I’m starting to realize that the key to getting a decent RMSE is to not wiff completely. What I mean is you want to minimize the number of times your predicted rating is 2 or more away from the actual rating. So the big problem really is determining when people are going to rate a movie 1 or 5 stars. I’m going to try not to worry about 2,3, or 4 star ratings. In those cases I could predict a 3 and at worst be 1 off. It’s when someone goes to either extreme that it can really hurt your score.

Take user 1858615 for example. User 1858615 has rated 227 movies, giving a whopping 183 of them 5 stars (80%). But they’ve also dished out 27 one star ratings (12%). That means that 92% of their ratings are either a 1 or a 5. This user has 5 movies in the probe set, which I have to predict thier rating on. Of course the user’s actual ratings are 1,5,5,5,5. Guessing the 5s isn’t so hard since the user has a penchant for high ratings, but how do I sniff out that 1 star rating? The movie in question is “Kung Fu Hustle“, which averaged 3.73 stars. Not exactly a dud.

My challenge now is to dig deaper into the data and try to figure out why user 1858615 gave movie 10231 one star.

Popularity: 4% [?]