Ranking in R

How to implement ranking in R

Evan Miller posted a how-to guide back in 2009 for ranking based on reviews. The problem he highlighted is the fact that percentage-based sorting doesn’t really work well in places like Yelp and Amazon. For example, is a restaurant with 2 positive reviews and 0 negative reviews really better than a restaurant with 50 positive reviews and 1 negative review? Clearly not, and Miller outlines a better way. Miller has code examples in Ruby, SQL, and even Excel, but not R. Here’s how to implement it in the best statistical programming language.

First we generate some dummy data. For this example, let’s treat a rating of 0 as a bad review and a rating of 1 as a good review.

x <-
  tibble::tibble(
  product = sample(LETTERS, size = 1000, replace = TRUE),
  rating = sample(c(0, 1), size = 1000, replace = TRUE)
  )

Now for the actual ranking. There’s a lot of math involved, but the actual equation is easy to implement with some help from dplyr.

Written on March 21, 2017