Spotify's Shuffle and Personal Websites | Next Issue #15





Subscribe to our newsletter

By subscribing, you agree with Revue’s Terms of Service and Privacy Policy and understand that Next — Today I Learned About R will receive your email address.

Hi there!
Yesterday I heard sad news — which quickly became good news (the good news is one of the tweets in this letter). Other than that, we’ll discuss the algorithm behind Spotify’s shuffle, see the pandemic’s impact on shopping with beautiful visualisations, reverse-engineered Resume ATS and my talk on personal websites.
Let’s dive in!

Five Stories
Spotify’s Shuffle Algorithm
When you hit shuffle on Spotify (or Apple Music, or YouTube), do you get a random list of songs to play one after another? If you have ten artists, all with around 100 pieces each, you will likely have 13 back-to-back songs by the same artists by the time you finish the first 100 songs (little simulation).
Spotify was bombarded with complaints from users, and they worked intensely on this. Initially, Spotify used the Fisher-Yates shuffle algorithm for creating a perfectly random playing queue. It isn’t complicated.
  1. You begin by writing down numbers from 1 to N, where N is the total number of songs.
  2. Choose any k between 1 and N. Identify the kth song from the end and note it separately. Strike kth harmony from the list of songs.
  3. Repeat till there are no songs left in the first list.
Simple, right? Well, listeners didn’t like the results. The engineering team had to define a “more-random-sounding” algorithm. Ultimately, they used Floyd-Steinberg dithering (an algorithm primarily used in image compression) to space out songs by the same artists. The algorithm can then be used recursively to space out pieces from the same album.
The New Normal: Pandemic’s Impact on Shopping Searches is a visualisation project that explores the general interest in different products before and during the Covid-19 pandemic. The products are grouped into three categories: normal, new normal and unusual. Search interest for normal products remained consistent through the pandemic. Products, where search interest increased and remains high, are called new normal. Unusual products suddenly got popular and are now back to old levels; even though COVID cases are not :(
Screenshot of
Screenshot of
The website is a rabbit hole and you can explore search trends for hundreds of products. The website gives you an option to filter products by category as well.
The New Normal
Plotly with ggplot2
If you have created a ggplot2 plot and want to add interactivity, here’s a simple way to do it. Just pass the plot p to ggplotly().
Making interactive plots with ggplotly().
Making interactive plots with ggplotly().
And by interactive, I mean you can hover on a point and get all the details; select points using a box or lasso tool; zoom in and zoom out; and even download the resulting graphics!
Example of interactive plot with ggplotly().
Example of interactive plot with ggplotly().
Someone Reverse-Engineered Resumes Not Landing into Dustbins
Well, not really. In the modern world, being on the job market means applying through hundreds of portals, most of which wouldn’t clear your resume through their ATS considering it filters 70% of applicants.
A (spoof) platform developed by Jess Peter provides you tools to provide you resume content, profile picture and profile video and even thoughtful leadership tweets. The project uses JSON resume to create Resume content, NVIDIA’s StyleGAN2 model which has been trained on 70,000 photos of people to create a profile picture, GPT-2 algorithm to generate thoughtful tweets and First Order Motion Model framework to animate still photos to act out recorded facial expressions.
I won’t recommend using it for actual applications, but it is definitely click-worthy.
I Web, Therefore I Exist
Digital identities have come a long way since the internet. Physical identities gradually disappear, and projects like Aadhaar in India have transformed governmental identification. Today, our digital social identities are owned and managed by Twitter, LinkedIn, and Facebook.
Last Saturday, I gave a talk at the Trenton R Users group on creating and controlling our digital social identity. How? By having a personal website. For the first half of the talk, I argued why personal websites make more sense today than ever before. Later, I jumped on to a hands-on website development on Owlstown. Believe it or not, it takes a little more than 15 minutes.
We recorded the session, and you can view it for free on YouTube. Other materials and sources are present on my website.
Four Packages
plotly is an R package for interactive web-based plots in R. You can click on a point to learn more, pinch to zoom, and all other familiar gestures work as well. Check the guidebook for different plots you can make.
quanteda creator argues you should drop all other text mining packages in R and start using this for all things text. Check the guidebook for more details.
pavo provides several tools to manage the spectral and spatial analysis of colour patterns. It provides flexible solutions to input spectral data, processes it, extracts variables and produces graphics among others. See the vignette for more details.
tmap is a package to produce thematic maps in R. The syntax for creating plots is similar to that of ggplot2 but tailored to maps. Check the vignette to learn more.
Here’s the link to the spreadsheet with all packages mentioned in past editions.
Three Jargons
pull() is used to extract a column from a data frame while working with pipes. df %>% pull(var) is way neater than using df %>% .$var.
Fair Bet: Fair bet is a bet for which the expected outcome is zero — after accounting for the cost of participation.
Sometimes you may need to edit the image in R — converting to black and white or ilk. The image_convert function from magick allows many options for doing so!
Two Tweets
Albert Rapp
After seeing @TeunvandenBrand's thread about {geomtextpath} I had to try it out, so my contribution for this week's #tidyTuesday does that.

My #rstats code is a bit messy. I had to manually compute the densities and arrange them vertically. Here it is:
R Function A Day
As requested by some of you, there is now a book of these posts! 📗

Makes it easy to-

📑 read
🔍 search
🔗 share

It ain't pretty, but that's the best I'd do in a day 😅

PRs welcome if you notice that something is amiss 🙏

#rstats #DataScience
One Meme
That's a wrap!
With R Function A Day discontinued and me running out of “interesting-but-not-complicated” jargon, I have decided to include R functions in the Three Jargons section of this letter.
As always, feedback is welcome. See you next week!
Did you enjoy this issue? Yes No
Harshvardhan @harshbutjust

A short and sweet curated collection of R-related works. Five stories. Four packages. Three jargons. Two tweets. One Meme.

List of all packages covered in past issues:

In order to unsubscribe, click here.
If you were forwarded this newsletter and you like it, you can subscribe here.
Created with Revue by Twitter.
Knoxville, Tennessee.