View profile

Code Snippets in RStudio and Spotify's Search Algorithm | Next Issue #13

Harshvardhan
Harshvardhan
Hi there!
We are almost two weeks into 2022. Are you still keeping up with your new year resolution? More power to you! 🚀
Today’s edition will talk about productivity in RStudio using snippets, resources for package development, a quick look into Spotify’s search algorithm, an ambitious project to document all algorithms and data science concept maps. I also started maintaining a spreadsheet of packages presented in this newsletter so that finding that package again isn’t a pain. (Thanks Sreyan, for this wonderful idea.)
Let’s dive in.

Five Stories
Code Snippets in RStudio
Code snippets are text macros in RStudio that are used to quickly insert common snippets of code. For example, writing fun inserts an R function definition. You can move from function name to arguments to the body by hitting the tab.
Example of snippets in RStudio.
Example of snippets in RStudio.
These are helpful for speedy programming whether you are a novice or an advanced R user. Not just that, you can also write your own code snippets for common functionalities that you use every day. Jozef compiled a list of 11 ready to use examples that could boost your productivity. They are grouped by four common use case scenarios that automatically insert boilerplate codes, or insert a code block (like loading those five packages you use everytime), among others. Jump to Gitlab if you only want examples.
Awesome R Package Development
Someone once said, “If you give the same advice thrice, write a blog. If you use the same code thrice, write a function. If you use the same function thrice, write a package”.
Developing packages in R is more straightforward than most other languages, thanks to… other packages. Indrajeet Patil wrote a list of packages in R that support package development. Whether you need package skeletons like golem and fusen, or feel roxygen2 is not enough for your documentation needs, or need to generate a static website for package documentation, the list has it all.
Instant Search for Music and Podcasts
Spotify boasts of 70 million songs and 3.2 million podcasts on its platform. When we click on that search button, how does it ensure we get what we expect? We would be overwhelmed by the options if it were just a textual match. In this blog post, Spotify explains how it defines its search algorithm.
There are three essential requirements of a good search algorithm. First, it is instantaneous, i.e. results update with each keystroke. Second, it is heterogeneous, i.e. the user could be searching for music or podcast. Third, it understands multiple representations of the same word, i.e. text in context.
In general, Spotify found that users delete 53% more characters when searching for a podcast than for a song, even though the average query length is the same. Furthermore, upon searching, users download podcasts nearly six times as many songs. These insights led them to develop a neural architecture called Neural Instant Search (NIS), that outperformed existing algorithms.
Neural Instant Search for Music and Podcasts
The Arcane Algorithm Archive
The Arcane Algorithm Archive is a collaborative effort to create a guide of important algorithms in all programming languages. The book is far from complete but is an ambitious step in an attempt to be comprehensive. It covers algorithms for plotting functions, fractals such as the Sierpinski triangle, data structures, algorithmic complexity, affine transformations and convolutions among others.
The book also briefly introduces algorithms such as the Euclidean algorithm to find the greatest common divisor, the Monte Carlo method for simulation, computational geometry for gift wrapping, data compression (Pied Piper, anyone?), Computus for deciding Easter day by Pope and approximation of how high can you count on your fingers (11 quadrillions, apparently).
The majority of the book’s examples are written in Python and many are written in more than one language. The user can select a language of their choice.
Concept Maps for R
Concept maps diagrammatically represent relationships between concepts and ideas. They are handy for visual learners, although they can benefit all learners. They help you see the big picture: how are higher-level concepts related to other ideas.
Concept maps work very well for classes or content that have visual elements or in times when it is important to see and understand relationships between different things. They can also be used to analyze information and compare and contrast.
The Learning Center, University of North Carolina at Chapel Hill
RStudio hosts a Github repository of concepts maps about R ideas. They are designed to be used in introductory data science lessons, but I found them helpful to jiggle your memory. There are concept maps for dplyr, clustering, R Markdown, regular expressions, neural networks and many more.
Four Packages
I have started compiling the list of packages listed in all Next letters in a single spreadsheet so that we all can easily jump to that one package when we need to. Access it here. The link will also be posted in the description of this newsletter.
flexdashboard provides methods to make easy interactive dashboards using R Markdown. It supports related visualisations in a single pane, htmlwidgets, flexible and straightforward layouts for organisation, Shiny apps, and a creative “storyboard” layout for presenting a sequence of visualisations with commentary. Check the vignette here.
rbokeh provides methods to create interactive web-based plots in R. Each point in the plot is clickable where you can see more information and you can even use your mouse to select a portion of the graph to magnify. Kewl. Check the examples here.
dygraphs provide methods for interactive and automatic plotting of time series objects in R. It is highly configurable and provides zoom, highlighting, shaded regions and point to point annotations. Check the vignette here.
insight provides methods to recover intermediate information when developing a model — beyond coefficient estimates and estimates of fit. It supports 200+ models currently. Check the vignette here.
Three Jargons
Census: It is the procedure of collecting, recording and calculating information about the whole or a significant part of the population. Generally, it is used to mean national housing and population count of citizens.
Ham Sandwich Theorem: Given n measurable objects in n-dimensional Euclidean space, it is possible to divide all of them in half with a single n-1 dimensional hyperplane. See Hannah Fry’s video on how to share sandwiches with your siblings fairly for a funny but informative introduction to the theorem.
Bertrand Paradox: Bertrand paradox is an example that shows how the principle of indifference may not produce well-defined results if the problem is not defined exactly. See Grant Sanderson’s video to understand this counter-intuitive paradox.
Two Tweets
David Scholz
Navigate to one of your github repositories and press "." (period). Well, I didn't know yet.
R Function A Day
Sometimes you wish to remove only certain rows or columns that are completely empty instead of partly empty.

The {remove_empty_*} function family from {janitor} 📦 do exactly this! 🧹

https://t.co/j1RAFZv4pY

#rstats #DataScience https://t.co/QHb1bDzHzB
One Meme
Meme about Albert Einstein and Machine Learning.
Meme about Albert Einstein and Machine Learning.
That's a wrap!
Hope you learnt something new today. Like always, your feedback and suggestions are always welcome. See you next week!
Harsh
Did you enjoy this issue? Yes No
Harshvardhan
Harshvardhan @harshbutjust

A short and sweet curated collection of R-related works. Five stories. Four packages. Three jargons. Two tweets. One Meme.

Personal website: https://harsh17.in.

List of all packages covered in past issues: https://www.harsh17.in/nextpackages/.

In order to unsubscribe, click here.
If you were forwarded this newsletter and you like it, you can subscribe here.
Created with Revue by Twitter.
Knoxville, Tennessee.