View profile

Is R-squared Useless? | Next — Issue #20

Hi there!
Today’s edition starts with a fundamental question: is R-squared useless? There are other interesting stories as well. Like Little Miss Data’s three minute summaries, Stanford’s COVID-19 vaccine distribution algorithm. Two cheatsheet-blogs on RStudio, R Markdown and blogdown are also described. A majority of packages and functions for today are around visualization. Finally, the meme is from a recent talk on data mishaps.
Let’s dive in!

Five Stories
Is R-squared Useless?
Like most debates on the internet, this one also started on Reddit when a student posted on r/statistics:
My stats professor just went on a rant about how R-squared values are essentially useless, is there any truth to this?
They were talking about Prof Cosma Shalizi from Carnegie Mellon University. It turns out there is a lot of truth to it. First, R squared values do not consider the variance of predictors. The above article demonstrates through simple simulations in R why this is true. Second, R squared values can be high even when the model is entirely wrong — like fitting a linear model to an exponential distribution.
Third, even if the variance remains the same and coefficients do not change at all, R-squared can vary ultimately depending on the range of X. Finally, R-squared cannot be compared across models. Different transformations will give different models, and the R-squared values are not comparable.
Solution? Use mean squared errors (MSE). The above article demonstrates that MSE allows comparability across models even with modifications.
As often as possible — usually every Friday — Laura Ellis publishes a #FunDataFriday article. They are three-minute-long introductions to an interesting data resource. They answer three basic questions:
  1. What is it? Tutorial, package, book, blog, etc.
  2. Why is it awesome?
  3. How do you get started?
Some of my favourite short reads are: Data Illustrator (a free and powerful tool on data visualization), Building a Career in Data Science (a book and a podcast), Data Asset Exchange and Wizard Zines (which I often refer to when I’m frustrated my code doesn’t work).
As Covid vaccines were becoming more real, there was a lot of discussion on who should be prioritised. It would’ve been a lottery in a dystopian world — like the movie Contagion. Thankfully, it wasn’t so.
Stanford University decided to use an algorithm to prioritise. It said to give vaccines to administrators and physicians at home instead of frontline workers, though the policy changed swiftly. The administrators squarely blamed the algorithm. Was it the algorithm’s fault? Caitlin argues, no.
Algorithms are designed, created, implemented, and tested by people. If algorithms aren’t performing appropriately, responsibility lies with the people who made them.
I won’t spoil the article for you, but the culprit was the final “human-in-the-loop” test that the administration skipped in an attempt to iterate fast.
This article is a tour de gifs. Using around 100 GIFs, Shannon explains how to customise RStudio and R Markdown for personal usage.
My favourite explanations are rearranging panels (that saved me more time than I expected), chunk anatomy and markdown troubleshooting. EVERYONE who uses R should read this article at least once.
Cheatsheets are more helpful than ever with exploding technologies and functions and functionalities to remember. RStudio hosts a bunch of cheatsheets on Tidyverse packages. This longish blog post by Prof Irene Virbik from the University of British Columbia gives a detailed introduction to blogdown and R Markdown syntax. The post is fantastic. Check it out!
Four Packages
ggbump: ggbump is an R package that creates elegant bump charts. Bump charts are suitable for plotting rank changes over time. Check the vignette here.
graphlayouts: This package implements some network and graph layouts that are not directly available in igraph. This is usually used in conjugation with ggraph. See the introduction here.
waffle: Waffle charts are square charts used to represent parts of a whole for categorical quantities. This is an excellent tool to present proportions. See the introduction here.
ggThemeAssist: When you want to modify your ggplot2 theme but you don’t know all the parameters to adjust, this RStudio add-in can help. See the vignette here.
Three Jargons
Sometimes, you want to insert a table in a ggplot2 plot, just like that. geom_table from ggpp is here to help! See documentation here.
Dot plots help in visualisation and comparison between multiple variables at once. ggdotplotstats from ggstatsplot produces these charts with descriptive and inferential statistics as well! See the documentation here.
Wanna visualise how errors vary from a distribution? stat_fit_deviations from ggpmisc is here to help. See the documentation here.
Two Tweets
Hadley Wickham
What's your favourite talk that shows a lot of code?
Chad C Williams
@mlittmancs When buying a house, I used statistical modelling in #rstats to predict the final sale price of houses dependent on predictors (e.g., land size).

Did the same when making sure my insurance paid me properly for my truck when it was written off (they didn't and agreed to my price)
One Meme
By Hilary Mason. Data Mishaps.
By Hilary Mason. Data Mishaps.
That's a wrap!
Hope you enjoyed today’s edition. Share it with your friends or colleagues who are interested in data science and R. See you next week!
Did you enjoy this issue? Yes No
Harshvardhan @harshbutjust

A short and sweet curated collection of R-related works. Five stories. Four packages. Three jargons. Two tweets. One Meme.

List of all packages covered in past issues:

In order to unsubscribe, click here.
If you were forwarded this newsletter and you like it, you can subscribe here.
Created with Revue by Twitter.
Knoxville, Tennessee.