Tidyverse, Reproducibility and Blogging about R | Next Issue #23

#23・
213

subscribers

40

issues

Subscribe to our newsletter

By subscribing, you agree with Revue’s Terms of Service and Privacy Policy and understand that Next — Today I Learned About R will receive your email address.

Harshvardhan
Harshvardhan
Hi there!
How many people are older than you? What’s your life expectancy? Would it change if you live in a different country? This website tells you some estimates. Very, very interesting. Apparently, I’ll live ten years more in Japan than in India.
Today’s letter will recall five talks I enjoyed extensively from past rstudio::rconf. Organised by RStudio annually, rconf is one of the largest data science conferences. The first video is by Hadley on Tidyverse, the second one is about the reproducibility crisis, the third one is on becoming an R blogger, the fourth one is on gt package, and the final one is on ggplot2 nitpicks.
Bookmark them or binge-watch them. Your choice. Don’t miss the bonus puzzle at the end.

Five Stories
1. Maintaining The House Tidyverse Built
Tidyverse is a universe in itself. It provides the easiest programmatic methods to get started with data science. In this video, Hadley (Chief Scientist, RStudio) talks about how Tidyverse is built. He talks about the lifecycle of functions; when a function deprecates, is experimental, has been superseded, and what can a developer do to ensure their codes don’t break. This led to introducing a lifecycle package that provides you don’t break your code as often. Finally, he warns against off-label usage when a function is used for something it’s not intended to. Like using c() to convert factors to integers. Finally, he advises using RStudio or MRAN’s package manager for long-lasting code. A must watch!
Hadley Wickham | Maintaining the house the tidyverse built | RStudio
Hadley Wickham | Maintaining the house the tidyverse built | RStudio
2. R Markdown: The Bigger Picture
Science has a reproducibility crisis. In 2012, Amgen made headlines when it could only replicate 6 out of 53 landmark cancer papers. Some estimates say 75-90% of experimental results cannot be reproduced. The problem isn’t limited to medical journals. Only six out of 18 economics landmark research works could be replicated. Only 13 of 21 papers in Nature and Science — two of the most prestigious cross-discipline journals—could be replicated.
Reproducibility crisis creates credibility crisis. A large part is because we’re using analysis tools that weren’t designed to be used that way. Bottom line? Use a notebook tool like R Markdown to make your research reproducible and probably replicable, argues Garrett Grolemund.
Garrett Grolemund | R Markdown The bigger picture | RStudio (2019)
Garrett Grolemund | R Markdown The bigger picture | RStudio (2019)
3. Becoming An R Blogger
Rebecca Barter gives four straight reasons why we should blog. First, you start learning exponentially. You generally don’t know how little you know until you have to explain it to someone, especially without non-fluencies. Second, it gives you a portfolio to show off. 56% of recruiters are more impressed by websites than any other tool (Forbes). Third, it allows you to practice your communication skills. Finally, it is the most productive way to procrastinate. I built my website almost entirely when I didn’t work on anything else. She also offers many other tips to get started!
Rebecca Barter | Becoming an R blogger | RStudio (2020)
Rebecca Barter | Becoming an R blogger | RStudio (2020)
4. Introducing the gt package
Of all the table packages in R, gt is my favourite. gt provides me with the ability to include searchable tables in R Markdown — and people absolutely love it. An executive once complimented them, saying, “they’re so refreshing coming from boring Excel sheets”.
In this talk, Rich Iannone introduces how gt is built what its properties and objectives are. He also gives a quick demo of creating such tables with ease.
Rich Iannone | Introducing the gt package | RStudio (2019)
Rich Iannone | Introducing the gt package | RStudio (2019)
5. Always look on the bright side of plots
So you have been using ggplot2 for some time now and have a good grasp of its essential properties. You understand how it behaves or is expected to behave. Recently you graduated to answering ggplot2 questions on Stackoverflow, from mindlessly copying and pasting codes.
Now you want to be a pro. Where do you start? This talk by Kara Woo. She talks about the three most common mishaps: mapping mishaps, scale snafus and theme threats. Check it out!
Kara Woo | Always look on the bright side of plots | RStudio
Kara Woo | Always look on the bright side of plots | RStudio
Four Packages
lifecycle: lifecycle package provides tools and conventions to manage the life cycle of your exported functions from Tidyverse. The best way to learn about it is through Hadley’s talk.
naniar: Handling missing values is tedious. This package is here to simplify it. Check the vignette here.
gt: My favourite package to create LaTeX, HTML and RTF tables in R. You can include them in R Markdown and literally everywhere you use tables (except Excel, yet…). Check the website here.
summarytools: One of the first things when starting with a new dataset is creating it’s a summary table. This package significantly simplifies the process. You’ll love it. Check the vignette here.
Three Jargons
Gumbel Distribution: Gumbel distribution is used to model the maximum and minimum of distributions. It is critical to model rare events like maximum water levels in the rivers, earthquakes, etc. Read more on it here.
Softmax Function: The softmax function is a multi-dimensional generalisation of the logistic function. It is particularly useful in multinomial logistic regression and neural networks. Read more on it here.
LAPACK: Lapack is the standard software library for conducting numerical linear algebra. It is originally written in Fortran but C/C++, R and Python’s SciPy are wrappers around its original functionalities. Read more on it here.
Two Tweets
Moriah Taylor (she/her)
This week for #TidyTuesday, we figured out how to use the #RStats {layer} package by @shinysci and @CedScherer to create maps of alternative fueling stations in the United States.

Code: https://t.co/KlFFycCLDL https://t.co/JRG91099Zb
David Neuzerling
In case you didn’t know, RStudio can generate a Roxygen template for any R function. Put your cursor in the body of the function and hit Code > Insert Roxygen Skeleton.

If you add a new parameter you can do this again and RStudio will extend the documentation to include it.
One Meme
This turned up in r/statisticsmemes. I found it hilarious.
This turned up in r/statisticsmemes. I found it hilarious.
Bonus: Puzzle
Last week our village had a fête. One of the competitions on offer was to guess the number of balls in a bag. There were N balls in the bag, and they were numbered 1, 2, 3, …, N. To help competitors make a sensible guess, they were allowed to take out four balls and note the numbers on them.
When I took part, I pulled out balls numbered 24, 87, 14 and 35.
There is a prize for the person who guesses the correct number exactly. How many balls should I estimate are in the bag?
That's a wrap!
RStudio recently launched a dedicated webpage for resources to help budding data scientists advocate data science teams at their organisations. Thank you for the suggestion, Matthias!
I hope you enjoyed today’s letter. Could you share it with a friend or a colleague? See you next week!
Harsh
Did you enjoy this issue? Yes No
Harshvardhan
Harshvardhan @harshbutjust

A short and sweet curated collection of R-related works. Five stories. Four packages. Three jargons. Two tweets. One Meme.

List of all packages covered in past issues: https://www.harsh17.in/nextpackages/.

In order to unsubscribe, click here.
If you were forwarded this newsletter and you like it, you can subscribe here.
Created with Revue by Twitter.
Knoxville, Tennessee.