View profile

Creating music with R | Next - Issue #39

Hi there!
The more I’m growing out of my through of disillusionment with Python, the more I’m meeting people to whom I explain why I love R far more. I didn’t have a good explanation. Probably because I started learning R before Python; probably because R feels closer to C++; probably R’s helpful documentation.
But an old blog post by Greg Wilson explains it perfectly: community. There is #TidyTuesday, R Meetups, R-ladies, and much more. They act as positive liberty where you have the means to accomplish your goals as opposed to negative liberty where no one actively stops you from achieving them. These nudges get me going.
Let’s dive in.

Five Stories
Last week was rstudio::rconf(2022), and my Twitter didn’t stop buzzing. Two news caught up everyone’s attention: RStudio changed its corporate name to Posit, and Quatro got the new-kid-in-town badge.
What is Quatro?
Imagine Jupyter Notebook or R Markdown but with major improvements. Somewhere you can code in R, Python, Julia and more in a single document. It works with RStudio, VS Code, Jupyter Lab, or any notebook or text editor you like. It can produce reports, presentations, websites, blogs, books, and journal articles in HTML, PDF, MS Word, ePub, and more.
The project is developed with support from the Jupyter community. Python documentation published with nbdev 2 will not be published as Quarto websites. There is a lot more to come.
Have you ever thought of creating music with R? I once tried but wasn’t very successful in creating soothing music. Here is a cool project which creates sounds based on different waveforms. It is actually quite simple.
There are independent functions like sine, square, triangle and sawtooth where you provide wave frequency and duration. Use them together, and soon enough, you can play “Mary had a little lamb” in R. 🤯
Long-term readers of Next would know how much I love Cédric’s posts. He illustrates his favourite ggplot2 functions and extensions from many associated packages. You can also watch the talk on YouTube.
He covers {ggtext} for text rendering, {ggforce} for fancy annotations, {ggdist} for visualising distributions and many more. If you’re somewhat familiar with ggplot2 syntax, dive in!
4. ROC and AUC in R
Last month, my sister asked me how to create ROC and AUC curves in R. I sent her some blogs, but they didn’t solve her specific use case. Then, she discovered this incredible video (and I bookmarked it).
The best part of this video is the simple explanation. Josh explains each line of code in detail and how it produces the required output. He also covers the theoretical concept behind logistic regression and ROC/AUC.
In July, artist Zach Katz fed DALL-E 2, a Google Street View image of Brooklyn, New York. He selected pavements and parked cars as aspects to remove and then typed in how he’d like them replaced — by a “strikingly beautiful cobblestone European pedestrian promenade, with an ornate stone water fountain and children playing.”
Actual (left) and DALL-E 2 redesign (right). Bloomberg.
Actual (left) and DALL-E 2 redesign (right). Bloomberg.
Now, Katz has a Twitter account where he posts pictures of such potential transformations.
Four Packages
ggResidpanel is an R package for creating panels of interactive diagnostic plots for residuals using ggplot2 and plotly to analyze model assumptions from various viewpoints. Vignette.
ggtech provides ggplot2 themes used by major tech companies such as Google, Facebook, Twitter, Airbnb and more. Github.
abess has an efficient toolkit for the best subset selection methods. It has ready-to-use functions for logistic regression, Poisson regression, Cox proportional hazard model, multinomial logistic regression and more. Vignette.
apa, apaText, apaTables packages provide formatting tools to produce APA standard formatting for publication. Vignettes: apa, apaText and apaTables.
Three Jargons
Lag in time-series data is an artificial delay. For daily information on stock price, a lag of one day would mean yesterday’s stock price, for example.
Class imbalance occurs when we have more observations from one group than the other. This can be problematic as the algorithm would learn to call everyone healthy purely because being healthy is more likely.
Downsampling uses fewer observations than available at our disposal. Why would we do that? Sometimes, we have more observations of one kind than of another. In a dataset of 10,000 patients, only ten might have a disease. Most classification algorithms do not produce binary outputs; their continuous outputs are thresholded to binary. Downsampling healthy patients is one way to avoid such a case.
Two Tweets
Albert Rapp
The #rstats ecosystem makes splitting a stacked bar plot simple. 🥳 This way, comparing groups is sooo much easier! 👌🏽

✂️ Split stacked bars with facet_wrap()
🪢 Combine splits with totals via {patchwork}


Details in thread 🧵
built a quick app that uses gpt-3 to convert from English to RegEx so you don't have to waste time on stackoverflow:
One Meme
That's a wrap!
If you liked today’s letter, why not share it? See you next week!
Did you enjoy this issue? Yes No
Harshvardhan @harshbutjust

A short and sweet curated collection of R-related works. Five stories. Four packages. Three jargons. Two tweets. One Meme.

List of all packages covered in past issues:

In order to unsubscribe, click here.
If you were forwarded this newsletter and you like it, you can subscribe here.
Created with Revue by Twitter.
Knoxville, Tennessee.