View profile

R 4.2.0 is here | Next - Issue #35

Hi there!
R 4.2.0 was released in April 2022. The simplest way to upgrade is to download the new version and double-click on it. If you’re using Windows, you can also try installr.
Let’s jump in.

Five Stories
The newest version of R 4.2.0 was released last month. Native pipes now support more than the first parameter, with the new placeholder _. Isabella explains more about pipes here.
Help pages now allow for math formulas with MathJax or Katex. knitr support makes them markdown-friendly. There is a “Run Examples” button in help pages now to, you guessed it, run examples. Conditional statements with vectors will not generate errors, not warnings. You can read about all the good stuff here.
R2D3 present a beautiful introduction to machine learning in this scroll-guide. It gently introduces us to the concepts of variables, decision boundaries, scatterplot, histogram, false positives or negatives, and more.
It starts with an intuitive model: decision trees. How would different branches look like? What makes a good split? What’s node impurity? What’s overfitting?
This is a follow-up to the previous post by R2D3. Ending on overfitting, the last guide hinted at the essential statistical truth: bias-variance trade-off. You can either make your model extremely good with existing data, making it biased; or you can make it equally good with unseen data, allowing high variance. What would you choose?
Mention predictive modeling to the general public and you’re likely to conjure memes of complex mathematical equations swirling. Mention wine, and you get a much different reaction. One can be intimidating, the other inviting.
The authors build a machine learning model to predict wine quality in this post. On the face of it, this is a simple predictive model. But when I scrolled through the story, I realised it is much more. Check it out, even if just for the pretty visualisations!
5. Can Machine Learning Translate Ancient Egyptian Texts?
Egyptian hieroglyphs are tough to decipher. Some characters mean a word, some are sentences, some are phrases, and most are unknown. When Ubisoft’s Assassin Creed: Origins was released, the game makers wanted to be accurate with the language. They approached Egyptologists who used Google’s Fabricius (an ML tool specifically developed for understanding hieroglyphs) and Gardiner’s Sign List (a classification system designed to aid understanding of hieroglyphs).
The model was great at finding the hieroglyphs — recall that most characters are observed incomplete. However, the ML program only had an accuracy of 27% in understanding the language.
Four Packages
Models in R quickly blow up in size, taking up a lot of disk space. butcher can reduce the size of the model by eliminating unnecessary closures, formulas, etc. Check the vignette here.
buildmer provides methods for finding the largest possible regression model after backward/forward elimination based on AIC, BIC or F-test on R^2. Check the vignette here.
Looking for free financial data in R? Here’s simfinapi — a package to get all kinds of financial information. You do need to register to get an API key. Check the vignette here.
mlflow is an excellent tool to track your machine learning experiments end-to-end. Check the vignette here.
Three Jargons
Score in machine learning is a metric to measure how good is a model. For classification problems, we use accuracy, AUC, F1 score, etc. For regression problems, we use RMSE, AIC, BIC, etc.
F1-score combines the precision and recall of a classifier into a single metric by taking their harmonic mean.
A more general F score called F-beta uses a positive real number beta, where beta is chosen such that recall is considered beta times as important as precision.
Two Tweets
Marc J. Lajeunesse
First published use of the {juicr} R package 🍊 to extract data from plot, figures, & charts! see:

No citation though😢-- c'est la vie I guess, I'm just glad someone found it useful!

#RStats #tcltk #OpenSource
One Meme
Need some inspiration? Check this out.
That's a wrap!
I hope you learned something new today. As always, your feedback and suggestions are always welcome. See you next week!
Did you enjoy this issue? Yes No
Harshvardhan @harshbutjust

A short and sweet curated collection of R-related works. Five stories. Four packages. Three jargons. Two tweets. One Meme.

List of all packages covered in past issues:

In order to unsubscribe, click here.
If you were forwarded this newsletter and you like it, you can subscribe here.
Created with Revue by Twitter.
Knoxville, Tennessee.