View profile

Upcoming R Conferences | Next - Issue #28

Harshvardhan
Harshvardhan
Hi there!
Today, I will discuss a dashboard on inflation in Turkey, a dataset package for machine learning operations, a binge-worthy read on how to make your R experience better, upcoming R conferences and many more.
Let’s dive in! 🦘

Five Stories
Turkey is having a good fight with inflation. Not that other countries are scoring straight A’s — the World Bank said:
Indeed, the most salient feature of today’s inflation is its ubiquity. In the absence of global policy options to resolve supply-chain disruptions, the task of addressing inflation is left to the major central banks.
Gencer used an elastic net model with time-series data to predict the inflation rate in Turkey. Elastic nets are preferred over generalised linear models, random forests and support vector machines because of their ability to handle collinearity, create an interpretable model and (of course) better prediction performance—the sweet spot of interpretability, performance and speed.
The most powerful determinant of inflation is CPI, followed by annual changes in import prices, which are followed by inflation expectations. It all hints at how inflation creates a self-sustaining loop.
Machine learning datasets are fun to goof around with. Gary Hutson released an R package with six datasets; five related to medical sciences and one on Counter-Strike: Global Offensive.
This blog explores the first dataset on heart diseases to predict health outcomes using logistic regression. Instead of using a simple glm(), he uses TidyModels to showcase what simple algorithms can do. He shows how to find the crucial variables by visually comparing Odds-Ratio utilising the OddsPlotty package.
It’s conference season once again. This blog describes upcoming R-related conferences. Some interesting ones:
I’ve never met Jenny Bryan, but I’m afraid of her. Why? Because she’s going to set my computer on fire 😨. (I’m improving but rm(list = ls()) is just so convenient!)
Most of us were introduced to R through a course and continued with the grace of Google and Stackoverflow. This binge-worthy read presents nifty tips and tricks on taking your R productivity from 30% to 70% in an hour. There are chapters on handling paths safely, how to name files, maintaining R (and updating all packages with devtools::update_packages(TRUE)), and many more!
Amazon just released a dataset on how to dictate the same content in 51 different languages. The aim is to promote the development of natural language understanding (NLU), which is a sub-branch of natural language processing (NLP). They also released open-source codes for re-creating Amazon’s massively multilingual NLU models.
They are launching a new competition using the MASSIVE dataset, Massively Multilingual NLU 2022 (MMNLU-22), to create a single model that handles all 51 languages. Finally, there’s also a conference on NLU hosted in Abu Dhabi (and online) to discuss this dataset and its models.
Four Packages
magick is the best image processing library in R. It provides several advanced functionalities such as read/write, rotate, crop, fill, annotations, etc. See the vignette here (https://bit.ly/3khdSoZ).
MLDataR provides real-world datasets for machine learning applications. See the vignette here (https://bit.ly/3Lfo25r).
OddsPlotty is used to create odds plot visualisation from logistic regression results. See the vignette here (https://bit.ly/3KfGQ3t).
FeatureTerminatoR contains functions for feature selection and recursive feature elimination – which is a technique to look at how much weight each predictor variable has on the overall model. See the vignette here (https://bit.ly/3xRynkc).
Three Jargons
.Renviron contains environment variables to be set in R sessions. These are most useful to set up API keys like Github or Twitter IDs. The easiest way to edit it using usethis::edit_r_environ().
.Rprofile contains R codes to run whenever R starts up. You can set up your personal welcome messages. And no, putting library(tidyverse) is not a good idea.
Entropy is a measure of the goodness of clusters obtained from an algorithm . It uses the weighted probability of observations belonging to a cluster to calculate cluster “entropy”. Lower the entropy, the better clusters we have.
Two Tweets
Rachael Dempsey
Have any resources you could recommend for a team of data scientists (with mostly stats training) hoping to improve their software engineering skills, things like:
⬢ version control
⬢ writing tests for their code
⬢ lots more that I don't know of ☺️

Thanks in advance!
#rstats
David Keyes
Ok seriously if you could use some #rstats support in your org, you should definitely hire an intern! I've got nearly 100 applicants for my internship. Many of them look amazing and it's going to be hard to choose just one!
One Meme
Bonus
I wrote a short parable on how excitement can affect our productivity positively. The only trouble is excitement is perishable.
That's a wrap!
Did you enjoy this issue? Yes No
Harshvardhan
Harshvardhan @harshbutjust

A short and sweet curated collection of R-related works. Five stories. Four packages. Three jargons. Two tweets. One Meme.

List of all packages covered in past issues: https://www.harsh17.in/nextpackages/.

In order to unsubscribe, click here.
If you were forwarded this newsletter and you like it, you can subscribe here.
Created with Revue by Twitter.
Knoxville, Tennessee.