History of Mathematics and R | Next Issue #16





Subscribe to our newsletter

By subscribing, you agree with Revue’s Terms of Service and Privacy Policy and understand that Next — Today I Learned About R will receive your email address.

Yesterday I played a new card game: Nerts. Each player has a deck of cards from which they randomly draw 13. Everyone plays simultaneously (so no turns!), and kinda-like Solitaire aims to get rid of their “Nert” or the 13 cards. It is challenging in two ways. First, you need to be vigilant of how others are playing. Second, you need to be fast to win; there are no second places.
The game posits the clear trade-off: diligence vs speed. The same follows for most things in life — including R. Today’s letter will talk about R as the first programming language, the history of mathematics museum and the next generation of kids looking for the right trees in forest. But first, let’s participate in the famous birthday paradox experiment.

Five Stories
Your twin is likely reading this newsletter too. Why? Because Maths!
The birthday problem in statistics is a veridical paradox: it is true, but you’d cross your hands and say “no way”. You are in a group of n people. What is the probability that you will share your birthday with at least another soul? With just 23 people in the group, you have more than a 50% chance of finding your twin. Why?
I will give you an intuitive explanation. With 23 people, there are 23C2 pairs to choose, i.e. 253 pairs — which is well over the number of days in half-year (182 days). No wonder it is probable to find twins. Russell’s website illustrates the same.
A straightforward interface asks you your birthday and compares it with the previous 22 visitors to the website. Then, it simulates the results for thousands of trials showing the number of comparisons needed — ultimately convincing us why you likely have a twin reading this too. (Since this newsletter has 130+ subscribers, the probability of having two people with the same birthday is more than 99.9%.)
Modern college students aren’t organizing their files into folders and directories, forcing some professors to rethink teaching programming. The idea of organizing files and folders on our computers followed through with how we do it in real life. Cabinets have drawers, which have several folders, which have files and documents. There is inherent nesting to the procedure.
Sometime in 2017, the college professors realized students didn’t know what directory and folders meant. Fundamentally, the new generation has changed how they access files. Haven’t we started searching for files instead of organizing them?
But the problem is both ways. Many professors do not understand how Instagram or Snapchat is organized, either. One could argue that I’m comparing apples to oranges, but I’d blame Muscle Memory. It is approximately why we like the current structure in the first place. Is it the most optimal?
History of Mathematics
The History of Mathematics is a virtual gallery of important physical artefacts around the history of mathematics. There are nine virtual exhibits with 74 artefact pages. The nine topics on display are Counting, Arithmetic, Algebra, Pythagorean Theorem, Geometry, Primes, Pi, Polyhedra and Mathematics Education.
Each exhibit is organized with a timeline of developments and 6-12 artefacts that showcase its development. The artefacts are historical pieces of evidence but are closely supported by interactive web resources from Wolfram. Did you know Indiana tried passing a bill that would fix the value of Pi to 3.2, legally speaking? Thanks to Professor C. A. Waldo of Purdue University, it didn’t pass through.
I thoroughly enjoyed browsing over exhibits on Pi and Counting. I am sure all R and Data Science enthusiasts like you would also love it. Guess what? There is no entry fee.
Text mining is a growing topic. In our recent class by Prof Zhou, we learnt several essential tools in Python for text mining. In this blog post, I covered the same methods using R. It may, again, be muscle memory, but I found R solutions to be neater though requiring many packages instead of just one.
In this short tutorial, I download Jane Austen’s Persuasion from Project Gutenberg, found words that start with a letter, end with a letter, contain a letter, are of certain character length, and more. I also visualised the frequency of common words (“the” is the most common word — obviously).
When did you first learn to program? I was lucky to have a computer science class in high school where we learnt C++, and syntax, loops, functions and conditional statements. Today Indian high schools have shifted mainly to Python and Java.
Python is “a programming language that lets you work quickly and integrate systems more effectively”. R is “a free software environment for statistical computing and graphics”. Today, these languages are considered substitutes for each other, though they’re probably not designed that way.
Sean Kross argues that data scientists and analysts have a different approach to programming than computer scientists. Instead of worrying about implementation logic, complexity and algorithms, they focus on applying methods on real datasets and bringing them to production. Thus, R makes a lot of sense as a tool that helps them do it. A hot but thoughtful take, in my opinion.
Four Packages
mlr3: R does not provide standardised methods for machine learning models. mlr3 is an attempt to standardise methods using an object-oriented approach. (I’m not sure how this compares with tidymodels though). Check the vignette here.
ggforce is ggplot2 on steroids. It aids ggplot2 through several functions that add additional functionalities on what’s possible. Check the complete list of functions here.
Deducer: This is not for you. It is for your friend who likes GUI and hates programming. Deducer is an (old) GUI for R that supports many SPSS-like functions with a GUI and R in the background.
fuzzyjoin: This package provides methods for non-exact joins between tables. You can fuse tables of text full of misspellings, for example. Check the vignette here.
Three Jargons
set_theme() is a ggplot2 command that sets the theme globally for the R session. Use it by calling set_theme(theme_minimal()) anytime before plotting.
qplot() is another ggplo2 function that is supposedly a replacement for plot(), hist(), and many other plotting functions in base R. See this quick guide.
Collaborative Filtering: Imagine Netflix wanting to know what to recommend to a user, u. It finds ten users who are similar to u. Then, it averages their ratings of several movies and shows and picks one that matches u‘s preferences. That’s collaborative filtering.
Two Tweets
Alison Presmanes Hill
An #rstats #blogdown file hierarchy cheatsheet:

├─ archetypes <- edit me!
├─ config.toml <- edit me!
├─ content <- edit me!
├─ data <- edit me!
├─ layouts <- edit me!
├─ public <- ignore me!
├─ static <- use me! (png/pdf/csv/xls)
├─ themes <- don't touch! https://t.co/gvVA703Lwa
Amelia McNamara
"Things that are still on your computer are approximately useless." -@drob #eUSR #eUSR2017 https://t.co/nS3IBiRHBn
One Meme
That's a wrap!
Have a great rest of the week. Take care and stay safe. Share this letter with your friend or colleague as well.
Until next week,
Did you enjoy this issue? Yes No
Harshvardhan @harshbutjust

A short and sweet curated collection of R-related works. Five stories. Four packages. Three jargons. Two tweets. One Meme.

List of all packages covered in past issues: https://www.harsh17.in/nextpackages/.

In order to unsubscribe, click here.
If you were forwarded this newsletter and you like it, you can subscribe here.
Created with Revue by Twitter.
Knoxville, Tennessee.