View profile

Difference between data science, ML and AI | Next Issue #19

Harshvardhan
Harshvardhan
How have you been doing?
Last week, I read this old blog post on choral explanations. We think of Wikipedia when we think of “collaboration” for explanations outside the academic context. Mike Coalfield argues that sites like Stackoverflow and Quora retain individual writing control while explaining the topic — making them inherently different from unified tools. These unique multifold explanations allow us to transcend through a topic via many facets. This idea of individual explanations to a common problem is refreshing.
Today, we will talk about three such ideas: data science, machine learning and artificial intelligence; Jesse Mostipak’s helpful list of data science Twitch streams; an incredibly beautiful and free reincarnation of Euclid’s Elements; and more.
Let’s dive in!

Five Stories
Before machine learning (ML) and artificial intelligence (AI) joined the industry-speak, these terms were unknown jargon. Many predicted we will have superintelligent robots today at the first-ever AI conference (Dartmouth conference)y. But like flying cars, the “intelligence” is nowhere to be seen. I like Stanislaw Ulam‘s take on it.
What is it that you see when you see? You see an object as a key, a man in a car as a passenger, some sheets of paper as a book. It is this word ‘as’ that must be mathematically formalised, on par with connectives “and”, “or”, “implies”, and “not”. …Until you do that, you will not get very far with your AI problem.
But anyway. The terms are here; they’re not going anywhere. So, what do they mean in the modern context? David Robinson proposes a wonderful distinction between the three ideas. Data science produces insights. Machine learning produces predictions. Artificial intelligence produces actions.
Data science involves understanding the data to gain insight. It involves a human decision maker looking at topic like statistical inference, data visualisation and experimental design. Machine learning is a field of prediction: if I know X, what can I say about Y? Finally, artificial intelligence is far less trivial to explain. That is how AlphaGo can defeat the best Go player. It is notoriously hard to understand but it works.
If nothing, jump on for David’s humour. (Meme for today’s newsletter was cross-sourced from his blog.)
Did you know there’s an entire world of absolutely free live-streamed data science content available 24/7 at twitch.tv? Jesse didn’t, I didn’t, and I’d bet you didn’t either. In this short blog post, she rounds up the best data science streamers — for R and Python.
If you like to follow along with data science competitions and hang out with other data science people, definitely check it out!
Do you consider yourself a graphic learner? TidyBlocks is an excellent website to understand statistics and R via blocks. A game-like website that comes with a detailed guide is here to support your journey. I first read about it in Greg Wilson’s blog (which is an excellent read independently), and my first reaction was, “how had I missed it so far!”.
This is a fantastic tool to teach statistics and R to beginners. The interface is rudimentary but designed as blocks that one plays to do a statistical analysis.
Every so often, we stumble on a hidden treasure somewhere on the web that makes us question how is it available for free!? This website is a stunning reproduction of the 1847 Byrne’s Euclid with interactive diagrams and cross-references. Elements were Euclid’s fundamental work and a pioneering encyclopedia of mathematics. Published in 300 BC, they are one of the oldest large-scale treatises on deductive mathematics.
In total, they were a collection of thirteen books in Greek. It is a collection of postulates, propositions and mathematical proofs. Nicholas carefully read through each book and reproduced Byrne’s version, which is as beautiful as thoughtful. Jump to the book’s introduction!
We all make typos when writing code. Here is a small library to help you with that. Also, it will save you from the embarrassment if you’re coding around your mom.
The package is simple: it holds out your typing mistakes and shows some quick suggestions on possible names that you could’ve been trying to type.
Source: ThinkR's website.
Source: ThinkR's website.
Four Packages
fcuk: A package designed to help people with clumsy fingers who make a lot of minor typographical errors. It shows you possible mistakes and how to correct them. See the vignette here.
xkcd: You would’ve likely stumbled on some xkcd comic sometime, or you may be a sincere reader of the same. This package creates xkcd like plots in R. It is pretty basic, but nothing can beat a good dose of humour in the boring slides. See the vignette here.
errorist: This package can search for your error messages the second they occur. So, no need to copy and paste the error message on Google. See the Github here.
correlation: Correlation is one of the first analyses we try when presented with new data. Unfortunately, cor() in R is very limited. This package has all you could ask for, and then some more. Even pretty visualisations! See the vignette here.
Three Jargons
Likelihood: Likelihood is the joint probability of the sampled data as a function of (unknown) parameters for a model. The likelihood is not the probability density function but the product of sampling densities. See Wikipedia for more details.
Score: Score represents the sensitivity of the likelihood function to measure the effect of changing parameters on log-likelihood. It is calculated as a derivative of log-likelihood. See Wikipedia for more details.
Fisher’s Information: It is a method to measure how much information a random variable has about the unknown parameter that supposedly describes the population. See Wikipedia for more details.
Two Tweets
Albert Rapp
Ever wanted to use colors in #ggplot2 more efficiently?

Based on #dataviz principles described in a recent blog post from @lisacmuth, I'll show you how.

Find my blog post at https://t.co/0RXHEkAlCd

Stick around for a quick summary thread 🧵⬇️
#rstats
Laura Ellis
The agenda for #DataMishapsNight is now posted! Data mistake stories fall into k=3 clusters:

🍌“Quality” Data Curation and Downstream Quagmires
🍌“I’ll just …” (Prod Stories)
🍌Machine “Learning” and “Directional” Metrics

Join us on Thursday at 7pm CST https://t.co/LCcthgQmL8 https://t.co/W7VKj4XzUw
One Meme
Source: David Robinson's blog.
Source: David Robinson's blog.
That's a wrap!
Hope you enjoyed today’s edition. Share it with your friends or colleagues who are interested in data science and R. See you next week!
Harsh
Did you enjoy this issue? Yes No
Harshvardhan
Harshvardhan @harshbutjust

A short and sweet curated collection of R-related works. Five stories. Four packages. Three jargons. Two tweets. One Meme.

Personal website: https://harsh17.in.

List of all packages covered in past issues: https://www.harsh17.in/nextpackages/.

In order to unsubscribe, click here.
If you were forwarded this newsletter and you like it, you can subscribe here.
Created with Revue by Twitter.
Knoxville, Tennessee.