View profile

Tinker with a Neural Network | Next - Issue #27

Hi there!
Today’s letter covers clustering, tinkering with neural networks, biography generation, AI ethics, etc. Let’s dive in!

Five Stories
Clustering in R is federated into several packages: NbClust, cluster, pvclust, etc. Using parameters, you can quickly get cluster details with several algorithms from several packages. This vignette explains clustering theory as well as functionalities in R. It’s my go-to whenever I’m working on any clustering project.
Neural networks are not so simple to understand (although this explanation by 3blue1brown helps). Tensorflow’s online playground is the perfect place to understand the intricacies of neural networks by experimentation. The web app allows options to control the type of neural network: regression/classification, test-train ratio, learning rate, and of course, the number of nodes and layers. Tinker and learn!
Wikipedia, which is consistently ranked one of the top 10 most visited websites, is often the first stop for many people looking for information about historical figures and changemakers. But not everyone is equally represented on Wikipedia. Only about 20 per cent of biographies on the English site are about women, according to the Wikimedia Foundation, and we imagine that percentage is even smaller for women from intersectional groups, such as women in science, women in Africa, and women in Asia.
Meta AI’s Generating Biographies project is working on NLP based solutions to assist Wikipedia article writers. The model searches websites for accurate information and drafts a Wikipedia-style entry about that person, complete with citations. You can explore the code or contribute on Github.
4. I, Roboethicist
What are coding computer ethics?
By feeding these algorithms with more and more photos of a cat – it will get better at recognizing what is and isn’t a cat. However, these algorithms are not perfect. How will the program treat a stuffed animal of a cat? How will it categorize the image of a cat on a t-shirt? When the stakes are low, like in image recognition, these errors may not matter as much. But for some technology being correct most of the time isn’t sufficient. We would simply not accept a pace-maker that operates correctly most of the time, or a plane that doesn’t crash into the mountains with just 95% certainty.
Colin is a PhD student at Oregon State. This story is about his research on computer science and ethics. You can listen to their podcast as well!
Many data science projects fail. It’s speculative work; career skills like knowing how to assess the viability of a project, when to cut your losses, and why things didn’t work out would be great. Alex has put together a set of questions to ask at each stage of the project lifecycle: scoping, analysis, development, validation and release.
Four Packages
With so many packages in R, it isn’t easy to keep track of them. For instance, I only knew tidytext and quanteda for NLP and text mining. Yesterday, I found textclean and I’m in love with its check_text() function.
CRAN task views aim to provide some guidance on which packages on the CRAN are relevant for a task. They give a brief overview of the included packages and can be automatically installed using the ctv package.
Here’s a list of currently available task views. Check their Github here.
Three Jargons
Akaike information criterion (AIC) measures prediction error and model complexity.
AIC = 2k - 2 log(L), where k is the number of parameters and L is the maximum likelihood.
Bayesian information criterion (BIC) is closely related to AIC and is based on likelihood and model complexity.
BIC = k log(n) - 2 log(L), where k is the number of parameters, n is the number of parameters, and L is maximum likelihood.
Censoring in statistics is when the random variable is truncated at a non-random value. For example, a bathroom scale might only measure up to 140 kg. If a 160-kg individual is weighed using the scale, the observer would only know that the individual’s weight is at least 140 kg.
Two Tweets
Tessa Davis
Of all the meeting platforms we've used over the last 2 years, Zoom is the one I love most.

Most of us think we know Zoom inside out.

But we don't.

Here are 7 Zoom hacks you probably haven't heard of, but you'll be glad you now know.

Dorsa Amir
Slides are visual aids that assist your presentation. Anytime you put something on your slides, its primary purpose is to help the *audience*, not you.

Nothing should distract from your verbal presentation, it should only enhance it.

A mini-presentation on slide design. 1/
One Meme
It was Kevin Kelly’s 68th birthday and we got 68 Bits of Unsolicited Advice. Then, he had another birthday and we got 99 Additional Bits of Unsolicited Advice.
That's a wrap!
I hope you enjoyed today’s letter. Could you share it with a friend or a colleague? See you next week!
Did you enjoy this issue? Yes No
Harshvardhan @harshbutjust

A short and sweet curated collection of R-related works. Five stories. Four packages. Three jargons. Two tweets. One Meme.

List of all packages covered in past issues:

In order to unsubscribe, click here.
If you were forwarded this newsletter and you like it, you can subscribe here.
Created with Revue by Twitter.
Knoxville, Tennessee.