View profile

Mostly Harmless AI

Mostly Harmless AI
By Alejandro Piad Morffis • Issue #7 • View online
🖖 Welcome to another issue of the Mostly Harmless AI newsletter.
Machine learning is becoming more and more a fundamental component in the products and services we consume every day. But the craft is still very much in its infancy and remains a challenging topic for newcomers, especially those without a formal education.
In this issue, we’ll take a look at an emerging paradigm that promises to bring a revolution to the machine learning field, much like compilers were in the 70s and 80s: Automated Machine Learning.

🌟 The topic
Everyone who’s done even a bit of machine learning has had to struggle with the intricacies of getting a model to work outside toy problems. There are literally hundreds if not thousands of knobs you can turn to make training more efficient, more robust, or even just to make it converge. Machine learning as a science has come very far in the last couple of decades, but as engineering, it is still in its infancy.
Enter AutoML or Automated Machine Learning. Sounds almost oxymoronic, or tautological: what does it even mean to automate machine learning?
AutoML is an emerging subfield of machine learning that attempts to progressively optimize most of the process for solving a typical machine learning problem. At a very high level, we can think of this process as a series of steps such as data collection and preparation, feature engineering, model selection and tuning, deployment, monitoring, and so on.
Each of these steps includes a lot of technical decisions with a myriad of options. Just take model selection: are we going full-in with neural networks, or should we try some of the basic models first? Which ones, for the matter? And don’t get me started with neural networks, we have to pick among layers, activation functions, regularizers, optimizers… If we look at data preparation, again, there are a plethora of options: how to impute missing values (or whether to do it at all), how to encode the data, which filters to apply… You get the point. With AutoML we can reduce the cognitive load of these million decisions significantly, though it’s no silver bullet, of course.
One of the most successful subtasks in AutoML is model selection and tuning. Say you want to try all the classifiers in scikit-learn. You can loop through all the classes (fortunately they share a common API), but you also have to try different parameters in each class, whose valid value ranges are different for each class. And then each model has to be trained and cross-validated, how many times? Instead, you can use something like auto-sklearn, a library that wraps all scikit-learn models under a single class AutoClassifier.
Jumping to the deep learning world, the model selection problem is often framed as Neural Architecture Search (NAS). There are literally hundreds of different techniques but in the end, it all boils down to the same idea: an intelligent search over the possible architectures of neural networks.
The world of AutoML is already huge and keeps growing. There are lots of open-source libraries you can use today, and the big players in the industry are already packing their cloud platforms with these tools.
Be aware, though, that the field is still in its infancy, and it’s no free lunch. As AutoML is basically machine learning on steroids, all the existing challenges are multiplied. The cost of training an AutoML model is orders of magnitude larger than training a single model, although a lot of research is being put into reducing it. And two major issues that remain ahead are related to the explainability and the intrinsic biases encoded in these models.
In the meantime, next time you have to train a somewhat conventional model on a somewhat conventional dataset, make yourself a favour and use the many AutoML libraries available. You’ll be surprised at the results and, even if you still need to tinker with details, you will at least start with a very good baseline to build upon.
📚 For learners
The best resource for getting up-to-date with the field is the AutoML.org website. You’ll find links to papers, tools, and a very good introductory book.
You can also check this awesome list on Github with links to hundreds of papers, books, tools, blog posts, slides, and more.
🔨 Tools of the trade
It’s impossible to list all the incredible AutoML libraries out there, so I’ll share just three of the most-known that will get you pretty far.
auto-sklearn is an AutoML wrapper for scikit-learn that gives you a black-box classifier. Under the hood, it’s powered by Bayesian optimization, a super cool technique to efficiently explore large and complex spaces of parameters with a non-trivial structure.
autokeras is keras-based AutoML framework. You’ll find a few high-level models, like an image or a text classifier, which when fit perform a neural architecture search over a space of sensibly predefined architectures.
auto-pytorch is a similar tool but for pytorch. You’ll find a few high-level models, like an image or a text classifier, which when fit perform a neural architecture search over a space of sensibly predefined architectures.
🍿 Recommendations
The last couple of weeks I’ve been reading Stephen King’s “On Writing”. It’s both a short autobiography and a manual for writing better from the genius of the horror genre. Whether you like his books or not, you gotta concede he can write, in a way that makes many of his readers (myself include) incapable of not rendering in their minds his imaginations. If you want to know the secret sauce behind the success of his narrative art, and how to apply those same tools to your own writing, whether fiction or not, this book pretty much summarizes it.
On Writing: A Memoir of the Craft by Stephen King
🎤 Word of mouth
Last Friday I challenged you to share with us an intriguing philosophical issue you couldn’t stop thinking about. Lots of you answered, with mindblowing questions that sparked discussions still alive today.
Alejandro Piad Morffis
Hey folks 🖖!

Have a nice #PhilosophyFriday 🤔!

❓ Share with us an intriguing philosophical question that you cannot stop thinking about... 👇
I also submitted one such dilemma, a contrived scenario where a self-driven car would have to choose in an impossible situation between different equally bad outcomes. The purpose was to surface the need to think deeply about morality and how to encode it on our AI systems because, regardless of our technological advancement, we are still humans, and AI is reaching the point where it will inevitably have to deal with fundamental human issues.
Alejandro Piad Morffis
Today is #PhilosophyFriday 🤔!

The year is 2035. You're sitting comfortably in your L5 self-driven car, zooming across the highway.

Suddenly, the truck in front drops a big boulder. In a split second the car AI has to make a choice: brake or dodge... 🧵👇
👥 Community
This week I want you to follow three researchers from the team that created auto-sklearn. They have been doing a tremendous job organizing workshops, writing survey papers, and working on basic science, to bring the field of AutoML into frontline research.
I was fortunate to participate in one of their workshops and meet them, and I can tell you they also happen to be genuinely good people, driven by the desire to make machine learning available to as many of us as possible.
☕ Homebrew
Since this issue is on the topic of AutoML, I want to tell you about yet another framework, but this time one I’m personally involved with.
AutoGOAL
autogoal is a different approach to AutoML. Instead of wrapping an existing library with a transparent API (like auto-sklearn) or give you high-level constructs (like autokeras), we tried to mix and match the most common ML tools you already use under a single unified API.
The library is still in its infancy, but it’s quickly growing. It currently includes hundreds of algorithms from sklearn, spacy, gensim, nltk, keras, pytorch, that you can use as black-box machine learning models. The most interesting part, however, is that under the hood everything is powered by a very flexible DSL that you can adapt to any new library.
It would mean to world to me if you could come to our Github and leave us a star, or give us a follow on Twitter.
👋 That’s it for now. Please let me know what do you think of this issue, what would you like to see more or less of, and any feedback you want to share. If you liked this newsletter, consider subscribing (in case you’re not) and forwarding it to those you love. It’s 💯 free!
Did you enjoy this issue?
Alejandro Piad Morffis

A weekly newsletter on all things AI, including recent news, hot resources, and interesting conversations happening all around the Internet.

If you don't want these updates anymore, please unsubscribe here.
If you were forwarded this newsletter and you like it, you can subscribe here.
Powered by Revue