Data Machina

By alg01

A weekly digest of machine learning curiosities, data science geekery, and other data amenities. Curated by @ds_ldn in the middle of the night. PLEASE NOTE: Data Machina is no longer published here. Go to to get the latest Data Machina

A weekly digest of machine learning curiosities, data science geekery, and other data amenities.

Curated by @ds_ldn in the middle of the night.

PLEASE NOTE: Data Machina is no longer published here. Go to to get the latest Data Machina

By subscribing, you agree with Revue’s Terms of Service and Privacy Policy and understand that Data Machina will receive your email address.






Data Machina - A personal note & important news

When I started Data Machina I thought: Well... maybe a few people will read it, but I’ll try my best. It turns out that 145 issues and 2 ½ years later more than 7,950 people read Data Machina every week.I’ve received amazing feedback, kind donations, awesome …


Data Machina - Issue #144

1. A Guide to The Machine Learning Engineering Loop2. Unveiling Mathematics behind XGBoost3. Embeddings @Twitter4. [perhaps] The 25 Best Data Visualizations of 20185. Open sourced: MS infer.NET Bayesian Inference Framework 6. Serverless Machine Learning with …


Data Machina - Issue #143

1. Time-Series Prediction Using RNN-LSTM2. Forecasting @Uber: An Introduction3. Machine Learning, Information Theory & Tail Bounds4. The Use of Embeddings in OpenAI Five5. AVA Algorithms: The Art & Science of Image Discovery @Netflix6. Multi-Armed Ban…


Data Machina - Issue #142

1. Bayesian Data Science: Simulation & Probabilistic Programming2. NLP's Generalization Problem: How Researchers are Tackling it3. Serverless for Data Scientists4. Learning Meaning in NLP- The Semantics Mega-Thread5. Understanding What Artificial Intellig…


Data Machina - Issue #141

1. Program Sythesis: Can We Teach Computers to Write Code?2. Opensourcing TransmogrifAI: Automated ML for Structured Data3. Neural Processes: Probabilistic Gaussian Process+Deep Learning4. Interpretating Probability: A Worthy Enterprise5. The Exploding & …


Data Machina - Issue #140

1. How Should We Evaluate Machine Learning for AI? A New Way2. Scalable Bayesian Inference with Hamiltonian Monte Carlo3. Deepmind's AlphaGo Zero Demystified4. ESPnet: An End-to-End Speech Processing Toolkit5. Facebook Grants: Statistics for Improving Insight…


Data Machina - Issue #139

1. Google AI Chief - Machine Learning Architecture Blueprint2. On Word Embeddings3. Tutorial: Version Control for Data Science Projects4. Introduction to Google BigQuery Machine Learning5. Cutting Through the Hype of Google AutoML6. Membrane- Reliable, Scalab…


Data Machina - Issue #138

1. Yann LeCun - Learning World Models: The Next Step Towards AI2. Machine Learning: Alchemy for the Modern Computer Scientist3. Geosharded Recommendations with Hilbert Curve @Tinder4. Seedbank: Collection of Interactive Machine Learning Examples5. An Open Fra…


Data Machina - Issue #136 (copy)

1. NLP's ImageNet Watershed Moment Has Arrived2. Design Patterns for Production NLP Systems3. Berkeley RiseLab: A Short History of Prediction-Serving Systems4. Do Bayesians Overfit?5. [awesome] Foundations of Machine Learning6. Uber Labs: An Intriguing Failin…


Data Machina - Issue #136

1. The Eight Rules of Optimization2. Story of an ML Pipeline - 5PB daily logs data & 1,000s of Models3. Probabilistic Deep Learning - Bayes by Backprop4. Machine Learning + Kafka Streams Examples5. [new] Facebook Tensor Comprehensions for High Perf. ML6. …


Data Machina - Issue #135

1. Deploying Machine Learning at Scale2. Tensorflow: The Confusing Parts3. Digging into AWS SageMaker Data Science Notebooks4. Representing & Comparing Probabilities (pdf, 131 slides)5. Phil's Data Structures Zoo6. A Visual Explanation of Google's Transfo…


Data Machina - Issue #134

1. Data Science vs. Statistics: Two Cultures?2. How to Explain Gradient Boosting3. Scalable Machine Learning with Fully Anonymised Data4. Better Map Pins with DBSCAN & Random Forests5. decaNLP: The Natural Language Decathlon6. Machine Learning ROI: A Canv…


Data Machina - Issue #133

1. Improving Language Understanding with Usupervised Learning2. Netflix AI Interview Questions: Acing the AI Interview3. Twitter Meets TensorFlow4. AI Infrastructure & Machine Learning Operations (MLOps)5. Tutorial: Deep Learning for Conversational AI (pd…


Data Machina - Issue #132

1. [new] MLflow: An Open Source Machine Learning Platform2. Torus: A Toolkit For Docker-First Data Science3. How a Kalman Filter Works, In Pictures4. The Building Blocks of Neural Networks Interpretability5. The 7 Pillars of Causal Reasoning with Reflections …


Data Machina - Issue #131

1. To Build Truly AI Machines, Teach Them Cause & Effect 2. Delayed Impact of Fair Machine Learning & Outcome Curve3. Combining Methods for Advanced Time-Series Prediction4. Experiments in Online Networks and Peer Effects5. Solve the Unsolvable with M…


Data Machina - Issue #130

1. Bayes, Shannon and The Math Formula of Surprise2. Visualizing 20 Years of Jeff Bezos Letters to Shareholders3. Compliant Machine Learning & Data Mgmt. with KubeFlow4. In The Search for Large Primes with Neural Networks5. Cosmos- All the Algorithms You'…


Data Machina - Issue #129

1. Loc2Vec - Learning Location Embeddings2. Word2Vec and Friends3. An Intro to Hashing in the Era of Machine Learning4. The False Allure of Hashing for Data Anonymization5. Augmenting Clinical Intelligence w/ Machine Intelligence (pdf)6. Language of Graphs: F…


Data Machina - Issue #128

1. Lessons Learnt from My 2 Years of AI Research @MIT2. Why I Lost Faith in p Values3. Understanding & Interpreting Your Machine Learning Models4. Optmisation Data Science: Processing Billions of GPS Data5. A Cartography Nerd's Guide to Custom Map Making6…


Data Machina - Issue #127

1. Machine Learning's Amazing Ability to Predict Chaos2. Bayesian Optimisation is Probabilistic Numerics (pdf)3. Explaining Complex Machine Learning Models with LIME4. Word Embeddings Under the Hood 5. Options for Machine Learning Models Interoperability6. Lo…


Data Machina - Issue #126

1. How to Unit Test Machine Learning Code2. Introducing Tensorflow Probability3. Probability Theory for Scientists & Engineers 4. Automatic Data Visualization with Seq2Seq Models5. Rapid NLP Annotation with Bootstrapping & Active Learning6. Continous …