View profile

Breaking the Jargons #2: June Edition

Parul Pandey
Parul Pandey
Hi there!
Welcome to the second edition of this newsletter. This edition brings you a mix of various articles ranging from course reviews, useful open-source libraries for machine learning to tips for automating your projects. I hope you enjoy the read.
📜Articles
Here are some of my favorite articles published in June:
I reviewed the recently released Hugging Face course. I look at the course content, its offerings, and whether or not it ticks the right boxes for us. 
In this article, I present a quick tour of some of the libraries that I recently encountered and which could be a great supplement to your machine learning stack. These are not your basic EDA libraries but advanced libraries which compile trained traditional machine learning models into tensor computations, a topic modeling technique that leverages BERT embeddings and libraries enabling interpretability for Pytorch models.
Handling categorical variables forms an essential component of a machine learning pipeline. There are many ways to encode categorical variables, and pandas’ dummy variable encoding is one. However, this encoding technique comes with its own limitation, and in this article, I present some workarounds to save ourselves from the trap.
Have you ever found yourself in a situation where it became difficult to decipher your codebase? Do you often end up with multiple files like untitled1.py or untitled2.ipynb? The situation is even grimmer in data science. Often, we limit our focus on the analysis and the end product while ignoring the quality of the code that is responsible for the analysis. In this article, I share my three favorite tools to help organize and structure your projects in a reusable and reproducible format.
Writing in Data Science can have a transformative effect not only in your journey but also in your career. I appeared on the FastBook Reading Sessions organised by Weights & Biases to discuss the same. I wrote this piece to summarize what I covered there. Primarily it discusses why writing matters in data science and how it can be used as a tool to leverage your portfolio.
🎙️ Interviews
This time I got to interview Dmitry Gordeevalso known as dott in the Kaggle worldHe is a Kaggle Competition’s Grandmaster and a Senior Data Scientist at H2O.ai. In this interview, Dmitry talks about his recent win in the Indoor Location & Navigation competition on Kaggle and his approach to Data Science in general.
🔬 Research Papers Recommendations
The research paper I found pretty interesting this month:
This paper compares the effectiveness of the recently proposed Deep learning frameworks for Tabular datasets. The authors examine Tabnet, Neural Oblivious Decision Ensembles (NODE), DNF-Net, and 1D-CNN deep learning models and compare their performance on eleven datasets with XGBoost. Out of the eleven datasets, nine datasets were derived from the papers of these deep learning models. The authors conclude the following vital points via their study:
  • The XGBoost model generally outperformed the deep models.
  • In most cases, the deep learning models perform worse on datasets that did not appear in their original papers,
  • No deep model consistently outperformed the others.
  • However, the ensemble of deep learning models and XGBoost outperforms the other models in most cases.
  • Finally, in the words of the authors: 
while significant progress has been made using deep models for tabular data, they still do not outperform XGBoost, and further research is needed in this field. Our somewhat improved ensemble results provide another potential avenue for further research.
Tabular Data: Deep Learning is Not All You Need | source:
Tabular Data: Deep Learning is Not All You Need | source:
💡 Concept corner
I find it fascinating when people break down complex machine learning concepts in easy-to-understand bits. Edwin Chen has this wonderful piece on the intuition behind Random Forests 🌳🌳🌳. If you are new to machine learning, this’ll help you grasp the concept, and if you are a veteran, you’ll enjoy the analogy.
🎁 Resource of the Month
A new and freeOpenCV course has been released by freeCodeCamp.org in association with the creators of OpenCV. The course teaches a wide range of exciting topics like Image & Video Manipulation, Image Enhancement, Filtering, Edge Detection, Object Detection, Tracking, Face Detection, and the OpenCV Deep Learning Module.
Click on the image above to go to the course.
Click on the image above to go to the course.
That is all for this edition. See you with another roundup next month. You can subscribe to receive the newsletter directly in your mailbox every month or share it with someone who could find them helpful.
Until next month,
Parul
Did you enjoy this issue? Yes No
Parul Pandey
Parul Pandey @pandeyparul

Breaking down data science jargon, an article a time.

In order to unsubscribe, click here.
If you were forwarded this newsletter and you like it, you can subscribe here.
Created with Revue by Twitter.