#1 Newly Proposed TinyTL Provides Significant Accuracy Improvements with Little Memory Overhead
Intelligent edge devices have been on the increase in our daily lives. Combining artificial intelligence (AI) and these edge devices, there are many real-world applications such as smart homes, smart retail, autonomous driving, and so on. Tiny Machine Learning (TinyML) technology aims to make computing at the edge cheaper, less expensive, and predictable.
However, state-of-the-art deep learning AI systems typically demand tremendous computational resources and expertise, which hinders the application of edge devices.
There has been a lot of interest and research to this end. Among them is this recently proposed Tiny-Transfer-Learning (TinyTL) for memory-efficient on-device learning that aims to adapt pre-trained models to newly collected data on edge devices.
The critical thing to note about this new method is that unlike conventional methods that focus on reducing the number of parameters or FLOPs, TinyTL directly optimizes the training memory footprint by fixing the memory-heavy modules while learning memory-efficient bias modules.
Researchers also introduce lite residual modules that significantly improve the model’s adaptation capacity with little memory overhead. Extensive experiments on benchmark datasets consistently show the effectiveness and memory-efficiency of TinyTL, paving the way for efficient on-device machine learning.
#2 Google AI: On Device Simultaneous Face, Hand and Pose Prediction
Google recently announced MediaPipe Holistic, a solution that provides a new state-of-the-art human pose topology that unlocks novel use cases. MediaPipe Holistic consists of a new pipeline with an optimized pose, face, and hand components that each run in real-time, with minimum memory transfer between their inference backends, and added support for interchangeability of the three components, depending on the quality/speed tradeoffs.
MediaPipe Holistic estimates the human pose with BlazePose’s pose detector and subsequent keypoint model. Then, using the inferred pose key points, it derives three regions of interest (ROI) crops for each hand (2x) and the face and employs a re-crop model to improve the ROI (details below). The pipeline then crops the full-resolution input frame to these ROIs and applies task-specific face and hand models to estimate their corresponding keypoints. Finally, all key points are merged with those of the pose model to yield the full 540+ keypoints. With its 540+ key points, aims to enable a holistic, simultaneous perception of body language, gesture, and facial expressions. Its blended approach enables remote gesture interfaces, as well as full-body AR, sports analytics, and sign language recognition.
The technique for gesture control can unlock various novel use-cases when other human-computer interaction modalities are not convenient. You can try it out in their web demo and prototype your own ideas with it, go here: https://mediapipe.dev/demo/holistic_remote/
#3 A Python OpenFST Wrapper With Support for Custom Semirings and Jupyter Notebooks
In this paper, researchers introduce mFST, a new Python library for working with Finite-State Machines based on OpenFST. mFST is a thin wrapper for OpenFST and exposes all of OpenFST’s methods for manipulating FSTs.
Additionally, mFST is the only Python wrapper for OpenFST that exposes OpenFST’s ability to define a custom semirings. This makes mFST ideal for developing models that involve learning the weights on a FST or creating neuralized FSTs. mFST has been designed to be easy to get started with and has been previously used in homework assignments for a NLP class as well in projects for integrating FSTs and neural networks. The work exhibits mFST API and how to use mFST to build a simple neuralized FST with PyTorch.
#4 The Winning Solution of Hateful Memes Challenge
Hateful Memes is a new challenge set for multimodal classification, focusing on detecting hate speech in multimodal memes. This paper proposes a new model that combines multimodal with rules, to achieve the first ranking of accuracy and AUROC of 86.8% and 0.923 respectively.
The researcher implemented MRM to VisualBert to enhance model effectiveness in Hateful Memes, and applied several common technologies such as K-fold, model stacking, semi-supervised learning, which significantly improved the AUROC and accuracy of classification.
The most important thing about the model is the combination of rules extracted from the data set with multimodal framework, which improved both the accuracy and the AUROC by more than 13%. On the other hand, the work shows that the performance of multimodal models on difficult samples is not so good. According to the researcher, the attempts to improve the multimodal framework in the future should focus on it.
#5 Machine Learning for Streaming Data in Python
In machine learning, the conventional approach is to process data in batches or chunks. Batch learning models assume that all the data is available at once. When a new batch of data is available, said models have to be retrained from scratch. The assumption of data availability is a hard constraint for the application of machine learning in multiple real-world applications where data is continuously generated. Additionally, storing historical data requires dedicated storage and processing resources which in some cases might be impractical.
Well, a different approach is to treat data as a stream.
In this paper, researchers introduce River, a machine learning library for dynamic data streams, and continual learning. It provides multiple state-of-the-art learning methods, data generators/transformers, performance metrics, and evaluators for different stream learning problems.
River results from the merger of the two most popular packages for stream learning in Python: Creme and scikit-multi-flow. River introduces a revamped architecture based on the lessons learned from the seminal packages. River’s ambition is to be the go-to library for doing machine learning on streaming data. It is open source under a large community of practitioners and researchers.