#1 A Learnable Frontend for Audio Classification
Developing machine learning (ML) models for audio understanding has seen remarkable progress in the last several years. However, unlike computer vision models, which can learn from raw pixels, deep neural networks for audio classification are rarely trained from raw audio waveforms. Instead, they rely on pre-processed data in the form of mel filterbanks that have been designed to replicate some aspects of the human auditory response.
In this paper, Google researchers introduce LEAF, a fully learnable frontend for audio classification as an alternative to using handcrafted mel-filterbanks. They demonstrate over a large range of tasks that LEAF model is a good drop-in replacement to these features with no adjustment to the task at hand, and can even learn a single set of parameters for general-purpose audio classification while outperforming previously proposed learnable frontends.
The model still relies on an underlying convolutional architecture, with fixed filter length and stride. Learning these important parameters directly from data would allow for an easier generalization across tasks with various sampling rates and frequency content.
#2 Automatically Lock Your Neural Networks When You’re Away
In this paper, researchers propose a builtin, universal deep neural network protection scheme Model-Lock (M-LOCK), with several training strategies to prevent malicious thieves from stealing models and obtaining available performance actively when you are away.
The method is like a key serial number verification mechanism used by many services and products by taking pattern or texture as a certificate. Even if the attacker illegally obtains permission to use the model, the model still cannot achieve expected predictive performance.
This is because the M-LOCK scheme will lock the model automatically without designed certificate in images. Meanwhile, malicious attackers obtain the least amount of information, while legitimate users are not affected. The work demonstrates that the proposed method can successfully generate a lock or daemon inside an arbitrary neural network on multiple datasets.
Potential uses and effects: M-LOCK does not depend on any special or designed structure of models, making it expandable to all existing models and different tasks with insignificant performance affecting. It can even be combined with hardware device features, which makes software and hardware highly integrated as one thing to employ DLaaS more flexibly and conveniently.
Results: Extensive experiments based on MNIST, FashionMNIST, CIFAR10, CIFAR100, SVHN, and GTSRB datasets demonstrated the feasibility and effectiveness of the proposed scheme.
#3 Incorporating Domain Knowledge into Deep Neural Networks
This paper presents a survey of ways in which domain-knowledge has been included when constructing models with neural networks.
This is important; because the inclusion of domain-knowledge is of special interest not just to constructing scientific assistants, but also, many other areas that involve understanding data using human-machine collaboration.
In many such instances, machine-based model construction may benefit significantly from being provided with human-knowledge of the domain encoded in a sufficiently precise form.
Researchers examine two broad approaches to encode such knowledge–as logical and numerical constraints–and describe techniques and results obtained in several sub-categories under each of these approaches. Read the full paper: Incorporating Domain Knowledge into Deep Neural Networks
#4 Deep Dynamic Neural Network to Trade-off between Accuracy and Diversity in a News Recommender System
This research paper proposes a deep neural network to provide timely, highly accurate, and reasonably diverse news recommendations to predict the reader’s next click.
Researchers learned the news representations by incorporating multiple features from the news. They also learned a reader’s long-term interests from the whole click history, the short-term interests, and the diversified interests from the recent clicks, and trade-off between high accuracy and reasonable diversity and apply different attention levels to learn useful news and reader representations.
Results: Extensive experiments on two news datasets demonstrate the effectiveness of the proposed approach. In the future, the researchers would like to include more readers’ feedbacks and address issues such as missing negative implicit feedbacks in NRS as well as to conduct a user study on whether our proposed method can indeed improve the measurement like click-through rate.
#5 How to Quickly and Efficiently Build Layouts from Images Captured Using a SmartPhone
Generating Building Information Model (BIM) in 2D/3D from indoor scenes has applications including real estate websites, indoor navigation, augmented/virtual reality, and many more.
Reconstructing an indoor scene and generating a layout/floor plan in 3D or 2D is a widely known problem. Quite a few algorithms have been proposed in the literature recently. However, most existing methods either use RGB-D images, thus requiring a depth camera, or depending on panoramic photos, assuming that there is little to no occlusion in the rooms.
This paper proposes GRIHA (Generating Room Interior of a House using ARCore), a framework for generating a layout using an RGB image captured using a simple mobile phone camera. Researchers take advantage of Simultaneous Localization and Mapping (SLAM) to assess the 3D transformations required for layout generation.
SLAM technology is built-in in recent mobile libraries such as ARCore by Google. Hence, the proposed method is fast and efficient. It gives the user freedom to generate layout by merely taking a few conventional photos, rather than relying on specialized depth hardware or occlusion-free panoramic images.
Results: GRIHA obtained superior results after comparison with existing methods. The system was tested on multiple hardware platforms to test the dependency and efficiency.