#1 TensorFlow 3D: Bringing 3D Deep Learning Capabilities into TensorFlow
According to scientists at Google AI, 3D sensors are rising rapidly. Their growth has created a need for scene understanding technology to process the data captured by these sensors.
But there’s a problem. Although computer vision has recently made good progress in 3D scene understanding, there are still challenges due to the limited availability of tools and resources applied to 3D data.
TensorFlow 3D: To further improve 3D scene understanding and reduce entry barriers for interested researchers, Google has recently introduced TensorFlow 3D (TF 3D), a highly modular and efficient library that is designed to bring 3D deep learning capabilities into TensorFlow. TF 3D provides a set of popular operations, loss functions, data processing tools, models, and metrics that enable the broader research community to develop, train and deploy state-of-the-art 3D scene understanding models.
Supported datasets: Currently, TF 3D supports the Waymo Open, ScanNet, and Rio datasets. Still, users can freely convert other popular datasets, such as NuScenes and Kitti, into a similar format, and use them in the pre-existing or custom-created pipelines, and leverage TF 3D for a wide variety of 3D deep learning research and applications.
#2 Reinforcement Learning Approaches in Social Robotics
The past decade has seen a rapid growth of social robotics in many domains including therapy, eldercare, entertainment, navigation, healthcare, education, personal robots rehabilitation, and more.
For researchers interested in using and applying reinforcement learning methods in the field, this paper surveys reinforcement learning approaches in social robotic applications.
Reinforcement learning is a framework for decision-making problems in which an agent interacts through trial-and-error with its environment to discover an optimal behavior. Since interaction is a key component in both reinforcement learning and social robotics, it can be a well-suited approach for real-world interactions with physically embodied social robots. Research in the field of social robotics and human-robot interaction becomes crucial as more and more robots are entering our lives.
The scope of this research is particularly focused on studies that include social physical robots and real-world human-robot interactions with users. Researchers present an exhaustive analysis of reinforcement learning approaches in social robotics. Additionally, they categorize existent reinforcement learning approaches based on the used method, the design of the reward mechanisms, and more. PDF paper link: Reinforcement Learning Approaches in Social Robotics
#3 Image Matching across Wide Baselines: From Paper to Practice
Matching two or more views of a scene is at the core of fundamental computer vision problems, including image retrieval, 3D reconstruction, re-localization, and SLAM.
Despite decades of research, image matching remains unsolved in the general, wide-baseline scenario because of many factors that need to be considered, including viewpoint, illumination, occlusions, and camera properties.
This recent research introduces a comprehensive benchmark for local features and robust estimation algorithms. The modular structure of its pipeline allows to integrate, configure, and combine methods and heuristics easily. Researchers demonstrate this by evaluating dozens of popular algorithms, from seminal works to the cutting edge of machine learning research. They show that classical solutions may still outperform the perceived state of the art with proper settings.
The framework enables researchers to evaluate how a new approach performs in a standardized pipeline, both against its competitors and alongside state-of-the-art solutions for other components, from which it cannot be detached. This is crucial, as sub-optimal hyperparameters can easily hide actual performance.
#4 SWAGAN: A Style-based WAvelet-driven Generative Model
Considerable progress has been made in the visual quality of Generative Adversarial Networks (GANs). However, these networks still suffer from degradation in quality for high-frequency content, stemming from a spectrally biased architecture, and similarly unfavorable loss functions.
Researchers with Tel Aviv University seek to address this issue by presenting a new general-purpose Style and WAvelet based GAN (SWAGAN) that implements progressive generation in the frequency domain in this paper.
SWAGAN incorporates wavelets throughout its generator and discriminator architectures, enforcing a frequency-aware latent representation at every step of the way. This approach yields enhancements in the visual quality of the generated images, and considerably increases computational performance.
Scientists demonstrate the advantage of the proposed method by integrating it into the SyleGAN2 framework, and verifying that content generation in the wavelet domain leads to higher quality images with more realistic high-frequency content.
#5 An Augmentation Method for Speech Emotion Recognition
Speech emotion recognition (SER) deals with recognizing the perceived emotion of a speaker in a recording. A speaker’s emotion can be categorized into many classes such as angry, happy, sad, neutral, etc.
However, automatically recognizing the emotion is challenging as it depends on many factors such as context and demographics.
This paper proposes CopyPaste, a perceptually motivated augmentation technique, for speech emotion recognition (SER). Researchers suggest that for emotions other than neutral, a speaker is perceived to be expressing a feeling even if it is defined within a short segment in a longer utterance. They present three CopyPaste schemes for model training. They show that the suggested strategies improve the SER on all the datasets and perform better than the standard speech augmentation technique of adding noise to the signal.
The results: show that CopyPaste is effective even in noisy test conditions. The best performing models use both CopyPaste and noise augmentation for training. Although the CopyPaste schemes are presented for the SER task in this research, the fundamental idea can be applied to other utterance-based classification tasks such as language identification and age prediction.