View profile

AI Scholar Weekly - Issue #51

AI Scholar Weekly - Issue #51
By Educate AI  • Issue #51 • View online
Google AI Introduces TensorFlow 3D; Which AI Applications will Flourish by 2030; Deep RL Approaches in Social Robots; Towards More Accurate and Practical Motion Capture; The Best Method to Learn ML Fast (and more)

Top AI Research This Week!
#1 TensorFlow 3D: Bringing 3D Deep Learning Capabilities into TensorFlow
According to scientists at Google AI, 3D sensors are rising rapidly. Their growth has created a need for scene understanding technology to process the data captured by these sensors. 
But there’s a problem. Although computer vision has recently made good progress in 3D scene understanding, there are still challenges due to the limited availability of tools and resources applied to 3D data.
TensorFlow 3D: To further improve 3D scene understanding and reduce entry barriers for interested researchers, Google has recently introduced TensorFlow 3D (TF 3D), a highly modular and efficient library that is designed to bring 3D deep learning capabilities into TensorFlow. TF 3D provides a set of popular operations, loss functions, data processing tools, models, and metrics that enable the broader research community to develop, train and deploy state-of-the-art 3D scene understanding models. 
Supported datasets: Currently, TF 3D supports the Waymo Open, ScanNet, and Rio datasets. Still, users can freely convert other popular datasets, such as NuScenes and Kitti, into a similar format, and use them in the pre-existing or custom-created pipelines, and leverage TF 3D for a wide variety of 3D deep learning research and applications.
‘TensorFlow 3D codebase and model has been useful for our 3D computer vision projects, and we hope that you will as well’, researchers say. See Github Repository. Read more: 3D Scene Understanding with TensorFlow 3D.
#2 Reinforcement Learning Approaches in Social Robotics
The past decade has seen a rapid growth of social robotics in many domains including therapy, eldercare, entertainment, navigation, healthcare, education, personal robots rehabilitation, and more.
For researchers interested in using and applying reinforcement learning methods in the field, this paper surveys reinforcement learning approaches in social robotic applications.
Reinforcement learning is a framework for decision-making problems in which an agent interacts through trial-and-error with its environment to discover an optimal behavior. Since interaction is a key component in both reinforcement learning and social robotics, it can be a well-suited approach for real-world interactions with physically embodied social robots. Research in the field of social robotics and human-robot interaction becomes crucial as more and more robots are entering our lives.
The scope of this research is particularly focused on studies that include social physical robots and real-world human-robot interactions with users. Researchers present an exhaustive analysis of reinforcement learning approaches in social robotics. Additionally, they categorize existent reinforcement learning approaches based on the used method, the design of the reward mechanisms, and more. PDF paper link:  Reinforcement Learning Approaches in Social Robotics
#3 Image Matching across Wide Baselines: From Paper to Practice
Matching two or more views of a scene is at the core of fundamental computer vision problems, including image retrieval, 3D reconstruction, re-localization, and SLAM. 
Despite decades of research, image matching remains unsolved in the general, wide-baseline scenario because of many factors that need to be considered, including viewpoint, illumination, occlusions, and camera properties. 
This recent research introduces a comprehensive benchmark for local features and robust estimation algorithms. The modular structure of its pipeline allows to integrate, configure, and combine methods and heuristics easily. Researchers demonstrate this by evaluating dozens of popular algorithms, from seminal works to the cutting edge of machine learning research. They show that classical solutions may still outperform the perceived state of the art with proper settings.
The framework enables researchers to evaluate how a new approach performs in a standardized pipeline, both against its competitors and alongside state-of-the-art solutions for other components, from which it cannot be detached. This is crucial, as sub-optimal hyperparameters can easily hide actual performance.
#4 SWAGAN: A Style-based WAvelet-driven Generative Model
Considerable progress has been made in the visual quality of Generative Adversarial Networks (GANs). However, these networks still suffer from degradation in quality for high-frequency content, stemming from a spectrally biased architecture, and similarly unfavorable loss functions.
Researchers with Tel Aviv University seek to address this issue by presenting a new general-purpose Style and WAvelet based GAN (SWAGAN) that implements progressive generation in the frequency domain in this paper.
SWAGAN incorporates wavelets throughout its generator and discriminator architectures, enforcing a frequency-aware latent representation at every step of the way. This approach yields enhancements in the visual quality of the generated images, and considerably increases computational performance.
Scientists demonstrate the advantage of the proposed method by integrating it into the SyleGAN2 framework, and verifying that content generation in the wavelet domain leads to higher quality images with more realistic high-frequency content.
#5 An Augmentation Method for Speech Emotion Recognition
Speech emotion recognition (SER) deals with recognizing the perceived emotion of a speaker in a recording. A speaker’s emotion can be categorized into many classes such as angry, happy, sad, neutral, etc. 
However, automatically recognizing the emotion is challenging as it depends on many factors such as context and demographics. 
This paper proposes CopyPaste, a perceptually motivated augmentation technique, for speech emotion recognition (SER). Researchers suggest that for emotions other than neutral, a speaker is perceived to be expressing a feeling even if it is defined within a short segment in a longer utterance. They present three CopyPaste schemes for model training. They show that the suggested strategies improve the SER on all the datasets and perform better than the standard speech augmentation technique of adding noise to the signal. 
The results: show that CopyPaste is effective even in noisy test conditions. The best performing models use both CopyPaste and noise augmentation for training. Although the CopyPaste schemes are presented for the SER task in this research, the fundamental idea can be applied to other utterance-based classification tasks such as language identification and age prediction.
Other Great AI Papers
This approach solves the fundamental challenges that previously confined transformers to low-resolution images by synthesizing images in the megapixel range and outperforming state-of-the-art convolutional approaches. Read more 
AI models are increasingly being used in many real-life applications. While they have been powerful in solving complex decision-making, the trustworthiness of these models remains a big issue for their broad adoption. This study presents an AI Testing Framework that enables the users to perform automated testing of the black-box AI models by synthetic generation of realistic test cases. Testing Framework for Black-box AI Models
A novel study that provides a thorough summary and discussion of synchronous remote collaboration systems that utilize AR/VR/MR technology. A Survey on Synchronous Augmented, Virtual and Mixed Reality Remote Collaboration Systems
Researchers have designed a simple reinforcement learning agent that, with a specification only of agent state dynamics and a reward function, can operate with some degree of competence in any environment. Simple Agent, Complex Environment
New research that’s an important step towards making motion capture more accurate and practical. It proposes A-NeRF, a fully automatic approach for estimating a volumetric actor model and jointly refining skeleton pose from monocular or multi-view video. It is the first to define NeRF models for extreme and articulated motion and scores high on the Human 3.6 Million benchmark. Read more
AI Resources
The best method to learn ML skills fast ~ On becoming a machine learning engineer
A story about machine learning ~ Free PDF Book on Patterns, Predictions, and Actions 
A better way to build ML — why you should be using Active Learning
Top AI News
BuzzFeed uses AI to create romantic partners in its latest quiz. Read full story
Humana joins with IBM Watson Health on AI tool to provide a better member experience. Read more
Which artificial intelligence applications will flourish by 2030? Read more
AI Scholar Weekly
Thanks for reading. Create a ripple effect by sharing this AI Scholar Newsletter with someone else, and they will also be lit up!
If you have suggestions, comments, or other thoughts, we would love to hear from you, email me at chris@educateai.org, tweet at @cdossman, like us on Facebook, or connect with me on LinkedIn
Did you enjoy this issue?
Educate AI

AI Scholar Weekly brings you everything new and exciting in the world of Artificial Intelligence, Machine Learning, and Deep Learning every week for free.

In order to unsubscribe, click here.
If you were forwarded this newsletter and you like it, you can subscribe here.
Powered by Revue