#1 When Attention Meets Fast Recurrence
Large language models have become increasingly difficult to train because of the required computation time and cost.
This research shows that by incorporating fast recurrent networks, very little attention computation is needed to achieve both top-performing modeling results and training speed.
The paper presents SRU++, a recurrent unit with optional built-in attention that exhibits state-of-the-art modeling capacity and training efficiency.
The proposed model obtains better perplexity and bits-per-character on standard language modeling benchmarks such as enwik8 and Wiki-103 datasets while using 2.5x-10x less training time and cost compared to top-performing Transformer models.
The results: Demonstrate that highly expressive and efficient neural models can be designed using not just attention. Moreover, fast recurrence with little attention can be a leading model architecture.
#2 An Opportunity for 3D Computer Vision to Go Beyond Point Clouds With a Full-Waveform LiDAR Dataset
Autonomous vehicles (AVs) can transform how transportation is done for people and merchandise while improving safety and efficiency. To the collective effort of developing safe autonomous vehicles, LeddarTech researchers have published the PixSet dataset.
The PixSet dataset contains 97 sequences, each averaging a few hundreds of frames, for a total of roughly 29000 frames. Each frame has been manually annotated with 3D bounding boxes.
What makes this new dataset unique is the use of a flash LiDAR and the inclusion of the full-waveform raw data, in addition to the usual point cloud data. The use of full-waveform data from a flash LiDAR has been shown to improve the performance of segmentation and object detection algorithms in airborne applications. Still, it is yet to be demonstrated for terrestrial applications such as autonomous driving.
Potential use and effects
: The researchers believe there is a great potential to improve further the perception algorithms by leveraging the raw data from the full-waveforms provided by the Pixell flash LiDAR. Their work also provides a baseline for 3D object detection performance from the Pixell point clouds. Read more: The First Full-Waveform Flash LiDAR Dataset for Autonomous Vehicle R&D
#3 A Straightforward Framework for Video Retrieval Using CLIP
Video is one of the most consumed forms of media available on the internet. Such high consumption of video requires finding suitable methods for retrieval by users. Currently, most video browsers rely on annotations made by users to identify video contents. Although this solution is simple to implement, it comes at a high price.
To this end, researchers have explored the application of the language-image model, CLIP, to obtain video representations without the need for said annotations. CLIP is a state-of-the-art Neural Network, which is pre-trained for image-text pairs
This model was clearly trained to learn a common space where images and text can be compared. Using various techniques described in the paper, the suggested framework obtains state-of-the-art results on the MSR-VTT and MSVD benchmarks.
Potential Uses and Effects: This methodology might help create a video representation based on CLIP for longer durations. For example, other works have used frame features to construct a graph that can change through time. Such a representation could keep the strong text alignment suitable for video retrieval. Also, our work can be used as an expert on a future Mixture-of-Experts (MoE) video retrieval system.
#4 Proof That You Can Build Algorithms that are Robust to Adversarial Attacks
Can we develop provably robust ML models against backdoor attacks? Building machine learning algorithms that are robust to adversarial attacks has been an emerging topic over the last decade in academia and industry.
Recent studies have shown that current defenses are not resilient against intelligent adversaries responding dynamically to the deployed defenses.
One recent, exciting line of research aims to develop provably robust algorithms against evasion attacks, including both deterministic and probabilistic certification approaches to solve the challenge.
Yes, a group of researchers in this paper present the first verification process RAB to provide provable robustness for ML models against backdoor attacks. They propose a technique inspired by the randomized smoothing technique against evasion attacks.
On the downside: The approach, in general, is computationally expensive since it requires the training of a large number of models on the smoothed datasets. They suggest using pre-trained classifiers, which are assumed to be clean, to mitigate this issue. One could then use RAB training to finetune on a smaller dataset and obtain a robustness guarantee.
#5 The Importance of Hyperparameter Optimization for Model-based Reinforcement Learning
Researchers with Facebook AI Research, University of Freiburg and Bosch Center for AI, UC Berkeley, and Princeton University have proposed to use automatic hyperparameter optimization (HPO) for Model-based RL.
Model-based Reinforcement Learning (MBRL) is a promising framework for learning control in a data-efficient manner. MBRL algorithms can be fairly complex due to the separate dynamics modeling and the subsequent planning algorithm, and as a result, they often possess tens of hyperparameters and architectural choices.
For this reason, MBRL typically requires significant human expertise before it can be applied to new problems and domains. As such, the researchers in this paper demonstrate that this problem can be tackled effectively with automated HPO, which they apply to yield significantly improved performance compared to human experts. Besides, they show that tuning several MBRL hyperparameters dynamically during the training further improves the performance compared to using static hyperparameters that are kept fixed for the whole training.