View profile

Deep Learning Revision - Issue #4

Jean de Dieu Nyandwi
Jean de Dieu Nyandwi
Welcome to the fourth issue of the Deep Learning Revision newsletter. Our past newsletters have been about diving deep into one paper, but let’s try a different format this week and hopefully in the next weeks(I am still learning what’s worth reading for you and how to best convey it).
We will talk about 2 latest deep learning papers, 2 things from the community that I found useful, and 1 useful open-source project from GitHub. With this format, I hope we will be able to spread more good ideas and news every week!

2 Deep Learning Papers
Imagen - Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding
Imagen is a new and state-of-the-art text-to-image model that has extreme photorealism and language understanding. Imagen can generate images of super-high resolution from text inputs.
Imagen Architecture
Imagen Architecture
From an architectural point of view, Imagen looks simple and straightforward. It contains a frozen large language model(LLM) as a text encoder that maps input texts into text embeddings, a text-image diffusion model that transforms text embeddings into 64x64 images, and 2 super-resolution models that upscale the image to 256x256 and 1024x1024 sizes respectively.
Imagen beats pretty much all current state-of-the-art image generation models such as recent DALLE•2, GLIDE, VQGAN+CLIP, and Latent Diffusion.
Comparison of Imagen with other image generation models
Comparison of Imagen with other image generation models
More technical details about Imagen can be found in its paper and webpage. Some amazing tweets showcasing images created by Imagen are herehere, and here.
Gato - A Generalist Agent 🐈
Gato is a new generalist agent that can perform a wide range of tasks such as playing Atari games, captioning images, playing a dialogue, following instructions, stacking blocks with a real robot arm, and navigating in simulated 3D environments, etc…
Gato 🐈
Gato 🐈
Gato was trained on (and can perform) 604 distinct tasks. Its training dataset contains diverse data such as images, texts, button presses, and many more. Its architecture is a single and large transformer model, something that shows the extreme performance of the Transformer. Isn’t fascinating that a single model can perform over 600 tasks using the same weights?
More about Gato can be found in its paper and blog. This video provides a great overview of Gato 🐈.
2 Things from the DL Community
History of computer vision contests won by deep CNNs on GPU
A great blog post on the impacts of GPUs on computer vision progress and what you didn’t know about ConvNets long before AlexNet!
History of computer vision contests won by deep CNNs on GPU
PyTorch Training On Apple Silicon Chips
For so long, it has not been possible to train PyTorch models on Apple Silicon chips. Recently, PyTorch announced that it’s no longer the case. You can now train PyTorch models on Apple M1 chips. More about the release and Apple Metal, here and here. Sebastian Raschka also shared his thoughts about the release on his blog, which I think is an interesting read!
Introducing Accelerated PyTorch Training on Mac | PyTorch
1 Thing from GitHub
Awesome Computer Vision is a well-curated repository of computer vision resources created by Jia-Bin Huang. Books, courses, papers, datasets, pre-trained models, blogs, etc…
Awesome Computer Vision
Until the next week!
If you enjoyed the newsletter, you can share it with people who you think might enjoy reading it!
Did you enjoy this issue? Yes No
Jean de Dieu Nyandwi
Jean de Dieu Nyandwi @jeande_d

Trends, ideas, and the latest news in deep learning and computer vision.

In order to unsubscribe, click here.
If you were forwarded this newsletter and you like it, you can subscribe here.
Created with Revue by Twitter.