May 9 - Issue #19
An analytical digest of artificial intelligence and machine learning news from the technology industry, research lab and venture capital market.
Reporting from 29th March 2017 through May 9th 2017
I’m Nathan Benaich
— welcome to issue #19 of my AI newsletter! I will synthesise a narrative that analyses and links important news, data, research and startup activity from the AI world. Grab your hot beverage of choice ☕ and enjoy the read! There’s comic relief at the end for those who make it :) A few quick points before we start:
1. I’m excited to launch the 3rd annual Research and Applied AI Summit
, a 1-day invite-only, non-for-profit event in London on June 30th exploring the frontiers of AI from the research lab and tech world. Engage with founders and technical leadership from Graphcore
, Orbital Insight
and more! Request your invite here
has released the world’s largest, most diverse pixel-accurately and instance-specifically annotated street-level imagery dataset, Mapillary Vistas
! This is big news for researchers and companies working on autonomous mobility as it provides key raw materials to teach agents to navigate our roads. TechCrunch profile the company here.
3. The 8th edition of London.AI
is slated for June 1st at Twitter HQ, featuring Rob Whitehead (CTO, Improbable), Victor Dillard (COO, Desktop Genetics) and Fabio Kuhn (CEO, Vortexa).
🚗 Department of Driverless Cars
: Anthony Levandowski has pleaded the 5th amendment
to not incriminate himself in court. He then announced to the Uber ATG team that he would recuse himself
from working on all LiDAR-related work and management for the remainder of the litigation. This won’t make much difference if the rest of the team have access to the alleged stolen Waymo files. Last week saw the last hearing
between Uber and Waymo before federal judges decide whether to temporarily shut down Uber’s work on self-driving cars. In the session, a Waymo lawyer argued that Otto was a “clandestine plan” from the get go; Uber however said that they’ve still not found any evidence
that any of the 14k files touched Uber servers. What a saga!
is also taking signups
for their early rider program in Phoenix, Arizona. The company is adding 500 self-driving Chrysler Pacifica Hybrid minivans to their existing fleet of 100 vehicles that are already on public roads.
Meanwhile, Apple received a permit
from the DMV to test its self-driving vehicles in California. The company is now the 30th authorised tester
in California and will use three Lexus RX 450h vehicles. Not much more information out there just yet except job ads for: software engineering on Maps Special Projects and computer vision engineers for the Technology Investigation team. This might also be for their mobile AR initiatives!
co-founder and CTO, Amnon Shashua, delivered a 1hr video talk
exploring the machine learning components to solving sensing, planning and mapping from the perspective of Mobileye.
plans to open source much of their self-driving technology stack
this July in an effort to “innovate at a higher level” and not “reinvent the wheel”, according to Qi Lu, GM for the company’s Intelligent Driving Group. At this point in the driverless car race, the move doesn’t appear to come from a position of strength.
passed a law allowing the testing of AVs on public roads
provided there is a driver who can take responsibility if required. Furthermore, 15 EU countries are asking the European Commission not to impose data localisation rules as they conduct their 2 year review of the Digital Single Market. Doing so would prevent the free flow of between self-driving cars driving in different countries, thus hampering fleet learning and data network effects.
has signed a deal with Groupe PSA to equip their Peugeot 3008
with self-driving technology developed by the startup. Tests will start from September 2017 in Singapore, where nuTonomy is already experimenting their taxi fleet service since last year.
its work (5 years in the making) on a 1550 nanometer LiDAR system that provides a 200m range vs. 100m-140m achieved by Velodyne’s 905 nanometer wavelength systems. At 70 mph, another 100m of vision will give a car an extra 3 seconds of reaction time when it sees an obstacle. Here’s a pretty picture of the Lumiar output :) Note that Velodyne is working seriously on updated LiDAR designs
(termed ‘solid state’) that reduce the form factor and cost, while also improving range.
Luminar's view of the world
💪 The big boys
Shortly after losing Andrew Ng, Baidu have announced
the opening of a second AI research center in Silicon Valley. This adds 150 scientists to the 200 who are already working at the first site.
have both followed suit, launching their own Silicon Valley AI labs.
In his letter to shareholders
, Jeff Bezos of Amazon
brilliantly articulates what it means to be a Day 1 vs. a Day 2 company. Making the point of embracing external trends, Bezos writes that “machine learning drives our algorithms for demand forecasting, product search ranking, product and deals recommendations, merchandising placements, fraud detection, translations, and much more.”
Buzzfeed ran a profile on the Facebook AI Research
group led by Yann LeCun in NYC. It covers some history of deep learning, the founding of FAIR and its goals, as well as windows into the focus areas for the team. The group also announced an updated release of fastText
, its library for text classification, in 294 languages and with a reduced memory footprint to optimize running on small memory devices. Other areas of research interest include unsupervised learning and predicting future frames in video. On the subject of video prediction, check out the neat results from this research
by Google/Adobe/Michigan that focuses a neural network on human pose to make long-term predictions about motion.
Google published a detailed research paper
that evaluates the performance of their tensor processing unit (TPU), their custom ASIC, operating in a data center environment. A blog post
highlights how inference of different neural networks (CNNs, MLPs and LSTMs) running on a TPU is 15x faster and 30x more power efficient (TPOS/Watt) than when running on an NVIDIA K80 GPU in the same datacenter. The TPU has 25x more multiplier-accumulator units for matrix multiplication - the core computational operation of the chip - and 3.5x as much on-chip memory as the K80 GPU. As a result, Google employs TPUs since 2015 for web and image search, Google Photos and Cloud Vision API, Translate and AlphaGO. The project started in 2014 and was running in a datacenter 15 months later.
🏦 Financial services
Actively managed funds are experiencing large scale withdrawals
due to their fees and largely mediocre performance. Instead, investors are capitalising passively managed peers that rely on systematic trading models. Blackrock
, a $5.4 trillion asset management company, has targeted $30bn in assets
to refocus on quantitative strategies. Laurence Fink, CEO, said “We have to change the ecosystem — that means relying more on big data, artificial intelligence, factors and models within quant and traditional investment strategies.” At least 36 managers associated to these funds are leaving the firm as a result.
A report by consultancy Opimas suggests
that capital markets teams could spend $1.5bn on AI technologies in 2017, growing to $2.8bn by 2021. This will result, the report states, in the loss of 230,000 jobs by 2025, of which 90,000 will be from asset managers.
, a $5bn systematic trading firm, shares the strategy behind it’s “Alpha Factory
”, a distributed community of data scourers and quant modellers. These include the firm’s full-time employees but also amateur quants from around the world who access data from the WebSim portal to extort signal from which trading strategies are generated.
📚 Policy and governance
Many think tanks and governmental organisations are producing reports on the impacts of software and machine learning on the workforce. However, there appears to be a dearth of data
to quantitatively measure this impact. For example, how do we track progress in the capabilities of various AI techniques and the use cases they affect? How are different demographics changing their skills and how is the temporary workforce evolving? We need a dedicated information infrastructure to properly instruct policy decisions.
The Economist runs a briefing on how data is giving rise to a new economy
. It explores the question of how to value data, the challenges with pricing it for external consumption, the resulting lack of data exchanges and incentives to simply buy entire data-creating companies altogether.
For more on the practical business implications of AI, come attend a three-part discussion forum
at Oxford’s Said Business School on May 11, May 18th (I’m at this one!) and June 1st in which attendees are the participants. The events are moderated by Kenneth Cukier of The Economist and coauthor of the NYT Bestseller “Big Data”. Sign up here.
Historian Yuval Noah Harari makes the point in a TED piece that the AI revolution will create a new unworking class
. His point boils down to this: as a civilisation, we have professionalised our roles and tasks over time such that machines can now displace human labor only by recapitulating these specific capabilities. While there are certainly many situations where domain-focused AI can trump human performance, however, the real world requires artificial agents to exhibit a generalised learning ability for which we don’t currently have the tools to create.
On the same topic, the Pew Research Center published a study on the future of jobs and jobs training.
In total, 1,408 respondents commented on whether they see the emergence of new educational and training programs that can successfully train a large number of workers in the skills they need for jobs in 10 years time. Some 70% were mostly optimistic about the future, while others believe that capitalism itself is in real trouble.
A concern for AI developers is the fear of propagating biases that are embedded within the training data from which a system learns. A study in Science
shows how a language model trained on a corpus of human-generated text does adopt, rather predictably, similar stereotypes. Paper here
🆕 New initiatives and frontiers for AI
In an interview with Jack Clark, Hinton implores the research community to push further into neuroscience
to draw inspiration for building AI. He points to mechanisms for long-term memory, reasoning, focused attention as areas for exploration.
A study in Science
has uncovered new fundamentals for how memories are formed
and mature over time (research paper here
). It was previously believed that memories are first created in the hippocampus (short-term) and subsequently transferred to the neocortex for long-term storage. Here, the authors show that episodic memories are created both in the hippocampus and neocortex. In the longer term, cells responsible for these memories in the hippocampus become silent while those in the neocortex retain their activity.
Generative adversarial networks
(GANs) are all the rage in the AI community (see: GAN zoo for a running list
). While discriminative models are used to separate input data (e.g. via classification), generative models can learn to create new examples of the input data they are trained on (e.g. images). A WIRED feature on Ian Goodfellow
, who began working on GANs in 2014, explains how GANs work and their potential impact. The adversarial learning framework works by having one neural network tasked with generating a target data type (e.g. images) and another neural network tasked with calling the real from the fake on the outputs. Over time, the generator learns to fool the discriminator and thus reproduces inherent structure we see in the real world.
To be truly useful in the real world, AI systems must do more than achieve state-of-the-art performance on a specific task within one domain. They must generalise to new problems without needing to be retrained entirely. To that end, transfer learning
offers an approach to leveraging labelled data and knowledge learned from one source domain/task in order to solve a related target domain/task. Sebastian Ruder shares a detailed post
on the what, why, how for transfer learning.
Wait But Why are back at it with a new piece
on Elon Musk’s Neuralink. It’s a chunky one!
The Toyota Research Institute
(TRI) has committed $35m
over four years to research machine learning applications to materials science. TRI is with Stanford, MIT, Michigan and others to develop new models and materials for batteries and fuel cells.
too plans significant investments
into AI, albeit to the tune of $1.2bn over the next four years. This represents 20% of the company’s total annual R&D expenditure by March 2021.
Here’s a selection of impactful work that caught my eye:
Tacotron: Towards end-to-end speech synthesis
, Google Research
. The speech recognition and transcription world has moved from multi-step pipelines to end-to-end learning. This means that engineers no longer need to create domain-specific feature extractors, acoustic and language models to take audio and turn it into text. Here, the authors present an end-to-end generative model that does the reverse: it synthesizes speech directly from characters. The sequence-to-sequence RNN model with attention mechanisms is trained entirely from scratch starting with <text, audio> pairs. One of the challenges is that the same text can correspond to different pronunciations and speaking styles, not least because of punctuation. In this work, Tacotron is trained on 26.4 hours of North American English speech produced by a professional female speaker. Listen to the samples here
, which demonstrate the model’s ability to be sensitive to punctuation, robust to spelling mistakes, learn stress and intonation and work well on complex words.
A deep compositional framework for human-like language acquisition in virtual environment
, Baidu Research
. In this work, the authors explore a path to generalising the ability of agents to navigate 2D environments by learning textual instructions. The core thesis being that if an agent’s perceptual experience can be grounded in language, the agent will be able to learn new tasks through instruction with little difficulty. To that end, an agent in the XWORLD maze environment perceives raw-pixel frames and must take action according to a language command given by a teacher to achieve a set of rewards (e.g. “Please move to the west of the watermelon”). During training, the reinforcement learning agent simultaneously learns the visual representations of the environment, the syntax and semantics of the language and the action module that outputs actions. As a result, the agent exhibits zero-shot navigation ability of executing previously unseen commands using its existing knowledge encoded by language. Neat! Blog post here.
Unpaired image-to-image translation using cycle-consistent adversarial networks
, UC Berkeley
. The problem of image-to-image translation is one of learning a function that maps image 1 (e.g. horse) to image 2 (e.g. zebra) according to a training set of aligned image pairs. Unsurprisingly, however, there’s not a wealth of paired training data for this task. In this paper, the authors learn functions to map images from domain X into domain Y and use adversarial training to produce yield believable outputs. To ensure high quality results, they combine this process with cycle consistency - the idea that X can map to Y but must also be able to map back to X without loss. Check out these results
where you can see a Monet painting transformed into a real-looking photograph, winter scenes turned to summer scenes, iPhone photos looking like DSLR photos with shallower depth of field, and more! This is a great example of what new features could come into Photoshop thanks to generative models.
Federated learning: Collaborative machine learning without centralized training data, Google Research
. Modern cloud-based machine learning services are trained from user data stored in a centralised virtual environment. In this paper, the authors introduce Federated Learning, an approach for a user’s interactions with mobile services to collaboratively inform a shared predictive model across all devices in the federation. Here, a device downloads a machine learning model, updates its parameters by learning from data on the device (re-training occurs on the phone) and shares a summary of updates to the cloud using encrypted communication. That update is propagated to all devices using the shared model. To make training tractable on the device, the authors use a Federated Averaging algorithm (paper
) that combines local stochastic gradient descent (the workhorse for minimizing objective functions) with a server that performs model averaging across devices. This work is implemented in Gboard on Android where user interactions with suggested queries are fed back to a global model to improve predictions as a function of user context.
Universal adversarial perturbations against semantic image segmentation
, Bosch Center for AI and University of Freiburg
. This paper presents an approach for generating universal adversarial perturbations to input camera images that strips the ability of a trained neural network to predict the correct semantic segmentation labels. Scarily, they find a universal noise filter that can remove all pedestrians from a street crossing segmentation while leaving the remainder of the segmentation mostly unchanged otherwise. This has implications for hacking! In a separate paper
by the University of Washington, the same concept of noise upsetting the target label predicted is found, this time with Google’s Cloud Vision API.
Predicting deeper into the future of semantic segmentation
, Facebook AI Research, NYU and Université de Genoble
. As we’ve seen at the start of the newsletter with Mapillary, semantic segmentation of street level scenes is a key component to solving perception and planning for autonomous vehicles. Prior work has focused on predicting future pixels on the basis of past and current pixels in order to learn a model of the world. This work, however, goes a step further and tasks a model with predicting segmentation maps of not yet observed video frames that lie up to a second or further in the future given a sequence of known segmented video frames. They present a CNN architecture that leverages the recurrence structure of a sequence of segmented frames to predict a single time step into the future and use the most recent outputs to predict the next step. The gradients are then back-propagated through time to inform the next prediction, thus forming an autoregressive process. Very cool and useful for automating segmentations but also for modeling the future behaviour of entities in street scenes.
Quick mentions of other great work:
Recurrent environment simulators
. Achieving long-term planning behaviour is required for many tasks that unfold over time. This work introduces recurrent neural networks that are able to make temporally and spatially coherent predictions for hundreds of time-steps into the future within virtual environments.
Low data drug discovery with one-shot learning
, Stanford University and MIT.
This paper introduces a neural architecture, the iterative refinement LSTM module, for low data learning for drug discovery. The network is also shown to be capable of broader generalization through transfer learning.
Learning from demonstrations for real world reinforcement learning
. Deep RL agents are incredibly data hungry, which works in simulation environments, but limits their utility outside simulators. The authors present an algorithm that allows an agent to leverage data from previous control of a system to accelerate its learning rate.
MIT EmTech Digital released the talk videos
from their most recent event in SF. These include Ilya from OpenAI, Prof. X of AutoX (incredible life story!), Lior from Otto and Ruslan of Apple/CMU.
Sergey Levine delivers a fantastic overview
talk on deep robotic learning at CMU robotics.
The provider of high-resolution Earth imagery, data and analysis, DigitalGlobe, released DeepCore
, a framework enabling developers to train and run ML models for processing, detecting and classifying objects in satellite imagery (w/OpenSpaceNet
). They detail how these tools were used to place 4th
in a recent Kaggle competition on feature detection from satellite imagery.
DeepMind open source Sonnet
, a library built on top of TensorFlow for building complex neural networks.
Curious how what food Americans eat? Instacart have open sourced a dataset of 3 million grocery orders
collected through their service.
86 deals (60% US and 28% EU) totalling $834m (53% US and 7% EU).
Domino Data Lab
, a collaborative enterprise data science and quantitative research platform, raised a $27m Series C round
led by Coatue Management. The company mentions customers in pharma, insurance, advanced manufacturing and internet. This round values the company at $113m.
, a predictive analytics SaaS platform that delivers increased productivity, reliability and safety to industrial applications, raised a $90m Series C round
valuing the company at $2bn. Founded in 2014, Uptake now has over 700 employees and signed a deployment deal
with Berkshire Hathaway’s wind power fleet energy subsidiaries to reduce operational and maintenance costs.
, which develops a portable computer vision-based blood diagnostic device, raised a $3.5m seed round from Sequoia Capital. Check this blog post about how they use CNNs to classify blood cell types.
, a stealth semiconductor startup founded by a few authors of Google’s TPU chip, raised a $10m Seed round
led by Social+Capital. No further details went live :)
5 acquisitions, including:
, a provider of predictive maintenance software for industrial IoT to run machine learning experiments, detect anomalies, identify causal factors and predictors, was acquired for $20m
by Progress Software (NASDAQ: PRGS). DataRPM was founded in 2012, employed 53 FTE and raised $6.17m.
, a computer vision-based solution for industrial supply chain quality assurance, was acquired by Cognex
(NASDAQ: CGNX) for an undisclosed amount. The ViDi was founded in 2012 in Switzerland.
, a 3D space position tracking software and hardware product with applications to robotics and virtual reality, was acquired by Baidu
(NASDAQ: BIDU) for an undisclosed sum. xPerception was founded in the US in 2016.
Congratulations on reaching the end of issue #19! Here’s a prize :)
Anything else catch your eye? Comments, questions? Just hit reply!
Did you enjoy this issue?
Carefully curated by Nathan Benaich with Revue
If you were forwarded this newsletter and you like it, you can subscribe here
If you don't want these updates anymore, please unsubscribe here