- last week DeepMind released StreetLearn (NeurIPS paper
). Navigating through an unstructured environment is a basic capability of intelligent, sentient being, humans for example. Humans are able to do this as we not only have a sense for where we are in the present but also where we are looking to get to, using visual cues to refine our understanding of space and direction. Machines, to date, have only been able to navigate using coordinates, pre-defined routes and/or using positioning systems and localisation to wayfind.
What DeepMind is asking the agent to do is to wayfind to a specific location using only the dataset of StreetView images (the street-level photos you see when use Google Maps). This is the human equivalent of travelling through a city and memorising the various streets, locations, landmarks using only non-written visual cues.
Google’s DeepMind have done this by building a neural network agent that inputs images observed from the environment and predicts the next action it should take in that environment. They then train it end-to-end using deep reinforcement learning. The result is that the agent can navigate through a city having never built a map, merely knowing where visual representations or artifacts lie.