Picking up an object is a deceptively hard problem for a robot. Being able to grasp many different types of things in rapid succession? That’s a major challenge.
In this weeks interview, we chat with Derik Pridmore
, co-founder and CEO of Osaro
, to learn how they are using machine learning to enable even commodity hardware to manipulate a variety of objects intelligently and efficiently.
What is the core idea behind Osaro?
Let’s do a thought experiment with two robots. One has a very sophisticated articulated hand, but its software is limited. The other only has a simple suction cup, but it’s paired with a human level of intelligence. Which robot would be more useful? In many cases, we think it’s the latter. IQ is more important than the end effector.
Many complex end effectors don’t add value. Fast, accurate scene understanding and robust planning can solve lots of problems with a simple suction cup end effector. If you need a robot to last for tens of millions of cycles, you need its hardware to be simple.
When you think about a person controlling that second robot as if they were playing a video game, it’s easy to picture them doing it with only an RGB camera. It stands to reason that it’s at least possible to solve the problem that way. That’s why we’ve forced ourselves to use low cost sensors, and make our algorithms perform just as well.
That, in a nutshell, is what we are trying to do.
That makes sense conceptually, but what is your technical approach for reaching that level of performance?
The key is to have the right toolbox of Deep Learning (DL), Reinforcement Learning (RL), and classical Machine Learning (ML). We apply these different types of learning in two major ways: in perception, or how the robot understands the world, and in controls, or how the robot plans and moves within the world. Perception has been the domain of DL, while RL and ML play bigger roles in controls.
On the DL side, we have a model zoo designed for picking at scale with various types of end effectors. We also use a simulation platform and purpose-built data storage and analytics to train, validate, and deploy reliable picking systems that operate in real-world conditions. On the RL side, we’ve leveraged imitation learning using visual and force data.
However, as you approach the limits of what’s currently possible, perception and control become more tightly coupled. Deep Reinforcement Learning (DRL) promises to combine the two, but those models are still very difficult to train.
We’ve seen a lot of papers and demos around reinforcement and deep reinforcement learning in robotics. What’s your take?
In many of those papers, researchers are working in a simulation with idealized systems that give them total access to ground truth. They have a perfect representation of the exact shape and position of every object, the position of every robot joint, perfectly timed data, consistent environmental conditions, and so forth.
Those are totally unrealistic assumptions. They’re not tailoring the algorithm to use only information one can realistically get from sensors. They don’t address a lot of real-world problems, such as glare, strange surfaces, or deformable objects.
It’s very tricky to make an algorithm that has only been trained in simulation work in the real world.
How does all of that tie into Osaro’s business strategy?
We’re looking at situations in which human-level intelligence on every robot would enable full automation. That happens in a few different areas, such as e-commerce picking for distribution and manufacturing for the food and auto industries.
Some people are trying to unlock full automation with their own hardware and software. They’re trying to be “full-stack.” The problem is that it’s hard to convince a large company to buy a startup’s hardware stack. They already have great hardware providers.
We’d rather play to our strengths and focus on the algorithmic part of problem. That’s why we’re focused on selling an integrated vision and control software package that fully automates existing industrial robots at scale. We’re starting with things that are viable now and building forward-looking applications based on where the technology is heading.