Think ahead: Robots anticipate human actions

A robot projects possible trajectories for the bowl.
Provided/Saxena Lab
Seeing a person carrying a bowl toward the refrigerator, a robot identifies the objects in the scene. Knowing that bowls are storable and refrigerators are places to store things, it projects possible trajectories for the bowl, and decides to open the refrigerator door.

A robot in Cornell’s Personal Robotics Lab has learned to foresee human action and adjust accordingly.

The robot was programmed to refill a person’s cup when it was nearly empty. To do this the robot must plan its movements in advance and then follow the plan. But if a human sitting at the table happens to raise the cup and drink from it,  the robot might pour a drink into a cup that isn’t there. But when the robot sees the human reaching for the cop, it can anticipate the human action and avoid making a mistake. In another test, the robot observed a human carrying an object toward a refrigerator and helpfully opened the refrigerator door.

Hema S. Koppula, Cornell graduate student in computer science, and Ashutosh Saxena, assistant professor of computer science, will describe their work at International Conference of Machine Learning, June 18-21 in Atlanta, and the Robotics: Science and Systems conference June 24-28 in Berlin, Germany.

From a database of 120 3-D videos of people performing common household activities, the robot has been trained to identify human activities by tracking the movements of the body – reduced to a symbolic skeleton for easy calculation – breaking them down into sub-activities like reaching, carrying, pouring or drinking, and to associate the activities with objects. Since each person performs tasks a little differently, the robot can build a model that is general enough to match new events.

“We extract the general principles of how people behave,” said Saxena. “Drinking coffee is a big activity, but there are several parts to it.“ The robot builds a “vocabulary” of such small parts that it can put together in various ways to recognize a variety of big activities, he explained.

Observing a new scene with its Microsoft Kinnect 3-D camera, the robot identifies the activities it sees, considers what uses are possible with the objects in the scene and how those uses fit with the activities; it then generates a set of possible continuations into the future – such as eating, drinking, cleaning, putting away – and finally chooses the most probable. As the action continues, it constantly updates and refines its predictions.

In tests, the robot made correct predictions 82 percent of the time when looking one second into the future, 71 percent correct for three seconds and 57 percent correct for 10 seconds. The robot also was more accurate in identifying current actions when it was also running the anticipation algorithm.

“Even though humans are predictable, they are only predictable part of the time,” Saxena said. “The future would be to figure out how the robot plans its action.  Right now we are almost hard-coding the responses, but there should be a way for the robot to learn how to respond.”

The research was supported by the U.S. Army Research Office, the Alfred E. Sloan Foundation and Microsoft.

Media Contact

Syl Kacapyr