RT-2 is the new version of what the company calls its vision-language-action (VLA) model. The model teaches robots to better recognize visual and language patterns to interpret instructions and infer what objects work best for the request.
Researchers tested RT-2 with a robotic arm in a kitchen office setting, asking its robotic arm to decide what makes a good improvised hammer (it was a rock) and to choose a drink to give an exhausted person (a Red Bull). They also told the robot to move a…