Researchers Have Taught Machines How to Follow Lego Instruction Manuals

Researchers have taught machines how to follow Lego instruction manuals

Lego’s enduring appeal comes not from the complexity of the sets, nor from the adorable minifigure versions of pop culture icons, but from the building process itself, turning a box of seemingly random parts into a complete model. . It’s a satisfying experience, and one that robots could one day steal from you, thanks to researchers at Stanford University.

Lego instruction manuals are a masterclass in how to visually convey an assembly process to a builder, regardless of their background, experience level or language they speak. Pay close attention to the parts required and the differences between one picture of the partially assembled model and the next, and you’ll be able to figure out where all the parts need to go before moving on to the next step. Lego has refined and polished the design of its instruction manuals over the years, but while they’re easy for humans to follow, machines are just learning how to interpret the step-by-step guides.

One of the biggest challenges when it comes to machine learning to build with Lego is interpreting the two-dimensional images of 3D models in traditional printed instruction manuals (although various Lego models can now be assembled via the company’s mobile app). , which provides full 3D models of each step that can be rotated and examined from any angle). Humans can look at an image of a Lego brick and instantly determine its 3D structure to find it in a stack of bricks, but for robots to do that, Stanford University researchers had to develop a new learning-based framework that they call the Manual-to-Executable-Plan network, or MEPNet for short, as detailed in a recently published article.

Not only does the neural network have to extrapolate the shape, form, and 3D structure of the individual parts identified in the manual for each step, it also needs to interpret the overall shape of the semi-assembled models that appear at each step, regardless of their orientation. Depending on where a piece needs to be added, Lego manuals will often provide a picture of a semi-assembled model from a completely different perspective than the previous step. The MEPNet framework has to figure out what it’s seeing and how it correlates to the 3D model it generated, as illustrated in the steps above.

Screenshot: Ruocheng Wang, Yunzhi Zhang, Jiayuan Mao, Chin-Yi Cheng, and Jiajun Wu

The framework must then determine where the new parts at each step fit into the previously generated 3D model by comparing the next iteration of the semi-assembled model to the previous ones. Lego manuals don’t use arrows to indicate the location of parts, and at most use a slightly different color to indicate where new parts should be placed, which may be too subtle to detect on a scanned image of a printed page. The MEPNet framework has to figure this out on its own, but what makes the process a bit easier is a feature unique to Lego bricks: the studs at the top and the studs at the bottom that allow them to snap together. securely with each other. . MEPNet understands the positional limitations of how Lego bricks can be stacked and attached based on the location of studs in a piece, helping to narrow down where in the semi-assembled model they can be attached.

So can you drop a stack of plastic bricks and a manual in front of a robotic arm and expect to have a complete model back in a few hours? Not yet. The goal of this research was simply to translate the 2D images of a Lego manual into assembly steps that a machine can functionally understand. Teaching a robot to manipulate and assemble Lego bricks is another challenge, this is just the first step, although we’re not sure if there are any Lego fans who want to pawn the actual building process on a machine.

Where this research could have more interesting applications is potentially automatically converting old Lego instruction manuals into the interactive 3D building guides included in the Lego mobile app now. And with a better understanding of translating 2D images into three-dimensional brick-built structures, this framework could potentially be used to develop software that could translate images of any object and spit out instructions on how to turn it into a Lego model.

Leave a Comment

Your email address will not be published.