AI Generated Rovers Not Mechanically Sound (Yet)

Taking a break from exploring electronics, I went to the to-do list and picked off the item “look into generative AI”. This particular story started several years ago when GitHub user @johndpope opened a GitHub issue on my Sawppy repository advocating for a Lexan body shell. Aesthetics is not my focus for Sawppy but I’m glad to see others are thinking about cosmetic enhancements. In a recent update, @johndpope added a large number of images generated by Stable Diffusion. They’re really… something. If there’s potential here, I really have to squint to see them. For the most part these images generated by Stable Diffusion were not mechanically sound. Or even mechanically feasible. Or even sane. It’s a patchwork of bits I recognize, assembled into a surrealistic dream that reminds me of Salvador Dali paintings.

(Image credit: Stable Diffusion from @johndpope prompt)

What’s going on here? An Ars Technica article about Stable Diffusion running on Apple Silicon had put a note in the back of my mind, and after seeing @johndpope updates I thought I would look into it further. I do own an Apple MacBook Air with a M1 Apple Silicon processor appropriate for the links in that Ars Technica article, but it is my understanding my gaming PC’s NVIDIA RTX 2070 GPU would be faster still. So I followed instructions on this @AUTOMATIC1111 GitHub repository to run Stable Diffusion locally on my machine.

My experiment results were no better than what @johndpope had posted. Jumbles of things, nothing very coherent, and the occasional misshapen nightmare fuel. Other tools like Midjourney and OpenAI’s DALL-E were supposed to be better, but they were commercial offerings not available for running locally and I didn’t feel like this experiment was worth handing over my credit card. Then I read Microsoft had licensed DALL-E for Bing Image Creator. No credit card necessary, just a Microsoft account. Well, I have that!

To see if things have gotten better, I headed over and here’s the most sane result from the prompt: “mechanical diagram of a six-wheel mars rover in blueprint style

(Image credit: Bing Image Creator from my prompt)

This is better looking than what I got out of local Stable Diffusion. (And ironically less Dali-like, given the DALL-E name.) But it is clearly weak on sound mechanical design concepts starting with the fact I asked for six wheels and got only four. Symmetry is not a well understood concept, either, as these four wheels are visibly misaligned relative to each other along orthographic axes. And there are random parts scattered around, what’s up with that? And finally, it seemed to have ignored the “Mars” part of my prompt as this creation shows no indication of adaptations for a Martian operating environment.

I tried a few variations on my prompt and my impression of this tool is to lean into its tendency for mechanical nonsense and get designs packed with greeble, because it’s certainly got plenty of visual noise. But I certainly can’t use it for anything that can function mechanically. To be fair, mechanical design is not the focus of such image generators. Plus, this field is still evolving rapidly so in a few months things might be very different. But at least for today, image generation AI pose no threat to mechanical engineering jobs.

A short while later I got another idea: instead of trying to make it do something mechanical, how about an abstract cartoon rover mascot?


[UPDATE]: In the comments, Quinn Morley got an interesting looking rover from the prompt “SAWPPY the rover, with six wheels and a body made of glass instead of metal, on Mars.

(Image credit: Bing Image Creator from Quinn Morley’s prompt)

At first glance, this looks really good!

But upon closer inspection, I noticed the suspension linkages are attached to tires instead of hubs, and there seem to be only five wheels instead of the six specified. Something about this particular combination of flaws is appealing to DALL-E’s inscrutable brain because it also showed up in my cartoon mascot experiment.

4 thoughts on “AI Generated Rovers Not Mechanically Sound (Yet)

  1. Got a decent one in bing image creator on the second try.

    SAWPPY the rover, with six wheels and a body made of glass instead of metal, on Mars.

    https://www.bing.com/images/create/sawppy-the-rover2c-with-six-wheels-and-a-body-made-/652affcb99784a779156992b18ce6430?id=n6KGyp%2blovNSlP8T5ajRvg%3d%3d&view=detailv2&idpp=genimg&FORM=GCRIDP&mode=overlay

    Also just did a metric crapload of these for a portfolio project. The ones for IceHive and Clover are Bing. I’m not in the Chat GPT DALLE-3 Beta yet unfortunately so I’m stuck with using it in Bing.

    https://www.quinnmorley.com/portfolio

    Like

    1. At first glance: Hey that’s indeed pretty decent!

      Looking closer: Hmm. all suspension linkages are attached to tires instead of hubs, and there seem to be only five wheels instead of the six specified.

      These traits are especially interesting to me because whatever part of the neural net generated this mechanical layout is likely also responsible for the five-wheeled rover in my post scheduled for tomorrow. I’m very curious what training data combined into this particular quirk and whether it’ll disappear in future versions.

      – –

      Are all IceHive & Clover images generated or just the ones with square aspect ratio? At least one Clover image resembles a Fusion 360 screenshot, so it’d be fascinating if that came out of the generator.

      Like

      1. Two of them are F360, and then several are actual pictures. What I couldn’t get for months with anything else was the lava tube with clover and crops. I think I finally settled on the prompt “An expansive lava tube blanketed by a dense carpet of small clover, glistening with countless dew droplets. 12 tomato plants stand tall, growing out of the bed of clover.”

        For IceHive it was “Visualize the ‘Ice Hive’ deep ice drilling mission on Saturn’s moon Titan: A vast sludge-covered-ice landscape, under a thick orange-hued atmospheric haze. Individual swarmbots, resembling small and flat ice-hauling tanks, bore a hole into the ground as a team, creating a deep vertical hole. Tetherless and autonomous, they navigate up and down the sheer faces of the hole in distinct helical paths, extracting ice and possibly other materials. Each robot functions autonomously.” I had to completely lie to it that there was no tunnel boring machine or the whole damn thing would go off the rails. Also it kept adding tethers between things or had things fly, so maybe 1 / 5 was up the right alley and maybe 1 / 20 was decent. It only takes one though to really make an idea pop. You can just sit there in bing hitting generate over and over. It’s great if you have two screens. Still, apparently in ChatGPT you can modify them (sort of like in midjourney how you could vary one you liked with a modified prompt.

        Like

      2. That sounds nice! I’ve frequently wished to pick one of the results “this one is close, let’s refine this one” but there’s no such option on Bing Image Creator today. I understand some of the generative systems have nondeterministic behavior and can’t support fine-tuning, so it’s nice to know that others have such capability.

        Like

Leave a comment