John Wayne and the Milkshake

Good Morning Folks! Technically it is morning still at the time of writing this but it is considerably later in the day than I normally do a blog post. I’m in a funk. I think I can fully admit that right now. The thought of sitting down to write a blog post on most days when I am uncertain what I want to actually talk about… is a bit of a challenge at the moment. When I am actively sharing a build I am working on, or yammering about some cool game I am playing I do pretty well. When I bore myself thinking about what I have been up to… it doesn’t really bode well for trying to convert that into something readable. So instead you are getting an adventure in AI Image Generation using Stable Diffusion. This morning one of my coworkers was all excited about prompting Microsoft CoPilot to draw pictures for him.

I think this is a thing everyone goes through when they first start playing with AI Image generation. I tend to do more surrealist prompts… for example, I set forth the basic concept of “John Wayne Drinking a Milkshake”. This was more of a chore than I figured it would be. I had Luke Skywalker eating a bagel rather easily… but I was also trying to explain the concept of “hallucinations” to my coworker and this served as a good lesson. The above image is pretty much what I consider to be the most cogent example of what was generated. There are still some weird things going on with the eyes… and the milkshake that was rendered seems to be cookies and cream… which then was also inexplicably applied to the sleeves of the shirt because that makes perfect sense.

It was one hell of a journey to get there. At first, I thought I would try and get it to spit out “John Wayne in a 57 Chevy drinking a Milkshake”. There are so many things wrong with this image. Firstly you have the impossible space of a car that is sort of wrapping around the figures in a nightmare hellscape with two not-quite steering wheels… Legs that attach out of the wrong places… and right-side John Wayne appears to be sitting on the lap of a three-legged left-side John Wayne. Then there is the playdough fingers of the right subject and the extra joints of the left subject’s most visible arm and the melted playdough nature of the other arm. Legitimate nightmare fuel. The color palate is good though and FEELS like a 50s almost 60s-era photo. I decided that the bit of the prompt about the 57 Chevy was too challenging and that I did not want to fuck with it any longer so I abandoned it.

Let’s ignore the fact that we once again have two John Waynes. There are parts of this that look pretty solid. I have no clue why I got shirtless erotic John Wayne trying to hand me a Milkshake that someone is obviously holding with two hands… as evidenced by the phantom figures. There is some jacked-up perspective stuff going on where the mangled hand of shirtless Wayne is somehow holding the straw… which is much closer to the camera than the figure is. Like if you could somehow remove all of the nonsense on the left side of the picture and just have seated-Wayne at the counter with a single milkshake it would be somewhat reasonable.

Shit just kept getting weirder the deeper I went. This is probably my favorite because again two John Waynes… one of which is lovingly holding quite possibly the worlds largest milkshake. The other one is inexplicably wearing a miniskirt and one leg appears to be a table leg. I also sort of love the super skinny tall milkshake that is sitting on the ground. Basically, I wanted to show some hallucinations… and I got a fever dream to explain that concept. This is the problem I have had with “AI Images” is that for a second when scrolling past them… they seem normal. The longer you look at them the more surreal and nonsensical they become.

I finally had to start including that he was “standing alone” to start to reign in the nonsense and simplify things further. He was no longer drinking a milkshake but just holding one. Eventually, I got a few images that began to look reasonable. Even then the prompt was still a bit fucked up because if you go back to that first image for a figure who was “alone” there were still a couple of dead-eyed “AI Zombies” in the background. Basically I feel like AI in its current state is a neat party trick. If you want “Superman Riding a Horse with a Shotgun”, you are going to get something that fills that prompt… no matter how confusing or contorted it winds up looking. If you want specificity or correct answers… I am just not sure if we are at a point where large language models can fill that bill. I for example would never want to put anyone’s life in the hands of an AI-based decision engine yet.