
12 min read
Three Baby Dolls Meet For The First Time lets anyone use AI for the first time.
The project started in 2021, long before any other user-focused generative AI applications. It brought this then-very-technical topic into the street, where everyone could use it—for the first time.
I created this installation as part of my bachelor thesis, including organizing the hardware and programming the interface.
without GhatGPT, Midjourney, Stable Diffusion, ...
I was already obsessed with early generative AI models.



With Dall-E Open AI showed the first promising text to image model, starting a development that since then only accellerated.
I wanted to tell my friends, I wanted to tell the world. But few people seemed to be very interested in the topic.

First try
Back then, the results from generative AI models were pretty underwhelming. Each output on its own didn’t say much about real-world potential. So I started thinking: what if I created a self-contained, endless generative process? Like a tiny design agency that keeps producing content non-stop. That way, we could observe a new kind of workflow—without focusing too much on the outcome itself.


OpenAI’s DALL·E looked incredible, but it wasn’t publicly available at the time—not even for testing. So I had to rely on open-source models I could actually run and experiment with. It took a lot of trial and error to find one that really worked for what I had in mind.



DeepDaze
Siren x CLIP

cosmic love and attention

a time traveler in the crowd

meditative peace in a sunlit forest

mist over green hills
BigSleep
BigGan x CLIP

cosmic love and attention

artificial intelligence

a pyramid made of ice

a lonely house in the woods
LatentReVision
VQGAN x CLIP
A house near the lake
Strawberry cake
A big green snake
The colorful planets
VQGANxCLIP (LatentReVision) turned out to be the best fit. It gave me the most consistent and usable results—while other models tended to generate overly abstract visuals or strange, repeating patterns.
The first version followed a four-step loop: generate an image, describe it in one word, turn that word into a sentence, then design a layout that brought it all together. Each finished graphic could trigger the next cycle, creating an infinite loop of visual and verbal interpretation.
But after watching it run for a few rounds, it became pretty clear that this wasn’t going anywhere useful. The design quality was mediocre, and it got boring fast. Research also showed that it’s unlikely designers will be fully replaced by AI—it's more about a collaborative future, where humans and AI push design forward together.
The whole setup was missing the most crucial piece: real human–AI interaction.
At that point, I felt kind of stuck. So I printed out a bunch of AI-generated images and threw them onto paper with ink. I tried to think very randomly and technically at the same time.
It was actually kind of fun, so I started prototyping a digital version. Using hand tracking, you could place artworks on a digital canvas and use them as brushes. But I still wasn’t happy with the interaction—it didn’t really say much about real-world designer–AI collaboration.
It finally clicked during a class presentation when I showed how I generated the images. Everyone started shouting out prompts they wanted to try, and we spent the whole time exploring what the AI came up with. That half hour was the most productive part of the entire project—and made me rethink the whole concept.
Second try

After that presentation, I quickly built a first prototype of the “Infinite Canvas.” Users could speak a prompt, which would be generated into an image. Once the image appeared, they’d see themselves masking out a part of it—this masked section would then be refined based on the next prompt.

When showing the prototype to others, I noticed they often used very basic prompts and ended up disappointed with the results—they had a specific style in mind. To fix this, I added a style selection bar with the most common visual categories users seemed to be looking for. They could switch styles anytime—even mid-process—and even blend two styles by selecting the area between categories.
Infinity Canvas V3

At that point, I ran a structured user test with people who hadn’t seen the installation before. I asked them to use it while speaking their thoughts aloud, then followed up with a series of questions about the experience and the overall concept.
It quickly became clear that the mask creation step was confusing. Since it wasn’t essential to the experience, I decided to remove it for the sake of simplicity.
Infinity Canvas V4
Final version

I got the opportunity to put my installation in one of our university’s public offices, right on a busy street. This was an amazing chance to elevate its impact even more. By placing it behind a big window, it could stay accessible 24/7—open to anyone, at any time. With or without experience.
This required a few changes for the installation to work in this context. First, I used a microphone that was glued directly to the glass. It picked up vibrations from the window and worked amazingly—even when the street was extremely crowded. This made it almost invisible and gave the screen a kind of magic.
Since the button was wireless, I could glue it directly to the glass too, creating a futuristic, minimalistic look.



Microsite
To continue the design process immediately afterward, all images are available right after generation. They’re automatically uploaded in the background and appear on the website a few seconds later.

Here, all generated images can be explored in an endless list and downloaded to share or use in other projects.
Check out the website here:
You might be wondering about the strange name. In reality, the project has no fixed title—it uses GPT-2 to continuously generate new names for itself based on the prompt:
“The name of this media art installation about the role of AI in design is:”
To this day, new names are still being created. You’ll see them anywhere dynamic naming is possible.
“Three Baby Dolls Meet For The First Time” was the first name it ever generated—and is used for situations where a changing name isn’t an option.
