Let’s talk about daydreams first. This might sound a bit unusual and deviated from the title itself. But I assure you, it’s not.
The fascinating process of daydreaming, which is so passionately hated by productivity enthusiasts and managers, but at the same time dearly loved by artists, dreamers and children alike, has a particular characteristic that I’d like to focus on, for the time being. It has the power to manifest pictures in one’s mind. And through practice or repetition, thinking about one particular setting can paint a rather vibrant picture in one’s mind, no matter how impossible or ridiculous that setting might be.
And humankind thrives on peculiarity. Our curiosity reaches its perceptible peak whenever something comes along that mere commonality fails to explain. That’s why books like Frankenstein, Harry Potter, and paintings of Salvador Dali, and Van Gogh are so widely admired. They challenge us. More specifically, our perception of reality.
This particular “ Manifesting vivid and convincing details from dreams” and “Challenging human perception through artistry” seemed like the job only humans are capable of doing, ever since the dawn of humanity. Everything we used to achieve that, including technology, were merely tools.
It was truly uncharted territory for machines. That lonesome road, however, may have started to pave a broader horizon. For the first time in human history, a machine may finally have the capacity to be considered a companion, an adversary, or even maybe a connoisseur in the process of art-making.
Recent Developments with AI-driven art platforms
Now, AI hasn’t essentially penetrated every domain of art conceivable. Even for those who did, it isn’t above scrutiny or deep criticism. For art forms like literature or music, AI is still like a baby with little to no sophisticated accomplishment. But in the case of generating visual imagery, there has been some really interesting development over the last few years.
Prompt to Image
It has always been a fantasy for many to turn written text into images instantly. And only recently that fantasy has come into reality. Throughout the past few years, several AI-powered tools have emerged that are capable of turning written English text into almost accurately manifested images. Let’s see a few examples of these tools.
Dall-e
Dall-e, and its successor dall-e 2 is one of the first of its kind to introduce the prompt-to-image fad to the common folks. DALL·E is a 12-billion parameter version of GPT-3 trained to generate images from text descriptions, using a dataset of text–image pairs. Images generated by Dall-e are often spot-on in terms of the relevance to its prompt, but still a lot of the time it remains a hit-and-miss. Let’s reveal what I’m talking about.
This image is generated from the prompt: “An astronaut riding a horse in space”
This is a marvelously well-done job for auto-generated photos. Dall-e works surprisingly well when it comes to depicting unlikely scenarios or well-drawn cartoonish images.
However, it also struggles in some areas such as generating multiple faces, the accuracy of factual, historical, or scientific images et cetera. Here’s one example of a catastrophic failure.
The prompt for this one was: “Seven Engineers gathered round a whiteboard”
Midjourney
Around last year, Midjourney AI took the world by storm with its brilliant and artistically sophisticated maneuver of creating eye-catching images from simple English prompts. Midjourney quickly became the new sensation for art enthusiasts as they started to strongly believe that it was time for the artists to put down their paint brushes because they don’t need to go through the rigorous process anymore when one can achieve such a feat instantly. Additionally, the command for initiating a prompt is /imagine which I think is a very nice touch.
Here’s what I’m talking about. This is an image I generated with midjourney’s beta program. The prompt is: “Snow globe hourglass, 3D, 8k”
What intrigued me about this particular image is that not only the AI understood what a snow globe and an hourglass is and combined them, but also incorporated some intricate and detailed scenarios within the object itself.
Another fascinating example: “A cat knight”
Now let’s talk about where Midjourney fails. One of the pitfalls is sometimes it just doesn’t understand the context when a combination of scenarios is given and meshes them together in the wrong order.
For example, this is taken from a subreddit called r/midjourneyfails and the prompt was “Sloths barbecuing pineapple on the grill”
Here, the AI couldn’t distinguish what should be on the grill, the sloth or the pineapple, and decided to put both. But again, to be honest, clarifying context has always been a major problem when it comes to computational procedures.
Upcoming Technologies
It is rumored that big tech companies are upping their game in this domain. Google has announced to launch their own “prompt to images” service named Imagen that not only is claiming to be better at predicting context than all the existing platforms, but also promises to take it one step further, with text-to-video generation. Meta is also getting in on the action with its text-to-video AI called Make-A-Video by Meta.