Appreciating the Poetic Misunderstandings of A.I. Art

admin

What does an “Art Deco Buddhist temple” look like? The phrase is nearly nonsensical; it’s hard to imagine a Buddhist temple built in the Art Deco style, the early-twentieth-century Western aesthetic of attenuated architecture and streamlined forms. But this didn’t deter the Twitter account @images_ai, which promises “images generated by […]

What does an “Art Deco Buddhist temple” look like? The phrase is nearly nonsensical; it’s hard to imagine a Buddhist temple built in the Art Deco style, the early-twentieth-century Western aesthetic of attenuated architecture and streamlined forms. But this didn’t deter the Twitter account @images_ai, which promises “images generated by A.I. machines.” When another Twitter user threw out that prompt, in early August, @images_ai responded with a picture that looks something like an Orientalist Disney castle, a mashup of pointy spires and red angled roofs with a patterned stone-gray façade. Or perhaps it resembles the archetypal Chinese Buddhist temple crossed with a McDonald’s—a fleeting, half-remembered image from a dream frozen into a permanent JPEG on social media.

Since @images_ai launched, at the end of June, it has gained a following for its constant feed of surreal, glitchy, sometimes beautiful, sometimes shocking pictures created with open-source machine-learning tools. The account has produced everything from a version of Salvador Dalí’s painting “The Persistence of Memory” in the neon-pastel style of Lisa Frank to a depiction of “the edge of reality and time,” a frightening swirl of floating eyes, hourglasses, and windows onto nowhere. It sometimes publishes dozens of images a day. But the account’s proprietor is very much a human being. “People do think that I’m a bot all the time,” Sam Burton-King, a twenty-year-old student at Northwestern University, told me recently. “They think if they tag me with something it will just be made. Most of the requests that I get are crap.”

Burton-King, who uses they and them pronouns, is not a programmer but a musician who arrived at Northwestern from the United Kingdom on a scholarship. They began as a math major, but, finding the coursework too difficult, moved into studying philosophy and music. This year, they noticed that an account they followed on Twitter, a vaporwave music label called DMT Tapes FL, was posting A.I.-generated images—creepy, cyberpunk-ish landscapes in bright blue and pink. They noticed other so-called art accounts, not all of them powered by A.I., among them @gameauras, which offers “video game images with elegiac auras” and now has more than sixty thousand followers. Burton-King decided to “try to curate my own feed and have a page of interesting art that had been created by these machines,” they said. Their project is part artistic archive and part collaboration with the audience, an A.I.-driven game of aesthetic telephone in which the fun is seeing what the machine gets wrong.

To create art for @images_ai, Burton-King feeds a selection of written prompts into what’s known as a generative adversarial network (G.A.N.), a machine-learning system in which two artificial neural networks—computer models that mimic the information processing of a human brain—compete with each other to come up with a result that best matches the inquiry. Burton-King often runs the tasks in browser tabs while studying; each image takes ten to twenty minutes to generate. The final image jerkily coalesces out of a field of pixellated static as the machine progresses through anywhere from a hundred to two thousand images, drawing ever closer to something it identifies as an “Art Deco Buddhist temple.”

The neural networks that Burton-King uses includes one called CLIP, which is trained on a database of four hundred million text-and-image pairs culled from sites across the Internet—likely including social networks such as Pinterest and Reddit. CLIP was released, in January, by the organization OpenAI, as an effort to better determine how well a written description matches a corresponding image. (Comparing an image of a dog to the word “cat,” CLIP would find a low correlation.) Soon after it launched, though, a twenty-three-year-old artist named Ryan Murdock realized that the program’s processes could be reversed: you could input text, like “cat,” and, starting with blank pixels, “iteratively update the image,” Murdock told me, until CLIP determines that it resembles a cat. “That way, you can take CLIP from being a classifier to being something that can drive image generation,” he said. Murdock pioneered the technique, combining CLIP with a commonly used G.A.N. in a program he called Big Sleep. (@Images_ai started out using Big Sleep and then moved to another system that followed Murdock’s method.)

CLIP is able to deduce when something looks like a cat, but it also can draw some very problematic conclusions. At one point, Murdock made a system that could take an image and automatically produce a description of it in text. “I never released it because it produced horrifyingly biased and cruel captions when you put people in it,” he said. It spat out derogatory terms—racist and ableist language. He continued, “It should come as little surprise that when you get these really powerful neural networks including, like, all of Reddit, they’re going to come out with some really disturbing attitudes.” In that sense, he added, A.I.-generated images are a “reflection of the collective unconsciousness of the Internet.” CLIP inherited its source material’s biases as well as its information.

“The art is in discriminating the good from the bad,” Burton-King said—figuring out which words to input, how big to make the images, and when to stop the generative process. But the most compelling aspect of the account might be its ability to enact an artistic fever dream, a kind of magic spell: “You can just type something and have it manifest in front of you,” Burton-King said. “I think that’s the main appeal for everyone.” It doesn’t even require coding fluency; @images_ai published a tutorial for anyone to perform the trick using open source tools online.

What do people want to see? The worst and most common @images_ai requests are Internet memes, according to Burton-King. “People ask for Shrek all the time, or Big Chungus, or Donald Trump in various situations,” they said. One successful prompt was “Elon Musk experiencing pain,” which yielded a grotesque collage of grimacing faces and Tesla chargers, and drew subsequent requests for scenes of other tech entrepreneurs being tortured. Burton-King also gets a lot of K-pop-themed requests (“stan Loona”) which they have refused outright lest they get flooded with fans sending in similar prompts. But they know what makes a good prompt. While we spoke, they pulled up the @images_ai Twitter notifications: “Somebody asked for ‘a million explosions at once,’ ” they said. “I can’t imagine quite what that would make.” (They decided to find out.) Their favorite prompts are lines pulled from poetry or novels: “Evocative words find evocative images very often,” they continued. Two stanzas from William Blake’s poem “Jerusalem,” from the early nineteenth century—“And did those feet in ancient time / Walk upon Englands mountains green . . .”—resulted in an idyllic green landscape beset by giant pairs of feet and dark looming factories, presumably the poem’s “Satanic Mills.”

The machine “will invariably encapsulate the vibe of the text, very rarely in a way that you anticipate,” Burton-King said. But this translation is far from literal—like all visual art, it’s a matter of interpretation. Looking at the Blake-prompted image, a viewer might just as well recall Shelley’s “two vast and trunkless legs of stone.”

Though the Dalí-Lisa Frank mashup that Burton-King created is surprisingly accurate to both artists’ styles, and several Basquiat mashups have worked out well, there’s no mistaking these works for human creations. “There’s a weird random quality to it.” A.I. images tend to include what look like tiles, with particular patterns repeating in different places, which Burton-King compares to tapestries. The images also don’t make sense as a whole; their components don’t cohere into a unified composition. “There’s a sort of incoherence to it that’s very difficult to emulate,” Burton-King said. The pleasure of @images_ai comes from the queasy gap between the human prompt and the automated result, the poetic misunderstandings that highlight the limits of the machine’s intelligence. Call it an enjoyable form of the uncanny valley.

Next Post

The secrets to great long exposure photography

I remember the first time I saw an image captured using the long exposure photography technique. It blew me away. The skies were stretched across the sky and the water as soft as silk. How was it even possible? Little did I know that this technique would become a game […]