Like most people who are extremely online, Brazilian screenwriter Fernando Marés is fascinated by the images generated by the artificial intelligence (AI) model DALL·E mini. In recent weeks, the AI system has become a viral sensation, creating images based on seemingly random and erratic questions from users, such as “Lady Gaga as the Joker†Elon Musk sued by a capybara,” and more.
Marés, an experienced hacktivist, started using DALL·E mini in early June. But instead of entering text for a specific request, he tried something different: he left the field blank. Fascinated by the seemingly random results, Marés ran the blank search over and over. Then Marés noticed something strange: almost every time he made a blank request, DALL E generated mini portraits of brown-skinned women who sareesa type of clothing common in South Asia.
Marés polled DALL·E mini thousands of times with the blank command entry to find out if it was just a coincidence. He then invited his friends to take turns generating images on five browser tabs on his computer. He said he went on for almost 10 hours without a break. He built a vast repository of over 5,000 unique images and shared 1.4 GB of raw DALL E mini data with Rest of the world.
Most of those images feature photos of brown-skinned women in saris. Why is DALL-E mini seemingly obsessed with this very specific type of image? According to AI researchers, the answer may have something to do with sloppy tagging and incomplete data sets.
DALL·E mini was developed by AI artist Boris Dayma and inspired by DALL·E 2, an OpenAI program that generates hyper-realistic art and graphics from text input. From cats meditating to robot dinosaurs battling monster trucks in a colosseum, the photos blew everyone’s minds, some calling it a threat to human illustrators. OpenAI recognized the potential for abuse and limited access to its model only to a carefully selected set of 400 researchers.
Dayma was fascinated by the art produced by DALL·E 2 and “wanted to have an open-source version that can be accessed and enhanced for everyone,” he said. Rest of the world† So he went ahead and created a stripped-down, open-source version of the model and called it DALL·E mini. He launched it in July 2021 and since then the model has been training and perfecting its performance.
DALL·E mini is now a viral internet phenomenon. The images it produces aren’t nearly as clear as DALL E 2’s and show remarkable distortion and blurring, but the system’s wild renderings—everything from the demogorgon from Weird stuff holding a basketball to a public execution at Disney World — have spawned a whole subculture, with subreddits and Twitter handles dedicated to composing his images. It inspired a cartoon in the New Yorker magazine and the Twitter handle Weird Dall-E Creations has more than 730,000 followers. Dayma told Rest of the world that the model generates approximately 5 million prompts per day and is currently trying to keep up with the extreme growth in user interest. (DALL.E mini has no affiliation with OpenAI and, at OpenAI’s urging, rebranded its open-source model as Craiyon on June 20.)
Dayma admits he’s baffled as to why the system generates images of brown women in saris for blank requests, but suspects it has something to do with the program’s dataset. “It’s quite interesting and I’m not sure why it’s happening,” Dayma said Rest of the world after viewing the images. “It’s also possible that this type of image was heavily represented in the dataset, perhaps with short captions as well,” Dayma said. Rest of the world† Rest of the world also contacted OpenAI, creator of DALL E 2, to see if they had any insight, but haven’t heard back yet.
AI models like DALL-E mini learn to draw an image by parsing millions of images from the web with their captions. The DALL·E minimodel was developed from three main datasets: Conceptual Captions dataset, which contains 3 million image and caption pairs; Conceptual 12M, which contains 12 million image and caption pairs, and The OpenAI’s corpus of approximately 15 million images. Dayma and DALL E mini co-creator Pedro Cuenca noted that their model is also trained using unfiltered data on the Internet, which opens it up to unknown and unexplained biases in datasets that can trickle down to image-generating models.
Dayma is not alone in suspecting the underlying dataset and training model. In search of answers, Marés turned to the popular machine learning discussion forum Hugging Face, which hosts DALL·E mini. There the computer science community weighed in, with some members repeatedly offering plausible explanations: The AI could have been trained on millions of images of people from South and Southeast Asia that are “unlabeled” in the corpus of training data. Dayma disputes this theory, saying that no image from the dataset has a caption.
Michael Cook, who is currently researching the intersection of artificial intelligence, creativity and game design at Queen Mary University in London, challenged the theory that the dataset contained too many photos of people from South Asia. “Usually, machine learning systems have the opposite problem: They don’t actually contain enough pictures of non-white people,” Cook said.
Cook has his own theory about the confusing results of DALL·E mini. “One thing that came to my mind as I did some reading is that a lot of these datasets remove text that isn’t English, as well as remove information about specific people, ie proper names,” Cook said.
“What we may be seeing is a weird side effect of some of this filtering or pre-processing, where images of Indian women, for example, are less likely to be filtered by the ban list, or the text describing the images will be removed and they’ll be added to the image without labels. data set added.” For example, if the captions were in Hindi or another language, the text might get confused when processing the data, causing the image to have no caption. “I can’t say for sure — it’s just a theory that occurred to me while exploring the data.”
Bias in AI systems is universal, and even well-funded Big Tech initiatives like Microsoft’s chatbot Tay and Amazon’s AI recruiting tool have succumbed to the problem. Google’s text-to-image generation model, Imagen, and OpenAI’s DALL.E 2 explicitly reveal that their models have the potential to mimic harmful biases and stereotypes, much like DALL.E mini.
Cook has been vocal critic of what he sees as the growing callousness and revelations that shake off prejudice as an inevitable part of emerging AI models. He told Rest of the world that while it’s commendable that a new piece of technology allows people to have a lot of fun, “I think there are serious cultural and social issues with this technology that we don’t really appreciate.”
Dayma, creator of DALL·E mini, admits the model is still a work in progress and the extent of its biases have not yet been fully documented. “The model has generated a lot more interest than I expected,” Dayma . said Rest of the world† He wants the model to remain open source so that his team can study its limitations and biases more quickly. “I think it’s interesting for the public to be aware of what’s possible so that they can develop a critical mind towards the media they receive as images, to the same degree as media they receive as news articles. ”
Meanwhile, the mystery remains unanswered. “I learn a lot from watching people use the model,” Dayma . said Rest of the world. “If it’s empty, it’s a gray area, so [I] needs to be further investigated.”
Marés said it’s important for people to learn about the potential damage of seemingly fun AI systems like DALL-E mini. The fact that even Dayma can’t discern why the system is spewing these images reinforces his concern. “That’s what the press and critics have [been] saying for years that these things are unpredictable and they have no control over them.”