EXPERT VOICES
Artistic potential and ethical considerations in visual generative AI
6 September 2023 – Vol 1, Issue 1.
My position about visual generative AI is that the technology is useful, but the best way of using it is unknown. The artworks that most people are producing at the moment (certainly people who are not artists) are much of the same. The works are very much derivative, and the colours are largely oversaturated. It is as if there is a particular style that AI-promoters subscribe to: hyper-realism and 1990s science fiction book cover aesthetic.
I see many people on Twitter thinking that this is brilliant; that this is going to take over the world… because anyone now can create images and videos. But so far, the creations are not new, rather they are all generative.
Generative AI, in essence, is a statistical summary, an aggregation of existing creations. It scours through the vast repository of human-produced images (and words) and creates amalgams of familiar elements. This raises a question: are the created works iteration, and therefore just an imitation? Can it truly be deemed new when it is, in many ways, a patchwork of pre-existing material?
As anyone can now use AI it may seem that we all can do the work of an artist or graphic designer. There was a similar sentiment when Photoshop came out in 1987. But this was not the case. Photoshop is only effectively usable by people who have had graphic training, and that have graphic aesthetic sensibilities or understand the language.
The same is true for AI. Indeed, we can all stumble onto DALL-E or Midjourney, and instruct the system to imagine a famous work and recreate it in a different style. That’s lovely for us. But if you want a piece of art, or if I need some imagery to use in a PowerPoint presentation, then it is much more difficult for me to be able to explain to AI, to prompt it, because I don’t know the way to describe the visual elements that I want. So, how is it possible to coax the right thing out of it, given its sort of biases and internal coverage of data?
I recently tried to use a picture of me and my grown-up daughter in Midjourney. It insisted on turning me into Harrison Ford, and turning my daughter into a famous actress in a science fiction style. Whatever I did, I couldn’t get past that effect. It was just stuck in this particular style. On the other hand, I have seen works that colleagues at the University of Southampton have done with artists, and the results are really impressive as they drive Midjourney to do something creative.
The technology is fantastic, but the statements that people are making about it are overhyped. Perhaps this is because the people who use it seem to have insufficient imagination or limited artistic skill. These people are basically the ones that boast the technology; the tech people who are trying out their technology, and to them it is indeed useful and brilliant.
For artists, I would personally suggest that the potential is in using AI as yet another tool, combined with their artistry skills. A question artists may ask is whether AI draws images ‘pixel by pixel’ from scratch, or does AI take parts of other images, and collage them together. The answer is that AI draws images from scratch ‘pixel by pixel’. The way it does so is to always ask: ‘what’s the most reasonable way to fill in this next pixel here, based on the millions of images that I’ve got and visual data that I have; and based on the statistical analysis I’ve made of how different parts of the image relate to each other’. AI also considers the way that the text in the prompt relates to the overall image. For example, AI has millions of examples of eyes. Some images of eyes appear in different contexts, and so AI will just choose the relevant context. AI does not really know what an eye is. It just knows the statistical makeup of pictures that have been described as having eyes. The result is not a collage, or a copy/paste from another photo. Rather it is a new ‘pixel by pixel’ drawing.
So, this new work is based on knowledge of other photos. This may be an issue. We have a model of intellectual property laws that dates back hundreds of years. It is based on the idea of copying; of literal and small adjustments that may allow to produce derivative versions of a work. The copyright holder can decide that only they have the right to make new copies of the work. The challenge is that generative AI is not creating a copy. For example, it would not take text from a book, and just pull it out somewhere else. What AI is doing is creating an incredibly complex statistical model, with billions of statistics describing how words and phrases fit together (or dots fit together in the case of images), and how they tend to cluster with each other under a variety of circumstances. We can argue that there is no copy in this; there is just a whole general sort of mishmash of data, the result of “training” the AI. And when you prompt it in a particular way, it might come up with something that hasn’t exactly been seen together in this way, but it absolutely the sort of thing that could have been seen.
If you prompt it in a different way, it might come up with something that is almost exactly how an artist would have expressed this idea. You can create new AI works of art in a particular style, following a style of an artist. That would not look out of place, rather would be convincing. It can be seen as a kind of statistical plagiarism, if you like; the knowledge about how artistic style is made. Instead of saying, ‘I am just going to lift things out, and put them somewhere else’, AI is saying, ‘I’m going to encode all of the knowledge about how your texts or how your images are put together, and use this large statistical mapping to create’. So, under the right conditions, AI can recreate an excerpt from what it read, or might create something that looks almost identical and speaks in the same voice. We need to think about a new concept of statistical stochastic plagiarism.
If the machine somehow remembers what all your pictures look like, and then reproduces them, that is plagiarism, and the machine must be shut down. But isn’t that what humans do as well? People somehow remember images they saw, and then talented artists can recreate those images.
We are used to the idea that machines do very simple things well. They can do rigid calculations very quickly and very adeptly. People on the other hand write novels and diagnose cancer, which are considered very complicated. AI could now recreate things that we thought only humans could create.
So, will artists, authors, and report writers lose their jobs? I think not. I remember when I came to work at the university, people were saying, ‘no one is going to need to go to university anymore, because why would they when they can get all of the information that they learn at university off of a CD ROM?’ This was even before we had the internet. Then we had the internet, Google, and then online courses via videos. And we consistently managed to get over the idea that technology will replace everything. The initial reaction is usually, ‘look at this new technology; it’s going to replace x y z jobs’. And of course, there are some things that it does well!
At the university, we need to understand how to deal with the technology that allows students to create essays without having the understanding themselves. This all happens so quickly; AI has been popularised only in the last year. We have yet to really come to terms with how it can work well, and how to create new value from it. And we have yet to think pedagogically about what this statistical approach to text generation means. What the implications are. But that is a topic for another interview!
Leslie Carr is Professor of Web Science at the University of Southampton. He is a member of Web Science Institute, the Centre for Democratic Futures, and Web and Internet Science group.