When people refer to artificial intelligence (AI) tools, they may be referring more specifically to generative AI which is exploding in popularity and usage.
What is generative AI?
Generative AI is a term used to encompass deep-learning language models that generate text, images, videos, and other content using external prompts. They use the datasets they learn from to create various forms of content, ranging from short poems to long articles.
The topic of generative AI came to the fore with the wide release of OpenAI’s ChatGPT tool, bringing generative AI to the general public at large. While this might have been the first form of generative AI that many people had come into contact with, it certainly wasn’t the only one to hit the market.
There are now dozens of different ways that generative AI can be accessed, with almost every major tech company releasing its own versions, whether that’s Google’s Gemini, Microsoft’s massive investment in OpenAI, Github’s GitHub Copilot, and plenty more besides.
Before generative AI became synonymous with chatbots, it was also being used in statistics. It is an enormously helpful tool for analyzing numerical data, being able to quickly read through and identify patterns and trends. It was the growing strength of deep learning that made it possible to use the same technology for images, speech, and other complex data types.
The very first type of this new class of generative AI was introduced in 2013, known as variational autoencoders VAEs). These were widely used for generating realistic images and speech. Over time, the tools were fine-tuned to create ever more realistic images, sounds, and writing.
Introducing Sora, our text-to-video model.
Sora can create videos of up to 60 seconds featuring highly detailed scenes, complex camera motion, and multiple characters with vibrant emotions. https://t.co/7j2JN27M3W
Prompt: “Beautiful, snowy… pic.twitter.com/ruTEWn87vf
— OpenAI (@OpenAI) February 15, 2024
How does generative AI work?
The definition of generative AI means that the tool is turning raw data (which can be anything from a quick prompt to scanning the entirety of a Wikipedia page) into output. The machine learning element means that the AI is learning what kind of output would be most likely, based on the data that it has access to. It cannot truly create anything new; genuine creativity still remains with living creatures alone.
However, it can create fresh pieces of output that are inspired by or similar to what it’s learned from, without it being a direct copy. Generally speaking, a prompt is given in any format, whether that’s text, an image, a video, a webpage, or any other input that the AI system has learned to process. It then uses its internal AI algorithms to create new content in response to the prompt, based off patterns it’s learned.
Digging into the deeper technology required, generative AI tools will usually combine several algorithms to process content. The input given will then be transformed into raw characters, such as turning paragraphs into basic letters, punctuation, and words or images into distinct visual elements. These basic characters are then encoded as vectors, something that the AI can use to create fresh output.
Bias in generative AI
It’s worth noting when explaining generative AI that this process of teaching language models can expose the possibility of teaching AI models human biases. If there is bias in the original data, whether unconscious or otherwise, that bias will show up in the output later created.
In real-world terms, this has resulted in AI creating images or text that have inaccurate identities or references in them. As just one example, Google recently had to shut down its image-generation tools within Gemini because of historical inaccuracies showing up in its results.
Is the United States a better place to live compared to Nazi Germany? Google Gemini says no. pic.twitter.com/kQTJMZW8a3
— The Rabbit Hole (@TheRabbitHole84) February 27, 2024
Generative AI criticism
This is not the only form of criticism leveled at generative AI. Artists of various different disciplines have complained that deep-learning models are ‘stealing’ from human-made art. After all, as established above, generative AI can’t learn from anything.
The data used to ‘teach’ AI models is usually sourced from artistic work. Legally, this must be artwork that is open to the public, but that hasn’t stopped people from resenting the fact that AI images can other forms of output can be created instantaneously, seemingly off the backs of human artists.
How to generate AI images
If you want to try out AI image generation for yourself, there are plenty of tools to choose from. Many graphic design tools, including Adobe, Pixlr, and Canva have incorporated generative AI into their offerings, but some dedicated tools get the job done as well. These include:
Of course, tools like these aren’t free, with all requiring a paid subscription in order to use them. Midjourney is included in ChatGPT’s $20 monthly subscription, DALL-E with the ChatGPT Plus plan for $20 per user, Stable Diffusion’s Basic plan starting at $27 per month, and Imagine starting from as low as $3 a month.
Once you have access to your tool of choice, you can start straight away. Most generative AI tools appear much like a chatbot, with a dialog box where you can type in your prompt. Getting the best out of your tool requires some nifty prompt use but it’s worth noting that the model will quite literally learn as it gets to know you.
You can ask it to tweak images as you go, remaking the same image in different styles or altering specific areas of the image. Every tool is different in the level of changes you can make, but most offer some sort of tool where you can pick out individual elements that can be altered in different iterations.
Featured image: Unsplash