# Image Generation (NEW!)

Image generation features are simply features that have one or multiple images in the output.\
Image generation AI features are created in the same way as other AI features in WorkflowAI, either through our\
web app at [WorkflowAI.com](https://workflowai.com), or through our SDK.

The models supporting image generation can be either specific to image generation like Imagen or GPT Image 1, or\
can handle both image an text outputs like Gemini Flash 2.0 Exp.

![Image Generation of a Sunset](/files/Bt8ylH3CiHBts6eNkNYQ)

## What are the best practices when dealing with Image generation ?

Image generation can be a bit more difficult to get right compared to traditional text only tasks. It is good\
to follow a few rules for best results:

* always explicitly mention that the goal is to generate one or multiple images in the instructions. This is\
  especially required for models like Gemini Flash 2.0 Exp that handle more than just image generation.
* use templated instructions (see [Variables in instructions](/python-sdk/agent.md#variables-in-instructions))\
  to inline your input into the message sent to the LLM. Some models like\
  GPT Image 1 are more tolerant to non plain text prompts but others like Imagen are really sensitive\
  to non alphanumeric characters and will generate gibberish if prompted incorrectly.

For example, for an agent that generates an image based on an animal (`string`) and a situation (`string`),\
your instructions could be:

```
Generate an image that represents an {{ animal }} in the situation: {{ situation }}.
```

## How to customize image generation ?

Image specific fields are automatically extracted from the instructions. For example, if your instructions\
mention that you prefer landscape images, the generated images will have a landscape format.

Here are the variables that are extracted

* `quality`: The quality of the image to generate. Can be `low`, `medium`, or `high`.
* `background`: The background type for the image. Can be `opaque`, `transparent`, or `auto`.
* `format`: The file type for the image. Can be `png`, `jpeg`, or `webp`.
* `shape`: The shape of the image to generate. Can be `square`, `portrait`, or `landscape`

{% hint style="info" %}
Not all models support all parameters so models may return images that do not exactly match the desired\
configuration.
{% endhint %}

## Generating multiple images

It is possible to generate multiple images in a single call:

* either by having multiple image fields in the output
* or by having a list of images in the output. In this case, an `image_count` field should be added to the\
  root of the input to control how many images are generated.

![Multiple images](/files/pJkRrYfrBwmAVJJTZyMo)

## How to edit existing Images ?

An image editing feature is simply an image generation feature that contains an image as an input.

An additional `mask` image field can be added to the input schema to allow masking the image.

Not all model supporting image generation support image editing.

![Edited image](/files/C26tzdPVJTncVSOdcEB8)

## What about generated code ?

The code corresponding to image generation features is very similar to the code for any other features.\
The only difference is that it contains an Image in the output.

For example, in python:

```python
from pydantic import BaseModel

import workflowai
from workflowai.fields import Image


class ImageGenerationInput(BaseModel):
    prompt: str


class ImageGenerationOutput(BaseModel):
    image: Image


@workflowai.agent(id="image-generation", model="gpt-image-1")
async def image_generation(_: ImageGenerationInput) -> ImageGenerationOutput: ...


async def generate_image(prompt: str):
    run = await image_generation.run(ImageGenerationInput(prompt=prompt))
    image = run.output.image
    print(image.content_type)
    print(image.data)  # data is the base64 encoded image data
```

## Can I create a feature that outputs an image AND text?

Supposedly, generating image and text is currently supported by Gemini Flash 2.0 Exp. In practice, it can be\
quite difficult to get consistent results.

If this is something you'd like to see us widely support, please add a feature request\
on [GitHub](https://github.com/workflowai/workflowai/discussions) or [Discord](https://discord.com/invite/qcsq8Kva).


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.workflowai.com/use-cases/image-generation.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
