Image Generation (NEW!)

Image generation features are simply features that have one or multiple images in the output. Image generation AI features are created in the same way as other AI features in WorkflowAI, either through our web app at WorkflowAI.com, or through our SDK.

The models supporting image generation can be either specific to image generation like Imagen or GPT Image 1, or can handle both image an text outputs like Gemini Flash 2.0 Exp.

What are the best practices when dealing with Image generation ?

Image generation can be a bit more difficult to get right compared to traditional text only tasks. It is good to follow a few rules for best results:

always explicitly mention that the goal is to generate one or multiple images in the instructions. This is especially required for models like Gemini Flash 2.0 Exp that handle more than just image generation.
use templated instructions (see Variables in instructions) to inline your input into the message sent to the LLM. Some models like GPT Image 1 are more tolerant to non plain text prompts but others like Imagen are really sensitive to non alphanumeric characters and will generate gibberish if prompted incorrectly.

For example, for an agent that generates an image based on an animal (string) and a situation (string), your instructions could be:

Generate an image that represents an {{ animal }} in the situation: {{ situation }}.

How to customize image generation ?

Image specific fields are automatically extracted from the instructions. For example, if your instructions mention that you prefer landscape images, the generated images will have a landscape format.

Here are the variables that are extracted

quality: The quality of the image to generate. Can be low, medium, or high.
background: The background type for the image. Can be opaque, transparent, or auto.
format: The file type for the image. Can be png, jpeg, or webp.
shape: The shape of the image to generate. Can be square, portrait, or landscape

Not all models support all parameters so models may return images that do not exactly match the desired configuration.

Generating multiple images

It is possible to generate multiple images in a single call:

either by having multiple image fields in the output
or by having a list of images in the output. In this case, an image_count field should be added to the root of the input to control how many images are generated.

How to edit existing Images ?

An image editing feature is simply an image generation feature that contains an image as an input.

An additional mask image field can be added to the input schema to allow masking the image.

Not all model supporting image generation support image editing.

What about generated code ?

The code corresponding to image generation features is very similar to the code for any other features. The only difference is that it contains an Image in the output.

For example, in python:

from pydantic import BaseModel

import workflowai
from workflowai.fields import Image


class ImageGenerationInput(BaseModel):
    prompt: str


class ImageGenerationOutput(BaseModel):
    image: Image


@workflowai.agent(id="image-generation", model="gpt-image-1")
async def image_generation(_: ImageGenerationInput) -> ImageGenerationOutput: ...


async def generate_image(prompt: str):
    run = await image_generation.run(ImageGenerationInput(prompt=prompt))
    image = run.output.image
    print(image.content_type)
    print(image.data)  # data is the base64 encoded image data

Can I create a feature that outputs an image AND text?

Supposedly, generating image and text is currently supported by Gemini Flash 2.0 Exp. In practice, it can be quite difficult to get consistent results.

If this is something you'd like to see us widely support, please add a feature request on GitHub or Discord.

PreviousImproving your AI Feature NextSchemas

Last updated 5 months ago

Was this helpful?