Image Generation (NEW!)
Last updated
Was this helpful?
Last updated
Was this helpful?
Image generation features are simply features that have one or multiple images in the output. Image generation AI features are created in the same way as other AI features in WorkflowAI, either through our web app at , or through our SDK.
The models supporting image generation can be either specific to image generation like Imagen or GPT Image 1, or can handle both image an text outputs like Gemini Flash 2.0 Exp.
Image generation can be a bit more difficult to get right compared to traditional text only tasks. It is good to follow a few rules for best results:
always explicitly mention that the goal is to generate one or multiple images in the instructions. This is especially required for models like Gemini Flash 2.0 Exp that handle more than just image generation.
For example, for an agent that generates an image based on an animal (string
) and a situation (string
),
your instructions could be:
Image specific fields are automatically extracted from the instructions. For example, if your instructions mention that you prefer landscape images, the generated images will have a landscape format.
Here are the variables that are extracted
quality
: The quality of the image to generate. Can be low
, medium
, or high
.
background
: The background type for the image. Can be opaque
, transparent
, or auto
.
format
: The file type for the image. Can be png
, jpeg
, or webp
.
shape
: The shape of the image to generate. Can be square
, portrait
, or landscape
It is possible to generate multiple images in a single call:
either by having multiple image fields in the output
or by having a list of images in the output. In this case, an image_count
field should be added to the
root of the input to control how many images are generated.
An image editing feature is simply an image generation feature that contains an image as an input.
An additional mask
image field can be added to the input schema to allow masking the image.
Not all model supporting image generation support image editing.
The code corresponding to image generation features is very similar to the code for any other features. The only difference is that it contains an Image in the output.
For example, in python:
Supposedly, generating image and text is currently supported by Gemini Flash 2.0 Exp. In practice, it can be quite difficult to get consistent results.
use templated instructions (see ) to inline your input into the message sent to the LLM. Some models like GPT Image 1 are more tolerant to non plain text prompts but others like Imagen are really sensitive to non alphanumeric characters and will generate gibberish if prompted incorrectly.
If this is something you'd like to see us widely support, please add a feature request on or .