@workflowai.agent

Star the repository on Github to get notified when new models are added.

Introduction

WorkflowAI takes a different approach to LLMs that simplifies development while providing more structure and reliability.

Traditional LLM Approach vs. WorkflowAI

Let's compare approaches with a practical example: extracting positive and negative aspects from customer reviews.

Traditional Approach

With traditional LLM frameworks, you might create a prompt like this:

Based on the following reviews:
{{ reviews }}

Identify what are the positive and negative aspects.
Make sure you return the output in JSON format:
{
    "positive_aspects": [string],
    "negative_aspects": [string]
}

This prompt combines several elements:

Component
Description

Instructions

"Identify what are the positive and negative aspects."

Variables

{{ reviews }} - Data to be processed

Output format

JSON structure with arrays of positive and negative aspects

Despite explicitly requesting JSON output, there's no guarantee the model will comply. The LLM might return malformed JSON, skip the format entirely, or include incorrect fields—requiring you to write additional validation code.

WorkflowAI Approach

The same task in WorkflowAI becomes more structured and type-safe:

WorkflowAI guarantees your output will match the defined schema by validating responses and automatically handling invalid data. No more worrying about malformed JSON or writing extensive error handling code.

WorkflowAI generates optimal prompts by combining your Pydantic models, docstring instructions, and any additional context. Benefits include automatic type validation, cleaner code architecture, and consistently reliable outputs.

Let's explore how this works by breaking down the different parts of an agent:

  1. Schema (input, output)

Optionally, an agent can also have tools, which will be explained in the Tools section.

Schema (input, output)

The schema has two structured parts:

Input

Defines the variables that the agent will receive as input

Output

Defines the variables that the agent will return as output

The input and output are defined using Pydantic models.

A very simple example of a schema is the following, where the agent receives a question as input and returns an answer as output.

Find more examples of schemas in the Schemas section.

Descriptions

Adding descriptions to the input and output fields is optional, but it's a good practice to do so, as descriptions will be included in the final prompt sent to the LLM, and will help align the agent's behavior.

Examples

Another effective way to align the agent's behavior is to provide examples for output fields.

There are very little use cases for descriptions and examples in the input fields. The LLM will most of the time infer from the value that is passed.

Required versus optional fields

In short, we recommend using default values for most output fields.

Pydantic is by default rather strict on model validation. If there is no default value, the field must be provided. Although the fact that a field is required is passed to the model, the generation can sometimes omit null or empty values.

Instructions

Instructions are helpful for the agent to understand the task it needs to perform. Use docstring to add instructions to the agent.

Instructions are automatically passed to the LLM via the system prompt.

Variables in instructions

You can customize your agent's instructions using Jinja2 template variables in the docstring. These variables are automatically filled with values from your input model's fields, giving you precise control over the final prompt.

Example: Code Review Agent

We recommend using CursorAI, Claude or ChatGPT to help generate the Jinja2 template.

The template uses Jinja2 syntax and supports common templating features including:

  • Variable substitution: {{ variable }}

  • Conditionals: {% if condition %}...{% endif %}

  • Loops: {% for item in items %}...{% endfor %}

  • Loop indices: {{ loop.index }}

See the Jinja2 documentation for the full template syntax and capabilities.

Temperature

The temperature is a parameter that controls the randomness of the output. It is a float between 0 and 1. The default temperature is 0.

Model

The model is the LLM that will be used to generate the output. WorkflowAI offers a unified interface for all the models it supports from OpenAI, Anthropic, Google, and more. Simply pass the model you want to use to the model parameter.

The list of models supported by WorkflowAI is available here, but you can also see the list of models from the playground, for a more user-friendly experience.

Set the model in the @agent decorator.

When a model is retired or deprecated, WorkflowAI automatically upgrades it to the latest compatible version with equivalent or better pricing. This ensures your agents continue working seamlessly without any code changes needed on your end.

Supported models

When building an agent that uses images, or audio, you need to use a model that supports multimodality. Use the list_models() function to get the list of models and check if they support your use case by checking the is_not_supported_reason field.

The list_models() function is a powerful way to programmatically discover which models are compatible with your agent's requirements. This is especially important for multimodal agents that handle images or audio, as not all models support these capabilities. You can use this information to dynamically select the most appropriate model at runtime or to provide fallback options.

Running the agent

To run the agent, simply call the run function with an input.

When you call run, the associated agent will be created on WorkflowAI Cloud (or your self-hosted server) if it does not already exist.

The agent id will be a slugified version of the function name unless specified explicitly using the id parameter, which is recommended.

Override the default model

You can also pass a model parameter to the agent function itself to specify the model you want to use, and override the default model set in the @agent decorator.

Cost, latency

WorkflowAI automatically tracks the cost and latency of each run, and makes it available in the run object.

Streaming

WorkflowAI also support streaming the output, using the stream method. The stream method returns an AsyncIterator, so you can use it in an async for loop.

Even when using streaming, partial outputs are returned as valid output schemas.

View the prompt

To access the exact prompt sent by WorkflowAI to any AI provider, and the raw response as well, you can use fetch_completions on a run object. For example:

The fetch_completions method is particularly useful for debugging, understanding token usage, and auditing the exact interactions with the underlying AI models. This can help you optimize prompts, analyze costs, and ensure the model is receiving the expected instructions.

Error handling

Read more about error handling in the Errors section.

Cache

To save money and improve latency, WorkflowAI supports caching.

By default, the cache settings is auto, meaning that agent runs are cached when the temperature is 0 (the default temperature value) and no tools are used. Which means that, when running the same agent (without tools) twice with the exact same input, the exact same output is returned and the underlying model is not called a second time.

The cache usage string literal is defined in cache_usage.py file. There are 3 possible values:

  • auto: (default) Use cached results only when temperature is 0, and no tools are used

  • always: Always use cached results if available, regardless of model temperature

  • never: Never use cached results, always execute a new run

Reply to a run

For some use-cases (for example, chatbots), you want to reply to a previously created run to maintain conversation history. Use the reply method from the Run object.

For example, a simple travel chatbot agent can be created as follows:

When using run.reply, WorkflowAI will automatically keep the conversation history.

You can continue to reply to the run as many times as you want.

Another use-case for run.reply is to ask a follow-up question, or ask the LLM to double-check its previous answer.

Using multiple clients

You might want to avoid using the shared client, for example if you are using multiple API keys or accounts. It is possible to achieve this by manually creating client instances

Field properties

Pydantic allows a variety of other validation criteria for fields: minimum, maximum, pattern, etc. This additional criteria are included the JSON Schema that is sent to WorkflowAI, and are sent to the model.

These arguments can be used to stir the model in the right direction. The caveat is have a validation that is too strict can lead to invalid generations. In case of an invalid generation:

  • WorkflowAI retries the inference once by providing the model with the invalid output and the validation error

  • if the model still fails to generate a valid output, the run will fail with an InvalidGenerationError. the partial output is available in the partial_output attribute of the InvalidGenerationError

Last updated