@workflowai.agent
Introduction
WorkflowAI takes a different approach to LLMs that simplifies development while providing more structure and reliability.
Traditional LLM Approach vs. WorkflowAI
Let's compare approaches with a practical example: extracting positive and negative aspects from customer reviews.
Traditional Approach
With traditional LLM frameworks, you might create a prompt like this:
Based on the following reviews:
{{ reviews }}
Identify what are the positive and negative aspects.
Make sure you return the output in JSON format:
{
"positive_aspects": [string],
"negative_aspects": [string]
}This prompt combines several elements:
Instructions
"Identify what are the positive and negative aspects."
Variables
{{ reviews }} - Data to be processed
Output format
JSON structure with arrays of positive and negative aspects
Despite explicitly requesting JSON output, there's no guarantee the model will comply. The LLM might return malformed JSON, skip the format entirely, or include incorrect fields—requiring you to write additional validation code.
WorkflowAI Approach
The same task in WorkflowAI becomes more structured and type-safe:
WorkflowAI guarantees your output will match the defined schema by validating responses and automatically handling invalid data. No more worrying about malformed JSON or writing extensive error handling code.
WorkflowAI generates optimal prompts by combining your Pydantic models, docstring instructions, and any additional context. Benefits include automatic type validation, cleaner code architecture, and consistently reliable outputs.
Let's explore how this works by breaking down the different parts of an agent:
Schema (input, output)
Optionally, an agent can also have tools, which will be explained in the Tools section.
Schema (input, output)
The schema has two structured parts:
Input
Defines the variables that the agent will receive as input
Output
Defines the variables that the agent will return as output
The input and output are defined using Pydantic models.
A very simple example of a schema is the following, where the agent receives a question as input and returns an answer as output.
Read more about why schemas are a good idea in the Schemas section.
Descriptions
Adding descriptions to the input and output fields is optional, but it's a good practice to do so, as descriptions will be included in the final prompt sent to the LLM, and will help align the agent's behavior.
Examples
Another effective way to align the agent's behavior is to provide examples for output fields.
Required versus optional fields
In short, we recommend using default values for most output fields.
Pydantic is by default rather strict on model validation. If there is no default value, the field must be provided. Although the fact that a field is required is passed to the model, the generation can sometimes omit null or empty values.
Instructions
Instructions are helpful for the agent to understand the task it needs to perform. Use docstring to add instructions to the agent.
Instructions are automatically passed to the LLM via the system prompt.
Variables in instructions
You can customize your agent's instructions using Jinja2 template variables in the docstring. These variables are automatically filled with values from your input model's fields, giving you precise control over the final prompt.
We recommend using CursorAI, Claude or ChatGPT to help generate the Jinja2 template.
The template uses Jinja2 syntax and supports common templating features including:
Variable substitution:
{{ variable }}Conditionals:
{% if condition %}...{% endif %}Loops:
{% for item in items %}...{% endfor %}Loop indices:
{{ loop.index }}
See the Jinja2 documentation for the full template syntax and capabilities.
Temperature
The temperature is a parameter that controls the randomness of the output. It is a float between 0 and 1. The default temperature is 0.
Model
The model is the LLM that will be used to generate the output. WorkflowAI offers a unified interface for all the models it supports from OpenAI, Anthropic, Google, and more. Simply pass the model you want to use to the model parameter.
Set the model in the @agent decorator.
Supported models
When building an agent that uses images, or audio, you need to use a model that supports multimodality. Use the list_models() function to get the list of models and check if they support your use case by checking the is_not_supported_reason field.
Running the agent
Before you run the agent, make sure you have setup the WorkflowAI client.
To run the agent, simply call the run function with an input.
When you call run, the associated agent will be created on WorkflowAI Cloud (or your self-hosted server) if it does not already exist.
Override the default model
You can also pass a model parameter to the agent function itself to specify the model you want to use, and override the default model set in the @agent decorator.
Cost, latency
WorkflowAI automatically tracks the cost and latency of each run, and makes it available in the run object.
Streaming
WorkflowAI also support streaming the output, using the stream method. The stream method returns an AsyncIterator, so you can use it in an async for loop.
View the prompt
To access the exact prompt sent by WorkflowAI to any AI provider, and the raw response as well, you can use fetch_completions on a run object. For example:
Error handling
Read more about error handling in the Errors section.
Cache
To save money and improve latency, WorkflowAI supports caching.
By default, the cache settings is auto, meaning that agent runs are cached when the temperature is 0
(the default temperature value) and no tools are used. Which means that, when running the same agent (without tools) twice with the exact same input, the exact same output is returned and the underlying model is not called a second time.
The cache usage string literal is defined in cache_usage.py file. There are 3 possible values:
auto: (default) Use cached results only when temperature is 0, and no tools are usedalways: Always use cached results if available, regardless of model temperaturenever: Never use cached results, always execute a new run
Reply to a run
For some use-cases (for example, chatbots), you want to reply to a previously created run to maintain conversation history. Use the reply method from the Run object.
For example, a simple travel chatbot agent can be created as follows:
When using run.reply, WorkflowAI will automatically keep the conversation history.
Note that the output schema of the reply will use the same output schema as the original run.
You can continue to reply to the run as many times as you want.
Another use-case for run.reply is to ask a follow-up question, or ask the LLM to double-check its previous answer.
Using multiple clients
You might want to avoid using the shared client, for example if you are using multiple API keys or accounts. It is possible to achieve this by manually creating client instances
Field properties
Pydantic allows a variety of other validation criteria for fields: minimum, maximum, pattern, etc. This additional criteria are included the JSON Schema that is sent to WorkflowAI, and are sent to the model.
These arguments can be used to stir the model in the right direction. The caveat is have a validation that is too strict can lead to invalid generations. In case of an invalid generation:
WorkflowAI retries the inference once by providing the model with the invalid output and the validation error
if the model still fails to generate a valid output, the run will fail with an
InvalidGenerationError. the partial output is available in thepartial_outputattribute of theInvalidGenerationError
Last updated