Tutorial: Generating Structured Output with OpenAI

_{Last Updated:
December 29, 2025}

Level: Beginner
Time to complete: 15 minutes
Prerequisites: You must have an API key from an active OpenAI account as this tutorial uses a GPT model by OpenAI.
Components Used: OpenAIChatGenerator, OpenAIResponsesChatGenerator
Goal: Learn how to generate structured outputs with OpenAIChatGenerator or OpenAIResponsesChatGenerator using Pydantic model or JSON schema.

Overview

This tutorial shows how to produce structured outputs by either providing Pydantic model or JSON schema to OpenAIChatGenerator.

Note: Only latest model starting with gpt-4o-mini can be used for this feature.

Installing Dependencies

Install Haystack with pip:

%%bash

pip install -q "haystack-ai>=2.20.0"

Structured Outputs with `OpenAIChatGenerator`

Using Pydantic Models

First, we’ll see how to pass Pydantic model to OpenAIChatGenerator. For this purpose, we define two Pydantic models, City and CitiesData. These models specify the fields and types that represent the data structure we want.

from typing import List
from pydantic import BaseModel


class City(BaseModel):
    name: str
    country: str
    population: int


class CitiesData(BaseModel):
    cities: List[City]

You can change these models according to the format you wish to extract from the text.

OpenAIChatGenerator generates text using OpenAI’s GPT model by default. We pass our Pydantic model to response_format parameter in generation_kwargs .

We also need to set the OPENAI_API_KEY variable.

Note: You can also set the response_format in generation_kwargs param in the run method of chat generator.

import os
from getpass import getpass

from haystack.components.generators.chat import OpenAIChatGenerator

if "OPENAI_API_KEY" not in os.environ:
    os.environ["OPENAI_API_KEY"] = getpass("Enter OpenAI API key:")
chat_generator = OpenAIChatGenerator(generation_kwargs={"response_format": CitiesData})

Running the Component

Run the component with an example passage that you want to convert into a JSON format and the json_schema you have created for CitiesData. For the given example passage, the generated JSON object should be like:

{
  "cities": [
    {
      "name": "Berlin",
      "country": "Germany",
      "population": 3850809
    },
    {
      "name": "Paris",
      "country": "France",
      "population": 2161000
    },
    {
      "name": "Lisbon",
      "country": "Portugal",
      "population": 504718
    }
  ]
}

The output of the LLM should be compliant with the json_schema.

from haystack.dataclasses import ChatMessage

text = "Berlin is the capital of Germany. It has a population of 3,850,809. Paris, France's capital, has 2.161 million residents. Lisbon is the capital and the largest city of Portugal with the population of 504,718."
result = chat_generator.run(messages=[ChatMessage.from_user(text)])

Printing the Correct JSON

If you didn’t get any error, you can now print the corrected JSON.

import json
valid_reply = result["replies"][0].text
valid_json = json.loads(valid_reply)
print(valid_json)

Using JSON schema

Now, we’ll create a JSON schema of the CitiesData model and pass it to OpenAIChatGenerator. OpenAI expects schemas in a specific format, so the schema generated with model_json_schema() cannot be used directly.

For details on how to create schemas for OpenAI, see the OpenAI Structured Outputs guide.

cities_data_schema={
        "type": "json_schema",
        "json_schema": {
            "name": "CitiesData",
            "schema": {
                "type": "object",
                "properties": {
                    "cities": {
                        "type": "array",
                        "items": {
                            "type": "object",
                            "properties": {
                                "name": { "type": "string" },
                                "country": { "type": "string" },
                                "population": { "type": "integer" }
                            },
                            "required": ["name", "country", "population"],
                            "additionalProperties": False
                        }
                    }
                },
                "required": ["cities"],
                "additionalProperties": False
            },
            "strict": True
        }
    }

Pass this JSON schema to the response_format parameter in chat generator. We run the generator individually to see the output.

chat_generator = OpenAIChatGenerator(generation_kwargs={"response_format": cities_data_schema})

text = "Berlin is the capital of Germany. It has a population of 3,850,809. Paris, France's capital, has 2.161 million residents. Lisbon is the capital and the largest city of Portugal with the population of 504,718."
result = chat_generator.run(messages=[ChatMessage.from_user(text)])

print(result["replies"][0].text)

Structured Outputs with `OpenAIResponsesChatGenerator`

Using Pydantic Models

We’ll use the models City and CitiesData defined above. OpenAIResponsesChatGenerator generates text using OpenAI’s gpt-5-mini model by default. We pass our Pydantic model to text_format parameter in generation_kwargs when calling the run method.

Note: You can set the text_format for the generator by passing it in generation_kwargs, in init or run methods.

import os
from getpass import getpass

from haystack.components.generators.chat import OpenAIResponsesChatGenerator

if "OPENAI_API_KEY" not in os.environ:
    os.environ["OPENAI_API_KEY"] = getpass("Enter OpenAI API key:")
responses_generator = OpenAIResponsesChatGenerator(generation_kwargs={"text_format": CitiesData})

Let’s check the structured output with a simple user message.

responses_generator.run(messages=[ChatMessage.from_user("Berlin is the capital of Germany. It has a population of 3,850,809. Paris, France's capital, has 2.161 million residents.")])

Using JSON Schema

Now, we’ll create a JSON schema of the CitiesData model and pass it to OpenAIResponsesChatGenerator. We cannot use the same schema we defined for OpenAIChatGenerator as OpenAI Responses API expects a different format of schema. For further details, see the documentation.

cities_data_schema_responses={
    "format": {
        "type": "json_schema",
            "name": "CitiesData",
            "schema": {
                "type": "object",
                "properties": {
                    "cities": {
                        "type": "array",
                        "items": {
                            "type": "object",
                            "properties": {
                                "name": { "type": "string" },
                                "country": { "type": "string" },
                                "population": { "type": "integer" }
                            },
                            "required": ["name", "country", "population"],
                            "additionalProperties": False
                        }
                    }
                },
                "required": ["cities"],
                "additionalProperties": False
            },
            "strict": True
        }
    }

We pass our JSON schema to text parameter in generation_kwargs.

Note: You can also set the text in generation_kwargs param in the run method of the chat generator.

chat_generator = OpenAIResponsesChatGenerator(generation_kwargs={"text": cities_data_schema_responses})

result = chat_generator.run(messages=[ChatMessage.from_user("Berlin is the capital of Germany. It has a population of 3,850,809. Paris, France's capital, has 2.161 million residents.")])
parsed = json.loads(result["replies"][0].text)

print(parsed)

What’s next

🎉 Congratulations! You’ve learned how to easily produce structured ouputs with OpenAIChatGenerator and OpenAIResponsesChatGenerator using Pydantic models and JSON schema.

Other chat generators that also support structured outputs: MistralChatGenerator, OpenRouterChatGenerator, NvidiaChatGenerator, MetaLlamaChatGenerator, TogetherAIChatGenerator, LlamaStackChatGenerator and STACKITChatGenerator.

To stay up to date on the latest Haystack developments, you can subscribe to our newsletter and join Haystack discord community.

Thanks for reading!

Build an Extractive QA Pipeline

Retrieving a Context Window Around a Sentence

Tutorial: Generating Structured Output with OpenAI

Overview

Installing Dependencies

Structured Outputs with OpenAIChatGenerator

Using Pydantic Models

Running the Component

Printing the Correct JSON

Using JSON schema

Structured Outputs with OpenAIResponsesChatGenerator

Using Pydantic Models

Using JSON Schema

What’s next

Structured Outputs with `OpenAIChatGenerator`

Structured Outputs with `OpenAIResponsesChatGenerator`