Integration: MLflow

Trace, evaluate, and monitor your Haystack applications with MLflow.

Authors

MLflow

GitHub Repo PyPI Package

Overview
Installation
Usage
- Trace a RAG Pipeline
- Use MLflow AI Gateway as an LLM Backend
License

Overview

MLflow is an open-source platform for managing the end-to-end machine learning and AI lifecycle. MLflow provides native tracing support for Haystack through its autolog integration, giving you full visibility into your Haystack pipeline execution.

MLflow Tracing offers:

Hierarchical trace visualization of every component, LLM call, retriever step, and pipeline execution
Automatic token usage and cost tracking for each LLM call
Built-in evaluation framework with LLM judges and custom scorers
Prompt versioning and management across your AI applications
Fully open-source with no vendor lock-in, self-host or use Managed MLflow in the cloud

You can learn more about the integration in MLflow’s Haystack integration guide.

Installation

pip install mlflow haystack-ai

To start the MLflow tracking server:

mlflow server --port 5000

The MLflow UI will be available at http://localhost:5000.

Usage

Enable tracing for Haystack with a single line of code. This automatically captures traces from all Haystack pipelines and components.

Trace a RAG Pipeline

import mlflow

from haystack import Document, Pipeline
from haystack.components.builders.chat_prompt_builder import ChatPromptBuilder
from haystack.components.generators.chat import OpenAIChatGenerator
from haystack.components.retrievers.in_memory import InMemoryBM25Retriever
from haystack.dataclasses import ChatMessage
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack.utils import Secret

# Enable MLflow tracing for Haystack
mlflow.haystack.autolog()
mlflow.set_experiment("Haystack")

# Write documents to InMemoryDocumentStore
document_store = InMemoryDocumentStore()
document_store.write_documents(
    [
        Document(content="My name is Jean and I live in Paris."),
        Document(content="My name is Mark and I live in Berlin."),
        Document(content="My name is Giorgio and I live in Rome."),
    ]
)

# Build a RAG pipeline
prompt_template = [
    ChatMessage.from_system("You are a helpful assistant."),
    ChatMessage.from_user(
        "Given these documents, answer the question.\n"
        "Documents:\n{% for doc in documents %}{{ doc.content }}{% endfor %}\n"
        "Question: {{question}}\n"
        "Answer:"
    ),
]

prompt_builder = ChatPromptBuilder(
    template=prompt_template, required_variables={"question", "documents"}
)

retriever = InMemoryBM25Retriever(document_store=document_store)
llm = OpenAIChatGenerator(api_key=Secret.from_env_var("OPENAI_API_KEY"))

rag_pipeline = Pipeline()
rag_pipeline.add_component("retriever", retriever)
rag_pipeline.add_component("prompt_builder", prompt_builder)
rag_pipeline.add_component("llm", llm)
rag_pipeline.connect("retriever", "prompt_builder.documents")
rag_pipeline.connect("prompt_builder", "llm.messages")

# Ask a question
question = "Who lives in Paris?"
results = rag_pipeline.run(
    {
        "retriever": {"query": question},
        "prompt_builder": {"question": question},
    }
)

print(results["llm"]["replies"])

Open the MLflow UI at http://localhost:5000 and navigate to the Traces tab to see detailed traces of your pipeline execution, including component spans, LLM calls, and token usage.

Disable Tracing

Auto-tracing for Haystack can be disabled by calling:

mlflow.haystack.autolog(disable=True)

Use MLflow AI Gateway as an LLM Backend

MLflow AI Gateway (MLflow ≥ 3.0) is a database-backed LLM proxy that routes requests to multiple providers — OpenAI, Anthropic, Gemini, Mistral, Bedrock, Ollama, and more — through a single OpenAI-compatible endpoint. Provider API keys are stored encrypted on the server, and features like fallback/retry, traffic splitting, and budget tracking are configured in the MLflow UI with no code changes needed.

Since the gateway exposes an OpenAI-compatible API, you can use Haystack’s built-in OpenAIChatGenerator with a custom api_base_url:

1. Install MLflow and start the server:

pip install mlflow[genai]
mlflow server --host 127.0.0.1 --port 5000

2. Create a gateway endpoint in the MLflow UI at http://localhost:5000. Navigate to AI Gateway → Create Endpoint, select a provider and model, and enter your provider API key. See the MLflow AI Gateway documentation for details.

MLflow AI Gateway — Create Endpoint

3. Use the endpoint in a Haystack pipeline:

from haystack import Pipeline
from haystack.components.builders import ChatPromptBuilder
from haystack.components.generators.chat import OpenAIChatGenerator
from haystack.dataclasses import ChatMessage
from haystack.utils import Secret

pipe = Pipeline()
pipe.add_component("prompt_builder", ChatPromptBuilder())
pipe.add_component(
    "llm",
    OpenAIChatGenerator(
        model="my-chat-endpoint",  # your MLflow Gateway endpoint name
        api_key=Secret.from_token("unused"),  # provider keys live on the server
        api_base_url="http://localhost:5000/gateway/openai/v1",
    ),
)
pipe.connect("prompt_builder", "llm")

messages = [ChatMessage.from_user("What is MLflow AI Gateway?")]
result = pipe.run({"prompt_builder": {"template": messages}})
print(result["llm"]["replies"][0].text)

You can also use it standalone:

from haystack.components.generators.chat import OpenAIChatGenerator
from haystack.dataclasses import ChatMessage
from haystack.utils import Secret

generator = OpenAIChatGenerator(
    model="my-chat-endpoint",
    api_key=Secret.from_token("unused"),
    api_base_url="http://localhost:5000/gateway/openai/v1",
)

messages = [ChatMessage.from_user("What is MLflow AI Gateway?")]
result = generator.run(messages)
print(result["replies"][0].text)

License

MLflow is distributed under the terms of the Apache-2.0 license.