Build a GitHub PR Creator Agent

_{Last Updated:
June 30, 2025}

In this recipe, we’ll create an Agent that uses tools from Haystack’s GitHub integration. Given a GitHub issue URL, the agent will not only comment on the issue but it will also fork the repository and open a pull request.

Step-by-step, the agent will:

Fetch and parse the issue description and comments
Identify the relevant directories and files
Determine the next steps for resolution and post them as a comment
Fork the repository and create a new branch
Open a pull request from the newly created branch to the original repository

For this, we’ll use Haystack’s Agent component. It implements a tool-calling functionality with provider-agnostic chat model support. We can use Agent either as a standalone component or within a pipeline.

Install dependencies

%pip install github-haystack -q
%pip install anthropic-haystack -q

GitHub Issue Resolver

First, we’ll create a GitHub issue resolver agent, following the steps in this recipe: Build a GitHub Issue Resolver Agent

import os
from getpass import getpass
from typing import List

from haystack import Pipeline
from haystack.components.agents import Agent
from haystack.components.builders import ChatPromptBuilder
from haystack.components.converters import OutputAdapter
from haystack.dataclasses import ChatMessage, Document
from haystack.tools.from_function import tool

from haystack_integrations.components.connectors.github import GitHubIssueViewer
from haystack_integrations.components.generators.anthropic import AnthropicChatGenerator
from haystack_integrations.prompts.github import SYSTEM_PROMPT
from haystack_integrations.tools.github import GitHubRepoViewerTool

os.environ["ANTHROPIC_API_KEY"] = getpass("Anthropic Key: ")

repo_viewer_tool = GitHubRepoViewerTool()

@tool
def create_comment(comment: str) -> str:
    """
    Use this to create a Github comment once you finished your exploration.
    """
    return comment

In this recipe, we simulate creating a comment on GitHub with the above tool for demonstration purposes. For real use cases, you can use GitHubIssueCommenterTool.

# from haystack_integrations.tools.github import GitHubIssueCommenterTool
# issue_commenter_tool = GitHubIssueCommenterTool()

chat_generator = AnthropicChatGenerator(model="claude-sonnet-4-20250514", generation_kwargs={"max_tokens": 8000})

agent = Agent(
    chat_generator=chat_generator,
    system_prompt=SYSTEM_PROMPT,
    tools=[repo_viewer_tool, create_comment],
    exit_conditions=["create_comment"],
    state_schema={"documents": {"type": List[Document]}},
)

issue_template = """
Issue from: {{ url }}
{% for document in documents %}
{% if loop.index == 1 %}
**Title: {{ document.meta.title }}**
{% endif %}
<issue-comment>
{{document.content}}
</issue-comment>
{% endfor %}
    """

issue_builder = ChatPromptBuilder(template=[ChatMessage.from_user(issue_template)], required_variables="*")

issue_fetcher = GitHubIssueViewer()

pp = Pipeline()

pp.add_component("issue_fetcher", issue_fetcher)
pp.add_component("issue_builder", issue_builder)
pp.add_component("agent", agent)

pp.connect("issue_fetcher.documents", "issue_builder.documents")
pp.connect("issue_builder.prompt", "agent.messages")

<haystack.core.pipeline.pipeline.Pipeline object at 0x7ff0f338c0d0>
🚅 Components
  - issue_fetcher: GitHubIssueViewer
  - issue_builder: ChatPromptBuilder
  - agent: Agent
🛤️ Connections
  - issue_fetcher.documents -> issue_builder.documents (List[Document])
  - issue_builder.prompt -> agent.messages (List[ChatMessage])

#pp.show()

Now we have a pipeline with an Agent that receives a GitHub issue URL as input, explores the files in the repository and comments on the GitHub issue with a proposed solution.

issue_url = "https://github.com/deepset-ai/haystack-core-integrations/issues/1268"

result = pp.run({"url": issue_url})

from IPython.display import Markdown, display

display(Markdown("# Comment from Agent\n\n" + result["agent"]["last_message"].tool_call_result.result))

# Comment from Agent

I can confirm that this issue still exists in the current codebase. While the changelog mentions that version 3.1.1 fixed "OpenSearch custom_query use without filters", the fix appears to be incomplete.

## Problem Analysis

The issue occurs in the `_prepare_embedding_search_request` method when using a `custom_query` with empty filters. Looking at the current code:

/```python
body = self._render_custom_query(
    custom_query,
    {
        "$query_embedding": query_embedding,
        "$filters": normalize_filters(filters) if filters else None,
    },
)
/```

While this looks like it should work (it conditionally calls `normalize_filters`), there's a subtle problem: when `filters` is an empty dict `{}`, the conditional `if filters` evaluates to `False`, so `None` is passed for `$filters`. However, **empty dict `{}` is not the same as `None`** - an empty dict is still "truthy" in terms of being a dict object, but it fails the boolean check used here.

## Root Cause

The issue is that `if filters:` returns `False` for empty dict `{}`, but `normalize_filters({})` still gets called in some code paths, or the `None` value causes issues in the OpenSearch query.

Looking at the `normalize_filters` function:

/```python
def normalize_filters(filters: Dict[str, Any]) -> Dict[str, Any]:
    if not isinstance(filters, dict):
        msg = "Filters must be a dictionary"
        raise FilterError(msg)

    if "field" in filters:
        return {"bool": {"must": _parse_comparison_condition(filters)}}
    return _parse_logical_condition(filters)
/```

And `_parse_logical_condition`:

/```python
def _parse_logical_condition(condition: Dict[str, Any]) -> Dict[str, Any]:
    if "operator" not in condition:
        msg = f"'operator' key missing in {condition}"
        raise FilterError(msg)
/```

So when an empty dict `{}` is passed to `normalize_filters`, it doesn't have a "field" key, so it goes to `_parse_logical_condition`, which then fails because there's no "operator" key.

## Recommended Fix

The fix should be to properly handle empty or None filters by only including the `$filters` placeholder when there are actual filters to substitute. Here's the corrected approach:

**For `_prepare_embedding_search_request`:**

/```python
if isinstance(custom_query, dict):
    substitutions = {"$query_embedding": query_embedding}
    if filters:  # Only add $filters if there are actual filters
        substitutions["$filters"] = normalize_filters(filters)
    body = self._render_custom_query(custom_query, substitutions)
/```

**For `_prepare_bm25_search_request`:**

/```python
if isinstance(custom_query, dict):
    substitutions = {"$query": query}
    if filters:  # Only add $filters if there are actual filters  
        substitutions["$filters"] = normalize_filters(filters)
    body = self._render_custom_query(custom_query, substitutions)
/```

This approach ensures that:
1. Empty filters (`{}`) don't get passed to `normalize_filters` 
2. The `$filters` placeholder is only included in custom queries when there are actual filters
3. Custom queries that don't use the `$filters` placeholder work correctly regardless of the filters parameter

This matches the original suggestion in the issue report and would properly resolve the problem for users trying to use custom queries without filters.

Let’s see what files our Agent looked at

for document in result["agent"]["documents"]:
    if document.meta["type"] in ["file_content"]:
        display(Markdown(f"[{document.meta['url']}]({document.meta['url']})"))

https://github.com/deepset-ai/haystack-core-integrations/blob/main/integrations/opensearch/src/haystack_integrations/document_stores/opensearch/document_store.py

https://github.com/deepset-ai/haystack-core-integrations/blob/main/integrations/opensearch/src/haystack_integrations/document_stores/opensearch/filters.py

https://github.com/deepset-ai/haystack-core-integrations/blob/main/integrations/opensearch/src/haystack_integrations/components/retrievers/opensearch/embedding_retriever.py

https://github.com/deepset-ai/haystack-core-integrations/blob/main/integrations/opensearch/CHANGELOG.md

https://github.com/deepset-ai/haystack-core-integrations/blob/main/integrations/opensearch/tests/test_embedding_retriever.py

https://github.com/deepset-ai/haystack-core-integrations/blob/main/integrations/opensearch/src/haystack_integrations/document_stores/opensearch/document_store.py

From Agent to Multi-Agent

In the next step, we’ll make this agent a little more powerful. We will pass the issue comments and the generated proposal to a second agent. We also fork the original repository so that we can make edits. For forking the repository, we need a personal access token from GitHub.

The Agent will then:

view relevant files
perform edits commit by commit
return a PR title and description once it is ready to go

# Either classic token or a fine-grained token that can create repositories and commit code
os.environ["GITHUB_TOKEN"] = getpass("Github Token: ")

from haystack_integrations.components.connectors.github import GitHubRepoForker
from haystack_integrations.prompts.github import FILE_EDITOR_PROMPT, FILE_EDITOR_SCHEMA, PR_CREATOR_PROMPT
from haystack_integrations.tools.github import GitHubFileEditorTool

repo_forker = GitHubRepoForker(create_branch=True, auto_sync=True, wait_for_completion=True)
pp.add_component("repo_forker", repo_forker)

file_editor_tool = GitHubFileEditorTool()

@tool
def create_pr(title: str, body: str) -> str:
    """
    Use this to create a Github PR once you are done with your changes.
    """
    return title + "\n\n" + body

In this recipe, we simulate creating a comment on GitHub with the above tool for demonstration purposes. For real use cases, you can use GitHubPRCreatorTool.

# from haystack_integrations.tools.github import GitHubPRCreatorTool
# pr_creator_tool = GitHubPRCreatorTool()

pr_chat_generator = AnthropicChatGenerator(model="claude-sonnet-4-20250514", generation_kwargs={"max_tokens": 8000})

pr_agent = Agent(
    chat_generator=pr_chat_generator,
    system_prompt=PR_CREATOR_PROMPT,
    tools=[file_editor_tool, create_pr, repo_viewer_tool],
    exit_conditions=["create_pr"],
    state_schema={"repo": {"type": str}, "branch": {"type": str}, "title": {"type": str}, "documents": {"type": List[Document]}},
)

pp.add_component("pr_agent", pr_agent)
adapter = OutputAdapter(
    template="{{issue_messages + [((agent_messages|last).tool_call_result.result)|user_message]}}",
    custom_filters={"user_message": ChatMessage.from_user},
    output_type=List[ChatMessage], unsafe=True
)
pp.add_component("adapter", adapter)

WARNING:haystack.components.converters.output_adapter:Unsafe mode is enabled. This allows execution of arbitrary code in the Jinja template. Use this only if you trust the source of the template.

pp.connect("repo_forker.issue_branch", "pr_agent.branch")
pp.connect("repo_forker.repo", "pr_agent.repo")
pp.connect("agent.messages", "adapter.agent_messages")
pp.connect("issue_builder.prompt", "adapter.issue_messages")
pp.connect("adapter.output", "pr_agent.messages")

<haystack.core.pipeline.pipeline.Pipeline object at 0x7ff0f338c0d0>
🚅 Components
  - issue_fetcher: GitHubIssueViewer
  - issue_builder: ChatPromptBuilder
  - agent: Agent
  - repo_forker: GitHubRepoForker
  - pr_agent: Agent
  - adapter: OutputAdapter
🛤️ Connections
  - issue_fetcher.documents -> issue_builder.documents (List[Document])
  - issue_builder.prompt -> agent.messages (List[ChatMessage])
  - issue_builder.prompt -> adapter.issue_messages (List[ChatMessage])
  - agent.messages -> adapter.agent_messages (List[ChatMessage])
  - repo_forker.issue_branch -> pr_agent.branch (str)
  - repo_forker.repo -> pr_agent.repo (str)
  - adapter.output -> pr_agent.messages (List[ChatMessage])

#pp.show()

result = pp.run(data={"url": issue_url})

from IPython.display import Markdown, display

display(Markdown("# Comment from Agent\n\n" + result["pr_agent"]["last_message"].tool_call_result.result))

# Comment from Agent

Fix OpenSearch custom_query with empty filters

## Summary

This PR fixes an issue where using `custom_query` with `OpenSearchEmbeddingRetriever` or `OpenSearchBM25Retriever` would fail when empty filters (`{}`) were provided.

## Problem

When using custom queries with empty filters dict (`{}`), the code would incorrectly attempt to normalize the empty filters, causing a `FilterError: 'operator' key missing in {}`.

## Root Cause

The conditional check `if filters` in both `_prepare_bm25_search_request` and `_prepare_embedding_search_request` methods evaluates to `True` for empty dictionaries, causing `normalize_filters({})` to be called even though empty dicts should be treated the same as `None`.

## Solution

Updated the conditional checks to explicitly handle empty dictionaries:

/```python
# Before
"$filters": normalize_filters(filters) if filters else None,

# After  
"$filters": normalize_filters(filters) if filters and filters != {} else None,
/```

This ensures that both `None` and `{}` are treated as "no filters" and result in `$filters` being set to `None` in the custom query substitutions.

## Changes Made

1. **Fixed `_prepare_bm25_search_request`** (line 500): Updated filter condition to handle empty dicts
2. **Fixed `_prepare_embedding_search_request`** (line 657): Updated filter condition to handle empty dicts  
3. **Added integration tests**: Created comprehensive tests to verify the fix works for both retriever types

## Testing

- Added new test cases for both embedding and BM25 retrievers with empty filters
- Existing tests continue to pass
- Verified that valid filters still work correctly
- Confirmed that `None` filters continue to work as expected

## Backwards Compatibility

This change is fully backwards compatible. It only affects the edge case where empty filter dicts were previously causing errors - now they work as expected.

Fixes #1268
```text