Maintained by deepset

Integration: Snowflake

A Snowflake integration that allows table retrieval from a Snowflake database.

Authors

Mohamed Sriha

deepset

GitHub Repo PyPI Package

Table of Contents

Snowflake table retriever for Haystack
- Installation
- Usage
- Examples
- License

Installation

Use pip to install Snowflake:

pip install snowflake-haystack

Usage

Once installed, initialize the SnowflakeTableRetriever to use it with Haystack. The integration supports multiple authentication methods including Multi-Factor Authentication (MFA).

Authentication Methods

Password Authentication

from haystack_integrations.components.retrievers.snowflake import SnowflakeTableRetriever
from haystack.utils import Secret

# Traditional password authentication
executor = SnowflakeTableRetriever(
    user="<ACCOUNT-USER>",
    account="<ACCOUNT-IDENTIFIER>",
    authenticator="SNOWFLAKE",
    api_key=Secret.from_env_var("SNOWFLAKE_API_KEY"),
    warehouse="<WAREHOUSE-NAME>",
)

Key-pair Authentication (MFA)

# JWT-based authentication using private key files
executor = SnowflakeTableRetriever(
    user="<ACCOUNT-USER>",
    account="<ACCOUNT-IDENTIFIER>",
    authenticator="SNOWFLAKE_JWT",
    private_key_file=Secret.from_env_var("SNOWFLAKE_PRIVATE_KEY_FILE"),
    private_key_file_pwd=Secret.from_env_var("SNOWFLAKE_PRIVATE_KEY_PWD"),  # Optional if key is encrypted
    warehouse="<WAREHOUSE-NAME>",
)

OAuth Authentication (MFA)

# OAuth-based authentication
executor = SnowflakeTableRetriever(
    user="<ACCOUNT-USER>",
    account="<ACCOUNT-IDENTIFIER>",
    authenticator="OAUTH",
    oauth_client_id=Secret.from_env_var("SNOWFLAKE_OAUTH_CLIENT_ID"),
    oauth_client_secret=Secret.from_env_var("SNOWFLAKE_OAUTH_CLIENT_SECRET"),
    oauth_token_request_url="<TOKEN-REQUEST-URL>",
    warehouse="<WAREHOUSE-NAME>",
)

Authentication Parameters

The SnowflakeTableRetriever supports three authentication methods:

SNOWFLAKE: Traditional password authentication
- Requires: api_key (password)
SNOWFLAKE_JWT: Key-pair authentication with JWT tokens (MFA)
- Requires: private_key_file (path to private key file)
- Optional: private_key_file_pwd (passphrase if key is encrypted)
OAUTH: OAuth-based authentication (MFA)
- Requires: oauth_client_id, oauth_client_secret, oauth_token_request_url

Database Access

Ensure you have select access to the tables before querying the database. More details here:

# Warm up the component so it connects to the database
executor.warm_up()

# Run the retriever
response = executor.run(query="""select * from database_name.schema_name.table_name""")

During component initialization, you could provide the schema and database name to avoid needing to provide them in the SQL query:

executor = SnowflakeTableRetriever(
    ...
    schema_name="<SCHEMA-NAME>",
    database ="<DB-NAME>"
)
executor.warm_up()

response = executor.run(query="""select * from table_name""")

Snowflake table retriever returns a Pandas dataframe and a Markdown version of the table:


print(response["dataframe"].head(2))  # Pandas dataframe
#   Column 1  Column 2
# 0       Value1 Value2
# 1       Value1 Value2

print(response["table"])  # Markdown
# | Column 1  | Column 2  |
# |:----------|:----------|
# | Value1    | Value2    |
# | Value1    | Value2    |

Using SnowflakeTableRetriever within a pipeline:

from haystack import Pipeline
from haystack.utils import Secret
from haystack.components.builders import PromptBuilder
from haystack.components.generators import OpenAIGenerator
from haystack_integrations.components.retrievers.snowflake import SnowflakeTableRetriever

executor = SnowflakeTableRetriever(
    user="<ACCOUNT-USER>",
    account="<ACCOUNT-IDENTIFIER>",
    authenticator="SNOWFLAKE",
    api_key=Secret.from_env_var("SNOWFLAKE_API_KEY"),
    warehouse="<WAREHOUSE-NAME>",
)

pipeline = Pipeline()
pipeline.add_component("builder", PromptBuilder(template="Describe this table: {{ table }}"))
pipeline.add_component("snowflake", executor)
pipeline.add_component("llm", OpenAIGenerator(model="gpt-4o"))

pipeline.connect("snowflake.table", "builder.table")
pipeline.connect("builder", "llm")

pipeline.run(data={"query": "select employee, salary from table limit 10;"})

Examples

You can find a code example showing how to use the Snowflake Retriever under the example/ folder of this repo.

License

snowflake-haystack is distributed under the terms of the Apache-2.0 license.