๐Ÿ†• Build and deploy Haystack pipelines with deepset Studio

Integration: LanceDB Haystack

A DocumentStore backed by LanceDB

Authors
Alan Meeson

Table of Contents

Overview

LanceDB-Haystack is an embedded LanceDB backed Document Store for Haystack 2.X.

Installation

The current simplest way to get LanceDB-Haystack is to install from GitHub via pip:

pip install lancedb-haystack

Usage

import pyarrow as pa
from lancedb_haystack import LanceDBDocumentStore
from lancedb_haystack import LanceDBEmbeddingRetriever, LanceDBFTSRetriever

# Declare the metadata fields schema, this lets us filter using it.
# See: https://arrow.apache.org/docs/python/api/datatypes.html
metadata_schema = pa.struct([
  ('title', pa.string()),    
  ('publication_date', pa.timestamp('s')),
  ('page_number', pa.int32()),
  ('topics', pa.list_(pa.string()))
])

# Create the DocumentStore
document_store = LanceDBDocumentStore(
  database='my_database', 
  table_name="documents", 
  metadata_schema=metadata_schema, 
  embedding_dims=384
)

# Create an embedding retriever
embedding_retriever = LanceDBEmbeddingRetriever(document_store)

# Create a Full Text Search retriever
fts_retriever = LanceDBFTSRetriever(document_store)

See also examples/pipeline-usage.ipynb for a full worked example, and the API Reference.

License

Apache License 2.0