Integration: fastRAG

fastRAG is a research framework for efficient and optimized retrieval augmented generative pipelines

Authors
Intel Labs

fastRAG is a research framework for efficient and optimized retrieval augmented generative pipelines, incorporating state-of-the-art LLMs and Information Retrieval. fastRAG is designed to empower researchers and developers with a comprehensive tool-set for advancing retrieval augmented generation.

Comments, suggestions, issues and pull-requests are welcomed! ❤️

IMPORTANT

Now compatible with Haystack v2+. Please report any possible issues you find.

📣 Updates

  • 2024-05: fastRAG V3 is Haystack 2.0 compatible 🔥
  • 2023-12: Gaudi2 and ONNX runtime support; Optimized Embedding models; Multi-modality and Chat demos; REPLUG text generation.
  • 2023-06: ColBERT index modification: adding/removing documents.
  • 2023-05: RAG with LLM and dynamic prompt synthesis example.
  • 2023-04: Qdrant DocumentStore support.

Key Features

  • Optimized RAG: Build RAG pipelines with SOTA efficient components for greater compute efficiency.
  • Optimized for Intel Hardware: Leverage Intel extensions for PyTorch (IPEX), 🤗 Optimum Intel and 🤗 Optimum-Habana for running as optimal as possible on Intel® Xeon® Processors and Intel® Gaudi® AI accelerators.
  • Customizable: fastRAG is built using Haystack and HuggingFace. All of fastRAG’s components are 100% Haystack compatible.

🚀 Components

For a brief overview of the various unique components in fastRAG refer to the Components Overview page.

LLM Backends
Intel Gaudi Accelerators Running LLMs on Gaudi 2
ONNX Runtime Running LLMs with optimized ONNX-runtime
OpenVINO Running quantized LLMs using OpenVINO
Llama-CPP Running RAG Pipelines with LLMs on a Llama CPP backend
Optimized Components
Embedders Optimized int8 bi-encoders
Rankers Optimized/sparse cross-encoders
RAG-efficient Components
ColBERT Token-based late interaction
Fusion-in-Decoder (FiD) Generative multi-document encoder-decoder
REPLUG Improved multi-document decoder
PLAID Incredibly efficient indexing engine

📍 Installation

Preliminary requirements:

  • Python 3.8 or higher.
  • PyTorch 2.0 or higher.

To set up the software, clone the project and run the following, preferably in a newly created virtual environment:

git clone https://github.com/IntelLabs/fastRAG.git
cd fastrag
pip install .

Usage

You can import components from fastRAG and use them in a Haystack pipeline:

from haystack.components.retrievers.in_memory import InMemoryBM25Retriever
from haystack.components.builders.prompt_builder import PromptBuilder
from haystack.components.rankers import TransformersSimilarityRanker

from fastrag.generators.openvino import OpenVINOGenerator

prompt_template = """
Given these documents, answer the question.
Documents:
{% for doc in documents %}
    {{ doc.content }}
{% endfor %}
Question: {{query}}
Answer:
"""

openvino_compressed_model_path = "path/to/quantized/model"

generator = OpenVINOGenerator(
    model="microsoft/phi-2",
    compressed_model_dir=openvino_compressed_model_path,
    device_openvino="CPU",
    task="text-generation",
    generation_kwargs={
        "max_new_tokens": 100,
    }
)

pipe = Pipeline()

pipe.add_component("retriever", InMemoryBM25Retriever(document_store=store))
pipe.add_component("ranker", ransformersSimilarityRanker())
pipe.add_component("prompt_builder", PromptBuilder(template=prompt_template))
pipe.add_component("llm", generator)

pipe.connect("retriever.documents", "ranker.documents")
pipe.connect("ranker", "prompt_builder.documents")
pipe.connect("prompt_builder", "llm")

query = "Who is the main villan in Lord of the Rings?"
answer_result = pipe.run({
    "prompt_builder": {
        "query": query
    },
    "retriever": {
        "query": query
    },
    "ranker": {
        "query": query,
        "top_k": 1
    }
})

print(answer_result["llm"]["replies"][0])
#' Sauron\n'

For more examples, check out Example Use Cases.

License

The code is licensed under the Apache 2.0 License.

Disclaimer

This is not an official Intel product.