SRag | Efficient Retrieval-Augmented Generation

How it Works

SRag simplifies the vectorization pipeline. It ingests your data, creates optimized embeddings, and provides a clean API for semantic retrieval.

Ingestion Engine: Automatically parses text, PDFs, and Markdown files into manageable chunks.
Vectorization: Generates dense embeddings optimized for rapid semantic search.
Query Router: Intercepts user prompts, retrieves relevant context, and augments the LLM input seamlessly.

import os
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings
from llama_index.llms.ollama import Ollama
from llama_index.embeddings.ollama import OllamaEmbedding

# Setup local models via Ollama
Settings.llm = Ollama(model="qwen2.5:7b", request_timeout=120.0)
Settings.embed_model = OllamaEmbedding(model_name="nomic-embed-text")

# Load documents and generate vector index
documents = SimpleDirectoryReader("docs").load_data()
index = VectorStoreIndex.from_documents(documents)

# Initialize strict context-based chat engine
chat_engine = index.as_chat_engine(
    chat_mode="context",
    similarity_top_k=20,
    verbose=False
)

Core Architecture

Embeddings

Support for local models via HuggingFace or external APIs like OpenAI.

Vector DB

Lightweight and in-memory persistent storage for ultra-fast document retrieval.

LLM Agnostic

Compatible with Ollama for local inference or any standard API backend.