SRag simplifies the vectorization pipeline. It ingests your data, creates optimized embeddings, and provides a clean API for semantic retrieval.
- Ingestion Engine: Automatically parses text, PDFs, and Markdown files into manageable chunks.
- Vectorization: Generates dense embeddings optimized for rapid semantic search.
- Query Router: Intercepts user prompts, retrieves relevant context, and augments the LLM input seamlessly.
import os
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings
from llama_index.llms.ollama import Ollama
from llama_index.embeddings.ollama import OllamaEmbedding
Settings.llm = Ollama(model="qwen2.5:7b", request_timeout=120.0)
Settings.embed_model = OllamaEmbedding(model_name="nomic-embed-text")
documents = SimpleDirectoryReader("docs").load_data()
index = VectorStoreIndex.from_documents(documents)
chat_engine = index.as_chat_engine(
chat_mode="context",
similarity_top_k=20,
verbose=False
)
Core Architecture
Embeddings
Support for local models via HuggingFace or external APIs like OpenAI.
Vector DB
Lightweight and in-memory persistent storage for ultra-fast document retrieval.
LLM Agnostic
Compatible with Ollama for local inference or any standard API backend.