AI / RAG Architecture

SRag

A modern, lightweight implementation for Retrieval-Augmented Generation. SRag streamlines the connection between your document embeddings and Large Language Models.

How it Works

SRag simplifies the vectorization pipeline. It ingests your data, creates optimized embeddings, and provides a clean API for semantic retrieval.

  • Ingestion Engine: Automatically parses text, PDFs, and Markdown files into manageable chunks.
  • Vectorization: Generates dense embeddings optimized for rapid semantic search.
  • Query Router: Intercepts user prompts, retrieves relevant context, and augments the LLM input seamlessly.
import os from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings from llama_index.llms.ollama import Ollama from llama_index.embeddings.ollama import OllamaEmbedding # Setup local models via Ollama Settings.llm = Ollama(model="qwen2.5:7b", request_timeout=120.0) Settings.embed_model = OllamaEmbedding(model_name="nomic-embed-text") # Load documents and generate vector index documents = SimpleDirectoryReader("docs").load_data() index = VectorStoreIndex.from_documents(documents) # Initialize strict context-based chat engine chat_engine = index.as_chat_engine( chat_mode="context", similarity_top_k=20, verbose=False )

Core Architecture

Embeddings

Support for local models via HuggingFace or external APIs like OpenAI.

Vector DB

Lightweight and in-memory persistent storage for ultra-fast document retrieval.

LLM Agnostic

Compatible with Ollama for local inference or any standard API backend.