REFERENCE(s):
Proses 1.
Menyimpan Data kedalam Database
Proses 2.
Mencari Jawaban Pertanyaan dari Database hasil Proses(1)
aka SEARCH ALGORITHM (past-knowledge: using binary tree)
In 2026, the Retrieval-Augmented Generation (RAG) pipeline has evolved from a simple "search-and-paste" script into a sophisticated, multi-layered architecture often referred to as Agentic RAG.
At its core, a RAG pipeline is a system that allows a Large Language Model (LLM) to look up information from a specific, external knowledge base before generating a response. This ensures the output is grounded in facts and up-to-date.
🏗️ The 2026 RAG Architecture
Modern pipelines are split into two distinct phases: the Indexing Pipeline (offline) and the Retrieval/Inference Pipeline (online).
1. The Indexing Pipeline (Data Ingestion)
This is how you prepare your knowledge so the AI can "read" it later.
- Ingestion & Parsing: Extracting text, tables, and images from PDFs, wikis, or databases.
- Chunking: Breaking long documents into smaller, semantically meaningful pieces.
- Embedding: Converting text chunks into numerical vectors (lists of numbers) using an embedding model.
- Vector Storage: Saving these vectors in a specialized database (e.g., Pinecone, Milvus, or Weaviate) for fast similarity searching.
2. The Retrieval & Generation Pipeline (User Interaction)
When a user asks a question, the following "online" steps occur:
- Query Transformation: The system rewrites the user's question to make it better for searching (e.g., "Multi-query" or "HyDE" strategies).
- Retrieval (Hybrid Search): The system searches the vector store using Semantic Search (meaning-based) and Keyword Search (exact word-based) simultaneously.
- Reranking: A secondary, more powerful model (Cross-Encoder) takes the top 50 results and picks the 5 most relevant ones to ensure high precision.
- Augmentation: The top 5 chunks are "stuffed" into the prompt alongside the original question.
- Generation: The LLM reads the context and generates an answer, typically including citations to the source documents.
📊 Comparison: Basic vs. Production RAG
| Feature | Basic RAG (2023-24) | Production/Agentic RAG (2026) |
|---|---|---|
| Search Method | Vector (Semantic) only | Hybrid Search (Vector + Keyword) |
| Logic | Linear (Step A → B → C) | Agentic (Self-correcting loops) |
| Accuracy | ~60-70% | 85%+ (due to Reranking) |
| Data Types | Text only | Multimodal (Text, Tables, Images) |
| Security | None | RBAC (Role-Based Access Control) |
🚀 Why RAG is the "Standard" in 2026
- Low Cost: Cheaper than fine-tuning a model every time your data changes.
- Freshness: You can update your database in seconds, and the AI will "know" the new info instantly.
- No Hallucinations: Because the AI is forced to use provided documents, it is much less likely to make things up.
- Transparency: Every answer comes with a source link, making the AI's "thought process" auditable.
Comments
Post a Comment