CHAT GPT
PIPELINE: Creating and Using

REFERENCE(s):



Proses 1. 

Menyimpan Data kedalam Database


Proses 2. 

Mencari Jawaban Pertanyaan dari Database hasil Proses(1)

aka SEARCH ALGORITHM (past-knowledge: using binary tree)


In 2026, the Retrieval-Augmented Generation (RAG) pipeline has evolved from a simple "search-and-paste" script into a sophisticated, multi-layered architecture often referred to as Agentic RAG.

At its core, a RAG pipeline is a system that allows a Large Language Model (LLM) to look up information from a specific, external knowledge base before generating a response. This ensures the output is grounded in facts and up-to-date.


🏗️ The 2026 RAG Architecture

Modern pipelines are split into two distinct phases: the Indexing Pipeline (offline) and the Retrieval/Inference Pipeline (online).

1. The Indexing Pipeline (Data Ingestion)

This is how you prepare your knowledge so the AI can "read" it later.

  • Ingestion & Parsing: Extracting text, tables, and images from PDFs, wikis, or databases.
  • Chunking: Breaking long documents into smaller, semantically meaningful pieces.
  • Embedding: Converting text chunks into numerical vectors (lists of numbers) using an embedding model.
  • Vector Storage: Saving these vectors in a specialized database (e.g., Pinecone, Milvus, or Weaviate) for fast similarity searching.

2. The Retrieval & Generation Pipeline (User Interaction)

When a user asks a question, the following "online" steps occur:

  • Query Transformation: The system rewrites the user's question to make it better for searching (e.g., "Multi-query" or "HyDE" strategies).
  • Retrieval (Hybrid Search): The system searches the vector store using Semantic Search (meaning-based) and Keyword Search (exact word-based) simultaneously.
  • Reranking: A secondary, more powerful model (Cross-Encoder) takes the top 50 results and picks the 5 most relevant ones to ensure high precision.
  • Augmentation: The top 5 chunks are "stuffed" into the prompt alongside the original question.
  • Generation: The LLM reads the context and generates an answer, typically including citations to the source documents.

📊 Comparison: Basic vs. Production RAG

Feature Basic RAG (2023-24) Production/Agentic RAG (2026)
Search Method Vector (Semantic) only Hybrid Search (Vector + Keyword)
Logic Linear (Step A → B → C) Agentic (Self-correcting loops)
Accuracy ~60-70% 85%+ (due to Reranking)
Data Types Text only Multimodal (Text, Tables, Images)
Security None RBAC (Role-Based Access Control)

🚀 Why RAG is the "Standard" in 2026

  • Low Cost: Cheaper than fine-tuning a model every time your data changes.
  • Freshness: You can update your database in seconds, and the AI will "know" the new info instantly.
  • No Hallucinations: Because the AI is forced to use provided documents, it is much less likely to make things up.
  • Transparency: Every answer comes with a source link, making the AI's "thought process" auditable.

Comments