Implementing Contextual RAG

The Contextual RAG system integrates Gravix Layer’s APIs to combine semantic search with LLM-powered generation, enhanced by conversational context. Key components include:

Document Ingestion: Processes multi-format documents (PDF, DOCX, TXT, Markdown) into chunks with metadata.
Vector Storage: Stores embeddings in Gravix Layer’s Vector Database using the baai/bge-large-en-v1.5 model.
Contextual Retrieval: Expands queries with conversation history, performs semantic search, and reranks results dynamically.
Response Generation: Uses meta-llama/llama-3.1-8b-instruct to generate responses incorporating conversation history and retrieved chunks.
Memory Management: Maintains a persistent conversation history (up to 10 interactions) for continuity.

This system outperforms traditional RAG by leveraging conversation-aware query expansion and contextual reranking, ensuring responses are relevant and coherent across multi-turn interactions.

Prerequisites

To use the system, ensure the following:

Python Environment: Python 3.8+ (notebook uses 3.9.7).
Gravix Layer API Key: Obtain from platform.gravixlayer.com. Set via:
```
os.environ['GRAVIXLAYER_API_KEY'] = 'your_key'
```
or use a .env file with GRAVIXLAYER_API_KEY=your_key.
Dependencies: Install required packages:
```
pip install gravixlayer PyPDF2 python-docx requests python-dotenv -q
```
- gravixlayer: Official SDK for Gravix Layer APIs.
- PyPDF2, python-docx: For PDF and DOCX processing.
- requests, python-dotenv: For API calls and environment management.
Sample Files: Documents (e.g., /path/to/Test.pdf) for testing ingestion.
Jupyter Notebook: Use Jupyter or a compatible IDE to run the notebook.

System Setup

Follow these steps to initialize the system:

Load Configuration: The notebook defines defaults in CONFIG:

CONFIG = {    'base_url': 'https://api.gravixlayer.com/v1',    'embedding_model': 'baai/bge-large-en-v1.5',    'llm_model': 'meta-llama/llama-3.1-8b-instruct',    'vector_dimension': 1024,    'similarity_metric': 'cosine',    'chunk_size': 800,    'chunk_overlap': 150,    'max_conversation_history': 10,    'retrieval_k': 5,    'rerank_k': 3}

Customize these (e.g., chunk_size) if needed.

Initialize Session State: Run the init_session_state() function to set up global variables:
- chat_history: Deque for conversation history (max 10 interactions).
- processed_files: List of ingested documents.
- document_chunks: Dictionary storing chunk IDs and text.
Create Vector Index: Run setup_contextual_rag('contextual-rag-demo') to create or find a vector index. This:
- Checks for existing indexes with list_vector_indexes().
- Creates a new index if none exists (create_vector_index).
- Stores the index ID in session_state['index_id'].

Check Status: Use show_system_status() to confirm setup:

show_system_status()

Output example:

### 📊 System StatusAPI Key: ✅ SetVector Index: ✅ 6cc50e37-a768-46cb-9c6d-34d9a0c7d9faDocuments Processed: 0Total Chunks: 0Conversation Messages: 0

Example Setup Code:

setup_contextual_rag('contextual-rag-demo')show_system_status()

Document Ingestion

Ingest documents to populate the knowledge base:

Single Document:
- Use ingest_document('/path/to/file.pdf', custom_name='Test.pdf').
- Process:
  - Reads file bytes (extract_text_from_file).
  - Supports PDF (page-wise extraction), DOCX (paragraphs), TXT, Markdown.
  - Chunks text with chunk_text_contextually (sentence-aware, 800 chars, 150-char overlap).
  - Generates metadata (filename, chunk_number, char_start/end, document_type, context_summary).
  - Embeds chunks using baai/bge-large-en-v1.5 and upserts to vector DB (upsert_vectors).
  - Updates session_state with processed files and chunks.
- Example:
  success = ingest_document('/Users/rupinajay/Downloads/Test.pdf')if success: print("Document ingested successfully!")
  Output:
  ### 📄 Ingesting Document: `Test.pdf`🔍 Extracting text...✅ Extracted 100034 characters🧩 Creating contextual chunks...✅ Created 167 chunks💾 Uploading to vector database...✅ Uploaded 167 chunks successfully✅ Document 'Test.pdf' successfully ingested!📊 Total documents: 1📊 Total chunks stored: 167
Bulk Ingestion:
- Use bulk_ingest_directory('/docs/') to process all supported files (PDF, DOCX, TXT, MD).
- Example:
  ingested = bulk_ingest_directory('/docs/', ['.pdf', '.txt'])print(f"Ingested {len(ingested)} files")
Metadata: Each chunk includes:
- filename: Document name.
- chunk_number: Sequential index.
- char_start/end: Text position.
- document_type: pdf/docx/text/markdown.
- context_summary: Top 5 keywords for context.
- has_previous/is_final: Structural flags.

Querying the System

Query the system to retrieve and generate context-aware responses:

Single Query:

Use query_contextual_rag("What is the main topic?").
Process:
- Fetches conversation context (get_conversation_context, last 3 interactions).
- Expands query with keywords (_expand_query_contextually).
- Searches vector DB (search_vectors, top_k=5), deduplicates (_deduplicate_hits), and reranks (_rerank_by_context) based on:
  - Conversation keywords (0.05 bonus per match).
  - Query type (e.g., 0.1 bonus for “what” queries matching “definition”).
  - Document structure (0.03 bonus for first chunks).
- Builds prompt with system instructions, conversation history, and top 3 chunks (_build_contextual_prompt).
- Generates response with meta-llama/llama-3.1-8b-instruct (max_tokens=600, temperature=0.1).
- Post-processes response (_post_process_response, e.g., cleans formatting).
- Saves query and response to chat_history.

Example:

result = query_contextual_rag("What is the main topic of the document?")

Output:

### 🧠 Contextual Query: What is the main topic of the document?════════════════════════════════════════════════════════════════🔍 Performing contextual retrieval...🔍 Starting contextual search for: 'What is the main topic...'📝 Generated 4 search variations🔍 Found 10 relevant chunks🎯 Found 17 unique results⭐ Selected top 3 most relevant chunks#### 📄 Retrieved Context:Recent Conversation Context: ...Source 1 [Score: 0.821] - Test.pdf (chunk 164, pdf): ...════════════════════════════════════════════════════════════════🤖 Generating contextual response...### 💡 Contextual Response:Based on the provided document context, the main topic is Artificial Intelligence (AI) in travel and tourism...🧠 Retrieval Summary:  - Analyzed 3 relevant chunks  - Used conversation history: Yes  - Sources: Test.pdf════════════════════════════════════════════════════════════════

Interactive Chat:

Use interactive_chat() for a conversational loop:

Input queries, ‘quit’ to exit, ‘clear’ to reset history, ‘status’ for system info.

Example:

interactive_chat()

Console:

🤖 Starting Interactive Contextual RAG ChatType 'quit' to exit, 'clear' to clear history, 'status' for system info────────────────────────────────────────────────────────────🧑 You: Explain what all AI concepts used in this doc🤖 Assistant: Based on the document context, the AI concepts include:- Predictive AI: Uses data to predict outcomes...- Expert Systems: Draws from knowledge bases...- AI Governance: Explores safety and regulation...

Options:
- show_context=True: Displays retrieved chunks.
- show_retrieval_info=True: Shows retrieval summary (chunks analyzed, sources).
- Returned dictionary: {"query", "response", "context", "hits", "conversation_context_used"}.

Key Features

The system’s contextual enhancements distinguish it from traditional RAG:

Conversational Context:
- Stores up to 10 interactions in chat_history (deque).
- Uses recent history (last 3 Q&A pairs) to inform retrieval and generation.
- Example: get_conversation_context() formats history as “Human: … Assistant: …”.
Query Enhancement:
- Expands queries with conversation keywords (extract_conversation_keywords, top 8 words).
- Generates variations (e.g., partial queries, question-type-specific).
- Example: For “What is AI?”, adds keywords like “predictive, expert” from prior chats.
Contextual Retrieval:
- Multi-query search (contextual_search_and_rerank) with deduplication.
- Reranks chunks based on:
  - Keyword matches (0.05 bonus per match).
  - Query type alignment (e.g., 0.1 bonus for “how” queries matching “process”).
  - Structural relevance (0.03 bonus for chunk_number=0).
- Selects top 3 chunks (rerank_k=3).
Memory Management:
- Persistent session_state tracks files, chunks, and history.
- Export via export_conversation('json', 'chat_log.json') or TXT format.
Multi-Document Support:
- Handles PDF, DOCX, TXT, Markdown with format-specific extraction.
- Metadata includes context_summary (top 5 keywords per chunk).
Real-Time Processing:
- Supports interactive document uploads and querying.
- Example: Ingest a PDF and query immediately.

Advanced Usage

Enhance the system with these features:

Customize Configuration:
- Adjust CONFIG (e.g., chunk_size=1000, retrieval_k=10).
- Example:
  CONFIG['chunk_size'] = 1000CONFIG['retrieval_k'] = 10

Metadata Filtering:

Search chunks by metadata:

search_by_metadata({'document_type': 'pdf', 'filename': 'Test.pdf'}, top_k=10)

Output:

🔍 Searching by metadata: {'document_type': 'pdf', 'filename': 'Test.pdf'}✅ Found 5 documents matching filters1. Test.pdf (chunk 0)2. Test.pdf (chunk 1)...

Conversation Management:
- View history: show_conversation_history().
- Clear history: clear_conversation().
- Export:
  export_conversation('txt', 'chat_log.txt')
  Output: conversation_20250925_1759.txt.

Document Summary:

Use get_document_summary():

summary = get_document_summary()print(summary)

Output:

{'total_documents': 1, 'total_chunks': 167, 'document_types': {'pdf': 1}, 'files': ['Test.pdf']}

Extend Reranking:
- Modify _rerank_by_context to add custom bonuses (e.g., for specific keywords).
- Example:
  if 'machine learning' in chunk_text: context_bonus += 0.1
Integrate APIs:
- Add external data sources or alternative LLMs by modifying generate_contextual_response.

Best Practices

Optimize Chunking: Adjust chunk_size and chunk_overlap for your documents (e.g., larger for dense texts).
Query Clarity: Use specific queries to leverage contextual reranking effectively.
History Management: Clear history periodically for long sessions (clear_conversation()).
Metadata Usage: Leverage search_by_metadata for targeted retrieval.
Testing: Start with small documents (e.g., a single PDF) to validate setup.
Backup: Export conversations regularly (export_conversation) to save progress.
Monitor API Usage: Track Gravix Layer API calls to avoid rate limits (see platform.gravixlayer.com).

Conclusion

The Contextual RAG system with Gravix Layer is a powerful tool for building context-aware, interactive AI applications. Its strengths lie in:

Conversational Continuity: Persistent history ensures coherent multi-turn interactions.
Enhanced Retrieval: Query expansion and contextual reranking outperform traditional RAG.
Flexibility: Supports diverse documents and extensible features.

To get started:

setup_contextual_rag('contextual-rag-demo')ingest_document('/path/to/Test.pdf')query_contextual_rag('What is the main topic?')interactive_chat()

Notebook: Contextual RAG Notebook For updates or contributions, check Gravix Layer’s GitHub or SDK documentation. Experiment with the notebook to explore its full potential, and extend it for your use case (e.g., custom reranking, additional APIs).

Getting Started

RAG

Advanced

​Prerequisites​

​System Setup​

​Document Ingestion​

​Querying the System​

​Key Features​

​Advanced Usage​

​Best Practices​

​Conclusion​

Prerequisites

System Setup

Document Ingestion

Querying the System

Key Features

Advanced Usage

Best Practices

Conclusion