- Document Ingestion: Processes multi-format documents (PDF, DOCX, TXT, Markdown) into chunks with metadata.
- Vector Storage: Stores embeddings in Gravix Layer’s Vector Database using the
baai/bge-large-en-v1.5model. - Contextual Retrieval: Expands queries with conversation history, performs semantic search, and reranks results dynamically.
- Response Generation: Uses
meta-llama/llama-3.1-8b-instructto generate responses incorporating conversation history and retrieved chunks. - Memory Management: Maintains a persistent conversation history (up to 10 interactions) for continuity.
Prerequisites
To use the system, ensure the following:- Python Environment: Python 3.8+ (notebook uses 3.9.7).
-
Gravix Layer API Key: Obtain from platform.gravixlayer.com. Set via:
or use a
.envfile withGRAVIXLAYER_API_KEY=your_key. -
Dependencies: Install required packages:
gravixlayer: Official SDK for Gravix Layer APIs.PyPDF2,python-docx: For PDF and DOCX processing.requests,python-dotenv: For API calls and environment management.
-
Sample Files: Documents (e.g.,
/path/to/Test.pdf) for testing ingestion. - Jupyter Notebook: Use Jupyter or a compatible IDE to run the notebook.
System Setup
Follow these steps to initialize the system:-
Load Configuration: The notebook defines defaults in
CONFIG:Customize these (e.g.,chunk_size) if needed. -
Initialize Session State: Run the
init_session_state()function to set up global variables:chat_history: Deque for conversation history (max 10 interactions).processed_files: List of ingested documents.document_chunks: Dictionary storing chunk IDs and text.
-
Create Vector Index: Run
setup_contextual_rag('contextual-rag-demo')to create or find a vector index. This:- Checks for existing indexes with
list_vector_indexes(). - Creates a new index if none exists (
create_vector_index). - Stores the index ID in
session_state['index_id'].
- Checks for existing indexes with
-
Check Status: Use
show_system_status()to confirm setup:Output example:
Document Ingestion
Ingest documents to populate the knowledge base:-
Single Document:
-
Use
ingest_document('/path/to/file.pdf', custom_name='Test.pdf'). -
Process:
- Reads file bytes (
extract_text_from_file). - Supports PDF (page-wise extraction), DOCX (paragraphs), TXT, Markdown.
- Chunks text with
chunk_text_contextually(sentence-aware, 800 chars, 150-char overlap). - Generates metadata (filename, chunk_number, char_start/end, document_type, context_summary).
- Embeds chunks using
baai/bge-large-en-v1.5and upserts to vector DB (upsert_vectors). - Updates
session_statewith processed files and chunks.
- Reads file bytes (
-
Example:
Output:
-
Use
-
Bulk Ingestion:
- Use
bulk_ingest_directory('/docs/')to process all supported files (PDF, DOCX, TXT, MD). - Example:
- Use
-
Metadata: Each chunk includes:
filename: Document name.chunk_number: Sequential index.char_start/end: Text position.document_type: pdf/docx/text/markdown.context_summary: Top 5 keywords for context.has_previous/is_final: Structural flags.
Querying the System
Query the system to retrieve and generate context-aware responses:-
Single Query:
-
Use
query_contextual_rag("What is the main topic?"). -
Process:
-
Fetches conversation context (
get_conversation_context, last 3 interactions). -
Expands query with keywords (
_expand_query_contextually). -
Searches vector DB (
search_vectors, top_k=5), deduplicates (_deduplicate_hits), and reranks (_rerank_by_context) based on:- Conversation keywords (0.05 bonus per match).
- Query type (e.g., 0.1 bonus for “what” queries matching “definition”).
- Document structure (0.03 bonus for first chunks).
-
Builds prompt with system instructions, conversation history, and top 3 chunks (
_build_contextual_prompt). -
Generates response with
meta-llama/llama-3.1-8b-instruct(max_tokens=600, temperature=0.1). -
Post-processes response (
_post_process_response, e.g., cleans formatting). -
Saves query and response to
chat_history.
-
Fetches conversation context (
-
Example:
Output:
-
Use
-
Interactive Chat:
-
Use
interactive_chat()for a conversational loop:- Input queries, ‘quit’ to exit, ‘clear’ to reset history, ‘status’ for system info.
-
Example:
Console:
-
Use
-
Options:
show_context=True: Displays retrieved chunks.show_retrieval_info=True: Shows retrieval summary (chunks analyzed, sources).- Returned dictionary:
{"query", "response", "context", "hits", "conversation_context_used"}.
Key Features
The system’s contextual enhancements distinguish it from traditional RAG:-
Conversational Context:
- Stores up to 10 interactions in
chat_history(deque). - Uses recent history (last 3 Q&A pairs) to inform retrieval and generation.
- Example:
get_conversation_context()formats history as “Human: … Assistant: …”.
- Stores up to 10 interactions in
-
Query Enhancement:
- Expands queries with conversation keywords (
extract_conversation_keywords, top 8 words). - Generates variations (e.g., partial queries, question-type-specific).
- Example: For “What is AI?”, adds keywords like “predictive, expert” from prior chats.
- Expands queries with conversation keywords (
-
Contextual Retrieval:
-
Multi-query search (
contextual_search_and_rerank) with deduplication. -
Reranks chunks based on:
- Keyword matches (0.05 bonus per match).
- Query type alignment (e.g., 0.1 bonus for “how” queries matching “process”).
- Structural relevance (0.03 bonus for chunk_number=0).
-
Selects top 3 chunks (
rerank_k=3).
-
Multi-query search (
-
Memory Management:
- Persistent
session_statetracks files, chunks, and history. - Export via
export_conversation('json', 'chat_log.json')or TXT format.
- Persistent
-
Multi-Document Support:
- Handles PDF, DOCX, TXT, Markdown with format-specific extraction.
- Metadata includes context_summary (top 5 keywords per chunk).
-
Real-Time Processing:
- Supports interactive document uploads and querying.
- Example: Ingest a PDF and query immediately.
Advanced Usage
Enhance the system with these features:-
Customize Configuration:
- Adjust
CONFIG(e.g.,chunk_size=1000,retrieval_k=10). - Example:
- Adjust
-
Metadata Filtering:
-
Search chunks by metadata:
Output:
-
Search chunks by metadata:
-
Conversation Management:
-
View history:
show_conversation_history(). -
Clear history:
clear_conversation(). -
Export:
Output:
conversation_20250925_1759.txt.
-
View history:
-
Document Summary:
-
Use
get_document_summary():Output:
-
Use
-
Extend Reranking:
- Modify
_rerank_by_contextto add custom bonuses (e.g., for specific keywords). - Example:
- Modify
-
Integrate APIs:
- Add external data sources or alternative LLMs by modifying
generate_contextual_response.
- Add external data sources or alternative LLMs by modifying
Best Practices
- Optimize Chunking: Adjust
chunk_sizeandchunk_overlapfor your documents (e.g., larger for dense texts). - Query Clarity: Use specific queries to leverage contextual reranking effectively.
- History Management: Clear history periodically for long sessions (
clear_conversation()). - Metadata Usage: Leverage
search_by_metadatafor targeted retrieval. - Testing: Start with small documents (e.g., a single PDF) to validate setup.
- Backup: Export conversations regularly (
export_conversation) to save progress. - Monitor API Usage: Track Gravix Layer API calls to avoid rate limits (see platform.gravixlayer.com).
Conclusion
The Contextual RAG system with Gravix Layer is a powerful tool for building context-aware, interactive AI applications. Its strengths lie in:- Conversational Continuity: Persistent history ensures coherent multi-turn interactions.
- Enhanced Retrieval: Query expansion and contextual reranking outperform traditional RAG.
- Flexibility: Supports diverse documents and extensible features.

