TENGWAR - Enterprise RAG Platform
100% private enterprise RAG platform for teams that need AI-powered document intelligence without sending data to the cloud. Combines intelligent document ingestion, hybrid semantic search, and agentic AI capabilities across three client surfaces (Windows desktop, macOS desktop, and web).
Role
Founder / Lead Developer
Domain
AI · Enterprise RAG · Knowledge Management · On-Premises
Stack
.NET 10C#BlazorMAUIvLLMQwen3.5BGE-M3QdrantPostgreSQLSignalRDockerMinIORabbitMQDoclingGLM-OCRM2M-100Microsoft Agent Framework
Problem & Context
Many organizations require internal knowledge assistants but cannot send proprietary documents to external LLM APIs due to legal, GDPR, trade secret, or IP protection requirements. TENGWAR is a plug-and-play on-premises solution that embeds company documents into vector databases and provides intelligent Q&A interfaces with precise source citations, all running locally on a single GPU node.
Responsibilities
- Full-stack architecture and implementation: backend API, desktop clients, web fallback, AI pipeline, and hardware configuration
- RAG architecture: hybrid search combining dense embeddings (BGE-M3), sparse vectors (BM25), and visual search with RRF fusion and cross-encoder reranking (BGE-Reranker-v2-m3)
- 14-phase smart document ingestion pipeline: structure detection (Docling), OCR (GLM-OCR), asset extraction, content fusion, context enrichment, translation (M2M-100), semantic chunking, tokenization, embedding, and vector storage
- Agentic chat system using Microsoft Agent Framework with autonomous tool invocation (document retrieval, web search, email, calendar, file generation)
- Real-time streaming responses via SignalR across all client surfaces
- Multi-layered security: JWT with RSA asymmetric signing, OAuth (Google + Microsoft Entra ID), RBAC, security clearance levels, department-based document filtering, per-user credential encryption (AES-256-GCM)
- Cross-platform desktop clients with .NET MAUI Blazor Hybrid (Windows + macOS)
- 15-service Docker Compose orchestration with GPU memory-sharing strategy for multiple vLLM services on a single NVIDIA GPU
- Multi-format document support: PDF, DOCX, XLSX, PPTX, TXT, MD, HTML, and more with automatic language detection and translation (100+ languages via M2M-100)
Architecture & Stack
- Backend: .NET 10 (C#), ASP.NET Core Web API, Entity Framework Core, SignalR hubs
- Desktop: .NET MAUI Blazor Hybrid (Windows + macOS native)
- Web: Blazor Server in Docker (browser-based fallback)
- Chat Model: vLLM serving Qwen3.5-35B-A3B-FP8 (35B total params, 3B active via MoE)
- Embeddings: BGE-M3 (568M, 1024-dimensional vectors)
- Reranker: BGE-Reranker-v2-m3 (cross-encoder)
- OCR: GLM-OCR 0.9B (vision-based document OCR)
- Translation: M2M-100 1.2B (100+ languages)
- Vector Database: Qdrant (dense + sparse + visual collections)
- Document Processing: Docling-based pipeline with semantic chunking
- Database: PostgreSQL 16
- Object Storage: MinIO (S3-compatible)
- Message Queue: RabbitMQ 3
- Web Search: SearXNG (self-hosted metasearch for agent tool)
- Deployment: Docker Compose (15 services), single NVIDIA GPU with precise memory allocation per vLLM service
Outcomes
- Enabled 100% private AI assistant for organizations unable to use cloud LLMs
- 14-phase intelligent ingestion pipeline with automatic OCR, translation, and semantic chunking eliminates manual document processing
- Hybrid search (dense + sparse + visual) with RRF fusion and cross-encoder reranking delivers precise, context-aware answers
- Agentic AI autonomously decides when to retrieve documents, search the web, or use other tools
- Multi-client support (Windows, macOS, Web) with shared Blazor UI layer ensures consistent experience
- Security model with clearance levels and department filtering meets enterprise compliance requirements
- Single-GPU deployment with memory-sharing strategy makes the platform accessible to SMBs without expensive infrastructure
Demo
Learn More
Official product page: tengwar.net ↗
