Everything we build at Drut AI that can be open-sourced, is. GitHub libraries for the AI engineering stack, and open-weight models fine-tuned for specific tasks.
agent-memory
drut-ai/agent-memory
Long-term episodic and semantic memory system for production AI agents. Supports vector stores, episodic recall, and forgetting curves.
rag-eval
drut-ai/rag-eval
Evaluation framework for retrieval-augmented generation pipelines. Covers faithfulness, relevance, context precision, and recall metrics.
prompt-router
drut-ai/prompt-router
Intelligent prompt routing between models based on cost, latency, and capability signals. Drop-in middleware for any LLM stack.
agent-sdk
drut-ai/agent-sdk
TypeScript SDK for building, orchestrating, and debugging multi-step AI agents with first-class tool-use and streaming support.
llm-eval
drut-ai/llm-eval
Lightweight LLM evaluation utilities for Node.js. Run automated evals, golden dataset comparisons, and regression tests in CI.
fast-vector-search
drut-ai/fast-vector-search
High-performance HNSW vector search with SIMD acceleration. Up to 12× faster than FAISS on commodity hardware.
hybrid-retriever
drut-ai/hybrid-retriever
BM25 + dense retrieval fusion with learned re-ranking. Plug-and-play with any vector store. Ships with query expansion.
tool-forge
drut-ai/tool-forge
Type-safe tool definition and validation layer for LLM function calling. Zod-native, works with OpenAI, Anthropic, and Gemini.
context-compressor
drut-ai/context-compressor
Lossless and lossy context compression for long-context LLMs. Reduces token count by 60–80% with configurable fidelity tradeoffs.
trace-llm
drut-ai/trace-llm
OpenTelemetry-native LLM observability layer. Trace every token, cost, and latency across multi-agent workflows.
memory-graph
drut-ai/memory-graph
Graph-based working memory for agents. Stores entities and relationships extracted from conversations into a queryable knowledge graph.
chunk-wizard
drut-ai/chunk-wizard
Semantic chunking library that respects document structure, tables, and code blocks. Dramatically improves RAG retrieval quality.
gemma4-sql-generator
Based on Gemma 4 9B
Fine-tuned Gemma 4 9B for natural language to SQL. Supports complex joins, CTEs, window functions. Trained on 180k NL-SQL pairs across PostgreSQL, MySQL, and BigQuery dialects.
qwen3-code-agent
Based on Qwen 3 8B
Qwen 3 8B fine-tuned for agentic coding workflows. Understands tool-use schemas, writes and debugs multi-file code, and reasons over test failures.
llama4-rag-instruct
Based on Llama 4 Scout 17B
Instruction-tuned Llama 4 Scout optimised for faithful RAG responses. Trained to cite sources, flag knowledge gaps, and refuse to hallucinate.
phi4-reasoning-math
Based on Phi-4 14B
Phi-4 fine-tuned with chain-of-thought distillation for multi-step mathematical reasoning. Excels at algebra, calculus, and proof-writing tasks.
mistral-doc-extractor
Based on Mistral 7B v0.3
Mistral 7B fine-tuned for structured extraction from PDFs and enterprise documents. Outputs clean JSON with entity, date, and relation extraction.
qwen3-vision-ocr
Based on Qwen3-VL 7B
Qwen3-VL fine-tuned for document layout understanding, table extraction, and handwritten text recognition across 14 languages.
gemma4-intent-sql-classifier
Based on Gemma 4 2B
Ultra-fast 2B classifier that routes user queries to the correct SQL generator — analytical vs transactional vs reporting. 4 ms latency on CPU.
llama4-code-reviewer
Based on Llama 4 Maverick 17B
Llama 4 Maverick fine-tuned to perform expert-level code reviews. Identifies security vulnerabilities, performance issues, and style violations with line-level precision.
phi4-legal-reasoning
Based on Phi-4 14B
Contract analysis and legal reasoning model. Identifies clauses, flags risks, and summarises obligations across common law jurisdictions.
All libraries are actively maintained. PRs, issues, and forks welcome. Models are on HuggingFace with quantised GGUF versions available.