Open Source — Drut AI

Built in the open.
Used in production.

Everything we build at Drut AI that can be open-sourced, is. GitHub libraries for the AI engineering stack, and open-weight models fine-tuned for specific tasks.

12
Libraries
9
Open Models
8.6k
GitHub Stars
170k+
HF Downloads
OSS
drut-ai/agent-memorydrut-ai/rag-evaldrut-ai/prompt-routerdrut-ai/agent-sdkdrut-ai/llm-evaldrut-ai/fast-vector-searchdrut-ai/hybrid-retrieverdrut-ai/tool-forgedrut-ai/context-compressordrut-ai/trace-llmdrut-ai/memory-graphdrut-ai/chunk-wizarddrut-ai/agent-memorydrut-ai/rag-evaldrut-ai/prompt-routerdrut-ai/agent-sdkdrut-ai/llm-evaldrut-ai/fast-vector-searchdrut-ai/hybrid-retrieverdrut-ai/tool-forgedrut-ai/context-compressordrut-ai/trace-llmdrut-ai/memory-graphdrut-ai/chunk-wizard
GitHub Libraries

Open source tools

12 / 12 packages
Pythonv0.9.2

agent-memory

drut-ai/agent-memory

Long-term episodic and semantic memory system for production AI agents. Supports vector stores, episodic recall, and forgetting curves.

MemoryAgentsVector DBEpisodic
Pythonv1.2.0

rag-eval

drut-ai/rag-eval

Evaluation framework for retrieval-augmented generation pipelines. Covers faithfulness, relevance, context precision, and recall metrics.

RAGEvaluationMetricsRAGAS
Pythonv0.6.1

prompt-router

drut-ai/prompt-router

Intelligent prompt routing between models based on cost, latency, and capability signals. Drop-in middleware for any LLM stack.

RoutingCost OptimizationMulti-Model
TypeScriptv2.0.0-beta

agent-sdk

drut-ai/agent-sdk

TypeScript SDK for building, orchestrating, and debugging multi-step AI agents with first-class tool-use and streaming support.

SDKAgentsTypeScriptStreaming
TypeScriptv1.0.4

llm-eval

drut-ai/llm-eval

Lightweight LLM evaluation utilities for Node.js. Run automated evals, golden dataset comparisons, and regression tests in CI.

EvaluationCI/CDNode.jsTesting
Rustv3.1.0

fast-vector-search

drut-ai/fast-vector-search

High-performance HNSW vector search with SIMD acceleration. Up to 12× faster than FAISS on commodity hardware.

Vector SearchHNSWSIMDRust
Pythonv0.4.3

hybrid-retriever

drut-ai/hybrid-retriever

BM25 + dense retrieval fusion with learned re-ranking. Plug-and-play with any vector store. Ships with query expansion.

BM25Dense RetrievalRe-rankingRAG
TypeScriptv1.1.2

tool-forge

drut-ai/tool-forge

Type-safe tool definition and validation layer for LLM function calling. Zod-native, works with OpenAI, Anthropic, and Gemini.

Function CallingZodType-safeTools
Pythonv0.7.0

context-compressor

drut-ai/context-compressor

Lossless and lossy context compression for long-context LLMs. Reduces token count by 60–80% with configurable fidelity tradeoffs.

Context WindowToken ReductionLong Context
Gov0.3.1

trace-llm

drut-ai/trace-llm

OpenTelemetry-native LLM observability layer. Trace every token, cost, and latency across multi-agent workflows.

ObservabilityOpenTelemetryTracingGo
Pythonv1.0.0

memory-graph

drut-ai/memory-graph

Graph-based working memory for agents. Stores entities and relationships extracted from conversations into a queryable knowledge graph.

Knowledge GraphMemoryEntitiesNeo4j
Pythonv0.5.2

chunk-wizard

drut-ai/chunk-wizard

Semantic chunking library that respects document structure, tables, and code blocks. Dramatically improves RAG retrieval quality.

ChunkingSemanticRAGDocument Parsing
Open Weights · HuggingFace

Task-specific models

9 / 9 models
SQL Generation9B

gemma4-sql-generator

Based on Gemma 4 9B

Fine-tuned Gemma 4 9B for natural language to SQL. Supports complex joins, CTEs, window functions. Trained on 180k NL-SQL pairs across PostgreSQL, MySQL, and BigQuery dialects.

Spider 2.084.2%
SQLNL2SQLGemma 4PostgreSQLBigQuery
Code Agent8B

qwen3-code-agent

Based on Qwen 3 8B

Qwen 3 8B fine-tuned for agentic coding workflows. Understands tool-use schemas, writes and debugs multi-file code, and reasons over test failures.

SWE-bench41.8%
CodeAgenticQwen 3Multi-fileDebugging
RAG Answering17B

llama4-rag-instruct

Based on Llama 4 Scout 17B

Instruction-tuned Llama 4 Scout optimised for faithful RAG responses. Trained to cite sources, flag knowledge gaps, and refuse to hallucinate.

TruthfulQA89.1%
RAGFaithfulLlama 4CitationInstruct
Mathematical Reasoning14B

phi4-reasoning-math

Based on Phi-4 14B

Phi-4 fine-tuned with chain-of-thought distillation for multi-step mathematical reasoning. Excels at algebra, calculus, and proof-writing tasks.

MATH-50079.6%
MathReasoningCoTPhi-4STEM
Document Extraction7B

mistral-doc-extractor

Based on Mistral 7B v0.3

Mistral 7B fine-tuned for structured extraction from PDFs and enterprise documents. Outputs clean JSON with entity, date, and relation extraction.

DocNLI F191.3%
ExtractionJSONDocumentsMistralNER
Visual OCR & Layout7B

qwen3-vision-ocr

Based on Qwen3-VL 7B

Qwen3-VL fine-tuned for document layout understanding, table extraction, and handwritten text recognition across 14 languages.

DocVQA88.7%
VisionOCRTablesMultilingualQwen3-VL
SQL Intent Routing2B

gemma4-intent-sql-classifier

Based on Gemma 4 2B

Ultra-fast 2B classifier that routes user queries to the correct SQL generator — analytical vs transactional vs reporting. 4 ms latency on CPU.

Accuracy97.2%
SQLClassificationRoutingFastGemma 4
Code Review17B

llama4-code-reviewer

Based on Llama 4 Maverick 17B

Llama 4 Maverick fine-tuned to perform expert-level code reviews. Identifies security vulnerabilities, performance issues, and style violations with line-level precision.

CodeReview-Bench76.4%
Code ReviewSecurityPerformanceLlama 4
Legal Reasoning14B

phi4-legal-reasoning

Based on Phi-4 14B

Contract analysis and legal reasoning model. Identifies clauses, flags risks, and summarises obligations across common law jurisdictions.

LegalBench71.8%
LegalContractsRiskReasoningPhi-4

Contribute or use it in production.

All libraries are actively maintained. PRs, issues, and forks welcome. Models are on HuggingFace with quantised GGUF versions available.