Open Source — Drut AI

Built in the open.
Used in production.

Everything we build at Drut AI that can be open-sourced, is. GitHub libraries for the AI engineering stack, and open-weight models fine-tuned for specific tasks.

Libraries

Open Models

8.6k

GitHub Stars

170k+

HF Downloads

OSS

drut-ai/agent-memorydrut-ai/rag-evaldrut-ai/prompt-routerdrut-ai/agent-sdkdrut-ai/llm-evaldrut-ai/fast-vector-searchdrut-ai/hybrid-retrieverdrut-ai/tool-forgedrut-ai/context-compressordrut-ai/trace-llmdrut-ai/memory-graphdrut-ai/chunk-wizarddrut-ai/agent-memorydrut-ai/rag-evaldrut-ai/prompt-routerdrut-ai/agent-sdkdrut-ai/llm-evaldrut-ai/fast-vector-searchdrut-ai/hybrid-retrieverdrut-ai/tool-forgedrut-ai/context-compressordrut-ai/trace-llmdrut-ai/memory-graphdrut-ai/chunk-wizard

GitHub Libraries

Open source tools

12 / 12 packages

Pythonv0.9.2

agent-memory

drut-ai/agent-memory

Long-term episodic and semantic memory system for production AI agents. Supports vector stores, episodic recall, and forgetting curves.

MemoryAgentsVector DBEpisodic

Pythonv1.2.0

rag-eval

drut-ai/rag-eval

Evaluation framework for retrieval-augmented generation pipelines. Covers faithfulness, relevance, context precision, and recall metrics.

RAGEvaluationMetricsRAGAS

Pythonv0.6.1

prompt-router

drut-ai/prompt-router

Intelligent prompt routing between models based on cost, latency, and capability signals. Drop-in middleware for any LLM stack.

RoutingCost OptimizationMulti-Model

TypeScriptv2.0.0-beta

agent-sdk

drut-ai/agent-sdk

TypeScript SDK for building, orchestrating, and debugging multi-step AI agents with first-class tool-use and streaming support.

SDKAgentsTypeScriptStreaming

TypeScriptv1.0.4

llm-eval

drut-ai/llm-eval

Lightweight LLM evaluation utilities for Node.js. Run automated evals, golden dataset comparisons, and regression tests in CI.

EvaluationCI/CDNode.jsTesting

Rustv3.1.0

fast-vector-search

drut-ai/fast-vector-search

High-performance HNSW vector search with SIMD acceleration. Up to 12× faster than FAISS on commodity hardware.

Vector SearchHNSWSIMDRust

Pythonv0.4.3

hybrid-retriever

drut-ai/hybrid-retriever

BM25 + dense retrieval fusion with learned re-ranking. Plug-and-play with any vector store. Ships with query expansion.

BM25Dense RetrievalRe-rankingRAG

TypeScriptv1.1.2

tool-forge

drut-ai/tool-forge

Type-safe tool definition and validation layer for LLM function calling. Zod-native, works with OpenAI, Anthropic, and Gemini.

Function CallingZodType-safeTools

Pythonv0.7.0

context-compressor

drut-ai/context-compressor

Lossless and lossy context compression for long-context LLMs. Reduces token count by 60–80% with configurable fidelity tradeoffs.

Context WindowToken ReductionLong Context

Gov0.3.1

trace-llm

drut-ai/trace-llm

OpenTelemetry-native LLM observability layer. Trace every token, cost, and latency across multi-agent workflows.

ObservabilityOpenTelemetryTracingGo

Pythonv1.0.0

memory-graph

drut-ai/memory-graph

Graph-based working memory for agents. Stores entities and relationships extracted from conversations into a queryable knowledge graph.

Knowledge GraphMemoryEntitiesNeo4j

Pythonv0.5.2

chunk-wizard

drut-ai/chunk-wizard

Semantic chunking library that respects document structure, tables, and code blocks. Dramatically improves RAG retrieval quality.

ChunkingSemanticRAGDocument Parsing

Open Weights · HuggingFace

Task-specific models

9 / 9 models

SQL Generation9B

gemma4-sql-generator

Based on Gemma 4 9B

Fine-tuned Gemma 4 9B for natural language to SQL. Supports complex joins, CTEs, window functions. Trained on 180k NL-SQL pairs across PostgreSQL, MySQL, and BigQuery dialects.

Spider 2.084.2%

SQLNL2SQLGemma 4PostgreSQLBigQuery

Code Agent8B

qwen3-code-agent

Based on Qwen 3 8B

Qwen 3 8B fine-tuned for agentic coding workflows. Understands tool-use schemas, writes and debugs multi-file code, and reasons over test failures.

SWE-bench41.8%

CodeAgenticQwen 3Multi-fileDebugging

RAG Answering17B

llama4-rag-instruct

Based on Llama 4 Scout 17B

Instruction-tuned Llama 4 Scout optimised for faithful RAG responses. Trained to cite sources, flag knowledge gaps, and refuse to hallucinate.

TruthfulQA89.1%

RAGFaithfulLlama 4CitationInstruct

Mathematical Reasoning14B

phi4-reasoning-math

Based on Phi-4 14B

Phi-4 fine-tuned with chain-of-thought distillation for multi-step mathematical reasoning. Excels at algebra, calculus, and proof-writing tasks.

MATH-50079.6%

MathReasoningCoTPhi-4STEM

Document Extraction7B

mistral-doc-extractor

Based on Mistral 7B v0.3

Mistral 7B fine-tuned for structured extraction from PDFs and enterprise documents. Outputs clean JSON with entity, date, and relation extraction.

DocNLI F191.3%

ExtractionJSONDocumentsMistralNER

Visual OCR & Layout7B

qwen3-vision-ocr

Based on Qwen3-VL 7B

Qwen3-VL fine-tuned for document layout understanding, table extraction, and handwritten text recognition across 14 languages.

DocVQA88.7%

VisionOCRTablesMultilingualQwen3-VL

SQL Intent Routing2B

gemma4-intent-sql-classifier

Based on Gemma 4 2B

Ultra-fast 2B classifier that routes user queries to the correct SQL generator — analytical vs transactional vs reporting. 4 ms latency on CPU.

Accuracy97.2%

SQLClassificationRoutingFastGemma 4

Code Review17B

llama4-code-reviewer

Based on Llama 4 Maverick 17B

Llama 4 Maverick fine-tuned to perform expert-level code reviews. Identifies security vulnerabilities, performance issues, and style violations with line-level precision.

CodeReview-Bench76.4%

Code ReviewSecurityPerformanceLlama 4

Legal Reasoning14B

phi4-legal-reasoning

Based on Phi-4 14B

Contract analysis and legal reasoning model. Identifies clauses, flags risks, and summarises obligations across common law jurisdictions.

LegalBench71.8%

LegalContractsRiskReasoningPhi-4

Contribute or use it in production.

All libraries are actively maintained. PRs, issues, and forks welcome. Models are on HuggingFace with quantised GGUF versions available.

GitHub Org HuggingFace Org

Built in the open.Used in production.

Open source tools

Task-specific models

Contribute or use it in production.

Built in the open.
Used in production.