API Solutions — Drut AI

AI capabilities,
delivered as APIs.

Six production-ready AI services. One API key. No infrastructure to manage, no pipelines to build. Start querying in under a minute.

6
API Services
< 1
Min Integration
99.9%
Uptime SLA
$0
Infra to Manage
API
/v1/rag/query/v1/sql/generate/v1/memory/v1/extract/v1/embed/v1/rerank/v1/rag/query/v1/sql/generate/v1/memory/v1/extract/v1/embed/v1/rerank
API Offerings · 6 services
01
RAG-as-a-Service
Retrieval-augmented generation. No infra, no indexing pipelines.
retrieval
POST/v1/rag/query

Upload your documents once. Query forever. We handle chunking, embedding, vector storage, re-ranking, and generation — all in a single API call. Bring your own LLM or use ours.

Use cases
Enterprise knowledge basesSupport copilotsResearch assistantsDoc QA
Latency< 400ms p95
SLA99.9% uptime
Pricing

Charged per query. Embedding ingestion billed separately at $0.004 / 1k tokens.

Starter
$0.008/ query
Up to 10k queries/mo
Growth
$0.005/ query
10k – 500k queries/mo
Scale
$0.003/ query
500k+ queries/mo
Enterprise
Custom/ query
SLA + dedicated cluster
POST https://api.drut.ai/v1/rag/query
Authorization: Bearer drut_sk_...
Content-Type: application/json

{
  "collection_id": "col_acme_docs_v2",
  "query": "What is our refund policy for enterprise plans?",
  "top_k": 5,
  "rerank": true,
  "model": "drut-rag-1",
  "stream": false
}
02
NL-to-SQL-as-a-Service
Natural language to production SQL. Dialect-aware, schema-grounded.
generation
POST/v1/sql/generate

Send us a natural language question and your schema. Get back validated, optimised SQL — tested against PostgreSQL, MySQL, BigQuery, and Snowflake. Supports CTEs, window functions, and multi-table joins.

Use cases
BI self-serviceData chatbotsReport generatorsAnalyst copilots
Latency< 250ms p95
SLA99.9% uptime
Pricing

Charged per generation call. Schema tokens are included in the per-call price.

Starter
$0.012/ call
Up to 5k calls/mo
Growth
$0.009/ call
5k – 100k calls/mo
Scale
$0.006/ call
100k+ calls/mo
Enterprise
Custom/ call
Private deployment
POST https://api.drut.ai/v1/sql/generate
Authorization: Bearer drut_sk_...
Content-Type: application/json

{
  "dialect": "postgresql",
  "schema": {
    "orders": ["id", "user_id", "amount", "created_at", "status"],
    "users":  ["id", "email", "plan", "region"]
  },
  "question": "Monthly revenue by region for paid users in 2024",
  "validate": true
}
03
Agent Memory Layer
Persistent, queryable memory for AI agents. Priced by what you store.
memory
POST/v1/memory

Give your agents a long-term memory that persists across sessions. Store facts, events, user preferences, and agent observations. Retrieve by semantic similarity or exact filters. Memory points expire or persist based on your policy.

Use cases
Personalised agentsCustomer historyResearch agentsMulti-session bots
Latency< 60ms p95
SLA99.95% uptime
Pricing

Charged per memory point stored per day. Retrieval queries are free up to 50k/mo.

Starter
$0.0002/ point / day
Up to 500k points
Growth
$0.00015/ point / day
500k – 10M points
Scale
$0.00009/ point / day
10M+ points
Enterprise
Custom/ point / day
Isolated namespace
POST https://api.drut.ai/v1/memory/write
Authorization: Bearer drut_sk_...
Content-Type: application/json

{
  "namespace": "user_9821",
  "points": [
    {
      "content": "Prefers concise answers, dislikes bullet lists.",
      "type": "preference",
      "ttl_days": 90
    },
    {
      "content": "Reviewed Q3 report on 2025-11-03, flagged APAC anomaly.",
      "type": "event",
      "ttl_days": 30
    }
  ]
}
04
Document Extraction API
PDFs, tables, forms — structured JSON out. Every time.
vision
POST/v1/extract

Send any document — PDF, scanned image, invoice, contract — and receive structured JSON with entities, tables, key-value pairs, and layout annotations. Powered by our Mistral-based fine-tuned extractor.

Use cases
Invoice processingContract analysisForm digitisationMedical records
Latency< 800ms p95
SLA99.9% uptime
Pricing

Charged per page extracted. Tables and forms count as one page each.

Starter
$0.018/ page
Up to 2k pages/mo
Growth
$0.012/ page
2k – 50k pages/mo
Scale
$0.007/ page
50k+ pages/mo
Enterprise
Custom/ page
On-prem available
POST https://api.drut.ai/v1/extract
Authorization: Bearer drut_sk_...
Content-Type: application/json

{
  "document_url": "https://storage.co/invoices/inv_0042.pdf",
  "extract": ["entities", "tables", "key_values"],
  "language": "en",
  "output_format": "json"
}
05
Embeddings API
High-fidelity embeddings for search, clustering, and classification.
retrieval
POST/v1/embed

Generate dense vector embeddings for text, code, or structured data. Choose from our model family — 256d for speed, 1536d for precision. Batch up to 2,048 inputs per request.

Use cases
Semantic searchDuplicate detectionClusteringRecommendation
Latency< 180ms p95
SLA99.9% uptime
Pricing

Charged per 1k tokens embedded. Batching is strongly encouraged — up to 2,048 inputs.

Drut-Embed-Fast
$0.00008/ 1k tokens
256d, 120ms p95
Drut-Embed-Base
$0.00018/ 1k tokens
768d, 180ms p95
Drut-Embed-Large
$0.00035/ 1k tokens
1536d, 280ms p95
Enterprise
Custom/ 1k tokens
Fine-tuned domain model
POST https://api.drut.ai/v1/embed
Authorization: Bearer drut_sk_...
Content-Type: application/json

{
  "model": "drut-embed-base",
  "inputs": [
    "Agent memory systems for production LLMs",
    "Long-term episodic storage in transformer agents"
  ],
  "truncate": true
}
06
Reranking API
Cross-encoder reranking. Drop your retrieval precision from 70% to 94%.
retrieval
POST/v1/rerank

Pass a query and a list of candidate documents. Get back a precision-scored, sorted list using our cross-encoder model. Works with any vector retrieval system or BM25.

Use cases
RAG precision boostSearch re-orderingCandidate shortlistingFAQ matching
Latency< 100ms p95
SLA99.9% uptime
Pricing

Charged per reranking call, regardless of candidate count (up to 100 docs).

Starter
$0.006/ call
Up to 20k calls/mo
Growth
$0.004/ call
20k – 500k calls/mo
Scale
$0.0025/ call
500k+ calls/mo
Enterprise
Custom/ call
Latency SLA
POST https://api.drut.ai/v1/rerank
Authorization: Bearer drut_sk_...
Content-Type: application/json

{
  "query": "How to reduce inference cost for large models?",
  "documents": [
    "Quantisation techniques reduce model size by 4x...",
    "The history of transformer architectures...",
    "Batching strategies for GPU inference workloads..."
  ],
  "top_n": 2,
  "return_scores": true
}

Ready to integrate?

Get an API key and start building in minutes. All services share the same authentication and base URL. SDKs for Python and TypeScript available.