The Only Low-latency Graph RAG API

duohub is the only low-latency graph RAG API to resolve complex queries in under 50ms.

Get Started
Trusted by engineers at companies of all sizes
Daily
ElevenLabs
Deakin University
Lockheed Martin
Digbi Health
Rebil
Dimando
Daily
ElevenLabs
Deakin University
Lockheed Martin
Digbi Health
Rebil
Dimando
Daily
ElevenLabs
Deakin University
Lockheed Martin
Digbi Health
Rebil
Dimando
Daily
ElevenLabs
Deakin University
Lockheed Martin
Digbi Health
Rebil
Dimando

Combine graph and vector RAG for the best of both worlds

Achieve unrivalled precision and speed with hybrid vector + graph RAG to deliver the right answer every time in less time than it takes you to blink.

Vector


Vector similarity search visualization

Semantic similarity search with reranking

Graph


Graph traversal visualization

Deep query resolution with graph traversals
Sarvam logo

"We've become very familiar with the limitations of RAG, particularly for voice agents. I'm glad to say that we've been impressed with the performance of duohub. Our team is able to ship products for new use cases quickly without having to manage complex infrastructure."

Aashay Sachdeva

Aashay Sachdeva

Founding Team - AI/ML at Sarvam

Create knowledge graphs that fit your data shape with ontologies

Our bespoke graph generation models are trained on intricate ontologies to build graphs that closely fit your data domain.

Vector
Vector

Start with one of our pre-trained ontology models today, or submit data to create an ontology tailored to your domain.

Make one API call for all pre and post-processing

Coreference Resolution


Coreference resolution visualization

A precursory, but often overlooked step, coreference resolution improves the performance of all down-stream processing by making explicit who or what pronouns such as "he", "she", "it" and "they" are referring to in your texts.

Fact Extraction


Fact extraction visualization

Optionally extract all key facts from your data, enabling compression of large volumes of information without sacrificing accuracy. Our fact extraction models specialise in extracting atomic units of meaning as single sentence statements.

Entity Resolution


Named entity recognition visualization

Find and merge multiple instances of the same entity under different names in your graph to further increase query resolution. Have confidence that a single entity in your graph is represented by a single node in your graph.
Get Started

Integrate with your stack in minutes, not months

Start querying your knowledge base with just three lines of code. No complex setup, no infrastructure headaches.

pip install duohub
from duohub import Duohub
client = Duohub()
result = client.query("Where is Ryan going in two weeks?")
PipecatAWS LambdaCloud Run
Supabase

Scale to millions of queries, globally

Data is replicated in 3 locations by default, with more regions available to be added. This contributes to a low-latency experience, with most subgraph queries returning in under 50ms.

Vector
Vector
Vector

Fast or right.
Choose both.

Most solutions force you to compromise: speed or accuracy, latency or precision. Not anymore. Combine the best of both worlds to deliver experiences with conversational AI that leave your customers speechless.

Logo

Solutions you know

Hundreds of thousands of dollars, specialised resources, months to production

Latency

0ms

Live by

May?
Logo

duohub

World-class, production-ready infrastructure you can count on today

Latency

0.00ms

Live by

Today

Pay only for what you use


Graph Generation

Volume rate (per 1k tokens)

*Automatic discounted rate at scale.

$0.0050*


Storage

Volume rate (per GB-day)

*Automatic discounted rate at scale.

$0.3285*


API Requests

Volume rate (per request)

*Automatic discounted rate at scale.

$0.0050*


Level up your enterprise

On-Premise Retrievers

Own your data and get the lowest latency with on-premise retrievers deployed within your VPC

Custom Ontologies

Fit your data domain with custom ontologies and train models to generate graphs from your own data

More GPU

Get more GPU and compute priority when powering graph generation models for faster ingestion

FAQ

Graph RAG (Retrieval Augmented Generation) combines traditional vector-based retrieval with graph-based knowledge representation. This allows for more precise and contextual information retrieval by leveraging both semantic similarity and structured relationships in your data.

Ontologies provide a formal structure for representing domain knowledge, relationships, and rules. This structured approach enables more accurate reasoning, better context understanding, and improved query resolution compared to simple vector-based retrieval.

Our system is designed for high performance with most subgraph queries completing under 50ms. With data replicated across three locations by default, you can expect consistent low-latency responses globally.

Yes! While we provide pre-trained models, enterprise customers can train models on their own ontologies to better fit their specific data domains and use cases.