Real-time, high-precision memory for voice AI

Bring depth to conversational AI experiences with ultra-low latency, ontology-powered graph RAG

Get Started
Trusted by engineers at companies of all sizes
Daily
ElevenLabs
Deakin University
Lockheed Martin
Digbi Health
Rebil
Dimando
Daily
ElevenLabs
Deakin University
Lockheed Martin
Digbi Health
Rebil
Dimando
Daily
ElevenLabs
Deakin University
Lockheed Martin
Digbi Health
Rebil
Dimando
Daily
ElevenLabs
Deakin University
Lockheed Martin
Digbi Health
Rebil
Dimando

Ground voice agents in truth

Start building better conversational AI with vector or graph RAG in minutes. Discover the right answer every time, even in complex data domains.

Vector


Vector similarity search visualization

Semantic similarity search with reranking

Graph


Graph traversal visualization

Deep query resolution with graph traversals
Sarvam logo

"We've become very familiar with the limitations of RAG, particularly for voice agents. I'm glad to say that we've been impressed with the performance of duohub. Our team is able to ship products for new use cases quickly without having to manage complex infrastructure."

Aashay Sachdeva

Aashay Sachdeva

Founding Team - AI/ML at Sarvam

Create knowledge graphs that fit your data using ontologies

Our bespoke graph generation models are trained on intricate ontologies to build graphs that closely fit your data domain.

Vector
Vector

Start with one of our pre-trained ontology models today, or submit data to create an ontology tailored to your domain.

Make one API call for all pre and post-processing

Coreference Resolution


Coreference resolution visualization

A precursory, but often overlooked step, coreference resolution improves the performance of all down-stream processing by making explicit who or what pronouns such as "he", "she", "it" and "they" are referring to in your texts.

Fact Extraction


Fact extraction visualization

Optionally extract all key facts from your data, enabling compression of large volumes of information without sacrificing accuracy. Our fact extraction models specialise in extracting atomic units of meaning as single sentence statements.

Entity Resolution


Named entity recognition visualization

Find and merge multiple instances of the same entity under different names in your graph to further increase query resolution. Have confidence that a single entity in your graph is represented by a single node in your graph.
Get Started

Scale to millions of queries, globally

Data is replicated in 3 locations by default, with more regions available to be added. This contributes to a low-latency experience, with most subgraph queries returning in under 50ms.

Vector
Vector

Integrate with your stack in minutes, not months

Start querying your knowledge base with just three lines of code. No complex setup, no infrastructure headaches.

pip install duohub
from duohub import Duohub
client = Duohub()
result = client.query("Where is Ryan going in two weeks?")
PipecatAWS LambdaCloud Run
Supabase
Vector

Fast or right.
Choose both.

Most solutions force you to compromise: speed or accuracy, latency or precision. Not anymore. Combine the best of both worlds to deliver experiences with conversational AI that leave your customers speechless.

Logo

Solutions you know

Hundreds of thousands of dollars, specialised resources, months to production

Latency

0ms

Live by

May?
Logo

duohub

World-class, production-ready infrastructure you can count on today

Latency

0.00ms

Live by

Today

Pay only for what you use

Clear, predictable pricing designed to scale with your success. No hidden fees, no minimum commitments. Start with a generous free plan and get $10 credit when you add a card.


Graph Generation

Volume rate (per 1k tokens)

*Automatic discounted rate at scale.

$0.0050*


Storage

Volume rate (per GB-day)

*Automatic discounted rate at scale.

$0.3285*


API Requests

Volume rate (per request)

*Automatic discounted rate at scale.

$0.0050*


Level up your enterprise

On-Premise Retrievers

Own your data and get the lowest latency with on-premise retrievers deployed within your VPC

Custom Ontologies

Fit your data domain with custom ontologies and train models to generate graphs from your own data

More GPU

Get more GPU and compute priority when powering graph generation models for faster ingestion

FAQ

Many people start by stuffing all of the context they want their voice agent to know into the prompt. This works for simple tasks, but as the complexity of the task increases, the prompt becomes too long and the agent becomes too slow. Additionally, you risk diverting the LLM's attention away from the important tasks at hand. Picture the context window like your working memory. It's not practical to hold everything in working memory while also attempting to reason. A memory layer allows you to store the context in a more efficient way, so that the agent can quickly retrieve only what is needed when it is needed.

Naive vector RAG is a good place to start, but it comes with a few key limitations. Let's use the simple example of a user asking you, 'Where did you go before New York City?' Now, if you have a lot of content in your memory store about New York City, your query will likely return a lot of content about New York City. But that's not what the user wants to know. Graph RAG gives you an extra layer of querying that allows you to use the nodes and relationships to determine what you did before you went to New York City.

You might be able to find the talent required to build complex graph RAG systems, develop entity resolution and coref models, and develop ontologies that match your data shape. However, this is extremely capital and time intensive. Furthermore, you then have the infrastructure overhead for maintaining the system. As voice AI engineers, we think it's much more convenient to use duohub which abstracts all of the complexity away for you so that you can focus on what really matters - creating exceptional voice AI experiences.

We have hybrid graph and vector retrievers that can be easily deployed within your VPC within each region where you operate. You can then use our APIs to process your data into graphs, perform coreference resolution, fact extraction, entity resolution and more. You do not pay for API requests or storage when using an on-premise retriever.

We generally offer same-day support for all customers. However, if you need integration support, custom ontology development or Service Level Agreements, we maintain add-on options which you can purchase in the app to get the support your business requires if it goes beyond what we already offer.