"We've become very familiar with the limitations of RAG, particularly for voice agents. I'm glad to say that we've been impressed with the performance of duohub. Our team is able to ship products for new use cases quickly without having to manage complex infrastructure."
Aashay Sachdeva
Founding Team - AI/ML at Sarvam
Hundreds of thousands of dollars, specialised resources, months to production
Latency
Live by
World-class, production-ready infrastructure you can count on today
Latency
Live by
Clear, predictable pricing designed to scale with your success. No hidden fees, no minimum commitments. Start with a generous free plan and get $10 credit when you add a card.
Volume rate (per 1k tokens)
*Automatic discounted rate at scale.
Volume rate (per GB-day)
*Automatic discounted rate at scale.
Volume rate (per request)
*Automatic discounted rate at scale.
Many people start by stuffing all of the context they want their voice agent to know into the prompt. This works for simple tasks, but as the complexity of the task increases, the prompt becomes too long and the agent becomes too slow. Additionally, you risk diverting the LLM's attention away from the important tasks at hand. Picture the context window like your working memory. It's not practical to hold everything in working memory while also attempting to reason. A memory layer allows you to store the context in a more efficient way, so that the agent can quickly retrieve only what is needed when it is needed.
Naive vector RAG is a good place to start, but it comes with a few key limitations. Let's use the simple example of a user asking you, 'Where did you go before New York City?' Now, if you have a lot of content in your memory store about New York City, your query will likely return a lot of content about New York City. But that's not what the user wants to know. Graph RAG gives you an extra layer of querying that allows you to use the nodes and relationships to determine what you did before you went to New York City.
You might be able to find the talent required to build complex graph RAG systems, develop entity resolution and coref models, and develop ontologies that match your data shape. However, this is extremely capital and time intensive. Furthermore, you then have the infrastructure overhead for maintaining the system. As voice AI engineers, we think it's much more convenient to use duohub which abstracts all of the complexity away for you so that you can focus on what really matters - creating exceptional voice AI experiences.
We have hybrid graph and vector retrievers that can be easily deployed within your VPC within each region where you operate. You can then use our APIs to process your data into graphs, perform coreference resolution, fact extraction, entity resolution and more. You do not pay for API requests or storage when using an on-premise retriever.
We generally offer same-day support for all customers. However, if you need integration support, custom ontology development or Service Level Agreements, we maintain add-on options which you can purchase in the app to get the support your business requires if it goes beyond what we already offer.