Conquering Accuracy to Deploy Conversational AI at Enterprise Scale

This case study examines how Echo AI overcame LLM accuracy issues to quickly move enterprise customers from prototypes to production using Log10’s AI-powered LLMOps workflow.

Situation

Echo AI is a Conversation Intelligence Platform that uses generative AI to review customer conversations and uncover hidden insights. Companies such as StitchFix, CLEAR, Wine Enthusiast and more are suddenly able to go from manually reviewing 1% of conversations to understanding 100% of what their customers are saying; they now can ask and answer nuanced questions about their business that might have never occurred to them previously. This is a transformative shift and a net-new use case.

A single conversation might get analyzed 50 different ways, with 50 unique API calls to 10 different LLMs, producing millions of LLM outputs over hundreds of thousands of customer interactions. Echo AI has built a powerful analysis platform that classifies these outputs to drive actionable business insights and trigger downstream workflows.

Echo AI analyzes every single conversation across every channel with human-level depth and gives leaders the answers to the critical strategic questions that drive growth and retention.

Problem

A major challenge with LLMs today is accuracy: LLMs are notoriously error-prone, producing different outputs even when given the same prompt, which had downstream implications for Echo AI if outputs were tagged erroneously. For example, Echo AI’s customers made financial decisions like providing refunds based on tagging information, so it was critical that LLM outputs were highly accurate and therefore correctly tagged.

From escalations to supply chain issues, Echo AI’s powerful tagging abilities track anything with just a simple prompt.

Making LLM outputs more accurate began with the initial LLM prompt, which often needed to be tailored via an iterative prompt engineering process to produce reliable results. Echo AI engineers found that initial prompts could start out with an accuracy as low as 40-50%, and that boosting this accuracy to the 95%+ accuracy required for production was an arduous journey. The process typically required several days of effort to set up, with ongoing manual monitoring that could last indefinitely.

If models changed they’d have regressions and would have to start the cycle all over again on prompts that they’d already tuned.

Because the AI tech stack was emerging and developer tooling was nascent, fixing LLM accuracy issues was prohibitively time consuming. Echo AI knew that they needed a more powerful tooling solution or their ability to roll out new customers would stall.

The Log10 Solution

Echo AI turned to Log10, which provides an end-to-end LLMOps workflow that supports logging, prompt engineering and optimization, enabling engineers to solve LLM accuracy issues within a single environment.

With just one line of integration, they began logging and tagging their LLM calls to Log10’s platform. In contrast to other solutions on the market, Log10 could handle direct LLM calls to multiple providers as well as to frameworks such as LangChain without the latency delays of a proxy.

from log10.load import OpenAI 

client = OpenAI(tags=["customer/activision", 
                      "use-case/customer-support"]) 
response = client.chat.completions.create( 
    model="gpt-3.5-turbo", 
    messages=[ 
        { 
            "role": "system", 
            "content": "You are a customer support 
            representative for Activision." 
        } 
        { 
            "role": "user", 
            "content": "How can I reset my password if I 
            no longer have access to the email address 
            associated with my account?", 
        } 
    ], 
    temperature=0, 
) 

print(response)
Log10 is easy to integrate. With just one line of code, all your LLM calls are captured, whether directly using the provider (OpenAI, Anthropic, Together, MosaicML, etc.) library, or via 3rd-party frameworks such as LangChain or magentic.

When a prompt needed improvement, Echo AI engineers could filter and search millions of logs in seconds to isolate a sample set of LLM call logs, open them in a collaborative Playground, and start improving the prompt using AI-powered suggestions along with a rich array of models and hyper parameters.

Whereas previously it took multiple days to improve a prompt, with Log10's workflow Echo AI engineers could optimize a prompt for production in less than 1 hour.

Log10 provides a streamlined end-to-end workflow to find and fix inaccurate LLM outputs.

Identifying an opportunity to make use of Echo AI’s massive trove of logs, Log10 built a Prompt Engineering Copilot to further boost accuracy and efficiency.

The copilot continuously analyzed Echo AI’s logs, testing for ways to improve accuracy via prompt engineering, and surfaced optimizations to solutions engineers when performance thresholds were crossed. On critical tasks the copilot improved accuracy by 10-20 F1 points resulting in greater user satisfaction and trust in Echo AI’s offering.

Log10’s Prompt Engineering Copilot provides AI-powered assistance to rapidly improve the accuracy of prompts and models.

With the copilot, Echo AI Solution Engineering team was able to onboard a 10x influx of customers without increasing the size of their team.

Results

As part of a new guard forming The Modern AI Stack, Log10 is bringing state-of-the-art tooling that solves net-new use cases to the marketplace. With Log10’s end-to-end LLMOps workflow, Echo AI efficiently resolved LLM accuracy issues and quickly moved enterprise customers with 6-figure contracts from prototypes to production.