In the auditing industry, precision is essential. The generation of false information by Large Language Models (LLMs), often referred to as hallucinations, is simply unacceptable. To mitigate these risks, we have been steadily refining our approach to Retrieval-Augmented Generation (RAG), as detailed in our previous blog post. Our journey began with rudimentary semantic search, followed by the implementation of Precision Chunking and MetaDocumenting to enhance the retrieval and generation process. Today, we are proud to unveil Knowledge-Grounded Generation, a major advancement focused on ensuring precise retrieval through a combination of string similarity (Levenshtein distance) and semantic search (vector embeddings). Knowledge-Grounded Generation is a new RAG methodology that ensures that LLM responses are grounded in pre-existing, verified information.
We have been working very closely with our design partners to improve the quality of answers provided by our Q&A chatbots. The first obvious step was to introduce our RAG pipeline and upload their proprietary data as context, which created a significant improvement in the accuracy of our answers. But after testing the quality of the answers, we began to realize that the answers were good, not great, and our partners were hesitating to release the chatbot to their customers out of fear that incorrect answers could reflect poorly on them. We were not satisfied with our answers and neither were out partners, so we decided to take a radically new approach to generating high quality, accurate answers. We set out on a mission to generate thousands of verified question and answer pairs and then apply RAG (retrieval and generation) from the verified Q&A pairs.
Our solution initially synthetically generates potential questions and answers from an uploaded document. These are reviewed and approved by a subject matter expert, becoming verified answers in our database. When a user asks a question, if it matches or is similar to one in the database, the corresponding verified answer is used and tailored to the specific question. For new questions not in the database, our system will use our other RAG techniques to generate "unverified" answers. If an answer is approved by the user or expert, it will be added to the verified database for future use. This creates a continuous learning loop, expanding the database of verified answers with each interaction and improving the system's accuracy over time. Our Knowledge-Grounded Generation methodology nearly eliminates all hallucination issues for our partners’ users.
Let’s take a closer look at an example. If the user asks, “What does PCAOB stand for?” and a question in our database perfectly matches it, then this is easy.
To deal with some of these non-identical but very similar variations, we calculate the Levenshtein Distance between the query and the stored question. The Levenshtein Distance measures the difference between two strings by counting the minimum number of single-character edits (insertions, deletions, or substitutions) required to change one string into the other. By calculating this distance, we can determine how similar the incoming query is to any of the stored questions. We set a threshold of 2 or 3 single-character edits to ensure we’re capturing precisely the correct question.
However, while Levenshtein Distance is useful for straightforward text comparison, it doesn’t capture the full complexity of natural language, such as in the variations at the bottom of the table:
Here, different words or phrases can convey the same meaning. To address this, Tellen also leverages advanced semantic search techniques. We use vector embeddings to represent the meaning of both the incoming query and the stored questions, allowing us to compare them on a conceptual level rather than just a textual one.
When a query is received, we convert it into an embedding—a high-dimensional vector that encapsulates its semantic meaning. We then compare this embedding to the embeddings of the stored questions using a similarity metric (in our case, cosine similarity). If a stored question has a high similarity score with the incoming query, it is considered a match, even if the exact wording differs.
However, we keep the threshold for the similarity metric low, around 0.3, so as not to show users approved question/answer pairs such as, “Who founded the PCAOB?” which has a very different meaning to the above query.
As a final step, we send the user’s original query and the approved answer, found by the combination of Levenshtein Distance and semantic search, to an LLM and ask it to rewrite the answer, on the condition that it only draws knowledge from the raw pre-approved answer itself, to suit the user’s initial query—so the answer appears tailored to the question asked. We call this the dynamic approved answer and the process is called Knowledge-Grounded Generation.
By combining Levenshtein Distance with semantic search, as well as a dynamic final response, Tellen ensures that users receive accurate and relevant answers, significantly reducing the risk of hallucinations. Our approach enables chatbots to provide reliable responses, especially in critical fields like auditing and accounting, where precision is paramount.
This multi-layered strategy—string similarity followed by semantic matching and then LLM inference—forms the core of Tellen’s solution to the hallucination problem, ensuring that only approved and accurate information is presented to end-users.