- Search augmented generation improves the accuracy and specificity of large language models.
- However, there are still challenges and specific implementation techniques are required.
- This article is part of Build IT, a series about digital technology trends disrupting industries.
The launch of OpenAI's ChatGPT in November 2022 sparked the latest wave of interest in AI, but it came with some serious issues. People could ask questions about almost any topic, but many of the answers in the large language model were either unnecessarily general or completely wrong. No, ChatGPT, Mars doesn't have a population of 2.5 billion people.
Such problems still plague large language models. But there is a solution. It's search extension generation. This method was invented in his 2020 by a group of researchers from Meta's AI research group and is rewriting the rules of LLM. The first wave of vague and meandering chatbots is receding and being replaced by expert chatbots that can answer surprisingly specific questions.
Although RAG is relatively unknown outside of the AI industry, it has come to dominate conversations among insiders, particularly those building user-facing chatbots. Nvidia used his RAG to build his LLM to help engineers design chips. Perplexity employs RAG to build an AI-powered search engine that currently has over 10 million monthly active users. Salesforce used RAG to build a chatbot platform for customer interaction.
“We've been looking at databases for a long time and we've been really excited about AI, but what are the unique use cases?” said Bob Van Ruit, CEO and co-founder of an AI data infrastructure company. Was it? RAG was the first.” To avoid. “From the user's perspective, the problem was simply that the generative model was stateless (meaning it couldn't update itself in response to new information).” But RAG won't remember it the next time you use it.
Innovations taking AI by storm
“Any industry with large amounts of unstructured data can benefit from RAG,” said van Luijt. “That ranges from insurance companies to law companies to banking to telecommunications.” Companies in these industries often have vast amounts of data, but it's up to them to sift through it and gain insights. It's a difficult task. “That's where he RAG adds a lot of value. You throw that information in and you're like, 'Figure it out,' and it does.”
This is achieved by adding a step when LLM generates the response. Rather than providing a response based solely on how the model was trained, RAGs can also use additional data (most often text, but modern methods include images, audio, and You can process the video) and send it back to them.
Nadaa Taiyab, a data scientist at healthcare IT company Tegria, shared an example of a chatbot she designed. This chatbot uses RAG to answer nutrition questions based on data from NutritionFacts.org. Although the nonprofit highlights research linking eggs to type 2 diabetes, most LLMs do not report this correlation when asked whether eggs reduce the risk of diabetes. However, chatbots powered by her RAG can retrieve and reference her NutritionFacts.org public works in responses. “And it worked,” Tayyab said. “It's so magical.”
but not perfect
This magic makes RAG the go-to technology for anyone looking to build a chatbot based on specific, often proprietary data. But Van Lujit cautioned that “like any technology, it's not a silver bullet.”
All data used for RAG must be converted into a vector database and stored as a series of numbers that LLM can understand. This is well understood by AI engineers as it is the core of how generative AI works, but the devil is in the details. Van Lujit said developers need to employ certain techniques such as “chunking strategies” to manipulate the way RAGs present data to LLMs.
The most basic strategy, fixed-size chunks, divides your data into pieces like pizza. All slices are (hopefully) the same size. However, this is not necessarily the best approach, especially if the LLM needs to access data across different documents. Other strategies, such as “semantic chunking,” use algorithms to select relevant data across many documents. However, implementing this approach requires more expertise and access to powerful computers. Simply put, it's better, but not cheaper.
Overcoming that obstacle can quickly lead to other problems.If successful, RAG may work a little Too good.
Kyle DeSana, co-founder of AI analytics company Siftree, warned against careless RAG implementation. “They lose touch with the voice of the customer without even realizing it, without doing the analysis,” DeSana said.
He said there can be pitfalls to a successful RAG chatbot. Chatbots with domain expertise can respond within seconds and encourage users to ask further questions. As a result, interactions that occur can result in questions that are beyond the scope of the chatbot. This results in what is called a feedback loop.
Unraveling feedback loops
While analysis is essential to identify shortcomings in RAG-powered AI tools, they remain reactive. AI engineers want to find more proactive solutions that don't require continuous interference with the data that RAG provides to AI. One of the state-of-the-art technologies he generates is Feedback Loops, which attempts to utilize feedback loops to enhance desired outcomes.
“RAG pipelines are typically unidirectional,” explains van Luijt. However, AI models can also use the data generated to improve the quality of information available through RAG. Van Lujit cited vacation rental companies like Airbnb and his Vrbo as an example. Listings on these sites include many details, some of which may have been overlooked or omitted by the listing creator (such as the location having easy access to transportation) mosquito?). AI is very good at filling these gaps. Once that's done, you can include the data in his RAG to improve the accuracy and detail of your answers.
“We ask the model, 'Based on what you have, do you think you can fill in the blanks?' It starts updating automatically,” Van Lujit said. . Weaviate publishes real-world examples of generative feedback loops, including a recreation of Amazon's AI-driven review summaries. In this example, he not only wants to publish the summary so people can read it, but he can also save it to the database so that he can later retrieve it through RAG. If a new summary is needed in the future, the AI can refer to previous answers rather than re-ingesting all published reviews (which can range from tens or hundreds of thousands of reviews) .
As the AI industry continues to grow, van Luijt and Taiyab speculate that new technologies will push models to the point where they no longer require search. A recent paper by researchers at Google describes a hypothetical his LLM with an infinite context. Simply put, AI chatbots have virtually infinite memory and are able to “remember” all the data they have been presented with in the past. Google announced in February that it had tested a context window of up to 10 million tokens, each representing a small chunk of text. It is large enough to store hundreds of books or tens of thousands of short documents.
Currently, the required computing resources exceed those of all but the largest tech giants. Google said in a February test that the hardware reached its “thermal limit.” RAG, on the other hand, can be implemented by a single developer in their free time. It has been scaled to accommodate millions of users and is available now.
“Maybe in the future RAG will be completely phased out, because it's not perfect,” Taiyab said. “But for now, this is all we have. Everyone is doing it. It's a fundamental application that is at the core of a large language model.”