Retrieval-augmented generation

Retrieval-augmented generation, or RAG, is a fancy term hiding a simple idea:

Problem: LLMs can reason, but they don't have the most relevant facts about your situation. They don't know the location of your user, or the most relevant passage from the knowledge base, or what the current list of customers is in their CRM. Whatever your application, you need to add Context.

Solution: just add this information into the context window!

Problem 2: but the context window is finite (and small), and you probably can't fit in your company's whole handbook, or a technical manual, or a customer's usage logs.

Solution 2: gather up all the potentially-relevant things for your query, index them in advance, and at runtime try to retrieve the most relevant ones for the LLM to use in the generation step.