Context engineering is information retrieval

The stages of an LLM app seem to go like this:

  • Hardcode the first prompt, get the end-to-end app working.
  • Realise that the answers are bad.
  • Do some prompt engineering.
  • Realise the answers are still bad.
  • Do some more prompt engineering.
  • Discover vector databases!!!1
  • Dump a ton of data as plain strings into the vector db for semantic search on embeddings.
  • Post your achievement on Twitter.

The journey usually ends here -- with an impressive demo. But the demo is usually hand-picked out of many examples, and for most users' most queries the system doesn't work.

What's next? Improving on this would take much more work. Setting up even semi-rigorous evaluation takes annoying work including manual labelling. Fetching the right context takes even more work. Prompt engineering turns into orchestrating multi-prompt chains with intent detection leading to interleaved Python code and LLM calls...

Which is to say, another form of engineering.

What I wanted to focus on, though, is the "fetching the right context" part. While it may seem new, the problem is the age-old Information retrieval problem -- and solutions are probably similar. So my suggestion to anyone working to remove hallucinations: brush up on your Information Retrieval 101, and be inspired by the search-engine-builders of 20+ years ago.