The secret to start preparing for the world of bots and AI agents

Written by Eric Uggla | Dec 19, 2025

The conversation around AI is dominated by technology. But to use bots and AI agents well, something far less glamorous is required. For those of us working in analytics, the best preparation isn’t a new platform or framework – it’s returning to an old, often postponed task: documentation.

THE NEW EXPECTATION: "JUST ASK THE AI"

Many are now aware of large language models and LLM:s and accessing them via chats. And know the issues of hallucinations and wrong answers – as well as the ability to unintentionally or intentionally get an answer we like from it. Technological solutions are hyped.

This stands somewhat in opposition to the world view of analysts, data-oriented people and engineers that likes to find the answers in large datasets and reasoning around numbers. These sources-of-truths obviously should be of great utility to help the LLM:s to being more correct.

As everyone gets used to get answers (correct or not) from their chatbots, they will want to ask it for the latest sales figure rather than having to look up a report or dig into a database. And other AI systems will require access to this data as well (see the latest buzzwords of Agentic AI and protocols like MCP). Whether you sign up for the hype train or not, the expectations will come.

So how do we start to prepare to do that, and what should we look out for?

AN EXPLOSION OF TOOLS, PRODUCTS AND PROMISES

One perspective is the technical one. Many products, platforms and much hype are starting to pop-up with a bewildering number of discussion surrounding integrations and advanced AI platforms managing much more than just LLM:s.

Technical protocols such as MCP, query routing, selecting correct models, and AI engineering in general are important concepts being discussed at the moment.

WORDS ARE IMPORTANT

We must bear in mind that no matter how advanced the technology becomes, agents and LLM:s still operate in the domain of words.

But what does words mean?

This is not a new challenge for us in data engineering or analytics. Imagine a meeting where a salesperson and someone from accounting are arguing over last year’s sales figures. The salesperson considers what he registered in the sales system when he talked to the client. The accounting person might only consider actual invoiced items that finally was signed and paid for. They have different concepts for the same term or word. The keyword here is “semantics”, which on a a philosophical level deals with the meaning of words.

Between humans, we solve confusions such as the one above by talking to each other and aligning to each other through micro-discussions. We understand that we refer to slightly different meanings.

But large language models as of today have no concept of meaning. They are just statistical models based on words. So, in order for them be effective we have to explain everything precisely. And we can pass on what things mean by describing it through words. Or override it’s usual behaviour in agents (if asked for sales, please ask for specification). But if our description is just “Here you find sales”, it will not do a great job.

DOCUMENTATION

The solution is not some new complex magic, but something we all probably have on the back log from yesterday:

Just document what every dataset and field actually means:

Consider the example above. A LLM will not understand what we mean with “sales inc” in Table 1 as it isn’t specified. Just by adding the row description will get it a fighting chance to answer a question with the intended sales number.

This can be done with prompt engineering, such as providing the description in the query (“sales inc means...”). Or you can put it in the reports. Or ideally integrating it in a compliance/lineage system such as Purview. There is no one-size-fits-all solution.

And the trick here is that you can ask an AI to document your system, summarize descriptions and take a guess. It might be able to figure out that your sales definition corresponds with typical sales processes or financial standards. But for an unclear “sales inc” it will likely guess wrong, so you will still need to correct it.

POINTS TO CONSIDER

Though providing meaningful descriptions is a good idea, it entails a few pitfalls for us data-centric people:

Words are different than the hard truths we data people are typically working with. Lawyers, linguists and marketing people can attest to this. It is important to be specific and that definitions are agreed upon.
The definitions must be as central and shared as possible. Having differing descriptions over different reports/databases – or across the organisation – will lead to confusion. And of course everyone needs to agree on the concepts/terms.
As describing everything becomes a very large task, it is important to start with considering what is important to have a shared meaning of. So, start by identifying your critical data elements and work from there to identify what might be complex or unclear.
You will likely discover new tools, API:s and words around this. Microsoft is introducing Fabric IQ right now, and similar tools can be found with other competitors. However, the tools are not as important as starting the process to find what needs explanations/clarifications – and then figure out where to store it and keep it alive. You can ask AI for help, but you will need to confirm that it matches reality.

CONCLUSION – DOCUMENT EARLY, MOVE FASTER LATER

Without documented systems, metrics and ratios, everything remains an unknown unknown. Therefore, the first and most critical step is simply to document what exists today, to the best of our ability, and to surface ambiguities rather than hide them.

As technical solutions mature, the urgency of this foundation will only increase. Expectations will quickly shift from “this would be nice to have” to “this should already exist.” Starting early allows organizations to build understanding, ownership, and process maturity. Trying to retrofit structure and documentation later – under pressure and scrutiny – is almost always more painful and far less effective.

Author: Daniel Hedblom, BI Consultant, Random Forest

View full post