Posted on

AI – Hype or actually useful?

Thoughts of the Fractional ChiefTL;DR

Primarily from a tech view, there is a lot of talk today about AI,
and a lot of that talk is exagerrated hype, overconfidence and
sometimes outright false claims, especially by some companies
that claim AI can generate 100% error-free code and outperforms
regular devs by 1000x. 

Let’s be a bit more realistic about the expectations and possibilities.

What is an LLM? 

It’s in the name – LLM stands for Large Language Model.

An LLM is software trained on large amounts of human language and related content, that at its core, predicts the likely next token(s) (words or pieces of words) given the prompt and the context.
In most real deployments it is probabilistic, meaning that if you allow sampling, two identical questions can produce different answers, and if you force deterministic decoding, you can get consistent outputs, but most likely, an incorrect response.

LLMs learn statistical patterns from what they were trained on: questions and answers from the internet, books, documentation, scientific writing, and code.
They can produce novel combinations of ideas by generalizing across those patterns, and all while this is very useful, it also means they can generate convincing text that is wrong.

AI models also use something called quantization, and while this is a simplification and not the only cause of mistakes, it can be a contributing factor.

Quantization reduces numeric precision inside the model (for example in the weights and sometimes activations) to make the model smaller and faster to run.

Conceptually it is like rounding: 1.0001 and 1.0002 may become 1.0 after rounding.
That kind of approximation can increase error rates, especially on harder reasoning tasks or long chains of inference, because small inaccuracies can (and will) compound as the reasoning progresses over time, and this extends into other sources similiar in context to “quantization”. See list at the end of the article. 

That said, hallucinations are not caused only by quantization – they can occur even without it, because the model is optimized to produce plausible text, not guaranteed truth, as all LLMs are effectively – statistical models.

Please do not get me wrong – I’m all for the use of AI, but it has to be an informed and responsible use, without the notion that it is always right and error-free, that it can be let to do anything without proper vetting by humans and so on – which are just plain dangerous and seemingly all too common misconceptions out in the wild.

Knowing how to, and using it responsibly can bring tremendous benefits to pretty much any organization, remembering the keywords: “How to” and “responsibly”. 

Bottom line:
There is no LLM today whose output you should treat as guaranteed correct – you still need to apply common sense, knowledge, and verification to the response.

So, security and autonomous agents?

Security and autonomous agents should not be treated lightly.
If you combine high privilege access with an agent that can produce “plausible” but wrong decisions, you are taking on a very real and high risk.

There are already examples of organizations learning this the hard way.
Reporting in February 2026 described AWS incidents where an internal AI coding agent (Kiro) was involved in a 13 hour outage in December 2025 after deleting and recreating an environment, with Amazon attributing the root cause to misconfigured permissions and process failures rather than “the AI” alone.
Ref: Amazon’s AI deleted production. Then Amazon blamed the humans.

The lesson is simple: do not let an AI control production or security critical systems without strict constraints and prior human approvals combined with auditing.
If you use AI for anything that can affect production, vet it first.
If all it does is produce text, the risk is much lower, but if it can take (unmonitored) actions, the bar must be set much higher.

So then, what is it good for?  

Now you may think, so what can I use it for then? 

Now that you understand what it is, the practical uses become obvious – treat it like a smart collaborator you can use to offload lower level tasks:

  • writing a function or two
  • scaffolding
  • generating test modules
  • drafting documentation
  • summarizing tickets, logs, or discussions
  • try concepts and solutions
  • hunt down stubborn and hard to find bugs, given good input.

You still review and own the output, but you do not have to type everything.

It is a power tool. You can talk to it, reason with it, and use it to explore ideas.
You can use it to analyze how text is likely to be perceived by others.
You can use it to help hunt bugs and narrow down hypotheses if you provide solid context – It can save hours, or even days..

Fast research of complex questions can also work well: it can combine and condense large sets of information into something usable.
You just have to verify anything important, before you use it. 

The key to success is: bring a good specification. 
Be very precise about your functional requirements and the data models, with examples of the data and formats, and you will have a higher rate of success.
Also be equally precise about the do’s and don’ts, as this will set the borders for the AI to work with during the creation of what you want.   

In short, assume the worst and create a clear specification of what you do/don’t want with the limits and everything, as if the actor is an inexperienced human doing it for the first time. 

Ok, so, what are the practical limitations? 

… you ask. 

Because models can hallucinate and make mistakes, do not blindly copy-paste and run generated code, or other output, as if it is perfect.
Use it safely:

  • if the output is trivial, you may be able to validate it at a glance
  • otherwise, review it, test it, and treat it like a draft from a junior engineer or colleague. 

Also, the old adage still goes – garbage in – garbage out.
Vague prompts produce vague results, and this is true for humans too.
“I am hungry, bring me something to eat” can easily result in your least liked dish.
If you want useful output, give clear requirements and constraints.

Do not assume it will “just know” what you meant.

Architecture, mostly

For higher level architecture, be cautious.
Context windows are limited, and long projects exceed what the model can consistently hold in working memory.
That often leads to linear solutions, duplicated logic, and “works in isolation” code that becomes spaghetti when assembled into an application.

You can partially mitigate this, but it takes discipline:

  • provide tight specifications
  • use simple explicit constraints like “must”, “must not”, and “shall”, and keep the samples simple and short. 
  • force modular boundaries and interfaces
  • provide examples of required data models where possible. 
  • require tests and acceptance criteria

Agentic AI

Keep the rules simple: 

  • Never let agents freely control everything and make autonomous decisions on your behalf.
  • Be explicit about
    • what they can do,
    • what they cannot do,
    • and what requires a human approval step.

If in doubt, hard-limit the last 2 steps, by not giving them the interface to act. 
(access to credentials, chatbot groups for interchange (risk of cred/data leakage), or direct execution control.)

Additional reasons for hallucinations and limitations of AI models

Just like expressed before with the decimal quantization example, these are also limitations in terms of precision or similar limitations that add to the fact that models hallucinate. 
The limitations are in principle similar to the quantization, by limiting of precision in the sources or even removal of sources, conflicting data, too much simplification, short attention spans (forgetfulness) and so on as in the examples below: 

  • Objective mismatch: plausible vs true
    LLMs are trained to predict the next token that fits the context, not to prove statements are true.
    When the model is uncertain, it will often still produce something that sounds coherent.

  • Missing or ambiguous context
    If the prompt does not include key facts, constraints, or definitions, the model fills gaps with the most likely completion.
    That is where confident wrong details show up.

  • No built-in source of truth
    Unless the system is explicitly connected to retrieval (docs, database, search) and forced to use it, the model is relying on its internal patterns, which are not a reliable fact store.

  • Training data gaps and conflicts
    Training data contains errors, outdated info, and contradictions.
    The model can blend incompatible sources or pick a common but wrong pattern.

  • Overgeneralization from patterns
    LLMs are good at analogies. Sometimes they apply the right shaped pattern to the wrong situation and generate an answer that looks structurally correct but is factually incorrect.

  • Decoding settings and randomness
    Sampling settings (temperature, top-p, etc.) increase diversity but also increase risk of inventing details.
    Deterministic decoding reduces variance but can still be wrong, just consistently wrong.

  • Long context and attention limits
    With long prompts or multi-step tasks, important details can be missed, diluted, or overshadowed by more recent or more frequent cues. Errors then cascade.

  • Weak grounding in math and multi-step verification
    They can produce fluent reasoning text without actually checking intermediate steps.
    If a system does not force external checks (calculators, unit tests, compilers), they will sometimes “talk through” mistakes.

  • Quantization and smaller models (contributing factor)
    Lower precision and smaller models can degrade accuracy and increase error rates, but they are not the root cause.
    Full precision large models hallucinate too.