Why LLMs are not Always the Answer
By Mauricio Mathey Garcia-Rada (MBA/MSDS ’24), Lead Data Scientist, ASARCO
The headlines are everywhere. “AI Will Transform Your Business.” “Every Company Needs a Large Language Model (LLM) Strategy.” “The Future Is Generative AI.”
As a lead data scientist in industrial operations, here is what might surprise you. In my team, we are not using LLMs for most of the problems we solve. Not because we are behind the curve, but because we understand something fundamental that often gets lost in the hype. It is not about the model. It is about the problem.
Let me be clear. LLMs are remarkable tools. But trying to use them for predictive analytics in industrial operations is like trying to use a screwdriver to hammer a nail. They are different tools, designed for very different jobs.
What LLMs Actually Do Well
LLMs excel at generating text. They are trained on massive text datasets to predict the next token in a sequence. If I write “The cat sat on the ___,” the model estimates the probabilities of what might come next, selects one, and then repeats the process word by word.
This probabilistic, autoregressive structure is a major reason hallucinations occur. Once the model commits to a low probability path, it has no mechanism to verify or correct itself. That structure is exactly what makes LLMs powerful for writing, summarization, and explanation.
It is also why they tend to be a poor primary tool for many industrial analytics problems. Predicting absenteeism, forecasting equipment downtime, anticipating failures, or optimizing production schedules are not text generation problems. They are decision problems rooted in structure, causality, and time dependent dynamics.
For instance, tree-based models like gradient boosting are designed to work well for structured tabular data. They naturally handle mixed variable types and complex interactions. LLMs are designed for sequential language modeling. Using an LLM for tabular prediction means fighting against its architecture rather than working with it.
The issue is not that LLMs make errors, all models do. The difference is what they were trained for. LLMs are optimized for general text prediction. Applying them to specific problems is like using Wikipedia to diagnose a problem with your car when you need the technical manual. They generate confident-sounding outputs, but those outputs aren’t calibrated to your actual data or validated against your real outcomes. Even in writing and coding, I routinely see subtle errors like code that almost works or explanations that miss key constraints.
Analytics Starts with the Decision, Not the Tool
Before talking about models, it is essential to talk about problem framing. In analytics, there are two broad categories of problems.
Supervised learning is used when there is a known outcome to predict, such as price, demand, or component failure. This includes regression and classification. Unsupervised learning is used when no true label exists and the goal is to discover structure, such as detecting anomalies or grouping similar behaviors.
Algorithms are like recipes. They are instructions for accomplishing a specific task. The key is not the recipe itself, but whether it matches the dish you are trying to cook.
In practice, this means starting with the decision. What action will be taken based on the analysis. What tradeoff is being managed. What does success look like. Only then does model selection make sense.
A Concrete Example from Operations
Consider maintenance in an industrial setting. A department is struggling with unexpected downtime from a specific piece of equipment such as an electric motor.
If the goal is early warning, unsupervised learning may be appropriate. Anomaly detection models can identify deviations in sensor behavior that signal abnormal operation before a failure occurs.
If the goal is to define a maintenance policy, the problem changes. Now supervised learning may be needed to estimate component lifetimes and failure probabilities. The decision balances the cost of proactive replacement against the risk and impact of catastrophic failure.
Same department. Same general topic. Completely different analytical approaches, driven by the decisions being made.
This is where many organizations go wrong. They start with the tool and then search for a problem that fits it.
Why “Just Tuning the LLM” Does Not Solve This
When teams raise concerns about fit, the response is often, “Can’t we just tune the LLM?” There are three versions of that suggestion.
First is parameter tuning. Adjusting temperature or sampling settings changes how confidently a model responds, not what it knows. It does nothing to bridge the gap between generic language patterns and highly specific operational behavior.
Second is fine tuning on company data. True fine tuning does modify model weights, but it does not change the model’s fundamental architecture. An LLM remains a probabilistic text generator. Fine tuning can improve terminology or narrow behavior, but it does not turn the model into a causal system or one that understands physical processes, constraints, or temporal dynamics.
Third is retrieval augmented generation (RAG) which is useful when the task is to locate and explain existing information. However, in complex operational environments, information is fragmented across tens of thousands of documents, logs, and records. A recent Stanford study shows that as document collections grow beyond roughly 10,000 documents, RAG performance degrades sharply and can fall below simple keyword search. More retrieval does not mean better answers. It often means more noise. Most importantly, retrieval does not create understanding. It only surfaces text. Our hardest problems are not about finding information. They are about modeling systems and making decisions under uncertainty.
This is not a rejection of LLMs. It is a recognition of fit. We use LLMs where language is the problem. We do not use them where structure, causality, and domain specific dynamics dominate.
Complexity is a Cost, not a Virtue
LLMs belong to a class of algorithms known as generative models. They contain billions of parameters and are designed to learn from massive amounts of unstructured data like text, images, and audio. That scale of complexity has a purpose, but only when the problem demands it.
Consider a simple analogy. If you are deciding whether to buy a chocolate bar, would you hire a financial analyst to build a complex financial model projecting costs and benefits over the next decade? Of course not. That level of complexity would be absurd. But if you are evaluating a major capital investment, the same model becomes appropriate.
Now imagine hosting friends for dinner and deciding what to cook. Hiring a financial analyst to project costs and benefits would feel nonsensical because it is the wrong tool for the decision. You probably would be thinking about whether there are any allergies, restrictions, what kind of food your friends eat, etc.
Many industrial datasets are highly structured. Think spreadsheets with timestamps, sensor readings, production volumes, and operating states. For these problems, simpler models, such as XGBoost or random forest just to name two, are often not only sufficient but superior. They are easier to validate, easier to explain, and easier to maintain.
The Broader Lesson
This pattern extends beyond LLMs. Consider the current excitement around agentic AI. In many cases, the real value does not come from the AI itself but from automation. Remove the AI component, implement solid robotic process automation, and the result is often cheaper, more stable, and easier to govern.
LLMs are one tool in a larger toolbox. A powerful one for certain tasks, but just one among many. Problems arise when they become the default answer.
A useful warning sign is this. If the first solution proposed to a business problem is “LLMs,” push back. Ask why. Ask what simpler alternatives were considered. Often, the issue is not ambition but a limited toolkit.
Current research on LLMs is exciting and necessary, and exploring these models helps us understand what is possible. But experimentation alone does not solve business problems. Experiments help us learn, but real value comes from applying the right methods to the right decisions within the context of actionable work.
In business, that means starting with the decision, framing the problem clearly, and selecting the simplest tool that reliably supports action. Different problems require different analytical approaches, and success comes from having the discipline to select the tools that fit the decision, not the ones that are most fashionable or hyped. That is not being behind the curve. That is being ahead of it.


The Batten Institute @ UVA Darden Substack is a space to share bold ideas and ask critical questions about what drives us forward. Our Batten community is broad and deep and our community members – faculty, staff, alumni and students – are invited to share their stories, ideas, and reflections on topics of interest to founders, innovators, and leaders everywhere.



