policyskeptic: AI is nether the danger or the solution many think

The biggest danger of AI is not that it will become Skynet and destroy us all. It is that it will make us too lazy to exercise critical thinking when doing research.

There are huge expectations that AIs of various forms will transform business and drive big improvements in productivity. The stock market is currently rewarding firms like Microsoft and Meta (Facebook’s parent) not for huge success in deploying AI but for huge investments in the tech needed to run it. There is a good reason why Nvidia (the firm that makes the key hardware needed to build AI tools) is the most valuable firm on the planet right now (by market capitalisation). It is the company that makes spades that makes the money in the early stages of a gold rush.

Much of the faith in AI is driven by the apparent huge achievements in three things: beating human players at Chess and Go; solving the protein folding problem; providing convincing text in searches driven by chat engines like ChatGPT or DeepSeek.

But these apparent successes are not as convincing as they appear.

Take DeepMind’s success at building tools playing games like chess or Go. The extrapolation many want to make from this success is that, once set up, the computers seemed to learn how to play the games at an exponential rate. Therefore, the claim goes, if we set up a suitable AI it will rapidly outpace its creators and learn to solve any problem at a similar rate. But, given that we know how the learning algorithm works, this is a false extrapolation.

To cut a long story short, DeepMind’s AI is a pattern recognition engine. Given a suitable training dataset it was able to see patterns of play that people found hard to see. It found clever and unintuitive ways to play Chess and Go that people had not learned. It learned rapidly and far faster than any human could. But what made the learning so rapid was not magical. The key was that in discrete finite games with simple fixed rules, the computer could generate a reliable training set of a huge number of complete games, a number far exceeding all the games that humans have ever played against each other. That large dataset provided a solid and reliable set of training data from which a pattern recognition engine could detect the interesting patterns leading to success or failure. The key is how quickly reliable training data can be generated. Since the boards of either game are finite and the rules of play completely unambiguous, a computer can create a huge set of possible games extremely rapidly and know for certain which patterns led to success or failure. Few real world problems are like that.

The success of AlphaFold in predicting protein structures from amino-acid sequences looks like a counterargument. Alphafold has done a better job than several decades of alternative algorithms for predicting protein structures. I don’t question that. AlphaFold’s success is significant enough to deserve a Nobel prize. But it has not, as is often claimed, solved the protein structure problem. It has clearly found common patterns in the training dataset of known protein structures (we have several hundred thousand known structures and sequences from 5 decades of hard chemical effort since the first x-ray structures of proteins were seen) that evaded previous analysis. But that training set relies on the slow and difficult task of isolating proteins with known sequences, crystallising them and determining their structures using x-ray crystallography (with some help from sophisticated forms of NMR). For proteins similar to known structures, AlphaFold does a good job, but it often stumbles badly if the new protein is too different (it sometimes fails to predict the new structure when the protein sequence is slightlymutated and it often gives bad predictions when the new protein is very dissimilar to the known structures in the training dataset). The computer can’t do exponential learning as it can’t expand the training data without the slow hard work of real world biological chemists finding new structures.

And extrapolating the success to claim that this will revolutionise drug development–as DeepMind founder Demis Hassabis has recently been doing, is jumping the shark. His claim that we might cure all disease or develop new drugs in months–not years–is ludicrous. There is a particularly good takedown of his claim by Science Columnist Derek Lowe (who actually works in drug development and rapidly saw through the factual absurdity of the Hassabis claims). The limiting factors for AlphaFold is generating the set of known structures for its training data and that is slow. The limiting factor for drug development is not knowing the structures of target proteins. Many factors matter including identifying which targets matter; designing and synthesising actual drugs that affect the target; testing those drugs in real animals to identify efficacy and side effects; testing their actual efficacy in people. All of that takes time that is unaffected by knowing the correct structure of a known protein target.

But, what about the manifest success of AI chat engines at generating computer code or research results far faster than people? Chat GPT is amazing!

This is where claims that such tools will rapidly transform or replace many jobs is most worrying. I’m sure there are many jobs which could be replaced by AI. Many UK local newspapers are now owned by Reach plc. Their content is dominated by clickbait headlines designed to attract attention to an overwhelming flood of equally clickbait adverts. The entire operation could probably be managed by AI without using journalists at all with no diminution of the already abysmal quality. But only because the news and factual content has largely already hit rock bottom and the only performance metric that matters is how many clicks the headlines generate. That many are wrong, inaccurate, factually misleading, full of exaggeration or simply made up is pretty irrelevant. The journalistic ethos has already abandoned any commitment to truth or moral purpose or public good. So replacing journalists with AI that can’t have any useful purpose, focus on truth or moral stance would not make things any worse. By all means replace those journalists.

Sure, many coders now use chat engines to generate code snippets. And this can often generate code much faster than they could write it. This is not unexpected given how AIs work. There is a huge volume of code out there to learn from and AIs can summarise or extract patterns from that huge training set. But is the code always good code? Since one of the major limitations of the design of most AIs is that they are poor at judgement, this is unclear. Some evaluations have actually suggested that, in aggregate, AIs lower the productivity of programmers (speed of writing code is not the primary metric that matters, speed of writing code that works for its intended use is what matters).

A great deal of the time taken to develop software well is taken debugging; more is taken redesigning when users point out it doesn’t quite do what they expected; more is taken eliminating evils such as major security leaks A disturbing amount of AI code replicates major security problems after all the training set they have learned from is full of leaks and bad practice and no AI has the built in judgement to evaluate such things. AIs cannot reliably interpret intent; they are not designed to do so. Though, perhaps, this is also a criticism that can be levelled at many programmers who design their products to meet narrow technical descriptions but ignore the real people who need to use their software. For example, Hospital EPRs are notorious for being hugely hostile to the doctors and nurses who are their primary users. No AI will fix that.

My own narrow experiments in solving simple problems in code often yielded useful rapid results. But my hit rate of code that worked was only about 50%.

And when it comes to using AI to search for useful results I have found what typical tools generate useful but also very unreliable. Chat engines like Chat GPT or Deepseek are in many ways a better search tool than a simple Google search. But the results, in my experience, almost always contain hallucinations. When writing a column recommending some key books for healthcare managers, I asked a question something like “tell me the top 10 books on health economics” In the list of ten, two were entirely fabricated (with plausible authors, titles and cover art). More recently, when I asked for academic references that had evaluated the lives saved by the London Major Trauma system (which I was involved in developing and had kept an eye on over the years) the top two references (both presented alongside hyperlinks supposedly linking directly to the publications) were both entirely fake (the hyperlinks were to real but unrelated papers). Google searches for the dates, authors or journals did not yield relevant papers. In this case using AI cost me more time in checking the results that I would have spent had I not used AI in the first place.

The ability of AIs to generate plausible text looks magical. But that text is untethered to any judgement about the quality or truth of the content. ChatGPT and Deepseek and others have been trained to be bullshit generators (in the sense used by philosopher Harry Frankfurt: bullshit is content entirely indifferent to the distinction between truth and falsehood). The ability to generate plausible pictures also seems magical. But many of those pictures are now polluting the internet with fake images (some historians are very worried about the proliferation of fake history backed by actual plausible-looking images that turn out to be AI generated). There is huge risk that this is a doom loop for reliable facts.

Given the way AI is currently built there is simply no way it can reliably solve difficult real-world problems. It simply doesn’t have a reliable training dataset it can learn from. The upside of this is that there is simply no possibility of AI turning into SkyNet and destroying us all. Creating an apocalypse requires reliable knowledge of how the world works which AIs are ill equipped to have.

The real problem is entirely different. And it is a problem shared with many previous complex computer systems. People tend to believe the results the computer generates even when the results are wrong. The UK prosecuted many of the managers of local post offices for financial fraud on the basis of a big accounting system that contained many huge flaws. It took 20 years to start to fix this huge problem, described by the PM at the time as one of the biggest miscarriages of justice in the history of the UK. Trusting what the computer said despite evidence it was wrong was a major contributor to this catastrophe. But the system was not so opaque that the flaws could not, eventually, be uncovered. Had the system been an AI this might never have been possible as one of the characteristics of most AIs is a fundamental lack of transparency about how they derive their specific outputs. And AIs are very good at generating plausible outputs even when they are provably wrong.

In short it is the plausibility of AI output that is the big danger. When AIs have no ability to test the truth or falsehood of their outputs, plausibility is a huge danger. But that is as much a people problem as an AI problem. If AIs erode our sense of the difference between truth and falsehood or diminish our skepticism then we are in trouble.

policyskeptic

Pages

Thursday, 9 October 2025

AI is nether the danger or the solution many think

No comments:

Post a Comment