From The Mandarin: Santow tips the bucket on AI slop
In a landmark speech delivered to the Sir Vincent Fairfax Oration in Sydney on Thursday, former human rights commissioner and now sought-after ethical adviser and academic Ed Santow delivered a serious wake-up call to assorted artificial intelligence cheer squad leaders and positivity meme flunkies.
Santow is positive about AI but also highly aware of its impact on societal functions, governance, and culture.
In a tightly woven speech that planted a deep stake in the necessity of the retention of knowledge and memory, Santow argued that “history matters on its own terms”, and its interpretation is also powering the next version of what we know as language models dip into the well.
“As AI disrupts our economy, politics, society and environment, I will make three arguments today:
AI might seem like it comes from the future, but it learns from the past, and so it also anchors us to that past.
Our history — or rather our choices about the versions of history that are recorded and remembered — influences how AI takes shape.
It is not enough that we expose AI systems to a ‘more accurate’ view of history; we must also draw the right lessons from history if we are to avoid repeating the mistakes and injustices of the past,” Santow said.
Exposure of AI to better feedstock is a difficult topic because, in large part, it assumes that the quality of inputs will self-correct problematic outputs. Yeah nah.
“Throughout history, we have built machines that are born like Venus — fully formed. When a car rolls off the production line, all it needs is a twist of a key or the press of a button, and it will work as intended. This is not true of AI,” Santow argued.
“AI systems start as ignorant as a newborn — perhaps even more so. A baby will search for its mother’s breast even before the baby can see. An AI system possesses none of a baby’s genetic instincts. Nothing can be assumed. All knowledge must be learned. The process of teaching an AI system — known as ‘machine learning’ — involves exposing the machine to our world.”
There’s a further problem, too, and it’s a systemic one. As internet pioneers like Vint Cerf noted, the great tech behemoth has trouble retaining both memory and history.
“The regime that should be in place [is] one in which old software is preserved; hardware can be emulated in the files so we can run old operating systems and old software so we can actually do something with the digital objects that have been captured and stored,” Cerf said in 2018.
“Think of all the papers we read now, especially academic papers that have URL references. Think about what happens 10, 20, 50 years from now when those don’t resolve anymore because the domain names were abandoned or someone forgot to pay the rent.”
That’s now happening.
But the warnings are at least a decade old.
I am wary of the about-face in my thinking on Large Language Models. Right through my time in lit academia, I was unusually
positive about LLM and its uses in my field. I do not have the skillset, for instance, to work with or for
Digipal, but I find their stuff REALLY COOL. It was something of a frustration to my mentors (and me, tbh) that the kind of literary scholarship I wanted to do just... didn't call for these kinds of digital tools. Even in the literary composition realm - while I encountered some truly un-informed uses of LMMs - I was significantly more willing than most literature scholars to believe that LLM linguistics could make findings as to authorship, at least on a "more likely than not" level.
In part, that is because in first-year English I was assigned some readings (in a sub-unit module on functional linguistics for literary studies) which looked at how forensic linguistics, focused not only on easily-identifiable dialect words but on patterns of "filler" words and sentence structure, had demonstrated throughout the 90s that Australian police were influencing interview records, particularly from Indigenous subjects, in ways which ranged from outright fabrication to shaping/skewing interview reports.** The case made by pragmatics is that individual speakers' uses of function words, sentence structure, etc, are shaped by context (e.g. are you or are you not a policeman), but can also, with sufficient corpus, be distinguished among individuals. I don't really see any reason to suppose that Billy Shakes is any more unique than the wrongfully convicted Mr Kelvin Condren, or that imitators of/collaborators with Billy Shakes would be less detectable to an algorithm than false police reports. Oh, there are other factors - can't use punctuation for early modern texts, because the printers did that part; medieval texts have layers of author, scribe, oral retellings and subsequent copyings, etc. I've never yet encountered such an identification that I'd hang my hat on as absolutely conclusive out of nowhere, but such studies never come out of nowhere and texts always have some context you can look at. Likely enough to work with? Sure.
I am very wary, therefore, of my current tendency to reskeet dunkings upon AI, sweeping statements about the "word association machine", etc. There are, in addition to fascinating historical uses of LLMs, very important practical ones! I would like to see those continue and be improved upon!***
I don't think I'm 100% wrong about generative LLMs producing "slop" at the moment, that's pretty clear. But I am concerned that I'm plugged in to a social media feed of academics and wonks who not only see all the current problems but also seem to be unaware of or walking back on the previously attested promising uses. So. I am not recirculating nearly as much as I read, and I am trying to weight my reading via sources like The Mandarin, rather than via Academics Despairing or other versions of the BlueSky Hot Take mill.
The article above says that Santow is "positive about AI". I rather wish it had covered
what Santow is positive about, because from what they've quoted from him as to the things to be wary of, he seems to have a nuanced grip on things.
* A stand-out was a linguist using the out-of-copyright editions in the
Corpus of Middle English Prose and Verse, apparently unaware how much editorial shaping went into them, or that they are not at all up-to-date, or, upon quizzing by one of my colleagues, that the poetic texts might predate the manuscripts and differ significantly from spoken English at the time of the manuscript composition while also not reflecting spoken English of the putative poem composition date.
** I don't have my 2005 syllabi to hand anymore, more fool me. I do not think that the article we were given was Diana Eades,
"The case for Condren: Aboriginal English, pragmatics and the law", Journal of Pragmatics 20.2 (1993) 141-162, but it definitely cited that article and Condren's case. Condren is a QLD case and I think the article I read was about a cohort of WA police transcripts - but that article I just cited is useful in that it has a good-enough overview in the unpaywalled abstract to illustrate my point.
*** For instance,
PHREDSS, the system which monitors presentations to NSW emergency departments and produces a read-out with alerts of Public Health Interest, is an LLM. You can find a fairly readable evaluation of its use in regional NSW in relation to large gatherings and public health disaster response
on the Department of Health and Aging's website. What I know from my Sources in stats is that the surveilance model is designed specifically for how emergency departments use language and record presentations, and then even the simplest-seeming uses for public health are looked at by experts in both this kind of stats, and epidemology.
The example I was given by my Sources was "pneumonia": in 2020, every day our good friend PHREDSS delivered unto the NSW government its ED data, tagged by presenting condition and location. Pneumonia was a leading indicator for COVID-19 at the time. However, someone has to check and weed out the "person didn't actually drown but they got water on the lungs" kind of pneumonia. (Given what I now know about the frequency of aspiration risks in the elderly and people with chronic illnesses, it's not going to be the surfing accidents that are the main reason you need a human to look at it: it's that if you get a statistical spike in pneumonia admissions from aged care homes in X region, you could be looking at a viral outbreak or you could be looking at some systemic failure of care leading to a whole bunch of elderly people aspirating and it not being addressed appropriately, leading to pneumonia.)
This 2015 article looks at the ED-side data capture problems relating to "alcohol syndrome", and whether such data has "positive predictive" value for public health, if this sort of thing tickles your brain.