AI-generated content in Wikipedia - a tale of caution

Day 1 23:55 Ground en Ethics, Society & Politics

Dec. 27, 2025 23:55-00:35

I successfully failed with a literature related project and accidentally built a ChatGPT detector. Then I spoke to the people who uploaded ChatGPT generated content on Wikipedia.

It began as a standard maintenance project: I wanted to write a tool to find and fix broken ISBN references in Wikipedia. Using the built-in checksum, this seemed like a straightforward technical task. I expected to find mostly typos. But I also found texts generated by LLMs. These models are effective at creating plausible-sounding content, but (for now) they often fail to generate correct checksums for identifiers like ISBNs. This vulnerability turned my tool into an unintentional detector for this type of content. This talk is the story of that investigation. I'll show how the tool works and how it identifies this anti-knowledge. But the tech is only half the story. The other half is human. I contacted the editors who had added this undeclared AI content. I will talk about why they did it and how the Wikipedians reacted and whether "The End is Nigh" calls might be warranted.

Speakers of this event

Mathias Schindler

Ich habe einige Jahre lang bei Wikimedia Deutschland Open Data-Themen gemacht, dann als Mitarbeiter eines Europaparlamentsabgeordneten Urheberrecht, Open Data und andere Themen und habe nun erfolgreich Open Data zu meinem Hobby gemacht, seit ich in der Privatwirtschaft bin.

Außerdem mag ich IFG, VIG, UIG und DSGVO.