RAG · NLP · Tintin · Very Important Research
What does Haddock
actually mean?
A homemade Retrieval-Augmented Generation system built to answer one of the great unsolved questions of 20th century literature: what on earth is Captain Haddock saying, and should anyone be offended?
The problem
Captain Haddock — first mate, eventually captain, full-time catastrophe — is one of the most linguistically inventive characters in the history of comics. When things go wrong (which, in Tintin, is always), Haddock reaches not for ordinary profanity but for a remarkable personal arsenal of invented insults, archaic nouns, and words that sound offensive but technically aren't.
"Blistering barnacles!" is the famous one. But there are hundreds more. Iconoclast. Ostrogoth. Sea gherkin. Troglodyte. Ectoplasm. Nitwit. Macrocephalic baboon. Some of these are real words. Some are historical figures. Some are, apparently, types of freshwater fish.
The question nobody was formally asking — but clearly should have been — is: what do they actually mean? Not in the vague "it's an insult" sense, but properly. Etymologically. Historically. With context.
This is that system.
What it does
Curse ingestion
A curated dataset of Haddock's curses, sourced from across the Tintin albums, each tagged with the book, scene context, and target of the outburst.
Vector knowledge base
Each curse is embedded alongside its etymological background, historical references, and linguistic notes. Stored in a vector database for semantic retrieval.
RAG pipeline
At query time, the system retrieves the most relevant curse records and passes them to an LLM with a prompt that would make Hergé cautiously proud. The answers are accurate. The tone is appropriate.
Natural language queries
Ask it anything. "What is a troglodyte?" "Which curses reference seafaring?" "Is Haddock ever actually rude?" It will answer, with citations.
Context-aware explanations
The system doesn't just define the word — it explains why Haddock might have chosen it, what it would have meant to a 1940s Belgian reader, and whether it constitutes an actual insult in any jurisdiction.
Entirely homemade
No off-the-shelf RAG framework. Built from scratch: embedding pipeline, retrieval logic, prompt construction. The kind of project that teaches you more than it probably should have needed to.
Sample exchange
"What does Haddock mean when he calls someone an Ostrogoth?"
The Ostrogoths were an eastern Gothic tribe who played a prominent role in the fall of the Western Roman Empire in the 5th century. By the medieval period, "Ostrogoth" had become a general term for a barbarian — someone uncultured, uncivilised, and probably eating with their hands. In Haddock's usage, it functions as a strong accusation of boorishness. It is, by any measure, a proper insult. Hergé clearly did his research.
Why
The honest answer is that this started as an excuse to build a RAG system from scratch and understand how retrieval-augmented generation actually works under the hood — without the abstraction layers that most frameworks politely hide from you.
The choice of subject matter was not, strictly speaking, an accident. Haddock's curses are a perfect dataset: they're finite, well-documented, culturally rich, and entirely harmless. Nobody has ever been genuinely wounded by the word macrocephalic.
It also connects to a broader project — The Gentleman's Insult — which takes the same philosophy (insults should be interesting, not just loud) and extends it to other characters with equally distinguished vocabularies.
Automated Mastodon publishing
At some point it became clear that keeping the model's output entirely to myself would be a waste. So I did what any reasonable person would do: I hijacked my own Mastodon account and wired the RAG system directly to it.
Selected outputs — curse explanations, etymological rabbit holes, the occasional Haddock deep-cut — are now published automatically to @sebs_tech@mastodon.social. The account did not consent to this arrangement. It didn't object either.
The pipeline picks outputs that are self-contained and interesting enough to stand alone — explanations that read as a complete thought without needing the query context. The Mastodon API is, it turns out, extremely easy to post to. Possibly too easy.
Automated curse explanations, published directly from the model to Mastodon. No curation. No editorial oversight. Just Haddock, explained, on the internet.
@sebs_tech@mastodon.social →Status
Functional and in use. The dataset is ongoing — there are 28 Tintin albums and Haddock's range is, it turns out, considerable. If you have a favourite curse that deserves proper documentation, get in touch.