vttoth

I am a software developer and author of computer books. I also work on some problems in theoretical physics. For more information, please visit my personal Web site at http://www.vttoth.com/.

Jan 272026
 

I have been thinking a lot (who hasn’t?) these days about the state of the world, in particular the United States.

Many people I know would like to see Trump gone. Can’t blame them. Unfortunately, I think the mess is deeper, and older than Trump’s presidency. Indirect roots go back decades. Trump is a symptom, maybe an accelerant, but not the cause. And even if he were to vanish tomorrow, the problems would remain.

Looking at the totality of events, from Minneapolis to Venezuela, from Davos to Mar-a-Lago, I realized that I am looking at a system that is under stress. Complex systems neither slide into chaos slowly nor crash instantly: they may reach the point of no return yet show only few outward signs of impending collapse until it actually happens.

A sad apolitical example is Chernobyl: The reactor’s fate was sealed when they removed the last few control rods, but for another minute and a half, everything seemed normal. They could even conduct the test that was the whole point of the exercise that ultimately led to disaster. When the operators finally pressed the famous AZ-5 shutdown button, triggering a series of explosions, it was just the closing act of a play that already had a predetermined outcome. Button or no button, the reactor was already primed for a thermal excursion and eventual explosion.

A replica AZ-5 button sold by the Chornobyl shop.

What is common in these cases is that the moment of no return represents the last moment in time when the system could still redundantly absorb or redistribute stress. It does not collapse instantly afterwards: things may start to SNAP, but it takes a while before the failure of individual pieces turns into a cascading collapse. Meanwhile, those in charge may still be under the illusion that they are in control, when in reality, the system is already in an irreversible, runaway state.

In that light, I think I might be able to put my finger on the precise moment when the US reached the point of no return, the first SNAP. It was early summer 2024, when the system failed to sentence Trump despite the criminal convictions. It was, quite simply, too slow. From that moment onward, America’s famed system of “checks and balances” could no longer keep up with events. The system broke even though from the outside, it still appeared, still appears to be functioning. It’s only now that the SNAPs are becoming a cacophony, with new outrage happening every day.

What we now have in the US – ICE effectively above the law, the militarization of federal enforcement even against local authorities, unilateral presidential action on everything from wars to tariffs with no checks on presidential power, weaponization of institutions of law enforcement, open corruption (e.g., Trump’s cryptocurrency, Venezuelan oil money to Qatar) – will not go away with Trump. There will be others – people who will be more determined and more talented than Trump! – exploiting these things. The system, which has been under stress ever since (at least) the days of Gingrich is, in my reading, broken already, and no event will magically repair it. To my untrained eye, I’d say the US ceased being a free democracy sometime in 2025 and became a hybrid regime, a “competitive authoritarian” state. Internationally, as Carney observed, Pax Americana is history: Instead we have a new world order that in many ways resembles the years before 1914, though hopefully with a saner outcome. (I am not terribly optimistic. As Carney said, you’re either at the table or you’re on the menu. That’s not a prescription for sanity.)

So… I cherish the past 80 years. It’s been an incredible period in human history. But I think it’s come to an end.

 Posted by at 4:22 am
Jan 132026
 

His views were notably controversial, especially in these troubled, polarized times. But his cartoons delighted millions, myself included.

Rest in peace, Scott Adams. Along with Catbert, we mourn you.

 Posted by at 2:49 pm
Jan 092026
 

Been a while since I blogged about politics. Not because I don’t think about it a lot… but because lately, it’s become kind of pointless. We are, I feel, past the point when individuals trying to raise sensible concerns can accomplish anything. History is taking over, and it’s not leading us into the right direction.

Looking beyond the specifics: be it the decisive US military action to remove Maduro in Venezuela (and the decision to leave Maduro’s regime in place with his VP in charge), the seizure of Russian-flagged tankers, yet another Russian act of sabotage against undersea infrastructure in the Baltics (this time caught red-handed by the Finns), the military situation in Ukraine, the murder of a Minnesota woman by ICE agents, more shooting by US customs and border protection agents in Portland…

Never mind the politics of the day. It’s the bigger picture that concerns me.

Back in the 1990s, I was able to rent a decent apartment here in Ottawa, all utilities included, for 600-odd dollars. It was a decent apartment with a lovely view of the city. The building was reasonably well-maintained. Eventually we moved out because we bought our townhome. The price was well within our ability to finance.

Today, that same apartment rents for three times the amount. Our townhome? A similar one was sold for five times what we paid for ours 28 years ago. Now you’d think this is perfectly normal if incomes rose at the same pace, perhaps keeping track with inflation. But that is not the case. The median income in the same time period increased by a measly 30%, give or take.

This, I daresay, is obscene. It means that a couple in their early 30s, like we were back then, doesn’t stand a chance in hell. Especially if they are immigrants like we were, with no family backing, no inherited wealth.

In light of this, I am not surprised by daily reports about rising homelessness, shelters filled to capacity, food banks struggling with demand.

What I find especially troubling is that we seem hell-bent on turning cautionary tales into reality. For instance, there’s the 1952 science-fiction novel by Pohl and Kornbluth, The Space Merchants. Nowadays considered a classic. It describes a society in which corporate power is running rampant, advertising firms rule the world, and profit trumps everything. Replace “advertising” with “social media”, make a few more surface tweaks and the novel feels like it was written in 2025. When the narrative describes the homeless seeking shelter in the staircases of Manhattan’s shiny office towers, I can’t help but think of all the homeless here on Rideau Street, just minutes away from Parliament Hill, in the heart of a wealthy G7 capital city.

Or how about a computer game from the golden era of 8-bit computing, one of the gems of Infocom, the leading company of the “interactive fiction” (that is, text adventures) genre? I am having in mind A Mind Forever Voyaging, a game in which you play as the AI protagonist, tasked with entering simulations of your town’s future 10, 20, etc., years hence, to find out how bad policy leads to societal collapse. When I read their description of the city, I am again reminded of Rideau Street’s homeless population.

In one of his best novels, the famous Hungarian writer Jenő Rejtő (killed far too young, at 37, serving in a forced labor battalion on the Eastern Front in 1943, after being drafted on account of being a Jew) has a character, a gourmand chef, utter these words: “The grub is inedible. Back at Manson, they only cooked bad food. That was tolerable. But here, they are cooking good food badly, and that is insufferable.” The West is like that today. It’s not “bad food”: our countries are not dictatorships, not failed states governed by corrupt oligarchs, but proper liberal democracies. Yet, I feel, they are increasingly mismanaged, unable to deliver on what ought to be the basic contract between the State and its Subjects in any regime.

What basic contract? Bertold Brecht put it best: “Erst kommt das Fressen, dann kommt die Moral“, says Macheath in Kurt Weill’s Threepenny Opera: “Food is the first thing, morals follow on.” A State must deliver food, shelter, basic security, a working infrastructure, and a legitimate hope that tomorrow will be better (or at least, not worse) than today.

A State that fails at that will itself fail. A State that succeeds at this mission will survive, even if it is an authoritarian regime. In fact, if the State is successful, it does not even need significant oppression to stay in power: it will, at the very least, be tolerated by the populace. (I grew up in such a state: Kádár’s “goulash communist” Hungary.) The liberal West may only forget this at its own peril.

What happens when the basic contract is violated? People look for alternatives. They may become desperate. And that’s when populist demagogues arrive on the scene, presenting themselves as saviors. In reality, they have neither the ability nor the inclination to solve anything: they feed on desperation, not solutions.

In contrast, successful States, liberal democracies and hereditary empires alike, share one thing in common: the dreaded “deep state”. That is to say, a competent meritocratic bureaucracy, capable of, and willing to, recognize and solve problems. A robust bureaucracy can survive several bad election cycles or even generations of bad Emperors. Imperial China serves as a powerful example, but we can also include the Roman Principate and later, Byzantium. Along with other examples of empires that remained stable and prosperous for many generations.

No wonder wannabe despots often target the “deep state” first. A competent meritocratic bureaucracy, after all, stands in their way towards unconstrained power. Thinning out the ranks, hollowing out the institutions is therefore the top order of the day for the would-be despot. It’s not always true of course. Talented despots learn to rely on the competent bureaucracy as opposed to eliminating it. But talented despots are not myopic populist opportunists. They are that rare kind: empire builders. Far too often, the despots we encounter lack both the talent and the vision to build anything. They just exploit the pain, and undermine the very institutions that can alleviate that pain.

This is what we see throughout the West in 2026. Even as the warning signs get stronger—among them rising wealth and income inequality, an oligarchic concentration of astronomical wealth in just a few hands, rising homelessness, decaying infrastructure, an increasingly fragile health care system, rising indebtedness, lack of employment security—there appears to be way too little appetite for meaningful structural solutions. Instead, we get easy slogans. “It’s the damn immigrants,” says one side while the other retorts with a complaint about “white supremacism,” just to name some examples, without implying moral equivalence. The slogans solve nothing: they do create, however, the specter of an “enemy” that must be eliminated, an “enemy” from whom only the populist can protect you.

One of the best records of one of my favorite bands, Electric Light Orchestra, was the incredibly prescient concept album Time, released in 1981. In addition to predicting advanced, well-aligned AI in such a way that feels almost uncanny in detail (“She does the things you do / But she is an IBM / She’s only programmed to be very nice / But she’s as cold as ice […] She tells me that she likes me very much […] She is the latest in technology / Almost mythology / But she has a heart of stone / She has an IQ of 1001 […] And she’s also a telephone“) they also describe a future that is… hollow: “Back in the good old 1980s / when things were so uncomplicated” – in other words, when Western liberal democracies still understood how to deliver on that basic contract, Brecht’s “basic food position“.

 Posted by at 1:03 am
Dec 202025
 

So I had a surprisingly smooth experience with a chat agent, most likely an AI agent though probably with some human-in-the-loop supervision. This had to do with canceling/downgrading my Sirius XM subscription, now that we no longer have a vehicle with satellite radio.

And it got me thinking. Beyond the hype, what does it take to build a reliable AI customer experience (CX) agent?

And that’s when it hit me: I already did it. Granted, not an agent per se, just the way I set up GPT-5 to play chess.

The secret? State machines.

I did not ask GPT to keep track of the board. I did not ask GPT to update the board either. I told GPT the state of the board and asked GPT to make a move.

The board state was tracked not by GPT but by conventional, deterministic code. The board is a state machine. Its state transitions are governed by the rules of chess. There is no ambiguity. The board’s state (including castling and en passant) is encoded in a FEN string unambiguously. When GPT offers a move, its validity is determined by a simple question: does it represent a valid state transition for the chessboard?

And this is how a good AI CX agent works. It does not unilaterally determine the state of the customer’s account. It offers state changes, which are then evaluated by the rigid logic of a state machine.

Diagram created by ChatGPT to illustrate a CX state machine

Take my case with Sirius XM. Current state: Customer with a radio and Internet subscription. Customer indicates intent to cancel radio. Permissible state changes: Customer cancels; customer downgrades to Internet-only subscription. This is where the LLM comes in: with proper scaffolding and a system prompt, it interrogates the customer. Do you have any favorite Sirius XM stations? Awesome. Are you planning on purchasing another XM radio (or a vehicle equipped with one)? No, fair enough. Would you like a trial subscription to keep listening via the Internet-only service? Great. State change initiated… And that’s when, for instance, a human supervisor comes in, to approve the request after glancing at the chat transcript.

The important thing is, the language mode does not decide what the next state is. It has no direct authority over the state of the customer’s account. What it can do, the only thing it can do at this point, is initiating a valid state transition.

The hard part when it comes to designing such a CX solution is mapping the states adequately, and making sure that the AI has the right instructions. Here is what is NOT needed:

  • There is no need for a “reasoning” model;
  • There is no need for “agentic” behavior;
  • There is no need for “council-of-experts”, “chain-of-thought” reasoning, “self-critique”, or any of the other hyped inventions.

In fact, a modest locally run model like Gemma-12B would be quite capable of performing the chat function. So there’s no need even to worry about leaking confidential customer information to the cloud.

Bottom line: use language models for what they do best, associative reasoning. Do not try to use a system with no internal state and no modeling capability as a reasoning engine. That’s like, if I may offer a crude but (I hope) not stupid analogy, it’s like building a world-class submarine and then, realizing that it is not capable of flying, nailing some makeshift wooden wings onto its body.

I almost feel tempted to create a mock CX Web site to demonstrate all this in practice. Then again, I realize that my chess implementation already does much of the same: the AI agent supplies a narrative and a proposed state transition, but the state (the chessboard) is maintained, its consistence is ensured, by conventional software scaffolding.

 Posted by at 3:31 pm
Dec 202025
 

Good-bye, 2022 Accord. Hardly knew ya.

Really, our Accord had ridiculously low mileage. We weren’t driving much even before COVID but since then? I’ve not been to NYC since 2016, or to the Perimeter Institute since, what, 2018 I think. In fact in the past 5 years, the farthest I’ve been from Ottawa was Montreal, maybe twice.

Needless to say, when our dealer saw a car with such low mileage, they pounced. Offered a new lease. I told them, sure, but I’m downsizing: no need for an Accord when we use the car this little, a Civic will do just fine. And a Civic it is.

Things missing? Very few. In no particular order:

  • No Sirius XM satellite radio (apparently it’s gone from Hondas?)
  • No built-in GPS (but Android Auto works way better than it ever did in the Accord);
  • Somewhat fewer USB, power outlets (but enough for our needs);
  • No HUD (but the instrument panel is quite adequate); and
  • No turn signals on the mirrors.

This is it. Really. That’s all. (I thought it also lacked blindspot warning, but I was mistaken.) And it’s a car just as decent and capable as the Accord, but substantially cheaper. So… who am I to complain?

So here we go, nice little Civic, until this lease expires. (No, I am not buying cars anymore. They are loaded with things that, when they go bad, are very costly to repair or replace. The technical debt is substantial.)


Almost forgot this rather important bit:

Yes. Made in Canada. These days, sadly, it matters.

 Posted by at 2:33 am
Dec 092025
 

It happened in 1959, at the height of the Cold War, during the early years of the Space Race between the United States of America and the Soviet Union.

An accident in rural Oklahoma, near the town of Winganon.

Rex Brown/CC BY-ND 2.0

But… don’t be fooled by appearances. This thing is not what it looks like.

That is to say, it is not a discarded space capsule, some early NASA relic.

What is it, then? Why, it is the container of a cement mixer that had an accident at this spot. Cement, of course, has the nasty tendency to solidify rapidly if it is not mixed, and we quickly end up with a block that weighs several tons and… well, it’s completely useless.

There were, I understand, plans to bury the thing but it never happened.

But then, in 2011, local artists had an idea and transformed the thing into something else. Painting it with a NASA logo, an American flag, and adding decorations, they made it appear like a discarded space capsule.

And I already know that if it ever happens again that I take another cross-country drive in America to visit the West Coast, and my route goes anywhere near the place, I will absolutely, definitely, visit it. Just as I visited the TARDIS of Doctor Who 12 years ago when I was spending a few lovely days in the fine city of London.

 Posted by at 3:34 am
Nov 282025
 

Behind every front-end there is a back-end. My WISPL.COM chatbot is no exception. It’s one thing to provide a nice chatbot experience to my select users. It’s another thing to be able to manage the system efficiently.

Sure, I can, and do, management tasks directly in the database, using SQL commands. But it’s inelegant and inconvenient. And just because I am the only admin does not mean I cannot make my own life easier by creating a more streamlined management experience.

Take announcements. The WISPL chatbot operates in four languages. Creating an announcement entails writing it in a primary language, translating it into three other languages, and the posting the requisite records to the database. Doing it by hand is not hard, but a chore.

Well, not anymore. I just created a nice back-end UI for this purpose. By itself it’s no big deal of course, but it’s the first time the software itself uses a large language model for a targeted purpose.

Note the highlighted Translate button. It sends the English-language text to a local copy of Gemma, Google’s open-weights LLM. Gemma is small but very capable. Among other things, it can produce near flawless translations into, never mind German or French, even Hungarian.

This back-end also lets me manage WISPL chatbot users as well as the language models themselves. It shows system logs, too.

 Posted by at 5:44 pm
Nov 222025
 

Earlier this morning, still in bed, I was thinking about how electricity entered people’s lives in the past century or so.

My wife and I personally knew older people who spent their childhood in a world without electricity.

  • When electricity finally arrived, it was at first in the form of electric lights.
  • Not much later, simple machines appeared. Maybe a vacuum cleaner. Maybe a coffee grinder. Something with a simple electric motor and some trivial mechanical construction.
  • Next came the radio. Suddenly, electricity introduced a whole new dimension: you were never alone anymore. If you had a radio set and a power source, you could listen to the world.
  • Then there were refrigerators, revolutionizing kitchens. Leftovers were no longer waste or slop for farm animals: they could be consumed a day or two later, kept fresh in the fridge.
  • Not long after, another miracle began to pop up in homes: television sets.
  • Sometime along the way, electric stoves, ventilators, and ultimately, air conditioning also appeared in many homes.

One could almost construct a timeline along these lines. This is what was on my mind earlier in the morning as I was waking up.

And then, a few hours later, a post showed up in my feed on Facebook, courtesy of the City of Ottawa Archives, accompanied by an image of some exhibition grounds celebrating World Television Day back in 1955.

It’s almost as though Facebook read my mind.

No, I do not believe that they did (otherwise I’d be busy constructing my first official tinfoil hat) but it is still an uncanny coincidence. I could not have possibly come up with a better illustration to accompany my morning thoughts on this subject.

 Posted by at 10:16 pm
Nov 222025
 

It was high time, I think. I just finished putting together a Web site that showcases my AI and machine learning related work.

The site is called WISPL. It is a domain name I fortuitously obtained almost a decade ago with an entirely different concept in mind, but which fits perfectly. It’s a short, pronounceable domain name and it reminds one of the phrase, “AI whisperer”.

The site has of course been the home of my “chatbot” for more than two years already, but now it is something more. In addition to the chatbot, I now present my retrieval augmented generation (RAG) solution; I show a Web app that allows the user to play chess against GPT “properly” (while also demonstrating the ground truth that autoregressive stochastic next-token predictors will never be great reasoning engines); I showcase my work on Maxima (the computer algebra system, an example of more “conventional” symbolic AI); and I describe some of my AI/ML research projects.

 Posted by at 3:48 am
Nov 112025
 

I came across this meme earlier today:

For the first time in history, you can say “He is an idiot” and 90% of the world will know whom you are talking about.

In an inspired moment, I fed this sentence, unaltered, to Midjourney. Midjourney 6.1 to be exact.

Our AI friends are… fun.

 Posted by at 6:19 pm
Nov 092025
 

While I was working on my minimalist but full implementation of a GPT, I also thought of a game that can help participants understand better how language models really work. Here are the rules:

  1. Someone asks a question.
  2. Participants take turns, making a best effort to contribute to the answer, ONE WORD AT A TIME.
  3. The round is finished when someone ends it with a period.

Say, there are three participants, Alice, Bob and Christine, trying to answer the question, “What was the most significant geopolitical event of the 20th century”?

A: THE
B: ATOMIC
C: BOMB
A: WAS
B: TESTED
C: IN
A: THE
B: SUMMER
C: OF
A: 1945
B: .

Did Alice really want to talk about the atomic bomb? Perhaps she was thinking of the Sarajevo assassination and the start of WW1. Or the collapse of the USSR.

Did Bob really mean to talk about the bomb? Perhaps he was thinking about the discovery of the atomic nature of matter and how it shaped society. Or maybe something about the atomic chain reaction?

Did Christine really mean to talk about the first atomic test, the Trinity test in New Mexico? Maybe she had in mind Hiroshima and Nagasaki.

The answer we got is an entirely sensible answer. But none of the participants knew that this will be the actual answer. There was no “mind” conceiving this specific answer. Yet the “latent knowledge” was present in the “network” of the three players. At each turn, there were high probability and lower probability variants. Participants typically but not necessarily picked the highest probability “next word”, but perhaps opted for a lower probability alternative on a whim, for instance when Bob used “TESTED” instead of “DROPPED”.

Language models do precisely this, except that in most cases, what they predict next is not a full word (though it might be) but a fragment, a token. There is no advance knowledge of what the model would say, but the latent knowledge is present, as a result of the model’s training.

In 1980, Searle argued, in the form of his famous Chinese Room thought experiment, that algorithmic symbol manipulation does not imply understanding. In his proposed game, participants who do not speak Chinese manipulate Chinese language symbols according to preset rules, conveying the illusion of comprehension without actual understanding. I think my little game offers a perfect counterexample: A non-algorithmic game demonstrating the emergence of disembodied intelligence based on the prior world knowledge of its participants, but not directly associated with any specific player.

My wife and I just played two turns of this game. It was a fascinating experience for both of us.

 Posted by at 7:39 pm
Nov 092025
 

A few weeks ago I had an idea.

What if I implement a GPT? No, not something on the scale of ChatGPT, with many hundreds of billions of parameters, consuming countless terawatt-hours, training on a corpus that encompasses much of the world’s literature and most of the Internet.

No, something far more modest. How about… a GPT that emulates the world’s first chatbot, Eliza?

Long story short (the long story will follow in due course on my Web site) I succeeded. I have built a GPT from scratch in C++, including training. I constructed a sensible (though far from perfect) training corpus of user prompts and Eliza responses. And over the course of roughly a week, using a consumer-grade GPU for hardware acceleration, I managed to train my smallest model.

No, don’t expect perfection. My little model does not have hundreds of billions of parameters. It does not even have millions of parameters. It is only a 38 thousand (!) parameter model.

Yet… it works. Sometimes its output is gibberish. But most of the time, the output is definitely Eliza-like.

The best part? The model is so small, its inference runtime works well when implemented in JavaScript, running in-browser.

And here is my first ever exchange with the JavaScript implementation, unfiltered and unedited.

No, I am not going to win awards with this chatbot, but the fact that it works at all, and that it successfully learned the basic Eliza-like behavior is no small potatoes.

For what it’s worth, I was monitoring its training using a little bit of homebrew near-real-time instrumentation, which allowed me to keep an eye on key model parameters, making sure that I intervene, adjusting learning rates, to prevent the training from destabilizing the model.

I am now training a roughly 10 times larger version. I do not yet know if that training will be successful. If it is, I expect its behavior will be more robust, with less gibberish and more Eliza-like behavior.

In the meantime, I can now rightfully claim that I know what I am talking about… after all, I have a C++ implementation, demonstrably working, complete with backpropagation, by way of credentials.

 Posted by at 1:40 am
Oct 172025
 

Now that I have put together my little RAG project (little but functional, more than a mere toy demo) it led to another idea. The abstract vector database (embedding) that represents my answers can be visualized, well, sort of, in a two-dimensional representation, and I built just that: an interactive visualization of all my Quora answers.

It is very educational to explore, how the embedding model managed to cluster answers by semantics. As a kind of a trivial example, there is a little “cat archipelago” in the upper right quadrant: several of my non-physics answers related to cats can be found in this corner. Elsewhere there is, for instance, a cluster of some of my French-language answers.

Anyhow, feel free to take a look. It’s fun. Unlike the RAG engine itself, exploring this map does not even consume any significant computing (GPU) resources on my server.

 Posted by at 7:18 pm
Oct 172025
 

I’ve been reading about this topic a lot lately: Retrieval Augmented Generation, the next best thing that should make large language models (LLMs) more useful, respond more accurately in specific use cases. It was time for me to dig a bit deeper and see if I can make good sense of the subject and understand its implementation.

The main purpose of RAG is to enable a language model to respond using, as context, a set of relevant documents drawn from a documentation library. Preferably, relevance itself is established using machine intelligence, so it’s not just some simple keyword search but semantic analysis that helps pick the right subset.

One particular method is to represent documents in an abstract vector space of many dimensions. A query, then, can be represented in the same abstract vector space. The most relevant documents are found using a “cosine similarity search”, which is to say, by measuring the “angle” between the query and the documents in the library. The smaller the angle (the closer the cosine is to 1) the more likely the document is a match.

The abstract vector space in which representations of documents “live” is itself generated by a specialized language model (an embedding model.) Once the right documents are found, they are fed, together with the user’s query, to a generative language model, which then produces the answer.

As it turns out, I just had the perfect example corpus for a test, technology demo implementation: My more than 11,000 Quora answers, mostly about physics.

Long story short, I now have this:

The nicest part: This RAG solution “lives” entirely on my local hardware. The main language model is Google’s Gemma with 12 billion parameters. At 4-bit quantization, it fits comfortably within the VRAM of a 16 GB consumer-grade GPU, leaving enough room for the cosine similarity search. Consequently, the model response to queries in record time: the answer page shown in this example was generated in less than about 30 seconds.

 Posted by at 1:52 am
Oct 042025
 

I regularly get despicable garbage on Facebook, for instance:

  • “Historical” content that’ mostly AI slop, illustrated by “photographs” that are readily identified by Google as generated by their AI;
  • Scam ads, e.g., advertising a business that never existed in the first place, with a “going out of business” once-in-a-lifetime sale;
  • Scam ads, trying to entice me to download, e.g., malicious browser extensions;
  • Catfishing contact requests;
  • Contact requests from cloned accounts, including cloned accounts of friends who, sadly, passed away years ago.

Meanwhile, just the other day, Facebook apparently lost all my prior notifications. Not sure if it is a site-wide problem or specific to my account, but it was annoying either way.

And then… this. I regularly repost my blog entries to Facebook. Over the past year, they randomly removed three of them, for allegedly violating their famous “community standards”. (Because Meta cares so much about “community”. Right.) The three that they removed were

So why do I even bother with Facebook, then? Well, a good question with a simple answer: there are plenty of people — old friends, classmates — that I’d lose touch with otherwise.

That does not mean that I have to like the experience.

Anyhow, now I wonder if this post will also be banned as “spam” by their broken algorithms. Until then, here’s an image of our newest cat.

Marcel may be young (just over 3 months) but he already understands a lot about the world. Including Facebook. His facial expression says it all.

 Posted by at 3:12 am
Sep 302025
 

There is a wonderful tool out there that works with many of the published large language models and multimodal models: Llama.cpp, a pure C++ implementation of the inference engine to run models like Meta’s Llama or Google’s Gemma.

The C++ implementation is powerful. It allows a 12-billion parameter model to run at speed even without GPU acceleration, and emit 3-4 tokens per second in the generation phase. That is seriously impressive.

There is one catch. Multimodal operation with images requires embedding, which is often the most time-consuming part. A single image may take 45-60 seconds to encode. And in a multi-turn conversation, the image(s) are repeatedly encoded, slowing down the conversation at every turn.

An obvious solution is to preserve the embeddings in a cache and avoid re-embedding images already cached. Well, this looked like a perfect opportunity to deep-dive into the Llama.cpp code base and make a surgical change. A perfect opportunity also to practice my (supposedly considerable) C++ skills, which I use less and less these days.

Well, what can I say? I did it and it works.

I can now converse with Gemma, even with image content, and it feels much snappier.

 Posted by at 2:21 am
Sep 282025
 

Once again, I am playing with “low-end” language and multimodal AI running on my own hardware. And I am… somewhat astonished.

But first… recently, I learned how to make the most out of published models available through Hugging Face, using the Llama.cpp project. This project is a C++ “engine” that can run many different models if they are presented in a standard form. In fact, I experimented with Llama.cpp earlier, but only a prepackaged version. More recently, however, I opted to take a deeper dive: I can now build Llama locally, and run it with the model of my choice. And that is exactly what I have been doing.

How efficient is Llama.cpp? Well… we can read a lot about just how much power it takes to run powerful language models and the associated insane hardware requirements in the form of powerful GPUs with tons of high-speed RAM. Sure, that helps. But Llama.cpp can run a decent model in the ~10 billion parameter range even without a GPU, and still produce output at a rate of 3-5 tokens (maybe 2-3 words) per second.

But wait… 10 billion? That sounds like a lot until we consider that the leading-edge, “frontier class” models are supposedly in the trillion-parameter range. So surely, a “tiny” 10-billion parameter model is, at best, a toy?

Maybe not.

Take Gemma, now fully incorporated into my WISPL.COM site by way of Llama.cpp. Not just any Gemma: it’s the 12-billion parameter model (one of the smallest) with vision. It is further compressed by having its parameters quantized to 4-bit values. In other words, it’s basically as small as a useful model can be made. Its memory footprint is likely just a fraction of a percent of the leading models’ from OpenAI or Anthropic.

I had a test conversation with Gemma the other day, after ironing out details. Gemma is running here with a 32,768 token context window, using a slightly customized version of my standard system prompt. And look what it accomplished in the course of a single conversation:

  1. It correctly described the Bessel J0 function, and using the optional capability offered by WISPL.COM and described to it in its system prompt, it included a relevant plot.
  2. Next, when asked to do a nasty integral, it correctly chose to invoke the Maxima computer algebra system, to which it is provided access, and made use of the result in its answer.
  3. Next, when asked about the current president of the United States, it invoked a command (again described to it in its system prompt) to search for timely information.
  4. Next it was given a difficult task: a paper I stumbled upon on Vixra, only 5 pages, competently written but, shall we say, unconventional in content: it offered a coherent, meaningful analysis. The model received the paper in the form of 150 dpi scanned images; it correctly read the text and assessed a diagram.
  5. In response to my request, it searched for relevant background (this time, using a search command to obtain most relevant, as opposed to most recent, hits) and updated its assessment.
  6. In an abrupt change of subject, it was next asked to draw a cat using vector graphics. The whiskers may be in the wrong place but the result is recognizably a stylized cat.
  7. Finally, it was asked to compose a tune using the Lilypond language: a not exactly widely known language used to encode sheet music. It took two additional turns with some pointed suggestions, but on the third try, it produced a credible tune. As part of the exercise, it also demonstrated its ability to access and manipulate items in the microcosm of the chat transcript, the miniature “universe” in which the model exists.

Throughout it all, and despite the numerous context changes, the model never lost coherence. The final exchanges were rather slow in execution (approximately 20 minutes to parse all images and the entire transcript and generate a response) but the model remained functional.

prompt eval time = 1102654.82 ms /  7550 tokens (  146.05 ms per token,     6.85 tokens per second)
       eval time =   75257.86 ms /   274 tokens (  274.66 ms per token,     3.64 tokens per second)
      total time = 1177912.68 ms /  7824 tokens

This is very respectable performance for a CPU-only run of a 12-billion parameter model with vision. But I mainly remain astonished by the model’s capabilities: its instruction-following ability, its coherence, its robust knowledge that remained free of serious hallucinations or confabulations despite the 4-bit quantization.

In other words, this model may be small but it is not a toy. And the ability to run such capable models locally, without cloud resources (and without the associated leakage of information) opens serious new horizons for diverse applications.

 Posted by at 12:22 am