October 2025 – Spinor Info

I’ve been reading about this topic a lot lately: Retrieval Augmented Generation, the next best thing that should make large language models (LLMs) more useful, respond more accurately in specific use cases. It was time for me to dig a bit deeper and see if I can make good sense of the subject and understand its implementation.

The main purpose of RAG is to enable a language model to respond using, as context, a set of relevant documents drawn from a documentation library. Preferably, relevance itself is established using machine intelligence, so it’s not just some simple keyword search but semantic analysis that helps pick the right subset.

One particular method is to represent documents in an abstract vector space of many dimensions. A query, then, can be represented in the same abstract vector space. The most relevant documents are found using a “cosine similarity search”, which is to say, by measuring the “angle” between the query and the documents in the library. The smaller the angle (the closer the cosine is to 1) the more likely the document is a match.

The abstract vector space in which representations of documents “live” is itself generated by a specialized language model (an embedding model.) Once the right documents are found, they are fed, together with the user’s query, to a generative language model, which then produces the answer.

As it turns out, I just had the perfect example corpus for a test, technology demo implementation: My more than 11,000 Quora answers, mostly about physics.

Long story short, I now have this:

The nicest part: This RAG solution “lives” entirely on my local hardware. The main language model is Google’s Gemma with 12 billion parameters. At 4-bit quantization, it fits comfortably within the VRAM of a 16 GB consumer-grade GPU, leaving enough room for the cosine similarity search. Consequently, the model response to queries in record time: the answer page shown in this example was generated in less than about 30 seconds.

I regularly get despicable garbage on Facebook, for instance:

“Historical” content that’ mostly AI slop, illustrated by “photographs” that are readily identified by Google as generated by their AI;
Scam ads, e.g., advertising a business that never existed in the first place, with a “going out of business” once-in-a-lifetime sale;
Scam ads, trying to entice me to download, e.g., malicious browser extensions;
Catfishing contact requests;
Contact requests from cloned accounts, including cloned accounts of friends who, sadly, passed away years ago.

Meanwhile, just the other day, Facebook apparently lost all my prior notifications. Not sure if it is a site-wide problem or specific to my account, but it was annoying either way.

And then… this. I regularly repost my blog entries to Facebook. Over the past year, they randomly removed three of them, for allegedly violating their famous “community standards”. (Because Meta cares so much about “community”. Right.) The three that they removed were

A post celebrating the fact that my Web site was 30 years in existence;
A whimsical post about the evils of chemistry;
A post about Lindsey Graham suggesting that Trump should run again in 2028.

So why do I even bother with Facebook, then? Well, a good question with a simple answer: there are plenty of people — old friends, classmates — that I’d lose touch with otherwise.

That does not mean that I have to like the experience.

Anyhow, now I wonder if this post will also be banned as “spam” by their broken algorithms. Until then, here’s an image of our newest cat.

Marcel may be young (just over 3 months) but he already understands a lot about the world. Including Facebook. His facial expression says it all.

Spinor Info

Visualization

Retrieval Augmented Generation (RAG)

Facebook’s broken “community standards”