Oct 172025
 

Now that I have put together my little RAG project (little but functional, more than a mere toy demo) it led to another idea. The abstract vector database (embedding) that represents my answers can be visualized, well, sort of, in a two-dimensional representation, and I built just that: an interactive visualization of all my Quora answers.

It is very educational to explore, how the embedding model managed to cluster answers by semantics. As a kind of a trivial example, there is a little “cat archipelago” in the upper right quadrant: several of my non-physics answers related to cats can be found in this corner. Elsewhere there is, for instance, a cluster of some of my French-language answers.

Anyhow, feel free to take a look. It’s fun. Unlike the RAG engine itself, exploring this map does not even consume any significant computing (GPU) resources on my server.

 Posted by at 7:18 pm
Oct 172025
 

I’ve been reading about this topic a lot lately: Retrieval Augmented Generation, the next best thing that should make large language models (LLMs) more useful, respond more accurately in specific use cases. It was time for me to dig a bit deeper and see if I can make good sense of the subject and understand its implementation.

The main purpose of RAG is to enable a language model to respond using, as context, a set of relevant documents drawn from a documentation library. Preferably, relevance itself is established using machine intelligence, so it’s not just some simple keyword search but semantic analysis that helps pick the right subset.

One particular method is to represent documents in an abstract vector space of many dimensions. A query, then, can be represented in the same abstract vector space. The most relevant documents are found using a “cosine similarity search”, which is to say, by measuring the “angle” between the query and the documents in the library. The smaller the angle (the closer the cosine is to 1) the more likely the document is a match.

The abstract vector space in which representations of documents “live” is itself generated by a specialized language model (an embedding model.) Once the right documents are found, they are fed, together with the user’s query, to a generative language model, which then produces the answer.

As it turns out, I just had the perfect example corpus for a test, technology demo implementation: My more than 11,000 Quora answers, mostly about physics.

Long story short, I now have this:

The nicest part: This RAG solution “lives” entirely on my local hardware. The main language model is Google’s Gemma with 12 billion parameters. At 4-bit quantization, it fits comfortably within the VRAM of a 16 GB consumer-grade GPU, leaving enough room for the cosine similarity search. Consequently, the model response to queries in record time: the answer page shown in this example was generated in less than about 30 seconds.

 Posted by at 1:52 am
Oct 042025
 

I regularly get despicable garbage on Facebook, for instance:

  • “Historical” content that’ mostly AI slop, illustrated by “photographs” that are readily identified by Google as generated by their AI;
  • Scam ads, e.g., advertising a business that never existed in the first place, with a “going out of business” once-in-a-lifetime sale;
  • Scam ads, trying to entice me to download, e.g., malicious browser extensions;
  • Catfishing contact requests;
  • Contact requests from cloned accounts, including cloned accounts of friends who, sadly, passed away years ago.

Meanwhile, just the other day, Facebook apparently lost all my prior notifications. Not sure if it is a site-wide problem or specific to my account, but it was annoying either way.

And then… this. I regularly repost my blog entries to Facebook. Over the past year, they randomly removed three of them, for allegedly violating their famous “community standards”. (Because Meta cares so much about “community”. Right.) The three that they removed were

So why do I even bother with Facebook, then? Well, a good question with a simple answer: there are plenty of people — old friends, classmates — that I’d lose touch with otherwise.

That does not mean that I have to like the experience.

Anyhow, now I wonder if this post will also be banned as “spam” by their broken algorithms. Until then, here’s an image of our newest cat.

Marcel may be young (just over 3 months) but he already understands a lot about the world. Including Facebook. His facial expression says it all.

 Posted by at 3:12 am