GPT, Claude, Gemini, Grok… great services. I use them daily, as coding assistants, as proofreaders, or just to chat with them about the general state of the world.
But they all reside in the cloud. Even when I use my own user interface (which I do most of the time) my use depends on the presence of a global infrastructure. Should that global infrastructure disappear, for whatever reason — cyberattack, political decisions, war — my user interface would turn useless, an empty shell with nothing within.
Well, at least that was the case until yesterday. As of today, I have an alternative.
Not a great alternative, to be sure. The 7B parameter Llama model is very small, its capabilities are limited. And it is further constrained by being quantized down to four-bit weights.
Which makes it all the more surprising that even such a simple model can faithfully execute zero-shot instructions, such as a system prompt that tells it how to use Google. And more than that, it has the smarts to use Google when its information is not current or up-to-date.
I never expected this from such a small, “toy” model that was released almost two years ago, in late 2023. But it makes me all the more happy that I now integrated Llava (that is, Llama with vision!) into my WISPL front-end.
Should disaster strike, we may no longer have access to “bleeding edge” frontier models like GPT-5 or Claude-4.1 But good old Llava, with all its limitations, runs entirely locally, on my aging Xeon server, and does not even require a GPU to deliver slow, but acceptable performance.
I won’t be using Llava daily, to be sure. But it’s there… consider it insurance.