Local Llama 3.2:1b Chat


Here is our local install of Llama 3.2:1b, running on the same old windows laptop as the rest of this site. The 1b indicates the size of the model, so a 1 billion parameter model. The more parameters, the more thoughtful the model, generally speaking. Or in other words, the fewer the parameters the more it puts in effort like a teenager. At this size it is best for summarization, simple questions, recipes, etc.

The rub is that the more parameters, the more computing power needed, so the best we can do is the 1b model on this machine (and you still need to be patient with it). We have run 7b and 13b models on our 2018 Macbook Air with ok results, but our M1 Mac Mini with 8GB RAM and our Desktop with AMD Ryzen 3 5300G with 16GB RAM and Radeon RX 5500 can run those and even a bit larger with no issue and quick responses.

To accomplish this we are using Ollama, Docker, and Open WebUI. When running Ollama in a terminal window you just get a standard chat function, but in Open WebUI it unlocks the ability to read images and docs, connect to websearch for realtime info, premake prompts and models, and more.


Related Posts