Hardware And Software Options for Local AI
Figure out what the most cost effective solution is for your situation
This article will guide you to choosing the best hardware setup fro you and help you evaluate the tradeoffs that need to be considered. This is a complex, multi variable consideration as AI hardware in itself has many tradeoffs, but adding in your use cases adds more complications. Some people can buy a Mac and be happy. Others will need to retrofit or even build an entire rig to satisfy their need as a power user. We will walk through the trade offs of different builds, and then tie them together with what your use cases are.
Baseline Mindset
Lets do a quick run through of my journey on Local AI.
When I first started out with AI I started exploring with Stable Diffusion around two years ago running AUTOMATIC1111 with my old Nvidia 1060 6GB. It took over 30 seconds just to generate an image on pre SDXL based models. This was the start of my journey on the hardware impacts when utilizing these AI models locally from my own machines. It was a painful and time consuming process but also fun to see a new technology in motion. At the time I was such a novice and didn’t understand prompting well so I would iterate 20 times for every prompt so I could get a good sampling of pictures to choose from. Now with my 3090 I can use the latest SDXL models and generate images in seconds. I tell this story to highlight how awesome it is to have fast AI, and to have fast AI you need to great hardware and software.
I dropped off the AI train for a bit, only to come back with summer ChatGPT3 launch rekindling my interests in the topic. This was great, but it wasn’t local, there were no privacy guarantees, and OpenAI can’t be trusted with sensitive data. Once I learned how easy it was to run Ollama on my Mac earlier in 2024 I started running AI on my personal devices and realized the true power aligned hardware and software resulting in blazing fast operations.
The speed was thanks to Mac’s hardware and software design for Apple Silicon. Our favorite local AI software Ollama runs with llama.cpp library, which treats Apple’s Metal API and MLX framework as a first class option. MLX is machine learning library that is highly aligned to Apple’s GPU and CPU hardware capabilities. These benefits return to Ollama by utilizing all the hardware that the Mac can provide under the new M generation chips. This means that the entire RAM and significant processing power are available without needing discrete external GPUs like with AMD, Nvidia, and Intel GPU lines. Suddenly I was running 8GB-13GB sized models and having conversations with AI that rivaled the speed and quality of ChatGPT 3.0 and 3.5 all with a few command line commands. I could now have full chat and also inline code suggestions all on the same device. This is where I realized the power of the local AI but also again hit my limitations.
My Mac wasn’t bought for AI, but thankfully I bought the 18GB model M3 model. For small AI models they run from 8-16GB of your RAM. This can unfortunately starve the rest of your Mac for memory if running a larger model. Throw in a Browser with many tabs, a large code repo inside an IDE, plus many other apps, and suddenly you’ve tied up a lot of system resources and can’t operate as fast.
This led me to my current day situation. I retrofitted my old desktop with a new 3090 and even though the RAM and CPU (Intel Core i5-6600K) are years old running Windows, everything with AI was suddenly immediately available and immensely more productive. I can now run Stable Diffusion XL and the latest Llama 3.1 all on this server and not tie up my MacBook’s RAM, focusing my resources into my Coding environment now. I now have superior models and processing power all an API call away on my server (more here on that) while saving my Mac most of the hard lifting.
After all this I’ve learned a lot of lessons from this slow evolution and I’m going to share with you some learnings and give you options on how to proceed. This will save you time and money and get you up to speed.