Replies

"Linux AI PCs?"

I acquired an EVO X2 with 128G memory to run AI models under Linux (actually, I've got it set up to dual boot so that I can use it for game play if I want). I'm happy with it, but here are the important things I've learned along the way (at great time expense):

- I find it useful to think of LLMs as coming in small (< 10G parameters), medium (< 75G parameters) and large (>75G parameters). The X2 can run smalls and mediums, which is good for many things, but it's never going to be as good as the large commercial online AI services.

- The LMM server is running the non-nonsense 'Lubuntu' Linux distro, which still provides a decent GUI, but without the desktop special effects that would steal CPU cycles from the LLM.

- The X2 uses an AMD GPU for which NVIDIA's CUDA software is not yet available to take max advantage of the GPU. However, the performance with small and medium sized LLMs is satisfactory. I do believe that CUDA will become available for this processor in the future. CUDA-compatibility is one of the most critical concerns when selecting hardware to host your own LLM.

There is a 'Vulkan' emulation layer that supposedly emulates CUDA and increases performance when using AMD GPUs.

- Forget about using AI to generate graphics locally unless you're ready to part with maybe $50k. Apparently the graphics generation requires huge memory and multiple high-end CUDA-compatible GPUs working in concert.

- In addition to X2 LLM machine, I've got a Dell 7040 microtower with an I7 processor and 16G of memory as myworkstation ($125 used on eBay), with Kubuntu installed (for bells and whistles), and a Dell 7060 microtower with I7 and maxed out to 32 Gig as the application/web server ($150 used on eBay). Like the LLM server, the 7060 is also running Lubuntu for efficiency purposes.

The app/web server is running the killer combo of LiteLLM (a smart 'LLM router' that interfaces with external and local LLMs, and 'Open WebUI' which provides a LLM portal that talks to LiteLLM, and basically provides a pretty darm capable web-based LLM chat UI similar to the big boys. Each of these, as well as the PSQL database that LiteLLM really needs for all features to work, run in individual Docker containers (actually, I'm using 'Podman' rather than 'Docker', but you'll save a ton of time and headaches going with standard Docker.

Open WebUI has some really nice features, such as the ability to define your own virtual assistants with their own guardrails, personalities and instructions.

For the LLM server, there's a nice console program 'nvtop' which allows you to view GPU load as a real time graph. This is how to get a warm fuzzy when running a query against a local LLM install.

Here's the video that got me started:

https://www.youtube.com/watch?v=nQCOTzS5oU0