AI Models and Runtime¶
Running LLMs with Ollama¶
To run LLMs on Dalek nodes, you can use the Open WebUI frontend on your machine locally and connect directly to a node running Ollama backend.
Install and Run the OpenWeb-UI Frontend¶
From the quick-start page, there are multiple installation methods. We follow here the one for docker. Pull the image from the docker repositories:
Run it on your local machine, making it available on localhost:3000:
sudo docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway \
-e WEBUI_AUTH=False \
-v open-webui:/app/backend/data \
--name open-webui ghcr.io/open-webui/open-webui:main
Info
The important options here are that we launch openwebui in single user mode
(we don't want to handle multiple accounts on a local machine) and the
--add-host=host.docker.internal:host-gateway option will redirect
localhost of you main OS to host.docker.internal inside the docker
container. This will simplify the connection to the node(s) later.
Run Ollama on a Node¶
Multiple runtimes (also called backends) are available but Open WebUI handles out-of-the-box Ollama and makes getting models running easy.
You need to allocate exclusively the node to be able to connect with SSH on it.
This can be done by connecting to the front then issuing either an srun or
sbatch:
Then, on the node, start ollama by running:
Info
This makes Ollama listen to web requests on the node, on port 11434.
Meeting the Ends¶
You then need to perform port forwarding to make your local and node 11434 ports communicate. On an allocated node, run the following:
Info
This commands connects to the node by first performing a jump on the front.
It then performs port forwarding to bind your port 11434 with the node's
localhost:11434. The binding is done on all your local interfaces. This
command runs in the background without launching a shell so if no error are
printed after issuing it, it is working properly.
[USER_DALEK] is your Dalek login and [NODE_NAME] is the name of the
node you want to connect to (typically something like az4-n4090-2 or
az4-a7900-0).
To check if it works, you can browse
http://localhost:11434/ and it should display
Ollama is running.
Last step is to navigate to Open WebUI
Admin Settings (or
localhost:3000 page, User > Settings > Administration
Settings > Connections > Ollama API (or ) and make sure that the Ollama API
connection is set to http://host.docker.internal:11434.
That's all folks!
Installation Notes¶
Models are installed and shared by all the users. The list of the available
models is given in the Ollama Models section.
So, by default, you cannot add or update these models (unless if you are in the
ai-models group, which is not automatic, see the Technical Details about
ai-models Group section
for more information about it). If you need specific models or specific
versions, you can do unset OLLAMA_MODELS after the module load ollama in
order for Ollama to default to your home repository as the default model
library. As the disk quota available on the NFS is limited, you need to set
Ollama default models path to the scratch and download your models there.
For example, you can do the following to store the models on the scratch:
module load ollama
mkdir -p /scratch/$USER/ollama/
OLLAMA_MODELS=/scratch/$USER/ollama/ ollama serve
Tips
Sometimes, large models can take a long time to load into memory. This is because NFS is not as fast as a local disk. If you need to reuse a specific model (or a subset of models) frequently, it may be a good idea to copy it to the scratch workspace.
Danger
Please keep in mind that models are heavy. When you store them on the NFS or the scratch, keep an eye on what you really use and clean occasionally.
Backend APIs¶
Since Ollama v15.0.0, Dalek provides multiple modules to help users to target specific backends:
ollama/x.y.z-cpu: execute only on CPU.ollama/x.y.z-cuda: run with CUDA (Nvidia GPUs).ollama/x.y.z-rocm: run with ROCm (AMD GPUs).ollama/x.y.z-vulkan: run Vulkan (support many different GPUs).ollama/x.y.z-zauto: automatic version, let Ollama decide how to run.
Even though different modules are used for a given version, the same Ollama executable binary is always used. The only difference lies in the definition of certain specific environment variables used to configure Ollama. For all the modules, the following environment variables are set:
OLLAMA_KEEP_ALIVE="-1": keep models in RAM and VRAM indefinitely as long as Ollama serves.OLLAMA_HOST="0.0.0.0": serve any incoming IP addresses.OLLAMA_MODELS="/mnt/nfs/ai-models/ollama/": as explained in the Installation Notes section, by default Ollama will search for preinstalled models in the/mnt/nfs/ai-models/ollama/shared folder on the NFS.
The following subsections describe the environment variables depending on the proposed modules.
ollama/x.y.z-cpu¶
The following environment variable is set:
OLLAMA_LLM_LIBRARY="cpu": force Ollama to run models on CPU (and this to avoid GPU).
ollama/x.y.z-cuda¶
The following environment variable is set:
OLLAMA_LLM_LIBRARY="cuda": force Ollama to run models with CUDA.
ollama/x.y.z-rocm¶
The following environment variables are set:
OLLAMA_LLM_LIBRARY="rocm": force Ollama to run models with ROCm (\(\approx\) HIP).HSA_OVERRIDE_GFX_VERSION=11.5.1: only on theaz5-a890mpartition. If not set, the Radeon 890M iGPU is not supported by ROCm and the models execute on CPU.
ollama/x.y.z-vulkan¶
The following environment variables are set:
OLLAMA_VULKAN="1": force Ollama to use Vulkan backend.GGML_VK_VISIBLE_DEVICES="[x]": specify the GPU id to use.[x]is a placeholder and depending on the partition its default value is:az4-n4090:1(select the GeForce RTX 4090 dGPU and NOT the Radeon 610M iGPU).az4-a7900:1(select the Radeon RX 7900 XTX dGPU and NOT the Radeon 610M iGPU).iml-ia770:1(select the Arc A770 eGPU and NOT the Arc Mobile iGPU)az5-a890m:0(select the Radeon 890M iGPU).
Note
Users can override GGML_VK_VISIBLE_DEVICES environment variable to target
the GPU they want.
Warning
Vulkan does not appear to work on the Intel GPUs of the iml-ia770
partition (Intel Arc A770 eGPU and Intel Arc Mobile iGPU).
ollama/0.9.3-ipex-llm-2.3¶
This is a specific version provided by Intel that combines with
IPEX-LLM to run Ollama on Intel GPUs like
the ones available in the iml-ia770 partition (Arc A770 eGPU and Arc Mobile
iGPU). Before to load this module, it is required to source the OneAPI script as
follows:
Then you can load the ollama/0.9.3-ipex-llm-2.3 module and the following
environment variables are set:
ONEAPI_DEVICE_SELECTOR="level_zero:0": select the GPU to run models as followlevel_zero:0: Intel Arc A770 eGPU (by default),level_zero:1: Intel Arc Mobile iGPU.
OLLAMA_NUM_GPU=999: to make sure all layers of your model are running on Intel GPU, otherwise, some layers may run on CPU.no_proxy=localhost,127.0.0.1ZES_ENABLE_SYSMAN="1"SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS="1"
OLLAMA_NUM_GPU, no_proxy, ZES_ENABLE_SYSMAN and
SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS variables are recommended and
detailed in the IPEX-LLM documentation.
Warning
Ollama v0.9.3 is quite outdated now (released Jun 25, 2025) and some models will not run with this version. Please refer to the "Comments" column in the Ollama Models table to check if the model can run with this version.
To determine whether IPEX-LLM is incompatible with a given model, either it will be indicated that the model is not compatible with Ollama v0.9.3+IPEX-LLM, or the required version of Ollama will be higher than v0.9.3. If none of the above reasons are mentioned, Ollama v0.9.3+IPEX-LLM should work fine on Intel GPUs.
Warning
For now, even though this version of Ollama appears to run on the Arc Mobile iGPU, we haven't been able to generate tokens consistently, and the implementation seems to have some bugs.
This contradicts the IPEX-LLM documentation, which states that the Intel Core Ultra processors and Intel Arc A-Series GPU are supported. Even on the main Readme of the project, Intel Core Ultra iGPU is mentioned to work...
Danger
IPEX-LLM is no longer supported by Intel, and the project was archived in early 2026... To the best of our knowledge, Intel has not yet announced any alternative to support Ollama on its GPUs.
There is an interesting discussion about IPEX-LLM alternatives on reddit.
Preinstalled AI Models¶
Preinstalled AI models are located in the /mnt/nfs/ai-models folder.
Everyone can read this folder but only users in the ai-models group can modify
it.
For now the /mnt/nfs/ai-models folder contains three sub-folders:
gguf: Models in the GGUF format.huggingface-snapshots: Models downloaded from Hugging Face, the format can differ depending on the repository.ollama: Models downloaded from Ollama (proprietary format, only works with Ollama).
The following sub-sections detail the models that are available depending on the three sub-folders.
GGUF Models¶
| Model Name | From | First rel. date | DL date | Params (B) | Context size (K) | Model size on disk (GB) | Input | Output | MoE | Use cases and architecture | Comments |
|---|---|---|---|---|---|---|---|---|---|---|---|
gpt-oss-20b-mxfp4 |
OpenAI | 2025/08 | 2025/10 | 20.000 | 128 | 14.0 | Text | Text | Yes | Conversational LLM | -- |
llama-2-7b.Q4_0 |
Meta | 2023/07 | 2025/11 | 7.000 | 4 | 3.6 | Text | Text | No | Conversational LLM | -- |
Hugging Face Models¶
| Model Name | From | First rel. date | DL date | Params (B) | Context size (K) | Model size on disk (GB) | Input | Output | MoE | Use cases and architecture | Comments |
|---|---|---|---|---|---|---|---|---|---|---|---|
donut-base |
NAVER Labs AI | 2021/11 | 2026/01 | 0.250 | -- | 0.8 | Text, Image, PDF | Text | No | Document understanding (transformer enc-dec) | OCR-free |
layoutlmv2-base-uncased |
Microsoft Research Asia | 2020/12 | 2026/01 | 0.200 | -- | 0.8 | Text, Image | Text | No | Document understanding (transformer enc-only) | With OCR |
layoutlmv3-base |
Microsoft Research Asia | 2022/04 | 2026/01 | 0.100 | -- | 1.9 | Text, Image, PDF | Text | No | Document understanding (transformer enc-only) | With OCR |
roberta-base-squad2 |
Deepset | 2023/06 | 2026/01 | 0.100 | -- | 2.4 | Text | Text | No | Extractive QA (transformer enc-only) | -- |
distilbert-base-cased |
Hugging Face | 2019/09 | 2026/01 | 0.065 | -- | 1.1 | Text | Text | No | Extractive QA (transformer enc-only) | -- |
bart-large-cnn |
Facebook AI | 2019/10 | 2026/01 | 0.400 | -- | 8.0 | Text | Text | No | Text summary (transformer enc-dec) | -- |
pegasus-cnn_dailymail |
Google Research | 2018/12 | 2026/01 | 7.000 | -- | 5.0 | Text | Text | No | Text summary (transformer enc-dec) | -- |
t5-base |
Google Research | 2018/05 | 2026/01 | 0.200 | -- | 4.2 | Text | Text | No | Text summary (transformer enc-dec) | -- |
PP-OCRv5_server_det |
PaddleOCR Team, Baidu | 2025/09 | 2026/01 | 0.100 | -- | 0.1 | Image, PDF | Text, Bounding boxes | No | Multimod CNN + transformer (txt detec + recog) | OCR to raw text |
idefics2-8b |
Hugging Face | 2024/04 | 2026/01 | 0.100 | -- | 32.0 | Text, Image | Text | No | Vision + language with summary and analysis | -- |
Segment-Anything-Model-2 |
Meta | 2024/07 | 2026/01 | 0.033 | -- | 0.1 | Image, Video | Image, Video | No | Vision-only and segmentation | -- |
gpt-oss-20b |
OpenAI | 2025/08 | 2026/02 | 20.900 | 128 | 14.0 | Text | Text | Yes | Conversational LLM | Corrupted |
Qwen2.5-VL-72B-Instruct |
Alibaba Cloud | 2024/09 | 2026/01 | 73.000 | 125 | 137.0 | Text, Image | Text | No | Multimodal LLM | -- |
Qwen2.5-VL-72B-Instruct-FP8-dynamic |
Alibaba Cloud | 2024/09 | 2026/01 | 73.000 | 125 | 72.0 | Text, Image | Text | No | Multimodal LLM | -- |
Llama-3.2-90B-Vision-Instruct-FP8-dynamic |
Meta | 2024/09 | 2026/01 | 89.000 | 128 | 86.0 | Text, Image | Text | No | Multimodal LLM | -- |
FLUX.1-dev |
Black Forest Labs | 2024/08 | 2026/02 | 12.000 | -- | 54.0 | Text | Image | No | Image gen (transformer + diffusion => FLUX) | -- |
FLUX.2-klein-9B |
Black Forest Labs | 2025/11 | 2026/02 | 9.000 | 40 | 50.0 | Text, Image | Image | No | Image gen (transformer + diffusion => FLUX) | Should work on RTX 4090 (~29 GB VRAM) |
FLUX.2-dev |
Black Forest Labs | 2025/11 | 2026/02 | 32.000 | -- | 166.0 | Text, Image | Image | No | Image gen (transformer + diffusion => FLUX) | -- |
FLUX.2-dev-bnb-4bit |
Black Forest Labs | 2025/11 | 2026/02 | 32.000 | -- | 32.0 | Text, Image | Image | No | Image gen (transformer + diffusion => FLUX) | Should work on RTX 4090 (~18 GB VRAM) |
Ollama Models¶
| Model Name | Alias of | From | First rel. date | DL date | Params (B) | Context size (K) | Model size on disk (GB) | Input | Output | MoE | Use cases and architecture | Comments |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
codellama:13b |
codellama:13b-instruct-q4_0 |
Meta | 2023/08 | 2026/02 | 13.000 | 16 | 7.4 | Text | Text | No | LLM for coding | -- |
codellama:34b |
codellama:34b-instruct-q4_0 |
Meta | 2023/08 | 2026/02 | 34.000 | 16 | 19.0 | Text | Text | No | LLM for coding | -- |
deepseek-coder-v2:16b |
deepseek-coder-v2:16b-lite-instruct-q4_0 |
Deepseek | 2024/07 | 2026/02 | 16.000 | 160 | 8.9 | Text | Text | Yes | LLM for coding | -- |
deepseek-r1:1.5b |
deepseek-r1:1.5b-qwen-distill-q4_K_M |
Deepseek | 2025/01 | 2026/02 | 1.500 | 128 | 1.1 | Text | Text | No | Conversational LLM | -- |
deepseek-r1:7b |
deepseek-r1:7b-qwen-distill-q4_K_M |
Deepseek | 2025/01 | 2026/02 | 7.000 | 128 | 4.7 | Text | Text | No | Conversational LLM | -- |
deepseek-r1:8b |
deepseek-r1:8b-0528-qwen3-q4_K_M |
Deepseek | 2025/01 | 2026/02 | 8.000 | 128 | 5.2 | Text | Text | No | Conversational LLM | -- |
deepseek-r1:14b |
deepseek-r1:14b-qwen-distill-q4_K_M |
Deepseek | 2025/01 | 2025/10 | 14.800 | 128 | 9.0 | Text | Text | No | Conversational LLM | -- |
deepseek-r1:32b |
deepseek-r1:32b-qwen-distill-q4_K_M |
Deepseek | 2025/01 | 2026/02 | 32.000 | 128 | 20.0 | Text | Text | No | Conversational LLM | -- |
deepseek-r1:70b |
deepseek-r1:70b-llama-distill-q4_K_M |
Deepseek | 2025/01 | 2026/02 | 70.000 | 128 | 43.0 | Text | Text | No | Conversational LLM | -- |
devstral-small-2:24b |
devstral-small-2:24b-instruct-2512-q4_K_M |
Mistral AI | 2025/12 | 2026/01 | 24.000 | 384 | 15.0 | Text, Image | Text | No | LLM for coding | Incompatible with Ollama v0.9.3+IPEX-LLM |
aiasistentworld/ERNIE-4.5-21B-A3B-Thinking-LLM:latest |
Q4_K_M |
Baidu | 2025/06 | 2026/02 | 21.800 | 128 | 13.0 | Text | Text | Yes | Conversational LLM | Incompatible with Ollama v0.9.3+IPEX-LLM |
gemma3:270m |
gemma3:270m-it-q8_0 |
Google DeepMind | 2025/03 | 2026/02 | 0.270 | 32 | 0.3 | Text | Text | No | Conversational LLM | Requires Ollama 0.6 or later |
gemma3:1b |
gemma3:1b-it-q4_K_M |
Google DeepMind | 2025/03 | 2026/02 | 1.000 | 32 | 0.8 | Text | Text | No | Conversational LLM | Requires Ollama 0.6 or later |
gemma3:4b |
gemma3:4b-it-q4_K_M |
Google DeepMind | 2025/03 | 2026/02 | 4.000 | 128 | 3.3 | Text, Image | Text | No | Conversational LLM | Requires Ollama 0.6 or later |
gemma3:12b |
gemma3:12b-it-q4_K_M |
Google DeepMind | 2025/03 | 2026/02 | 12.000 | 128 | 8.1 | Text, Image | Text | No | Conversational LLM | Requires Ollama 0.6 or later |
gemma3:27b |
gemma3:27b-it-q4_K_M |
Google DeepMind | 2025/03 | 2026/02 | 27.000 | 128 | 17.0 | Text, Image | Text | No | Conversational LLM | Requires Ollama 0.6 or later |
gemma4:e2b |
gemma4:e2b-it-q4_K_M |
Google DeepMind | 2026/04 | 2026/04 | 5.000 | 128 | 7.2 | Text, Image, Video | Text | Yes | Multimodal LLM | Requires Ollama 0.20.0 or later |
gemma4:e4b |
gemma4:e4b-it-q4_K_M |
Google DeepMind | 2026/04 | 2026/04 | 8.000 | 128 | 9.6 | Text, Image, Video | Text | Yes | Multimodal LLM | Requires Ollama 0.20.0 or later |
gemma4:26b |
gemma4:26b-a4b-it-q4_K_M |
Google DeepMind | 2026/04 | 2026/04 | 26.000 | 256 | 18.0 | Text, Image, Video | Text | Yes | Multimodal LLM | Requires Ollama 0.20.0 or later |
gemma4:31b |
gemma4:31b-it-q4_K_M |
Google DeepMind | 2026/04 | 2026/04 | 31.000 | 256 | 20.0 | Text, Image, Video | Text | No | Multimodal LLM | Requires Ollama 0.20.0 or later |
glm4:9b |
glm4:9b-chat-q4_0 |
Zhipu AI | 2024/06 | 2025/10 | 9.000 | 128 | 5.5 | Text | Text | No | Conversational LLM | Requires Ollama 0.2 or later |
glm-4.7-flash:q4_K_M |
-- | Zhipu AI | 2026/01 | 2025/10 | 30.000 | 198 | 19.0 | Text | Text | Yes | Conversational LLM | Requires Ollama 0.14.3 or later |
glm-4.7-flash:q8_0 |
-- | Zhipu AI | 2026/01 | 2025/10 | 30.000 | 198 | 32.0 | Text | Text | Yes | Conversational LLM | Requires Ollama 0.14.3 or later |
glm-4.7-flash:bf16 |
-- | Zhipu AI | 2026/01 | 2025/10 | 30.000 | 198 | 60.0 | Text | Text | Yes | Conversational LLM | Requires Ollama 0.14.3 or later |
gpt-oss:20b |
-- | OpenAI | 2025/08 | 2025/10 | 20.900 | 128 | 14.0 | Text | Text | Yes | Conversational LLM | Incompatible with Ollama v0.9.3+IPEX-LLM |
gpt-oss:120b |
-- | OpenAI | 2025/08 | 2025/10 | 120.000 | 128 | 65.0 | Text | Text | Yes | Conversational LLM | Incompatible with Ollama v0.9.3+IPEX-LLM |
granite4:350m |
granite4:350m-bf16 |
IBM | 2025/11 | 2026/02 | 0.350 | 32 | 0.7 | Text | Text | No | Conversational LLM | -- |
granite4:350m-h |
granite4:350m-h-q8_0 |
IBM | 2025/11 | 2026/02 | 0.350 | 32 | 0.4 | Text | Text | Yes | Conversational LLM | -- |
granite4:1b |
granite4:1b-bf16 |
IBM | 2025/11 | 2026/02 | 1.000 | 128 | 3.3 | Text | Text | No | Conversational LLM | -- |
granite4:1b-h |
granite4:1b-h-q8_0 |
IBM | 2025/11 | 2026/02 | 1.000 | 1000000 | 1.6 | Text | Text | Yes | Conversational LLM | -- |
granite4:3b |
granite4:micro (Q4_K_M) |
IBM | 2025/11 | 2026/02 | 3.000 | 128 | 2.1 | Text | Text | No | Conversational LLM | -- |
granite4:3b-h |
granite4:micro-h (Q4_K_M) |
IBM | 2025/11 | 2026/02 | 3.000 | 1000000 | 1.9 | Text | Text | Yes | Conversational LLM | -- |
granite4:7b-a1b-h |
granite4:tiny-h (Q4_K_M) |
IBM | 2025/11 | 2026/02 | 7.000 | 1000000 | 4.2 | Text | Text | Yes | Conversational LLM | -- |
granite4:32b-a9b-h |
granite4:small-h (Q4_K_M) |
IBM | 2025/11 | 2026/02 | 32.000 | 1000000 | 19.0 | Text | Text | Yes | Conversational LLM | -- |
internlm2.5:1.8b-chat |
-- | Shanghai AI Laboratory | 2024/07 | 2025/02 | 1.800 | 32 | 3.8 | Text | Text | No | Conversational LLM | -- |
internlm2.5:7b-chat |
-- | Shanghai AI Laboratory | 2024/07 | 2025/02 | 7.000 | 32 | 15.0 | Text | Text | No | Conversational LLM | -- |
internlm2.5:7b-chat-1m |
-- | Shanghai AI Laboratory | 2024/07 | 2025/02 | 7.000 | 256 | 15.0 | Text | Text | No | Conversational LLM | -- |
internlm2.5:20b-chat |
-- | Shanghai AI Laboratory | 2024/07 | 2025/02 | 20.000 | 32 | 40.0 | Text | Text | No | Conversational LLM | -- |
internlm3-8b-instruct |
-- | Shanghai AI Laboratory | 2025/01 | 2025/02 | 8.000 | 32 | 18.0 | Text | Text | No | Conversational LLM | -- |
llama2:7b |
llama2:7b-chat-q4_0 |
Meta | 2023/02 | 2025/02 | 7.000 | 4 | 3.8 | Text | Text | No | Conversational LLM | -- |
llama2:13b |
llama2:13b-chat-q4_0 |
Meta | 2023/02 | 2025/02 | 13.000 | 4 | 7.4 | Text | Text | No | Conversational LLM | -- |
llama2:70b |
llama2:70b-chat-q4_0 |
Meta | 2023/02 | 2025/02 | 70.000 | 4 | 39.0 | Text | Text | No | Conversational LLM | -- |
llama3.1:8b |
llama3.1:8b-instruct-q4_K_M |
Meta | 2024/07 | 2025/02 | 8.000 | 128 | 4.9 | Text | Text | No | Conversational LLM | -- |
llama3.1:70b |
llama3.1:70b-instruct-q4_K_M |
Meta | 2024/07 | 2025/02 | 70.000 | 128 | 43.0 | Text | Text | No | Conversational LLM | -- |
llama3.2:1b |
llama3.2:1b-instruct-q8_0 |
Meta | 2024/09 | 2025/02 | 1.000 | 128 | 1.3 | Text | Text | No | Conversational LLM | -- |
llama3.2:3b |
llama3.2:3b-instruct-q4_K_M |
Meta | 2024/09 | 2025/02 | 3.000 | 128 | 2.0 | Text | Text | No | Conversational LLM | -- |
llava:13b |
llava:13b-v1.6-vicuna-q4_0 |
Microsoft Research | 2023/10 | 2026/02 | 13.000 | 4 | 8.0 | Text, Image | Text | No | Multimodal LLM | -- |
llava:34b |
llava:34b-v1.6-q4_0 |
Microsoft Research | 2023/10 | 2026/02 | 34.000 | 4 | 20.0 | Text, Image | Text | No | Multimodal LLM | -- |
llava-llama3:8b |
llava-llama3:8b-v1.1-q4_0 |
Microsoft Research | 2024/04 | 2026/02 | 8.000 | 8 | 5.5 | Text, Image | Text | No | Multimodal LLM | -- |
mistral:7b |
mistral:7b-instruct-v0.3-q4_K_M |
Mistral AI | 2023/09 | 2026/03 | 7.000 | 32 | 4.4 | Text | Text | No | Conversational LLM | -- |
mistral-small3.2:24b |
mistral-small3.2:24b-instruct-2506-q4_K_M |
Mistral AI | 2025/06 | 2026/01 | 24.000 | 128 | 15.0 | Text, Image | Text | No | Multimodal LLM | -- |
mistral-nemo |
mistral-nemo:12b-instruct-2407-q4_0 |
Mistral AI | 2024/07 | 2026/03 | 12.000 | 1000 | 7.1 | Text | Text | No | Conversational LLM | -- |
mixtral:8x7b |
mixtral:8x7b-instruct-v0.1-q4_0 |
Mistral AI | 2023/12 | 2026/01 | 57.000 | 32 | 26.0 | Text | Text | Yes | Conversational LLM | -- |
mixtral:8x22b |
mixtral:8x22b-instruct-v0.1-q4_0 |
Mistral AI | 2023/12 | 2025/10 | 140.600 | 64 | 80.0 | Text | Text | Yes | Conversational LLM | -- |
nomic-embed-text-v2-moe |
-- | Nomic AI | 2025/02 | 2026/01 | 0.305 | 512 | 1.0 | Text | Text | Yes | LLM for multilingual retrieval | -- |
olmo-3:7b |
olmo-3:7b-think-q4_K_M |
Allen AI | 2025/11 | 2026/02 | 7.000 | 64 | 4.5 | Text | Text | No | Conversational LLM | Incompatible with Ollama v0.9.3+IPEX-LLM |
olmo-3:32b |
olmo-3:32b-think-q4_K_M |
Allen AI | 2025/11 | 2026/02 | 32.000 | 64 | 19.0 | Text | Text | No | Conversational LLM | Incompatible with Ollama v0.9.3+IPEX-LLM |
olmo-3.1:32b |
olmo-3.1:32b-think-q4_K_M |
Allen AI | 2025/12 | 2026/02 | 32.000 | 64 | 19.0 | Text | Text | No | Conversational LLM | Incompatible with Ollama v0.9.3+IPEX-LLM |
olmo-3.1:32b-instruct |
olmo-3.1:32b-instruct-q4_K_M |
Allen AI | 2025/12 | 2026/02 | 32.000 | 64 | 19.0 | Text | Text | No | Conversational LLM | Incompatible with Ollama v0.9.3+IPEX-LLM |
phi4:14b |
phi4:14b-q4_K_M |
Microsoft | 2025/01 | 2026/02 | 14.000 | 16 | 9.1 | Text | Text | No | Conversational LLM | -- |
phi4-mini:3.8b |
phi4-mini:3.8b-q4_K_M |
Microsoft | 2025/01 | 2026/02 | 3.800 | 128 | 2.5 | Text | Text | No | Conversational LLM | -- |
phi4-reasoning:14b |
phi4-reasoning:14b-q4_K_M |
Microsoft | 2025/04 | 2026/02 | 14.000 | 16 | 11.0 | Text | Text | No | Conversational LLM | -- |
phi4-mini-reasoning:3.8b |
phi4-mini-reasoning:3.8b-q4_K_M |
Microsoft | 2025/01 | 2026/02 | 3.800 | 128 | 3.2 | Text | Text | No | Conversational LLM | -- |
qwen2.5:0.5b |
qwen2.5:0.5b-instruct-q4_K_M |
Alibaba Cloud | 2024/09 | 2026/02 | 0.500 | 32 | 0.4 | Text | Text | No | Conversational LLM | -- |
qwen2.5:1.5b |
qwen2.5:1.5b-instruct-q4_K_M |
Alibaba Cloud | 2024/09 | 2026/02 | 1.500 | 32 | 1.0 | Text | Text | No | Conversational LLM | -- |
qwen2.5:3b |
qwen2.5:3b-instruct-q4_K_M |
Alibaba Cloud | 2024/09 | 2026/02 | 3.000 | 32 | 1.9 | Text | Text | No | Conversational LLM | -- |
qwen2.5:7b |
qwen2.5:7b-instruct-q4_K_M |
Alibaba Cloud | 2024/09 | 2026/02 | 7.000 | 32 | 4.7 | Text | Text | No | Conversational LLM | -- |
qwen2.5:14b |
qwen2.5:14b-instruct-q4_K_M |
Alibaba Cloud | 2024/09 | 2026/02 | 14.000 | 32 | 9.0 | Text | Text | No | Conversational LLM | -- |
qwen2.5:32b |
qwen2.5:32b-instruct-q4_K_M |
Alibaba Cloud | 2024/09 | 2026/02 | 32.000 | 32 | 20.0 | Text | Text | No | Conversational LLM | -- |
qwen2.5:72b |
qwen2.5:72b-instruct-q4_K_M |
Alibaba Cloud | 2024/09 | 2026/02 | 72.000 | 32 | 47.0 | Text | Text | No | Conversational LLM | -- |
qwen2.5vl:7b |
qwen2.5vl:7b-q4_K_M |
Alibaba Cloud | 2024/12 | 2026/02 | 32.000 | 125 | 6.0 | Text, Image | Text | No | Multimodal LLM | -- |
qwen2.5vl:32b |
qwen2.5vl:32b-q4_K_M |
Alibaba Cloud | 2024/12 | 2026/02 | 32.000 | 125 | 21.0 | Text, Image | Text | No | Multimodal LLM | -- |
qwen3:0.6b |
qwen3:0.6b-q4_K_M |
Alibaba Cloud | 2025/04 | 2025/10 | 0.600 | 40 | 0.5 | Text | Text | No | Conversational LLM | -- |
qwen3:1.7b |
qwen3:1.7b-q4_K_M |
Alibaba Cloud | 2025/04 | 2025/10 | 1.700 | 40 | 1.4 | Text | Text | No | Conversational LLM | -- |
qwen3:4b |
qwen3:4b-q4_K_M |
Alibaba Cloud | 2025/04 | 2025/10 | 4.000 | 256 | 2.5 | Text | Text | No | Conversational LLM | -- |
qwen3:8b |
qwen3:4b-thinking-2507-q4_K_M |
Alibaba Cloud | 2025/04 | 2025/10 | 8.000 | 40 | 5.2 | Text | Text | No | Conversational LLM | -- |
qwen3:14b |
qwen3:14b-thinking-2507-q4_K_M |
Alibaba Cloud | 2025/04 | 2025/10 | 14.000 | 40 | 9.3 | Text | Text | No | Conversational LLM | -- |
qwen3:30b |
qwen3:30b-a3b-thinking-2507-q4_K_M |
Alibaba Cloud | 2025/04 | 2025/10 | 30.500 | 256 | 19.0 | Text | Text | Yes | Conversational LLM | -- |
qwen3:32b |
qwen3:32b-q4_K_M |
Alibaba Cloud | 2025/04 | 2026/02 | 32.000 | 40 | 20.0 | Text | Text | No | Conversational LLM | -- |
qwen3-coder:30b |
qwen3-coder:30b-a3b-q4_K_M |
Alibaba Cloud | 2025/08 | 2025/10 | 30.500 | 256 | 19.0 | Text | Text | Yes | LLM for coding | -- |
qwen3-coder-next:latest |
qwen3-coder-next:q4_K_M |
Alibaba Cloud | 2026/02 | 2026/03 | 80.000 | 256 | 52.0 | Text | Text | Yes | LLM for coding | -- |
qwen3-vl:2b |
qwen3-vl:2b-thinking-q4_K_M |
Alibaba Cloud | 2025/10 | 2026/02 | 2.000 | 256 | 1.9 | Text, Image | Text | No | Multimodal LLM | Incompatible with Ollama v0.9.3+IPEX-LLM |
qwen3-vl:4b |
qwen3-vl:4b-thinking-q4_K_M |
Alibaba Cloud | 2025/10 | 2026/02 | 4.000 | 256 | 3.3 | Text, Image | Text | No | Multimodal LLM | Incompatible with Ollama v0.9.3+IPEX-LLM |
qwen3-vl:8b |
qwen3-vl:8b-thinking-q4_K_M |
Alibaba Cloud | 2025/10 | 2026/02 | 8.000 | 256 | 6.1 | Text, Image | Text | No | Multimodal LLM | Incompatible with Ollama v0.9.3+IPEX-LLM |
qwen3-vl:30b |
qwen3-vl:30b-a3b-thinking-q4_K_M |
Alibaba Cloud | 2025/10 | 2026/02 | 30.000 | 256 | 20.0 | Text, Image | Text | Yes | Multimodal LLM | Incompatible with Ollama v0.9.3+IPEX-LLM |
qwen3-vl:32b |
qwen3-vl:32b-thinking-q4_K_M |
Alibaba Cloud | 2025/10 | 2026/02 | 32.000 | 256 | 21.0 | Text, Image | Text | No | Multimodal LLM | Incompatible with Ollama v0.9.3+IPEX-LLM |
qwen3.5:0.8b |
qwen3.5:0.8b-q8_0 |
Alibaba Cloud | 2026/02 | 2026/03 | 0.800 | 256 | 1.0 | Text, Image | Text | No | Multimodal LLM | Requires Ollama 0.17.4 or later |
qwen3.5:2b |
qwen3.5:2b-q8_0 |
Alibaba Cloud | 2026/02 | 2026/03 | 2.000 | 256 | 2.7 | Text, Image | Text | No | Multimodal LLM | Requires Ollama 0.17.4 or later |
qwen3.5:4b |
qwen3.5:4b-q4_K_M |
Alibaba Cloud | 2026/02 | 2026/03 | 4.000 | 256 | 3.4 | Text, Image | Text | No | Multimodal LLM | Requires Ollama 0.17.4 or later |
qwen3.5:9b |
qwen3.5:9b-q4_K_M |
Alibaba Cloud | 2026/02 | 2026/03 | 9.000 | 256 | 6.6 | Text, Image | Text | No | Multimodal LLM | Requires Ollama 0.17.4 or later |
qwen3.5:27b |
qwen3.5:27b-q4_K_M |
Alibaba Cloud | 2026/02 | 2026/03 | 27.000 | 256 | 17.0 | Text, Image | Text | No | Multimodal LLM | Requires Ollama 0.17.4 or later |
qwen3.5:35b |
qwen3.5:35b-a3b-q4_K_M |
Alibaba Cloud | 2026/02 | 2026/03 | 35.000 | 256 | 24.0 | Text, Image | Text | Yes | Multimodal LLM | Requires Ollama 0.17.4 or later |
qwen3.5:122b |
qwen3.5:122b-a10b-q4_K_M |
Alibaba Cloud | 2026/02 | 2026/03 | 122.000 | 256 | 81.0 | Text, Image | Text | Yes | Multimodal LLM | Requires Ollama 0.17.4 or later |
Technical Details about ai-models Group¶
For users in the ai-models group, it has been ensured that created files and
folders will have the ai-models group by default. For this, the setgid bit
has been added on /mnt/nfs/ai-models and sub-folders:
Then, still in the /mnt/nfs/ai-models folder, the default group rights have
been updated to force rwx on new created folders and rw on new created
files:
# install ACL to have the `setfacl` command
sudo apt install acl
# apply ACL to existing files
find /mnt/nfs/ai-models -type d -exec sudo setfacl -m g:ai-models:rwx {} +
find /mnt/nfs/ai-models -type f -exec sudo setfacl -m g:ai-models:rw- {} +
# apply ACL to the future files
sudo setfacl -R -d -m g:ai-models:rwx /mnt/nfs/ai-models