AI Models and Runtime¶
Running LLMs with Ollama¶
To run LLMs on Dalek nodes, you can use the Open WebUI frontend on your machine locally and connect directly to a node running Ollama backend.
Install and Run the OpenWeb-UI Frontend¶
From the quick-start page, there are multiple installation methods. We follow here the one for docker. Pull the image from the docker repositories:
Run it on your local machine, making it available on localhost:3000:
sudo docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway \
-e WEBUI_AUTH=False \
-v open-webui:/app/backend/data \
--name open-webui ghcr.io/open-webui/open-webui:main
Info
The important options here are that we launch openwebui in single user mode
(we don't want to handle multiple accounts on a local machine) and the
--add-host=host.docker.internal:host-gateway option will redirect
localhost of you main OS to host.docker.internal inside the docker
container. This will simplify the connection to the node(s) later.
Run a Backend on a Node¶
Multiple backends are available but Open WebUI handles out-of-the-box Ollama and makes getting models running easy.
You need to allocate exclusively the node to be able to connect with SSH on it.
This can be done by connecting to the front then issuing either an srun or
sbatch:
Then, on the node, start ollama by running:
Info
This makes Ollama listen to web requests on the node, on port 11434.
Meeting the Ends¶
You then need to perform port forwarding to make your local and node 11434 ports communicate. On an allocated node, run the following:
Info
This commands connects to the node by first performing a jump on the front.
It then performs port forwarding to bind your port 11434 with the node's
localhost:11434. The binding is done on all your local interfaces. This
command runs in the background without launching a shell so if no error are
printed after issuing it, it is working properly.
[USER_DALEK] is your Dalek login and [NODE_NAME] is the name of the
node you want to connect to (typically something like az4-n4090-2 or
az4-a7900-0).
To check if it works, you can browse
http://localhost:11434/ and it should display
Ollama is running.
Last step is to navigate to Open WebUI
Admin Settings (or
localhost:3000 page, User > Settings > Administration
Settings > Connections > Ollama API (or ) and make sure that the Ollama API
connection is set to http://host.docker.internal:11434.
That's all folks!
Installation Notes¶
Models are installed and shared by all the users. The list of the available
models is given in the previous section. So,
by default, you cannot add or update these models (unless if you are in the
ai-models group, which is not automatic). If you need specific models or
specific versions, you can do unset OLLAMA_MODELS after the
module load ollama in order for Ollama to default to your home repository as
the default model library. As the disk quota available on the NFS is limited,
you need to set Ollama default models path to the scratch and download your
models there.
For example, you can do the following to store the models on the scratch:
module load ollama
mkdir -p /scratch/$USER/ollama/
OLLAMA_MODELS=/scratch/$USER/ollama/ ollama serve
Danger
Please keep in mind that models are heavy. When you store them on the NFS or the scratch, keep an eye on what you really use and clean occasionally.
Preinstalled AI Models¶
Preinstalled AI models are located in the /mnt/nfs/ai-models folder.
Everyone can read this folder but only users in the ai-models group can modify
it.
For now the /mnt/nfs/ai-models folder contains three sub-folders:
gguf: Models in the GGUF format.huggingface-snapshots: Models downloaded from Hugging Face, the format can differ depending on the repository.ollama: Models downloaded from Ollama (proprietary format, only works with Ollama).
The following sub-sections detail the models that are available depending on the three sub-folders.
GGUF Models¶
| Model Name | From | First rel. date | DL date | Params (B) | Context size (K) | Model size on disk (GB) | Input | Output | MoE | Use cases and architecture | Comments |
|---|---|---|---|---|---|---|---|---|---|---|---|
gpt-oss-20b-mxfp4 |
OpenAI | 2025/08 | 2025/10 | 20.000 | 128 | 14.0 | Text | Text | Yes | General LLM | -- |
llama-2-7b.Q4_0 |
Meta | 2023/07 | 2025/11 | 7.000 | 4 | 3.6 | Text | Text | No | General LLM | -- |
Hugging Face Models¶
| Model Name | From | First rel. date | DL date | Params (B) | Context size (K) | Model size on disk (GB) | Input | Output | MoE | Use cases and architecture | Comments |
|---|---|---|---|---|---|---|---|---|---|---|---|
donut-base |
NAVER Labs AI | 2021/11 | 2026/01 | 0.250 | -- | 0.8 | Text, Image, PDF | Text | No | Document understanding (transformer enc-dec) | OCR-free |
layoutlmv2-base-uncased |
Microsoft Research Asia | 2020/12 | 2026/01 | 0.200 | -- | 0.8 | Text, Image | Text | No | Document understanding (transformer enc-only) | With OCR |
layoutlmv3-base |
Microsoft Research Asia | 2022/04 | 2026/01 | 0.100 | -- | 1.9 | Text, Image, PDF | Text | No | Document understanding (transformer enc-only) | With OCR |
roberta-base-squad2 |
Deepset | 2023/06 | 2026/01 | 0.100 | -- | 2.4 | Text | Text | No | Extractive QA (transformer enc-only) | -- |
distilbert-base-cased |
Hugging Face | 2019/09 | 2026/01 | 0.065 | -- | 1.1 | Text | Text | No | Extractive QA (transformer enc-only) | -- |
bart-large-cnn |
Facebook AI | 2019/10 | 2026/01 | 0.400 | -- | 8.0 | Text | Text | No | Text summary (transformer enc-dec) | -- |
pegasus-cnn_dailymail |
Google Research | 2018/12 | 2026/01 | 7.000 | -- | 5.0 | Text | Text | No | Text summary (transformer enc-dec) | -- |
t5-base |
Google Research | 2018/05 | 2026/01 | 0.200 | -- | 4.2 | Text | Text | No | Text summary (transformer enc-dec) | -- |
PP-OCRv5_server_det |
PaddleOCR Team, Baidu | 2025/09 | 2026/01 | 0.100 | -- | 0.1 | Image, PDF | Text, Bounding boxes | No | Multimod CNN + transformer (txt detec + recog) | OCR to raw text |
idefics2-8b |
Hugging Face | 2024/04 | 2026/01 | 0.100 | -- | 32.0 | Text, Image | Text | No | Vision + language with summary and analysis | -- |
Segment-Anything-Model-2 |
Meta | 2024/07 | 2026/01 | 0.033 | -- | 0.1 | Image, Video | Image, Video | No | Vision-only and segmentation | -- |
gpt-oss-20b |
OpenAI | 2025/08 | 2026/02 | 20.900 | 128 | 14.0 | Text | Text | Yes | General LLM | Corrupted |
Qwen2.5-VL-72B-Instruct |
Alibaba Cloud | 2024/09 | 2026/01 | 73.000 | 125 | 137.0 | Text, Image | Text | No | LVLM | -- |
Qwen2.5-VL-72B-Instruct-FP8-dynamic |
Alibaba Cloud | 2024/09 | 2026/01 | 73.000 | 125 | 72.0 | Text, Image | Text | No | LVLM | -- |
Llama-3.2-90B-Vision-Instruct-FP8-dynamic |
Meta | 2024/09 | 2026/01 | 89.000 | 128 | 86.0 | Text, Image | Text | No | LVLM | -- |
FLUX.1-dev |
Black Forest Labs | 2024/08 | 2026/02 | 12.000 | -- | 54.0 | Text | Image | No | Image gen (transformer + diffusion => FLUX) | -- |
FLUX.2-klein-9B |
Black Forest Labs | 2025/11 | 2026/02 | 9.000 | 40 | 50.0 | Text, Image | Image | No | Image gen (transformer + diffusion => FLUX) | Should work on RTX 4090 (~29 GB VRAM) |
FLUX.2-dev |
Black Forest Labs | 2025/11 | 2026/02 | 32.000 | -- | 166.0 | Text, Image | Image | No | Image gen (transformer + diffusion => FLUX) | -- |
FLUX.2-dev-bnb-4bit |
Black Forest Labs | 2025/11 | 2026/02 | 32.000 | -- | 32.0 | Text, Image | Image | No | Image gen (transformer + diffusion => FLUX) | Should work on RTX 4090 (~18 GB VRAM) |
Ollama Models¶
| Model Name | Alias of | From | First rel. date | DL date | Params (B) | Context size (K) | Model size on disk (GB) | Input | Output | MoE | Use cases and architecture | Comments |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
deepseek-r1:1.5b |
deepseek-r1:1.5b-qwen-distill-q4_K_M |
Deepseek | 2025/01 | 2026/02 | 1.500 | 128 | 1.1 | Text | Text | No | General LLM | -- |
deepseek-r1:7b |
deepseek-r1:7b-qwen-distill-q4_K_M |
Deepseek | 2025/01 | 2026/02 | 7.000 | 128 | 4.7 | Text | Text | No | General LLM | -- |
deepseek-r1:8b |
deepseek-r1:8b-0528-qwen3-q4_K_M |
Deepseek | 2025/01 | 2026/02 | 8.000 | 128 | 5.2 | Text | Text | No | General LLM | -- |
deepseek-r1:14b |
deepseek-r1:14b-qwen-distill-q4_K_M |
Deepseek | 2025/01 | 2025/10 | 14.800 | 128 | 9.0 | Text | Text | No | General LLM | -- |
deepseek-r1:32b |
deepseek-r1:32b-qwen-distill-q4_K_M |
Deepseek | 2025/01 | 2026/02 | 32.000 | 128 | 20.0 | Text | Text | No | General LLM | -- |
deepseek-r1:70b |
deepseek-r1:70b-llama-distill-q4_K_M |
Deepseek | 2025/01 | 2026/02 | 70.000 | 128 | 43.0 | Text | Text | No | General LLM | -- |
aiasistentworld/ERNIE-4.5-21B-A3B-Thinking-LLM:latest |
Q4_K_M |
Baidu | 2025/06 | 2026/02 | 21.800 | 128 | 13.0 | Text | Text | Yes | General LLM | Incompatible with Ollama v0.9.3+ipex-llm |
gemma3:270m |
gemma3:270m-it-q8_0 |
Google DeepMind | 2025/03 | 2026/02 | 0.270 | 32 | 0.3 | Text | Text | No | General LLM | Requires Ollama 0.6 or later |
gemma3:1b |
gemma3:1b-it-q4_K_M |
Google DeepMind | 2025/03 | 2026/02 | 1.000 | 32 | 0.8 | Text | Text | No | General LLM | Requires Ollama 0.6 or later |
gemma3:4b |
gemma3:4b-it-q4_K_M |
Google DeepMind | 2025/03 | 2026/02 | 4.000 | 128 | 3.3 | Text, Image | Text | No | General LLM | Requires Ollama 0.6 or later |
gemma3:12b |
gemma3:12b-it-q4_K_M |
Google DeepMind | 2025/03 | 2026/02 | 12.000 | 128 | 8.1 | Text, Image | Text | No | General LLM | Requires Ollama 0.6 or later |
gemma3:27b |
gemma3:27b-it-q4_K_M |
Google DeepMind | 2025/03 | 2026/02 | 27.000 | 128 | 17.0 | Text, Image | Text | No | General LLM | Requires Ollama 0.6 or later |
glm4:9b |
glm4:9b-chat-q4_0 |
Zhipu AI | 2024/06 | 2025/10 | 9.000 | 128 | 5.5 | Text | Text | No | General LLM | Requires Ollama 0.2 or later |
glm-4.7-flash:q4_K_M |
-- | Zhipu AI | 2026/01 | 2025/10 | 30.000 | 198 | 19.0 | Text | Text | Yes | General LLM | Requires Ollama 0.14.3 or later |
glm-4.7-flash:q8_0 |
-- | Zhipu AI | 2026/01 | 2025/10 | 30.000 | 198 | 32.0 | Text | Text | Yes | General LLM | Requires Ollama 0.14.3 or later |
glm-4.7-flash:bf16 |
-- | Zhipu AI | 2026/01 | 2025/10 | 30.000 | 198 | 60.0 | Text | Text | Yes | General LLM | Requires Ollama 0.14.3 or later |
gpt-oss:20b |
-- | OpenAI | 2025/08 | 2025/10 | 20.900 | 128 | 14.0 | Text | Text | Yes | General LLM | Incompatible with Ollama v0.9.3+ipex-llm |
gpt-oss:120b |
-- | OpenAI | 2025/08 | 2025/10 | 120.000 | 128 | 65.0 | Text | Text | Yes | General LLM | Incompatible with Ollama v0.9.3+ipex-llm |
granite4:350m |
granite4:350m-bf16 |
IBM | 2025/11 | 2026/02 | 0.350 | 32 | 0.7 | Text | Text | No | General LLM | |
granite4:350m-h |
granite4:350m-h-q8_0 |
IBM | 2025/11 | 2026/02 | 0.350 | 32 | 0.4 | Text | Text | Yes | General LLM | |
granite4:1b |
granite4:1b-bf16 |
IBM | 2025/11 | 2026/02 | 1.000 | 128 | 3.3 | Text | Text | No | General LLM | |
granite4:1b-h |
granite4:1b-h-q8_0 |
IBM | 2025/11 | 2026/02 | 1.000 | 1000000 | 1.6 | Text | Text | Yes | General LLM | |
granite4:3b |
granite4:micro (Q4_K_M) |
IBM | 2025/11 | 2026/02 | 3.000 | 128 | 2.1 | Text | Text | No | General LLM | |
granite4:3b-h |
granite4:micro-h (Q4_K_M) |
IBM | 2025/11 | 2026/02 | 3.000 | 1000000 | 1.9 | Text | Text | Yes | General LLM | |
granite4:7b-a1b-h |
granite4:tiny-h (Q4_K_M) |
IBM | 2025/11 | 2026/02 | 7.000 | 1000000 | 4.2 | Text | Text | Yes | General LLM | |
granite4:32b-a9b-h |
granite4:small-h (Q4_K_M) |
IBM | 2025/11 | 2026/02 | 32.000 | 1000000 | 19.0 | Text | Text | Yes | General LLM | |
internlm2.5:1.8b-chat |
-- | Shanghai AI Laboratory | 2024/07 | 2025/02 | 1.800 | 32 | 3.8 | Text | Text | No | General LLM | -- |
internlm2.5:7b-chat |
-- | Shanghai AI Laboratory | 2024/07 | 2025/02 | 7.000 | 32 | 15.0 | Text | Text | No | General LLM | -- |
internlm2.5:7b-chat-1m |
-- | Shanghai AI Laboratory | 2024/07 | 2025/02 | 7.000 | 256 | 15.0 | Text | Text | No | General LLM | -- |
internlm2.5:20b-chat |
-- | Shanghai AI Laboratory | 2024/07 | 2025/02 | 20.000 | 32 | 40.0 | Text | Text | No | General LLM | -- |
internlm3-8b-instruct |
-- | Shanghai AI Laboratory | 2025/01 | 2025/02 | 8.000 | 32 | 18.0 | Text | Text | No | General LLM | -- |
llama2:7b |
llama2:7b-chat-q4_0 |
Meta | 2023/02 | 2025/02 | 7.000 | 4 | 3.8 | Text | Text | No | General LLM | -- |
llama2:13b |
llama2:13b-chat-q4_0 |
Meta | 2023/02 | 2025/02 | 13.000 | 4 | 7.4 | Text | Text | No | General LLM | -- |
llama2:70b |
llama2:70b-chat-q4_0 |
Meta | 2023/02 | 2025/02 | 70.000 | 4 | 39.0 | Text | Text | No | General LLM | -- |
llama3.1:8b |
llama3.1:8b-instruct-q4_K_M |
Meta | 2024/07 | 2025/02 | 8.000 | 128 | 4.9 | Text | Text | No | General LLM | -- |
llama3.1:70b |
llama3.1:70b-instruct-q4_K_M |
Meta | 2024/07 | 2025/02 | 70.000 | 128 | 43.0 | Text | Text | No | General LLM | -- |
llama3.2:1b |
llama3.2:1b-instruct-q8_0 |
Meta | 2024/09 | 2025/02 | 1.000 | 128 | 1.3 | Text | Text | No | General LLM | -- |
llama3.2:3b |
llama3.2:3b-instruct-q4_K_M |
Meta | 2024/09 | 2025/02 | 3.000 | 128 | 2.0 | Text | Text | No | General LLM | -- |
mistral:7b |
mistral:7b-instruct-v0.3-q4_K_M |
Mistral AI | 2023/09 | 2026/03 | 7.000 | 32 | 4.4 | Text | Text | No | General LLM | -- |
mistral-nemo |
mistral-nemo:12b-instruct-2407-q4_0 |
Mistral AI | 2024/07 | 2026/03 | 12.000 | 1000 | 7.1 | Text | Text | No | General LLM | -- |
mixtral:8x7b |
mixtral:8x7b-instruct-v0.1-q4_0 |
Mistral AI | 2023/12 | 2026/01 | 57.000 | 32 | 26.0 | Text | Text | Yes | General LLM | -- |
mixtral:8x22b |
mixtral:8x22b-instruct-v0.1-q4_0 |
Mistral AI | 2023/12 | 2025/10 | 140.600 | 64 | 80.0 | Text | Text | Yes | General LLM | -- |
olmo-3:7b |
olmo-3:7b-think-q4_K_M |
Allen AI | 2025/11 | 2026/02 | 7.000 | 64 | 4.5 | Text | Text | No | General LLM | Incompatible with Ollama v0.9.3+ipex-llm |
olmo-3:32b |
olmo-3:32b-think-q4_K_M |
Allen AI | 2025/11 | 2026/02 | 32.000 | 64 | 19.0 | Text | Text | No | General LLM | Incompatible with Ollama v0.9.3+ipex-llm |
olmo-3.1:32b |
olmo-3.1:32b-think-q4_K_M |
Allen AI | 2025/12 | 2026/02 | 32.000 | 64 | 19.0 | Text | Text | No | General LLM | Incompatible with Ollama v0.9.3+ipex-llm |
olmo-3.1:32b-instruct |
olmo-3.1:32b-instruct-q4_K_M |
Allen AI | 2025/12 | 2026/02 | 32.000 | 64 | 19.0 | Text | Text | No | General LLM | Incompatible with Ollama v0.9.3+ipex-llm |
phi4:14b |
phi4:14b-q4_K_M |
Microsoft | 2025/01 | 2026/02 | 14.000 | 16 | 9.1 | Text | Text | No | General LLM | -- |
phi4-mini:3.8b |
phi4-mini:3.8b-q4_K_M |
Microsoft | 2025/01 | 2026/02 | 3.800 | 128 | 2.5 | Text | Text | No | General LLM | -- |
phi4-reasoning:14b |
phi4-reasoning:14b-q4_K_M |
Microsoft | 2025/04 | 2026/02 | 14.000 | 16 | 11.0 | Text | Text | No | General LLM | -- |
phi4-mini-reasoning:3.8b |
phi4-mini-reasoning:3.8b-q4_K_M |
Microsoft | 2025/01 | 2026/02 | 3.800 | 128 | 3.2 | Text | Text | No | General LLM | -- |
qwen2.5:0.5b |
qwen2.5:0.5b-instruct-q4_K_M |
Alibaba Cloud | 2024/09 | 2026/02 | 0.500 | 32 | 0.4 | Text | Text | No | General LLM | -- |
qwen2.5:1.5b |
qwen2.5:1.5b-instruct-q4_K_M |
Alibaba Cloud | 2024/09 | 2026/02 | 1.500 | 32 | 1.0 | Text | Text | No | General LLM | -- |
qwen2.5:3b |
qwen2.5:3b-instruct-q4_K_M |
Alibaba Cloud | 2024/09 | 2026/02 | 3.000 | 32 | 1.9 | Text | Text | No | General LLM | -- |
qwen2.5:7b |
qwen2.5:7b-instruct-q4_K_M |
Alibaba Cloud | 2024/09 | 2026/02 | 7.000 | 32 | 4.7 | Text | Text | No | General LLM | -- |
qwen2.5:14b |
qwen2.5:14b-instruct-q4_K_M |
Alibaba Cloud | 2024/09 | 2026/02 | 14.000 | 32 | 9.0 | Text | Text | No | General LLM | -- |
qwen2.5:32b |
qwen2.5:32b-instruct-q4_K_M |
Alibaba Cloud | 2024/09 | 2026/02 | 32.000 | 32 | 20.0 | Text | Text | No | General LLM | -- |
qwen2.5:72b |
qwen2.5:72b-instruct-q4_K_M |
Alibaba Cloud | 2024/09 | 2026/02 | 72.000 | 32 | 47.0 | Text | Text | No | General LLM | -- |
qwen3:0.6b |
qwen3:0.6b-q4_K_M |
Alibaba Cloud | 2025/04 | 2025/10 | 0.600 | 40 | 0.5 | Text | Text | No | General LLM | -- |
qwen3:1.7b |
qwen3:1.7b-q4_K_M |
Alibaba Cloud | 2025/04 | 2025/10 | 1.700 | 40 | 1.4 | Text | Text | No | General LLM | -- |
qwen3:4b |
qwen3:4b-q4_K_M |
Alibaba Cloud | 2025/04 | 2025/10 | 4.000 | 256 | 2.5 | Text | Text | No | General LLM | -- |
qwen3:8b |
qwen3:4b-thinking-2507-q4_K_M |
Alibaba Cloud | 2025/04 | 2025/10 | 8.000 | 40 | 5.2 | Text | Text | No | General LLM | -- |
qwen3:14b |
qwen3:14b-thinking-2507-q4_K_M |
Alibaba Cloud | 2025/04 | 2025/10 | 14.000 | 40 | 9.3 | Text | Text | No | General LLM | -- |
qwen3:30b |
qwen3:30b-a3b-thinking-2507-q4_K_M |
Alibaba Cloud | 2025/04 | 2025/10 | 30.500 | 256 | 19.0 | Text | Text | Yes | General LLM | -- |
qwen3:32b |
qwen3:32b-q4_K_M |
Alibaba Cloud | 2025/04 | 2026/02 | 32.000 | 40 | 20.0 | Text | Text | No | General LLM | -- |
codellama:13b |
codellama:13b-instruct-q4_0 |
Meta | 2023/08 | 2026/02 | 13.000 | 16 | 7.4 | Text | Text | No | LLM for coding | -- |
codellama:34b |
codellama:34b-instruct-q4_0 |
Meta | 2023/08 | 2026/02 | 34.000 | 16 | 19.0 | Text | Text | No | LLM for coding | -- |
deepseek-coder-v2:16b |
deepseek-coder-v2:16b-lite-instruct-q4_0 |
Deepseek | 2024/07 | 2026/02 | 16.000 | 160 | 8.9 | Text | Text | Yes | LLM for coding | -- |
devstral-small-2:24b |
devstral-small-2:24b-instruct-2512-q4_K_M |
Mistral AI | 2025/12 | 2026/01 | 24.000 | 384 | 15.0 | Text, Image | Text | No | LLM for coding | Incompatible with Ollama v0.9.3+ipex-llm |
qwen3-coder:30b |
qwen3-coder:30b-a3b-q4_K_M |
Alibaba Cloud | 2025/08 | 2025/10 | 30.500 | 256 | 19.0 | Text | Text | Yes | LLM for coding | -- |
nomic-embed-text-v2-moe |
-- | Nomic AI | 2025/02 | 2026/01 | 0.305 | 512 | 1.0 | Text | Text | Yes | LLM for multilingual retrieval | -- |
llava-llama3:8b |
llava-llama3:8b-v1.1-q4_0 |
Microsoft Research | 2024/04 | 2026/02 | 8.000 | 8 | 5.5 | Text, Image | Text | No | LVLM | -- |
llava:13b |
llava:13b-v1.6-vicuna-q4_0 |
Microsoft Research | 2023/10 | 2026/02 | 13.000 | 4 | 8.0 | Text, Image | Text | No | LVLM | -- |
llava:34b |
llava:34b-v1.6-q4_0 |
Microsoft Research | 2023/10 | 2026/02 | 34.000 | 4 | 20.0 | Text, Image | Text | No | LVLM | -- |
mistral-small3.2:24b |
mistral-small3.2:24b-instruct-2506-q4_K_M |
Mistral AI | 2025/06 | 2026/01 | 24.000 | 128 | 15.0 | Text, Image | Text | No | LVLM | -- |
qwen2.5vl:7b |
qwen2.5vl:7b-q4_K_M |
Alibaba Cloud | 2024/12 | 2026/02 | 32.000 | 125 | 6.0 | Text, Image | Text | No | LVLM + detection | -- |
qwen2.5vl:32b |
qwen2.5vl:32b-q4_K_M |
Alibaba Cloud | 2024/12 | 2026/02 | 32.000 | 125 | 21.0 | Text, Image | Text | No | LVLM + detection | -- |
qwen3-vl:2b |
qwen3-vl:2b-thinking-q4_K_M |
Alibaba Cloud | 2025/10 | 2026/02 | 2.000 | 256 | 1.9 | Text, Image | Text | No | LVLM + detection | Incompatible with Ollama v0.9.3+ipex-llm |
qwen3-vl:4b |
qwen3-vl:4b-thinking-q4_K_M |
Alibaba Cloud | 2025/10 | 2026/02 | 4.000 | 256 | 3.3 | Text, Image | Text | No | LVLM + detection | Incompatible with Ollama v0.9.3+ipex-llm |
qwen3-vl:8b |
qwen3-vl:8b-thinking-q4_K_M |
Alibaba Cloud | 2025/10 | 2026/02 | 8.000 | 256 | 6.1 | Text, Image | Text | No | LVLM + detection | Incompatible with Ollama v0.9.3+ipex-llm |
qwen3-vl:30b |
qwen3-vl:30b-a3b-thinking-q4_K_M |
Alibaba Cloud | 2025/10 | 2026/02 | 30.000 | 256 | 20.0 | Text, Image | Text | Yes | LVLM + detection | Incompatible with Ollama v0.9.3+ipex-llm |
qwen3-vl:32b |
qwen3-vl:32b-thinking-q4_K_M |
Alibaba Cloud | 2025/10 | 2026/02 | 32.000 | 256 | 21.0 | Text, Image | Text | No | LVLM + detection | Incompatible with Ollama v0.9.3+ipex-llm |
Info
Interesting article about popular LLMs and the corresponding required architectures:
Technical Details about ai-models Group¶
For users in the ai-models group, it has been ensured that created files and
folders will have the ai-models group by default. For this, the setgid bit
has been added on /mnt/nfs/ai-models and sub-folders:
Then, still in the /mnt/nfs/ai-models folder, the default group rights have
been updated to force rwx on new created folders and rw on new created
files:
# install ACL to have the `setfacl` command
sudo apt install acl
# apply ACL to existing files
find /mnt/nfs/ai-models -type d -exec sudo setfacl -m g:ai-models:rwx {} +
find /mnt/nfs/ai-models -type f -exec sudo setfacl -m g:ai-models:rw- {} +
# apply ACL to the future files
sudo setfacl -R -d -m g:ai-models:rwx /mnt/nfs/ai-models