LLobyΒΆ
LLoby is a centralized gateway located on the Dalek frontal node
(front.dalek.lip6), designed to provide seamless access to Large Language
Models (LLMs) within the Dalek cluster.
π― MotivationsΒΆ
Workflow Complexity: Traditional setups require manual node reservation, service deployment, and reverse port forwarding for every session.
- LLoby Solution: Automates the entire lifecycle from allocation to request forwarding via a single OpenAI-compatible endpoint.
Resource Inefficiency: Individual user instances (e.g., Ollama) prevent resource sharing and cause cluster fragmentation through excessive, isolated reservations.
- LLoby Solution: Implements an orchestration layer that enables efficient resource sharing across the cluster.
Computational Waste: Intermittent human-AI interaction leaves allocated nodes idle, resulting in significant wasted compute power.
- LLoby Solution: Automatically unallocates workers when unused and leverages llama.cpp's continuous batching to maximize hardware utilization.
β¨ Key FeaturesΒΆ
- Unified Access Point: A single connection point to interact with multiple high-performance models.
- Intelligent Orchestration: LLoby automatically manages compute node allocation. It is specifically designed to minimize interference with ongoing scientific experiments on the Dalek cluster.
- Resource sharing: Based on llama.cpp backend, LLoby enable resource sharing using continuous batching.
- OpenAI Compatibility: LLoby exposes an OpenAI-compatible API, ensuring compatibility with a wide range of modern LLM tools (Open WebUI, continue.dev...).
βοΈβ Setup GuideΒΆ
LLoby runs on port 11087 of Dalek's front node. To use it, you need to map
this port to your local machine using SSH.
Step 1: Establish the SSH Tunnel
Open a terminal on your local machine and run the following command to redirect LLoby's port to your local machine:
Tip
If you are connected to LIP6 network, you can skip step 1 and directly
connect your tools to the openAI endpoint:
http://front.dalek.proj.lip6.fr:11087/v1
Step 2: Configure your LLM Tool (e.g., Open WebUI)
To use LLoby, it is recommended to install on your local machine Applications such as Open WebUI to discuss with LLMs. For code generation, continue.dev (VS code) and avante.nvim (neovim) have been tested, but feel free to configure your own preferred tool!
Once your application is installed, you can connect it to LLoby with the following procedure:
- Open your tool's settings (e.g.: For Open WebUI, navigate to admin panel > Settings > Connections).
- Add a new OpenAI connection.
- Use the following endpoint depending on your situation:
- Using port forwarding:
http://localhost:11087/v1 - From LIP6 network:
http://front.dalek.proj.lip6.fr:11087/v1
- Using port forwarding:
Note
If you used a different port in Step 1, adjust the URL accordingly, e.g.,
http://localhost:11435/v1
πβ Daily Usage TipsΒΆ
π¦ Supported Models:
LLoby provides access to state-of-the-art open-weights models. Below is the current list of available models:
| Family | Variants | quantization |
|---|---|---|
| Qwen 3.6 | 35B, 27B | Q4_K_M |
| Gemma 4 | 31B, 26B, E4B, E2B | Q4_K_M |
| GPT-oss | 20B | mxfp4 |
| Llama 2 | 7B | Q4_0 |
Note
New models are added regularly. For a complete and up-to-date list, please
refer to the GGUF models section.
π Monitoring Node Status & Usage
If your requests seem to be pending or slow, you can check the status of the compute nodes. This is a powerful way to understand the current cluster load and why a model might be waiting for resources:
- Via Terminal:
curl http://localhost:11087/status/txt - Via Terminal (Live tracking):
watch -n 1 curl -s http://localhost:11087/status/txt - Via Browser (Visual mode): http://localhost:11087/status
Tip
From LIP6 network, monitoring can be done via:
- Via Terminal:
curl http://front.dalek.proj.lip6.fr:11087/status/txt - Via Terminal (Live tracking):
watch -n 1 curl -s http://front.dalek.proj.lip6.fr:11087/status/txt - Via Browser (Visual mode): http://front.dalek.proj.lip6.fr:11087/status