LLoby¶

LLoby is a centralized gateway located on the Dalek frontal node (front.dalek.lip6), designed to provide seamless access to Large Language Models (LLMs) within the Dalek cluster.

🎯 Motivations¶

Workflow Complexity: Traditional setups require manual node reservation, service deployment, and reverse port forwarding for every session.

LLoby Solution: Automates the entire lifecycle from allocation to request forwarding via a single OpenAI-compatible endpoint.

Resource Inefficiency: Individual user instances (e.g., Ollama) prevent resource sharing and cause cluster fragmentation through excessive, isolated reservations.

LLoby Solution: Implements an orchestration layer that enables efficient resource sharing across the cluster.

Computational Waste: Intermittent human-AI interaction leaves allocated nodes idle, resulting in significant wasted compute power.

LLoby Solution: Automatically unallocates workers when unused and leverages llama.cpp's continuous batching to maximize hardware utilization.

✨ Key Features¶

Unified Access Point: A single connection point to interact with multiple high-performance models.
Intelligent Orchestration: LLoby automatically manages compute node allocation. It is specifically designed to minimize interference with ongoing scientific experiments on the Dalek cluster.
Resource sharing: Based on llama.cpp backend, LLoby enable resource sharing using continuous batching.
OpenAI Compatibility: LLoby exposes an OpenAI-compatible API, ensuring compatibility with a wide range of modern LLM tools (Open WebUI, continue.dev...).

⚒️ Setup Guide¶

LLoby runs on port 11087 of Dalek's front node. To use it, you need to map this port to your local machine using SSH.

Step 1: Establish the SSH Tunnel

Open a terminal on your local machine and run the following command to redirect LLoby's port to your local machine:

ssh -N -L 11087:localhost:11087 front.dalek.lip6

Tip

If you are connected to LIP6 network, you can skip step 1 and directly connect your tools to the openAI endpoint: http://front.dalek.proj.lip6.fr:11087/v1

Step 2: Configure your LLM Tool (e.g., Open WebUI)

To use LLoby, it is recommended to install on your local machine Applications such as Open WebUI to discuss with LLMs. For code generation, continue.dev (VS code) and avante.nvim (neovim) have been tested, but feel free to configure your own preferred tool!

Once your application is installed, you can connect it to LLoby with the following procedure:

Open your tool's settings (e.g.: For Open WebUI, navigate to admin panel > Settings > Connections).
Add a new OpenAI connection.
Use the following endpoint depending on your situation:
- Using port forwarding: http://localhost:11087/v1
- From LIP6 network: http://front.dalek.proj.lip6.fr:11087/v1

Note

If you used a different port in Step 1, adjust the URL accordingly, e.g., http://localhost:11435/v1

🚀 Daily Usage Tips¶

📦 Supported Models:

LLoby provides access to state-of-the-art open-weights models. Below is the current list of available models:

Family	Variants	quantization
Qwen 3.6	35B, 27B	Q4_K_M
Gemma 4	31B, 26B, E4B, E2B	Q4_K_M
GPT-oss	20B	mxfp4
Llama 2	7B	Q4_0

Note

New models are added regularly. For a complete and up-to-date list, please refer to the GGUF models section.

📊 Monitoring Node Status & Usage

If your requests seem to be pending or slow, you can check the status of the compute nodes. This is a powerful way to understand the current cluster load and why a model might be waiting for resources:

Via Terminal: curl http://localhost:11087/status/txt
Via Terminal (Live tracking): watch -n 1 curl -s http://localhost:11087/status/txt
Via Browser (Visual mode): http://localhost:11087/status

Tip

From LIP6 network, monitoring can be done via:

Via Terminal: curl http://front.dalek.proj.lip6.fr:11087/status/txt
Via Terminal (Live tracking): watch -n 1 curl -s http://front.dalek.proj.lip6.fr:11087/status/txt
Via Browser (Visual mode): http://front.dalek.proj.lip6.fr:11087/status