Skip to content

Hardware Node Settings

This page list the known possible hardware and software configurations that the user can leverage on the compute nodes. One may noticed that this is generally not possible on traditional clusters and it makes Dalek quite unique from this point of view.

CPU Driver

Dalek compute nodes are configured to allow users to modify the CPU driver parameters. The only prerequisite is membership in the cpudev group. To be added to the cpudev group, please contact the system administrators.

To prevent unexpected behavior, the CPU configuration is automatically reset when a SLURM job terminates. For this reason, when working with CPU drivers, we strongly recommend running exclusive jobs (i.e., reserving the entire node).

To ensure system stability, the cpudev_backup script is executed at node boot time. This script creates a snapshot of the /sys/devices/system/cpu hierarchy and stores it in the /tmp/cpudev_sysfs_backup.txt file. When a job ends, the cpudev_restore script is automatically invoked to restore the original CPU configuration.

To modify CPU drivers (e.g., adjusting frequencies or idle behaviors), we strongly encourage users to use the cpupower command-line tools. Normally, cpupower requires sudo privileges. But, on Dalek, users belonging to the cpudev group are allowed to use it via:

sudo cpupower [parameters]

Please refer to the official cpupower documentation for usage details:

Advanced users may also experiment by directly modifying files under /sys/devices/system/cpu. To enable this, a dedicated helper binary, cpudev_setperms, is installed on the nodes. This tool parses the /sys/devices/system/cpu hierarchy and, for each file:

  • changes the ownership from root:root to root:cpudev,
  • grants the cpudev group the same permissions as the root user.

This allows members of the cpudev group to modify the relevant sysfs entries directly. The tool must be run on an exclusively reserved node and can be invoked as follows:

sudo cpudev_setperms

Warning

When modifying files directly under /sys/devices/system/cpu, additional files may be created (for example, when switching CPU drivers). In such cases, cpudev_setperms must be run again to update permissions on the newly created files.

Tips

powertop is installed on the nodes and can be run with sudo if you are in the cpudev group.

CPU Driver Source Codes

Backup Script

/usr/local/sbin/cpudev_backup
#!/bin/bash

# CPU sysfs path and save path
BASE_PATH="/sys/devices/system/cpu"
#BACKUP_FILE="/tmp/cpu_sysfs_backup_$(date +%Y%m%d_%H%M%S).txt"
BACKUP_FILE="/tmp/cpudev_sysfs_backup.txt"

# Files list to exclude
EXCLUDE_FILES=("uevent" "modalias" "subsystem" "device" "scaling_setspeed")

# Create the save file
echo "Path of sysfs CPU parameters save: $BACKUP_FILE"
echo "# sysfs CPU parameters - $(date)" > "$BACKUP_FILE"
echo "# Format: path=value" >> "$BACKUP_FILE"

# Build the command line with file exclusions
exclude_args=()
for file in "${EXCLUDE_FILES[@]}"; do
    exclude_args+=(! -name "$file")
done

# Browse files, exclude those in the list and check owner permissions
find "$BASE_PATH" -type f -perm -u=w "${exclude_args[@]}" 2>/dev/null | while read -r file; do
    value=$(cat "$file" 2>/dev/null)
    if [ $? -eq 0 ]; then
        echo "$file=$value" >> "$BACKUP_FILE"
    fi
done

echo "Save complete."

Restore Script

/usr/local/sbin/cpudev_restore
#!/bin/bash

# Put back the root group to prevent abusive uses later
echo "Restore root group in /sys/devices/system/cpu/ directory"
chgrp -R root /sys/devices/system/cpu/*

# Check that the save file is given, if not stop here
if [ $# -ne 1 ]; then
    exit 0
fi

BACKUP_FILE="$1"

if [ ! -f "$BACKUP_FILE" ]; then
    echo "Error: '$BACKUP_FILE' does not exist."
    exit 1
fi

# Define paths priority order
PRIORITY_PATHS=(
    "intel_pstate" # first those in this sub-folder
    "cpufreq"      # then those is cpufreq/
)

echo "Restore sysfs CPU parameters from $BACKUP_FILE"

# Read the backup file and sort the lines according to priority order
declare -A file_values
declare -A b_file_values # to check
declare -A i_file_values # for a new attempt at ignored files

while IFS= read -r line; do
    # ignore comments
    if [[ "$line" == \#* ]]; then
        continue
    fi
    file=$(echo "$line" | cut -d'=' -f1)
    value=$(echo "$line" | cut -d'=' -f2-)
    file_values["$file"]="$value"
    b_file_values["$file"]="$value"
done < "$BACKUP_FILE"

# Function to restore a file if necessary
restore_file() {
    local file="$1"
    local target_value="$2"
    if [ -n "$target_value" ]; then
        if [ -w "$file" ]; then
            current_value=$(cat "$file" 2>/dev/null)
            if [ "$current_value" != "$target_value" ]; then
                echo "$target_value" | sudo tee "$file" > /dev/null
                echo "Restored: $file = $target_value"
            else
                echo "Up to date: $file ($target_value)"
            fi
        else
            i_file_values["$file"]="$target_value"
            echo "Ignored (writing failed): $file"
        fi
    fi
}

# Restore according to priority order
for pattern in "${PRIORITY_PATHS[@]}"; do
    echo "--- Restoring files in $pattern/ ---"
    for file in "${!file_values[@]}"; do
        if [[ "$file" == *"$pattern"* ]]; then
            restore_file "$file" "${file_values[$file]}"
            unset file_values["$file"]  # to avoid to be processed two times
        fi
    done
done
echo "--- Restoring the remaining files  ---"
for file in "${!file_values[@]}"; do
    restore_file "$file" "${file_values[$file]}"
done

# New attempt for ignored files
echo "--- Restoring previously ignored files ---"
for file in "${!i_file_values[@]}"; do
    restore_file "$file" "${i_file_values[$file]}"
    unset i_file_values["$file"] # to avoid infinite looping
done

# Verify that the shaft has been properly restored
for file in "${!b_file_values[@]}"; do
    v=$(cat "$file" 2>/dev/null)
    t="${b_file_values[$file]}"
    if [[ "$v" != "$t" ]]; then
        echo "Wrong restoration of the sysfs tree. '$file' should have the value '$t', but it has the value '$v'."
        exit 2
    fi
done

echo "Restoration complete."

Set Permissions

/usr/local/sbin/cpudev_setperms
// g++ -std=c++17 -O2 -Wall -o cpudev_setperms cpudev_setperms.cpp

#include <iostream>
#include <string>
#include <vector>
#include <cstring>
#include <regex>
#include <unistd.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <sys/stat.h>
#include <pwd.h>
#include <grp.h>
#include <ftw.h>
#include <errno.h>
#include <fcntl.h>
#include <limits.h>
#include <sstream>
#include <syslog.h>

static const std::vector<std::string> g_local_users = {"administrator", "powerstate", "prober"};
static const std::string g_slurm_path = "/opt/slurm/bin/";
static const std::string g_cpudev_path = "/sys/devices/system/cpu/";
static const std::string g_cpudev_group = "cpudev";

// ================ Get current username =======================================
std::string get_sudo_invoker() {
    // First, check if the program was run via sudo
    const char* sudoUser = getenv("SUDO_USER");
    if (sudoUser && *sudoUser) {
        return std::string(sudoUser);
    }

    // Fallback: use the real UID of the process
    uid_t uid = getuid();
    struct passwd* pw = getpwuid(uid);
    if (pw && pw->pw_name) {
        return std::string(pw->pw_name);
    }

    // Final fallback
    return "unknown";
}

// ================ Run external binary safely and capture its stdout ==========
std::string run_program_capture(const std::string &prog, const std::vector<std::string> &args) {
    int pipefd[2];
    if (pipe(pipefd) == -1) {
        syslog(LOG_ERR, "pipe() failed: %s", strerror(errno));
        return "";
    }

    pid_t pid = fork();
    if (pid < 0) {
        syslog(LOG_ERR, "fork() failed: %s", strerror(errno));
        close(pipefd[0]); close(pipefd[1]);
        return "";
    }

    if (pid == 0) {
        // child
        dup2(pipefd[1], STDOUT_FILENO);
        close(pipefd[0]);
        close(pipefd[1]);

        std::vector<char*> argv;
        argv.reserve(args.size() + 2);
        argv.push_back(const_cast<char*>(prog.c_str()));
        for (const auto &a : args)
            argv.push_back(const_cast<char*>(a.c_str()));
        argv.push_back(nullptr);

        execv(prog.c_str(), argv.data());
        _exit(127);
    }

    // parent
    close(pipefd[1]);
    std::string out;
    char buf[512];
    ssize_t n;
    while ((n = read(pipefd[0], buf, sizeof(buf))) > 0) {
        out.append(buf, buf + n);
    }
    close(pipefd[0]);

    int status = 0;
    waitpid(pid, &status, 0);
    if (WIFEXITED(status) && WEXITSTATUS(status) != 0) {
        syslog(LOG_WARNING, "Program %s exited with status %d", prog.c_str(), WEXITSTATUS(status));
    } else if (WIFSIGNALED(status)) {
        syslog(LOG_WARNING, "Program %s terminated by signal %d", prog.c_str(), WTERMSIG(status));
    }

    while (!out.empty() && (out.back() == '\n' || out.back() == '\r'))
        out.pop_back();
    return out;
}

// ================ get hostname (safe) ========================================
std::string get_hostname() {
    char host[256];
    if (gethostname(host, sizeof(host)) == 0) {
        return std::string(host);
    }
    return "";
}

// ================ get Slurm JobID (first token) ==============================
std::string get_slurm_job_id(const std::string &user) {
    std::string prog = g_slurm_path + "squeue";
    std::string host = get_hostname();
    if (host.empty()) return "";
    std::vector<std::string> args = {"--noheader", ("--nodelist=" + host), ("--user=" + user), "--Format=JobID"};
    std::string out = run_program_capture(prog, args);
    if (out.empty()) return "";

    size_t pos = out.find('\n');
    std::string firstline = (pos == std::string::npos) ? out : out.substr(0, pos);
    std::istringstream iss(firstline);
    std::string token;
    if (!(iss >> token)) return "";
    std::regex jobre("^\\d+$");
    if (std::regex_match(token, jobre)) return token;
    return "";
}

// ================ parse "OverSubscribe=" value from scontrol output ===========
std::string get_over_subscribe_flag(const std::string &job_id) {
    std::string prog = g_slurm_path + "scontrol";
    std::vector<std::string> args = {"show", "job", job_id};
    std::string out = run_program_capture(prog, args);
    if (out.empty()) return "";
    std::string key = "OverSubscribe=";
    size_t p = out.find(key);
    if (p == std::string::npos) return "";
    p += key.size();
    size_t q = p;
    while (q < out.size() && !isspace((unsigned char)out[q])) ++q;
    return out.substr(p, q - p);
}

// ================ Check if user is local (from hard-coded list) =============
bool is_local_user(const std::string &user) {
    for (auto &u : g_local_users) if (u == user) return true;
    return false;
}

// ================ Resolve group name to gid =================================
bool lookup_gid(const std::string &groupname, gid_t &out_gid) {
    struct group *g = getgrnam(groupname.c_str());
    if (!g) {
        syslog(LOG_ERR, "getgrnam('%s') failed", groupname.c_str());
        return false;
    }
    out_gid = g->gr_gid;
    return true;
}

// Global state for nftw callback
static gid_t g_cpudev_gid = (gid_t)-1;
static bool g_do_chgrp = true;
static bool g_do_chmod_g_eq_u = true;
static int g_change_errors = 0;

// ================ nftw callback ==============================================
int nftw_callback(const char *fpath, const struct stat *sb, int typeflag, struct FTW * /*ftwbuf*/) {
    (void)typeflag; // FTW_PHYS used so symlinks won't be followed
    // Change group preserving owner
    if (g_do_chgrp) {
        if (chown(fpath, sb->st_uid, g_cpudev_gid) != 0) {
            // log occasionally; avoid extremely noisy logs - increment counter and log sample
            ++g_change_errors;
            if (g_change_errors <= 5) {
                syslog(LOG_WARNING, "chown failed on %s: %s", fpath, strerror(errno));
            } else if (g_change_errors == 6) {
                syslog(LOG_WARNING, "Further chown failures suppressed (many)");
            }
        }
    }

    // Set group bits equal to owner bits (g = u)
    if (g_do_chmod_g_eq_u) {
        mode_t cur = sb->st_mode;
        mode_t ubits = (cur & S_IRWXU);
        mode_t new_mode = (cur & ~S_IRWXG) | ((ubits >> 3) & S_IRWXG);
        if ((cur & S_IRWXG) != (new_mode & S_IRWXG)) {
            if (chmod(fpath, new_mode) != 0) {
                ++g_change_errors;
                if (g_change_errors <= 5) {
                    syslog(LOG_WARNING, "chmod failed on %s: %s", fpath, strerror(errno));
                } else if (g_change_errors == 6) {
                    syslog(LOG_WARNING, "Further chmod failures suppressed (many)");
                }
            }
        }
    }
    return 0; // continue
}

// ================ Recursively operate on g_cpudev_path safely ===============
bool apply_cpudev_changes() {
    if (!lookup_gid(g_cpudev_group.c_str(), g_cpudev_gid)) {
        syslog(LOG_ERR, "Group '%s' not found", g_cpudev_group.c_str());
        return false;
    }
    g_change_errors = 0;
    // Use 20 file descriptors at the same time.
    // NFTW_PHYS prevents following symlinks (equivalent to chmod -P)
    if (nftw(g_cpudev_path.c_str(), nftw_callback, 20, FTW_PHYS) != 0) {
        syslog(LOG_ERR, "nftw failed on %s: %s", g_cpudev_path.c_str(), strerror(errno));
        return false;
    }
    if (g_change_errors > 0) {
        syslog(LOG_WARNING, "Completed with %d change errors under %s", g_change_errors, g_cpudev_path.c_str());
        std::clog << "(WW) Completed with " << g_change_errors
                  << " change errors under " << g_cpudev_path << std::endl;
    } else {
        syslog(LOG_INFO, "Successfully updated ownership and permissions under %s", g_cpudev_path.c_str());
    }
    return (g_change_errors == 0);
}

// ================ main =======================================================
int main() {
    // open syslog
    openlog("cpudev_setperms", LOG_PID | LOG_CONS, LOG_DAEMON);
    syslog(LOG_INFO, "Program start");

    std::string user = get_sudo_invoker();
    syslog(LOG_INFO, "Invoked by user: %s", user.c_str());

    std::string job_id = get_slurm_job_id(user);
    if (!job_id.empty()) {
        syslog(LOG_INFO, "Found SLURM JobID %s for user %s", job_id.c_str(), user.c_str());
    } else {
        syslog(LOG_INFO, "No SLURM JobID found for user %s on this node", user.c_str());
    }

    std::string over_sub;
    if (!job_id.empty()) {
        over_sub = get_over_subscribe_flag(job_id);
        syslog(LOG_INFO, "OverSubscribe for job %s = '%s'", job_id.c_str(), over_sub.c_str());
    }

    bool local = is_local_user(user);
    if (local) syslog(LOG_INFO, "User %s is in local user list", user.c_str());

    if (over_sub == "NO" || local) {
        syslog(LOG_INFO, "Proceeding to change ownership/permissions for %s", g_cpudev_path.c_str());

        if (!apply_cpudev_changes()) {
            syslog(LOG_ERR, "Failed to update ownership/permissions under %s", g_cpudev_path.c_str());
            std::cerr << "(EE) Failed to update some ownership/permission entries under " << g_cpudev_path << "\n";
            closelog();
            return 3;
        }

        syslog(LOG_INFO, "Completed ownership/permission updates for user %s", user.c_str());
        std::cout << "(II) Permissions and ownership updated successfully for user: " << user << "\n";
        closelog();
        return 0;
    } else {
        syslog(LOG_WARNING, "Node is NOT exclusively allocated and user is not local: aborting for user %s", user.c_str());
        std::cerr << "(EE) This node is NOT exclusively allocated via SLURM, you cannot run this program.\n";
        closelog();
        return 4;
    }
}

SPANK Plugin to Invoke Restore Script

/mnt/nfs/software/slurm/lib/spank/cpudev.so
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <signal.h>
#include <stdio.h>
#include <syslog.h>
#include <pwd.h>

#include <slurm/spank.h>

#define LOG

#ifdef LOG
#define OPENLOG(...) openlog(__VA_ARGS__)
#define SYSLOG(...) syslog(__VA_ARGS__)
#define CLOSELOG(...) closelog(__VA_ARGS__)
#else
#define OPENLOG(...)
#define SYSLOG(...)
#define CLOSELOG(...)
#endif

/*
 * All spank plugins must define this macro for the Slurm plugin loader.
 */
SPANK_PLUGIN(cpudev, 1);

int slurm_spank_task_exit(spank_t spank, int argc, char *argv[])
{
    spank_context_t calling_context = spank_context();

    OPENLOG("slurm_spank_cpudev", LOG_CONS | LOG_PID | LOG_NDELAY, LOG_DAEMON);
    SYSLOG(LOG_NOTICE, "task_exit - program started by SLURM, UID is %d", getuid());

    switch (calling_context)
    {
        case S_CTX_REMOTE: {
            char *nodename = getenv("SLURMD_NODENAME");
            if (nodename == NULL)
            {
                slurm_error("cpudev: getenv of \"SLURMD_NODENAME\" failed!");
                SYSLOG(LOG_ERR, "task_exit - getenv of \"SLURMD_NODENAME\" failed!");
            }
            else
            {
                const char* path_restore_script = "/usr/local/sbin/cpudev_restore";
                if (access(path_restore_script, F_OK) == 0)
                {
                    uid_t uid;
                    /* spank_err_t rc = */ spank_get_item(spank, S_JOB_UID, &uid);
                    struct passwd *pws;
                    pws = getpwuid(uid);

                    char proberctl_script[2048];
                    const char* path_backup = "/tmp/cpudev_sysfs_backup.txt";
                    int is_backup_file;
                    if ((is_backup_file = access(path_backup, F_OK)) == 0)
                    {
                        // changes the group to "root" and restore the CPU sysfs values
                        snprintf(proberctl_script, sizeof(proberctl_script), "%s %s", path_restore_script, path_backup);
                    }
                    else
                    {
                        // in this case the script just changes the group to "root"
                        snprintf(proberctl_script, sizeof(proberctl_script), "%s", path_restore_script);
                    }

                    if (system(proberctl_script))
                    {
                        slurm_error ("cpudev_restore: system command failed!");
                        SYSLOG(LOG_ERR, "task_exit - fail to run cpudev_restore script (user=%s,backup=%d)", pws->pw_name, is_backup_file);
                    }
                    else
                    {
                        SYSLOG(LOG_NOTICE, "task_exit - run cpudev_restore script (user=%s,backup=%d)", pws->pw_name, is_backup_file);
                    }
                }
                else
                {
                    SYSLOG(LOG_NOTICE, "task_exit - run cpudev_restore script is not installed on this node");
                }
            }
            break;
        }
        default:
            break;
    }

    CLOSELOG();
    return 0;
}