Infrastructure

Nodes

Threadripper

The threadripper is a 64-thread 132 GB RAM machine located in 1525-415 that can be used for heavy, multithreaded computing. One can do this in several ways. First a root user must run the adduser command which allows the creation of a new user. At the physical PC, one can then login and use the command startx to enter a desktop environment. TR runs an Unix-like operating system called FreeBSD.

ROOT and GEANT4 are already installed for all users. Additional software can be installed using pkg, or built by users.

When a user has been added, SSH access can be granted by contacting AU IT (specifically, I have been in contact with Dennis Christensen), who can add users to the SSH-Gateway_Phys-RIPP AD group. Then TR can be accessed by a proxy jump through ssh.au.dk. This necessitates that the user has an AU ID. When the user has been added to the AD group, they can SSH in by

ssh -t <AU ID>@ssh.au.dk ssh <TR Username>@ripp.st.lab.au.dk

You will be prompted for your AU wide password, after which you have to confirm with two-factor authentication (however you have this set up). Then you will be prompted for the password for your TR user. This can be done somewhat faster by adding the following to your ~/.ssh/config:

Host TR
    HostName ripp.st.lab.au.dk
    User <TR Username>
    ProxyJump <AU ID>@ssh.au.dk

Once this has been set up, you can create a shared folder using SSHFS:

mkdir trmount
sshfs TR:/home/<TR Username> trmount

This mounts the home directory of your TR user to the directory trmount, which allows you to work on directly on the files from your personal machine. Now, to take advantage of the multithreading capabilities of the TR, you must either write code which can run multithreaded, or use tricks when running executables from a TR terminal. There are many ways to do such tricks, and I will show an example using parallel here. I have created a program which takes as input a number (the thread number), prints the time and thread number, waits 4 seconds and then exits:

#include <iostream>
#include <chrono>
#include <thread>

int main(int argc, char* argv[]) {
    auto currentTime = std::chrono::system_clock::to_time_t(std::chrono::system_clock::now());
    int inputNumber = std::stoi(argv[1]);
    std::cout << "Run no. " << inputNumber << ": current time is " << std::ctime(&currentTime)  << std::endl;
    std::this_thread::sleep_for(std::chrono::seconds(4));
    currentTime = std::chrono::system_clock::to_time_t(std::chrono::system_clock::now());
    std::cout << "Run no. " << inputNumber << " current time is " << std::ctime(&currentTime)  <<std::endl;
    return 0;
}

I can run this in parallel on TR by going SSHing in, building the executable on TR, and running it with parallel -v ./test ::: 1 2 3 or equivalently parallel -v ./test ::: {1..3}, which produces the output

./test 1
Run no. 1: current time is Mon Feb  5 16:33:35 2024
Run no. 1 current time is Mon Feb  5 16:33:39 2024
./test 2
Run no. 2: current time is Mon Feb  5 16:33:35 2024
Run no. 2 current time is Mon Feb  5 16:33:39 2024
./test 3
Run no. 3: current time is Mon Feb  5 16:33:35 2024
Run no. 3 current time is Mon Feb  5 16:33:39 2024

If analysis of large files is required, these could be uploaded to ERDA, and ERDA can be mounted on TR following this guide.

stkern

In july 2024, the ubuntu running on stkern (22.04.4) and the services running on it were updated. I leave here some notes that may be useful to understand the services for maintenance and future updates. The stkern server is a virtual machine hosted by AU IT. It can be configured from here: vcsa01.vm.auit.au.dk where it is also possible to get terminal access in case SSH ceases to function. It is also possible to power the server off and on from this site and create snapshots of the VM. To upgrade the machine I did local backup of anything important and made sure I was able to make the services run, took a snapshot of the machine and did do-release-upgrade.

Generally our services are structured like this: the service is hosted on the https port of some local IP address on the machine. Nginx is then responsible for exposing the service to the internet. The elossweb and cloudSRIM services run in docker containers, with 3 separate containers for cloudSRIM to allow multiple simultaneous users.

Wiki

https://wiki.kern.phys.au.dk/

The wiki is run by the gitit service. The configuration file and data are located in /var/wiki/ and runs the service on port 5001 on localhost. The service can be controlled by commands such as ‘service gitit start’. To see what exactly these commands do, consult /etc/systemd/system/gitit.service. The wiki is backed up on gitlab at

https://gitlab.au.dk/ausa/wiki

Elog

https://elog.kern.phys.au.dk/

Configuration and data files are located in ´/usr/local/elog´. I installed this by doing (according to https://elog.psi.ch/elog/adminguide.html)

wget https://elog.psi.ch/elog/download/tar/elog-latest.tar.gz"https://elog.psi.ch/elog/download/tar/elog-latest.tar.gz

tar -xzvf elog-latest.tar.gz

Cd elog-3.1.5-1

sudo make install

After this the elog service can be started and stopped using service elog start.

To hand over admin privileges of elog to another user, edit the file elogd.cfg in /usr/local/elog/ on kern and change the field “Admin user” accordingly.

The elog is backed up every hour with a cronjob to https://gitlab.au.dk/ausa/kern/elog. The script that runs this cronjob is pushElog.sh in the elog directory. To monitor the cronjob, do crontab -e

cloudSRIM

https://srim.kern.phys.au.dk/

The files relevant to running cloudSRIM are located in /home/kernadmin/dock-cloudSRIM. CloudSRIM is one of the services that runs in a docker container. Docker can be installed by following the instructions given in https://docs.docker.com/engine/install/ubuntu/.

Of particular importance is the Dockerfile, which controls what the docker containers contain and run.

Before starting cloudSRIM a docker network must be created with the name cloudNet:

docker network create --subnet=172.18.0.0/16 cloudNet

172.18.0.0 is the subnet where docker services will be hosted. This is where nginx expects to find these services. Now do, in the dock-cloudSRIM directory:

docker build -t munken/cloudsrim .

To build the image. Once built, running the shell script restart.sh should start (and of course also is able to restart) the cloudSRIM service. It may be necessary to open the firewall for this port. Once the service runs, one should be able to do

curl -f -H 'Content-Type: application/json' -X POST -d '{"Z1":3,"A1":9,"M":9.026790189057362,"target":{"Z":[1,6],"stoi":[4,2],"gas":false,"density":0.93}}' -o 'Li9_4H_32.dat' 172.18.0.2:5000

And get an output file that looks like

==================================================================

          Calculation using SRIM-2006  

          SRIM version ---> SRIM-2012.01 

          Calc. date   ---> July 12, 2024  

==================================================================

Provided nginx is set up and is exposing cloudsrim, we can do

curl -f -H 'Content-Type: application/json' -X POST -d '{"Z1":3,"A1":9,"M":9.026790189057362,"target":{"Z":[1,6],"stoi":[4,2],"gas":false,"density":0.93}}' -o 'Li9_4H_32.dat' https://srim.kern.phys.au.dk/

ElossWeb

https://eloss.kern.phys.au.dk/

The relevant files are located in /home/kernadmin/eloss-dock. The docker image is built by doing first

docker build -t elossweb .

From that directory, and then starting a container on the cloudNet (nginx expects this specific ip address):

docker run --net cloudNet --name elossweb --ip 172.18.0.5 -itd elossweb

Note: elossweb saves srim files so that it does not need to generate them again in the future. If for some reason elossweb cannot see cloudsrim, these files will be junk, and no output will be given when elossweb is prompted for that file. To delete them, such that elossweb is forced to download them again, do

docker exec -it elossweb bash

And remove the files from the .AUSAlib/SRIM13 directory.

InfluxDB

Influxdb is a timeseries based database which has mainly been used to store data to be plotted in grafana. We use version 1, as upgrading to later versions is a significant upgrade that does not give any needed advantages. Installation guide can be found here: https://docs.influxdata.com/influxdb/v1/introduction/install/

The configuration file is located in /etc/influxdb/influxdb.conf, and data in /var/lib/influxdb. It is important that the influxdb user has permissions for this directory, which can be given by

sudo chown -R influxdb:influxdb /var/lib/influxdb/*

The data in /var/lib/influxdb/data is a symlink to /mnt/nfs/influxdb/data. When I set up influxdb on a different machine, this symlink gave problems. It might be better to use a bind mount instead, but we leave it as because it is not currently broken. Can be restarted with


sudo service restart influx 

To access the DB directly, ssh into kern and run


influx 

For basement-related stuff we use the database measure, and measurements that corresponds to different services such as vacuum, vulom etc.

Grafana

https://grafana.kern.phys.au.dk/

To install, follow directions given here: https://grafana.com/grafana/download

The configuration file is /etc/grafana/grafan.ini.

Code which defines the dashboards is located in /var/lib/grafana/

When setup, it can be restarted with

sudo service restart grafana-server

nginx

Nginx is responsible for exposing all the services to the internet. Configuration files are located in /etc/nginx/. After I upgraded, I found that it was necessary to do

sudo chown -R nginx:nginx /var/lib/nginx/

and

sudo chmod -R 700 /var/lib/nginx/

or grafana and the elog would not behave properly (grafana opened with error message; images on elog would not load properly)

Snowfox

Located in the basement at the 5MeV accelerator.

Mux_client (snowfox)

Used to monitor trigger data Can be restarted on snowfox with

sudo service mux_client restart

Pfeiffer (snowfox)

Used to monitor vacuum in basement. Can be restarted on snowfox with

sudo service pfeiffer restart

Gitlab

We do not host our own gitlab server, instead we have one run by AU. In the AUSA group, users can be added and permissions changed by going to https://gitlab.au.dk/ausa and clicking “Manage”.

Legacy

Nodes and services which are no longer active but the information may still be useful.

kern

GitLab

Access: https://git.kern.phys.au.dk/

RUP

Runs on data_dumper@stkernfys. We have a premium user with the username: ausadocker, and the usual password.

Code: https://git.kern.phys.au.dk/ausa/RUP

Settings: /etc/RUP/settings.yml

Log: /var/log/RUP.yml

Rules: /etc/RUP/rules.yml

Startup script: /etc/init/RUP.conf

Iterate directory: In case you want to unpack a bunch of files after they are already uploaded to the data directory you can use the iterate option. Edit the rules.yml and settings.yml. On the data user you can now execute (example to iterate all files in directory is561c/):

/usr/bin/python /usr/local/bin/RUP --rules /etc/RUP/rules.yml --config /etc/RUP/settings.yml --log RUPis561c2.log --iterate is561c/

To start RUP simply go on stkernfys and type

service RUP start

RUP used docker. To login to docker go onto data_dumper on stkern and type

docker login

Collectd

Collectd is used to gather data on running processes. At the moment it is running on Snowfox and the data is sent to Influx in the collectd database. The config file is /etc/collectd/collectd.conf, and the service can be restarted with

service collectd restart. 

RunDB

Access: https://rundb.kern.phys.au.dk/

Runs in docker.

Docker image: rundb

Docker container: rundb

Docker file: https://git.kern.phys.au.dk/ausa/rundb-docker

Cloned to /home/kernadmin/rundb-docker

Build image with: docker build -t rundb .

Database is in /opt/rundb/runs.db

Log files are in opt/rundb/log_files

Container started with /home/kernadmin/rundb-docker/run.sh

Runs on port 8092

nginx-configuration: /etc/nginx/sites-available/rundb.kern.phys.au.dk.conf

NB! If rundb is not working, it can be started with the command sudo docker start rundb

DAQC

Access: https://daqc.kern.phys.au.dk/

stkernfys

Access: stkernfys.phys.au.dk

What do we use it for:

Nice to know:

ci-kern

Access: ci-kern.phys.au.dk If ci-kern runs out of space, either prune the docker image or reinstall it.

One can prune the docker images by typing

docker system prune

on ci-kern.

If pruning does not clear enough space, then one can delete all dangling images:

docker volume rm $(docker volume ls -qf dangling=true)

If we need to install a new runner, then type

gitlab-runner register

Make sure that tags = docker, default image = ubuntu, executer = docker.

Docker

We use docker to pull images in on ci-kern for CI implementation and for RUP to do automatic unpackinga and handling of data files. The docker user that is used on ci-kern and in RUP (data_dumpers docker login) is AUSA’s own user. This is a “pro” user with unlimited pulls. The user is currently set up on Hans’ email with username “ausadocker” og det sædvandlige kodeord.