Infrastructure
Nodes
Threadripper
The threadripper is a 64-thread 132 GB RAM machine located in 1525-415 that can be used for heavy, multithreaded computing. One can do this in several ways. First a root user must run the adduser
command which allows the creation of a new user. At the physical PC, one can then login and use the command startx
to enter a desktop environment. TR runs an Unix-like operating system called FreeBSD.
ROOT and GEANT4 are already installed for all users. Additional software can be installed using pkg
, or built by users.
When a user has been added, SSH access can be granted by contacting AU IT (specifically, I have been in contact with Dennis Christensen), who can add users to the SSH-Gateway_Phys-RIPP
AD group. Then TR can be accessed by a proxy jump through ssh.au.dk
. This necessitates that the user has an AU ID. When the user has been added to the AD group, they can SSH in by
ssh -t <AU ID>@ssh.au.dk ssh <TR Username>@ripp.st.lab.au.dk
You will be prompted for your AU wide password, after which you have to confirm with two-factor authentication (however you have this set up). Then you will be prompted for the password for your TR user. This can be done somewhat faster by adding the following to your ~/.ssh/config
:
Host TR
HostName ripp.st.lab.au.dk
User <TR Username>
ProxyJump <AU ID>@ssh.au.dk
Once this has been set up, you can create a shared folder using SSHFS:
mkdir trmount
sshfs TR:/home/<TR Username> trmount
This mounts the home directory of your TR user to the directory trmount, which allows you to work on directly on the files from your personal machine. Now, to take advantage of the multithreading capabilities of the TR, you must either write code which can run multithreaded, or use tricks when running executables from a TR terminal. There are many ways to do such tricks, and I will show an example using parallel
here. I have created a program which takes as input a number (the thread number), prints the time and thread number, waits 4 seconds and then exits:
#include <iostream>
#include <chrono>
#include <thread>
int main(int argc, char* argv[]) {
auto currentTime = std::chrono::system_clock::to_time_t(std::chrono::system_clock::now());
int inputNumber = std::stoi(argv[1]);
std::cout << "Run no. " << inputNumber << ": current time is " << std::ctime(¤tTime) << std::endl;
std::this_thread::sleep_for(std::chrono::seconds(4));
currentTime = std::chrono::system_clock::to_time_t(std::chrono::system_clock::now());
std::cout << "Run no. " << inputNumber << " current time is " << std::ctime(¤tTime) <<std::endl;
return 0;
}
I can run this in parallel on TR by going SSHing in, building the executable on TR, and running it with parallel -v ./test ::: 1 2 3
or equivalently parallel -v ./test ::: {1..3}
, which produces the output
./test 1
Run no. 1: current time is Mon Feb 5 16:33:35 2024
Run no. 1 current time is Mon Feb 5 16:33:39 2024
./test 2
Run no. 2: current time is Mon Feb 5 16:33:35 2024
Run no. 2 current time is Mon Feb 5 16:33:39 2024
./test 3
Run no. 3: current time is Mon Feb 5 16:33:35 2024
Run no. 3 current time is Mon Feb 5 16:33:39 2024
If analysis of large files is required, these could be uploaded to ERDA, and ERDA can be mounted on TR following this guide.
stkern
In july 2024, the ubuntu running on stkern (22.04.4) and the services running on it were updated. I leave here some notes that may be useful to understand the services for maintenance and future updates. The stkern server is a virtual machine hosted by AU IT. It can be configured from here: vcsa01.vm.auit.au.dk where it is also possible to get terminal access in case SSH ceases to function. It is also possible to power the server off and on from this site and create snapshots of the VM. To upgrade the machine I did local backup of anything important and made sure I was able to make the services run, took a snapshot of the machine and did do-release-upgrade.
Generally our services are structured like this: the service is hosted on the https port of some local IP address on the machine. Nginx is then responsible for exposing the service to the internet. The elossweb and cloudSRIM services run in docker containers, with 3 separate containers for cloudSRIM to allow multiple simultaneous users.
Wiki
https://wiki.kern.phys.au.dk/
The wiki is run by the gitit service. The configuration file and data are located in /var/wiki/ and runs the service on port 5001 on localhost. The service can be controlled by commands such as ‘service gitit start’. To see what exactly these commands do, consult /etc/systemd/system/gitit.service
. The wiki is backed up on gitlab at
https://gitlab.au.dk/ausa/wiki
Elog
https://elog.kern.phys.au.dk/
Configuration and data files are located in ´/usr/local/elog´. I installed this by doing (according to https://elog.psi.ch/elog/adminguide.html)
wget https://elog.psi.ch/elog/download/tar/elog-latest.tar.gz"https://elog.psi.ch/elog/download/tar/elog-latest.tar.gz
tar -xzvf elog-latest.tar.gz
Cd elog-3.1.5-1
sudo make install
After this the elog service can be started and stopped using service elog start
.
To hand over admin privileges of elog to another user, edit the file elogd.cfg
in /usr/local/elog/
on kern and change the field “Admin user” accordingly.
The elog is backed up every hour with a cronjob to https://gitlab.au.dk/ausa/kern/elog. The script that runs this cronjob is pushElog.sh in the elog directory. To monitor the cronjob, do crontab -e
cloudSRIM
https://srim.kern.phys.au.dk/
The files relevant to running cloudSRIM are located in /home/kernadmin/dock-cloudSRIM
. CloudSRIM is one of the services that runs in a docker container. Docker can be installed by following the instructions given in https://docs.docker.com/engine/install/ubuntu/.
Of particular importance is the Dockerfile, which controls what the docker containers contain and run.
Before starting cloudSRIM a docker network must be created with the name cloudNet:
docker network create --subnet=172.18.0.0/16 cloudNet
172.18.0.0 is the subnet where docker services will be hosted. This is where nginx expects to find these services. Now do, in the dock-cloudSRIM directory:
docker build -t munken/cloudsrim .
To build the image. Once built, running the shell script restart.sh should start (and of course also is able to restart) the cloudSRIM service. It may be necessary to open the firewall for this port. Once the service runs, one should be able to do
curl -f -H 'Content-Type: application/json' -X POST -d '{"Z1":3,"A1":9,"M":9.026790189057362,"target":{"Z":[1,6],"stoi":[4,2],"gas":false,"density":0.93}}' -o 'Li9_4H_32.dat' 172.18.0.2:5000
And get an output file that looks like
==================================================================
Calculation using SRIM-2006
SRIM version ---> SRIM-2012.01
Calc. date ---> July 12, 2024
==================================================================
…
Provided nginx is set up and is exposing cloudsrim, we can do
curl -f -H 'Content-Type: application/json' -X POST -d '{"Z1":3,"A1":9,"M":9.026790189057362,"target":{"Z":[1,6],"stoi":[4,2],"gas":false,"density":0.93}}' -o 'Li9_4H_32.dat' https://srim.kern.phys.au.dk/
ElossWeb
https://eloss.kern.phys.au.dk/
The relevant files are located in /home/kernadmin/eloss-dock
. The docker image is built by doing first
docker build -t elossweb .
From that directory, and then starting a container on the cloudNet (nginx expects this specific ip address):
docker run --net cloudNet --name elossweb --ip 172.18.0.5 -itd elossweb
Note: elossweb saves srim files so that it does not need to generate them again in the future. If for some reason elossweb cannot see cloudsrim, these files will be junk, and no output will be given when elossweb is prompted for that file. To delete them, such that elossweb is forced to download them again, do
docker exec -it elossweb bash
And remove the files from the .AUSAlib/SRIM13 directory
.
InfluxDB
Influxdb is a timeseries based database which has mainly been used to store data to be plotted in grafana. We use version 1, as upgrading to later versions is a significant upgrade that does not give any needed advantages. Installation guide can be found here: https://docs.influxdata.com/influxdb/v1/introduction/install/
The configuration file is located in /etc/influxdb/influxdb.conf
, and data in /var/lib/influxdb
. It is important that the influxdb user has permissions for this directory, which can be given by
sudo chown -R influxdb:influxdb /var/lib/influxdb/*
The data in /var/lib/influxdb/data
is a symlink to /mnt/nfs/influxdb/data
. When I set up influxdb on a different machine, this symlink gave problems. It might be better to use a bind mount instead, but we leave it as because it is not currently broken. Can be restarted with
sudo service restart influx
To access the DB directly, ssh into kern and run
influx
For basement-related stuff we use the database measure
, and measurements that corresponds to different services such as vacuum
, vulom
etc.
Grafana
https://grafana.kern.phys.au.dk/
To install, follow directions given here: https://grafana.com/grafana/download
The configuration file is /etc/grafana/grafan.ini
.
Code which defines the dashboards is located in /var/lib/grafana/
When setup, it can be restarted with
sudo service restart grafana-server
nginx
Nginx is responsible for exposing all the services to the internet. Configuration files are located in /etc/nginx/
. After I upgraded, I found that it was necessary to do
sudo chown -R nginx:nginx /var/lib/nginx/
and
sudo chmod -R 700 /var/lib/nginx/
or grafana and the elog would not behave properly (grafana opened with error message; images on elog would not load properly)
Snowfox
Located in the basement at the 5MeV accelerator.
Mux_client (snowfox)
Used to monitor trigger data Can be restarted on snowfox with
sudo service mux_client restart
Pfeiffer (snowfox)
Used to monitor vacuum in basement. Can be restarted on snowfox with
sudo service pfeiffer restart
Gitlab
We do not host our own gitlab server, instead we have one run by AU. In the AUSA group, users can be added and permissions changed by going to https://gitlab.au.dk/ausa and clicking “Manage”.
Legacy
Nodes and services which are no longer active but the information may still be useful.
kern
GitLab
Access: https://git.kern.phys.au.dk/
RUP
Runs on data_dumper@stkernfys
. We have a premium user with the username: ausadocker, and the usual password.
Code: https://git.kern.phys.au.dk/ausa/RUP
Settings: /etc/RUP/settings.yml
Log: /var/log/RUP.yml
Rules: /etc/RUP/rules.yml
Startup script: /etc/init/RUP.conf
Iterate directory: In case you want to unpack a bunch of files after they are already uploaded to the data directory you can use the iterate
option. Edit the rules.yml
and settings.yml
. On the data
user you can now execute (example to iterate all files in directory is561c/
):
/usr/bin/python /usr/local/bin/RUP --rules /etc/RUP/rules.yml --config /etc/RUP/settings.yml --log RUPis561c2.log --iterate is561c/
To start RUP simply go on stkernfys and type
service RUP start
RUP used docker. To login to docker go onto data_dumper on stkern and type
docker login
Collectd
Collectd is used to gather data on running processes. At the moment it is running on Snowfox and the data is sent to Influx in the collectd database. The config file is /etc/collectd/collectd.conf, and the service can be restarted with
service collectd restart.
RunDB
Access: https://rundb.kern.phys.au.dk/
Runs in docker.
Docker image: rundb
Docker container: rundb
Docker file: https://git.kern.phys.au.dk/ausa/rundb-docker
Cloned to /home/kernadmin/rundb-docker
Build image with: docker build -t rundb .
Database is in /opt/rundb/runs.db
Log files are in opt/rundb/log_files
Container started with /home/kernadmin/rundb-docker/run.sh
Runs on port 8092
nginx-configuration: /etc/nginx/sites-available/rundb.kern.phys.au.dk.conf
NB! If rundb is not working, it can be started with the command sudo docker start rundb
DAQC
Access: https://daqc.kern.phys.au.dk/
stkernfys
Access: stkernfys.phys.au.dk
What do we use it for:
Nice to know:
ci-kern
Access: ci-kern.phys.au.dk If ci-kern runs out of space, either prune the docker image or reinstall it.
One can prune the docker images by typing
docker system prune
on ci-kern.
If pruning does not clear enough space, then one can delete all dangling images:
docker volume rm $(docker volume ls -qf dangling=true)
If we need to install a new runner, then type
gitlab-runner register
Make sure that tags = docker, default image = ubuntu, executer = docker.
Docker
We use docker to pull images in on ci-kern for CI implementation and for RUP to do automatic unpackinga and handling of data files. The docker user that is used on ci-kern and in RUP (data_dumpers docker login) is AUSA’s own user. This is a “pro” user with unlimited pulls. The user is currently set up on Hans’ email with username “ausadocker” og det sædvandlige kodeord.