Docker/Singularity at the Martinos Center
Due to
security concerns
with Docker, we do not support running Docker in full access
mode on our Linux
workstations or the compute cluster. In limited cases, we do support Docker in
isolation (namespace remap) mode. This mode lets you build
Docker containers but the isolation restriction prevents
proper binding of local storage into the container when running them. This means
the programs in the container cannot operate on your data outside
the container.
In the HPC community a popular alternative is
Singularity.
This lets you run most Docker containers used
for data analysis type workflows. It also lets you access
the data on storage outside the container with the bind
mechanism giving you the same access you have normally.
However the big issue with Singularity is in general it
requires root access to build new Singularity images from scratch.
But there are a couple of workarounds to that.
Setup your environment for Docker/Singularity
Docker is not installed on the center's Linux Workstations by default.
If you want it installed, you need to request it from the Martinos
Help desk. Check for existance of the /var/run/docker.sock file. Singularity, being a simple user-space program is installed
everywhere.
A very important issue is that Docker and Singularity can end up
writing many GBs to areas of your home directory that will overflow
your quota. To prevent this you must symlink link the directories
they use to point
to other storage volumes you own or in your group.
The two places that need symlinking are ~/.singularity
and for Docker/Podman ~/.local/share/containers. Here is an example:
cd /cluster/mygroup/users/raines #<- change this to your group area
mkdir singularity
mkdir singularity/tmp
mkdir singularity/cache
rm -rf ~/.singularity
ln -s $PWD/singularity ~/.singularity
mkdir docker
rm -rf ~/.local/share/containers
ln -s $PWD/docker ~/.local/share/containers
(do this or singularity/docker will fill up homedir or workstation OS disk at /tmp)
NOTE: In some cases SINGULARITY_TMPDIR
and SINGULARITY_CACHEDIR must
be on local disk space instead of network space. In these cases
you should temporarily set these in your shell to space under /scratch
if that exists on the machine. On the MLSC cluster this is done
automatically. You need to do this for builds (see below)
Running pre-built images from Docker Hub
Here is an example using dhcp-structural-pipeline.
cd /cluster/mygroup/users/raines
mkdir dhcp-structural-pipeline
cd dhcp-structural-pipeline
singularity pull dhcpSP.sif docker://biomedia/dhcp-structural-pipeline:latest
(this will take a long time)
mkdir data
(copy your T1 and T2 data into new data subdir)
singularity run -e -B $PWD/data:/data dhcp-SP.sif \
subject1 session1 44 -T1 /data/T1.nii.gz -T2 /data/T2.nii.gz -t 8 -d /data
The -e option sets a clean environment before running the container.
For some software, you don't want to do this in order to pass in settings
by enviornment variables. If you do not use -e then you should
remove certain variables that can break the container.
The LD_LIBRARY_PATH is one example that will really screw things up.
Variables with XDG and DBUS in their name can also cause
problems.
In dhcp-structural-pipeline for example, if you have FSLDIR set to something under /usr/pubsw or /space/freesurfer it will fail since those paths will not be found in the container. Be aware and be careful.
Checkout using -e combined with the --env-file option
for more consistant control of your shell environment when using containers
Also note you do not need to pull the image every time you run it.
You only pull it again to get new versions.
To make the file path environment inside the container very much like
it is outside the container when running things normally on Martinos
Linux machines, you can add the following options:
-B /autofs -B /cluster -B /space -B /homes -B /vast -B /usr/pubsw -B /usr/local/freesurfer
This would let you source the Freesurfer environment as normal inside
the container. For that though the container would be need to be
one with a non-minimal OS install that would have all the
system libraries Freesurfer requires.
WARNING: The NVIDIA NGC containers do a 'find -L /usr ...' in the
entrypoint script on startup. So doing -B /usr/pubsw ends up making startup take over 15 minutes as it then searchs the 100's of GBs of files in
/usr/pubsw! This 'find' is pretty useless so there are two
solutions:
- Just do not use -B /usr/pubsw if you don't need that path
in what you are running
- Add -B /cluster/batch/IMAGES/nvidia_entrypoint.sh:/usr/local/bin/nvidia_entrypoint.sh to your Singularity command line to overwrite the entrypoint script with a copy I made that removes the 'find'
We have also discovered that most Docker images built for NVIDIA GPU
use in AI/ML try to do some fancy stuff in their entrypoint script
on startup to put the "correct" CUDA libs in a directory named
/usr/local/cuda/compat/lib. You will get errors regarding changing
this in Singularity since the container's internal filesytem is
unwritable. Also your CUDA programs in the container might fail
if the CUDA libs in that directory are used.
To fix this add as an option to singularity
-B /opt:/usr/local/cuda/compat
This basically just nullfies that directory so it is not used
and has no libaries. Singularity automatically adds to the
LD_LIBRARY_PATH defined in the container a directory with the
correct CUDA libs matching the driver running on the host.
For more info on singularity options run man singularity-run
or man singularity-exec
or read the User Guide.
The difference between run and exec is that run
will run the default entrypoint startup script built-in to the container
while exec will just run the command you give on the command line
instead and skip the startup configuration.
Building your own Singularity images
Typically, building Singularity images locally requires full
root access via sudo which we
will not give on our Linux workstations. There are two workarounds
to this. The simplest is that the organization that
makes Singularity now has a
remote build option that works well you can register for.
create Singularity definition file
singularity remote login
singularity build --remote myimage.sif Singularity
If there is anything sensitive in your build though you should not use this.
Instead you should pull a SIF image of the base image you want to start
with and then modify it as shown in the fakeroot/writable example
in the next section below.
Modify existing Singularity images
First you should check if you really need to modify the image.
For example, if you are using Python in an image and simply need
to add new packages via pip you can do that without
modifying the image using PYTHONUSERBASE that you bind mount
into the container. For example:
cd /cluster/itgroup/raines
mkdir -p local/lib
vi vars.txt #create it with your favorite editor (emacs, pico)
cat vars.txt
----------------------------------------------------------
| PYTHONUSERBASE=/cluster/itgroup/raines
| PYTHONPATH=$PYTHONUSERBASE/lib/python3.7/site-packages
| PATH=$PYTHONUSERBASE/bin:$PATH
----------------------------------------------------------
singularity exec --nv --env-file vars.txt \
-B /cluster/itgroup/raines -B /scratch:/scratch \
-B /autofs -B /cluster -B /space -B /vast \
/cluster/batch/IMAGES/tensorflow-20.12-tf2-py3.sif \
pip3 install nibabel
singularity exec --nv --env-file vars.txt \
-B /cluster/itgroup/raines -B /scratch:/scratch \
-B /autofs -B /cluster -B /space -B /vast \
/cluster/batch/IMAGES/tensorflow-20.12-tf2-py3.sif \
python3 /cluster/itgroup/raines/script_needing_nibabel_and_TF.py
To modify a existing SIF image container file, one needs to first
convert it to a sandbox, run a shell inside the sandbox in
fakeroot/writable mode and do the steps in that shell to modify the
container as desired. Then you exit the container and convert
the sandbox to a SIF file.
For this to work
you will have to email help@nmr.mgh.harvard.edu to request
to be added to the /etc/subuid file
on the machine you will use for builds to turn on user namespace mapping.
That machine need to have a large /scratch volume too (sandboxes do not
work on network mounted volumes). You then
do something like this example:
mkdir -p /scratch/$USER/{tmp,cache}
cd /scratch/$USER
export SINGULARITY_TMPDIR=/scratch/$USER/tmp
export SINGULARITY_CACHEDIR=/scratch/$USER/cache
singularity build --sandbox --fakeroot myTF \
/cluster/batch/IMAGES/tensorflow-20.11-tf2-py3.sif
singularity shell --fakeroot --writable --net myTF
> apt-get update
> apt-get install -qqy python3-tk
> python3 -m pip install matplotlib
> exit
singularity build --fakeroot /cluster/mygroup/users/$USER/myTF.sif myTF
NOTE: you can do a rm -rf /scratch/$USER afterward
but there will a be a few files you cannot delete due to the
namespace mapping that happens. The daily /scratch cleaner job
will eventually clean it up.
Building your own Docker image and running with Singularity
There are plenty of tutorials on building Docker images online. You
should go a read one of them to get started (here is the official one).
The main things you
need to keep in mind are to tag each build of your image with
a unique version tag and that you DO NOT need to push/upload the image
to any hub. The image you build is not though a single file. It
is a special overlay that ends up under /var/lib/docker but that
you never touch directly. All interaction is via the docker
subcommands.
docker build --tag proj_A_recon:v1 .
docker image ls
Note not all directives in a Dockerfile will convert to Singularity
so some should be avoided. More info can
be found
here. Basically, only the FROM, COPY, ENV, RUN, CMD,
HEALTHCHECK, WORKDIR and LABEL directives are supported.
Directives that effect the eventual runtime of the container
like VOLUME will not translate.
You can also do a "docker run -it --rm proj_A_recon bash" to
shell into your container to verify and test things internally.
Next step is to convert to a Singularity SIF image. This will be
a single file created in the directory you run the command in.
singularity build proj_A_recon.sif docker-daemon://proj_A_recon:v1
And once that is done you can run it.
singularity run -B /cluster/mygroup/data:/data proj_A_recon.sif
or
singularity exec -B /cluster/mygroup/data:/data proj_A_recon.sif /bin/bash
|