Install R on an HPC cluster using bioconda

Maintain a consistent R environment across HPC environments with conda.
conda, HPC, Python, R

Install conda

We recommend installing conda with either Anaconda or Miniconda.

Once conda is installed, upgrade the conda base installation.

conda update -n base --channel=defaults conda
conda update -n base --channel=defaults --all

bash configuration

Up to v4.3, the location of the bin directory should be into $PATH in ~/.bash_profile:

export PATH="$CONDA_DIR/bin:$PATH"

As of the v4.4 update, the loading configuration has changed. Now a profile script must be sourced in ~/.bash_profile:

. "$CONDA_DIR/etc/profile.d/"

Set up channels

Ensure that bioconda channels are added in the following order:

conda config --add channels defaults
conda config --add channels conda-forge
conda config --add channels bioconda

A ~/.condarc file should be created that contains the following:

  - bioconda
  - conda-forge
  - defaults

Manging environments

A list of installed conda environments can be obtained with:

conda env list

A conda environment can be deactivated with:

conda deactivate

Here’s how to remove an environment:

conda env remove --name=R-3.4.1-YYYYMMDD

Create R 3.4.1 environment

Note that pandoc version 2 currently creates issues rendering R Markdown templates properly.

conda create --name=R-3.4.1-20180614 \
    blas \
    gcc \
    hdf5=1.10.1 \
    java-jdk \
    libgfortran \
    libiconv \
    mysql \
    openblas \
    pandoc=1 \
    r-base=3.4.1 \
    umap-learn \
mkdir -p ~/R/library/3.4-bioc-release-YYYYMMDD/library

Create ~/.Renviron and ~/.Rprofile files, using the recommended defaults from seqcloud.

Now let’s activate the conda environment and run R.

conda activate R-3.4.1-20180614


XML package

The XML package is known to have compilation issues on CentOS. If you run into this issue, here’s how to build it from source on the Harvard O2 cluster:

R CMD INSTALL --configure-args='XML_CONFIG=/n/app/libxml2/2.9.4/bin/xml2-config' ${pkg_file}