Install bcbio-nextgen

bcbio is a python toolkit providing current best practice pipelines for next-generation sequencing data analysis.
  1. Check environment
  2. Install bcbio
  3. Install pre-built genomes
  4. Upgrade instructions
  5. Install custom genomes
    1. GENCODE and Ensembl scripts

This guide uses the bash shell to install bcbio in a Linux environment. To run bcbio on macOS or Windows, refer to bcbio-vm, which is designed to run inside an isolated container.

Check environment

Before running the bcbio installer, ensure that you’re running a clean environment. We recommend keeping the configuration inside your ~/.bashrc file to a minimum. In particular, check that conda is not actively loaded or exported in $PATH:

# conda should not be loaded during install.
which conda
echo $PATH

We recommend installing bcbio into a share with a decent amount of disk space. In general, we recommend using a share with at least 100 GB of free space, since the genome builds can be quite large. This guide assumes that bcbio will be installed inside a directory symlinked to ~/bcbio.

Install bcbio

Here we are installing the latest stable build of bcbio as minimally as possible by default, without all of the pre-built genomes.

cd ~/bcbio
mkdir stable tools

Download the installer script.

wget https://raw.github.com/bcbio/bcbio-nextgen/master/scripts/bcbio_nextgen_install.py

bcbio has a lot of configuration options. The official installation guide is very thorough. Additionally, you can refer to help directly inside the terminal.

python bcbio_nextgen_install.py --help

We recommend enabling these options by default:

Note that --aligners and --genomes arguments will append, so call it multiple times to install multiple aligners and/or genomes.

If you want to install bcbio without any pre-built genomes, set the --nodata flag. I recommend installing at least 1 pre-built genome (e.g. hg38).

python bcbio_nextgen_install.py stable \
    --upgrade="stable" \
    --tooldir="tools" \
    --aligners="hisat2" \
    --aligners="minimap2" \
    --aligners="bowtie2" \
    --datatarget="rnaseq" \
    --genomes="hg38" \
    --genomes="hg19" \
    --cores="8" \
    --isolate \
    --minimize-disk

The main bcbio_nextgen.py program will be installed to toolsdir/bin. After installation has finished, add the directory path where this is located to the $PATH variable in your ~/.bashrc file and relaunch the terminal.

export BCBIO_DIR="${HOME}/bcbio/tools/bin"
export PATH="${BCBIO_DIR}:${PATH}"

NOTE: bcbio doesn’t have to be exported to $PATH to work. You can call bcbio_nextgen.py directly to run the pipeline.

python "${BCBIO_DIR}/bcbio_nextgen.py"

This enables installation of multiple versions of bcbio on a single machine.

Install pre-built genomes

bcbio currently requires a least 1 pre-built genome to be installed, otherwise some errors will occur, due to Galaxy not being properly configured internally. Generally, we recommend installing the pre-built human and mouse genomes:

Here’s how to install additional pre-built genomes after bcbio has been installed:

bcbio_nextgen.py upgrade \
    --upgrade="skip" \
    --genomes="hg38-noalt" \
    --genomes="mm10" \
    --cores=8

Upgrade instructions

bcbio should be periodically upgraded on a schedule. In the future, here’s how to update to the newest stable release. Note that all tools (--tools) and genomes (--data) will also be upgraded, which is optional.

bcbio_nextgen.py upgrade --upgrade="stable" --tools --data

Install custom genomes

We recommend performing RNA-seq analysis using the latest genome annotations from GENCODE (human, mouse) or Ensembl. For human datasets, we recommend using GRCh38 instead of the legacy GRCh37 build.

For this, we need to run the bcbio_setup_genome.py script.

bcbio_setup_genome.py --help

If you want to obtain a list of available genomes, simply run bcbio_setup_genome.py without any flags.

In particular, these configuration options are useful when installing custom genomes:

GENCODE and Ensembl scripts

Scripts for installing current GENCODE and Ensembl genome builds into bcbio are available in the koopa bootloader. They are located in the bcbio_setup_genome directory of the package.