bcbio-nextgen Homo sapiens reference genome

Details on the assembly sources used for the hg38 genome build.
  1. Reference genome for alignment (seq/)
    1. HISAT2
  2. Reference transcriptome for RNA-seq (rnaseq/)
  3. YAML recipe links

Reference genome for alignment (seq/)

The hg38 (UCSC; Ensembl GRCh38) reference genome build is from the 1000 Genomes project. This is derived from the NCBI set with HLA and decoy alternative alleles. Note that this genome was chosen because it’s considered the best current reference suitable for variant calling. This is the reference genome FASTA that gets used by aligners, including HISAT2, STAR, bowtie2, and bwa.

Relevant links, with version information:

See also:


HISAT2 pre-built index version information:

Reference transcriptome for RNA-seq (rnaseq/)

bcbio-nextgen currently uses the latest Ensembl GRCh38 reference genome for the hg38 transcripts FASTA and GTF. Note that the annotations are remapped from Ensembl to UCSC (see gtf.yaml recipe link below). This file gets used by salmon and kallisto for transcript-level quantification.

YAML recipe links

Current hg38 recipes:

Recipes per workflow, defined in CloudBioLinux: