Make a GenomicRanges object

makeGRangesFromEnsembl(organism, level = c("genes", "transcripts"),
  genomeBuild = NULL, release = NULL)

makeGRangesFromEnsDb(object, level = c("genes", "transcripts"))

makeGRangesFromGFF(file, level = c("genes", "transcripts"))

annotable(organism, level = c("genes", "transcripts"),
  genomeBuild = NULL, release = NULL)

Arguments

organism

character(1). Full Latin organism name (e.g. "Homo sapiens").

level

character(1). Return ranges as "genes" or "transcripts".

genomeBuild

character(1). Ensembl genome build assembly name (e.g. "GRCh38"). If set NULL, defaults to the most recent build available. Note: don't pass in UCSC build IDs (e.g. "hg38").

release

integer(1). Ensembl release version (e.g. 90). If set NULL, defaults to the most recent release available.

object

Object.

file

character(1). File path.

Value

GRanges.

Functions

  • makeGRangesFromEnsembl: Quickly obtain gene and transcript annotations from Ensembl using AnnotationHub and ensembldb.

    Simply specify the desired organism, using the full latin name. For example, we can obtain human annotations with Homo sapiens. Optionally, specific Ensembl genome builds (e.g. GRCh38) and release versions (e.g. 87) are supported.

    Under the hood, this function fetches annotations from AnnotationHub using the ensembldb package. AnnotationHub supports versioned Ensembl releases, back to version 87.

    Genome build: use "GRCh38" instead of "hg38" for the genome build, since we're querying Ensembl and not UCSC.

  • makeGRangesFromEnsDb: Use specific EnsDb object as annotation source. Alternatively, can pass in an EnsDb package name as a character(1).

  • makeGRangesFromGFF: The GFF (General Feature Format) format consists of one line per feature, each containing 9 columns of data, plus optional track definition lines. The GTF (General Transfer Format) is identical to GFF version 2. We recommend using a GTF file instead of a GFF3 file, if possible.

    The UCSC website has detailed conventions on the GFF3 format, including the metadata columns.

    Remote URLs and compressed files are supported.

  • annotable: annotable() is a legacy convenience function that calls makeGRangesFromEnsembl() and returns a tibble instead of GRanges. Note that GRanges can also be coercing using as.data.frame().

Broad class definitions

For gene and transcript annotations, a broadClass column is added, which generalizes the gene types into a smaller number of semantically-meaningful groups:

  • coding.

  • noncoding.

  • pseudo.

  • small.

  • decaying.

  • ig (immunoglobulin).

  • tcr (T cell receptor).

  • other.

GRCh37 (hg19) legacy annotations

makeGRangesFromEnsembl supports the legacy Homo sapiens GRCh37 (release 75) build by internally querying the EnsDb.Hsapiens.v75 package. Alternatively, the corresponding GTF/GFF file can be loaded directly from GENCODE or Ensembl.

See also

Examples

## makeGRangesFromEnsembl ==== ## Genes x <- makeGRangesFromEnsembl("Homo sapiens", level = "genes")
#> Making GRanges from Ensembl.
#> Matching EnsDb from AnnotationHub 2.14.3 (2018-10-24).
#> AH64923: Ensembl 94 EnsDb for Homo sapiens
#> Making GRanges from EnsDb object.
#> - Organism: Homo sapiens #> - Genome Build: GRCh38 #> - Ensembl Release: 94 #> - Level: genes
#> Defining broadClass using: geneName, geneBiotype, seqnames
#> Arranging by geneID.
#> [1] "GRanges object with 65687 ranges and 8 metadata columns"
## Transcripts x <- makeGRangesFromEnsembl("Homo sapiens", level = "transcripts")
#> Making GRanges from Ensembl.
#> Matching EnsDb from AnnotationHub 2.14.3 (2018-10-24).
#> AH64923: Ensembl 94 EnsDb for Homo sapiens
#> Making GRanges from EnsDb object.
#> - Organism: Homo sapiens #> - Genome Build: GRCh38 #> - Ensembl Release: 94 #> - Level: transcripts
#> Defining broadClass using: geneName, transcriptBiotype, seqnames
#> Arranging by transcriptID.
#> [1] "GRanges object with 228432 ranges and 15 metadata columns"
## makeGRangesFromEnsDb ==== x <- makeGRangesFromEnsDb("EnsDb.Hsapiens.v75")
#> Making GRanges from EnsDb object.
#> Loading required namespace: EnsDb.Hsapiens.v75
#> - Organism: Homo sapiens #> - Genome Build: GRCh37 #> - Ensembl Release: 75 #> - Level: genes
#> Defining broadClass using: geneName, geneBiotype, seqnames
#> Arranging by geneID.
## makeGRangesFromGFF ==== file <- file.path(basejumpCacheURL, "example.gtf") ## Genes x <- makeGRangesFromGFF(file = file, level = "genes")
#> Making GRanges from GFF/GTF file.
#> Importing example.gtf using rtracklayer::import().
#> Ensembl GTF detected.
#> 17 gene annotations detected.
#> Defining broadClass using: geneName, geneBiotype, seqnames
#> Arranging by geneID.
#> [1] "GRanges object with 17 ranges and 8 metadata columns"
## Transcripts x <- makeGRangesFromGFF(file = file, level = "transcripts")
#> Making GRanges from GFF/GTF file.
#> Importing example.gtf using rtracklayer::import().
#> Ensembl GTF detected.
#> 20 transcript annotations detected.
#> Defining broadClass using: geneName, geneBiotype, seqnames
#> Arranging by transcriptID.
#> [1] "GRanges object with 20 ranges and 16 metadata columns"