Abstract

Abstract placeholder text.

Overview

The goal of the worminfo package is to quickly and simply provide correct, up-to-date annotation data for the entire C. elegans genome. The underlying gene annotations are periodically sourced from WormBase, Ensembl, and PANTHER. RNAi clone annotations are sourced from the official ORFeome and Ahringer repositories. WormBase is updated on a bi-monthly release schedule. If the source data appears outdated, file an update request on our GitHub repository, and we’ll promptly issue a rebuild. When the package is loaded, the most recent annotation data is downloaded from our server then cached locally for quick access.

source("https://bioconductor.org/biocLite.R")
biocLite(c(
    "steinbaugh/basejump",
    "steinbaugh/worminfo"
))
biocLite("tidyverse")
library(basejump)
library(worminfo)
library(tidyverse)

Gene queries

gene() accepts multiple identifier formats:

  • gene
  • sequence
  • name
  • class
  • keyword

Output from the function is controlled using the select parameter. The function returns a tibble. We advise that subsequent data manipulations be performed with the tidyverse collection of packages, dplyr in particular.

gene("WBGene00004804", format = "gene")
gene("T19E7.2", format = "sequence")
gene("skn-1", format = "name")

These queries all return the same result:

## Observations: 1
## Variables: 3
## $ gene     <chr> "WBGene00004804"
## $ sequence <chr> "T19E7.2"
## $ name     <chr> "skn-1"

You can obtain all genes in a named class. For example, let’s get the current list of daf (Dauer Arrest Forming) genes:

gene("daf", format = "class")
## # A tibble: 34 x 4
##    class           gene sequence   name
##    <chr>          <chr>    <chr>  <chr>
##  1   daf WBGene00000897  F29C4.1  daf-1
##  2   daf WBGene00000898 Y55D5A.5  daf-2
##  3   daf WBGene00000899  F25E2.5  daf-3
##  4   daf WBGene00000900  C05D2.1  daf-4
##  5   daf WBGene00000901  W01G7.1  daf-5
##  6   daf WBGene00000902  F31F6.5  daf-6
##  7   daf WBGene00000903  B0412.2  daf-7
##  8   daf WBGene00000904 R05D11.1  daf-8
##  9   daf WBGene00000905  T13C5.1  daf-9
## 10   daf WBGene00000906  F23B2.4 daf-10
## # ... with 24 more rows

You can perform a keyword search. For example, let’s get all genes that are involved in the unfolded protein response:

gene("unfolded protein response", format = "keyword")
## # A tibble: 65 x 4
##              gene                   keyword  sequence   name
##             <chr>                     <chr>     <chr>  <chr>
##  1 WBGene00000024 unfolded protein response     AC3.3  abu-1
##  2 WBGene00000025 unfolded protein response  F19G12.7  abu-2
##  3 WBGene00000026 unfolded protein response   F31A3.1  abu-3
##  4 WBGene00000027 unfolded protein response   Y5H2A.3  abu-4
##  5 WBGene00000028 unfolded protein response Y105C5A.4  abu-5
##  6 WBGene00000029 unfolded protein response   C03A7.7  abu-6
##  7 WBGene00000030 unfolded protein response   C03A7.8  abu-7
##  8 WBGene00000031 unfolded protein response  C03A7.14  abu-8
##  9 WBGene00000032 unfolded protein response  R09F10.2  abu-9
## 10 WBGene00000033 unfolded protein response   F35A5.3 abu-10
## # ... with 55 more rows

select arguments

The gene() function contains annotation data that can be queried using the select parameter:

  • gene
  • name
  • sequence
  • status
  • otherIDs
  • descriptionConcise
  • descriptionProvisional
  • descriptionDetailed
  • descriptionAutomated
  • class
  • ortholog
  • rnaiPhenotype
  • pantherUniprotKB
  • pantherSubfamily
  • pantherFamilyName
  • pantherSubfamilyName
  • pantherGoMF
  • pantherGoBP
  • pantherGoCC
  • pantherClass
  • pantherPathway

When select is NULL, a simple result is returned:

  • gene
  • sequence
  • name

Let’s select() the status and otherIDs for sbp-1 (bHLH transcription factor):

gene("sbp-1",
     format = "name",
     select = c("status", "otherIDs")) %>%
    glimpse()
## Observations: 1
## Variables: 5
## $ name     <chr> "sbp-1"
## $ gene     <chr> "WBGene00004735"
## $ sequence <chr> "Y47D3B.7"
## $ status   <chr> "Live"
## $ otherIDs <chr> "Y47D3B.7, sbp-1, lpd-1, hlh-20"

RNAi clone queries

RNA interference libraries are commonly used in C. elegans research. RNAi clones are typically referenced by a gene pair or open reading frame (ORF) sequence, which are often difficult to map to an up-to-date gene identifier. We have written the rnai() function to solve this problem. The function enables a researcher to search for RNAi clones by a gene identifier, or map an RNAi clone by plate location and return the up-to-date WormBase gene identifier. The rnai() function has built-in support for the commonly used ORFeome and Ahringer RNAi libraries.

Find clones by gene identifier

Let’s find the locations of sbp-1 in the Ahringer and ORFeome libraries. This can be done easily with the rnai() function, just search by name:

rnai("sbp-1", format = "name") %>%
    glimpse()
## Observations: 1
## Variables: 7
## $ name        <chr> "sbp-1"
## $ gene        <chr> "WBGene00004735"
## $ sequence    <chr> "Y47D3B.7"
## $ genePair    <chr> "Y47D3B.7"
## $ orfeome96   <chr> "10059-H06, 11010-G06"
## $ ahringer384 <chr> "III-6-C01"
## $ ahringer96  <chr> "86-B01"

Alternatively, you can map by gene, sequence (ORF), or legacy genePair. These queries return the same result:

rnai("WBGene00004735", format = "gene")
rnai("Y47D3B.7", format = "sequence")
rnai("Y47D3B.7", format = "genePair")

Map clones by plate location

The rnai() function was written to simplify the task of identifying wells of interest in an RNAi screen. You can simply input the well identifiers by plate location, and rnai() will return up-to-date gene identifiers from WormBase.

Ahringer library format

Well identifiers for the Ahringer 384 well library must include a chromosome prefix in roman numeral format. The 96 well Ahringer library has unique plate numbers and does not require a chromosome prefix.

c("ahringer384-III-6C01",
  "ahringer96-86-B01") %>%
    rnai() %>%
    glimpse()
## Observations: 2
## Variables: 4
## $ clone    <chr> "ahringer384-III-6C01", "ahringer96-86-B01"
## $ gene     <chr> "WBGene00004735", "WBGene00004735"
## $ sequence <chr> "Y47D3B.7", "Y47D3B.7"
## $ name     <chr> "sbp-1", "sbp-1"

ORFeome library format

We advise formatting identifiers for the 96 well ORFeome library as orfeome96-10001-A01. However, the function is flexible, and also supports the historical format (e.g. GHR-11010@G06):

c("GHR-11010@G06",
  "orfeome96-11010-G06") %>%
    rnai() %>%
    glimpse()
## Observations: 2
## Variables: 4
## $ clone    <chr> "GHR-11010@G06", "orfeome96-11010-G06"
## $ gene     <chr> "WBGene00004735", "WBGene00004735"
## $ sequence <chr> "Y47D3B.7", "Y47D3B.7"
## $ name     <chr> "sbp-1", "sbp-1"

Construct a cherrypick library

Here’s a how to generate a list of ORFeome clones that target the unfolded protein response:

cherrypick("unfolded protein response", format = "keyword")
## # A tibble: 62 x 7
##                      keyword           gene  sequence   name
##                        <chr>          <chr>     <chr>  <chr>
##  1 unfolded protein response WBGene00000024     AC3.3  abu-1
##  2 unfolded protein response WBGene00000025  F19G12.7  abu-2
##  3 unfolded protein response WBGene00000026   F31A3.1  abu-3
##  4 unfolded protein response WBGene00000027   Y5H2A.3  abu-4
##  5 unfolded protein response WBGene00000028 Y105C5A.4  abu-5
##  6 unfolded protein response WBGene00000029   C03A7.7  abu-6
##  7 unfolded protein response WBGene00000030   C03A7.8  abu-7
##  8 unfolded protein response WBGene00000031  C03A7.14  abu-8
##  9 unfolded protein response WBGene00000032  R09F10.2  abu-9
## 10 unfolded protein response WBGene00000033   F35A5.3 abu-10
## # ... with 52 more rows, and 3 more variables: orfeome96 <chr>,
## #   ahringer384 <chr>, ahringer96 <chr>