makeGene2symbolFromGFFfunctions now support the
uniqueargument, which returns sanitized values in the
geneNamecolumn, ensuring there are no duplicates. This is enabled by default (recommended) but can be disabled using
unique = FALSE. This functionality was added to ensure consistent gene name handling in single-cell RNA-seq analyses.
basejump.save.compressglobal options, so the desired file type (e.g. RDS instead of RDA) and compression (e.g. xz instead of gzip) can be easily specified for an entire project.
sampleNamesnow supports assignment for
lanePatternregular expression pattern as a global, which was previously defined in the bcbioBase package.
ascoercion method support. Need to ensure
exportMethods(coerce)is included in
as(x, "tbl_df")won’t work when called from an Rscript without the package library loaded. Thanks @roryk for noticing this.
gene2symbolgeneric to use
..., since we’ve added the
unique = TRUEargument in this release.
annotable: Moved to
makeGRanges.Rfile, and improved the internal code to export supported formals also used in
makeGRangesFromEnsembl. The function should work exactly the same as previous releases, but now with clearer supported arguments in the documentation.
cleanSystemLibrary, since Travis CI installs packages into the system library, and causes this check to return
as(from, "tbl_df")for internal tibble coercion in all functions.
organismargument matching no longer suggests a default. The current list of supported organisms is in the documentation, and described in the internal
validObjectvalidity checks, where applicable.
sampleData: Made validity check stricter, requiring
sampleNamecolumn to be defined, otherwise the function will intentionally error.
formalsinternall to keep ggplot2 theme formals consistent.
cleanSystemLibrary: Utility function to check whether a user has installed packages into the R system library. Refer to
.libPathsdocumentation for more information on library paths.
genomeBuildfor Ensembl annotation functions. The
genomeBuildargument still works but now will inform the user about the change.
prepareTemplatefrom bcbioBase package. Simplifed this function to copy all files inside
extdata/rmarkdown/sharedwithin a specified package. Currently in use for bcbioRNASeq, bcbioSingleCell, and the new pointillism clustering package.
In this release, we are migrating some of the S4 generics previously exported in the bcbioBase package. We are consolidating these functions here for simplicity and stability.
prepareSummarizedExperiment, previously exported in the bcbioBase package. We are using the
makeprefix here for consistency (see other gene annotation functions).
curl::has_internetinternally to check for Internet connection. This applies to the annotation functions that query web databases.
SummarizedExperimentto an unstructured
list. This is the method used internally for
matchInterestingGroups: New developer function to automatically handle
interestingGroupsargument used across various plotting functions and in the bcbio infrastructure packges.
SummarizedExperimentmethod support for converting objects containing gene symbols (“geneName”) as rownames back to gene identifiers (“geneID”).
eggnog: quickly download current annotations from EggNOG database. Useful for annotating gene-to-protein matches; currently in use with the brightworm RNAi screening package, which contains WormBase gene ID and EggNOG ID annotations.
dplyr::pullto reexported functions.
atomicas primary argument.
SummarizedExperiment. Previously, if
geneNamecolumn was a factor, this function would error. This issue has been fixed by ensuring that the symbols provided in
geneNameare coerced to a character vector.
makeGRangesFromGFFand other GFF utility functions, including
makeTx2geneFromGFF. Note that
makeGRangesFromGFFnow returns additional metadata columns accessible with
S4Vectors::mcols, and that these columns are now sorted alphabetically.
SummarizedExperiment class method support for
ggplot2::theme_linedraw, improving the consistency between these themes.
single_cell_counts, instead of the previous camel case conventions:
readFileByExtensionfunction. Use the
makeNamesfamily of functions manually after data import instead. This helps avoid unwanted sanitization of data.
textas the primary argument, instead of
makeGRangesFromEnsemblnow supports remapping of UCSC genome build to Ensembl. However, this isn’t recommended, and will warn the user.
NULLinstead of erroring on genome build match failure.
stripTranscriptVersionsnow matches “
-”, and “
_” version delimiters.
assertIsDataFrameOrNULLin a future release.
rtracklayer::importinternally to return GFF file as a
GRangesobject instead of a
theme_paperwhite. Now using British spelling internally for ggplot code.
theme_paperwhite, removing the black box around the labels when facet wrapping is enabled.
geomeanin favor of
panther(organism = "XXX").
emptyRangesenables easy creation of placeholder ranges for
GRangesobjects, where transgene and FASTA spike-ins are needed.
hgnc2geneenables easy mapping of HGNC to Ensembl gene identifiers.
mgi2geneenables easy mapping of MGI to Ensembl gene identifiers.
pantherfunction enables easy querying of the PANTHER website. Human, mouse, nematode worm, and fruit fly are currently supported. The specific PANTHER release (e.g. 13) can be declared using the
releaseargument. Otherwise, the function will return the most recent annotations from the PANTHER website.
readJSONadds support for JSON files. Like the other read functions, it supports both local files and remote URLs.
theme_paperwhiteprovide minimal, high contrast ggplot2 themes with bold sans serif labels.
.RDatafiles. The function will error by design if multiple data extensions are detected inside the directory specified with the
.biocLitefunction. Now using
requireNamespaceinstead, without attempting to install automatically.
stopinstead of the rlang equivalents.
multiassignAsEnviris now recommended in place of
readFileByExtensionwill now attempt to use the rio package for file extensions that are not natively supported.
assertFormalAnnotationColto bcbioBase package.
transcripts. These functions allow the return of
data.frameclass objects from AnnotationHub using ensembldb.
broadClassdefinition code to match against chromosome from Ensembl if available.
loadDataAsNamenow works with unquoted names, improving consistency with
convertUCSCBuildToEnsemblfunction, for easy remapping of UCSC to Ensembl genome build names (e.g.
plotCorrelationHeatmaphere from bcbioRNASeq, for improved consistency with other heatmap functions.
base::make.namesthat sanitizes using underscores (“_“) rather than dots (”.“).
readYAMLfrom a generic to standard function.
annotablefunction has been deprecated in favor of the new
checkAnnotabledeprecated in favor of
checkGene2symboldeprecated in favor of
checkTx2genedeprecated in favor of
assertFormalColorFunctiondeprecated in favor of
initializeDirdeprecated in favor of
theme_midnightalias to match the syntax in the ggplot2 package.
annotableto simply work on the Entrez identifier column (
entrez). If a manually passed in data frame still has duplicates, the function will now abort instead of attempting to use
convertTranscriptsToGenesfunctions. Previously some of this functionality was contained within the
tx2genegenerics for the character method. This behavior was inconsistent with
tx2geneusage in the bcbio R packages, so I decided to split these out into separate functions. Now
tx2genework consistently with the
annotablefunction to return gene-to-symbol and transcript-to-gene identifier mappings in a
markdownPlotlistare now exported as S4 generics. The
md*function variants are now exported as aliases.
geomeanhas been renamed to
selectSamples. These functions are now deprecated here in basejump (see
deprecated.Rfile for more information).
revcomphave been deprecated in favor of
reverseComplementfrom the Biostrings package.
warn, in place of
loadData. Additionally, the file name must match the internal name in the RData file, otherwise
loadDatawill warn the user. This is more strict than the default behavior of
base::load, but helps prevent accidental overwrite in the current working environment.
localOrRemoteFile, previously an internal function, is now exported.
annotablenow uses internal GRCh37 annotations from the annotables package, which is saved in the
extdata/directory internally. Previously, these genome annotations were accessed from lazy loaded data saved in the
data/directory of the package repository.
annotablesnow checks for all packages attached by ensembldb and AnnotationHub and forces detachment at the end of the function call. Otherwise, this can result in the unwanted effect of ensembldb masking other user-loaded functions, such as the tidyverse suite (e.g.
camelnow handles delimited numbers a little differently. Previously, delimiters in between numbers (e.g. the commas in “1,000,000”) were stripped. Sometimes this can result in confusing names. For example, if we have a column formatted in dotted case containing a decimal (e.g. “resolution.1.6”), the decimal would be stripped (e.g. “resolution16” in camel). Now, we sanitize a numeric delimiter as a lower case “x” character (e.g. “resolution1x6”). This ensures that numbers containing decimals remain semantically meaningful when sanitized with
grepl) calls have been simplified to use the default order of “pattern, replacement, x”.
readSampleMetadataFile. We were detecting the presence of
indexcolumn but should instead check against
dgCMatrixmethod support in
aggregateFeaturesfunctions. Both of these functions now use a consistent
groupingsparameter, which uses a named factor to define the mappings of either samples (columns) for
aggregateReplicatesor genes/transcripts (rows) for
makeNamessanitization functions. Now they will work on
names(x)for vectors by default.
detectOrganismto match against “H. sapiens”, etc.
NAvalues from LibreOffice and Microsoft Excel output in
readFileByExtension. This function now sets
microplatecode from the wormbase package here, since it’s of general interest.
sanitizeAnnotableutility functions that will be used in the bcbio R packages.
midnightThemeggplot2 theme. Originally this was defined as
darkThemein the bcbioSingleCell package, but can be useful for other plots and has been moved here for general bioinformatics usage. The theme now uses
ggplot2::theme_minimalas the base, with some color tweaks, namely dark gray axes without white axis lines.
loadDataAsNamenow default to
replace = TRUE. If an object with the same name exists in the destination environment, then a warning is generated.
collapseToStringonly attempts to dynamically return the original object class on objects that aren’t class
data.frame. I updated this code to behave more nicely with grouped tibbles (
grouped_df), which are a virtual class of
data.frameand therefore can’t be coerced using
NULLfor integers and numerics.
prepareSummarizedExperiment, added support for dropping
NULLobjects in assays list. This is useful for handling output from bcbioRNASeq when
transformLimitis reached. In this case, the
vstmatrices aren’t generated and set
NULLin the assays list. Using
Filter(Negate(is.null), assays)we can drop these
NULLobjects and prevent a downstream dimension mismatch in the
readSampleMetadataFile. This now checks for a sequence column containing ACGT nucleotides. When those are detected, the
revcompcolumn is generated. Otherwise this step is skipped. This is useful for handling multiplexed sample metadata from 10X Genomics Cell Ranger single-cell RNA-seq samples.
annotablefunction to include nested Entrez identifiers in the
entrezcolumn. This is useful for downstream functional analysis.
toStringUniquecode, which is still in use in the wormbase package.
detectOrganism. Now allowing
NULLreturn for unsupported organism, with a warning.
saveData. Now will skip on existing files when
overwrite = FALSE.
readDataVersions, which shouldn’t have the column types defined, using
col_types = "ccT".
loadDataAsName. Now rather than using a named character vector for the
mappingsargument, the user can simply pass the key value pairs in as dots. For example,
newName1 = "oldName1", newName2 = "oldName2". The legacy
mappingsmethod will still work, as long as the dots argument is a length of 1.
rowDatato be left unset in
prepareSummarizedExperiment. This is useful for setting up objects that don’t contain gene annotations.
readSampleMetadata. This feature wasn’t fully baked and doesn’t offer enough functionality to the user.
detectOrganismand added support for chicken genome.
prepareSummarizedExperimentto make sample loading with
loadSingleCellin the bcbio packages less confusing.
*GTFalias functions to simply wrap the
*GFFfunctions with S4 methods support.
camelsyntax for both lax and strict modes. Added
gsubin internal functions.
loadRemoteDatato a standard function instead of using S4 dispatch, allowing the
envirargument to be set properly.
*GFFfunction variants for
selectSamples. These functions are saved in
methods-*.Rfiles where applicable.
prepareSummarizedExperimentgeneric definition as primary object.
assignAndSaveDatato add silent return of file path.
) in chain operations with magrittr pipe (%>%`).
.prepareSampleMetadatautility function, for use with loading sample metadata from an external CSV, Excel, or YAML file.
loadDatafunctionality back to the package.
annotablefunction documentation and support for Ensembl release versions.
snakename functions. Added the
makeNamesfunctions, by splitting each into their own separate methods file.
geomeanfunction. Also improved internal code of
geomeanbased on Paul McMurdie’s Stack Overflow post. See function documentation for more information.
collapseToString, to avoid NAMESPACE collisions with tidyverse packages (dplyr, glue).
prepareSummarizedExperiment. Improved row and column name handling in the function. It now outputs more helpful diagnostic messages on error.
detectHPCfunction to allow for unit testing.
prepareSEfor better semantic meaning.
onLoad.Rscript back to ensure proper attachment of annotables data package.
parent.frameassignment not work correctly.