ecodive
is an R package for calculating ecological
diversity metrics in a parallelized and memory-efficient manner. It is
designed to handle large datasets, such as those common in microbiome
research, with significant speed and memory improvements over existing
tools.
Analyzing ecological diversity is often a computational bottleneck,
especially with large datasets. ecodive
addresses this by
providing:
High Performance: ecodive
is
written in C and parallelized using pthreads, making it dramatically
faster than other R packages. Benchmarks
show it can be up to 43,000x faster and use up to
33,000x less memory.
Zero Dependencies: The package has no external R dependencies, making it lightweight, stable, and easy to install. This also makes it an ideal and secure backend for other R packages.
Comprehensive Metrics: It implements a wide range of common alpha and beta diversity metrics, including classic and phylogenetic-aware methods like Faith’s PD and the complete UniFrac family.
Ease of Use: The API is simple and integrates
seamlessly with popular bioinformatics packages like phyloseq
and
rbiom
.
The latest stable version can be installed from CRAN.
install.packages('ecodive')
The development version is available on GitHub. Please note that this method requires a compiler - see http://www.rstudio.com/ide/docs/packages/prerequisites if the installation does not succeed on the first try.
install.packages('pak')
::pak('cmmr/ecodive') pak
ecodive
functions are straightforward to use. Here are a
few examples.
phyloseq
or
rbiom
objectsThe easiest way to use ecodive
is with a
phyloseq
or rbiom
object. These objects
conveniently bundle the count data and phylogenetic tree.
library(ecodive)
data(esophagus, package = 'phyloseq')
data(hmp50, package = 'rbiom')
# Calculate weighted UniFrac distance
<- weighted_unifrac(esophagus)
w_unifrac print(w_unifrac)
#> B C
#> C 0.1050480
#> D 0.1401124 0.1422409
# Calculate Faith's Phylogenetic Diversity
<- faith(hmp50)
faith_pd print(faith_pd[1:4])
#> HMP01 HMP02 HMP03 HMP04
#> 6.22296 8.59432 8.93375 9.86597
You can also provide the count data and phylogenetic tree as separate objects.
The ex_counts
and ex_tree
objects are
included with ecodive
.
## Example Data ----------------------
<- rarefy(ex_counts)
counts
counts#> Saliva Gums Nose Stool
#> Streptococcus 162 309 6 1
#> Bacteroides 2 2 0 341
#> Corynebacterium 0 0 171 1
#> Haemophilus 180 34 0 1
#> Propionibacterium 1 0 82 0
#> Staphylococcus 0 0 86 1
## Alpha Diversity -------------------
shannon(counts)
#> Saliva Gums Nose Stool
#> 0.74119910 0.35692121 1.10615349 0.07927797
faith(counts, tree = ex_tree)
#> Saliva Gums Nose Stool
#> 180 155 101 202
## Beta Diversity --------------------
bray_curtis(counts)
#> Saliva Gums Nose
#> Gums 0.4260870
#> Nose 0.9797101 0.9826087
#> Stool 0.9884058 0.9884058 0.9913043
weighted_unifrac(counts, tree = ex_tree)
#> Saliva Gums Nose
#> Gums 36.97681
#> Nose 67.23768 55.97101
#> Stool 109.77971 109.44058 110.00870
The online manual for ecodive is available at https://cmmr.github.io/ecodive/. It includes a getting started guide, articles on alpha/beta diversity, and reference pages for each function.
The following commands will check if ecodive passes the bundled testing suite.
install.packages('testthat')
::test_check('ecodive') testthat
Bug reports, feature requests, and general questions can be submitted at https://github.com/cmmr/ecodive/issues.
Pull requests are welcome. Please ensure contributed code is covered by tests and documentation (add additional tests and documentation as needed) and passes all automated tests.
New functions must leverage C and pthreads to minimize memory and CPU time.
Please note that the ecodive project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.