gwas2crispr

GWAS‑to‑CRISPR: streamlined extraction of significant GWAS SNPs, metadata aggregation and optional FASTA/BED/CSV export for downstream CRISPR design (GRCh38/hg38).

Overview

Genome‑wide association studies (GWAS) link traits to genetic variants, but raw summary statistics are not directly usable for guide design. gwas2crispr bridges this gap. It retrieves significant single‑nucleotide polymorphisms (SNPs) for a given Experimental Factor Ontology (EFO) trait, annotates them with gene and study metadata, and returns in‑memory summaries. When requested, it also writes ready‑to‑use CSV, BED and FASTA files for high‑throughput CRISPR target design. All genomic coordinates are mapped to GRCh38/hg38.

Core functions

CRAN‑safe examples: the package does not write files by default. Examples that perform network operations or file writing are wrapped in \donttest{}. When you supply out_prefix, outputs are written only to paths you specify — in documentation we use tempdir().


Installation

Requirements (read first)

Install the core prerequisite (GWAS Catalog client)

install.packages("gwasrapidd")

Install Bioconductor dependencies (for FASTA)

if (!requireNamespace("BiocManager", quietly = TRUE))
  install.packages("BiocManager")
BiocManager::install(c("Biostrings", "BSgenome.Hsapiens.UCSC.hg38"))

Install from GitHub

Until the package is on CRAN, install the development version directly:

if (!requireNamespace("devtools", quietly = TRUE))
  install.packages("devtools")
devtools::install_github("leopard0ly/gwas2crispr")

After CRAN release you will be able to run:

install.packages("gwas2crispr")

Quick start (primary workflow)

Use a clear prefix and write outputs (CSV/BED/FASTA) to your current working directory:

library(gwas2crispr)

run_gwas2crispr(
  efo_id    = "EFO_0000707",  # lung disease (example)
  p_cut     = 1e-6,
  flank_bp  = 300,
  out_prefix = "lung"         # produces: lung_snps_full.csv / lung_snps_hg38.bed / lung_snps_flank300.fa
)

Outputs

A) Object‑only (no files written)

library(gwas2crispr)

res <- run_gwas2crispr(
  efo_id     = "EFO_0001663",  # Prostate cancer
  p_cut      = 5e-8,
  flank_bp   = 200,
  out_prefix = NULL            # <- no writing; returns objects only
)

res$summary   # one‑row tibble: n_SNPs, SNPs_w_gene, unique_genes, n_studies
res$chr_freq  # table of chromosomes by SNP count

B) Write files to a safe temporary directory (secondary)

out <- file.path(tempdir(), "prostate")  # CRAN‑friendly
res <- run_gwas2crispr(
  efo_id     = "EFO_0001663",
  p_cut      = 5e-8,
  flank_bp   = 200,
  out_prefix = out
)

res$csv    # path to <prefix>_snps_full.csv
res$bed    # path to <prefix>_snps_hg38.bed
res$fasta  # path to <prefix>_snps_flank<bp>.fa (only if BSgenome installed)

Output file names


Command‑line interface (CLI)

A portable Rscript is installed in the package under inst/scripts/gwas2crispr.R. Use it to run the pipeline from the shell. The script relies on the optparse package; install it if missing.

Windows (Command Prompt)

Version-agnostic (recommended):

Rscript -e "cat(system.file('scripts','gwas2crispr.R', package='gwas2crispr'))" ^
  | Rscript -- -e EFO_0001663 -p 5e-8 -f 200 -o "%CD%\prostate"

Fixed output (current folder):

"C:\Program Files\R\R-4.4.1\bin\Rscript.exe" ^
  "C:\Users\ZAD ECT\AppData\Local\R\win-library\4.4\gwas2crispr\scripts\gwas2crispr.R" ^
  -e EFO_0001663 -p 5e-8 -f 200 -o "%CD%\prostate"

Temporary output (system temp):

Rscript -e "cat(system.file('scripts','gwas2crispr.R', package='gwas2crispr'))" ^
  | Rscript -- -e EFO_0001663 -p 5e-8 -f 200 -o "%TEMP%\prostate"

Linux/macOS (Bash)

Fixed output (current folder):

Rscript "$(Rscript -e 'cat(system.file("scripts","gwas2crispr.R", package="gwas2crispr"))')" -e EFO_0001663 -p 5e-8 -f 200 -o "$PWD/prostate"

Temporary output (system temp):

Rscript "$(Rscript -e 'cat(system.file("scripts","gwas2crispr.R", package="gwas2crispr"))')" -e EFO_0001663 -p 5e-8 -f 200 -o "$(mktemp -d)/prostate"

Options

If you omit the -o/--out option, no files are written. Use -v/--verbose to emit a concise summary of the run.


Function reference

fetch_gwas(efo_id, p_cut = 5e-8)

Fetch significant associations for an EFO trait. Tries gwasrapidd::get_associations() first; if no rows or an error is returned, falls back to the EBI GWAS REST API.

run_gwas2crispr(efo_id, p_cut = 5e-8, flank_bp = 200, out_prefix = NULL)

Runs the full pipeline: fetches GWAS data, merges gene and study annotations, and returns a list with summary and chr_freq. When out_prefix is provided, the list also contains file paths to the written csv, bed and optional fasta files.


Reproducibility & file layout


Testing

Automated tests live in tests/testthat/ and avoid network calls on CRAN via skip_on_cran(). To run the test suite locally:

devtools::test()

Notes on resources


Citation

Please cite gwas2crispr and the resources it builds upon. To see the formatted citation:

citation("gwas2crispr")

Additional background: Sudlow et al. (2015) UK Biobank: An open access resource for identifying the causes of a wide range of complex diseases of middle and old age doi:10.1093/nar/gkv1256.


Getting help

License

MIT © Othman S. I. Mohammed — see the LICENSE file for details.

Acknowledgments

This package builds upon gwasrapidd and the EBI GWAS REST API. Sequence handling and genome data are powered by Biostrings and BSgenome.