Introduction to CohortSymmetry

CohortSymmetry provides tools to perform Sequence Symmetry Analysis (SSA). Before using the package, it is highly recommended that this method is tested beforehand against well-known positive and negative controls. The details of SSA and the relevant controls could be found using Pratt et al (2015).

Below, you will find an example analysis that offers a brief and comprehensive overview of the package’s functionalities. More context and further examples for each of these functions are provided in later vignettes.

library(CDMConnector)
library(dplyr)
library(DBI)
library(omock)
library(CohortSymmetry)
library(duckdb)

The CohortSymmetry package works with data mapped to the OMOP CDM. Hence, the initial step involves connecting to a database. As an example, we will be using Omock package to generate a mock database with two mock cohorts: the index_cohort and the marker_cohort.

cdm <- emptyCdmReference(cdmName = "mock") |>
  mockPerson(nPerson = 100) |>
  mockObservationPeriod() |>
  mockCohort(
    name = "index_cohort",
    numberCohorts = 1,
    cohortName = c("index_cohort"),
    seed = 1,
  ) |>
  mockCohort(
    name = "marker_cohort",
    numberCohorts = 1,
    cohortName = c("marker_cohort"), 
    seed = 2
  )

con <- dbConnect(duckdb::duckdb())
cdm <- copyCdmTo(con = con, cdm = cdm, schema = "main", overwrite = T)

Once we have established a connection to the database, we can use the generateSequenceCohortSet() function to find the intersection of the two cohorts. This function will provide us with the individuals who appear in both cohorts, which will be named intersect - another cohort in the cdm reference.

cdm <- generateSequenceCohortSet(
  cdm = cdm,
  indexTable = "index_cohort",
  markerTable = "marker_cohort",
  name = "intersect",
  combinationWindow = c(0, Inf)
)

See below that the generated cohort follows the format of an OMOP CDM cohort with the addition of two extra columns: index_date and marker_date. These columns correspond to the cohort_start_date in the index_cohort and the marker_cohort, respectively.

cdm$intersect |> 
  dplyr::glimpse()
#> Rows: ??
#> Columns: 6
#> Database: DuckDB v1.0.0 [xihangc@Windows 10 x64:R 4.4.1/:memory:]
#> $ cohort_definition_id <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
#> $ subject_id           <int> 1, 2, 6, 13, 15, 17, 18, 20, 21, 23, 25, 26, 28, …
#> $ cohort_start_date    <date> 1982-07-06, 2004-01-19, 1981-09-26, 2015-05-05, …
#> $ cohort_end_date      <date> 1982-10-21, 2004-02-02, 1983-08-14, 2015-06-24, …
#> $ index_date           <date> 1982-07-06, 2004-02-02, 1981-09-26, 2015-06-24, …
#> $ marker_date          <date> 1982-10-21, 2004-01-19, 1983-08-14, 2015-05-05, …

Once we have the intersect cohort, you are able to explore the temporal symmetry by using summariseTemporalSymmetry, tableTemporalSymmetry, and plotTemporalSymmetry():

temporal_symmetry <- summariseTemporalSymmetry(
  cohort = cdm$intersect, 
  timescale = "year")

tableTemporalSymmetry(result = temporal_symmetry)

plotTemporalSymmetry(result = temporal_symmetry)

Next, we will use the summariseSequenceRatios() function to get the crude sequence ratios, adjusted sequence ratios, and the corresponding confidence intervals.

sequence_ratio <- summariseSequenceRatios(cohort = cdm$intersect)

tableSequenceRatios(result = sequence_ratio)

plotSequenceRatios(result = sequence_ratio)

As a diagram

Diagrammatically, the work flow using CohortSymmetry resembles the following flow chat: