The connector
package provides a set of functions to
connect to different data sources (such as databases and file systems)
and read and write data from them using a consistent interface.
It is designed to be a generic and extensible package, so that new data sources can be added easily.
This vignette demonstrates how to use the connector
package to connect to either a file system or a
database to access different types of data.
The main function in this package is connect()
. This
function, based on a configuration file or a list, creates a
connectors
object with a connector
for each of
the specified data sources. The configuration file can be in list
format, JSON, or YAML format.
The input list (or configuration file) must have the following structure:
metadata
, env
, and
datasources
fields are allowed.datasources
is mandatory.metadata
and
env
must each be a list of named character
vectors of length 1.datasources
must be a list of unnamed
lists.name
and the named list element
backend
.type
must be
provided.Here is an example anyone can run to see how the
connector
package works. We will use the configuration file
provided below, which uses the file system as the connection type for
ADaM and TFL data.
_connector.yml:
metadata:
adam_path: !expr file.path(getwd(), "adam")
tfl_path: !expr file.path(getwd(), "tfl")
datasources:
- name: "adam"
backend:
type: "connector::connector_fs"
path: "{metadata.adam_path}"
- name: "tfl"
backend:
type: "connector::connector_fs"
path: "{metadata.tfl_path}"
As you can see, the configuration file contains metadata about the
paths to the directories where the data will be stored, and two data
sources: adam
and tfl
, both using the
connector_fs
backend to connect to file system folders.
Note that the paths to the directories are defined using metadata
variables (e.g., {metadata.adam_path}
), which allows you to
easily change the paths in one place.
Now, let’s run the example:
The first step is to create the connections to the data sources.
# Load data connections
db <- connect()
#> ────────────────────────────────────────────────────────────────────────────────
#> Connection to:
#> → adam
#> • connector::connector_fs
#> • /private/var/folders/kv/q2rqqp3s0s5f9rxn_854l2lm0000gp/T/RtmpZyw2fH/filefe513b5ee525/adam
#> ────────────────────────────────────────────────────────────────────────────────
#> Connection to:
#> → tfl
#> • connector::connector_fs
#> • /private/var/folders/kv/q2rqqp3s0s5f9rxn_854l2lm0000gp/T/RtmpZyw2fH/filefe513b5ee525/tfl
Next, we manipulate the iris dataset and store it in the
adam
connector. This means we will create a subset of the
iris dataset and save it as an RDS file in the adam
directory.
## Iris data
setosa <- iris |>
filter(Species == "setosa")
## Store data
db$adam |>
write_cnt(setosa, "setosa.rds")
We can also create more complex summaries and store them in the same connector.
mean_for_all_iris <- iris |>
group_by(Species) |>
summarise_all(list(mean, median, sd, min, max))
db$adam |>
write_cnt(mean_for_all_iris, "mean_iris.rds")
## List and load data
db$adam |>
list_content_cnt()
#> [1] "mean_iris.rds" "setosa.rds"
We can also read back the data we just created and filter it further
using the read_cnt()
function.
# Read and filter data
setosa_filtered <- db$adam |>
read_cnt("setosa") |>
filter(Sepal.Length > 5)
#> → Found one file: '/private/var/folders/kv/q2rqqp3s0s5f9rxn_854l2lm0000gp/T/RtmpZyw2fH/filefe513b5ee525/adam/setosa.rds'
Finally, we can create a plot with the ggplot2
package
and store it in the tfl
connector.
# Create a plot
plot_setosa <- ggplot(setosa_filtered) +
aes(x = Sepal.Length, y = Sepal.Width) +
geom_point()
## Store data and plot objects
db$tfl |>
write_cnt(plot_setosa$data, "setosa_data.csv")
db$tfl |>
write_cnt(plot_setosa, "setosa_plot.rds")
## Store plot image
tmp_file <- tempfile(fileext = ".png")
ggsave(tmp_file, plot_setosa)
#> Saving 3 x 3 in image
db$tfl |>
upload_cnt(tmp_file, "setosa_plot.png")
# List all files in the TFL directory
db$tfl |>
list_content_cnt()
#> [1] "setosa_data.csv" "setosa_plot.png" "setosa_plot.rds"