Title: | Metabolic Pathway Completeness and Abundance Calculation |
Version: | 1.1 |
Description: | Provides tools for analyzing metabolic pathway completeness, abundance, and transcripts using KEGG Orthology (KO) data from (meta)genomic and (meta)transcriptomic studies. Supports both completeness (presence/absence) and abundance-weighted analyses. Includes built-in KEGG reference datasets. For more details see Li et al. (2023) <doi:10.1038/s41467-023-42193-7>. |
URL: | https://github.com/LiuyangLee/mclink |
BugReports: | https://github.com/LiuyangLee/mclink/issues |
License: | GPL-3 |
Encoding: | UTF-8 |
RoxygenNote: | 7.3.2 |
Imports: | data.table (≥ 1.17.0), dplyr (≥ 1.1.4), stringr (≥ 1.5.1), tibble (≥ 3.2.1), utils |
Depends: | R (≥ 3.5) |
LazyData: | false |
Config/check/use_internal_tz: | TRUE |
NeedsCompilation: | no |
Packaged: | 2025-08-19 10:17:04 UTC; Administrator |
Author: | Liuyang Li |
Maintainer: | Liuyang Li <cyanobacteria@yeah.net> |
Repository: | CRAN |
Date/Publication: | 2025-08-22 18:20:06 UTC |
KEGG Orthology (KO) Abundance/Presence Across Microbial Samples or Genomes
Description
A test dataset (wide-format) showing the relative abundance of KEGG Orthology (KO) entries across multiple microbial samples. Values represent normalized abundance metrics (e.g., TPM, RPKM, relative percentage, presence/absence).
Usage
KO_Sample_wide
Format
A data frame with 2495 rows (KO entries) and 5 variables:
- KO
Character. KEGG Orthology ID (e.g., "K00001").
- Marinobacter salarius
Numeric. Abundance in Genome "Marinobacter salarius".
- Pseudooceanicola nanhaiensis
Numeric. Abundance in Genome "Pseudooceanicola nanhaiensis".
- Alteromonas australica
Numeric. Abundance in Genome "Alteromonas australica".
- Henriciella pelagia
Numeric. Abundance in Genome "Henriciella pelagia".
KEGG Orthology (KO) Pathway Information Dataset
Description
A comprehensive dataset mapping KEGG Orthology (KO) entries to metabolic pathways, including module hierarchy, definitions, and enzyme annotations.
Usage
KO_pathway_ref
Format
A data frame with 3846 rows (KO entries) and 10 variables:
- Orthology_Entry
Character. KEGG Orthology ID (e.g., "K00844").
- Module_Type
Character. Type of metabolic module (e.g., "Pathway modules").
- Level_2
Character. Broad metabolic category (e.g., "Carbohydrate metabolism").
- Level_3
Character. Specific metabolic subcategory (e.g., "Central carbohydrate metabolism").
- Module_Entry
Character. KEGG Module ID (e.g., "M00001").
- Module_Name
Character. Full name of the metabolic module (e.g., "Glycolysis (Embden-Meyerhof pathway), glucose => pyruvate").
- Definition
Character. KO composition of the module, truncated in display (e.g., "(K00844,K12407,...)").
- Orthology_Symbol
Character. Short symbol for the KO (e.g., "HK" for hexokinase).
- Orthology_Name
Character. Full enzyme name with EC number (e.g., "hexokinase
[EC:2.7.1.1]
").- KO_Symbol
Character. Combined KO ID and symbol (e.g., "K00844; HK").
Add Missing Rows to a Data Frame
Description
This function checks if specified rows exist in a data frame, and if not, adds them with all values set to 0. Useful for ensuring consistent KO representation across samples.
Usage
add_rows_if_not_exists(
module_abundance,
add_rows = c("K14126", "K14128", "K14127")
)
Arguments
module_abundance |
A data frame with row names representing KO identifiers (e.g., K numbers) and numeric abundance values |
add_rows |
A character vector of row names (KO identifiers) that should be present in the output. Defaults to c("K14126","K14128","K14127") |
Value
The original data frame with additional rows (if any were missing) where all values are set to 0. Row and column names are preserved.
Calculate Log2 Ratio of Sample Values to Row Means
Description
This function calculates the log2 ratio of each value in a data frame to its corresponding row mean. Zero values are preserved as zeros in the output.
Usage
ata_cal(data = KO_Sample_table)
Arguments
data |
A data frame or matrix containing numerical values to be processed. Rows represent features (e.g., KO terms) and columns represent samples. Default is KO_Sample_table. |
Value
A data frame of the same dimensions as input, where each value is: - log2(sample_value/row_mean) when both sample_value and row_mean are non-zero - 0 when either sample_value or row_mean is zero The Mean_RA column used for calculations is removed from the output.
Convert Abundance Values to Presence/Absence Indicators
Description
Transforms a numeric abundance matrix into a binary presence/absence matrix, where 1 indicates presence (abundance > 0) and 0 indicates absence. Preserves row names as Orthology_Entry column in the output.
Usage
convert_abundance_to_presence(module_abundance)
Arguments
module_abundance |
A data frame containing KO abundance data, must include: - Rows named by Orthology_Entry (KO identifiers) - Numeric columns representing sample abundances - An Orthology_Entry column |
Value
A data frame with: - Binary values (1 = present, 0 = absent) for each sample - Original row names preserved in Orthology_Entry column - Same dimensions as input (excluding the Orthology_Entry column)
Create and Export Pathway-Specific Module Sample Files
Description
Processes pathway information and module sample data to create and export individual pathway-specific files containing scaled module data.
Usage
create_sub_module_sample(
pathway_infor,
Module_Sample_scale,
out_DIR_Module_Sample_by_pathway,
plus_scale_method,
comma_scale_method
)
Arguments
pathway_infor |
Data frame containing pathway information, see examples. |
Module_Sample_scale |
Data frame containing scaled module sample data with module names as row names |
out_DIR_Module_Sample_by_pathway |
Character string specifying output directory |
plus_scale_method |
Scaling method for plus-separated KOs ("mean", "min", or "max") |
comma_scale_method |
Scaling method for comma-separated KOs ("sum" or "max") |
Value
None (writes files to disk)
Escape Special Characters in a String
Description
Escapes all specified special characters in a string by adding a backslash before them. This is particularly useful for preparing strings for use in regular expressions or other contexts where special characters need to be treated as literals.
Usage
escape_special_chars(s)
Arguments
s |
A character string to be processed |
Value
A new string with all specified special characters escaped with backslashes
Extract Innermost Parentheses Content
Description
Extracts all text segments enclosed in the innermost level of parentheses from a string. This is useful for parsing hierarchical or nested parenthetical expressions.
Usage
extract_inner_brackets(s)
Arguments
s |
A character string to process (can contain multiple parenthetical groups) |
Value
A character vector containing all innermost parenthesized segments Returns empty character vector if no matches found
Group KO Abundance Data by Module
Description
Processes KO abundance data to group by metabolic modules, converting presence/absence data into module-level KO lists. Handles missing KOs and maintains sample-specific KO profiles.
Usage
group_ko_by_module(pathway_infor, Sample_KO_abundance)
Arguments
pathway_infor |
Data frame containing pathway information, see examples. |
Sample_KO_abundance |
Data frame of KO abundances with: - Rows as KO identifiers - Columns as samples - Orthology_Entry column |
Value
A data frame where: - Rows are module names - Columns are samples - Cell values are space-separated lists of present KOs - Empty strings for modules with no detected KOs
Metabolic Pathway Coverage Analysis
Description
Analyzes metabolic pathway completeness/abundance from (meta)genome KO presence/abundance data. Can use either built-in KEGG datasets or user-provided data frames. Output includes pathway coverage metrics and detected KOs in each pathway/module.
Usage
mclink(
ref = NULL,
data = NULL,
table_feature = "completeness",
plus_scale_method = "mean",
comma_scale_method = "max",
out_dir = NULL,
split_by_pathway = FALSE
)
Arguments
ref |
Pathway information data frame. When |
data |
Sample KO abundance data frame. When |
table_feature |
Analysis type, either:
|
plus_scale_method |
Scaling method for plus-separated KOs (subunits/complexes):
|
comma_scale_method |
Scaling method for comma-separated KOs (isoforms/alternatives):
|
out_dir |
Output directory path. If |
split_by_pathway |
Logical. If |
Value
A list containing:
coverage - Data frame with pathway coverage metrics
detected_KOs - List of detected KOs per pathway/module
log - log of the analysis process
If out_dir
is specified, results are also written as TSV files.
Examples
data(KO_pathway_ref)
data(KO_Sample_wide)
selected_modules <- c("M00176","M00165","M00173","M00374","M00375","M00376","M00377")
KO_pathway_ref_selected <- KO_pathway_ref[KO_pathway_ref$Module_Entry %in% selected_modules, ]
mc_list =
mclink(ref = KO_pathway_ref_selected,
data = KO_Sample_wide,
table_feature = "completeness",
plus_scale_method = "min",
comma_scale_method = "max")
mc_coverage = mc_list[["coverage"]]
mc_detected_KOs = mc_list[["detected_KOs"]]
mc_log = mc_list[["log"]]
print(head(mc_coverage))
Merge Module Information with Module Table
Description
Merges pathway information with a module table to create a sample-by-module matrix with proper module names. Ensures all modules are represented in the output.
Usage
merge_module_name(pathway_infor, module_table)
Arguments
pathway_infor |
Data frame containing pathway information, see examples. |
module_table |
Data frame containing module data with: - Module_Entry: Matching module identifiers - Orthology_Entry: KO identifiers - Definition: Module definitions - Sample columns with abundance values |
Value
A data frame where: - Rows are module names (from Module_Name) - Columns are samples - All modules from pathway_infor are represented - Original row names are replaced with descriptive module names
Process All Modules in Pathway Information
Description
Processes all metabolic modules in pathway information, handling each module's structure, definition, and bracket components. Aggregates results across all modules.
Usage
process_all_modules(
pathway_infor,
Sample_KO,
plus_scale_method,
comma_scale_method,
verbose = TRUE
)
Arguments
pathway_infor |
Data frame containing pathway information, see examples. |
Sample_KO |
Data frame containing KO (KEGG Orthology) sample data |
plus_scale_method |
Scaling method for plus-separated KOs ("mean", "min", or "max") |
comma_scale_method |
Scaling method for comma-separated KOs ("sum" or "max") |
verbose |
Logical controlling console output:
|
Value
A list with two components:
-
data
: A data frame of processed results for all modules, with unique rows to avoid duplicates. -
log
: A character vector of timestamped log messages.
Process All Pathways Analysis
Description
Processes module sample data across all pathways with specified scaling methods. Handles different comparison methods and outputs results by pathway.
Usage
process_all_pathways(
pathway_infor,
Module_Sample,
out_DIR_Module_Sample_by_pathway,
compare_method = c("log", "avg", "round"),
plus_scale_method,
comma_scale_method
)
Arguments
pathway_infor |
Data frame containing pathway information, see examples. |
Module_Sample |
Data frame of module sample data to process |
out_DIR_Module_Sample_by_pathway |
Output directory for pathway-specific results |
compare_method |
Comparison method to use: "log" (log10 transform), "avg" (average calculation), or "round" (simple rounding) |
plus_scale_method |
Scaling method for plus-separated KOs ("mean", "min", or "max") |
comma_scale_method |
Scaling method for comma-separated KOs ("sum" or "max") |
Value
Main outputs are written to: - Combined module file (All_modules.*.tsv) - Pathway-specific files (via create_sub_module_sample)
Process Module Brackets Recursively
Description
Recursively processes nested brackets in module definitions to calculate pathway completeness or abundance scores. Handles complex pathway structures with multiple nesting levels.
Usage
process_module_brackets(
module_abundance = sub_Sample_KO_pathway,
module_steps_str = module_steps_str,
bracket_count = 1,
step_count = 1,
module_name = "Module",
raw_module_steps = module_steps_str,
plus_scale_method,
comma_scale_method,
abundance_log = list()
)
Arguments
module_abundance |
Data frame containing KO abundance data for the module |
module_steps_str |
String representation of module steps/structure |
bracket_count |
Counter for tracking nested bracket levels |
step_count |
Counter for tracking processing steps |
module_name |
Name of the module being processed |
raw_module_steps |
Original unprocessed module steps string |
plus_scale_method |
Scaling method for plus-separated KOs ("mean", "min", or "max") |
comma_scale_method |
Scaling method for comma-separated KOs ("sum" or "max") |
abundance_log |
A character vector of timestamped log messages |
Value
A list with two components:
-
data
: Data frame containing processed abundance values with: - Rows for each bracket level and final step - Consistent sample columns as input -
log
: A character vector of timestamped log messages.
Process Module Definition String
Description
Cleans and processes module definition strings by removing various patterns and formatting elements to extract core KO relationships. Handles special cases like negative KOs and parenthetical expressions.
Usage
process_module_definition(sub_Sample_KO_pathway)
Arguments
sub_Sample_KO_pathway |
Data frame containing module definitions in a 'Definition' column |
Value
A list with log and character vector of cleaned module definition strings. The vector contains: - Removed negative KO indicators - Simplified parentheses - Normalized space
Process Module Components with Comma Handling
Description
Processes a vector of KOs, applying different handling methods depending on whether they contain commas or not. Useful for processing complex pathway definitions with alternative KOs.
Usage
process_module_loop_comma(
KO_vector,
module_abundance,
process_step_comma,
process_step_direct,
aggregrate_rowname = "step_1",
step_count = 1,
comma_scale_method
)
Arguments
KO_vector |
Character vector of KO identifiers to process |
module_abundance |
Data frame containing KO abundance data |
process_step_comma |
Function to handle comma-separated KOs (alternative forms) |
process_step_direct |
Function to handle individual KOs |
aggregrate_rowname |
Base name for row aggregation (default: 'step_1') |
step_count |
Counter for processing steps (default: 1) |
comma_scale_method |
Scaling method for comma-separated KOs ("sum" or "max") |
Value
List containing: - abundance_table: Processed abundance Data frame - step_count: Updated step counter - abundance_log: log
Process Module Components with Plus and Comma Handling
Description
Processes a vector of KOs, applying different handling methods depending on whether they contain plus signs, commas, or both. Handles complex pathway definitions with both required components (plus-separated) and alternative forms (comma-separated).
Usage
process_module_loop_plu_comma(
KO_vector,
module_abundance,
process_step_plus,
process_step_comma,
process_step_direct,
aggregrate_rowname = "step_1",
step_count = 1,
plus_scale_method,
comma_scale_method
)
Arguments
KO_vector |
Character vector of KO identifiers to process |
module_abundance |
Data frame containing KO abundance data |
process_step_plus |
Function to handle plus-separated KOs (required components) |
process_step_comma |
Function to handle comma-separated KOs (alternative forms) |
process_step_direct |
Function to handle individual KOs |
aggregrate_rowname |
Base name for row aggregation (default: 'step_1') |
step_count |
Counter for processing steps (default: 1) |
plus_scale_method |
Scaling method for plus-separated KOs ("mean", "min", or "max") |
comma_scale_method |
Scaling method for comma-separated KOs ("sum" or "max") |
Value
List containing: - abundance_table: Processed abundance values - step_count: Updated step counter - abundance_log: log
Process Module Components with Plus Sign Handling
Description
Processes a vector of KOs, applying different handling methods depending on whether they contain plus signs or not. Handles pathway definitions with required components (plus-separated KOs representing complex subunits).
Usage
process_module_loop_plus(
KO_vector,
module_abundance,
process_step_plus,
process_step_direct,
aggregrate_rowname = "step_1",
step_count = 1,
plus_scale_method
)
Arguments
KO_vector |
Character vector of KO identifiers to process |
module_abundance |
Data frame containing KO abundance data |
process_step_plus |
Function to handle plus-separated KOs (required components) |
process_step_direct |
Function to handle individual KOs |
aggregrate_rowname |
Base name for row aggregation (default: 'step_1') |
step_count |
Counter for processing steps (default: 1) |
plus_scale_method |
Scaling method for plus-separated KOs ("mean", "min", or "max") |
Value
List containing: - abundance_table: Processed abundance values - step_count: Updated step counter - abundance_log: log
Process Module Steps with Complex KO String Handling
Description
Processes a KO string containing various combinations of KOs separated by different operators (commas, plus signs, or spaces). Handles complex pathway definitions with multiple types of relationships between KOs.
Usage
process_module_step(
module_abundance,
KO_string = "K03388,K03389+K03390+K14083,K14126+K14127,K14128",
aggregrate_rowname = "bracket_1",
step_count = 1,
plus_scale_method,
comma_scale_method
)
Arguments
module_abundance |
Data frame containing KO abundance data |
KO_string |
String representation of KO relationships (default: "K03388,K03389+K03390+K14083,K14126+K14127,K14128") |
aggregrate_rowname |
Base name for row aggregation (default: 'bracket_1') |
step_count |
Counter for processing steps (default: 1) |
plus_scale_method |
Scaling method for plus-separated KOs ("mean", "min", or "max") |
comma_scale_method |
Scaling method for comma-separated KOs ("sum" or "max") |
Value
List containing: - abundance_table: Processed abundance values - step_count: Updated step counter - abundance_log: log
Process Module Structure Data
Description
Filters and merges pathway information with sample KO data for a specific module. Returns a merged dataframe containing KO abundance data and pathway definitions.
Usage
process_module_structure(pathway_infor, Sample_KO, module)
Arguments
pathway_infor |
Data frame containing pathway information, see examples. |
Sample_KO |
Dataframe containing KO abundance data with KO IDs as row names |
module |
Character string of the module ID to process (e.g. "M00563") |
Value
A merged dataframe containing: - KO abundance data for the specified module - Corresponding pathway information - Empty dataframe if no matching KOs found
Process Comma-Separated KOs with Specified Scaling Method
Description
Handles comma-separated KOs by applying the specified scaling method (sum or max). Processes multiple KOs separated by commas and aggregates them into a single row.
Usage
process_step_comma(
module_abundance,
KOs = c("K14126,K14127,K14128"),
aggregrate_rowname,
step_count = 1,
comma_scale_method
)
Arguments
module_abundance |
Data frame containing KO abundance data with required columns: Orthology_Entry, Module_Entry, Definition |
KOs |
Character vector of comma-separated KO IDs (default: "K14126,K14127,K14128") |
aggregrate_rowname |
Base name for row aggregation (default: 'step_1') |
step_count |
Processing step counter (default: 1) |
comma_scale_method |
Scaling method for comma-separated KOs ("sum" or "max") |
Value
List containing: - abundance_table: Processed data with aggregated values - step_count: Updated step counter - abundance_log: log
Direct KO Processing Without Special Handling
Description
Processes KO abundances directly without any special scaling or aggregation. Simply extracts the specified KOs from the abundance table while maintaining the original module metadata.
Usage
process_step_direct(module_abundance, KOs = c("K14126", "K14128", "K14127"))
Arguments
module_abundance |
Data frame containing KO abundance data with required columns: Orthology_Entry, Module_Entry, Definition |
KOs |
Character vector of KO IDs to extract (default: c("K14126","K14128","K14127")) |
Value
List containing: - abundance_table: A subset of the input data frame containing only the specified KOs, with original module metadata preserved - abundance_log: log
Process Plus-Separated KOs with Specified Scaling Method
Description
Handles plus-separated KOs by applying the specified scaling method (mean, min, or max). Processes multiple KOs separated by plus signs and aggregates them into a single row. Note: For mean calculation, uses the sum of existing KO abundances divided by total number of KOs (including those with zero abundance in all samples).
Usage
process_step_plus(
module_abundance,
KOs = c("K14126+K14127+K14128"),
aggregrate_rowname,
step_count = 1,
plus_scale_method
)
Arguments
module_abundance |
Data frame containing KO abundance data with required columns: Orthology_Entry, Module_Entry, Definition |
KOs |
Character string of plus-separated KO IDs (default: "K14126+K14127+K14128") |
aggregrate_rowname |
Base name for row aggregation (default: 'step_1') |
step_count |
Processing step counter (default: 1) |
plus_scale_method |
Scaling method for plus-separated KOs ("mean", "min", or "max") |
Value
List containing: - abundance_table: Processed data with aggregated values - step_count: Updated step counter - abundance_log: log
Process Space-Separated KOs with Mean Calculation
Description
Handles space-separated KOs by calculating the mean abundance across all specified KOs. Processes multiple KOs separated by spaces and aggregates them into a single row. Note: For mean calculation, uses the sum of KO abundances divided by total number of KOs, including those with zero abundance in all samples.
Usage
process_step_space(
module_abundance,
KOs = c("K14126 K14127 K14128"),
aggregrate_rowname,
step_count = 1
)
Arguments
module_abundance |
Data frame containing KO abundance data with required columns: Orthology_Entry, Module_Entry, Definition |
KOs |
Character string of space-separated KO IDs (default: "K14126 K14127 K14128") |
aggregrate_rowname |
Base name for row aggregation (default: 'step_1') |
step_count |
Processing step counter (default: 1) |
Value
List containing: - abundance_table: Processed data with mean values - step_count: Updated step counter - abundance_log: log
Read and Process KO Sample Table with Pathway Information
Description
This function reads a KO sample table (wide format), validates its contents against pathway information, and returns a filtered table containing only KOs present in both datasets.
Usage
read_and_process_KO_table(in_KO_Sample_wide, pathway_infor)
Arguments
in_KO_Sample_wide |
Character string specifying the path to the input KO sample table file. Should be a tab-delimited file with KO identifiers as row names. |
pathway_infor |
Data frame containing pathway information, see examples. |
Value
A list with log and a data frame.The data frame contains the filtered sample data, with only rows that match KOs in the pathway information. Includes an added 'Orthology_Entry' column containing the row names.
Read and process Pathway information dataframe
Description
This function reads a tab-delimited file containing KEGG pathway information, performs data validation and cleaning, and returns a processed data frame.
Usage
read_and_process_pathway_infor(in_KO_pathway_ref)
Arguments
in_KO_pathway_ref |
Character string specifying the path to the input file. The file should be a tab-delimited text file containing KEGG pathway information, with a header row and at least one column named "Module_Entry". |
Value
A list with log and a data frame containing the processed pathway information after removing empty/NA entries. The data frame will have the same columns as the input file. Returns NULL if the file cannot be read.
Remove Outer Parentheses from String
Description
This function checks if a string is wrapped in outer parentheses and removes them if present.
Usage
remove_outer_brackets(s)
Arguments
s |
A character string to be processed |
Value
The input string with outer parentheses removed (if they existed), or the original string if no outer parentheses were found.