Welcome to the new version of CaltechAUTHORS. Login is currently restricted to library staff. If you notice any issues, please email coda@library.caltech.edu
Published March 25, 2024 | Published
Journal Article Open

Enrichment on steps, not genes, improves inference of differentially expressed pathways

Abstract

Enrichment analysis is frequently used in combination with differential expression data to investigate potential commonalities amongst lists of genes and generate hypotheses for further experiments. However, current enrichment analysis approaches on pathways ignore the functional relationships between genes in a pathway, particularly OR logic that occurs when a set of proteins can each individually perform the same step in a pathway. As a result, these approaches miss pathways with large or multiple sets because of an inflation of pathway size (when measured as the total gene count) relative to the number of steps. We address this problem by enriching on step-enabling entities in pathways. We treat sets of protein-coding genes as single entities, and we also weight sets to account for the number of genes in them using the multivariate Fisher’s noncentral hypergeometric distribution. We then show three examples of pathways that are recovered with this method and find that the results have significant proportions of pathways not found in gene list enrichment analysis.

Copyright and License

© 2024 Markarian et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Acknowledgement

The authors thank Mark Zhang for writing feedback as well as Chris Mungall and his group in the Gene Ontology consortium for their support. This work was supported by the National Human Genome Research Institute (U24HG012212).

Funding

This work was supported by a National Human Genome Research Institute grant (U24HG012212) to PWS and supported the salaries of PWS, NM, KVA and DH. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Data Availability

All data and code used is available on the GitHub repository https://github.com/nmarkari/gocam_enrichment. We have also used Zenodo to assign a DOI to the repository: 10.5281/zenodo.8310236 (https://zenodo.org/records/8310236). Primary sources for datasets used in testing are listed in Table 5, and their corresponding csv files after filtering are in our GitHub at https://github.com/nmarkari/gocam_enrichment/tree/main/test_data/processed.

Conflict of Interest

The authors have declared that no competing interests exist.

Files

journal.pcbi.1011968.pdf
Files (96.6 MB)
Name Size Download all
md5:ee1bf2c924af0fb973db92fefce6acb7
3.2 MB Preview Download
md5:f2183abb921067bf1e5eee85f175696b
93.5 MB Preview Download

Additional details

Created:
May 9, 2024
Modified:
May 9, 2024