Start by loading pwcuratr and a collection of packages that facilitates data manipulation and plotting.

The first step to curating a pathway cluster is to define a set of ‘seed genes’ that are of interest. This set could, e.g., be defined via an extensive literature review. The package pwcuratr comes with a set of genes associated with the serotonin neurotransmitter pathways. The list of genes can be loaded via

Next, external data is used to define a plausible functional neighborhood for genes that are functionally related to any of the seed genes. We start by querying a snapshot of the reactome.org pathway database for any pathways containing at least one of the seed genes.

reactome_pathway_id n_participating_genes
R-HSA-5579012 1
R-HSA-141333 2
R-HSA-379398 2
R-HSA-380612 2
R-HSA-380615 3
R-HSA-379397 3
R-HSA-209905 4
R-HSA-140179 4
R-HSA-209931 5
R-HSA-379401 5
R-HSA-112311 11
R-HSA-390666 12
R-HSA-83936 12
R-HSA-209776 18
R-HSA-181429 18
R-HSA-181430 18
R-HSA-442660 20
R-HSA-212676 23
R-HSA-5579029 42
R-HSA-375280 43
R-HSA-425397 44
R-HSA-112310 51
R-HSA-425366 100
R-HSA-5668914 111
R-HSA-211945 112
R-HSA-6785807 123
R-HSA-112314 220
R-HSA-416476 223
R-HSA-211859 233
R-HSA-425407 269
R-HSA-112315 287
R-HSA-373076 347
R-HSA-71291 412
R-HSA-112316 438
R-HSA-418594 458
R-HSA-449147 512
R-HSA-500792 513
R-HSA-418555 656
R-HSA-382551 806
R-HSA-1280215 1024
R-HSA-388396 1290
R-HSA-1643685 1304
R-HSA-372790 1370
R-HSA-1430728 2291
R-HSA-168256 2734
R-HSA-162582 3105

A naive approach to defining a function neighborhood for the seed genes would be to take the union of all genes who are participating in any of the identified reactome pathways. However, reactome.org pathways are defined on very different levels of granularity and it will not always be sensible to include all participating genes from very large pathways. A first step to address this ‘specificity problem’ is to hand-curate the list of pathways to consider for further processing. Here, a completely arbitrary cut-off in terms of the pathway size of 250 participating genes is chosen. To avoid loosing any of the seed genes, they are added manually to the initial list of candidate genes.

To refine the candidate set of genes even further, predicted functional gene-gene interactions based on the 2018 release of the interactions derived in

Wu, G., Feng, X., & Stein, L. (2010). A human functional protein interaction network and its application to cancer data analysis. Genome biology, 11(5), R53.

are used. The set of interactions is scored between 0 (low confidence) and 1 (high confidence) for functional interaction between the respective genes.

Clearly, the majority of included interactions has a confidence score of 0.9 or greater and we filter the set of interactions with this threshold. This results in a gene interaction network that can be used to refine our notion of functional neighborhood. We want to filter the set of candidate genes derived above such that only genes that are connected to a seed gene via at most \(k\) edges are retained.

The final choice of \(k\) can only be justified by manually inspecting the resulting gene-networks. As an example, consider the choice \(k=2\). We first prune the set of candidate genes and then create an igraph object to represent the network information. This allows us to use the wealth of graph-based techniques implemented in igraph package. For instance, one may be interested in the number of connected components of the pruned gene network.

There are 7 components with almost all candidate genes connected in the first components and two small non-trivial components. All components of size 1 correspond to seed genes for which no functional neighbors in the selected reactome.org pathways where found.

Based on this visual representation, the neighborhood parameters could be tweaked or the set of initial reactome pathways could be altered. In the presence of non-connected components it might be worthwhile considering a split of the pathway cluster such that each individual network is fully connected.

For instance, one could consider the HTR subsystem separately by restricting the initial set of seed genes. Since the HTR3 subsystem is not connected, we can filter these seed genes out as well.