Featured Research

1. Statistical framework for differential pseudotime analysis with multiple single-cell RNA-seq samples

Pseudotime analysis with single-cell RNA-seq data has been widely used to study dynamic gene regulatory programs along continuous biological processes. While many computational methods have been developed to infer the pseudo-temporal trajectories of cells within a biological sample, methods that compare pseudo-temporal patterns of multiple samples (or replicates) across different experimental conditions are lacking. We developed a comprehensive and statistically rigorous computational framework called Lamian for differential multi-sample pseudotime analysis in single cells (a). It can be used to identify temporal changes associated with sample covariates, such as different biological conditions, and detect changes in gene expression, cell density, and topology of a pseudotemporal trajectory. Unlike existing methods that ignore sample variability, Lamian draws statistical inferences after accounting for cross-sample variability and substantially reduces sample-specific false discoveries that are not generalizable to new samples. It is the first comprehensive framework that innovatively addresses many open challenges in pseudotime analysis with multiple single-cell RNA-seq samples. It significantly improves upon existing methods, including tree structure inference reproducibility, differential topology, and differential temporal pattern identification. We actively apply the Lamian method to other collaborative projects to identify the pseudo-temporal transcriptional features in non-small cell lung cancers (b), head and neck squamous cell carcinoma, and acute myeloid leukemia.

  1. Hou, W., Ji, Z., Chen, Z., Wherry, E.J., Hicks, S.*, and Ji, H.* A statistical framework for differential pseudotime analysis with multiple single-cell RNA-seq samples. Nature Communications 14, 7286 (2023). Software package: Lamian.
  2. Caushi, J.X., Zhang, J., Ji, Z., Vaghasia, A., Zhang, B., Hsiue, E., Mog, B., Hou, W., Justesen, S., Blosser, R., Tam, A., Anagnostou, V., Cottrell, T.R., Guo, H., Chan, H., Singh, D., Thapa, S., Dykema, A., Choudhury, C., Aparicio, L., Cheung, L., Lanis, M., Belcaid, Z., Asmar, M.E., Illei, P., Brock, M., Ha, J., Bush, E., Park, B., Bott, M., Naidoo, J., Marrone, K.A., Reuss, J.E., Velculescu, V.E., Chaft, J.E., Kinzler, K.W., Zhou, S., Vogelstein, B., Taube, J.M., Merghoub, T., Brahmer, J.R., Hellmann, M.D., Forde, P.M., Yegnasubramanian, S.*, Ji, H.*, Pardoll, D.M.*, Smith, K.N.* (2021). Transcriptional programs of neoantigen-specific TIL in anti-PD-1-treated lung cancers. Nature, July 21, 2021. PMID: 34290408 PMCID: PMC8338555.
  3. Dykema, A.G., Zhang, J., Cheung, L.S., Connor, S., Zhang, B., Zeng, Z., Cherry, C.M., Li, T., Caushi, J.X., Nishimoto, M., Munoz, A.J., Ji, Z., Hou, W., Zhan, W., Singh, D., Zhang, T., Rashid, R., Mitchell-Flack, M., Bom, S., Tam, A., Ionta, N., Aye, T.H.K., Wang, Y., Sawosik, C.A., Tirado, L.E., Tomasovic, L.M., Spangler, J.B., Anagnostou, W., Yang, S., Spicer, J., Rayes, R., Taube, J., Brahmer, J.R., Forde, P.M., Yegnasubramanian, S.*, Ji, H.*, Pardoll, M.*, and Smith K.N.*(2023). Lung tumor–infiltrating Treg have divergent transcriptional profiles and function linked to checkpoint blockade response. Science Immunology, 8(87). PMID: 37713507.

2. Methods for benchmarking single-cell RNA-seq imputation methods as well as unbiased visualization of single-cell RNA-seq and single-cell spatial data

Single-cell RNA-Seq has been developed to measure gene expression in individual cells, leading to significant progress toward systematically identifying heterogeneous cell populations within a tissue. Recent work has led to the development ofmany imputationmethods which address the increased sparsity observed in single-cell RNA-seq data. We are among the firstto systematically evaluate the performance of all 18 state-of-the-art single-cell RNA-seq imputation methods(a). Using cell line and tissue data measured across experimental protocols, we evaluated methods in terms of the similarity between imputed cell profiles and bulk samples and whether these methods recover relevant biological signals or introduce spurious noise in downstream differential expression, unsupervised clustering, and pseudo-temporal trajectory analyses.

We also developed methods to address the visualization challenges in single-cell and spatial genomic data. With the increased number of cells and samples in single-cell genomic data, visualizing the low-dimensional representations with scatterplots is often biased due to cells being masked by other cells or an unbalanced total number of cells across samples. These biases are often overlooked but may lead to misinterpretation of data. We developeda software tool, SCUBI(b), toovercome these biases.Besides, current visualization methods often assign visually similar colors to spatially neighboring cell clusters, making it hard to identify the distinction among tens ofclusters.We developedPalo(c) whichoptimizes the color palette assignment for single-cell and spatial data in a spatially-aware manner.

  1. Hou, W., Ji, Z., Ji, H.* and Hicks, S.C.*, (2020). A Systematic Evaluation of Single-cell RNA-sequencing Imputation Methods. Genome Biology 21, 218 (2020), doi: 10.1186/s13059-020-02132-x. PMID: 32854757. PMCID: PMC7450705. Links to: CodeTwitter.
  2. Hou, W., Ji, Z.* (2022). Single-cell Unbiased Visualization with SCUBI. Cell Reports Methods, 100135, 2022Software package: scubi. PMID: 35224531. PMCID: PMC8871596
  3. Hou, W., Ji, Z.* (2022). Palo: spatially-aware color palette optimization for single-cell and spatial data. Bioinformatics, June 01, 2022Software package: Palo. PMID: 35642896. PMCID: PMC9272793.

3. Boolean networks and glycosylation networks

A Boolean network (BN) is a mathematical and graphical structure that has been prevalently used for modeling gene regulatory networks (GRNs). The nodes represent genes, directed edges represent gene-gene-regulation, and the network state at time t represents the cell state at that time point. A BN achieves a (or multiple) set of states called an attractor that forms a loop of states given the finite number of total states. An attractor can be considered as a cell type. We mathematically proved that the minimum set of driver nodes for BNs is NP-hard and developed an integer-linear-programming (ILP)-based algorithm to optimize the control strategy (a). We discovered the intriguing relationships between the BNs control problem and the Coupon collector's problem, where the latter is famous in combinatorics (b). We presented rigorous algorithms and proofs to show how to drive BNs from one cell type to the other and the minimum cost. We mathematically showed that a small number of driver nodes are enough to control BNs to a target state if target states are restricted to attractors, under a reasonable assumption. This work suggests that if the number of attractors (which may correspond to the number of cell types) is 200, only around nine driver nodes are enough if both an initial state and a target attractor are specified. We also actively developed algorithms for finding the optimal minimal set of driver nodes (b) and the inference of BNs (c). In addition, we built a systematic framework for constructing glycosylation networks (d). These results have broad applications in studying cell differentiation, cell reprogramming, and other developmental progressions.

  1. Hou, W., Tamura, T., Ching, W.K. and Akutsu, T., 2016. Finding and analyzing the minimum set of driver nodes in control of Boolean networks. Advances in Complex Systems, 19(03), p.1650006. doi: 10.1142/S0219525916500065. [PMID/PMCID not available]
  2. Hou, W., Ruan, P., Ching, W.K. and Akutsu, T., 2019. On the number of driver nodes for controlling a Boolean network when the targets are restricted to attractors. Journal of Theoretical Biology, 463, pp.1-11. PMID: 30543810. [PMCID not available]
  3. Cheng, X., Qiu, Y., Hou, W. and Ching, W.K., 2017. Integer programming-based method for observability of singleton attractors in Boolean networks. IET Systems Biology, 11(1), pp.30-35. PMID: 28303791. [PMCID not available]
  4. Hou, W., Qiu, Y., Hashimoto, N., Ching, W.K. and Aoki-Kinoshita, K.F., 2016. A systematic framework to derive N-glycan biosynthesis process and the automated construction of glycosylation networks. BMC Bioinformatics, 17(7), pp.465-472. PMID: 27454116. PMCID: PMC4965717.