Hierarchical non-parametric Bayesian clustering of digital expression data

DGEclust is a program for clustering and differential expression analysis of expression data generated by next-generation sequencing assays, such as RNA-seq, CAGE and others. It takes as input a table of count data and it estimates the number and parameters of the clusters supported by the data. The estimated cluster configurations can be post-processed in order to identify differentially expressed genes and for generating gene- and sample-wise dendrograms and heatmaps.

Internally, DGEclust uses a Hierarchical Dirichlet Process Mixture Model (HDPMM) for modelling over-dispersed count data, combined with a blocked Gibbs sampler for efficient Bayesian learning. You can find more technical details on the statistical methodologies used in this software in the following papers:

  1. Vavoulis DV, Gough J (2013). Non-Parametric Bayesian Modelling of Digital Gene Expression Data. J Comput Sci Syst Biol 7:001-009. doi: 10.4172/jcsb.1000131 [PDF]
  2. Vavoulis DV, Francescatto M, Heutink P, Gough J (2014). DGEclust: differential expression analysis of clustered count data. (under review) [PDF]

If you find this software useful, please cite the second paper, above. For more information and bug reports, you can send an email to Dimitris Vavoulis or Julian Gough.