A database of DNA Methylation and gene expression in Human Cancer

MethHC is a database for Human pan-cancer gene expression, methylation and microRNA expression.



DNA methylation is add a methyl-group to the DNA base Cytosine(C) by DNMT(DNA methyltransferase).

MethHC uses beta value for measuring methylation level ranging from 0 (least methylated) to 1(most methylated).
methylation level is given by beta= = Methylated probe intensity (M) / (Unmethylated probe intensity (U) + Methylated probe intensity (M) + 100).

DNA methylation data was obtained from Illumina Infinium HumanMethylation450 Beadchip in TCGA Data Portal.

Gene expression

Gene expression value was obtained from RNA Seq RPKM (Reads Per Kilobase per Million mapped reads) values in TCGA Data Portal.

microRNA expression

microRNA expression value was obtained from miRNA sequencing RPM (Reads Per Million mapped reads) values in TCGA Data Portal.

Gene centric regions

TSSs (transcription start sites) information was from RefSeq gene of UCSC and miRStart database for gene expression and microRNA expression, respectively.

Promoter region was defines as from 1.5kb upstream to 0.5kb downstream of the RefSeq TSS.

Enhancer regions was downloaded from the UCSC Table Browser.

Using the NCBI Reference Sequence (RefSeq) gene annotation, we annotated probes into 8 gene regions (promoter, enhancer, TSS1500, TSS200, 5ˊUTR, 1st exon, gene body and 3ˊUTR).

The hg19 RefSeq information was downloaded from UCSC

CpG Island region

CpG Island regions were defined based on UCSC criteria : CG content >50%, Obs/Exp, CpG ratio >0.60 and length >200 bps.

Shores were defined as the 2 kb up- and down-stream of a CpG island and shelves as the 2 kb outside of a shore. (including N shelf, N shore, CpG Island, S shelf and S shore of CpG region)

Statistic analysis

In the boxplot, we use t-test to test the difference between two groups, tumor and normal samples, and p-value is the probability of obtaining a test statistic result.

In the scatter plot, the association between methylation level and expression level of the corresponding genes was determined by calculating Pearson correlation coefficient.

Data generation flow


User Guide

To facilitate access to data and further analyses for the identification of differentially methylated genes, MethHC provides a variety of interfaces and graphical visualizations.


Briefly describe MethHC structure and features.

Borwse Top 250

This analysis is to identify the 250 hyper-, hypo- methylated genes and most differentially methylated genes.

Methylation Cluster

Methylation cluster is using hierarchical clustering graph to show genes and samples with similar DNA methylation patterns. Genes with similar methylation patterns are grouped together and are connected by a series of branches.The data were clustered using Euclidean distance and complete linkage.


MethHC can submit a group of interested genes/miRNAs or the name of a KEGG pathway to identify differentially methylated genes in a set of tumors. Our system shows the differential methylation status of each transcript in selected tumors by boxplot, The asterisk means significant differential methylation in selected tumors(* : p<0.05, **: p< 0.005). A detailed view of each gene is demonstrated in the correlation of DNA methylation and gene expression.


Providing a detailed list of the cancer and sample numbers and the comparsion of MethHC iwth other methylation databases


This page