Step 1: Clustering analysis
We carry on two-way clustering between the chips and the differentia genes, simultaneously carry on forecast of the new gene's function, the goal is for the classification of the different sample, and the similar analysis of the difference gene. Clustering methods include hierarchical clusterinf, K-means clustering and SOM. The example seen in the next chart.
Making biology function classification of differentia genes,based on GO(gene ontology) annotations database, and screening significantly noticeable differences classification based on statistical testing methods (P-value). The final is the output of result for the client's needs.
The statistics result of differentia genes in GO database The tree chart about the differentia genes’ distribution in the GO nodes.
Step 3. Pathway analysis
It can create the signal pathways and the network of biological functions, the comparison and integration of related differentia genes are for identifying the relationship between genes and carrying out dynamic simulation path. Based on statistical testing methods (P-value) screening significantly noticeable differences in metabolic pathways, it has a view to building a simulation of disease-causing genetic disease status of the access network, for the purpose of genetic analysis with a view to the target gene was found between the biological pathways and disease or biochemical pathway relation.
.
Differences in metabolic pathway genes in the analysis of the situation, the red marker up-regulated genes, the green marker down-regulated genes
Make use of relevant transcription factor (TF) database, using pwmatch algorithm analysis of each transcription factor genes in the distribution of differences, using chi-square test of statistical methods such as looking for differences in transcription factors. The aim of finding control characters, statistically significant differences in transcription factors.
Step 5. Data-driven Network Analysis
It construct Co-expression gene regulatory networks about differential expressed genes. It is through using Bayesian methods of machine learning to build a dynamic network between the differential genes. This is a data-driven network building, can find some of the new regulation.
What the left figure demonstrated is the genes altogether express network example. Uses GeneTS R package computing network structure. The picture result draws by the MEDUSA software. The node is a gene, the rimmed connected two genes, possibly exist adjust the relations mutually. Gene tabulation for difference gene.
By integrating PubMed text mining, homology prediction, gene neighbor, protein - protein interaction, gene fusion and other data to create a all differentially expressed genes in a single plot of the regulatory networks. This is a knowledge-driven network has been built. Would aim at the results of previous studies and the experimental correlation analysis of biological information in order to find some new clues to the gene co-expression, discovering new patterns.
Step 7: Classification of diseases
Mainly aims at the complex disease, carries on the discrimination using the chip data to these disease's sub-type, the tradition diagnosis method is unable to distinguish, but actually has the important meaning hypotype for the prognosis.
Left shows the samples with red marks of clustering results represent cancer patients, it is clear to distinguish this type of cancer into two categories from the results. Indicating that the tumor may be two subtypes.
The results of the chip to build predictive models based on Bayesian network, PAM and SVM machine learning methods : the part of data used as a forecasting model, and then part of data as a test data set (independent samples) to verify accuracy of the model. The purpose is to screen a number of target genes using experimental data, and thus build a model about early diagnosis, disease prediction and prognosis.
Example: The characteristic gene expression value’s situation,used in constructing disease forecast model. Each gene ‘s(red and blue color) expression quantity has quite remarkable difference in two kinds. Through this model’s test, may obtain the rate of accuracy.
After the raw data pre-processing, homogenization, screen differentially expressed microRNA and carry on two-way clustering analysis of microRNA.
Some MicroRNA in the genome was cluster distribution. These are synchronous transcription of microRNA. As the post-transcriptional regulation of maturation process is different, so clusters of microRNA expression levels of mature body is slightly different. Clustering analysis of microRNA can study the differences in microRNA expression
MicroRNA binding the target gene 3 'UTR, down target gene (mainly protein level).We develop our own prediction algorithm based on targetscan platform, it can provide a number of prediction service which have not online databases , or predict MicroRNA target gene which has been a non-3'UTR (eg 5'UTR) region.
The project used to study the MicroRNA’s self-regulation mechanism. The method is extracting a promoter region of microRNA, through the PWM and other algorithms to find transcription factor binding sites.
Step 5, regulatory network analysis.
For some transcription factor, microRNA itself can be combined in the promoter region, regulatory microRNA. And the microRNA may also be combined in the same time, transcription factor 3 'UTR download-regulating the transcription factor. That in itself is a network. The approach is a comprehensive analysis of target gene prediction and promoter
In order to eliminate experimental error, biological samples individual differences and noise impact, the data is normalized.
If the chip probes’ annotations are not produced by genome sequence, but by direct cloning and sequencing experiments, when doing BLAST to the genome, sometimes there is no counterpart. Need to be re-annotation, access to the genome sequence of the probe.
We will list the original genes have been re-annotations, the original annotations generally provide only a CpG island which is located within genes, that is within, in fact there is a much wider range of promoter, so we joined a number of CpG islands upstream (upstream) and downstream gene (downstream) of the annotations.
To take two-fold difference in gene expression
Scan CpG island region of transcription factor binding sites. Under normal circumstances , because the methylation of DNA transcription factors makes the power of combination drops, leading to a lower level of gene transcription. Therefore, the analysis of transcription factors can explain the mechanism of methylation leading to transcriptional repression.
Reference:
Dong Li et al. CpG methylation plays a vital role in determining tissue- and cell-specific expression of the human cell-death-inducing DFF45-like effector A gene through the regulation of Sp1/Sp3 binding. Nucleic Acids Research, 2007, 1–12.
The red mark is the methylation of a significant genetic pattern1. Blue mark is the methylation of a significant genetic pattern2.
We can use text mining methods (NLP) to establish the following diagram of genes associated with network:
The use of NLP technology found in p53 gene associated with the network (local)
For example: acetylation of histone H3 gene CpG islands will lead to upregulation of mRNA levels. However, we do not see this trend (CpG probe - the use of gene mapping within comments), using of CpG-chip and expression microarray data.
When the CpG probe - the use of downstream gene mapping annotations, we get a result:the mRNAs of the gene which make CpG island dowmstream based on acetylation are also down.
exon-chip data analysis can be divided into 3 steps: the acquisition of exon-level expression value, the acquisition of gene-level expression value, body variable cut.
Using Affymetrix Power Tools (APT) software, get the available expression level value of Exon based on rma-sketch algorithm normalization. Exon level of expression of exon-chip value is the basis of exon analysis.
Exon chip itself can serve as a common expression profile chip, so in addition to calculate the level of Exon expression value, but also can calculate the value of gene expression levels. In addition to the value of gene expression level can be used for the purpose of the traditional expression profile analysis, it can also inter-chip calibration exon expression value data.
Analyse each gene on the chip several exon. In different tissue samples, these exon expression of value is different. Gene level data used after correction, using ANOVA (t-test) or SAM (Significant Analysis of Microarray) and other methods can be selected differentially expressed exon, and this exon corresponds to a variable shear phenomenon.
With the gene expression values of the data before and after correction exon data comparison. For each exon,We use the t-test tests to find the existence of the phenomenon of variable shear exons.The second exon is a variable shear in the figure
Follow-up analysis and is no different from regular cDNA microarray. Including the GO analysis, signal pathway analysis,the methods of analysis in this regard, seen expression analysis section, where is not listed.
The project pre-processe client's raw data, then view the distribution of their data, do two-way clustering with sample and protein peak for a overall understanding of raw data.
The box-type figure of raw data distribution - a two-way clustering analysis of the raw data
PCA analysis is a dimension reduction technique,it can map multidimensional (ie, protein peak number) of the chip data onto low-dimensional space. Similar sample to each other near the point where the sample is,it can be analyzed by PCA to find those "outlier" samples.
Principal component analysis of samples. Abscissa of the diagram as the first principal component,
Located at the top of the image sample may be "outliers" samples.
According to the experiment carried out different groups of samples peak screening, and clustering analysis to determine the difference between peak (protein) and different samples of the relationship between the interactions. In the cluster diagram is generally believed that the closer the distance between the peaks of the sample or differences in the relationship more closely.
Differences in peak heat map. Different colors for different samples of text groups, longitudinal coordinates MZ value. Color indicates the peak of the density of the green - black - red, followed by increase in density.
Using the decision tree, the neural network, SVM established the grouping diagnosis model, the goal have lain in using the empirical datum screen one batch of target peak, and by this construction model, carried on the early diagnosis, disease forecast and the prognosis analysis.
Uses the decision tree method construction the forecast model, screens the rate of accuracy highest 20 decision trees from 1000 decision trees to take the modelling standard.
Uses forecast model which is based on the neural network method (ANN). What the left figure demonstrated is the peak screening principle used in carrying on the model construction, when peak is about 10, may achieve the quite good preparation rate, helps this example to choose 10 key peak to construct the ANN model.
The difference peak which obtains regarding the above analysis, the researcher often does not know its corresponding protein’s characteristic. In view of this question, we act according to the related literature and the experiment principle, has designed set of algorithms and the software (names peak2gene, applied for patent).It may act according to the difference peak which related to the information provided by researchers, comes to carry on the appraisal and screening about the protein which corresponds to this peak, so that the researcher can conduct the deep research.
The Peak2gene software's flow chart, core algorithm perk2gene has the SCI literature support, the reliability is high, has applied in many research
Cytokine detection is an important indicator to estimate the immune function,it is important meaningful in the disease diagnosis, course of observation, monitoring treatment.By cytokine antibody chips, you can do a variety of simultaneous detection of expression levels of cytokines used to analyze the correlation of samples cytokines’ relative expression; It also can serve as a gene chips supplement used to study relations of gene expression and cytokines. Company's services of antibody analysis, including selection of difference, clustering analysis, PCA analysis, prediction modeling, and gene chips, or other protein chip correlation analysis for exploring hidden biological significance of the results.
Step 1、cluster analysis
By hierarchical clustering algorithm, distance using pearson correlation coefficient, linkage using average: example is as follows (partial)
Clustering analysis of raw data. The intensity of the color signal values expressed by the green - black - red, the signal to increase the value of the order. From the result, it seems law is not strong.
Step 2, principal component analysis (PCA)
PCA analysis is a dimension reduction technique, the chip can map high-dimensional data onto two-dimensional space. Each sample with the two-dimensional space, a point that, similar to each other near the point which represents the sample, so we can find a PCA analysis of those "outliers" in the sample:
Sample principal component analysis (see pca.emf). The figure abscissa as the first principal component, the vertical axis for the second principal component. Each sample with a scatter said. Samples of different colors indicate the different groups.
Using statistical analysis methods and so on ANOVA, Make statistica about difference expresses of the immune body in different groupings. The difference immune body following chart of symbols shows:
Box plo:
The decision tree is the tree structure used to represent the decision-making set, is a visual representation of knowledge, but also highly efficient classifier. The main idea is to construct the decision tree as a tool of information theory, in all non-leaf node selecting key property or property group, divide the training examples of top-down set up to meet certain conditions for the termination. Decision tree from a root node, a number of leaf nodes and a number of non-leaf nodes pose. Root node corresponds to the learning task. Each leaf node contains a category name. Decision tree is an important method of pattern recognition. The advantage is that the rules clear, high classification accuracy rate.
Gene selection method of tree
By training the decision tree, decision tree building. The following icon:
Price: 5000RMB / Experimental
Include:
Chip raw data processing and genotyping, we give a list of statistically significant SNP.
Descriptive statistics, such as the minor allele frequency, Hardy-Weinberg equilibrium and so on.
Test of significance, the experimental group and control group differences , false-positive rate (FDR) of the calculation.
SNP association analysis, a linear model or logistic regression model. (All statistics can select from SAS, SPSS, or S-Plus / R are given)
CNV is currently a hot research content. SNP chip data can be used to accurately calculate the CNV. We provide SNPchips CNV calculations based on CNAG (Copy Number Analyser for GeneChip), dChip (DNA-Chip Analyzer) and CNAT (Chromosome Copy Number Analysis Tool) and other algorithms.
Through SNP position in the chromosome, the use seeks for the gene which SNP possibly affects (or EST). We may also annotate the function to the corresponding gene (gene ontology, pathway and transcription factor analysis and so on), then explain the SNP’s possible action mechanism. This part may refer to the conventional expression profile analysis.
Traditional statistical methods in SNP mining, often have a certain sensitivity and specificity limitations. Using some pattern recognition / machine learning methods ,it is better address the issue of SNP screening.we supply tree-based SNP mining algorithms.
Hsiang-Yu Yuan et al. FASTSNP: an always up-to-date and extendable service for SNP function analysis and prioritization. Nucleic Acids Research 2006 34(Web Server issue):W635-W641
The establishment of artificial neural network (ANN), SVM, PAML and other diagnostic model based on SNP screening has important clinical significance. The following figure is the method which we build a diagnostic mode using ANN.
SNP public data in the present are more and more, the main platforms are Illumina and affymetrix . We provide the integration solutionsabout analysis of public data, including data integration between different platforms.
SNP and expression profile microarray, aCGH have their own technical merits. We provide a comprehensive data integration programs to achieve all possible high-throughput method to solve the corresponding biological problems: such as cancer drug target screening, complex genetic disease marker mining.
Stein Aerts et al. Gene prioritization through genomic data fusion. NATURE BIOTECHNOLOGY VOLUME 24 NUMBER 5 MAY 2006
Including data validation experimental design, and experimental service.
We offer PCR methods, or taq-man's real-time PCR genotyping method to verify the contents of the SNP-chip services. In addition, we also provide a follow-up experiments on the SNP function services, including:
For the gene promoter region is located in the SNP, we recommend the amount of gene expression for detection (western blot). While taking advantage of transcription factor analysis of point mutations to analyse the impact of transcription factor binding free energy.
For the intron of the SNP, we recommend to take out variable shear validation (northern blot).
For the CDS region of the SNP, non-synonymous mutations can take advantage of 3D modeling analysis you compare the changes in protein structure.
For the 3'-UTR we recommend for microRNA binding site prediction and common 3'-UTR element prediction (such as ARE, etc.). Mutations cause the loss of binding sites
Request Information |
Related News |
Other Products |
Related Products |
Recently viewed products |