Data Mining Procedures
Step on: Preprocessing Techniques
Among the whole data mining procedures, in fact, it is well-known that the preprocessing techniques are the most important and difficult part.
Handling and processing a different kind of metabolomic data
There have been many kinds of metabolomic data mentioned above. Therefore, there is need for the processing technique to carefully handle all these kinds of data in consideration to nature of each data..Sennsichip Bioinformatics Analysis Platform, a software platform including metabolomic data analysis in our company has developed, supports various data format.
Normalization of data
The noise and background can occur when using electrospray for ionization of samples from chromatography, and thus there should be noise reduction and baseline reduction techniques. To deal with these problems, many effective algorithm has been developed to adjust and can properly treat different data such as the lowness-based normalization technique as preprocessing methods for these issues.
Identification and quantification of metabolites
After removing noise and background described above, there should be peak alignment techniques for peak shift problems caused by variation of arrival time of compounds from multiple samples. Sennsichip Bioinformatics Analysis Platform constructed a novel peak alignment algorithm. As an alternative approach, the algorithm that performs the alignment by clustering retention time of each peak corresponding to each compound has been also proposed Second, there can be overlapped chromatographic peaks in chromatography results, and for these peaks the algorithm to identify each peak is needed.
Dimension Reduction Techniques
Once we obtain metabolic profile data after proper preprocessing steps, in order to see the data directly, reduction of the dimension of the data into 2 or 3 dimensions is needed. For this purpose, there are a representative methods, PCA (principal component analysis), which are an unsupervised and supervised method respectively. Sennsichip Bioinformatics Analysis Platform utilizes a PCA as dimension reduction and visualization method of data.

Figure 3 PCA scores plot discriminating specimens from normal specimens based on marker metabolites.
Feature Analysis and Selection Techniques
The main characteristic of metabolomic data is that there are large amounts of features. Therefore, there is need for techniques of analysis about features and selection among them. Moreover, to avoid over-fitting to given data and keep general properties of classifiers that we have generated, also it is essential to use feature selection techniques. In addition, because by the feature selection techniques we are able to find a group of the most associated metabolites to the particular researches (e.g. diseases), the findings can be used as bio-markers and can be practically applied. Sennsichip Bioinformatics Analysis Platform also has tried to develop a new method on it based on a genetic algorithm in careful consideration to nature of metabolomic data.

Figure 4 compounds from multiple samples
Classification Techniques
From given metabolic data, we can generate diagnosis models by classification techniques, and then using the generated models, we can diagnose patients by applying the data from them to the models. There are a variety of classification algorithms, and in our consideration, receiver–operator characteristic (ROC) curves can be suitable choice

Figure 5 the classification accuracies for sample data by ROC curve classifier
bio-equip.cn
overview
Our from Fudan University and American Ph.D with experience in pharmaceutical design, bioinformatics services, biological software, and database development. The major of our products and services is covering the following fields: gene chip, protein chip, mass spectrumetry, and experiment design, data analysis, results validation of high-throughput sequencing, a multi-dimensional solutions including of modification of SCI paper.
We have constructed a professional bioinformatics service platform based on Matlab and R language. At Sensichip, we intend to integrate scientific research achievements of today’s bioinformatics, and technologies and products used by data analysis. Our platform provides a wide variety of professional services for gene expression profiling array, microRNA array, SNP array, Exon array, MeDIP-chip, Oligo chip, CGH array, protein array, cytokine array, and data analysis including mass spectrumetry, metabolismics and high-throughput sequencing.
We have achieved success to extend platform to text mining, database construction, biological analysis software development, based on java, perl, C++, and so on. At present, Sensichip has over 200 customers at home and aboard, published many SCI papers with a high impact factor, and participated application and implementation of much national fund.
Technical Innovation
* HDMD (Human Disease Microarray Database)
Storing and analyzing microarray data for complex human disease such as tumour and diabetes.
* PMBA (Plant Microarray Bioinformatics Analysis)
An integrated microarray analysis system for rice and Arabidopsis.
* MMCP (Multiple Methods for Class Prediction)
A software integrating ANN, SVM, PNN, PAM methods for class prediction using gene expression data.
* PlantQTL-GE
A database system for identifying candidate genes in rice and Arabidopsis by gene expression and QTL information
nstroduction
Genomics
Our major business:
microRNA analysis (including: routine analysis, three-dimensional structure prediction, the network building)
solexa sequencing experiments and analysis services (small RNA sequencing, mRNA sequencing, ChIP-Seq sequencing, RNA-Seq sequencing, whole-genome re-sequencing, bacterial whole-genome sequencing, DNA methylation sequencing).
Co-published papers:
· Chen Lei, et al., The role of microRNA expression pattern in human intrahepatic cholangiocarcinoma, Journal of Hepatology,2009,50(2):358-369,IF=6.642
· Ding JJ, et al., ES Cells Derived from Somatic Cloned and Fertilized Blastocysts are post-transcriptionally Indistinguishable: a MicroRNA and Protein Profiles Compariso, proteomics, 2009,9,1–11,IF=6.088
· Hu SJ, Ren G, Liu JL,et al,. MicroRNA expression and regulation in mouse uterus during embryo implantation, J Biol Chem. 2008 Aug 22;283(34).IF=5.6
· Guodong Li; Wenjuan Zhang; Huazong Zeng et al., Identification of new biomarkers for osteosarcoma early diagnosis from evidences of SELDI-TOF-MS and microarray,BMC cancer, 2009,9:150,IF=3.08
Proteomics
We are focus on:
DIGE-2D experiments and analysis
iTRAQ experiments and data analysis
MALDI-TOF-MS experiments and analysis
SELDI experiments and analysis,
Shortgun Protomics and so on.
Co-published papers:
· Ding JJ, et al., ES Cells Derived from Somatic Cloned and Fertilized Blastocysts are post-transcriptionally Indistinguishable: a MicroRNA and Protein Profiles Compariso, proteomics, 2009,9,1–11,IF=6.088
· Jinghui Guo et al.,Identification of Serum Biomarkers for pancreatic adenocarcinoma by Proteomic analysis .Cancer Science,2009.IF=3.47
· Guodong Li; Wenjuan Zhang; Huazong Zeng et al., Identification of new biomarkers for osteosarcoma early diagnosis from evidences of SELDI-TOF-MS and microarray,BMC cancer, 2009,9:150,IF=3.08
Metabolomics
We are focus on
LC-MS experiments and analysis
GC-MS experiments and analysis NMR experiments and analysis.
Co-published papers:
· Hao Wu, Ruyi Xue , Huazong Zeng, Xizhong Shen et al., Metabolomic profiling of human urine in hepatocellular carcinoma patients using gas chromatography/mass spectrometry, Analytica Chimica Acta,IF=3.18
· Hao Wu, Huazong Zeng, Xizhong Shen et al., Metabolomic study for diagnostic model of oesophageal cancer using gas chromatography/mass spectrometry, Journal of Chromatography B, 877 (2009) 3111–3117 .IF=2.935
Microarray analysis
We are focus on:
Gene Chip analysis
MicroRNA Chip
methylation (cGp) microarray analysis
exon microarray analysis
Protein Chip analysis
SNP Chip analysis
antibody microarray analysis.
Co-published papers:
· Ding JJ, et al., ES Cells Derived from Somatic Cloned and Fertilized Blastocysts are post-transcriptionally Indistinguishable: a MicroRNA and Protein Profiles Compariso, proteomics, 2009,9,1–11,IF=6.088
· Guodong Li; Wenjuan Zhang; Huazong Zeng et al., Identification of new biomarkers for osteosarcoma early diagnosis from evidences of SELDI-TOF-MS and microarray,BMC cancer, 2009,9:150,IF=3.08
Literature mining
We are focus on:
Disease & Gene literature mining,
Gene & Gene literature mining
SNP mining literature
CpG literature mining.
Training
Let bioinformatics training come to you. Onsite training is ideal for groups of researcher or those who need customized instruction on bioinformatics analysis. To maximize productivity with the bioinformatics tools, instructors can tailor the curriculum with institute or industry-specific examples, and address challenges and process issues familiar to students from your organization.