DNA Sequence and Microarray Data Analysis

"Biology easily has 500 years of exciting problems to work on." - Donald E. Knuth

Biological life is a natural digital computer. To some extent, deciphering a genome is like decompiling an executable computer program, which is difficult but not impossible!

My research group works on signal processing and pattern recognition techniques for DNA sequence and microarray data analysis. We study hidden features along a DNA sequence and methods for transformation and visualization of symbolic data. In the terminology of signal processing, microarray time-series can be considered as a large set of short and noisy signals and we need to find the relations among these signals. The spectral decomposition method we have developed recently for microarray data can processing these signals and reveal the gene regulatory relations effectively.

We're also working on data biclustering. In clustering, classification is performed in one direction only (see diagram below), while in biclustering it is done in both row and column directions of the data matrix. Biclustering is well-known to be an NP-hard problem. Naturally, it is much harder than clustering. We have recently discovered a hyperplane model in multidimensional signal space, which can be used to separate different types of biclusters and can remove irrelevant rows and columns in the data matrix. Using our method, we can reduce the computational complexity significantly and extract biclusters efficiently. We're applying for a patent for this new technology.

If you're an honours graduate in CS, EE, mathematics, physics or statistics and want to work on challenging problems in the new and rapidly growing field of bioinformatics, please contact me regarding postgraduate research opportunities at City University of Hong Kong and the University of Sydney.

Selected Publications

<< Up