The 12th Korea-Singapore Joint Workshop on Bioinformatics and Natural Language Processing

Speakers

Limsoon Wong (NUS, Singapore)

Title: Improving consistency and coverage of MS-based proteomics

Mass spectrometry (MS)-based proteomics is a widely used and powerful tool for profiling systems-wide protein expression changes. It can be applied for various purposes, e.g. biomarker discovery in diseases and study of drug responses. Nonetheless, MS-based proteomics tend to have consistency issues (poor reproducibility and inter-sample agreement) and coverage issues (inability to detect the entire proteome) that need to be urgently addressed. This talk discusses how these issues can be addressed by proteomic profile analysis techniques that use biological networks (especially protein complexes) as the biological context. In particular, several techniques that we have been developing for complex-based analysis of proteomics profile are described. These techniques are useful in identifying proteomics-profile analysis results that are more consistent, more reproducible, more robust in the presence of batch effects, and more biologically coherent, and these techniques allow expansion of the detected proteome to uncover and/or discover novel proteins. Incidentally, I think this work beautifully demonstrates the triump of logic and computational thinking over noise. (Joint work with Wilson Wen Bin Goh.)

Limsoon Wong is KITHCT Chair Professor of Computer Science and Professor of Pathology at the National University of Singapore. He currently works mostly on knowledge discovery technologies and their application to biomedicine. He is a Fellow of the ACM, inducted for his contributions to database theory and computational biology. He co-founded Molecular Connections, an information extraction and curation services company in India, and oversaw its steady growth over the past decade to nearly 2000 research engineers, scientists, and curators.

See-Kiong Ng (I2R, Singapore)

Title: Utilizing Temporal Information for Taxonomy Construction

Taxonomy plays an important role in many applications by organizing domain knowledge into a hierarchy of 'is-a' relations between terms. Previous works on automatic construction of taxonomies from text documents either ignored temporal information or used fixed time periods to discretize the time series of documents. In this talk, we present a time-aware method to automatically construct and effectively maintain a taxonomy from a given series of documents pre-clustered for a domain of interest. The method extracts temporal information from the documents and uses a timestamp contribution function to score the temporal relevance of the evidence from source texts when identifying the taxonomic relations for constructing the taxonomy. The method also incrementally updates the taxonomy by adding fresh relations from new data and removing outdated relations using an information decay function. It thus avoids rebuilding the whole taxonomy from scratch for every update and maintains the taxonomy effectively up-to-date in order to keep up with the latest information trends in the rapidly evolving domain. (This work is part of the PhD thesis work of Luu Anh Tuan and will be published in TACL.)

See-Kiong Ng (Ph.D., Carnegie Mellon University) is a Principal Scientist and the Programme Director of the Urban Systems Initiative by the Agency of Science, Technology and Research (A*STAR). The Initiative seeks to address the challenges of the rapidly urbanising world through smart city technologies and innovations. See-Kiong also holds a concurrent appointment as Director, Strategic Alliances of A*STAR's Institute for Infocomm Research (I2R). See-Kiong has a long-standing interest in cross-disciplinary applied computer science research. From using data mining to understand the biology of the human body, to using big data approaches to understand the biology of complex human cities, See-Kiong has published widely, with more than 100 papers in leading peer-reviewed journals and conferences across multiple disciplines.

Jie Zheng (NTU, Singapore)

Title: SynLethDB: a comprehensive knowledge base of synthetic lethality across diverse cancer cell lines and multiple species

Synthetic lethality (SL) is a type of genetic interaction between two genes such that simultaneous perturbations of the two genes result in cell death, while a perturbation of either gene alone is not lethal. Hence, the inhibition of SL partners of genes with cancer-specific mutations could selectively kill cancer cells but spare normal cells. Therefore, SL is emerging as a promising anticancer strategy that could potentially overcome the drawbacks of traditional chemotherapies by reducing severe side effects. However, there has not been a comprehensive database dedicated to collecting SL pairs and related knowledge. In this paper, we propose a comprehensive database, SynLethDB (http://histone.sce.ntu.edu.sg/SynLethDB/), which contains SL pairs collected from biochemical assays, computational predictions and text mining results on human and four model species, i.e. mouse, fruit fly, worm and yeast. For each SL pair, a confidence score was calculated by integrating individual scores derived from different evidence sources. We also developed a statistical analysis module to estimate the sensitivity of cancer cells to drugs targeting human SL partners, based on large-scale genomics data, gene expression profiles and drug sensitivity profiles on more than 1000 cancer cell lines. To help users access and mine the wealth of the data, functionalities such as search and filtering, orthology search, gene set enrichment analysis as well as a user-friendly web interface have been implemented to facilitate data mining and interpretation. SynLethDB would be a useful resource for biomedical research community and pharmaceutical industry. In addition, I will introduce the computational problem of SL prediction, with SynLethDB as benchmark data. Biologists can use the knowledge and data resources to guide wet-lab screenings of SL using newest technologies, e.g. CRISPR-Cas9. (This is a joint work with Jing Guo and Hui Liu, published in the Database Issue of Nucleic Acids Research, 44 (D1): D1011 - D1017, 2016.)

Jie Zheng is a tenure-track Assistant Professor at the School of Computer Science and Engineering, Nanyang Technological University (NTU), Singapore, and an adjunct senior research scientist with Genome Institute of Singapore (GIS). He received Ph.D. from the University of California, Riverside, USA and his B. Eng (honors) from Zhejiang University, P.R. China, both in Computer Science. Before joining NTU, he was a research scientist at the National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), USA. His research interests are Bioinformatics, Computational Systems Biology and Genomics, aiming to develop novel algorithms and in silico models to help answer biomedical questions (e.g. how are cell fate decided in cancer and stem cells). He has published in top-tier journals such as Nucleic Acids Research, Genome Biology, Molecular Biology & Evolution, Bioinformatics, PLoS Computational Biology, etc.

Wei Lu (SUTD, Singapore)

Title: Advanced Structured Prediction with the StatNLP Framework

Structured prediction is one of the most important topics in various fields, including machine learning, computer vision, natural language processing and bioinformatics. The hidden Markov model (HMM) and the probabilistic context free grammars (PCFGs) are two classic generative models used for predicting outputs in the form of linear-chain and tree structures, respectively. Under the discriminative learning paradigm, researchers then proposed linear-chain conditional random fields (CRFs) (Lafferty et al. 2001), showing better performance on standard tasks such as information extraction. Several extensions to such a model were then proposed afterwards, including the semi-Markov CRFs (Sarawagi et al. 2004), tree CRFs, as well as their latent variable variants. On the other hand, utilizing a slightly different loss function, one can arrive at the structured support vector machines (Tsochantaridis et al. 2004) and its variants (Yu and Joachims 2009) as well. Furthermore, due to the popularity of neural networks and deep learning, new models that integrate neural networks and graphical models, such as neural CRFs (Trinh and Arti 2010) are also proposed.
In this talk, I will be discussing how such a wide spectrum of existing structured prediction models can all be unified and implemented under our StatNLP framework (http://statnlp.org/). We also show that the framework can be used to solve certain structured prediction problems that can not be easily handled by conventional structured prediction models, and discuss potential applications of our framework in different tasks.

Wei Lu is an Assistant Professor from the Information Systems Technology and Design (ISTD) Pillar of the Singapore University of Technology and Design (SUTD) (http://www.sutd.edu.sg/). He received his PhD from the Singapore-MIT Alliance of the National University of Singapore (NUS), and worked as a postdoctoral research associate at the University of Illinois at Urbana-Champaign (UIUC), USA. His research interests include machine learning and statistical natural language processing. He is particularly interested in semantic processing (in a broad sense). He has published at conferences such as ACL, EMNLP, NAACL, AAAI, and CIKM. He paper on language generation from formal semantics received the best paper award at EMNLP'2011. He served as an area co-chair for ACL 2016, and will be serving as the publication co-chairs for ACL 2017 and ACL 2018 (advisor).

Kwang-Hyun Cho (KAIST, Korea)

Title: Systems biology - Observing complexity and seeing simplicity from biological networks

Systems biology explores the hidden evolutionary principle underlying the emergent property of living systems by combining biological experiments, mathematical modeling, computer simulation, and systems analysis. Such an emergent property occurs when multiple components interact with each other in a nonlinear way. Cells have evolved a complicated signaling network to recognize external signals and elicit appropriate responses for survival. We found that there are intriguing circuits embedded in such a signaling network that were evolutionarily developed for critical cellular functions and result in some intriguing emergent properties. In particular, we found that feedforward and feedback loops are essential in such circuits and that cellular dysfunctions related to complex human disease such as cancer can be caused by malfunctioning of these circuits. In this talk, I will introduce some case studies ranging from a small-scale signaling circuit to a large and complex molecular interaction network to discuss how the emergent properties of cellular functions can be induced by complicated interaction of multiple molecules and how we can control the cellular functions by perturbing some targeted molecules in the network, which leads to network medicine.

Kwang-Hyun Cho is a Professor and Head of the Department of Bio and Brain Engineering at the KAIST and a director of the Laboratory for Systems Biology and Bio-Inspired Engineering (http://sbie.kaist.ac.kr). He was the recipient of the IEEE/IEEK Joint Award for Young IT Engineer, Young Scientist Award from the President of Korea, Walton Fellow Award from Science Foundation of Ireland, and Distinguished International Scholar Invitation from Chinese Academy of Sciences. He has been working on systems biology with biomedical applications, network medicine, complex network control and bio-inspired engineering. His innovative contribution to systems biology and bio-inspired engineering research by combining an engineering approach with biochemical experimentation has led to over 150 high-profile international journal publications. He is currently the Editor-in-Chief of IET Systems Biology (IET, London, U.K.).