[Video (Chinese)] [Program Intro]
Our program is aimed at cultivation of the so-called 'Physician Scientists'. Most (if not all) of the graduates from this program will practice medicine in top hospitals of China. Obviously, I might be one exception.
Clinical decision support systems powered by artificial intelligence (AI) have potential to facilitate the accurate diagnosis of diseases and improve the healthcare quality for patients. However, computational clinical decision making solely based on the electronic health records (EHR) is still a challenging task due to the heterogeneity of patients, missing information in reports and the lack of well-annotated documents. We participated in the first Biobank Disease Challenge, an AI and machine learning data analytic competition consisting of two tasks: the Task 1 is to develop computational phenotyping algorithms and the Task 2 is an open-ended task aiming for novel insights from AI. Overall, our results were ranked as tied 1st place in the Biobank Disease Challenge.
For the Task 1, we design a model which uses semi-supervised learning to learn informative feature representation from training EHRs and to predict the states of certain diseases on test EHRs. We first learn the representations of the raw features from all patients, including both unlabeled and labeled ones, by principal component analysis. Then using the learnt representations of the labeled patients, we train a logistic classifier that accurately categorizes the patient cohorts by disease states. Meanwhile, the interpretability of the model is retained since we can calculate the relative importance of each raw feature for the classification.
For the Task 2, we formulate the medical treatment procedure as a Markov decision process and develop an intelligent agent to learn the best treatment option for each patient with migraine headache from EHR. Guided by the next-hospital-visit interval as the reward signal, the agent is optimized using deep reinforcement learning.
The increasing amount of scientific literature in biological and biomedical science research has created a challenge in continuous and reliable curation of the latest knowledge discovered, and automatic biomedical text-mining has been one of the answers to this challenge. In this paper, we aim to further improve the reliability of biomedical text-mining by training the system to directly simulate the human behaviors such as querying the PubMed, selecting articles from queried results, and reading selected articles for knowledge. We take advantage of the efficiency of biomedical text-mining, the flexibility of deep reinforcement learning, and the massive amount of knowledge collected in UMLS into an integrative artificial intelligent reader that can automatically identify the authentic articles and effectively acquire the knowledge conveyed in the articles.
We construct a system, whose current primary task is to build the genetic association database between genes and complex traits of human. Our contributions in this paper are three-fold: 1) We propose to improve the reliability of text-mining by building a system that can directly simulate the behavior of a researcher, and we develop corresponding methods, such as Bi-directional LSTM for text mining and Deep Q-Network for organizing behaviors. 2) We demonstrate the effectiveness of our system with an example in constructing a genetic association database. 3) We release our implementation as a generic framework for researchers in the community to conveniently construct other databases.
There are millions of articles in PubMed database. To facilitate information retrieval, curators in the National Library of Medicine (NLM) assign a set of Medical Subject Headings (MeSH) to each article. MeSH is a hierarchically-organized vocabulary, containing about 28K different concepts, covering the fields from clinical medicine to information sciences. Several automatic MeSH indexing models have been developed to improve the time-consuming and financially expensive manual annotation, including the NLM official tool -- Medical Text Indexer, and the winner of BioASQ Task5a challenge -- DeepMeSH. However, these models are complex and not interpretable. We propose a novel end-to-end model, AttentionMeSH, which utilizes deep learning and attention mechanism to index MeSH terms to biomedical text. The attention mechanism enables the model to associate textual evidence with annotations, thus providing interpretability at the word level. The model also uses a novel masking mechanism to enhance accuracy and speed. In the final week of BioASQ Chanllenge Task6a, we ranked 2nd by average MiF using an on-construction model. After the contest, we achieve close to state-of-the-art MiF performance of ~0.684 using our final model. Human evaluations show AttentionMeSH also provides high level of interpretability, retrieving about 90% of all expert-labeled relevant words given an MeSH-article pair at 20 output.
Many problems in NLP require aggregating information from multiple mentions of the same entity which may be far apart in the text. Existing Recurrent Neural Network (RNN) layers are biased towards short-term dependencies and hence not suited to such tasks. We present a recurrent layer which is instead biased towards coreferent dependencies. The layer uses coreference annotations extracted from an external system to connect entity mentions belonging to the same cluster. Incorporating this layer into a state-of-the-art reading comprehension model improves performance on three datasets -- Wikihop, LAMBADA and the bAbi AI tasks -- with large gains when training data is scarce.
My first deep learning project performed in Jan 2017. I tried biological seq2seq (from mRNA to Protein translation) and re-deciphered the genetic code in silico.
I tried a permutation-based clustering approach on Hi-C data and identified several interesting patterns after permutation clustering. [Slides of the proposal]
Genetic mutations are the fundation of cancer development. We propose to construct a suvCas9 system in Saccharomyces cerevisiae to function as the monitor of genomic sequence. The system has the potential to prevent carcinogenesis by killing all the mutated cells. [This video] gives an introduction of how our system works.
This project won a Gold Medal in iGEM 2016.
We built a robot with visual servo system that can perform fracture restoration surgery. I am in charge of the computer vision part. We had a patent in China and applied for some more Chinese and international patents.
We reported a case of DCM where genetic testing identified a novel familial truncating mutation in the TTN gene. We believe it follows an autosomal-dominant inheritance pattern with low penetrance. Genetic testing results of the family members identified his son, brother and brother’s daughter as carriers, but they are not clinically affected. We speculate that the unfavorable lifestyle of the patient may contribute to the disease. Our findings may provide some insights of prevention, diagnosis and management of DCM with similar mutations and demonstrate the importance of genetic screening of family members after finding an index patient.
Guess where I am?
I focused on building a good academic atmosphere during my term. Average GPA of the class increased by 2 percents. We also won the 'Jiatuan' of Tsinghua University, which is a top honor only for the best classes in the university.
And yes - we do have a term limit!
Shadowing program: Dpt. of Neurosurgery and Dpt. of Infectious Disease at Peking Union Medical College Hospital (北京协和医院); Multiple important departments at Beijing Tsinghua Changgung Hospital (北京清华长庚医院).
A Beijing Undergraduate Training Program for Innovation and Entrepreneurship funded by the Government of Beijing. Healper is a chat robot for health management based on questionares. Our project was graded excellence by the committee.