I am a fifth year M.D. candidate at School of Medicine, Tsinghua University, currently visiting Department of Biomedical Informatics, University of Pittsburgh. I am interested in deep learning for natural language processing in biomedical fields. My advisor is Prof. Xinghua Lu and I am also supervised by Prof. Lu's collaborator Prof. William Cohen at CMU. I worked with Prof. Xuegong Zhang at Tsinghua University.
My dream is to create AGI in biomedical field, which will empower human doctors and researchers with the Unreasonable Effectiveness of Data. I also dream of contributing to the revolution of medical education in the era of AI. Moreover, I have an enthusiasm for democratizing medicine, i.e., making the medical knowledge, resources and techniques available to everyone that needs them.
Overall GPA 93.27/100, Rank 1/32
Our program is aimed at cultivation of the so-called 'Physician Scientists'. Most (if not all) of the graduates from this program will practice medicine in top hospitals of China. Obviously, I might be one exception.
The increasing amount of scientific literature in biological and biomedical science research has created a challenge in continuous and reliable curation of the latest knowledge discovered, and automatic biomedical text-mining has been one of the answers to this challenge. In this paper, we aim to further improve the reliability of biomedical text-mining by training the system to directly simulate the human behaviors such as querying the PubMed, selecting articles from queried results, and reading selected articles for knowledge. We take advantage of the efficiency of biomedical text-mining, the flexibility of deep reinforcement learning, and the massive amount of knowledge collected in UMLS into an integrative artificial intelligent reader that can automatically identify the authentic articles and effectively acquire the knowledge conveyed in the articles.
We construct a system, whose current primary task is to build the genetic association database between genes and complex traits of human. Our contributions in this paper are three-fold: 1) We propose to improve the reliability of text-mining by building a system that can directly simulate the behavior of a researcher, and we develop corresponding methods, such as Bi-directional LSTM for text mining and Deep Q-Network for organizing behaviors. 2) We demonstrate the effectiveness of our system with an example in constructing a genetic association database. 3) We release our implementation as a generic framework for researchers in the community to conveniently construct other databases.
There are millions of articles in PubMed database. To facilitate information retrieval, curators in the National Library of Medicine (NLM) assign a set of Medical Subject Headings (MeSH) to each article. MeSH is a hierarchically-organized vocabulary, containing about 28K different concepts, covering the fields from clinical medicine to information sciences. Several automatic MeSH indexing models have been developed to improve the time-consuming and financially expensive manual annotation, including the NLM official tool -- Medical Text Indexer, and the winner of BioASQ Task5a challenge -- DeepMeSH. However, these models are complex and not interpretable. We propose a novel end-to-end model, AttentionMeSH, which utilizes deep learning and attention mechanism to index MeSH terms to biomedical text. The attention mechanism enables the model to associate textual evidence with annotations, thus providing interpretability at the word level. The model also uses a novel masking mechanism to enhance accuracy and speed. In the final week of BioASQ Chanllenge Task6a, we ranked 2nd by average MiF using an on-construction model. After the contest, we achieve close to state-of-the-art MiF performance of ~0.684 using our final model. Human evaluations show AttentionMeSH also provides high level of interpretability, retrieving about 90% of all expert-labeled relevant words given an MeSH-article pair at 20 output.
Many problems in NLP require aggregating information from multiple mentions of the same entity which may be far apart in the text. Existing Recurrent Neural Network (RNN) layers are biased towards short-term dependencies and hence not suited to such tasks. We present a recurrent layer which is instead biased towards coreferent dependencies. The layer uses coreference annotations extracted from an external system to connect entity mentions belonging to the same cluster. Incorporating this layer into a state-of-the-art reading comprehension model improves performance on three datasets -- Wikihop, LAMBADA and the bAbi AI tasks -- with large gains when training data is scarce.
My first deep learning project performed in Jan 2017. I tried biological seq2seq (from mRNA to Protein translation) and re-deciphered the genetic code in silico.
I tried a permutation-based clustering approach on Hi-C data and identified several interesting patterns after permutation clustering. [Slides of the proposal]
Genetic mutations are the fundation of cancer development. We propose to construct a suvCas9 system in Saccharomyces cerevisiae to function as the monitor of genomic sequence. The system has the potential to prevent carcinogenesis by killing all the mutated cells. [This video] gives an introduction of how our system works.
This project won a Gold Medal in iGEM 2016.
We built a robot with visual servo system that can perform fracture restoration surgery. I am in charge of the computer vision part. We had a patent in China and applied for some more Chinese and international patents.
We reported a case of DCM where genetic testing identified a novel familial truncating mutation in the TTN gene. We believe it follows an autosomal-dominant inheritance pattern with low penetrance. Genetic testing results of the family members identified his son, brother and brother’s daughter as carriers, but they are not clinically affected. We speculate that the unfavorable lifestyle of the patient may contribute to the disease. Our findings may provide some insights of prevention, diagnosis and management of DCM with similar mutations and demonstrate the importance of genetic screening of family members after finding an index patient.
Guess where I am?
I focused on building a good academic atmosphere during my term. Average GPA of the class increased by 2 percents. We also won the 'Jiatuan' of Tsinghua University, which is a top honor only for the best classes in the university.
And yes - we do have a term limit!
Shadowing program: Dpt. of Neurosurgery and Dpt. of Infectious Disease at Peking Union Medical College Hospital (北京协和医院); Multiple important departments at Beijing Tsinghua Changgung Hospital (北京清华长庚医院).
A Beijing Undergraduate Training Program for Innovation and Entrepreneurship funded by the Government of Beijing. Healper is a chat robot for health management based on questionares. Our project was graded excellence by the committee.
I have been deeply influenced and inspired by Richard Feynman and Linus Pauling ever since my senior high school.
I enjoy long-distance running, which is inspired by my hero Prof. Yigong Shi.
I love music. Cheer Chen 陳綺貞 is my favorite singer and songwriter.
I'm a traveler-for-life and travelling is part of me. Here are the places I've visited: