Experiences

Medical Student

School of Medicine, Tsinghua University

[Video (Chinese)] [Program Intro]

Our program is aimed at cultivation of the so-called 'Physician Scientists'. Most (if not all) of the graduates from this program will practice medicine in top hospitals of China. Obviously, I might be one exception.

August 2014 - Present

Research Assistant

DBMI of UPitt, MLD of CMU
Partners HealthCare Biobank Disease Challenge [Announcement]

Clinical decision support systems powered by artificial intelligence (AI) have potential to facilitate the accurate diagnosis of diseases and improve the healthcare quality for patients. However, computational clinical decision making solely based on the electronic health records (EHR) is still a challenging task due to the heterogeneity of patients, missing information in reports and the lack of well-annotated documents. We participated in the first Biobank Disease Challenge, an AI and machine learning data analytic competition consisting of two tasks: the Task 1 is to develop computational phenotyping algorithms and the Task 2 is an open-ended task aiming for novel insights from AI. Overall, our results were ranked as tied 1st place in the Biobank Disease Challenge.

For the Task 1, we design a model which uses semi-supervised learning to learn informative feature representation from training EHRs and to predict the states of certain diseases on test EHRs. We first learn the representations of the raw features from all patients, including both unlabeled and labeled ones, by principal component analysis. Then using the learnt representations of the labeled patients, we train a logistic classifier that accurately categorizes the patient cohorts by disease states. Meanwhile, the interpretability of the model is retained since we can calculate the relative importance of each raw feature for the classification.

For the Task 2, we formulate the medical treatment procedure as a Markov decision process and develop an intelligent agent to learn the best treatment option for each patient with migraine headache from EHR. Guided by the next-hospital-visit interval as the reward signal, the agent is optimized using deep reinforcement learning.

Automatic Human-like Mining and Constructing Reliable Genetic Association Database with Deep Reinforcement Learning [bioRxiv] [pdf]

The increasing amount of scientific literature in biological and biomedical science research has created a challenge in continuous and reliable curation of the latest knowledge discovered, and automatic biomedical text-mining has been one of the answers to this challenge. In this paper, we aim to further improve the reliability of biomedical text-mining by training the system to directly simulate the human behaviors such as querying the PubMed, selecting articles from queried results, and reading selected articles for knowledge. We take advantage of the efficiency of biomedical text-mining, the flexibility of deep reinforcement learning, and the massive amount of knowledge collected in UMLS into an integrative artificial intelligent reader that can automatically identify the authentic articles and effectively acquire the knowledge conveyed in the articles.

We construct a system, whose current primary task is to build the genetic association database between genes and complex traits of human. Our contributions in this paper are three-fold: 1) We propose to improve the reliability of text-mining by building a system that can directly simulate the behavior of a researcher, and we develop corresponding methods, such as Bi-directional LSTM for text mining and Deep Q-Network for organizing behaviors. 2) We demonstrate the effectiveness of our system with an example in constructing a genetic association database. 3) We release our implementation as a generic framework for researchers in the community to conveniently construct other databases.

AttentionMeSH: Interpretable and Scalable Automatic MeSH Indexer [pdf] [code] [slides]

There are millions of articles in PubMed database. To facilitate information retrieval, curators in the National Library of Medicine (NLM) assign a set of Medical Subject Headings (MeSH) to each article. MeSH is a hierarchically-organized vocabulary, containing about 28K different concepts, covering the fields from clinical medicine to information sciences. Several automatic MeSH indexing models have been developed to improve the time-consuming and financially expensive manual annotation, including the NLM official tool -- Medical Text Indexer, and the winner of BioASQ Task5a challenge -- DeepMeSH. However, these models are complex and not interpretable. We propose a novel end-to-end model, AttentionMeSH, which utilizes deep learning and attention mechanism to index MeSH terms to biomedical text. The attention mechanism enables the model to associate textual evidence with annotations, thus providing interpretability at the word level. The model also uses a novel masking mechanism to enhance accuracy and speed. In the final week of BioASQ Chanllenge Task6a, we ranked 2nd by average MiF using an on-construction model. After the contest, we achieve close to state-of-the-art MiF performance of ~0.684 using our final model. Human evaluations show AttentionMeSH also provides high level of interpretability, retrieving about 90% of all expert-labeled relevant words given an MeSH-article pair at 20 output.

Neural Models for Reasoning over Multiple Mentions [arXiv] [pdf] [code]

Many problems in NLP require aggregating information from multiple mentions of the same entity which may be far apart in the text. Existing Recurrent Neural Network (RNN) layers are biased towards short-term dependencies and hence not suited to such tasks. We present a recurrent layer which is instead biased towards coreferent dependencies. The layer uses coreference annotations extracted from an external system to connect entity mentions belonging to the same cluster. Incorporating this layer into a state-of-the-art reading comprehension model improves performance on three datasets -- Wikihop, LAMBADA and the bAbi AI tasks -- with large gains when training data is scarce.

October 2017 - Present

Undergraduate Research Assistant

Department of Automation, Tsinghua University
In silico re-deciphering of the genetic code [Report]

My first deep learning project performed in Jan 2017. I tried biological seq2seq (from mRNA to Protein translation) and re-deciphered the genetic code in silico.

Permutation clustering of the DNA sequence facilitates understanding of the nonlinearly organized genome [Proposal]

I tried a permutation-based clustering approach on Hi-C data and identified several interesting patterns after permutation clustering. [Slides of the proposal]

September 2016 - August 2017

Member of Tsinghua 2016 iGEM Team

School of Life Sciences, Tsinghua University
A CRISPR/Cas9-based gene surveillance system in S. cerevisiae [Project Website]

Genetic mutations are the fundation of cancer development. We propose to construct a suvCas9 system in Saccharomyces cerevisiae to function as the monitor of genomic sequence. The system has the potential to prevent carcinogenesis by killing all the mutated cells. [This video] gives an introduction of how our system works.

This project won a Gold Medal in iGEM 2016.

December 2015 - November 2016

Undergraduate Research Assistant

School of Aerospace, Tsinghua University
Building of an orthopedics surgical robot with visual servo system [Patent]

We built a robot with visual servo system that can perform fracture restoration surgery. I am in charge of the computer vision part. We had a patent in China and applied for some more Chinese and international patents.

October 2016 - August 2017

Undergraduate Research Assistant (Student Leader)

School of Medicine, Tsinghua University
Based on a sequenced pedigree: a novel TTN truncating variant related to dilated cardiomyopathy

We reported a case of DCM where genetic testing identified a novel familial truncating mutation in the TTN gene. We believe it follows an autosomal-dominant inheritance pattern with low penetrance. Genetic testing results of the family members identified his son, brother and brother’s daughter as carriers, but they are not clinically affected. We speculate that the unfavorable lifestyle of the patient may contribute to the disease. Our findings may provide some insights of prevention, diagnosis and management of DCM with similar mutations and demonstrate the importance of genetic screening of family members after finding an index patient.

January 2016 - August 2017

Xuetang Talent

Tsinghua University

Guess where I am?

Februay 2016 - August 2017

Class President

Class Bio47, Tsinghua University

I focused on building a good academic atmosphere during my term. Average GPA of the class increased by 2 percents. We also won the 'Jiatuan' of Tsinghua University, which is a top honor only for the best classes in the university.

And yes - we do have a term limit!

September 2015 - September 2016

Intern

Peking Union Medical College Hospital & Beijing Tsinghua Changgung Hospital

Shadowing program: Dpt. of Neurosurgery and Dpt. of Infectious Disease at Peking Union Medical College Hospital (北京协和医院); Multiple important departments at Beijing Tsinghua Changgung Hospital (北京清华长庚医院).

July 2016

Team Leader

Healper, a 'pre-start-up' company

A Beijing Undergraduate Training Program for Innovation and Entrepreneurship funded by the Government of Beijing. Healper is a chat robot for health management based on questionares. Our project was graded excellence by the committee.

August 2016 - August 2017