About
I am a researcher at the BioNLP group under NCBI/NLM/NIH. I received my MD degree from Tsinghua University in 2022. Prior to that, I was an exchange student at the University of Pittsburgh and Carnegie Mellon University. I got my BSc degree from Tsinghua in 2019. My long-term goal is to democratize biomedical knowledge by providing accurate, verifiable, and understandable information to everyone in need. Currently, I work on three topics regarding large language models for biomedicine:
Evaluating the medical capabilities of LLMs. In 2019, we released and characterized the first decoder-only biomedical language model, BioELMo. Our PubMedQA is one of the most commonly used benchmarks for evaluating LLMs in biomedicine. We also reported their hallucinations in information seeking, hidden flaws behind expert-level performance, and safety vulnerabilities under adversarial attacks.
Augmentation with retrieval and domain tools. We trained MedCPT, state-of-the-art embedding model for biomedicine, using large-scale PubMed search logs. Our MedRAG toolkit and benchmark offer practical guidelines for retrieval-augmented generation in medicine. We released a team of biomedical AI agents, including GeneGPT, GeneAgent, and AgentMD.
Novel LLM applications in biomedicine. We are one of the pioneers in using LLMs for patient-to-trial matching with TrialGPT, which won the NIH Director’s Chanllenge Awards. TrialGPT was covered by AUANews, FedScoop, and Azure Government. We wrote reviews for opportunities and challenges in biomedical LLMs and biomedical question answering.
I serve as the Associate Editor of the Journal of Medical Internet Research (JMIR), on the editorial board of the Journal of Biomedical Informatics (JBI), and on the editorial committee of a special issue of Journal of the American Medical Informatics Association (JAMIA).
I am actively looking for potential collaborators and students on these topics. Please feel free to send me an email if you are interested in working with me.
News
- (11/24) TrialGPT: Matching patients to clinical trials with LLMs. Nat Commun
- (10/24) Primer: Demystifying LLMs for medicine.
- (09/24) MedCalc-Bench: The GSM8k in medicine. NeurIPS Datasets and Benchmarks
- (09/24) MedReview: LLMs for medical summarization. npj Digit Med
- (08/24) i-MedRAG: Improving MedRAG with iterative follow-up questions. PSB
- (07/24) Hidden Flaws: behind expert-level scores of GPT-4V in medicine. npj Digit Med
- (06/24) Safety: Adversarial attacks on LLMs in medicine.
- (05/24) GeneAgent: Language agents for gene set knowledge discovery.
- (05/24) MedRAG: Retrieval-augmented generation for medicine. ACL Findings
- (04/24) Trustworthy: Generative AI for evidence-based medicine. J Biomed Inform
- (02/24) AgentMD: Large-Scale clinical tool learning.
- (02/24) GeneGPT: Augmenting LLMs with biomedical tools for QA. Bioinformatics
- (02/24) PubMed & Beyond: Biomedical literature search in the age of AI. eBioMedicine
- (01/24) Survey: LLMs for biomedicine and healthcare. Brief Bioinform
- (12/23) PMC-Patients: 167k open patient summaries. Sci Data
- (11/23) MedCPT: Foundation models for embedding bio-texts. Bioinformatics
- (05/23) Perspective: LLMs and medical literature search. J Am Soc Nephrol
- (04/23) LADER: Log-augmented biomedical literature search. ACM SIGIR
- (03/23) RAMM: Retrieval-augmented visual QA in biomedicine. ACM Multimedia
- (09/22) Joined Dr. Zhiyong Lu’s BioNLP Group at NCBI/NLM/NIH.