About

I am a researcher at the BioNLP group under NLM/NIH, supported by the NIH Pathway to Independence (K99/R00) Award. I received my MD degree from Tsinghua University in 2022. Prior to that, I got my BSc degree from Tsinghua in 2019. My long-term goal is to democratize biomedical knowledge by providing accurate, verifiable, and understandable information to everyone in need. Currently, I work on language modeling for biomedicine:

Evaluating the medical capabilities of LLMs. In 2019, we released and characterized the first decoder-only biomedical language model, BioELMo. Our PubMedQA is one of the most commonly used benchmarks for evaluating LLMs in biomedicine. We also reported their hallucinations in information seeking, hidden flaws behind expert-level performance, and safety vulnerabilities under adversarial attacks.

Augmentation with retrieval and domain tools. We trained MedCPT, state-of-the-art embedding model for biomedicine, using large-scale PubMed search logs. Our MedRAG toolkit and benchmark offer practical guidelines for retrieval-augmented generation in medicine. We released a team of biomedical AI agents, including GeneGPT, GeneAgent, and AgentMD.

Novel LLM applications in biomedicine. We are one of the pioneers in using LLMs for patient-to-trial matching with TrialGPT, which won the NIH Director’s Chanllenge Awards. TrialGPT was covered by POLITICO, Nature, AUANews, and Azure Government. We wrote reviews for opportunities and challenges in biomedical LLMs and biomedical question answering.

I serve as the Area Chair for the ACL Rolling Review, the Associate Editor of the Journal of Medical Internet Research (JMIR), on the editorial board of the Journal of Biomedical Informatics (JBI), and on the editorial committee of a special issue of Journal of the American Medical Informatics Association (JAMIA).

I am actively looking for potential collaborators and students on these topics. Please feel free to send me an email if you are interested in working with me. I also welcome inquiries from institutions seeking R00-supported tenure-track faculty in medical AI, biomedical informatics, or related areas.

News

(10/25) AgentMD: Large-Scale clinical tool learning. Nat Commun
(10/25) Safety: Adversarial attacks on LLMs in medicine. Nat Commun
(09/25) I received the NIH Pathway to Independence (K99/R00) Award!
(07/25) GeneAgent: Language agents for gene set knowledge discovery. Nat Methods
(06/25) MedCite: Medical text generation with citations. ACL Findings
(06/25) Cell-o1: Large reasoning model for single cell analysis.
(06/25) Medical AI Evaluation: Thoughts on OpenAI’s HealthBench and Beyond.
(05/25) BriefContext: Improving medical RAG with MapReduce. npj Digit Med
(04/25) Benchmark: LLMs for BioNLP tasks. Nat Commun
(03/25) Evaluation: Quantitative info on LLM-generated DDx. npj Digit Med
(02/25) RAG-Gym: Improving agentic RAG with reasoning.
(11/24) TrialGPT: Patient-to-trial matching with AI. Editors’ Highlights at Nat Commun
(10/24) Primer: Demystifying LLMs for medicine.
(09/24) MedCalc-Bench: The GSM8k in medicine. NeurIPS Datasets and Benchmarks
(09/24) MedReview: LLMs for medical summarization. npj Digit Med
(08/24) i-MedRAG: Improving MedRAG with iterative follow-up questions. PSB
(07/24) Hidden Flaws: behind expert-level scores of GPT-4V in medicine. npj Digit Med
(05/24) MedRAG: Retrieval-augmented generation for medicine. ACL Findings
(02/24) GeneGPT: Augmenting LLMs with biomedical tools for QA. Bioinformatics
(02/24) PubMed & Beyond: Biomedical literature search in the age of AI. eBioMedicine
(01/24) Survey: LLMs for biomedicine and healthcare. Brief Bioinform
(12/23) PMC-Patients: 167k open patient case reports. Sci Data
(11/23) MedCPT: Foundation models for embedding bio-texts. Bioinformatics
(04/23) LADER: Log-augmented biomedical literature search. ACM SIGIR
(03/23) RAMM: First visual RAG system for medical QA. ACM Multimedia
(09/22) Joined Dr. Zhiyong Lu’s BioNLP Group at NCBI/NLM/NIH.

Qiao Jin

News