Biomedical and Healthcare Natural Language Processing

CS532, Spring 2022

University of Illinois Chicago


Objective

In the recent past, there has been a dramatic revolution in Artificial Intelligence (AI). Natural Language Processing (NLP), a sub-field of AI, has seamlessly reshaped our interaction with the machine at various fronts, whether our conversation with chatbots, face recognition, or autonomous driving. A similar trend is also observed in the healthcare and biomedical domains. With the rapid digitization of medical records, an exponential rise in biomedical literature, and the growing interest in patient interaction with social media, there has been a significant advancement across several biomedical and healthcare NLP problems. From curating biological information to automating health surveillance of disease outbreaks, there have been a number of success stories in various biomedical NLP applications.

This seminar course is designed to familiarize students with cutting-edge research in the biomedical and healthcare NLP. It will provide a systematic introduction to several biomedical/clinical/healthcare problems, data processing from different sources, including clinical narrative, social media, the biomedical literature, paper discussion, and research projects that aim to solve real-life healthcare/biomedical problems. The key topics to be covered are biomedical/clinical information extraction, semantics and biomedical/healthcare knowledge graph, biomedical/healthcare question answering, disease prediction and progression, multi-modal biomedical NLP, summarization, dialogue generation in healthcare and medical domains, and modeling conversations in the healthcare domain.



Time

Monday and Wednesday from 15:00-16:15 CST (Thomas Beckham Hall 180C)
First two weeks of instruction will be online.

Office Hours

Friday from 11:00-12:00 CST (Virtual)

Piazza

https://piazza.com/uic/spring2022/cs532
Textbook and Readings
  • Biomedical Natural Language Processing, Kevin Bretonnel Cohen and Dina Demner-Fushman
  • Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition, Daniel Jurafsky and James H. Martin, available online
Grading Policy
  • Project: proposal, presentations and final paper (50%)
  • Paper presentations (20%)
  • Paper critique (10%)
  • Class participation, discussion, and brainstorming (20%)
Prerequisites
  • CS 421 (Natural Language Processing) or CS 521 (Statistical Natural Language Processing) or other equivalent NLP class, and
  • CS 533 (Deep Learning for NLP)

Coursework

The course consists of:

  1. Paper Presentation
    In each paper presentation class, there will be two presentations (20 minutes each) on the pre-defined research topic. Each student has to present at least two papers in the course.

  2. Paper Discussion Session
    The paper presentation will be followed by 10 minutes paper discussion session. One participant will argue in favor of the paper, and one will argue against the paper. Each student has to lead the discussion twice (once in favor and once in against the paper) throughout the course.

  3. Paper Critique
    Each student has to submit detailed assessment of three research papers (not more than two pages) that will consist of (i) key contributions of the paper, (ii) main strengths, and (iii) weaknesses of the paper.

  4. Project
    The final project provide you the opportunity to apply your newly acquired skills towards solving real-life biomedical and healthcare problems. A team of two students has to submit a project at the end of the coursework. The deliverable for the final project include:

    • Proposal Submission
      Each team need to provide a one-page proposal by Feb 4. The proposal should outline your research objectives, an explanation of objectives, and plans for pursuing them.

    • Project Presentation
      Each team has to present twice in the semester – at the beginning (week 4) and the end of the semester (week 14). In the first presentation, the team needs to give their research problem, motivation, a plan to tackle the challenge, and a timeline to complete the project. The final presentation will be more focused on the methodology, experimental results, analysis, and discussion of the research project.

    • Final Paper
      Each team must write a final report in the NLP conferences (e.g., ACL, NAACL, AAAI) paper format. The paper should have an abstract, introduction, clear motivation, contribution, related works, proposed method, comparison with other baselines, results, and analysis.


Schedule

This is a tentative schedule and is subject to change.

Week Topics Readings and useful links
Week 1
Introduction to Biomedical/Healthcare NLP: background, challenges, applications Chapter 1 from Cohen and Demner-Fushman's book
Week 2
Biomedical NLP case study: n2c2 (formerly i2b2) challenge and Bio creative challenge
  1. National NLP Clinical Challenges (n2c2)
  2. BioCreative Tasks
Week 3
Biomedical NLP systems: UMLS, Stanza, MetaMap, cTAKES
  1. The Unified Medical Language System (UMLS): integrating biomedical terminology
  2. Biomedical and clinical English model packages for the Stanza Python NLP library
  3. An overview of MetaMap: historical perspective and recent advances
  4. Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications
Week 4
Project Proposal
Week 5
Biomedical/Clinical Information Extraction
  1. A Multi-Task Approach for Improving Biomedical Named Entity Recognition by Incorporating Multi-Granularity Information
  2. Cross-Domain Data Integration for Named Entity Disambiguation in Biomedical Text
  3. BioMegatron: Larger Biomedical Domain Language Model
  4. Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing
Week 6
Semantics and Biomedical/Healthcare Knowledge Graph
  1. Joint Biomedical Entity and Relation Extraction with Knowledge-Enhanced Collective Inference
  2. SMedBERT: A Knowledge-Enhanced Pre-trained Language Model with Structured Semantics for Medical Text Mining
  3. The impact of learning Unified Medical Language System knowledge embeddings in relation extraction from biomedical texts
  4. Incorporating medical knowledge in BERT for clinical relation extraction
Week 7
Biomedical and Healthcare Question Answering
  1. External Features Enriched Model for Biomedical Question Answering
  2. PubMedQA: A Dataset for Biomedical Research Question Answering
  3. Consumer Health Information and Question Answering: Helping Consumers Find Answers to Their Health-related Information Needs
  4. Clinical Reading Comprehension: A Thorough Analysis of the emrQA Dataset
  5. Towards Automating Healthcare Question Answering in a Noisy Multilingual Low-Resource Setting
  6. Interpretable Multi-Step Reasoning with Knowledge Extraction on Complex Healthcare Question Answering
  7. Towards Medical Machine Reading Comprehension with Structural Knowledge and Plain Text
Week 8
Disease Prediction and Progression
  1. Med-BERT: Pretrained Contextualized Embeddings on Largescale Structured Electronic Health Records for Disease Prediction
  2. Automated Monitoring of Tweets for Early Detection of the 2014 Ebola Epidemic
  3. Predicting Mortality in Critically ill Patients with Diabetes using Machine Learning and Clinical Notes
  4. Natural Language Processing to Ascertain Cancer Outcomes From Medical Oncologist Notes
Week 9
Multimodal Biomedical NLP
  1. MELINDA: A Multimodal Dataset for Biomedical Experiment Method Classification
  2. MedFuseNet: An Attention-based Multimodal Deep Learning Model for Visual Question Answering in the Medical Domain
  3. Fusion of Medical Imaging and Electronic Health Records using Deep Learning: A Systematic Review and Implementation Guidelines
  4. Natural Language Processing to Ascertain Cancer Outcomes From Medical Oncologist Notes
Week 10
Biomedical Document and Healthcare Records Summarization
  1. A Discourse-Aware Attention Model for Abstractive Summarization of Long Documents
  2. SumPubMed: Summarization Dataset of PubMed Scientific Articles
  3. What Happened to Me while I Was in the Hospital? Challenges and Opportunities for Generating Patient-Friendly Hospitalization Summaries
  4. Generating SOAP Notes from Doctor-Patient Conversations Using Modular Summarization Techniques
  5. MS^2: Multi-Document Summarization of Medical Studies
  6. Optimizing the Factual Correctness of a Summary: A Study of Summarizing Radiology Reports
  7. Goal Summarization for Human-Human Health Coaching Dialogues
Week 11
Dialogue Generation in Healthcare and Medical Domains
  1. MedDialog: Large-scale Medical Dialogue Datasets
  2. Would you like to tell me more? Generating a corpus of psychotherapy dialogues
  3. Towards Facilitating Empathic Conversations in Online Mental Health Support: A Reinforcement Learning Approach
Week 12
Modeling Conversations in the Healthcare Domain
  1. A Computational Approach to Understanding Empathy Expressed in Text-Based Mental Health Support
  2. A Quantitative Analysis of Patients’ Narratives of Heart Failure
  3. Summarizing Behavioral Change Goals from SMS Exchanges to Support Health Coaches
  4. Human-Human Health Coaching via Text Messages: Corpus, Annotation, and Analysis
  5. Modeling Dialogue in Conversational Cognitive Health Screening Interviews
Week 13
The Role of Social Media in the Healthcare Domain
  1. It Takes Two to Empathize: One to Seek and One to Provide
  2. Identifying Depressive Symptoms from Tweets: Figurative Language Enabled Multitask Learning Framework
  3. Identifying Medical Self-Disclosure in Online Communities
  4. Modeling Self-Disclosure in Social Networking Sites
  5. The Channel Matters: Self-disclosure, Reciprocity and Social Support in Online Cancer Support Groups
  6. CancerEmo: A Dataset for Fine-Grained Emotion Detection
Week 14
Final Project Presentation
Week 15
Paper Due
Note
Students can select the papers outside the paper list mentioned above. However, it has to be approved by the course instructor one week before the paper-discussion week.