Biomedical and Healthcare Natural Language Processing

CS532, Spring 2022

University of Illinois Chicago

Objective

In the recent past, there has been a dramatic revolution in Artificial Intelligence (AI). Natural Language Processing (NLP), a sub-field of AI, has seamlessly reshaped our interaction with the machine at various fronts, whether our conversation with chatbots, face recognition, or autonomous driving. A similar trend is also observed in the healthcare and biomedical domains. With the rapid digitization of medical records, an exponential rise in biomedical literature, and the growing interest in patient interaction with social media, there has been a significant advancement across several biomedical and healthcare NLP problems. From curating biological information to automating health surveillance of disease outbreaks, there have been a number of success stories in various biomedical NLP applications.

This seminar course is designed to familiarize students with cutting-edge research in the biomedical and healthcare NLP. It will provide a systematic introduction to several biomedical/clinical/healthcare problems, data processing from different sources, including clinical narrative, social media, the biomedical literature, paper discussion, and research projects that aim to solve real-life healthcare/biomedical problems. The key topics to be covered are biomedical/clinical information extraction, semantics and biomedical/healthcare knowledge graph, biomedical/healthcare question answering, disease prediction and progression, multi-modal biomedical NLP, summarization, dialogue generation in healthcare and medical domains, and modeling conversations in the healthcare domain.

Time	Monday and Wednesday from 15:00-16:15 CST (Thomas Beckham Hall 180C) First two weeks of instruction will be online.
Office Hours	Friday from 11:00-12:00 CST (Virtual)
Piazza	https://piazza.com/uic/spring2022/cs532
Textbook and Readings	Biomedical Natural Language Processing, Kevin Bretonnel Cohen and Dina Demner-Fushman Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition, Daniel Jurafsky and James H. Martin, available online
Grading Policy	Project: proposal, presentations and final paper (50%) Paper presentations (20%) Paper critique (10%) Class participation, discussion, and brainstorming (20%)
Prerequisites	CS 421 (Natural Language Processing) or CS 521 (Statistical Natural Language Processing) or other equivalent NLP class, and CS 533 (Deep Learning for NLP)

Coursework

The course consists of:

Paper Presentation
In each paper presentation class, there will be two presentations (20 minutes each) on the pre-defined research topic. Each student has to present at least two papers in the course.
Paper Discussion Session
The paper presentation will be followed by 10 minutes paper discussion session. One participant will argue in favor of the paper, and one will argue against the paper. Each student has to lead the discussion twice (once in favor and once in against the paper) throughout the course.
Paper Critique
Each student has to submit detailed assessment of three research papers (not more than two pages) that will consist of (i) key contributions of the paper, (ii) main strengths, and (iii) weaknesses of the paper.
Project
The final project provide you the opportunity to apply your newly acquired skills towards solving real-life biomedical and healthcare problems. A team of two students has to submit a project at the end of the coursework. The deliverable for the final project include:
- Proposal Submission
  Each team need to provide a one-page proposal by Feb 4. The proposal should outline your research objectives, an explanation of objectives, and plans for pursuing them.
- Project Presentation
  Each team has to present twice in the semester – at the beginning (week 4) and the end of the semester (week 14). In the first presentation, the team needs to give their research problem, motivation, a plan to tackle the challenge, and a timeline to complete the project. The final presentation will be more focused on the methodology, experimental results, analysis, and discussion of the research project.
- Final Paper
  Each team must write a final report in the NLP conferences (e.g., ACL, NAACL, AAAI) paper format. The paper should have an abstract, introduction, clear motivation, contribution, related works, proposed method, comparison with other baselines, results, and analysis.

Schedule

This is a tentative schedule and is subject to change.

Week	Topics	Readings and useful links
Week 1	Introduction to Biomedical/Healthcare NLP: background, challenges, applications	Chapter 1 from Cohen and Demner-Fushman's book
Week 2	Biomedical NLP case study: n2c2 (formerly i2b2) challenge and Bio creative challenge	National NLP Clinical Challenges (n2c2) BioCreative Tasks
Week 3	Biomedical NLP systems: UMLS, Stanza, MetaMap, cTAKES	The Unified Medical Language System (UMLS): integrating biomedical terminology Biomedical and clinical English model packages for the Stanza Python NLP library An overview of MetaMap: historical perspective and recent advances Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications
Week 4	Project Proposal
Week 5	Biomedical/Clinical Information Extraction	A Multi-Task Approach for Improving Biomedical Named Entity Recognition by Incorporating Multi-Granularity Information Cross-Domain Data Integration for Named Entity Disambiguation in Biomedical Text BioMegatron: Larger Biomedical Domain Language Model Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing
Week 6	Semantics and Biomedical/Healthcare Knowledge Graph	Joint Biomedical Entity and Relation Extraction with Knowledge-Enhanced Collective Inference SMedBERT: A Knowledge-Enhanced Pre-trained Language Model with Structured Semantics for Medical Text Mining The impact of learning Unified Medical Language System knowledge embeddings in relation extraction from biomedical texts Incorporating medical knowledge in BERT for clinical relation extraction
Week 7	Biomedical and Healthcare Question Answering	External Features Enriched Model for Biomedical Question Answering PubMedQA: A Dataset for Biomedical Research Question Answering Consumer Health Information and Question Answering: Helping Consumers Find Answers to Their Health-related Information Needs Clinical Reading Comprehension: A Thorough Analysis of the emrQA Dataset Towards Automating Healthcare Question Answering in a Noisy Multilingual Low-Resource Setting Interpretable Multi-Step Reasoning with Knowledge Extraction on Complex Healthcare Question Answering Towards Medical Machine Reading Comprehension with Structural Knowledge and Plain Text
Week 8	Disease Prediction and Progression	Med-BERT: Pretrained Contextualized Embeddings on Largescale Structured Electronic Health Records for Disease Prediction Automated Monitoring of Tweets for Early Detection of the 2014 Ebola Epidemic Predicting Mortality in Critically ill Patients with Diabetes using Machine Learning and Clinical Notes Natural Language Processing to Ascertain Cancer Outcomes From Medical Oncologist Notes
Week 9	Multimodal Biomedical NLP	MELINDA: A Multimodal Dataset for Biomedical Experiment Method Classification MedFuseNet: An Attention-based Multimodal Deep Learning Model for Visual Question Answering in the Medical Domain Fusion of Medical Imaging and Electronic Health Records using Deep Learning: A Systematic Review and Implementation Guidelines Natural Language Processing to Ascertain Cancer Outcomes From Medical Oncologist Notes
Week 10	Biomedical Document and Healthcare Records Summarization	A Discourse-Aware Attention Model for Abstractive Summarization of Long Documents SumPubMed: Summarization Dataset of PubMed Scientific Articles What Happened to Me while I Was in the Hospital? Challenges and Opportunities for Generating Patient-Friendly Hospitalization Summaries Generating SOAP Notes from Doctor-Patient Conversations Using Modular Summarization Techniques MS^2: Multi-Document Summarization of Medical Studies Optimizing the Factual Correctness of a Summary: A Study of Summarizing Radiology Reports Goal Summarization for Human-Human Health Coaching Dialogues
Week 11	Dialogue Generation in Healthcare and Medical Domains	MedDialog: Large-scale Medical Dialogue Datasets Would you like to tell me more? Generating a corpus of psychotherapy dialogues Towards Facilitating Empathic Conversations in Online Mental Health Support: A Reinforcement Learning Approach
Week 12	Modeling Conversations in the Healthcare Domain	A Computational Approach to Understanding Empathy Expressed in Text-Based Mental Health Support A Quantitative Analysis of Patients’ Narratives of Heart Failure Summarizing Behavioral Change Goals from SMS Exchanges to Support Health Coaches Human-Human Health Coaching via Text Messages: Corpus, Annotation, and Analysis Modeling Dialogue in Conversational Cognitive Health Screening Interviews
Week 13	The Role of Social Media in the Healthcare Domain	It Takes Two to Empathize: One to Seek and One to Provide Identifying Depressive Symptoms from Tweets: Figurative Language Enabled Multitask Learning Framework Identifying Medical Self-Disclosure in Online Communities Modeling Self-Disclosure in Social Networking Sites The Channel Matters: Self-disclosure, Reciprocity and Social Support in Online Cancer Support Groups CancerEmo: A Dataset for Fine-Grained Emotion Detection
Week 14	Final Project Presentation
Week 15	Paper Due

Note

Students can select the papers outside the paper list mentioned above. However, it has to be approved by the course instructor one week before the paper-discussion week.

Biomedical and Healthcare Natural Language Processing

CS532, Spring 2022

University of Illinois Chicago

Objective

Time

Office Hours

Piazza

Textbook and Readings

Grading Policy

Prerequisites

Coursework

The course consists of:

Paper Presentation

Paper Discussion Session

Paper Critique

Project

Proposal Submission

Project Presentation

Final Paper

Schedule

This is a tentative schedule and is subject to change.

Note