Skim Literature (SkimLit) 📜
Project Description
In this project, I replicated the deep learning model behind the 2017 paper PubMed 200k RCT: a
Dataset for Sequenctial Sentence Classification in Medical Abstracts and adding further modelling
experiments on my own.
When it was released, the paper presented a new dataset called PubMed 200k RCT which consists of ~200,000
labelled Randomized Controlled Trial (RCT) abstracts.
What is RCT?
Randomized controlled trials (RCT) are prospective studies that measure the effectiveness of a new
intervention or treatment. Although no study is likely on its own to prove causality, randomization reduces
bias and provides a rigorous tool to examine cause-effect relationships between an intervention and outcome.
The goal of the dataset was to explore the ability for NLP models to classify sentences which appear in
sequential order.
In other words, given the abstract of a RCT, what role does each sentence serve in the abstract?
- View Project Implementation: