Top-performing Natural Language Processing Model
Stanford Question Answering Dataset 2.0
2nd Place
Top-performing Natural Language Processing Model
Abstract
Released in 2016, the Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset consisting of questions posed by crowdworkers on a set of Wikipedia articles. The answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable. Performance on SQuAD surpassed human performance in 2018, and in response the SQuAD2.0 challenge was released, which combines the 100,000 questions in SQuAD1.1 with over 50,000 new, unanswerable questions written adversarially by crowdworkers to look similar to answerable ones. Solutions to SQuAD2.0 will not only answer questions when possible, but also determine when no answer is supported by the paragraph and abstain from answering. The Layer 6 NLP Team developed a model that ranked the 2nd place in the leadership board (as of March 28, 2019).
