Hoover, B., Strobelt, H. & Gehrmann, S., Submitted. exBERT: A Visual Analysis Tool to Explore Learned Representations in Transformers Models.Abstract

Large language models can produce powerful contextual representations that lead to improvements across many NLP tasks. Since these models are typically guided by a sequence of learned self attention mechanisms and may comprise undesired inductive biases, it is paramount to be able to explore what the attention has learned. While static analyses of these models lead to targeted insights, interactive tools are more dynamic and can help humans better gain an intuition for the model-internal reasoning process. We present exBERT, an interactive tool named after the popular BERT language model, that provides insights into the meaning of the contextual representations by matching a human-specified input to similar contexts in a large annotated dataset. By aggregating the annotations of the matching similar contexts, exBERT helps intuitively explain what each attention-head has learned. 

Ziegler, Z., et al., Submitted. Encoder-Agnostic Adaptation for Conditional Language Generation.Abstract

Large pretrained language models have changed the way researchers approach discriminative natural language understanding tasks, leading to the dominance of approaches that adapt a pretrained model for arbitrary downstream tasks. However, it is an open question how to use similar techniques for language generation. Early results in the encoder-agnostic setting have been mostly negative. In this work, we explore methods for adapting a pretrained language model to arbitrary conditional input.


We observe that pretrained transformer models are sensitive to large parameter changes during tuning. Therefore, we propose an adaptation that directly injects arbitrary conditioning into self attention, an approach we call pseudo self attention. Through experiments on four diverse conditional text generation tasks, we show that this encoder-agnostic technique outperforms strong baselines, produces coherent generations, and is data-efficient.

Gehrmann, S., Ziegler, Z. & Rush, A.M., 2019. Generating Abstractive Summaries with Finetuned Language Models. In INLG 2019 (GenChal Track).Abstract
Neural abstractive document summarization is commonly approached by models that exhibit a mostly extractive behavior. This behavior is facilitated by a copy-attention which allows models to copy words from a source document. 
  While models in the mostly extractive news summarization domain benefit from this inductive bias, they commonly fail to paraphrase or compress information from the source document.
  Recent advances in transfer-learning from large pretrained language models give rise to alternative approaches that do not rely on copy-attention and instead learn to generate concise and abstractive summaries. 
  In this paper, as part of the TL;DR challenge, we compare the abstractiveness of summaries from different summarization approaches and show that transfer-learning can be efficiently utilized without any changes to the model architecture. 
  We demonstrate that the approach leads to a higher level of abstraction for a similar performance on the TL;DR challenge tasks, enabling true natural language compression.
Gehrmann, S., et al., 2019. Visual Interactions with Deep Models through Collaborative Semantic Inference. IEEE Transactions on Visualization and Computer Graphics. Publisher's VersionAbstract
Automation of tasks can have critical consequences when humans lose agency over decision processes. Deep learning models are particularly susceptible since current black-box approaches lack explainable reasoning. We argue that both the visual interface and model structure of deep learning systems need to take into account interaction design. We propose a framework of collaborative semantic inference (CSI) for the co-design of interactions and models to enable visual collaboration between humans and algorithms. The approach exposes the intermediate reasoning process of models which allows semantic interactions with the visual metaphors of a problem, which means that a user can both understand and control parts of the model reasoning process. We demonstrate the feasibility of CSI with a co-designed case study of a document summarization system.
Zancanaro, M., et al., 2019. Evaluating an Automated Mediator for Joint Narratives in a Conflict Situation. Behavior & Information Technology. Publisher's VersionAbstract
Joint narratives are often used in the context of reconciliation interventions for people in social conflict situations, which arise, for example, due to ethnic or religious differences. The interventions aim to encourage a change in attitudes of the participants towards each other. Typically, a human mediator is fundamental for achieving a successful intervention. In this work, we present an automated approach to support remote interactions between pairs of participants as they contribute to a shared story in their own language. A key component is an automated cognitive tutor that guides the participants through a controlled escalation/de-escalation process during the development of a joint narrative. We performed a controlled study comparing a trained human mediator to the automated mediator. The results demonstrate that an automated mediator, although simple at this stage, effectively supports interactions and helps to achieve positive outcomes comparable to those attained by the trained human mediator.
Gehrmann, S., Strobelt, H. & Rush, A.M., 2019. GLTR: Statistical Detection and Visualization of Generated Text. In ACL 2019 (Demo). acl_demo_2019_gltr_statistical_detection_of_fake_text_5.pdf
Süzgün, M., et al., 2019. LSTMs can Perform Dynamic Counting. Deep Learning and Formal Languages (ACL 2019 Workshop). lstms_can_perform_dynamic_counting_acl_19_.pdf
Sercu, T., et al., 2019. Interactive Visual Exploration of Latent Space (IVELS) for Peptide Auto-Encoder Model Selection. Deep Generative Models for Highly Structured Data (ICLR 2019 Workshop). amp_vis_iclr2019_workshop_1.pdf
Gehrmann, S., Layne, S. & Dernoncourt, F., 2019. Improving Human Text Comprehension through Semi-Markov CRF-based Neural Section Title Generation. North American Chapter of the Association for Computational Linguistics. Publisher's VersionAbstract
Titles of short sections within long documents support readers by guiding their focus towards relevant passages and by providing anchor-points that help to understand the progression of the document. The positive effects of section titles are even more pronounced when measured on readers with less developed reading abilities, for example in communities with limited labeled text resources. We, therefore, aim to develop techniques to generate section titles in low-resource environments. In particular, we present an extractive pipeline for section title generation by first selecting the most salient sentence and then applying deletion-based compression. Our compression approach is based on a Semi-Markov Conditional Random Field that leverages unsupervised word-representations such as ELMo or BERT, eliminating the need for a complex encoder-decoder architecture. The results show that this approach leads to competitive performance with sequence-to-sequence models with high resources, while strongly outperforming it with low resources. In a human-subject study across subjects with varying reading abilities, we find that our section titles improve the speed of completing comprehension tasks while retaining similar accuracy.
naacl2019.pdf naacl19-presentation.pdf
Elder, H., et al., 2018. Towards Controllable Generation of Diverse Natural Language. In INLG 2018 (Challenge Track). Publisher's VersionAbstract
In natural language generation (NLG), the task is to generate utterances from a more abstract input, such as structured data. An added challenge is to generate utterances that contain an accurate representation of the input, while reflecting the fluency and variety of human-generated text. In this paper, we report experiments with NLG models that can be used in task oriented dialogue systems. We explore the use of additional input to the model to encourage diversity and control of outputs. While our submission does not rank highly using automated metrics, qualitative investigation of generated utterances suggests the use of additional information in neural network NLG systems to be a promising research direction.
Gehrmann, S., Deng, Y. & Rush, A.M., 2018. Bottom-up abstractive summarization. EMNLP 2018.Abstract
Neural network-based methods for abstractive summarization produce outputs that are more fluent than other techniques, but perform poorly at content selection. This work proposes a simple technique for addressing this issue: use a data-efficient content selector to 
over-determine phrases in a source document that should be part of the summary. We use this selector as a bottom-up attention step to constrain the model to likely phrases. We show that this approach improves the ability to compress text, while still generating fluent summaries. This two-step process is both simpler and higher performing than other end-to-end content selection models, leading to significant improvements on ROUGE for both the  CNNDM and NYT corpus. Furthermore, the content selector can be trained with as little as 1,000 sentences, making it easy to transfer a trained summarizer to a new domain.
Gehrmann, S., et al., 2018. End-to-End Content and Plan Selection for Data-to-Text Generation. INLG 2018.Abstract
Learning to generate fluent natural language from structured data with neural networks has become an common approach for NLG. 
  This problem can be challenging when the form of the structured data varies between examples. 
  This paper presents a survey of several extensions to sequence-to-sequence models to account for the latent content selection process, particularly variants of copy attention and coverage decoding.  
  We further propose a training method based on diverse ensembling to encourage models to learn distinct sentence templates during training. An empirical evaluation of these techniques shows an increase in the quality of generated text across five automated metrics, as well as human evaluation.
Strobelt, H., et al., 2018. Seq2Seq-Vis: A Visual Debugging Tool for Sequence-to-Sequence Models. IEEE Transactions on Visualization and Computer Graphics. Publisher's VersionAbstract
sequence-to-sequence models have proven to be
accurate and robust for many sequence prediction tasks, and have
become the standard approach for automatic translation of text. The
models work with a five-stage blackbox pipeline that begins with encoding a
source sequence to a vector space and then decoding out to a new
target sequence. This process is now standard, but like many deep
learning methods remains quite difficult to understand or debug.  In
this work, we present a visual analysis tool
that allows interaction and "what if"-style exploration of trained sequence-to-sequence models through each stage of the translation process. The aim is to identify
which patterns have been learned, to detect model errors, and to probe the model with
counterfactual scenario. We demonstrate the utility of our tool through several real-world  sequence-to-sequence use cases on large-scale models.
Gehrmann, S., et al., 2018. End-to-End Content and Plan Selection for Natural Language Generation, Publisher's VersionAbstract

This paper describes our entry for the INLG 2018 E2E NLG challenge. Generating flu- ent natural language descriptions from struc- tured data is a key sub-task for conversa- tional agents. In the E2E NLG challenge, the task is to generate these utterances conditioned on multiple attributes and values. Our sys- tem utilizes several extensions to the general- purpose sequence-to-sequence (S2S) architec- ture to model the latent content selection pro- cess, particularly different variants of copy at- tention and coverage decoding. In addition, we propose a new training method based on diverse ensembling to encourage the model to learn latent plans in training. We empirically evaluate these techniques and show that the system increases the quality of generated text across five automated metrics. Out of a total of sixty submitted systems from 16 institutions, our best system ranks first-place in three of the five metrics, including ROUGE.

Strobelt, H., et al., 2018. LSTMVis: A Tool for Visual Analysis of Hidden State Dynamics in Recurrent Neural Networks. IEEE Transactions on Visualization and Computer Graphics , 24 (1) , pp. 667-676.
Gehrmann, S., et al., 2018. Comparing deep learning and concept extraction based methods for patient phenotyping from clinical narratives. PloS one , 13 (2) , pp. e0192360.
Wu, J.T., et al., 2017. Behind the Scenes: A Medical Natural Language Processing Project. International Journal of Medical Informatics. Publisher's Version
Visual Analysis of Hidden State Dynamics in Recurrent Neural Networks
Strobelt, H., et al., 2016. Visual Analysis of Hidden State Dynamics in Recurrent Neural Networks. Publisher's VersionAbstract

Recurrent neural networks, and in particular long short-term memory networks (LSTMs), are a remarkably effective tool for sequence modeling that learn a dense black-box hidden representation of their sequential input. Researchers interested in better understanding these models have studied the changes in hidden state representations over time and noticed some interpretable patterns but also significant noise. In this work, we present LSTMVis a visual analysis tool for recurrent neural networks with a focus on understanding these hidden state dynamics. The tool allows a user to select a hypothesis input range to focus on local state changes, to match these states changes to similar patterns in a large data set, and to align these results with domain specific structural annotations. We further show several use cases of the tool for analyzing specific hidden state properties on datasets containing nesting, phrase structure, and chord progressions, and demonstrate how the tool can be used to isolate patterns for further statistical analysis.

Gehrmann, S., et al., 2015. Deploying AI Methods to Support Collaborative Writing: a Preliminary Investigation. In CHI’15 Extended Abstracts on Human Factors in Computing Systems. ACM. Publisher's VersionAbstract
Many documents (e.g., academic papers, government reports) are typically written by multiple authors. While existing tools facilitate and support such collaborative efforts (e.g., Dropbox, Google Docs), these tools lack intelligent information sharing mechanisms. Capabilities such as "track changes" and "diff"" visualize changes to authors, but do not distinguish between minor and major edits and do not consider the possible effects of edits on other parts of the document. Drawing collaborators’ attention to specific edits and describing them remains the responsibility of authors. This paper presents our initial work toward the development of a collaborative system that supports multi-author writing. We describe methods for tracking paragraphs, identifying significant edits, and predicting parts of the paper that are likely to require changes as a result of previous edits. Preliminary evaluation of these methods shows promising results.