Filter Papers
Options are sorted by paper count.
Datasets Paper Type Domains Learning Paradigm Metrics Human Evaluation Pipeline Component Venue Year Clear Filters
Explore 514 Papers
Title Summary Venue Year
Unsupervised Class-Specific Abstractive Summarization of Customer Reviews System: The paper proposes a model for large-scale unsupervised abstractive summarization of customer reviews in e-commerce. The model addresses the challenge of reducing generic and uninformative content and producing useful information related to specific product aspects by modeling reviews in the context of topical classes of interest. The proposed model can generate class-specific summaries from multiple reviews of each product without ground-truth summaries, using only class probabilities or labels. The model combines a generative variational autoencoder with a class-correlation gating mechanism and a hierarchical structure. Human evaluation shows that the generated summaries are relevant, fluent, and representative, and evaluation using a reference dataset shows that the model outperforms state-of-the-art abstractive and extractive baselines. ACL 2021
Unsupervised Abstractive Opinion Summarization by Generating Sentences with Tree-Structured Topic Guidance The paper presents a new method for summarizing opinionated texts using a recursive Gaussian mixture model. The model generates sentences with tree-structured topic guidance, where the root sentence conveys generic content, and the leaf sentences describe specific topics. Experimental results show that the generated topic sentences are more informative and cover more input contents than those generated by recent unsupervised summarization models. The paper also demonstrates that the variance of latent Gaussians represents the granularity of sentences, similar to Gaussian word embedding. TACL 2021
Unsupervised Opinion Summarization as Copycat-Review Generation The paper discusses the task of opinion summarization, which involves automatically creating summaries that reflect subjective information expressed in multiple documents, such as product reviews. While previous work has focused on selecting fragments from input reviews to produce a summary, the authors propose a generative model that can produce abstractive summaries by generating novel sentences. They consider the unsupervised setting, where no summaries are used in training, and define a hierarchical variational autoencoder model that can control the "amount of novelty" in new reviews. At test time, the model produces summaries that reflect consensus opinions by forcing the novelty to be minimal. Experiments on Amazon and Yelp datasets show that setting the review's latent code to its mean allows the model to produce fluent and coherent summaries. ACL 2020
Unsupervised document summarization using pre-trained sentence embeddings and graph centrality System: The paper proposes a simple and fast method for summarizing any document of any size using sentence embeddings produced by deep language models. This method is based on graph centrality and can satisfy any length constraints for the summaries produced. The proposed method offers competitive performance to more sophisticated supervised methods and can serve as a proxy for abstractive summarization techniques. NAACL 2021
Unsupervised Single Document Abstractive Summarization using Semantic Units The paper discusses the importance of content frequency in abstractive summarization and proposes a two-stage training framework for the model to learn the frequency of each semantic unit in the source text. The model is trained in an unsupervised manner and identifies sentences with high-frequency semantic units during inference to generate summaries. The model outperforms other unsupervised methods on the CNN/Daily Mail summarization task and achieves competitive ROUGE scores with fewer parameters than pre-trained models. It can be trained under low-resource language settings and is a potential solution for real-world applications where pre-trained models are not applicable. AACL 2022
BottleSum: Unsupervised and Self-supervised Sentence Summarization using the Information Bottleneck Principle The paper proposes a new approach to unsupervised sentence summarization using the Information Bottleneck principle. The approach seeks a compressed sentence that can best predict the next sentence, using an iterative algorithm that gradually searches shorter subsequences of the given sentence. The method can efficiently perform extractive sentence summarization over a large corpus using only pretrained language models with no direct supervision. The paper also presents a new approach to self-supervised abstractive summarization, where a transformer-based language model is trained on the output summaries of the unsupervised method. Empirical results show that the extractive method outperforms other unsupervised models on multiple automatic metrics, and the self-supervised abstractive model outperforms unsupervised baselines by human evaluation along multiple attributes. EMNLP 2019
Simple Unsupervised Summarization by Contextual Matching System: The paper proposes an unsupervised method for sentence summarization using language modeling. The approach uses two language models, one generic and one specific to the target domain, and employs a product-of-experts criteria to maintain contextual matching and output fluency. The experiments show promising results for both abstractive and extractive summarization without the need for paired data. ACL 2019
Unsupervised Neural Single-Document Summarization of Reviews via Learning Latent Discourse Structure and its Ranking The paper presents a model for end-to-end abstractive summarization of product reviews without supervision. The model uses a discourse tree to represent the review, with the summary as the root and child sentences providing detailed explanations. The model recursively estimates parents from their children to learn the discourse tree and generate a concise summary. An architecture is introduced to rank the importance of each sentence on the tree and focus on the main review point. Experimental results show that the model outperforms other unsupervised approaches and achieves competitive performance with supervised models for long reviews. The induced tree demonstrates that child sentences provide additional information about their parent, and the generated summary abstracts the entire review. ACL 2019
TED: A Pretrained Unsupervised Summarization Model with Theme Modeling and Denoising The paper proposes a transformer-based unsupervised abstractive summarization system called TED that pretrains on large-scale data using the lead bias in news articles. The system is then fine-tuned on target domains through theme modeling and a denoising autoencoder to enhance the quality of generated summaries. TED outperforms all unsupervised abstractive baselines on various datasets and the summaries generated by TED are highly abstractive. Each component in the objective function of TED is highly effective. EMNLP 2020
Q-learning with Language Model for Edit-based Unsupervised Summarization The paper proposes a new approach for unsupervised text summarization using Q-learning with an edit-based summarization. The method combines two modules to form an Editorial Agent and Language Model converter (EALM), where the agent predicts edit actions and the LM converter generates a summary based on the action signals. Q-learning is used to train the agent to produce proper edit actions. Experimental results show that EALM performs competitively compared to previous methods, even with no validation set. The approach also allows for the use of reinforcement learning techniques in unsupervised summarization. Qualitative analysis is conducted to provide insights for future research in unsupervised summarizers. EMNLP 2020
Unsupervised Extractive Text Summarization with Distance-Augmented Sentence Graphs The paper discusses the limitations of supervised summarization due to the high cost and difficulty of obtaining large quantities of human-generated summaries. It proposes an unsupervised approach to extractive text summarization using an automatically constructed sentence graph to select salient sentences based on similarities and relative distances. The approach is generalized from single-document to multi-document settings by aggregating document-level graphs via proximity-based cross-document edges. In experiments on benchmark datasets, the proposed approach achieved competitive or better results than previous state-of-the-art unsupervised extractive summarization methods in both single-document and multi-document settings, and the performance is competitive to strong supervised baselines. SIGIR 2021
SUBSUME: A Dataset for Subjective Summary Extraction from Wikipedia Documents The paper discusses the need for tailored summaries based on the user's intent and how existing methods fall short when query interpretation is subjective. While several datasets exist for summarization with objective intents, no datasets exist for subjective intents where different users will provide different summaries. The authors present SUBSUME, the first dataset for evaluation of subjective summary extraction systems, containing 2,200 triplets over 48 Wikipedia pages with ten intents of varying subjectivity. The paper explores baseline algorithms for subjective extractive summarization and shows that example-based approaches better capture subjective intents than query-based ones, motivating further research on this challenging problem. EMNLP 2021
GenCompareSum: a hybrid unsupervised summarization method using salience The paper proposes a hybrid, unsupervised, abstractive-extractive approach for text summarization (TS) that generates salient textual fragments representing key points in a document and selects the most important sentences using BERTScore. The approach is evaluated on documents from the biomedical and general scientific domains and compared to existing unsupervised and supervised methods. The authors show that their approach out-performs existing methods despite not needing a vast amount of labelled training data. ACL 2022
Learning Non-Autoregressive Models from Search for Unsupervised Sentence Summarization The paper proposes a Non-Autoregressive Unsupervised Summarization (NAUS) approach for generating short summaries without the need for parallel data. The approach involves edit-based search and training an encoder-only non-autoregressive Transformer based on the search result. The paper also introduces a dynamic programming approach for length-control decoding, which is important for the summarization task. Experiments on two datasets show that NAUS achieves state-of-the-art performance for unsupervised summarization and improves inference efficiency. Additionally, the algorithm is able to perform explicit length-transfer summary generation. ACL 2022
Cross-domain Aspect/Sentiment-aware Abstractive Review Summarization System: The paper proposes a model called CASAS for aspect/sentiment-aware abstractive review summarization in a domain adaptation scenario. The model leverages a domain classification task to recognize the domain information of texts and transfer knowledge from source domains to target domains. The experiments conducted on Amazon reviews show that CASAS outperforms other methods in both out-of-domain and in-domain setups. CIKM 2018
Ensure the Correctness of the Summary: Incorporate Entailment Knowledge into Abstractive Sentence Summarization The paper discusses the importance of correctness in sentence summarization and proposes a new approach that incorporates entailment knowledge into abstractive summarization models. The authors argue that a correct summary should not contain error messages with respect to the source sentence. They propose an entailment-aware encoder and decoder and use entailment Reward Augmented Maximum Likelihood (RAML) training. Experimental results show that their models outperform baselines in terms of informativeness and correctness. COLING 2018
Zero-Shot Aspect-Based Scientific Document Summarization using Self-Supervised Pre-training The paper explores the zero-shot setting for aspect-based scientific document summarization, which can improve document assistance systems and reader experience. However, current datasets have limited aspects, causing models to over-fit to specific domains. The authors establish baseline results for zero-shot performance and propose a self-supervised pre-training approach to enhance it. They create a biomedical aspect-based summarization dataset using PubMed structured abstracts and show promising results when pre-trained with unlabelled in-domain data. ACL 2022
An Efficient Coarse-to-Fine Facet-Aware Unsupervised Summarization Framework Based on Semantic Blocks The paper proposes an efficient Coarse-to-Fine Facet-Aware Ranking (C2F-FAR) framework for unsupervised long document summarization, which is based on the semantic block. The framework addresses the problem of existing methods failing to consider efficiency and effectiveness at the same time when the input document is extremely long. The proposed method converts the one-step ranking method into the hierarchical multi-granularity two-stage ranking, where the coarse-level stage splits the document into facet-aware semantic blocks and filters insignificant blocks, and the fine-level stage selects salient sentences in each block and extracts the final summary from selected sentences. The framework achieves new state-of-the-art unsupervised summarization results on Gov-Report and BillSum and speeds up 4-28 times more than previous methods. COLING 2022
A Neural Attention Model for Abstractive Sentence Summarization System: The paper proposes a new approach to abstractive sentence summarization using a fully data-driven method. The method utilizes a local attention-based model that generates each word of the summary based on the input sentence. The model is simple in structure, but can be trained end-to-end and scaled to a large amount of training data. The model shows significant performance gains on the DUC-2004 shared task compared to other strong baselines. EMNLP 2015
Diversity driven Attention Model for Query-based Abstractive Summarization The paper proposes a model for query-based summarization that addresses the problem of repeated phrases in the summary. The model is based on the encode-attend-decode paradigm and includes a query attention model and a diversity-based attention model. The authors introduce a new query-based summarization dataset and show that their model outperforms vanilla encode-attend-decode models with a gain of 28% in ROUGE-L scores. ACL 2017
Abstractive Sentence Summarization with Attentive Recurrent Neural Networks The paper discusses a new method for Abstractive Sentence Summarization, which generates a shorter version of a given sentence while preserving its meaning. The method uses a conditional recurrent neural network (RNN) with a novel convolutional attention-based encoder to ensure that the decoder focuses on the appropriate input words. The model relies on learned features and is easy to train on large data sets. The experiments show that the model outperforms the state-of-the-art method on the Gigaword corpus and performs competitively on the DUC-2004 shared task. NAACL 2016
Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond The paper discusses the use of sequence-to-sequence recurrent neural networks (RNNs) for text summarization. It also explores various techniques for improving the performance of these models, such as attention mechanisms and pointer networks. The authors present experimental results on several benchmark datasets, demonstrating the effectiveness of their approach. They also discuss potential future directions for research in this area. CONLL 2016
Controlling Length in Abstractive Summarization Using a Convolutional Neural Network The paper discusses the limitations of convolutional neural networks (CNNs) in generating summaries of desired lengths for different scenarios with space or length constraints. To address this problem, the authors propose an approach to constrain the summary length by extending a convolutional sequence to sequence model. The results show that this approach generates high-quality summaries with user-defined length and outperforms baselines in terms of ROUGE score, length variations, and semantic similarity. EMNLP 2018
Selective Encoding for Abstractive Sentence Summarization The paper proposes a selective encoding model for abstractive sentence summarization, which includes a sentence encoder, a selective gate network, and an attention equipped decoder. The model uses recurrent neural networks and constructs a second level sentence representation for better performance. The model was evaluated on multiple datasets and outperformed state-of-the-art baseline models. ACL 2017
Deep Recurrent Generative Decoder for Abstractive Text Summarization The paper proposes a new framework for abstractive text summarization using a sequence-to-sequence oriented encoder-decoder model with a deep recurrent generative decoder. The model learns latent structure information from target summaries using a recurrent latent random model and neural variational inference. Abstractive summaries are generated using both generative latent variables and discriminative deterministic states. The model outperforms state-of-the-art methods on benchmark datasets in different languages. EMNLP 2017
A Pilot Study of Domain Adaptation Effect for Neural Abstractive Summarization System: The paper explores domain adaptation for neural abstractive summarization and investigates what information can be transferred to a new domain. The study finds that pre-training based on extractive summaries benefits the neural summarization model and that a combination of in-domain and out-of-domain setup yields better summaries when in-domain data is insufficient. The model is capable of selecting salient content even when trained on out-of-domain data, but requires in-domain data to capture the style for a target domain. EMNLP 2017
Abstractive Document Summarization with a Graph-Based Attentional Neural Model The paper discusses the challenges of abstractive document summarization and proposes a novel graph-based attention mechanism in the sequence-to-sequence framework to address the saliency factor of summarization. The experimental results show that the proposed model achieves considerable improvement over previous neural abstractive models and is competitive with state-of-the-art extractive methods. ACL 2017
A DEEP REINFORCED MODEL FOR ABSTRACTIVE SUMMARIZATION The paper discusses the limitations of current attentional, RNN-based encoder-decoder models for abstractive summarization on longer documents and introduces a new neural network model with a novel intraattention and a training method that combines supervised word prediction and reinforcement learning. The resulting summaries are more readable and the model achieves an improved ROUGE-1 score on the CNN/Daily Mail dataset compared to previous state-of-the-art models. Human evaluation also shows that the model produces higher quality summaries. ICLR 2018
Faithful to the Original: Fact-Aware Neural Abstractive Summarization The paper discusses the problem of fake facts in abstractive summarization, where different parts of the source text are fused together. The authors propose a solution that leverages open information extraction and dependency parse technologies to extract actual fact descriptions from the source text, and a dual-attention sequence-to-sequence framework to generate summaries conditioned on both the source text and the extracted fact descriptions. Experiments show that their model can reduce fake summaries by 80%, while also improving informativeness. AAAI 2018
Generative Adversarial Network for Abstractive Text Summarization The paper proposes an adversarial process for abstractive text summarization, where a generative model and a discriminative model are simultaneously trained. The generator is built as an agent of reinforcement learning, while the discriminator attempts to distinguish the generated summary from the ground truth summary. The model achieves competitive ROUGE scores with state-of-the-art methods on the CNN/Daily Mail dataset and is able to generate more abstractive, readable, and diverse summaries. AAAI 2018
Boosting Few-Shot Abstractive Summarization with Auxiliary Tasks The paper discusses the challenge of summarization in niche domains and proposes a solution to the few-shot problem by designing auxiliary tasks to assist abstractive summarization. The authors use BART as the base sequence-to-sequence model and incorporate the main and auxiliary tasks under a multi-task framework. They also use a task-specific adapter and adaptive weight mechanism to adjust the contribution of auxiliary tasks to the main task. The experiments show the effectiveness of their method for few-shot datasets, and they propose pre-training the model on unlabeled datasets to further improve performance. CIKM 2021
Improving Zero and Few-Shot Abstractive Summarization with Intermediate Fine-tuning and Data Augmentation The paper discusses how models pretrained on large text corpora achieve state-of-the-art performance on English text summarization tasks, but fine-tuning them on new, niche domains is infeasible due to the requirement of hundreds of thousands of data points. The authors introduce a novel and generalizable method called WikiTransfer, which fine-tunes pretrained models for summarization in an unsupervised, dataset-specific manner using pseudo-summaries produced from generic Wikipedia data. WikiTransfer models achieve state-of-the-art, zero-shot abstractive summarization performance on the CNN-DailyMail dataset and demonstrate effectiveness on three additional diverse datasets. The authors also employ data augmentation and introduce a regularization term to improve few-shot transfer performance. The paper further studies the effect of dataset aspects on transfer performance and evaluates the quality of output summaries using both automatic and human evaluation. NAACL 2021
Improving Neural Abstractive Document Summarization with Structural Regularization The paper discusses the limitations of current neural sequence-to-sequence models in document summarization and proposes a solution that leverages the structural information of both documents and multi-sentence summaries to improve performance. The proposed method involves incorporating structural-compression and structural-coverage regularization to capture the information compression and coverage properties of document summarization. Experimental results show that the proposed method significantly improves the performance of document summarization and outperforms current state-of-the-art neural abstractive methods. EMNLP 2018
Controllable Abstractive Summarization The paper discusses how current document summarization models do not take into account user preferences such as desired length, style, entities of interest, and how much of the document has been read. The authors propose a neural summarization model that allows users to specify these preferences, resulting in high quality summaries tailored to their needs. The system can also automatically set control variables and outperforms state of the art abstractive systems on the CNN-Dailymail dataset. ACL 2018
Fast Abstractive Summarization with Reinforce-Selected Sentence Rewriting The paper proposes a summarization model that selects important sentences and rewrites them to create a concise summary. They use a new sentence-level policy gradient method to bridge the gap between two neural networks and achieve higher scores on all metrics, including human evaluation, on the CNN/Daily Mail dataset. The model also enables faster inference and training convergence than previous models. The model is also demonstrated to perform well on the DUC2002 dataset. ACL 2018
Global Encoding for Abstractive Summarization The paper proposes a new global encoding framework to improve the conventional sequence-to-sequence model in neural abstractive summarization, which often suffers from repetition and semantic irrelevance. The framework controls the information flow from the encoder to the decoder based on the global information of the source context, using a convolutional gated unit to perform global encoding and improve the representations of the source-side information. Evaluations on two datasets show that the proposed model outperforms baseline models and is capable of generating higher quality summaries with reduced repetition. ACL 2018
Structure-Infused Copy Mechanisms for Abstractive Summarization The paper discusses the limitations of current summarization systems and proposes a new approach that incorporates source-side syntactic information to improve the quality of summaries. The approach uses structure-infused copy mechanisms to copy important words and relations from the source sentence to the summary sentence. Experimental results show that this approach is effective and outperforms state-of-the-art methods. COLING 2018
A Unified Model for Extractive and Abstractive Summarization using Inconsistency Loss The paper proposes a unified model that combines the strengths of extractive and abstractive summarization. The model uses sentence-level attention to modulate word-level attention, resulting in a more readable paragraph. The model also introduces a novel inconsistency loss function to penalize the inconsistency between two levels of attentions. By end-to-end training, the model achieves state-of-the-art ROUGE scores and is the most informative and readable summarization on the CNN/Daily Mail dataset according to a human evaluation. ACL 2018
Aspect and Sentiment Aware Abstractive Review Summarization The paper discusses the lack of research on end-to-end abstractive review summarization, which is important for businesses and consumers to make informed decisions. The authors propose a mutual attention mechanism that learns the representations of context, sentiment, and aspect words within reviews, acting as an encoder. The learned representations are incorporated into the decoder to generate aspect/sentiment-aware review summaries via an attention fusion network. The abstractive summarizer is jointly trained with the text categorization task, which helps learn a category-specific text encoder. The experimental results on a real-life dataset show that their model outperforms other strong competitors. COLING 2018
Entity Commonsense Representation for Neural Abstractive Summarization The paper explores the use of linked entities to improve the performance of a neural text summarizer. The authors propose a module called Entity2Topic (E2T) that transforms a list of entities into a vector representation of the summary's topic. They use an off-the-shelf entity linking system (ELS) to extract linked entities, but resolve imperfections in the ELS by encoding entities with selective disambiguation and pooling entity vectors using firm attention. Applying E2T to a simple sequence-to-sequence model with attention mechanism results in significant improvements in the performance of the summarizer in the Gigaword and CNN datasets. NAACL 2018
A Discourse-Aware Attention Model for Abstractive Summarization of Long Documents System: The paper proposes a new model for abstractive summarization of longer-form documents, such as research papers. The model uses a hierarchical encoder to model the discourse structure of the document and an attentive discourse-aware decoder to generate the summary. Empirical results on two large-scale datasets of scientific papers show that the proposed model outperforms state-of-the-art models. NAACL 2018
Deep Communicating Agents for Abstractive Summarization System: The paper proposes a new approach to abstractive summarization using deep communicating agents in an encoder-decoder architecture. The task of encoding a long text is divided across multiple collaborating agents, each responsible for a subsection of the input text. These encoders are connected to a single decoder, trained using reinforcement learning to generate a focused and coherent summary. Empirical results show that this approach leads to higher quality summaries compared to several strong baselines. NAACL 2018
Guiding Generation for Abstractive Text Summarization Based on Key Information Guide Network The paper proposes a guiding generation model that combines extractive and abstractive methods for text summarization. The model uses a Key Information Guide Network (KIGN) to encode keywords and guide the generation process, and a prediction-guide mechanism to obtain long-term value for future decoding. The model is evaluated on the CNN/Daily Mail dataset and shows significant improvements compared to previous models. NAACL 2018
Frustratingly Easy Model Ensemble for Abstractive Summarization System: The paper discusses the effectiveness of ensemble methods for text-generation tasks, but notes that they often come with increased computational costs. The authors propose an alternative unsupervised ensemble method called post-ensemble, which selects a majority-like output in post-processing. The method is theoretically related to kernel density estimation based on the von MisesFisher kernel. Experimental results on a news headline-generation task show that the proposed method outperforms current ensemble methods. EMNLP 2018
Improving Neural Abstractive Document Summarization with Explicit Information Selection Modeling The paper proposes a new approach to document summarization that explicitly models and optimizes the information selection process. This is achieved through an information selection layer that includes global information filtering and local sentence selection. The approach is trained using distantly-supervised training guided by a golden summary. Experimental results show that this approach significantly improves document summarization performance and outperforms state-of-the-art neural abstractive methods. EMNLP 2018
Bottom-Up Abstractive Summarization The paper proposes a technique to improve the content selection of neural network-based methods for abstractive summarization. The technique involves using a data-efficient content selector to identify phrases in the source document that should be included in the summary. This selector is used as a bottom-up attention step to constrain the model to likely phrases, resulting in improved text compression and fluent summaries. The approach is simpler and higher performing than other end-to-end content selection models, and can be trained with as little as 1,000 sentences, making it easy to transfer to a new domain. The technique was shown to significantly improve ROUGE scores for both the CNN-DM and NYT corpus. EMNLP 2018
A Reinforced Topic-Aware Convolutional Sequence-to-Sequence Model for Abstractive Text Summarization The paper proposes a deep learning approach to automatic summarization that incorporates topic information into the ConvS2S model and uses SCST for optimization. The approach improves coherence, diversity, and informativeness of generated summaries through a biased probability generation mechanism. Reinforcement training optimizes the model with respect to the non-differentiable metric ROUGE and avoids exposure bias during inference. The method is evaluated on three datasets and shows superior performance in abstractive summarization. IJCAI 2018
Exploring Human-Like Reading Strategy for Abstractive Text Summarization The paper discusses the challenges of generating high-quality abstractive summaries using deep neural network based methods and proposes a novel Hybrid learning model for Abstractive Text Summarization (HATS) that follows a hierarchical routine similar to human-like reading strategy. HATS consists of three major components, a knowledge-based attention network, a multitask encoder-decoder network, and a generative adversarial network, which are consistent with the different stages of the human-like reading strategy. The experimental results on two real-life datasets, CNN/Daily Mail and Gigaword, demonstrate that HATS achieves impressive results. AAAI 2019
An Entity-Driven Framework for Abstractive Summarization The paper introduces SENECA, a new system for entity-driven coherent abstractive summarization that uses entity information to generate informative and coherent abstracts. The framework takes a two-step approach, with an entity-aware content selection module identifying salient sentences and an abstract generation module conducting cross-sentence information compression and abstraction. The model is trained with rewards to promote coherence, conciseness, and clarity, and is further connected using reinforcement learning. Automatic evaluation shows that SENECA outperforms previous state-of-the-art on ROUGE and coherence measures on New York Times and CNN/Daily Mail datasets, and human judges rate its summaries as more informative and coherent than those by popular summarization models. EMNLP 2019
BiSET: Bi-directional Selective Encoding with Template for Abstractive Summarization The paper proposes a new model called Bi-directional Selective Encoding with Template (BiSET) for summarizing articles. The model uses templates discovered from training data to select key information from source articles and guide the summarization process. The experiments conducted on a standard summarization dataset show that the BiSET model significantly improves the summarization performance and achieves a new state of the art. ACL 2019
Attention Optimization for Abstractive Document Summarization System: The paper discusses the importance of attention in improving document summarization models. The authors propose an attention refinement unit that uses both local and global variance loss to supervise the attention model at each decoding step and optimize the attention distributions from a global perspective. The effectiveness of the proposed methods is verified through experiments on the CNN/Daily Mail dataset. EMNLP 2019
Contrastive Attention Mechanism for Abstractive Sentence Summarization The paper proposes a contrastive attention mechanism for abstractive sentence summarization, which includes both conventional attention that focuses on relevant parts of the source sentence and opponent attention that focuses on irrelevant or less relevant parts. The mechanism is trained in an opposite way to encourage the contribution from conventional attention and discourage the contribution from opponent attention. Experiments show that the proposed mechanism is more focused on relevant parts and greatly improves the state-of-the-art performance on the task. The code is available on GitHub. EMNLP 2019
Aspect and Opinion Aware Abstractive Review Summarization with Reinforced Hard Typed Decoder System: The paper discusses a two-stage reinforcement learning approach for abstractive review summarization. The approach predicts the output word type and then generates the final word distribution based on the predicted word type. The method outperforms several strong baseline approaches based on ROUGE scores in experimental results on two Amazon product review datasets. CIKM 2019
Concept Pointer Network for Abstractive Summarization The paper proposes a concept pointer network for improving abstractive summarization by generating new conceptual words to express concrete details. The network uses knowledge-based, context-aware conceptualizations to derive an extended set of candidate concepts and points to the most appropriate choice using both the concept set and original source text. The training model is optimized using a novel method of distantly-supervised learning guided by reference summaries and testing set. The proposed approach provides statistically significant improvements over several state-of-the-art models on both the DUC2004 and Gigaword datasets, and a human evaluation supports the quality of the summaries produced within this framework. EMNLP 2019
How to Write Summaries with Patterns? Learning towards Abstractive Summarization through Prototype Editing The paper introduces a model called Prototype Editing based Summary Generator (PESG) that utilizes prototype document-summary pairs to generate better summaries that conform to a particular style with patterns. The model addresses two challenges: incorporating learned patterns from the prototype while avoiding copying irrelevant facts, and generating new summaries based on the summary pattern or extracted facts. A fact checker is used to estimate mutual information between the input document and generated summary, resulting in state-of-the-art performance in both automatic metrics and human evaluations. EMNLP 2019
Summary Level Training of Sentence Rewriting for Abstractive Summarization The paper proposes a new approach to combining extractive and abstractive summarization using Sentence Rewriting models. The existing models in this framework rely on suboptimal labels, causing a mismatch between the training objective and evaluation metric. The authors present a novel training signal that directly maximizes summary-level ROUGE scores through reinforcement learning and incorporate BERT into their model. They show that their proposed model and training procedure obtain new state-of-the-art performance on both CNN/Daily Mail and New York Times datasets and generalize better on DUC-2002 test set. EMNLP 2019
Deep Reinforcement Learning with Distributional Semantic Rewards for Abstractive Summarization The paper discusses the limitations of using conventional reward measures for deep reinforcement learning in abstractive summarization tasks, which can result in repetitive and incoherent sentences. Instead, the authors propose using distributional semantics to measure the matching degrees, allowing for sentence-level evaluation and the generation of semantically-correct phrases. The proposed distributional semantics reward (DSR) is shown to have superior performance in capturing the lexical and compositional diversity of natural language, based on human judgments on Gigaword and CNN/Daily Mail datasets. EMNLP 2019
Improving Abstractive Document Summarization with Salient Information Modeling The paper proposes a Transformer-based encoder-decoder framework with two novel extensions for abstractive document summarization. The first extension is a focus-attention mechanism that models a Gaussian focal bias on attention scores to enhance the perception of local context, contributing to producing salient and informative summaries. The second extension is an independent saliency-selection network that manages the information flow from encoder to decoder, effectively reducing the influences of secondary information on the generated summaries. Experimental results on the CNN/Daily Mail benchmark show that the proposed model outperforms other state-of-the-art baselines on the ROUGE metrics. ACL 2019
In Conclusion Not Repetition: Comprehensive Abstractive Summarization With Diversified Attention Based On Determinantal Point Processes The paper discusses the limitations of existing Seq2Seq models for abstractive summarization and introduces a new model called DivCNN Seq2Seq that uses Determinantal Point Processes methods to produce attention distribution that considers both quality and diversity. The new model achieves a higher level of comprehensiveness compared to existing models and strong baselines without breaking the end-to-end architecture. The reproducible codes and datasets are available online. CONLL 2019
Abstractive Summarization of Reddit Posts with Multi-level Memory Networks System: The paper discusses a method for summarizing Reddit posts using multi-level memory networks. The authors propose a model that can capture the important information in a post and generate a summary that accurately reflects the content. The model uses both word-level and sentence-level representations to capture the meaning of the post and the relationships between different parts of the text. The authors evaluate their model on a dataset of TIFU (Today I Fucked Up) posts from Reddit and show that it outperforms several baseline methods in terms of ROUGE scores. NAACL 2019
Abstractive Text Summarization Based on Deep Learning and Semantic Content Generalization The paper presents a new method for improving abstractive text summarization using deep learning and semantic data transformations. The method involves using a theoretical model for semantic-based text generalization along with a deep encoder-decoder architecture to produce a summary in generalized form. The summary is then transformed into a human-readable form while retaining important information and addressing the problem of out-of-vocabulary or rare words. The approach is evaluated on two datasets with positive results. ACL 2019
Scoring Sentence Singletons and Pairs for Abstractive Summarization The paper discusses the challenge of summarizing text by both compressing single sentences and fusing pairs, as sentence selection methods only work with single sentences and not combinations of them. The authors propose a framework that ranks sentence singletons and pairs together in a unified space, modeling human methodology by selecting either a single sentence or a pair of sentences and compressing or fusing them to produce a summary sentence. The framework was tested on both single and multidocument summarization datasets, with findings reported on sentence selection and abstraction. ACL 2019
Neural Query-Biased Abstractive Summarization Using Copying Mechanism System: The paper discusses the query-biased summarization task and how conventional approaches have achieved better performance by including overlapping words between the source and the query in the summary. However, RNN-based approaches do not explicitly model this phenomenon. The paper proposes an RNN-based query-biased summarizer that primarily includes overlapping words in the summary using a copying mechanism. Experimental results show that this strategy works well for neural query-biased summarizers. ECIR 2020
The Summary Loop: Learning to Write Abstractive Summaries Without Examples The paper presents a new approach to unsupervised abstractive summarization that maximizes coverage and fluency while adhering to a length constraint. The method includes key terms from the original document and uses a coverage model to fill them in the generated summary. The unsupervised training procedure uses both coverage and fluency models to generate and score summaries. The method outperforms previous unsupervised methods by more than 2 R-1 points and approaches results of competitive supervised methods. The model attains higher levels of abstraction with shorter copied passages and learns to compress and merge sentences without supervision. ACL 2020
Attend to Medical Ontologies: Content Selection for Clinical Abstractive Summarization The paper discusses the limitations of the seq2seq network in identifying key regions of the source for text summarization. The authors propose a solution by augmenting salient ontological terms into the summarizer for clinical abstractive summarization. Their experiments on two clinical data sets show that their model significantly improves state-of-the-art results in terms of ROUGE metrics, which is important in the healthcare domain where any improvement can impact patients’ welfare. ACL 2020
Knowledge Graph-Augmented Abstractive Summarization with Semantic-Driven Cloze Reward The paper discusses the limitations of current sequence-to-sequence models for abstractive summarization and proposes a new framework called ASGARD, which uses dual encoders and a reward system based on a multiple choice cloze test to better capture entity interactions and generate more informative summaries. The authors show that their models produce significantly higher ROUGE scores and are rated as more informative and containing fewer errors by human judges compared to other systems. ACL 2020
Composing Elementary Discourse Units in Abstractive Summarization The paper proposes a new method for abstractive summarization using elementary discourse units (EDUs) instead of sentences. The method includes an EDU selection model to group informative EDUs and an EDU fusion model to combine them into sentences. The reinforcement learning mechanism is used to improve the summarization performance. The model was tested on CNN/Daily Mail and showed promising results. ACL 2020
Controlling the Amount of Verbatim Copying in Abstractive Summarization The paper discusses the challenge of creating abstracts that accurately summarize the original text without changing its meaning. It explores the use of neural summarization models to generate summaries with varying degrees of copying, from purely extractive to highly generative. The authors present a method that allows for control over copying during both training and decoding stages, and demonstrate its effectiveness through extensive experiments. The paper also reveals interesting and unobvious findings about the process of summarization. AAAI 2020
Keywords-Guided Abstractive Sentence Summarization This paper proposes an abstractive sentence summarization method that applies guidance signals of keywords to both the encoder and the decoder in the sequence-to-sequence model. A multi-task learning framework is adopted to jointly learn to extract keywords and generate a summary for the input sentence. The authors apply keywords-guided selective encoding strategies to filter source information by investigating the interactions between the input sentence and the keywords. They extend the pointer-generator network by a dual-attention and a dual-copy mechanism, which can integrate the semantics of the input sentence and the keywords, and copy words from both the input sentence and the keywords. The authors demonstrate that multi-task learning and keywords-oriented guidance facilitate sentence summarization task, achieving better performance than the competitive models on the English Gigaword sentence summarization dataset. AAAI 2020
A Cascade Approach to Neural Abstractive Summarization with Content Selection and Fusion The paper presents an empirical study supporting the use of a cascade architecture for neural text summarization. The study shows that a pipeline architecture, which separately identifies important content pieces and stitches them together, performs comparably or better than end-to-end systems that perform content selection and surface realization jointly. The paper also discusses the challenges of evaluating summarization systems and suggests future research directions. AACL 2020
SemSUM: Semantic Dependency Guided Neural Abstractive Summarization The paper proposes a new approach to neural abstractive summarization that incorporates semantic dependency graphs to improve semantic relevance and reduce content deviation in generated summaries. The proposed model, SemSUM, leverages the information of original input texts and corresponding semantic dependency graphs to guide the summarization process. The model was evaluated on three datasets and showed significant improvements in automatic evaluation ROUGE metrics. AAAI 2020
Joint Parsing and Generation for Abstractive Summarization The paper proposes a solution to the problem of ungrammatical and inaccurate sentences produced by abstractive summarization systems. The proposed method involves generating a sentence and its syntactic dependency parse simultaneously to encourage grammatical sentences and maintain the original meaning. The paper presents a novel neural architecture for abstractive summarization that combines a sequential decoder with a tree-based decoder and a human evaluation protocol to assess the accuracy of the summary. The method is evaluated on various datasets and shows competitive results against strong baselines. AAAI 2020
Self-Attention Guided Copy Mechanism for Abstractive Summarization The paper proposes a Transformer-based model to improve the copy mechanism in abstractive summarization. The model identifies the importance of each source word using degree centrality with a directed graph built by the self-attention layer. The centrality of each source word is used to guide the copy process explicitly, resulting in better performance than baseline methods on the CNN/Daily Mail and Gigaword datasets. ACL 2020
A Hierarchical Network for Abstractive Meeting Summarization with Cross-Domain Pretraining The paper discusses the challenge of summarizing meeting transcripts and proposes a novel abstractive summary network that adapts to the meeting scenario. The network includes a hierarchical structure to accommodate long transcripts and a role vector to depict the difference among speakers. The model is pre-trained on largescale news summary data due to the inadequacy of meeting summary data. The empirical results show that the proposed model outperforms previous approaches in both automatic metrics and human evaluation, with an increase in ROUGE-1 score from 34.66% to 46.28% on the ICSI dataset. EMNLP 2020
Friendly Topic Assistant for Transformer Based Abstractive Summarization The paper discusses the use of topic models to improve the performance of Transformer-based models in abstractive document summarization. The proposed model, called topic assistant (TA), includes three modules and is compatible with various Transformer-based models. TA is user-friendly and only introduces a small number of extra parameters. Experimental results on three datasets demonstrate that TA is able to improve the performance of several Transformer-based models. EMNLP 2020
Reducing Quantity Hallucinations in Abstractive Summarization The paper discusses the issue of hallucination in abstractive summaries and proposes a solution using the HERMAN system. HERMAN verifies specific entities in summaries and up-ranks those whose quantity terms are supported by the original text. Experimental results show higher precision and F1 scores for up-ranked summaries without a loss in recall, and human evaluation shows a preference for up-ranked summaries. EMNLP 2020
Rewards with Negative Examples for Reinforced Topic-Focused Abstractive Summarization System: This paper discusses the problem of generating abstractive summaries focused on a particular topic. The authors propose a deep reinforcement learning approach that uses a negative example baseline to improve the model's ability to identify what it should not focus on. They adapt existing datasets for this task and show that their approach outperforms a self-critical baseline in various evaluation metrics. EMNLP 2021
Enriching Transformers with Structured Tensor-Product Representations for Abstractive Summarization The paper discusses the task of abstractive summarization, which involves generating a concise summary of input documents. The authors adapt the TP-TRANSFORMER architecture, which enriches the original Transformer with the Tensor Product Representation (TPR), for this task. The model encodes two separate representations for each token to represent the syntactic structure and semantic content separately, and then binds them into the TPR as the layer output. The authors argue that this structured intermediate representation enables the model to better control the contents and structures when generating the summary. The TP-TRANSFORMER outperforms the Transformer and the original TP-TRANSFORMER significantly on several abstractive summarization datasets based on both automatic and human evaluations. The authors also demonstrate the emergent structural information in the role vectors and improved syntactic interpretability in the TPR layer outputs. NAACL 2021
Multi-Fact Correction in Abstractive Text Summarization The paper discusses the challenges faced by system-generated abstractive summaries, which often contain factual inconsistencies. To address this issue, the authors propose SpanFact, a suite of two factual correction models that use knowledge from question answering models to correct errors in system-generated summaries. The models use single or multimasking strategies to replace entities and ensure semantic consistency with the source text while retaining the syntactic structure of the summaries. Experiments show that SpanFact significantly improves the factual consistency of system-generated summaries without sacrificing summary quality. EMNLP 2020
On the Summarization of Consumer Health Questions The paper discusses the challenge of question understanding in question answering, particularly in the context of natural language questions that are longer than necessary and contain peripheral information. The authors study neural abstractive models for medical question summarization and introduce the MeQSum corpus of 1,000 summarized consumer health questions. They explore data augmentation methods and evaluate state-of-the-art neural abstractive models on this task. The authors show that semantic augmentation from question datasets improves performance and that pointer-generator networks outperform sequence-to-sequence attentional models, achieving a ROUGE-1 score of 44.16%. The paper also includes a detailed error analysis and suggestions for improving question summarization. ACL 2019
Factual Error Correction for Abstractive Summarization Models The paper discusses the challenge of ensuring factual consistency in abstractive summarization systems and proposes a post-editing corrector module to address this issue. The module is pre-trained on artificial examples created by applying heuristic transformations on reference summaries. Experimental results show that the model is able to correct factual errors in summaries generated by other neural summarization models and outperforms previous models on factual consistency evaluation on the CNN/DailyMail dataset. However, the paper also notes that transferring from artificial error correction to downstream settings is still challenging. EMNLP 2020
Pre-training for Abstractive Document Summarization by Reinstating Source Text The paper discusses the challenge of training large SEQ2SEQ based summarization models on limited supervised summarization data and presents three sequence-to-sequence pre-training objectives that allow for pre-training a SEQ2SEQ based abstractive summarization model on unlabeled text. These objectives include sentence reordering, next sentence generation, and masked document generation, which have close relations with the abstractive document summarization task. Experiments on two benchmark summarization datasets show that all three objectives can improve performance upon baselines. The method achieves comparable results to models pre-trained on large-scale data with only 19GB text for pre-training, demonstrating its effectiveness. Code and models are publicly available. EMNLP 2020
Controllable Abstractive Sentence Summarization with Guiding Entities The paper proposes a controllable abstractive sentence summarization model that generates summaries with guiding entities. The model ensures that entities appear in final output summaries and can generate more novel entities. The proposed model is evaluated using fine-grained informativeness metrics in the relevance, extraness, and omission perspectives. Experimental results show that the model outperforms the state-of-the-art methods in both automatic evaluation scores and informativeness metrics. COLING 2020
BASS: Boosting Abstractive Summarization with Unified Semantic Graph The paper proposes a new framework called BASS for abstractive summarization of long or multi-document text, which is challenging for the Seq2Seq architecture due to its inability to analyze long-distance relations in text. BASS utilizes a unified Semantic graph to aggregate co-referent phrases and convey rich relations between them. A graph-based encoder-decoder model is also proposed to improve document representation and summary generation by leveraging the graph structure. Several graph augmentation methods are designed to encode both explicit and implicit relations in the text, while the graph propagation attention mechanism is developed in the decoder to select salient content for the summary. Empirical results show that BASS brings substantial improvements for both long-document and multi-document summarization tasks. ACL 2021
Leveraging Lead Bias for Zero-shot Abstractive News Summarization The paper proposes leveraging the lead bias in news articles to pre-train abstractive news summarization models on large-scale unlabeled news corpora. The authors collect a massive news corpus and conduct data cleaning and filtering via statistical analysis. They apply self-supervised pre-training on this dataset to existing generation models BART and T5 for domain adaptation. The approach dramatically improves the summarization quality and achieves state-of-the-art results for zero-shot news summarization without any fine-tuning. The model is deployed in Microsoft News and provides public APIs as well as a demo website for multi-lingual news summarization. SIGIR 2021
Abstractive Text Summarization with Hierarchical Multi-scale Abstraction Modeling and Dynamic Memory System: The paper proposes a new approach to text summarization using hierarchical multi-scale abstraction modeling and dynamic memory. The system is designed to extract important information from large amounts of text and generate a concise summary. The approach is evaluated on several datasets and shows promising results compared to other state-of-the-art methods. SIGIR 2021
Improve Query Focused Abstractive Summarization by Incorporating Answer Relevance The paper proposes a new model called QFS-BART for generating summaries that are both coherent and answer-related to a given query. Unlike previous QFS models, QFS-BART considers the explicit answer relevance of the source documents given the query via a question answering model. The model also takes advantage of large pre-trained models for improved summarization performance. Empirical results on the Debatepedia dataset show that QFS-BART achieves state-of-the-art performance. ACL 2021
Improving Factual Consistency of Abstractive Summarization via Question Answering The paper addresses the problem of factual inconsistency in abstractive summarization models. The authors propose an efficient automatic evaluation metric to measure factual consistency and a novel learning algorithm that maximizes the proposed metric during model training. Through extensive experiments, the authors confirm that their method is effective in improving factual consistency and overall quality of the summaries, as judged by both automatic metrics and human evaluation. ACL 2021
AdaptSum: Towards Low-Resource Domain Adaptation for Abstractive Summarization The paper discusses the challenges faced by state-of-the-art abstractive summarization models due to their reliance on extensive labeled data, which limits their generalization ability on domains where such data are not available. The authors present a study of domain adaptation for the abstractive summarization task in a low-resource setting, focusing on the second phase of pre-training on large-scale generative models under three different settings. The experiments show that the effectiveness of pre-training is correlated with the similarity between the pre-training data and the target domain task. The authors also find that continuing pre-training could lead to catastrophic forgetting, and a learning method with less forgetting can alleviate this issue. The results highlight the need for more advanced domain adaptation methods for the abstractive summarization task, as a huge gap still exists between the low-resource and high-resource settings. NAACL 2021
Reinforcement Learning for Abstractive Question Summarization with Question-aware Semantic Rewards The paper discusses the need for reliable and accurate question answering systems for online consumer health questions. It introduces a reinforcement learning-based framework for abstractive question summarization, which proposes two novel rewards obtained from downstream tasks to regularize the question generation model. The proposed method achieves higher performance over state-of-the-art models and generates more diverse and semantically valid questions with fewer factual inconsistencies. The source code is available on GitHub. ACL 2021
SimCLS: A Simple Framework for Contrastive Learning of Abstractive Summarization The paper introduces a new framework called SIMCLS for abstractive summarization, which improves the performance of existing top-performing models by a large margin. The framework formulates text generation as a reference-free evaluation problem assisted by contrastive learning. The experimental results show that SIMCLS can achieve 2.51 absolute improvement against BART and 2.50 over PEGASUS w.r.t ROUGE-1 on the CNN/DailyMail dataset, driving the state-of-the-art performance to a new level. The codes and results have been open-sourced, and the proposed models have been deployed into the EXPLAINABOARD platform for researchers to understand the systems in a more fine-grained way. ACL 2021
Sentence-level Planning for Especially Abstractive Summarization System: The paper proposes a new model called the sentence planner model to generate more abstractive summaries. The model includes a hierarchical decoder that generates a representation for the next summary sentence and conditions the word generator on this representation. The generated summaries are more abstractive and achieve high ROUGE scores when compared to human reference summaries. The effectiveness of the design decisions is verified through extensive evaluations. EMNLP 2021
Exploring Multitask Learning for Low-Resource Abstractive Summarization The paper investigates the use of multitask learning for abstractive summarization with limited training data. Four different tasks, including extractive summarization, language modeling, concept detection, and paraphrase detection, are incorporated individually and in combination to improve abstractive summarization. The results show that multitask learning can enhance the performance of abstractive summarization, and certain tasks, such as paraphrase detection, consistently benefit the task. EMNLP 2021
Planning with Learned Entity Prompts for Abstractive Summarization The paper introduces a mechanism to improve the generation of abstractive summaries by learning an intermediate plan that grounds the summary generation. This is achieved by prepending target summaries with entity chains, which are ordered sequences of entities mentioned in the summary. Transformer-based sequence-to-sequence models are then trained to generate the entity chain and continue generating the summary based on the entity chain and input. The approach was evaluated on multiple datasets and demonstrated improved entity specificity and planning in summaries, achieving state-of-the-art performance in terms of ROUGE on some datasets. The mechanism also provides a way to control hallucinations in abstractive summaries, outperforming state-of-the-art approaches for faithfulness when evaluated automatically and by humans. TACL 2021
Discourse Understanding and Factual Consistency in Abstractive Summarization The paper introduces a framework called Co-opNet for generating abstractive summaries with factual consistency and narrative flow. Co-opNet is a transformer-based framework where a generator works with a discriminator architecture to compose coherent long-form summaries. The paper explores four different discriminator objectives to capture different aspects of coherence. The ability of Co-opNet to learn these objectives is measured using arXiv scientific papers, with empirical results showing improved global coherence compared to competitive baselines. EACL 2021
EASE: Extractive-Abstractive Summarization End-to-End using the Information Bottleneck Principle The paper proposes a new framework called EASE that combines the strengths of extractive and abstractive summarization systems to generate concise and interpretable summaries. The framework uses the Information Bottleneck principle to jointly train extraction and abstraction in an end-to-end fashion. Inspired by human summarization methods, the framework first extracts a pre-defined amount of evidence spans and then generates a summary using only the evidence. The authors show through automatic and human evaluations that the generated summaries are better than strong extractive and extractive-abstractive baselines. EMNLP 2021
Learn to Copy from the Copying History: Correlational Copy Network for Abstractive Summarization The paper proposes a new copying scheme called Correlational Copying Network (CoCoNet) for abstractive summarization that enhances the standard copying mechanism by keeping track of the copying history. CoCoNet takes advantage of prior copying distributions and encourages the model to copy the input word that is relevant to the previously copied one. The model is strengthened through pretraining with suitable corpora that simulate the copying behaviors. Experimental results show that CoCoNet can copy more accurately and achieves new state-of-the-art performances on summarization benchmarks, including CNN/DailyMail for news summarization and SAMSum for dialogue summarization. The code is available at https://github.com/hrlinlp/coconet. EMNLP 2021
Knowledge and Keywords Augmented Abstractive Sentence Summarization Abstractive Sentence summarization method that addresses the issue of sparse knowledge structure. The proposed method utilizes topic keywords and knowledge structure to generate high-quality summaries. The results show that KAS outperforms existing methods in terms of ROUGE scores and human evaluation. EMNLP 2021
GSum: A General Framework for Guided Neural Abstractive Summarization The paper discusses the challenges of neural abstractive summarization models, which can produce coherent summaries but may be unfaithful and difficult to control. The authors propose a guided summarization framework (GSum) that can effectively take different types of external guidance as input and demonstrate its effectiveness in achieving state-of-the-art performance on popular summarization datasets. The authors also show how different types of guidance can generate qualitatively different summaries, providing a degree of controllability to the learned models. NAACL 2021
Entity-level Factual Consistency of Abstractive Text Summarization System: The paper discusses the challenge of ensuring factual consistency in abstractive summarization, particularly in relation to entity hallucination. The authors propose new metrics to measure entity-level factual consistency and suggest filtering training data as a solution to the problem. They also propose a summary-worthy entity classification task and a joint entity and summary generation approach to further improve entity level metrics. EACL 2021
Enhancing Factual Consistency of Abstractive Summarization The paper discusses the issue of inconsistency between automatic abstractive summaries and the original text, which can distort or fabricate facts. To address this problem, the authors propose a fact-aware summarization model called FASUM, which integrates factual relations into the summary generation process using graph attention. They also introduce a factual corrector model called FC to automatically correct factual errors in existing summaries. Empirical results show that FASUM produces more factually consistent summaries compared to existing systems, and FC can improve the factual consistency of given summaries by modifying only a few keywords. NAACL 2021
Global-aware Beam Search for Neural Abstractive Summarization The paper presents a new algorithm for neural abstractive summarization that improves upon the local optimality problem of the original beam search. The algorithm uses a novel global protocol based on the attention distribution to generate summaries in a near-global optimal fashion. The global attention distribution can be predicted before inference, allowing for step-wise improvements on the beam search through the global scoring mechanism. The algorithm is shown to significantly improve state-of-the-art summarization models on nine datasets and remains robust even with corrupted attention distributions. The codes and examples are available. NEURIPS 2022
Better Highlighting: Creating Sub-Sentence Summary Highlights System: The paper proposes a method to generate summary highlights that can be overlaid on original documents to help readers sift through large amounts of text. The method aims to prevent distortion of the original meaning by providing summaries in context. The method combines determinantal point processes and deep contextualized representations to identify important and non-redundant sub-sentence segments to form self-contained highlights. The paper presents extensive experiments on summarization datasets to demonstrate the flexibility and modeling power of the method. The authors conclude that highlighting is a promising avenue for future summarization research. EMNLP 2020
A New Approach to Overgenerating and Scoring Abstractive Summaries The paper proposes a new approach to generate multiple summaries with diverse content and varying lengths, and then select the best ones based on user needs. The approach involves a two-staged strategy to generate a diverse set of candidate summaries from the source text and then score and select admissible ones. The generator gives precise control over the length of the summary, and the selectors are designed to predict the optimal summary length and emphasize faithfulness to the original text. The approach achieves state-of-the-art performance in benchmark summarization datasets. NAACL 2021
Improving Faithfulness in Abstractive Summarization with Contrast Candidate Generation and Selection The paper discusses how current models for neural abstractive summarization often generate summaries that are not faithful to the original context. To address this issue, the authors propose a post-processing technique called contrast candidate generation and selection. They generate alternative candidate summaries where named entities and quantities are replaced with compatible semantic types from the source document, and then use a discriminative correction model to select the best candidate as the final output summary. The authors' experiments show that this method is effective in identifying and correcting extrinsic hallucinations. They also analyze the typical hallucination phenomenon by different types of neural summarization systems, in hope to provide insights for future work on the direction. NAACL 2021
Unsupervised Reference-Free Summary Quality Evaluation via Contrastive Learning The paper proposes a new method for evaluating the quality of document summarization systems without requiring human-generated reference summaries. The method uses unsupervised contrastive learning and a new metric based on BERT that covers both linguistic qualities and semantic informativeness. The model is trained with a ranking loss using different types of negative samples for each summary. The experiments on Newsroom and CNN/Daily Mail datasets show that the proposed method outperforms other metrics and is generalizable across datasets. EMNLP 2020
To Point or Not to Point: Understanding How Abstractive Summarizers Paraphrase Text The paper discusses the limitations of abstractive neural summarization models despite their improved ROUGE scores. The authors conducted experiments on the pointer-generator model to understand how it controls its level of abstraction and extraction. The model utilizes syntactic boundaries to truncate sentences on an extractive-biased dataset, but when forced to generate, it only shows simple paraphrasing abilities with factual inaccuracies and hallucinations. On an abstractive-biased dataset, the model copies infrequently and shows limited abstractive abilities. The results suggest that abstractive summarization models lack the semantic understanding necessary to generate faithful and abstractive paraphrases. ACL 2021
Attention Head Masking for Inference Time Content Selection in Abstractive Summarization The paper presents a technique called attention head masking to effectively inform content selection in Transformer-based abstractive summarization models. This technique is applied on encoder-decoder attentions to identify important content during inference. The authors demonstrate the effectiveness of this technique on three document summarization datasets, including in-domain and cross-domain settings. Their models outperform prior state-of-the-art models on CNN/Daily Mail and New York Times datasets. Additionally, the inferencetime masking technique is data-efficient, requiring less than 20% of the training samples to outperform BART fine-tuned on the full CNN/DailyMail dataset. NAACL 2021
Using Question Answering Rewards to Improve Abstractive Summarization The paper discusses the issues with current neural abstractive summarization models and presents a framework to train these models to improve their summaries. The framework involves training a sequence-to-sequence model and then further training it in a Reinforcement Learning setting with question-answering based rewards. The experimental results show that this approach can improve the quality of the summaries generated by these models, with human evaluations showing a preference for the approach over general abstractive summarization models 30% of the time. EMNLP 2021
SummaReranker: A Multi-Task Mixture-of-Experts Re-ranking Framework for Abstractive Summarization The paper discusses the limitations of using beam search to generate summaries with sequence-to-sequence neural networks, due to the large search space and exposure bias. The authors propose a solution of directly training a second-stage model to perform re-ranking on a set of summary candidates, resulting in improved performance of the base model. Their SummaReranker model achieves state-of-the-art results on several datasets, with code and checkpoints available online. ACL 2022
Faithful Abstractive Summarization via Fact-aware Consistency-constrained Transformer The paper proposes a new model for abstractive summarization called Entity-Relation Pointer Generator Network (ERPGN) that formalizes the facts in the original document as a factual knowledge graph and generates a high-quality summary by directly modeling consistency between the summary and the knowledge graph. The model uses two pointer network structures to capture the facts in the original document and two semantic-level losses to measure the disagreement between the summary and the facts. The experiments show that ERPGN outperforms classic abstractive summarization models and state-of-the-art fact-aware baseline methods in terms of faithfulness. CIKM 2022
CLIFF: Contrastive Learning for Improving Faithfulness and Factuality in Abstractive Summarization System: The paper discusses a new approach to generating abstractive summaries that are both faithful and factually consistent with the given articles. The approach uses a contrastive learning formulation that leverages both reference summaries and automatically generated erroneous summaries to train summarization systems that are better at distinguishing between them. The paper also describes four strategies for creating negative samples that resemble errors made commonly by two state-of-the-art models, BART and PEGASUS. Experiments on XSum and CNN/Daily Mail show that the contrastive learning framework consistently produces more factual summaries than other approaches, according to QA-based factuality evaluation. Human judges also find that the model summaries correct more errors. EMNLP 2021
Exploring Explainable Selection to Control Abstractive Summarization The paper discusses the limitations of current neural models for document summarization, which lack transparency and control. To address this issue, the authors propose a novel select-and-generate framework called ESCA that focuses on explainability. The framework reveals the latent centrality and interactions between sentences, along with scores for sentence novelty and relevance, to give users a window into the choices the model is making and an opportunity to guide those choices. A novel pair-wise matrix captures the sentence interactions, centrality, and attribute scores, and a mask with tunable attribute thresholds allows the user to control which sentences are likely to be included in the extraction. A sentence-deployed attention mechanism in the abstractor ensures the final summary emphasizes the desired content. ESCA outperformed eight state-of-the-art models on the CNN/DailyMail and NYT50 benchmark datasets in a series of experiments assessed with ROUGE metrics and two human evaluations. AAAI 2021
Improving the Faithfulness of Abstractive Summarization via Entity Coverage Control The paper discusses the limitations of abstractive summarization systems that use pre-training language models, which are prone to hallucinating facts that are not faithful to the input context. To address this issue, the authors propose a method called Entity Coverage Control (ECC) that computes entity coverage precision and adds a control code to each training example to guide the model to recognize faithful contents. They also extend their method through intermediate fine-tuning on noisy data extracted from Wikipedia to enable zero-shot summarization. The proposed method leads to more faithful and salient abstractive summarization in supervised fine-tuning and zero-shot settings, as demonstrated by experimental results on three benchmark datasets of different domains and styles. NAACL 2022
PSP: Pre-trained Soft Prompts for Few-Shot Abstractive Summarization The paper presents a new approach to few-shot abstractive summarization using a soft prompts architecture coupled with prompt pre-training and fine-tuning. The soft prompts consist of continuous input embeddings across an encoder and decoder, with a new inner-prompt introduced to capture document-level information. The approach uses prompt pre-training with self-supervised pseudo-data to teach the model basic summarizing capability, followed by fine-tuning with few-shot examples using lightweight soft prompts. Experimental results on the CNN/DailyMail and XSum datasets show that the method outperforms full-model tuning and Prompt Tuning, and delivers competitive results against PrefixTuning with significantly fewer parameters. COLING 2022
FACTPEGASUS: Factuality-Aware Pre-training and Fine-tuning for Abstractive Summarization The paper presents FACTPEGASUS, an abstractive summarization model that focuses on factuality during pre-training and finetuning. The model uses a sentence selection strategy to create pseudosummaries that are both important and factual, and introduces three complementary components for fine-tuning: a corrector to remove hallucinations, a contrastor to differentiate factual from nonfactual summaries, and a connector to improve knowledge transfer. Experiments show that FACTPEGASUS substantially improves factuality and is more factual than using the original pre-training objective in zero-shot and few-shot settings, while also retaining factual behavior more robustly than strong baselines. NAACL 2022
Attention Temperature Matters in Abstractive Summarization Distillation The paper discusses how abstractive text summarization relies on large, computationally expensive pre-trained sequence-to-sequence Transformer models, and proposes a method to distill these models into smaller ones with minimal performance loss. The method involves manipulating attention temperatures in Transformers to make pseudo labels easier to learn for student models. Experiments on three summarization datasets show that this method consistently improves vanilla pseudo-labeling based methods, and both pseudo labels and summaries produced by the student models are shorter and more abstractive. The code for the proposed method is available on GitHub. ACL 2022
Towards Abstractive Grounded Summarization of Podcast Transcripts The paper discusses the challenges of summarizing podcasts, including factual inconsistencies and speech disfluencies in transcripts. The authors propose a novel abstractive summarization method that grounds summary segments in specific regions of the transcript to improve summarization quality. They conducted a series of analyses on a large podcast dataset and found that their approach achieved promising results, improving both automatic and human evaluation of summarization quality. ACL 2022
Extractive Elementary Discourse Units for Improving Abstractive Summarization The paper discusses the use of elementary discourse units (EDUs) as the textual unit of content selection for abstractive summarization. The authors propose a novel summarization model that first designs an EDU selector to choose salient content, and then the generator model rewrites the selected EDUs as the final summary. To determine the relevancy of each EDU on the entire document, the authors apply group tag embedding. Extensive experiments on the CNN/Daily Mail dataset have demonstrated the effectiveness of their model. SIGIR 2022
BRIO: Bringing Order to Abstractive Summarization The paper proposes a new training paradigm for abstractive summarization models that assumes a non-deterministic distribution, which assigns probability mass to different candidate summaries based on their quality. This approach addresses the performance degradation issue during inference, where the model needs to compare system-generated summaries that deviate from the reference summary. The proposed method achieves a new state-of-the-art result on the CNN/DailyMail and XSum datasets, and can estimate probabilities of candidate summaries that are more correlated with their level of quality. ACL 2022
Source-summary Entity Aggregation in Abstractive Summarization The paper discusses the phenomenon of referring to entities in later discourse by a more general description, and how this applies to summarization. The authors categorize these instances as source-summary entity aggregations and analyze them in the CNN/DAILYMAIL corpus. They examine how well three state-of-the-art summarization systems can generate such aggregations and develop techniques to encourage them to generate more. The results show that there is significant room for improvement in producing semantically correct aggregations. COLING 2022
Soft Layer-Specific Multi-Task Summarization with Entailment and Question Generation The paper proposes a method to improve abstractive summarization by using multi-task learning with the auxiliary tasks of question generation and entailment generation. The former helps the summarization model identify salient questioning-worthy details, while the latter teaches the model how to rewrite a summary that is a directed-logical subset of the input document. The paper also proposes novel multitask architectures with high-level layer-specific sharing and soft-sharing mechanisms, which result in statistically significant improvements over the state-of-the-art on various datasets. The paper presents quantitative and qualitative analysis studies of the model's learned saliency and entailment skills. ACL 2018
Improving Abstraction in Text Summarization The paper proposes two techniques to improve the level of abstraction in abstractive text summarization. The first technique involves decomposing the decoder into a contextual network and a pretrained language model. The second technique involves a novelty metric that encourages the generation of novel phrases. The proposed model achieves results comparable to state-of-the-art models, while achieving a significantly higher level of abstraction as measured by n-gram overlap with the source document. EMNLP 2018
Should We Trust This Summary? Bayesian Abstractive Summarization to The Rescue The paper explores uncertainty in modern abstractive summarization models using Bayesian Deep Learning. They use Monte Carlo dropout to approximate Bayesian inference and perform multiple stochastic forward passes to quantify uncertainty at prediction time. This allows for filtering out generated summaries of high uncertainty and can be used for selecting samples for annotation. Bayesian inference also enables finding a summary that performs better than a deterministic one and is more robust to uncertainty. Their Variational Bayesian equivalents of BART and PEGASUS outperform their deterministic counterparts on multiple benchmark datasets. ACL 2022
Length Control in Abstractive Summarization by Pretraining Information Selection The paper proposes a new approach for length-controllable summarization models that adapts the encoding of the source based on the desired length. This is achieved through a length-aware attention mechanism (LAAM) that is trained on a summary length balanced dataset built from the original training data. The results show that this approach is effective in generating high-quality summaries with desired lengths, including those that were not seen in the original training set. Previous models tended to generate summaries as long as those in the training data, but LAAM can generate shorter summaries as well. ACL 2022
Neural Network-Based Abstract Generation for Opinions and Arguments System: The paper proposes a neural network model that generates informative and concise summaries for opinionated text. The model uses an attention-based mechanism to absorb information from multiple text units and an importance-based sampling method to integrate important input. The system outperforms state-of-the-art summarization systems on newly collected datasets of movie reviews and arguments and is rated higher in human evaluation for informativeness and grammaticality. NAACL 2016
Get To The Point: Summarization with Pointer-Generator Networks The paper discusses the limitations of neural sequence-to-sequence models for abstractive text summarization, which can inaccurately reproduce factual details and repeat themselves. The authors propose a new architecture that uses a hybrid pointer-generator network to accurately reproduce information while retaining the ability to generate novel words, and coverage to discourage repetition. The model is applied to the CNN/Daily Mail summarization task and outperforms the current abstractive state-of-the-art by at least 2 ROUGE points. ACL 2017
From Neural Sentence Summarization to Headline Generation: A Coarse-to-Fine Approach The paper discusses the challenge of extending sentence summarization models to the task of document headline generation. The proposed solution is a coarse-to-fine approach that first identifies important sentences using document summarization techniques and then uses a multi-sentence summarization model with hierarchical attention to generate headlines. The approach significantly improves the performance of neural sentence summarization models on the headline generation task, as demonstrated by experimental results on a large real dataset. IJCAI 2017
Don’t Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization The paper introduces a new summarization task called extreme summarization, which requires an abstractive modeling approach to create a one-sentence news summary that answers the question "What is the article about?" A large dataset was collected from the BBC, and a novel abstractive model based on convolutional neural networks was proposed. The model was shown to outperform both extractive and abstractive approaches when evaluated by humans and automatically. The architecture captures long-range dependencies in a document and recognizes pertinent content. EMNLP 2018
Multi-Reward Reinforced Summarization with Saliency and Entailment The paper discusses the task of abstractive text summarization, which involves compressing a long document into a short summary while maintaining important aspects such as saliency, logical entailment, and non-redundancy. The authors propose a reinforcement learning approach with two novel reward functions, ROUGESal and Entail, in addition to a coverage-based baseline. The ROUGESal reward up-weights salient phrases/words detected via a keyphrase classifier, while the Entail reward gives high scores to logically-entailed summaries using an entailment classifier. The authors show that combining these rewards with traditional metric-based rewards leads to superior performance improvement, achieving state-of-the-art results on the CNN/Daily Mail dataset and strong improvements on the DUC-2002 dataset. NAACL 2018
Learning to Encode Text as Human-Readable Summaries using Generative Adversarial Networks The paper proposes a method for achieving unpaired abstractive summarization using an auto-encoder that encodes input text into human-readable sentences. The auto-encoder consists of a generator and a reconstructor, with a discriminator used to ensure the generator output resembles human-written sentences. The generator encodes the input text into a shorter word sequence, and the reconstructor recovers the generator input from the generator output. This approach achieves abstractive summarization without the need for document-summary pairs as training data, and promising results are shown on both English and Chinese corpora. EMNLP 2018
A Hierarchical End-to-End Model for Jointly Improving Text Summarization and Sentiment Classification The paper proposes a hierarchical end-to-end model for joint learning of text summarization and sentiment classification, where the sentiment classification label is treated as a further "summarization" of the text summarization output. The model achieves better performance than strong baseline systems on both abstractive summarization and sentiment classification, as shown by experimental results on Amazon online reviews datasets. Text summarization and sentiment classification aim to capture the main ideas of the text at different levels, with text summarization describing the text within a few sentences and sentiment classification summarizing the text into an even more abstract fashion, i.e., a sentiment class. IJCAI 2018
Answers Unite! Unsupervised Metrics for Reinforced Summarization Models The paper discusses how abstractive summarization approaches based on Reinforcement Learning (RL) can overcome classical likelihood maximization. The most commonly used summarization metric, ROUGE, has limitations such as bias towards lexical similarity and suboptimal accounting for fluency and readability. The paper proposes alternative evaluation measures based on Question Answering, which were found to be favorable compared to ROUGE and do not require reference summaries. Training a RL-based model on these metrics leads to improvements in both human and automated metrics. EMNLP 2019
Summary Cloze: A New Task for Content Selection in Topic-Focused Summarization The paper proposes a new method for studying content selection in topic-focused summarization called the summary cloze task. The task involves generating the next sentence of a summary based on a topic, a partial summary, and a reference document. The challenge is deciding what information in the references is relevant to the topic and partial summary and should be included in the summary. The paper reports experimental results on a dataset of nearly 500k summary cloze instances from Wikipedia using various extractive and abstractive models. The results show that the task remains a significant challenge, but the topic and partial summary help the models identify relevant content. EMNLP 2019
Text Summarization with Pretrained Encoders The paper discusses the use of Bidirectional Encoder Representations from Transformers (BERT) in text summarization and proposes a framework for both extractive and abstractive models. They introduce a document-level encoder based on BERT that can express the semantics of a document and obtain representations for its sentences. They also propose a new fine-tuning schedule for abstractive summarization that adopts different optimizers for the encoder and decoder to alleviate the mismatch between the two. The experiments on three datasets show that their model achieves state-of-the-art results in both extractive and abstractive settings. EMNLP 2019
Generating Formality-tuned Summaries Using Input-dependent Rewards The paper discusses a reinforcement learning based approach to generate formality-tailored summaries for an input article. The model can generate both formal and informal summary variants, accommodating the psycho-linguistic preferences of the intended audience. The proposed framework includes a novel input-dependent reward function that aids in training the model with stylistic feedback on sampled and ground-truth summaries. Automated and qualitative evaluations show the viability of the approach. CONLL 2019
OPINIONDIGEST: A Simple Framework for Opinion Summarization The paper presents OPINIONDIGEST, an opinion summarization framework that uses an Aspect-based Sentiment Analysis model to extract opinion phrases from reviews and trains a Transformer model to reconstruct the original reviews. The framework selects the most popular opinions and uses them to generate an opinion summary. OPINIONDIGEST can also generate customized summaries by filtering opinions according to aspect and sentiment. The framework outperforms competitive baselines in automatic evaluation and produces informative summaries with promising customization capabilities, as verified by human studies. ACL 2020
Summarizing Text on Any Aspects: A Knowledge-Informed Weakly-Supervised Approach The paper discusses aspect-based abstractive summarization, which generates a summary of a document based on a specific topic of interest. Previous studies have only focused on a small set of pre-defined topics, limiting the application of the task. The authors propose a new method that allows summarization on arbitrary topics relevant to the document, using external knowledge sources such as ConceptNet and Wikipedia. Experiments show that their approach improves performance on both real and synthetic documents. EMNLP 2020
Multi-hop Inference for Question-driven Summarization The paper proposes a new method called Multi-hop Selective Generator (MSG) for question-driven abstractive summarization. This method incorporates multi-hop reasoning to provide justifications for the generated summaries. The proposed method outperforms state-of-the-art methods on two non-factoid QA datasets, namely WikiHow and PubMedQA. The method jointly models the relevance to the question and the interrelation among different sentences via a human-like multi-hop inference module and a gated selective pointer generator network with a multi-view coverage mechanism. EMNLP 2020
Improving Truthfulness of Headline Generation The paper discusses the concern about the truthfulness of generated summaries in abstractive summarization and explores improving the truthfulness in headline generation on two popular datasets. The study analyzes headlines generated by the state-of-the-art encoder-decoder model and shows that the model sometimes generates untruthful headlines due to untruthful supervision data used for training the model. To remedy this problem, the study hypothesizes that removing untruthful instances from the supervision data may help and builds a binary classifier that predicts an entailment relation between an article and its headline to filter out untruthful instances. Experimental results demonstrate that the headline generation model trained on filtered supervision data shows remarkable improvements in automatic and manual evaluations of the generated headlines. ACL 2020
Long-Span Summarization via Local Attention and Content Selection The paper discusses the use of transformer-based models in natural language processing tasks, specifically document summarization. While these models have achieved impressive results, they struggle with scaling as input length grows, making it difficult to train or fine-tune them for long document summarization. The paper proposes two methods, local self-attention and explicit content selection, to address long-span dependencies in abstractive summarization. The approaches are compared on various network configurations and tested on standard long-span summarization tasks, achieving state-of-the-art results on all three tasks in the ROUGE scores. The paper also notes that their approach can achieve comparable or better results than existing approaches without requiring a large-scale GPU card. ACL 2021
QuestEval: Summarization Asks for Fact-based Evaluation The paper discusses the limitations of current metrics for evaluating summarization, such as ROUGE, and proposes a new framework called QUESTEVAL. Unlike other metrics, QUESTEVAL does not require a groundtruth reference and relies on question answering models to assess whether a summary contains all the relevant information from its source document. The paper shows that QUESTEVAL significantly improves the correlation with human judgments over four evaluation dimensions: consistency, coherence, fluency, and relevance. The authors also provide code and models for the framework. EMNLP 2021
Sequence Level Contrastive Learning for Text Summarization The paper proposes a contrastive learning model for supervised abstractive text summarization, which maximizes the similarities between different views of the same mean representation during training. The model outperforms a strong sequence-to-sequence text generation model on three different summarization datasets and achieves better faithfulness ratings in human evaluation. The code is available at https://github.com/xssstory/SeqCo. AAAI 2022
Towards Zero-Shot Conditional Summarization with Adaptive Multi-Task Fine-Tuning The paper discusses the problem of conditional summarization, where content selection and surface realization are based on a natural language question or topic description. The authors explore the use of multi-task fine-tuning (MTFT) on twenty-one natural language tasks to enable zero-shot conditional summarization on five tasks. They present four new summarization datasets and report zero-shot performance using T5 and BART, demonstrating that MTFT can improve zero-shot summarization quality. The paper highlights the importance of specific summaries for applications such as question answering and literature discovery. EMNLP 2020
RewardsOfSum: Exploring Reinforcement Learning Rewards for Summarisation The paper proposes two reward functions for abstractive summarization, RwBHinge and RISK, to improve upon the negative loglikelihood (NLL) baselines commonly used in training models. The experiments show that the proposed approach consistently improves performance over the NLL baselines when fine-tuning an NLL pre-trained model on nine diverse summarization datasets. The reward function used in reinforcement learning plays a key role in performance and is still partially unexplored. ACL 2021
Sparsity and Sentence Structure in Encoder-Decoder Attention of Summarization Systems The paper discusses the challenges of using transformer models for NLP tasks, particularly in summarization, due to the computational expense of the encoder-decoder attention mechanism. The authors propose a modified architecture that selects a subset of input sentences to constrain the attention mechanism, based on the empirical observation of a sparse sentence structure in document summarization. Experiments on various summarization tasks show that the proposed approach maintains system performance while reducing computational cost. EMNLP 2021
Faithful to the Document or to the World? Mitigating Hallucinations via Entity-Linked Knowledge in Abstractive Summarization The paper discusses how existing abstractive summarization systems generate text that is not directly inferable from the source alone, resulting in content hallucinations. These hallucinations are sometimes factual but unfaithful to the source. The paper suggests that these factual hallucinations occur due to the prevalence of factual yet unfaithful entities in summarization datasets. The authors find that these entities are examples of additional world knowledge being used to connect entities and concepts. They demonstrate that connecting entities to an external knowledge base can improve the factuality of summaries without making them more extractive. EMNLP 2022
Controllable Summarization with Constrained Markov Decision Process The paper discusses controllable text summarization, which allows users to control specific attributes of generated summaries. The authors propose a new training framework based on Constrained Markov Decision Process (CMDP) that includes a reward function and constraints to improve summarization control. The reward function encourages summaries to resemble human-written references, while the constraints prevent generated summaries from violating user-imposed requirements. The framework can be used to control important attributes of summarization, such as length, covered entities, and abstractiveness. Experiments show that the CMDP framework helps generate informative summaries while complying with specific attribute requirements. TACL 2021
StructSum: Summarization via Structured Representations The paper discusses the challenges faced by abstractive text summarization models, including layout bias, limited abstractiveness, and lack of transparency. The authors propose a framework based on document-level structure induction for summarization that incorporates latent and explicit dependencies across sentences in the source document into end-to-end single-document summarization models. The framework improves the coverage of content in the source documents, generates more abstractive summaries by generating more novel n-grams, and incorporates interpretable sentence-level structures, while performing on par with standard baselines. The framework was trained on the CNN/DM dataset. EACL 2021
Informative and Controllable Opinion Summarization The paper proposes a new approach to opinion summarization that eliminates the need for pre-selected content and allows for the use of all input reviews. The approach involves condensing the reviews into multiple dense vectors which are then used as input to an abstractive model. The framework also includes a zero-shot customization technique that takes user preferences into account. Experimental results show that the proposed model outperforms existing methods on the Rotten Tomatoes dataset and generates more informative and customized summaries. EACL 2021
Annotating and Modeling Fine-grained Factuality in Summarization The paper discusses the issue of factual errors in abstractive summarization systems and explores different data sources for training models to identify these errors. The authors found that factual errors differ significantly across datasets and that human-labeled data with fine-grained annotations is more effective for training models than synthetic data or sentence-level annotations. They also show that their best factuality detection model enables training of more factual summarization models by identifying non-factual tokens in the training data. NAACL 2021
Jointly Learning Guidance Induction and Faithful Summary Generation via Conditional Variational Autoencoders The paper discusses the challenges of generating factual consistency summaries through abstractive summarization and proposes a novel framework based on conditional variational autoencoders to induce guidance information and generate summaries equipped with guidance synchronously. The approach is shown to generate relevant and fluent summaries that are more faithful than existing state-of-the-art approaches according to multiple factual consistency metrics, as demonstrated through experiments on XSUM and CNNDM datasets. NAACL 2022
Efficient Few-Shot Fine-Tuning for Opinion Summarization The paper discusses the challenges of abstractive summarization in opinion summarization due to the lack of large annotated datasets of reviews paired with reference summaries. To address this, the authors propose a few-shot method based on adapters that can easily store in-domain knowledge. Instead of fine-tuning the entire model, adapters are added and pre-trained in a task-specific way on a large corpus of unannotated customer reviews, using held-out reviews as pseudo summaries. The adapters are then fine-tuned on the small available human-annotated dataset. The authors show that this self-supervised adapter pre-training improves summary quality over standard fine-tuning. Additionally, for summary personalization, the authors condition on aspect keyword queries, automatically created from generic datasets. This results in better-organized summary content reflected in improved coherence and fewer redundancies. NAACL 2022
Towards Summarizing Healthcare Questions in Low-Resource Setting The paper discusses the challenges of creating large-scale datasets for abstractive document summarization in closed domains like healthcare, where human annotation requires domain expertise. The authors propose a data selection strategy that uses guided semantic-overlap and diversity-based objective functions to generate diverse and semantic questions in a low-resource setting. Their experiments on benchmark healthcare question summarization datasets show that their method achieves new state-of-the-art results and generates diverse, fluent, and informative summarized questions. COLING 2022
Modeling Content Importance for Summarization with Pre-trained Language Models The paper discusses the challenge of modeling content importance for summarization, which previous methods have struggled with due to their focus on word-level salience and lack of consideration for semantics and context. The authors propose a new approach that applies information theory to pretrained language models, allowing for a more comprehensive evaluation of importance that can be applied to different types of semantic units. Experiments on two datasets show that their method outperforms prior work in terms of F1 and ROUGE scores. EMNLP 2020
APRIL: Interactively Learning to Summarise by Combining Active Preference Learning and Reinforcement Learning The paper proposes a method for automatic document summarization that learns from users' preferences instead of using reference summaries. The method reduces sample complexity by leveraging active learning, preference learning, and reinforcement learning techniques through a new objective function. The authors conducted both simulation and real-user experiments, which showed that their method significantly advances the state of the art. The source code is available for free on GitHub. EMNLP 2018
Closed-Book Training to Improve Summarization Encoder Memory The paper discusses the importance of a strong encoder in neural sequence-to-sequence summarization models and proposes a method to improve the encoder's memorization capabilities by adding an additional 'closed-book' decoder without attention and pointer mechanisms. This forces the encoder to be more selective in the information it encodes in its memory state, leading to improved performance on the CNN/Daily Mail dataset in terms of ROUGE and METEOR metrics, as well as human evaluation. The paper also presents several tests and ablations to demonstrate the effectiveness of the proposed method. EMNLP 2018
Generating topic-oriented summaries using neural attention System: The paper proposes an attention-based RNN framework to generate multiple summaries of a single document that are tuned to different topics of interest. Existing summarization algorithms generate a single summary and cannot generate multiple summaries that are tailored to the interests of different readers. The proposed method outperforms existing baselines and suggests that generative networks can be successfully biased to look at sentences relevant to a topic and generate topic-tuned summaries. NAACL 2018
Retrieve, Rerank and Rewrite: Soft Template Based Neural Summarization System: The paper proposes a new approach to seq2seq summarization that uses existing summaries as soft templates to guide the model. The authors retrieve proper summaries as candidate templates using an IR platform and extend the seq2seq framework to conduct template reranking and template-aware summary generation. Experiments show that this approach significantly outperforms state-of-the-art methods and even soft templates themselves demonstrate high competitiveness. Importing high-quality external summaries also improves the stability and readability of generated summaries. ACL 2018
Attribute-aware Sequence Network for Review Summarization The paper proposes an Attribute-aware Sequence Network (ASN) for review summarization that takes into account users' characteristics such as gender, age, and occupation. The ASN includes three modules: an attribute encoder, an attribute-aware review encoder, and an attribute-aware summary decoder. The authors validate their model using a new dataset called TripAtt, which includes 495,440 attribute-review-summary triplets. The experiments show that ASN achieves state-of-the-art performance on review summarization in both auto-metric ROUGE and human evaluation. EMNLP 2019
Pretraining-Based Natural Language Generation for Text Summarization System: The paper proposes a new pretraining-based encoder-decoder framework for generating output sequences from input sequences in two stages. The encoder uses BERT to encode the input sequence into context representations, while the decoder uses a Transformer-based decoder to generate a draft output sequence in the first stage. In the second stage, each word of the draft sequence is masked and fed to BERT, and the input sequence and draft representation generated by BERT are combined to predict the refined word for each masked position using a Transformer-based decoder. This approach is the first to apply BERT to text generation tasks, and the proposed method is evaluated on the text summarization task, achieving new state-of-the-art results on both CNN/Daily Mail and New York Times datasets. CONLL 2019
Convex Aggregation for Opinion Summarization The paper discusses recent advances in text autoencoders and their ability to generate grammatically correct and consistent text from aggregated latent vectors. However, the commonly used simple average approach for vector aggregation can lead to overly generic summaries due to unexpected L2-norm shrinkage in the aggregated latent vectors, which the paper refers to as summary vector degeneration. To address this issue, the authors develop a framework called COOP, which searches input combinations for the latent vector aggregation using input-output word overlap. Experimental results show that COOP successfully alleviates the summary vector degeneration issue and establishes new state-of-the-art performance on two opinion summarization benchmarks. The code for COOP is available at https://github.com/megagonlabs/coop. EMNLP 2021
Learning Opinion Summarizers by Selecting Informative Reviews The paper discusses the challenges of opinion summarization and proposes a new approach that involves jointly learning to select informative subsets of reviews and summarizing the opinions expressed in these subsets. The authors collected a large dataset of summaries paired with user reviews for over 31,000 products, but the large number of reviews per product made summarization impractical. The authors use amortized variational inference and policy gradient methods for joint training and demonstrate the importance of selecting informative reviews resulting in improved quality of summaries and reduced hallucinations. EMNLP 2021
A Unified Dual-view Model for Review Summarization and Sentiment Classification with Inconsistency Loss The paper proposes a dual-view model that jointly improves review summarization and sentiment classification tasks. The model uses an encoder to learn a context representation for the review and a summary decoder to generate a review summary. Two sentiment classifiers are used to predict sentiment labels for the review and generated summary. An inconsistency loss is introduced during training to penalize disagreement between the two classifiers and help the decoder generate a summary with a consistent sentiment tendency. Experiment results on four real-world datasets demonstrate the effectiveness of the proposed model. SIGIR 2020
Document Summarization with VHTM: Variational Hierarchical Topic-Aware Mechanism improved summaries. The paper introduces a new approach, VHTM, that combines summarization with topic inference and merges topics into multiple granularity levels. This is in contrast to previous work that relied on pre-trained single-grained topic models. The approach is validated through comprehensive experiments, which demonstrate its superior performance compared to baselines. AAAI 2020
Learning to summarize from human feedback The paper discusses how language models are limited by the data and metrics used for a particular task, such as summarization models being trained to predict human reference summaries and evaluated using ROUGE. The authors propose training a model to optimize for human preferences, using a large dataset of human comparisons between summaries and reinforcement learning. They apply their method to a version of the TL;DR dataset of Reddit posts and find that their models significantly outperform both human reference summaries and larger models fine-tuned with supervised learning alone. The authors also conduct extensive analyses to understand their human feedback dataset and fine-tuned models and establish that their reward model generalizes to new datasets and results in better summaries than optimizing ROUGE according to humans. The paper aims to motivate machine learning researchers to pay closer attention to how their training loss affects the model behavior they actually want. NEURIPS 2020
TLDR: Extreme Summarization of Scientific Documents System: The paper introduces TLDR generation, a new extreme summarization technique for scientific papers that involves compressing the source material and requires expert knowledge of the domain-specific language. To facilitate research on this task, the authors introduce SCITLDR, a dataset of 5.4K TLDRs over 3.2K papers that includes both author-written and expert-derived summaries. The authors propose CATTS, a learning strategy that uses titles as an auxiliary training signal to generate TLDRs. CATTS outperforms strong baselines under both automated metrics and human evaluations. The data and code for this research are publicly available at https://github.com/allenai/scitldr. EMNLP 2020
Leveraging Pre-trained Checkpoints for Sequence Generation Tasks The paper discusses the effectiveness of using pre-trained checkpoints for Sequence Generation. The authors developed a Transformer-based sequence-to-sequence model that is compatible with pre-trained BERT, GPT-2, and RoBERTa checkpoints. They conducted an empirical study and found that initializing their model with these checkpoints resulted in new state-of-the-art results on Machine Translation, Text Summarization, Sentence Splitting, and Sentence Fusion. This demonstrates the potential of pre-training for Sequence Generation tasks. TACL 2020
Learning to Fuse Sentences with Transformers for Summarization System: This paper explores the ability of Transformers to fuse sentences and proposes algorithms to enhance their ability to perform sentence fusion by leveraging the knowledge of points of correspondence between sentences. The authors conducted extensive experiments to investigate the effects of different design choices on Transformer's performance and found that modeling points of correspondence between sentences is crucial for effective sentence fusion. The ability to fuse sentences is important for summarization systems to produce succinct abstracts, but current summarizers can fail on fusing sentences, leading to few summary sentences or incorrect fusions that fail to retain the original meaning. EMNLP 2020
Focus Attention: Promoting Faithfulness and Diversity in Summarization The paper introduces a new method called Focus Attention Mechanism to help seq2seq decoders generate summaries that are similar or topical to the input document. They also propose a Focus Sampling method to enable the generation of diverse summaries. The evaluation on the BBC extreme summarization task shows that models augmented with Focus Attention generate summaries that are closer to the target and more faithful to their input documents, outperforming their vanilla counterparts on ROUGE and multiple faithfulness measures. The paper also demonstrates that Focus Sampling is more effective in generating diverse and faithful summaries than other decoding methods. ACL 2021
HIBRIDS: Attention with Hierarchical Biases for Structure-aware Long Document Summarization The paper discusses the importance of document structure for efficient information consumption, but notes that it is difficult to encode this structure into modern Transformer architecture. The authors present HIBRIDS, a model that incorporates hierarchical biases to better incorporate document structure into attention scores. They also introduce a new task, hierarchical questionsummary generation, which involves summarizing content into a hierarchy of questions and summaries. The authors annotate a new dataset with over 6,000 questionsummary hierarchies labeled on long government reports and show that their model produces better hierarchies than comparisons on both hierarchy quality and content coverage. The model also improves the generation of longform summaries from government reports and Wikipedia articles, as measured by ROUGE scores. ACL 2022
Learning Summary Prior Representation for Extractive Summarization The paper introduces the concept of summary prior, which determines how much of a sentence should be included in a summary without considering its context. The authors propose a new summary system called PriorSum, which uses convolutional neural networks to capture summary prior features from length-variable phrases. The learned prior features are combined with document-dependent features for sentence ranking. Experiments on the DUC generic summarization benchmarks show that PriorSum outperforms existing methods and can identify different aspects supporting the summary prior. ACL 2015
Transformer Reasoning Network for Personalized Review Summarization The paper proposes a novel transformer-based reasoning framework for personalized review summarization in E-commerce platforms. The quality of generated summaries is highly related to the characteristics of users and products, including their historical summaries. However, most previous works ignore the interaction between the input review and corresponding historical summaries. The proposed approach involves inter- and intra-attention in the encoder to learn the personalized representation of the input review and a memory-decoder attention module in the decoder to retrieve more useful information for the final summary generation. The approach outperforms many competitive baseline methods in generating more reasonable summaries for recommendation. SIGIR 2021
RefSum: Refactoring Neural Summarization The paper presents a new framework called Refactor for text summarization and summaries combination. The authors highlight the limitations of previous methods and perform a comprehensive evaluation involving twenty-two base systems, four datasets, and three different application scenarios. The Refactor model achieves new state-of-the-art results on the CNN/DailyMail dataset and addresses the limitations of traditional methods. The authors open-source all the code and provide a convenient interface for other researchers to use as an off-the-shelf tool to achieve further performance improvements. NAACL 2021
Noisy Self-Knowledge Distillation for Text Summarization System: The paper proposes a new method called self-knowledge distillation for text summarization that can improve the training process by using guidance from a teacher model and multiple noise signals to better model uncertainty. The proposed method achieves state-of-the-art results on three benchmarks for both pretrained and nonpretrained summarizers. NAACL 2021
Self-Supervised Learning for Contextualized Extractive Summarization The paper proposes a new approach to improve extractive summarization by introducing three pre-training tasks that capture document-level context in a self-supervised manner. The proposed method is validated through experiments on the CNN/DM dataset, and the results show that a simple model with pre-training outperforms previous state-of-the-art models. ACL 2019
Efficient Attentions for Long Document Summarization HEPOS is a new efficient encoder-decoder attention model that effectively identifies important information from a source document for summarization. The authors conducted a study of existing efficient self-attentions and combined them with HEPOS to process ten times more tokens than existing models that use full attentions. They also presented a new dataset, GOVREPORT, with longer documents and summaries, and showed that their models produced significantly higher ROUGE scores than competitive comparisons, including new state-of-the-art results on PubMed. Human evaluation also showed that their models generated more informative summaries with fewer unfaithful errors. NAACL 2021
Aspect-Controllable Opinion Summarization System: This paper proposes a new approach for generating customized summaries based on aspect queries, such as describing the location and room of a hotel. The authors create a synthetic training dataset enriched with aspect controllers and fine-tune a pretrained model to generate aspect-specific summaries. Experiments show that their model outperforms previous state-of-the-art methods and can generate personalized summaries by controlling the number of aspects discussed. EMNLP 2021
Make The Most of Prior Data: A Solution for Interactive Text Summarization with Preference Feedback The paper discusses the importance of incorporating human preferences in summarization models to align with human interests. It proposes a new framework for training summarization models with preference feedback in an interactive manner, leveraging offline data and a novel reward model to improve performance and sample efficiency. The experiments conducted on three datasets confirm the benefits of the proposed framework in active, few-shot, and online settings of preference learning. NAACL 2022
Summarizing Patients’ Problems from Hospital Progress Notes Using Pre-trained Sequence-to-Sequence Models The paper proposes a new NLP task of generating a list of problems in a patient's daily care plan using input from provider's progress notes during hospitalization. The study investigates the performance of T5 and BART, two state-of-the-art seq2seq transformer architectures, in solving this problem. The evaluation methods include ROUGE, BERTScore, cosine similarity on sentence embedding, and F-score on medical concepts. The results show that T5 with domain adaptive pre-training achieves significant performance gains compared to a rule-based system and general domain pre-trained language models, indicating a promising direction for tackling the problem summarization task. The study provides a corpus built on top of progress notes from publicly available electronic health record progress notes in the Medical Information Mart for Intensive Care (MIMIC)-III. COLING 2022
Document Summarization with Latent Queries The paper discusses the development of neural models for creating generic summaries for single or multiple documents, driven by the availability of large-scale datasets. However, for query-focused summarization (QFS), labeled training data is not easily accessible. The authors propose a unified modeling framework for any type of summarization, assuming that all summaries are a response to a query, which is observed in QFS and latent in generic summarization. They model queries as discrete latent variables over document tokens and learn representations compatible with observed and unobserved query verbalizations. The framework formulates summarization as a generative process and optimizes a latent query model and a conditional language model. Despite learning from generic summarization data only, their approach outperforms strong comparison systems across benchmarks, query types, document settings, and target domains. TACL 2022
Long-Span Language Models for Query-Focused Unsupervised Extractive Text Summarization The paper discusses the use of long-span language models (LMs) in unsupervised query-focused extractive summarization systems. The authors propose the use of Across Sentence Boundary LSTM-based LMs (ASBLSTM and biASBLSTM) that are specifically designed for this task. They conducted experiments on a real-world corpus with 100 Wikipedia event descriptions as queries and found that using the long-span models in an integer linear programming (ILP) formulation of MMR criterion was the most effective approach compared to several state-of-the-art baseline methods from the literature. ECIR 2018
Identifying Implicit Quotes for Unsupervised Extractive Summarization of Conversations The paper proposes an unsupervised extractive neural summarization model called Implicit Quote Extractor for conversational texts. The model aims to extract quoted sentences as summaries, even if they are not explicitly shown in replies. The training task of the model is to predict whether a reply candidate is a true reply to a post, and to do so, the model learns to extract sentences that replies frequently refer to. The model is evaluated on two email datasets and one social media dataset, and the results confirm that it is useful for extractive summarization. The paper also discusses whether quote extraction is an important factor for summarization and whether the model can capture salient sentences that conventional methods cannot. AACL 2020
Unsupervised Extractive Summarization by Pre-training Hierarchical Transformers The paper discusses a new method for unsupervised extractive document summarization, which involves selecting important sentences from a document without using labeled summaries during training. The authors propose using transformer attentions to rank sentences, and pre-train a hierarchical transformer model using unlabeled documents only. They then use sentence-level self-attentions and pre-training objectives to rank sentences. Experiments on CNN/DailyMail and New York Times datasets show that their model achieves state-of-the-art performance on unsupervised summarization, and is less dependent on sentence positions. When combined with a recent unsupervised model explicitly modeling sentence positions, the results are even better. EMNLP 2020
Improving Unsupervised Extractive Summarization with Facet-Aware Modeling The paper discusses the problem of facet bias in unsupervised extractive summarization, where existing graph-based methods tend to select sentences within the same facet. To address this, the authors propose a facet-aware centrality-based ranking model that introduces a sentence-document weight to pay more attention to different facets. The method is evaluated on 8 benchmark datasets and consistently outperforms strong baselines, especially in long and multi-document scenarios. The performance gains are attributed to alleviating the facet bias problem. ACL 2021
Unsupervised Extractive Summarization using Pointwise Mutual Information System: The paper proposes a new approach to unsupervised extractive summarization using pointwise mutual information (PMI) between sentences to measure relevance and redundancy. The method involves a greedy sentence selection algorithm to maximize relevance and minimize redundancy of extracted sentences. The authors show that their method outperforms similarity-based methods on datasets in various domains, including news, medical journal articles, and personal anecdotes. EACL 2021
Unsupervised Extractive Opinion Summarization Using Sparse Coding The paper presents a new method called Semantic Autoencoder (SemAE) for extractive opinion summarization in an unsupervised manner. SemAE uses dictionary learning to capture semantic information from reviews and learns a latent representation of each sentence over semantic units. The extractive summarization algorithm leverages these representations to identify representative opinions among hundreds of reviews. SemAE can also perform controllable summarization to generate aspect-specific summaries. The authors report strong performance on SPACE and AMAZON datasets and provide their code publicly. ACL 2022
Summarizing Opinions: Aspect Extraction Meets Sentiment Prediction and They Are Both Weakly Supervised The paper presents a neural framework for summarizing opinions from online product reviews. The framework is knowledge-lean and only requires light supervision in the form of product domain labels and user-provided ratings. The method combines two weakly supervised components to identify salient opinions and form extractive summaries from multiple reviews. The authors introduce an opinion summarization dataset that includes a training set of product reviews from six diverse domains and human-annotated development and test sets with gold standard aspect annotations, salience labels, and opinion summaries. Automatic evaluation shows significant improvements over baselines, and a largescale study indicates that the opinion summaries generated by the framework are preferred by human judges according to multiple criteria. EMNLP 2018
Discourse-Aware Unsupervised Summarization of Long Scientific Documents The paper proposes an unsupervised graph-based ranking model for summarizing long scientific documents. The method uses a two-level hierarchical graph representation of the document and asymmetrical positional cues to determine sentence importance. The approach outperforms strong unsupervised baselines in automatic metrics and human evaluation on the PubMed and arXiv datasets. It also achieves performance comparable to many state-of-the-art supervised approaches. The results suggest that patterns in the discourse structure are a strong signal for determining importance in scientific articles. EACL 2021
SummVD : An efficient approach for unsupervised topic-based text summarization The paper introduces a new method called SummVD for automatic unsupervised extractive summarization. It uses singular value decomposition and word clustering to reduce the dimensionality of word embeddings and propose a representation of words on a small number of dimensions, each representing a hidden topic. This makes SummVD an efficient method for text summarization, outperforming recent extractive approaches. It requires low resources in terms of data and computing power, making it suitable for use in live summarization systems. AACL 2022
Summarization Evaluation in the Absence of Human Model Summaries Using the Compositionality of Word Embeddings The paper presents a new approach for evaluating the quality of summaries without the need for human model summaries. The approach uses word embeddings to develop features that reflect coverage, diversity, informativeness, and coherence of summaries. These features are then used to train a learning model for predicting summary content quality. The proposed metric was evaluated on data from query-focused and update summarization tasks in TAC 2008 and 2009, and the results show that the feature combination provides reliable estimates of summary content quality when model summaries are not available. COLING 2018
GUSUM: Graph-Based Unsupervised Summarization using Sentence Features Scoring and Sentence-BERT The paper presents a new method for unsupervised extractive document summarization called Graph-Based Unsupervised Summarization (GUSUM). The method uses sentence embeddings and features to modify traditional graph ranking algorithms and compute sentence centrality. The approach aims to include the most important sentences while excluding those with similar meanings in the summary. The method is evaluated on several datasets and achieves high performance when evaluated both automatically and by humans. COLING 2022
Sentence Centrality Revisited for Unsupervised Summarization The paper discusses the development of an unsupervised approach for single document summarization, which utilizes a modified graph-based ranking algorithm. The algorithm incorporates BERT, a neural representation learning model, to capture sentential meaning, and builds graphs with directed edges to consider the relative position of nodes in a document. The approach was tested on three news summarization datasets and outperformed strong baselines by a significant margin. The authors argue that this approach is more realistic than relying on large-scale and high-quality training data for different types of summaries, domains, or languages. ACL 2019
Discrete Optimization for Unsupervised Sentence Summarization with Word-Level Extraction The paper discusses the process of automatic sentence summarization, which involves creating a shorter version of a sentence while retaining its most important information. The authors propose an unsupervised objective function that considers language fluency and semantic similarity metrics to find a high-scoring summary through discrete optimization. Their method achieves a new state-of-the-art for unsupervised sentence summarization according to ROUGE scores. The authors also highlight the sensitivity of the commonly reported ROUGE F1 metric to summary length and suggest that future evaluation should group summarization systems by output length brackets. ACL 2020
Extractive Summarisation Based on Keyword Profile and Language Model System: The paper presents a statistical framework for summarizing scientific papers by extracting information-rich citation sentences that capture the main contributions of the paper. The framework involves two stages, where salient keywords are automatically discovered in the first stage and citation sentences that best capture the paper's main contributions are identified in the second stage. The approach outperforms current state-of-the-art systems in scientific paper summarization using methods rooted in quantitative statistics and information theory. NAACL 2015
Integrating Importance, Non-Redundancy and Coherence in Graph-Based Extractive Summarization The paper proposes a graph-based method for extractive single-document summarization that considers importance, non-redundancy, and local coherence simultaneously. The method uses a bipartite graph consisting of sentence and entity nodes to rank sentences based on importance and ensure non-redundancy and local coherence of the summary. The method is applied to scientific articles from the journal PLOS Medicine and achieves better results than other systems on this data. The method also achieves state-of-the-art results on DUC 2002 data, and incorporating the local coherence measure always achieves the best results. Human judgments are used to evaluate the coherence of the summaries. IJCAI 2015
Topical Coherence for Graph-based Extractive Summarization System: The paper presents an approach for extractive single-document summarization using a weighted graphical representation of documents obtained by topic modeling. The approach optimizes importance, coherence, and non-redundancy simultaneously using ILP. The system's performance is compared with state-of-the-art results on scientific articles from PLOS Medicine and on DUC 2002 data using ROUGE scores. Human judges evaluate the coherence of summaries generated by the system in comparison to two baselines, and the approach obtains competitive performance. EMNLP 2015
Gibberish, Assistant, or Master? Using Tweets Linking to News for Extractive Single-Document Summarization The paper explores using tweets linking to news for generating extractive summaries of documents. By regarding every tweet as a vote for candidate sentences, they use unsupervised summarization models to rank candidate extracts via random walk on a heterogeneous graph. They can use the linking tweets to opportunistically "supervise" the summarization with no need for reference summaries. The influence of the volume and latency of tweets on the quality of output summaries is analyzed. Compared to truly supervised summarizers unaware of tweets, their method achieves significantly better results with a reasonably small tradeoff on latency. Compared to the same using tweets as auxiliary features, their method is comparable while needing fewer tweets and much shorter time to achieve significant outperformance. SIGIR 2015
A Redundancy-Aware Sentence Regression Framework for Extractive Summarization The paper proposes a new approach to extractive summarization that models sentence importance and redundancy simultaneously by evaluating the relative importance of a sentence given a set of selected sentences. The proposed method uses a new framework to conduct regression with respect to the relative gain of a sentence calculated by the ROUGE metric and incorporates additional features derived from sentence relations. Experiments on multi-document summarization datasets show that the proposed method outperforms state-of-the-art extractive summarization approaches. COLING 2016
Combining Graph Degeneracy and Submodularity for Unsupervised Extractive Summarization The paper presents an unsupervised text summarization system that uses a submodularity framework to generate summaries in a greedy way while maintaining high performance. The system includes a novel coverage reward term that assigns scores to words based on the graph-of-words representation of text and the k-core decomposition algorithm. The system was evaluated on three datasets and achieved state-of-the-art performance, particularly in the meeting domain. EMNLP 2017
Enumeration of Extractive Oracle Summaries The paper proposes an Integer Linear Programming formulation to obtain extractive oracle summaries in terms of ROUGEn and an algorithm that enumerates all of the oracle summaries for a set of reference summaries to evaluate system summaries. The experimental results show that there is room for improvement in extractive summarization and that F-measures derived from the enumerated oracle summaries have stronger correlations with human judgment than those derived from single oracle summaries. EACL 2017
Toward Extractive Summarization of Online Forum Discussions via Hierarchical Attention Networks System: This paper discusses the task of forum thread summarization, which has not been extensively studied. The authors propose a model that uses hierarchical attention networks and neural attention mechanisms to build sentence and thread representations for summarization. The results show that their approach outperforms other methods and that removing redundancies is important for achieving the best results. AAAI 2017
Extractive Summarization Using Multi-Task Learning with Document Classification The paper proposes a framework for automatic document summarization that extracts sentences using externally related information. The focus is on single document summarization using small amounts of reference summaries, and the framework uses multitask learning with curriculum learning for sentence extraction and document classification. The proposed method is evaluated on financial report and news corpus datasets, and the results show comparable performance to state-of-the-art systems. EMNLP 2017
Rank-Aware Gain-Based Evaluation of Extractive Summarization The paper discusses the limitations of the ROUGE metric for evaluating extractive summarization tasks and proposes a new evaluation metric called Sem-nCG, which is both rank-aware and semantic-aware. The paper also demonstrates how to generate more reliable semantic-aware ground truths for evaluating extractive summarization tasks without additional human intervention. Preliminary experimental results show that the Sem-nCG metric is semantic-aware and has a higher correlation with human judgement for single document summarization when a single reference is considered. CIKM 2022
A Supervised Approach to Extractive Summarisation of Scientific Papers The paper discusses the challenges of summarizing large, complex scientific publications using neural approaches, which require large datasets. The authors introduce a new dataset for summarization of computer science publications and develop models using both neural sentence encoding and traditional summarization features. They find that models that encode sentences and their local and global context perform best, outperforming established baseline methods. CONLL 2017
SummaRuNNer: A Recurrent Neural Network Based Sequence Model for Extractive Summarization of Documents The paper presents SummaRuNNer, a Recurrent Neural Network (RNN) based model for extractive summarization of documents. The model achieves performance better than or comparable to state-of-the-art and is very interpretable, allowing visualization of its predictions broken up by abstract features such as information content, salience, and novelty. The paper also introduces abstractive training of the extractive model, which can train on human-generated reference summaries alone, eliminating the need for sentence-level extraction. AAAI 2017
Hybrid MemNet for Extractive Summarization The paper discusses the problem of extractive text summarization and the limitations of conventional approaches that rely on manually compiled features. The authors propose a data-driven system called Hybrid MemNet, which uses an end-to-end deep network to learn a continuous unified representation of a document and generate its summary. The system captures both local and global sentential information and identifies summary-worthy sentences. Experimental results on two corpora show significant performance gains compared to state-of-the-art baselines. CIKM 2017
Extractive Summarization with SWAP-NET: Sentences and Words from Alternating Pointer Networks SWAP-NET is a new neural sequence-to-sequence model for extractive summarization that identifies both salient sentences and key words in an input document, and then combines them to form the extractive summary. The model uses a new two-level pointer network based architecture that models the interaction of key words and salient sentences. Experiments on large scale benchmark corpora demonstrate that SWAP-NET outperforms state-of-the-art extractive summarizers. ACL 2018
SQuALITY: Building a Long-Document Summarization Dataset the Hard Way The paper discusses the challenges of assembling summarization datasets and proposes a new approach of hiring contractors to write original summaries from scratch. The resulting dataset, SQuALITY, consists of question-focused summaries and is shown to be challenging for state-of-the-art summarization systems. The authors also note that existing automatic evaluation metrics are weak indicators of summary quality. SQuALITY is available for use at https://github.com/nyu-mll/SQuALITY. EMNLP 2022
Reinforced Extractive Summarization with Question-Focused Rewards The paper proposes a new training method for extractive summarization using Cloze-style comprehension questions instead of human abstracts, which are often inaccurate due to difficulty aligning them with source documents. The method encourages system summaries to preserve important source content and share common words with the abstracts, and uses reinforcement learning with a question-focused reward function to promote concise, fluent, and informative summaries. Experiments show that the proposed method is effective and outperforms state-of-the-art systems on standard summarization datasets. ACL 2018
Attentive Encoder-based Extractive Text Summarization The paper proposes an attentive encoder-based summarization (AES) model for generating article summaries that considers both the global information of a document and the relationships of sentences in the document. The model uses both unidirectional and bidirectional recurrent neural networks (RNNs) to construct encoders, resulting in unidirectional attentive encoder-based summarization (Uni-AES) and bidirectional attentive encoder-based summarization (Bi-AES). The experimental results show that Bi-AES outperforms Uni-AES and achieves substantial improvements over a relevant baseline. CIKM 2018
Ranking Sentences for Extractive Summarization with Reinforcement Learning System: This paper proposes a new algorithm for single document summarization, which is the task of creating a shorter version of a document while retaining its main information. The algorithm is based on a sentence ranking task and uses a reinforcement learning objective to optimize the ROUGE evaluation metric. The authors trained a neural summarization model using this algorithm on the CNN and DailyMail datasets and found that it outperformed existing extractive and abstractive systems in both automatic and human evaluations. NAACL 2018
On Extractive and Abstractive Neural Document Summarization with Transformer Language Models The paper presents a method for producing abstractive summaries of long documents using neural abstractive summarization. The method involves performing a simple extractive step before generating a summary, which is then used to condition the transformer language model on relevant information. The approach produces more abstractive summaries compared to prior work that employs a copy mechanism, while still achieving higher ROUGE scores. The authors provide extensive comparisons with strong baseline methods and multiple variants of their approach, using four different summarization tasks and datasets. They find that transformer-based methods produce summaries with fewer n-gram copies, leading to n-gram copying statistics that are more similar to human-generated abstracts. A human evaluation shows that transformers are ranked highly for coherence and fluency, but purely extractive methods score higher for informativeness and relevance. The authors hope that their architectures and experiments may serve as strong points of comparison for future work. EMNLP 2020
Fine-grained Factual Consistency Assessment for Abstractive Summarization Models System: This paper proposes a framework called SumFC for assessing the factual consistency of abstractive summarization models. SumFC uses a two-stage approach to select relevant sentences and perform fine-grained consistency reasoning at the sentence level. The model is trained using data synthesis and contrastive loss to identify subtle cues. Experimental results show that SumFC outperforms previous methods and can distinguish detailed differences better. EMNLP 2021
BANDITSUM: Extractive Summarization as a Contextual Bandit The paper proposes a new method called BANDITSUM for training neural networks to perform single-document extractive summarization without heuristically-generated extractive labels. The approach treats extractive summarization as a contextual bandit problem, where the model chooses a sequence of sentences to include in the summary based on the document context. A policy gradient reinforcement learning algorithm is used to train the model to select sequences of sentences that maximize ROUGE score. The experiments show that BANDITSUM achieves better or comparable ROUGE scores than state-of-the-art approaches and converges using fewer update steps. Additionally, BANDITSUM performs significantly better than competing approaches when good summary sentences appear late in the source document. EMNLP 2018
Harnessing Popularity in Social Media for Extractive Summarization of Online Conversations The paper discusses using popularity measures in social media as a way to summarize online conversations. They propose a Disjunctive model that separates the contribution of content and context in determining popularity. They evaluate their model using a dataset where the informativeness of comments is annotated and show that their model outperforms baseline models that use popularity as a measure of informativeness. EMNLP 2018
Neural Latent Extractive Document Summarization System: The paper proposes a new approach to extractive summarization that uses a latent variable model where sentences are viewed as latent variables. This approach avoids the need for heuristically created sentence-level labels, which may be suboptimal. Instead, sentences with activated variables are used to infer gold summaries, and the loss during training comes directly from these summaries. The model was tested on the CNN/Dailymail dataset and was found to outperform a strong extractive baseline trained on heuristically approximated labels and perform competitively with several recent models. EMNLP 2018
DeepChannel: Salience Estimation by Contrastive Learning for Extractive Document Summarization DeepChannel is a neural model for extractive document summarization that uses a salience score to represent the importance of sentences in a document. The salience score is estimated using an attention-based deep neural network, and the model uses a contrastive training strategy to learn the salience estimation network. The most salient sentences are iteratively extracted from the document to generate a summary. The model achieves state-of-the-art ROUGE scores on the CNN/Daily Mail dataset and shows strong robustness in out-of-domain tests. It also demonstrates tremendous data efficiency, achieving a high ROUGE-1 F-1 score with only 1/100 of the training set. AAAI 2019
Reward Learning for Efficient Reinforcement Learning in Extractive Document Summarisation The paper proposes a new approach to document summarization using Reinforcement Learning (RL) algorithms. The approach, called RELIS, learns a reward function with Learning-to-Rank (L2R) algorithms at training time and uses this reward function to train an input-specific RL policy at test time. This approach reduces training time by two orders of magnitude compared to state-of-the-art models while performing on par with them. The authors prove that RELIS guarantees to generate near-optimal summaries with appropriate L2R and RL algorithms. The approach is evaluated on extractive multi-document summarization. IJCAI 2019
Neural Extractive Text Summarization with Syntactic Compression The paper discusses recent neural network approaches to summarization, which are either selection-based extraction or generation-based abstraction. The authors present a neural model for single-document summarization that combines extraction and syntactic compression. The model selects sentences from the document, identifies possible compressions based on constituency parses, and scores those compressions with a neural model to produce the final summary. The authors construct oracle extractive-compressive summaries for learning and achieve strong performance on the CNN/Daily Mail and New York Times datasets, outperforming an off-the-shelf compression module. Human and manual evaluation shows that the model's output generally remains grammatical. EMNLP 2019
Exploiting Discourse-Level Segmentation for Extractive Summarization The paper proposes using discourse-level segmentation to improve extractive summarization, as it can more precisely identify the core content in a document compared to using sentences as the elementary unit. The authors investigate the effectiveness of this approach using two basic neural network architectures and a deep bi-directional transformer, and achieve state-of-the-art performance when combining discourse-level segmentation with their adapted contextual representation model on the CNN/Daily Mail dataset. EMNLP 2019
Extractive Summarization of Long Documents by Combining Global and Local Context The paper proposes a new neural summarization model for long documents that considers both global and local context. The model outperforms previous work on two scientific paper datasets and shows that its benefits increase with longer documents. Surprisingly, the study finds that the benefits of the model come mainly from modeling the local context, even for the longest documents. EMNLP 2019
DistilSum: Distilling the Knowledge for Extractive Summarization DistilSum is a new approach to extractive summarization that uses a teacher mechanism and student model to produce high entropy soft targets at a high temperature. The student model is trained to match these targets and then tested with a temperature of 1 to distill for ground-truth labels. Compared to the current best extractive classifier, BERTSUMEXT, DistilSum achieves a substantial improvement in both text similarity and performance of the classifier on the CNN/DM dataset. The source code for DistilSum will be available on Github. CIKM 2020
Countering the Effects of Lead Bias in News Summarization via Multi-Stage Training and Auxiliary Losses The paper discusses how sentence position is a strong feature for news summarization, but recent neural systems excessively exploit this trend, which can be detrimental when summarizing documents where important content is in later parts of the article. The authors propose two techniques to make systems sensitive to the importance of content in different parts of the article: pretraining the model with randomly shuffled sentences and using an auxiliary ROUGE-based loss. These techniques significantly improve the performance of a reinforcement learning-based extractive system, with the auxiliary loss being more powerful than pretraining. EMNLP 2019
Reading Like HER: Human Reading Inspired Extractive Summarization The paper proposes a new approach to extractive text summarization for long documents by simulating the two-stage process of human summarization. The approach uses a convolutional neural network to encode the gist of paragraphs for rough reading and a decision-making policy with an adapted termination mechanism for careful reading. The problem is formulated as a contextual bandit problem and solved with policy gradient. Experiments on the CNN and DailyMail datasets show that the proposed method provides high-quality summaries with varied length and outperforms state-of-the-art extractive methods in terms of ROUGE metrics. EMNLP 2019
Guiding Extractive Summarization with Question-Answering Rewards The paper discusses the challenge of developing a supervised summarization system due to the lack of ground-truth data. The authors propose a novel framework that uses question-answering rewards to guide the system in producing informative and fluent summaries that perform well on question-answering tasks. The system learns from human abstracts and aims to produce summaries that can answer important questions. The results show that the proposed framework outperforms strong summarization baselines as evaluated by automatic metrics and human assessors. NAACL 2019
STRASS: A Light and Effective Method for Extractive Summarization Based on Sentence Embeddings The paper introduces STRASS, an extractive text summarization method that selects sentences with the closest embeddings to the document embedding. The model learns a transformation of the document embedding to minimize the similarity between the extractive summary and the ground truth summary. The training is inexpensive and can be done on CPU, and the inference time is short and linear. The paper also introduces the French CASS dataset and shows that the method performs similarly to state-of-the-art extractive methods with effective training and inferring time. ACL 2019
HETFORMER: Heterogeneous Transformer with Sparse Attention for Long-Text Extractive Summarization The paper proposes a new approach for extractive summarization called HETFORMER, which is based on a Transformer-based pre-trained model with multi-granularity sparse attentions. The approach models different types of semantic nodes in raw text as a potential heterogeneous graph and directly learns heterogeneous relationships among nodes by Transformer. The experiments show that HETFORMER achieves state-of-the-art performance in Rouge F1 while using less memory and fewer parameters compared to existing methods that use GNNs with pre-trained models. EMNLP 2021
Extractive Summarization as Text Matching The paper proposes a new approach to building neural extractive summarization systems by formulating the task as a semantic text matching problem. This paradigm shift is based on a comprehensive analysis of the gap between sentence-level and summary-level extractors. The authors demonstrate the effectiveness of the matching framework by achieving state-of-the-art results on the CNN/DailyMail dataset and five other datasets. They also release their codes, processed dataset, and generated summaries to encourage further research in this area. ACL 2020
Bridging Hierarchical and Sequential Context Modeling for Question-driven Extractive Answer Summarization The paper discusses the challenges of answer summarization in non-factoid question answering and proposes a unified model that integrates hierarchical and sequential context modeling for question-driven extractive answer summarization. The model uses a hierarchical compare-aggregate method to integrate the interaction between QA pairs in both word-level and sentence-level into the final question and answer representations. The question-aware sequential extractor is then used to produce a summary for the lengthy answer. The experimental results show that the proposed method achieves superior performance on WikiHowQA and PubMedQA. SIGIR 2020
On the Abstractiveness of Neural Document Summarization System: The paper discusses modern neural document summarization systems that aim to produce abstractive summaries. The authors conducted a study to verify the degree of abstractiveness of these systems and found that many tend to be near-extractive in practice. They also implemented a pure copy system that achieved comparable results while being more computationally efficient. The authors suggest that future efforts should focus on developing more efficient systems that can better utilize the vocabulary in the original document. EMNLP 2018
Re-evaluating Automatic Summarization with BLEU and 192 Shades of ROUGE The paper analyzes current evaluation methodologies for summarization metrics and identifies concerns such as the absence of methods for testing improvements over a baseline and the omission of important components of human assessment. The authors propose an evaluation methodology that overcomes these challenges and reveals which metric variants outperform others. They also find that the machine translation metric BLEU performs similarly to ROUGE for evaluating summarization systems. The authors replicate a recent evaluation that relied on suboptimal ROUGE variants and find different conclusions about the relative performance of state-of-the-art summarization systems. EMNLP 2015
Heterogeneous Graph Neural Networks for Extractive Document Summarization System: The paper presents a new approach called HETERSUMGRAPH for extractive document summarization. It uses a graph-based neural network that includes semantic nodes of different granularity levels, which act as intermediaries between sentences and enrich cross-sentence relations. The graph structure is flexible and can be extended from a single-document setting to multi-document by introducing document nodes. The authors claim to be the first to introduce different types of nodes into graph-based neural networks for extractive document summarization and have performed a comprehensive qualitative analysis to investigate their benefits. The code for HETERSUMGRAPH will be released on Github. ACL 2020
Discourse-Aware Neural Extractive Text Summarization The paper introduces a new neural summarization model called DISCOBERT1, which addresses issues with sentence-based extractive models and the limitations of BERT in capturing long-range dependencies in documents. DISCOBERT extracts sub-sentential discourse units and constructs structural discourse graphs to capture long-range dependencies, which are encoded with Graph Convolutional Networks. The proposed model outperforms state-of-the-art methods on popular summarization benchmarks compared to other BERT-base models. ACL 2020
Fact-level Extractive Summarization with Hierarchical Graph Mask on BERT The paper proposes a new approach to extractive summarization that focuses on fact-level semantic units rather than individual sentences. The model uses a hierarchical structure to incorporate multiple levels of textual information and is combined with BERT using a hierarchical graph mask to improve natural language understanding. The experiments on the CNN/DaliyMail dataset show that the proposed model achieves state-of-the-art results. COLING 2020
Neural Extractive Summarization with Hierarchical Attentive Heterogeneous Graph Network The paper discusses the challenges of sentence-level extractive text summarization, particularly in modeling redundancy between extracted sentences. The authors propose a new approach called HAHSum, which uses a hierarchical attentive heterogeneous graph to model different levels of information and spotlight redundancy dependencies between sentences. The approach iteratively refines sentence representations with a redundancy-aware graph and delivers label dependencies by message passing. Experiments on large-scale benchmark corpora demonstrate that HAHSum outperforms previous extractive summarizers. EMNLP 2020
Conditional Neural Generation using Sub-Aspect Functions for Extractive News Summarization The paper discusses the challenges of text summarization in the news domain, where neural models easily overfit due to the inverted pyramid writing style and the need to generate a variety of summaries for different users. The authors propose a neural framework that can flexibly control summary generation by introducing subaspect functions (importance, diversity, position) regulated by control codes. They demonstrate that extracted summaries with minimal position bias are comparable to those generated by standard models that take advantage of position preference, and that news summaries generated with a focus on diversity can be more preferred by human raters. The authors suggest that a more flexible neural summarization framework providing more control options could be desirable in tailoring to different user preferences. EMNLP 2020
Stepwise Extractive Summarization and Planning with Structured Transformers The paper proposes encoder-centric stepwise models for extractive summarization using structured transformers - HiBERT and Extended Transformers. The models enable stepwise summarization by injecting the previously generated summary into the structured transformer as an auxiliary sub-structure. The models are efficient in modeling the structure of long inputs and do not rely on task-specific redundancy-aware modeling, making them a general purpose extractive content planner for different tasks. The stepwise models achieve state-of-the-art performance in terms of Rouge without any redundancy aware modeling or sentence filtering in CNN/DailyMail extractive summarization and Rotowire table-to-text generation. Amongst the two structured transformers tested, stepwise Extended Transformers provides the best performance across both datasets and sets a new standard for these challenges. EMNLP 2020
Enhancing Extractive Text Summarization with Topic-Aware Graph Neural Networks The paper proposes a new approach to extractive text summarization that addresses the limitations of existing models in capturing intersentence relationships and topical information. The proposed model uses a graph neural network to efficiently represent the document structure and a joint neural topic model to discover latent topics for sentence selection. The experimental results show that the proposed model outperforms existing approaches on both short and long document datasets, demonstrating its robustness in different document genres and lengths. The model's effectiveness in long document summarization is attributed to its ability to preselect salient contents using topical information. COLING 2020
Goal-Directed Extractive Summarization of Financial Reports The paper discusses the importance of extractive summarization of financial reports filed by companies, which impact their stock prices. The lack of in-domain labeled summarization data is a major obstacle to train finance-specific summarization models. The paper proposes a goal-directed approach to modeling 10-K report summarization, leveraging summaries with labeled goal-related data for stock buy/sell classification. The paper also considers a multi-task learning method with an industry classification auxiliary task to provide improvements. The proposed method significantly outperforms strong baselines in intrinsic and extrinsic evaluations for stock buy/sell classification and portfolio construction tasks. CIKM 2021
Deep Differential Amplifier for Extractive Summarization The paper discusses the issue of imbalanced sentence classification in extractive summarization, which cannot be easily addressed by data sampling or augmentation algorithms. To solve this problem, the authors propose a deep differential amplifier framework that calculates and amplifies the semantic difference between each sentence and other sentences, and applies a residual unit to deepen the architecture. The model pays more attention to the pivotal information of one sentence, which is different from previous approaches that model all informative context in the source document. Experimental results show that the proposed summarizer performs competitively against state-of-the-art methods. The source code will be available on Github. ACL 2021
Post-Editing Extractive Summaries by Definiteness Prediction The paper discusses the limitations of extractive summarization and proposes a postediting step that focuses on the definiteness of noun phrases to improve the coherence and readability of extractive summaries. The proposed system was evaluated through human and automatic evaluation studies, which showed that the system generated improved summaries. The authors also noted that the system relied on local cues rather than pragmatic reasoning to make decisions. EMNLP 2021
Extractive Opinion Summarization in Quantized Transformer Spaces ones. The paper presents the Quantized Transformer, an unsupervised system for extractive opinion summarization that uses a clustering interpretation of the quantized space and a novel extraction algorithm to discover popular opinions among hundreds of reviews. The system also enables controllable summarization without further training by utilizing properties of the quantized space to extract aspect-specific summaries. The authors also introduce SPACE, a large-scale evaluation benchmark for opinion summarizers, and demonstrate the promise of their approach through experiments and human studies. TACL 2021
Krimping texts for better summarization The paper introduces a new approach for automated text summarization using the Minimum Description Length principle and the Krimp dataset compression algorithm. The approach represents a text as a transactional dataset and describes it using frequent sequences of words. The summary is compiled from sentences that compress the document, with the problem of summarization reduced to maximal coverage. The approach is evaluated using a greedy algorithm and the results are presented. EMNLP 2015
HIBERT: Document Level Pre-training of Hierarchical Bidirectional Transformers for Document Summarization The paper proposes a new model called HIBERT for document encoding in neural extractive summarization models. It pre-trains the model using unlabeled data and applies it to the summarization model, resulting in better performance compared to randomly initialized models. The proposed model achieves state-of-the-art performance on the CNN/Dailymail and New York Times datasets. ACL 2019
Considering Nested Tree Structure in Sentence Extractive Summarization with Pre-trained Transformer The paper proposes a new model called NeRoBERTa for sentence extractive summarization, which uses nested tree structures consisting of syntactic and discourse trees to improve coherence and informativeness of the summary. The model outperforms baseline models in ROUGE and achieves comparable scores to state-of-the-art models in human evaluation. The paper highlights the difficulty of using pre-trained BERT-based encoders for this task and suggests the use of nested tree structures for better performance. EMNLP 2021
The Effect of Pretraining on Extractive Summarization for Scientific Documents The paper explores the impact of pretraining on a BERT-based extractive summarization system for scientific documents. The authors found that an intermediate pretraining step using existing summarization datasets improved performance and achieved state-of-the-art results on a scientific summarization dataset. They also analyzed the effects of varying the size and domain of the pretraining corpus, changing the length of the input sequence, and varying target tasks. Additionally, they investigated how intermediate pretraining interacts with contextualized word embeddings trained on different domains. NAACL 2021
Extractive Summarization Considering Discourse and Coreference Relations based on Heterogeneous Graph System: The paper proposes a model for extractive summarization that incorporates both discourse and coreference relations. The model uses a heterogeneous graph containing three types of nodes, each corresponding to text spans of different granularity. Experimental results on a benchmark summarization dataset show that the proposed method is effective. EACL 2021
Sliding Selector Network with Dynamic Memory for Extractive Summarization of Long Documents The paper proposes a new approach to extractive summarization of long-form documents using a sliding selector network with dynamic memory. This approach addresses the issue of loss of summary-relevant contents due to the length limitation of text encoder in neural-based summarization models. The sliding window extracts summary sentences segment by segment and the memory mechanism preserves and updates history information dynamically, allowing semantic flow across different windows. Experimental results on two large-scale datasets of scientific papers show that this model outperforms previous state-of-the-art models. Qualitative and quantitative investigations are also performed to understand how the model works and where the performance gain comes from. NAACL 2021
Flexible Non-Autoregressive Extractive Summarization with Threshold: How to Extract a Non-Fixed Number of Summary Sentences The paper proposes a non-autoregressive method for extractive summarization called ThresSum, which extracts a non-fixed number of summary sentences without sorting them by predicted probabilities. Instead, ThresSum picks sentences individually from the source document when the predicted probabilities exceed a threshold. During training, the model enhances sentence representation through iterative refinement and weak supervision with soft labels generated progressively by adjusting the temperature with a knowledge distillation algorithm. ThresSum outperforms BERTSUMEXT with a substantial improvement of 0.74 ROUGE-1 score on CNN/DM dataset. AAAI 2021
Multiplex Graph Neural Network for Extractive Text Summarization The paper proposes a new approach to extractive text summarization, which involves extracting the most representative sentences from a given document. The authors note that sentence embedding is important for creating a good summary, and that recent studies have used graph neural networks to capture inter-sentential relationships. However, these approaches do not consider multiple types of inter-sentential relationships or intra-sentential relationships. To address these issues, the authors propose a Multiplex Graph Convolutional Network (MultiGCN) to model different types of relationships among sentences and words. They then use this approach to create a Multiplex Graph Summarization (Multi-GraS) model for extractive text summarization. The authors evaluate their approach on the CNN/DailyMail benchmark dataset and demonstrate its effectiveness. EMNLP 2021
OTExtSum: Extractive Text Summarisation with Optimal Transport The paper is written by a group of researchers from various institutions, including the University of Sydney and Renmin University of China. The abstract does not provide a clear indication of the topic or focus of the paper, but it does list the authors' affiliations and contact information. Further analysis of the full paper would be necessary to understand its content and purpose. NAACL 2022
MemSum: Extractive Summarization of Long Documents Using Multi-Step Episodic Markov Decision Processes MemSum is a reinforcement-learning-based extractive summarizer that considers the text content of the sentence, the global text context of the rest of the document, and the extraction history consisting of the set of sentences that have already been extracted. It obtains state-of-the-art test-set performance in summarizing long documents taken from PubMed, arXiv, and GovReport. Ablation studies demonstrate the importance of local, global, and history information. A human evaluation confirms the high quality and low redundancy of the generated summaries, stemming from MemSum’s awareness of extraction history. ACL 2022
Faithful or Extractive? On Mitigating the Faithfulness-Abstractiveness Trade-off in Abstractive Summarization The paper discusses the issue of faithfulness errors in abstractive summarization systems and proposes a framework for evaluating the effectiveness of such systems. The authors generate a faithfulness-abstractiveness trade-off curve to serve as a control and show that current methods for improving faithfulness fail to consistently improve over the control at the same level of abstractiveness. They then introduce a selector to identify the most faithful and abstractive summary for a given document and demonstrate that this system can attain higher faithfulness scores in human evaluations while being more abstractive than the baseline system on two datasets. Additionally, the authors show that their system achieves a better faithfulness-abstractiveness trade-off than the control at the same level of abstractiveness. ACL 2022
HiStruct+: Improving Extractive Text Summarization with Hierarchical Structure Information The paper proposes a new approach to improve extractive summarization models by explicitly incorporating hierarchical structure information into a pre-trained, encoder-only Transformer language model. The proposed HiStruct+ model outperforms a strong baseline on three datasets, including PubMed and arXiv, and the improvement is more significant for datasets with more conspicuous hierarchical structures. The ablation study shows that the hierarchical position information is the main contributor to the model's state-of-the-art performance. ACL 2022
MuchSUM: Multi-channel Graph Neural Network for Extractive Summarization The paper discusses the limitations of using pre-trained BERT-based encoders for extractive text summarization and proposes a new approach called MuchSUM, which is a multi-channel graph convolutional network that incorporates multiple summary-worthy features. The approach introduces three specific graph channels to encode node textual features, node centrality features, and node position features, respectively, under bipartite word-sentence heterogeneous graphs. A cross-channel convolution operation is designed to distill the common graph representations shared by different channels, and the sentence representations of each channel are fused for extractive summarization. The approach also investigates three weighted graphs in each channel to infuse edge features for graph-based summarization modeling. Experimental results demonstrate that the MuchSUM model can achieve considerable performance compared with some BERT-initialized graph-based extractive summarization systems. SIGIR 2022
SAPGraph: Structure-aware Extractive Summarization for Scientific Papers with Heterogeneous Graph The paper discusses the challenges of scientific paper summarization in NLP and presents a solution called SAPGraph1. The framework utilizes paper structure to generate more comprehensive and valuable summaries compared to previous works that tend to extract summaries from the head of the paper. SAPGraph is based on a structure-aware heterogeneous graph that models the document into a graph with three kinds of nodes and edges based on structure information of facets and knowledge. The paper also provides a large-scale dataset of COVID-19-related papers, CORD-SUM, for experiments. AACL 2022
Does Pretraining for Summarization Require Knowledge Transfer? The paper discusses pretraining techniques in text summarization and challenges the idea that knowledge transfer is the reason for its success. The authors show that pretraining on randomly selected character n-grams can achieve similar performance to models pretrained on real corpora, which could eliminate concerns over offensive language, bias, and copyright issues. The authors also design several tasks to test the structure of pretraining tasks, but find no significant benefit, leaving the possibility of a small role for knowledge transfer. EMNLP 2021
Extractive Entity-Centric Summarization as Sentence Selection using Bi-Encoders The paper discusses entity-centric summarization, which produces a summary of a document specific to a given target entity. Extractive summaries are preferred over abstractive ones as they preserve factuality and can be used in downstream tasks. The authors explore methods to solve this task by recasting it as a sentence selection task, using methods inspired by information retrieval. They test different architecture variants and loss functions and achieve up to a 5.8 F1 improvement over past state-of-the-art and outperform the entity-centric Lead 3 heuristic by 1.1 F1. The authors also show strong results on the related task of salient sentence selection for an entity. AACL 2022
GRETEL: Graph Contrastive Topic Enhanced Language Model for Long Document Extractive Summarization The paper proposes a new model called Graph contRastivE Topic Enhanced Language model (GRETEL) that combines the graph contrastive topic model with pre-trained language models (PLMs) to improve text summarization. The graph contrastive topic model integrates the hierarchical transformer encoder and graph contrastive learning to capture and integrate global semantic information from the document context and the gold summary. GRETEL aims to extract salient sentences that are topically related to the gold summary, rather than redundant sentences that cover sub-optimal topics. Experimental results on general domain and biomedical datasets show that GRETEL outperforms state-of-the-art methods. COLING 2022
HeterGraphLongSum: Heterogeneous Graph Neural Network with Passage Aggregation for Extractive Long Document Summarization The paper discusses the effectiveness of Graph Neural Network (GNN)-based models in Natural Language Processing (NLP) tasks, particularly in Extractive Document Summarization (EDS). However, long-form document summarization using graph-based approaches is still a challenge. The paper proposes a new model called HeterGraphLongSum, which includes three types of semantic units (word, sentence, and passage) to represent long documents in a graph structure. The model achieves promising results for the extractive long document summarization problem without relying on pre-trained language models like BERT. The source code is available on Github for further exploration. COLING 2022
Multi Graph Neural Network for Extractive Long Document Summarization System: The paper discusses the use of Heterogeneous Graph Neural Networks (HeterGNN) for document summarization, specifically for long documents. The authors address the issue of lacking inter-sentence connections and propose a solution by building a graph on sentence-level nodes and combining it with HeterGNN to capture semantic information. The experiments conducted on two benchmark datasets show that this method achieves state-of-the-art results in the field of document summarization. COLING 2022
Compressive Document Summarization via Sparse Optimization System: The paper presents a sparse optimization framework for extractive document summarization with a decomposable convex objective function. An efficient ADMM algorithm is derived to solve it, and an additional sentence dissimilarity term is introduced to encourage diversity in the summaries. The framework achieves significant improvement over previous related work and is generalized to compressive summarization with a block coordinate descent algorithm. The compressive summarization results are competitive against state-of-the-art results while maintaining reasonable readability, as demonstrated on DUC 2006 and DUC 2007 datasets. IJCAI 2015
Optimizing Sentence Modeling and Selection for Document Summarization The paper proposes a new approach to extractive document summarization, which involves selecting salient sentences from a given document. The approach, called DivSelect+CNNLM, addresses two challenges: modeling information redundancy among candidate sentences and selecting the most appropriate sentences. It introduces a novel neural network language model based on convolutional neural network (CNN) to project sentences into dense distributed representations and models sentence redundancy using cosine similarity. The selection process is formulated as an optimization problem, constructing a diversified selection process (DivSelect) to select sentences with high prestige and dissimilarity. The approach is evaluated on benchmark datasets and shows effectiveness in summarization. IJCAI 2015
Improving abstractive summarization with energy-based re-ranking The paper discusses the weaknesses of current abstractive summarization systems, such as the omission of relevant information and the generation of factual inconsistencies. It proposes an energy-based model that learns to re-rank summaries according to recent advances in summarization metrics, which consistently improves the scores achieved by the predicted summaries. However, the paper also notes that the re-ranking approach should be used with care for highly abstractive summaries, as the available metrics are not yet sufficiently reliable for this purpose. EMNLP 2022
Active Learning for Abstractive Text Summarization The paper discusses the challenges of creating human-curated annotated datasets for abstractive text summarization (ATS) and the potential of Active Learning (AL) to reduce the amount of annotation required. However, there were no effective AL query strategies for ATS due to the fact that uncertain instances are usually noisy and selecting them can degrade the model performance. The paper proposes the first effective query strategy for AL in ATS based on diversity principles, which improves the model performance in terms of ROUGE and consistency scores. The paper also analyzes the effect of self-learning and shows that it can further increase the performance of the model. EMNLP 2022
AttSum: Joint Learning of Focusing and Summarization with Neural Attention The paper discusses the challenges of extractive query-focused summarization, specifically the tasks of query relevance ranking and sentence saliency ranking. Previous systems have struggled to perform both tasks effectively, but the proposed system, AttSum, tackles them jointly using distributed representations and an attention mechanism. The system is evaluated on benchmark datasets and achieves competitive performance without the use of hand-crafted features. The authors also observe that the sentences identified as relevant to the query do indeed meet the query's needs. COLING 2016
Summarising the points made in online political debates The paper proposes an abstractive approach to summarize argumentative discussions in online communities. The approach extracts key content through 'point' extraction, where a point is a verb and its syntactic arguments. The approach uses dependency parse information and verb case frames to identify and extract valid points and generates an abstractive summary that discusses the key points being made in the debate. The approach was evaluated using a corpus of online political debates and showed significant improvements over a high-performing extractive summarizer. ACL 2016
Neural Summarization by Extracting Sentences and Words System: The paper proposes a new approach to extractive summarization using neural networks and continuous sentence features. The approach includes a hierarchical document encoder and an attention-based extractor, allowing for different classes of summarization models. The models were trained on large scale corpora and achieved results comparable to the state of the art without any linguistic annotation. ACL 2016
Summarizing Lengthy Questions System: The paper proposes the task of question summarization and analyzes question-summary pairs from a Community Question Answering site. It finds that some questions require abstractive approaches instead of extractive approaches. The authors created a dataset and trained extractive and abstractive summarization models, comparing them based on ROUGE scores and manual evaluations. The results show that an abstractive method using an encoder-decoder model with a copying mechanism performs better according to both ROUGE-2 F-measure and human judges' evaluations. IJCNLP 2017
SEHY: A Simple yet Effective Hybrid Model for Summarization of Long Scientific Documents The paper discusses the challenges of long-document summarization and proposes a Simple yet Effective HYbrid approach (SEHY) that selects salient sections instead of sentences for summary generation. The approach exploits discourse information and avoids fulltext understanding while retaining salient information within the length limit. The paper also presents two strategies for training the extractor and evaluates the approach on a large-scale scientific paper dataset. The authors also discuss how the disciplinary class of a scientific paper affects the performance of SEHY. Experimental results show the effectiveness of the approach and interesting findings on arXiv and its subsets. AACL 2022
Learning to Extract Coherent Summary via Deep Reinforcement Learning The paper proposes a neural coherence model to capture cross-sentence semantic and syntactic coherence patterns in order to extract more coherent summaries. The proposed model can be trained in an end-to-end fashion using unlabeled data and is used in combination with the ROUGE package to design a reinforcement learning method to train a neural extractive summarizer called the Reinforced Neural Extractive Summarization (RNES) model. The RNES model learns to optimize coherence and informative importance of the summary simultaneously and outperforms existing baselines in terms of ROUGE on the CNN/Daily Mail dataset. The qualitative evaluation shows that summaries produced by RNES are more coherent and readable. AAAI 2018
Neural Document Summarization by Jointly Learning to Score and Select Sentences The paper presents a new approach to extractive document summarization that combines sentence scoring and selection into a single neural network framework. The approach uses a hierarchical encoder to represent the document sentences and integrates the selection strategy into the scoring model. Experiments on the CNN/Daily Mail dataset show that the proposed framework outperforms existing extractive summarization models. ACL 2018
Iterative Document Representation Learning Towards Summarization with Polishing ITS is a new model for extractive text summarization that iteratively polishes the document representation on multiple passes through the document, inspired by the observation that humans often need to read an article multiple times to fully understand and summarize its contents. The model also includes a selective reading mechanism that accurately determines the extent to which each sentence should be updated. Experimental results on two datasets show that ITS outperforms state-of-the-art extractive systems when evaluated by both machines and humans. EMNLP 2018
GENERATING WIKIPEDIA BY SUMMARIZING LONG SEQUENCES The paper discusses a method for generating English Wikipedia articles by summarizing source documents using extractive summarization and a neural abstractive model. The abstractive model uses a decoder-only architecture that can attend to very long sequences, allowing it to generate fluent and coherent multi-sentence paragraphs and even whole articles. The model is able to extract relevant factual information when given reference documents, as reflected in perplexity, ROUGE scores, and human evaluations. ICLR 2018
Improving Latent Alignment in Text Summarization by Generalizing the Pointer Generator The paper discusses the limitations of Pointer Generators in modern summarization systems, which are restricted to exact word matches and result in a bias towards extractive generations. The authors propose a solution by allowing the model to "edit" pointed tokens, transforming them into a target space with a learned relation embedding. The model is shown to capture more latent alignment relations, improve word alignment accuracy, generate higher quality summaries, and bring more abstraction to the generated summaries. The proposed approach is validated on three large-scale summarization datasets. EMNLP 2019
An Editorial Network for Enhanced Document Summarization System: The paper proposes a new approach to summarization called the Editorial Network, which combines extractive and abstractive methods. This approach is applied as a postprocessing step to a sequence of extracted sentences. The paper also suggests a novel soft-labeling approach for training the "editor." The effectiveness of this approach is demonstrated using the CNN/DailyMail dataset, and it is shown to outperform state-of-the-art extractive-only or abstractive-only baselines. ACL 2019
Transformer-based Model for Single Documents Neural Summarization The paper proposes a system that enhances performance on single document summarization tasks using the CNN/DailyMail and Newsroom datasets. The system follows the encoder-decoder paradigm but with a focus on the encoder. The authors introduce a framework that encodes the source text with a transformer and then a sequence-to-sequence model. They find that the transformer and seq2seq model complement each other, resulting in a richer encoded vector representation. Additionally, paying more attention to the vocabulary of target words during abstraction improves performance. The authors experiment with their hypothesis and framework on extractive and abstractive single document summarization tasks and evaluate using the CNN/DailyMail and Newsroom datasets. ACL 2019
Jointly Extracting and Compressing Documents with Summary State Representations The paper presents a new neural model for text summarization that extracts sentences from a document and compresses them to generate concise and informative summaries. The model dynamically determines the length of the output summary based on gold summaries observed during training, and does not require length constraints typical to extractive summarization. The model achieves state-of-the-art results on the CNN/DailyMail and Newsroom datasets, improving over current extractive and abstractive methods. A new dataset of oracle compressive summaries derived automatically from the CNN/DailyMail reference summaries is also made available. NAACL 2019
Single Document Summarization as Tree Induction The paper proposes a new approach to single-document extractive summarization, using a multi-root dependency tree to generate summaries. The model is designed to refine its structures through an iterative algorithm, and is shown to perform competitively against existing methods on two benchmark datasets. This approach differs from previous methods that rely on linguistically motivated document representations. NAACL 2019
Inducing Document Structure for Aspect-based Summarization The paper discusses aspect-based summarization, which generates a summary centered around a specific aspect of a document. The authors induce latent document structure and train their models in a scalable synthetic setup, resulting in improvements in summarization over topic-agnostic baselines. The models accurately segment documents by aspect and can produce both abstractive and extractive aspect-based summaries. The learned document structure is particularly advantageous for summarizing long documents, and the results transfer from synthetic training documents to natural news articles from CNN/Daily Mail and RCV1. ACL 2019
Cross-Task Knowledge Transfer for Query-Based Text Summarization System: The paper explores the possibility of transferring knowledge between machine reading comprehension (MRC) and query-based text summarization. The authors use an MRC model trained on the SQuAD1.1 dataset to build an extractive query-based summarizer, which compresses the output of the MRC model using a new sentence compression technique. They also use pre-trained machine translation systems to abstract the extracted summaries. The models achieve state-of-the-art results on the CNN/Daily Mail and Debatepedia datasets, and can serve as powerful baselines for future systems. The authors hope that their results will encourage further research on transfer learning from large MRC corpora to query-based summarization. ACL 2019
Read what you need: Controllable Aspect-based Opinion Summarization of Tourist Reviews The paper discusses the time-consuming process of manually extracting relevant aspects and opinions from large volumes of user-generated text. It proposes a solution for generating personalized aspect-based opinion summaries from online tourist reviews, allowing readers to control various attributes of the summary. The approach involves an unsupervised method to extract coherent aspects and an Integer Linear Programming (ILP) based extractive technique to select informative opinions around those aspects while respecting user-specified values. The authors evaluate and compare their summaries using crowdsourcing and ROUGE-based metrics and obtain competitive results. SIGIR 2020
Copy or Rewrite: Hybrid Summarization with Hierarchical Reinforcement Learning The paper proposes a hybrid framework for summarization called HYSUM that combines extractive and abstractive methods to generate informative and concise summaries. Existing extract-then-abstract methods suffer from information loss in the abstraction step, but HYSUM can switch between copying and rewriting sentences based on redundancy to effectively combine the advantages of both methods. The paper also proposes an end-to-end reinforcing method based on Hierarchical Reinforcement Learning to enhance cooperation between the extraction and rewriting modules. Automatic and human evaluations show that HYSUM outperforms existing models on the CNN/DailyMail corpus. AAAI 2020
Few-Shot Learning for Opinion Summarization The paper discusses the task of opinion summarization, which involves creating text that reflects subjective information expressed in multiple documents, such as user reviews of a product. The lack of large datasets for training supervised models has led to the use of extractive methods that select text fragments in an unsupervised or weakly-supervised way. However, recent research has shown that abstractive summaries can also be produced in an unsupervised fashion. The paper presents a method that uses a handful of summaries to bootstrap the generation of summary text with expected properties such as writing style, informativeness, fluency, and sentiment preservation. The approach involves training a conditional Transformer language model to generate a new product review given other available reviews of the product, and fine-tuning a plug-in module that predicts property values on a handful of summaries. The approach outperforms previous extractive and abstractive methods in automatic and human evaluation on Amazon and Yelp datasets. EMNLP 2020
Interactive Text Ranking with Bayesian Optimization: A Case Study on Community QA and Summarization text ranking. The paper proposes an interactive text ranking approach that uses Bayesian optimization to focus on high-quality candidates and integrate prior knowledge to address the lack of user or task-specific training data. The approach significantly outperforms existing interactive approaches in community question answering and extractive multidocument summarization. The ranking function learned by the method is also an effective reward function for reinforcement learning, improving the state of the art for interactive text ranking. TACL 2020
Compressive Summarization with Plausibility and Salience Modeling The paper proposes a new approach to compressive summarization that uses data-driven criteria of plausibility and salience to determine which spans of sentences can be deleted. A pre-trained Transformer model judges each criterion, and only deletions that are both plausible and not salient are applied. The approach achieves strong in-domain results on benchmark summarization datasets and can generalize cross-domain with fine-tuning on only 500 samples. Human evaluation shows that the plausibility model generally selects for grammatical and factual deletions. EMNLP 2020
News Editorials: Towards Summarizing Long Argumentative Texts The paper discusses the lack of exploration in automatic summarization of argumentative texts and presents a new corpus of 1330 summaries for 266 news editorials. The summaries are evaluated based on a specific annotation scheme and aim to be thesis-indicative, persuasive, reasonable, concise, and self-contained. The corpus contains at least three high-quality summaries for about 90% of the editorials, making it useful for the development and evaluation of summarization technology for long argumentative texts. The paper also reports on an in-depth corpus analysis and the evaluation of two extractive summarization models. COLING 2020
Globalizing BERT-based Transformer Architectures for Long Document Summarization The paper discusses the limitations of using current transformer-based architectures for fine-tuning large language models on downstream tasks that require reasoning with long documents. To address this issue, the authors introduce a novel hierarchical propagation layer that spreads information between multiple transformer windows. They validate the effectiveness of their approach on three extractive summarization corpora of long scientific papers and news articles and report state-of-the-art results for long document summarization and comparable results for smaller document summarization. EACL 2021
Summarizing Long-Form Document with Rich Discourse Information The paper proposes a new extractive summarization model called HEROES to address the deficiencies of existing models for summarizing long-form documents. The two main deficiencies are the increase in computation due to the size of the input document and the lack of exploitation of discourse structural information. HEROES consists of two modules: a content ranking module that selects important sections and sentences to create a short digest, and an extractive summarization module based on a heterogeneous graph with nodes from different discourse levels and designed edge connections to reflect the discourse hierarchy of the document. Experimental results show that HEROES outperforms various strong baselines. CIKM 2021
Demoting the Lead Bias in News Summarization via Alternating Adversarial Learning System: This paper introduces a new technique to reduce lead bias in news articles and improve the performance of neural extractive summarizers on data with different or no bias. The experiments conducted on two news corpora show that this technique effectively reduces the model's learned lead bias and improves its generality on out-of-distribution data, without any significant loss in performance on in-distribution data. ACL 2021
Contextualized Rewriting for Text Summarization The paper discusses the limitations of extractive summarization and the potential benefits of abstractive rewriting. However, abstractive rewriting systems only consider extracted summaries as input, which can result in the loss of important background knowledge. To address this issue, the authors propose a contextualized rewriting approach that takes in the entire original document. They formalize this approach as a seq2seq problem with group alignments and introduce group tags to model the alignments. The system identifies extracted summaries through content-based addressing and achieves significant improvements on ROUGE scores compared to non-contextualized rewriting systems without requiring reinforcement learning. AAAI 2021
AUTOSUMM: Automatic Model Creation for Text Summarization The paper proposes methods to automatically create deep learning models for extractive and abstractive text summarization tasks, which have shown state-of-the-art performances on various datasets. The methods use a combination of Neural Architecture Search and Knowledge Distillation techniques, leveraging the knowledge provided by large language models such as BERT and GPT-2 to develop smaller, customized models for any given dataset. The proposed methods achieve near state-of-the-art performances in terms of accuracy while reducing inference time and model size. EMNLP 2021
Leveraging Information Bottleneck for Scientific Document Summarization The paper presents an unsupervised extractive approach to summarize scientific long documents using the Information Bottleneck principle. The approach involves using signals as queries to retrieve key content from the source document, followed by a pre-trained language model to conduct further sentence search and editing to return the final extracted summaries. The framework can be extended to a multi-view framework by different signals. The proposed framework was evaluated on three scientific document datasets and was found to be effective. Human evaluation suggests that the extracted summaries cover more content aspects than previous systems. EMNLP 2021
Hierarchical Heterogeneous Graph Attention Network for Syntax-Aware Summarization The paper proposes a new approach to summarization that incorporates the constituent structure of the text using Graph Neural Networks. They use a hierarchical heterogeneous graph attention network over constituency-based parse trees for syntax-aware summarization, which reflects how humans construct summaries hierarchically. The model is effective for both abstractive and extractive summarization tasks on five benchmark datasets from various domains, and further performance improvement can be obtained using state-of-the-art pre-trained models. AAAI 2022
TSTR: Too Short to Represent, Summarize with Details! Intro-Guided Extended Summary Generation The paper discusses the challenges of generating long/extended summaries for scientific papers, which provide more detailed information than traditional abstracts. The authors propose an extractive summarizer called TSTR that uses introductory information as pointers to salient information. The evaluations on two large-scale datasets show significant improvement in ROUGE and average ROUGE scores compared to strong baselines and state-of-the-art methods. Human evaluations also favor TSTR-generated extended summaries in terms of cohesion and completeness. NAACL 2022
Supporting content evaluation of student summaries by Idea Unit embedding The paper proposes a method for computer-assisted content evaluation of summaries by establishing a correspondence between segments of the source text and its summary using "Idea Units (IUs)." The IU correspondence is based on the similarity between vector representations of IU. The proposed method is more robust against rephrased expressions than conventional ROUGE-based baselines and outperformed the baselines in recall. The proposed method has been implemented in a GUI tool called "Segment Matcher" to help teachers establish a link between corresponding IUs across the summary and source text. ACL 2019
COLO: A Contrastive Learning based Re-ranking Framework for One-Stage Summarization The paper proposes a new framework called COLO for one-stage summarization that uses contrastive learning to generate summaries directly based on summary-level scores, without additional modules or parameters. The framework improves extractive and abstractive results on the CNN/DailyMail benchmark while maintaining parameter and inference efficiency. Compared to state-of-the-art multi-stage systems, COLO saves more than 100 GPU training hours and has a 3-8x speed-up ratio during inference while achieving comparable results. COLING 2022
Scientific Article Summarization Using Citation-Context and Article's Discourse Structure The paper proposes a new approach to summarizing scientific articles that takes into account citation-context and the document discourse model. The method overcomes the problem of inconsistency between citation summaries and the article's content by providing context for each citation. The approach leverages the inherent scientific article's discourse for producing better summaries and shows a significant improvement over existing summarization approaches in terms of ROUGE scores on a scientific summarization dataset. The method is adaptable to other domains beyond the biomedical domain used for evaluation. EMNLP 2015
Summarizing Student Responses to Reflection Prompts The paper proposes a new algorithm for summarizing student responses to reflection prompts. Unlike traditional methods, the algorithm creates summaries from extracted phrases rather than sentences, and ranks the phrases by the number of students who mention them. Experimental results show that this approach outperforms other summarization methods in terms of ROUGE scores. EMNLP 2015
Movie Script Summarization as Graph-based Scene Extraction System: The paper discusses the task of movie script summarization and how it can improve script browsing, provide a general idea of the plotline, and reduce reading time. The authors propose a graph-based model that selects an optimal chain of scenes by considering logical progression, diversity, and importance. Human evaluation shows that their model produces more informative summaries compared to other methods. NAACL 2015
Contextualizing Citations for Scientific Summarization using Word Embeddings and Domain Knowledge The paper proposes an unsupervised model that uses distributed representation of words and domain knowledge to extract context from referenced papers to reflect their exact contributions. The model significantly outperforms the state-of-the-art and improves citation-based summarization of scientific articles. The paper highlights the importance of appropriate context for citation texts and presents a solution to address this problem. SIGIR 2017
Coarse-to-Fine Attention Models for Document Summarization The paper proposes a new approach to document summarization using a coarse-to-fine attention model that hierarchically reads a document. This approach selects top-level chunks of text using coarse attention and then reads the words of the chosen chunks using fine attention. Unlike standard attention models, this method scales with the number of top-level chunks and can handle longer sequences. While it may lag behind state-of-the-art baselines, the proposed method achieves the desired behavior of sparsely attending to subsets of the document for generation. EMNLP 2017
Concept-based Summarization using Integer Linear Programming: From Concept Pruning to Multiple Optimal Solutions System: The paper discusses the challenges of sentence selection in concept-based summarization, which is modelled as a budgeted maximum coverage problem. To find optimal solutions efficiently, low-weight concepts need to be pruned. However, reducing the number of concepts leads to lower ROUGE scores and multiple optimal solutions. The authors propose an extension to the model that provides a single optimal solution and eliminates the need for concept pruning using an approximation algorithm that achieves comparable performance to exact inference. EMNLP 2015
Generating Coherent Summaries of Scientific Articles Using Coherence Patterns System: The paper introduces a new approach to automatic summarization of scientific articles that takes into account coherence. The approach uses a graph-based model and coherence patterns mined from a corpus of abstracts to generate summaries that are coherent, important, and non-redundant. The approach is optimized using Mixed Integer Programming and outperforms baseline and state-of-the-art systems in terms of coherence and relevance. EMNLP 2016
Summarizing topical contents from PubMed documents using a thematic analysis System: The paper proposes a method for improving the search and browsing experience in PubMed by finding sub-topics or themes from a set of documents and computing representative titles for each theme. The method combines a thematic clustering algorithm and the Pool Adjacent Violators algorithm to induce significant themes. The system was tested on five disease sets from OMIM and outperformed LDA in terms of performance measures. The quality of theme titles was also evaluated by comparing them with manually created titles. EMNLP 2015
Using Relevant Public Posts to Enhance News Article Summarization The paper explores using public posts on social media to improve automatic summary generation for news articles. Different approaches are proposed, including using frequency information from posts to re-estimate bigram weights and re-weighting a dependency tree edge's importance for sentence compression. The experiments conducted on Facebook data show that relevant public posts can be effectively leveraged to improve news article summarization results. COLING 2016
Learning-Based Single-Document Summarization with Compression and Anaphoricity Constraints The paper presents a model for single-document summarization that combines compression and anaphoricity constraints. The model selects textual units for the summary based on learned weights from a large corpus. Compression rules allow for content deletion within a sentence, and anaphoricity constraints ensure cross-sentence coherence by including pronoun antecedents or rewriting pronouns as full mentions. The final system outperforms prior work on both ROUGE and human judgments of linguistic quality. ACL 2016
Automatic Summarization of Student Course Feedback System: The paper proposes a new approach to summarizing student course feedback using the integer linear programming (ILP) framework. This approach allows different student responses to share co-occurrence statistics and alleviates sparsity issues. The experimental results on a student feedback corpus show that this approach outperforms a range of baselines in terms of both ROUGE scores and human evaluation. NAACL 2016
Low-Resource Neural Headline Generation System: This paper discusses the challenges of improving headline quality on smaller datasets using neural headline generation models. The authors propose a new method that allows for pre-training all parameters of the model and utilizing all available text. This approach resulted in significant improvements in perplexity and ROUGE scores, with up to a 32.4% relative improvement in perplexity and 2.84 points in ROUGE. EMNLP 2017
Automatic Text Summarization Using Reinforcement Learning with Embedding Features System: The paper discusses the use of simple embedding features in a Reinforcement learning approach to automatic text summarization. The authors propose a new deep learning network for estimating Qvalues used in Reinforcement learning and evaluate their model using ROUGE scores with various datasets. The results show that their model is competitive with previous models. IJCNLP 2017
Vocabulary Tailored Summary Generation The paper proposes a neural framework for generating summaries that are tailored to the linguistic preferences of a specific audience. Existing frameworks do not take into account such preferences, but the proposed method tunes the summary words at the time of generation to match the target vocabulary. The evaluations show that the proposed approach maintains a superior summary quality compared to a word embedding based lexical substitution algorithm. The paper demonstrates two applications of the proposed approach to generate summaries with simpler or shorter words for better readability. COLING 2018
ScisummNet: A Large Annotated Corpus and Content-Impact Models for Scientific Paper Summarization with Citation Networks The paper discusses the challenges of scientific article summarization and proposes solutions to these challenges. The authors develop and release a large-scale manually-annotated corpus for scientific papers on computational linguistics and propose summarization methods that integrate the authors' original highlights and the article's actual impacts on the community to create comprehensive, hybrid summaries. The authors conduct experiments to demonstrate the efficacy of their corpus in training data-driven models for scientific paper summarization and the advantage of their hybrid summaries over abstracts and traditional citation-based summaries. The large annotated corpus and hybrid methods provide a new framework for scientific paper summarization research. AAAI 2019
Better Rewards Yield Better Summaries: Learning to Summarise Without References The paper discusses the limitations of using ROUGE scores as rewards in Reinforcement Learning (RL) based document summarisation systems, as high ROUGE scores do not necessarily correspond to high human judgement. To address this, the authors learn a reward function from human ratings on 2,500 summaries, which only takes the document and system summary as input. The learned rewards are shown to have significantly higher correlation with human ratings than previous approaches. The authors conduct human evaluation experiments and find that RL systems using their learned rewards generate summaries with higher human ratings compared to state-of-the-art supervised-learning systems and ROUGE-as-rewards RL summarisation systems. The learned reward function and source code are available at https://github.com/yg211/summary-reward-no-reference. EMNLP 2019
Towards Annotating and Creating Sub-Sentence Summary Highlights The paper discusses the benefits of creating summary highlights at the sub-sentence level and proposes a method for generating them by annotating summary-worthy sub-sentences and teaching classifiers to do the same. The task is framed as jointly selecting important sentences and identifying a single most informative textual unit from each sentence, which reduces the complexity involved in sentence compression. The study provides new benchmarks and baselines for generating highlights at the sub-sentence level. EMNLP 2019
Neural Review Summarization Leveraging User and Product Information The paper discusses product review summarization, which is a personalized and targeted form of text summarization that provides a brief summary of an online product review. The authors explore different ways to use user and product information to improve review summarization and demonstrate that their approaches are highly effective and outperform existing summarization methods. This technique is useful for both sellers and consumers in making purchase decisions. CIKM 2019
Global Optimization under Length Constraint for Neural Text Summarization The paper proposes a global optimization method called GOLC for neural text summarization models that increases the probabilities of generating summaries with high evaluation scores within a desired length. The method is compared to two other optimization methods on two datasets and the results show that GOLC generates fewer overlength summaries while maintaining the fastest processing speed. The importance of generating in-length summaries for post-editing is also demonstrated, with approximately 30% to 40% improved post-editing time by use of in-length summaries. ACL 2019
From Arguments to Key Points: Towards Automatic Argument Summarization The paper proposes a method for generating concise summaries from a large collection of arguments on a given topic by representing them as a small set of key points, each scored according to its salience. The authors analyze a large dataset of crowd-contributed arguments and find that a small number of key points per topic is typically sufficient for covering the vast majority of the arguments. They also show that a domain expert can often predict these key points in advance. The paper introduces a novel large-scale dataset for the task of argument-to-key point mapping and reports promising empirical results for an extensive set of experiments with this dataset. ACL 2020
Systematically Exploring Redundancy Reduction in Summarizing Long Documents The paper explores the problem of redundancy in neural summarization and proposes three new methods to balance non-redundancy and importance when summarizing long documents. The authors organize existing methods into categories based on when and how redundancy is considered and show that their proposed methods achieve state-of-the-art ROUGE scores while significantly reducing redundancy on two scientific paper datasets. AACL 2020
Weakly-Supervised Opinion Summarization by Leveraging External Information The paper proposes a generative method called ASPMEM for opinion summarization from online product reviews. ASPMEM contains an array of memory cells to store aspect-related knowledge, which helps obtain a better opinion representation and infer aspect information more precisely. The method is evaluated on both aspect identification and opinion summarization tasks and outperforms state-of-the-art methods without relying on human supervision. The proposed method uses domain knowledge from external sources to automatically identify relevant aspects, eliminating the need for additional human effort. AAAI 2020
Quantitative Argument Summarization and Beyond: Cross-Domain Key Point Analysis The paper discusses the importance of not only extracting salient points when summarizing a collection of views, arguments or opinions, but also quantifying their prevalence. The traditional approach of creating textual summaries lacks this quantitative aspect. The paper proposes a method for automatic extraction of key points, which enables fully automatic analysis and achieves performance comparable to a human expert. The applicability of key point analysis goes beyond argumentation data, as demonstrated by promising results in municipal surveys and user reviews. The paper also presents an in-depth evaluation of argument-to-key point matching models, where previous results are substantially outperformed. EMNLP 2020
WikiAsp: A Dataset for Multi-domain Aspect-based Summarization information. The paper proposes a new dataset, WikiAsp, for multi-domain aspect-based summarization, which aims to encourage research in open-domain aspect-based summarization. The dataset is built using Wikipedia articles from 20 different domains, and several baseline models are proposed and tested. The results highlight challenges that existing summarization models face in this setting, such as handling pronouns and time-sensitive information. TACL 2021
A Joint Model for Structure-based News Genre Classification with Application to Text Summarization The paper proposes a joint model for structure-based news genre classification that identifies one of four commonly used news structures and recognizes a sequence of news elements within the article that define the corresponding news structure. The joint model consistently outperforms its variants that perform two tasks independently, which supports the idea that preserving the two-way dependencies and constraints between a type of news structure and its sequence of news elements enables the model to better predict both of them. The system's predicted news structure type and news elements have improved the performance of text summarization when incorporated into a recent neural network system. ACL 2021
Bringing Structure into Summaries: a Faceted Summarization Dataset for Long Scientific Documents System: The paper discusses faceted summarization, which provides multiple summaries of a long document from different perspectives, each targeting specific sections such as purpose, method, findings, and value. The lack of large-scale faceted summarization datasets has hindered research in this area, but the authors present FacetSum, a benchmark built on Emerald journal articles covering diverse domains. The study's analyses and empirical results highlight the importance of structured summaries, and the authors believe FacetSum will drive further advances in summarization research and NLP systems that can leverage structured information in both long texts and summaries. ACL 2021
Utility of Missing Concepts inQuery-biased Summarization The paper discusses a new approach to query-biased summarization (QBS) that aims to reduce user effort in finding relevant documents. The approach identifies missing information in a retrieved document and presents it in a search interface for crowd workers to judge document relevance based on snippets and missing information. The method, called DSPApprox, uses classical approaches to find terms or phrases relevant to a query. The experimental results show both benefits and limitations of the method compared with traditional ones that only show relevant snippets. SIGIR 2021
LenAtten: An Effective Length Controlling Unit For Text Summarization The paper discusses fixed length summarization and the trade-off between length controllability and summary quality. The authors introduce a new length controlling unit called LenAtten, which improves length controllability and ROGUE scores while maintaining great generalization ability. The experimental results show that their model is significantly better than the best-performing length controllable summarizer on the CNN/Daily Mail dataset. ACL 2021
Word Graph Guided Summarization for Radiology Findings The paper discusses the challenges faced by radiologists in writing impression sections of radiology reports, which summarize essential findings and are critical for communicating medical information to physicians. Automatic impression generation has emerged as an attractive research direction to facilitate this clinical practice. The paper proposes a novel method for automatic impression generation, where a word graph is constructed from the findings to record critical words and their relations, and a Word Graph guided Summarization model (WGSUM) is designed to generate impressions with the help of the word graph. Experimental results on two datasets confirm the validity and effectiveness of the proposed approach, achieving state-of-the-art results. Further experiments are conducted to analyze the impact of different graph designs on the performance of the method. ACL 2021
PASS: Perturb-and-Select Summarizer for Product Reviews The paper discusses the challenges of automatically producing concise and informative summaries for product reviews, including the tendency for summarizers to favor generic content and the potential for self-contradicting summaries due to reviewer disagreements. The authors propose the PASS system, which uses a pre-trained Transformer-based model and applies systematic perturbations to generate multiple summaries per product. The system also includes a method for ranking the summaries based on coherence. The authors compare their system to other methods and show that it produces more informative, diverse, and coherent summaries. ACL 2021
Decision-Focused Summarization The paper proposes a new approach to summarization called decision-focused summarization, which aims to summarize relevant information for a particular decision. They use a predictive model to make the decision based on the full text and then select representative sentences that lead to similar model decisions while accounting for non-redundancy. The method, called DecSum, is evaluated on a testbed where the task is to summarize restaurant reviews to predict future ratings on Yelp. DecSum outperforms other summarization methods in decision faithfulness and representativeness and enables humans to outperform random chance in predicting which restaurant will be better rated in the future. EMNLP 2021
QSG Transformer: Transformer withQuery-Attentive Semantic Graph forQuery-Focused Summarization The paper discusses the task of Query-Focused Summarization (QFS) and the limitations of Transformer-based summarization models in utilizing relationships between distant words and query information. To address these issues, the authors propose the QSG Transformer, a novel QFS model that leverages structure information on Query-attentive Semantic Graph (QSG). The QSG node representation is improved by a query-attentive graph attention network, which spreads the information of the query node into QSG using Personalized PageRank. The proposed method achieves superior performance over state-of-the-art models on two QFS datasets. SIGIR 2022
Enhancing Scientific Papers Summarization with Citation Graph The paper proposes a new approach to scientific paper summarization that utilizes the citation network of the papers. The authors argue that previous approaches have focused too much on the content of the papers and have not taken into account the importance of the citation network. They introduce a new model called CGSUM that incorporates both the source paper and its references. They also construct a new dataset called Semantic Scholar Network (SSN) that contains 141K research papers and 661K citation relationships. The experiments show that the proposed model outperforms pretrained models even with a simple architecture and that the citation graph is crucial for generating high-quality summaries. AAAI 2021
MIRANEWS: Dataset and Benchmarks for Multi-Resource-Assisted News Summarization The paper discusses the problem of 'extrinsic hallucinations' in single-document news summarization, where the summary contains facts not present in the source document. The authors propose using multiple supplementary resource documents to mitigate this problem and present a new dataset called MIRANEWS to benchmark existing summarization models. They show that more than 27% of facts mentioned in the gold summaries of MIRANEWS are better grounded on assisting documents than in the main source articles. The authors also conduct an error analysis of generated summaries from pretrained models fine-tuned on MIRANEWS, revealing that assisted summarization reduces 55% of hallucinations when compared to single-document summarization models trained on the main article only. EMNLP 2021
Graph Enhanced Contrastive Learning for Radiology Findings Summarization The paper proposes a unified framework for automatic impression generation in radiology reports that leverages both extra knowledge and the original findings in an integrated way. The proposed method encodes each input finding using a text encoder and constructs a graph through its entities and dependency tree. A graph encoder is then adopted to model relation information in the constructed graph. Finally, contrastive learning is introduced to emphasize key words in the findings. The experimental results on OpenI and MIMIC-CXR confirm the effectiveness of the proposed method. ACL 2022
Comparative Opinion Summarization via Collaborative Decoding The paper proposes a new task called comparative opinion summarization, which generates two contrastive summaries and one common summary from two different sets of reviews to help users compare multiple choices. The authors develop a framework called COCOSUM, which consists of two base summarization models that jointly generate the summaries. Experimental results show that COCOSUM produces higher-quality summaries than existing opinion summarization models. The dataset and code are available for use. ACL 2022
Focus on the Action: Learning to Highlight and Summarize Jointly for Email To-Do Items Summarization The paper discusses the task of automatic email to-do item generation, which involves generating action mentions from emails to help people schedule their daily work. The authors propose a learning to highlight and summarize framework (LHS) to identify the most salient text and actions and generate more faithful to-do items. The LHS model outperforms baseline models and achieves state-of-the-art performance in both quantitative evaluation and human judgement. The paper also highlights specific challenges that current models face with email to-do summarization. ACL 2022
Semantic Overlap Summarization among Multiple Alternative Narratives: An Exploratory Study The paper introduces a new NLP task called Semantic Overlap Summarization (SOS) which involves generating a summary from multiple alternative narratives. The authors created a benchmark dataset by collecting alternative narrative pairs and manually creating reference summaries. They found that the popular ROUGE metric is not suitable for evaluating this task and instead used a sentencewise annotation technique with three overlap labels. Their experiments showed that this technique yielded higher correlation with human judgment and higher inter-rater agreement compared to the ROUGE metric. COLING 2022
Generation of Patient After-Visit Summaries to Support Physicians The paper discusses the problem of physicians not having enough time to write clear and informative after-visit summaries for patients, and explores the possibility of using automatic generation of summaries. The study uses a clinical dataset to examine whether automatic summaries can effectively convey the important details of clinical visits. The results suggest that generating lay language after-visit summaries is still a challenging task, but a feedback mechanism is introduced to alert physicians when automatic summaries fail to capture important details or contain potentially detrimental information. Automatic and human evaluation shows the effectiveness of this approach in providing writing feedback and supporting physicians. COLING 2022
ENTSUM: A Data Set for Entity-Centric Summarization The paper discusses controllable summarization, which aims to provide summaries that take into account user-specified aspects and preferences. The authors introduce a human-annotated data set (ENTSUM) for controllable summarization with a focus on named entities as the aspects to control. They conduct an extensive analysis and show that existing methods for controllable summarization fail to generate entity-centric summaries. The authors propose extensions to state-of-the-art summarization approaches that achieve substantially better results on their data set. The paper highlights the challenging nature of this task and the proposed data set. ACL 2022
Quantifying Appropriateness of Summarization Data for Curriculum Learning The paper proposes a method of curriculum learning to train summarization models from noisy data. They use sequence-to-sequence models and propose a model that can quantify noise from a single noisy corpus. They conduct experiments on three summarization models and show that their method improves performance. They also analyze how different curricula affect the performance of pretrained and nonpretrained summarization models. Human evaluation results also show that their method improves the performance of summarization models. EACL 2021
NEWSROOM: A Dataset of 1.3 Million Summaries with Diverse Extractive Strategies The paper introduces NEWSROOM, a dataset of 1.3 million articles and summaries from 38 major news publications, extracted from search and social media metadata between 1998 and 2017. The summaries demonstrate a high diversity of summarization styles, combining abstractive and extractive strategies. The authors analyze the extraction strategies used in NEWSROOM summaries and compare them to other datasets to evaluate its diversity and difficulty. They also train existing methods on the data to evaluate its utility and challenges. The dataset is available online at summari.es. NAACL 2018
Summarizing Community-based Question-Answer Pairs The paper proposes a new task of summarizing Community-based Question Answering (CQA) pairs to help users quickly digest key information. The authors design a multi-stage data annotation process and create a benchmark dataset, COQASUM, based on the Amazon QA corpus. They compare extractive and abstractive summarization methods and establish a strong baseline approach called DedupLED. The experiment confirms two key challenges, sentencetype transfer and deduplication removal, towards the CQA summarization task. The data and code are publicly available. EMNLP 2022
Intrinsic Evaluation of Summarization Datasets The paper discusses the importance of high quality data for building statistical models in natural language processing (NLP), and the need to evaluate data quality during dataset construction or post hoc. It highlights that popular summarization datasets are often drawn from natural sources without quality assurance guarantees, and that data quality has gone largely unquestioned in recent summarization research. The authors introduce 5 intrinsic metrics and apply them to 10 popular datasets, finding that data usage in recent summarization research is sometimes inconsistent with the underlying properties of the datasets employed. They also discover that their metrics can serve as inexpensive heuristics for detecting low quality examples. EMNLP 2020
A Novel Wikipedia based Dataset for Monolingual and Cross-Lingual Summarization The paper discusses the challenge of cross-lingual summarization and the lack of available resources for this task. To address this issue, the authors present a new dataset for monolingual and cross-lingual summarization in the English-German pair. They collected high-quality cross-lingual data from Spektrum der Wissenschaft and complemented it with a similar dataset from the Wikipedia Science Portal. The authors also conducted experiments with various summarization models and found that the proposed dataset is useful for monolingual and cross-lingual summarization. EMNLP 2021
How Domain Terminology Affects Meeting Summarization Performance The paper discusses the importance of meetings in organizations and the need for a meeting summarization system to help users quickly search and sift through large meeting collections. The authors analyze the impact of domain terminology, or jargon terms, on the performance of meeting summarization and find that it can have a substantial impact. They create gold-standard annotations for jargon terms on a sizable meeting corpus and publicly release all domain terminology to advance research in meeting summarization. COLING 2020
Automatically Discarding Straplines to Improve Data Quality for Abstractive News Summarization The paper discusses the impact of non-summary texts, specifically straplines, on the quality of news article summarization. The authors identify straplines as a common form of non-summary text that is often included in scraped corpora used for news summarization. They present a rule-based strapline detection method that achieves good performance and show that removing straplines and noise from the training data of a news summarizer results in higher quality summaries, with improvements as high as 7 points ROUGE score. ACL 2022
TLDR9+: A Large Scale Resource for Extreme Summarization of Social Media Posts The paper discusses the importance of training data in developing summarization systems and introduces a new large-scale summarization dataset called TLDR9+ containing over 9 million training instances extracted from Reddit discussion forum. The dataset is specifically gathered for extreme summarization and is more than twice larger than the previously proposed dataset. The authors also distill a more fine-grained dataset called TLDRHQ by sampling high-quality instances from TLDR9+ with the help of human annotations. The paper further evaluates different state-of-the-art summarization models on the proposed datasets. EMNLP 2021
An Exploration of Post-Editing Effectiveness in Text Summarization The paper discusses the potential benefits of human-AI collaboration in text summarization through post-editing. The study conducted with 72 participants compared post-editing provided summaries with manual summarization for summary quality, human efficiency, and user experience on formal and informal text. The results suggest that post-editing can be useful in some cases, but not in others, and participants' different editing strategies and needs for assistance offer implications for future human-AI summarization systems. NAACL 2022
SoLSCSum: A Linked Sentence-Comment Dataset for Social Context Summarization The paper introduces a new dataset called SoLSCSum for social context summarization, consisting of 157 open-domain articles and their comments from Yahoo News that were manually annotated by two annotators to extract standard summaries. The dataset has a high inter-annotator agreement and can be used to train summary methods such as SVM. The paper also demonstrates the potential use of the dataset by training a learning to rank model with local and cross features, which achieved significant improvements in document summarization over state-of-the-art baselines. CIKM 2016
Comparative Document Summarisation via Classification The paper discusses extractive summarization in a comparative setting, where the objective is to select a small number of documents that represent each group and distinguish them from other groups. The authors propose a new set of objective functions that connect recent literature on document summarization, interpretable machine learning, and data subset selection. They cast the problem as a binary classification among different groups and derive objectives based on the maximum mean discrepancy and a gradient-based optimization strategy. The authors evaluate comparative summarization methods on a newly curated collection of controversial news topics over 13 months and find that gradient-based optimization outperforms discrete and baseline approaches in 15 out of 24 different automatic evaluation settings. In crowd-sourced evaluations, summaries from gradient optimization elicit 7% more accurate classification from human workers than discrete optimization. The authors suggest that their formulation of comparative summarization will be useful in comparing content sources, authors, related topics, or distinct viewpoints. AAAI 2019
LipKey: A Large-Scale News Dataset for Absent Keyphrases Generation and Abstractive Summarization System: The paper discusses the importance of summaries, keyphrases, and titles in capturing the content of a document. The authors introduce LipKey, a news corpus with human-written abstractive summaries, absent keyphrases, and titles. They use multi-task training and joint structured inputs to improve transformer-based summarization models by including absent keyphrases and titles as additional context to the source document. COLING 2022
Toward Unifying Text Segmentation and Long Document Summarization The paper discusses the importance of text segmentation in understanding and summarizing long documents, particularly in transcripts of audio/video recordings. The authors propose an approach that simultaneously performs summarization and segmentation to learn robust sentence representations, which is further enhanced by an optimization-based regularizer to promote selection of diverse summary sentences. The approach was evaluated on multiple datasets and found to achieve state-of-the-art performance on publicly available benchmarks, with better crossgenre transferability when equipped with text segmentation. The paper also includes analyses to quantify the impact of section segmentation on summarizing written and spoken documents of substantial length and complexity. EMNLP 2022
Analyzing Multi-Task Learning for Abstractive Text Summarization The paper explores the effects of task families on abstractive text summarization, specifically analyzing the influence of multi-task learning strategies using task families for the English language. The authors group tasks into three strategies and evaluate trained models through two downstream tasks, finding that certain combinations of task families positively impact downstream performance. They also find that choice and combinations of task families influence downstream performance more than the training scheme, supporting the use of task families for abstractive text summarization. The code is publicly available. EMNLP 2022
WikiSum: Coherent Summarization Dataset for Efficient Human-Evaluation The paper discusses the challenges of evaluating summarization output from existing datasets, which are often curated from academic documents written for experts. To address this issue, the authors present a new dataset based on article summaries from the WikiHow website, which are written in plain language and focused on how-to articles. The authors compare their dataset to existing ones and show that it makes human evaluation more manageable and effective. A human evaluation conducted on PubMed and the proposed dataset supports their findings. ACL 2021
HaRiM: Evaluating Summary Quality with Hallucination Risk The paper discusses the challenge of measuring the factual consistency of generated text in summarization models. The authors propose a reference-free metric called HaRiM, which measures hallucination risk based on token likelihoods and correlates well with human judgment on three summary-quality annotation sets. They reinterpret a previously suggested objective as a hallucination risk measurement to better estimate summary quality without requiring additional training or alignment to human judgments. The authors hope their work will facilitate progress in both automated evaluation and generation of summaries. AACL 2022
FEQA: A Question Answering Evaluation Framework for Faithfulness Assessment in Abstractive Summarization The paper discusses the issue of neural abstractive summarization models generating content inconsistent with the source document, and the inadequacy of existing automatic metrics to capture such mistakes. The authors propose an automatic question answering (QA) based metric for evaluating the faithfulness of generated summaries, which has a higher correlation with human faithfulness scores, especially on highly abstractive summaries. The authors also find that current models exhibit a trade-off between abstractiveness and faithfulness, with outputs having less word overlap with the source document being more likely to be unfaithful. ACL 2020
NEWTS: A Corpus for News Topic-Focused Summarization∗ The paper discusses how text summarization models are improving and how existing benchmarking corpora may not reflect the full range of summarization needs. The paper introduces a new topical summarization corpus called NEWTS, which is based on the CNN/Dailymail dataset and annotated via online crowd-sourcing. Each source article is paired with two reference summaries, each focusing on a different theme of the source document. The paper evaluates existing techniques and analyzes the effectiveness of different prompting methods. ACL 2022
EMAILSUM: Abstractive Email Thread Summarization The paper discusses the importance of summarizing conversation threads to improve work and communication efficiency. To aid in research on thread summarization, the authors developed an abstractive Email Thread Summarization dataset and conducted a study on different summarization techniques. The study revealed challenges in current abstractive summarization models, such as understanding the sender's intent and identifying the roles of sender and receiver. The authors also found that widely used automatic evaluation metrics are weakly correlated with human judgments, emphasizing the importance of human evaluation and the development of better metrics. EACL 2021
Rigid Formats Controlled Text Generation The paper discusses the challenges of generating text in rigid formats such as lyrics, sonnets, and classical Chinese poetry, which require adherence to strict formatting and rhyming schemes while maintaining sentence integrity. The authors propose a framework called SongNet, which is a Transformer-based auto-regressive language model that uses tailor-designed symbols to improve modeling performance. The attention mechanism is also improved to capture future information on the format. The framework is pre-trained and fine-tuned, and experiments show that it generates better results than existing methods in terms of both automatic metrics and human evaluation. ACL 2020
SummScreen: A Dataset for Abstractive Screenplay Summarization The paper introduces a summarization dataset called SUMMSCREEN, which consists of pairs of TV series transcripts and human-written recaps. The dataset poses a challenge for abstractive summarization due to plot details being scattered throughout the transcript and the presence of content that does not directly relate to the central plot. The paper proposes two entity-centric evaluation metrics and evaluates several methods, including neural models and those based on nearest neighbors. An oracle extractive approach outperforms all benchmarked models, indicating that neural models are unable to fully exploit the input transcripts. Human evaluation and qualitative analysis show that non-oracle models are competitive with their oracle counterparts but generate unfaithful facts, suggesting future research directions. ACL 2022
Towards Personalized Review Summarization via User-Aware Sequence Network The paper discusses personalized review summarization, which generates a condensed summary for a user's review, accounting for their preference on different aspects or writing style. The proposed model, User-aware Sequence Network (USN), considers the user's characteristics when generating summaries, containing a user-aware encoder and decoder. The user-aware encoder selects important information of a review, and the user-aware decoder incorporates user characteristics and word-using habits to generate personalized summaries. The model was validated using a new dataset, and achieved state-of-the-art performance on personalized review summarization. The paper focuses on single-review summarization and leaves adapting the model to multi-review summarization scenarios for future work. The review provided is about a hotel near the airport, with a clean and comfortable room and a slightly high price. The summary generated is "very quite room in a great location." AAAI 2019
A Statistical Analysis of Summarization Evaluation Metrics Using Resampling Methods The paper discusses the challenges in evaluating the quality of summarization evaluation metrics and proposes methods for calculating confidence intervals and running hypothesis tests for correlations using two resampling methods, bootstrapping and permutation. The authors evaluate the proposed methods through simulation experiments and apply them to several automatic evaluation metrics across three sets of human annotations. They find that confidence intervals are wide, indicating high uncertainty in the reliability of automatic metrics. However, two recent works, QAEval and BERTScore, show statistical improvements over ROUGE in some evaluation settings. TACL 2021
Point at the Triple: Generation of Text Summaries from Knowledge Base Triples (Extended Abstract) System: The paper discusses a method for generating natural language summaries from knowledge base triples using a pointer-generator network. The network can generate regular words and verbalize triples in multiple ways. The approach was evaluated through automatic and human evaluations on single and open-domain summaries generation tasks, and it outperformed other data-driven baselines significantly. IJCAI 2020
Summarizing Relationships for Interactive Concept Map Browsers The paper discusses concept maps, which are visual summaries of important concepts from a dataset displayed as vertexes with edges showing natural language descriptions of relationships between concepts. While previous attempts at creating concept maps have been static, the paper presents a model that responds to queries by returning short, importance-ranked, natural language descriptions of the relationship between two requested concepts for display in a visual interface. The model is trained on a new public dataset and code and data are available at a specific GitHub link. EMNLP 2019
BillSum: A Corpus for Automatic Summarization of US Legislation The paper introduces BillSum, the first dataset for summarization of US Congressional and California state bills. The authors explain the challenges in processing this type of data and benchmark extractive methods that consider neural sentence representations and traditional contextual features. They also demonstrate that models built on Congressional bills can be used to summarize California bills, showing that methods developed on this dataset can transfer to states without human-written summaries. EMNLP 2019
How well do you know your summarization datasets? The paper discusses the lack of understanding of the characteristics of datasets used to train and evaluate summarization systems, and how they affect system performance and reliability of metrics. The authors manually analyze 600 samples from three popular summarization datasets and classify them into six categories based on noise types and summarization difficulty. They then analyze 27 state-of-the-art summarization models and 5 popular metrics, and report their findings on the distinct data quality and complexity distributions of datasets, the dependence of model performance and metric reliability on sample complexity, and the low scores received by faithful summaries due to poor diversity of references. The authors also release the code, annotated data, and model outputs. ACL 2021
Topic Concentration in Query Focused Summarization Datasets The paper discusses Query-Focused Summarization (QFS), which summarizes a document cluster in response to a specific input query. The authors note that current state-of-the-art algorithms for QFS do not significantly improve upon generic summarization methods, which ignore query relevance, when evaluated on traditional QFS datasets. They hypothesize that this is due to the high topic concentration in these datasets. To address this, they introduce a new QFS dataset with controlled levels of topic concentration and compare algorithms on this dataset. They report strong improvement in performance for algorithms that properly model query relevance and present three new QFS algorithms that outperform state-of-the-art methods on the new dataset. AAAI 2016
Neural Text Summarization: A Critical Evaluation The paper discusses the current state of text summarization, which aims to condense long documents into shorter versions while retaining important information. Despite increased interest and research, progress on benchmark datasets has stalled. The authors identify three primary issues: 1) datasets may contain noise and are underconstrained, 2) evaluation metrics do not account for important factors such as factual correctness, and 3) models overfit to biases in current datasets and lack diversity in their outputs. EMNLP 2019
Hallucinated but Factual! Inspecting the Factuality of Hallucinations in Abstractive Summarization The paper discusses how abstractive summarization systems often generate content that is not directly inferable from the source text, known as "hallucinations." However, the authors found that much of this hallucinated content is factual and can provide useful background information in a summary. They propose a novel detection approach to separate factual from non-factual hallucinations of entities, using pre-trained and finetuned masked language models. Their approach outperforms five baselines and strongly correlates with human judgments. The authors also show that their detector, when used as a reward signal in an off-line reinforcement learning algorithm, significantly improves the factuality of summaries while maintaining the level of abstractiveness. ACL 2022
Generating a Structured Summary of Numerous Academic Papers: Dataset and Method The paper discusses the challenges of summarizing numerous academic papers into a structured summary and proposes a solution called BigSurvey, which is a large-scale dataset for generating comprehensive summaries of academic papers on each topic. The authors utilize target summaries from over 7,000 survey papers and their 430,000 reference papers' abstracts as input documents. They also propose a summarization method called category-based alignment and sparse transformer (CAST), which outperforms various advanced summarization methods. IJCAI 2022
A Corpus of Very Short Scientific Summaries System: The paper introduces a new task of summarizing scientific articles in the chemistry domain into one or two-sentence table of contents entries. The authors use an open access publication corpus and evaluate their approach using state-of-the-art summarization methods. CONLL 2020
A Closer Look at Data Bias in Neural Extractive Summarization Models System: The paper discusses the current state of summarization datasets and how different factors of datasets affect the generalization behavior of neural extractive summarization models. The authors propose several properties of datasets that matter for the generalization of summarization models and analyze how different properties of datasets influence the choices of model structure design and training methods. They demonstrate that a deep understanding of dataset characteristics can lead to significant improvements in existing models. EMNLP 2019
BrailleSUM: A News Summarization System for the Blind and Visually Impaired People System: The paper discusses the challenges of document summarization for the blind and visually impaired people and proposes a new system called BrailleSUM. The system takes into account the length of each sentence in news articles and uses an ILP-based summarization method. Evaluation results show that BrailleSUM can produce shorter braille summaries without sacrificing content quality. ACL 2015
SummEval: Re-evaluating Summarization Evaluation judgments. The paper addresses the lack of consensus and comprehensive studies on evaluation metrics for text summarization. The authors re-evaluate 14 automatic evaluation metrics and benchmark 23 recent summarization models using these metrics. They also assemble the largest collection of summaries generated by models trained on the CNN/DailyMail news dataset and share it in a unified format. Additionally, they implement and share a toolkit for evaluating summarization models across a broad range of automatic metrics and assemble the largest and most diverse collection of human judgments of model-generated summaries on the CNN/Daily Mail dataset. The authors hope that their work will promote a more complete evaluation protocol for text summarization and advance research in developing evaluation metrics that better correlate with human judgments. TACL 2021
AnswerSumm: A Manually-Curated Dataset and Pipeline for Answer Summarization The paper discusses the challenge of answer summarization in Community Question Answering (CQA) fora such as Stack Overflow and Yahoo! Answers, where each question thread can receive a large number of answers with different perspectives. The absence of a dataset to provide supervision for producing such summaries is a major obstacle. The paper introduces a novel dataset of 4,631 CQA threads for answer summarization curated by professional linguists. The pipeline gathers annotations for all subtasks of answer summarization, including relevant answer sentence selection, grouping these sentences based on perspectives, summarizing each perspective, and producing an overall summary. The paper also introduces a novel unsupervised approach for multi-perspective data augmentation that boosts summarization performance according to automatic evaluation. Finally, the paper proposes reinforcement learning rewards to improve factual consistency and answer coverage and analyzes areas for improvement. NAACL 2022
Mitigating Data Scarceness through Data Synthesis, Augmentation and Curriculum for Abstractive Summarization System: This paper discusses three techniques for improving abstractive summarization models without requiring additional data. These techniques include data synthesis with paraphrasing, data augmentation with sample mixing, and curriculum learning with two new difficulty metrics. The experiments conducted show that these techniques can improve summarization performance across two models and two small datasets, both when applied in isolation and when combined. EMNLP 2021
Re-evaluating Evaluation in Text Summarization The paper discusses the importance of automated evaluation metrics in text summarization tasks and highlights the need to re-evaluate the current standard metric, ROUGE, which has been used for almost 20 years. The authors assess the reliability of automatic metrics using top-scoring system outputs on modern datasets and systems, both abstractive and extractive, for system-level and summary-level evaluation settings. They find that conclusions about evaluation metrics on older datasets do not necessarily hold on modern datasets and systems. The authors release a dataset of human judgments collected from 25 top-scoring neural summarization systems, which can be found on GitHub. EMNLP 2020
CAVES: A Dataset to facilitate Explainable Classification and Summarization of Concerns towards COVID Vaccines The paper discusses the societal challenge of convincing people to get vaccinated against COVID-19 and the use of social media analysis to understand specific concerns people have towards vaccines. The authors have curated CAVES, a large-scale dataset of about 10k COVID-19 anti-vaccine tweets labeled into various specific anti-vaccine concerns in a multi-label setting. This is the first multi-label classification dataset that provides explanations for each label and class-wise summaries of all tweets. Preliminary experiments show that this is a challenging dataset for multi-label explainable classification and tweet summarization. SIGIR 2022
ConvoSumm: Conversation Summarization Benchmark and Improved Abstractive Summarization with Argument Mining The paper discusses the lack of standardized datasets for summarizing online discussions, which has resulted in abstractive text summarization primarily focusing on news articles. To address this gap, the authors design annotation protocols to crowdsource four new datasets on diverse online conversation forms. They benchmark state-of-the-art models on these datasets and analyze characteristics associated with the data. They also evaluate these models on widely-used conversation summarization datasets to establish strong baselines in this domain. The authors incorporate argument mining through graph construction to directly model the issues, viewpoints, and assertions present in a conversation and filter noisy input, showing comparable or improved results according to automatic and human evaluations. ACL 2021
ASPECTNEWS: Aspect-Oriented Summarization of News Documents The paper discusses the limitations of generic and query-based summaries and proposes aspect-oriented summaries that focus on high-level topics discussed among similar types of documents. The authors collected a dataset of aspect-oriented summaries for articles in news sub-domains and evaluated existing techniques for generating such summaries without in-domain training data. They compared different training schemes and found that their final approach produced focused summaries that were better than those from a generic summarization system or keyword matching, and that the system was sensitive to the choice of keywords. ACL 2022
CDEvalSumm: An Empirical Study of Cross-Dataset Evaluation for Neural Summarization Systems The paper discusses the limitations of existing evaluation methods for text summarization models, which are typically trained and evaluated on the same dataset. The authors argue that this approach can narrow our understanding of the generalization ability for different summarization systems. To address this, they perform an in-depth analysis of different datasets and investigate the performance of 11 representative summarization systems on 5 datasets from different domains under a cross-dataset setting. The study reveals the effect of model architectures and generation ways (i.e. abstractive and extractive) on model generalization ability and sheds light on the limitations of existing summarizers. Supplementary code can be found on their Github page. EMNLP 2020
End-to-End Segmentation-based News Summarization System: The paper introduces a new way of digesting news content by segmenting a news article into multiple sections and generating corresponding summaries for each section. The authors create a dataset called SEGNEWS, consisting of 27k news articles with sections and aligned heading-style section summaries. They propose a novel segmentation-based language generation model adapted from pretrained language models that can jointly segment a document and produce the summary for each section. Experimental results on SEGNEWS show that their model outperforms several state-of-the-art sequence-to-sequence generation models for this task. ACL 2022
Falsesum: Generating Document-level NLI Examples for Recognizing Factual Inconsistency in Summarization The paper discusses how neural abstractive summarization models can generate summaries that are factually inconsistent with their source documents. Previous attempts to recognize such inconsistencies using natural language inference (NLI) have been unsuccessful due to the models' inability to generalize to the task. The authors propose a data generation pipeline called Falsesum, which uses a text generation model to introduce varying types of factual inconsistencies into human-annotated summaries. The resulting dataset contains diverse yet plausible examples, and models trained on it improve performance on four benchmarks for detecting factual inconsistency in summarization. NAACL 2022
Joint Learning of Answer Selection and Answer Summary Generation in Community Question Answering The paper discusses the issues of redundancy and lengthiness in crowdsourced answers in Community Question Answering (CQA), which limit the performance of answer selection and lead to difficulties for community users. To solve these problems, the authors propose a novel joint learning model that tackles the tasks of answer selection and answer summary generation in CQA. They design a question-driven pointer-generator network that exploits the correlation information between question-answer pairs to aid in attending the essential information when generating answer summaries. They also leverage the answer summaries to alleviate noise in original lengthy answers when ranking the relevancy degrees of question-answer pairs. The authors construct a new large-scale CQA corpus, WikiHowQA, which contains long answers for answer selection as well as reference summaries for answer summarization. The experimental results show that the joint learning method effectively addresses the answer redundancy issue in CQA and achieves state-of-the-art results on both answer selection and text summarization tasks. The proposed model is also shown to be of great transferring ability and applicability for resource-poor CQA tasks that lack reference answer summaries. AAAI 2020
Evaluation of Abstractive Summarisation Models with Machine Translation in Deliberative Processes The paper discusses the summarization of deliberative processes in non-English languages, which involves combining multiple narratives of poor grammatical quality in a single text. The authors evaluate various abstractive summarization models in combination with a machine translation model, and report promising results in terms of fluency, consistency, and relevance of the summaries produced. The approach is easy to implement for many languages by changing the translation model. EMNLP 2021
Finding a Balanced Degree of Automation for Summary Evaluation The paper discusses the challenges of evaluating summarization tasks using human evaluation and automatic metrics. The authors propose a flexible semiautomatic to automatic summary evaluation metrics called LitePyramid, which uses a natural language inference model and semantic role labeling model to replace manual work. LitePyramid is compared to 15 existing metrics and evaluated on three meta-evaluation datasets and a newly collected dataset. The results show that LitePyramid consistently has the best summary-level correlations and can reduce costs for future data collection. EMNLP 2021
SUMPUBMED: Summarization Dataset of PubMed Scientific Articles The paper discusses the limitations of text summarization models that are trained on news article datasets, where the summary is typically located at the beginning of the text. To address this issue, the authors created a new dataset called SUMPUBMED, which contains scientific articles from the PubMed archive. The summary in SUMPUBMED is distributed throughout the text and contains rare domain-specific scientific terms, making it challenging for seq2seq models that are trained on news articles to summarize effectively. The authors conclude that SUMPUBMED provides new opportunities for improving text summarization models and developing new evaluation metrics. ACL 2021
Unsupervised Opinion Summarization with Content Planning The paper discusses the challenges of using deep learning techniques for summarizing reviews due to the lack of large-scale datasets. The authors propose a method that incorporates content planning into the summarization model, which improves the quality of the output and allows for the creation of more natural synthetic datasets. The content plans are generated from aspect and sentiment distributions induced from data without expensive annotations. The synthetic datasets are created by sampling pseudo-reviews from a Dirichlet distribution, and the model generates summaries based on input reviews and induced content plans. Experimental results show that their approach outperforms other models in generating informative, coherent, and fluent summaries that capture opinion consensus. AAAI 2021
Automatic learner summary assessment for reading comprehension System: The paper discusses the development of a tool for assessing learner reading comprehension through automated assessment of their summaries. The authors propose three novel approaches to assess the summaries and evaluate them on two datasets they created. The results show that their models outperform traditional approaches and produce quality assessments close to those of professional examiners. NAACL 2019
Investigating Metric Diversity for Evaluating Long Document Summarisation The paper discusses the LongSumm shared task, which focuses on long document summarization and has been limited by its use of a single family of metrics for evaluation. The authors replicated the evaluation using multiple test set samples and found that the use of additional metrics revealed high-quality summaries missed by the original metrics. They also suggest that SPICE could be a candidate metric for summarization evaluation in LongSumm1. The relative ranking of systems changed under this more rigorous evaluation, but some key learnings from previous years still held. COLING 2022
At Which Level Should We Extract? An Empirical Study on Extractive Document Summarization The paper discusses the effectiveness of extractive methods in automatic document summarization and proposes extracting sub-sentential units instead of full sentences. The authors show that extracting full sentences can lead to redundancy and unnecessity issues, and present a neural extractive model that leverages sub-sentential information. The experiments and analyses demonstrate that extracting sub-sentential units performs competitively compared to full sentence extraction. The paper provides inspiration for future research on the basic extraction units in extractive summarization. COLING 2020
Dissecting Generation Modes for Abstractive Summarization Models via Ablation and Attribution The paper proposes a two-step method to interpret the decisions made by neural abstractive summarization models. The first step involves analyzing the model's behavior to categorize each decoder decision into one of several generation modes. The second step involves interpreting the decisions using different attribution methods to determine their importance for the generation of the next token. The paper demonstrates the method's capability to identify phrases the summarization model has memorized and determine where in the training pipeline this memorization happened, as well as study complex generation phenomena like sentence fusion on a per-instance basis. ACL 2021
Understanding Points of Correspondence between Sentences for Abstractive Summarization The paper discusses the challenge of fusing sentences with disparate content to create informative and succinct summaries, which is a task that humans can easily perform but is difficult for modern abstractive summarizers. The authors propose introducing the notion of points of correspondence, which are cohesive devices that tie any two sentences together into a coherent text, and provide a dataset containing human annotations of points of correspondence between sentences. The dataset bridges the gap between coreference resolution and summarization and can serve as a basis for future work to measure the success of sentence fusion systems. ACL 2020
HOLMS: Alternative Summary Evaluation with Large Language Models The paper discusses the need for evaluation measures in document summarization that can rank systems based on individual summaries rather than just an average score. It highlights the limitations of current measures like ROUGE and BLEU, which are lexical in nature and not ideal for training neural networks. The authors propose a new hybrid evaluation measure called HOLMS, which combines language models and lexical similarity measures. They demonstrate through experiments that HOLMS outperforms ROUGE and BLEU in correlation with human judgments on several extractive summarization datasets for both linguistic quality and pyramid scores. COLING 2020
Evaluation of Summarization Systems across Gender, Age, and Race System: The paper discusses how summarization systems are evaluated by human annotators and raters, who are often recruited through platforms with skewed demographics. The authors argue that this can lead to bias in system development and evaluation, as summary evaluation is sensitive to protected attributes. They suggest building models that cater to all groups rather than just some. EMNLP 2021
Phrase-Level Localization of Inconsistency Errors in Summarization by Weak Supervision The paper proposes a methodology for identifying inconsistency errors in summarization. A synthetic dataset is created to train a model called SumPhrase, which can detect factual errors in summarization more effectively than existing weakly supervised methods. The joint identification of error-corresponding original sentences is proven to be effective in improving error detection accuracy. COLING 2022
A Graph-theoretic Summary Evaluation for ROUGE System: The paper discusses the limitations of the ROUGE evaluation metric for text summarization, which only considers surface similarities between summaries and cannot accurately assess summaries with lexical variations and paraphrasing. The authors propose a graph-based approach to incorporate both lexical and semantic similarities into ROUGE. The results of experiments on TAC AESOP datasets show that this approach improves the correlation between ROUGE and human judgments. EMNLP 2018
Time-Limits and Summaries for Faster Relevance Assessing The paper discusses the importance of relevance assessing in applications such as high-recall retrieval and test collection construction. The authors conducted a user study with 60 participants to investigate the impact of time limits and document size on relevance assessing. They found that using a time limit as short as 15 seconds or judging document summaries in place of full documents could significantly speed judging without significantly affecting judging quality. Participants found judging document summaries with a 60 second time limit to be the easiest and best experience. The authors suggest that high quality document summaries can provide the same speed benefits as time limits while improving the judging experience for assessors. SIGIR 2019
What Makes a Good Podcast Summary? System: This paper discusses the motivation behind abstractive summarization of podcasts, which is driven by the increasing popularity of podcasts and the needs of their listeners. The authors note that podcasting is a unique domain that differs from news and other media commonly studied in automatic summarization research. The study uses a collection of podcast summaries generated by different algorithms and human judgments of summary quality from the TREC 2020 Podcasts Track to explore the correlations between various automatic evaluation metrics and human judgments, as well as the linguistic aspects of summaries that lead to strong evaluations. The qualities of a good podcast summary are still unknown, and this study aims to shed light on this topic. SIGIR 2022
Earlier Isn’t Always Better: Sub-aspect Analysis on Corpus and System Biases in Summarization The paper explores the biases and sub-aspects of summarization systems, specifically position, importance, and diversity, across nine different summarization corpora. The study finds that while position exhibits substantial bias in news articles, this is not the case for academic papers and meeting minutes. Additionally, different types of summarization systems are composed of different degrees of the sub-aspects. The study provides useful lessons for developing new summarization systems and collecting new summarization datasets. EMNLP 2019
Searching for Effective Neural Extractive Summarization: What Works and What’s Next System: The paper discusses the success of deep neural networks in text summarization, but notes that there is still much to be understood about why they work so well and how they can be improved. The authors explore different model architectures, transferable knowledge, and learning schemas to improve neural extractive summarization systems. They also present a new framework that achieves state-of-the-art results on CNN/DailyMail. The paper aims to provide insights for future research on extractive summarization and the source code is available on Github. ACL 2019
Is Human Scoring the Best Criteria for Summary Evaluation? System: The paper challenges the commonly held belief that a summary quality measure is best judged by how closely it correlates with quality scores produced by human annotators. The authors present observations that question this view and propose an alternative criterion for selecting the best measure from a group of measures that does not rely on human scores. ACL 2021
SUMMAC: Re-Visiting NLI-based Models for Inconsistency Detection in Summarization The paper discusses the importance of factual consistency in summaries and the limitations of natural language inference (NLI) models for inconsistency detection. The authors propose a new method called SUMMACCONV, which segments documents into sentence units and aggregates scores between pairs of sentences, enabling NLI models to be successfully used for this task. They also introduce a new benchmark called SUMMAC, consisting of six large inconsistency detection datasets. On this dataset, SUMMACConv obtains state-of-the-art results with a balanced accuracy of 74.4%, a 5% improvement compared with prior work. TACL 2022
What Have We Achieved on Text Summarization? The paper discusses the current state of text summarization using deep learning and highlights the gaps that still exist between automatic summarizers and human professionals. The authors use the Multidimensional Quality Metric to identify 8 major sources of errors on 10 representative summarization models. They find that extractive summarizers are generally better than abstractive ones in terms of faithfulness and factual-consistency. They also note that pre-training techniques, particularly sequence-to-sequence pre-training, are highly effective for improving text summarization, with BART being the most effective. The paper provides insights into the strengths and limitations of different summarization techniques and highlights areas for future research. EMNLP 2020
Are Factuality Checkers Reliable? Adversarial Meta-evaluation of Factuality in Summarization The paper discusses the importance of generating summaries that are not only fluent and informative but also factually correct, and the rapid development of the field of factual evaluation. However, the meta-evaluation methodologies of factuality metrics are limited in their opacity, leading to insufficient understanding of their relative advantages and applicability. The paper presents an adversarial meta-evaluation methodology that diagnoses the strengths and weaknesses of 6 existing top-performing metrics over 24 diagnostic test datasets and searches for directions for further improvement by data augmentation. The authors propose several calls for future research and make all codes, diagnostic test datasets, and trained factuality models available. EMNLP 2021
Facet-Aware Evaluation for Extractive Summarization The paper proposes a new evaluation setup for extractive summarization that focuses on assessing the information coverage in extracted summaries. This setup involves treating each sentence in the reference summary as a facet and identifying the sentences in the document that express the semantics of each facet as support sentences. The evaluation is then performed by comparing the indices of extracted sentences and support sentences of all the facets in the reference summary. The authors construct an extractive version of the CNN/Daily Mail dataset to facilitate this new evaluation setup and demonstrate that it is more effective than commonly adopted metrics like ROUGE in manifesting better correlation with human judgment, enabling fine-grained evaluation and comparative analysis, and revealing valuable insights of state-of-the-art summarization methods. ACL 2020
Mapping the Design Space of Human-AI Interaction in Text Summarization The paper explores the role of humans in automatic text summarization systems and the design considerations for human-AI interaction in text generation tasks. The authors conducted a literature review and developed a taxonomy of five interactions in AI-assisted text generation. They designed text summarization prototypes for each interaction and interviewed 16 users to understand their expectations, experience, and needs regarding efficiency, control, and trust with AI in text summarization. The paper proposes design considerations for human-AI interaction in text summarization and broader text generation tasks. NAACL 2022
Fact-based Content Weighting for Evaluating Abstractive Summarisation The paper discusses the difficulty of evaluating abstractive summarization using standard word-overlap-based metrics, and introduces a new evaluation metric based on fact-level content weighting. The metric relates the facts of the document to the facts of the summary, and assumes that a good summary will reflect all relevant facts present in the human-generated reference summary. The authors confirm this hypothesis by showing that their weightings are highly correlated to human perception and compare favorably to a recent manual highlight-based metric. ACL 2020
A Semantic QA-Based Approach for Text Summarization Evaluation The paper discusses the challenge of assessing the quality of Natural Language Processing and Computational Linguistics applications that generate new texts based on existing texts. Specifically, the paper focuses on the problem of pinpointing content differences between two text passages, especially for large passages such as articles and books. The authors propose a new approach that treats one text passage as a small knowledge base and asks it a large number of questions to identify all content points. By comparing the correctly answered questions from two text passages, the authors are able to compare their content precisely. The experiment using 2007 DUC summarization corpus shows promising results. AAAI 2018
Self-Repetition in Abstractive Neural Summarizers The paper analyzes self-repetition in the output of neural summarizers, measuring it as the number of repeated n-grams of length four or longer. Three popular architectures (BART, T5, and Pegasus) are analyzed, and it is found that BART is particularly prone to self-repetition. Fine-tuning on more abstractive data and data featuring formulaic language is associated with a higher rate of self-repetition. Qualitative analysis reveals that systems produce artefacts such as ads and disclaimers unrelated to the content being summarized, as well as formulaic phrases common in the fine-tuning domain. The paper suggests that their approach to corpus level analysis of self-repetition may help practitioners clean up training data for summarizers and ultimately support methods for minimizing the amount of self-repetition. AACL 2022
A Principled Framework for Evaluating Summarizers: Comparing Models of Summary Quality against Human Judgments System: The paper introduces a new framework for evaluating extractive summarizers based on an optimization problem. It shows that every extractive summarizer can be broken down into an objective function and an optimization technique. The authors compare and evaluate several objective functions in well-known summarizers and analyze their correlation with human judgments. The comparison across two datasets provides surprising insights into the role and performance of objective functions in different summarizers. ACL 2017
On Faithfulness and Factuality in Abstractive Summarization The paper examines the limitations of neural text generation models for abstractive document summarization and finds that these models often generate content that is unfaithful to the input document. A large scale human evaluation of several neural abstractive summarization systems was conducted to better understand the types of hallucinations they produce. The analysis shows that pretrained models are better summarizers in terms of generating faithful and factual summaries as evaluated by humans. Textual entailment measures are found to better correlate with faithfulness than standard metrics, potentially leading to better automatic evaluation metrics and training and decoding criteria. ACL 2020
From COMET to COMES – Can Summary Evaluation Benefit from Translation Evaluation? The paper discusses the use of COMET, a neural-based evaluation metric for Machine Translation systems, for evaluating Text Summarization systems. Despite being trained on multilingual MT outputs, COMET performs well in monolingual settings for predicting summarization output quality. The authors introduce a variant of the model, COMES, trained on annotated summarization outputs using MT data for pre-training. The performance of COMES is examined on several datasets with human judgments for different notions of summary quality, across various domains and languages. AACL 2022
Towards Question-Answering as an Automatic Metric for Evaluating the Content Quality of a Summary The paper proposes a new metric, QAEval, to evaluate the content quality of a summary using question-answering (QA) instead of traditional text overlap based metrics such as ROUGE. QA-based methods directly measure a summary's information overlap with a reference, making them fundamentally different than text overlap metrics. The authors demonstrate the experimental benefits of QA-based metrics through an analysis of QAEval, which outperforms current state-of-the-art metrics on most evaluations using benchmark datasets. The authors also identify the performance bottlenecks of QAEval and estimate that its potential upper-bound performance surpasses all other automatic metrics, approaching that of the gold-standard Pyramid Method. TACL 2021
Crowdsourcing Lightweight Pyramids for Manual Summary Evaluation The paper discusses the importance of manual evaluation in summary evaluation methodology and the traditional Pyramid protocol, which is reliable but expensive and requires expertise. Cheaper and less thorough manual evaluation methods have been used instead, but the authors propose a lightweight sampling-based version of the Pyramid approach that can be crowdsourced. They analyze the performance of their method and release their crowdsourced Summary-ContentUnits and crowdsourcing scripts for future evaluations. NAACL 2019
A Simple Theoretical Model of Importance for Summarization System: The paper argues that establishing theoretical models of Importance will advance our understanding of summarization and improve summarization systems. The authors propose definitions of Redundancy, Relevance, and Informativeness, and show how Importance arises as a single quantity that unifies these concepts. The paper also provides intuitions to interpret the proposed quantities and experiments to demonstrate the potential of the framework to inform and guide subsequent works. ACL 2019
Understanding the Behaviour of Neural Abstractive Summarizers using Contrastive Examples The paper examines the performance of neural abstractive summarizers in generating summary texts and their ability to understand deeper syntactic and semantic structures. The authors generate a set of contrastive summaries and test whether existing neural summarizers score them more highly than human-written summaries. They find that these systems fail to understand the source text in a majority of cases. NAACL 2019
SSAS: Semantic Similarity for Abstractive Summarization The paper introduces a new metric called Semantic Similarity for Abstractive Summarization (SSAS) that evaluates system-generated summaries at a semantic inference level. Previous approaches relied on word or syntactic sub-sequence overlap, which cannot evaluate summaries at this level. SSAS uses natural language inference and paraphrasing techniques to weigh quantities representing agreement, contradiction, topical neutrality, paraphrasing, and optionally ROUGE score between a system-generated and human-written summary. IJCNLP 2017
HIGHRES: Highlight-based Reference-less Evaluation of Summarization The paper discusses the challenges of manual evaluation of system-generated summaries and proposes a novel approach called HIGHlight-based Reference-less Evaluation of Summarization (HIGHRES). This approach involves assessing summaries against the source document using manually highlighted salient content. The authors validate their approach by employing crowd-workers to augment a dataset and compare two state-of-the-art systems. They demonstrate that HIGHRES improves inter-annotator agreement and helps emphasize differences among systems that would be ignored under other evaluation approaches. ACL 2019
Pruning Basic Elements for Better Automatic Evaluation of Summaries The paper introduces a new automatic evaluation measure for summarization called pruned Basic Elements (pBE). It addresses the weakness of the widely used BE concept, which redundantly matches basic elements. pBE prunes basic elements by disregarding frequency count and reducing semantically overlapped elements based on word similarity. The study shows that pBE outperforms ROUGE in DUC datasets and achieves the highest rank correlation coefficient in TAC 2011 AESOP task. NAACL 2018
Factual Consistency Evaluation for Text Summarization via Counterfactual Estimation The paper discusses the issue of factual inconsistency in generated summaries despite significant progress in text summarization. The authors propose a novel metric to evaluate factual consistency in text summarization via counterfactual estimation, which removes the effect of language prior from the total causal effect on the generated summary. This provides a simple yet effective way to evaluate consistency without relying on other auxiliary tasks. The authors conduct experiments on three public abstractive text summarization datasets and demonstrate the advantages of the proposed metric in improving the correlation with human judgments and the convenience of usage. The source code is available at https://github.com/xieyxclack/factual_coco. EMNLP 2021
A Training-free and Reference-free Summarization Evaluation Metric via Centrality-weighted Relevance and Self-referenced Redundancy The paper proposes a training-free and reference-free summarization evaluation metric to avoid the costly and time-consuming process of collecting human-annotated references and ratings. The metric consists of a centrality-weighted relevance score and a self-referenced redundancy score. The relevance score is computed between the pseudo reference built from the source document and the given summary, and the redundancy score evaluates the redundant information in the summary. The final evaluation score is produced by combining the relevance and redundancy scores. The proposed method outperforms existing methods on both multi-document and single-document summarization evaluation. The source code is available at the given link. ACL 2021
SueNes: A Weakly Supervised Approach to Evaluating Single-Document Summarization via Negative Sampling The paper discusses the limitations of current automatic summary evaluation metrics, which focus on lexical similarity and require a reference summary. The authors propose a weakly supervised approach that does not require a reference summary, using existing summarization datasets and pairing documents with corrupted reference summaries for training. In cross-domain tests, their approach outperforms baselines and shows advantages in gauging linguistic qualities over all metrics. NAACL 2022
FACTGRAPH: Evaluating Factuality in Summarization with Semantic Graph Representations The paper discusses the limitations of current abstractive summarization approaches, which often generate summaries that are not factually consistent with the source document. The authors propose a new method called FACTGRAPH, which decomposes the document and summary into structured meaning representations (MR) to better evaluate factuality. FACTGRAPH encodes these MRs using a graph encoder and text encoder, and experiments show that it outperforms previous approaches by up to 15% in identifying factual errors and inconsistencies. NAACL 2022
Estimating Summary Quality with Pairwise Preferences The paper proposes a new evaluation approach for automatic summarization systems based on pairwise preferences of sentences, which is simpler and cheaper to obtain than gold standard summaries. The authors show that humans can provide useful feedback using this approach, and that it outperforms the three most popular versions of ROUGE with less expensive human input. Additionally, the framework can reuse already available evaluation data to achieve even better results. NAACL 2018
Studying Summarization Evaluation Metrics in the Appropriate Scoring Range The paper discusses the issue of evaluating automatic summarization systems using human judgments. The current human judgment datasets were created during the DUC/TAC shared tasks, but modern systems are better than the best systems submitted at that time. The paper shows that evaluation metrics which behave similarly on these datasets strongly disagree in the higher-scoring range where current systems operate. This creates a problem as we cannot decide which metric to trust. The paper calls for collecting human judgments for high-scoring summaries to resolve this debate and improve summarization systems and metrics. ACL 2019
Automated Pyramid Summarization Evaluation System: The paper discusses the development of a method called Pyramid evaluation, which assesses the content of paragraph-length summaries of source texts. This method involves creating a pyramid that lists distinct units of content found in several reference summaries, weights them based on how many reference summaries they occur in, and produces three scores based on the weighted content of new summaries. The paper presents an automated version of this method that is more efficient, transparent, and complete than previous automated pyramid methods. The new method is tested on a dataset of student summaries and historical NIST data from extractive summarizers. CONLL 2019
Training Dynamics for Text Summarization Models The paper discusses the fine-tuning process of pre-trained language models for summarization tasks and analyzes the training dynamics for generation models. The study focuses on different datasets and summary properties, such as abstractiveness and hallucination, to understand what the model learns at different stages of its fine-tuning process. The authors find that the model learns to copy the input early in the training process consistently across all datasets studied, while factual errors are learned in the later stages, though this behavior is more varied across domains. Based on these observations, the authors explore complementary approaches for modifying training to achieve different goals, such as improving factuality or improving abstractiveness. ACL 2022
Content Selection in Deep Learning Models of Summarization The paper discusses experiments with deep learning models of summarization in various domains, finding that many sophisticated features do not improve performance over simpler models. This suggests that creating a summarizer for a new domain may be easier than previously thought, and questions the benefit of deep learning models for summarization in domains with massive datasets. The paper suggests that new forms of sentence representations or external knowledge sources are needed for better summarization. EMNLP 2018
Analyzing Sentence Fusion in Abstractive Summarization System: The paper examines how abstractive summarization systems combine information from multiple sentences to form summary sentences. The researchers analyzed the outputs of five state-of-the-art summarizers and found that while the summary sentences were mostly grammatical, they often failed to remain faithful to the original article. The study highlights the need for further research in this area to improve the accuracy of abstractive summarization systems. EMNLP 2019
Investigating Crowdsourcing Protocols for Evaluating the Factual Consistency of Summaries The paper discusses the issue of factual inconsistencies in current pre-trained models used for summarization and the need to evaluate the factual consistency of summaries to develop better models. The authors conducted crowdsourced evaluations using two different methods to determine the factors that affect the reliability of human evaluation. They found that the ranking-based Best-Worst Scaling method is more reliable than the rating-based Likert Scale method, which highly depends on the target dataset and evaluation design. To improve crowdsourcing reliability, they extended the Likert rating scale and presented a scoring algorithm for Best-Worst Scaling called value learning. The authors also made their crowdsourcing guidelines publicly available to facilitate future work on factual consistency in summarization. NAACL 2022
Automatic Pyramid Evaluation Exploiting EDU-based Extractive Reference Summaries System: The paper discusses the automation of the pyramid method, a manual evaluation framework. The authors transform human-made reference summaries into extractive reference summaries consisting of Elementary Discourse Units (EDUs) from source documents. They then weight each EDU by counting the number of extractive reference summaries that contain it. The summary is scored based on the correspondences between EDUs in the summary and those in the pyramid. The authors conducted experiments on DUC and TAC data sets and found that their methods strongly correlate with various manual evaluations. EMNLP 2018
Re-Examining System-Level Correlations of Automatic Summarization Evaluation Metrics The paper discusses the reliability of automatic summarization evaluation metrics in replicating human judgments of summary quality. The authors identify two inconsistencies in the definition of system-level correlation and propose changes to address them. First, they suggest using the full test set instead of a subset judged by humans to calculate the system score for an automatic metric, leading to more precise estimates of system-level correlations. Second, they propose calculating correlations only on pairs of systems with small differences in automatic scores, which are commonly observed in practice. The authors demonstrate that the best estimate of the correlation of ROUGE to human judgments is near 0 in realistic scenarios, highlighting the need for more high-quality human judgments and improved automatic metrics when differences in system scores are small. NAACL 2022
Asking and Answering Questions to Evaluate the Factual Consistency of Summaries The paper discusses the limitations of abstractive summarization models due to frequent factual inconsistencies in their output. Existing automatic evaluation metrics are not sensitive to such errors. The authors propose QAGS, an automatic evaluation protocol that identifies factual inconsistencies in generated summaries by asking questions about the summary and its source. QAGS has higher correlations with human judgments of factual consistency than other automatic evaluation metrics and provides interpretability by indicating which tokens of a summary are inconsistent and why. The authors believe QAGS is a promising tool for automatically generating usable and factually consistent text. Code for QAGS is available on GitHub. ACL 2020
Evaluation of Cross Domain Text Summarization The paper discusses the effectiveness of extractive-abstractive hybrid summarization in generating concise summaries for long documents. Two approaches to hybrid summarization, extraction-then-abstraction and extraction-with-abstraction, are compared and evaluated through large-scale experiments. The study examines the generalization of the algorithms by testing them within and across news domains and comparing automatic assessments to human judgments. The results show that the extraction-then-abstraction approach outperforms the extraction-with-abstraction approach, especially for cross-domain headline generation. SIGIR 2020
Understanding the Extent to which Content Quality Metrics Measure the Information Quality of Summaries The paper analyzes the token alignments used by reference-based metrics such as ROUGE and BERTScore to compare summaries and argues that their scores largely cannot be interpreted as measuring information overlap. Rather, they are better estimates of the extent to which the summaries discuss the same topics. The consequence of this result is that the most frequently used summarization evaluation metrics do not align with the community’s research goal, to generate summaries with high-quality information. However, the paper concludes by demonstrating that a recently proposed metric, QAEval, which scores summaries using question-answering, appears to better capture information quality than current evaluations, highlighting a direction for future research. CONLL 2021
Using Analytic Scoring Rubrics in the Automatic Assessment of College-Level Summary Writing Tasks in L2 The paper discusses the automated scoring of college-level summary writing tasks in English as a second language (EL2) using the Reading-for-Understanding (RU) cognitive framework, extended with the Reading-to-Write (RW) element, and analytic scoring with six rubrics covering content and writing quality. The authors show that regression models with reference-based and linguistic features perform better than baselines across all rubrics and reveal interesting correlations between summary features and analytic rubrics, highlighting the links between the RU and RW constructs. IJCNLP 2017
How to Find Strong Summary Coherence Measures? A Toolbox and a Comparative Study for Summary Coherence Measure Evaluation The paper discusses the importance of automatically evaluating the coherence of summaries and the challenges of doing so due to the use of disparate datasets and metrics. The authors conduct a large-scale investigation of various methods for summary coherence modeling and introduce two novel analysis measures to identify biases in coherence measures. They find that currently available automatic coherence measures are not reliable across all evaluation metrics, but large-scale language models fine-tuned on self-supervised tasks show promising results if they are trained to generalize across different summary lengths. COLING 2022
PrefScore: Pairwise Preference Learning for Reference-free Summarization Quality Assessment System: The paper proposes a method for evaluating machine-generated summaries without a human-written reference summary. The method involves learning the preference rank of summaries using the Bradley-Terry power ranking model from inferior summaries generated by corrupting base summaries. The experiments conducted on several datasets show that the proposed method can produce scores highly correlated with human ratings. COLING 2022
How to Evaluate a Summarizer: Study Design and Statistical Analysis for Manual Linguistic Quality Evaluation The paper discusses the importance of manual evaluation in assessing progress in automatic text summarization. The authors conducted a survey on recent summarization system papers and found little agreement on how to perform evaluation studies. They conducted two evaluation experiments on coherence and repetitiveness and compared Likert-type and ranking annotations. They found that the best choice of evaluation method can vary depending on the aspect being evaluated. The authors also found that study parameters are often not fully reported and subsequent statistical analysis ignores grouping factors. They showed that the total number of annotators can have a strong impact on study power and that current statistical analysis methods can inflate type I error rates up to eight-fold. They highlight that eliciting multiple judgments per summary leads to less powerful and reliable annotations for system comparison given a fixed study budget. EACL 2021
Evaluating the Factual Consistency of Abstractive Text Summarization The paper proposes a model-based approach for verifying factual consistency and identifying conflicts between source documents and generated summaries. The model is trained jointly for three tasks: predicting whether each summary sentence is factually consistent or not, extracting a span in the source document to support this consistency prediction, and extracting the inconsistent span from each summary sentence that is deemed inconsistent. The approach outperforms previous models and provides useful assistance in verifying factual consistency. The authors also release a dataset, code, and trained model weights for factual consistency verification. EMNLP 2020
ESTIME: Estimation of Summary-to-Text Inconsistency by Mismatched Embeddings The paper introduces a new reference-free summary quality evaluation measure called ESTIME, which focuses on the faithfulness of the summary. The measure counts potential inconsistencies between the summary and the source document and correlates strongly with expert scores in the SummEval dataset. The paper also presents a method of generating subtle factual errors in human summaries and shows that ESTIME is more sensitive to these errors than other common evaluation measures. EMNLP 2021
Question Answering as an Automatic Evaluation Metric for News Article Summarization The paper discusses recent developments in automatic summarization and headline generation, which have focused on maximizing ROUGE scores. The authors propose an alternative evaluation metric called Answering Performance for Evaluation of Summaries (APES), which uses reading comprehension to assess a summary's ability to answer questions about the source article. They compare APES to other manual evaluation metrics and present a neural abstractive model that maximizes APES and increases ROUGE scores. NAACL 2019
An Anchor-Based Automatic Evaluation Metric for Document Summarization The paper discusses a new protocol for designing reference-based metrics for document summarization that requires the endorsement of source documents. The proposed anchored ROUGE metric fixes each summary particle on the source document, resulting in a more solid computation. Empirical results on benchmark datasets show that using the source document induces a higher correlation with human judgments for the ROUGE metric. The protocol is self-explanatory and easy to implement, and can foster various effective designs of reference-based metrics besides the anchored ROUGE. COLING 2020
QAFactEval: Improved QA-Based Factual Consistency Evaluation for Summarization The paper discusses the importance of factual consistency in text summarization models and evaluates two types of metrics, entailment-based and question answering (QA)-based, for measuring this quality. The authors find that carefully selecting the components of a QA-based metric is critical to performance and propose an optimized metric called QAFACTEVAL, which outperforms previous QA-based and entailment-based metrics. Additionally, the authors suggest that combining both types of metrics can further improve performance. NAACL 2022
SUM-QE: a BERT-based Summary Quality Estimation Model System: The paper introduces a new model called SUM-QE, which uses BERT to evaluate the quality of summarizations. Unlike other models, SUM-QE focuses on linguistic quality aspects that are not captured by content-based approaches. The model achieves high correlations with human ratings and outperforms simpler models. The predictions of SUM-QE can be used for system development and to inform users about the quality of automatically generated summaries and other types of text. EMNLP 2019
InfoLM: A New Metric to Evaluate Summarization & Data2Text Generation The paper discusses the challenges of assessing the quality of natural language generation systems through human annotation, which is expensive and time-consuming. Researchers often rely on automatic metrics, but existing string-based metrics like BLEU do not handle synonyms well. The authors introduce InfoLM, a family of untrained metrics that uses a pre-trained masked language model and information measures to address these flaws. They demonstrate that InfoLM achieves significant improvement and correlation gains in many configurations on both summarization and data2text generation through direct assessment. AAAI 2022
The Feasibility of Embedding Based Automatic Evaluation for Single Document Summarization The paper discusses the limitations of using ROUGE to evaluate summarization systems and presents experiments on using distributed representations for evaluation. The results show that the max value over each dimension of the summary ELMo word embeddings and averaging the cosine similarity of all encoders yield high correlation with human ratings in both reference-based and reference-free settings. The distributed representations outperform ROUGE in recent corpora for abstractive news summarization but are less effective on older test data and systems. EMNLP 2019
A Thorough Evaluation of Task-Specific Pretraining for Summarization The paper compares task-agnostic pretraining objectives with task-specific pretraining objectives for summarization tasks in a controlled study. The results show that task-agnostic pretraining is sufficient for most cases, reducing the need for costly task-specific pretraining. The study also reports new state-of-the-art numbers for two summarization tasks using a T5 model with 11 billion parameters and an optimal beam search length penalty. EMNLP 2021
How to Compare Summarizers without Target Length? Pitfalls, Solutions and Re-Examination of the Neural Summarization Literature The paper discusses how traditional summarization evaluations compared systems that produced summaries of the same length, but neural approaches have done away with this requirement. The paper presents experiments showing that summaries of different lengths produced by the same system have a clear non-linear pattern of quality as measured by ROUGE F1 scores. The paper proposes a new evaluation method where ROUGE scores are normalized by those of a random system producing summaries of the same length. The paper reanalyzes recently reported results and shows that some negative results are actually reports of system improvement once differences in length are taken into account. Finally, the paper presents a small-scale human evaluation showing a similar trend of perceived quality increase with summary length, calling for the need of similar normalization in reporting human scores. NAACL 2019
Metrics also Disagree in the Low Scoring Range: Revisiting Summarization Evaluation Metrics The paper discusses the evaluation of automatic metrics in text summarization, specifically focusing on the disagreement between metrics when ranking high-scoring summaries. The authors revisit previous experiments and suggest that the narrow scoring range of summaries may be the reason for the disagreement. They also analyze three other properties that impact inter-metric agreement: Ease of Summarization, Abstractiveness, and Coverage. The authors make their analysis code and data publicly available to encourage reproducible research. COLING 2020
What Makes a Good and Useful Summary? Incorporating Users in Automatic Summarization Research The paper discusses the gap between the current research focus in automatic summarization and users' needs, particularly university students who heavily rely on summaries. To address this, the authors propose a survey methodology that can be adjusted to investigate different user groups. They find that the current research directions do not fully align with students' needs and suggest ways to mitigate this mismatch in future research. NAACL 2022
Questioning the Validity of Summarization Datasets and Improving Their Factual Consistency The paper discusses the lack of a well-defined formulation for summarization evaluation, which has led to popular summarization datasets being constructed in a way that does not guarantee validity or factual consistency. The authors address this issue by combining factual consistency models to identify problematic instances and release a filtered summarization dataset called SummFC with improved factual consistency. They demonstrate that models trained on this dataset achieve improved performance in nearly all quality aspects and argue that it should become a valid benchmark for developing and evaluating summarization systems. EMNLP 2022
Evaluating Multiple System Summary Lengths: A Case Study The paper explores whether reference summaries of a single length can be used to evaluate system summaries of varying lengths. The authors conducted a case study using several variants of the ROUGE metric and found that the evaluation protocol is competitive. This paves the way for practical evaluation of varying-length summaries using existing summarization benchmarks. EMNLP 2018
Ranking Generated Summaries by Correctness: An Interesting but Challenging Application for Natural Language Inference The paper discusses the limitations of abstractive summarization due to factual errors in generated summaries. The authors evaluate summaries produced by state-of-the-art models and find that errors occur frequently, especially with more abstractive models. They explore the use of textual entailment predictions to detect and reduce such errors by reranking alternative predicted summaries. The authors find that current entailment models do not offer the desired performance for this task and release their annotations as additional test data for future evaluations of natural language inference. ACL 2019
Understanding Neural Abstractive Summarization Models via Uncertainty The paper discusses the difficulty in interpreting the behavior of seq2seq abstractive summarization models, which generate text in a free-form manner. The authors analyze summarization decoders by studying the entropy of the model's token-level predictions, finding a correlation between low prediction entropy and where the model copies tokens rather than generating novel text. The decoder's uncertainty also connects to factors like sentence position and syntactic distance between adjacent pairs of tokens, giving insight into what factors make a context particularly selective for the model's next output token. Finally, the authors study the relationship between decoder uncertainty and attention behavior to understand how attention gives rise to these observed effects in the model. The paper concludes that uncertainty is a useful perspective for analyzing summarization and text generation models more broadly. EMNLP 2020
What Makes a Good Summary? Reconsidering the Focus of Automatic Summarization The paper discusses the need to re-assess the focus and objectives of automatic text summarization and whether they align with users' desires. The authors conducted a survey among heavy users of pre-made summaries and found that the current focus of the field does not fully align with participants' wishes. They propose adopting a broader perspective on automatic summarization, expanding the types of input material that can be summarized, and defining requirements for datasets that can facilitate these research directions. They also propose including usefulness as an important aspect of summarization in the evaluation methodology and propose a methodology to evaluate the usefulness of a summary. The authors hope to unlock important research directions for future work on automatic summarization. NAACL 2022
Leveraging Locality in Abstractive Text Summarization The paper discusses the challenges of using neural attention models for long text summarization due to the quadratic memory complexity of the self-attention module. Instead of designing more efficient attention modules, the authors investigate if models with a restricted context can have competitive performance. They propose a locality-aware modeling strategy where the model is applied to individual pages grouped by the principle of locality during both the encoding and decoding stages. The authors empirically investigate three kinds of locality in text summarization at different levels of granularity and show that their model outperforms strong baseline models with efficient attention modules. EMNLP 2022
Benchmarking Answer Verification Methods for Question Answering-Based Summarization Evaluation Metrics The paper discusses the importance of answer verification in question answering-based summarization evaluation metrics. The authors benchmark various answer verification methods, including lexical overlap and more sophisticated text comparison methods like BERTScore and LERC. They find that LERC performs well in some settings, but overall, improved verification performance does not necessarily lead to better QA-based metric quality. The authors attribute this to dataset properties. ACL 2022
Gradient-based Adversarial Factual Consistency Evaluation for Abstractive Summarization The paper proposes a method for generating highly abstract yet factually correct summaries using an efficient weak-supervised adversarial data augmentation approach. The approach forms a factual consistency dataset and trains an evaluation model that can accurately and robustly discriminate factual consistency and trace factual errors. Experiments and analysis on public annotated summarization and factual consistency datasets demonstrate the effectiveness and reasonableness of the approach. The codes for the approach can be found at https://github.com/parZival27/GrAdualCC. EMNLP 2021
The limits of automatic summarisation according to ROUGE System: This paper highlights the limitations of using the ROUGE metric for evaluating summarization systems, particularly in terms of optimal solutions. The authors provide the first proof that the task of summarization is NPhard. However, they also demonstrate that greedy algorithms perform well on three benchmark datasets. The paper also points out the difficulty in ensuring overall quality assurance, as there is no natural upper bound on the quality of summarization systems and even humans cannot achieve optimal summarization. EACL 2017
Reference-free Summarization Evaluation via Semantic Correlation and Compression Ratio System: The paper proposes a new automatic reference-free evaluation metric for summarization that compares semantic distribution between source document and summary by pretrained language models and considers summary compression ratio. The experiments show that this metric is more consistent with human evaluation in terms of coherence, consistency, relevance, and fluency. NAACL 2022
GO FIGURE: A Meta Evaluation of Factuality in Summarization The paper discusses the challenge of ensuring factual correctness in machine-generated text and introduces a metaevaluation framework called GO FIGURE for evaluating factuality evaluation metrics. The framework proposes five necessary conditions for evaluating factuality metrics on diagnostic factuality data across three different summarization tasks. The benchmark analysis on ten factuality metrics shows that the framework provides a robust and efficient evaluation that is extensible to multiple types of factual consistency and standard generation metrics, including QA metrics. However, the performance of QA metrics is highly dependent on the way in which questions are generated. ACL 2021
Understanding Factuality in Abstractive Summarization with FRANK: A Benchmark for Factuality Metrics The paper discusses the issue of factually unreliable outputs generated by modern summarization models and the lack of common benchmarks to measure their factuality. To address this, the authors devise a typology of factual errors and collect human annotations of generated summaries from state-of-the-art summarization systems for the CNN/DM and XSum datasets. They identify the proportion of different categories of factual errors in various summarization models and benchmark factuality metrics, showing their correlation with human judgement and specific strengths and weaknesses. NAACL 2021
Play the Shannon Game With Language Models: A Human-Free Approach to Summary Evaluation The paper introduces new reference-free summary evaluation metrics that use a pretrained language model to estimate the information content shared between a document and its summary. These metrics are a modern take on the Shannon Game and an extension of BLANC. The authors empirically verify that their metrics achieve state-of-the-art correlation with human judgement of the summary quality dimensions of coherence and relevance, as well as competitive correlation with human judgement of consistency and fluency. AAAI 2022
Revisiting Automatic Evaluation of Extractive Summarization Task: Can We Do Better than ROUGE? The paper discusses the limitations of the traditional ROUGE metric for evaluating automated summarization tasks and proposes a semantic-aware nCG-based evaluation metric called Sem-nCG. The paper demonstrates how to generate more reliable semantic-aware ground truths for evaluating extractive summarization tasks without additional human intervention. The authors conducted extensive experiments using the CNN/DailyMail dataset and found that Sem-nCG is more reliable and shows higher correlation with human judgement than ROUGE. The paper suggests that ROUGE often leads to inaccurate conclusions and Sem-nCG is a better alternative for evaluating extractive summarization tasks. ACL 2022
Factorizing Content and Budget Decisions in Abstractive Summarization of Long Documents The paper proposes a method called FACTORSUM1 that disentangles content selection from the budget used to cover salient content, improving the performance and applicability of abstractive summarizers. This is achieved by factorizing summarization into two steps through an energy function: (1) generation of abstractive summary views covering salient information in subsets of the input document (document views); (2) combination of these views into a final summary, following a budget and content guidance. The model achieves significantly higher ROUGE scores on multiple benchmarks for long document summarization, and is effective for domain adaptation. The performance gains are due to more flexible budget adaptation and processing of shorter contexts provided by partial document views. EMNLP 2022
Learning to Score System Summaries for Better Content Selection Evaluation System: The paper proposes a new automatic scoring metric for evaluating summaries, based on human judgments from classical summarization datasets. The model learns the best combination of existing automatic scoring metrics that correlates with human judgments. The reliability of the new metric is tested through a manual evaluation, and the trained metric is released as an open-source tool. EMNLP 2017
Optimizing the Factual Correctness of a Summary: A Study of Summarizing Radiology Reports The paper discusses the limitations of existing neural abstractive summarization models in terms of factual correctness and proposes a framework to evaluate and optimize the factual correctness of generated summaries using an information extraction module and reinforcement learning. The proposed method is applied to the summarization of radiology reports, where factual correctness is crucial, and is shown to substantially improve the quality of outputs over a competitive neural summarization system, approaching the quality of human-authored summaries. ACL 2020
Masked Summarization to Generate Factually Inconsistent Summaries for Improved Factual Consistency Checking The paper discusses the challenge of determining whether a generated summary is factually consistent with the source text, despite recent advances in abstractive summarization systems. The latest approach is to train a factual consistency classifier on factually consistent and inconsistent summaries, with the former readily available as reference summaries in existing summarization datasets. However, generating factually inconsistent summaries that are closely relevant to the source text remains a challenge. The paper proposes a method of generating such summaries using source texts and reference summaries with key information masked. Experiments on seven benchmark datasets demonstrate that factual consistency classifiers trained on summaries generated using this method generally outperform existing models and show a competitive correlation with human judgments. The characteristics of the summaries generated using this method are also analyzed, and a pre-trained model and code will be released. NAACL 2022
Readability Controllable Biomedical Document Summarization The paper discusses the need for readability controllable summarization for biomedical documents, as existing summarization systems do not consider the varying levels of expertise of readers. The authors introduce a new task of generating technical summaries for experts and plain language summaries for laypeople, and construct a corpus of biomedical papers with both types of summaries. They benchmark multiple advanced summarization models and propose a novel metric to evaluate the readability discrepancy between the two types of summaries. The results show that current control techniques are not effective in generating suitable summaries for different levels of expertise. EMNLP 2022
Scientific Paper Extractive Summarization Enhanced by Citation Graphs The paper discusses the use of citation graphs to improve scientific paper extractive summarization. The authors propose two models: a Multi-granularity Unsupervised Summarization model (MUS) and a Graph-based Supervised Summarization model (GSS). MUS finetunes a pre-trained encoder model on the citation graph by link prediction tasks, while GSS introduces a gated sentence encoder and a graph information fusion module to polish the sentence representation. Experiments on a public benchmark dataset show that both models bring substantial improvements over the prior state-of-the-art model. EMNLP 2022
Why Do You Feel This Way? Summarizing Triggers of Emotions in Social Media Posts The paper discusses the importance of understanding the triggers that lead to people's emotions during crises such as the COVID-19 pandemic. It proposes a novel approach of emotion detection and trigger summarization using social media posts, which tend to be charged with multiple emotions and scattered triggers. The authors introduce COVIDET, a dataset of ~1,900 English Reddit posts related to COVID-19, with manual annotations of perceived emotions and abstractive summaries of their triggers. The paper also presents strong baselines for jointly detecting emotions and summarizing emotion triggers. The authors conclude that COVIDET presents new challenges in emotion-specific summarization and multi-emotion detection in long social media posts. EMNLP 2022
Extractive Summarization of Legal Decisions using Multi-task Learning and Maximal Marginal Relevance The paper presents techniques for extractive summarization of legal decisions in a low-resource setting using limited expert annotated data. The models locate relevant content using a sequential model and tackle redundancy by leveraging maximal marginal relevance to compose summaries. The proposed approaches can achieve ROUGE scores vis-à-vis expert extracted summaries that match those achieved by inter-annotator comparison. The multi-task learning model variant leverages rhetorical role identification as an auxiliary task to further improve the summarizer. EMNLP 2022
Correcting Diverse Factual Errors in Abstractive Summarization via Post-Editing and Language Model Infilling The paper proposes a new approach to correcting factual errors in abstractive summarization models. Instead of using heuristics to generate non-factual summaries, the authors generate hard, representative synthetic examples of non-factual summaries through infilling language models. With this data, they train a more robust fact-correction model to post-edit the summaries to improve factual consistency. The approach is shown to vastly outperform prior methods in correcting erroneous summaries on two popular summarization datasets, improving factuality scores by over ∼11 points on CNN/DM and over ∼31 points on XSum on average across multiple summarization models, while maintaining competitive summarization quality. The proposed model is called FACTEDIT. EMNLP 2022
Learning to Generate Overlap Summaries through Noisy Synthetic Data The paper discusses the Semantic Overlap Summarization (SOS) task, which involves summarizing common information from multiple alternate narratives. The lack of existing datasets for supervised training is a major challenge for this task. To address this, the authors propose a novel data augmentation technique to create synthetic data for training a seq-to-seq model. Through experiments using news narratives, they show that models trained using the synthetic dataset provide significant performance improvements over pre-trained summarization techniques and are close to models trained on golden training data. The proposed data augmentation technique is effective for training seq-to-seq models on the SOS task. EMNLP 2022
SentBS: Sentence-level Beam Search for Controllable Summarization The paper discusses the limitations of current structure-controlling methods in controllable text generation and proposes a new method called sentence-level beam search generation (SentBS) to address these limitations. SentBS evaluates sentences throughout the generation process to select suitable ones for subsequent generations. The paper experiments with different decoding methods as subcomponents for SentBS and evaluates the results on the structure-controlled dataset MReD. The experiments show that all explored combinations for SentBS can improve the agreement between the generated text and the desired structure, with the best method reducing structural discrepancies by approximately 68%. EMNLP 2022
SEM-F1: an Automatic Way for Semantic Evaluation of Multi-Narrative Overlap Summaries at Scale The paper discusses the Semantic Overlap Summarization (SOS) task, which involves generating a summary from multiple alternative narratives that convey common information. The authors focus on the automated evaluation of the SOS task using a benchmark dataset and find that the popular ROUGE metric is not suitable for this task. They propose a new evaluation metric called SEM-F1, which yields higher correlation with human judgment and inter-rater agreement compared to ROUGE. The metric is inspired by the sentence-wise annotation technique using overlap labels reported in previous work. EMNLP 2022
Factual Error Correction for Abstractive Summaries Using Entity Retrieval The paper discusses the problem of factual errors in abstractive summarization systems and proposes a solution in the form of an efficient factual error correction system called RFEC. The system is based on entity retrieval and retrieves evidence sentences from the original document to reduce the length of the text to analyze. It then detects entity-level errors in the summaries and substitutes the wrong entities with accurate ones from the evidence sentences. The experimental results show that RFEC outperforms baseline methods in correcting factual errors with a faster speed. EMNLP 2022
ECTSum: A New Benchmark Dataset For Bullet Point Summarization of Long Earnings Call Transcripts The paper discusses the lack of efficient techniques to summarize financial documents and introduces a new dataset called ECTSum, which consists of transcripts of earnings calls and expert-written bullet point summaries. The authors benchmark their dataset with state-of-the-art summarization methods and present a simple yet effective approach called ECT-BPS to generate bullet points that capture important facts discussed in the calls. EMNLP 2022
FRSUM: Towards Faithful Abstractive Summarization via Enhancing Factual Robustness The paper discusses the unfaithful generation problem in current Seq2Seq summarization models, despite their ability to generate fluent and grammatical text. The authors propose a new perspective of factual robustness to measure the faithfulness of existing systems, which is the ability to correctly generate factual information over adversarial unfaithful information. They propose a novel training strategy called FRSUM, which enhances the model's factual robustness by teaching it to defend against both explicit adversarial samples and implicit factual adversarial perturbations. The evaluation results show that FRSUM consistently improves the faithfulness of various Seq2Seq models, such as T5 and BART. EMNLP 2022
Learning From the Source Document: Unsupervised Abstractive Summarization The paper introduces an unsupervised learning method called SCR (Summarize, Contrast and Review) for abstractive text summarization. Unlike most state-of-the-art methods that heavily rely on high-quality and large-scale parallel corpora, SCR removes the need for reference summaries. It leverages contrastive learning and is the first work to apply it for unsupervised abstractive summarization. The model is trained using true source documents as positive examples and strategically generated fake source documents as negative examples. The generated summaries are also guided to be similar to human-written texts. The extensive experiments show that SCR outperforms other unsupervised abstractive summarization baselines, demonstrating its effectiveness. EMNLP 2022
CiteSum: Citation Text-guided Scientific Extreme Summarization and Domain Adaptation with Limited Supervision The paper proposes a new approach to automatically extract ultra-short summaries of scientific papers from their citation texts, creating a new benchmark dataset called CiteSum without human annotation. The authors conduct a comprehensive analysis of CiteSum and demonstrate the usefulness of the dataset by adapting models pre-trained on CiteSum to new tasks and domains with limited supervision. The results show that CITES outperforms most fully-supervised methods on SciTLDR for scientific extreme summarization and achieves significant gains on XSum for news extreme summarization and news headline generation. EMNLP 2022
Mutual Information Alleviates Hallucinations in Abstractive Summarization The paper discusses the issue of abstractive summarization models exhibiting the tendency to output content not supported by the source document, known as hallucinations. The authors identify high model uncertainty as a criterion that leads to more probability of hallucinated content during generation. They propose a decoding strategy that switches to optimizing for pointwise mutual information of the source and target token when the model exhibits uncertainty, which decreases the probability of hallucinated tokens while maintaining the ROUGE and BERTS scores of top-performing decoding strategies. The experiments on the XSUM dataset support the effectiveness of their proposed method. EMNLP 2022
REFEREE: Reference-Free Sentence Summarization with Sharper Controllability through Symbolic Knowledge Distillation REFEREE is a new framework for sentence summarization that can be trained without the need for gold summaries. It allows for direct control of compression ratio and uses Symbolic Knowledge Distillation to distill latent knowledge from pre-trained language models. The framework proposes iterative distillation of knowledge, where student models from previous iterations serve as teacher models in the next iteration. The results show that the final student models outperform the much larger GPT3-Instruct model in terms of controllability of compression ratios without compromising the quality of summarization. The iterative distillation process also produces a high-quality dataset of sentence-summary pairs with varying degrees of compression ratios. EMNLP 2022
Unsupervised Opinion Summarisation in the Wasserstein Space The paper discusses the challenges of opinion summarization of social media posts and presents WassOS, an unsupervised abstractive summarization model that uses the Wasserstein distance. The model disentangles the distributions of documents/posts into separate semantic and syntactic spaces and obtains the summary distribution using the Wasserstein barycenter. A latent variable is then fed into a GRU decoder with a transformer layer to produce the final summary. The experiments on multiple datasets show that WassOS outperforms the state-of-the-art on ROUGE metrics and consistently produces the best summaries with respect to meaning preservation according to human evaluations. EMNLP 2022
BOOKSUM: A Collection of Datasets for Long-form Narrative Summarization The paper discusses the limitations of existing text summarization datasets and introduces BOOKSUM, a collection of datasets for long-form narrative summarization. The dataset covers literature documents and includes highly abstractive, human-written summaries on three levels of granularity. The unique challenges posed by the domain and structure of the dataset include processing long documents, non-trivial causal and temporal dependencies, and rich discourse structures. The paper also evaluates multiple extractive and abstractive summarization models as baselines for the dataset. EMNLP 2022
NARRASUM: A Large-Scale Dataset for Abstractive Narrative Summarization The paper proposes NARRASUM, a large-scale narrative summarization dataset containing 122K narrative documents and their corresponding abstractive summaries. The dataset is collected from plot descriptions of movies and TV episodes with diverse genres. The paper highlights the challenges of summarizing a narrative, which requires an understanding of event causality and character behaviors. The experiments show a large performance gap between humans and state-of-the-art summarization models on NARRASUM. The authors hope that this dataset will promote future research in summarization and broader studies of natural language understanding and generation. The dataset is available at https://github.com/zhaochaocs/narrasum. EMNLP 2022
Don’t Say What You Don’t Know: Improving the Consistency of Abstractive Summarization by Constraining Beam Search The paper discusses the issue of "hallucinations" in abstractive summarization systems, where the system produces statements not supported by the source text. The authors analyze the connection between hallucinations and training data, and find that models hallucinate because they train on target summaries that are unsupported by the source. They present a new decoding method called PINOCCHIO, which improves the consistency of a transformer-based abstractive summarizer by constraining beam search to avoid hallucinations. PINOCCHIO detects likely model hallucinations based on various measures of attribution to the source text and can backtrack to find more consistent output or produce no summary at all when no consistent generation can be found. The experiments show that PINOCCHIO improves the consistency of generation by an average of 68% on two abstractive summarization datasets without hurting recall. EMNLP 2022
HYDRASUM: Disentangling Style Features in Text Summarization with Multi-Decoder Models The paper introduces HYDRASUM, a new summarization architecture that uses multiple decoders to automatically learn contrasting summary styles without extra supervision. HYDRASUM provides a simple mechanism to obtain stylistically-diverse summaries by sampling from individual decoders or their mixtures, outperforming baseline models on three summarization datasets. A small modification to the gating strategy during training can enforce an even stricter style partitioning, allowing users to vary summary styles along multiple dimensions. EMNLP 2022
Improving Faithfulness by Augmenting Negative Summaries from Fake Documents The paper discusses the issue of current abstractive summarization systems producing summaries that are unfaithful to the source document, which can lead to misinformation. The authors propose a back-translation-style approach to augment negative samples that mimic factual errors made by the model, in order to teach the model to distinguish between faithful and unfaithful summaries. They also incorporate textual entailment data through multitasking to further improve performance. Experiments on three datasets show that their method consistently improves faithfulness without sacrificing informativeness. EMNLP 2022
Learning to Revise References for Faithful Summarization The paper proposes a new approach to improve the quality of reference summaries while retaining all data. The approach involves selectively rewriting unsupported reference sentences to better reflect source data. A synthetic dataset of positive and negative revisions is automatically generated, and models are trained to revise reference sentences with contrastive learning. The intensity of revisions is treated as a controllable attribute to balance faithfulness and abstraction. The proposed method is tested on noisy references from publicly available MIMIC-III discharge summaries for hospital-course summarization, and models trained on revised clinical references are found to be more faithful, informative, and fluent than models trained on original or filtered data. EMNLP 2022
HEGEL: Hypergraph Transformer for Long Document Summarization The paper discusses the challenges of extractive summarization for long documents due to the extended structured input context and long-distance sentence dependency. It proposes HEGEL, a hypergraph neural network that captures high-order cross-sentence relations to improve summarization. HEGEL uses hypergraph transformer layers to update and learn effective sentence representations and fuses different types of sentence dependencies, including latent topics, keywords coreference, and section structure. The paper validates HEGEL through extensive experiments on two benchmark datasets, demonstrating its effectiveness and efficiency. EMNLP 2022
Long Text and Multi-Table Summarization: Dataset and Method The paper discusses the limitations of existing document summarization methods that focus only on text and filter out non-textual content, such as tables. To address this, the authors propose FINDSum, a large-scale dataset for long text and multi-table summarization. The dataset is built on 21,125 annual reports from 3,794 companies and has two subsets for summarizing each company's results of operations and liquidity. The authors present three types of summarization methods and propose evaluation metrics to assess the usage of numerical information in produced summaries. The paper highlights the importance of jointly considering input textual and tabular data when summarizing report documents. EMNLP 2022
Improving Factual Consistency in Summarization with Compression-Based Post-Editing The paper discusses the problem of factual inconsistency in summarization models and proposes a model-agnostic approach to address it through post-editing. The focus is on removing extrinsic entity errors, or entities not in the source, to improve consistency while retaining the summary's essential information and form. The proposed method uses sentence-compression data to train the post-editing model to remove errors marked with special tokens. The model improves factual consistency while maintaining ROUGE and can be applied on top of another post-editor, improving entity precision by up to a total of 38%. The paper also compares different post-editing approaches and analyzes settings where post-editors show the largest improvements. EMNLP 2022
Making Science Simple: Corpora for the Lay Summarisation of Scientific Literature The paper discusses the importance of lay summarisation in making scientific literature more accessible to non-experts. It highlights the limitations of current corpora for this task and presents two new datasets, PLOS and eLife, containing biomedical journal articles and expert-written lay summaries. The paper characterizes the lay summaries and benchmarks them using mainstream summarization approaches, demonstrating their utility and identifying key challenges. The datasets and code are available for use. EMNLP 2022
Revisiting text decomposition methods for NLI-based factuality scoring of summaries The paper discusses the use of Natural Language Inference models to score the factuality of generated summaries. Previous studies have shown that decomposing either the input document or the summary into sentences can improve factuality scoring. However, the paper systematically compares different granularities of decomposition and shows that fine-grained decomposition is not always the best strategy. The results also suggest that incorporating additional context can improve performance, but this may not apply to all datasets. The paper highlights the importance of caution in model and methodology selection for downstream tasks. EMNLP 2022
Learning with Rejection for Abstractive Text Summarization The paper proposes a new training objective for abstractive summarization that uses rejection learning to identify and reject potentially noisy tokens. They also propose a regularized decoding objective that penalizes non-factual candidate summaries during inference. The method improves the factuality of generated summaries while increasing their abstractiveness, as shown in evaluations compared to five baseline models. Existing methods drop noisy samples or tokens from the training set, reducing its size and creating an artificial propensity to copy words from the source. EMNLP 2022
Abstractive Summarization Guided by Latent Hierarchical Document Structure The paper proposes a new approach to summarizing scientific articles using a hierarchy-aware graph neural network (HierGNN). This approach captures the underlying structure and dependencies between sentences in the input article, which is essential for integrating and consolidating information from different parts of the text. The HierGNN model consists of three main steps: learning a hierarchical document structure, propagating sentence information over this structure, and using graph-level attention to concentrate the decoder on salient information. Experiments show that HierGNN improves upon strong sequence models such as BART, with a significant margin in average ROUGE-1/2/L for CNN/DM and XSum. Human evaluation also demonstrates that summaries produced by HierGNN are more relevant and less redundant than baselines. The model synthesizes summaries by fusing multiple source sentences, rather than compressing a single source sentence, and processes long inputs more effectively. EMNLP 2022
Towards Summary Candidates Fusion The paper discusses the limitations of current abstractive summarization methods and proposes a new paradigm called SummaFusion, which fuses multiple summary candidates to produce a novel abstractive second-stage summary. This method improves both the ROUGE scores and qualitative properties of the summaries, especially in the few-shot setup where it sets a new state-of-the-art. The code and checkpoints for SummaFusion are available on GitHub. EMNLP 2022
Generating Multiple-Length Summaries via Reinforcement Learning for Unsupervised Sentence Summarization The paper discusses the development of an abstractive model for unsupervised summarization of texts, which is based on reinforcement learning and does not require human-written summaries. The model uses a Markov decision process with rewards to formulate the summarization process and a multi-summary learning mechanism to generate multiple summaries of varying lengths that enhance each other. Experimental results show that the proposed model outperforms both abstractive and extractive models and frequently generates new words not present in the input texts. EMNLP 2022
CTRLSUM: Towards Generic Controllable Text Summarization The paper introduces CTRLSUM, a framework for generating summaries that can be controlled through a set of keywords. The keywords are automatically extracted during training, and at test time, a control function maps control signals to keywords. The same trained model can be applied to control summaries on various dimensions without affecting the model training process or pretrained models. The framework is effective in entity-centric and length-controllable summarization, contribution summarization on scientific papers, invention purpose summarization on patent filings, and question-guided summarization on news articles. CTRLSUM is also comparable or better than strong pretrained systems in standard, unconstrained summarization settings. EMNLP 2022
SNAC: Coherence Error Detection for Narrative Summarization The paper discusses the lack of appropriate evaluation frameworks for summarizing long texts, which inhibits progress in this field. The authors introduce SNAC, a narrative coherence evaluation framework for fine-grained annotations of long summaries. They develop a taxonomy of coherence errors in generated narrative summaries and collect annotations for 6.6k sentences across 150 book and movie summaries. The collected annotations allow them to benchmark past work in coherence modeling and train a strong classifier for automatically localizing coherence errors in generated summaries. The SNAC framework can support future work in long document summarization and coherence evaluation, including improved summarization modeling and posthoc summary correction. EMNLP 2022
Universal Evasion Attacks on Summarization Scoring The paper discusses the importance of automatic scoring of summaries in guiding the development of summarizers, but notes that summary scoring has not been studied as a machine learning task to assess its accuracy and robustness. The authors perform evasion attacks to explore the robustness of summary scoring systems and find that non-summary strings can achieve competitive scores with good summarizers on popular metrics such as ROUGE, METEOR, and BERTScore. The attacks also outperform state-of-the-art summarization methods on ROUGE-1 and ROUGE-L, and score the second-highest on METEOR. The authors observe a BERTScore backdoor where a simple trigger can score higher than any automatic summarization method. The low robustness of current scoring systems at the system level is highlighted, and the authors hope that their proposed attacks will facilitate the development of summary scores. EMNLP 2022
Salience Allocation as Guidance for Abstractive Summarization The paper proposes a new summarization approach called SEASON (SaliencE Allocation as Guidance for Abstractive SummarizatiON) that uses salience expectation to guide abstractive summarization and adapts well to articles with different levels of abstractiveness. The paper argues that extractive summaries as guidance can be too strict and lead to information loss or noisy signals. SEASON is shown to be effective and reliable in automatic and human evaluations on two benchmark datasets, and empirical results on more than one million news articles demonstrate a natural fifteen-fifty salience split for news article sentences, providing useful insights for composing news articles. EMNLP 2022
R-TeaFor: Regularized Teacher-Forcing for Abstractive Summarization System: The paper proposes a new method called Regularized Teacher-Forcing (R-TeaFor) to address the exposure bias problem in training sequence generation models. R-TeaFor utilizes the pairwise relationship between the original training data and the modified ones for better regularization. The experiments show that R-TeaFor outperforms previous state-of-the-art models in summarization and can be generalized to different pre-trained models. EMNLP 2022
How Far are We from Robust Long Abstractive Summarization? The paper discusses the evaluation of long document abstractive summarization systems using fine-grained human annotations. It highlights the trade-off between generating relevant summaries and factual ones, and suggests promising directions for developing factual consistency metrics. The study also reveals the limitations of factuality metrics in detecting different types of factual errors and the effectiveness of ROUGE and BARTScore in evaluating the relevancy of a summary. The authors release their annotated long document dataset to contribute to the development of metrics across a broader range of summarization settings. EMNLP 2022
Summarizing Procedural Text: Data and Approach The paper proposes a procedural text summarization task with two summarization granularity: step-view and globalview, which summarizes each step in procedural text separately or gives an overall summary for all steps respectively. To tackle this task, the authors propose an Entity-State Graph-based Summarizer (ESGS) which is based on state-of-the-art entity state tracking methods and constructs a heterogeneous graph to aggregate contextual information for each procedure. The authors also propose to use the contextualized procedure graph representation to predict the salient entity. Experiments conducted on two datasets verify the effectiveness of the proposed model. EMNLP 2022
Few-shot Query-Focused Summarization with Prefix-Merging The paper proposes a new approach called prefix-merging for few-shot learning in query-focused summarization. The approach integrates the knowledge of text summarization and question answering into a properly designed prefix and applies it to query-focused summarization. With only a small amount of trainable parameters, prefix-merging outperforms fine-tuning on query-focused summarization. The paper also discusses the influence of different prefix designs and proposes a visualized explanation for how prefix-merging works. EMNLP 2022
Opinion Summarization by Weak-Supervision from Mix-structured Data The paper discusses the challenges of opinion summarization of multiple reviews and proposes a new method to address the issue. The authors convert each review into a mix of structured and unstructured data, called opinion-aspect pairs (OAs) and implicit sentences (ISs), and synthesize training pairs of such mix-structured data as input and the textual summary as output. They design a summarization model with OA encoder and IS encoder and show that their approach outperforms previous methods on Yelp, Amazon and RottenTomatos datasets. EMNLP 2022
Are Abstractive Summarization Models truly ‘Abstractive’? An Empirical Study to Compare the two Forms of Summarization The paper discusses the shift from extractive to abstractive methods in automatic text summarization, and how large autoregressive language models have contributed to this shift. The authors revisit extractive methods and compare their performance to state-of-the-art abstractive models, finding that abstractive methods are not completely abstract in their generated summaries. They propose an evaluation metric to measure the degree of abstractiveness of a summary compared to extractive methods. The authors conduct experiments on two summarization datasets using five techniques in extractive and abstractive summarization to confirm their findings. EMNLP 2022
X-FACTOR: A Cross-metric Evaluation of Factual Correctness in Abstractive Summarization The paper discusses the issue of factually inconsistent summaries produced by abstractive summarization models and proposes X-FACTOR, a cross-evaluation of three high-performing fact-aware abstractive summarization methods. The authors propose a fact-aware filtering mechanism to improve the quality of training data, a corrector module to improve the factual consistency of generated summaries, and a re-ranking technique to sample summary instances and rerank them based on their factuality. The paper also provides a detailed crossmetric agreement analysis to show how tuning a model to output summaries based on a particular factuality metric influences factuality as determined by other metrics. The goal of the work is to facilitate research that improves the factuality and faithfulness of abstractive summarization models. EMNLP 2022
Human Guided Exploitation of Interpretable Attention Patterns in Summarization and Topic Segmentation The paper discusses the combination of two lines of research on the multi-head self-attention mechanism of the transformer model. The first line of research aims to understand why and how transformers work, while the second proposes new attention augmentation methods to make transformers more accurate, efficient, and interpretable. The authors present a human-in-the-loop pipeline to discover task-specific attention patterns, which are then injected into smaller and original models. The benefits of this approach are demonstrated in two case studies on extractive summarization and topic segmentation, where the models show considerable improvements in accuracy and efficiency after injecting the discovered patterns into attention heads. EMNLP 2022
Unsupervised Token-level Hallucination Detection from Summary Generation By-products The paper discusses the issue of hallucinations in abstractive summarization, which are model generations that are not faithful to the source document. Current methods for detecting hallucinations are limited to certain datasets and focus on noun phrases and named entities. The authors propose a new method that detects candidate hallucinations at the token level, regardless of its part of speech, using information already produced during summary generation. They evaluate their method on the CNN/DailyMail dataset and show that it achieves better precision-recall tradeoffs than existing methods. The authors also repurpose an existing factuality dataset and create their own token-level annotations. Overall, their method enables practitioners to generate summaries and identify possible hallucinations with minimal overhead. EMNLP 2022
Unsupervised Multi-Granularity Summarization The paper proposes an unsupervised multi-granularity summarization framework called GRANUSUM, which can generate summaries with customizable semantic coverage. The framework uses events as the basic semantic units of the source documents and ranks them by their salience. A model is developed to summarize input documents with given events as anchors and hints, producing multi-granular summaries in an unsupervised manner. The paper also introduces a new benchmark called GranuDUC, which contains multiple summaries at different granularities for each document cluster. Experimental results show that GRANUSUM outperforms strong baselines in multi-granularity summarization and exhibits state-of-the-art performance under conventional unsupervised abstractive setting by exploiting event information. EMNLP 2022