Search papers by interacting with the summarization pipeline components
Choosing multiple components currently performs an OR search resulting in larger results on applying multiple filters.
Document Representation
  • 90 Papers
  • 30 Papers
  • 17 Papers
  • 61 Papers
Model Training
  • 128 Papers
  • 29 Papers
Summary Creation
  • 80 Papers
  • 50 Papers
  • 9 Papers
Found 514 papers

Thi Nhat, Anh Nguyen, Mingwei Shen, Karen Hovsepian

1.  Unsupervised Class-Specific Abstractive Summarization of Customer Reviews
ACL, 2021 Unsupervised Learning

System: The paper proposes a model for large-scale unsupervised abstractive summarization of customer reviews in e-commerce. The model addresses the challenge of reducing generic and uninformative content and producing useful information related to specific product aspects by modeling reviews in the context of topical classes of interest. The proposed model can generate class-specific summaries from multiple reviews of each product without ground-truth summaries, using only class probabilities or labels. The model combines a generative variational autoencoder with a class-correlation gating mechanism and a hierarchical structure. Human evaluation shows that the generated summaries are relevant, fluent, and representative, and evaluation using a reference dataset shows that the model outperforms state-of-the-art abstractive and extractive baselines.

Masaru Isonuma, Junichiro Mori, Danushka Bollegala, Ichiro Sakata

2.  Unsupervised Abstractive Opinion Summarization by Generating Sentences with Tree-Structured Topic Guidance
TACL, 2021 Unsupervised Learning

The paper presents a new method for summarizing opinionated texts using a recursive Gaussian mixture model. The model generates sentences with tree-structured topic guidance, where the root sentence conveys generic content, and the leaf sentences describe specific topics. Experimental results show that the generated topic sentences are more informative and cover more input contents than those generated by recent unsupervised summarization models. The paper also demonstrates that the variance of latent Gaussians represents the granularity of sentences, similar to Gaussian word embedding.

Arthur Bražinskas, Mirella Lapata, Ivan Titov

3.  Unsupervised Opinion Summarization as Copycat-Review Generation
ACL, 2020 Unsupervised Learning

The paper discusses the task of opinion summarization, which involves automatically creating summaries that reflect subjective information expressed in multiple documents, such as product reviews. While previous work has focused on selecting fragments from input reviews to produce a summary, the authors propose a generative model that can produce abstractive summaries by generating novel sentences. They consider the unsupervised setting, where no summaries are used in training, and define a hierarchical variational autoencoder model that can control the "amount of novelty" in new reviews. At test time, the model produces summaries that reflect consensus opinions by forcing the novelty to be minimal. Experiments on Amazon and Yelp datasets show that setting the review's latent code to its mean allows the model to produce fluent and coherent summaries.

Juan Ramirez-Orta, Evangelos Milios

4.  Unsupervised document summarization using pre-trained sentence embeddings and graph centrality
NAACL, 2021 Unsupervised Learning

System: The paper proposes a simple and fast method for summarizing any document of any size using sentence embeddings produced by deep language models. This method is based on graph centrality and can satisfy any length constraints for the summaries produced. The proposed method offers competitive performance to more sophisticated supervised methods and can serve as a proxy for abstractive summarization techniques.

Jhen-Yi Wu, Ying-Jia Lin, Hung-Yu Kao

5.  Unsupervised Single Document Abstractive Summarization using Semantic Units
AACL, 2022 Unsupervised Learning

The paper discusses the importance of content frequency in abstractive summarization and proposes a two-stage training framework for the model to learn the frequency of each semantic unit in the source text. The model is trained in an unsupervised manner and identifies sentences with high-frequency semantic units during inference to generate summaries. The model outperforms other unsupervised methods on the CNN/Daily Mail summarization task and achieves competitive ROUGE scores with fewer parameters than pre-trained models. It can be trained under low-resource language settings and is a potential solution for real-world applications where pre-trained models are not applicable.

Peter West, Ari Holtzman, Jan Buys, Yejin Choi

6.  BottleSum: Unsupervised and Self-supervised Sentence Summarization using the Information Bottleneck Principle
EMNLP, 2019 Unsupervised Learning

The paper proposes a new approach to unsupervised sentence summarization using the Information Bottleneck principle. The approach seeks a compressed sentence that can best predict the next sentence, using an iterative algorithm that gradually searches shorter subsequences of the given sentence. The method can efficiently perform extractive sentence summarization over a large corpus using only pretrained language models with no direct supervision. The paper also presents a new approach to self-supervised abstractive summarization, where a transformer-based language model is trained on the output summaries of the unsupervised method. Empirical results show that the extractive method outperforms other unsupervised models on multiple automatic metrics, and the self-supervised abstractive model outperforms unsupervised baselines by human evaluation along multiple attributes.

Jiawei Zhou, Alexander M. Rush

7.  Simple Unsupervised Summarization by Contextual Matching
ACL, 2019 Unsupervised Learning

System: The paper proposes an unsupervised method for sentence summarization using language modeling. The approach uses two language models, one generic and one specific to the target domain, and employs a product-of-experts criteria to maintain contextual matching and output fluency. The experiments show promising results for both abstractive and extractive summarization without the need for paired data.

Masaru Isonuma, Junichiro Mori, Ichiro Sakata

8.  Unsupervised Neural Single-Document Summarization of Reviews via Learning Latent Discourse Structure and its Ranking
ACL, 2019 Unsupervised Learning

The paper presents a model for end-to-end abstractive summarization of product reviews without supervision. The model uses a discourse tree to represent the review, with the summary as the root and child sentences providing detailed explanations. The model recursively estimates parents from their children to learn the discourse tree and generate a concise summary. An architecture is introduced to rank the importance of each sentence on the tree and focus on the main review point. Experimental results show that the model outperforms other unsupervised approaches and achieves competitive performance with supervised models for long reviews. The induced tree demonstrates that child sentences provide additional information about their parent, and the generated summary abstracts the entire review.

Ziyi Yang, Chenguang Zhu, Robert Gmyr, Michael Zeng, Xuedong Huang, Eric Darve

9.  TED: A Pretrained Unsupervised Summarization Model with Theme Modeling and Denoising
EMNLP, 2020 Unsupervised Learning

The paper proposes a transformer-based unsupervised abstractive summarization system called TED that pretrains on large-scale data using the lead bias in news articles. The system is then fine-tuned on target domains through theme modeling and a denoising autoencoder to enhance the quality of generated summaries. TED outperforms all unsupervised abstractive baselines on various datasets and the summaries generated by TED are highly abstractive. Each component in the objective function of TED is highly effective.

Ryosuke Kohita, Akifumi Wachi, Yang Zhao, Ryuki Tachibana

10.  Q-learning with Language Model for Edit-based Unsupervised Summarization
EMNLP, 2020 Reinforced Learning

The paper proposes a new approach for unsupervised text summarization using Q-learning with an edit-based summarization. The method combines two modules to form an Editorial Agent and Language Model converter (EALM), where the agent predicts edit actions and the LM converter generates a summary based on the action signals. Q-learning is used to train the agent to produce proper edit actions. Experimental results show that EALM performs competitively compared to previous methods, even with no validation set. The approach also allows for the use of reinforcement learning techniques in unsupervised summarization. Qualitative analysis is conducted to provide insights for future research in unsupervised summarizers.

Jingzhou Liu, Dominic J. D. Hughes, Yiming Yang

11.  Unsupervised Extractive Text Summarization with Distance-Augmented Sentence Graphs
SIGIR, 2021 Unsupervised Learning

The paper discusses the limitations of supervised summarization due to the high cost and difficulty of obtaining large quantities of human-generated summaries. It proposes an unsupervised approach to extractive text summarization using an automatically constructed sentence graph to select salient sentences based on similarities and relative distances. The approach is generalized from single-document to multi-document settings by aggregating document-level graphs via proximity-based cross-document edges. In experiments on benchmark datasets, the proposed approach achieved competitive or better results than previous state-of-the-art unsupervised extractive summarization methods in both single-document and multi-document settings, and the performance is competitive to strong supervised baselines.

Nishant Yadav, Matteo Brucato, Anna Fariha, Oscar Youngquist, Julian Killingback, Alexandra Meliou, Peter J. Haas

12.  SUBSUME: A Dataset for Subjective Summary Extraction from Wikipedia Documents
EMNLP, 2021 Supervised Learning

The paper discusses the need for tailored summaries based on the user's intent and how existing methods fall short when query interpretation is subjective. While several datasets exist for summarization with objective intents, no datasets exist for subjective intents where different users will provide different summaries. The authors present SUBSUME, the first dataset for evaluation of subjective summary extraction systems, containing 2,200 triplets over 48 Wikipedia pages with ten intents of varying subjectivity. The paper explores baseline algorithms for subjective extractive summarization and shows that example-based approaches better capture subjective intents than query-based ones, motivating further research on this challenging problem.

Jennifer A Bishop, Qianqian Xie, Sophia Ananiadou

13.  GenCompareSum: a hybrid unsupervised summarization method using salience
ACL, 2022 Unsupervised Learning

The paper proposes a hybrid, unsupervised, abstractive-extractive approach for text summarization (TS) that generates salient textual fragments representing key points in a document and selects the most important sentences using BERTScore. The approach is evaluated on documents from the biomedical and general scientific domains and compared to existing unsupervised and supervised methods. The authors show that their approach out-performs existing methods despite not needing a vast amount of labelled training data.

Puyuan Liu, Chenyang Huang, Lili Mou

14.  Learning Non-Autoregressive Models from Search for Unsupervised Sentence Summarization
ACL, 2022 Unsupervised Learning

The paper proposes a Non-Autoregressive Unsupervised Summarization (NAUS) approach for generating short summaries without the need for parallel data. The approach involves edit-based search and training an encoder-only non-autoregressive Transformer based on the search result. The paper also introduces a dynamic programming approach for length-control decoding, which is important for the summarization task. Experiments on two datasets show that NAUS achieves state-of-the-art performance for unsupervised summarization and improves inference efficiency. Additionally, the algorithm is able to perform explicit length-transfer summary generation.

Min Yang, Qiang Qu, Jia Zhu, Ying Shen, Zhou Zhao

15.  Cross-domain Aspect/Sentiment-aware Abstractive Review Summarization
CIKM, 2018 Supervised Learning

System: The paper proposes a model called CASAS for aspect/sentiment-aware abstractive review summarization in a domain adaptation scenario. The model leverages a domain classification task to recognize the domain information of texts and transfer knowledge from source domains to target domains. The experiments conducted on Amazon reviews show that CASAS outperforms other methods in both out-of-domain and in-domain setups.

Haoran Li, Junnan Zhu, Jiajun Zhang, Chengqing Zong

16.  Ensure the Correctness of the Summary: Incorporate Entailment Knowledge into Abstractive Sentence Summarization
COLING, 2018 Supervised Learning

The paper discusses the importance of correctness in sentence summarization and proposes a new approach that incorporates entailment knowledge into abstractive summarization models. The authors argue that a correct summary should not contain error messages with respect to the source sentence. They propose an entailment-aware encoder and decoder and use entailment Reward Augmented Maximum Likelihood (RAML) training. Experimental results show that their models outperform baselines in terms of informativeness and correctness.

Amir Soleimani, Vassilina Nikoulina, Benoit Favre, Salah Ait-Mokhtar

17.  Zero-Shot Aspect-Based Scientific Document Summarization using Self-Supervised Pre-training
ACL, 2022 Supervised Learning

The paper explores the zero-shot setting for aspect-based scientific document summarization, which can improve document assistance systems and reader experience. However, current datasets have limited aspects, causing models to over-fit to specific domains. The authors establish baseline results for zero-shot performance and propose a self-supervised pre-training approach to enhance it. They create a biomedical aspect-based summarization dataset using PubMed structured abstracts and show promising results when pre-trained with unlabelled in-domain data.

Xinnian Liang, Jing Li, Shuangzhi Wu, Jiali Zeng, Yufan Jiang, Mu Li, Zhoujun Li

18.  An Efficient Coarse-to-Fine Facet-Aware Unsupervised Summarization Framework Based on Semantic Blocks
COLING, 2022 Unsupervised Learning

The paper proposes an efficient Coarse-to-Fine Facet-Aware Ranking (C2F-FAR) framework for unsupervised long document summarization, which is based on the semantic block. The framework addresses the problem of existing methods failing to consider efficiency and effectiveness at the same time when the input document is extremely long. The proposed method converts the one-step ranking method into the hierarchical multi-granularity two-stage ranking, where the coarse-level stage splits the document into facet-aware semantic blocks and filters insignificant blocks, and the fine-level stage selects salient sentences in each block and extracts the final summary from selected sentences. The framework achieves new state-of-the-art unsupervised summarization results on Gov-Report and BillSum and speeds up 4-28 times more than previous methods.

Alexander M. Rush, Sumit Chopra

19.  A Neural Attention Model for Abstractive Sentence Summarization
EMNLP, 2015 Supervised Learning

System: The paper proposes a new approach to abstractive sentence summarization using a fully data-driven method. The method utilizes a local attention-based model that generates each word of the summary based on the input sentence. The model is simple in structure, but can be trained end-to-end and scaled to a large amount of training data. The model shows significant performance gains on the DUC-2004 shared task compared to other strong baselines.

Preksha Nema, Mitesh M. Khapra, Anirban Laha, Balaraman Ravindran

20.  Diversity driven Attention Model for Query-based Abstractive Summarization
ACL, 2017 Supervised Learning

The paper proposes a model for query-based summarization that addresses the problem of repeated phrases in the summary. The model is based on the encode-attend-decode paradigm and includes a query attention model and a diversity-based attention model. The authors introduce a new query-based summarization dataset and show that their model outperforms vanilla encode-attend-decode models with a gain of 28% in ROUGE-L scores.

Sumit Chopra, Michael Auli, Alexander M. Rush

21.  Abstractive Sentence Summarization with Attentive Recurrent Neural Networks
NAACL, 2016 Supervised Learning

The paper discusses a new method for Abstractive Sentence Summarization, which generates a shorter version of a given sentence while preserving its meaning. The method uses a conditional recurrent neural network (RNN) with a novel convolutional attention-based encoder to ensure that the decoder focuses on the appropriate input words. The model relies on learned features and is easy to train on large data sets. The experiments show that the model outperforms the state-of-the-art method on the Gigaword corpus and performs competitively on the DUC-2004 shared task.

Ramesh Nallapati, Bowen Zhou, Cicero dos Santos, Bing Xiang

22.  Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond
CONLL, 2016 Supervised Learning

The paper discusses the use of sequence-to-sequence recurrent neural networks (RNNs) for text summarization. It also explores various techniques for improving the performance of these models, such as attention mechanisms and pointer networks. The authors present experimental results on several benchmark datasets, demonstrating the effectiveness of their approach. They also discuss potential future directions for research in this area.

Yizhu Liu, Zhiyi Luo, Kenny Q. Zhu

23.  Controlling Length in Abstractive Summarization Using a Convolutional Neural Network
EMNLP, 2018 Supervised Learning

The paper discusses the limitations of convolutional neural networks (CNNs) in generating summaries of desired lengths for different scenarios with space or length constraints. To address this problem, the authors propose an approach to constrain the summary length by extending a convolutional sequence to sequence model. The results show that this approach generates high-quality summaries with user-defined length and outperforms baselines in terms of ROUGE score, length variations, and semantic similarity.

Qingyu Zhou, Nan Yang, Furu Wei, Ming Zhou

24.  Selective Encoding for Abstractive Sentence Summarization
ACL, 2017 Supervised Learning

The paper proposes a selective encoding model for abstractive sentence summarization, which includes a sentence encoder, a selective gate network, and an attention equipped decoder. The model uses recurrent neural networks and constructs a second level sentence representation for better performance. The model was evaluated on multiple datasets and outperformed state-of-the-art baseline models.

Piji Li, Wai Lam, Lidong Bing, Zihao Wang

25.  Deep Recurrent Generative Decoder for Abstractive Text Summarization
EMNLP, 2017 Supervised Learning

The paper proposes a new framework for abstractive text summarization using a sequence-to-sequence oriented encoder-decoder model with a deep recurrent generative decoder. The model learns latent structure information from target summaries using a recurrent latent random model and neural variational inference. Abstractive summaries are generated using both generative latent variables and discriminative deterministic states. The model outperforms state-of-the-art methods on benchmark datasets in different languages.

Xinyu Hua, Lu Wang

26.  A Pilot Study of Domain Adaptation Effect for Neural Abstractive Summarization
EMNLP, 2017 Supervised Learning

System: The paper explores domain adaptation for neural abstractive summarization and investigates what information can be transferred to a new domain. The study finds that pre-training based on extractive summaries benefits the neural summarization model and that a combination of in-domain and out-of-domain setup yields better summaries when in-domain data is insufficient. The model is capable of selecting salient content even when trained on out-of-domain data, but requires in-domain data to capture the style for a target domain.

Jiwei Tan, Xiaojun Wan, Jianguo Xiao

27.  Abstractive Document Summarization with a Graph-Based Attentional Neural Model
ACL, 2017 Supervised Learning

The paper discusses the challenges of abstractive document summarization and proposes a novel graph-based attention mechanism in the sequence-to-sequence framework to address the saliency factor of summarization. The experimental results show that the proposed model achieves considerable improvement over previous neural abstractive models and is competitive with state-of-the-art extractive methods.

Romain Paulus, Caiming Xiong, Richard Socher

28.  A DEEP REINFORCED MODEL FOR ABSTRACTIVE SUMMARIZATION
ICLR, 2018 Reinforced Learning

The paper discusses the limitations of current attentional, RNN-based encoder-decoder models for abstractive summarization on longer documents and introduces a new neural network model with a novel intraattention and a training method that combines supervised word prediction and reinforcement learning. The resulting summaries are more readable and the model achieves an improved ROUGE-1 score on the CNN/Daily Mail dataset compared to previous state-of-the-art models. Human evaluation also shows that the model produces higher quality summaries.

Ziqiang Cao, Furu Wei, Wenjie Li, Sujian Li

29.  Faithful to the Original: Fact-Aware Neural Abstractive Summarization
AAAI, 2018 Supervised Learning

The paper discusses the problem of fake facts in abstractive summarization, where different parts of the source text are fused together. The authors propose a solution that leverages open information extraction and dependency parse technologies to extract actual fact descriptions from the source text, and a dual-attention sequence-to-sequence framework to generate summaries conditioned on both the source text and the extracted fact descriptions. Experiments show that their model can reduce fake summaries by 80%, while also improving informativeness.

Linqing Liu, Yao Lu, Min Yang, Qiang Qu, Jia Zhu, Hongyan Li

30.  Generative Adversarial Network for Abstractive Text Summarization
AAAI, 2018 Reinforced Learning

The paper proposes an adversarial process for abstractive text summarization, where a generative model and a discriminative model are simultaneously trained. The generator is built as an agent of reinforcement learning, while the discriminator attempts to distinguish the generated summary from the ground truth summary. The model achieves competitive ROUGE scores with state-of-the-art methods on the CNN/Daily Mail dataset and is able to generate more abstractive, readable, and diverse summaries.

Qiwei Bi, Haoyuan Li, Hanfang Yang

31.  Boosting Few-Shot Abstractive Summarization with Auxiliary Tasks
CIKM, 2021 Supervised Learning

The paper discusses the challenge of summarization in niche domains and proposes a solution to the few-shot problem by designing auxiliary tasks to assist abstractive summarization. The authors use BART as the base sequence-to-sequence model and incorporate the main and auxiliary tasks under a multi-task framework. They also use a task-specific adapter and adaptive weight mechanism to adjust the contribution of auxiliary tasks to the main task. The experiments show the effectiveness of their method for few-shot datasets, and they propose pre-training the model on unlabeled datasets to further improve performance.

Alexander R. Fabbri, Simeng Han, Haoyuan Li, Haoran Li, Marjan Ghazvininejad, Shafiq Joty, Dragomir Radev, Yashar Mehdad

32.  Improving Zero and Few-Shot Abstractive Summarization with Intermediate Fine-tuning and Data Augmentation
NAACL, 2021 Supervised Learning

The paper discusses how models pretrained on large text corpora achieve state-of-the-art performance on English text summarization tasks, but fine-tuning them on new, niche domains is infeasible due to the requirement of hundreds of thousands of data points. The authors introduce a novel and generalizable method called WikiTransfer, which fine-tunes pretrained models for summarization in an unsupervised, dataset-specific manner using pseudo-summaries produced from generic Wikipedia data. WikiTransfer models achieve state-of-the-art, zero-shot abstractive summarization performance on the CNN-DailyMail dataset and demonstrate effectiveness on three additional diverse datasets. The authors also employ data augmentation and introduce a regularization term to improve few-shot transfer performance. The paper further studies the effect of dataset aspects on transfer performance and evaluates the quality of output summaries using both automatic and human evaluation.

Wei Li, Xinyan Xiao, Yajuan Lyu, Yuanzhuo Wang

33.  Improving Neural Abstractive Document Summarization with Structural Regularization
EMNLP, 2018 Supervised Learning

The paper discusses the limitations of current neural sequence-to-sequence models in document summarization and proposes a solution that leverages the structural information of both documents and multi-sentence summaries to improve performance. The proposed method involves incorporating structural-compression and structural-coverage regularization to capture the information compression and coverage properties of document summarization. Experimental results show that the proposed method significantly improves the performance of document summarization and outperforms current state-of-the-art neural abstractive methods.

Angela Fan, David Grangier, Michael Auli

34.  Controllable Abstractive Summarization
ACL, 2018 Supervised Learning

The paper discusses how current document summarization models do not take into account user preferences such as desired length, style, entities of interest, and how much of the document has been read. The authors propose a neural summarization model that allows users to specify these preferences, resulting in high quality summaries tailored to their needs. The system can also automatically set control variables and outperforms state of the art abstractive systems on the CNN-Dailymail dataset.

Yen-Chun Chen, Mohit Bansal

35.  Fast Abstractive Summarization with Reinforce-Selected Sentence Rewriting
ACL, 2018 Reinforced Learning

The paper proposes a summarization model that selects important sentences and rewrites them to create a concise summary. They use a new sentence-level policy gradient method to bridge the gap between two neural networks and achieve higher scores on all metrics, including human evaluation, on the CNN/Daily Mail dataset. The model also enables faster inference and training convergence than previous models. The model is also demonstrated to perform well on the DUC2002 dataset.

Junyang Lin, Xu Sun, Shuming Ma, Qi Su

36.  Global Encoding for Abstractive Summarization
ACL, 2018 Supervised Learning

The paper proposes a new global encoding framework to improve the conventional sequence-to-sequence model in neural abstractive summarization, which often suffers from repetition and semantic irrelevance. The framework controls the information flow from the encoder to the decoder based on the global information of the source context, using a convolutional gated unit to perform global encoding and improve the representations of the source-side information. Evaluations on two datasets show that the proposed model outperforms baseline models and is capable of generating higher quality summaries with reduced repetition.

Kaiqiang Song, Lin Zhao, Fei Liu

37.  Structure-Infused Copy Mechanisms for Abstractive Summarization
COLING, 2018 Supervised Learning

The paper discusses the limitations of current summarization systems and proposes a new approach that incorporates source-side syntactic information to improve the quality of summaries. The approach uses structure-infused copy mechanisms to copy important words and relations from the source sentence to the summary sentence. Experimental results show that this approach is effective and outperforms state-of-the-art methods.

Wan-Ting Hsu, Chieh-Kai Lin, Ming-Ying Lee, Kerui Min, Jing Tang, Min Sun

38.  A Unified Model for Extractive and Abstractive Summarization using Inconsistency Loss
ACL, 2018 Supervised Learning

The paper proposes a unified model that combines the strengths of extractive and abstractive summarization. The model uses sentence-level attention to modulate word-level attention, resulting in a more readable paragraph. The model also introduces a novel inconsistency loss function to penalize the inconsistency between two levels of attentions. By end-to-end training, the model achieves state-of-the-art ROUGE scores and is the most informative and readable summarization on the CNN/Daily Mail dataset according to a human evaluation.

Min Yang, Qiang Qu, Ying Shen, Qiao Liu, Wei Zhao, Jia Zhu

39.  Aspect and Sentiment Aware Abstractive Review Summarization
COLING, 2018 Supervised Learning

The paper discusses the lack of research on end-to-end abstractive review summarization, which is important for businesses and consumers to make informed decisions. The authors propose a mutual attention mechanism that learns the representations of context, sentiment, and aspect words within reviews, acting as an encoder. The learned representations are incorporated into the decoder to generate aspect/sentiment-aware review summaries via an attention fusion network. The abstractive summarizer is jointly trained with the text categorization task, which helps learn a category-specific text encoder. The experimental results on a real-life dataset show that their model outperforms other strong competitors.

Reinald Kim Amplayo, Seung-won Hwang

40.  Entity Commonsense Representation for Neural Abstractive Summarization
NAACL, 2018 Supervised Learning

The paper explores the use of linked entities to improve the performance of a neural text summarizer. The authors propose a module called Entity2Topic (E2T) that transforms a list of entities into a vector representation of the summary's topic. They use an off-the-shelf entity linking system (ELS) to extract linked entities, but resolve imperfections in the ELS by encoding entities with selective disambiguation and pooling entity vectors using firm attention. Applying E2T to a simple sequence-to-sequence model with attention mechanism results in significant improvements in the performance of the summarizer in the Gigaword and CNN datasets.

Arman Cohan, Franck Dernoncourt, Doo Soon Kim, Trung Bui, Seokhwan Kim, Walter Chang, Nazli Goharian

41.  A Discourse-Aware Attention Model for Abstractive Summarization of Long Documents
NAACL, 2018 Supervised Learning

System: The paper proposes a new model for abstractive summarization of longer-form documents, such as research papers. The model uses a hierarchical encoder to model the discourse structure of the document and an attentive discourse-aware decoder to generate the summary. Empirical results on two large-scale datasets of scientific papers show that the proposed model outperforms state-of-the-art models.

Asli Celikyilmaz, Antoine Bosselut, Xiaodong He, Yejin Choi

42.  Deep Communicating Agents for Abstractive Summarization
NAACL, 2018 Reinforced Learning

System: The paper proposes a new approach to abstractive summarization using deep communicating agents in an encoder-decoder architecture. The task of encoding a long text is divided across multiple collaborating agents, each responsible for a subsection of the input text. These encoders are connected to a single decoder, trained using reinforcement learning to generate a focused and coherent summary. Empirical results show that this approach leads to higher quality summaries compared to several strong baselines.

Chenliang Li, Weiran Xu, Sheng Gao

43.  Guiding Generation for Abstractive Text Summarization Based on Key Information Guide Network
NAACL, 2018 Supervised Learning

The paper proposes a guiding generation model that combines extractive and abstractive methods for text summarization. The model uses a Key Information Guide Network (KIGN) to encode keywords and guide the generation process, and a prediction-guide mechanism to obtain long-term value for future decoding. The model is evaluated on the CNN/Daily Mail dataset and shows significant improvements compared to previous models.

Hayato Kobayashi

44.  Frustratingly Easy Model Ensemble for Abstractive Summarization
EMNLP, 2018 Unsupervised Learning

System: The paper discusses the effectiveness of ensemble methods for text-generation tasks, but notes that they often come with increased computational costs. The authors propose an alternative unsupervised ensemble method called post-ensemble, which selects a majority-like output in post-processing. The method is theoretically related to kernel density estimation based on the von MisesFisher kernel. Experimental results on a news headline-generation task show that the proposed method outperforms current ensemble methods.

Wei Li, Xinyan Xiao, Yajuan Lyu, Yuanzhuo Wang

45.  Improving Neural Abstractive Document Summarization with Explicit Information Selection Modeling
EMNLP, 2018 Supervised Learning

The paper proposes a new approach to document summarization that explicitly models and optimizes the information selection process. This is achieved through an information selection layer that includes global information filtering and local sentence selection. The approach is trained using distantly-supervised training guided by a golden summary. Experimental results show that this approach significantly improves document summarization performance and outperforms state-of-the-art neural abstractive methods.

Sebastian Gehrmann, Yuntian Deng, Alexander M. Rush

46.  Bottom-Up Abstractive Summarization
EMNLP, 2018 Supervised Learning

The paper proposes a technique to improve the content selection of neural network-based methods for abstractive summarization. The technique involves using a data-efficient content selector to identify phrases in the source document that should be included in the summary. This selector is used as a bottom-up attention step to constrain the model to likely phrases, resulting in improved text compression and fluent summaries. The approach is simpler and higher performing than other end-to-end content selection models, and can be trained with as little as 1,000 sentences, making it easy to transfer to a new domain. The technique was shown to significantly improve ROUGE scores for both the CNN-DM and NYT corpus.

Li Wang, Junlin Yao, Yunzhe Tao, Li Zhong, Wei Liu, Qiang Du

47.  A Reinforced Topic-Aware Convolutional Sequence-to-Sequence Model for Abstractive Text Summarization
IJCAI, 2018 Reinforced Learning

The paper proposes a deep learning approach to automatic summarization that incorporates topic information into the ConvS2S model and uses SCST for optimization. The approach improves coherence, diversity, and informativeness of generated summaries through a biased probability generation mechanism. Reinforcement training optimizes the model with respect to the non-differentiable metric ROUGE and avoids exposure bias during inference. The method is evaluated on three datasets and shows superior performance in abstractive summarization.

Min Yang, Qiang Qu, Wenting Tu, Ying Shen, Zhou Zhao, Xiaojun Chen

48.  Exploring Human-Like Reading Strategy for Abstractive Text Summarization
AAAI, 2019 Supervised Learning

The paper discusses the challenges of generating high-quality abstractive summaries using deep neural network based methods and proposes a novel Hybrid learning model for Abstractive Text Summarization (HATS) that follows a hierarchical routine similar to human-like reading strategy. HATS consists of three major components, a knowledge-based attention network, a multitask encoder-decoder network, and a generative adversarial network, which are consistent with the different stages of the human-like reading strategy. The experimental results on two real-life datasets, CNN/Daily Mail and Gigaword, demonstrate that HATS achieves impressive results.

Eva Sharma, Luyang Huang, Zhe Hu, Lu Wang

49.  An Entity-Driven Framework for Abstractive Summarization
EMNLP, 2019 Reinforced Learning

The paper introduces SENECA, a new system for entity-driven coherent abstractive summarization that uses entity information to generate informative and coherent abstracts. The framework takes a two-step approach, with an entity-aware content selection module identifying salient sentences and an abstract generation module conducting cross-sentence information compression and abstraction. The model is trained with rewards to promote coherence, conciseness, and clarity, and is further connected using reinforcement learning. Automatic evaluation shows that SENECA outperforms previous state-of-the-art on ROUGE and coherence measures on New York Times and CNN/Daily Mail datasets, and human judges rate its summaries as more informative and coherent than those by popular summarization models.

Kai Wang, Xiaojun Quan, Rui Wang

50.  BiSET: Bi-directional Selective Encoding with Template for Abstractive Summarization
ACL, 2019 Supervised Learning

The paper proposes a new model called Bi-directional Selective Encoding with Template (BiSET) for summarizing articles. The model uses templates discovered from training data to select key information from source articles and guide the summarization process. The experiments conducted on a standard summarization dataset show that the BiSET model significantly improves the summarization performance and achieves a new state of the art.

Min Gui, Junfeng Tian, Rui Wang, Zhenglu Yang

51.  Attention Optimization for Abstractive Document Summarization
EMNLP, 2019 Supervised Learning

System: The paper discusses the importance of attention in improving document summarization models. The authors propose an attention refinement unit that uses both local and global variance loss to supervise the attention model at each decoding step and optimize the attention distributions from a global perspective. The effectiveness of the proposed methods is verified through experiments on the CNN/Daily Mail dataset.

Xiangyu Duan, Hongfei Yu, Mingming Yin, Min Zhang, Weihua Luo, Yue Zhang

52.  Contrastive Attention Mechanism for Abstractive Sentence Summarization
EMNLP, 2019 Supervised Learning

The paper proposes a contrastive attention mechanism for abstractive sentence summarization, which includes both conventional attention that focuses on relevant parts of the source sentence and opponent attention that focuses on irrelevant or less relevant parts. The mechanism is trained in an opposite way to encourage the contribution from conventional attention and discourage the contribution from opponent attention. Experiments show that the proposed mechanism is more focused on relevant parts and greatly improves the state-of-the-art performance on the task. The code is available on GitHub.

Yufei Tian, Jianfei Yu, Jing Jiang

53.  Aspect and Opinion Aware Abstractive Review Summarization with Reinforced Hard Typed Decoder
CIKM, 2019 Reinforced Learning

System: The paper discusses a two-stage reinforcement learning approach for abstractive review summarization. The approach predicts the output word type and then generates the final word distribution based on the predicted word type. The method outperforms several strong baseline approaches based on ROUGE scores in experimental results on two Amazon product review datasets.

Wang Wenbo, Gao Yang, Zhou Yuxiang

54.  Concept Pointer Network for Abstractive Summarization
EMNLP, 2019 Supervised Learning

The paper proposes a concept pointer network for improving abstractive summarization by generating new conceptual words to express concrete details. The network uses knowledge-based, context-aware conceptualizations to derive an extended set of candidate concepts and points to the most appropriate choice using both the concept set and original source text. The training model is optimized using a novel method of distantly-supervised learning guided by reference summaries and testing set. The proposed approach provides statistically significant improvements over several state-of-the-art models on both the DUC2004 and Gigaword datasets, and a human evaluation supports the quality of the summaries produced within this framework.

Shen Gao, Xiuying Chen, Piji Li, Zhangming Chan, Dongyan Zhao, Rui Yan

55.  How to Write Summaries with Patterns? Learning towards Abstractive Summarization through Prototype Editing
EMNLP, 2019 Supervised Learning

The paper introduces a model called Prototype Editing based Summary Generator (PESG) that utilizes prototype document-summary pairs to generate better summaries that conform to a particular style with patterns. The model addresses two challenges: incorporating learned patterns from the prototype while avoiding copying irrelevant facts, and generating new summaries based on the summary pattern or extracted facts. A fact checker is used to estimate mutual information between the input document and generated summary, resulting in state-of-the-art performance in both automatic metrics and human evaluations.

Sanghwan Bae, Taeuk Kim, Jihoon Kim, Sang-goo Lee

56.  Summary Level Training of Sentence Rewriting for Abstractive Summarization
EMNLP, 2019 Reinforced Learning

The paper proposes a new approach to combining extractive and abstractive summarization using Sentence Rewriting models. The existing models in this framework rely on suboptimal labels, causing a mismatch between the training objective and evaluation metric. The authors present a novel training signal that directly maximizes summary-level ROUGE scores through reinforcement learning and incorporate BERT into their model. They show that their proposed model and training procedure obtain new state-of-the-art performance on both CNN/Daily Mail and New York Times datasets and generalize better on DUC-2002 test set.

Siyao Li, Deren Lei, Pengda Qin, William Yang Wang

57.  Deep Reinforcement Learning with Distributional Semantic Rewards for Abstractive Summarization
EMNLP, 2019 Reinforced Learning

The paper discusses the limitations of using conventional reward measures for deep reinforcement learning in abstractive summarization tasks, which can result in repetitive and incoherent sentences. Instead, the authors propose using distributional semantics to measure the matching degrees, allowing for sentence-level evaluation and the generation of semantically-correct phrases. The proposed distributional semantics reward (DSR) is shown to have superior performance in capturing the lexical and compositional diversity of natural language, based on human judgments on Gigaword and CNN/Daily Mail datasets.

Yongjian You, Weijia Jia, Tianyi Liu, Wenmian Yang

58.  Improving Abstractive Document Summarization with Salient Information Modeling
ACL, 2019 Supervised Learning

The paper proposes a Transformer-based encoder-decoder framework with two novel extensions for abstractive document summarization. The first extension is a focus-attention mechanism that models a Gaussian focal bias on attention scores to enhance the perception of local context, contributing to producing salient and informative summaries. The second extension is an independent saliency-selection network that manages the information flow from encoder to decoder, effectively reducing the influences of secondary information on the generated summaries. Experimental results on the CNN/Daily Mail benchmark show that the proposed model outperforms other state-of-the-art baselines on the ROUGE metrics.

Lei Li, Wei Liu, Marina Litvak, Natalia Vanetik, Zuying Huang

59.  In Conclusion Not Repetition: Comprehensive Abstractive Summarization With Diversified Attention Based On Determinantal Point Processes
CONLL, 2019 Supervised Learning

The paper discusses the limitations of existing Seq2Seq models for abstractive summarization and introduces a new model called DivCNN Seq2Seq that uses Determinantal Point Processes methods to produce attention distribution that considers both quality and diversity. The new model achieves a higher level of comprehensiveness compared to existing models and strong baselines without breaking the end-to-end architecture. The reproducible codes and datasets are available online.

Byeongchang Kim, Hyunwoo Kim, Gunhee Kim

60.  Abstractive Summarization of Reddit Posts with Multi-level Memory Networks
NAACL, 2019 Supervised Learning

System: The paper discusses a method for summarizing Reddit posts using multi-level memory networks. The authors propose a model that can capture the important information in a post and generate a summary that accurately reflects the content. The model uses both word-level and sentence-level representations to capture the meaning of the post and the relationships between different parts of the text. The authors evaluate their model on a dataset of TIFU (Today I Fucked Up) posts from Reddit and show that it outperforms several baseline methods in terms of ROUGE scores.

Panagiotis Kouris, Georgios Alexandridis, Andreas Stafylopatis

61.  Abstractive Text Summarization Based on Deep Learning and Semantic Content Generalization
ACL, 2019 Supervised Learning

The paper presents a new method for improving abstractive text summarization using deep learning and semantic data transformations. The method involves using a theoretical model for semantic-based text generalization along with a deep encoder-decoder architecture to produce a summary in generalized form. The summary is then transformed into a human-readable form while retaining important information and addressing the problem of out-of-vocabulary or rare words. The approach is evaluated on two datasets with positive results.

Logan Lebanoff, Kaiqiang Song, Franck Dernoncourt, Doo Soon Kim, Seokhwan Kim, Walter Chang, Fei Liu

62.  Scoring Sentence Singletons and Pairs for Abstractive Summarization
ACL, 2019 Supervised Learning

The paper discusses the challenge of summarizing text by both compressing single sentences and fusing pairs, as sentence selection methods only work with single sentences and not combinations of them. The authors propose a framework that ranks sentence singletons and pairs together in a unified space, modeling human methodology by selecting either a single sentence or a pair of sentences and compressing or fusing them to produce a summary sentence. The framework was tested on both single and multidocument summarization datasets, with findings reported on sentence selection and abstraction.

Tatsuya Ishigaki1(B, Hen-Hsen Huang, Hiroya Takamura, Hsin-Hsi Chen, Manabu Okumura

63.  Neural Query-Biased Abstractive Summarization Using Copying Mechanism
ECIR, 2020 Supervised Learning

System: The paper discusses the query-biased summarization task and how conventional approaches have achieved better performance by including overlapping words between the source and the query in the summary. However, RNN-based approaches do not explicitly model this phenomenon. The paper proposes an RNN-based query-biased summarizer that primarily includes overlapping words in the summary using a copying mechanism. Experimental results show that this strategy works well for neural query-biased summarizers.

Philippe Laban, Andrew Hsi

64.  The Summary Loop: Learning to Write Abstractive Summaries Without Examples
ACL, 2020 Unsupervised Learning

The paper presents a new approach to unsupervised abstractive summarization that maximizes coverage and fluency while adhering to a length constraint. The method includes key terms from the original document and uses a coverage model to fill them in the generated summary. The unsupervised training procedure uses both coverage and fluency models to generate and score summaries. The method outperforms previous unsupervised methods by more than 2 R-1 points and approaches results of competitive supervised methods. The model attains higher levels of abstraction with shorter copied passages and learns to compress and merge sentences without supervision.

Sajad Sotudeh, Nazli Goharian, Ross W. Filice

65.  Attend to Medical Ontologies: Content Selection for Clinical Abstractive Summarization
ACL, 2020 Supervised Learning

The paper discusses the limitations of the seq2seq network in identifying key regions of the source for text summarization. The authors propose a solution by augmenting salient ontological terms into the summarizer for clinical abstractive summarization. Their experiments on two clinical data sets show that their model significantly improves state-of-the-art results in terms of ROUGE metrics, which is important in the healthcare domain where any improvement can impact patients’ welfare.

Luyang Huang, Lingfei Wu, Lu Wang, John M. Fabrizi, Joseph P. Ganim

66.  Knowledge Graph-Augmented Abstractive Summarization with Semantic-Driven Cloze Reward
ACL, 2020 Reinforced Learning

The paper discusses the limitations of current sequence-to-sequence models for abstractive summarization and proposes a new framework called ASGARD, which uses dual encoders and a reward system based on a multiple choice cloze test to better capture entity interactions and generate more informative summaries. The authors show that their models produce significantly higher ROUGE scores and are rated as more informative and containing fewer errors by human judges compared to other systems.

Zhenwen Li, Wenhao Wu, Sujian Li

67.  Composing Elementary Discourse Units in Abstractive Summarization
ACL, 2020 Reinforced Learning

The paper proposes a new method for abstractive summarization using elementary discourse units (EDUs) instead of sentences. The method includes an EDU selection model to group informative EDUs and an EDU fusion model to combine them into sentences. The reinforcement learning mechanism is used to improve the summarization performance. The model was tested on CNN/Daily Mail and showed promising results.

Kaiqiang Song, Bingqing Wang, Zhe Feng, Liu Ren, Fei Liu

68.  Controlling the Amount of Verbatim Copying in Abstractive Summarization
AAAI, 2020 Supervised Learning

The paper discusses the challenge of creating abstracts that accurately summarize the original text without changing its meaning. It explores the use of neural summarization models to generate summaries with varying degrees of copying, from purely extractive to highly generative. The authors present a method that allows for control over copying during both training and decoding stages, and demonstrate its effectiveness through extensive experiments. The paper also reveals interesting and unobvious findings about the process of summarization.

Haoran Li, Junnan Zhu, Jiajun Zhang, Chengqing Zong, Xiaodong He

69.  Keywords-Guided Abstractive Sentence Summarization
AAAI, 2020 Supervised Learning

This paper proposes an abstractive sentence summarization method that applies guidance signals of keywords to both the encoder and the decoder in the sequence-to-sequence model. A multi-task learning framework is adopted to jointly learn to extract keywords and generate a summary for the input sentence. The authors apply keywords-guided selective encoding strategies to filter source information by investigating the interactions between the input sentence and the keywords. They extend the pointer-generator network by a dual-attention and a dual-copy mechanism, which can integrate the semantics of the input sentence and the keywords, and copy words from both the input sentence and the keywords. The authors demonstrate that multi-task learning and keywords-oriented guidance facilitate sentence summarization task, achieving better performance than the competitive models on the English Gigaword sentence summarization dataset.

Logan Lebanoff, Franck Dernoncourt, Doo Soon Kim, Walter Chang, Fei Liu

70.  A Cascade Approach to Neural Abstractive Summarization with Content Selection and Fusion
AACL, 2020 Supervised Learning

The paper presents an empirical study supporting the use of a cascade architecture for neural text summarization. The study shows that a pipeline architecture, which separately identifies important content pieces and stitches them together, performs comparably or better than end-to-end systems that perform content selection and surface realization jointly. The paper also discusses the challenges of evaluating summarization systems and suggests future research directions.

Hanqi Jin, Tianming Wang, Xiaojun Wan

71.  SemSUM: Semantic Dependency Guided Neural Abstractive Summarization
AAAI, 2020 Supervised Learning

The paper proposes a new approach to neural abstractive summarization that incorporates semantic dependency graphs to improve semantic relevance and reduce content deviation in generated summaries. The proposed model, SemSUM, leverages the information of original input texts and corresponding semantic dependency graphs to guide the summarization process. The model was evaluated on three datasets and showed significant improvements in automatic evaluation ROUGE metrics.

Kaiqiang Song, Logan Lebanoff, Qipeng Guo, Xipeng Qiu, Xiangyang Xue, Chen Li, Dong Yu, Fei Liu

72.  Joint Parsing and Generation for Abstractive Summarization
AAAI, 2020 Supervised Learning

The paper proposes a solution to the problem of ungrammatical and inaccurate sentences produced by abstractive summarization systems. The proposed method involves generating a sentence and its syntactic dependency parse simultaneously to encourage grammatical sentences and maintain the original meaning. The paper presents a novel neural architecture for abstractive summarization that combines a sequential decoder with a tree-based decoder and a human evaluation protocol to assess the accuracy of the summary. The method is evaluated on various datasets and shows competitive results against strong baselines.

Song Xu, Haoran Li, Peng Yuan, Youzheng Wu, Xiaodong He, Bowen Zhou

73.  Self-Attention Guided Copy Mechanism for Abstractive Summarization
ACL, 2020 Supervised Learning

The paper proposes a Transformer-based model to improve the copy mechanism in abstractive summarization. The model identifies the importance of each source word using degree centrality with a directed graph built by the self-attention layer. The centrality of each source word is used to guide the copy process explicitly, resulting in better performance than baseline methods on the CNN/Daily Mail and Gigaword datasets.

Chenguang Zhu, Ruochen Xu, Michael Zeng, Xuedong Huang

74.  A Hierarchical Network for Abstractive Meeting Summarization with Cross-Domain Pretraining
EMNLP, 2020 Supervised Learning

The paper discusses the challenge of summarizing meeting transcripts and proposes a novel abstractive summary network that adapts to the meeting scenario. The network includes a hierarchical structure to accommodate long transcripts and a role vector to depict the difference among speakers. The model is pre-trained on largescale news summary data due to the inadequacy of meeting summary data. The empirical results show that the proposed model outperforms previous approaches in both automatic metrics and human evaluation, with an increase in ROUGE-1 score from 34.66% to 46.28% on the ICSI dataset.

Zhengjue Wang, Zhibin Duan, Hao Zhang, Chaojie Wang, Long Tian, Bo Chen, Mingyuan Zhou

75.  Friendly Topic Assistant for Transformer Based Abstractive Summarization
EMNLP, 2020 Supervised Learning

The paper discusses the use of topic models to improve the performance of Transformer-based models in abstractive document summarization. The proposed model, called topic assistant (TA), includes three modules and is compatible with various Transformer-based models. TA is user-friendly and only introduces a small number of extra parameters. Experimental results on three datasets demonstrate that TA is able to improve the performance of several Transformer-based models.

Zheng Zhao, Shay B. Cohen, Bonnie Webber

76.  Reducing Quantity Hallucinations in Abstractive Summarization
EMNLP, 2020 Supervised Learning

The paper discusses the issue of hallucination in abstractive summaries and proposes a solution using the HERMAN system. HERMAN verifies specific entities in summaries and up-ranks those whose quantity terms are supported by the original text. Experimental results show higher precision and F1 scores for up-ranked summaries without a loss in recall, and human evaluation shows a preference for up-ranked summaries.

Khalil Mrini, Can Liu, Markus Dreyer

77.  Rewards with Negative Examples for Reinforced Topic-Focused Abstractive Summarization
EMNLP, 2021 Reinforced Learning

System: This paper discusses the problem of generating abstractive summaries focused on a particular topic. The authors propose a deep reinforcement learning approach that uses a negative example baseline to improve the model's ability to identify what it should not focus on. They adapt existing datasets for this task and show that their approach outperforms a self-critical baseline in various evaluation metrics.

Yichen Jiang, Asli Celikyilmaz, Paul Smolensky, Paul Soulos, Sudha Rao, Hamid Palangi, Roland Fernandez, Caitlin Smith, Mohit Bansal, Jianfeng Gao

78.  Enriching Transformers with Structured Tensor-Product Representations for Abstractive Summarization
NAACL, 2021 Supervised Learning

The paper discusses the task of abstractive summarization, which involves generating a concise summary of input documents. The authors adapt the TP-TRANSFORMER architecture, which enriches the original Transformer with the Tensor Product Representation (TPR), for this task. The model encodes two separate representations for each token to represent the syntactic structure and semantic content separately, and then binds them into the TPR as the layer output. The authors argue that this structured intermediate representation enables the model to better control the contents and structures when generating the summary. The TP-TRANSFORMER outperforms the Transformer and the original TP-TRANSFORMER significantly on several abstractive summarization datasets based on both automatic and human evaluations. The authors also demonstrate the emergent structural information in the role vectors and improved syntactic interpretability in the TPR layer outputs.

Yue Dong, Shuohang Wang, Zhe Gan, Yu Cheng, Jackie Chi, Kit Cheung, Jingjing Liu

79.  Multi-Fact Correction in Abstractive Text Summarization
EMNLP, 2020 Supervised Learning

The paper discusses the challenges faced by system-generated abstractive summaries, which often contain factual inconsistencies. To address this issue, the authors propose SpanFact, a suite of two factual correction models that use knowledge from question answering models to correct errors in system-generated summaries. The models use single or multimasking strategies to replace entities and ensure semantic consistency with the source text while retaining the syntactic structure of the summaries. Experiments show that SpanFact significantly improves the factual consistency of system-generated summaries without sacrificing summary quality.

Asma Ben Abacha, Dina Demner-Fushman

80.  On the Summarization of Consumer Health Questions
ACL, 2019

The paper discusses the challenge of question understanding in question answering, particularly in the context of natural language questions that are longer than necessary and contain peripheral information. The authors study neural abstractive models for medical question summarization and introduce the MeQSum corpus of 1,000 summarized consumer health questions. They explore data augmentation methods and evaluate state-of-the-art neural abstractive models on this task. The authors show that semantic augmentation from question datasets improves performance and that pointer-generator networks outperform sequence-to-sequence attentional models, achieving a ROUGE-1 score of 44.16%. The paper also includes a detailed error analysis and suggestions for improving question summarization.

Meng Cao, Yue Dong, Jiapeng Wu, Jackie Chi, Kit Cheung

81.  Factual Error Correction for Abstractive Summarization Models
EMNLP, 2020 Supervised Learning

The paper discusses the challenge of ensuring factual consistency in abstractive summarization systems and proposes a post-editing corrector module to address this issue. The module is pre-trained on artificial examples created by applying heuristic transformations on reference summaries. Experimental results show that the model is able to correct factual errors in summaries generated by other neural summarization models and outperforms previous models on factual consistency evaluation on the CNN/DailyMail dataset. However, the paper also notes that transferring from artificial error correction to downstream settings is still challenging.

Yanyan Zou, Xingxing Zhang, Wei Lu, Furu Wei, Ming Zhou

82.  Pre-training for Abstractive Document Summarization by Reinstating Source Text
EMNLP, 2020 Supervised Learning

The paper discusses the challenge of training large SEQ2SEQ based summarization models on limited supervised summarization data and presents three sequence-to-sequence pre-training objectives that allow for pre-training a SEQ2SEQ based abstractive summarization model on unlabeled text. These objectives include sentence reordering, next sentence generation, and masked document generation, which have close relations with the abstractive document summarization task. Experiments on two benchmark summarization datasets show that all three objectives can improve performance upon baselines. The method achieves comparable results to models pre-trained on large-scale data with only 19GB text for pre-training, demonstrating its effectiveness. Code and models are publicly available.

Changmeng Zheng, Yi Cai, Guanjie Zhang, Qing Li

83.  Controllable Abstractive Sentence Summarization with Guiding Entities
COLING, 2020 Supervised Learning

The paper proposes a controllable abstractive sentence summarization model that generates summaries with guiding entities. The model ensures that entities appear in final output summaries and can generate more novel entities. The proposed model is evaluated using fine-grained informativeness metrics in the relevance, extraness, and omission perspectives. Experimental results show that the model outperforms the state-of-the-art methods in both automatic evaluation scores and informativeness metrics.

Wenhao Wu, Wei Li, Xinyan Xiao, Jiachen Liu, Ziqiang, Sujian Li, Hua Wu, Haifeng Wang

84.  BASS: Boosting Abstractive Summarization with Unified Semantic Graph
ACL, 2021 Supervised Learning

The paper proposes a new framework called BASS for abstractive summarization of long or multi-document text, which is challenging for the Seq2Seq architecture due to its inability to analyze long-distance relations in text. BASS utilizes a unified Semantic graph to aggregate co-referent phrases and convey rich relations between them. A graph-based encoder-decoder model is also proposed to improve document representation and summary generation by leveraging the graph structure. Several graph augmentation methods are designed to encode both explicit and implicit relations in the text, while the graph propagation attention mechanism is developed in the decoder to select salient content for the summary. Empirical results show that BASS brings substantial improvements for both long-document and multi-document summarization tasks.

Chenguang Zhu, Ziyi Yang, Robert Gmyr, Michael Zeng, Xuedong Huang

85.  Leveraging Lead Bias for Zero-shot Abstractive News Summarization
SIGIR, 2021 Supervised Learning

The paper proposes leveraging the lead bias in news articles to pre-train abstractive news summarization models on large-scale unlabeled news corpora. The authors collect a massive news corpus and conduct data cleaning and filtering via statistical analysis. They apply self-supervised pre-training on this dataset to existing generation models BART and T5 for domain adaptation. The approach dramatically improves the summarization quality and achieves state-of-the-art results for zero-shot news summarization without any fine-tuning. The model is deployed in Microsoft News and provides public APIs as well as a demo website for multi-lingual news summarization.

Lihan Wang, Min Yang, Chengming Li, Ying Shen, Ruifeng Xu

86.  Abstractive Text Summarization with Hierarchical Multi-scale Abstraction Modeling and Dynamic Memory
SIGIR, 2021 Supervised Learning

System: The paper proposes a new approach to text summarization using hierarchical multi-scale abstraction modeling and dynamic memory. The system is designed to extract important information from large amounts of text and generate a concise summary. The approach is evaluated on several datasets and shows promising results compared to other state-of-the-art methods.

Dan Su, Tiezheng Yu, Pascale Fung

87.  Improve Query Focused Abstractive Summarization by Incorporating Answer Relevance
ACL, 2021 Supervised Learning

The paper proposes a new model called QFS-BART for generating summaries that are both coherent and answer-related to a given query. Unlike previous QFS models, QFS-BART considers the explicit answer relevance of the source documents given the query via a question answering model. The model also takes advantage of large pre-trained models for improved summarization performance. Empirical results on the Debatepedia dataset show that QFS-BART achieves state-of-the-art performance.

Feng Nan, Cicero Nogueira dos Santos, Henghui Zhu, Patrick Ng, Kathleen McKeown, Ramesh Nallapati, Dejiao Zhang, Zhiguo Wang, Andrew O. Arnold, Bing Xiang

88.  Improving Factual Consistency of Abstractive Summarization via Question Answering
ACL, 2021 Supervised Learning

The paper addresses the problem of factual inconsistency in abstractive summarization models. The authors propose an efficient automatic evaluation metric to measure factual consistency and a novel learning algorithm that maximizes the proposed metric during model training. Through extensive experiments, the authors confirm that their method is effective in improving factual consistency and overall quality of the summaries, as judged by both automatic metrics and human evaluation.

Tiezheng Yu, Zihan Liu, Pascale Fung

89.  AdaptSum: Towards Low-Resource Domain Adaptation for Abstractive Summarization
NAACL, 2021 Supervised Learning

The paper discusses the challenges faced by state-of-the-art abstractive summarization models due to their reliance on extensive labeled data, which limits their generalization ability on domains where such data are not available. The authors present a study of domain adaptation for the abstractive summarization task in a low-resource setting, focusing on the second phase of pre-training on large-scale generative models under three different settings. The experiments show that the effectiveness of pre-training is correlated with the similarity between the pre-training data and the target domain task. The authors also find that continuing pre-training could lead to catastrophic forgetting, and a learning method with less forgetting can alleviate this issue. The results highlight the need for more advanced domain adaptation methods for the abstractive summarization task, as a huge gap still exists between the low-resource and high-resource settings.

Shweta Yadav, Deepak Gupta, Asma Ben Abacha, Dina Demner-Fushman

90.  Reinforcement Learning for Abstractive Question Summarization with Question-aware Semantic Rewards
ACL, 2021 Reinforced Learning

The paper discusses the need for reliable and accurate question answering systems for online consumer health questions. It introduces a reinforcement learning-based framework for abstractive question summarization, which proposes two novel rewards obtained from downstream tasks to regularize the question generation model. The proposed method achieves higher performance over state-of-the-art models and generates more diverse and semantically valid questions with fewer factual inconsistencies. The source code is available on GitHub.

Yixin Liu, Pengfei Liu

91.  SimCLS: A Simple Framework for Contrastive Learning of Abstractive Summarization
ACL, 2021 Supervised Learning

The paper introduces a new framework called SIMCLS for abstractive summarization, which improves the performance of existing top-performing models by a large margin. The framework formulates text generation as a reference-free evaluation problem assisted by contrastive learning. The experimental results show that SIMCLS can achieve 2.51 absolute improvement against BART and 2.50 over PEGASUS w.r.t ROUGE-1 on the CNN/DailyMail dataset, driving the state-of-the-art performance to a new level. The codes and results have been open-sourced, and the proposed models have been deployed into the EXPLAINABOARD platform for researchers to understand the systems in a more fine-grained way.

Andreas Marfurt, James Henderson

92.  Sentence-level Planning for Especially Abstractive Summarization
EMNLP, 2021 Supervised Learning

System: The paper proposes a new model called the sentence planner model to generate more abstractive summaries. The model includes a hierarchical decoder that generates a representation for the next summary sentence and conditions the word generator on this representation. The generated summaries are more abstractive and achieve high ROUGE scores when compared to human reference summaries. The effectiveness of the design decisions is verified through extensive evaluations.

Ahmed Magooda, Mohamed Elaraby, Diane Litman

93.  Exploring Multitask Learning for Low-Resource Abstractive Summarization
EMNLP, 2021 Supervised Learning

The paper investigates the use of multitask learning for abstractive summarization with limited training data. Four different tasks, including extractive summarization, language modeling, concept detection, and paraphrase detection, are incorporated individually and in combination to improve abstractive summarization. The results show that multitask learning can enhance the performance of abstractive summarization, and certain tasks, such as paraphrase detection, consistently benefit the task.

Shashi Narayan, Yao Zhao, Joshua Maynez, Vitaly Nikolaev, Ryan McDonald

94.  Planning with Learned Entity Prompts for Abstractive Summarization
TACL, 2021 Supervised Learning

The paper introduces a mechanism to improve the generation of abstractive summaries by learning an intermediate plan that grounds the summary generation. This is achieved by prepending target summaries with entity chains, which are ordered sequences of entities mentioned in the summary. Transformer-based sequence-to-sequence models are then trained to generate the entity chain and continue generating the summary based on the entity chain and input. The approach was evaluated on multiple datasets and demonstrated improved entity specificity and planning in summaries, achieving state-of-the-art performance in terms of ROUGE on some datasets. The mechanism also provides a way to control hallucinations in abstractive summaries, outperforming state-of-the-art approaches for faithfulness when evaluated automatically and by humans.

Saadia Gabriel, Antoine Bosselut, Jeff Da, Ari Holtzman, Jan Buys, Kyle Lo, Asli Celikyilmaz, Yejin Choi

95.  Discourse Understanding and Factual Consistency in Abstractive Summarization
EACL, 2021 Supervised Learning

The paper introduces a framework called Co-opNet for generating abstractive summaries with factual consistency and narrative flow. Co-opNet is a transformer-based framework where a generator works with a discriminator architecture to compose coherent long-form summaries. The paper explores four different discriminator objectives to capture different aspects of coherence. The ability of Co-opNet to learn these objectives is measured using arXiv scientific papers, with empirical results showing improved global coherence compared to competitive baselines.

Haoran Li, Arash Einolghozati, Srinivasan Iyer, Bhargavi Paranjape, Yashar Mehdad, Sonal Gupta, Marjan Ghazvininejad

96.  EASE: Extractive-Abstractive Summarization End-to-End using the Information Bottleneck Principle
EMNLP, 2021 Supervised Learning

The paper proposes a new framework called EASE that combines the strengths of extractive and abstractive summarization systems to generate concise and interpretable summaries. The framework uses the Information Bottleneck principle to jointly train extraction and abstraction in an end-to-end fashion. Inspired by human summarization methods, the framework first extracts a pre-defined amount of evidence spans and then generates a summary using only the evidence. The authors show through automatic and human evaluations that the generated summaries are better than strong extractive and extractive-abstractive baselines.

Haoran Li, Song Xu, Peng Yuan, Yujia Wang, Youzheng Wu, Xiaodong He, Bowen Zhou

97.  Learn to Copy from the Copying History: Correlational Copy Network for Abstractive Summarization
EMNLP, 2021 Supervised Learning

The paper proposes a new copying scheme called Correlational Copying Network (CoCoNet) for abstractive summarization that enhances the standard copying mechanism by keeping track of the copying history. CoCoNet takes advantage of prior copying distributions and encourages the model to copy the input word that is relevant to the previously copied one. The model is strengthened through pretraining with suitable corpora that simulate the copying behaviors. Experimental results show that CoCoNet can copy more accurately and achieves new state-of-the-art performances on summarization benchmarks, including CNN/DailyMail for news summarization and SAMSum for dialogue summarization. The code is available at https://github.com/hrlinlp/coconet.

Shuo Guan, Ping Zhu, Zhihua Wei

98.  Knowledge and Keywords Augmented Abstractive Sentence Summarization
EMNLP, 2021 Supervised Learning

Abstractive Sentence summarization method that addresses the issue of sparse knowledge structure. The proposed method utilizes topic keywords and knowledge structure to generate high-quality summaries. The results show that KAS outperforms existing methods in terms of ROUGE scores and human evaluation.

Zi-Yi Dou, Pengfei Liu, Hiroaki Hayashi, Zhengbao Jiang, Graham Neubig

99.  GSum: A General Framework for Guided Neural Abstractive Summarization
NAACL, 2021 Supervised Learning

The paper discusses the challenges of neural abstractive summarization models, which can produce coherent summaries but may be unfaithful and difficult to control. The authors propose a guided summarization framework (GSum) that can effectively take different types of external guidance as input and demonstrate its effectiveness in achieving state-of-the-art performance on popular summarization datasets. The authors also show how different types of guidance can generate qualitatively different summaries, providing a degree of controllability to the learned models.

Feng Nan, Ramesh Nallapati, Zhiguo Wang, Cicero Nogueira dos Santos, Henghui Zhu, Dejiao Zhang, Kathleen McKeown, Bing Xiang

100.  Entity-level Factual Consistency of Abstractive Text Summarization
EACL, 2021 Supervised Learning

System: The paper discusses the challenge of ensuring factual consistency in abstractive summarization, particularly in relation to entity hallucination. The authors propose new metrics to measure entity-level factual consistency and suggest filtering training data as a solution to the problem. They also propose a summary-worthy entity classification task and a joint entity and summary generation approach to further improve entity level metrics.

Chenguang Zhu, William Hinthorn, Ruochen Xu, Qingkai Zeng, Michael Zeng, Xuedong Huang, Meng Jiang

101.  Enhancing Factual Consistency of Abstractive Summarization
NAACL, 2021 Supervised Learning

The paper discusses the issue of inconsistency between automatic abstractive summaries and the original text, which can distort or fabricate facts. To address this problem, the authors propose a fact-aware summarization model called FASUM, which integrates factual relations into the summary generation process using graph attention. They also introduce a factual corrector model called FC to automatically correct factual errors in existing summaries. Empirical results show that FASUM produces more factually consistent summaries compared to existing systems, and FC can improve the factual consistency of given summaries by modifying only a few keywords.

Ye Ma, Zixun Lan Lu Zong, Kaizhu Huang

102.  Global-aware Beam Search for Neural Abstractive Summarization
NEURIPS, 2022 Supervised Learning

The paper presents a new algorithm for neural abstractive summarization that improves upon the local optimality problem of the original beam search. The algorithm uses a novel global protocol based on the attention distribution to generate summaries in a near-global optimal fashion. The global attention distribution can be predicted before inference, allowing for step-wise improvements on the beam search through the global scoring mechanism. The algorithm is shown to significantly improve state-of-the-art summarization models on nine datasets and remains robust even with corrupted attention distributions. The codes and examples are available.

Sangwoo Cho, Kaiqiang Song, Chen Li, Dong Yu, Hassan Foroosh, Fei Liu

103.  Better Highlighting: Creating Sub-Sentence Summary Highlights
EMNLP, 2020 Supervised Learning

System: The paper proposes a method to generate summary highlights that can be overlaid on original documents to help readers sift through large amounts of text. The method aims to prevent distortion of the original meaning by providing summaries in context. The method combines determinantal point processes and deep contextualized representations to identify important and non-redundant sub-sentence segments to form self-contained highlights. The paper presents extensive experiments on summarization datasets to demonstrate the flexibility and modeling power of the method. The authors conclude that highlighting is a promising avenue for future summarization research.

Kaiqiang Song, Bingqing Wang, Zhe Feng, Fei Liu

104.  A New Approach to Overgenerating and Scoring Abstractive Summaries
NAACL, 2021 Supervised Learning

The paper proposes a new approach to generate multiple summaries with diverse content and varying lengths, and then select the best ones based on user needs. The approach involves a two-staged strategy to generate a diverse set of candidate summaries from the source text and then score and select admissible ones. The generator gives precise control over the length of the summary, and the selectors are designed to predict the optimal summary length and emphasize faithfulness to the original text. The approach achieves state-of-the-art performance in benchmark summarization datasets.

Sihao Chen, Fan Zhang, Kazoo Sone, Dan Roth

105.  Improving Faithfulness in Abstractive Summarization with Contrast Candidate Generation and Selection
NAACL, 2021 Supervised Learning

The paper discusses how current models for neural abstractive summarization often generate summaries that are not faithful to the original context. To address this issue, the authors propose a post-processing technique called contrast candidate generation and selection. They generate alternative candidate summaries where named entities and quantities are replaced with compatible semantic types from the source document, and then use a discriminative correction model to select the best candidate as the final output summary. The authors' experiments show that this method is effective in identifying and correcting extrinsic hallucinations. They also analyze the typical hallucination phenomenon by different types of neural summarization systems, in hope to provide insights for future work on the direction.

Hanlu Wu, Tengfei Ma, Lingfei Wu, Tariro Manyumwa, Shouling Ji

106.  Unsupervised Reference-Free Summary Quality Evaluation via Contrastive Learning
EMNLP, 2020

The paper proposes a new method for evaluating the quality of document summarization systems without requiring human-generated reference summaries. The method uses unsupervised contrastive learning and a new metric based on BERT that covers both linguistic qualities and semantic informativeness. The model is trained with a ranking loss using different types of negative samples for each summary. The experiments on Newsroom and CNN/Daily Mail datasets show that the proposed method outperforms other metrics and is generalizable across datasets.

Matt Wilber, William Timkey, Marten van Schijndel

107.  To Point or Not to Point: Understanding How Abstractive Summarizers Paraphrase Text
ACL, 2021

The paper discusses the limitations of abstractive neural summarization models despite their improved ROUGE scores. The authors conducted experiments on the pointer-generator model to understand how it controls its level of abstraction and extraction. The model utilizes syntactic boundaries to truncate sentences on an extractive-biased dataset, but when forced to generate, it only shows simple paraphrasing abilities with factual inaccuracies and hallucinations. On an abstractive-biased dataset, the model copies infrequently and shows limited abstractive abilities. The results suggest that abstractive summarization models lack the semantic understanding necessary to generate faithful and abstractive paraphrases.

Shuyang Cao, Lu Wang

108.  Attention Head Masking for Inference Time Content Selection in Abstractive Summarization
NAACL, 2021 Supervised Learning

The paper presents a technique called attention head masking to effectively inform content selection in Transformer-based abstractive summarization models. This technique is applied on encoder-decoder attentions to identify important content during inference. The authors demonstrate the effectiveness of this technique on three document summarization datasets, including in-domain and cross-domain settings. Their models outperform prior state-of-the-art models on CNN/Daily Mail and New York Times datasets. Additionally, the inferencetime masking technique is data-efficient, requiring less than 20% of the training samples to outperform BART fine-tuned on the full CNN/DailyMail dataset.

Chulaka Gunasekara, Guy Feigenblat, Benjamin Sznajder, Ranit Aharonov, Sachindra Joshi

109.  Using Question Answering Rewards to Improve Abstractive Summarization
EMNLP, 2021 Reinforced Learning

The paper discusses the issues with current neural abstractive summarization models and presents a framework to train these models to improve their summaries. The framework involves training a sequence-to-sequence model and then further training it in a Reinforcement Learning setting with question-answering based rewards. The experimental results show that this approach can improve the quality of the summaries generated by these models, with human evaluations showing a preference for the approach over general abstractive summarization models 30% of the time.

Mathieu Ravaut, Shafiq Joty, Nancy F. Chen

110.  SummaReranker: A Multi-Task Mixture-of-Experts Re-ranking Framework for Abstractive Summarization
ACL, 2022 Supervised Learning

The paper discusses the limitations of using beam search to generate summaries with sequence-to-sequence neural networks, due to the large search space and exposure bias. The authors propose a solution of directly training a second-stage model to perform re-ranking on a set of summary candidates, resulting in improved performance of the base model. Their SummaReranker model achieves state-of-the-art results on several datasets, with code and checkpoints available online.

Yuanjie Lyu, Chen Zhu, Tong Xu, Zikai Yin, Enhong Chen

111.  Faithful Abstractive Summarization via Fact-aware Consistency-constrained Transformer
CIKM, 2022 Supervised Learning

The paper proposes a new model for abstractive summarization called Entity-Relation Pointer Generator Network (ERPGN) that formalizes the facts in the original document as a factual knowledge graph and generates a high-quality summary by directly modeling consistency between the summary and the knowledge graph. The model uses two pointer network structures to capture the facts in the original document and two semantic-level losses to measure the disagreement between the summary and the facts. The experiments show that ERPGN outperforms classic abstractive summarization models and state-of-the-art fact-aware baseline methods in terms of faithfulness.

Shuyang Cao, Lu Wang

112.  CLIFF: Contrastive Learning for Improving Faithfulness and Factuality in Abstractive Summarization
EMNLP, 2021 Supervised Learning

System: The paper discusses a new approach to generating abstractive summaries that are both faithful and factually consistent with the given articles. The approach uses a contrastive learning formulation that leverages both reference summaries and automatically generated erroneous summaries to train summarization systems that are better at distinguishing between them. The paper also describes four strategies for creating negative samples that resemble errors made commonly by two state-of-the-art models, BART and PEGASUS. Experiments on XSum and CNN/Daily Mail show that the contrastive learning framework consistently produces more factual summaries than other approaches, according to QA-based factuality evaluation. Human judges also find that the model summaries correct more errors.

Haonan Wang, Yang Gao, Yu Bai, Mirella Lapata, Heyan Huang

113.  Exploring Explainable Selection to Control Abstractive Summarization
AAAI, 2021 Supervised Learning

The paper discusses the limitations of current neural models for document summarization, which lack transparency and control. To address this issue, the authors propose a novel select-and-generate framework called ESCA that focuses on explainability. The framework reveals the latent centrality and interactions between sentences, along with scores for sentence novelty and relevance, to give users a window into the choices the model is making and an opportunity to guide those choices. A novel pair-wise matrix captures the sentence interactions, centrality, and attribute scores, and a mask with tunable attribute thresholds allows the user to control which sentences are likely to be included in the extraction. A sentence-deployed attention mechanism in the abstractor ensures the final summary emphasizes the desired content. ESCA outperformed eight state-of-the-art models on the CNN/DailyMail and NYT50 benchmark datasets in a series of experiments assessed with ROUGE metrics and two human evaluations.

Haopeng Zhang, Semih Yavuz, Wojciech Kryscinsk, Kazuma Hashimoto, Yingbo Zhou

114.  Improving the Faithfulness of Abstractive Summarization via Entity Coverage Control
NAACL, 2022 Supervised Learning

The paper discusses the limitations of abstractive summarization systems that use pre-training language models, which are prone to hallucinating facts that are not faithful to the input context. To address this issue, the authors propose a method called Entity Coverage Control (ECC) that computes entity coverage precision and adds a control code to each training example to guide the model to recognize faithful contents. They also extend their method through intermediate fine-tuning on noisy data extracted from Wikipedia to enable zero-shot summarization. The proposed method leads to more faithful and salient abstractive summarization in supervised fine-tuning and zero-shot settings, as demonstrated by experimental results on three benchmark datasets of different domains and styles.

Xiaochen Liu, Yang Gao, Yu Bai, Jiawei Li, Yinan Hu, Heyan Huang, Boxing Chen

115.  PSP: Pre-trained Soft Prompts for Few-Shot Abstractive Summarization
COLING, 2022 Supervised Learning

The paper presents a new approach to few-shot abstractive summarization using a soft prompts architecture coupled with prompt pre-training and fine-tuning. The soft prompts consist of continuous input embeddings across an encoder and decoder, with a new inner-prompt introduced to capture document-level information. The approach uses prompt pre-training with self-supervised pseudo-data to teach the model basic summarizing capability, followed by fine-tuning with few-shot examples using lightweight soft prompts. Experimental results on the CNN/DailyMail and XSum datasets show that the method outperforms full-model tuning and Prompt Tuning, and delivers competitive results against PrefixTuning with significantly fewer parameters.

David Wan, Mohit Bansal

116.  FACTPEGASUS: Factuality-Aware Pre-training and Fine-tuning for Abstractive Summarization
NAACL, 2022 Supervised Learning

The paper presents FACTPEGASUS, an abstractive summarization model that focuses on factuality during pre-training and finetuning. The model uses a sentence selection strategy to create pseudosummaries that are both important and factual, and introduces three complementary components for fine-tuning: a corrector to remove hallucinations, a contrastor to differentiate factual from nonfactual summaries, and a connector to improve knowledge transfer. Experiments show that FACTPEGASUS substantially improves factuality and is more factual than using the original pre-training objective in zero-shot and few-shot settings, while also retaining factual behavior more robustly than strong baselines.

Shengqiang Zhang, Xingxing Zhang, Hangbo Bao, Furu Wei

117.  Attention Temperature Matters in Abstractive Summarization Distillation
ACL, 2022 Supervised Learning

The paper discusses how abstractive text summarization relies on large, computationally expensive pre-trained sequence-to-sequence Transformer models, and proposes a method to distill these models into smaller ones with minimal performance loss. The method involves manipulating attention temperatures in Transformers to make pseudo labels easier to learn for student models. Experiments on three summarization datasets show that this method consistently improves vanilla pseudo-labeling based methods, and both pseudo labels and summaries produced by the student models are shorter and more abstractive. The code for the proposed method is available on GitHub.

Kaiqiang Song, Chen Li, Xiaoyang Wang, Dong Yu, Fei Liu

118.  Towards Abstractive Grounded Summarization of Podcast Transcripts
ACL, 2022 Supervised Learning

The paper discusses the challenges of summarizing podcasts, including factual inconsistencies and speech disfluencies in transcripts. The authors propose a novel abstractive summarization method that grounds summary segments in specific regions of the transcript to improve summarization quality. They conducted a series of analyses on a large podcast dataset and found that their approach achieved promising results, improving both automatic and human evaluation of summarization quality.

Ye Xiong, Teeradaj Racharak, Minh Le Nguyen

119.  Extractive Elementary Discourse Units for Improving Abstractive Summarization
SIGIR, 2022 Supervised Learning

The paper discusses the use of elementary discourse units (EDUs) as the textual unit of content selection for abstractive summarization. The authors propose a novel summarization model that first designs an EDU selector to choose salient content, and then the generator model rewrites the selected EDUs as the final summary. To determine the relevancy of each EDU on the entire document, the authors apply group tag embedding. Extensive experiments on the CNN/Daily Mail dataset have demonstrated the effectiveness of their model.

Yixin Liu, Pengfei Liu, Dragomir Radev, Graham Neubig

120.  BRIO: Bringing Order to Abstractive Summarization
ACL, 2022 Supervised Learning

The paper proposes a new training paradigm for abstractive summarization models that assumes a non-deterministic distribution, which assigns probability mass to different candidate summaries based on their quality. This approach addresses the performance degradation issue during inference, where the model needs to compare system-generated summaries that deviate from the reference summary. The proposed method achieves a new state-of-the-art result on the CNN/DailyMail and XSum datasets, and can estimate probabilities of candidate summaries that are more correlated with their level of quality.

José Ángel González, Annie Louis, Jackie C. K. Cheung

121.  Source-summary Entity Aggregation in Abstractive Summarization
COLING, 2022 Supervised Learning

The paper discusses the phenomenon of referring to entities in later discourse by a more general description, and how this applies to summarization. The authors categorize these instances as source-summary entity aggregations and analyze them in the CNN/DAILYMAIL corpus. They examine how well three state-of-the-art summarization systems can generate such aggregations and develop techniques to encourage them to generate more. The results show that there is significant room for improvement in producing semantically correct aggregations.

Han Guo, Ramakanth Pasunuru, Mohit Bansal

122.  Soft Layer-Specific Multi-Task Summarization with Entailment and Question Generation
ACL, 2018 Supervised Learning

The paper proposes a method to improve abstractive summarization by using multi-task learning with the auxiliary tasks of question generation and entailment generation. The former helps the summarization model identify salient questioning-worthy details, while the latter teaches the model how to rewrite a summary that is a directed-logical subset of the input document. The paper also proposes novel multitask architectures with high-level layer-specific sharing and soft-sharing mechanisms, which result in statistically significant improvements over the state-of-the-art on various datasets. The paper presents quantitative and qualitative analysis studies of the model's learned saliency and entailment skills.

Wojciech Kryściński, Romain Paulus, Caiming Xiong, Richard Socher

123.  Improving Abstraction in Text Summarization
EMNLP, 2018 Supervised Learning

The paper proposes two techniques to improve the level of abstraction in abstractive text summarization. The first technique involves decomposing the decoder into a contextual network and a pretrained language model. The second technique involves a novelty metric that encourages the generation of novel phrases. The proposed model achieves results comparable to state-of-the-art models, while achieving a significantly higher level of abstraction as measured by n-gram overlap with the source document.

Alexios Gidiotis, Grigorios Tsoumakas

124.  Should We Trust This Summary? Bayesian Abstractive Summarization to The Rescue
ACL, 2022 Unsupervised Learning

The paper explores uncertainty in modern abstractive summarization models using Bayesian Deep Learning. They use Monte Carlo dropout to approximate Bayesian inference and perform multiple stochastic forward passes to quantify uncertainty at prediction time. This allows for filtering out generated summaries of high uncertainty and can be used for selecting samples for annotation. Bayesian inference also enables finding a summary that performs better than a deterministic one and is more robust to uncertainty. Their Variational Bayesian equivalents of BART and PEGASUS outperform their deterministic counterparts on multiple benchmark datasets.

Yizhu Liu, Qi Jia, Kenny Q. Zhu

125.  Length Control in Abstractive Summarization by Pretraining Information Selection
ACL, 2022 Supervised Learning

The paper proposes a new approach for length-controllable summarization models that adapts the encoding of the source based on the desired length. This is achieved through a length-aware attention mechanism (LAAM) that is trained on a summary length balanced dataset built from the original training data. The results show that this approach is effective in generating high-quality summaries with desired lengths, including those that were not seen in the original training set. Previous models tended to generate summaries as long as those in the training data, but LAAM can generate shorter summaries as well.

Lu Wang

126.  Neural Network-Based Abstract Generation for Opinions and Arguments
NAACL, 2016 Supervised Learning

System: The paper proposes a neural network model that generates informative and concise summaries for opinionated text. The model uses an attention-based mechanism to absorb information from multiple text units and an importance-based sampling method to integrate important input. The system outperforms state-of-the-art summarization systems on newly collected datasets of movie reviews and arguments and is rated higher in human evaluation for informativeness and grammaticality.

Abigail See, Peter J. Liu, Christopher D. Manning

127.  Get To The Point: Summarization with Pointer-Generator Networks
ACL, 2017 Supervised Learning

The paper discusses the limitations of neural sequence-to-sequence models for abstractive text summarization, which can inaccurately reproduce factual details and repeat themselves. The authors propose a new architecture that uses a hybrid pointer-generator network to accurately reproduce information while retaining the ability to generate novel words, and coverage to discourage repetition. The model is applied to the CNN/Daily Mail summarization task and outperforms the current abstractive state-of-the-art by at least 2 ROUGE points.

Jiwei Tan, Jianguo Xiao

128.  From Neural Sentence Summarization to Headline Generation: A Coarse-to-Fine Approach
IJCAI, 2017 Supervised Learning

The paper discusses the challenge of extending sentence summarization models to the task of document headline generation. The proposed solution is a coarse-to-fine approach that first identifies important sentences using document summarization techniques and then uses a multi-sentence summarization model with hierarchical attention to generate headlines. The approach significantly improves the performance of neural sentence summarization models on the headline generation task, as demonstrated by experimental results on a large real dataset.

Shashi Narayan, Shay B. Cohen, Mirella Lapata

129.  Don’t Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization
EMNLP, 2018 Supervised Learning

The paper introduces a new summarization task called extreme summarization, which requires an abstractive modeling approach to create a one-sentence news summary that answers the question "What is the article about?" A large dataset was collected from the BBC, and a novel abstractive model based on convolutional neural networks was proposed. The model was shown to outperform both extractive and abstractive approaches when evaluated by humans and automatically. The architecture captures long-range dependencies in a document and recognizes pertinent content.

Ramakanth Pasunuru, Mohit Bansal

130.  Multi-Reward Reinforced Summarization with Saliency and Entailment
NAACL, 2018 Reinforced Learning

The paper discusses the task of abstractive text summarization, which involves compressing a long document into a short summary while maintaining important aspects such as saliency, logical entailment, and non-redundancy. The authors propose a reinforcement learning approach with two novel reward functions, ROUGESal and Entail, in addition to a coverage-based baseline. The ROUGESal reward up-weights salient phrases/words detected via a keyphrase classifier, while the Entail reward gives high scores to logically-entailed summaries using an entailment classifier. The authors show that combining these rewards with traditional metric-based rewards leads to superior performance improvement, achieving state-of-the-art results on the CNN/Daily Mail dataset and strong improvements on the DUC-2002 dataset.

Yau-Shian Wang, Hung-Yi Lee

131.  Learning to Encode Text as Human-Readable Summaries using Generative Adversarial Networks
EMNLP, 2018 Supervised Learning

The paper proposes a method for achieving unpaired abstractive summarization using an auto-encoder that encodes input text into human-readable sentences. The auto-encoder consists of a generator and a reconstructor, with a discriminator used to ensure the generator output resembles human-written sentences. The generator encodes the input text into a shorter word sequence, and the reconstructor recovers the generator input from the generator output. This approach achieves abstractive summarization without the need for document-summary pairs as training data, and promising results are shown on both English and Chinese corpora.

Shuming Ma, Xu Sun, Junyang Lin, Xuancheng Ren

132.  A Hierarchical End-to-End Model for Jointly Improving Text Summarization and Sentiment Classification
IJCAI, 2018 Supervised Learning

The paper proposes a hierarchical end-to-end model for joint learning of text summarization and sentiment classification, where the sentiment classification label is treated as a further "summarization" of the text summarization output. The model achieves better performance than strong baseline systems on both abstractive summarization and sentiment classification, as shown by experimental results on Amazon online reviews datasets. Text summarization and sentiment classification aim to capture the main ideas of the text at different levels, with text summarization describing the text within a few sentences and sentiment classification summarizing the text into an even more abstract fashion, i.e., a sentiment class.

Thomas Scialom, Sylvain Lamprier, Benjamin Piwowarski, Jacopo Staiano

133.  Answers Unite! Unsupervised Metrics for Reinforced Summarization Models
EMNLP, 2019 Reinforced Learning

The paper discusses how abstractive summarization approaches based on Reinforcement Learning (RL) can overcome classical likelihood maximization. The most commonly used summarization metric, ROUGE, has limitations such as bias towards lexical similarity and suboptimal accounting for fluency and readability. The paper proposes alternative evaluation measures based on Question Answering, which were found to be favorable compared to ROUGE and do not require reference summaries. Training a RL-based model on these metrics leads to improvements in both human and automated metrics.

Daniel Deutsch

134.  Summary Cloze: A New Task for Content Selection in Topic-Focused Summarization
EMNLP, 2019 Supervised Learning

The paper proposes a new method for studying content selection in topic-focused summarization called the summary cloze task. The task involves generating the next sentence of a summary based on a topic, a partial summary, and a reference document. The challenge is deciding what information in the references is relevant to the topic and partial summary and should be included in the summary. The paper reports experimental results on a dataset of nearly 500k summary cloze instances from Wikipedia using various extractive and abstractive models. The results show that the task remains a significant challenge, but the topic and partial summary help the models identify relevant content.

Yang Liu, Mirella Lapata

135.  Text Summarization with Pretrained Encoders
EMNLP, 2019 Supervised Learning

The paper discusses the use of Bidirectional Encoder Representations from Transformers (BERT) in text summarization and proposes a framework for both extractive and abstractive models. They introduce a document-level encoder based on BERT that can express the semantics of a document and obtain representations for its sentences. They also propose a new fine-tuning schedule for abstractive summarization that adopts different optimizers for the encoder and decoder to alleviate the mismatch between the two. The experiments on three datasets show that their model achieves state-of-the-art results in both extractive and abstractive settings.

Kushal Chawla, Balaji Vasan Srinivasan, Niyati Chhaya

136.  Generating Formality-tuned Summaries Using Input-dependent Rewards
CONLL, 2019 Reinforced Learning

The paper discusses a reinforcement learning based approach to generate formality-tailored summaries for an input article. The model can generate both formal and informal summary variants, accommodating the psycho-linguistic preferences of the intended audience. The proposed framework includes a novel input-dependent reward function that aids in training the model with stylistic feedback on sampled and ground-truth summaries. Automated and qualitative evaluations show the viability of the approach.

Yoshihiko Suhara, Xiaolan Wang, Stefanos Angelidis, Wang-Chiew Tan

137.  OPINIONDIGEST: A Simple Framework for Opinion Summarization
ACL, 2020 Supervised Learning

The paper presents OPINIONDIGEST, an opinion summarization framework that uses an Aspect-based Sentiment Analysis model to extract opinion phrases from reviews and trains a Transformer model to reconstruct the original reviews. The framework selects the most popular opinions and uses them to generate an opinion summary. OPINIONDIGEST can also generate customized summaries by filtering opinions according to aspect and sentiment. The framework outperforms competitive baselines in automatic evaluation and produces informative summaries with promising customization capabilities, as verified by human studies.

Bowen Tan, Lianhui Qin, Eric P. Xing, Zhiting Hu

138.  Summarizing Text on Any Aspects: A Knowledge-Informed Weakly-Supervised Approach
EMNLP, 2020 Supervised Learning

The paper discusses aspect-based abstractive summarization, which generates a summary of a document based on a specific topic of interest. Previous studies have only focused on a small set of pre-defined topics, limiting the application of the task. The authors propose a new method that allows summarization on arbitrary topics relevant to the document, using external knowledge sources such as ConceptNet and Wikipedia. Experiments show that their approach improves performance on both real and synthetic documents.

Yang Deng, Wenxuan Zhang, Wai Lam

139.  Multi-hop Inference for Question-driven Summarization
EMNLP, 2020 Supervised Learning

The paper proposes a new method called Multi-hop Selective Generator (MSG) for question-driven abstractive summarization. This method incorporates multi-hop reasoning to provide justifications for the generated summaries. The proposed method outperforms state-of-the-art methods on two non-factoid QA datasets, namely WikiHow and PubMedQA. The method jointly models the relevance to the question and the interrelation among different sentences via a human-like multi-hop inference module and a gated selective pointer generator network with a multi-view coverage mechanism.

Kazuki Matsumaru, Sho Takase, Naoaki Okazaki

140.  Improving Truthfulness of Headline Generation
ACL, 2020 Supervised Learning

The paper discusses the concern about the truthfulness of generated summaries in abstractive summarization and explores improving the truthfulness in headline generation on two popular datasets. The study analyzes headlines generated by the state-of-the-art encoder-decoder model and shows that the model sometimes generates untruthful headlines due to untruthful supervision data used for training the model. To remedy this problem, the study hypothesizes that removing untruthful instances from the supervision data may help and builds a binary classifier that predicts an entailment relation between an article and its headline to filter out untruthful instances. Experimental results demonstrate that the headline generation model trained on filtered supervision data shows remarkable improvements in automatic and manual evaluations of the generated headlines.

Potsawee Manakul, Mark J. F. Gales

141.  Long-Span Summarization via Local Attention and Content Selection
ACL, 2021 Supervised Learning

The paper discusses the use of transformer-based models in natural language processing tasks, specifically document summarization. While these models have achieved impressive results, they struggle with scaling as input length grows, making it difficult to train or fine-tune them for long document summarization. The paper proposes two methods, local self-attention and explicit content selection, to address long-span dependencies in abstractive summarization. The approaches are compared on various network configurations and tested on standard long-span summarization tasks, achieving state-of-the-art results on all three tasks in the ROUGE scores. The paper also notes that their approach can achieve comparable or better results than existing approaches without requiring a large-scale GPU card.

Thomas Scialom, Paul-Alexis Dray, Patrick Gallinari, Sylvain Lamprier, Benjamin Piwowarski, Jacopo Staiano, Alex Wang

142.  QuestEval: Summarization Asks for Fact-based Evaluation
EMNLP, 2021

The paper discusses the limitations of current metrics for evaluating summarization, such as ROUGE, and proposes a new framework called QUESTEVAL. Unlike other metrics, QUESTEVAL does not require a groundtruth reference and relies on question answering models to assess whether a summary contains all the relevant information from its source document. The paper shows that QUESTEVAL significantly improves the correlation with human judgments over four evaluation dimensions: consistency, coherence, fluency, and relevance. The authors also provide code and models for the framework.

Shusheng Xu, Xingxing Zhang, Yi Wu, Furu Wei

143.  Sequence Level Contrastive Learning for Text Summarization
AAAI, 2022 Supervised Learning

The paper proposes a contrastive learning model for supervised abstractive text summarization, which maximizes the similarities between different views of the same mean representation during training. The model outperforms a strong sequence-to-sequence text generation model on three different summarization datasets and achieves better faithfulness ratings in human evaluation. The code is available at https://github.com/xssstory/SeqCo.

Travis R. Goodwin, Max E. Savery, Dina Demner-Fushman

144.  Towards Zero-Shot Conditional Summarization with Adaptive Multi-Task Fine-Tuning
EMNLP, 2020 Supervised Learning

The paper discusses the problem of conditional summarization, where content selection and surface realization are based on a natural language question or topic description. The authors explore the use of multi-task fine-tuning (MTFT) on twenty-one natural language tasks to enable zero-shot conditional summarization on five tasks. They present four new summarization datasets and report zero-shot performance using T5 and BART, demonstrating that MTFT can improve zero-shot summarization quality. The paper highlights the importance of specific summaries for applications such as question answering and literature discovery.

Jacob Parnell, Inigo Jauregi Unanue, Massimo Piccardi

145.  RewardsOfSum: Exploring Reinforcement Learning Rewards for Summarisation
ACL, 2021 Reinforced Learning

The paper proposes two reward functions for abstractive summarization, RwBHinge and RISK, to improve upon the negative loglikelihood (NLL) baselines commonly used in training models. The experiments show that the proposed approach consistently improves performance over the NLL baselines when fine-tuning an NLL pre-trained model on nine diverse summarization datasets. The reward function used in reinforcement learning plays a key role in performance and is still partially unexplored.

Potsawee Manakul, Mark J. F. Gales

146.  Sparsity and Sentence Structure in Encoder-Decoder Attention of Summarization Systems
EMNLP, 2021 Supervised Learning

The paper discusses the challenges of using transformer models for NLP tasks, particularly in summarization, due to the computational expense of the encoder-decoder attention mechanism. The authors propose a modified architecture that selects a subset of input sentences to constrain the attention mechanism, based on the empirical observation of a sparse sentence structure in document summarization. Experiments on various summarization tasks show that the proposed approach maintains system performance while reducing computational cost.

Yue Dong, John Wieting, Pat Verga

147.  Faithful to the Document or to the World? Mitigating Hallucinations via Entity-Linked Knowledge in Abstractive Summarization
EMNLP, 2022 Supervised Learning

The paper discusses how existing abstractive summarization systems generate text that is not directly inferable from the source alone, resulting in content hallucinations. These hallucinations are sometimes factual but unfaithful to the source. The paper suggests that these factual hallucinations occur due to the prevalence of factual yet unfaithful entities in summarization datasets. The authors find that these entities are examples of additional world knowledge being used to connect entities and concepts. They demonstrate that connecting entities to an external knowledge base can improve the factuality of summaries without making them more extractive.

Hou Pong Chan, Lu Wang, Irwin King

148.  Controllable Summarization with Constrained Markov Decision Process
TACL, 2021 Reinforced Learning

The paper discusses controllable text summarization, which allows users to control specific attributes of generated summaries. The authors propose a new training framework based on Constrained Markov Decision Process (CMDP) that includes a reward function and constraints to improve summarization control. The reward function encourages summaries to resemble human-written references, while the constraints prevent generated summaries from violating user-imposed requirements. The framework can be used to control important attributes of summarization, such as length, covered entities, and abstractiveness. Experiments show that the CMDP framework helps generate informative summaries while complying with specific attribute requirements.

Vidhisha Balachandran, Artidoro Pagnoni, Jay Yoon Lee, Dheeraj Rajagopal, Jaime Carbonell, Yulia Tsvetkov

149.  StructSum: Summarization via Structured Representations
EACL, 2021 Supervised Learning

The paper discusses the challenges faced by abstractive text summarization models, including layout bias, limited abstractiveness, and lack of transparency. The authors propose a framework based on document-level structure induction for summarization that incorporates latent and explicit dependencies across sentences in the source document into end-to-end single-document summarization models. The framework improves the coverage of content in the source documents, generates more abstractive summaries by generating more novel n-grams, and incorporates interpretable sentence-level structures, while performing on par with standard baselines. The framework was trained on the CNN/DM dataset.

Reinald Kim Amplayo, Mirella Lapata, Samuel L. Jackson

150.  Informative and Controllable Opinion Summarization
EACL, 2021 Supervised Learning

The paper proposes a new approach to opinion summarization that eliminates the need for pre-selected content and allows for the use of all input reviews. The approach involves condensing the reviews into multiple dense vectors which are then used as input to an abstractive model. The framework also includes a zero-shot customization technique that takes user preferences into account. Experimental results show that the proposed model outperforms existing methods on the Rotten Tomatoes dataset and generates more informative and customized summaries.

Tanya Goyal, Greg Durrett

151.  Annotating and Modeling Fine-grained Factuality in Summarization
NAACL, 2021 Supervised Learning

The paper discusses the issue of factual errors in abstractive summarization systems and explores different data sources for training models to identify these errors. The authors found that factual errors differ significantly across datasets and that human-labeled data with fine-grained annotations is more effective for training models than synthetic data or sentence-level annotations. They also show that their best factuality detection model enables training of more factual summarization models by identifying non-factual tokens in the training data.

Wang Xu, Tiejun Zhao

152.  Jointly Learning Guidance Induction and Faithful Summary Generation via Conditional Variational Autoencoders
NAACL, 2022 Supervised Learning

The paper discusses the challenges of generating factual consistency summaries through abstractive summarization and proposes a novel framework based on conditional variational autoencoders to induce guidance information and generate summaries equipped with guidance synchronously. The approach is shown to generate relevant and fluent summaries that are more faithful than existing state-of-the-art approaches according to multiple factual consistency metrics, as demonstrated through experiments on XSUM and CNNDM datasets.

Arthur Bražinskas, Ramesh Nallapati, Mohit Bansal, Markus Dreyer

153.  Efficient Few-Shot Fine-Tuning for Opinion Summarization
NAACL, 2022 Supervised Learning

The paper discusses the challenges of abstractive summarization in opinion summarization due to the lack of large annotated datasets of reviews paired with reference summaries. To address this, the authors propose a few-shot method based on adapters that can easily store in-domain knowledge. Instead of fine-tuning the entire model, adapters are added and pre-trained in a task-specific way on a large corpus of unannotated customer reviews, using held-out reviews as pseudo summaries. The adapters are then fine-tuned on the small available human-annotated dataset. The authors show that this self-supervised adapter pre-training improves summary quality over standard fine-tuning. Additionally, for summary personalization, the authors condition on aspect keyword queries, automatically created from generic datasets. This results in better-organized summary content reflected in improved coherence and fewer redundancies.

Shweta Yadav, Cornelia Caragea

154.  Towards Summarizing Healthcare Questions in Low-Resource Setting
COLING, 2022 Supervised Learning

The paper discusses the challenges of creating large-scale datasets for abstractive document summarization in closed domains like healthcare, where human annotation requires domain expertise. The authors propose a data selection strategy that uses guided semantic-overlap and diversity-based objective functions to generate diverse and semantic questions in a low-resource setting. Their experiments on benchmark healthcare question summarization datasets show that their method achieves new state-of-the-art results and generates diverse, fluent, and informative summarized questions.

Liqiang Xiao, Lu Wang, Hao He, Yaohui Jin

155.  Modeling Content Importance for Summarization with Pre-trained Language Models
EMNLP, 2020 Supervised Learning

The paper discusses the challenge of modeling content importance for summarization, which previous methods have struggled with due to their focus on word-level salience and lack of consideration for semantics and context. The authors propose a new approach that applies information theory to pretrained language models, allowing for a more comprehensive evaluation of importance that can be applied to different types of semantic units. Experiments on two datasets show that their method outperforms prior work in terms of F1 and ROUGE scores.

Yang Gao, Christian M. Meyer, Iryna Gurevych

156.  APRIL: Interactively Learning to Summarise by Combining Active Preference Learning and Reinforcement Learning
EMNLP, 2018 Reinforced Learning

The paper proposes a method for automatic document summarization that learns from users' preferences instead of using reference summaries. The method reduces sample complexity by leveraging active learning, preference learning, and reinforcement learning techniques through a new objective function. The authors conducted both simulation and real-user experiments, which showed that their method significantly advances the state of the art. The source code is available for free on GitHub.

Yichen Jiang, Mohit Bansal

157.  Closed-Book Training to Improve Summarization Encoder Memory
EMNLP, 2018 Reinforced Learning

The paper discusses the importance of a strong encoder in neural sequence-to-sequence summarization models and proposes a method to improve the encoder's memorization capabilities by adding an additional 'closed-book' decoder without attention and pointer mechanisms. This forces the encoder to be more selective in the information it encodes in its memory state, leading to improved performance on the CNN/Daily Mail dataset in terms of ROUGE and METEOR metrics, as well as human evaluation. The paper also presents several tests and ablations to demonstrate the effectiveness of the proposed method.

Kundan Krishna, Balaji Vasan Srinivasan

158.  Generating topic-oriented summaries using neural attention
NAACL, 2018 Supervised Learning

System: The paper proposes an attention-based RNN framework to generate multiple summaries of a single document that are tuned to different topics of interest. Existing summarization algorithms generate a single summary and cannot generate multiple summaries that are tailored to the interests of different readers. The proposed method outperforms existing baselines and suggests that generative networks can be successfully biased to look at sentences relevant to a topic and generate topic-tuned summaries.

Ziqiang Cao, Wenjie Li, Furu Wei, Sujian Li

159.  Retrieve, Rerank and Rewrite: Soft Template Based Neural Summarization
ACL, 2018 Supervised Learning

System: The paper proposes a new approach to seq2seq summarization that uses existing summaries as soft templates to guide the model. The authors retrieve proper summaries as candidate templates using an IR platform and extend the seq2seq framework to conduct template reranking and template-aware summary generation. Experiments show that this approach significantly outperforms state-of-the-art methods and even soft templates themselves demonstrate high competitiveness. Importing high-quality external summaries also improves the stability and readability of generated summaries.

Junjie Li, Xuepeng Wang, Dawei Yin, Chengqing Zong

160.  Attribute-aware Sequence Network for Review Summarization
EMNLP, 2019 Supervised Learning

The paper proposes an Attribute-aware Sequence Network (ASN) for review summarization that takes into account users' characteristics such as gender, age, and occupation. The ASN includes three modules: an attribute encoder, an attribute-aware review encoder, and an attribute-aware summary decoder. The authors validate their model using a new dataset called TripAtt, which includes 495,440 attribute-review-summary triplets. The experiments show that ASN achieves state-of-the-art performance on review summarization in both auto-metric ROUGE and human evaluation.

Haoyu Zhang, Jingjing Cai, Jianjun Xu, Ji Wang

161.  Pretraining-Based Natural Language Generation for Text Summarization
CONLL, 2019 Supervised Learning

System: The paper proposes a new pretraining-based encoder-decoder framework for generating output sequences from input sequences in two stages. The encoder uses BERT to encode the input sequence into context representations, while the decoder uses a Transformer-based decoder to generate a draft output sequence in the first stage. In the second stage, each word of the draft sequence is masked and fed to BERT, and the input sequence and draft representation generated by BERT are combined to predict the refined word for each masked position using a Transformer-based decoder. This approach is the first to apply BERT to text generation tasks, and the proposed method is evaluated on the text summarization task, achieving new state-of-the-art results on both CNN/Daily Mail and New York Times datasets.

Hayate Iso, Xiaolan Wang, Yoshihiko Suhara, Stefanos Angelidis, Wang-Chiew Tan

162.  Convex Aggregation for Opinion Summarization
EMNLP, 2021 Supervised Learning

The paper discusses recent advances in text autoencoders and their ability to generate grammatically correct and consistent text from aggregated latent vectors. However, the commonly used simple average approach for vector aggregation can lead to overly generic summaries due to unexpected L2-norm shrinkage in the aggregated latent vectors, which the paper refers to as summary vector degeneration. To address this issue, the authors develop a framework called COOP, which searches input combinations for the latent vector aggregation using input-output word overlap. Experimental results show that COOP successfully alleviates the summary vector degeneration issue and establishes new state-of-the-art performance on two opinion summarization benchmarks. The code for COOP is available at https://github.com/megagonlabs/coop.

Arthur Bražinskas, Mirella Lapata, Ivan Titov

163.  Learning Opinion Summarizers by Selecting Informative Reviews
EMNLP, 2021 Supervised Learning

The paper discusses the challenges of opinion summarization and proposes a new approach that involves jointly learning to select informative subsets of reviews and summarizing the opinions expressed in these subsets. The authors collected a large dataset of summaries paired with user reviews for over 31,000 products, but the large number of reviews per product made summarization impractical. The authors use amortized variational inference and policy gradient methods for joint training and demonstrate the importance of selecting informative reviews resulting in improved quality of summaries and reduced hallucinations.

Hou Pong Chan, Hong Kong, Wang Chen, Irwin King

164.  A Unified Dual-view Model for Review Summarization and Sentiment Classification with Inconsistency Loss
SIGIR, 2020 Supervised Learning

The paper proposes a dual-view model that jointly improves review summarization and sentiment classification tasks. The model uses an encoder to learn a context representation for the review and a summary decoder to generate a review summary. Two sentiment classifiers are used to predict sentiment labels for the review and generated summary. An inconsistency loss is introduced during training to penalize disagreement between the two classifiers and help the decoder generate a summary with a consistent sentiment tendency. Experiment results on four real-world datasets demonstrate the effectiveness of the proposed model.

Xiyan Fu, Jun Wang, Jinghan Zhang, Jinmao Wei, Zhenglu Yang

165.  Document Summarization with VHTM: Variational Hierarchical Topic-Aware Mechanism
AAAI, 2020 Supervised Learning

improved summaries. The paper introduces a new approach, VHTM, that combines summarization with topic inference and merges topics into multiple granularity levels. This is in contrast to previous work that relied on pre-trained single-grained topic models. The approach is validated through comprehensive experiments, which demonstrate its superior performance compared to baselines.

Nisan Stiennon, Long Ouyang, Jeff Wu, Daniel M. Ziegler, Ryan Lowe, Chelsea Voss, Alec Radford, Dario Amodei, Paul Christiano

166.  Learning to summarize from human feedback
NEURIPS, 2020 Reinforced Learning

The paper discusses how language models are limited by the data and metrics used for a particular task, such as summarization models being trained to predict human reference summaries and evaluated using ROUGE. The authors propose training a model to optimize for human preferences, using a large dataset of human comparisons between summaries and reinforcement learning. They apply their method to a version of the TL;DR dataset of Reddit posts and find that their models significantly outperform both human reference summaries and larger models fine-tuned with supervised learning alone. The authors also conduct extensive analyses to understand their human feedback dataset and fine-tuned models and establish that their reward model generalizes to new datasets and results in better summaries than optimizing ROUGE according to humans. The paper aims to motivate machine learning researchers to pay closer attention to how their training loss affects the model behavior they actually want.

Isabel Cachola, Kyle Lo, Arman Cohan, Daniel S. Weld

167.  TLDR: Extreme Summarization of Scientific Documents
EMNLP, 2020 Supervised Learning

System: The paper introduces TLDR generation, a new extreme summarization technique for scientific papers that involves compressing the source material and requires expert knowledge of the domain-specific language. To facilitate research on this task, the authors introduce SCITLDR, a dataset of 5.4K TLDRs over 3.2K papers that includes both author-written and expert-derived summaries. The authors propose CATTS, a learning strategy that uses titles as an auxiliary training signal to generate TLDRs. CATTS outperforms strong baselines under both automated metrics and human evaluations. The data and code for this research are publicly available at https://github.com/allenai/scitldr.

Sascha Rothe, Shashi Narayan

168.  Leveraging Pre-trained Checkpoints for Sequence Generation Tasks
TACL, 2020 Unsupervised Learning

The paper discusses the effectiveness of using pre-trained checkpoints for Sequence Generation. The authors developed a Transformer-based sequence-to-sequence model that is compatible with pre-trained BERT, GPT-2, and RoBERTa checkpoints. They conducted an empirical study and found that initializing their model with these checkpoints resulted in new state-of-the-art results on Machine Translation, Text Summarization, Sentence Splitting, and Sentence Fusion. This demonstrates the potential of pre-training for Sequence Generation tasks.

Logan Lebanoff, Franck Dernoncourt, Doo Soon Kim, Lidan Wang, Walter Chang, Fei Liu

169.  Learning to Fuse Sentences with Transformers for Summarization
EMNLP, 2020 Supervised Learning

System: This paper explores the ability of Transformers to fuse sentences and proposes algorithms to enhance their ability to perform sentence fusion by leveraging the knowledge of points of correspondence between sentences. The authors conducted extensive experiments to investigate the effects of different design choices on Transformer's performance and found that modeling points of correspondence between sentences is crucial for effective sentence fusion. The ability to fuse sentences is important for summarization systems to produce succinct abstracts, but current summarizers can fail on fusing sentences, leading to few summary sentences or incorrect fusions that fail to retain the original meaning.

Rahul Aralikatte, Shashi Narayan, Joshua Maynez, Sascha Rothe, Ryan McDonald

170.  Focus Attention: Promoting Faithfulness and Diversity in Summarization
ACL, 2021 Supervised Learning

The paper introduces a new method called Focus Attention Mechanism to help seq2seq decoders generate summaries that are similar or topical to the input document. They also propose a Focus Sampling method to enable the generation of diverse summaries. The evaluation on the BBC extreme summarization task shows that models augmented with Focus Attention generate summaries that are closer to the target and more faithful to their input documents, outperforming their vanilla counterparts on ROUGE and multiple faithfulness measures. The paper also demonstrates that Focus Sampling is more effective in generating diverse and faithful summaries than other decoding methods.

Shuyang Cao, Lu Wang

171.  HIBRIDS: Attention with Hierarchical Biases for Structure-aware Long Document Summarization
ACL, 2022 Supervised Learning

The paper discusses the importance of document structure for efficient information consumption, but notes that it is difficult to encode this structure into modern Transformer architecture. The authors present HIBRIDS, a model that incorporates hierarchical biases to better incorporate document structure into attention scores. They also introduce a new task, hierarchical questionsummary generation, which involves summarizing content into a hierarchy of questions and summaries. The authors annotate a new dataset with over 6,000 questionsummary hierarchies labeled on long government reports and show that their model produces better hierarchies than comparisons on both hierarchy quality and content coverage. The model also improves the generation of longform summaries from government reports and Wikipedia articles, as measured by ROUGE scores.

Ziqiang Cao, Furu Wei, Sujian Li, Wenjie Li, Ming Zhou, Houfeng Wang

172.  Learning Summary Prior Representation for Extractive Summarization
ACL, 2015 Supervised Learning

The paper introduces the concept of summary prior, which determines how much of a sentence should be included in a summary without considering its context. The authors propose a new summary system called PriorSum, which uses convolutional neural networks to capture summary prior features from length-variable phrases. The learned prior features are combined with document-dependent features for sentence ranking. Experiments on the DUC generic summarization benchmarks show that PriorSum outperforms existing methods and can identify different aspects supporting the summary prior.

Hongyan Xu, Hongtao Liu, Pengfei Jiao, Wenjun Wang

173.  Transformer Reasoning Network for Personalized Review Summarization
SIGIR, 2021 Supervised Learning

The paper proposes a novel transformer-based reasoning framework for personalized review summarization in E-commerce platforms. The quality of generated summaries is highly related to the characteristics of users and products, including their historical summaries. However, most previous works ignore the interaction between the input review and corresponding historical summaries. The proposed approach involves inter- and intra-attention in the encoder to learn the personalized representation of the input review and a memory-decoder attention module in the decoder to retrieve more useful information for the final summary generation. The approach outperforms many competitive baseline methods in generating more reasonable summaries for recommendation.

Yixin Liu, Zi-Yi Dou, Pengfei Liu

174.  RefSum: Refactoring Neural Summarization
NAACL, 2021 Supervised Learning

The paper presents a new framework called Refactor for text summarization and summaries combination. The authors highlight the limitations of previous methods and perform a comprehensive evaluation involving twenty-two base systems, four datasets, and three different application scenarios. The Refactor model achieves new state-of-the-art results on the CNN/DailyMail dataset and addresses the limitations of traditional methods. The authors open-source all the code and provide a convenient interface for other researchers to use as an off-the-shelf tool to achieve further performance improvements.

Yang Liu, Sheng Shen, Mirella Lapata

175.  Noisy Self-Knowledge Distillation for Text Summarization
NAACL, 2021 Supervised Learning

System: The paper proposes a new method called self-knowledge distillation for text summarization that can improve the training process by using guidance from a teacher model and multiple noise signals to better model uncertainty. The proposed method achieves state-of-the-art results on three benchmarks for both pretrained and nonpretrained summarizers.

Hong Wang, Xin Wang, Wenhan Xiong, Mo Yu, Xiaoxiao Guo, Shiyu Chang, William Yang Wang

176.  Self-Supervised Learning for Contextualized Extractive Summarization
ACL, 2019 Supervised Learning

The paper proposes a new approach to improve extractive summarization by introducing three pre-training tasks that capture document-level context in a self-supervised manner. The proposed method is validated through experiments on the CNN/DM dataset, and the results show that a simple model with pre-training outperforms previous state-of-the-art models.

Luyang Huang, Shuyang Cao, Nikolaus Parulian, Heng Ji, Lu Wang

177.  Efficient Attentions for Long Document Summarization
NAACL, 2021 Supervised Learning

HEPOS is a new efficient encoder-decoder attention model that effectively identifies important information from a source document for summarization. The authors conducted a study of existing efficient self-attentions and combined them with HEPOS to process ten times more tokens than existing models that use full attentions. They also presented a new dataset, GOVREPORT, with longer documents and summaries, and showed that their models produced significantly higher ROUGE scores than competitive comparisons, including new state-of-the-art results on PubMed. Human evaluation also showed that their models generated more informative summaries with fewer unfaithful errors.

Reinald Kim Amplayo, Stefanos Angelidis, Mirella Lapata

178.  Aspect-Controllable Opinion Summarization
EMNLP, 2021 Supervised Learning

System: This paper proposes a new approach for generating customized summaries based on aspect queries, such as describing the location and room of a hotel. The authors create a synthetic training dataset enriched with aspect controllers and fine-tune a pretrained model to generate aspect-specific summaries. Experiments show that their model outperforms previous state-of-the-art methods and can generate personalized summaries by controlling the number of aspects discussed.

Duy-Hung Nguyen, Nguyen Viet Dung Nghiem, Bao-Sinh Nguyen, Dung Tien Le, Shahab Sabahi, Minh-Tien Nguyen, Hung Le, Hoang Cau, Dong Da

179.  Make The Most of Prior Data: A Solution for Interactive Text Summarization with Preference Feedback
NAACL, 2022 Reinforced Learning

The paper discusses the importance of incorporating human preferences in summarization models to align with human interests. It proposes a new framework for training summarization models with preference feedback in an interactive manner, leveraging offline data and a novel reward model to improve performance and sample efficiency. The experiments conducted on three datasets confirm the benefits of the proposed framework in active, few-shot, and online settings of preference learning.

Yanjun Gao, Timothy Miller, Dongfang Xu, Matthew M. Churpek, Majid Afshar

180.  Summarizing Patients’ Problems from Hospital Progress Notes Using Pre-trained Sequence-to-Sequence Models
COLING, 2022 Supervised Learning

The paper proposes a new NLP task of generating a list of problems in a patient's daily care plan using input from provider's progress notes during hospitalization. The study investigates the performance of T5 and BART, two state-of-the-art seq2seq transformer architectures, in solving this problem. The evaluation methods include ROUGE, BERTScore, cosine similarity on sentence embedding, and F-score on medical concepts. The results show that T5 with domain adaptive pre-training achieves significant performance gains compared to a rule-based system and general domain pre-trained language models, indicating a promising direction for tackling the problem summarization task. The study provides a corpus built on top of progress notes from publicly available electronic health record progress notes in the Medical Information Mart for Intensive Care (MIMIC)-III.

Yumo Xu, Mirella Lapata

181.  Document Summarization with Latent Queries
TACL, 2022 Supervised Learning

The paper discusses the development of neural models for creating generic summaries for single or multiple documents, driven by the availability of large-scale datasets. However, for query-focused summarization (QFS), labeled training data is not easily accessible. The authors propose a unified modeling framework for any type of summarization, assuming that all summaries are a response to a query, which is observed in QFS and latent in generic summarization. They model queries as discrete latent variables over document tokens and learn representations compatible with observed and unobserved query verbalizations. The framework formulates summarization as a generative process and optimizes a latent query model and a conditional language model. Despite learning from generic summarization data only, their approach outperforms strong comparison systems across benchmarks, query types, document settings, and target domains.

Mittul Singh, Arunav Mishra, Youssef Oualil, Klaus Berberich, Dietrich Klakow

182.  Long-Span Language Models for Query-Focused Unsupervised Extractive Text Summarization
ECIR, 2018 Unsupervised Learning

The paper discusses the use of long-span language models (LMs) in unsupervised query-focused extractive summarization systems. The authors propose the use of Across Sentence Boundary LSTM-based LMs (ASBLSTM and biASBLSTM) that are specifically designed for this task. They conducted experiments on a real-world corpus with 100 Wikipedia event descriptions as queries and found that using the long-span models in an integer linear programming (ILP) formulation of MMR criterion was the most effective approach compared to several state-of-the-art baseline methods from the literature.

Ryuji Kano, Yasuhide Miura, Tomoki Taniguchi, Tomoko Ohkuma

183.  Identifying Implicit Quotes for Unsupervised Extractive Summarization of Conversations
AACL, 2020 Unsupervised Learning

The paper proposes an unsupervised extractive neural summarization model called Implicit Quote Extractor for conversational texts. The model aims to extract quoted sentences as summaries, even if they are not explicitly shown in replies. The training task of the model is to predict whether a reply candidate is a true reply to a post, and to do so, the model learns to extract sentences that replies frequently refer to. The model is evaluated on two email datasets and one social media dataset, and the results confirm that it is useful for extractive summarization. The paper also discusses whether quote extraction is an important factor for summarization and whether the model can capture salient sentences that conventional methods cannot.

Shusheng Xu, Xingxing Zhang, Yi Wu, Furu Wei, Ming Zhou

184.  Unsupervised Extractive Summarization by Pre-training Hierarchical Transformers
EMNLP, 2020 Unsupervised Learning

The paper discusses a new method for unsupervised extractive document summarization, which involves selecting important sentences from a document without using labeled summaries during training. The authors propose using transformer attentions to rank sentences, and pre-train a hierarchical transformer model using unlabeled documents only. They then use sentence-level self-attentions and pre-training objectives to rank sentences. Experiments on CNN/DailyMail and New York Times datasets show that their model achieves state-of-the-art performance on unsupervised summarization, and is less dependent on sentence positions. When combined with a recent unsupervised model explicitly modeling sentence positions, the results are even better.

Xinnian Liang, Shuangzhi Wu, Mu Li, Zhoujun Li

185.  Improving Unsupervised Extractive Summarization with Facet-Aware Modeling
ACL, 2021 Unsupervised Learning

The paper discusses the problem of facet bias in unsupervised extractive summarization, where existing graph-based methods tend to select sentences within the same facet. To address this, the authors propose a facet-aware centrality-based ranking model that introduces a sentence-document weight to pay more attention to different facets. The method is evaluated on 8 benchmark datasets and consistently outperforms strong baselines, especially in long and multi-document scenarios. The performance gains are attributed to alleviating the facet bias problem.

Vishakh Padmakumar

186.  Unsupervised Extractive Summarization using Pointwise Mutual Information
EACL, 2021 Unsupervised Learning

System: The paper proposes a new approach to unsupervised extractive summarization using pointwise mutual information (PMI) between sentences to measure relevance and redundancy. The method involves a greedy sentence selection algorithm to maximize relevance and minimize redundancy of extracted sentences. The authors show that their method outperforms similarity-based methods on datasets in various domains, including news, medical journal articles, and personal anecdotes.

Somnath Basu, Roy Chowdhury, Chao Zhao, Snigdha Chaturvedi

187.  Unsupervised Extractive Opinion Summarization Using Sparse Coding
ACL, 2022 Unsupervised Learning

The paper presents a new method called Semantic Autoencoder (SemAE) for extractive opinion summarization in an unsupervised manner. SemAE uses dictionary learning to capture semantic information from reviews and learns a latent representation of each sentence over semantic units. The extractive summarization algorithm leverages these representations to identify representative opinions among hundreds of reviews. SemAE can also perform controllable summarization to generate aspect-specific summaries. The authors report strong performance on SPACE and AMAZON datasets and provide their code publicly.

Stefanos Angelidis, Mirella Lapata

188.  Summarizing Opinions: Aspect Extraction Meets Sentiment Prediction and They Are Both Weakly Supervised
EMNLP, 2018 Supervised Learning

The paper presents a neural framework for summarizing opinions from online product reviews. The framework is knowledge-lean and only requires light supervision in the form of product domain labels and user-provided ratings. The method combines two weakly supervised components to identify salient opinions and form extractive summaries from multiple reviews. The authors introduce an opinion summarization dataset that includes a training set of product reviews from six diverse domains and human-annotated development and test sets with gold standard aspect annotations, salience labels, and opinion summaries. Automatic evaluation shows significant improvements over baselines, and a largescale study indicates that the opinion summaries generated by the framework are preferred by human judges according to multiple criteria.

Yue Dong, Andrei Mircea, Jackie C. K. Cheung

189.  Discourse-Aware Unsupervised Summarization of Long Scientific Documents
EACL, 2021 Unsupervised Learning

The paper proposes an unsupervised graph-based ranking model for summarizing long scientific documents. The method uses a two-level hierarchical graph representation of the document and asymmetrical positional cues to determine sentence importance. The approach outperforms strong unsupervised baselines in automatic metrics and human evaluation on the PubMed and arXiv datasets. It also achieves performance comparable to many state-of-the-art supervised approaches. The results suggest that patterns in the discourse structure are a strong signal for determining importance in scientific articles.

Gabriel Shenouda, Christophe Rodrigues, Aurélien Bossard

190.  SummVD : An efficient approach for unsupervised topic-based text summarization
AACL, 2022 Unsupervised Learning

The paper introduces a new method called SummVD for automatic unsupervised extractive summarization. It uses singular value decomposition and word clustering to reduce the dimensionality of word embeddings and propose a representation of words on a small number of dimensions, each representing a hidden topic. This makes SummVD an efficient method for text summarization, outperforming recent extractive approaches. It requires low resources in terms of data and computing power, making it suitable for use in live summarization systems.

Elaheh ShafieiBavani, Mohammad Ebrahimi, Raymond Wong, Fang Chen

191.  Summarization Evaluation in the Absence of Human Model Summaries Using the Compositionality of Word Embeddings
COLING, 2018

The paper presents a new approach for evaluating the quality of summaries without the need for human model summaries. The approach uses word embeddings to develop features that reflect coverage, diversity, informativeness, and coherence of summaries. These features are then used to train a learning model for predicting summary content quality. The proposed metric was evaluated on data from query-focused and update summarization tasks in TAC 2008 and 2009, and the results show that the feature combination provides reliable estimates of summary content quality when model summaries are not available.

Tuba Gokhan, Phillip Smith, Mark Lee

192.  GUSUM: Graph-Based Unsupervised Summarization using Sentence Features Scoring and Sentence-BERT
COLING, 2022 Unsupervised Learning

The paper presents a new method for unsupervised extractive document summarization called Graph-Based Unsupervised Summarization (GUSUM). The method uses sentence embeddings and features to modify traditional graph ranking algorithms and compute sentence centrality. The approach aims to include the most important sentences while excluding those with similar meanings in the summary. The method is evaluated on several datasets and achieves high performance when evaluated both automatically and by humans.

Hao Zheng, Mirella Lapata

193.  Sentence Centrality Revisited for Unsupervised Summarization
ACL, 2019 Unsupervised Learning

The paper discusses the development of an unsupervised approach for single document summarization, which utilizes a modified graph-based ranking algorithm. The algorithm incorporates BERT, a neural representation learning model, to capture sentential meaning, and builds graphs with directed edges to consider the relative position of nodes in a document. The approach was tested on three news summarization datasets and outperformed strong baselines by a significant margin. The authors argue that this approach is more realistic than relying on large-scale and high-quality training data for different types of summaries, domains, or languages.

Raphael Schumann, Lili Mou, Yao Lu, Olga Vechtomova, Katja Markert

194.  Discrete Optimization for Unsupervised Sentence Summarization with Word-Level Extraction
ACL, 2020 Unsupervised Learning

The paper discusses the process of automatic sentence summarization, which involves creating a shorter version of a sentence while retaining its most important information. The authors propose an unsupervised objective function that considers language fluency and semantic similarity metrics to find a high-scoring summary through discrete optimization. Their method achieves a new state-of-the-art for unsupervised sentence summarization according to ROUGE scores. The authors also highlight the sensitivity of the commonly reported ROUGE F1 metric to summary length and suggest that future evaluation should group summarization systems by output length brackets.

Han Xu, Eric Martin, Ashesh Mahidadia

195.  Extractive Summarisation Based on Keyword Profile and Language Model
NAACL, 2015 Unsupervised Learning

System: The paper presents a statistical framework for summarizing scientific papers by extracting information-rich citation sentences that capture the main contributions of the paper. The framework involves two stages, where salient keywords are automatically discovered in the first stage and citation sentences that best capture the paper's main contributions are identified in the second stage. The approach outperforms current state-of-the-art systems in scientific paper summarization using methods rooted in quantitative statistics and information theory.

Daraksha Parveen, Michael Strube

196.  Integrating Importance, Non-Redundancy and Coherence in Graph-Based Extractive Summarization
IJCAI, 2015 Unsupervised Learning

The paper proposes a graph-based method for extractive single-document summarization that considers importance, non-redundancy, and local coherence simultaneously. The method uses a bipartite graph consisting of sentence and entity nodes to rank sentences based on importance and ensure non-redundancy and local coherence of the summary. The method is applied to scientific articles from the journal PLOS Medicine and achieves better results than other systems on this data. The method also achieves state-of-the-art results on DUC 2002 data, and incorporating the local coherence measure always achieves the best results. Human judgments are used to evaluate the coherence of the summaries.

Daraksha Parveen, Hans-Martin Ramsl, Michael Strube

197.  Topical Coherence for Graph-based Extractive Summarization
EMNLP, 2015 Unsupervised Learning

System: The paper presents an approach for extractive single-document summarization using a weighted graphical representation of documents obtained by topic modeling. The approach optimizes importance, coherence, and non-redundancy simultaneously using ILP. The system's performance is compared with state-of-the-art results on scientific articles from PLOS Medicine and on DUC 2002 data using ROUGE scores. Human judges evaluate the coherence of summaries generated by the system in comparison to two baselines, and the approach obtains competitive performance.

Zhongyu Wei, Wei Gao

198.  Gibberish, Assistant, or Master? Using Tweets Linking to News for Extractive Single-Document Summarization
SIGIR, 2015 Unsupervised Learning

The paper explores using tweets linking to news for generating extractive summaries of documents. By regarding every tweet as a vote for candidate sentences, they use unsupervised summarization models to rank candidate extracts via random walk on a heterogeneous graph. They can use the linking tweets to opportunistically "supervise" the summarization with no need for reference summaries. The influence of the volume and latency of tweets on the quality of output summaries is analyzed. Compared to truly supervised summarizers unaware of tweets, their method achieves significantly better results with a reasonably small tradeoff on latency. Compared to the same using tweets as auxiliary features, their method is comparable while needing fewer tweets and much shorter time to achieve significant outperformance.

Pengjie Ren, Furu Wei, Zhumin Chen, Jun Ma, Ming Zhou

199.  A Redundancy-Aware Sentence Regression Framework for Extractive Summarization
COLING, 2016 Supervised Learning

The paper proposes a new approach to extractive summarization that models sentence importance and redundancy simultaneously by evaluating the relative importance of a sentence given a set of selected sentences. The proposed method uses a new framework to conduct regression with respect to the relative gain of a sentence calculated by the ROUGE metric and incorporates additional features derived from sentence relations. Experiments on multi-document summarization datasets show that the proposed method outperforms state-of-the-art extractive summarization approaches.

Antoine J.-P. Tixier, Polykarpos Meladianos, Michalis Vazirgiannis

200.  Combining Graph Degeneracy and Submodularity for Unsupervised Extractive Summarization
EMNLP, 2017 Unsupervised Learning

The paper presents an unsupervised text summarization system that uses a submodularity framework to generate summaries in a greedy way while maintaining high performance. The system includes a novel coverage reward term that assigns scores to words based on the graph-of-words representation of text and the k-core decomposition algorithm. The system was evaluated on three datasets and achieved state-of-the-art performance, particularly in the meeting domain.

Tsutomu Hirao, Masaaki Nishino, Jun Suzuki, Masaaki Nagata

201.  Enumeration of Extractive Oracle Summaries
EACL, 2017 Unsupervised Learning

The paper proposes an Integer Linear Programming formulation to obtain extractive oracle summaries in terms of ROUGEn and an algorithm that enumerates all of the oracle summaries for a set of reference summaries to evaluate system summaries. The experimental results show that there is room for improvement in extractive summarization and that F-measures derived from the enumerated oracle summaries have stronger correlations with human judgment than those derived from single oracle summaries.

Sansiri Tarnpradab, Fei Liu, Kien A. Hua

202.  Toward Extractive Summarization of Online Forum Discussions via Hierarchical Attention Networks
AAAI, 2017 Supervised Learning

System: This paper discusses the task of forum thread summarization, which has not been extensively studied. The authors propose a model that uses hierarchical attention networks and neural attention mechanisms to build sentence and thread representations for summarization. The results show that their approach outperforms other methods and that removing redundancies is important for achieving the best results.

Masaru Isonuma, Toru Fujino, Junichiro Mori, Yutaka Matsuo, Ichiro Sakata

203.  Extractive Summarization Using Multi-Task Learning with Document Classification
EMNLP, 2017 Supervised Learning

The paper proposes a framework for automatic document summarization that extracts sentences using externally related information. The focus is on single document summarization using small amounts of reference summaries, and the framework uses multitask learning with curriculum learning for sentence extraction and document classification. The proposed method is evaluated on financial report and news corpus datasets, and the results show comparable performance to state-of-the-art systems.

Mousumi Akter

204.  Rank-Aware Gain-Based Evaluation of Extractive Summarization
CIKM, 2022

The paper discusses the limitations of the ROUGE metric for evaluating extractive summarization tasks and proposes a new evaluation metric called Sem-nCG, which is both rank-aware and semantic-aware. The paper also demonstrates how to generate more reliable semantic-aware ground truths for evaluating extractive summarization tasks without additional human intervention. Preliminary experimental results show that the Sem-nCG metric is semantic-aware and has a higher correlation with human judgement for single document summarization when a single reference is considered.

Ed Collins, Isabelle Augenstein, Sebastian Riedel

205.  A Supervised Approach to Extractive Summarisation of Scientific Papers
CONLL, 2017 Supervised Learning

The paper discusses the challenges of summarizing large, complex scientific publications using neural approaches, which require large datasets. The authors introduce a new dataset for summarization of computer science publications and develop models using both neural sentence encoding and traditional summarization features. They find that models that encode sentences and their local and global context perform best, outperforming established baseline methods.

Ramesh Nallapati, Feifei Zhai, Bowen Zhou

206.  SummaRuNNer: A Recurrent Neural Network Based Sequence Model for Extractive Summarization of Documents
AAAI, 2017 Supervised Learning

The paper presents SummaRuNNer, a Recurrent Neural Network (RNN) based model for extractive summarization of documents. The model achieves performance better than or comparable to state-of-the-art and is very interpretable, allowing visualization of its predictions broken up by abstract features such as information content, salience, and novelty. The paper also introduces abstractive training of the extractive model, which can train on human-generated reference summaries alone, eliminating the need for sentence-level extraction.

Abhishek Kumar Singh, Manish Gupta, Vasudeva Varma

207.  Hybrid MemNet for Extractive Summarization
CIKM, 2017 Supervised Learning

The paper discusses the problem of extractive text summarization and the limitations of conventional approaches that rely on manually compiled features. The authors propose a data-driven system called Hybrid MemNet, which uses an end-to-end deep network to learn a continuous unified representation of a document and generate its summary. The system captures both local and global sentential information and identifies summary-worthy sentences. Experimental results on two corpora show significant performance gains compared to state-of-the-art baselines.

Aishwarya Jadhav, Vaibhav Rajan

208.  Extractive Summarization with SWAP-NET: Sentences and Words from Alternating Pointer Networks
ACL, 2018 Supervised Learning

SWAP-NET is a new neural sequence-to-sequence model for extractive summarization that identifies both salient sentences and key words in an input document, and then combines them to form the extractive summary. The model uses a new two-level pointer network based architecture that models the interaction of key words and salient sentences. Experiments on large scale benchmark corpora demonstrate that SWAP-NET outperforms state-of-the-art extractive summarizers.

Alex Wang, Richard Yuanzhe Pang, Angelica Chen, Jason Phang, Samuel R. Bowman

209.  SQuALITY: Building a Long-Document Summarization Dataset the Hard Way
EMNLP, 2022

The paper discusses the challenges of assembling summarization datasets and proposes a new approach of hiring contractors to write original summaries from scratch. The resulting dataset, SQuALITY, consists of question-focused summaries and is shown to be challenging for state-of-the-art summarization systems. The authors also note that existing automatic evaluation metrics are weak indicators of summary quality. SQuALITY is available for use at https://github.com/nyu-mll/SQuALITY.

Kristjan Arumae, Fei Liu

210.  Reinforced Extractive Summarization with Question-Focused Rewards
ACL, 2018 Reinforced Learning

The paper proposes a new training method for extractive summarization using Cloze-style comprehension questions instead of human abstracts, which are often inaccurate due to difficulty aligning them with source documents. The method encourages system summaries to preserve important source content and share common words with the abstracts, and uses reinforcement learning with a question-focused reward function to promote concise, fluent, and informative summaries. Experiments show that the proposed method is effective and outperforms state-of-the-art systems on standard summarization datasets.

Chong Feng, Fei Cai, Honghui Chen, Maarten de Rijke

211.  Attentive Encoder-based Extractive Text Summarization
CIKM, 2018 Supervised Learning

The paper proposes an attentive encoder-based summarization (AES) model for generating article summaries that considers both the global information of a document and the relationships of sentences in the document. The model uses both unidirectional and bidirectional recurrent neural networks (RNNs) to construct encoders, resulting in unidirectional attentive encoder-based summarization (Uni-AES) and bidirectional attentive encoder-based summarization (Bi-AES). The experimental results show that Bi-AES outperforms Uni-AES and achieves substantial improvements over a relevant baseline.

Shashi Narayan, Shay B. Cohen, Mirella Lapata

212.  Ranking Sentences for Extractive Summarization with Reinforcement Learning
NAACL, 2018 Reinforced Learning

System: This paper proposes a new algorithm for single document summarization, which is the task of creating a shorter version of a document while retaining its main information. The algorithm is based on a sentence ranking task and uses a reinforcement learning objective to optimize the ROUGE evaluation metric. The authors trained a neural summarization model using this algorithm on the CNN and DailyMail datasets and found that it outperformed existing extractive and abstractive systems in both automatic and human evaluations.

Jonathan Pilault, Raymond Li, Sandeep Subramanian, Christopher Pal

213.  On Extractive and Abstractive Neural Document Summarization with Transformer Language Models
EMNLP, 2020 Supervised Learning

The paper presents a method for producing abstractive summaries of long documents using neural abstractive summarization. The method involves performing a simple extractive step before generating a summary, which is then used to condition the transformer language model on relevant information. The approach produces more abstractive summaries compared to prior work that employs a copy mechanism, while still achieving higher ROUGE scores. The authors provide extensive comparisons with strong baseline methods and multiple variants of their approach, using four different summarization tasks and datasets. They find that transformer-based methods produce summaries with fewer n-gram copies, leading to n-gram copying statistics that are more similar to human-generated abstracts. A human evaluation shows that transformers are ranked highly for coherence and fluency, but purely extractive methods score higher for informativeness and relevance. The authors hope that their architectures and experiments may serve as strong points of comparison for future work.

Sen Zhang, Jianwei Niu, Chuyuan Wei

214.  Fine-grained Factual Consistency Assessment for Abstractive Summarization Models
EMNLP, 2021

System: This paper proposes a framework called SumFC for assessing the factual consistency of abstractive summarization models. SumFC uses a two-stage approach to select relevant sentences and perform fine-grained consistency reasoning at the sentence level. The model is trained using data synthesis and contrastive loss to identify subtle cues. Experimental results show that SumFC outperforms previous methods and can distinguish detailed differences better.

Yue Dong, Yikang Shen, Eric Crawford, Herke van Hoof, Jackie C.K. Cheung

215.  BANDITSUM: Extractive Summarization as a Contextual Bandit
EMNLP, 2018 Reinforced Learning

The paper proposes a new method called BANDITSUM for training neural networks to perform single-document extractive summarization without heuristically-generated extractive labels. The approach treats extractive summarization as a contextual bandit problem, where the model chooses a sequence of sentences to include in the summary based on the document context. A policy gradient reinforcement learning algorithm is used to train the model to select sequences of sentences that maximize ROUGE score. The experiments show that BANDITSUM achieves better or comparable ROUGE scores than state-of-the-art approaches and converges using fewer update steps. Additionally, BANDITSUM performs significantly better than competing approaches when good summary sentences appear late in the source document.

Ryuji Kano, Yasuhide Miura, Motoki Taniguchi, Yan-Ying Chen, Francine Chen, Tomoko Ohkuma

216.  Harnessing Popularity in Social Media for Extractive Summarization of Online Conversations
EMNLP, 2018 Supervised Learning

The paper discusses using popularity measures in social media as a way to summarize online conversations. They propose a Disjunctive model that separates the contribution of content and context in determining popularity. They evaluate their model using a dataset where the informativeness of comments is annotated and show that their model outperforms baseline models that use popularity as a measure of informativeness.

Xingxing Zhang, Mirella Lapata, Furu Wei, Ming Zhou

217.  Neural Latent Extractive Document Summarization
EMNLP, 2018 Supervised Learning

System: The paper proposes a new approach to extractive summarization that uses a latent variable model where sentences are viewed as latent variables. This approach avoids the need for heuristically created sentence-level labels, which may be suboptimal. Instead, sentences with activated variables are used to infer gold summaries, and the loss during training comes directly from these summaries. The model was tested on the CNN/Dailymail dataset and was found to outperform a strong extractive baseline trained on heuristically approximated labels and perform competitively with several recent models.

Jiaxin Shi, Chen Liang, Lei Hou, Juanzi Li, Zhiyuan Liu, Hanwang Zhang

218.  DeepChannel: Salience Estimation by Contrastive Learning for Extractive Document Summarization
AAAI, 2019 Supervised Learning

DeepChannel is a neural model for extractive document summarization that uses a salience score to represent the importance of sentences in a document. The salience score is estimated using an attention-based deep neural network, and the model uses a contrastive training strategy to learn the salience estimation network. The most salient sentences are iteratively extracted from the document to generate a summary. The model achieves state-of-the-art ROUGE scores on the CNN/Daily Mail dataset and shows strong robustness in out-of-domain tests. It also demonstrates tremendous data efficiency, achieving a high ROUGE-1 F-1 score with only 1/100 of the training set.

Yang Gao, Christian Meyer, Mohsen Mesgar, Iryna Gurevych

219.  Reward Learning for Efficient Reinforcement Learning in Extractive Document Summarisation
IJCAI, 2019 Reinforced Learning

The paper proposes a new approach to document summarization using Reinforcement Learning (RL) algorithms. The approach, called RELIS, learns a reward function with Learning-to-Rank (L2R) algorithms at training time and uses this reward function to train an input-specific RL policy at test time. This approach reduces training time by two orders of magnitude compared to state-of-the-art models while performing on par with them. The authors prove that RELIS guarantees to generate near-optimal summaries with appropriate L2R and RL algorithms. The approach is evaluated on extractive multi-document summarization.

Jiacheng Xu

220.  Neural Extractive Text Summarization with Syntactic Compression
EMNLP, 2019 Supervised Learning

The paper discusses recent neural network approaches to summarization, which are either selection-based extraction or generation-based abstraction. The authors present a neural model for single-document summarization that combines extraction and syntactic compression. The model selects sentences from the document, identifies possible compressions based on constituency parses, and scores those compressions with a neural model to produce the final summary. The authors construct oracle extractive-compressive summaries for learning and achieve strong performance on the CNN/Daily Mail and New York Times datasets, outperforming an off-the-shelf compression module. Human and manual evaluation shows that the model's output generally remains grammatical.

Zhengyuan Liu, Nancy F. Chen

221.  Exploiting Discourse-Level Segmentation for Extractive Summarization
EMNLP, 2019 Supervised Learning

The paper proposes using discourse-level segmentation to improve extractive summarization, as it can more precisely identify the core content in a document compared to using sentences as the elementary unit. The authors investigate the effectiveness of this approach using two basic neural network architectures and a deep bi-directional transformer, and achieve state-of-the-art performance when combining discourse-level segmentation with their adapted contextual representation model on the CNN/Daily Mail dataset.

Wen Xiao, Giuseppe Carenini

222.  Extractive Summarization of Long Documents by Combining Global and Local Context
EMNLP, 2019 Supervised Learning

The paper proposes a new neural summarization model for long documents that considers both global and local context. The model outperforms previous work on two scientific paper datasets and shows that its benefits increase with longer documents. Surprisingly, the study finds that the benefits of the model come mainly from modeling the local context, even for the longest documents.

Ruipeng Jia, Yanan Cao, Haichao Shi, Fang Fang, Yanbing Liu, Jianlong Tan

223.  DistilSum: Distilling the Knowledge for Extractive Summarization
CIKM, 2020 Supervised Learning

DistilSum is a new approach to extractive summarization that uses a teacher mechanism and student model to produce high entropy soft targets at a high temperature. The student model is trained to match these targets and then tested with a temperature of 1 to distill for ground-truth labels. Compared to the current best extractive classifier, BERTSUMEXT, DistilSum achieves a substantial improvement in both text similarity and performance of the classifier on the CNN/DM dataset. The source code for DistilSum will be available on Github.

Matt Grenander, Yue Dong, Jackie C. K. Cheung, Annie Louis

224.  Countering the Effects of Lead Bias in News Summarization via Multi-Stage Training and Auxiliary Losses
EMNLP, 2019 Reinforced Learning

The paper discusses how sentence position is a strong feature for news summarization, but recent neural systems excessively exploit this trend, which can be detrimental when summarizing documents where important content is in later parts of the article. The authors propose two techniques to make systems sensitive to the importance of content in different parts of the article: pretraining the model with randomly shuffled sentences and using an auxiliary ROUGE-based loss. These techniques significantly improve the performance of a reinforcement learning-based extractive system, with the auxiliary loss being more powerful than pretraining.

Ling Luo, Xiang Ao, Yan Song, Feiyang Pan, Min Yang, Qing He

225.  Reading Like HER: Human Reading Inspired Extractive Summarization
EMNLP, 2019 Supervised Learning

The paper proposes a new approach to extractive text summarization for long documents by simulating the two-stage process of human summarization. The approach uses a convolutional neural network to encode the gist of paragraphs for rough reading and a decision-making policy with an adapted termination mechanism for careful reading. The problem is formulated as a contextual bandit problem and solved with policy gradient. Experiments on the CNN and DailyMail datasets show that the proposed method provides high-quality summaries with varied length and outperforms state-of-the-art extractive methods in terms of ROUGE metrics.

Kristjan Arumae, Fei Liu

226.  Guiding Extractive Summarization with Question-Answering Rewards
NAACL, 2019 Reinforced Learning

The paper discusses the challenge of developing a supervised summarization system due to the lack of ground-truth data. The authors propose a novel framework that uses question-answering rewards to guide the system in producing informative and fluent summaries that perform well on question-answering tasks. The system learns from human abstracts and aims to produce summaries that can answer important questions. The results show that the proposed framework outperforms strong summarization baselines as evaluated by automatic metrics and human assessors.

Léo Bouscarrat, Antoine Bonnefoy, Thomas Peel, Cécile Pereira, EURA NOVA

227.  STRASS: A Light and Effective Method for Extractive Summarization Based on Sentence Embeddings
ACL, 2019 Supervised Learning

The paper introduces STRASS, an extractive text summarization method that selects sentences with the closest embeddings to the document embedding. The model learns a transformation of the document embedding to minimize the similarity between the extractive summary and the ground truth summary. The training is inexpensive and can be done on CPU, and the inference time is short and linear. The paper also introduces the French CASS dataset and shows that the method performs similarly to state-of-the-art extractive methods with effective training and inferring time.

Ye Liu, Jian-Guo Zhang, Yao Wan, Congying Xia, Lifang He, Philip S. Yu

228.  HETFORMER: Heterogeneous Transformer with Sparse Attention for Long-Text Extractive Summarization
EMNLP, 2021 Supervised Learning

The paper proposes a new approach for extractive summarization called HETFORMER, which is based on a Transformer-based pre-trained model with multi-granularity sparse attentions. The approach models different types of semantic nodes in raw text as a potential heterogeneous graph and directly learns heterogeneous relationships among nodes by Transformer. The experiments show that HETFORMER achieves state-of-the-art performance in Rouge F1 while using less memory and fewer parameters compared to existing methods that use GNNs with pre-trained models.

Ming Zhong, Pengfei Liu, Yiran Chen, Danqing Wang, Xipeng Qiu, Xuanjing Huang

229.  Extractive Summarization as Text Matching
ACL, 2020 Supervised Learning

The paper proposes a new approach to building neural extractive summarization systems by formulating the task as a semantic text matching problem. This paradigm shift is based on a comprehensive analysis of the gap between sentence-level and summary-level extractors. The authors demonstrate the effectiveness of the matching framework by achieving state-of-the-art results on the CNN/DailyMail dataset and five other datasets. They also release their codes, processed dataset, and generated summaries to encourage further research in this area.

Yang Deng, Wenxuan Zhang, Yaliang Li, Min Yang, Wai Lam, Ying Shen, Hong Kong, 2Alibaba

230.  Bridging Hierarchical and Sequential Context Modeling for Question-driven Extractive Answer Summarization
SIGIR, 2020 Supervised Learning

The paper discusses the challenges of answer summarization in non-factoid question answering and proposes a unified model that integrates hierarchical and sequential context modeling for question-driven extractive answer summarization. The model uses a hierarchical compare-aggregate method to integrate the interaction between QA pairs in both word-level and sentence-level into the final question and answer representations. The question-aware sequential extractor is then used to produce a summary for the lengthy answer. The experimental results show that the proposed method achieves superior performance on WikiHowQA and PubMedQA.

Fangfang Zhang, Jin-ge Yao, Rui Yan

231.  On the Abstractiveness of Neural Document Summarization
EMNLP, 2018

System: The paper discusses modern neural document summarization systems that aim to produce abstractive summaries. The authors conducted a study to verify the degree of abstractiveness of these systems and found that many tend to be near-extractive in practice. They also implemented a pure copy system that achieved comparable results while being more computationally efficient. The authors suggest that future efforts should focus on developing more efficient systems that can better utilize the vocabulary in the original document.

Yvette Graham

232.  Re-evaluating Automatic Summarization with BLEU and 192 Shades of ROUGE
EMNLP, 2015

The paper analyzes current evaluation methodologies for summarization metrics and identifies concerns such as the absence of methods for testing improvements over a baseline and the omission of important components of human assessment. The authors propose an evaluation methodology that overcomes these challenges and reveals which metric variants outperform others. They also find that the machine translation metric BLEU performs similarly to ROUGE for evaluating summarization systems. The authors replicate a recent evaluation that relied on suboptimal ROUGE variants and find different conclusions about the relative performance of state-of-the-art summarization systems.

Danqing Wang, Pengfei Liu, Yining Zheng, Xipeng Qiu, Xuanjing Huang

233.  Heterogeneous Graph Neural Networks for Extractive Document Summarization
ACL, 2020 Supervised Learning

System: The paper presents a new approach called HETERSUMGRAPH for extractive document summarization. It uses a graph-based neural network that includes semantic nodes of different granularity levels, which act as intermediaries between sentences and enrich cross-sentence relations. The graph structure is flexible and can be extended from a single-document setting to multi-document by introducing document nodes. The authors claim to be the first to introduce different types of nodes into graph-based neural networks for extractive document summarization and have performed a comprehensive qualitative analysis to investigate their benefits. The code for HETERSUMGRAPH will be released on Github.

Jiacheng Xu, Zhe Gan, Yu Cheng, Jingjing Liu

234.  Discourse-Aware Neural Extractive Text Summarization
ACL, 2020 Supervised Learning

The paper introduces a new neural summarization model called DISCOBERT1, which addresses issues with sentence-based extractive models and the limitations of BERT in capturing long-range dependencies in documents. DISCOBERT extracts sub-sentential discourse units and constructs structural discourse graphs to capture long-range dependencies, which are encoded with Graph Convolutional Networks. The proposed model outperforms state-of-the-art methods on popular summarization benchmarks compared to other BERT-base models.

Ruifeng Yuan, Zili Wang, Wenjie Li

235.  Fact-level Extractive Summarization with Hierarchical Graph Mask on BERT
COLING, 2020 Supervised Learning

The paper proposes a new approach to extractive summarization that focuses on fact-level semantic units rather than individual sentences. The model uses a hierarchical structure to incorporate multiple levels of textual information and is combined with BERT using a hierarchical graph mask to improve natural language understanding. The experiments on the CNN/DaliyMail dataset show that the proposed model achieves state-of-the-art results.

Ruipeng Jia, Yanan Cao, Hengzhu Tang, Fang Fang, Cong Cao, Shi Wang

236.  Neural Extractive Summarization with Hierarchical Attentive Heterogeneous Graph Network
EMNLP, 2020 Supervised Learning

The paper discusses the challenges of sentence-level extractive text summarization, particularly in modeling redundancy between extracted sentences. The authors propose a new approach called HAHSum, which uses a hierarchical attentive heterogeneous graph to model different levels of information and spotlight redundancy dependencies between sentences. The approach iteratively refines sentence representations with a redundancy-aware graph and delivers label dependencies by message passing. Experiments on large-scale benchmark corpora demonstrate that HAHSum outperforms previous extractive summarizers.

Zhengyuan Liu, Ke Shi, Nancy F. Chen

237.  Conditional Neural Generation using Sub-Aspect Functions for Extractive News Summarization
EMNLP, 2020 Supervised Learning

The paper discusses the challenges of text summarization in the news domain, where neural models easily overfit due to the inverted pyramid writing style and the need to generate a variety of summaries for different users. The authors propose a neural framework that can flexibly control summary generation by introducing subaspect functions (importance, diversity, position) regulated by control codes. They demonstrate that extracted summaries with minimal position bias are comparable to those generated by standard models that take advantage of position preference, and that news summaries generated with a focus on diversity can be more preferred by human raters. The authors suggest that a more flexible neural summarization framework providing more control options could be desirable in tailoring to different user preferences.

Shashi Narayan, Joshua Maynez, Jakub Adamek, Daniele Pighin, Blaz̆ Bratanic, Ryan McDonald

238.  Stepwise Extractive Summarization and Planning with Structured Transformers
EMNLP, 2020 Supervised Learning

The paper proposes encoder-centric stepwise models for extractive summarization using structured transformers - HiBERT and Extended Transformers. The models enable stepwise summarization by injecting the previously generated summary into the structured transformer as an auxiliary sub-structure. The models are efficient in modeling the structure of long inputs and do not rely on task-specific redundancy-aware modeling, making them a general purpose extractive content planner for different tasks. The stepwise models achieve state-of-the-art performance in terms of Rouge without any redundancy aware modeling or sentence filtering in CNN/DailyMail extractive summarization and Rotowire table-to-text generation. Amongst the two structured transformers tested, stepwise Extended Transformers provides the best performance across both datasets and sets a new standard for these challenges.

Peng Cui, Le Hu, Yuanchao Liu

239.  Enhancing Extractive Text Summarization with Topic-Aware Graph Neural Networks
COLING, 2020 Supervised Learning

The paper proposes a new approach to extractive text summarization that addresses the limitations of existing models in capturing intersentence relationships and topical information. The proposed model uses a graph neural network to efficiently represent the document structure and a joint neural topic model to discover latent topics for sentence selection. The experimental results show that the proposed model outperforms existing approaches on both short and long document datasets, demonstrating its robustness in different document genres and lengths. The model's effectiveness in long document summarization is attributed to its ability to preselect salient contents using topical information.

Yash Agrawal, Vivek Anand, Manish Gupta, S Arunachalam, Vasudeva Varma

240.  Goal-Directed Extractive Summarization of Financial Reports
CIKM, 2021 Supervised Learning

The paper discusses the importance of extractive summarization of financial reports filed by companies, which impact their stock prices. The lack of in-domain labeled summarization data is a major obstacle to train finance-specific summarization models. The paper proposes a goal-directed approach to modeling 10-K report summarization, leveraging summaries with labeled goal-related data for stock buy/sell classification. The paper also considers a multi-task learning method with an industry classification auxiliary task to provide improvements. The proposed method significantly outperforms strong baselines in intrinsic and extrinsic evaluations for stock buy/sell classification and portfolio construction tasks.

Ruipeng Jia, Yanan Cao, Fang Fang, Yuchen Zhou, Zheng Fang, Yanbing Liu, Shi Wang

241.  Deep Differential Amplifier for Extractive Summarization
ACL, 2021 Supervised Learning

The paper discusses the issue of imbalanced sentence classification in extractive summarization, which cannot be easily addressed by data sampling or augmentation algorithms. To solve this problem, the authors propose a deep differential amplifier framework that calculates and amplifies the semantic difference between each sentence and other sentences, and applies a residual unit to deepen the architecture. The model pays more attention to the pivotal information of one sentence, which is different from previous approaches that model all informative context in the source document. Experimental results show that the proposed summarizer performs competitively against state-of-the-art methods. The source code will be available on Github.

Jad Kabbara, Jackie Chi Kit Cheung

242.  Post-Editing Extractive Summaries by Definiteness Prediction
EMNLP, 2021 Supervised Learning

The paper discusses the limitations of extractive summarization and proposes a postediting step that focuses on the definiteness of noun phrases to improve the coherence and readability of extractive summaries. The proposed system was evaluated through human and automatic evaluation studies, which showed that the system generated improved summaries. The authors also noted that the system relied on local cues rather than pragmatic reasoning to make decisions.

Stefanos Angelidis, Reinald Kim Amplayo, Yoshihiko Suhara, Xiaolan Wang, Mirella Lapata

243.  Extractive Opinion Summarization in Quantized Transformer Spaces
TACL, 2021 Supervised Learning

ones. The paper presents the Quantized Transformer, an unsupervised system for extractive opinion summarization that uses a clustering interpretation of the quantized space and a novel extraction algorithm to discover popular opinions among hundreds of reviews. The system also enables controllable summarization without further training by utilizing properties of the quantized space to extract aspect-specific summaries. The authors also introduce SPACE, a large-scale evaluation benchmark for opinion summarizers, and demonstrate the promise of their approach through experiments and human studies.

Marina Litvak, Sami Shamoon, Natalia Vanetik

244.  Krimping texts for better summarization
EMNLP, 2015 Unsupervised Learning

The paper introduces a new approach for automated text summarization using the Minimum Description Length principle and the Krimp dataset compression algorithm. The approach represents a text as a transactional dataset and describes it using frequent sequences of words. The summary is compiled from sentences that compress the document, with the problem of summarization reduced to maximal coverage. The approach is evaluated using a greedy algorithm and the results are presented.

Xingxing Zhang, Furu Wei, Ming Zhou

245.  HIBERT: Document Level Pre-training of Hierarchical Bidirectional Transformers for Document Summarization
ACL, 2019 Supervised Learning

The paper proposes a new model called HIBERT for document encoding in neural extractive summarization models. It pre-trains the model using unlabeled data and applies it to the summarization model, resulting in better performance compared to randomly initialized models. The proposed model achieves state-of-the-art performance on the CNN/Dailymail and New York Times datasets.

Jingun Kwon, Naoki Kobayashi, Hidetaka Kamigaito, Manabu Okumura

246.  Considering Nested Tree Structure in Sentence Extractive Summarization with Pre-trained Transformer
EMNLP, 2021 Supervised Learning

The paper proposes a new model called NeRoBERTa for sentence extractive summarization, which uses nested tree structures consisting of syntactic and discourse trees to improve coherence and informativeness of the summary. The model outperforms baseline models in ROUGE and achieves comparable scores to state-of-the-art models in human evaluation. The paper highlights the difficulty of using pre-trained BERT-based encoders for this task and suggests the use of nested tree structures for better performance.

Yash Gupta, Pawan Sasanka Ammanamanchi, Shikha Bordia, Arjun Manoharan, Deepak Mittal, Ramakanth Pasunuru, Manish Shrivastava, Maneesh Singh, Mohit Bansal, Preethi Jyothi

247.  The Effect of Pretraining on Extractive Summarization for Scientific Documents
NAACL, 2021 Supervised Learning

The paper explores the impact of pretraining on a BERT-based extractive summarization system for scientific documents. The authors found that an intermediate pretraining step using existing summarization datasets improved performance and achieved state-of-the-art results on a scientific summarization dataset. They also analyzed the effects of varying the size and domain of the pretraining corpus, changing the length of the input sequence, and varying target tasks. Additionally, they investigated how intermediate pretraining interacts with contextualized word embeddings trained on different domains.

Yin Jou Huang, Sadao Kurohashi

248.  Extractive Summarization Considering Discourse and Coreference Relations based on Heterogeneous Graph
EACL, 2021 Supervised Learning

System: The paper proposes a model for extractive summarization that incorporates both discourse and coreference relations. The model uses a heterogeneous graph containing three types of nodes, each corresponding to text spans of different granularity. Experimental results on a benchmark summarization dataset show that the proposed method is effective.

Peng Cui

249.  Sliding Selector Network with Dynamic Memory for Extractive Summarization of Long Documents
NAACL, 2021 Supervised Learning

The paper proposes a new approach to extractive summarization of long-form documents using a sliding selector network with dynamic memory. This approach addresses the issue of loss of summary-relevant contents due to the length limitation of text encoder in neural-based summarization models. The sliding window extracts summary sentences segment by segment and the memory mechanism preserves and updates history information dynamically, allowing semantic flow across different windows. Experimental results on two large-scale datasets of scientific papers show that this model outperforms previous state-of-the-art models. Qualitative and quantitative investigations are also performed to understand how the model works and where the performance gain comes from.

Ruipeng Jia, Yanan Cao, Haichao Shi, Fang Fang, Pengfei Yin, Shi Wang

250.  Flexible Non-Autoregressive Extractive Summarization with Threshold: How to Extract a Non-Fixed Number of Summary Sentences
AAAI, 2021 Supervised Learning

The paper proposes a non-autoregressive method for extractive summarization called ThresSum, which extracts a non-fixed number of summary sentences without sorting them by predicted probabilities. Instead, ThresSum picks sentences individually from the source document when the predicted probabilities exceed a threshold. During training, the model enhances sentence representation through iterative refinement and weak supervision with soft labels generated progressively by adjusting the temperature with a knowledge distillation algorithm. ThresSum outperforms BERTSUMEXT with a substantial improvement of 0.74 ROUGE-1 score on CNN/DM dataset.

Baoyu Jing, Zeyu You, Tao Yang, Wei Fan, Hanghang Tong

251.  Multiplex Graph Neural Network for Extractive Text Summarization
EMNLP, 2021 Supervised Learning

The paper proposes a new approach to extractive text summarization, which involves extracting the most representative sentences from a given document. The authors note that sentence embedding is important for creating a good summary, and that recent studies have used graph neural networks to capture inter-sentential relationships. However, these approaches do not consider multiple types of inter-sentential relationships or intra-sentential relationships. To address these issues, the authors propose a Multiplex Graph Convolutional Network (MultiGCN) to model different types of relationships among sentences and words. They then use this approach to create a Multiplex Graph Summarization (Multi-GraS) model for extractive text summarization. The authors evaluate their approach on the CNN/DailyMail benchmark dataset and demonstrate its effectiveness.

Peggy Tang, Kun Hu, Rui Yan, Lei Zhang, Junbin Gao, Zhiyong Wang

252.  OTExtSum: Extractive Text Summarisation with Optimal Transport
NAACL, 2022 Unsupervised Learning

The paper is written by a group of researchers from various institutions, including the University of Sydney and Renmin University of China. The abstract does not provide a clear indication of the topic or focus of the paper, but it does list the authors' affiliations and contact information. Further analysis of the full paper would be necessary to understand its content and purpose.

Nianlong Gu

253.  MemSum: Extractive Summarization of Long Documents Using Multi-Step Episodic Markov Decision Processes
ACL, 2022 Reinforced Learning

MemSum is a reinforcement-learning-based extractive summarizer that considers the text content of the sentence, the global text context of the rest of the document, and the extraction history consisting of the set of sentences that have already been extracted. It obtains state-of-the-art test-set performance in summarizing long documents taken from PubMed, arXiv, and GovReport. Ablation studies demonstrate the importance of local, global, and history information. A human evaluation confirms the high quality and low redundancy of the generated summaries, stemming from MemSum’s awareness of extraction history.

Faisal Ladhak, Esin Durmus, He He, Claire Cardie, Kathleen McKeown

254.  Faithful or Extractive? On Mitigating the Faithfulness-Abstractiveness Trade-off in Abstractive Summarization
ACL, 2022 Supervised Learning

The paper discusses the issue of faithfulness errors in abstractive summarization systems and proposes a framework for evaluating the effectiveness of such systems. The authors generate a faithfulness-abstractiveness trade-off curve to serve as a control and show that current methods for improving faithfulness fail to consistently improve over the control at the same level of abstractiveness. They then introduce a selector to identify the most faithful and abstractive summary for a given document and demonstrate that this system can attain higher faithfulness scores in human evaluations while being more abstractive than the baseline system on two datasets. Additionally, the authors show that their system achieves a better faithfulness-abstractiveness trade-off than the control at the same level of abstractiveness.

Qian Ruan, Malte Ostendorff, Georg Rehm

255.  HiStruct+: Improving Extractive Text Summarization with Hierarchical Structure Information
ACL, 2022 Supervised Learning

The paper proposes a new approach to improve extractive summarization models by explicitly incorporating hierarchical structure information into a pre-trained, encoder-only Transformer language model. The proposed HiStruct+ model outperforms a strong baseline on three datasets, including PubMed and arXiv, and the improvement is more significant for datasets with more conspicuous hierarchical structures. The ablation study shows that the hierarchical position information is the main contributor to the model's state-of-the-art performance.

Qianren Mao, Hongdong Zhu, Junnan Liu, Cheng Ji, Hao Peng, Jianxin Li, Lihong Wang, Zheng Wang

256.  MuchSUM: Multi-channel Graph Neural Network for Extractive Summarization
SIGIR, 2022 Supervised Learning

The paper discusses the limitations of using pre-trained BERT-based encoders for extractive text summarization and proposes a new approach called MuchSUM, which is a multi-channel graph convolutional network that incorporates multiple summary-worthy features. The approach introduces three specific graph channels to encode node textual features, node centrality features, and node position features, respectively, under bipartite word-sentence heterogeneous graphs. A cross-channel convolution operation is designed to distill the common graph representations shared by different channels, and the sentence representations of each channel are fused for extractive summarization. The approach also investigates three weighted graphs in each channel to infuse edge features for graph-based summarization modeling. Experimental results demonstrate that the MuchSUM model can achieve considerable performance compared with some BERT-initialized graph-based extractive summarization systems.

Siya Qi, Lei Li, Yiyang Li, Jin Jiang, Dingxin Hu, Yuze Li, Yingqi Zhu, Yanquan Zhou, Marina Litvak, Natalia Vanetik

257.  SAPGraph: Structure-aware Extractive Summarization for Scientific Papers with Heterogeneous Graph
AACL, 2022 Supervised Learning

The paper discusses the challenges of scientific paper summarization in NLP and presents a solution called SAPGraph1. The framework utilizes paper structure to generate more comprehensive and valuable summaries compared to previous works that tend to extract summaries from the head of the paper. SAPGraph is based on a structure-aware heterogeneous graph that models the document into a graph with three kinds of nodes and edges based on structure information of facets and knowledge. The paper also provides a large-scale dataset of COVID-19-related papers, CORD-SUM, for experiments.

Kundan Krishna, Jeffrey Bigham, Zachary C. Lipton

258.  Does Pretraining for Summarization Require Knowledge Transfer?
EMNLP, 2021

The paper discusses pretraining techniques in text summarization and challenges the idea that knowledge transfer is the reason for its success. The authors show that pretraining on randomly selected character n-grams can achieve similar performance to models pretrained on real corpora, which could eliminate concerns over offensive language, bias, and copyright issues. The authors also design several tasks to test the structure of pretraining tasks, but find no significant benefit, leaving the possibility of a small role for knowledge transfer.

Ella Hofmann-Coyle, Mayank Kulkarni, Lingjue Xie, Mounica Maddela, Daniel Preoţiuc-Pietro

259.  Extractive Entity-Centric Summarization as Sentence Selection using Bi-Encoders
AACL, 2022 Supervised Learning

The paper discusses entity-centric summarization, which produces a summary of a document specific to a given target entity. Extractive summaries are preferred over abstractive ones as they preserve factuality and can be used in downstream tasks. The authors explore methods to solve this task by recasting it as a sentence selection task, using methods inspired by information retrieval. They test different architecture variants and loss functions and achieve up to a 5.8 F1 improvement over past state-of-the-art and outperform the entity-centric Lead 3 heuristic by 1.1 F1. The authors also show strong results on the related task of salient sentence selection for an entity.

Qianqian Xie, Jimin Huang

260.  GRETEL: Graph Contrastive Topic Enhanced Language Model for Long Document Extractive Summarization
COLING, 2022 Supervised Learning

The paper proposes a new model called Graph contRastivE Topic Enhanced Language model (GRETEL) that combines the graph contrastive topic model with pre-trained language models (PLMs) to improve text summarization. The graph contrastive topic model integrates the hierarchical transformer encoder and graph contrastive learning to capture and integrate global semantic information from the document context and the gold summary. GRETEL aims to extract salient sentences that are topically related to the gold summary, rather than redundant sentences that cover sub-optimal topics. Experimental results on general domain and biomedical datasets show that GRETEL outperforms state-of-the-art methods.

Tuan-Anh Phan, Nam Bui

261.  HeterGraphLongSum: Heterogeneous Graph Neural Network with Passage Aggregation for Extractive Long Document Summarization
COLING, 2022 Supervised Learning

The paper discusses the effectiveness of Graph Neural Network (GNN)-based models in Natural Language Processing (NLP) tasks, particularly in Extractive Document Summarization (EDS). However, long-form document summarization using graph-based approaches is still a challenge. The paper proposes a new model called HeterGraphLongSum, which includes three types of semantic units (word, sentence, and passage) to represent long documents in a graph structure. The model achieves promising results for the extractive long document summarization problem without relying on pre-trained language models like BERT. The source code is available on Github for further exploration.

Xuan-Dung Doan, Le-Minh Nguyen, Nam Bui

262.  Multi Graph Neural Network for Extractive Long Document Summarization
COLING, 2022 Supervised Learning

System: The paper discusses the use of Heterogeneous Graph Neural Networks (HeterGNN) for document summarization, specifically for long documents. The authors address the issue of lacking inter-sentence connections and propose a solution by building a graph on sentence-level nodes and combining it with HeterGNN to capture semantic information. The experiments conducted on two benchmark datasets show that this method achieves state-of-the-art results in the field of document summarization.

Jin-ge Yao, Xiaojun Wan, Jianguo Xiao

263.  Compressive Document Summarization via Sparse Optimization
IJCAI, 2015 Unsupervised Learning

System: The paper presents a sparse optimization framework for extractive document summarization with a decomposable convex objective function. An efficient ADMM algorithm is derived to solve it, and an additional sentence dissimilarity term is introduced to encourage diversity in the summaries. The framework achieves significant improvement over previous related work and is generalized to compressive summarization with a block coordinate descent algorithm. The compressive summarization results are competitive against state-of-the-art results while maintaining reasonable readability, as demonstrated on DUC 2006 and DUC 2007 datasets.

Wenpeng Yin, Yulong Pei

264.  Optimizing Sentence Modeling and Selection for Document Summarization
IJCAI, 2015 Unsupervised Learning

The paper proposes a new approach to extractive document summarization, which involves selecting salient sentences from a given document. The approach, called DivSelect+CNNLM, addresses two challenges: modeling information redundancy among candidate sentences and selecting the most appropriate sentences. It introduces a novel neural network language model based on convolutional neural network (CNN) to project sentences into dense distributed representations and models sentence redundancy using cosine similarity. The selection process is formulated as an optimization problem, constructing a diversified selection process (DivSelect) to select sentences with high prestige and dissimilarity. The approach is evaluated on benchmark datasets and shows effectiveness in summarization.

Diogo PernesÁç, Afonso MendesÁ, André F. T. MartinsÈÉÆ, ÁPriberam çUniversidade

265.  Improving abstractive summarization with energy-based re-ranking
EMNLP, 2022 Supervised Learning

The paper discusses the weaknesses of current abstractive summarization systems, such as the omission of relevant information and the generation of factual inconsistencies. It proposes an energy-based model that learns to re-rank summaries according to recent advances in summarization metrics, which consistently improves the scores achieved by the predicted summaries. However, the paper also notes that the re-ranking approach should be used with care for highly abstractive summaries, as the available metrics are not yet sufficiently reliable for this purpose.

Akim Tsvigun, Ivan Lysenko, Danila Sedashov, Ivan Lazichny, Eldar Damirov, Vladimir Karlov, Artemy Belousov, Leonid Sanochkin, Maxim Panov, Alexander Panchenko, Mikhail Burtsev, Artem Shelmanov

266.  Active Learning for Abstractive Text Summarization
EMNLP, 2022 Supervised Learning

The paper discusses the challenges of creating human-curated annotated datasets for abstractive text summarization (ATS) and the potential of Active Learning (AL) to reduce the amount of annotation required. However, there were no effective AL query strategies for ATS due to the fact that uncertain instances are usually noisy and selecting them can degrade the model performance. The paper proposes the first effective query strategy for AL in ATS based on diversity principles, which improves the model performance in terms of ROUGE and consistency scores. The paper also analyzes the effect of self-learning and shows that it can further increase the performance of the model.

Ziqiang Cao, Wenjie Li, Sujian Li, Furu Wei, Yanran Li

267.  AttSum: Joint Learning of Focusing and Summarization with Neural Attention
COLING, 2016 Supervised Learning

The paper discusses the challenges of extractive query-focused summarization, specifically the tasks of query relevance ranking and sentence saliency ranking. Previous systems have struggled to perform both tasks effectively, but the proposed system, AttSum, tackles them jointly using distributed representations and an attention mechanism. The system is evaluated on benchmark datasets and achieves competitive performance without the use of hand-crafted features. The authors also observe that the sentences identified as relevant to the query do indeed meet the query's needs.

Charlie Egan, Advaith Siddharthan

268.  Summarising the points made in online political debates
ACL, 2016 Unsupervised Learning

The paper proposes an abstractive approach to summarize argumentative discussions in online communities. The approach extracts key content through 'point' extraction, where a point is a verb and its syntactic arguments. The approach uses dependency parse information and verb case frames to identify and extract valid points and generates an abstractive summary that discusses the key points being made in the debate. The approach was evaluated using a corpus of online political debates and showed significant improvements over a high-performing extractive summarizer.

Jianpeng Cheng, Mirella Lapata

269.  Neural Summarization by Extracting Sentences and Words
ACL, 2016 Supervised Learning

System: The paper proposes a new approach to extractive summarization using neural networks and continuous sentence features. The approach includes a hierarchical document encoder and an attention-based extractor, allowing for different classes of summarization models. The models were trained on large scale corpora and achieved results comparable to the state of the art without any linguistic annotation.

Tatsuya Ishigaki, Hiroya Takamura, Manabu Okumura

270.  Summarizing Lengthy Questions
IJCNLP, 2017 Supervised Learning

System: The paper proposes the task of question summarization and analyzes question-summary pairs from a Community Question Answering site. It finds that some questions require abstractive approaches instead of extractive approaches. The authors created a dataset and trained extractive and abstractive summarization models, comparing them based on ROUGE scores and manual evaluations. The results show that an abstractive method using an encoder-decoder model with a copying mechanism performs better according to both ROUGE-2 F-measure and human judges' evaluations.

Zhihua Jiang, Junzhan Yang, Dongning Rao

271.  SEHY: A Simple yet Effective Hybrid Model for Summarization of Long Scientific Documents
AACL, 2022 Supervised Learning

The paper discusses the challenges of long-document summarization and proposes a Simple yet Effective HYbrid approach (SEHY) that selects salient sections instead of sentences for summary generation. The approach exploits discourse information and avoids fulltext understanding while retaining salient information within the length limit. The paper also presents two strategies for training the extractor and evaluates the approach on a large-scale scientific paper dataset. The authors also discuss how the disciplinary class of a scientific paper affects the performance of SEHY. Experimental results show the effectiveness of the approach and interesting findings on arXiv and its subsets.

Yuxiang Wu, Baotian Hu

272.  Learning to Extract Coherent Summary via Deep Reinforcement Learning
AAAI, 2018 Reinforced Learning

The paper proposes a neural coherence model to capture cross-sentence semantic and syntactic coherence patterns in order to extract more coherent summaries. The proposed model can be trained in an end-to-end fashion using unlabeled data and is used in combination with the ROUGE package to design a reinforcement learning method to train a neural extractive summarizer called the Reinforced Neural Extractive Summarization (RNES) model. The RNES model learns to optimize coherence and informative importance of the summary simultaneously and outperforms existing baselines in terms of ROUGE on the CNN/Daily Mail dataset. The qualitative evaluation shows that summaries produced by RNES are more coherent and readable.

Qingyu Zhou, Nan Yang, Furu Wei, Shaohan Huang, Ming Zhou, Tiejun Zhao

273.  Neural Document Summarization by Jointly Learning to Score and Select Sentences
ACL, 2018 Supervised Learning

The paper presents a new approach to extractive document summarization that combines sentence scoring and selection into a single neural network framework. The approach uses a hierarchical encoder to represent the document sentences and integrates the selection strategy into the scoring model. Experiments on the CNN/Daily Mail dataset show that the proposed framework outperforms existing extractive summarization models.

Xiuying Chen, Shen Gao, Chongyang Tao, Yan Song, Dongyan Zhao, Rui Yan

274.  Iterative Document Representation Learning Towards Summarization with Polishing
EMNLP, 2018 Supervised Learning

ITS is a new model for extractive text summarization that iteratively polishes the document representation on multiple passes through the document, inspired by the observation that humans often need to read an article multiple times to fully understand and summarize its contents. The model also includes a selective reading mechanism that accurately determines the extent to which each sentence should be updated. Experimental results on two datasets show that ITS outperforms state-of-the-art extractive systems when evaluated by both machines and humans.

Peter J. Liu, Mohammad Saleh, Etienne Pot, Ben Goodrich, Ryan Sepassi, Łukasz Kaiser, Noam Shazeer

275.  GENERATING WIKIPEDIA BY SUMMARIZING LONG SEQUENCES
ICLR, 2018 Supervised Learning

The paper discusses a method for generating English Wikipedia articles by summarizing source documents using extractive summarization and a neural abstractive model. The abstractive model uses a decoder-only architecture that can attend to very long sequences, allowing it to generate fluent and coherent multi-sentence paragraphs and even whole articles. The model is able to extract relevant factual information when given reference documents, as reflected in perplexity, ROUGE scores, and human evaluations.

Xiaoyu Shen, Yang Zhao, Hui Su, Dietrich Klakow

276.  Improving Latent Alignment in Text Summarization by Generalizing the Pointer Generator
EMNLP, 2019 Supervised Learning

The paper discusses the limitations of Pointer Generators in modern summarization systems, which are restricted to exact word matches and result in a bias towards extractive generations. The authors propose a solution by allowing the model to "edit" pointed tokens, transforming them into a target space with a learned relation embedding. The model is shown to capture more latent alignment relations, improve word alignment accuracy, generate higher quality summaries, and bring more abstraction to the generated summaries. The proposed approach is validated on three large-scale summarization datasets.

Edward Moroshko, Guy Feigenblat, Haggai Roitman, David Konopnicki

277.  An Editorial Network for Enhanced Document Summarization
ACL, 2019 Supervised Learning

System: The paper proposes a new approach to summarization called the Editorial Network, which combines extractive and abstractive methods. This approach is applied as a postprocessing step to a sequence of extracted sentences. The paper also suggests a novel soft-labeling approach for training the "editor." The effectiveness of this approach is demonstrated using the CNN/DailyMail dataset, and it is shown to outperform state-of-the-art extractive-only or abstractive-only baselines.

Elozino Egonmwan, Yllias Chali

278.  Transformer-based Model for Single Documents Neural Summarization
ACL, 2019 Supervised Learning

The paper proposes a system that enhances performance on single document summarization tasks using the CNN/DailyMail and Newsroom datasets. The system follows the encoder-decoder paradigm but with a focus on the encoder. The authors introduce a framework that encodes the source text with a transformer and then a sequence-to-sequence model. They find that the transformer and seq2seq model complement each other, resulting in a richer encoded vector representation. Additionally, paying more attention to the vocabulary of target words during abstraction improves performance. The authors experiment with their hypothesis and framework on extractive and abstractive single document summarization tasks and evaluate using the CNN/DailyMail and Newsroom datasets.

Afonso Mendes, Shashi Narayan, Sebastião Miranda, Zita Marinho, André F. T. Martins, Shay B. Cohen

279.  Jointly Extracting and Compressing Documents with Summary State Representations
NAACL, 2019 Supervised Learning

The paper presents a new neural model for text summarization that extracts sentences from a document and compresses them to generate concise and informative summaries. The model dynamically determines the length of the output summary based on gold summaries observed during training, and does not require length constraints typical to extractive summarization. The model achieves state-of-the-art results on the CNN/DailyMail and Newsroom datasets, improving over current extractive and abstractive methods. A new dataset of oracle compressive summaries derived automatically from the CNN/DailyMail reference summaries is also made available.

Yang Liu, Ivan Titov, Mirella Lapata

280.  Single Document Summarization as Tree Induction
NAACL, 2019 Supervised Learning

The paper proposes a new approach to single-document extractive summarization, using a multi-root dependency tree to generate summaries. The model is designed to refine its structures through an iterative algorithm, and is shown to perform competitively against existing methods on two benchmark datasets. This approach differs from previous methods that rely on linguistically motivated document representations.

Lea Frermann, Alexandre Klementiev

281.  Inducing Document Structure for Aspect-based Summarization
ACL, 2019 Supervised Learning

The paper discusses aspect-based summarization, which generates a summary centered around a specific aspect of a document. The authors induce latent document structure and train their models in a scalable synthetic setup, resulting in improvements in summarization over topic-agnostic baselines. The models accurately segment documents by aspect and can produce both abstractive and extractive aspect-based summaries. The learned document structure is particularly advantageous for summarizing long documents, and the results transfer from synthetic training documents to natural news articles from CNN/Daily Mail and RCV1.

Elozino Egonmwan, Vittorio Castelli, Md Arafat Sultan

282.  Cross-Task Knowledge Transfer for Query-Based Text Summarization
ACL, 2019 Unsupervised Learning

System: The paper explores the possibility of transferring knowledge between machine reading comprehension (MRC) and query-based text summarization. The authors use an MRC model trained on the SQuAD1.1 dataset to build an extractive query-based summarizer, which compresses the output of the MRC model using a new sentence compression technique. They also use pre-trained machine translation systems to abstract the extracted summaries. The models achieve state-of-the-art results on the CNN/Daily Mail and Debatepedia datasets, and can serve as powerful baselines for future systems. The authors hope that their results will encourage further research on transfer learning from large MRC corpora to query-based summarization.

Rajdeep Mukherjee, Hari Chandana Peruri, Uppada Vishnu, Pawan Goyal, Sourangshu Bhattacharya, Niloy Ganguly

283.  Read what you need: Controllable Aspect-based Opinion Summarization of Tourist Reviews
SIGIR, 2020 Unsupervised Learning

The paper discusses the time-consuming process of manually extracting relevant aspects and opinions from large volumes of user-generated text. It proposes a solution for generating personalized aspect-based opinion summaries from online tourist reviews, allowing readers to control various attributes of the summary. The approach involves an unsupervised method to extract coherent aspects and an Integer Linear Programming (ILP) based extractive technique to select informative opinions around those aspects while respecting user-specified values. The authors evaluate and compare their summaries using crowdsourcing and ROUGE-based metrics and obtain competitive results.

Liqiang Xiao, Lu Wang, Hao He, Yaohui Jin

284.  Copy or Rewrite: Hybrid Summarization with Hierarchical Reinforcement Learning
AAAI, 2020 Reinforced Learning

The paper proposes a hybrid framework for summarization called HYSUM that combines extractive and abstractive methods to generate informative and concise summaries. Existing extract-then-abstract methods suffer from information loss in the abstraction step, but HYSUM can switch between copying and rewriting sentences based on redundancy to effectively combine the advantages of both methods. The paper also proposes an end-to-end reinforcing method based on Hierarchical Reinforcement Learning to enhance cooperation between the extraction and rewriting modules. Automatic and human evaluations show that HYSUM outperforms existing models on the CNN/DailyMail corpus.

Arthur Bražinskas, Mirella Lapata, Ivan Titov

285.  Few-Shot Learning for Opinion Summarization
EMNLP, 2020 Supervised Learning

The paper discusses the task of opinion summarization, which involves creating text that reflects subjective information expressed in multiple documents, such as user reviews of a product. The lack of large datasets for training supervised models has led to the use of extractive methods that select text fragments in an unsupervised or weakly-supervised way. However, recent research has shown that abstractive summaries can also be produced in an unsupervised fashion. The paper presents a method that uses a handful of summaries to bootstrap the generation of summary text with expected properties such as writing style, informativeness, fluency, and sentiment preservation. The approach involves training a conditional Transformer language model to generate a new product review given other available reviews of the product, and fine-tuning a plug-in module that predicts property values on a handful of summaries. The approach outperforms previous extractive and abstractive methods in automatic and human evaluation on Amazon and Yelp datasets.

Edwin Simpson, Yang Gao, Iryna Gurevych

286.  Interactive Text Ranking with Bayesian Optimization: A Case Study on Community QA and Summarization
TACL, 2020 Reinforced Learning

text ranking. The paper proposes an interactive text ranking approach that uses Bayesian optimization to focus on high-quality candidates and integrate prior knowledge to address the lack of user or task-specific training data. The approach significantly outperforms existing interactive approaches in community question answering and extractive multidocument summarization. The ranking function learned by the method is also an effective reward function for reinforcement learning, improving the state of the art for interactive text ranking.

Shrey Desai, Jiacheng Xu, Greg Durrett

287.  Compressive Summarization with Plausibility and Salience Modeling
EMNLP, 2020 Supervised Learning

The paper proposes a new approach to compressive summarization that uses data-driven criteria of plausibility and salience to determine which spans of sentences can be deleted. A pre-trained Transformer model judges each criterion, and only deletions that are both plausible and not salient are applied. The approach achieves strong in-domain results on benchmark summarization datasets and can generalize cross-domain with fine-tuning on only 500 samples. Human evaluation shows that the plausibility model generally selects for grammatical and factual deletions.

Shahbaz Syed, Roxanne El Baff, Khalid Al-Khatib, Johannes Kiesel, Benno Stein, Martin Potthast

288.  News Editorials: Towards Summarizing Long Argumentative Texts
COLING, 2020 Unsupervised Learning

The paper discusses the lack of exploration in automatic summarization of argumentative texts and presents a new corpus of 1330 summaries for 266 news editorials. The summaries are evaluated based on a specific annotation scheme and aim to be thesis-indicative, persuasive, reasonable, concise, and self-contained. The corpus contains at least three high-quality summaries for about 90% of the editorials, making it useful for the development and evaluation of summarization technology for long argumentative texts. The paper also reports on an in-depth corpus analysis and the evaluation of two extractive summarization models.

Quentin Grail, Julien Perez, Eric Gaussier

289.  Globalizing BERT-based Transformer Architectures for Long Document Summarization
EACL, 2021 Supervised Learning

The paper discusses the limitations of using current transformer-based architectures for fine-tuning large language models on downstream tasks that require reasoning with long documents. To address this issue, the authors introduce a novel hierarchical propagation layer that spreads information between multiple transformer windows. They validate the effectiveness of their approach on three extractive summarization corpora of long scientific papers and news articles and report state-of-the-art results for long document summarization and comparable results for smaller document summarization.

Tianyu Zhu, Wen Hua, Jianfeng Qu, Xiaofang Zhou

290.  Summarizing Long-Form Document with Rich Discourse Information
CIKM, 2021 Supervised Learning

The paper proposes a new extractive summarization model called HEROES to address the deficiencies of existing models for summarizing long-form documents. The two main deficiencies are the increase in computation due to the size of the input document and the lack of exploitation of discourse structural information. HEROES consists of two modules: a content ranking module that selects important sections and sentences to create a short digest, and an extractive summarization module based on a heterogeneous graph with nodes from different discourse levels and designed edge connections to reflect the discourse hierarchy of the document. Experimental results show that HEROES outperforms various strong baselines.

Linzi Xing, Wen Xiao, Giuseppe Carenini

291.  Demoting the Lead Bias in News Summarization via Alternating Adversarial Learning
ACL, 2021 Supervised Learning

System: This paper introduces a new technique to reduce lead bias in news articles and improve the performance of neural extractive summarizers on data with different or no bias. The experiments conducted on two news corpora show that this technique effectively reduces the model's learned lead bias and improves its generality on out-of-distribution data, without any significant loss in performance on in-distribution data.

Guangsheng Bao, Yue Zhang

292.  Contextualized Rewriting for Text Summarization
AAAI, 2021 Supervised Learning

The paper discusses the limitations of extractive summarization and the potential benefits of abstractive rewriting. However, abstractive rewriting systems only consider extracted summaries as input, which can result in the loss of important background knowledge. To address this issue, the authors propose a contextualized rewriting approach that takes in the entire original document. They formalize this approach as a seq2seq problem with group alignments and introduce group tags to model the alignments. The system identifies extracted summaries through content-based addressing and achieves significant improvements on ROUGE scores compared to non-contextualized rewriting systems without requiring reinforcement learning.

Sharmila Reddy Nangi, Atharv Tyagi, Jay Mundra, Sagnik Mukherjee, Snehal Raj, Aparna Garimella, Niyati Chhaya

293.  AUTOSUMM: Automatic Model Creation for Text Summarization
EMNLP, 2021 Supervised Learning

The paper proposes methods to automatically create deep learning models for extractive and abstractive text summarization tasks, which have shown state-of-the-art performances on various datasets. The methods use a combination of Neural Architecture Search and Knowledge Distillation techniques, leveraging the knowledge provided by large language models such as BERT and GPT-2 to develop smaller, customized models for any given dataset. The proposed methods achieve near state-of-the-art performances in terms of accuracy while reducing inference time and model size.

Jiaxin Ju, Ming Liu, Huan Yee Koh, Yuan Jin, Lan Du, Shirui Pan

294.  Leveraging Information Bottleneck for Scientific Document Summarization
EMNLP, 2021 Unsupervised Learning

The paper presents an unsupervised extractive approach to summarize scientific long documents using the Information Bottleneck principle. The approach involves using signals as queries to retrieve key content from the source document, followed by a pre-trained language model to conduct further sentence search and editing to return the final extracted summaries. The framework can be extended to a multi-view framework by different signals. The proposed framework was evaluated on three scientific document datasets and was found to be effective. Human evaluation suggests that the extracted summaries cover more content aspects than previous systems.

Zixing Song, Irwin King

295.  Hierarchical Heterogeneous Graph Attention Network for Syntax-Aware Summarization
AAAI, 2022 Supervised Learning

The paper proposes a new approach to summarization that incorporates the constituent structure of the text using Graph Neural Networks. They use a hierarchical heterogeneous graph attention network over constituency-based parse trees for syntax-aware summarization, which reflects how humans construct summaries hierarchically. The model is effective for both abstractive and extractive summarization tasks on five benchmark datasets from various domains, and further performance improvement can be obtained using state-of-the-art pre-trained models.

Sajad Sotudeh, Nazli Goharian

296.  TSTR: Too Short to Represent, Summarize with Details! Intro-Guided Extended Summary Generation
NAACL, 2022 Supervised Learning

The paper discusses the challenges of generating long/extended summaries for scientific papers, which provide more detailed information than traditional abstracts. The authors propose an extractive summarizer called TSTR that uses introductory information as pointers to salient information. The evaluations on two large-scale datasets show significant improvement in ROUGE and average ROUGE scores compared to strong baselines and state-of-the-art methods. Human evaluations also favor TSTR-generated extended summaries in terms of cohesion and completeness.

Marcello Gecchele, Hiroaki Yamada, Takenobu Tokunaga, Yasuyo Sawaki

297.  Supporting content evaluation of student summaries by Idea Unit embedding
ACL, 2019

The paper proposes a method for computer-assisted content evaluation of summaries by establishing a correspondence between segments of the source text and its summary using "Idea Units (IUs)." The IU correspondence is based on the similarity between vector representations of IU. The proposed method is more robust against rephrased expressions than conventional ROUGE-based baselines and outperformed the baselines in recall. The proposed method has been implemented in a GUI tool called "Segment Matcher" to help teachers establish a link between corresponding IUs across the summary and source text.

Chenxin An, Ming Zhong, Zhiyong Wu, Qin Zhu, Xuanjing Huang, Xipeng Qiu

298.  COLO: A Contrastive Learning based Re-ranking Framework for One-Stage Summarization
COLING, 2022 Supervised Learning

The paper proposes a new framework called COLO for one-stage summarization that uses contrastive learning to generate summaries directly based on summary-level scores, without additional modules or parameters. The framework improves extractive and abstractive results on the CNN/DailyMail benchmark while maintaining parameter and inference efficiency. Compared to state-of-the-art multi-stage systems, COLO saves more than 100 GPU training hours and has a 3-8x speed-up ratio during inference while achieving comparable results.

Arman Cohan, Nazli Goharian

299.  Scientific Article Summarization Using Citation-Context and Article's Discourse Structure
EMNLP, 2015 Supervised Learning

The paper proposes a new approach to summarizing scientific articles that takes into account citation-context and the document discourse model. The method overcomes the problem of inconsistency between citation summaries and the article's content by providing context for each citation. The approach leverages the inherent scientific article's discourse for producing better summaries and shows a significant improvement over existing summarization approaches in terms of ROUGE scores on a scientific summarization dataset. The method is adaptable to other domains beyond the biomedical domain used for evaluation.

Wencan Luo, Diane Litman

300.  Summarizing Student Responses to Reflection Prompts
EMNLP, 2015 Unsupervised Learning

The paper proposes a new algorithm for summarizing student responses to reflection prompts. Unlike traditional methods, the algorithm creates summaries from extracted phrases rather than sentences, and ranks the phrases by the number of students who mention them. Experimental results show that this approach outperforms other summarization methods in terms of ROUGE scores.

Philip John Gorinski, Mirella Lapata

301.  Movie Script Summarization as Graph-based Scene Extraction
NAACL, 2015 Unsupervised Learning

System: The paper discusses the task of movie script summarization and how it can improve script browsing, provide a general idea of the plotline, and reduce reading time. The authors propose a graph-based model that selects an optimal chain of scenes by considering logical progression, diversity, and importance. Human evaluation shows that their model produces more informative summaries compared to other methods.

Arman Cohan, Nazli Goharian

302.  Contextualizing Citations for Scientific Summarization using Word Embeddings and Domain Knowledge
SIGIR, 2017 Unsupervised Learning

The paper proposes an unsupervised model that uses distributed representation of words and domain knowledge to extract context from referenced papers to reflect their exact contributions. The model significantly outperforms the state-of-the-art and improves citation-based summarization of scientific articles. The paper highlights the importance of appropriate context for citation texts and presents a solution to address this problem.

Jeffrey Ling, Alexander M. Rush

303.  Coarse-to-Fine Attention Models for Document Summarization
EMNLP, 2017 Supervised Learning

The paper proposes a new approach to document summarization using a coarse-to-fine attention model that hierarchically reads a document. This approach selects top-level chunks of text using coarse attention and then reads the words of the chosen chunks using fine attention. Unlike standard attention models, this method scales with the number of top-level chunks and can handle longer sequences. While it may lag behind state-of-the-art baselines, the proposed method achieves the desired behavior of sparsely attending to subsets of the document for generation.

Florian Boudin, Hugo Mougard, Benoit Favre

304.  Concept-based Summarization using Integer Linear Programming: From Concept Pruning to Multiple Optimal Solutions
EMNLP, 2015 Unsupervised Learning

System: The paper discusses the challenges of sentence selection in concept-based summarization, which is modelled as a budgeted maximum coverage problem. To find optimal solutions efficiently, low-weight concepts need to be pruned. However, reducing the number of concepts leads to lower ROUGE scores and multiple optimal solutions. The authors propose an extension to the model that provides a single optimal solution and eliminates the need for concept pruning using an approximation algorithm that achieves comparable performance to exact inference.

Daraksha Parveen, Mohsen Mesgar, Michael Strube

305.  Generating Coherent Summaries of Scientific Articles Using Coherence Patterns
EMNLP, 2016 Unsupervised Learning

System: The paper introduces a new approach to automatic summarization of scientific articles that takes into account coherence. The approach uses a graph-based model and coherence patterns mined from a corpus of abstracts to generate summaries that are coherent, important, and non-redundant. The approach is optimized using Mixed Integer Programming and outperforms baseline and state-of-the-art systems in terms of coherence and relevance.

Sun Kim, Lana Yeganova, John Wilbur

306.  Summarizing topical contents from PubMed documents using a thematic analysis
EMNLP, 2015 Unsupervised Learning

System: The paper proposes a method for improving the search and browsing experience in PubMed by finding sub-topics or themes from a set of documents and computing representative titles for each theme. The method combines a thematic clustering algorithm and the Pool Adjacent Violators algorithm to induce significant themes. The system was tested on five disease sets from OMIM and outperformed LDA in terms of performance measures. The quality of theme titles was also evaluated by comparing them with manually created titles.

Chen Li, Zhongyu Wei, Yang Liu, Yang Jin, Fei Huang

307.  Using Relevant Public Posts to Enhance News Article Summarization
COLING, 2016 Unsupervised Learning

The paper explores using public posts on social media to improve automatic summary generation for news articles. Different approaches are proposed, including using frequency information from posts to re-estimate bigram weights and re-weighting a dependency tree edge's importance for sentence compression. The experiments conducted on Facebook data show that relevant public posts can be effectively leveraged to improve news article summarization results.

Greg Durrett, Taylor Berg-Kirkpatrick, Dan Klein

308.  Learning-Based Single-Document Summarization with Compression and Anaphoricity Constraints
ACL, 2016 Unsupervised Learning

The paper presents a model for single-document summarization that combines compression and anaphoricity constraints. The model selects textual units for the summary based on learned weights from a large corpus. Compression rules allow for content deletion within a sentence, and anaphoricity constraints ensure cross-sentence coherence by including pronoun antecedents or rewriting pronouns as full mentions. The final system outperforms prior work on both ROUGE and human judgments of linguistic quality.

Wencan Luo, Fei Liu, Zitao Liu, Diane Litman

309.  Automatic Summarization of Student Course Feedback
NAACL, 2016 Unsupervised Learning

System: The paper proposes a new approach to summarizing student course feedback using the integer linear programming (ILP) framework. This approach allows different student responses to share co-occurrence statistics and alleviates sparsity issues. The experimental results on a student feedback corpus show that this approach outperforms a range of baselines in terms of both ROUGE scores and human evaluation.

Ottokar Tilk, Tanel Alumäe

310.  Low-Resource Neural Headline Generation
EMNLP, 2017 Supervised Learning

System: This paper discusses the challenges of improving headline quality on smaller datasets using neural headline generation models. The authors propose a new method that allows for pre-training all parameters of the model and utilizing all available text. This approach resulted in significant improvements in perplexity and ROUGE scores, with up to a 32.4% relative improvement in perplexity and 2.84 points in ROUGE.

Gyoung Ho Lee, Kong Joo Lee

311.  Automatic Text Summarization Using Reinforcement Learning with Embedding Features
IJCNLP, 2017 Reinforced Learning

System: The paper discusses the use of simple embedding features in a Reinforcement learning approach to automatic text summarization. The authors propose a new deep learning network for estimating Qvalues used in Reinforcement learning and evaluate their model using ROUGE scores with various datasets. The results show that their model is competitive with previous models.

Kundan Krishna, Aniket Murhekar, Saumitra Sharma, Balaji Vasan Srinivasan

312.  Vocabulary Tailored Summary Generation
COLING, 2018 Supervised Learning

The paper proposes a neural framework for generating summaries that are tailored to the linguistic preferences of a specific audience. Existing frameworks do not take into account such preferences, but the proposed method tunes the summary words at the time of generation to match the target vocabulary. The evaluations show that the proposed approach maintains a superior summary quality compared to a word embedding based lexical substitution algorithm. The paper demonstrates two applications of the proposed approach to generate summaries with simpler or shorter words for better readability.

Michihiro Yasunaga, Jungo Kasai, Rui Zhang, Alexander R. Fabbri, Irene Li, Dan Friedman, Dragomir R. Radev

313.  ScisummNet: A Large Annotated Corpus and Content-Impact Models for Scientific Paper Summarization with Citation Networks
AAAI, 2019 Supervised Learning

The paper discusses the challenges of scientific article summarization and proposes solutions to these challenges. The authors develop and release a large-scale manually-annotated corpus for scientific papers on computational linguistics and propose summarization methods that integrate the authors' original highlights and the article's actual impacts on the community to create comprehensive, hybrid summaries. The authors conduct experiments to demonstrate the efficacy of their corpus in training data-driven models for scientific paper summarization and the advantage of their hybrid summaries over abstracts and traditional citation-based summaries. The large annotated corpus and hybrid methods provide a new framework for scientific paper summarization research.

Florian Böhm, Yang Gao, Christian M. Meyer, Ori Shapira, Ido Dagan, Iryna Gurevych

314.  Better Rewards Yield Better Summaries: Learning to Summarise Without References
EMNLP, 2019 Reinforced Learning

The paper discusses the limitations of using ROUGE scores as rewards in Reinforcement Learning (RL) based document summarisation systems, as high ROUGE scores do not necessarily correspond to high human judgement. To address this, the authors learn a reward function from human ratings on 2,500 summaries, which only takes the document and system summary as input. The learned rewards are shown to have significantly higher correlation with human ratings than previous approaches. The authors conduct human evaluation experiments and find that RL systems using their learned rewards generate summaries with higher human ratings compared to state-of-the-art supervised-learning systems and ROUGE-as-rewards RL summarisation systems. The learned reward function and source code are available at https://github.com/yg211/summary-reward-no-reference.

Kristjan Arumae, Parminder Bhatia, Fei Liu

315.  Towards Annotating and Creating Sub-Sentence Summary Highlights
EMNLP, 2019 Supervised Learning

The paper discusses the benefits of creating summary highlights at the sub-sentence level and proposes a method for generating them by annotating summary-worthy sub-sentences and teaching classifiers to do the same. The task is framed as jointly selecting important sentences and identifying a single most informative textual unit from each sentence, which reduces the complexity involved in sentence compression. The study provides new benchmarks and baselines for generating highlights at the sub-sentence level.

Hui Liu, Xiaojun Wan

316.  Neural Review Summarization Leveraging User and Product Information
CIKM, 2019 Supervised Learning

The paper discusses product review summarization, which is a personalized and targeted form of text summarization that provides a brief summary of an online product review. The authors explore different ways to use user and product information to improve review summarization and demonstrate that their approaches are highly effective and outperform existing summarization methods. This technique is useful for both sellers and consumers in making purchase decisions.

Takuya Makino, Tomoya Iwakura, Hiroya Takamura, Manabu Okumura

317.  Global Optimization under Length Constraint for Neural Text Summarization
ACL, 2019 Supervised Learning

The paper proposes a global optimization method called GOLC for neural text summarization models that increases the probabilities of generating summaries with high evaluation scores within a desired length. The method is compared to two other optimization methods on two datasets and the results show that GOLC generates fewer overlength summaries while maintaining the fastest processing speed. The importance of generating in-length summaries for post-editing is also demonstrated, with approximately 30% to 40% improved post-editing time by use of in-length summaries.

Roy Bar-Haim, Lilach Eden, Roni Friedman, Yoav Kantor, Dan Lahav, Noam Slonim

318.  From Arguments to Key Points: Towards Automatic Argument Summarization
ACL, 2020 Supervised Learning

The paper proposes a method for generating concise summaries from a large collection of arguments on a given topic by representing them as a small set of key points, each scored according to its salience. The authors analyze a large dataset of crowd-contributed arguments and find that a small number of key points per topic is typically sufficient for covering the vast majority of the arguments. They also show that a domain expert can often predict these key points in advance. The paper introduces a novel large-scale dataset for the task of argument-to-key point mapping and reports promising empirical results for an extensive set of experiments with this dataset.

Wen Xiao, Giuseppe Carenini

319.  Systematically Exploring Redundancy Reduction in Summarizing Long Documents
AACL, 2020 Supervised Learning

The paper explores the problem of redundancy in neural summarization and proposes three new methods to balance non-redundancy and importance when summarizing long documents. The authors organize existing methods into categories based on when and how redundancy is considered and show that their proposed methods achieve state-of-the-art ROUGE scores while significantly reducing redundancy on two scientific paper datasets.

Chao Zhao, Snigdha Chaturvedi

320.  Weakly-Supervised Opinion Summarization by Leveraging External Information
AAAI, 2020 Supervised Learning

The paper proposes a generative method called ASPMEM for opinion summarization from online product reviews. ASPMEM contains an array of memory cells to store aspect-related knowledge, which helps obtain a better opinion representation and infer aspect information more precisely. The method is evaluated on both aspect identification and opinion summarization tasks and outperforms state-of-the-art methods without relying on human supervision. The proposed method uses domain knowledge from external sources to automatically identify relevant aspects, eliminating the need for additional human effort.

Roy Bar-Haim, Yoav Kantor, Lilach Eden, Roni Friedman, Dan Lahav, Noam Slonim

321.  Quantitative Argument Summarization and Beyond: Cross-Domain Key Point Analysis
EMNLP, 2020 Supervised Learning

The paper discusses the importance of not only extracting salient points when summarizing a collection of views, arguments or opinions, but also quantifying their prevalence. The traditional approach of creating textual summaries lacks this quantitative aspect. The paper proposes a method for automatic extraction of key points, which enables fully automatic analysis and achieves performance comparable to a human expert. The applicability of key point analysis goes beyond argumentation data, as demonstrated by promising results in municipal surveys and user reviews. The paper also presents an in-depth evaluation of argument-to-key point matching models, where previous results are substantially outperformed.

Hiroaki Hayashi, Prashant Budania, Peng Wang, Chris Ackerson, Raj Neervannan, Graham Neubig

322.  WikiAsp: A Dataset for Multi-domain Aspect-based Summarization
TACL, 2021

information. The paper proposes a new dataset, WikiAsp, for multi-domain aspect-based summarization, which aims to encourage research in open-domain aspect-based summarization. The dataset is built using Wikipedia articles from 20 different domains, and several baseline models are proposed and tested. The results highlight challenges that existing summarization models face in this setting, such as handling pronouns and time-sensitive information.

Zeyu Dai, Ruihong Huang

323.  A Joint Model for Structure-based News Genre Classification with Application to Text Summarization
ACL, 2021 Supervised Learning

The paper proposes a joint model for structure-based news genre classification that identifies one of four commonly used news structures and recognizes a sequence of news elements within the article that define the corresponding news structure. The joint model consistently outperforms its variants that perform two tasks independently, which supports the idea that preserving the two-way dependencies and constraints between a type of news structure and its sequence of news elements enables the model to better predict both of them. The system's predicted news structure type and news elements have improved the performance of text summarization when incorporated into a recent neural network system.

Rui Meng, Khushboo Thaker, Lei Zhang, Yue Dong, Xingdi Yuan, Tong Wang, Daqing He

324.  Bringing Structure into Summaries: a Faceted Summarization Dataset for Long Scientific Documents
ACL, 2021 Supervised Learning

System: The paper discusses faceted summarization, which provides multiple summaries of a long document from different perspectives, each targeting specific sections such as purpose, method, findings, and value. The lack of large-scale faceted summarization datasets has hindered research in this area, but the authors present FacetSum, a benchmark built on Emerald journal articles covering diverse domains. The study's analyses and empirical results highlight the importance of structured summaries, and the authors believe FacetSum will drive further advances in summarization research and NLP systems that can leverage structured information in both long texts and summaries.

Sheikh Muhammad Sarwar, Felipe Moraes, Jiepu Jiang, James Allan

325.  Utility of Missing Concepts inQuery-biased Summarization
SIGIR, 2021 Unsupervised Learning

The paper discusses a new approach to query-biased summarization (QBS) that aims to reduce user effort in finding relevant documents. The approach identifies missing information in a retrieved document and presents it in a search interface for crowd workers to judge document relevance based on snippets and missing information. The method, called DSPApprox, uses classical approaches to find terms or phrases relevant to a query. The experimental results show both benefits and limitations of the method compared with traditional ones that only show relevant snippets.

Zhongyi Yu, Zhenghao Wu, Hao Zheng, Zhe XuanYuan, Jefferson Fong, Weifeng Su

326.  LenAtten: An Effective Length Controlling Unit For Text Summarization
ACL, 2021 Supervised Learning

The paper discusses fixed length summarization and the trade-off between length controllability and summary quality. The authors introduce a new length controlling unit called LenAtten, which improves length controllability and ROGUE scores while maintaining great generalization ability. The experimental results show that their model is significantly better than the best-performing length controllable summarizer on the CNN/Daily Mail dataset.

Jinpeng Hu, Jianling Li, Zhihong Chen, Yaling Shen, Yan Song, Xiang Wan, Tsung-Hui Chang

327.  Word Graph Guided Summarization for Radiology Findings
ACL, 2021 Supervised Learning

The paper discusses the challenges faced by radiologists in writing impression sections of radiology reports, which summarize essential findings and are critical for communicating medical information to physicians. Automatic impression generation has emerged as an attractive research direction to facilitate this clinical practice. The paper proposes a novel method for automatic impression generation, where a word graph is constructed from the findings to record critical words and their relations, and a Word Graph guided Summarization model (WGSUM) is designed to generate impressions with the help of the word graph. Experimental results on two datasets confirm the validity and effectiveness of the proposed approach, achieving state-of-the-art results. Further experiments are conducted to analyze the impact of different graph designs on the performance of the method.

Nadav Oved, Ran Levy

328.  PASS: Perturb-and-Select Summarizer for Product Reviews
ACL, 2021 Supervised Learning

The paper discusses the challenges of automatically producing concise and informative summaries for product reviews, including the tendency for summarizers to favor generic content and the potential for self-contradicting summaries due to reviewer disagreements. The authors propose the PASS system, which uses a pre-trained Transformer-based model and applies systematic perturbations to generate multiple summaries per product. The system also includes a method for ranking the summaries based on coherence. The authors compare their system to other methods and show that it produces more informative, diverse, and coherent summaries.

Chao-Chun Hsu, Chenhao Tan

329.  Decision-Focused Summarization
EMNLP, 2021 Supervised Learning

The paper proposes a new approach to summarization called decision-focused summarization, which aims to summarize relevant information for a particular decision. They use a predictive model to make the decision based on the full text and then select representative sentences that lead to similar model decisions while accounting for non-redundancy. The method, called DecSum, is evaluated on a testbed where the task is to summarize restaurant reviews to predict future ratings on Yelp. DecSum outperforms other summarization methods in decision faithfulness and representativeness and enables humans to outperform random chance in predicting which restaurant will be better rated in the future.

Choongwon Park, Youngjoong Ko

330.  QSG Transformer: Transformer withQuery-Attentive Semantic Graph forQuery-Focused Summarization
SIGIR, 2022 Supervised Learning

The paper discusses the task of Query-Focused Summarization (QFS) and the limitations of Transformer-based summarization models in utilizing relationships between distant words and query information. To address these issues, the authors propose the QSG Transformer, a novel QFS model that leverages structure information on Query-attentive Semantic Graph (QSG). The QSG node representation is improved by a query-attentive graph attention network, which spreads the information of the query node into QSG using Personalized PageRank. The proposed method achieves superior performance over state-of-the-art models on two QFS datasets.

Chenxin An, Ming Zhong, Yiran Chen, Danqing Wang, Xipeng Qiu, Xuanjing Huang

331.  Enhancing Scientific Papers Summarization with Citation Graph
AAAI, 2021 Supervised Learning

The paper proposes a new approach to scientific paper summarization that utilizes the citation network of the papers. The authors argue that previous approaches have focused too much on the content of the papers and have not taken into account the importance of the citation network. They introduce a new model called CGSUM that incorporates both the source paper and its references. They also construct a new dataset called Semantic Scholar Network (SSN) that contains 141K research papers and 661K citation relationships. The experiments show that the proposed model outperforms pretrained models even with a simple architecture and that the citation graph is crucial for generating high-quality summaries.

Xinnuo Xu, Ondřej Dušek, Shashi Narayan, Verena Rieser, Ioannis Konstas

332.  MIRANEWS: Dataset and Benchmarks for Multi-Resource-Assisted News Summarization
EMNLP, 2021 Supervised Learning

The paper discusses the problem of 'extrinsic hallucinations' in single-document news summarization, where the summary contains facts not present in the source document. The authors propose using multiple supplementary resource documents to mitigate this problem and present a new dataset called MIRANEWS to benchmark existing summarization models. They show that more than 27% of facts mentioned in the gold summaries of MIRANEWS are better grounded on assisting documents than in the main source articles. The authors also conduct an error analysis of generated summaries from pretrained models fine-tuned on MIRANEWS, revealing that assisted summarization reduces 55% of hallucinations when compared to single-document summarization models trained on the main article only.

Jinpeng Hu, Zhuo Li, Zhihong Chen, Zhen Li, Xiang Wan, Tsung-Hui Chang

333.  Graph Enhanced Contrastive Learning for Radiology Findings Summarization
ACL, 2022 Supervised Learning

The paper proposes a unified framework for automatic impression generation in radiology reports that leverages both extra knowledge and the original findings in an integrated way. The proposed method encodes each input finding using a text encoder and constructs a graph through its entities and dependency tree. A graph encoder is then adopted to model relation information in the constructed graph. Finally, contrastive learning is introduced to emphasize key words in the findings. The experimental results on OpenI and MIMIC-CXR confirm the effectiveness of the proposed method.

Hayate Isoä, Xiaolan Wangä, Stefanos AngelidisÅ, Yoshihiko Suharaä

334.  Comparative Opinion Summarization via Collaborative Decoding
ACL, 2022 Supervised Learning

The paper proposes a new task called comparative opinion summarization, which generates two contrastive summaries and one common summary from two different sets of reviews to help users compare multiple choices. The authors develop a framework called COCOSUM, which consists of two base summarization models that jointly generate the summaries. Experimental results show that COCOSUM produces higher-quality summaries than existing opinion summarization models. The dataset and code are available for use.

Kexun Zhang, Jiaao Chen, Diyi Yang

335.  Focus on the Action: Learning to Highlight and Summarize Jointly for Email To-Do Items Summarization
ACL, 2022 Supervised Learning

The paper discusses the task of automatic email to-do item generation, which involves generating action mentions from emails to help people schedule their daily work. The authors propose a learning to highlight and summarize framework (LHS) to identify the most salient text and actions and generate more faithful to-do items. The LHS model outperforms baseline models and achieves state-of-the-art performance in both quantitative evaluation and human judgement. The paper also highlights specific challenges that current models face with email to-do summarization.

Naman Bansal, Mousumi Akter, Shubhra Kanti Karmaker

336.  Semantic Overlap Summarization among Multiple Alternative Narratives: An Exploratory Study
COLING, 2022 Supervised Learning

The paper introduces a new NLP task called Semantic Overlap Summarization (SOS) which involves generating a summary from multiple alternative narratives. The authors created a benchmark dataset by collecting alternative narrative pairs and manually creating reference summaries. They found that the popular ROUGE metric is not suitable for evaluating this task and instead used a sentencewise annotation technique with three overlap labels. Their experiments showed that this technique yielded higher correlation with human judgment and higher inter-rater agreement compared to the ROUGE metric.

Pengshan Cai, Fei Liu, Adarsha Bajracharya, Joe Sills, Alok Kapoor, Weisong Liu, Dan Berlowitz, David Levy, Richeek Pradhan, Hong Yu

337.  Generation of Patient After-Visit Summaries to Support Physicians
COLING, 2022 Supervised Learning

The paper discusses the problem of physicians not having enough time to write clear and informative after-visit summaries for patients, and explores the possibility of using automatic generation of summaries. The study uses a clinical dataset to examine whether automatic summaries can effectively convey the important details of clinical visits. The results suggest that generating lay language after-visit summaries is still a challenging task, but a feedback mechanism is introduced to alert physicians when automatic summaries fail to capture important details or contain potentially detrimental information. Automatic and human evaluation shows the effectiveness of this approach in providing writing feedback and supporting physicians.

Mounica Maddela, Mayank Kulkarni

338.  ENTSUM: A Data Set for Entity-Centric Summarization
ACL, 2022 Supervised Learning

The paper discusses controllable summarization, which aims to provide summaries that take into account user-specified aspects and preferences. The authors introduce a human-annotated data set (ENTSUM) for controllable summarization with a focus on named entities as the aspects to control. They conduct an extensive analysis and show that existing methods for controllable summarization fail to generate entity-centric summaries. The authors propose extensions to state-of-the-art summarization approaches that achieve substantially better results on their data set. The paper highlights the challenging nature of this task and the proposed data set.

Ryuji Kano, Takumi Takahashi, Toru Nishino, Motoki Taniguchi, Tomoki Taniguchi, Tomoko Ohkuma

339.  Quantifying Appropriateness of Summarization Data for Curriculum Learning
EACL, 2021

The paper proposes a method of curriculum learning to train summarization models from noisy data. They use sequence-to-sequence models and propose a model that can quantify noise from a single noisy corpus. They conduct experiments on three summarization models and show that their method improves performance. They also analyze how different curricula affect the performance of pretrained and nonpretrained summarization models. Human evaluation results also show that their method improves the performance of summarization models.

Max Grusky, Mor Naaman, Yoav Artzi

340.  NEWSROOM: A Dataset of 1.3 Million Summaries with Diverse Extractive Strategies
NAACL, 2018

The paper introduces NEWSROOM, a dataset of 1.3 million articles and summaries from 38 major news publications, extracted from search and social media metadata between 1998 and 2017. The summaries demonstrate a high diversity of summarization styles, combining abstractive and extractive strategies. The authors analyze the extraction strategies used in NEWSROOM summaries and compare them to other datasets to evaluate its diversity and difficulty. They also train existing methods on the data to evaluate its utility and challenges. The dataset is available online at summari.es.

Ting-Yao Hsu, Yoshi Suhara, Xiaolan Wang

341.  Summarizing Community-based Question-Answer Pairs
EMNLP, 2022 Supervised Learning

The paper proposes a new task of summarizing Community-based Question Answering (CQA) pairs to help users quickly digest key information. The authors design a multi-stage data annotation process and create a benchmark dataset, COQASUM, based on the Amazon QA corpus. They compare extractive and abstractive summarization methods and establish a strong baseline approach called DedupLED. The experiment confirms two key challenges, sentencetype transfer and deduplication removal, towards the CQA summarization task. The data and code are publicly available.

Rishi Bommasani, Claire Cardie

342.  Intrinsic Evaluation of Summarization Datasets
EMNLP, 2020

The paper discusses the importance of high quality data for building statistical models in natural language processing (NLP), and the need to evaluate data quality during dataset construction or post hoc. It highlights that popular summarization datasets are often drawn from natural sources without quality assurance guarantees, and that data quality has gone largely unquestioned in recent summarization research. The authors introduce 5 intrinsic metrics and apply them to 10 popular datasets, finding that data usage in recent summarization research is sometimes inconsistent with the underlying properties of the datasets employed. They also discover that their metrics can serve as inexpensive heuristics for detecting low quality examples.

Mehwish Fatima, Michael Strube

343.  A Novel Wikipedia based Dataset for Monolingual and Cross-Lingual Summarization
EMNLP, 2021

The paper discusses the challenge of cross-lingual summarization and the lack of available resources for this task. To address this issue, the authors present a new dataset for monolingual and cross-lingual summarization in the English-German pair. They collected high-quality cross-lingual data from Spektrum der Wissenschaft and complemented it with a similar dataset from the Wikipedia Science Portal. The authors also conducted experiments with various summarization models and found that the proposed dataset is useful for monolingual and cross-lingual summarization.

Jia Jin Koay, Alexander Roustai, Xiaojin Dai, Dillon Burns, Alec Kerrigan, Fei Liu

344.  How Domain Terminology Affects Meeting Summarization Performance
COLING, 2020

The paper discusses the importance of meetings in organizations and the need for a meeting summarization system to help users quickly search and sift through large meeting collections. The authors analyze the impact of domain terminology, or jargon terms, on the performance of meeting summarization and find that it can have a substantial impact. They create gold-standard annotations for jargon terms on a sizable meeting corpus and publicly release all domain terminology to advance research in meeting summarization.

Amr Keleg, Matthias Lindemann, Danyang Liu, Wanqiu Long, Bonnie L. Webber

345.  Automatically Discarding Straplines to Improve Data Quality for Abstractive News Summarization
ACL, 2022

The paper discusses the impact of non-summary texts, specifically straplines, on the quality of news article summarization. The authors identify straplines as a common form of non-summary text that is often included in scraped corpora used for news summarization. They present a rule-based strapline detection method that achieves good performance and show that removing straplines and noise from the training data of a news summarizer results in higher quality summaries, with improvements as high as 7 points ROUGE score.

Sajad Sotudeh, Hanieh Deilamsalehy, Franck Dernoncourt, Nazli Goharian

346.  TLDR9+: A Large Scale Resource for Extreme Summarization of Social Media Posts
EMNLP, 2021

The paper discusses the importance of training data in developing summarization systems and introduces a new large-scale summarization dataset called TLDR9+ containing over 9 million training instances extracted from Reddit discussion forum. The dataset is specifically gathered for extreme summarization and is more than twice larger than the previously proposed dataset. The authors also distill a more fine-grained dataset called TLDRHQ by sampling high-quality instances from TLDR9+ with the help of human annotations. The paper further evaluates different state-of-the-art summarization models on the proposed datasets.

Vivian Lai, Alison Smith-Renner, Ke Zhang, Ruijia Cheng, Wenjuan Zhang, Joel Tetreault, Alejandro Jaimes

347.  An Exploration of Post-Editing Effectiveness in Text Summarization
NAACL, 2022

The paper discusses the potential benefits of human-AI collaboration in text summarization through post-editing. The study conducted with 72 participants compared post-editing provided summaries with manual summarization for summary quality, human efficiency, and user experience on formal and informal text. The results suggest that post-editing can be useful in some cases, but not in others, and participants' different editing strategies and needs for assistance offer implications for future human-AI summarization systems.

Minh-Tien Nguyen, Minh-Le Nguyen

348.  SoLSCSum: A Linked Sentence-Comment Dataset for Social Context Summarization
CIKM, 2016

The paper introduces a new dataset called SoLSCSum for social context summarization, consisting of 157 open-domain articles and their comments from Yahoo News that were manually annotated by two annotators to extract standard summaries. The dataset has a high inter-annotator agreement and can be used to train summary methods such as SVM. The paper also demonstrates the potential use of the dataset by training a learning to rank model with local and cross features, which achieved significant improvements in document summarization over state-of-the-art baselines.

Umanga Bista, Alexander Mathews, Minjeong Shin, Aditya Krishna Menon, Lexing Xie

349.  Comparative Document Summarisation via Classification
AAAI, 2019

The paper discusses extractive summarization in a comparative setting, where the objective is to select a small number of documents that represent each group and distinguish them from other groups. The authors propose a new set of objective functions that connect recent literature on document summarization, interpretable machine learning, and data subset selection. They cast the problem as a binary classification among different groups and derive objectives based on the maximum mean discrepancy and a gradient-based optimization strategy. The authors evaluate comparative summarization methods on a newly curated collection of controversial news topics over 13 months and find that gradient-based optimization outperforms discrete and baseline approaches in 15 out of 24 different automatic evaluation settings. In crowd-sourced evaluations, summaries from gradient optimization elicit 7% more accurate classification from human workers than discrete optimization. The authors suggest that their formulation of comparative summarization will be useful in comparing content sources, authors, related topics, or distinct viewpoints.

Fajri Koto, Timothy Baldwin, Jey Han Lau

350.  LipKey: A Large-Scale News Dataset for Absent Keyphrases Generation and Abstractive Summarization
COLING, 2022

System: The paper discusses the importance of summaries, keyphrases, and titles in capturing the content of a document. The authors introduce LipKey, a news corpus with human-written abstractive summaries, absent keyphrases, and titles. They use multi-task training and joint structured inputs to improve transformer-based summarization models by including absent keyphrases and titles as additional context to the source document.

Sangwoo Cho, Kaiqiang Song, Xiaoyang Wang, Fei Liu, Dong Yu

351.  Toward Unifying Text Segmentation and Long Document Summarization
EMNLP, 2022 Supervised Learning

The paper discusses the importance of text segmentation in understanding and summarizing long documents, particularly in transcripts of audio/video recordings. The authors propose an approach that simultaneously performs summarization and segmentation to learn robust sentence representations, which is further enhanced by an optimization-based regularizer to promote selection of diverse summary sentences. The approach was evaluated on multiple datasets and found to achieve state-of-the-art performance on publicly available benchmarks, with better crossgenre transferability when equipped with text segmentation. The paper also includes analyses to quantify the impact of section segmentation on summarizing written and spoken documents of substantial length and complexity.

Frederic Kirstein, Jan Philip Wahle, Terry Ruas, Bela Gipp

352.  Analyzing Multi-Task Learning for Abstractive Text Summarization
EMNLP, 2022

The paper explores the effects of task families on abstractive text summarization, specifically analyzing the influence of multi-task learning strategies using task families for the English language. The authors group tasks into three strategies and evaluate trained models through two downstream tasks, finding that certain combinations of task families positively impact downstream performance. They also find that choice and combinations of task families influence downstream performance more than the training scheme, supporting the use of task families for abstractive text summarization. The code is publicly available.

Nachshon Cohen, Oren Kalinsky, Yftah Ziser, Alessandro Moschitti

353.  WikiSum: Coherent Summarization Dataset for Efficient Human-Evaluation
ACL, 2021

The paper discusses the challenges of evaluating summarization output from existing datasets, which are often curated from academic documents written for experts. To address this issue, the authors present a new dataset based on article summaries from the WikiHow website, which are written in plain language and focused on how-to articles. The authors compare their dataset to existing ones and show that it makes human evaluation more manageable and effective. A human evaluation conducted on PubMed and the proposed dataset supports their findings.

Seonil Son, Junsoo Park, Jeong-in Hwang, Junghwa Lee, Hyungjong Noh, Yeonsoo Lee

354.  HaRiM: Evaluating Summary Quality with Hallucination Risk
AACL, 2022

The paper discusses the challenge of measuring the factual consistency of generated text in summarization models. The authors propose a reference-free metric called HaRiM, which measures hallucination risk based on token likelihoods and correlates well with human judgment on three summary-quality annotation sets. They reinterpret a previously suggested objective as a hallucination risk measurement to better estimate summary quality without requiring additional training or alignment to human judgments. The authors hope their work will facilitate progress in both automated evaluation and generation of summaries.

Esin Durmus, Mona Diab

355.  FEQA: A Question Answering Evaluation Framework for Faithfulness Assessment in Abstractive Summarization
ACL, 2020

The paper discusses the issue of neural abstractive summarization models generating content inconsistent with the source document, and the inadequacy of existing automatic metrics to capture such mistakes. The authors propose an automatic question answering (QA) based metric for evaluating the faithfulness of generated summaries, which has a higher correlation with human faithfulness scores, especially on highly abstractive summaries. The authors also find that current models exhibit a trade-off between abstractiveness and faithfulness, with outputs having less word overlap with the source document being more likely to be unfaithful.

Seyed Ali Bahrainian, Sheridan Feucht, Carsten Eickhoff

356.  NEWTS: A Corpus for News Topic-Focused Summarization∗
ACL, 2022

The paper discusses how text summarization models are improving and how existing benchmarking corpora may not reflect the full range of summarization needs. The paper introduces a new topical summarization corpus called NEWTS, which is based on the CNN/Dailymail dataset and annotated via online crowd-sourcing. Each source article is paired with two reference summaries, each focusing on a different theme of the source document. The paper evaluates existing techniques and analyzes the effectiveness of different prompting methods.

Shiyue Zhang, Asli Celikyilmaz, Jianfeng Gao, Mohit Bansal

357.  EMAILSUM: Abstractive Email Thread Summarization
EACL, 2021

The paper discusses the importance of summarizing conversation threads to improve work and communication efficiency. To aid in research on thread summarization, the authors developed an abstractive Email Thread Summarization dataset and conducted a study on different summarization techniques. The study revealed challenges in current abstractive summarization models, such as understanding the sender's intent and identifying the roles of sender and receiver. The authors also found that widely used automatic evaluation metrics are weakly correlated with human judgments, emphasizing the importance of human evaluation and the development of better metrics.

Piji Li, Haisong Zhang, Xiaojiang Liu, Shuming Shi

358.  Rigid Formats Controlled Text Generation
ACL, 2020

The paper discusses the challenges of generating text in rigid formats such as lyrics, sonnets, and classical Chinese poetry, which require adherence to strict formatting and rhyming schemes while maintaining sentence integrity. The authors propose a framework called SongNet, which is a Transformer-based auto-regressive language model that uses tailor-designed symbols to improve modeling performance. The attention mechanism is also improved to capture future information on the format. The framework is pre-trained and fine-tuned, and experiments show that it generates better results than existing methods in terms of both automatic metrics and human evaluation.

Mingda Chen, Zewei Chu, Sam Wiseman, Kevin Gimpel

359.  SummScreen: A Dataset for Abstractive Screenplay Summarization
ACL, 2022

The paper introduces a summarization dataset called SUMMSCREEN, which consists of pairs of TV series transcripts and human-written recaps. The dataset poses a challenge for abstractive summarization due to plot details being scattered throughout the transcript and the presence of content that does not directly relate to the central plot. The paper proposes two entity-centric evaluation metrics and evaluates several methods, including neural models and those based on nearest neighbors. An oracle extractive approach outperforms all benchmarked models, indicating that neural models are unable to fully exploit the input transcripts. Human evaluation and qualitative analysis show that non-oracle models are competitive with their oracle counterparts but generate unfaithful facts, suggesting future research directions.

Junjie Li, Haoran Li, Chengqing Zong

360.  Towards Personalized Review Summarization via User-Aware Sequence Network
AAAI, 2019

The paper discusses personalized review summarization, which generates a condensed summary for a user's review, accounting for their preference on different aspects or writing style. The proposed model, User-aware Sequence Network (USN), considers the user's characteristics when generating summaries, containing a user-aware encoder and decoder. The user-aware encoder selects important information of a review, and the user-aware decoder incorporates user characteristics and word-using habits to generate personalized summaries. The model was validated using a new dataset, and achieved state-of-the-art performance on personalized review summarization. The paper focuses on single-review summarization and leaves adapting the model to multi-review summarization scenarios for future work. The review provided is about a hotel near the airport, with a clean and comfortable room and a slightly high price. The summary generated is "very quite room in a great location."

Daniel Deutsch, Rotem Dror, Dan Roth

361.  A Statistical Analysis of Summarization Evaluation Metrics Using Resampling Methods
TACL, 2021

The paper discusses the challenges in evaluating the quality of summarization evaluation metrics and proposes methods for calculating confidence intervals and running hypothesis tests for correlations using two resampling methods, bootstrapping and permutation. The authors evaluate the proposed methods through simulation experiments and apply them to several automatic evaluation metrics across three sets of human annotations. They find that confidence intervals are wide, indicating high uncertainty in the reliability of automatic metrics. However, two recent works, QAEval and BERTScore, show statistical improvements over ROUGE in some evaluation settings.

Pavlos Vougiouklis, Elena Simperl

362.  Point at the Triple: Generation of Text Summaries from Knowledge Base Triples (Extended Abstract)
IJCAI, 2020

System: The paper discusses a method for generating natural language summaries from knowledge base triples using a pointer-generator network. The network can generate regular words and verbalize triples in multiple ways. The approach was evaluated through automatic and human evaluations on single and open-domain summaries generation tasks, and it outperformed other data-driven baselines significantly.

Abram Handler, Prem Ganeshkumar, Brendan O’Connor, Mohamed AlTantawy, Slobodan Milosevic

363.  Summarizing Relationships for Interactive Concept Map Browsers
EMNLP, 2019

The paper discusses concept maps, which are visual summaries of important concepts from a dataset displayed as vertexes with edges showing natural language descriptions of relationships between concepts. While previous attempts at creating concept maps have been static, the paper presents a model that responds to queries by returning short, importance-ranked, natural language descriptions of the relationship between two requested concepts for display in a visual interface. The model is trained on a new public dataset and code and data are available at a specific GitHub link.

Anastassia Kornilova, Vlad Eidelman

364.  BillSum: A Corpus for Automatic Summarization of US Legislation
EMNLP, 2019

The paper introduces BillSum, the first dataset for summarization of US Congressional and California state bills. The authors explain the challenges in processing this type of data and benchmark extractive methods that consider neural sentence representations and traditional contextual features. They also demonstrate that models built on Congressional bills can be used to summarize California bills, showing that methods developed on this dataset can transfer to states without human-written summaries.

Priyam Tejaswin, Dhruv Naik, Pengfei Liu

365.  How well do you know your summarization datasets?
ACL, 2021

The paper discusses the lack of understanding of the characteristics of datasets used to train and evaluate summarization systems, and how they affect system performance and reliability of metrics. The authors manually analyze 600 samples from three popular summarization datasets and classify them into six categories based on noise types and summarization difficulty. They then analyze 27 state-of-the-art summarization models and 5 popular metrics, and report their findings on the distinct data quality and complexity distributions of datasets, the dependence of model performance and metric reliability on sample complexity, and the low scores received by faithful summaries due to poor diversity of references. The authors also release the code, annotated data, and model outputs.

Tal Baumel, Raphael Cohen, Michael Elhadad

366.  Topic Concentration in Query Focused Summarization Datasets
AAAI, 2016

The paper discusses Query-Focused Summarization (QFS), which summarizes a document cluster in response to a specific input query. The authors note that current state-of-the-art algorithms for QFS do not significantly improve upon generic summarization methods, which ignore query relevance, when evaluated on traditional QFS datasets. They hypothesize that this is due to the high topic concentration in these datasets. To address this, they introduce a new QFS dataset with controlled levels of topic concentration and compare algorithms on this dataset. They report strong improvement in performance for algorithms that properly model query relevance and present three new QFS algorithms that outperform state-of-the-art methods on the new dataset.

Wojciech Kryściński, Nitish Shirish Keskar, Bryan McCann, Caiming Xiong, Richard Socher

367.  Neural Text Summarization: A Critical Evaluation
EMNLP, 2019

The paper discusses the current state of text summarization, which aims to condense long documents into shorter versions while retaining important information. Despite increased interest and research, progress on benchmark datasets has stalled. The authors identify three primary issues: 1) datasets may contain noise and are underconstrained, 2) evaluation metrics do not account for important factors such as factual correctness, and 3) models overfit to biases in current datasets and lack diversity in their outputs.

Meng Cao, Yue Dong, Jackie Chi, Kit Cheung

368.  Hallucinated but Factual! Inspecting the Factuality of Hallucinations in Abstractive Summarization
ACL, 2022

The paper discusses how abstractive summarization systems often generate content that is not directly inferable from the source text, known as "hallucinations." However, the authors found that much of this hallucinated content is factual and can provide useful background information in a summary. They propose a novel detection approach to separate factual from non-factual hallucinations of entities, using pre-trained and finetuned masked language models. Their approach outperforms five baselines and strongly correlates with human judgments. The authors also show that their detector, when used as a reward signal in an off-line reinforcement learning algorithm, significantly improves the factuality of summaries while maintaining the level of abstractiveness.

Shuaiqi LIU, Jiannong Cao, Zhiyuan Wen

369.  Generating a Structured Summary of Numerous Academic Papers: Dataset and Method
IJCAI, 2022

The paper discusses the challenges of summarizing numerous academic papers into a structured summary and proposes a solution called BigSurvey, which is a large-scale dataset for generating comprehensive summaries of academic papers on each topic. The authors utilize target summaries from over 7,000 survey papers and their 430,000 reference papers' abstracts as input documents. They also propose a summarization method called category-based alignment and sparse transformer (CAST), which outperforms various advanced summarization methods.

Yifan Chen, Tamara Polajnar, Colin Batchelor, Simone Teufel

370.  A Corpus of Very Short Scientific Summaries
CONLL, 2020

System: The paper introduces a new task of summarizing scientific articles in the chemistry domain into one or two-sentence table of contents entries. The authors use an open access publication corpus and evaluate their approach using state-of-the-art summarization methods.

Ming Zhong, Danqing Wang, Pengfei Liu, Xipeng Qiu, Xuanjing Huang

371.  A Closer Look at Data Bias in Neural Extractive Summarization Models
EMNLP, 2019

System: The paper discusses the current state of summarization datasets and how different factors of datasets affect the generalization behavior of neural extractive summarization models. The authors propose several properties of datasets that matter for the generalization of summarization models and analyze how different properties of datasets influence the choices of model structure design and training methods. They demonstrate that a deep understanding of dataset characteristics can lead to significant improvements in existing models.

Xiaojun Wan, Yue Hu

372.  BrailleSUM: A News Summarization System for the Blind and Visually Impaired People
ACL, 2015

System: The paper discusses the challenges of document summarization for the blind and visually impaired people and proposes a new system called BrailleSUM. The system takes into account the length of each sentence in news articles and uses an ILP-based summarization method. Evaluation results show that BrailleSUM can produce shorter braille summaries without sacrificing content quality.

Alexander R. Fabbri, Wojciech Kryściński, Bryan McCann, Caiming Xiong, Richard Socher, Dragomir Radev

373.  SummEval: Re-evaluating Summarization Evaluation
TACL, 2021

judgments. The paper addresses the lack of consensus and comprehensive studies on evaluation metrics for text summarization. The authors re-evaluate 14 automatic evaluation metrics and benchmark 23 recent summarization models using these metrics. They also assemble the largest collection of summaries generated by models trained on the CNN/DailyMail news dataset and share it in a unified format. Additionally, they implement and share a toolkit for evaluating summarization models across a broad range of automatic metrics and assemble the largest and most diverse collection of human judgments of model-generated summaries on the CNN/Daily Mail dataset. The authors hope that their work will promote a more complete evaluation protocol for text summarization and advance research in developing evaluation metrics that better correlate with human judgments.

Alexander R. Fabbri, Xiaojian Wu, Srini Iyer, Haoran Li, Mona Diab

374.  AnswerSumm: A Manually-Curated Dataset and Pipeline for Answer Summarization
NAACL, 2022

The paper discusses the challenge of answer summarization in Community Question Answering (CQA) fora such as Stack Overflow and Yahoo! Answers, where each question thread can receive a large number of answers with different perspectives. The absence of a dataset to provide supervision for producing such summaries is a major obstacle. The paper introduces a novel dataset of 4,631 CQA threads for answer summarization curated by professional linguists. The pipeline gathers annotations for all subtasks of answer summarization, including relevant answer sentence selection, grouping these sentences based on perspectives, summarizing each perspective, and producing an overall summary. The paper also introduces a novel unsupervised approach for multi-perspective data augmentation that boosts summarization performance according to automatic evaluation. Finally, the paper proposes reinforcement learning rewards to improve factual consistency and answer coverage and analyzes areas for improvement.

Ahmed Magooda, Diane Litman

375.  Mitigating Data Scarceness through Data Synthesis, Augmentation and Curriculum for Abstractive Summarization
EMNLP, 2021

System: This paper discusses three techniques for improving abstractive summarization models without requiring additional data. These techniques include data synthesis with paraphrasing, data augmentation with sample mixing, and curriculum learning with two new difficulty metrics. The experiments conducted show that these techniques can improve summarization performance across two models and two small datasets, both when applied in isolation and when combined.

Manik Bhandari, Pranav Gour, Atabak Ashfaq, Pengfei Liu, Graham Neubig

376.  Re-evaluating Evaluation in Text Summarization
EMNLP, 2020

The paper discusses the importance of automated evaluation metrics in text summarization tasks and highlights the need to re-evaluate the current standard metric, ROUGE, which has been used for almost 20 years. The authors assess the reliability of automatic metrics using top-scoring system outputs on modern datasets and systems, both abstractive and extractive, for system-level and summary-level evaluation settings. They find that conclusions about evaluation metrics on older datasets do not necessarily hold on modern datasets and systems. The authors release a dataset of human judgments collected from 25 top-scoring neural summarization systems, which can be found on GitHub.

Soham Poddar, Azlaan Mustafa Samad, Rajdeep Mukherjee, Niloy Ganguly, Saptarshi Ghosh

377.  CAVES: A Dataset to facilitate Explainable Classification and Summarization of Concerns towards COVID Vaccines
SIGIR, 2022

The paper discusses the societal challenge of convincing people to get vaccinated against COVID-19 and the use of social media analysis to understand specific concerns people have towards vaccines. The authors have curated CAVES, a large-scale dataset of about 10k COVID-19 anti-vaccine tweets labeled into various specific anti-vaccine concerns in a multi-label setting. This is the first multi-label classification dataset that provides explanations for each label and class-wise summaries of all tweets. Preliminary experiments show that this is a challenging dataset for multi-label explainable classification and tweet summarization.

Alexander R. Fabbri, Faiaz Rahman, Imad Rizvi, Borui Wang, Haoran Li, Yashar Mehdad, Dragomir Radev

378.  ConvoSumm: Conversation Summarization Benchmark and Improved Abstractive Summarization with Argument Mining
ACL, 2021

The paper discusses the lack of standardized datasets for summarizing online discussions, which has resulted in abstractive text summarization primarily focusing on news articles. To address this gap, the authors design annotation protocols to crowdsource four new datasets on diverse online conversation forms. They benchmark state-of-the-art models on these datasets and analyze characteristics associated with the data. They also evaluate these models on widely-used conversation summarization datasets to establish strong baselines in this domain. The authors incorporate argument mining through graph construction to directly model the issues, viewpoints, and assertions present in a conversation and filter noisy input, showing comparable or improved results according to automatic and human evaluations.

Ojas Ahuja, Jiacheng Xu, Akshay Gupta, Kevin Horecka, Greg Durrett

379.  ASPECTNEWS: Aspect-Oriented Summarization of News Documents
ACL, 2022

The paper discusses the limitations of generic and query-based summaries and proposes aspect-oriented summaries that focus on high-level topics discussed among similar types of documents. The authors collected a dataset of aspect-oriented summaries for articles in news sub-domains and evaluated existing techniques for generating such summaries without in-domain training data. They compared different training schemes and found that their final approach produced focused summaries that were better than those from a generic summarization system or keyword matching, and that the system was sensitive to the choice of keywords.

Yiran Chen, Pengfei Liu, Ming Zhong, Zi-Yi Dou, Danqing Wang, Xipeng Qiu, Xuanjing Huang

380.  CDEvalSumm: An Empirical Study of Cross-Dataset Evaluation for Neural Summarization Systems
EMNLP, 2020

The paper discusses the limitations of existing evaluation methods for text summarization models, which are typically trained and evaluated on the same dataset. The authors argue that this approach can narrow our understanding of the generalization ability for different summarization systems. To address this, they perform an in-depth analysis of different datasets and investigate the performance of 11 representative summarization systems on 5 datasets from different domains under a cross-dataset setting. The study reveals the effect of model architectures and generation ways (i.e. abstractive and extractive) on model generalization ability and sheds light on the limitations of existing summarizers. Supplementary code can be found on their Github page.

Yang Liu, Chenguang Zhu, Michael Zeng

381.  End-to-End Segmentation-based News Summarization
ACL, 2022

System: The paper introduces a new way of digesting news content by segmenting a news article into multiple sections and generating corresponding summaries for each section. The authors create a dataset called SEGNEWS, consisting of 27k news articles with sections and aligned heading-style section summaries. They propose a novel segmentation-based language generation model adapted from pretrained language models that can jointly segment a document and produce the summary for each section. Experimental results on SEGNEWS show that their model outperforms several state-of-the-art sequence-to-sequence generation models for this task.

Prasetya Ajie Utama, Joshua Bambrick, Nafise Sadat Moosavi, Iryna Gurevych

382.  Falsesum: Generating Document-level NLI Examples for Recognizing Factual Inconsistency in Summarization
NAACL, 2022

The paper discusses how neural abstractive summarization models can generate summaries that are factually inconsistent with their source documents. Previous attempts to recognize such inconsistencies using natural language inference (NLI) have been unsuccessful due to the models' inability to generalize to the task. The authors propose a data generation pipeline called Falsesum, which uses a text generation model to introduce varying types of factual inconsistencies into human-annotated summaries. The resulting dataset contains diverse yet plausible examples, and models trained on it improve performance on four benchmarks for detecting factual inconsistency in summarization.

Yang Deng, Wai Lam, Yuexiang Xie, Daoyuan Chen, Yaliang Li, Min Yang, Ying Shen

383.  Joint Learning of Answer Selection and Answer Summary Generation in Community Question Answering
AAAI, 2020

The paper discusses the issues of redundancy and lengthiness in crowdsourced answers in Community Question Answering (CQA), which limit the performance of answer selection and lead to difficulties for community users. To solve these problems, the authors propose a novel joint learning model that tackles the tasks of answer selection and answer summary generation in CQA. They design a question-driven pointer-generator network that exploits the correlation information between question-answer pairs to aid in attending the essential information when generating answer summaries. They also leverage the answer summaries to alleviate noise in original lengthy answers when ranking the relevancy degrees of question-answer pairs. The authors construct a new large-scale CQA corpus, WikiHowQA, which contains long answers for answer selection as well as reference summaries for answer summarization. The experimental results show that the joint learning method effectively addresses the answer redundancy issue in CQA and achieves state-of-the-art results on both answer selection and text summarization tasks. The proposed model is also shown to be of great transferring ability and applicability for resource-poor CQA tasks that lack reference answer summaries.

Miguel Arana-Catania, Rob Procter, Yulan He, Maria Liakata

384.  Evaluation of Abstractive Summarisation Models with Machine Translation in Deliberative Processes
EMNLP, 2021

The paper discusses the summarization of deliberative processes in non-English languages, which involves combining multiple narratives of poor grammatical quality in a single text. The authors evaluate various abstractive summarization models in combination with a machine translation model, and report promising results in terms of fluency, consistency, and relevance of the summaries produced. The approach is easy to implement for many languages by changing the translation model.

Shiyue Zhang, Mohit Bansal

385.  Finding a Balanced Degree of Automation for Summary Evaluation
EMNLP, 2021

The paper discusses the challenges of evaluating summarization tasks using human evaluation and automatic metrics. The authors propose a flexible semiautomatic to automatic summary evaluation metrics called LitePyramid, which uses a natural language inference model and semantic role labeling model to replace manual work. LitePyramid is compared to 15 existing metrics and evaluated on three meta-evaluation datasets and a newly collected dataset. The results show that LitePyramid consistently has the best summary-level correlations and can reduce costs for future data collection.

Vivek Gupta, Prerna Bharti, Pegah Nokhiz, Harish Karnick

386.  SUMPUBMED: Summarization Dataset of PubMed Scientific Articles
ACL, 2021

The paper discusses the limitations of text summarization models that are trained on news article datasets, where the summary is typically located at the beginning of the text. To address this issue, the authors created a new dataset called SUMPUBMED, which contains scientific articles from the PubMed archive. The summary in SUMPUBMED is distributed throughout the text and contains rare domain-specific scientific terms, making it challenging for seq2seq models that are trained on news articles to summarize effectively. The authors conclude that SUMPUBMED provides new opportunities for improving text summarization models and developing new evaluation metrics.

Reinald Kim Amplayo, Stefanos Angelidis, Mirella Lapata

387.  Unsupervised Opinion Summarization with Content Planning
AAAI, 2021

The paper discusses the challenges of using deep learning techniques for summarizing reviews due to the lack of large-scale datasets. The authors propose a method that incorporates content planning into the summarization model, which improves the quality of the output and allows for the creation of more natural synthetic datasets. The content plans are generated from aspect and sentiment distributions induced from data without expensive annotations. The synthetic datasets are created by sampling pseudo-reviews from a Dirichlet distribution, and the model generates summaries based on input reviews and induced content plans. Experimental results show that their approach outperforms other models in generating informative, coherent, and fluent summaries that capture opinion consensus.

Menglin Xia, Ekaterina Kochmar, Ted Briscoe

388.  Automatic learner summary assessment for reading comprehension
NAACL, 2019

System: The paper discusses the development of a tool for assessing learner reading comprehension through automated assessment of their summaries. The authors propose three novel approaches to assess the summaries and evaluate them on two datasets they created. The results show that their models outperform traditional approaches and produce quality assessments close to those of professional examiners.

Cai Yang, Stephen Wan

389.  Investigating Metric Diversity for Evaluating Long Document Summarisation
COLING, 2022

The paper discusses the LongSumm shared task, which focuses on long document summarization and has been limited by its use of a single family of metrics for evaluation. The authors replicated the evaluation using multiple test set samples and found that the use of additional metrics revealed high-quality summaries missed by the original metrics. They also suggest that SPICE could be a candidate metric for summarization evaluation in LongSumm1. The relative ranking of systems changed under this more rigorous evaluation, but some key learnings from previous years still held.

Qingyu Zhou, Furu Wei, Ming Zhou

390.  At Which Level Should We Extract? An Empirical Study on Extractive Document Summarization
COLING, 2020

The paper discusses the effectiveness of extractive methods in automatic document summarization and proposes extracting sub-sentential units instead of full sentences. The authors show that extracting full sentences can lead to redundancy and unnecessity issues, and present a neural extractive model that leverages sub-sentential information. The experiments and analyses demonstrate that extracting sub-sentential units performs competitively compared to full sentence extraction. The paper provides inspiration for future research on the basic extraction units in extractive summarization.

Jiacheng Xu

391.  Dissecting Generation Modes for Abstractive Summarization Models via Ablation and Attribution
ACL, 2021

The paper proposes a two-step method to interpret the decisions made by neural abstractive summarization models. The first step involves analyzing the model's behavior to categorize each decoder decision into one of several generation modes. The second step involves interpreting the decisions using different attribution methods to determine their importance for the generation of the next token. The paper demonstrates the method's capability to identify phrases the summarization model has memorized and determine where in the training pipeline this memorization happened, as well as study complex generation phenomena like sentence fusion on a per-instance basis.

Logan Lebanoff, John Muchovej, Franck Dernoncourt, Doo Soon Kim, Lidan Wang, Walter Chang, Fei Liu

392.  Understanding Points of Correspondence between Sentences for Abstractive Summarization
ACL, 2020

The paper discusses the challenge of fusing sentences with disparate content to create informative and succinct summaries, which is a task that humans can easily perform but is difficult for modern abstractive summarizers. The authors propose introducing the notion of points of correspondence, which are cohesive devices that tie any two sentences together into a coherent text, and provide a dataset containing human annotations of points of correspondence between sentences. The dataset bridges the gap between coreference resolution and summarization and can serve as a basis for future work to measure the success of sentence fusion systems.

Yassine Mrabet, Dina Demner-Fushman

393.  HOLMS: Alternative Summary Evaluation with Large Language Models
COLING, 2020

The paper discusses the need for evaluation measures in document summarization that can rank systems based on individual summaries rather than just an average score. It highlights the limitations of current measures like ROUGE and BLEU, which are lexical in nature and not ideal for training neural networks. The authors propose a new hybrid evaluation measure called HOLMS, which combines language models and lexical similarity measures. They demonstrate through experiments that HOLMS outperforms ROUGE and BLEU in correlation with human judgments on several extractive summarization datasets for both linguistic quality and pyramid scores.

Anna Jørgensen, Anders Søgaard

394.  Evaluation of Summarization Systems across Gender, Age, and Race
EMNLP, 2021

System: The paper discusses how summarization systems are evaluated by human annotators and raters, who are often recruited through platforms with skewed demographics. The authors argue that this can lead to bias in system development and evaluation, as summary evaluation is sensitive to protected attributes. They suggest building models that cater to all groups rather than just some.

Masato Takatsuka, Tetsunori Kobayashi, Yoshihiko Hayashi

395.  Phrase-Level Localization of Inconsistency Errors in Summarization by Weak Supervision
COLING, 2022

The paper proposes a methodology for identifying inconsistency errors in summarization. A synthetic dataset is created to train a model called SumPhrase, which can detect factual errors in summarization more effectively than existing weakly supervised methods. The joint identification of error-corresponding original sentences is proven to be effective in improving error detection accuracy.

Elaheh ShafieiBavani, Mohammad Ebrahimi, Raymond Wong, Fang Chen

396.  A Graph-theoretic Summary Evaluation for ROUGE
EMNLP, 2018

System: The paper discusses the limitations of the ROUGE evaluation metric for text summarization, which only considers surface similarities between summaries and cannot accurately assess summaries with lexical variations and paraphrasing. The authors propose a graph-based approach to incorporate both lexical and semantic similarities into ROUGE. The results of experiments on TAC AESOP datasets show that this approach improves the correlation between ROUGE and human judgments.

Shahin Rahbariasl, Mark D. Smucker

397.  Time-Limits and Summaries for Faster Relevance Assessing
SIGIR, 2019

The paper discusses the importance of relevance assessing in applications such as high-recall retrieval and test collection construction. The authors conducted a user study with 60 participants to investigate the impact of time limits and document size on relevance assessing. They found that using a time limit as short as 15 seconds or judging document summaries in place of full documents could significantly speed judging without significantly affecting judging quality. Participants found judging document summaries with a 60 second time limit to be the easiest and best experience. The authors suggest that high quality document summaries can provide the same speed benefits as time limits while improving the judging experience for assessors.

Rezvaneh Rezapour, Rosie Jones, Sravana Reddy, Ian Soboroff

398.  What Makes a Good Podcast Summary?
SIGIR, 2022

System: This paper discusses the motivation behind abstractive summarization of podcasts, which is driven by the increasing popularity of podcasts and the needs of their listeners. The authors note that podcasting is a unique domain that differs from news and other media commonly studied in automatic summarization research. The study uses a collection of podcast summaries generated by different algorithms and human judgments of summary quality from the TREC 2020 Podcasts Track to explore the correlations between various automatic evaluation metrics and human judgments, as well as the linguistic aspects of summaries that lead to strong evaluations. The qualities of a good podcast summary are still unknown, and this study aims to shed light on this topic.

Taehee Jung, Dongyeop Kang, Lucas Mentch, Eduard Hovy

399.  Earlier Isn’t Always Better: Sub-aspect Analysis on Corpus and System Biases in Summarization
EMNLP, 2019

The paper explores the biases and sub-aspects of summarization systems, specifically position, importance, and diversity, across nine different summarization corpora. The study finds that while position exhibits substantial bias in news articles, this is not the case for academic papers and meeting minutes. Additionally, different types of summarization systems are composed of different degrees of the sub-aspects. The study provides useful lessons for developing new summarization systems and collecting new summarization datasets.

Ming Zhong, Pengfei Liu, Danqing Wang, Xipeng Qiu, Xuanjing Huang

400.  Searching for Effective Neural Extractive Summarization: What Works and What’s Next
ACL, 2019

System: The paper discusses the success of deep neural networks in text summarization, but notes that there is still much to be understood about why they work so well and how they can be improved. The authors explore different model architectures, transferable knowledge, and learning schemas to improve neural extractive summarization systems. They also present a new framework that achieves state-of-the-art results on CNN/DailyMail. The paper aims to provide insights for future research on extractive summarization and the source code is available on Github.

Oleg Vasilyev, John Bohannon

401.  Is Human Scoring the Best Criteria for Summary Evaluation?
ACL, 2021

System: The paper challenges the commonly held belief that a summary quality measure is best judged by how closely it correlates with quality scores produced by human annotators. The authors present observations that question this view and propose an alternative criterion for selecting the best measure from a group of measures that does not rely on human scores.

Philippe Laban, Tobias Schnabel, Paul N. Bennett, Marti A. Hearst

402.  SUMMAC: Re-Visiting NLI-based Models for Inconsistency Detection in Summarization
TACL, 2022

The paper discusses the importance of factual consistency in summaries and the limitations of natural language inference (NLI) models for inconsistency detection. The authors propose a new method called SUMMACCONV, which segments documents into sentence units and aggregates scores between pairs of sentences, enabling NLI models to be successfully used for this task. They also introduce a new benchmark called SUMMAC, consisting of six large inconsistency detection datasets. On this dataset, SUMMACConv obtains state-of-the-art results with a balanced accuracy of 74.4%, a 5% improvement compared with prior work.

Dandan Huang, Leyang Cui, Sen Yang, Guangsheng Bao, Kun Wang, Jun Xie, Yue Zhang

403.  What Have We Achieved on Text Summarization?
EMNLP, 2020

The paper discusses the current state of text summarization using deep learning and highlights the gaps that still exist between automatic summarizers and human professionals. The authors use the Multidimensional Quality Metric to identify 8 major sources of errors on 10 representative summarization models. They find that extractive summarizers are generally better than abstractive ones in terms of faithfulness and factual-consistency. They also note that pre-training techniques, particularly sequence-to-sequence pre-training, are highly effective for improving text summarization, with BART being the most effective. The paper provides insights into the strengths and limitations of different summarization techniques and highlights areas for future research.

Yiran Chen, Pengfei Liu, Xipeng Qiu

404.  Are Factuality Checkers Reliable? Adversarial Meta-evaluation of Factuality in Summarization
EMNLP, 2021

The paper discusses the importance of generating summaries that are not only fluent and informative but also factually correct, and the rapid development of the field of factual evaluation. However, the meta-evaluation methodologies of factuality metrics are limited in their opacity, leading to insufficient understanding of their relative advantages and applicability. The paper presents an adversarial meta-evaluation methodology that diagnoses the strengths and weaknesses of 6 existing top-performing metrics over 24 diagnostic test datasets and searches for directions for further improvement by data augmentation. The authors propose several calls for future research and make all codes, diagnostic test datasets, and trained factuality models available.

Yuning Mao, Liyuan Liu, Qi Zhu, Xiang Ren, Jiawei Han

405.  Facet-Aware Evaluation for Extractive Summarization
ACL, 2020

The paper proposes a new evaluation setup for extractive summarization that focuses on assessing the information coverage in extracted summaries. This setup involves treating each sentence in the reference summary as a facet and identifying the sentences in the document that express the semantics of each facet as support sentences. The evaluation is then performed by comparing the indices of extracted sentences and support sentences of all the facets in the reference summary. The authors construct an extractive version of the CNN/Daily Mail dataset to facilitate this new evaluation setup and demonstrate that it is more effective than commonly adopted metrics like ROUGE in manifesting better correlation with human judgment, enabling fine-grained evaluation and comparative analysis, and revealing valuable insights of state-of-the-art summarization methods.

Ruijia Cheng, Alison Smith-Renner, Ke Zhang, Joel R. Tetreault, Alejandro Jaimes

406.  Mapping the Design Space of Human-AI Interaction in Text Summarization
NAACL, 2022

The paper explores the role of humans in automatic text summarization systems and the design considerations for human-AI interaction in text generation tasks. The authors conducted a literature review and developed a taxonomy of five interactions in AI-assisted text generation. They designed text summarization prototypes for each interaction and interviewed 16 users to understand their expectations, experience, and needs regarding efficiency, control, and trust with AI in text summarization. The paper proposes design considerations for human-AI interaction in text summarization and broader text generation tasks.

Xinnuo Xu, Ondřej Dušek, Jingyi Li, Verena Rieser, Ioannis Konstas

407.  Fact-based Content Weighting for Evaluating Abstractive Summarisation
ACL, 2020

The paper discusses the difficulty of evaluating abstractive summarization using standard word-overlap-based metrics, and introduces a new evaluation metric based on fact-level content weighting. The metric relates the facts of the document to the facts of the summary, and assumes that a good summary will reflect all relevant facts present in the human-generated reference summary. The authors confirm this hypothesis by showing that their weightings are highly correlated to human perception and compare favorably to a recent manual highlight-based metric.

Ping Chen, Fei Wu, Tong Wang, Wei Ding

408.  A Semantic QA-Based Approach for Text Summarization Evaluation
AAAI, 2018

The paper discusses the challenge of assessing the quality of Natural Language Processing and Computational Linguistics applications that generate new texts based on existing texts. Specifically, the paper focuses on the problem of pinpointing content differences between two text passages, especially for large passages such as articles and books. The authors propose a new approach that treats one text passage as a small knowledge base and asks it a large number of questions to identify all content points. By comparing the correctly answered questions from two text passages, the authors are able to compare their content precisely. The experiment using 2007 DUC summarization corpus shows promising results.

Nikita Salkar, Thomas Trikalinos, Byron C. Wallace, Ani Nenkova

409.  Self-Repetition in Abstractive Neural Summarizers
AACL, 2022

The paper analyzes self-repetition in the output of neural summarizers, measuring it as the number of repeated n-grams of length four or longer. Three popular architectures (BART, T5, and Pegasus) are analyzed, and it is found that BART is particularly prone to self-repetition. Fine-tuning on more abstractive data and data featuring formulaic language is associated with a higher rate of self-repetition. Qualitative analysis reveals that systems produce artefacts such as ads and disclaimers unrelated to the content being summarized, as well as formulaic phrases common in the fine-tuning domain. The paper suggests that their approach to corpus level analysis of self-repetition may help practitioners clean up training data for summarizers and ultimately support methods for minimizing the amount of self-repetition.

Maxime Peyrard, Judith Eckle-Kohler

410.  A Principled Framework for Evaluating Summarizers: Comparing Models of Summary Quality against Human Judgments
ACL, 2017

System: The paper introduces a new framework for evaluating extractive summarizers based on an optimization problem. It shows that every extractive summarizer can be broken down into an objective function and an optimization technique. The authors compare and evaluate several objective functions in well-known summarizers and analyze their correlation with human judgments. The comparison across two datasets provides surprising insights into the role and performance of objective functions in different summarizers.

Joshua Maynez, Shashi Narayan, Bernd Bohnet, Ryan McDonald

411.  On Faithfulness and Factuality in Abstractive Summarization
ACL, 2020

The paper examines the limitations of neural text generation models for abstractive document summarization and finds that these models often generate content that is unfaithful to the input document. A large scale human evaluation of several neural abstractive summarization systems was conducted to better understand the types of hallucinations they produce. The analysis shows that pretrained models are better summarizers in terms of generating faithful and factual summaries as evaluated by humans. Textual entailment measures are found to better correlate with faithfulness than standard metrics, potentially leading to better automatic evaluation metrics and training and decoding criteria.

Mateusz Krubiński, Pavel Pecina

412.  From COMET to COMES – Can Summary Evaluation Benefit from Translation Evaluation?
AACL, 2022

The paper discusses the use of COMET, a neural-based evaluation metric for Machine Translation systems, for evaluating Text Summarization systems. Despite being trained on multilingual MT outputs, COMET performs well in monolingual settings for predicting summarization output quality. The authors introduce a variant of the model, COMES, trained on annotated summarization outputs using MT data for pre-training. The performance of COMES is examined on several datasets with human judgments for different notions of summary quality, across various domains and languages.

Daniel Deutsch, Tania Bedrax-Weiss, Dan Roth

413.  Towards Question-Answering as an Automatic Metric for Evaluating the Content Quality of a Summary
TACL, 2021

The paper proposes a new metric, QAEval, to evaluate the content quality of a summary using question-answering (QA) instead of traditional text overlap based metrics such as ROUGE. QA-based methods directly measure a summary's information overlap with a reference, making them fundamentally different than text overlap metrics. The authors demonstrate the experimental benefits of QA-based metrics through an analysis of QAEval, which outperforms current state-of-the-art metrics on most evaluations using benchmark datasets. The authors also identify the performance bottlenecks of QAEval and estimate that its potential upper-bound performance surpasses all other automatic metrics, approaching that of the gold-standard Pyramid Method.

Ori Shapira, David Gabay, Yang Gao, Hadar Ronen, Ramakanth Pasunuru, Mohit Bansal, Yael Amsterdamer, Ido Dagan

414.  Crowdsourcing Lightweight Pyramids for Manual Summary Evaluation
NAACL, 2019

The paper discusses the importance of manual evaluation in summary evaluation methodology and the traditional Pyramid protocol, which is reliable but expensive and requires expertise. Cheaper and less thorough manual evaluation methods have been used instead, but the authors propose a lightweight sampling-based version of the Pyramid approach that can be crowdsourced. They analyze the performance of their method and release their crowdsourced Summary-ContentUnits and crowdsourcing scripts for future evaluations.

Maxime Peyrard

415.  A Simple Theoretical Model of Importance for Summarization
ACL, 2019

System: The paper argues that establishing theoretical models of Importance will advance our understanding of summarization and improve summarization systems. The authors propose definitions of Redundancy, Relevance, and Informativeness, and show how Importance arises as a single quantity that unifies these concepts. The paper also provides intuitions to interpret the proposed quantities and experiments to demonstrate the potential of the framework to inform and guide subsequent works.

Krtin Kumar, Jackie Chi Kit Cheung

416.  Understanding the Behaviour of Neural Abstractive Summarizers using Contrastive Examples
NAACL, 2019

The paper examines the performance of neural abstractive summarizers in generating summary texts and their ability to understand deeper syntactic and semantic structures. The authors generate a set of contrastive summaries and test whether existing neural summarizers score them more highly than human-written summaries. They find that these systems fail to understand the source text in a majority of cases.

Raghuram Vadapalli, Litton J Kurisinkel, Manish Gupta, Vasudeva Varma

417.  SSAS: Semantic Similarity for Abstractive Summarization
IJCNLP, 2017

The paper introduces a new metric called Semantic Similarity for Abstractive Summarization (SSAS) that evaluates system-generated summaries at a semantic inference level. Previous approaches relied on word or syntactic sub-sequence overlap, which cannot evaluate summaries at this level. SSAS uses natural language inference and paraphrasing techniques to weigh quantities representing agreement, contradiction, topical neutrality, paraphrasing, and optionally ROUGE score between a system-generated and human-written summary.

Hardy Shashi Narayan, Andreas Vlachos

418.  HIGHRES: Highlight-based Reference-less Evaluation of Summarization
ACL, 2019

The paper discusses the challenges of manual evaluation of system-generated summaries and proposes a novel approach called HIGHlight-based Reference-less Evaluation of Summarization (HIGHRES). This approach involves assessing summaries against the source document using manually highlighted salient content. The authors validate their approach by employing crowd-workers to augment a dataset and compare two state-of-the-art systems. They demonstrate that HIGHRES improves inter-annotator agreement and helps emphasize differences among systems that would be ignored under other evaluation approaches.

Ukyo Honda, Tsutomu Hirao, Masaaki Nagata

419.  Pruning Basic Elements for Better Automatic Evaluation of Summaries
NAACL, 2018

The paper introduces a new automatic evaluation measure for summarization called pruned Basic Elements (pBE). It addresses the weakness of the widely used BE concept, which redundantly matches basic elements. pBE prunes basic elements by disregarding frequency count and reducing semantically overlapped elements based on word similarity. The study shows that pBE outperforms ROUGE in DUC datasets and achieves the highest rank correlation coefficient in TAC 2011 AESOP task.

Yuexiang Xie, Fei Sun, Yang Deng, Yaliang Li, Bolin Ding

420.  Factual Consistency Evaluation for Text Summarization via Counterfactual Estimation
EMNLP, 2021

The paper discusses the issue of factual inconsistency in generated summaries despite significant progress in text summarization. The authors propose a novel metric to evaluate factual consistency in text summarization via counterfactual estimation, which removes the effect of language prior from the total causal effect on the generated summary. This provides a simple yet effective way to evaluate consistency without relying on other auxiliary tasks. The authors conduct experiments on three public abstractive text summarization datasets and demonstrate the advantages of the proposed metric in improving the correlation with human judgments and the convenience of usage. The source code is available at https://github.com/xieyxclack/factual_coco.

Wang Chen, Piji Li, Irwin King

421.  A Training-free and Reference-free Summarization Evaluation Metric via Centrality-weighted Relevance and Self-referenced Redundancy
ACL, 2021

The paper proposes a training-free and reference-free summarization evaluation metric to avoid the costly and time-consuming process of collecting human-annotated references and ratings. The metric consists of a centrality-weighted relevance score and a self-referenced redundancy score. The relevance score is computed between the pseudo reference built from the source document and the given summary, and the redundancy score evaluates the redundant information in the summary. The final evaluation score is produced by combining the relevance and redundancy scores. The proposed method outperforms existing methods on both multi-document and single-document summarization evaluation. The source code is available at the given link.

Forrest Sheng Bao, Minghui Qiu, Yinfei Yang, Cen Chen

422.  SueNes: A Weakly Supervised Approach to Evaluating Single-Document Summarization via Negative Sampling
NAACL, 2022

The paper discusses the limitations of current automatic summary evaluation metrics, which focus on lexical similarity and require a reference summary. The authors propose a weakly supervised approach that does not require a reference summary, using existing summarization datasets and pairing documents with corrupted reference summaries for training. In cross-domain tests, their approach outperforms baselines and shows advantages in gauging linguistic qualities over all metrics.

Leonardo F. R. Ribeiro, Mengwen Liu, Iryna Gurevych, Markus Dreyer, Mohit Bansal

423.  FACTGRAPH: Evaluating Factuality in Summarization with Semantic Graph Representations
NAACL, 2022

The paper discusses the limitations of current abstractive summarization approaches, which often generate summaries that are not factually consistent with the source document. The authors propose a new method called FACTGRAPH, which decomposes the document and summary into structured meaning representations (MR) to better evaluate factuality. FACTGRAPH encodes these MRs using a graph encoder and text encoder, and experiments show that it outperforms previous approaches by up to 15% in identifying factual errors and inconsistencies.

Markus Zopf

424.  Estimating Summary Quality with Pairwise Preferences
NAACL, 2018

The paper proposes a new evaluation approach for automatic summarization systems based on pairwise preferences of sentences, which is simpler and cheaper to obtain than gold standard summaries. The authors show that humans can provide useful feedback using this approach, and that it outperforms the three most popular versions of ROUGE with less expensive human input. Additionally, the framework can reuse already available evaluation data to achieve even better results.

Maxime Peyrard

425.  Studying Summarization Evaluation Metrics in the Appropriate Scoring Range
ACL, 2019

The paper discusses the issue of evaluating automatic summarization systems using human judgments. The current human judgment datasets were created during the DUC/TAC shared tasks, but modern systems are better than the best systems submitted at that time. The paper shows that evaluation metrics which behave similarly on these datasets strongly disagree in the higher-scoring range where current systems operate. This creates a problem as we cannot decide which metric to trust. The paper calls for collecting human judgments for high-scoring summaries to resolve this debate and improve summarization systems and metrics.

Yanjun Gao, Chen Sun, Rebecca J. Passonneau

426.  Automated Pyramid Summarization Evaluation
CONLL, 2019

System: The paper discusses the development of a method called Pyramid evaluation, which assesses the content of paragraph-length summaries of source texts. This method involves creating a pyramid that lists distinct units of content found in several reference summaries, weights them based on how many reference summaries they occur in, and produces three scores based on the weighted content of new summaries. The paper presents an automated version of this method that is more efficient, transparent, and complete than previous automated pyramid methods. The new method is tested on a dataset of student summaries and historical NIST data from extractive summarizers.

Tanya Goyal, Jiacheng Xu, Junyi Jessy Li, Greg Durrett

427.  Training Dynamics for Text Summarization Models
ACL, 2022

The paper discusses the fine-tuning process of pre-trained language models for summarization tasks and analyzes the training dynamics for generation models. The study focuses on different datasets and summary properties, such as abstractiveness and hallucination, to understand what the model learns at different stages of its fine-tuning process. The authors find that the model learns to copy the input early in the training process consistently across all datasets studied, while factual errors are learned in the later stages, though this behavior is more varied across domains. Based on these observations, the authors explore complementary approaches for modifying training to achieve different goals, such as improving factuality or improving abstractiveness.

Chris Kedzie, Kathleen McKeown

428.  Content Selection in Deep Learning Models of Summarization
EMNLP, 2018

The paper discusses experiments with deep learning models of summarization in various domains, finding that many sophisticated features do not improve performance over simpler models. This suggests that creating a summarizer for a new domain may be easier than previously thought, and questions the benefit of deep learning models for summarization in domains with massive datasets. The paper suggests that new forms of sentence representations or external knowledge sources are needed for better summarization.

Logan Lebanoff, John Muchovej, Franck Dernoncourt, Doo Soon Kim, Seokhwan Kim, Walter Chang, Fei Liu

429.  Analyzing Sentence Fusion in Abstractive Summarization
EMNLP, 2019

System: The paper examines how abstractive summarization systems combine information from multiple sentences to form summary sentences. The researchers analyzed the outputs of five state-of-the-art summarizers and found that while the summary sentences were mostly grammatical, they often failed to remain faithful to the original article. The study highlights the need for further research in this area to improve the accuracy of abstractive summarization systems.

Xiangru Tang, Alexander R. Fabbri, Ziming Mao, Griffin Adams, Borui Wang, Haoran Li, Yashar Mehdad, Dragomir Radev

430.  Investigating Crowdsourcing Protocols for Evaluating the Factual Consistency of Summaries
NAACL, 2022

The paper discusses the issue of factual inconsistencies in current pre-trained models used for summarization and the need to evaluate the factual consistency of summaries to develop better models. The authors conducted crowdsourced evaluations using two different methods to determine the factors that affect the reliability of human evaluation. They found that the ranking-based Best-Worst Scaling method is more reliable than the rating-based Likert Scale method, which highly depends on the target dataset and evaluation design. To improve crowdsourcing reliability, they extended the Likert rating scale and presented a scoring algorithm for Best-Worst Scaling called value learning. The authors also made their crowdsourcing guidelines publicly available to facilitate future work on factual consistency in summarization.

Tsutomu Hirao, Hidetaka Kamigaito, Masaaki Nagata

431.  Automatic Pyramid Evaluation Exploiting EDU-based Extractive Reference Summaries
EMNLP, 2018

System: The paper discusses the automation of the pyramid method, a manual evaluation framework. The authors transform human-made reference summaries into extractive reference summaries consisting of Elementary Discourse Units (EDUs) from source documents. They then weight each EDU by counting the number of extractive reference summaries that contain it. The summary is scored based on the correspondences between EDUs in the summary and those in the pyramid. The authors conducted experiments on DUC and TAC data sets and found that their methods strongly correlate with various manual evaluations.

Daniel Deutsch, Rotem Dror, Dan Roth

432.  Re-Examining System-Level Correlations of Automatic Summarization Evaluation Metrics
NAACL, 2022

The paper discusses the reliability of automatic summarization evaluation metrics in replicating human judgments of summary quality. The authors identify two inconsistencies in the definition of system-level correlation and propose changes to address them. First, they suggest using the full test set instead of a subset judged by humans to calculate the system score for an automatic metric, leading to more precise estimates of system-level correlations. Second, they propose calculating correlations only on pairs of systems with small differences in automatic scores, which are commonly observed in practice. The authors demonstrate that the best estimate of the correlation of ROUGE to human judgments is near 0 in realistic scenarios, highlighting the need for more high-quality human judgments and improved automatic metrics when differences in system scores are small.

Alex Wang, Kyunghyun Cho, Mike Lewis

433.  Asking and Answering Questions to Evaluate the Factual Consistency of Summaries
ACL, 2020

The paper discusses the limitations of abstractive summarization models due to frequent factual inconsistencies in their output. Existing automatic evaluation metrics are not sensitive to such errors. The authors propose QAGS, an automatic evaluation protocol that identifies factual inconsistencies in generated summaries by asking questions about the summary and its source. QAGS has higher correlations with human judgments of factual consistency than other automatic evaluation metrics and provides interpretability by indicating which tokens of a summary are inconsistent and why. The authors believe QAGS is a promising tool for automatically generating usable and factually consistent text. Code for QAGS is available on GitHub.

Liam Scanlon, Shiwei Zhang, Xiuzhen Zhang, Mark Sanderson

434.  Evaluation of Cross Domain Text Summarization
SIGIR, 2020

The paper discusses the effectiveness of extractive-abstractive hybrid summarization in generating concise summaries for long documents. Two approaches to hybrid summarization, extraction-then-abstraction and extraction-with-abstraction, are compared and evaluated through large-scale experiments. The study examines the generalization of the algorithms by testing them within and across news domains and comparing automatic assessments to human judgments. The results show that the extraction-then-abstraction approach outperforms the extraction-with-abstraction approach, especially for cross-domain headline generation.

Daniel Deutsch, Dan Roth

435.  Understanding the Extent to which Content Quality Metrics Measure the Information Quality of Summaries
CONLL, 2021

The paper analyzes the token alignments used by reference-based metrics such as ROUGE and BERTScore to compare summaries and argues that their scores largely cannot be interpreted as measuring information overlap. Rather, they are better estimates of the extent to which the summaries discuss the same topics. The consequence of this result is that the most frequently used summarization evaluation metrics do not align with the community’s research goal, to generate summaries with high-quality information. However, the paper concludes by demonstrating that a recently proposed metric, QAEval, which scores summaries using question-answering, appears to better capture information quality than current evaluations, highlighting a direction for future research.

Tamara Sladoljev-Agejev, Jan Šnajder

436.  Using Analytic Scoring Rubrics in the Automatic Assessment of College-Level Summary Writing Tasks in L2
IJCNLP, 2017

The paper discusses the automated scoring of college-level summary writing tasks in English as a second language (EL2) using the Reading-for-Understanding (RU) cognitive framework, extended with the Reading-to-Write (RW) element, and analytic scoring with six rubrics covering content and writing quality. The authors show that regression models with reference-based and linguistic features perform better than baselines across all rubrics and reveal interesting correlations between summary features and analytic rubrics, highlighting the links between the RU and RW constructs.

Julius Steen, Katja Markert

437.  How to Find Strong Summary Coherence Measures? A Toolbox and a Comparative Study for Summary Coherence Measure Evaluation
COLING, 2022

The paper discusses the importance of automatically evaluating the coherence of summaries and the challenges of doing so due to the use of disparate datasets and metrics. The authors conduct a large-scale investigation of various methods for summary coherence modeling and introduce two novel analysis measures to identify biases in coherence measures. They find that currently available automatic coherence measures are not reliable across all evaluation metrics, but large-scale language models fine-tuned on self-supervised tasks show promising results if they are trained to generalize across different summary lengths.

Ge Luo, Forrest Sheng Bao

438.  PrefScore: Pairwise Preference Learning for Reference-free Summarization Quality Assessment
COLING, 2022

System: The paper proposes a method for evaluating machine-generated summaries without a human-written reference summary. The method involves learning the preference rank of summaries using the Bradley-Terry power ranking model from inferior summaries generated by corrupting base summaries. The experiments conducted on several datasets show that the proposed method can produce scores highly correlated with human ratings.

Julius Steen, Katja Markert

439.  How to Evaluate a Summarizer: Study Design and Statistical Analysis for Manual Linguistic Quality Evaluation
EACL, 2021

The paper discusses the importance of manual evaluation in assessing progress in automatic text summarization. The authors conducted a survey on recent summarization system papers and found little agreement on how to perform evaluation studies. They conducted two evaluation experiments on coherence and repetitiveness and compared Likert-type and ranking annotations. They found that the best choice of evaluation method can vary depending on the aspect being evaluated. The authors also found that study parameters are often not fully reported and subsequent statistical analysis ignores grouping factors. They showed that the total number of annotators can have a strong impact on study power and that current statistical analysis methods can inflate type I error rates up to eight-fold. They highlight that eliciting multiple judgments per summary leads to less powerful and reliable annotations for system comparison given a fixed study budget.

Wojciech Kryściński, Bryan McCann, Caiming Xiong, Richard Socher

440.  Evaluating the Factual Consistency of Abstractive Text Summarization
EMNLP, 2020

The paper proposes a model-based approach for verifying factual consistency and identifying conflicts between source documents and generated summaries. The model is trained jointly for three tasks: predicting whether each summary sentence is factually consistent or not, extracting a span in the source document to support this consistency prediction, and extracting the inconsistent span from each summary sentence that is deemed inconsistent. The approach outperforms previous models and provides useful assistance in verifying factual consistency. The authors also release a dataset, code, and trained model weights for factual consistency verification.

Oleg Vasilyev, John Bohannon

441.  ESTIME: Estimation of Summary-to-Text Inconsistency by Mismatched Embeddings
EMNLP, 2021

The paper introduces a new reference-free summary quality evaluation measure called ESTIME, which focuses on the faithfulness of the summary. The measure counts potential inconsistencies between the summary and the source document and correlates strongly with expert scores in the SummEval dataset. The paper also presents a method of generating subtle factual errors in human summaries and shows that ESTIME is more sensitive to these errors than other common evaluation measures.

Matan Eyal, Tal Baumel, Michael Elhadad

442.  Question Answering as an Automatic Evaluation Metric for News Article Summarization
NAACL, 2019

The paper discusses recent developments in automatic summarization and headline generation, which have focused on maximizing ROUGE scores. The authors propose an alternative evaluation metric called Answering Performance for Evaluation of Summaries (APES), which uses reading comprehension to assess a summary's ability to answer questions about the source article. They compare APES to other manual evaluation metrics and present a neural abstractive model that maximizes APES and increases ROUGE scores.

Kexiang Wang, Tianyu Liu, Baobao Chang, Zhifang Sui

443.  An Anchor-Based Automatic Evaluation Metric for Document Summarization
COLING, 2020

The paper discusses a new protocol for designing reference-based metrics for document summarization that requires the endorsement of source documents. The proposed anchored ROUGE metric fixes each summary particle on the source document, resulting in a more solid computation. Empirical results on benchmark datasets show that using the source document induces a higher correlation with human judgments for the ROUGE metric. The protocol is self-explanatory and easy to implement, and can foster various effective designs of reference-based metrics besides the anchored ROUGE.

Alexander R. Fabbri, Chien-Sheng Wu, Wenhao Liu, Caiming Xiong

444.  QAFactEval: Improved QA-Based Factual Consistency Evaluation for Summarization
NAACL, 2022

The paper discusses the importance of factual consistency in text summarization models and evaluates two types of metrics, entailment-based and question answering (QA)-based, for measuring this quality. The authors find that carefully selecting the components of a QA-based metric is critical to performance and propose an optimized metric called QAFACTEVAL, which outperforms previous QA-based and entailment-based metrics. Additionally, the authors suggest that combining both types of metrics can further improve performance.

Stratos Xenouleas, Prodromos Malakasiotis, Marianna Apidianaki, Ion Androutsopoulos

445.  SUM-QE: a BERT-based Summary Quality Estimation Model
EMNLP, 2019

System: The paper introduces a new model called SUM-QE, which uses BERT to evaluate the quality of summarizations. Unlike other models, SUM-QE focuses on linguistic quality aspects that are not captured by content-based approaches. The model achieves high correlations with human ratings and outperforms simpler models. The predictions of SUM-QE can be used for system development and to inform users about the quality of automatically generated summaries and other types of text.

Pierre Jean A. Colombo, Chloé Clavel, Pablo

446.  InfoLM: A New Metric to Evaluate Summarization & Data2Text Generation
AAAI, 2022

The paper discusses the challenges of assessing the quality of natural language generation systems through human annotation, which is expensive and time-consuming. Researchers often rely on automatic metrics, but existing string-based metrics like BLEU do not handle synonyms well. The authors introduce InfoLM, a family of untrained metrics that uses a pre-trained masked language model and information measures to address these flaws. They demonstrate that InfoLM achieves significant improvement and correlation gains in many configurations on both summarization and data2text generation through direct assessment.

Simeng Sun, Ani Nenkova

447.  The Feasibility of Embedding Based Automatic Evaluation for Single Document Summarization
EMNLP, 2019

The paper discusses the limitations of using ROUGE to evaluate summarization systems and presents experiments on using distributed representations for evaluation. The results show that the max value over each dimension of the summary ELMo word embeddings and averaging the cosine similarity of all encoders yield high correlation with human ratings in both reference-based and reference-free settings. The distributed representations outperform ROUGE in recent corpora for abstractive news summarization but are less effective on older test data and systems.

Sascha Rothe, Joshua Maynez, Shashi Narayan

448.  A Thorough Evaluation of Task-Specific Pretraining for Summarization
EMNLP, 2021

The paper compares task-agnostic pretraining objectives with task-specific pretraining objectives for summarization tasks in a controlled study. The results show that task-agnostic pretraining is sufficient for most cases, reducing the need for costly task-specific pretraining. The study also reports new state-of-the-art numbers for two summarization tasks using a T5 model with 11 billion parameters and an optimal beam search length penalty.

Simeng Sun, Ori Shapira, Ido Dagan, Ani Nenkova

449.  How to Compare Summarizers without Target Length? Pitfalls, Solutions and Re-Examination of the Neural Summarization Literature
NAACL, 2019

The paper discusses how traditional summarization evaluations compared systems that produced summaries of the same length, but neural approaches have done away with this requirement. The paper presents experiments showing that summaries of different lengths produced by the same system have a clear non-linear pattern of quality as measured by ROUGE F1 scores. The paper proposes a new evaluation method where ROUGE scores are normalized by those of a random system producing summaries of the same length. The paper reanalyzes recently reported results and shows that some negative results are actually reports of system improvement once differences in length are taken into account. Finally, the paper presents a small-scale human evaluation showing a similar trend of perceived quality increase with summary length, calling for the need of similar normalization in reporting human scores.

Manik Bhandari, Pranav Gour, Atabak Ashfaq, Pengfei Liu

450.  Metrics also Disagree in the Low Scoring Range: Revisiting Summarization Evaluation Metrics
COLING, 2020

The paper discusses the evaluation of automatic metrics in text summarization, specifically focusing on the disagreement between metrics when ranking high-scoring summaries. The authors revisit previous experiments and suggest that the narrow scoring range of summaries may be the reason for the disagreement. They also analyze three other properties that impact inter-metric agreement: Ease of Summarization, Abstractiveness, and Coverage. The authors make their analysis code and data publicly available to encourage reproducible research.

Maartje ter Hoeve, Julia Kiseleva, Maarten de Rijke

451.  What Makes a Good and Useful Summary? Incorporating Users in Automatic Summarization Research
NAACL, 2022

The paper discusses the gap between the current research focus in automatic summarization and users' needs, particularly university students who heavily rely on summaries. To address this, the authors propose a survey methodology that can be adjusted to investigate different user groups. They find that the current research directions do not fully align with students' needs and suggest ways to mitigate this mismatch in future research.

Yanzhu Guo, Chloé Clavel, Moussa Kamal Eddine, Michalis Vazirgiannis

452.  Questioning the Validity of Summarization Datasets and Improving Their Factual Consistency
EMNLP, 2022

The paper discusses the lack of a well-defined formulation for summarization evaluation, which has led to popular summarization datasets being constructed in a way that does not guarantee validity or factual consistency. The authors address this issue by combining factual consistency models to identify problematic instances and release a filtered summarization dataset called SummFC with improved factual consistency. They demonstrate that models trained on this dataset achieve improved performance in nearly all quality aspects and argue that it should become a valid benchmark for developing and evaluating summarization systems.

Ori Shapira, David Gabay, Hadar Ronen, Judit Bar-Ilan, Yael Amsterdamer, Ani Nenkova, Ido Dagan

453.  Evaluating Multiple System Summary Lengths: A Case Study
EMNLP, 2018

The paper explores whether reference summaries of a single length can be used to evaluate system summaries of varying lengths. The authors conducted a case study using several variants of the ROUGE metric and found that the evaluation protocol is competitive. This paves the way for practical evaluation of varying-length summaries using existing summarization benchmarks.

Tobias Falke, Leonardo F. R. Ribeiro, Prasetya Ajie Utama, Ido Dagan, Iryna Gurevych

454.  Ranking Generated Summaries by Correctness: An Interesting but Challenging Application for Natural Language Inference
ACL, 2019

The paper discusses the limitations of abstractive summarization due to factual errors in generated summaries. The authors evaluate summaries produced by state-of-the-art models and find that errors occur frequently, especially with more abstractive models. They explore the use of textual entailment predictions to detect and reduce such errors by reranking alternative predicted summaries. The authors find that current entailment models do not offer the desired performance for this task and release their annotations as additional test data for future evaluations of natural language inference.

Jiacheng Xu, Shrey Desai, Greg Durrett

455.  Understanding Neural Abstractive Summarization Models via Uncertainty
EMNLP, 2020

The paper discusses the difficulty in interpreting the behavior of seq2seq abstractive summarization models, which generate text in a free-form manner. The authors analyze summarization decoders by studying the entropy of the model's token-level predictions, finding a correlation between low prediction entropy and where the model copies tokens rather than generating novel text. The decoder's uncertainty also connects to factors like sentence position and syntactic distance between adjacent pairs of tokens, giving insight into what factors make a context particularly selective for the model's next output token. Finally, the authors study the relationship between decoder uncertainty and attention behavior to understand how attention gives rise to these observed effects in the model. The paper concludes that uncertainty is a useful perspective for analyzing summarization and text generation models more broadly.

Maartje ter Hoeve, Julia Kiseleva, Maarten de Rijke

456.  What Makes a Good Summary? Reconsidering the Focus of Automatic Summarization
NAACL, 2022

The paper discusses the need to re-assess the focus and objectives of automatic text summarization and whether they align with users' desires. The authors conducted a survey among heavy users of pre-made summaries and found that the current focus of the field does not fully align with participants' wishes. They propose adopting a broader perspective on automatic summarization, expanding the types of input material that can be summarized, and defining requirements for datasets that can facilitate these research directions. They also propose including usefulness as an important aspect of summarization in the evaluation methodology and propose a methodology to evaluate the usefulness of a summary. The authors hope to unlock important research directions for future work on automatic summarization.

Yixin Liu, Ansong Ni, Linyong Nan, Budhaditya Deb, Chenguang Zhu, Ahmed H. Awadallah, Dragomir Radev

457.  Leveraging Locality in Abstractive Text Summarization
EMNLP, 2022 Supervised Learning

The paper discusses the challenges of using neural attention models for long text summarization due to the quadratic memory complexity of the self-attention module. Instead of designing more efficient attention modules, the authors investigate if models with a restricted context can have competitive performance. They propose a locality-aware modeling strategy where the model is applied to individual pages grouped by the principle of locality during both the encoding and decoding stages. The authors empirically investigate three kinds of locality in text summarization at different levels of granularity and show that their model outperforms strong baseline models with efficient attention modules.

Daniel Deutsch, Dan Roth

458.  Benchmarking Answer Verification Methods for Question Answering-Based Summarization Evaluation Metrics
ACL, 2022

The paper discusses the importance of answer verification in question answering-based summarization evaluation metrics. The authors benchmark various answer verification methods, including lexical overlap and more sophisticated text comparison methods like BERTScore and LERC. They find that LERC performs well in some settings, but overall, improved verification performance does not necessarily lead to better QA-based metric quality. The authors attribute this to dataset properties.

Zhiyuan Zeng, Jiaze Chen, Weiran Xu, Lei Li

459.  Gradient-based Adversarial Factual Consistency Evaluation for Abstractive Summarization
EMNLP, 2021

The paper proposes a method for generating highly abstract yet factually correct summaries using an efficient weak-supervised adversarial data augmentation approach. The approach forms a factual consistency dataset and trains an evaluation model that can accurately and robustly discriminate factual consistency and trace factual errors. Experiments and analysis on public annotated summarization and factual consistency datasets demonstrate the effectiveness and reasonableness of the approach. The codes for the approach can be found at https://github.com/parZival27/GrAdualCC.

Natalie Schluter

460.  The limits of automatic summarisation according to ROUGE
EACL, 2017

System: This paper highlights the limitations of using the ROUGE metric for evaluating summarization systems, particularly in terms of optimal solutions. The authors provide the first proof that the task of summarization is NPhard. However, they also demonstrate that greedy algorithms perform well on three benchmark datasets. The paper also points out the difficulty in ensuring overall quality assurance, as there is no natural upper bound on the quality of summarization systems and even humans cannot achieve optimal summarization.

Yizhu Liu, Qi Jia, Kenny Q. Zhu

461.  Reference-free Summarization Evaluation via Semantic Correlation and Compression Ratio
NAACL, 2022

System: The paper proposes a new automatic reference-free evaluation metric for summarization that compares semantic distribution between source document and summary by pretrained language models and considers summary compression ratio. The experiments show that this metric is more consistent with human evaluation in terms of coherence, consistency, relevance, and fluency.

Saadia Gabriel, Asli Celikyilmaz, Rahul Jha, Yejin Choi, Jianfeng Gao, ♣ ♠Paul, G. Allen

462.  GO FIGURE: A Meta Evaluation of Factuality in Summarization
ACL, 2021

The paper discusses the challenge of ensuring factual correctness in machine-generated text and introduces a metaevaluation framework called GO FIGURE for evaluating factuality evaluation metrics. The framework proposes five necessary conditions for evaluating factuality metrics on diagnostic factuality data across three different summarization tasks. The benchmark analysis on ten factuality metrics shows that the framework provides a robust and efficient evaluation that is extensible to multiple types of factual consistency and standard generation metrics, including QA metrics. However, the performance of QA metrics is highly dependent on the way in which questions are generated.

Artidoro Pagnoni, Vidhisha Balachandran, Yulia Tsvetkov

463.  Understanding Factuality in Abstractive Summarization with FRANK: A Benchmark for Factuality Metrics
NAACL, 2021

The paper discusses the issue of factually unreliable outputs generated by modern summarization models and the lack of common benchmarks to measure their factuality. To address this, the authors devise a typology of factual errors and collect human annotations of generated summaries from state-of-the-art summarization systems for the CNN/DM and XSum datasets. They identify the proportion of different categories of factual errors in various summarization models and benchmark factuality metrics, showing their correlation with human judgement and specific strengths and weaknesses.

Nicholas Egan, Oleg Vasilyev, John Bohannon

464.  Play the Shannon Game With Language Models: A Human-Free Approach to Summary Evaluation
AAAI, 2022

The paper introduces new reference-free summary evaluation metrics that use a pretrained language model to estimate the information content shared between a document and its summary. These metrics are a modern take on the Shannon Game and an extension of BLANC. The authors empirically verify that their metrics achieve state-of-the-art correlation with human judgement of the summary quality dimensions of coherence and relevance, as well as competitive correlation with human judgement of consistency and fluency.

Mousumi Akter, Naman Bansal, Shubhra Kanti Karmaker

465.  Revisiting Automatic Evaluation of Extractive Summarization Task: Can We Do Better than ROUGE?
ACL, 2022

The paper discusses the limitations of the traditional ROUGE metric for evaluating automated summarization tasks and proposes a semantic-aware nCG-based evaluation metric called Sem-nCG. The paper demonstrates how to generate more reliable semantic-aware ground truths for evaluating extractive summarization tasks without additional human intervention. The authors conducted extensive experiments using the CNN/DailyMail dataset and found that Sem-nCG is more reliable and shows higher correlation with human judgement than ROUGE. The paper suggests that ROUGE often leads to inaccurate conclusions and Sem-nCG is a better alternative for evaluating extractive summarization tasks.

Marcio Fonseca, Yftah Ziser, Shay B. Cohen

466.  Factorizing Content and Budget Decisions in Abstractive Summarization of Long Documents
EMNLP, 2022 Supervised Learning

The paper proposes a method called FACTORSUM1 that disentangles content selection from the budget used to cover salient content, improving the performance and applicability of abstractive summarizers. This is achieved by factorizing summarization into two steps through an energy function: (1) generation of abstractive summary views covering salient information in subsets of the input document (document views); (2) combination of these views into a final summary, following a budget and content guidance. The model achieves significantly higher ROUGE scores on multiple benchmarks for long document summarization, and is effective for domain adaptation. The performance gains are due to more flexible budget adaptation and processing of shorter contexts provided by partial document views.

Maxime Peyrard, Teresa Botschen, Iryna Gurevych

467.  Learning to Score System Summaries for Better Content Selection Evaluation
EMNLP, 2017

System: The paper proposes a new automatic scoring metric for evaluating summaries, based on human judgments from classical summarization datasets. The model learns the best combination of existing automatic scoring metrics that correlates with human judgments. The reliability of the new metric is tested through a manual evaluation, and the trained metric is released as an open-source tool.

Yuhao Zhang, Derek Merck, Emily Bao Tsai, Christopher D. Manning, Curtis P. Langlotz

468.  Optimizing the Factual Correctness of a Summary: A Study of Summarizing Radiology Reports
ACL, 2020

The paper discusses the limitations of existing neural abstractive summarization models in terms of factual correctness and proposes a framework to evaluate and optimize the factual correctness of generated summaries using an information extraction module and reinforcement learning. The proposed method is applied to the summarization of radiology reports, where factual correctness is crucial, and is shown to substantially improve the quality of outputs over a competitive neural summarization system, approaching the quality of human-authored summaries.

Hwanhee Lee, Kang Min Yoo, Joonsuk Park, Hwaran Lee, Kyomin Jung

469.  Masked Summarization to Generate Factually Inconsistent Summaries for Improved Factual Consistency Checking
NAACL, 2022

The paper discusses the challenge of determining whether a generated summary is factually consistent with the source text, despite recent advances in abstractive summarization systems. The latest approach is to train a factual consistency classifier on factually consistent and inconsistent summaries, with the former readily available as reference summaries in existing summarization datasets. However, generating factually inconsistent summaries that are closely relevant to the source text remains a challenge. The paper proposes a method of generating such summaries using source texts and reference summaries with key information masked. Experiments on seven benchmark datasets demonstrate that factual consistency classifiers trained on summaries generated using this method generally outperform existing models and show a competitive correlation with human judgments. The characteristics of the summaries generated using this method are also analyzed, and a pre-trained model and code will be released.

Zheheng Luo, Qianqian Xie, Sophia Ananiadou

470.  Readability Controllable Biomedical Document Summarization
EMNLP, 2022 Supervised Learning

The paper discusses the need for readability controllable summarization for biomedical documents, as existing summarization systems do not consider the varying levels of expertise of readers. The authors introduce a new task of generating technical summaries for experts and plain language summaries for laypeople, and construct a corpus of biomedical papers with both types of summaries. They benchmark multiple advanced summarization models and propose a novel metric to evaluate the readability discrepancy between the two types of summaries. The results show that current control techniques are not effective in generating suitable summaries for different levels of expertise.

Xiuying Chen, Mingzhe Li, Shen Gao, Rui Yan, Xin Gao, Xiangliang Zhang

471.  Scientific Paper Extractive Summarization Enhanced by Citation Graphs
EMNLP, 2022 Unsupervised Learning

The paper discusses the use of citation graphs to improve scientific paper extractive summarization. The authors propose two models: a Multi-granularity Unsupervised Summarization model (MUS) and a Graph-based Supervised Summarization model (GSS). MUS finetunes a pre-trained encoder model on the citation graph by link prediction tasks, while GSS introduces a gated sentence encoder and a graph information fusion module to polish the sentence representation. Experiments on a public benchmark dataset show that both models bring substantial improvements over the prior state-of-the-art model.

Hongli Zhan, Tiberiu Sosea, Cornelia Caragea, Junyi Jessy Li

472.  Why Do You Feel This Way? Summarizing Triggers of Emotions in Social Media Posts
EMNLP, 2022 Supervised Learning

The paper discusses the importance of understanding the triggers that lead to people's emotions during crises such as the COVID-19 pandemic. It proposes a novel approach of emotion detection and trigger summarization using social media posts, which tend to be charged with multiple emotions and scattered triggers. The authors introduce COVIDET, a dataset of ~1,900 English Reddit posts related to COVID-19, with manual annotations of perceived emotions and abstractive summaries of their triggers. The paper also presents strong baselines for jointly detecting emotions and summarizing emotion triggers. The authors conclude that COVIDET presents new challenges in emotion-specific summarization and multi-emotion detection in long social media posts.

Abhishek Agarwal, Shanshan Xu, Matthias Grabmair

473.  Extractive Summarization of Legal Decisions using Multi-task Learning and Maximal Marginal Relevance
EMNLP, 2022 Supervised Learning

The paper presents techniques for extractive summarization of legal decisions in a low-resource setting using limited expert annotated data. The models locate relevant content using a sequential model and tackle redundancy by leveraging maximal marginal relevance to compose summaries. The proposed approaches can achieve ROUGE scores vis-à-vis expert extracted summaries that match those achieved by inter-annotator comparison. The multi-task learning model variant leverages rhetorical role identification as an auxiliary task to further improve the summarizer.

Vidhisha Balachandran, Hannaneh Hajishirzi, William W. Cohen, Yulia Tsvetkov

474.  Correcting Diverse Factual Errors in Abstractive Summarization via Post-Editing and Language Model Infilling
EMNLP, 2022 Supervised Learning

The paper proposes a new approach to correcting factual errors in abstractive summarization models. Instead of using heuristics to generate non-factual summaries, the authors generate hard, representative synthetic examples of non-factual summaries through infilling language models. With this data, they train a more robust fact-correction model to post-edit the summaries to improve factual consistency. The approach is shown to vastly outperform prior methods in correcting erroneous summaries on two popular summarization datasets, improving factuality scores by over ∼11 points on CNN/DM and over ∼31 points on XSum on average across multiple summarization models, while maintaining competitive summarization quality. The proposed model is called FACTEDIT.

Naman Bansal, Mousumi Akter, Shubhra Kanti Karmaker

475.  Learning to Generate Overlap Summaries through Noisy Synthetic Data
EMNLP, 2022 Supervised Learning

The paper discusses the Semantic Overlap Summarization (SOS) task, which involves summarizing common information from multiple alternate narratives. The lack of existing datasets for supervised training is a major challenge for this task. To address this, the authors propose a novel data augmentation technique to create synthetic data for training a seq-to-seq model. Through experiments using news narratives, they show that models trained using the synthetic dataset provide significant performance improvements over pre-trained summarization techniques and are close to models trained on golden training data. The proposed data augmentation technique is effective for training seq-to-seq models on the SOS task.

Chenhui Shen, Liying Cheng, Lidong Bing, Yang You, Luo Si

476.  SentBS: Sentence-level Beam Search for Controllable Summarization
EMNLP, 2022 Supervised Learning

The paper discusses the limitations of current structure-controlling methods in controllable text generation and proposes a new method called sentence-level beam search generation (SentBS) to address these limitations. SentBS evaluates sentences throughout the generation process to select suitable ones for subsequent generations. The paper experiments with different decoding methods as subcomponents for SentBS and evaluates the results on the structure-controlled dataset MReD. The experiments show that all explored combinations for SentBS can improve the agreement between the generated text and the desired structure, with the best method reducing structural discrepancies by approximately 68%.

Naman Bansal, Mousumi Akter, Shubhra Kanti Karmaker

477.  SEM-F1: an Automatic Way for Semantic Evaluation of Multi-Narrative Overlap Summaries at Scale
EMNLP, 2022

The paper discusses the Semantic Overlap Summarization (SOS) task, which involves generating a summary from multiple alternative narratives that convey common information. The authors focus on the automated evaluation of the SOS task using a benchmark dataset and find that the popular ROUGE metric is not suitable for this task. They propose a new evaluation metric called SEM-F1, which yields higher correlation with human judgment and inter-rater agreement compared to ROUGE. The metric is inspired by the sentence-wise annotation technique using overlap labels reported in previous work.

Hwanhee Lee, Cheoneum Park, Seunghyun Yoon, Trung Bui, Franck Dernoncourt, Juae Kim, Kyomin Jung

478.  Factual Error Correction for Abstractive Summaries Using Entity Retrieval
EMNLP, 2022

The paper discusses the problem of factual errors in abstractive summarization systems and proposes a solution in the form of an efficient factual error correction system called RFEC. The system is based on entity retrieval and retrieves evidence sentences from the original document to reduce the length of the text to analyze. It then detects entity-level errors in the summaries and substitutes the wrong entities with accurate ones from the evidence sentences. The experimental results show that RFEC outperforms baseline methods in correcting factual errors with a faster speed.

Rajdeep Mukherjee, Abhinav Bohra, Akash Banerjee, Soumya Sharma, Manjunath Hegde, Afreen Shaikh, Shivani Shrivastava, Koustuv Dasgupta, Niloy Ganguly, Saptarshi Ghosh, Pawan Goyal

479.  ECTSum: A New Benchmark Dataset For Bullet Point Summarization of Long Earnings Call Transcripts
EMNLP, 2022 Supervised Learning

The paper discusses the lack of efficient techniques to summarize financial documents and introduces a new dataset called ECTSum, which consists of transcripts of earnings calls and expert-written bullet point summaries. The authors benchmark their dataset with state-of-the-art summarization methods and present a simple yet effective approach called ECT-BPS to generate bullet points that capture important facts discussed in the calls.

Wenhao Wu, Wei Li, Jiachen Liu, Xinyan Xiao, Ziqiang Cao, Sujian Li, Hua Wu

480.  FRSUM: Towards Faithful Abstractive Summarization via Enhancing Factual Robustness
EMNLP, 2022 Supervised Learning

The paper discusses the unfaithful generation problem in current Seq2Seq summarization models, despite their ability to generate fluent and grammatical text. The authors propose a new perspective of factual robustness to measure the faithfulness of existing systems, which is the ability to correctly generate factual information over adversarial unfaithful information. They propose a novel training strategy called FRSUM, which enhances the model's factual robustness by teaching it to defend against both explicit adversarial samples and implicit factual adversarial perturbations. The evaluation results show that FRSUM consistently improves the faithfulness of various Seq2Seq models, such as T5 and BART.

Haojie Zhuang, Wei Emma Zhang, Jian Yang, Congbo Ma, Yutong Qu, Quan Z. Sheng

481.  Learning From the Source Document: Unsupervised Abstractive Summarization
EMNLP, 2022 Unsupervised Learning

The paper introduces an unsupervised learning method called SCR (Summarize, Contrast and Review) for abstractive text summarization. Unlike most state-of-the-art methods that heavily rely on high-quality and large-scale parallel corpora, SCR removes the need for reference summaries. It leverages contrastive learning and is the first work to apply it for unsupervised abstractive summarization. The model is trained using true source documents as positive examples and strategically generated fake source documents as negative examples. The generated summaries are also guided to be similar to human-written texts. The extensive experiments show that SCR outperforms other unsupervised abstractive summarization baselines, demonstrating its effectiveness.

Yuning Mao, Ming Zhong, Jiawei Han

482.  CiteSum: Citation Text-guided Scientific Extreme Summarization and Domain Adaptation with Limited Supervision
EMNLP, 2022 Supervised Learning

The paper proposes a new approach to automatically extract ultra-short summaries of scientific papers from their citation texts, creating a new benchmark dataset called CiteSum without human annotation. The authors conduct a comprehensive analysis of CiteSum and demonstrate the usefulness of the dataset by adapting models pre-trained on CiteSum to new tasks and domains with limited supervision. The results show that CITES outperforms most fully-supervised methods on SciTLDR for scientific extreme summarization and achieves significant gains on XSum for news extreme summarization and news headline generation.

Liam van der Poel, Ryan Cotterell, Clara Meister

483.  Mutual Information Alleviates Hallucinations in Abstractive Summarization
EMNLP, 2022

The paper discusses the issue of abstractive summarization models exhibiting the tendency to output content not supported by the source document, known as hallucinations. The authors identify high model uncertainty as a criterion that leads to more probability of hallucinated content during generation. They propose a decoding strategy that switches to optimizing for pointwise mutual information of the source and target token when the model exhibits uncertainty, which decreases the probability of hallucinated tokens while maintaining the ROUGE and BERTS scores of top-performing decoding strategies. The experiments on the XSUM dataset support the effectiveness of their proposed method.

Melanie Sclar, Peter West, Sachin Kumar, Yulia Tsvetkov, Yejin Choi

484.  REFEREE: Reference-Free Sentence Summarization with Sharper Controllability through Symbolic Knowledge Distillation
EMNLP, 2022 Unsupervised Learning

REFEREE is a new framework for sentence summarization that can be trained without the need for gold summaries. It allows for direct control of compression ratio and uses Symbolic Knowledge Distillation to distill latent knowledge from pre-trained language models. The framework proposes iterative distillation of knowledge, where student models from previous iterations serve as teacher models in the next iteration. The results show that the final student models outperform the much larger GPT3-Instruct model in terms of controllability of compression ratios without compromising the quality of summarization. The iterative distillation process also produces a high-quality dataset of sentence-summary pairs with varying degrees of compression ratios.

Jiayu Song, Iman Munire Bilal, Adam Tsakalidis, Rob Procter, Maria Liakata

485.  Unsupervised Opinion Summarisation in the Wasserstein Space
EMNLP, 2022 Unsupervised Learning

The paper discusses the challenges of opinion summarization of social media posts and presents WassOS, an unsupervised abstractive summarization model that uses the Wasserstein distance. The model disentangles the distributions of documents/posts into separate semantic and syntactic spaces and obtains the summary distribution using the Wasserstein barycenter. A latent variable is then fed into a GRU decoder with a transformer layer to produce the final summary. The experiments on multiple datasets show that WassOS outperforms the state-of-the-art on ROUGE metrics and consistently produces the best summaries with respect to meaning preservation according to human evaluations.

Wojciech Kryściński, Nazneen Rajani, Divyansh Agarwal, Caiming Xiong, Dragomir Radev

486.  BOOKSUM: A Collection of Datasets for Long-form Narrative Summarization
EMNLP, 2022

The paper discusses the limitations of existing text summarization datasets and introduces BOOKSUM, a collection of datasets for long-form narrative summarization. The dataset covers literature documents and includes highly abstractive, human-written summaries on three levels of granularity. The unique challenges posed by the domain and structure of the dataset include processing long documents, non-trivial causal and temporal dependencies, and rich discourse structures. The paper also evaluates multiple extractive and abstractive summarization models as baselines for the dataset.

Chao Zhao, Faeze Brahman, Kaiqiang Song, Wenlin Yao, Dian Yu, Snigdha Chaturvedi

487.  NARRASUM: A Large-Scale Dataset for Abstractive Narrative Summarization
EMNLP, 2022

The paper proposes NARRASUM, a large-scale narrative summarization dataset containing 122K narrative documents and their corresponding abstractive summaries. The dataset is collected from plot descriptions of movies and TV episodes with diverse genres. The paper highlights the challenges of summarizing a narrative, which requires an understanding of event causality and character behaviors. The experiments show a large performance gap between humans and state-of-the-art summarization models on NARRASUM. The authors hope that this dataset will promote future research in summarization and broader studies of natural language understanding and generation. The dataset is available at https://github.com/zhaochaocs/narrasum.

Daniel King, Zejiang Shen, Nishant Subramani, Daniel S. Weld, Iz Beltagy, Doug Downey, †MosaicML ‡MIT Allen

488.  Don’t Say What You Don’t Know: Improving the Consistency of Abstractive Summarization by Constraining Beam Search
EMNLP, 2022

The paper discusses the issue of "hallucinations" in abstractive summarization systems, where the system produces statements not supported by the source text. The authors analyze the connection between hallucinations and training data, and find that models hallucinate because they train on target summaries that are unsupported by the source. They present a new decoding method called PINOCCHIO, which improves the consistency of a transformer-based abstractive summarizer by constraining beam search to avoid hallucinations. PINOCCHIO detects likely model hallucinations based on various measures of attribution to the source text and can backtrack to find more consistent output or produce no summary at all when no consistent generation can be found. The experiments show that PINOCCHIO improves the consistency of generation by an average of 68% on two abstractive summarization datasets without hurting recall.

Tanya Goyal, Nazneen Rajani, Wenhao Liu, Wojciech Kryściński

489.  HYDRASUM: Disentangling Style Features in Text Summarization with Multi-Decoder Models
EMNLP, 2022 Supervised Learning

The paper introduces HYDRASUM, a new summarization architecture that uses multiple decoders to automatically learn contrasting summary styles without extra supervision. HYDRASUM provides a simple mechanism to obtain stylistically-diverse summaries by sampling from individual decoders or their mixtures, outperforming baseline models on three summarization datasets. A small modification to the gating strategy during training can enforce an even stricter style partitioning, allowing users to vary summary styles along multiple dimensions.

Tianshu Wang, Faisal Ladhak, Esin Durmus, He He

490.  Improving Faithfulness by Augmenting Negative Summaries from Fake Documents
EMNLP, 2022

The paper discusses the issue of current abstractive summarization systems producing summaries that are unfaithful to the source document, which can lead to misinformation. The authors propose a back-translation-style approach to augment negative samples that mimic factual errors made by the model, in order to teach the model to distinguish between faithful and unfaithful summaries. They also incorporate textual entailment data through multitasking to further improve performance. Experiments on three datasets show that their method consistently improves faithfulness without sacrificing informativeness.

Griffin Adams, Han-Chin Shing, Qing Sun, Christopher Winestock, Kathleen McKeown, Noémie Elhadad

491.  Learning to Revise References for Faithful Summarization
EMNLP, 2022

The paper proposes a new approach to improve the quality of reference summaries while retaining all data. The approach involves selectively rewriting unsupported reference sentences to better reflect source data. A synthetic dataset of positive and negative revisions is automatically generated, and models are trained to revise reference sentences with contrastive learning. The intensity of revisions is treated as a controllable attribute to balance faithfulness and abstraction. The proposed method is tested on noisy references from publicly available MIMIC-III discharge summaries for hospital-course summarization, and models trained on revised clinical references are found to be more faithful, informative, and fluent than models trained on original or filtered data.

Haopeng Zhang, Xiao Liu, Jiawei Zhang

492.  HEGEL: Hypergraph Transformer for Long Document Summarization
EMNLP, 2022 Supervised Learning

The paper discusses the challenges of extractive summarization for long documents due to the extended structured input context and long-distance sentence dependency. It proposes HEGEL, a hypergraph neural network that captures high-order cross-sentence relations to improve summarization. HEGEL uses hypergraph transformer layers to update and learn effective sentence representations and fuses different types of sentence dependencies, including latent topics, keywords coreference, and section structure. The paper validates HEGEL through extensive experiments on two benchmark datasets, demonstrating its effectiveness and efficiency.

Shuaiqi Liu, Jiannong Cao, Ruosong Yang, Zhiyuan Wen

493.  Long Text and Multi-Table Summarization: Dataset and Method
EMNLP, 2022

The paper discusses the limitations of existing document summarization methods that focus only on text and filter out non-textual content, such as tables. To address this, the authors propose FINDSum, a large-scale dataset for long text and multi-table summarization. The dataset is built on 21,125 annual reports from 3,794 companies and has two subsets for summarizing each company's results of operations and liquidity. The authors present three types of summarization methods and propose evaluation metrics to assess the usage of numerical information in produced summaries. The paper highlights the importance of jointly considering input textual and tabular data when summarizing report documents.

Alexander R. Fabbri, Prafulla Kumar Choubey, Jesse Vig, Chien-Sheng Wu, Caiming Xiong

494.  Improving Factual Consistency in Summarization with Compression-Based Post-Editing
EMNLP, 2022

The paper discusses the problem of factual inconsistency in summarization models and proposes a model-agnostic approach to address it through post-editing. The focus is on removing extrinsic entity errors, or entities not in the source, to improve consistency while retaining the summary's essential information and form. The proposed method uses sentence-compression data to train the post-editing model to remove errors marked with special tokens. The model improves factual consistency while maintaining ROUGE and can be applied on top of another post-editor, improving entity precision by up to a total of 38%. The paper also compares different post-editing approaches and analyzes settings where post-editors show the largest improvements.

Tomas Goldsack, Zhihao Zhang, Chenghua Lin, Carolina Scarton

495.  Making Science Simple: Corpora for the Lay Summarisation of Scientific Literature
EMNLP, 2022

The paper discusses the importance of lay summarisation in making scientific literature more accessible to non-experts. It highlights the limitations of current corpora for this task and presents two new datasets, PLOS and eLife, containing biomedical journal articles and expert-written lay summaries. The paper characterizes the lay summaries and benchmarks them using mainstream summarization approaches, demonstrating their utility and identifying key challenges. The datasets and code are available for use.

John Glover, Federico Fancellu, Vasudevan Jagannathan, Matthew R. Gormley, Thomas Schaaf

496.  Revisiting text decomposition methods for NLI-based factuality scoring of summaries
EMNLP, 2022

The paper discusses the use of Natural Language Inference models to score the factuality of generated summaries. Previous studies have shown that decomposing either the input document or the summary into sentences can improve factuality scoring. However, the paper systematically compares different granularities of decomposition and shows that fine-grained decomposition is not always the best strategy. The results also suggest that incorporating additional context can improve performance, but this may not apply to all datasets. The paper highlights the importance of caution in model and methodology selection for downstream tasks.

Meng Cao, Yue Dong, Jingyi He, Jackie Chi, Kit Cheung

497.  Learning with Rejection for Abstractive Text Summarization
EMNLP, 2022 Supervised Learning

The paper proposes a new training objective for abstractive summarization that uses rejection learning to identify and reject potentially noisy tokens. They also propose a regularized decoding objective that penalizes non-factual candidate summaries during inference. The method improves the factuality of generated summaries while increasing their abstractiveness, as shown in evaluations compared to five baseline models. Existing methods drop noisy samples or tokens from the training set, reducing its size and creating an artificial propensity to copy words from the source.

Yifu Qiu, Shay B. Cohen

498.  Abstractive Summarization Guided by Latent Hierarchical Document Structure
EMNLP, 2022 Supervised Learning

The paper proposes a new approach to summarizing scientific articles using a hierarchy-aware graph neural network (HierGNN). This approach captures the underlying structure and dependencies between sentences in the input article, which is essential for integrating and consolidating information from different parts of the text. The HierGNN model consists of three main steps: learning a hierarchical document structure, propagating sentence information over this structure, and using graph-level attention to concentrate the decoder on salient information. Experiments show that HierGNN improves upon strong sequence models such as BART, with a significant margin in average ROUGE-1/2/L for CNN/DM and XSum. Human evaluation also demonstrates that summaries produced by HierGNN are more relevant and less redundant than baselines. The model synthesizes summaries by fusing multiple source sentences, rather than compressing a single source sentence, and processes long inputs more effectively.

Mathieu Ravaut, Shafiq Joty, Nancy F. Chen

499.  Towards Summary Candidates Fusion
EMNLP, 2022 Supervised Learning

The paper discusses the limitations of current abstractive summarization methods and proposes a new paradigm called SummaFusion, which fuses multiple summary candidates to produce a novel abstractive second-stage summary. This method improves both the ROUGE scores and qualitative properties of the summaries, especially in the few-shot setup where it sets a new state-of-the-art. The code and checkpoints for SummaFusion are available on GitHub.

Dongmin Hyun, Xiting Wang, Chanyoung Park, Xing Xie, Hwanjo Yu

500.  Generating Multiple-Length Summaries via Reinforcement Learning for Unsupervised Sentence Summarization
EMNLP, 2022 Reinforced Learning

The paper discusses the development of an abstractive model for unsupervised summarization of texts, which is based on reinforcement learning and does not require human-written summaries. The model uses a Markov decision process with rewards to formulate the summarization process and a multi-summary learning mechanism to generate multiple summaries of varying lengths that enhance each other. Experimental results show that the proposed model outperforms both abstractive and extractive models and frequently generates new words not present in the input texts.

Junxian He, Wojciech Kryściński, Bryan McCann, Nazneen Rajani, Caiming Xiong

501.  CTRLSUM: Towards Generic Controllable Text Summarization
EMNLP, 2022 Supervised Learning

The paper introduces CTRLSUM, a framework for generating summaries that can be controlled through a set of keywords. The keywords are automatically extracted during training, and at test time, a control function maps control signals to keywords. The same trained model can be applied to control summaries on various dimensions without affecting the model training process or pretrained models. The framework is effective in entity-centric and length-controllable summarization, contribution summarization on scientific papers, invention purpose summarization on patent filings, and question-guided summarization on news articles. CTRLSUM is also comparable or better than strong pretrained systems in standard, unconstrained summarization settings.

Tanya Goyal, Junyi Jessy Li, Greg Durrett

502.  SNAC: Coherence Error Detection for Narrative Summarization
EMNLP, 2022

The paper discusses the lack of appropriate evaluation frameworks for summarizing long texts, which inhibits progress in this field. The authors introduce SNAC, a narrative coherence evaluation framework for fine-grained annotations of long summaries. They develop a taxonomy of coherence errors in generated narrative summaries and collect annotations for 6.6k sentences across 150 book and movie summaries. The collected annotations allow them to benchmark past work in coherence modeling and train a strong classifier for automatically localizing coherence errors in generated summaries. The SNAC framework can support future work in long document summarization and coherence evaluation, including improved summarization modeling and posthoc summary correction.

Wenchuan Mu Kwan, Hui Lim

503.  Universal Evasion Attacks on Summarization Scoring
EMNLP, 2022

The paper discusses the importance of automatic scoring of summaries in guiding the development of summarizers, but notes that summary scoring has not been studied as a machine learning task to assess its accuracy and robustness. The authors perform evasion attacks to explore the robustness of summary scoring systems and find that non-summary strings can achieve competitive scores with good summarizers on popular metrics such as ROUGE, METEOR, and BERTScore. The attacks also outperform state-of-the-art summarization methods on ROUGE-1 and ROUGE-L, and score the second-highest on METEOR. The authors observe a BERTScore backdoor where a simple trigger can score higher than any automatic summarization method. The low robustness of current scoring systems at the system level is highlighted, and the authors hope that their proposed attacks will facilitate the development of summary scores.

Fei Wang, Kaiqiang Song, Hongming Zhang, Lifeng Jin, Sangwoo Cho, Wenlin Yao, Xiaoyang Wang, Muhao Chen, Dong Yu

504.  Salience Allocation as Guidance for Abstractive Summarization
EMNLP, 2022 Supervised Learning

The paper proposes a new summarization approach called SEASON (SaliencE Allocation as Guidance for Abstractive SummarizatiON) that uses salience expectation to guide abstractive summarization and adapts well to articles with different levels of abstractiveness. The paper argues that extractive summaries as guidance can be too strict and lead to information loss or noisy signals. SEASON is shown to be effective and reliable in automatic and human evaluations on two benchmark datasets, and empirical results on more than one million news articles demonstrate a natural fifteen-fifty salience split for news article sentences, providing useful insights for composing news articles.

Guan-Yu Lin, Pu-Jen Cheng

505.  R-TeaFor: Regularized Teacher-Forcing for Abstractive Summarization
EMNLP, 2022 Supervised Learning

System: The paper proposes a new method called Regularized Teacher-Forcing (R-TeaFor) to address the exposure bias problem in training sequence generation models. R-TeaFor utilizes the pairwise relationship between the original training data and the modified ones for better regularization. The experiments show that R-TeaFor outperforms previous state-of-the-art models in summarization and can be generalized to different pre-trained models.

Huan Yee Koh, Jiaxin Ju, He Zhang, Ming Liu, Shirui Pan

506.  How Far are We from Robust Long Abstractive Summarization?
EMNLP, 2022

The paper discusses the evaluation of long document abstractive summarization systems using fine-grained human annotations. It highlights the trade-off between generating relevant summaries and factual ones, and suggests promising directions for developing factual consistency metrics. The study also reveals the limitations of factuality metrics in detecting different types of factual errors and the effectiveness of ROUGE and BARTScore in evaluating the relevancy of a summary. The authors release their annotated long document dataset to contribute to the development of metrics across a broader range of summarization settings.

Shen Gao, Haotong Zhang, Xiuying Chen, Dongyan Zhao, Rui Yan

507.  Summarizing Procedural Text: Data and Approach
EMNLP, 2022 Supervised Learning

The paper proposes a procedural text summarization task with two summarization granularity: step-view and globalview, which summarizes each step in procedural text separately or gives an overall summary for all steps respectively. To tackle this task, the authors propose an Entity-State Graph-based Summarizer (ESGS) which is based on state-of-the-art entity state tracking methods and constructs a heterogeneous graph to aggregate contextual information for each procedure. The authors also propose to use the contextualized procedure graph representation to predict the salient entity. Experiments conducted on two datasets verify the effectiveness of the proposed model.

Ruifeng Yuan, Zili Wang, Ziqiang Cao, Wenjie Li

508.  Few-shot Query-Focused Summarization with Prefix-Merging
EMNLP, 2022 Supervised Learning

The paper proposes a new approach called prefix-merging for few-shot learning in query-focused summarization. The approach integrates the knowledge of text summarization and question answering into a properly designed prefix and applies it to query-focused summarization. With only a small amount of trainable parameters, prefix-merging outperforms fine-tuning on query-focused summarization. The paper also discusses the influence of different prefix designs and proposes a visualized explanation for how prefix-merging works.

Yizhu Liu, Qi Jia, Kenny Q. Zhu, Shanghai Jiao Tong

509.  Opinion Summarization by Weak-Supervision from Mix-structured Data
EMNLP, 2022 Supervised Learning

The paper discusses the challenges of opinion summarization of multiple reviews and proposes a new method to address the issue. The authors convert each review into a mix of structured and unstructured data, called opinion-aspect pairs (OAs) and implicit sentences (ISs), and synthesize training pairs of such mix-structured data as input and the textual summary as output. They design a summarization model with OA encoder and IS encoder and show that their approach outperforms previous methods on Yelp, Amazon and RottenTomatos datasets.

Vinayshekhar Bannihatti Kumar, Rashmi Gangadharaiah

510.  Are Abstractive Summarization Models truly ‘Abstractive’? An Empirical Study to Compare the two Forms of Summarization
EMNLP, 2022

The paper discusses the shift from extractive to abstractive methods in automatic text summarization, and how large autoregressive language models have contributed to this shift. The authors revisit extractive methods and compare their performance to state-of-the-art abstractive models, finding that abstractive methods are not completely abstract in their generated summaries. They propose an evaluation metric to measure the degree of abstractiveness of a summary compared to extractive methods. The authors conduct experiments on two summarization datasets using five techniques in extractive and abstractive summarization to confirm their findings.

Subhajit Chaudhury, Sarathkrishna Swaminathan, Chulaka Gunasekara, Maxwell Crouse, Srinivas Ravishankar, Daiki Kimura, Keerthiram Murugesan, Ramón Fernandez Astudillo, Tahira Naseem, Pavan Kapanipathi, Alexander Gray

511.  X-FACTOR: A Cross-metric Evaluation of Factual Correctness in Abstractive Summarization
EMNLP, 2022

The paper discusses the issue of factually inconsistent summaries produced by abstractive summarization models and proposes X-FACTOR, a cross-evaluation of three high-performing fact-aware abstractive summarization methods. The authors propose a fact-aware filtering mechanism to improve the quality of training data, a corrector module to improve the factual consistency of generated summaries, and a re-ranking technique to sample summary instances and rerank them based on their factuality. The paper also provides a detailed crossmetric agreement analysis to show how tuning a model to output summaries based on a particular factuality metric influences factuality as determined by other metrics. The goal of the work is to facilitate research that improves the factuality and faithfulness of abstractive summarization models.

Raymond Li, Wen Xiao, Linzi Xing, Lanjun Wang, Gabriel Murray, Giuseppe Carenini

512.  Human Guided Exploitation of Interpretable Attention Patterns in Summarization and Topic Segmentation
EMNLP, 2022 Supervised Learning

The paper discusses the combination of two lines of research on the multi-head self-attention mechanism of the transformer model. The first line of research aims to understand why and how transformers work, while the second proposes new attention augmentation methods to make transformers more accurate, efficient, and interpretable. The authors present a human-in-the-loop pipeline to discover task-specific attention patterns, which are then injected into smaller and original models. The benefits of this approach are demonstrated in two case studies on extractive summarization and topic segmentation, where the models show considerable improvements in accuracy and efficiency after injecting the discovered patterns into attention heads.

Andreas Marfurt, James Henderson

513.  Unsupervised Token-level Hallucination Detection from Summary Generation By-products
EMNLP, 2022 Unsupervised Learning

The paper discusses the issue of hallucinations in abstractive summarization, which are model generations that are not faithful to the source document. Current methods for detecting hallucinations are limited to certain datasets and focus on noun phrases and named entities. The authors propose a new method that detects candidate hallucinations at the token level, regardless of its part of speech, using information already produced during summary generation. They evaluate their method on the CNN/DailyMail dataset and show that it achieves better precision-recall tradeoffs than existing methods. The authors also repurpose an existing factuality dataset and create their own token-level annotations. Overall, their method enables practitioners to generate summaries and identify possible hallucinations with minimal overhead.

Ming Zhong, Yang Liu, Suyu Ge, Yuning Mao, Yizhu Jiao, Xingxing Zhang, Yichong Xu, Chenguang Zhu, Michael Zeng, Jiawei Han

514.  Unsupervised Multi-Granularity Summarization
EMNLP, 2022 Unsupervised Learning

The paper proposes an unsupervised multi-granularity summarization framework called GRANUSUM, which can generate summaries with customizable semantic coverage. The framework uses events as the basic semantic units of the source documents and ranks them by their salience. A model is developed to summarize input documents with given events as anchors and hints, producing multi-granular summaries in an unsupervised manner. The paper also introduces a new benchmark called GranuDUC, which contains multiple summaries at different granularities for each document cluster. Experimental results show that GRANUSUM outperforms strong baselines in multi-granularity summarization and exhibits state-of-the-art performance under conventional unsupervised abstractive setting by exploiting event information.