TLDR Progress

Pipeline Components

Input Encoding: Consists of methods to better encode long documents which cannot be processed (without truncating) by standard Transformer models, for instance, by using hierarchical (or graph) attention, and/or feeding additional context to the model such as the discourse structure of the document (if available), or user-specific aspects.
Unit Relationship: Consists of methods that explicitly consider the relationship between the source document's units (words, sentences, or even passages). A key motivation for this is the information redundancy in a long document, or key connections between various parts of the document that may be ignored via standard encoding of the document.
Data Augmentation: Consists of methods that employ various data augmentation techniques to either introduce specific aspects (facets) into the model, or to create contrastive (or adversarial) examples for robustness, or to simply overcome the lack of suitable data in low-resource domains.
External Knowledge: Consists of methods that use external knowledge to improve the model's performance, for instance, via knowledge graphs, domain-specific vocabularies, or information from pre-trained language models.
Objective Function: Consists of methods that introduce new objective functions alongside standard the cross-entropy loss to better suit the task at hand, for instance, to emphasis on diversity, faithfulness, or custom rewards (in case of reinforcement learning).
Auxiliary Tasks: Consists of methods that employ additional tasks, for instance, via multi-task learning, or via pre-training on related tasks (such as textual entailment, paraphrasing, gap-sentence prediction, etc.) that help the summarization task.
Unit Selection: Consists of methods that explicitly select the units (words, sentences, or even passages) that are most relevant to the summary, for instance via copying or pointing to specific text spans in the source document. A key motivation for this is the loss of information when the model is forced to generate a summary of a fixed length.
Controlled Generation: Consists of methods that guide the model to generate summaries with specific attributes such as style, length, tone etc., for instance, by providing additional text as guidance to condition the summary generation process, or to restrict the model's vocabulary to a specific domain.
Post Processing: Consists of methods that post-process the generated summaries to improve their quality, for instance, via re-ranking, re-writing, or swapping specific spans of the summary to achieve the desired goal.

Overview

Annotation Scheme

Pipeline Components