Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

CEG: A joint model for causal commonsense events enhanced story ending generation

  • Yushi Zhang,

    Roles Data curation, Investigation, Methodology, Validation, Writing – original draft

    Affiliation School of Computer Science and Technology, East China Normal University, Shanghai, China

  • Yan Yang ,

    Roles Conceptualization, Formal analysis, Investigation, Methodology, Validation, Writing – review & editing

    yanyang@cs.ecnu.edu.cn

    Affiliations School of Computer Science and Technology, East China Normal University, Shanghai, China, Shanghai Institute of AI for Education and School of Computer Science and Technology, Shanghai, China

  • Ming Gu,

    Roles Writing – review & editing

    Affiliation School of Computer Science and Technology, East China Normal University, Shanghai, China

  • Feng Gao,

    Roles Writing – review & editing

    Affiliation School of Computer Science and Technology, East China Normal University, Shanghai, China

  • Chengcai Chen,

    Roles Resources

    Affiliation Xiaoi Robot Technology Co.,Ltd, Shanghai, China

  • Liang He

    Roles Funding acquisition

    Affiliations School of Computer Science and Technology, East China Normal University, Shanghai, China, Shanghai Institute of AI for Education and School of Computer Science and Technology, Shanghai, China

Abstract

With the success of pre-trained language models, the performance of story ending generation has been dramatically improved while remaining challenging due to the lack of commonsense reasoning ability. Most previous works mainly focus on using commonsense knowledge to enhance the implicit correlations between words but ignore the hidden causality of sentences or events. In this paper, we propose Causal commonsense Enhanced joint model for story ending Generation (CEG), which incorporates causal commonsense events knowledge to generate a reasonable story ending. Specifically, we first develop a commonsense events inference model trained on GLUCOSE, which converts static knowledge into a dynamic generation model to discover unseen knowledge. It uses prompts to produce various commonsense events behind the stories as pseudo-labels of the dataset. Then, we propose a joint model for the causal events inference task and the story ending generation task to inject inference knowledge into the generation, which consists of a shared encoder, an inference decoder, and a generation decoder. In the causal events inference task, we use the shared encoder and the inference decoder to reason the causal events behind each sentence of the story context to help the model better understand the story and provide long-distance dependencies for the story ending generation. In story ending generation, we combine the hidden states of the causal events with the story context to generate the story ending by the shared encoder and the generation decoder. We jointly train the model on two tasks so that the generation decoder produces the story endings that better match the clues. Experimental results on the ROCStories dataset show that our model outperforms the previous works, demonstrating the effectiveness of the joint model and the generated causal events.

Introduction

Story ending generation aims to conclude a story and complete the plot given the context. It requires a model to understand the implicit commonsense knowledge beyond the text. Pre-trained language models such as GPT-2 [1] and Bert [2] have achieved great success in terms of fluency and informativeness. However, these models only focus on the surface meaning of the text, ignoring the commonsense knowledge behind the stories. Therefore, it is crucial to equip the model with commonsense reasoning ability to discover the commonsense knowledge underlying the story and further improve the performance of story ending generation.

Previous works mainly integrate ConceptNet [3] to enhance the commonsense reasoning ability. Specifically, some methods [4, 5] enrich word representations with their neighbors’ information on knowledge graphs while they neglect inference abilities. Others [6, 7] extract the subgraph from ConceptNet according to the keywords in the story and then explore the appropriate concepts for story generation. Although these studies have made great progress in commonsense reasoning, they purely focus on commonsense correlations at the word level, which might cause incoherence between endings and story clues [47]. As shown in Fig 1, the result of method (a) obtains relevant words of the keywords through the knowledge graph. The word “crush” is highly related to the keywords “love” and “dream” but leads to a wrong story ending which contradicts the story context.

thumbnail
Fig 1. Story endings generated by different methods.

Words in red are keywords of the story, which are linked to ConceptNet to infer related words (orange nodes) in method (a). Method (b) directly generates results according to the story context. Method (c) reasons the causal events behind each sentence by various relations on the commonsense events knowledge base and incorporates them into the ending generation.

https://doi.org/10.1371/journal.pone.0286049.g001

In daily life, story development depends on the whole story rather than a single sentence or word, which implies that humans need to discover the logicality and causality among the story sentences and then capture the in-depth gist. Comparing the results of method (b) and method (c) in Fig 1, the former is directly generated by story context without reasoning, which lacks understanding of the whole story and only pays attention to the last two sentences. Meanwhile, the result of method (c) in Fig 1 shows that explicitly fusing the commonsense events behind the story context into the ending generation can help predict future events and generate a reasonable ending.

Luckily, some large-scale datasets of implicit event commonsense knowledge have been proposed, such as ATOMIC [8] and GLUCOSE [9]. Unlike ConceptNet, these knowledge bases focus on causal relations between events, which provide the basis for reasoning on the event level [8, 9]. Specifically, GLUCOSE [9] defines ten relations in daily life, including event, emotion, possession, etc. Therefore, with the help of commonsense knowledge about events, models can capture the underlying events in the context reasoned by different relations to understand the story better and make the ending generation coherent with the story clues.

In this paper, we propose Causal commonsense Enhanced joint model for story ending Generation (CEG), which infers multi-relational implicit commonsense events behind the context and integrates them explicitly into the story ending generation to make the ending more logical. Specifically, we propose a commonsense events inference model based on the BART [10] and fine-tune it on GLUCOSE [9]. It uses the prompts to discover underlying commonsense events of each story sentence by different relations as pseudo-labels. Furthermore, using the generation model can avoid being restricted to the size of the original knowledge base and discover unseen knowledge. Then, we devise a joint model for the causal events inference task and the story ending generation task, which consists of a shared encoder, an inference decoder, and a generation decoder. The causal events inference task uses the shared encoder and the inference decoder to reason underlying causal events of each story sentence, which assists model to understand the story from a global perspective and captures long-distance dependencies. In the story ending generation task, we encode the hidden states of causal events by the shared encoder and combine them with the story context to generate the story ending by the generation decoder. By sharing the parameter of the encoder, the model can better utilize its semantic information. Finally, we jointly train the model by the losses from two tasks to make them integrative and avoid error propagation. Experimental results on the ROCStories dataset [11] show that our model outperforms previous works, demonstrating the effectiveness of CEG and the generated commonsense events from multiple context sentences.

Our contributions are summarized as follows:

  • We propose a story commonsense events inference model trained on GLUCOSE, which transforms static knowledge into a dynamic generative model. It infers commonsense events by prompts as pseudo-labels to train the joint model in the causal events inference task, which can improve the commonsense reasoning ability of the model.
  • To the best of our knowledge, this is the first time that a joint model has been proposed for the causal events inference task and the story ending generation task, which infers causal events from each story sentence to provide long-distance commonsense information and improve the generation performance. The joint training of the model and the sharing of parameters make the inferred events more beneficial for the generation.
  • We conduct experiments on the story ending generation dataset ROCStories, and the results demonstrate that our model generates more reasonable story endings than the previous works.

Related works

Story ending generation is an important task for story generation, which aims to generate reasonable story ending according to the story context. However, it is difficult to capture commonsense knowledge only using surface information of the story. Therefore, some works constructed commonsense knowledge bases by exploring the relevance between the concepts or events and it is an effective way to integrate commonsense knowledge with generation models.

Story generation

Story generation is the task of generating a reasonable story according to the leading context. It needs the model to generate multiple story sentences given the beginning of the story or some keywords [1216]. [14, 16] utilized the VAE to encode the story plots or the keywords and then generated the follow-up stories. [15] introduced the event sequence as the trigger to help generate stories. In addition, some works [12, 13] used external knowledge to enhance story generation. [12] trained the model on ConceptNet and Atomic to obtain commonsense knowledge, which lacked inference according to the story. [13] utilized ConceptNet to reason the related words according to the keywords in the leading context but ignored the reasoning in the story context.

Different from story generation, story ending generation requires the model to generate a story ending according to the given story context. According to the result in [16], story plots become more and more complex and informative as the story progresses, so the story ending generation is more difficult than the story generation. [17] proposed a Seq2Seq model using adversarial training to enrich the diversity of the results. Compared to the maximum likelihood estimation, adversarial training prevented the model from generating a general ending for different story contexts. [18] combined the generation model with reinforcement learning to make the ending sensible. [19] drove the model to focus on the key phrases in the context to avoid the general ending. [20] turned the story context to the graph with relation dependency to improve the logical capability of the model. [21] proposed controllable story generation to generate the specific story ending for the given context and user’s intent. However, these models only trained on the large-scale corpus, which ignored commonsense knowledge.

Commonsense knowledge

Commonsense knowledge bases play an essential role in enhancing the commonsense reasoning ability of models. ConceptNet [3] is a well-known knowledge graph connecting words and phrases with their relations. However, it mainly focuses on commonsense relevance between words and can not provide enough information to capture the relations between events. Unlike ConceptNet, ATOMIC [8] is composed of event sentences and their relations, such as “what impact does an event have on X”. [22] proposed COMET fine-tuned on ATOMIC to generate knowledge by a generative model. [9] proposed GLUCOSE, which included a story-specific statement paired with an inference rule generalized from the statement. The events of GLUCOSE are daily life stories that are more suited for the story ending generation task, so we use GLUCOSE as our commonsense knowledge base.

Commonsense knowledge integrated generation

It has been proven that pre-trained language models like Bert [2] or GPT-2 [1] contain commonsense knowledge. However, these models still lack reasoning ability and can not obtain knowledge behind the context, which is impossible to solve only by increasing the model size [23]. So how to integrate commonsense knowledge into the generation model is a critical challenge in many tasks. [5, 6, 24] searched for the sub-graphs from the ConceptNet according to the context and integrated them into the model to enrich the commonsense knowledge of the model. However, these works only tried to infuse the word representation of commonsense knowledge into the model and ignored the reasoning. [7] made inferences on the ConceptNet to reason possible concepts for the generation but only focused on the word level, which ignored the relations of events. [25] extracted the representation from COMET to improve the commonsense reasoning of the model. [26] used a generation model fine-tuned on the commonsense knowledge base to complete the tasks.

Unlike these works, our model aims to infer implicit events behind the story to improve story ending generation. We first propose a commonsense events inference model fine-tuned on GLUCOSE, which transformed the static knowledge base into a dynamic generative model to infer the commonsense events behind the story as pseudo-labels. In addition, the generative model is more generalized than static knowledge and can generate some unseen knowledge. After that, we propose a joint model to generate the story ending through two tasks, causal events inference task, and story ending generation task. In the causal events inference task, we reason the causal events according to each sentence of the story by different relations, which provide global information and long-distance dependencies for the story ending generation. In the story ending generation task, we generate the ending by infusing the causal events with the story context. We jointly train the model and share the parameter of the encoder to make causal events more helpful to the story ending generation.

Materials and methods

In this section, we will introduce our method in detail. To find out commonsense knowledge behind the story, we first propose commonsense events inference model finetuned on the GLUCOSE and discover implicit commonsense events as pseudo labels for the story. Then we design a joint model trained on the causal events inference task and story ending generation task. The causal event inference task reasons causal events by different relations and provides long-distance information for the story ending generation task. The joint training makes the causal events contain more information beneficial for the story ending generation and helps the model generate more reasonable story endings.

Task formulation

In this paper, we exploit commonsense events behind the story context to help generate a more reasonable story ending. The task is formalized as follows: given the story context X = {X1, X2, ⋯, Xm}, where m is the sentence number of context, we need to generate a reasonable ending Y and complete the plot. Each sentence Xi contains several words . To perceive the logicality behind the story, we first reason the causal events C = {C11, C12, ⋯, C1n, C21, ⋯, Cmn} behind the story through relation set R = {r1, r2, ⋯, rn}, where n is the number of different relation types and Cij is the causal event inferred by story sentence Xi and relation rj. Then we combine causal events with story context to generate a reasonable story ending Y, formally as: (1)

Commonsense events inference

To generate a reasonable story ending, the model is required to understand the causal relevance between different events of the story. We use the external commonsense knowledge base GLUCOSE [9] to train the model and assist the model in inferring the events from the story context. Since GLUCOSE cannot cover all events in the story and link them correctly to the corresponding event in the knowledge base, we design a commonsense events inference model fine-tuned on GLUCOSE. It can transform static knowledge into a dynamic generative model and generate commonsense events unseen in the knowledge base. And then, we employ it to reason about the commonsense events of the story as pseudo-labels to augment the data and train the joint model.

We fine-tune a BART [10] model on GLUCOSE. At the encoding step, we input the entire story context X = {X1, X2, ⋯, Xm} with story ending Y as input to the BART encoder and get their representation H. At the decoding step, we use the prompts of different relations to guide the model generation. For each relation in GLUCOSE, we set a prompt like “emotion that motivates 〈sentence〉 is”, where the special token 〈sentence〉 is replaced by the specific sentence Xi needing inference. More detailed relation categories and prompts are shown in Table 1. Together with the representation from the encoder, the prompt Promptij of the relation rj and the specific sentence Xi will be fed into the BART decoder to generate the corresponding event Iij. (2) (3) At the inference step, we use the same format as the training step to generate commonsense events I = {I11, I12, ⋯, I1n, I21, ⋯, Imn} for each story, where Iij is the commonsense event inferred from story sentence Xi and relation rj. These commonsense events are regarded as pseudo-labels to train the model in the causal events inference task.

Causal commonsense enhanced joint model for story ending generation

We propose Causal commonsense Enhanced joint model for story ending Generation (CEG) to infuse commonsense knowledge into story ending generation. Fig 2 illustrates the model architecture, which consists of a shared encoder, an inference decoder, and a generation decoder. The model is jointly trained on the causal events inference task and the story ending generation task. The causal events inference task aims to generate causal events relevant to the story context using the shared encoder and inference decoder. Each causal event is inferred by the corresponding sentence in the story context using a specific causal relation. Then we generate the story ending according to the story context and the hidden states of causal events in the story ending generation task by the shared encoder and the generation decoder. Two tasks share the same encoder to better utilize its linguistic properties. At last, we train the model by the losses from two tasks jointly to make the causal events more adaptive for story ending generation.

thumbnail
Fig 2. The architecture of CEG.

The model architecture consists of three components, a shared encoder, an inference decoder, and a generation decoder. The causal events inference task uses the shared encoder and the inference decoder to generate causal events for the story context. The story ending generation task applies the shared encoder and the generation decoder, which integrates the causal events and the story context to finish the ending.

https://doi.org/10.1371/journal.pone.0286049.g002

Causal events inference.

The commonsense knowledge behind the story is implicit, and the language model usually can not capture them just by the story context. So we use external commonsense knowledge to train the model reasoning events related to the story. The causal events inference uses an encoder-decoder architecture model [27], which consists of a shared encoder and an inference decoder. We concatenate the sentences in the story context X = {X1, X2, ⋯, Xm} with special tokens, which can be denoted as Xconcat = [X1, 〈/s〉, X2, 〈/s〉, ⋯, 〈/s〉, Xm−1], and then input it into the shared encoder to get the story representation Hs: (4) (5) where the shared encoder is a Transformer-based [28] encoder. FFN is a fully connected feed-forward network containing two linear transformations with a ReLU activation. MultiHead is the multi-head attention layer of the Transformer, which encodes the value V according to the attention weights calculated by query Q to key K. (6) (7) (8) where and are the parameters of the attention mechanism, M is the attention mask, and dk is the dimension of the query and key.

At the decoding step, we use the inference decoder to generate the causal events C = {C11, C12, ⋯, C1n, C21, ⋯, Cmn} by prompts of the relation set R and story representation Hs, where m is the sentence number of context and n is the number of relations. The inference decoder is a Transformer-based decoder consisting of self attention, cross attention, and a fully connected feed-forward network. We input the prompts shown in Table 1 into the decoder and guide the model to generate causal events. The Promptij is the prompt of story sentence Xi and relation rj, which is used to infer the causal event Cij. The story representation Hs is input into the cross attention and calculated the attention weight with the hidden states from the self attention. Finally, we get the hidden states of the causal events O = {O11, ⋯, O1n, ⋯, Omn} and their corresponding sentences C = {C11, ⋯, C21, ⋯, Cmn}: (9) (10) (11) (12) (13) where is the output of the self attention in step k, and is the output of the cross attention. In self attention, each token x(k) calculates attention weight with history tokens x(1 : k−1), and the self-attention mask is the left-to-right form to ensure each token only depends on the history tokens, as shown in Fig 3. At the cross attention, we calculate attention weight between the output hidden states of self attention and story representation Hs. Finally, we get the hidden state o(k) and its corresponding token , where o(k) is the kth hidden state of Oij, is the kth token of the Cij.

thumbnail
Fig 3. Mask of different modules.

The first mask is the encoder mask, which is bidirectional, and each token accesses the information of other tokens. The second mask is the mask of the self attention in the decoder, which is a left-to-right mask, and each token only accesses information of history tokens. The third mask is the mask of the cross attention in the decoder, where each token access all information from the encoder.

https://doi.org/10.1371/journal.pone.0286049.g003

We calculate the loss of the casual events between generated results C and the pseudo-labels I. The loss function is the negative log-likelihood function: (14) where m is the number of sentences in story, n is the number of inference events, and lij is the sequence length of the causal event Cij, yk is the kth token of Iij and is the kthe token of Cij. P(⋅) represents the probability score from the model output. log(⋅) is the base-2 logarithm.

Furthermore, we use the hidden states of the causal events O = {O11, ⋯, O1n, ⋯, Omn} in the story generation task instead of the causal events sentence C, so the loss from the story ending generation task can optimize the generation of the causal events and make causal events more precisely.

Story ending generation.

Starting with the hidden states of the causal events O, we apply a Transformer-based encoder-decoder architecture model with the shared encoder and the generation decoder to generate a story ending. To utilize its linguistic and semantic information, we share the same encoder in the causal events inference task. First, we use the shared encoder to encode the hidden states O into the intermediate representations . Each hidden state Oij encode into individually. We concatenate each intermediate representation and story representation Hs, then we input concatenation into the generation decoder to yield the result, which is applied to compute the attention weight with each step’s hidden states in the cross attention of the generation decoder. (15) (16) where is the generated story ending of the model. is the kth token of the ending, and are history tokens.

The story ending loss is a negative log-likelihood between generated ending and the golden ending Y. (17) where is the kth token of generated ending and y(k) is the kth token of the golden ending Y. P(⋅) represents the probability score from the model output and log(⋅) is the base-2 logarithm.

Training objective.

The training objective is the sum of the losses in two tasks: (18) where is the loss of the causal events inference task and is the loss of the story ending generation task. And then, we can train the model jointly on two tasks by the total loss .

Results

To evaluate the ability of our model, we conduct experiments on ROCStories [11] and compare our model with other baseline models. We both use automatic metrics and human evaluation to measure the performance of the models. Then, we explore the effectiveness of different parts in our model and represent some cases of models to analyze the difference between them.

Dataset

We conduct experiments on the ROCStories dataset, which contains stories describing daily life, and each story consists of five sentences. The task of the dataset is to use the first four sentences as story context to generate a reasonable story ending. We use the data splits in [7] as Table 2 shown. To highlight causal events behind the stories, we employ GLUCOSE as our commonsense knowledge base, containing the commonsense events inferred from the story context. The size of GLUCOSE is shown in Table 2.

Implementation details

Five types of relations defined in GLUCOSE cover the causes and effects of the specific sentence, including event, emotion, location, possession, and other attributes, as shown in Table 1. For each story sentence and relation, we infer one causal event. We use the base version of BART parameters to initialize the shared encoder and two decoders. Both the encoder and decoder have 768-dimensional hidden states, 6 layers, and 12 attention heads. All our experiments are conducted on a cloud computing instance with Intel Xeon Gold 6130 CPU and Nvidia Tesla A100 40GB GPU.

Automatic metrics

We adopt following automatic metrics to evaluate the performance of models.

BLEU-n: BLEU-n measures the similarity between generated endings and ground truths. It calculate the n-gram overlap against the golden story ending Y and the generated story ending . The higher BLEU score, the closer the story ending generated by the model is to the correct answer, and the better the model’s generation performance. (19) where Countmatch(ngram) is the number of the n-gram that both appeared in generated story ending and the golden story ending Y. Count(ngram) is the number of n-gram in the golden story ending Y. In our experiment, we evaluate the results with n = 1, 2, 3, 4, which are represented as BLEU-1, BLEU-2, BLEU-3, and BLEU-4.

DISTINCT-n: DISTINCT-n measures the diversity of the generated story endings. It calculates the ratio of the distinct n-gram in the entire generated story endings. The higher DISTINCT score, the more diverse the results generated by the model are. (20) where Countdistinct(ngram) is the number of the distinct n-gram in entire generated story endings. Count(ngram) is the number of n-gram in entire generated story endings. In our experiment, we evaluate the results with n = 1, 2, 3, 4, which are represented as DISTINCT-1, DISTINCT-2, DISTINCT-3, DISTINCT-4.

In the test step, we use the checkpoint with the best BLEU-1 on the development set as the final model to generate endings on the test set. In addition, we also conduct a human evaluation to investigate the model’s ability.

Baselines

We compare our model with several promising baselines:

Seq2Seq [7]: A simple encoder-decoder model based on long short-term memory (LSTM) with an attention mechanism.

IE+GA [5]: IE+GA uses an incremental encoder to model the story context from different sentences. After that, a multi-source attention mechanism is applied to gather the information from the story and the ConceptNet.

T-CVAE [16]: A conditional variable autoencoder based on Transformer for story ending generation, which uses a latent variable to learn the distribution of story.

GPT2 [1]: We finetune a GPT2 model on the ROCStories dataset to generate the story ending.

GRF [7]: A generation model enables pre-trained models with dynamic multi-hop reasoning on the external commonsense knowledge graph to generate the result.

Result

The results of automatic metrics are summarized in Table 3.

According to Table 3, we can observe the following:

  • The results of our model show an improvement over the baseline models on diverse metrics. It indicates that the commonsense knowledge of events contributes to the story ending generation. The model obtains the commonsense reasoning ability through joint training and infers causal commonsense events to improve the story ending generation. It allows the generation model to understand the implicit commonsense knowledge behind the story and generate more reasonable endings.
  • Comparing the results of CEG and GRF, our model is higher than GRF on BLEU-1 to BLEU-4 by 0.8, 0.7, 0.6, and 0.5, which indicates that commonsense knowledge is more helpful to the story ending generation. Causal events of the story context can predicate the story’s development and lead to a logical ending. In addition, causal events help the CEG perform better on sentence diversity and get higher DISTINCT-3 and DISTINCT-4 by 3.0 and 5.1 respectively. At the same time, GRF searches the related words on ConceptNet to enhance the story context DISTINCT-1 and DISTINCT-2 by 2.0 and 1.6, which leads to better lexical diversity.
  • We find that the pre-trained language models improve generated results on language diversity by comparing the results between pre-trained language models (GPT-2, GRF, CEG) and traditional language models (Seq2Seq, IE+GA, T-CVAE). The DISTINCT-1 and DISTINCT-2 scores of GPT-2 were higher than the T-CVAE by 4.8 and 10.7. However, the pre-trained model only uses surface semantic information and lacks reasoning ability. It generates the story ending through the related information in the dataset, which may cause errors in unseen scenarios. To avoid these problems, we incorporate commonsense knowledge into the model to enhance the reasoning ability, which can improve our model on BLEU-1 by 1.8 compared with GPT-2.

Human evaluation

We conduct the human evaluation for generated sentences based on reasonableness and fluency, and both metrics are on a 1-3 Likert scale. The higher score indicates the model generates more reasonable and fluent endings. Fluency measures whether the generated sentences are coherent and fluent. Annotators should evaluate the grammar and the readability of the ending. Reasonableness measures whether the generated sentences are suitable for the story context. Annotators focus on whether the story ending is appropriate under the given story context. We randomly sample 200 generated results from CEG and GRF to conduct the human evaluation. The human evaluation results are summarized in Table 4.

The results show that our model performs better than GRF on reasonableness by 0.1 and on fluency by 0.03, which demonstrates our model’s capability of reasoning. GRF search for words that are just story keywords’ neighbors in ConceptNet. However, these words may not fit the story context, leading to lower reasonableness and fluency. In contrast, our model infers different causal events, which not only improve the reasonableness but also provide a sentence-level refinement to generate a more fluent story ending.

Ablation study

In this part, we evaluate the influence of different attributes choices of relation in the generation model. We remove the different relation types in CEG and compare the results between the baseline and CEG. The results are summarized in Table 5.

Comparing the results of different experiments, we find that all different relation types help the model perform better. CEG model infuses different types of causal events to comprehensively understand the story and generate a more reasonable ending based on the context and the BLEU scores decrease when CEG removes different relations. In addition, we find the BLEU-1 and BLEU-2 of CEG w/o emotion decline most among all ablation studies. We analyze the dataset and find that emotional events run through the main storyline. Therefore, it is important for the model to reason causal events about emotion to capture the story’s clues and then generate a reasonable ending. With the help of emotional knowledge, the model can more effectively grasp the main storyline and utilize knowledge of other types.

Furthermore, we conduct experiments to evaluate the effect of joint training. We use sentences of causal events as external knowledge instead of hidden states to enhance the story ending generation (CEG w/o joint). In this way, the loss of the generation model is detached from the inference model.

Table 6 shows that the performance of CEG w/o joint decreases by 1.9 on BLEU-1 and 1.7 on BLEU-2. Without joint training, the model has a gap between the causal events inference and story ending generation. The focus on causal events is scattered, disrupting the generation of models and leading to worse results. Joint training allows the model to focus on the events required to produce a reasonable story ending.

Case study

Table 7 provides examples of ROCStories and generated results of CEG and GRF. Combining endings generated by the two models with the story context, we find endings of GRF contain some logical errors. In the first case, the story context describes an accident during the trip to the beach, but GRF only captures vague development at the word level and ignores the casual information between the sentences. Meanwhile, CEG gets the inference that “Jen needs help” or “Jen wants safety” from the second sentence and “Jen is rescued” from the fourth sentence. Therefore, CEG can utilize the long-distance events and understand the story’s main idea, making the result more related to the context.

Furthermore, in the second case, GRF produces an ending that does not match the story context and ignores the story’s logic. In contrast to GRF, our model yields inference sentences “Jill feel(s) prepared” and “Jill possess(es) a real estate license”. We also show some inference sentences in Table 8. We find that the model has inferred many related events of the story context, which are useful for the generation.

Conclusion

In this paper, we propose CEG, a joint model which infers the causal events behind the story to enhance the story ending generation. We first use the commonsense events inference model to convert GLUCOSE into a dynamic generative model to discover the commonsense events unseen in the original knowledge base. Then we propose a joint model trained on causal events inference task and story ending generation task. In the causal events inference task, the model reason causal events behind each story sentence by different relations to provide long-distance commonsense knowledge, which can help the model better understand the story and generate more reasonable story endings. The joint training and parameter sharing of the model can make causal events more related to the story ending. The experiments on ROCStories dataset show that our model outperforms the best baseline by 0.8 on BLEU-1 and 3.0 on DISTINCT-3, which indicates that our results are more commonsense and informative. Our model also performs better on reasonableness by 0.1 and on fluency by 0.03 in human evaluation, which shows that our model generates more reasonable and fluent endings. In addition, ablation studies demonstrate the effectiveness of commonsense events inference and joint training in our model.

Our model lacks interpretability in knowledge selection and has limited commonsense relationships. In the future, we plan to improve interpretability by explicitly selecting knowledge mechanisms. In addition, we will enhance reasoning ability by combining different commonsense knowledge bases, which can provide more extensive information and knowledge.

Acknowledgments

We thank the reviewers for their time and effort in reviewing our work. With their helpful comments and suggestions, we can improve the quality of our work. In addition, special thanks to our colleagues in the lab for their help and support in this work.

References

  1. 1. Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I, et al. Language models are unsupervised multitask learners. OpenAI blog. 2019;1(8):9.
  2. 2. Devlin J, Chang M, Lee K, Toutanova K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein J, Doran C, Solorio T, editors. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers). Association for Computational Linguistics; 2019. p. 4171–4186.
  3. 3. Speer R, Chin J, Havasi C. ConceptNet 5.5: An Open Multilingual Graph of General Knowledge. In: Singh SP, Markovitch S, editors. Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, February 4-9, 2017, San Francisco, California, USA. AAAI Press; 2017. p. 4444–4451. Available from: http://aaai.org/ocs/index.php/AAAI/AAAI17/paper/view/14972.
  4. 4. Chen J, Chen J, Yu Z. Incorporating Structured Commonsense Knowledge in Story Completion. In: The Thirty-Third AAAI Conference on Artificial Intelligence, AAAI 2019, The Thirty-First Innovative Applications of Artificial Intelligence Conference, IAAI 2019, The Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019, Honolulu, Hawaii, USA, January 27—February 1, 2019. AAAI Press; 2019. p. 6244–6251.
  5. 5. Guan J, Wang Y, Huang M. Story Ending Generation with Incremental Encoding and Commonsense Knowledge. In: The Thirty-Third AAAI Conference on Artificial Intelligence, AAAI 2019, The Thirty-First Innovative Applications of Artificial Intelligence Conference, IAAI 2019, The Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019, Honolulu, Hawaii, USA, January 27—February 1, 2019. AAAI Press; 2019. p. 6473–6480.
  6. 6. Zhang H, Liu Z, Xiong C, Liu Z. Grounded Conversation Generation as Guided Traverses in Commonsense Knowledge Graphs. In: Jurafsky D, Chai J, Schluter N, Tetreault JR, editors. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020. Association for Computational Linguistics; 2020. p. 2031–2043.
  7. 7. Ji H, Ke P, Huang S, Wei F, Zhu X, Huang M. Language Generation with Multi-Hop Reasoning on Commonsense Knowledge Graph. In: Webber B, Cohn T, He Y, Liu Y, editors. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, November 16-20, 2020. Association for Computational Linguistics; 2020. p. 725–736.
  8. 8. Sap M, Bras RL, Allaway E, Bhagavatula C, Lourie N, Rashkin H, et al. ATOMIC: An Atlas of Machine Commonsense for If-Then Reasoning. In: The Thirty-Third AAAI Conference on Artificial Intelligence, AAAI 2019, The Thirty-First Innovative Applications of Artificial Intelligence Conference, IAAI 2019, The Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019, Honolulu, Hawaii, USA, January 27—February 1, 2019. AAAI Press; 2019. p. 3027–3035.
  9. 9. Mostafazadeh N, Kalyanpur A, Moon L, Buchanan DW, Berkowitz L, Biran O, et al. GLUCOSE: GeneraLized and COntextualized Story Explanations. In: Webber B, Cohn T, He Y, Liu Y, editors. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, November 16-20, 2020. Association for Computational Linguistics; 2020. p. 4569–4586.
  10. 10. Lewis M, Liu Y, Goyal N, Ghazvininejad M, Mohamed A, Levy O, et al. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. In: Jurafsky D, Chai J, Schluter N, Tetreault JR, editors. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020. Association for Computational Linguistics; 2020. p. 7871–7880.
  11. 11. Mostafazadeh N, Chambers N, He X, Parikh D, Batra D, Vanderwende L, et al. A Corpus and Cloze Evaluation for Deeper Understanding of Commonsense Stories. In: Knight K, Nenkova A, Rambow O, editors. NAACL HLT 2016, The 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego California, USA, June 12-17, 2016. The Association for Computational Linguistics; 2016. p. 839–849.
  12. 12. Guan J, Huang F, Zhao Z, Zhu X, Huang M. A Knowledge-Enhanced Pretraining Model for Commonsense Story Generation. Transactions of the Association for Computational Linguistics. 2020;8:93–108.
  13. 13. Xu P, Patwary M, Shoeybi M, Puri R, Fung P, Anandkumar A, et al. MEGATRON-CNTRL: Controllable Story Generation with External Knowledge Using Large-Scale Language Models. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Online: Association for Computational Linguistics; 2020. p. 2831–2845. Available from: https://aclanthology.org/2020.emnlp-main.226.
  14. 14. Xie Z, Lau JH, Cohn T. Exploring Story Generation with Multi-task Objectives in Variational Autoencoders. In: Proceedings of the The 19th Annual Workshop of the Australasian Language Technology Association. Online: Australasian Language Technology Association; 2021. p. 97–106. Available from: https://aclanthology.org/2021.alta-1.10.
  15. 15. Tang C, Lin C, Huang H, Guerin F, Zhang Z. EtriCA: Event-Triggered Context-Aware Story Generation Augmented by Cross Attention. In: Goldberg Y, Kozareva Z, Zhang Y, editors. Findings of the Association for Computational Linguistics: EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022. Association for Computational Linguistics; 2022. p. 5504–5518. Available from: https://aclanthology.org/2022.findings-emnlp.403.
  16. 16. Wang T, Wan X. T-CVAE: Transformer-Based Conditioned Variational Autoencoder for Story Completion. In: Kraus S, editor. Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI 2019, Macao, China, August 10-16, 2019. ijcai.org; 2019. p. 5233–5239.
  17. 17. Li Z, Ding X, Liu T. Generating Reasonable and Diversified Story Ending Using Sequence to Sequence Model with Adversarial Training. In: Bender EM, Derczynski L, Isabelle P, editors. Proceedings of the 27th International Conference on Computational Linguistics, COLING 2018, Santa Fe, New Mexico, USA, August 20-26, 2018. Association for Computational Linguistics; 2018. p. 1033–1043. Available from: https://aclanthology.org/C18-1088/.
  18. 18. Zhao Y, Liu L, Liu C, Yang R, Yu D. From Plots to Endings: A Reinforced Pointer Generator for Story Ending Generation. In: Zhang M, Ng V, Zhao D, Li S, Zan H, editors. Natural Language Processing and Chinese Computing—7th CCF International Conference, NLPCC 2018, Hohhot, China, August 26-30, 2018, Proceedings, Part I. vol. 11108 of Lecture Notes in Computer Science. Springer; 2018. p. 51–63.
  19. 19. Gupta P, Bannihatti Kumar V, Bhutani M, Black AW. WriterForcing: Generating more interesting story endings. In: Proceedings of the Second Workshop on Storytelling. Florence, Italy: Association for Computational Linguistics; 2019. p. 117–126. Available from: https://aclanthology.org/W19-3413.
  20. 20. Huang Q, Mo L, Li P, Cai Y, Liu Q, Wei J, et al. Story Ending Generation with Multi-Level Graph Convolutional Networks over Dependency Trees. In: Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI 2021, Virtual Event, February 2-9, 2021. AAAI Press; 2021. p. 13073–13081. Available from: https://ojs.aaai.org/index.php/AAAI/article/view/17545.
  21. 21. Peng N, Ghazvininejad M, May J, Knight K. Towards Controllable Story Generation. In: Proceedings of the First Workshop on Storytelling. New Orleans, Louisiana: Association for Computational Linguistics; 2018. p. 43–49. Available from: https://aclanthology.org/W18-1505.
  22. 22. Bosselut A, Rashkin H, Sap M, Malaviya C, Celikyilmaz A, Choi Y. COMET: Commonsense Transformers for Automatic Knowledge Graph Construction. In: Korhonen A, Traum DR, Màrquez L, editors. Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, Volume 1: Long Papers. Association for Computational Linguistics; 2019. p. 4762–4779.
  23. 23. Zellers R, Holtzman A, Bisk Y, Farhadi A, Choi Y. HellaSwag: Can a Machine Really Finish Your Sentence? In: Korhonen A, Traum DR, Màrquez L, editors. Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, Volume 1: Long Papers. Association for Computational Linguistics; 2019. p. 4791–4800.
  24. 24. Zhou H, Young T, Huang M, Zhao H, Xu J, Zhu X. Commonsense Knowledge Aware Conversation Generation with Graph Attention. In: Lang J, editor. Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI 2018, July 13-19, 2018, Stockholm, Sweden. ijcai.org; 2018. p. 4623–4629.
  25. 25. Bhagavatula C, Bras RL, Malaviya C, Sakaguchi K, Holtzman A, Rashkin H, et al. Abductive Commonsense Reasoning. In: 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net; 2020. Available from: https://openreview.net/forum?id=Byg1v1HKDB.
  26. 26. Ammanabrolu P, Cheung W, Broniec W, Riedl MO. Automated Storytelling via Causal, Commonsense Plot Ordering. In: Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI 2021, Virtual Event, February 2-9, 2021. AAAI Press; 2021. p. 5859–5867. Available from: https://ojs.aaai.org/index.php/AAAI/article/view/16733
  27. 27. Cho K, van Merrienboer B, Gülçehre Ç, Bahdanau D, Bougares F, Schwenk H, et al. Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. In: Moschitti A, Pang B, Daelemans W, editors. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, October 25-29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL. ACL; 2014. p. 1724–1734.
  28. 28. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. Attention is All you Need. In: Guyon I, von Luxburg U, Bengio S, Wallach HM, Fergus R, Vishwanathan SVN, et al., editors. Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA; 2017. p. 5998–6008. Available from: https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html.