User Manual Privacy Policy Disclaimer Contact us
  Advanced SearchBrowse





AligNarr: Aligning Narratives of Different Length for Movie Summarization


Abouhamra,  Mostafa
Databases and Information Systems, MPI for Informatics, Max Planck Society;
International Max Planck Research School, MPI for Informatics, Max Planck Society;

External Ressource
No external resources are shared
Fulltext (public)
There are no public fulltexts stored in PuRe
Supplementary Material (public)
There is no public supplementary material available

Abouhamra, M. (2019). AligNarr: Aligning Narratives of Different Length for Movie Summarization. Master Thesis, Universität des Saarlandes, Saarbrücken.

Cite as: http://hdl.handle.net/21.11116/0000-0004-5836-D
Automatic text alignment is an important problem in natural language processing. It can be used to create the data needed to train different language models. Most research about automatic summarization revolves around summarizing news articles or scientific papers, which are somewhat small texts with simple and clear structure. The bigger the difference in size between the summary and the original text, the harder the problem will be since important information will be sparser and identifying them can be more difficult. Therefore, creating datasets from larger texts can help improve automatic summarization. In this project, we try to develop an algorithm which can automatically create a dataset for abstractive automatic summarization for bigger narrative text bodies such as movie scripts. To this end, we chose sentences as summary text units and scenes as script text units and developed an algorithm which uses some of the latest natural language processing techniques to align scenes and sentences based on the similarity in their meanings. Solving this alignment problem can provide us with important information about how to evaluate the meaning of a text, which can help us create better abstractive summariza- tion models. We developed a method which uses different similarity scoring techniques (embedding similarity, word inclusion and entity inclusion) to align script scenes and sum- mary sentences which achieved an F1 score of 0.39. Analyzing our results showed that the bigger the differences in the number of text units being aligned, the more difficult the alignment problem is. We also critiqued of our own similarity scoring techniques and dif- ferent alignment algorithms based on integer linear programming and local optimization and showed their limitations and discussed ideas to improve them.