Technical Reports

HPL-2011-145

Automatic Text Summarization and Small-World Networks

Balinsky, Helen; Balinsky, Alexander; Simske, Steven
HP Laboratories

HPL-2011-145

Keyword(s): Text Summarization; small world network; unusual behavior detection; Helmholtz principle

Abstract: Automatic text summarization is an important and challenging problem. Over the years, the amount of text available electronically has grown exponentially. This growth has created a huge demand for automatic methods and tools for text summarization. We can think of automatic summarization as a type of information compression. To achieve such compression, better modelling and understanding of document structures and internal relations is required. In this article, we develop a novel approach to extractive text summarization by modelling texts and documents as small-world networks. Based on our recent work on the detection of unusual behavior in text, we model a document as a one-parameter family of graphs with its sentences or paragraphs defining the vertex set and with edges defined by Helmholtz's principle. We demonstrate that for some range of the parameters, the resulting graph becomes a small-world network. Such a remarkable structure opens the possibility of applying many measures and tools from social network theory to the problem of extracting the most important sentences and structures from text documents. We hope that documents will be also a new and rich source of examples of complex networks.

10 Pages

Additional Publication Information: To be presented at DocEng 2011: 11th ACM Symposium on Document Engineering.

External Posting Date: September 6, 2011 [Abstract]. Approved for External Publication
Internal Posting Date: September 6, 2011 [Fulltext]

Back to Index