HP Labs automates publishing with content analysis and composition technology




Few industries have been transformed as much by digital technology as publishing. The ubiquity of Internet-connected computers, smartphones and tablet devices like the Slate and iPad has forever changed the way text, photos and multimedia is created, published and consumed. This transformation continues with the emergence of social media and content creation technologies, which offer tools and platforms for individuals to self-publish via the Internet.

Recognizing the significance of this change in the industry, researchers at HP's Printing and Content Delivery Lab (PCDL) are developing a variety of automated publishing applications to help publishers, organizations and everyday people make the most of this transformation.

Research Director Qian Lin.

Research Director
Qian Lin

"Our research focuses primarily on content creation and consumption technologies," says Qian Lin, who leads the Automated Publishing program in PCDL. "Our goal is to help publishers optimize content creation and distribution while enabling a better and more interactive content experience for consumers."

HP researchers exploit major trends in publishing

HP research projects focus on three major publishing trends:

  • the need for publishers to optimize text and visual-based Web content for different form factors-printed pages and the smaller displays of smartphones and hand-held devices
  • the desire among individuals to "tell their own stories," using mobile phone cameras and social media platforms like Facebook to share photos, for example
  • the growth of micropublishing, whereby individuals and organizations leverage easy-to-use tools to produce professional-quality print publications and online magazines

Although applications developed by HP's Automated Publishing research teams tackle discreet problems, the underlying technology is the same.

Content analysis and composition technology

"Two core technologies form the foundation of our content creation and consumption applications. One conducts content analysis using machine learning and computer vision technology, and the second does content synthesis and composition," says Lin. "By using different combinations of these technologies, we can enable different applications for different purposes."

Machine learning is a method of training computers to understand data sets, so they can recognize and classify data according to various patterns. For example, HP's Automated Publishing researchers used machine learning to train computers so they can analyze elements on a Web page and classify them accordingly.

Computer vision technology is similar, but instead of analyzing patterns in data, it looks for actual features in an image. To detect a face in an image, for instance, computer vision analysis works by applying a series of filters to the image and classifying the result, thus detecting the presence of a face in the image.

The content synthesis and composition technology developed by HP's Automated Publishing researchers solves content optimization problems, such as adapting Web pages for mobile device screens. Its algorithms use mathematical models to optimize static layouts, such as an 8 ½-by-11-inch piece of paper or a smartphone screen. By resizing an image or white space, or removing unnecessary content elements, the algorithms help to ensure a quality viewing experience for users.

"Our content synthesis and composition technology forms a core foundation for automated publishing," says Lin. "It can be used for a variety of applications, from personal and professional publishing, to content creation for small business communications, to configuring Web content for tablets and printers."

The applications

HP's researchers are working on several promising content synthesis and composition technology-based applications. Three examples of such work are optimizing Web pages for printing, viewing on hand-held devices and creating photo albums on the fly.

Smart Print

Anyone who's bemoaned their printer for wasting paper and ink by printing pages with useless text, such as navigation links and indecipherable banner ads or pages with only a few words, will appreciate the HP Smart Print application.

Using content-analysis technology, HP SmartPrint allows users to print only the information they want on a Web page. SmartPrint automatically analyzes Web pages to determine which elements are useful and should be included for printing. Once SmartPrint selects an area to print, users can then modify the print area to make sure only the content they want to see on paper is actually printed.

"SmartPrint is enabled by an algorithm that does a combination of DOM (document object model)-tree and visual analysis, which looks at the HTML file to determine the elements a user would most likely want on a printed page," says Jerry Liu, project lead for HP Smart Print, and research manager, HP PCDL.

HP Smart Print not only improves the reading experience by optimizing printed content, but it also reduces paper and ink use. Another major innovation that Liu and his team achieved with SmartPrint is the application's scalability. Although similar technologies can filter out extraneous content when printing a handful of pages, HP SmartPrint prints 1,000 Web pages as easily as it does five.

The application is available for download as HP Smart Print. It's embedded in the HP Bing search toolbar.

Article Clipper

Similar to HP SmartPrint, Article Clipper uses content-analysis technology to improve readers' interaction with Web pages. It differs from SmartPrint, however, in that Article Clipper optimizes Web page content for hand-held devices and smartphones, rather than printed paper.

Because smartphones and hand-held devices have tiny screens compared to laptops and personal computers, they often display only portions of a Web page, frustrating publishers and readers alike.

"Article Clipper solves this problem by eliminating excess elements and clutter on Web pages, like navigation links and ads, and it can consolidate content from multiple Web pages," says Jian Fan, project lead for Article Clipper, and researcher, HP PCDL. "It makes reading Web content on mobile platforms much easier."

Article Clipper is used primarily for optimizing RSS feeds, which readers receive in a standardized format: headline, intro paragraph and link to a Web page. Article Clipper improves the readability of RSS feeds by pulling content from their associated Web pages and combining it with the titles and intro paragraphs, eliminating the need to click through to Web pages to read the content. If a user receives multiple RSS feeds, Article Clipper will follow the links of all the feeds and assemble the Web page content into a multipage magazine.

Article Clipper aggregates content into HTML pages as well as PDFs. It is available as the Net News app on HP's Slate, which comes with HP's HP Photosmart eStation All-in-One Printer series.

HP Photobook

Photobook authoring application on an iPad

HP Photobook

With cameras now standard on most mobile phones, people are snapping more photographs than ever. To share them, many users upload photos to Flickr or Facebook and then tag and organize the pictures in separate albums. This requires a good Internet connection, and it's made easier with a keyboard and mouse.

Imagine if you could do all that on a smartphone or tablet computer, without connecting to a Web-based photo service? You can with HP Photobook.

"We built HP Photobook as a touch-based application that lets users create, edit and view photo albums on their hand-held devices, without having to actively upload them to Web-based photo services," says Jun Xiao, project lead for HP Photobook, and researcher, HP PCDL.

Although creating and editing a photo album on a computer seems simple, it requires a lot of computing muscle. This makes it difficult to do on hand-held devices, which have less computing power, different input and output technologies, smaller screens, and no mouse.

HP Photobook solves this problem by using image-analysis technology to automate such user tasks as photo selection, grouping, cropping, editing and layout. Instead of using a mouse, HP Photobook users can create and edit photo albums by tapping and dragging fingers across a screen. "HP Photobook enables a much better photo scrapbook experience," says Xiao.

HP Photobook is available as a download in iTune's app store.