Technical Reports

HPL-2010-159

Click here for full text: PDF

OfCourse: Web Content Discovery, Classification and Information Extraction for Online Course Materials

Xiong, Yuhong; Luo, Ping; Zhao, Yong; Lin, Fen; Feng, Shicong; Zhou, Baoyao; Zheng. Liwei
HP Laboratories

HPL-2010-159

Keyword(s): Vertical search, online courses, Web classification, Web in- formation extraction

Abstract: In this paper we present OfCourse, a vertical search engine for online course materials. These materials have the following characteristics: they are scattered very sparsely in the university Web sites; and are generated by the teachers with totally different HMTL templates and layouts. These characteristics impose some challenges for Web Classification (to identify the course materials) and Web Information Extraction (to extract course metadata, such as course title, time and ID) from the identified course homepages. Here, we describe our proposed method to tackle these challenges, and the features of this system. OfCourse, containing over 60,000 courses from the top 50 universities in the US, is currently available for public access.

2 Pages

Additional Publication Information: Published in the 18th ACM Conference on Information and Knowledge Management, Hong Kong (demo paper), November 2-6, 2009

External Posting Date: October 21, 2010 [Fulltext]. Approved for External Publication
Internal Posting Date: October 21, 2010 [Fulltext]

Back to Index