United States-English

» Contact HP

Selected Publications

HP Labs


»	Research


»	News and events
»	Technical reports


»	About HP Labs
»	Careers @ HP Labs
»	People
»	Worldwide sites


»	Downloads

2008

Language Feature Mining for Music Emotion Classification via Supervised Learning from Lyrics
Hui He, Jianming Jin, Yuhong Xiong, Bo Chen, Wu Sun, Ling Zhao

Keywords:

Emotion classification, n-grams, Naïve Bayes, Maximum Entropy, Support Vector Machine

Abstract:

In recent years, efficient and intelligent music information retrieval became very important. One essential aspect of this field is music emotion classification by earning from lyrics. This problem is different from traditional text categorization in that more linguistic or semantic information is required for better emotion analysis. Therefore, we focus on how to extract useful and meaningful language features in this paper. We investigate three kinds of preprocessing methods and a series of language grams having different n-order under the well-known n-gram language model framework to extract more semantic features. Then, we employ three supervised learning methods (Naïve Bayes, maximum entropy classification, and support vector machines) to examine the classification performance. Experimental results show that feature extraction methods improve music emotion classification accuracies. Maximum entropy classification with unigram+bigram+trigram gets best accuracy and it is suitable for music emotion classification.

» Back

Information-Theoretic Distance Measures for Clustering Validation: Generalization and Normalization
Ping Luo, Hui Xiong, Guoxing Zhan, Junjie Wu and Zhongzhi Shi

Keywords:

Clustering Validation, Entropy, Information-Theoretic Distance Measures, K-means Clustering

Abstract:

This paper studies the generalization and normalization issues of information-theoretic distance measures for clustering validation. Along this line, we first introduce a uniform representation of distance measures, defined as quasi-distance, which is induced based on a general form of conditional entropy. The quasi-distance possesses three properties: symmetry, the triangle law, and the minimum reachable. These properties ensure that the quasi-distance naturally lends itself as the external measure for clustering validation. In addition, we observe that the ranges of the distance measures are different when they apply for clustering validation on different data sets. Therefore, when comparing the performances of clustering algorithms on different data sets, distance normalization is required to equalize ranges of the distance measures. A critical challenge for distance normalization is to obtain the ranges of a distance measure when a data set is provided. To that end, we theoretically analyze the computation of the maximum value of a distance measure for a data set. Finally, we compare the performances of the partition clustering algorithm K-means on various real-world data sets. The experiments show that the normalized distance measures have better performance than the original distance measures when comparing clusterings of different data sets. Also, the normalized Shannon distance has the best performance among four distance measures under study.

» Back

Inductive Transfer Learning for Unlabeled Target-domain via Hybrid Regularization
Fuzhen Zhuang, Ping Luo, Qing He, and Zhongzhi Shi

Keywords:

Transfer Learning, Inductive Learning, Transductive Learning, Hybrid Regularization

Abstract:

Recent years have witnessed an increasing interest in transfer learning. This paper deals with the classification problem that the target-domain with a different distribution from the source-domain is totally unlabeled, and aims to build an inductive model for unseen data. Firstly, we analyze the problem of class ratio drift in the previous work of transductive transfer learning, and propose to use a normalization method to move towards the desired class ratio. Furthermore, we develop a hybrid regularization framework for inductive transfer learning. It considers three factors, including the distribution geometry of the target-domain by manifold regularization, the entropy value of prediction probability by entropy regularization, and the class prior by expectation regularization. This framework is used to adapt the inductive model learnt from the source-domain to the target-domain. Finally, the experiments on the real-world text data show the effectiveness of our inductive method of transfer learning. Meanwhile, it can handle unseen test points.

» Back

A Practical and Adaptive Framework for Super-Resolution
Heng Su, Liang Tang, Daniel Tretter, and Jie Zhou

Keywords:

Super-resolution, registration error, block based, de-block

Abstract:

In this paper a novel practical and adaptive framework for super-resolution is proposed. Existing super-resolution algorithms are limited by several assumptions, fit different imaging environments; hence they may fail when facing real complex scenes. We propose to divide the target image into adaptive-sized blocks, and apply different conventional algorithms to different parts of the image with different characteristics. The proposed framework is of high extensibility and generality; various super-resolution and single frame image enhancement technologies can be adaptively assembled to improve the robustness and result quality of SR operation. Experiments with real-life video indicate encouraging improvements.

» Back

The Influence of Independent Scheduling Delay on Multi-Query Processing Based on Fork-Join
Yong Wang, Limei Jiao, Ying Liu, Huaiming Song

Keywords:

Multi-query processing, Fork-join model, coefficient of Scheduling Distance

Abstract:

Fork-join is a basic query processing model in shared-nothing parallel database systems. A query Q is decomposed into a number of sub-queries, and each of which is processed independently on a processing element (PE), then all the results of sub-queries are "joined" and returned as Q's results. In this scheme, the query processing time of Q depends on the time of its last finished sub-query. Though it is very important, there are few works studied the performance of multi-query processing on Fork-join model. In this paper, we present a model, called CSD (Coefficient of Scheduling Distance), to evaluate the influence of independent PE scheduling on the total completion time. All the experiments are conducted on DBroker, a large-scale production system for network security management. The experimental results show that the CSD model is accurate and effective.

» Back

DSpace Concept Linking - An Innovative Feature for Virtual Olympic Museum
Baoyao Zhou, Yuhong Xiong, Wei Liu, Xukun Shen, Yue Qi, Jiahui Wang, Yong Hu

Keywords:

Virtual Olympic Museum, Concept linking, Concept-tree

Abstract:

DSpace organizes all of its items using predefined communities and collections. Although such hierarchical data model works well for organizing related contents in DSpace, there is still room to improve users' browsing experience. DSpace concept linking aims to discover richer semantic relationships among DSpace resources and link them together across different communities and collections to further facilitate content navigations. The concept linking function has been implemented as an innovative feature in Virtual Olympic Museum.

» Back

A Rule-based Metadata Extractor for Learning Materials
Yu Yang, Ming Zhang, Baoyao Zhou

Keywords:

Metadata extraction, Regular expression, Information refinement

Abstract:

Integrating all kinds of learning material is becoming more and more significant for the teachers and students to take advantage of the online E-learning courses. As the key part of the whole Online Course Organization System, Metadata Extraction function needs to be accurate enough when dealing with semi-structured documents, even those incompact ones. We design and implement a Metadata Extractor to compare between several rules ordered by priority, and there is another step of information refinement to help improving the final accuracy. When domain changes, users just need to input specific rules, without considering the program. The experiment shows that our new method can perform very well with those semi-structured documents with F measure higher than 85%, which indicates that this method is quite feasible in reality.

» Back

2007

Building a Scalable Web Query System
Meichun Hsu and Yuhong Xiong

Keywords:

Focused Crawling, Web Page Classification, Metadata Extraction

Abstract:

Nowadays, the dominant way to find information on the web is through search. General search engines are very effective, but search phrases and results are unstructured and that limits a user's ability to further automate the processing of the search results. In recent years, we have seen efforts to build systems that support more precise query on the web for certain content verticals. We describe the general problems for building an extensible web query system and report some of our work in this area.

» Back

Object Extraction by Spatio-Temporal Assembling
Xiaoke Qin, Liang Tang, and Jie Zhou

Keywords:

Spatio-temporal assembling, Object extraction, Canal traffic surveillance

Abstract:

Among various algorithms for vision-based traffic monitoring, spatio-temporal (ST) slice analysis is attractive by computing over a larger temporal scale. However, it is unsuitable for further pattern recognition, since the conventional ST slice cannot preserve the spatial relationship of the original object image. In this paper, we propose a novel algorithm for accurate traffic object extraction. Compared with previous ST algorithms depending on one line per frame, we assemble the object based on foreground strips obtained from each frame and carefully designed motion estimation. Thus, both the spatial and temporal information is used more effectively. Applications in real canal traffic scenes show the advantages of our algorithm.

» Back

On Line Course Organization
Ming Zhang, Weichun Wang, Yi Zhou, Yu Yang, Yuhong Xiong, and Xiaoming Li

Keywords:

Focused Crawling - Metadata Extraction - Learning Object Management - Ontology

Abstract:

In order to help users access on-line materials with more specific questions, we build a learning portal named Fusion. First we develop FusionCrawler, a link classification focused crawler, to download potential course pages. We then use a binary classifier to pick out the course pages. After the course pages are identified, we use FusionExtractor, a DOM tree based regular expression wrapper, to extract metadata. The metadata include Course Name, Instructor Information, Course Outline, and other relevant information, and they are stored in a database behind the portal. Experimental results show that our approach to organize on-line courses based on focused crawling and metadata extraction approach is effective. The FusionCrawler got average 40-50% more on-topic learning materials than normal focused crawler, while the average F1 in FusionExtractor is 85%. With metadata of more than 1,400 MIT OCW, 3000 UIUC and 1000 WISC courses; 300 courses from GreatLearning with 3000 Chinese course videos; and nearly 1000 videos from Internet Achieve; the Fusion portal provides several kinds of searching function, like quick search, advanced search and semantic navigation browsing.

» Back

DM-DSpace: Federating DSpace Based Digital Museums in China
Wei Liu, Yuhong Xiong, Baoyao Zhou, James Rutherford, Xukun Shen, Yue Qi, Xiaoyu Li, Weihua Huang, Shu Wang, Bailiang Chen

Keywords:

Digital Museum, DSpace, Federation

Abstract:

The China Digital Museum Project is exploring its way to achieve a federation in which tens of heterogeneous knowledge repositories can share information easily and all the digitized artifacts can be stored, managed, preserved and disseminated. Finally, the project will lead to "virtual museums" formed of arrangement of digital assets by subjects, regardless of physical location. These requirements triggered the idea of federating DSpace which results in an extended DSpace named DM-DSpace.

» Back

2006

An Effective Approach for Periodic Web Personalization
Baoyao Zhou, Siu Cheung Hui, Alvis C. M. Fong

Keywords:

Web personalization, Web usage mining, Periodic access patterns

Abstract:

Periodic web personalization aims to recommend the most relevant resources to a user during a specific time period by analyzing the periodic access patterns of the user from web usage logs. In this paper, we propose a novel web usage mining approach for supporting effective periodic web personalization. The proposed approach first constructs a user behavior model, called Personal Web Usage Lattice, from web usage logs using the fuzzy Formal Concept Analysis technique. Based on the Personal Web Usage Lattice, resources that the user is most probably interested in during a given period can be deduced efficiently. This approach enables the costly personalized resources preparation process to be done in advance rather than in real-time.

» Back

Printable version


Privacy statement	Using this site means you accept its terms	Feedback to HP Labs

© 2009 Hewlett-Packard Development Company, L.P.