|
» |
|
| |
2008 |
|
|
|
|
Language Feature Mining for Music Emotion Classification via Supervised Learning from Lyrics Hui He, Jianming Jin, Yuhong Xiong, Bo Chen, Wu Sun, Ling Zhao
Keywords:
Emotion classification, n-grams, Naïve Bayes, Maximum Entropy, Support Vector Machine
Abstract:
In recent years, efficient and intelligent music information retrieval became very important. One essential aspect of this field is music emotion classification by earning from lyrics. This problem is different from traditional text categorization in that more linguistic or semantic information is required for better emotion analysis. Therefore, we focus on how to extract useful and meaningful language features in this paper. We investigate three kinds of preprocessing methods and a series of language grams having different n-order under the well-known n-gram language model framework to extract more semantic features. Then, we employ three supervised learning methods (Naïve Bayes, maximum entropy classification, and support vector machines) to examine the classification performance. Experimental results show that feature extraction methods improve music emotion classification accuracies. Maximum entropy classification with unigram+bigram+trigram gets best accuracy and it is suitable for music emotion classification.
» Back |
|
|
Information-Theoretic Distance Measures for Clustering Validation: Generalization and Normalization Ping Luo, Hui Xiong, Guoxing Zhan, Junjie Wu and Zhongzhi Shi
Keywords:
Clustering Validation, Entropy, Information-Theoretic Distance Measures, K-means Clustering
Abstract:
This paper studies the generalization and normalization issues of information-theoretic distance measures for clustering validation. Along this line, we first introduce a uniform representation of distance measures, defined as quasi-distance, which is induced based on a general form of conditional entropy. The quasi-distance possesses three properties: symmetry, the triangle law, and the minimum reachable. These properties ensure that the quasi-distance naturally lends itself as the external measure for clustering validation. In addition, we observe that the ranges of the distance measures are different when they apply for clustering validation on different data sets. Therefore, when comparing the performances of clustering algorithms on different data sets, distance normalization is required to equalize ranges of the distance measures. A critical challenge for distance normalization is to obtain the ranges of a distance measure when a data set is provided. To that end, we theoretically analyze the computation of the maximum value of a distance measure for a data set. Finally, we compare the performances of the partition clustering algorithm K-means on various real-world data sets. The experiments show that the normalized distance measures have better performance than the original distance measures when comparing clusterings of different data sets. Also, the normalized Shannon distance has the best performance among four distance measures under study.
» Back |
|
|
Inductive Transfer Learning for Unlabeled Target-domain via Hybrid Regularization Fuzhen Zhuang, Ping Luo, Qing He, and Zhongzhi Shi
Keywords:
Transfer Learning, Inductive Learning, Transductive Learning, Hybrid Regularization
Abstract:
Recent years have witnessed an increasing interest in transfer learning. This paper deals with the classification problem that the target-domain with a different distribution from the source-domain is totally unlabeled, and aims to build an inductive model for unseen data. Firstly, we analyze the problem of class ratio drift in the previous work of transductive transfer learning, and propose to use a normalization method to move towards the desired class ratio. Furthermore, we develop a hybrid regularization framework for inductive transfer learning. It considers three factors, including the distribution geometry of the target-domain by manifold regularization, the entropy value of prediction probability by entropy regularization, and the class prior by expectation regularization. This framework is used to adapt the inductive model learnt from the source-domain to the target-domain. Finally, the experiments on the real-world text data show the effectiveness of our inductive method of transfer learning. Meanwhile, it can handle unseen test points.
» Back |
|
|
A Practical and Adaptive Framework for Super-Resolution Heng Su, Liang Tang, Daniel Tretter, and Jie Zhou
Keywords:
Super-resolution, registration error, block based, de-block
Abstract:
In this paper a novel practical and adaptive framework for super-resolution is proposed. Existing super-resolution algorithms are limited by several assumptions, fit different imaging environments; hence they may fail when facing real complex scenes. We propose to divide the target image into adaptive-sized blocks, and apply different conventional algorithms to different parts of the image with different characteristics. The proposed framework is of high extensibility and generality; various super-resolution and single frame image enhancement technologies can be adaptively assembled to improve the robustness and result quality of SR operation. Experiments with real-life video indicate encouraging improvements.
» Back |
|
|
The Influence of Independent Scheduling Delay on Multi-Query Processing Based on Fork-Join Yong Wang, Limei Jiao, Ying Liu, Huaiming Song
Keywords:
Multi-query processing, Fork-join model, coefficient of Scheduling Distance
Abstract:
Fork-join is a basic query processing model in shared-nothing parallel database systems. A query Q is decomposed into a number of sub-queries, and each of which is processed independently on a processing element (PE), then all the results of sub-queries are "joined" and returned as Q's results. In this scheme, the query processing time of Q depends on the time of its last finished sub-query. Though it is very important, there are few works studied the performance of multi-query processing on Fork-join model. In this paper, we present a model, called CSD (Coefficient of Scheduling Distance), to evaluate the influence of independent PE scheduling on the total completion time. All the experiments are conducted on DBroker, a large-scale production system for network security management. The experimental results show that the CSD model is accurate and effective.
» Back |
|
|
DSpace Concept Linking - An Innovative Feature for Virtual Olympic Museum Baoyao Zhou, Yuhong Xiong, Wei Liu, Xukun Shen, Yue Qi, Jiahui Wang, Yong Hu
Keywords:
Virtual Olympic Museum, Concept linking, Concept-tree
Abstract:
DSpace organizes all of its items using predefined communities and collections. Although such hierarchical data model works well for organizing related contents in DSpace, there is still room to improve users' browsing experience. DSpace concept linking aims to discover richer semantic relationships among DSpace resources and link them together across different communities and collections to further facilitate content navigations. The concept linking function has been implemented as an innovative feature in Virtual Olympic Museum.
» Back |
|
|
A Rule-based Metadata Extractor for Learning Materials Yu Yang, Ming Zhang, Baoyao Zhou
Keywords:
Metadata extraction, Regular expression, Information refinement
Abstract:
Integrating all kinds of learning material is becoming more and more significant for the teachers and students to take advantage of the online E-learning courses. As the key part of the whole Online Course Organization System, Metadata Extraction function needs to be accurate enough when dealing with semi-structured documents, even those incompact ones. We design and implement a Metadata Extractor to compare between several rules ordered by priority, and there is another step of information refinement to help improving the final accuracy. When domain changes, users just need to input specific rules, without considering the program. The experiment shows that our new method can perform very well with those semi-structured documents with F measure higher than 85%, which indicates that this method is quite feasible in reality.
» Back |
2007 |
|
|
|
|
Building a Scalable Web Query System Meichun Hsu and Yuhong Xiong
Keywords:
Focused Crawling, Web Page Classification, Metadata Extraction
Abstract:
Nowadays, the dominant way to find information on the web is through search. General search engines are very effective, but search phrases and results are unstructured and that limits a user's ability to further automate the processing of the search results. In recent years, we have seen efforts to build systems that support more precise query on the web for certain content verticals. We describe the general problems for building an extensible web query system and report some of our work in this area.
» Back |
|
|
Object Extraction by Spatio-Temporal Assembling Xiaoke Qin, Liang Tang, and Jie Zhou
Keywords:
Spatio-temporal assembling, Object extraction, Canal traffic surveillance
Abstract:
Among various algorithms for vision-based traffic monitoring, spatio-temporal (ST) slice analysis is attractive by computing over a larger temporal scale. However, it is unsuitable for further pattern recognition, since the conventional ST slice cannot preserve the spatial relationship of the original object image. In this paper, we propose a novel algorithm for accurate traffic object extraction. Compared with previous ST algorithms depending on one line per frame, we assemble the object based on foreground strips obtained from each frame and carefully designed motion estimation. Thus, both the spatial and temporal information is used more effectively. Applications in real canal traffic scenes show the advantages of our algorithm.
» Back |
|
|
On Line Course Organization Ming Zhang, Weichun Wang, Yi Zhou, Yu Yang, Yuhong Xiong, and Xiaoming Li
Keywords:
Focused Crawling - Metadata Extraction - Learning Object Management - Ontology
Abstract:
In order to help users access on-line materials with more specific questions, we build a learning portal named Fusion. First we develop FusionCrawler, a link classification focused crawler, to download potential course pages. We then use a binary classifier to pick out the course pages. After the course pages are identified, we use FusionExtractor, a DOM tree based regular expression wrapper, to extract metadata. The metadata include Course Name, Instructor Information, Course Outline, and other relevant information, and they are stored in a database behind the portal. Experimental results show that our approach to organize on-line courses based on focused crawling and metadata extraction approach is effective. The FusionCrawler got average 40-50% more on-topic learning materials than normal focused crawler, while the average F1 in FusionExtractor is 85%. With metadata of more than 1,400 MIT OCW, 3000 UIUC and 1000 WISC courses; 300 courses from GreatLearning with 3000 Chinese course videos; and nearly 1000 videos from Internet Achieve; the Fusion portal provides several kinds of searching function, like quick search, advanced search and semantic navigation browsing.
» Back |
|
|
DM-DSpace: Federating DSpace Based Digital Museums in China Wei Liu, Yuhong Xiong, Baoyao Zhou, James Rutherford, Xukun Shen, Yue Qi, Xiaoyu Li, Weihua Huang, Shu Wang, Bailiang Chen
Keywords:
Digital Museum, DSpace, Federation
Abstract:
The China Digital Museum Project is exploring its way to achieve a federation in which tens of heterogeneous knowledge repositories can share information easily and all the digitized artifacts can be stored, managed, preserved and disseminated. Finally, the project will lead to "virtual museums" formed of arrangement of digital assets by subjects, regardless of physical location. These requirements triggered the idea of federating DSpace which results in an extended DSpace named DM-DSpace.
» Back |
2006 |
|
|
|
|
An Effective Approach for Periodic Web Personalization Baoyao Zhou, Siu Cheung Hui, Alvis C. M. Fong
Keywords:
Web personalization, Web usage mining, Periodic access patterns
Abstract:
Periodic web personalization aims to recommend the most relevant resources to a user during a specific time period by analyzing the periodic access patterns of the user from web usage logs. In this paper, we propose a novel web usage mining approach for supporting effective periodic web personalization. The proposed approach first constructs a user behavior model, called Personal Web Usage Lattice, from web usage logs using the fuzzy Formal Concept Analysis technique. Based on the Personal Web Usage Lattice, resources that the user is most probably interested in during a given period can be deduced efficiently. This approach enables the costly personalized resources preparation process to be done in advance rather than in real-time.
» Back |
|