Nowadays there are more and more musical assets in digital formats
available, and they are becoming more and more popular in people's
life. Then comes the problem of how to efficiently arrange musical
assets so as to make them easy for browsing and retrieving. Since
the volume of digital music materials is growing rapidly, manual
indexing and retrieval are just impossible. In this project, we
focus on building systems and methods for automatic categorization
and retrieval of musical assets. There are many applications of this
work such as online music shopping, personal music library
organization, searching for preferred music channels on the web, and
retrieving video segments based on music content. In the following,
three technologies which have been developed for this project are
described.
Semi-Automatic Approach for Music
Classification
Audio categorization is essential when managing a music database,
either a professional library or a personal collection. However, a
complete automation in categorizing music into proper classes for
browsing and searching is not yet supported by today’s technology.
Also, the issue of music classification is subjective to some extent
as each user may have his own criteria for categorizing music.
Therefore, we proposed the idea of semi-automatic music
classification. With this approach, a music browsing system is set
up which contains a set of tools for separating music into a number
of broad types (e.g. male solo, female solo, string instruments
performance, etc.) using existing music analysis methods. With
results of the automatic process, the user may further cluster music
pieces in the database into finer classes and/or adjust
misclassifications manually according to his own preferences and
definitions. Such a system may greatly improve the efficiency of
music browsing and retrieval, while at the same time guarantee
accuracy and user’s satisfaction of the results.
Here are illustrations of the system.
For more details of this work, please refer to the following paper:
Tong Zhang, "Semi-automatic approach for music
classification," SPIE's Conference on Internet Multimedia
Management Systems IV (part of ITCom'03), vol. 5242, Orlando, Sep.
2003. (PDF Download)
Automatic Music Instrument Classification
While most previous work on musical instrument recognition is
focused on the classification of single notes in monophonic music,
we proposed a scheme in this work for the distinction of instruments
in continuous music pieces which may contain one or more kinds of
instruments. Highlights of the system include music segmentation
into notes, harmonic partial estimation in polyphonic sound, note
feature calculation and normalization, note classification using a
set of neural networks, and music piece categorization with fuzzy
logic principles. Example outputs of the system are "the music
piece is 100% guitar (with 90% likelihood)" and "the music
piece is 60% violin and 40% piano, thus a violin/piano duet".
The system has been tested with twelve
kinds of musical instruments, and very promising experimental
results have been obtained. An accuracy of about 80% is achieved,
and the number can be raised to 90% if mis-classifications within
the same instrument family are tolerated (e.g. cello, viola and
violin). A demonstration system for
musical instrument classification and music timbre retrieval was
also developed.
For algorithmic details, please refer to the following paper:
Tong Zhang, "Instrument classification in polyphonic music
based on timbre analysis," SPIE's Conference on Internet
Multimedia Management Systems II (part of ITCom'01), vol. 4519,
p. 136-147, Denver, Aug. 2001. (PDF Download)
Automatic Singer Identification
The singer’s information is essential in organizing, browsing
and retrieving music collections. In this work, a system for
automatic singer identification is developed which recognizes the
singer of a song by analyzing the music signal. Meanwhile, songs
which are similar in terms of singer’s voice are clustered. The
proposed scheme follows the framework of common speaker
identification systems, but special efforts are made to distinguish
the singing voice from instrumental sounds in a song. A statistical
model is trained for each singer’s voice with typical song(s) of
the singer. Then, for a song to be identified, the starting point of
singing voice is detected and a portion of the song is excerpted
from that point. Audio features are extracted and matched with
singers’ voice models in the database. The song is assigned to the
model having the best match. Promising results are obtained on a
small set of samples, and accuracy rates of around 80% are achieved.
In the proposed approach, four audio features are extracted and
integrated to detect the start of the singer's voice in a song,
which helps to remove prelude of the song. This technique can be
used in several other music management applications in addition to
singer identification, such as audio thumbnailling. Here is an
example.
For more details, please refer to the following paper:
Tong Zhang, "Automatic singer identification," Proc.
of ICME'03, Baltimore, July 2003. (PDF
Download)
Contact
For more information about the technology, please contact Tong
Zhang (tong.zhang@hp.com) at
Imaging Technology Dept., HP Labs.
|
|
|