Visual IR
Audio Based IR
SEG Scheme
SBD Scheme
Progressive Query
Video Summarization
Hierarchical Cellular Tree
TLQ System
Mobile MUVIS
  • A global framework implementation in order to achieve a robust and generic solution for
    audio-based multimedia indexing and retrieval, specifically:
      - Generic Support for Audio Codecs
      - Generic Support for File Formats
      - Generic Support for Capturing, Encoding and Acoustic Parameters/Variations
      - Generic Support for AFeX Framework ParametersGeneric Support for audio clip duration and content variation.

  • The main objective is content-based (speaker, subject, "sounds like") retrieval of the audio, which is suitable to human judgment and (aural) perception.
Unsupervised Audio Classification and Segmentation in MUVIS

    Objectives and Requirements - Preliminary Work

  • Design is made particularly for Audio based Multimedia Indexing and Retrieval
  • No training required
  • Fully automatic and unsupervised
  • Multimodal Structure
  • Bitstream Mode (Compressed Domain)
  • Generic Mode (UnCompressed Domain)
      - Decoding - if requiredClassification and Segmentation scheme should be robust to:
      - Compression parameters (Compressed audio)
      - Sampling frequency (16KHz - 44.1 KHz)
      - No. of channels
      - Audio compression scheme
  • Generic parameters
      - Sound volume level
      - Clip duration
      - Content (Class) variation within audio clip
  • High accuracy is required:
      - Reduces the class domain into 4 main categories
        - Speech, Music, Fuzzy and Silence
      - High precision is required:
        - Primary objective is to minimize the critical errors
  • Error types divided into three types
      - Critical Errors
      - Semi-Critical Errors
      - Non-Critical Errors

  • Pure Class Types
      - Speech
      - Music
      - Silence
  • Impure Class Type
      - Fuzzy

  • A segment has a fuzzy class type if
      - Either it is not classifiable as a pure class due to some potential uncertainties or anomalies in the audio source
      - Or it exhibits features from more than one pure class
Audio Classification and Segmentation Main Approach
  • Audio Classification and Segmentation is an internally dependant problem
  • 4 steps algorithm
  • Each step results in an initial status for the next step
  • Iteration loops ensure global segmentation within the clip
  • More global segmentation assures more accuracy in classification within the segment

    Experimental Results
  • In total measures, the method is applied onto 260 (~15 hours) MP3, 100 (~5 hours) AAC and 200 (~10 hours) PCM clips.

    Bit-Stream Mode:

    Generic Mode:

Audio Feature Extraction (AFeX) Framework
  • Independent AFeX module(s) integration capability into MUVIS framework for audio-based indexing and retrieval.

    AFeX Overview
  • Each AFeX algorithm should be implemented as a Dynamically Linked Library (DLL) with respect to AFeX API. AFeX API provides the necessary handshaking and information flow between a MUVIS application and an AFeX module.
  • AFeX interface is defined in AFex_API.h file. As mentioned before, any AFeX algorithm should be implemented as a DLL using this API header file. Mainly AFex_API.h defines about five different API functions required to manage all feature extraction operations in a dynamic way. It also specifies a certain data structure necessary for feature extraction and communication between the module and application. Figure 1 summarizes the API functions and linkage between MUVIS applications and a sample FeX module. There is a naming convention for any AFeX module as follows:
    AFex_[fourCC code].dll (i.e AFex_MFCC.dll)
  • All the AFeX modules should be stored in the same directory with the application (DBSEditor and MBrowser) or simply in C:\MUVIS\ directory. If the database contains audio clips or video clips with audio track, for each clip, again a separate feature file for each AFeX module is created and stored along with the video clip. In this case this feature file contains the feature vectors for all the key-frames in audio track. The naming convention in this is as follows: [indexed video file name]_AFex.[FourCC code] (i.e. “MTV_CLIP_12_AFex.MFCC”)

    AFeX API
  • Two enumeration types, two structures and five API function properties (name and types) are declared in AFex_API.h. The owner of a AFeX module should implement all these API functions. There also exists a macro in order to convert a character array to FourCC code.
    AFeX_API.h : Data Structures
  • c_categ Enumeration:
    Defines the audio class types of each audio frame to be used for feature extraction.
  • AudioType Enumeration:
    This enumeration is not currently used.
  • AFexParam Structure:
    Created and filled by an AFeX module for handshaking operation. One of the MUVIS applications calls AFex_Bind function once with a pointer to this structure in run-time. Therefore, AFeX module fills the following members of this structure to introduce itself to the application:
      char feat_name[255] : Description of the feature (i.e. "The feature: MFCC"). Used by applications as title.
      long feat_fourcc : Feature fourcc code (i.e. _FourCC('MFCC') ). Unique identification code for the feature extraction algorithm. Used by applications to identify each AFeX module and associated files.
      unsigned int feat_param_no : Number of parameters (i.e. 6 for MFCC).
      long* feat_param_fourcc : Array of parameter fourcc codes (i.e. [_FourCC('NoFl'), _FourCC('Hfre'), _FourCC('Ford'), …] ). Used for display purposes.
      double* feat_param_default : Array of parameter default values (i.e. [32,22050,24, …] for MFCC).

  • AFrameParam Structure:
    A MUVIS application calls the AFex_Extract function to extract the features of the frame stored inside the given FrameParam structure. The format of the frame is the format that is specified by the AFeX module in the AFexParam structure.
      short* buffer; // buffer of audio PCM samples..
      int buf_len; // lenght of buffer..
      AudioType a_type; // frame audio type..
      c_categ f_class; // frame classification type..

    FEX_API.h : Functions
    Five API function properties (name and types) are declared in AFex_API.h. The creator of an AFeX module should implement all specified API functions, which are described as follows:
  • int AFex_Bind(AFexParam*): Used for handshaking operation between a MUVIS application and an AFeX module. AFeX module fills the specific structure to introduce itself to the application. This function is called only once at the beginning, just after the application links the AFeX module in run-time.
  • (double*)AFex_Init (double*, int, int, int): The feature extraction parameters are given to initialize the AFeX module. The AFeX module performs necessary initialization operations, i.e. memory allocation, table creation etc. This function is called for the initialization of a unique sub-feature extraction operation. A new sub-feature can be created by using different set of feature parameters.
  • double* AFex_Extract(AFrameParam, int&): It is used to extract the features of an audio frame (buffer). It returns the feature vectors, which should be normalized in such a way that the total length of the vector should be in between 0.0 and 1.0. This normalization is required for merging multiple (sub-) features while querying in MBrowser.
  • int AFex_Exit(AFexParam *): It is for resetting and terminating the AFeX module operation. It frees the entire memory space allocated in AFex_Bind function. Additionally, if AFex_Init has been called already, this function resets the AFeX module to perform further feature extraction operations. This function is called at least once while the MUVIS application is terminated, but it might be called at the end of each AFeX operation per sub-feature extraction.
  • double AFex_GetDistance(double *, double*, int): This function is used to obtain the similarity measure via calculating the distance between two feature vectors and therefore, the appropriate distance measurement algorithm should be implemented in this function. The resulting distance is returned as a double precision number.
    For more information see our publications
Please send questions and bug reports to webm@ster,
Release Date: 01.03.2006
Last Update: 10.09.2013
Copyright 2013 MUVIS