• Upadte: A document (PDF: 10 MB) on research activities in 2014-15 is hosted at: Click here.
Some of the key contributions of the group are summarized below:
  • Auditory Spectral Estimation: Robust estimation of the instantaneous frequency (IF) of a time-varying sinusoid is explored, first using joint time-frequency representations and then using auditory system motivated level-crossing information. It is shown that zero-crossing instants can be used to estimate the instantaneous phase function through interpolation of its irregularly spaced samples [J2,C1]. Similarly, higher order TFRs, such as cross-L-Wigner distribution and complex TFRs, are shown to be more to be more robust IF estimators of arbitrary polynomial phase functions, in the presence of additive noise. [J5, J6,C14]
  • Speech/Audio Compression: A new technique of decomposing speech and music signals into linear chirp components with a mixture Gaussian envelope for each component is developed. This uses a successive approximation approach and hence well suited to approximate a signal to its perceptual noise threshold. It is shown that the model is effective for a variety of signals that are harmonic, tonal, transient and noise like (unvoiced) and provides a near transparent reconstruction for a parameter load of 2-4 pars/ms [J3,C2, P1]. We also showed that VQ of side information is effective for low bitrate speech/audio coding in MPEG-4 AAC [C8,C9]. We have also pursued an adaptive filter-bank approach to audio coding with joint time-frequency domain bit-allocation. This provides a better estimate of the perceptual entropy of transient signals. For very-low bitrate speech coding, we have formulated a speech recognition followed by speech synthesis approach. Here we have developed separate HMMs for vocal-tract modeling and glottal-pulse modelling. This joint HMM based synthesis seems promising
  • Speech Recognition: Towards robust speech recognition in automobile noise, we addressed robustness at four levels: (i) word boundary detection, (ii) auditory/discriminant features, (iii) noise compensation and (iv) growing HMMs/NNs. For (i) we have developed a new algorithm based on the norm of the mel-frequency cepstral vectors, whose performance is seen to be < 2% false word detection and 0% words missed [J1]. Robust feature transformation is determined using a discriminant formulation and gradient search using the training data. This approach has lead to a combined optimum for the DCT and weighting lifter for the MFCC parameters [M16]. We also developed an auditory motivated variable time-frequency resolution (EarLyzer) feature vector for improved performance [C5]. For noise compensation, it is shown that spectral addition of background noise beyond the word boundaries is more effective than spectral subtraction. For pattern matching, a 2D matched filter model is developed to match the speech patterns in the joint TF representation [J4]. For a compact statistical representation of the speech patterns a growing HMM or a growing SOM (self-organizaing map) neural network is considered. It is found that increasing the statistical resolution in a class-discriminant manner does  provide improved HMM classification of noisy signals [C7].
  • Speech Enhancement: Intelligibility/quality enhancement of binaural noisy speech is modelled using an iterative coherence filter approach. It is shown that this approach gives a better performance that the single channel model [C3].
  • Audio Processing: For hi-fi audio rendering, we explored simulation of dynamic localization using head related impulse response (HRIR) interpolation and filtering in real-time. Of the various interpolation schemes, linear interpolation is shown to be su?cient to achieve perceptual accuracy of localization of the source [C4].
Future work:
  • Resolution/robustness trade-off in time-varying IF estimation using LC information.
  • Speech/Audio analysis/synthesis using polynomial chirps.
  • Discriminant training of speech patterns using growing HMMs and growing neural-networks.
  • Binaural speech enhancement using an estimate of head related transfer function (HRTF).
  • Real-time audio morphing for audio effects.