Skip to content

Audio Understanding

Audio Understanding

Publish Date Title Authors PDF Code
2024-09-16 An Efficient Self-Learning Framework For Interactive Spoken Dialog Systems Hitesh Tulsiani et.al. 2409.10515v1 null
2024-09-16 MusicLIME: Explainable Multimodal Music Understanding Theodoros Sotirou et.al. 2409.10496v1 link
2024-09-16 Meta-Whisper: Speech-Based Meta-ICL for ASR on Low-Resource Languages Ming-Hao Hsu et.al. 2409.10429v1 null
2024-09-16 Leveraging Joint Spectral and Spatial Learning with MAMBA for Multichannel Speech Enhancement Wenze Ren et.al. 2409.10376v1 null
2024-09-16 Ultra-Low Latency Speech Enhancement - A Comprehensive Study Haibin Wu et.al. 2409.10358v1 null
2024-09-16 2D or not 2D: How Does the Dimensionality of Gesture Representation Affect 3D Co-Speech Gesture Generation? Téo Guichoux et.al. 2409.10357v1 null
2024-09-16 DreamHead: Learning Spatial-Temporal Correspondence via Hierarchical Diffusion for Audio-driven Talking Head Synthesis Fa-Ting Hong et.al. 2409.10281v1 null
2024-09-16 oboVox Far Field Speaker Recognition: A Novel Data Augmentation Approach with Pretrained Models Muhammad Sudipto Siam Dip et.al. 2409.10240v1 null
2024-09-16 Speech as a Biomarker for Disease Detection Catarina Botelho et.al. 2409.10230v1 null
2024-09-16 RF-GML: Reference-Free Generative Machine Listener Arijit Biswas et.al. 2409.10210v1 null
2024-09-16 Emo-DPO: Controllable Emotional Speech Synthesis through Direct Preference Optimization Xiaoxue Gao et.al. 2409.10157v1 null
2024-09-16 Room impulse response prototyping using receiver distance estimations for high quality room equalisation algorithms James Brooks-Park et.al. 2409.10131v1 null
2024-09-16 Self-Supervised Syllable Discovery Based on Speaker-Disentangled HuBERT Ryota Komatsu et.al. 2409.10103v1 link
2024-09-16 Optimizing Dysarthria Wake-Up Word Spotting: An End-to-End Approach for SLT 2024 LRDWWS Challenge Shuiyun Liu et.al. 2409.10076v1 null
2024-09-16 Speaker Contrastive Learning for Source Speaker Tracing Qing Wang et.al. 2409.10072v1 null
2024-09-16 StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style Diffusion Yinghao Aaron Li et.al. 2409.10058v1 null
2024-09-16 TBDM-Net: Bidirectional Dense Networks with Gender Information for Speech Emotion Recognition Vlad Striletchi et.al. 2409.10056v1 null
2024-09-16 Audio-Driven Reinforcement Learning for Head-Orientation in Naturalistic Environments Wessel Ledder et.al. 2409.10048v1 null
2024-09-16 DiffATR: Diffusion-based Generative Modeling for Audio-Text Retrieval Yifei Xin et.al. 2409.10025v1 null
2024-09-16 DNN-based ensemble singing voice synthesis with interactions between singers Hiroaki Hyodo et.al. 2409.09988v1 null
2024-09-16 A Study on Zero-shot Non-intrusive Speech Assessment using Large Language Models Ryandhimas E. Zezario et.al. 2409.09914v1 null
2024-09-15 Acquiring Pronunciation Knowledge from Transcribed Speech Audio via Multi-task Learning Siqi Sun et.al. 2409.09891v1 null
2024-09-15 Constructing a Singing Style Caption Dataset Hyunjong Ok et.al. 2409.09866v1 link
2024-09-15 Efficient Video to Audio Mapper with Visual Scene Detection Mingjing Yi et.al. 2409.09823v1 null
2024-09-15 Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition Chao-Han Huck Yang et.al. 2409.09785v2 null
2024-09-15 Self-supervised Multimodal Speech Representations for the Assessment of Schizophrenia Symptoms Gowtham Premananth et.al. 2409.09733v1 null
2024-09-15 A Comprehensive Methodological Survey of Human Activity Recognition Across Divers Data Modalities Jungpil Shin et.al. 2409.09678v1 null
2024-09-15 Self-supervised Learning for Acoustic Few-Shot Classification Jingyong Liang et.al. 2409.09647v1 null
2024-09-15 Extract and Diffuse: Latent Integration for Improved Diffusion-based Speech and Vocal Enhancement Yudong Yang et.al. 2409.09642v1 null
2024-09-15 Stutter-Solver: End-to-end Multi-lingual Dysfluency Detection Xuanru Zhou et.al. 2409.09621v1 link