Skip to content

Audio Understanding

Audio Understanding

Publish Date Title Authors PDF Code
2025-06-26 GGTalker: Talking Head Systhesis with Generalizable Gaussian Priors and Identity-Specific Adaptation Wentao Hu et.al. 2506.21513v1 null
2025-06-26 SmoothSinger: A Conditional Diffusion Model for Singing Voice Synthesis with Multi-Resolution Architecture Kehan Sui et.al. 2506.21478v1 null
2025-06-26 Aligning Spoken Dialogue Models from User Interactions Anne Wu et.al. 2506.21463v1 null
2025-06-26 ThinkSound: Chain-of-Thought Reasoning in Multimodal Large Language Models for Audio Generation and Editing Huadai Liu et.al. 2506.21448v1 null
2025-06-26 Learnable Adaptive Time-Frequency Representation via Differentiable Short-Time Fourier Transform Maxime Leiber et.al. 2506.21440v1 null
2025-06-26 Deception Detection in Dyadic Exchanges Using Multimodal Machine Learning: A Study on a Swedish Cohort Franco Rugolon et.al. 2506.21429v1 null
2025-06-26 Hybrid Deep Learning and Signal Processing for Arabic Dialect Recognition in Low-Resource Settings Ghazal Al-Shwayyat et.al. 2506.21386v1 null
2025-06-26 Exploring Adapter Design Tradeoffs for Low Resource Music Generation Atharva Mehta et.al. 2506.21298v1 null
2025-06-26 Integrating Vehicle Acoustic Data for Enhanced Urban Traffic Management: A Study on Speed Classification in Suzhou Pengfei Fan et.al. 2506.21269v1 null
2025-06-26 Prompt-Guided Turn-Taking Prediction Koji Inoue et.al. 2506.21191v1 null
2025-06-26 Performance improvement of spatial semantic segmentation with enriched audio features and agent-based error correction for DCASE 2025 Challenge Task 4 Jongyeon Park et.al. 2506.21174v1 null
2025-06-26 A Hierarchical Deep Learning Approach for Minority Instrument Detection Dylan Sechet et.al. 2506.21167v1 null
2025-06-26 Post-training for Deepfake Speech Detection Wanying Ge et.al. 2506.21090v1 null
2025-06-26 PeakNetFP: Peak-based Neural Audio Fingerprinting Robust to Extreme Time Stretching Guillem Cortès-Sebastià et.al. 2506.21086v1 null
2025-06-26 CodecSlime: Temporal Redundancy Compression of Neural Speech Codec via Dynamic Frame Rate Hankun Wang et.al. 2506.21074v1 null
2025-06-26 Step-by-Step Video-to-Audio Synthesis via Negative Audio Guidance Akio Hayakawa et.al. 2506.20995v1 null
2025-06-26 OmniEval: A Benchmark for Evaluating Omni-modal Models with Visual, Auditory, and Textual Inputs Yiman Zhang et.al. 2506.20960v1 null
2025-06-26 A Multi-Stage Framework for Multimodal Controllable Speech Synthesis Rui Niu et.al. 2506.20945v1 null
2025-06-25 Universal and Efficient Detection of Adversarial Data through Nonuniform Impact on Network Layers Furkan Mumcu et.al. 2506.20816v1 null
2025-06-25 Deciphering GunType Hierarchy through Acoustic Analysis of Gunshot Recordings Ankit Shah et.al. 2506.20609v1 null
2025-06-25 Multimodal Representation Learning and Fusion Qihang Jin et.al. 2506.20494v1 null
2025-06-25 The role of audio-visual integration in the time course of phonetic encoding in self-supervised speech models Yi Wang et.al. 2506.20361v1 null
2025-06-25 Feature Hallucination for Self-supervised Action Recognition Lei Wang et.al. 2506.20342v1 null
2025-06-25 Malicious earworms and useful memes, how the far-right surfs on TikTok audio trends Marloes Geboers et.al. 2506.20695v1 null
2025-06-25 Lightweight Target-Speaker-Based Overlap Transcription for Practical Streaming ASR Aleš Pražák et.al. 2506.20288v1 null
2025-06-25 CBF-AFA: Chunk-Based Multi-SSL Fusion for Automatic Fluency Assessment Papa Séga Wade et.al. 2506.20243v1 null
2025-06-25 An Exploration of ECAPA-TDNN and x-vector Speaker Representations in Zero-shot Multi-speaker TTS Marie Kunešová et.al. 2506.20190v1 null
2025-06-25 MEL: Multi-level Ensemble Learning for Resource-Constrained Environments Krishna Praneet Gudipaty et.al. 2506.20094v1 null
2025-06-24 Neuromorphic Wireless Split Computing with Resonate-and-Fire Neurons Dengyu Wu et.al. 2506.20015v1 null
2025-06-24 Improved Topology-Independent Distributed Adaptive Node-Specific Signal Estimation for Wireless Acoustic Sensor Networks Paul Didier et.al. 2506.20001v1 null