2025-06-26 |
GGTalker: Talking Head Systhesis with Generalizable Gaussian Priors and Identity-Specific Adaptation |
Wentao Hu et.al. |
2506.21513v1 |
null |
2025-06-26 |
SmoothSinger: A Conditional Diffusion Model for Singing Voice Synthesis with Multi-Resolution Architecture |
Kehan Sui et.al. |
2506.21478v1 |
null |
2025-06-26 |
Aligning Spoken Dialogue Models from User Interactions |
Anne Wu et.al. |
2506.21463v1 |
null |
2025-06-26 |
ThinkSound: Chain-of-Thought Reasoning in Multimodal Large Language Models for Audio Generation and Editing |
Huadai Liu et.al. |
2506.21448v1 |
null |
2025-06-26 |
Learnable Adaptive Time-Frequency Representation via Differentiable Short-Time Fourier Transform |
Maxime Leiber et.al. |
2506.21440v1 |
null |
2025-06-26 |
Deception Detection in Dyadic Exchanges Using Multimodal Machine Learning: A Study on a Swedish Cohort |
Franco Rugolon et.al. |
2506.21429v1 |
null |
2025-06-26 |
Hybrid Deep Learning and Signal Processing for Arabic Dialect Recognition in Low-Resource Settings |
Ghazal Al-Shwayyat et.al. |
2506.21386v1 |
null |
2025-06-26 |
Exploring Adapter Design Tradeoffs for Low Resource Music Generation |
Atharva Mehta et.al. |
2506.21298v1 |
null |
2025-06-26 |
Integrating Vehicle Acoustic Data for Enhanced Urban Traffic Management: A Study on Speed Classification in Suzhou |
Pengfei Fan et.al. |
2506.21269v1 |
null |
2025-06-26 |
Prompt-Guided Turn-Taking Prediction |
Koji Inoue et.al. |
2506.21191v1 |
null |
2025-06-26 |
Performance improvement of spatial semantic segmentation with enriched audio features and agent-based error correction for DCASE 2025 Challenge Task 4 |
Jongyeon Park et.al. |
2506.21174v1 |
null |
2025-06-26 |
A Hierarchical Deep Learning Approach for Minority Instrument Detection |
Dylan Sechet et.al. |
2506.21167v1 |
null |
2025-06-26 |
Post-training for Deepfake Speech Detection |
Wanying Ge et.al. |
2506.21090v1 |
null |
2025-06-26 |
PeakNetFP: Peak-based Neural Audio Fingerprinting Robust to Extreme Time Stretching |
Guillem Cortès-Sebastià et.al. |
2506.21086v1 |
null |
2025-06-26 |
CodecSlime: Temporal Redundancy Compression of Neural Speech Codec via Dynamic Frame Rate |
Hankun Wang et.al. |
2506.21074v1 |
null |
2025-06-26 |
Step-by-Step Video-to-Audio Synthesis via Negative Audio Guidance |
Akio Hayakawa et.al. |
2506.20995v1 |
null |
2025-06-26 |
OmniEval: A Benchmark for Evaluating Omni-modal Models with Visual, Auditory, and Textual Inputs |
Yiman Zhang et.al. |
2506.20960v1 |
null |
2025-06-26 |
A Multi-Stage Framework for Multimodal Controllable Speech Synthesis |
Rui Niu et.al. |
2506.20945v1 |
null |
2025-06-25 |
Universal and Efficient Detection of Adversarial Data through Nonuniform Impact on Network Layers |
Furkan Mumcu et.al. |
2506.20816v1 |
null |
2025-06-25 |
Deciphering GunType Hierarchy through Acoustic Analysis of Gunshot Recordings |
Ankit Shah et.al. |
2506.20609v1 |
null |
2025-06-25 |
Multimodal Representation Learning and Fusion |
Qihang Jin et.al. |
2506.20494v1 |
null |
2025-06-25 |
The role of audio-visual integration in the time course of phonetic encoding in self-supervised speech models |
Yi Wang et.al. |
2506.20361v1 |
null |
2025-06-25 |
Feature Hallucination for Self-supervised Action Recognition |
Lei Wang et.al. |
2506.20342v1 |
null |
2025-06-25 |
Malicious earworms and useful memes, how the far-right surfs on TikTok audio trends |
Marloes Geboers et.al. |
2506.20695v1 |
null |
2025-06-25 |
Lightweight Target-Speaker-Based Overlap Transcription for Practical Streaming ASR |
Aleš Pražák et.al. |
2506.20288v1 |
null |
2025-06-25 |
CBF-AFA: Chunk-Based Multi-SSL Fusion for Automatic Fluency Assessment |
Papa Séga Wade et.al. |
2506.20243v1 |
null |
2025-06-25 |
An Exploration of ECAPA-TDNN and x-vector Speaker Representations in Zero-shot Multi-speaker TTS |
Marie Kunešová et.al. |
2506.20190v1 |
null |
2025-06-25 |
MEL: Multi-level Ensemble Learning for Resource-Constrained Environments |
Krishna Praneet Gudipaty et.al. |
2506.20094v1 |
null |
2025-06-24 |
Neuromorphic Wireless Split Computing with Resonate-and-Fire Neurons |
Dengyu Wu et.al. |
2506.20015v1 |
null |
2025-06-24 |
Improved Topology-Independent Distributed Adaptive Node-Specific Signal Estimation for Wireless Acoustic Sensor Networks |
Paul Didier et.al. |
2506.20001v1 |
null |