2025-06-26 |
SmoothSinger: A Conditional Diffusion Model for Singing Voice Synthesis with Multi-Resolution Architecture |
Kehan Sui et.al. |
2506.21478v1 |
null |
2025-06-26 |
A Keyword-Based Technique to Evaluate Broad Question Answer Script |
Tamim Al Mahmud et.al. |
2506.21461v1 |
null |
2025-06-26 |
Hybrid Deep Learning and Signal Processing for Arabic Dialect Recognition in Low-Resource Settings |
Ghazal Al-Shwayyat et.al. |
2506.21386v1 |
null |
2025-06-26 |
Prompt-Guided Turn-Taking Prediction |
Koji Inoue et.al. |
2506.21191v1 |
null |
2025-06-25 |
The role of audio-visual integration in the time course of phonetic encoding in self-supervised speech models |
Yi Wang et.al. |
2506.20361v1 |
null |
2025-06-25 |
CBF-AFA: Chunk-Based Multi-SSL Fusion for Automatic Fluency Assessment |
Papa Séga Wade et.al. |
2506.20243v1 |
null |
2025-06-25 |
An Exploration of ECAPA-TDNN and x-vector Speaker Representations in Zero-shot Multi-speaker TTS |
Marie Kunešová et.al. |
2506.20190v1 |
null |
2025-06-24 |
Vo-Ve: An Explainable Voice-Vector for Speaker Identity Evaluation |
Jaejun Lee et.al. |
2506.19446v1 |
null |
2025-06-24 |
Learning to assess subjective impressions from speech |
Yuto Kondo et.al. |
2506.19335v1 |
null |
2025-06-23 |
Selecting N-lowest scores for training MOS prediction models |
Yuto Kondo et.al. |
2506.18326v1 |
null |
2025-06-23 |
Rethinking Mean Opinion Scores in Speech Quality Assessment: Aggregation through Quantized Distribution Fitting |
Yuto Kondo et.al. |
2506.18307v1 |
null |
2025-06-23 |
JIS: A Speech Corpus of Japanese Idol Speakers with Various Speaking Styles |
Yuto Kondo et.al. |
2506.18296v1 |
null |
2025-06-22 |
Human Voice is Unique |
Rita Singh et.al. |
2506.18182v1 |
null |
2025-06-22 |
Causal Interventions in Bond Multi-Dealer-to-Client Platforms |
Paloma Marín et.al. |
2506.18147v1 |
null |
2025-06-22 |
AI Harmonizer: Expanding Vocal Expression with a Generative Neurosymbolic Music AI System |
Lancelot Blanchard et.al. |
2506.18143v1 |
null |
2025-06-22 |
Face-Voice Association for Audiovisual Active Speaker Detection in Egocentric Recordings |
Jason Clarke et.al. |
2506.18055v1 |
null |
2025-06-21 |
Data Quality Issues in Multilingual Speech Datasets: The Need for Sociolinguistic Awareness and Proactive Language Planning |
Mingfei Lau et.al. |
2506.17525v1 |
null |
2025-06-19 |
Unpacking Generative AI in Education: Computational Modeling of Teacher and Student Perspectives in Social Media Discourse |
Paulina DeVito et.al. |
2506.16412v1 |
null |
2025-06-19 |
Improved Intelligibility of Dysarthric Speech using Conditional Flow Matching |
Shoutrik Das et.al. |
2506.16127v1 |
null |
2025-06-19 |
VS-Singer: Vision-Guided Stereo Singing Voice Synthesis with Consistency Schrödinger Bridge |
Zijing Zhao et.al. |
2506.16020v1 |
null |
2025-06-18 |
PredGen: Accelerated Inference of Large Language Models through Input-Time Speculation for Real-Time Speech Interaction |
Shufan Li et.al. |
2506.15556v1 |
null |
2025-06-18 |
"How can we learn and use AI at the same time?": Participatory Design of GenAI with High School Students |
Isabella Pu et.al. |
2506.15525v2 |
null |
2025-06-18 |
Foundation of Affective Computing and Interaction |
Changzeng Fu et.al. |
2506.15497v1 |
null |
2025-06-18 |
I Know You're Listening: Adaptive Voice for HRI |
Paige Tuttösí et.al. |
2506.15107v1 |
null |
2025-06-18 |
EmojiVoice: Towards long-term controllable expressivity in robot speech |
Paige Tuttösí et.al. |
2506.15085v1 |
null |
2025-06-17 |
A Comparative Evaluation of Deep Learning Models for Speech Enhancement in Real-World Noisy Environments |
Md Jahangir Alam Khondkar et.al. |
2506.15000v1 |
link |
2025-06-17 |
ASAP-FE: Energy-Efficient Feature Extraction Enabling Multi-Channel Keyword Spotting on Edge Processors |
Jongin Choi et.al. |
2506.14657v1 |
null |
2025-06-17 |
Vela: Scalable Embeddings with Voice Large Language Models for Multimodal Retrieval |
Ruofan Hu et.al. |
2506.14445v1 |
null |
2025-06-17 |
SLEEPING-DISCO 9M: A large-scale pre-training dataset for generative music modeling |
Tawsif Ahmed et.al. |
2506.14293v3 |
null |
2025-06-16 |
Multimodal "Puppeteer": An Exploration of Robot Teleoperation Via Virtual Counterpart with LLM-Driven Voice and Gesture Interaction in Augmented Reality |
Yuchong Zhang et.al. |
2506.13189v1 |
null |