Skip to content

Singing Voice Synthesis and Conversion

Singing Voice Synthesis and Conversion

Publish Date Title Authors PDF Code
2025-06-26 SmoothSinger: A Conditional Diffusion Model for Singing Voice Synthesis with Multi-Resolution Architecture Kehan Sui et.al. 2506.21478v1 null
2025-06-26 A Keyword-Based Technique to Evaluate Broad Question Answer Script Tamim Al Mahmud et.al. 2506.21461v1 null
2025-06-26 Hybrid Deep Learning and Signal Processing for Arabic Dialect Recognition in Low-Resource Settings Ghazal Al-Shwayyat et.al. 2506.21386v1 null
2025-06-26 Prompt-Guided Turn-Taking Prediction Koji Inoue et.al. 2506.21191v1 null
2025-06-25 The role of audio-visual integration in the time course of phonetic encoding in self-supervised speech models Yi Wang et.al. 2506.20361v1 null
2025-06-25 CBF-AFA: Chunk-Based Multi-SSL Fusion for Automatic Fluency Assessment Papa Séga Wade et.al. 2506.20243v1 null
2025-06-25 An Exploration of ECAPA-TDNN and x-vector Speaker Representations in Zero-shot Multi-speaker TTS Marie Kunešová et.al. 2506.20190v1 null
2025-06-24 Vo-Ve: An Explainable Voice-Vector for Speaker Identity Evaluation Jaejun Lee et.al. 2506.19446v1 null
2025-06-24 Learning to assess subjective impressions from speech Yuto Kondo et.al. 2506.19335v1 null
2025-06-23 Selecting N-lowest scores for training MOS prediction models Yuto Kondo et.al. 2506.18326v1 null
2025-06-23 Rethinking Mean Opinion Scores in Speech Quality Assessment: Aggregation through Quantized Distribution Fitting Yuto Kondo et.al. 2506.18307v1 null
2025-06-23 JIS: A Speech Corpus of Japanese Idol Speakers with Various Speaking Styles Yuto Kondo et.al. 2506.18296v1 null
2025-06-22 Human Voice is Unique Rita Singh et.al. 2506.18182v1 null
2025-06-22 Causal Interventions in Bond Multi-Dealer-to-Client Platforms Paloma Marín et.al. 2506.18147v1 null
2025-06-22 AI Harmonizer: Expanding Vocal Expression with a Generative Neurosymbolic Music AI System Lancelot Blanchard et.al. 2506.18143v1 null
2025-06-22 Face-Voice Association for Audiovisual Active Speaker Detection in Egocentric Recordings Jason Clarke et.al. 2506.18055v1 null
2025-06-21 Data Quality Issues in Multilingual Speech Datasets: The Need for Sociolinguistic Awareness and Proactive Language Planning Mingfei Lau et.al. 2506.17525v1 null
2025-06-19 Unpacking Generative AI in Education: Computational Modeling of Teacher and Student Perspectives in Social Media Discourse Paulina DeVito et.al. 2506.16412v1 null
2025-06-19 Improved Intelligibility of Dysarthric Speech using Conditional Flow Matching Shoutrik Das et.al. 2506.16127v1 null
2025-06-19 VS-Singer: Vision-Guided Stereo Singing Voice Synthesis with Consistency Schrödinger Bridge Zijing Zhao et.al. 2506.16020v1 null
2025-06-18 PredGen: Accelerated Inference of Large Language Models through Input-Time Speculation for Real-Time Speech Interaction Shufan Li et.al. 2506.15556v1 null
2025-06-18 "How can we learn and use AI at the same time?": Participatory Design of GenAI with High School Students Isabella Pu et.al. 2506.15525v2 null
2025-06-18 Foundation of Affective Computing and Interaction Changzeng Fu et.al. 2506.15497v1 null
2025-06-18 I Know You're Listening: Adaptive Voice for HRI Paige Tuttösí et.al. 2506.15107v1 null
2025-06-18 EmojiVoice: Towards long-term controllable expressivity in robot speech Paige Tuttösí et.al. 2506.15085v1 null
2025-06-17 A Comparative Evaluation of Deep Learning Models for Speech Enhancement in Real-World Noisy Environments Md Jahangir Alam Khondkar et.al. 2506.15000v1 link
2025-06-17 ASAP-FE: Energy-Efficient Feature Extraction Enabling Multi-Channel Keyword Spotting on Edge Processors Jongin Choi et.al. 2506.14657v1 null
2025-06-17 Vela: Scalable Embeddings with Voice Large Language Models for Multimodal Retrieval Ruofan Hu et.al. 2506.14445v1 null
2025-06-17 SLEEPING-DISCO 9M: A large-scale pre-training dataset for generative music modeling Tawsif Ahmed et.al. 2506.14293v3 null
2025-06-16 Multimodal "Puppeteer": An Exploration of Robot Teleoperation Via Virtual Counterpart with LLM-Driven Voice and Gesture Interaction in Augmented Reality Yuchong Zhang et.al. 2506.13189v1 null