Singing Voice Synthesis and Conversion

Publish Date	Title	Authors	PDF	Code
2025-06-26	SmoothSinger: A Conditional Diffusion Model for Singing Voice Synthesis with Multi-Resolution Architecture	Kehan Sui et.al.	2506.21478v1	null
2025-06-26	A Keyword-Based Technique to Evaluate Broad Question Answer Script	Tamim Al Mahmud et.al.	2506.21461v1	null
2025-06-26	Hybrid Deep Learning and Signal Processing for Arabic Dialect Recognition in Low-Resource Settings	Ghazal Al-Shwayyat et.al.	2506.21386v1	null
2025-06-26	Prompt-Guided Turn-Taking Prediction	Koji Inoue et.al.	2506.21191v1	null
2025-06-25	The role of audio-visual integration in the time course of phonetic encoding in self-supervised speech models	Yi Wang et.al.	2506.20361v1	null
2025-06-25	CBF-AFA: Chunk-Based Multi-SSL Fusion for Automatic Fluency Assessment	Papa Séga Wade et.al.	2506.20243v1	null
2025-06-25	An Exploration of ECAPA-TDNN and x-vector Speaker Representations in Zero-shot Multi-speaker TTS	Marie Kunešová et.al.	2506.20190v1	null
2025-06-24	Vo-Ve: An Explainable Voice-Vector for Speaker Identity Evaluation	Jaejun Lee et.al.	2506.19446v1	null
2025-06-24	Learning to assess subjective impressions from speech	Yuto Kondo et.al.	2506.19335v1	null
2025-06-23	Selecting N-lowest scores for training MOS prediction models	Yuto Kondo et.al.	2506.18326v1	null
2025-06-23	Rethinking Mean Opinion Scores in Speech Quality Assessment: Aggregation through Quantized Distribution Fitting	Yuto Kondo et.al.	2506.18307v1	null
2025-06-23	JIS: A Speech Corpus of Japanese Idol Speakers with Various Speaking Styles	Yuto Kondo et.al.	2506.18296v1	null
2025-06-22	Human Voice is Unique	Rita Singh et.al.	2506.18182v1	null
2025-06-22	Causal Interventions in Bond Multi-Dealer-to-Client Platforms	Paloma Marín et.al.	2506.18147v1	null
2025-06-22	AI Harmonizer: Expanding Vocal Expression with a Generative Neurosymbolic Music AI System	Lancelot Blanchard et.al.	2506.18143v1	null
2025-06-22	Face-Voice Association for Audiovisual Active Speaker Detection in Egocentric Recordings	Jason Clarke et.al.	2506.18055v1	null
2025-06-21	Data Quality Issues in Multilingual Speech Datasets: The Need for Sociolinguistic Awareness and Proactive Language Planning	Mingfei Lau et.al.	2506.17525v1	null
2025-06-19	Unpacking Generative AI in Education: Computational Modeling of Teacher and Student Perspectives in Social Media Discourse	Paulina DeVito et.al.	2506.16412v1	null
2025-06-19	Improved Intelligibility of Dysarthric Speech using Conditional Flow Matching	Shoutrik Das et.al.	2506.16127v1	null
2025-06-19	VS-Singer: Vision-Guided Stereo Singing Voice Synthesis with Consistency Schrödinger Bridge	Zijing Zhao et.al.	2506.16020v1	null
2025-06-18	PredGen: Accelerated Inference of Large Language Models through Input-Time Speculation for Real-Time Speech Interaction	Shufan Li et.al.	2506.15556v1	null
2025-06-18	"How can we learn and use AI at the same time?": Participatory Design of GenAI with High School Students	Isabella Pu et.al.	2506.15525v2	null
2025-06-18	Foundation of Affective Computing and Interaction	Changzeng Fu et.al.	2506.15497v1	null
2025-06-18	I Know You're Listening: Adaptive Voice for HRI	Paige Tuttösí et.al.	2506.15107v1	null
2025-06-18	EmojiVoice: Towards long-term controllable expressivity in robot speech	Paige Tuttösí et.al.	2506.15085v1	null
2025-06-17	A Comparative Evaluation of Deep Learning Models for Speech Enhancement in Real-World Noisy Environments	Md Jahangir Alam Khondkar et.al.	2506.15000v1	link
2025-06-17	ASAP-FE: Energy-Efficient Feature Extraction Enabling Multi-Channel Keyword Spotting on Edge Processors	Jongin Choi et.al.	2506.14657v1	null
2025-06-17	Vela: Scalable Embeddings with Voice Large Language Models for Multimodal Retrieval	Ruofan Hu et.al.	2506.14445v1	null
2025-06-17	SLEEPING-DISCO 9M: A large-scale pre-training dataset for generative music modeling	Tawsif Ahmed et.al.	2506.14293v3	null
2025-06-16	Multimodal "Puppeteer": An Exploration of Robot Teleoperation Via Virtual Counterpart with LLM-Driven Voice and Gesture Interaction in Augmented Reality	Yuchong Zhang et.al.	2506.13189v1	null