Skip to content

Singing Voice Synthesis and Conversion

Singing Voice Synthesis and Conversion

Publish Date Title Authors PDF Code
2024-09-16 An Efficient Self-Learning Framework For Interactive Spoken Dialog Systems Hitesh Tulsiani et.al. 2409.10515v1 null
2024-09-16 Voice control interface for surgical robot assistants Ana Davila et.al. 2409.10225v1 null
2024-09-16 Speaker Contrastive Learning for Source Speaker Tracing Qing Wang et.al. 2409.10072v1 null
2024-09-16 StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style Diffusion Yinghao Aaron Li et.al. 2409.10058v1 null
2024-09-16 DNN-based ensemble singing voice synthesis with interactions between singers Hiroaki Hyodo et.al. 2409.09988v1 null
2024-09-15 Constructing a Singing Style Caption Dataset Hyunjong Ok et.al. 2409.09866v1 link
2024-09-15 Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition Chao-Han Huck Yang et.al. 2409.09785v2 null
2024-09-14 MacST: Multi-Accent Speech Synthesis via Text Transliteration for Accent Conversion Sho Inoue et.al. 2409.09352v1 null
2024-09-14 DSCLAP: Domain-Specific Contrastive Language-Audio Pre-Training Shengqiang Liu et.al. 2409.09289v1 null
2024-09-14 M$^{3}$V: A multi-modal multi-view approach for Device-Directed Speech Detection Anna Wang et.al. 2409.09284v1 null
2024-09-14 SafeEar: Content Privacy-Preserving Audio Deepfake Detection Xinfeng Li et.al. 2409.09272v1 null
2024-09-13 Seed-Music: A Unified Framework for High Quality and Controlled Music Generation Ye Bai et.al. 2409.09214v1 null
2024-09-13 HLTCOE JHU Submission to the Voice Privacy Challenge 2024 Henry Li Xinyuan et.al. 2409.08913v2 null
2024-09-13 DualSep: A Light-weight dual-encoder convolutional recurrent network for real-time in-car speech separation Ziqian Wang et.al. 2409.08610v1 null
2024-09-13 Effective Integration of KAN for Keyword Spotting Anfeng Xu et.al. 2409.08605v1 null
2024-09-13 LHQ-SVC: Lightweight and High Quality Singing Voice Conversion Modeling Yubo Huang et.al. 2409.08583v1 null
2024-09-13 Incorporating Procedural Fairness in Flag Submissions on Social Media Platforms Yunhee Shim et.al. 2409.08498v1 null
2024-09-13 Beyond Functionality: Co-Designing Voice User Interfaces for Older Adults' Well-being Xinhui Hu et.al. 2409.08449v1 null
2024-09-12 Zero-Shot Sing Voice Conversion: built upon clustering-based phoneme representations Wangjin Zhou et.al. 2409.08039v1 null
2024-09-12 Detecting and Defending Against Adversarial Attacks on Automatic Speech Recognition via Diffusion Models Nikolai L. Kühne et.al. 2409.07936v1 null
2024-09-12 Bridging Discrete and Continuous: A Multimodal Strategy for Complex Emotion Detection Jiehui Jia et.al. 2409.07901v1 null
2024-09-11 Echoes of Privacy: Uncovering the Profiling Practices of Voice Assistants Tina Khezresmaeilzadeh et.al. 2409.07444v2 null
2024-09-11 D-CAPTCHA++: A Study of Resilience of Deepfake CAPTCHA under Transferable Imperceptible Adversarial Attack Hong-Hanh Nguyen-Le et.al. 2409.07390v1 null
2024-09-11 Cross-Dialect Text-To-Speech in Pitch-Accent Language Incorporating Multi-Dialect Phoneme-Level BERT Kazuki Yamauchi et.al. 2409.07265v1 null
2024-09-11 Muskits-ESPnet: A Comprehensive Toolkit for Singing Voice Synthesis in New Paradigm Yuning Wu et.al. 2409.07226v1 link
2024-09-11 A Continual and Incremental Learning Approach for TinyML On-device Training Using Dataset Distillation and Model Size Adaption Marcus Rüb et.al. 2409.07114v1 null
2024-09-11 Bio-Eng-LMM AI Assist chatbot: A Comprehensive Tool for Research and Education Ali Forootani et.al. 2409.07110v1 null
2024-09-11 The VoiceMOS Challenge 2024: Beyond Speech Quality Prediction Wen-Chin Huang et.al. 2409.07001v1 null
2024-09-10 VoiceWukong: Benchmarking Deepfake Voice Detection Ziwei Yan et.al. 2409.06348v1 null
2024-09-10 InstructSing: High-Fidelity Singing Voice Generation via Instructing Yourself Chang Zeng et.al. 2409.06330v1 null