2024-09-16 |
An Efficient Self-Learning Framework For Interactive Spoken Dialog Systems |
Hitesh Tulsiani et.al. |
2409.10515v1 |
null |
2024-09-16 |
Voice control interface for surgical robot assistants |
Ana Davila et.al. |
2409.10225v1 |
null |
2024-09-16 |
Speaker Contrastive Learning for Source Speaker Tracing |
Qing Wang et.al. |
2409.10072v1 |
null |
2024-09-16 |
StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style Diffusion |
Yinghao Aaron Li et.al. |
2409.10058v1 |
null |
2024-09-16 |
DNN-based ensemble singing voice synthesis with interactions between singers |
Hiroaki Hyodo et.al. |
2409.09988v1 |
null |
2024-09-15 |
Constructing a Singing Style Caption Dataset |
Hyunjong Ok et.al. |
2409.09866v1 |
link |
2024-09-15 |
Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition |
Chao-Han Huck Yang et.al. |
2409.09785v2 |
null |
2024-09-14 |
MacST: Multi-Accent Speech Synthesis via Text Transliteration for Accent Conversion |
Sho Inoue et.al. |
2409.09352v1 |
null |
2024-09-14 |
DSCLAP: Domain-Specific Contrastive Language-Audio Pre-Training |
Shengqiang Liu et.al. |
2409.09289v1 |
null |
2024-09-14 |
M$^{3}$V: A multi-modal multi-view approach for Device-Directed Speech Detection |
Anna Wang et.al. |
2409.09284v1 |
null |
2024-09-14 |
SafeEar: Content Privacy-Preserving Audio Deepfake Detection |
Xinfeng Li et.al. |
2409.09272v1 |
null |
2024-09-13 |
Seed-Music: A Unified Framework for High Quality and Controlled Music Generation |
Ye Bai et.al. |
2409.09214v1 |
null |
2024-09-13 |
HLTCOE JHU Submission to the Voice Privacy Challenge 2024 |
Henry Li Xinyuan et.al. |
2409.08913v2 |
null |
2024-09-13 |
DualSep: A Light-weight dual-encoder convolutional recurrent network for real-time in-car speech separation |
Ziqian Wang et.al. |
2409.08610v1 |
null |
2024-09-13 |
Effective Integration of KAN for Keyword Spotting |
Anfeng Xu et.al. |
2409.08605v1 |
null |
2024-09-13 |
LHQ-SVC: Lightweight and High Quality Singing Voice Conversion Modeling |
Yubo Huang et.al. |
2409.08583v1 |
null |
2024-09-13 |
Incorporating Procedural Fairness in Flag Submissions on Social Media Platforms |
Yunhee Shim et.al. |
2409.08498v1 |
null |
2024-09-13 |
Beyond Functionality: Co-Designing Voice User Interfaces for Older Adults' Well-being |
Xinhui Hu et.al. |
2409.08449v1 |
null |
2024-09-12 |
Zero-Shot Sing Voice Conversion: built upon clustering-based phoneme representations |
Wangjin Zhou et.al. |
2409.08039v1 |
null |
2024-09-12 |
Detecting and Defending Against Adversarial Attacks on Automatic Speech Recognition via Diffusion Models |
Nikolai L. Kühne et.al. |
2409.07936v1 |
null |
2024-09-12 |
Bridging Discrete and Continuous: A Multimodal Strategy for Complex Emotion Detection |
Jiehui Jia et.al. |
2409.07901v1 |
null |
2024-09-11 |
Echoes of Privacy: Uncovering the Profiling Practices of Voice Assistants |
Tina Khezresmaeilzadeh et.al. |
2409.07444v2 |
null |
2024-09-11 |
D-CAPTCHA++: A Study of Resilience of Deepfake CAPTCHA under Transferable Imperceptible Adversarial Attack |
Hong-Hanh Nguyen-Le et.al. |
2409.07390v1 |
null |
2024-09-11 |
Cross-Dialect Text-To-Speech in Pitch-Accent Language Incorporating Multi-Dialect Phoneme-Level BERT |
Kazuki Yamauchi et.al. |
2409.07265v1 |
null |
2024-09-11 |
Muskits-ESPnet: A Comprehensive Toolkit for Singing Voice Synthesis in New Paradigm |
Yuning Wu et.al. |
2409.07226v1 |
link |
2024-09-11 |
A Continual and Incremental Learning Approach for TinyML On-device Training Using Dataset Distillation and Model Size Adaption |
Marcus Rüb et.al. |
2409.07114v1 |
null |
2024-09-11 |
Bio-Eng-LMM AI Assist chatbot: A Comprehensive Tool for Research and Education |
Ali Forootani et.al. |
2409.07110v1 |
null |
2024-09-11 |
The VoiceMOS Challenge 2024: Beyond Speech Quality Prediction |
Wen-Chin Huang et.al. |
2409.07001v1 |
null |
2024-09-10 |
VoiceWukong: Benchmarking Deepfake Voice Detection |
Ziwei Yan et.al. |
2409.06348v1 |
null |
2024-09-10 |
InstructSing: High-Fidelity Singing Voice Generation via Instructing Yourself |
Chang Zeng et.al. |
2409.06330v1 |
null |