Speech Synthesis and Conversion

Publish Date	Title	Authors	PDF	Code
2025-06-26	DeOcc-1-to-3: 3D De-Occlusion from a Single Image via Self-Supervised Multi-View Diffusion	Yansong Qu et.al.	2506.21544v1	null
2025-06-26	MADrive: Memory-Augmented Driving Scene Modeling	Polina Karpikova et.al.	2506.21520v1	null
2025-06-26	Mind2Web 2: Evaluating Agentic Search with Agent-as-a-Judge	Boyu Gou et.al.	2506.21506v1	null
2025-06-26	SmoothSinger: A Conditional Diffusion Model for Singing Voice Synthesis with Multi-Resolution Architecture	Kehan Sui et.al.	2506.21478v1	null
2025-06-26	ThinkSound: Chain-of-Thought Reasoning in Multimodal Large Language Models for Audio Generation and Editing	Huadai Liu et.al.	2506.21448v1	null
2025-06-26	Low-metallicity massive single stars with rotation. III. Source of ionization and C-IV emission in I Zw 18	Dorottya Szécsi et.al.	2506.21442v1	null
2025-06-26	EndoFlow-SLAM: Real-Time Endoscopic SLAM with Flow-Constrained Gaussian Splatting	Taoyu Wu et.al.	2506.21420v1	null
2025-06-26	XVerse: Consistent Multi-Subject Control of Identity and Semantic Attributes via DiT Modulation	Bowen Chen et.al.	2506.21416v1	null
2025-06-26	DynamicBench: Evaluating Real-Time Report Generation in Large Language Models	Jingyao Li et.al.	2506.21343v1	null
2025-06-26	Rubidium Abundances in Cool Giants from High-Resolution H-band Spectra: A New Diagnostic for Galactic Chemical Evolution	Nils Ryde et.al.	2506.21332v1	null
2025-06-26	An H$α$ Cloud in the HI Tail: Recent Star Formation in the Outskirts of NGC 4258 Revealed by Nanshan 1-m Telescope	Cheng Cheng et.al.	2506.21321v1	null
2025-06-26	HieraSurg: Hierarchy-Aware Diffusion Model for Surgical Video Generation	Diego Biagini et.al.	2506.21287v1	null
2025-06-26	FairyGen: Storied Cartoon Video from a Single Child-Drawn Character	Jiayi Zheng et.al.	2506.21272v1	null
2025-06-26	Wurtzite Boron Nitride as a potential defects host	M. Silvetti et.al.	2506.21197v1	null
2025-06-26	Geometry and Perception Guided Gaussians for Multiview-consistent 3D Generation from a Single Image	Pufan Li et.al.	2506.21152v1	null
2025-06-26	Learning to See in the Extremely Dark	Hai Jiang et.al.	2506.21132v1	null
2025-06-26	Improving Diffusion-Based Image Editing Faithfulness via Guidance and Scheduling	Hansam Cho et.al.	2506.21045v1	null
2025-06-26	User-in-the-Loop View Sampling with Error Peaking Visualization	Ayaka Yasunaga et.al.	2506.21009v1	null
2025-06-26	Style-Aligned Image Composition for Robust Detection of Abnormal Cells in Cytopathology	Qiuyi Qi et.al.	2506.21001v1	null
2025-06-26	DBMovi-GS: Dynamic View Synthesis from Blurry Monocular Video via Sparse-Controlled Gaussian Splatting	Yeon-Ji Song et.al.	2506.20998v1	null
2025-06-26	Step-by-Step Video-to-Audio Synthesis via Negative Audio Guidance	Akio Hayakawa et.al.	2506.20995v1	null
2025-06-26	Consistent Zero-shot 3D Texture Synthesis Using Geometry-aware Diffusion and Temporal Video Models	Donggoo Kang et.al.	2506.20946v1	null
2025-06-26	A Multi-Stage Framework for Multimodal Controllable Speech Synthesis	Rui Niu et.al.	2506.20945v1	null
2025-06-25	3DGH: 3D Head Generation with Composable Hair and Face	Chengan He et.al.	2506.20875v1	null
2025-06-25	Nowhere left to hide: revealing realistic gravitational-wave populations in high dimensions and high resolution with PixelPop	Sofia Alvarez-Lopez et.al.	2506.20731v1	null
2025-06-25	Architectural mechanisms of a universal fault-tolerant quantum computer	Dolev Bluvstein et.al.	2506.20661v1	null
2025-06-25	PhasePoly: An Optimization Framework forPhase Polynomials in Quantum Circuits	Zihan Chen et.al.	2506.20624v1	null
2025-06-25	Video Perception Models for 3D Scene Synthesis	Rui Huang et.al.	2506.20601v1	null
2025-06-25	Interplay of magnetic ordering and charge transport in a distorted ScAl$_3$C$_3$-type GdZn$_3$As$_3$	Zhiyu Zhou et.al.	2506.20454v1	null
2025-06-25	HiWave: Training-Free High-Resolution Image Generation via Wavelet-Based Diffusion Sampling	Tobias Vontobel et.al.	2506.20452v1	null