2025-06-26 |
SiM3D: Single-instance Multiview Multimodal and Multisetup 3D Anomaly Detection Benchmark |
Alex Costanzino et.al. |
2506.21549v1 |
null |
2025-06-26 |
HalluSegBench: Counterfactual Visual Reasoning for Segmentation Hallucination Evaluation |
Xinzhuo Li et.al. |
2506.21546v1 |
null |
2025-06-26 |
DeOcc-1-to-3: 3D De-Occlusion from a Single Image via Self-Supervised Multi-View Diffusion |
Yansong Qu et.al. |
2506.21544v1 |
null |
2025-06-26 |
WorldVLA: Towards Autoregressive Action World Model |
Jun Cen et.al. |
2506.21539v1 |
null |
2025-06-26 |
Maximal Matching Matters: Preventing Representation Collapse for Robust Cross-Modal Retrieval |
Hani Alomari et.al. |
2506.21538v1 |
null |
2025-06-26 |
Exploring the Design Space of 3D MLLMs for CT Report Generation |
Mohammed Baharoon et.al. |
2506.21535v1 |
null |
2025-06-26 |
The Kaleidoscope Survey: Strong Gravitational Lensing in Galaxy Clusters with Radial Arcs |
Catherine Cerny et.al. |
2506.21531v1 |
null |
2025-06-26 |
Revealing electron-lattice decoupling by Peltier thermometry and nanoscale thermal imaging in graphene |
Saurabh Kumar Srivastav et.al. |
2506.21523v1 |
null |
2025-06-26 |
Mitigating Hallucination of Large Vision-Language Models via Dynamic Logits Calibration |
Jiahe Chen et.al. |
2506.21509v1 |
null |
2025-06-26 |
Lightweight Physics-Informed Zero-Shot Ultrasound Plane Wave Denoising |
Hojat Asgariandehkordi et.al. |
2506.21499v1 |
null |
2025-06-26 |
TITAN: Query-Token based Domain Adaptive Adversarial Learning |
Tajamul Ashraf et.al. |
2506.21484v1 |
null |
2025-06-26 |
SmoothSinger: A Conditional Diffusion Model for Singing Voice Synthesis with Multi-Resolution Architecture |
Kehan Sui et.al. |
2506.21478v1 |
null |
2025-06-26 |
Wild refitting for black box prediction |
Martin J. Wainwright et.al. |
2506.21460v1 |
null |
2025-06-26 |
Spatial Mental Modeling from Limited Views |
Baiqiao Yin et.al. |
2506.21458v1 |
null |
2025-06-26 |
A Comprehensive Dataset for Underground Miner Detection in Diverse Scenario |
Cyrus Addy et.al. |
2506.21451v1 |
null |
2025-06-26 |
Controllable 3D Placement of Objects with Scene-Aware Diffusion Models |
Mohamed Omran et.al. |
2506.21446v1 |
null |
2025-06-26 |
HyperSORT: Self-Organising Robust Training with hyper-networks |
Samuel Joutard et.al. |
2506.21430v1 |
null |
2025-06-26 |
XVerse: Consistent Multi-Subject Control of Identity and Semantic Attributes via DiT Modulation |
Bowen Chen et.al. |
2506.21416v1 |
null |
2025-06-26 |
Plasmonically Enhanced Flexural-Mode AlScN Nanoplate Resonator as Uncooled and Ultrafast IR Detector with High Responsivity |
Aurelio Venditti et.al. |
2506.21412v1 |
null |
2025-06-26 |
Distributed Cross-Channel Hierarchical Aggregation for Foundation Models |
Aristeidis Tsaris et.al. |
2506.21411v1 |
null |
2025-06-26 |
FastRef:Fast Prototype Refinement for Few-Shot Industrial Anomaly Detection |
Long Tian et.al. |
2506.21398v1 |
null |
2025-06-26 |
High-quality metalens enables minimally invasive CFB endoscopy |
Ruixiang Song et.al. |
2506.21379v1 |
null |
2025-06-26 |
GenFlow: Interactive Modular System for Image Generation |
Duc-Hung Nguyen et.al. |
2506.21369v1 |
null |
2025-06-26 |
rQdia: Regularizing Q-Value Distributions With Image Augmentation |
Sam Lerman et.al. |
2506.21367v1 |
null |
2025-06-26 |
CA-I2P: Channel-Adaptive Registration Network with Global Optimal Selection |
Zhixin Cheng et.al. |
2506.21364v1 |
null |
2025-06-26 |
Primordial Metamaterials |
A. Ware et.al. |
2506.21359v1 |
null |
2025-06-26 |
ToosiCubix: Monocular 3D Cuboid Labeling via Vehicle Part Annotations |
Behrooz Nasihatkon et.al. |
2506.21358v1 |
null |
2025-06-26 |
ShotBench: Expert-Level Cinematic Understanding in Vision-Language Models |
Hongbo Liu et.al. |
2506.21356v1 |
null |
2025-06-26 |
SMMILE: An Expert-Driven Benchmark for Multimodal Medical In-Context Learning |
Melanie Rieff et.al. |
2506.21355v1 |
null |
2025-06-26 |
Generalizable Neural Electromagnetic Inverse Scattering |
Yizhe Cheng et.al. |
2506.21349v1 |
null |