2025-06-26 |
Whole-Body Conditioned Egocentric Video Prediction |
Yutong Bai et.al. |
2506.21552v1 |
null |
2025-06-26 |
SiM3D: Single-instance Multiview Multimodal and Multisetup 3D Anomaly Detection Benchmark |
Alex Costanzino et.al. |
2506.21549v1 |
null |
2025-06-26 |
SAM4D: Segment Anything in Camera and LiDAR Streams |
Jianyun Xu et.al. |
2506.21547v1 |
null |
2025-06-26 |
HalluSegBench: Counterfactual Visual Reasoning for Segmentation Hallucination Evaluation |
Xinzhuo Li et.al. |
2506.21546v1 |
null |
2025-06-26 |
DeOcc-1-to-3: 3D De-Occlusion from a Single Image via Self-Supervised Multi-View Diffusion |
Yansong Qu et.al. |
2506.21544v1 |
null |
2025-06-26 |
StruMamba3D: Exploring Structural Mamba for Self-supervised Point Cloud Representation Learning |
Chuxin Wang et.al. |
2506.21541v1 |
null |
2025-06-26 |
WorldVLA: Towards Autoregressive Action World Model |
Jun Cen et.al. |
2506.21539v1 |
null |
2025-06-26 |
Maximal Matching Matters: Preventing Representation Collapse for Robust Cross-Modal Retrieval |
Hani Alomari et.al. |
2506.21538v1 |
null |
2025-06-26 |
ResQ: A Novel Framework to Implement Residual Neural Networks on Analog Rydberg Atom Quantum Computers |
Nicholas S. DiBrita et.al. |
2506.21537v1 |
null |
2025-06-26 |
Exploring the Design Space of 3D MLLMs for CT Report Generation |
Mohammed Baharoon et.al. |
2506.21535v1 |
null |
2025-06-26 |
WAFT: Warping-Alone Field Transforms for Optical Flow |
Yihan Wang et.al. |
2506.21526v1 |
null |
2025-06-26 |
MADrive: Memory-Augmented Driving Scene Modeling |
Polina Karpikova et.al. |
2506.21520v1 |
null |
2025-06-26 |
G$^{2}$D: Boosting Multimodal Learning with Gradient-Guided Distillation |
Mohammed Rakib et.al. |
2506.21514v1 |
null |
2025-06-26 |
GGTalker: Talking Head Systhesis with Generalizable Gaussian Priors and Identity-Specific Adaptation |
Wentao Hu et.al. |
2506.21513v1 |
null |
2025-06-26 |
Mitigating Hallucination of Large Vision-Language Models via Dynamic Logits Calibration |
Jiahe Chen et.al. |
2506.21509v1 |
null |
2025-06-26 |
Process mining-driven modeling and simulation to enhance fault diagnosis in cyber-physical systems |
Francesco Vitale et.al. |
2506.21502v1 |
null |
2025-06-26 |
Lightweight Physics-Informed Zero-Shot Ultrasound Plane Wave Denoising |
Hojat Asgariandehkordi et.al. |
2506.21499v1 |
null |
2025-06-26 |
From multi-allocations to allocations, with subadditive valuations |
Uriel Feige et.al. |
2506.21493v1 |
null |
2025-06-26 |
Towards Reliable Detection of Empty Space: Conditional Marked Point Processes for Object Detection |
Tobias J. Riedlinger et.al. |
2506.21486v1 |
null |
2025-06-26 |
TITAN: Query-Token based Domain Adaptive Adversarial Learning |
Tajamul Ashraf et.al. |
2506.21484v1 |
null |
2025-06-26 |
An equation-based batch distillation simulation to evaluate the effect of multiplicities in thermodynamic activity coefficients |
Jennifer Werner et.al. |
2506.21483v1 |
null |
2025-06-26 |
Global and Local Entailment Learning for Natural World Imagery |
Srikumar Sastry et.al. |
2506.21476v1 |
null |
2025-06-26 |
Reinforcement Learning for Optimal Control of Spin Magnetometers |
Logan W. Cooke et.al. |
2506.21475v1 |
null |
2025-06-26 |
Logios : An open source Greek Polytonic Optical Character Recognition system |
Perifanos Konstantinos et.al. |
2506.21474v1 |
null |
2025-06-26 |
Evaluation of Traffic Signals for Daily Traffic Pattern |
Mohammad Shokrolah Shirazi et.al. |
2506.21469v1 |
null |
2025-06-26 |
TopK Language Models |
Ryosuke Takahashi et.al. |
2506.21468v1 |
null |
2025-06-26 |
Spatial Mental Modeling from Limited Views |
Baiqiao Yin et.al. |
2506.21458v1 |
null |
2025-06-26 |
Rethinking Oversaturation in Classifier-Free Guidance via Low Frequency |
Kaiyu Song et.al. |
2506.21452v1 |
null |
2025-06-26 |
A Comprehensive Dataset for Underground Miner Detection in Diverse Scenario |
Cyrus Addy et.al. |
2506.21451v1 |
null |
2025-06-26 |
ThinkSound: Chain-of-Thought Reasoning in Multimodal Large Language Models for Audio Generation and Editing |
Huadai Liu et.al. |
2506.21448v1 |
null |