Skip to content

Vision Transformer

Vision Transformer

Publish Date Title Authors PDF Code
2025-06-26 Whole-Body Conditioned Egocentric Video Prediction Yutong Bai et.al. 2506.21552v1 null
2025-06-26 SiM3D: Single-instance Multiview Multimodal and Multisetup 3D Anomaly Detection Benchmark Alex Costanzino et.al. 2506.21549v1 null
2025-06-26 SAM4D: Segment Anything in Camera and LiDAR Streams Jianyun Xu et.al. 2506.21547v1 null
2025-06-26 HalluSegBench: Counterfactual Visual Reasoning for Segmentation Hallucination Evaluation Xinzhuo Li et.al. 2506.21546v1 null
2025-06-26 DeOcc-1-to-3: 3D De-Occlusion from a Single Image via Self-Supervised Multi-View Diffusion Yansong Qu et.al. 2506.21544v1 null
2025-06-26 StruMamba3D: Exploring Structural Mamba for Self-supervised Point Cloud Representation Learning Chuxin Wang et.al. 2506.21541v1 null
2025-06-26 WorldVLA: Towards Autoregressive Action World Model Jun Cen et.al. 2506.21539v1 null
2025-06-26 Maximal Matching Matters: Preventing Representation Collapse for Robust Cross-Modal Retrieval Hani Alomari et.al. 2506.21538v1 null
2025-06-26 ResQ: A Novel Framework to Implement Residual Neural Networks on Analog Rydberg Atom Quantum Computers Nicholas S. DiBrita et.al. 2506.21537v1 null
2025-06-26 Exploring the Design Space of 3D MLLMs for CT Report Generation Mohammed Baharoon et.al. 2506.21535v1 null
2025-06-26 WAFT: Warping-Alone Field Transforms for Optical Flow Yihan Wang et.al. 2506.21526v1 null
2025-06-26 MADrive: Memory-Augmented Driving Scene Modeling Polina Karpikova et.al. 2506.21520v1 null
2025-06-26 G$^{2}$D: Boosting Multimodal Learning with Gradient-Guided Distillation Mohammed Rakib et.al. 2506.21514v1 null
2025-06-26 GGTalker: Talking Head Systhesis with Generalizable Gaussian Priors and Identity-Specific Adaptation Wentao Hu et.al. 2506.21513v1 null
2025-06-26 Mitigating Hallucination of Large Vision-Language Models via Dynamic Logits Calibration Jiahe Chen et.al. 2506.21509v1 null
2025-06-26 Process mining-driven modeling and simulation to enhance fault diagnosis in cyber-physical systems Francesco Vitale et.al. 2506.21502v1 null
2025-06-26 Lightweight Physics-Informed Zero-Shot Ultrasound Plane Wave Denoising Hojat Asgariandehkordi et.al. 2506.21499v1 null
2025-06-26 From multi-allocations to allocations, with subadditive valuations Uriel Feige et.al. 2506.21493v1 null
2025-06-26 Towards Reliable Detection of Empty Space: Conditional Marked Point Processes for Object Detection Tobias J. Riedlinger et.al. 2506.21486v1 null
2025-06-26 TITAN: Query-Token based Domain Adaptive Adversarial Learning Tajamul Ashraf et.al. 2506.21484v1 null
2025-06-26 An equation-based batch distillation simulation to evaluate the effect of multiplicities in thermodynamic activity coefficients Jennifer Werner et.al. 2506.21483v1 null
2025-06-26 Global and Local Entailment Learning for Natural World Imagery Srikumar Sastry et.al. 2506.21476v1 null
2025-06-26 Reinforcement Learning for Optimal Control of Spin Magnetometers Logan W. Cooke et.al. 2506.21475v1 null
2025-06-26 Logios : An open source Greek Polytonic Optical Character Recognition system Perifanos Konstantinos et.al. 2506.21474v1 null
2025-06-26 Evaluation of Traffic Signals for Daily Traffic Pattern Mohammad Shokrolah Shirazi et.al. 2506.21469v1 null
2025-06-26 TopK Language Models Ryosuke Takahashi et.al. 2506.21468v1 null
2025-06-26 Spatial Mental Modeling from Limited Views Baiqiao Yin et.al. 2506.21458v1 null
2025-06-26 Rethinking Oversaturation in Classifier-Free Guidance via Low Frequency Kaiyu Song et.al. 2506.21452v1 null
2025-06-26 A Comprehensive Dataset for Underground Miner Detection in Diverse Scenario Cyrus Addy et.al. 2506.21451v1 null
2025-06-26 ThinkSound: Chain-of-Thought Reasoning in Multimodal Large Language Models for Audio Generation and Editing Huadai Liu et.al. 2506.21448v1 null