Skip to content

Vision Transformer

Vision Transformer

Publish Date Title Authors PDF Code
2024-09-16 RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval Di Liu et.al. 2409.10516v1 null
2024-09-16 Machine Learning Optimization of non-Kasha Behavior and of Transient Dynamics in Model Retinal Isomerization Davinder Singh et.al. 2409.10505v1 null
2024-09-16 Causal Language Modeling Can Elicit Search and Reasoning Capabilities on Logic Puzzles Kulin Shah et.al. 2409.10502v1 null
2024-09-16 Flash STU: Fast Spectral Transform Units Y. Isabel Liu et.al. 2409.10489v2 null
2024-09-16 Do Pre-trained Vision-Language Models Encode Object States? Kaleb Newman et.al. 2409.10488v1 null
2024-09-16 XLM for Autonomous Driving Systems: A Comprehensive Review Sonda Fourati et.al. 2409.10484v1 null
2024-09-16 Exploring 3D Face Reconstruction and Fusion Methods for Face Verification: A Case-Study in Video Surveillance Simone Maurizio La Cava et.al. 2409.10481v1 null
2024-09-16 SimInversion: A Simple Framework for Inversion-Based Text-to-Image Editing Qi Qian et.al. 2409.10476v1 null
2024-09-16 MacDiff: Unified Skeleton Modeling with Masked Conditional Diffusion Lehong Wu et.al. 2409.10473v1 null
2024-09-16 KoroT-3E: A Personalized Musical Mnemonics Tool for Enhancing Memory Retention of Complex Computer Science Concepts Xiangzhe Yuan et.al. 2409.10446v1 null
2024-09-16 Deep-Wide Learning Assistance for Insect Pest Classification Toan Nguyen et.al. 2409.10445v1 link
2024-09-16 CtRNet-X: Camera-to-Robot Pose Estimation in Real-world Conditions Using a Single Camera Jingpei Lu et.al. 2409.10441v1 null
2024-09-16 Learning Semi-Supervised Medical Image Segmentation from Spatial Registration Qianying Liu et.al. 2409.10422v1 null
2024-09-16 HiFi-CS: Towards Open Vocabulary Visual Grounding For Robotic Grasping Using Vision-Language Models Vineet Bhat et.al. 2409.10419v1 null
2024-09-16 Prompt-and-Transfer: Dynamic Class-aware Enhancement for Few-shot Segmentation Hanbo Bi et.al. 2409.10389v1 null
2024-09-16 Mamba-ST: State Space Model for Efficient Style Transfer Filippo Botti et.al. 2409.10385v1 null
2024-09-16 Learning Gentle Grasping from Human-Free Force Control Demonstration Mingxuan Li et.al. 2409.10371v1 null
2024-09-16 Integrated nowcasting of convective precipitation with Transformer-based models using multi-source data Çağlar Küçük et.al. 2409.10367v1 null
2024-09-16 Robust image representations with counterfactual contrastive learning Mélanie Roschewitz et.al. 2409.10365v1 link
2024-09-16 Frequency-Guided Masking for Enhanced Vision Self-Supervised Learning Amin Karimi Monsefi et.al. 2409.10362v1 null
2024-09-16 2D or not 2D: How Does the Dimensionality of Gesture Representation Affect 3D Co-Speech Gesture Generation? Téo Guichoux et.al. 2409.10357v1 null
2024-09-16 Taming Diffusion Models for Image Restoration: A Review Ziwei Luo et.al. 2409.10353v1 null
2024-09-16 Point2Graph: An End-to-end Point Cloud-based 3D Open-Vocabulary Scene Graph for Robot Navigation Yifan Xu et.al. 2409.10350v1 null
2024-09-16 MEGS: Morphological Evaluation of Galactic Structure Ufuk Çakır et.al. 2409.10346v1 link
2024-09-16 VAE-QWGAN: Improving Quantum GANs for High Resolution Image Generation Aaron Mark Thomas et.al. 2409.10339v1 null
2024-09-16 Phys3DGS: Physically-based 3D Gaussian Splatting for Inverse Rendering Euntae Choi et.al. 2409.10335v1 null
2024-09-16 DRIVE: Dependable Robust Interpretable Visionary Ensemble Framework in Autonomous Driving Songning Lai et.al. 2409.10330v1 null
2024-09-16 InfoDisent: Explainability of Image Classification Models by Information Disentanglement Łukasz Struski et.al. 2409.10329v1 null
2024-09-16 Fuse4Seg: Image-Level Fusion Based Multi-Modality Medical Image Segmentation Yuchen Guo et.al. 2409.10328v2 null
2024-09-16 Baking Relightable NeRF for Real-time Direct/Indirect Illumination Rendering Euntae Choi et.al. 2409.10327v1 null