2025-06-26 |
SAM4D: Segment Anything in Camera and LiDAR Streams |
Jianyun Xu et.al. |
2506.21547v1 |
null |
2025-06-26 |
Maximal Matching Matters: Preventing Representation Collapse for Robust Cross-Modal Retrieval |
Hani Alomari et.al. |
2506.21538v1 |
null |
2025-06-26 |
G$^{2}$D: Boosting Multimodal Learning with Gradient-Guided Distillation |
Mohammed Rakib et.al. |
2506.21514v1 |
null |
2025-06-26 |
Global and Local Entailment Learning for Natural World Imagery |
Srikumar Sastry et.al. |
2506.21476v1 |
null |
2025-06-26 |
Deception Detection in Dyadic Exchanges Using Multimodal Machine Learning: A Study on a Swedish Cohort |
Franco Rugolon et.al. |
2506.21429v1 |
null |
2025-06-26 |
Distributed Cross-Channel Hierarchical Aggregation for Foundation Models |
Aristeidis Tsaris et.al. |
2506.21411v1 |
null |
2025-06-26 |
CA-I2P: Channel-Adaptive Registration Network with Global Optimal Selection |
Zhixin Cheng et.al. |
2506.21364v1 |
null |
2025-06-26 |
SMMILE: An Expert-Driven Benchmark for Multimodal Medical In-Context Learning |
Melanie Rieff et.al. |
2506.21355v1 |
null |
2025-06-26 |
HumanOmniV2: From Understanding to Omni-Modal Reasoning with Context |
Qize Yang et.al. |
2506.21277v1 |
null |
2025-06-26 |
WordCon: Word-level Typography Control in Scene Text Rendering |
Wenda Shi et.al. |
2506.21276v1 |
null |
2025-06-26 |
Integrating Vehicle Acoustic Data for Enhanced Urban Traffic Management: A Study on Speed Classification in Suzhou |
Pengfei Fan et.al. |
2506.21269v1 |
null |
2025-06-26 |
GANet-Seg: Adversarial Learning for Brain Tumor Segmentation with Hybrid Generative Models |
Qifei Cui et.al. |
2506.21245v1 |
null |
2025-06-26 |
DiMPLe -- Disentangled Multi-Modal Prompt Learning: Enhancing Out-Of-Distribution Alignment with Invariant and Spurious Feature Separation |
Umaima Rahman et.al. |
2506.21237v1 |
null |
2025-06-26 |
MedPrompt: LLM-CNN Fusion with Weight Routing for Medical Image Segmentation and Classification |
Shadman Sobhan et.al. |
2506.21199v1 |
null |
2025-06-26 |
Personalized Federated Learning via Dual-Prompt Optimization and Cross Fusion |
Yuguang Zhang et.al. |
2506.21144v1 |
null |
2025-06-26 |
Semantic-aware Digital Twin for AI-based CSI Acquisition |
Jiajia Guo et.al. |
2506.21126v1 |
null |
2025-06-26 |
IPFormer-VideoLLM: Enhancing Multi-modal Video Understanding for Multi-shot Scenes |
Yujia Liang et.al. |
2506.21116v1 |
null |
2025-06-26 |
DALR: Dual-level Alignment Learning for Multimodal Sentence Representation Learning |
Kang He et.al. |
2506.21096v1 |
null |
2025-06-26 |
EgoAdapt: Adaptive Multisensory Distillation and Policy Learning for Efficient Egocentric Perception |
Sanjoy Chowdhury et.al. |
2506.21080v1 |
null |
2025-06-26 |
TRIDENT: Tri-Modal Molecular Representation Learning with Taxonomic Annotations and Local Correspondence |
Feng Jiang et.al. |
2506.21028v1 |
null |
2025-06-26 |
LASFNet: A Lightweight Attention-Guided Self-Modulation Feature Fusion Network for Multimodal Object Detection |
Lei Hao et.al. |
2506.21018v1 |
null |
2025-06-26 |
Multimodal Prompt Alignment for Facial Expression Recognition |
Fuyan Ma et.al. |
2506.21017v1 |
null |
2025-06-26 |
TSDASeg: A Two-Stage Model with Direct Alignment for Interactive Point Cloud Segmentation |
Chade Li et.al. |
2506.20991v1 |
null |
2025-06-26 |
EVA: Mixture-of-Experts Semantic Variant Alignment for Compositional Zero-Shot Learning |
Xiao Zhang et.al. |
2506.20986v1 |
null |
2025-06-26 |
ThermalDiffusion: Visual-to-Thermal Image-to-Image Translation for Autonomous Navigation |
Shruti Bansal et.al. |
2506.20969v1 |
null |
2025-06-26 |
OmniEval: A Benchmark for Evaluating Omni-modal Models with Visual, Auditory, and Textual Inputs |
Yiman Zhang et.al. |
2506.20960v1 |
null |
2025-06-26 |
Hierarchical Sub-action Tree for Continuous Sign Language Recognition |
Dejie Yang et.al. |
2506.20947v1 |
null |
2025-06-26 |
A Multi-Stage Framework for Multimodal Controllable Speech Synthesis |
Rui Niu et.al. |
2506.20945v1 |
null |
2025-06-26 |
E-FreeM2: Efficient Training-Free Multi-Scale and Cross-Modal News Verification via MLLMs |
Van-Hoang Phan et.al. |
2506.20944v1 |
null |
2025-06-25 |
Stellar Dynamics in Open Clusters Increases the Binary Fraction and Mass Ratios: Evidence from Photometric Binaries in 35 Open Clusters |
Anna C. Childs et.al. |
2506.20889v1 |
null |