2024-09-16 |
MusicLIME: Explainable Multimodal Music Understanding |
Theodoros Sotirou et.al. |
2409.10496v1 |
link |
2024-09-16 |
Flash STU: Fast Spectral Transform Units |
Y. Isabel Liu et.al. |
2409.10489v2 |
null |
2024-09-16 |
XLM for Autonomous Driving Systems: A Comprehensive Review |
Sonda Fourati et.al. |
2409.10484v1 |
null |
2024-09-16 |
Prompt-and-Transfer: Dynamic Class-aware Enhancement for Few-shot Segmentation |
Hanbo Bi et.al. |
2409.10389v1 |
null |
2024-09-16 |
Nonlinear Causality in Brain Networks: With Application to Motor Imagery vs Execution |
Sipan Aslan et.al. |
2409.10374v1 |
null |
2024-09-16 |
Fuse4Seg: Image-Level Fusion Based Multi-Modality Medical Image Segmentation |
Yuchen Guo et.al. |
2409.10328v2 |
null |
2024-09-16 |
Soft modes in vector spin glass models on sparse random graphs |
Silvio Franz et.al. |
2409.10312v1 |
null |
2024-09-16 |
SOLVR: Submap Oriented LiDAR-Visual Re-Localisation |
Joshua Knights et.al. |
2409.10247v1 |
null |
2024-09-16 |
Neuromorphic Facial Analysis with Cross-Modal Supervision |
Federico Becattini et.al. |
2409.10213v1 |
null |
2024-09-16 |
Fit and Prune: Fast and Training-free Visual Token Pruning for Multi-modal Large Language Models |
Weihao Ye et.al. |
2409.10197v1 |
null |
2024-09-16 |
LiLoc: Lifelong Localization using Adaptive Submap Joining and Egocentric Factor Graph |
Yixin Fang et.al. |
2409.10172v1 |
null |
2024-09-16 |
Data-Centric Strategies for Overcoming PET/CT Heterogeneity: Insights from the AutoPET III Lesion Segmentation Challenge |
Balint Kovacs et.al. |
2409.10120v1 |
link |
2024-09-16 |
Participation Factors for Nonlinear Autonomous Dynamical Systems in the Koopman Operator Framework |
Kenji Takamichi et.al. |
2409.10105v1 |
null |
2024-09-16 |
MotionCom: Automatic and Motion-Aware Image Composition with LLM and Video Diffusion Prior |
Weijing Tao et.al. |
2409.10090v1 |
null |
2024-09-16 |
Cross-modality image synthesis from TOF-MRA to CTA using diffusion-based models |
Alexander Koch et.al. |
2409.10089v1 |
null |
2024-09-16 |
DAE-Fuse: An Adaptive Discriminative Autoencoder for Multi-Modality Image Fusion |
Yuchen Guo et.al. |
2409.10080v1 |
null |
2024-09-16 |
StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style Diffusion |
Yinghao Aaron Li et.al. |
2409.10058v1 |
null |
2024-09-15 |
TransForce: Transferable Force Prediction for Vision-based Tactile Sensors with Sequential Image Translation |
Zhuo Chen et.al. |
2409.09870v1 |
null |
2024-09-15 |
Physically-Consistent Parameter Identification of Robots in Contact |
Shahram Khorshidi et.al. |
2409.09850v1 |
null |
2024-09-15 |
On the Effect of Robot Errors on Human Teaching Dynamics |
Jindan Huang et.al. |
2409.09827v1 |
null |
2024-09-15 |
Efficient Video to Audio Mapper with Visual Scene Detection |
Mingjing Yi et.al. |
2409.09823v1 |
null |
2024-09-15 |
PROSE-FD: A Multimodal PDE Foundation Model for Learning Multiple Operators for Forecasting Fluid Dynamics |
Yuxuan Liu et.al. |
2409.09811v1 |
null |
2024-09-15 |
DiFSD: Ego-Centric Fully Sparse Paradigm with Uncertainty Denoising and Iterative Refinement for Efficient End-to-End Autonomous Driving |
Haisheng Su et.al. |
2409.09777v1 |
null |
2024-09-15 |
Explore the Hallucination on Low-level Perception for MLLMs |
Yinan Sun et.al. |
2409.09748v1 |
null |
2024-09-15 |
MFCLIP: Multi-modal Fine-grained CLIP for Generalizable Diffusion Face Forgery Detection |
Yaning Zhang et.al. |
2409.09724v1 |
null |
2024-09-15 |
Generative Semantic Communication via Textual Prompts: Latency Performance Tradeoffs |
Mengmeng Ren et.al. |
2409.09715v1 |
null |
2024-09-15 |
A Comprehensive Methodological Survey of Human Activity Recognition Across Divers Data Modalities |
Jungpil Shin et.al. |
2409.09678v1 |
null |
2024-09-15 |
Enhancing Weakly-Supervised Object Detection on Static Images through (Hallucinated) Motion |
Cagri Gungor et.al. |
2409.09616v1 |
null |
2024-09-15 |
NEVLP: Noise-Robust Framework for Efficient Vision-Language Pre-training |
Yiyi Tao et.al. |
2409.09582v1 |
null |
2024-09-14 |
Multi-Microphone and Multi-Modal Emotion Recognition in Reverbrant Enviroment |
Ohad Cohen et.al. |
2409.09545v1 |
null |