Skip to content

Audio Processing and Recognization

Audio Processing and Recognization

Publish Date Title Authors PDF Code
2024-09-16 Do Pre-trained Vision-Language Models Encode Object States? Kaleb Newman et.al. 2409.10488v1 null
2024-09-16 MacDiff: Unified Skeleton Modeling with Masked Conditional Diffusion Lehong Wu et.al. 2409.10473v1 null
2024-09-16 A Knowledge-Enhanced Disease Diagnosis Method Based on Prompt Learning and BERT Integration Zhang Zheng et.al. 2409.10403v1 null
2024-09-16 Prompt-and-Transfer: Dynamic Class-aware Enhancement for Few-shot Segmentation Hanbo Bi et.al. 2409.10389v1 null
2024-09-16 VAE-QWGAN: Improving Quantum GANs for High Resolution Image Generation Aaron Mark Thomas et.al. 2409.10339v1 null
2024-09-16 Escaping Local Minima: Hybrid Artificial Potential Field with Wall-Follower for Decentralized Multi-Robot Navigation Joonkyung Kim et.al. 2409.10332v1 null
2024-09-16 MGSA: Multi-granularity Graph Structure Attention for Knowledge Graph-to-Text Generation Shanshan Wang et.al. 2409.10294v1 null
2024-09-16 SPAC: Sampling-based Progressive Attribute Compression for Dense Point Clouds Xiaolong Mao et.al. 2409.10293v1 null
2024-09-16 Anatomical Positional Embeddings Mikhail Goncharov et.al. 2409.10291v1 link
2024-09-16 Speech as a Biomarker for Disease Detection Catarina Botelho et.al. 2409.10230v1 null
2024-09-16 Garment Attribute Manipulation with Multi-level Attention Vittorio Casula et.al. 2409.10206v1 null
2024-09-16 Orienting gaze toward a visual target: Neurophysiological synthesis with epistemological considerations Laurent Goffart et.al. 2409.10189v1 null
2024-09-16 TCDformer-based Momentum Transfer Model for Long-term Sports Prediction Hui Liu et.al. 2409.10176v1 null
2024-09-16 Human Insights Driven Latent Space for Different Driving Perspectives: A Unified Encoder for Efficient Multi-Task Inference Huy-Dung Nguyen et.al. 2409.10095v1 null
2024-09-16 Messy Code Makes Managing ML Pipelines Difficult? Just Let LLMs Rewrite the Code! Sebastian Schelter et.al. 2409.10081v1 null
2024-09-16 DAE-Fuse: An Adaptive Discriminative Autoencoder for Multi-Modality Image Fusion Yuchen Guo et.al. 2409.10080v1 null
2024-09-16 Learning Latent Wireless Dynamics from Channel State Information Charbel Bou Chaaya et.al. 2409.10045v1 null
2024-09-16 ViewActive: Active viewpoint optimization from a single image Jiayi Wu et.al. 2409.09997v1 null
2024-09-16 SHIRE: Enhancing Sample Efficiency using Human Intuition in REinforcement Learning Amogh Joshi et.al. 2409.09990v1 null
2024-09-16 Mining of Switching Sparse Networks for Missing Value Imputation in Multivariate Time Series Kohei Obata et.al. 2409.09930v1 null
2024-09-16 Rapid Adaptation of Earth Observation Foundation Models for Segmentation Karthick Panner Selvam et.al. 2409.09907v1 null
2024-09-16 Rediscovering the Latent Dimensions of Personality with Large Language Models as Trait Descriptors Joseph Suh et.al. 2409.09905v1 null
2024-09-15 GRIN: Zero-Shot Metric Depth with Pixel-Level Diffusion Vitor Guizilini et.al. 2409.09896v1 null
2024-09-15 Optimality of Motion Camouflage Under Escape Uncertainty Mallory Gaspard et.al. 2409.09890v1 null
2024-09-15 Constructing a Singing Style Caption Dataset Hyunjong Ok et.al. 2409.09866v1 link
2024-09-15 Formalizing, Normalizing, and Splitting the Energy Network Re-Dispatch for Quantum Annealing Loong Kuan Lee et.al. 2409.09857v1 null
2024-09-15 Latent Diffusion Models for Controllable RNA Sequence Generation Kaixuan Huang et.al. 2409.09828v1 null
2024-09-15 Abnormal Event Detection In Videos Using Deep Embedding Darshan Venkatrayappa et.al. 2409.09804v1 null
2024-09-15 CROSS-JEM: Accurate and Efficient Cross-encoders for Short-text Ranking Tasks Bhawna Paliwal et.al. 2409.09795v1 null
2024-09-15 VGG-Tex: A Vivid Geometry-Guided Facial Texture Estimation Model for High Fidelity Monocular 3D Face Reconstruction Haoyu Wu et.al. 2409.09740v2 null