ENHANCED SOUND EVENT LOCALIZATION AND DETECTION IN REAL 360◦ AUDIO-VISUAL SOUNDSCAPES

Type
Journal
Authors
Roman ( Adrian S. Roman )
Balamurugan ( Baladithya Balamurugan )
Pothuganti ( Rithik Pothuganti )
 
Category
Article  [ Browse Items ]
Publication Year
2024 
URL
[ private ] 
Abstract
This technical report details our work towards building an enhanced audio-visual sound event localization and detection (SELD) network. We build on top of the audio-only SELDnet23 model and adapt it to be audio-visual by merging both audio and video information prior to the gated recurrent unit (GRU) of the audio-only network. Our model leverages YOLO and DETIC object detectors. We also build a framework that implements audio-visual data augmentation and audio-visual synthetic data generation. We deliver an audio-visual SELDnet system that outperforms the existing audio-visual SELD baseline. 
Description
https://arxiv.org/pdf/2401.17129.pdf 
Number of Copies

REVIEWS (0) -

No reviews posted yet.

WRITE A REVIEW

Please login to write a review.