Basit öğe kaydını göster

dc.contributor.authorSelbes, Berkay
dc.contributor.authorSert, Mustafa
dc.date.accessioned2024-05-03T11:53:48Z
dc.date.available2024-05-03T11:53:48Z
dc.date.issued2023
dc.identifier.isbn979-8-4007-0277-8en_US
dc.identifier.urihttp://hdl.handle.net/11727/12049
dc.description.abstractVideo captioning aims to generate natural language sentences of an input video. Generating coherent natural language sentences is a challenging task due to the complex nature of video content such as object and scene understanding, extraction of object- and event-specific auditory information, and acquisition of relationships among objects. In this study, we address the problem of efficient modeling of object interactions in scenes, as they include crucial information regarding the events in the visual scene. To this end, we propose to use object features along with auditory information to better model the audio-visual scene appearing within the video. Specifically, we extract Faster R-CNN as the object features and VGGish as the auditory features and design a transformer encoder-decoder architecture in the multimodal setup. Experiments on MSR-VTT show encouraging results and object features better model the object interactions along with the auditory information in comparison to the ResNet features.en_US
dc.language.isoengen_US
dc.relation.isversionof10.1145/3607540.3617141en_US
dc.rightsinfo:eu-repo/semantics/closedAccessen_US
dc.subjectVideo captioningen_US
dc.subjecttransformersen_US
dc.subjectattentionen_US
dc.subjectNLPen_US
dc.subjectVGGishen_US
dc.subjectFaster R-CNNen_US
dc.subjectobject featureen_US
dc.titleMultimodal Video Captioning Using Object-Auditory Information Fusion with Transformersen_US
dc.typeconferenceObjecten_US
dc.relation.journal2nd Workshop on User-Centric Narrative Summarization of Long Videos (NarSUM)en_US
dc.identifier.startpage51en_US
dc.identifier.endpage56en_US
dc.identifier.wos001125005500008en_US
dc.identifier.scopus2-s2.0-85178356164en_US


Bu öğenin dosyaları:

DosyalarBoyutBiçimGöster

Bu öğe ile ilişkili dosya yok.

Bu öğe aşağıdaki koleksiyon(lar)da görünmektedir.

Basit öğe kaydını göster