Audio Captioning with Composition of Acoustic and Semantic Information

Eren, Aysegul Ozkaya; Sert, Mustafa

Audio Captioning with Composition of Acoustic and Semantic Information

dc.contributor.author	Eren, Aysegul Ozkaya
dc.contributor.author	Sert, Mustafa
dc.date.accessioned	2022-09-05T09:46:51Z
dc.date.available	2022-09-05T09:46:51Z
dc.date.issued	2021
dc.description.abstract	Generating audio captions is a new research area that combines audio and natural language processing to create meaningful textual descriptions for audio clips. To address this problem, previous studies mostly use the encoder-decoder-based models without considering semantic information. To fill this gap, we present a novel encoder-decoder architecture using bi-directional Gated Recurrent Units (BiGRU) with audio and semantic embeddings. We extract semantic embedding by obtaining subjects and verbs from the audio clip captions and combine these embedding with audio embedding to feed the BiGRU-based encoder-decoder model. To enable semantic embeddings for the test audios, we introduce a Multilayer Perceptron classifier to predict the semantic embeddings of those clips. We also present exhaustive experiments to show the effciency of different features and datasets for our proposed model the audio captioning task. To extract audio features, we use the log Mel energy features, VGGish embeddings, and a pretrained audio neural network (PANN) embeddings. Extensive experiments on two audio captioning datasets Clotho and AudioCaps show that our proposed model outperforms state-of-the-art audio captioning models across different evaluation metrics and using the semantic information improves the captioning performance.	en_US
dc.identifier.endpage	160	en_US
dc.identifier.issn	1793-351X	en_US
dc.identifier.issue	02	en_US
dc.identifier.scopus	2-s2.0-85109474276	en_US
dc.identifier.startpage	143	en_US
dc.identifier.uri	https://arxiv.org/pdf/2105.06355.pdf
dc.identifier.uri	http://hdl.handle.net/11727/7509
dc.identifier.volume	15	en_US
dc.identifier.wos	000670288200002	en_US
dc.language.iso	eng	en_US
dc.relation.isversionof	10.1142/S1793351X21400018	en_US
dc.relation.journal	INTERNATIONAL JOURNAL OF SEMANTIC COMPUTING	en_US
dc.relation.publicationcategory	Makale - Uluslararası Hakemli Dergi	en_US
dc.rights	info:eu-repo/semantics/openAccess	en_US
dc.subject	Audio captioning	en_US
dc.subject	PANNs	en_US
dc.subject	VGGish	en_US
dc.subject	GRU	en_US
dc.subject	BiGRU	en_US
dc.title	Audio Captioning with Composition of Acoustic and Semantic Information	en_US
dc.type	Article	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: ds98.pdf
Size:: 812.2 KB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Mühendislik Fakültesi / Faculty of Engineering
Scopus Açık Erişimli Yayınlar
Scopus İndeksli Yayınlar Koleksiyonu
Wos Açık Erişimli Yayınlar
Wos İndeksli Yayınlar Koleksiyonu