Multimodal Video Concept Classification based on Convolutional Neural Network and Audio Feature Combination

Selbes, Berkay; Sert, Mustafa

Multimodal Video Concept Classification based on Convolutional Neural Network and Audio Feature Combination

dc.contributor.author	Selbes, Berkay
dc.contributor.author	Sert, Mustafa
dc.contributor.author	Sert, Mustafa
dc.contributor.orcID	0000-0002-7056-4245	en_US
dc.contributor.researcherID	AAB-8673-2019	en_US
dc.date.accessioned	2023-07-20T10:09:38Z
dc.date.available	2023-07-20T10:09:38Z
dc.date.issued	2017
dc.description.abstract	Video concept classification is a very important task for several applications such as content based video indexing and searching In this study, we propose a multi-modal video classification method based on the feature-level fusion of audiovisual signals. In the proposed method, we extract Mel Frequency Cepstral Coefficient (MFCC) and convolutional neural network (CNN) features from the audio and visual parts of the video signal, respectively and calculate three statistical representations of the MFCC feature vectors. We perform feature level fusion of both modalities using the concatenation operator and train Support Vector Machine (SVM) classifiers using these multimodal features. We evaluate the effectiveness of our proposed method on the TRECVID video performance dataset for both single and multi-modal cases. Our results show that, fusing standard deviation representation of the audio modality along with the GoogleNet CNN features improves the classification accuracy.	en_US
dc.identifier.issn	2165-0608	en_US
dc.identifier.scopus	2-s2.0-85026325770	en_US
dc.identifier.uri	http://hdl.handle.net/11727/10010
dc.identifier.wos	000413813100586	en_US
dc.language.iso	tur	en_US
dc.relation.isversionof	10.1109/SIU.2017.7960723	en_US
dc.relation.journal	25th Signal Processing and Communications Applications Conference (SIU)	en_US
dc.rights	info:eu-repo/semantics/closedAccess	en_US
dc.subject	Convolutional Neural Network	en_US
dc.subject	Support Vector Machine	en_US
dc.subject	Feature Extraction	en_US
dc.subject	TRECVID	en_US
dc.subject	Video concept classification	en_US
dc.title	Multimodal Video Concept Classification based on Convolutional Neural Network and Audio Feature Combination	en_US
dc.type	Conference Object	en_US

Files

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Mühendislik Fakültesi / Faculty of Engineering