Multimodal Video Concept Classification based on Convolutional Neural Network and Audio Feature Combination

dc.contributor.authorSelbes, Berkay
dc.contributor.authorSert, Mustafa
dc.contributor.authorSert, Mustafa
dc.contributor.orcID0000-0002-7056-4245en_US
dc.contributor.researcherIDAAB-8673-2019en_US
dc.date.accessioned2023-07-20T10:09:38Z
dc.date.available2023-07-20T10:09:38Z
dc.date.issued2017
dc.description.abstractVideo concept classification is a very important task for several applications such as content based video indexing and searching In this study, we propose a multi-modal video classification method based on the feature-level fusion of audiovisual signals. In the proposed method, we extract Mel Frequency Cepstral Coefficient (MFCC) and convolutional neural network (CNN) features from the audio and visual parts of the video signal, respectively and calculate three statistical representations of the MFCC feature vectors. We perform feature level fusion of both modalities using the concatenation operator and train Support Vector Machine (SVM) classifiers using these multimodal features. We evaluate the effectiveness of our proposed method on the TRECVID video performance dataset for both single and multi-modal cases. Our results show that, fusing standard deviation representation of the audio modality along with the GoogleNet CNN features improves the classification accuracy.en_US
dc.identifier.issn2165-0608en_US
dc.identifier.scopus2-s2.0-85026325770en_US
dc.identifier.urihttp://hdl.handle.net/11727/10010
dc.identifier.wos000413813100586en_US
dc.language.isoturen_US
dc.relation.isversionof10.1109/SIU.2017.7960723en_US
dc.relation.journal25th Signal Processing and Communications Applications Conference (SIU)en_US
dc.rightsinfo:eu-repo/semantics/closedAccessen_US
dc.subjectConvolutional Neural Networken_US
dc.subjectSupport Vector Machineen_US
dc.subjectFeature Extractionen_US
dc.subjectTRECVIDen_US
dc.subjectVideo concept classificationen_US
dc.titleMultimodal Video Concept Classification based on Convolutional Neural Network and Audio Feature Combinationen_US
dc.typeConference Objecten_US

Files

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: