Mühendislik Fakültesi / Faculty of Engineering
Permanent URI for this collectionhttps://hdl.handle.net/11727/1401
Browse
12 results
Search Results
Item Video Scene Classification Using Spatıal Pyramid Based Features(2014) Sert, Mustafa; Ergun, Hilal; https://orcid.org/0000-0002-7056-4245; AAB-8673-2019Recognition of video scenes is a challenging problem due to the unconstrained structure of the video content. Here, we propose a spatial pyramid based method for the recognition of video scenes and explore the effect of parameter optimization to the recognition accuracy. In the experiments different sampling methods, dictionary sizes, kernel methods, and pyramid levels are examined. Support Vector Machine (SVM) is employed for classification due to the success in pattern recognition applications. Our experiments show that, the size of dictionary and proper pyramid levels in feature representation drastically enhance the recognition accuracy.Item Analysis of Deep Neural Network Models for Acoustic Scene Classification(2019) Basbug, Ahmet Melih; Sert, MustafaAcoustic Scene Classification is one of the active fields of both audio signal processing and machine learning communities. Due to the uncontrolled environment characteristics and the multiple diversity of environmental sounds, the classification of acoustic environment recordings by computer systems is a challenging task. In this study, the performance of deep learning algorithms on acoustic scene classification problem which includes continuous information in sound events are analyzed. For this purpose, the success of the AlexNet and the VGGish based 4- and 8-layered convolutional neural networks utilizing long-short-term memory recurrent neural network (LSTM-RNN) and Gated Recurrent Unit Recurrent Neural Network (GRU-RNN) architectures have been analyzed for this classification task. In this direction, we adapt the LSTM-RNN and the GRU-RNN models with the 4- and 8-layared CNN architectures for the classification. Our experimental results show that 4-layered CNN with GRU structure improve the accuracy.Item Classification of Obstructive Sleep Apnea using Multimodal and Sigma-based Feature Representation(2019) Memis, Gokhan; Sert, MustafaObstructive sleep apnea (OSA) is a sleep disorder characterized by a decrease in blood oxygen saturation and waking up after a long time. Diagnosis can be made by following a full night with a polysomnogram device, so there is a need for computer-based methods for the diagnosis of OSA. In this study, a method based on feature selection is proposed for OSA classification using oxygen saturation and electrocardiogram signals. Standard deviation (sigma) based features have been created to increase accuracy and reduce computational complexity. To evaluate the effectiveness, comparisons were made with selected machine learning algorithms. The achievements of the obtained features were compared with Naive Bayes (NB), k-nearest neighborhood (kNN) and Support Vector Machine (SVM) classifiers. The tests performed on the PhysioNet dataset consisting of real clinical samples show that the use of sigma-based features result an average performance increase of 1.98% in all test scenarios.Item The Effectiveness of Feature Selection Methods on Physical Activity Recognition(2018) Memis, Gokhan; Sert, MustafaFor the definition of physical activity monitoring with long activity times can be costly and there is a need for efficient computer based algorithms. Smartphone sensors such as accelerometer, magnetometer, and gyroscope for physical activity recognition are used in many researches. In this study, we propose a multi-modal approach to classify the different physical activities at the feature level by fusing electrocardiography (ECG), accelerometer, magnetometer, and gyroscope signals. We use Support Vector Machine (SVM), nearest neighbors, Naive Bayes, Random Tree and Bagging RepTree classifiers as learning algorithms and provide comprehensive empirical results on fusion strategy. Our experimental results on real clinical examples from the MHealth dataset show that the proposed feature-level fusion approach gives an average accuracy of 98.40% using SVM with the highest value in all scenarios. We also observe that when we use the SVM classifier with the gyroscope signal, which we take the highest value as a single modal, it gives an average accuracy of 96.27%. We achieve a significant improvement in comparision with existing studies.Item Continuous Valence Prediction Using Recurrent Neural Networks with Facial Expressions and EEG Signals(2018) Sen, Dogancan; Sert, MustafaAutomatic analysis of human emotions by computer systems is an important task for human-machine interaction. Recent studies show that, the temporal characteristics of emotions play an important role in the success of automatic recognition. Also, the use of different signals (facial expressions, bio-signals, etc.) is important for the understanding of emotions. In this study, we propose a multi-modal method based on feature-level fusion of human facial expressions and electroencephalograms (EEG) data to predict human emotions in continuous valence dimension. For this purpose, a recursive neural network (LSTM-RNN) with long short-term memory units is designed. The proposed method is evaluated on the MAHNOB-HCI performance data set.Item Sentiment Analysis on Microblog Data based on Word Embedding and Fusion Techniques(2017) Hayran, Ahmet; Sert, Mustafa; 0000-0002-7056-4245; AAB-8673-2019People often use social platforms to state their views and desires. Twitter is one of the most popular microblog service used for this purpose. In this study, we propose a new approach for automatically classifying the sentiment of microblog messages. The proposed approach is based on utilizing robust feature representation and fusion. We make use of word embedding technique as the feature representation and the Support Vector Machine as the classifier. In our approach, we first calculate statistical measures from word embedding representations and fuse them using different combinations. Learning is performed using these fused features and tested on the Turkish tweet dataset. Results show that the proposed approach significantly reduces the dimension of tweet representation and enhances sentiment classification accuracy. Best performance is attained by the proposed Dvot fusion technique with an accuracy of %80.05.Item Multimodal Video Concept Classification based on Convolutional Neural Network and Audio Feature Combination(2017) Selbes, Berkay; Sert, Mustafa; Sert, Mustafa; 0000-0002-7056-4245; AAB-8673-2019Video concept classification is a very important task for several applications such as content based video indexing and searching In this study, we propose a multi-modal video classification method based on the feature-level fusion of audiovisual signals. In the proposed method, we extract Mel Frequency Cepstral Coefficient (MFCC) and convolutional neural network (CNN) features from the audio and visual parts of the video signal, respectively and calculate three statistical representations of the MFCC feature vectors. We perform feature level fusion of both modalities using the concatenation operator and train Support Vector Machine (SVM) classifiers using these multimodal features. We evaluate the effectiveness of our proposed method on the TRECVID video performance dataset for both single and multi-modal cases. Our results show that, fusing standard deviation representation of the audio modality along with the GoogleNet CNN features improves the classification accuracy.Item Feature-level Fusion of Convolutional Neural Networks for Visual Object Classification(2016) Ergun, Hilal; Sert, Mustafa; https://orcid.org/0000-0002-7056-4245; AAB-8673-2019Deep learning architectures have shown great success in various computer vision applications. In this study, we investigate some of the very popular convolutional neural network (CNN) architectures, namely GoogleNet, AlexNet, VGG19 and ResNet. Furthermore, we show possible early feature fusion strategies for visual object classification tasks. Concatanation of features, average pooling and maximum pooling are among the investigated fusion strategies. We obtain state-of-the-art results on well-known image classification datasets of Caltech-101, Caltech-256 and Pascal VOC 2007.Item Facial Action Unit Detection using Variable Decision Thresholds(2016) Aksoy, Nukhet; Sert, Mustafa; 0000-0002-7056-4245; D-3080-2015Detection of facial action units (AUs) is an important research field for recognizing emotional states in facial expressions. Here, we propose a novel, yet effective method, that utilizes variable decision thresholds at the prediction stage of a binary learning method for AU detection. The method performs a thresholding technique to find optimum values for each AU and make use of these thresholds as the decision threshold of the support vector machine (SVM) algorithm. Our experiments on Extended Cohn- Kanade (CK+) dataset show significant improvements on most of the AUs with an average F1 score of 6 .383 % compared with the baseline method.Item Movie Rating Prediction Using Ensemble Learning and Mixed Type Attributes(2017) Ozkaya Eren, Aysegul; Sert, Mustafa; 0000-0002-7056-4245; AAB-8673-2019Nowadays, audience can easily share their rating about a movie on the internet. Predicting movie user ratings automatically is specifically valuable for prediction box office gross in the cinema sector. As a result, movie rating prediction has been a popular application area for machine learning researchers. Although most of the recent studies consider using mostly numerical features in analyses, handling nominal features is still an open problem. In this study, we propose a method for predicting movie user ratings based on numerical and nominal feature collaboration and ensemble learning. The effectiveness and the performance of the proposed approach is validated on Internet Movie Database (IMDb) performance dataset by comparing with different methods in the literature. Results show that, using mixed data types along with the ensemble learning improves the movie rating prediction.