Browsing by Author "Sert, Mustafa"

Now showing 1 - 20 of 49

Acoustic Scene Classification Using Spatial Pyramid Pooling With Convolutional Neural Networks
(2019) Basbug, Ahmet Melih; Sert, Mustafa; 0000-0002-7056-4245; AAB-8673-2019
Automatic understanding of audio events and acoustic scenes has been an active research topic for researchers from signal processing and machine learning communities. Recognition of acoustic scenes in the real life scenarios is a challenging task due to the diversity of environmental sounds and uncontrolled environments. Efficient methods and feature representations are needed to cope with these challenges. In this study, we address the acoustic scene classification of raw audio signal and propose a cascaded CNN architecture that uses spatial pyramid pooling (SPP, also referred to as spatial pyramid matching) method to aggregate local features coming from convolutional layers of the CNN. We use three well known audio features, namely MFCC, Mel Energy, and spectrogram to represent audio content and evaluate the effectiveness of our proposed CNN-SPP architecture on the DCASE 2018 acoustic scene performance dataset. Our results show that, the proposed CNN-SPP architecture with the spectrogram feature improves the classification accuracy.
Adli uygulamalar için ses içerik analizi
(Başkent Üniversitesi Fen Bilimleri Enstitüsü, 2018) Sarman, Sercan; Sert, Mustafa
Günümüzde artan şiddet olayları, adli incelemelerin de önemini artırmıştır. Şiddet olaylarının ardından gerçekleştirilecek olan adli incelemeler esnasında, erişilebilir durumda olan bütün işitsel ve görsel veriler oldukça kıymetlidir. Olayın gerçekleştiği konumun tespit edilmesi, şiddetin türünün belirlenmesi ve benzeri süreçler, adli ses analizi kapsamında yer almaktadır. Günümüzde çevrimiçi içeriğe erişimin akıllı cihazlar aracılığıyla konum bağımsız olarak gerçekleştirilebiliyor olması ve sunulan içeriğin miktarının hızlı bir şekilde artmasıyla; içeriğin otomatik olarak sınıflandırılmasının önemini artırmıştır. Özellikle çocuk ve gençleri olumsuz olarak etkileyebilecek içeriğin otomatik olarak tespit edilmesi, içerik miktarının hızlı artışıyla birlikte önem kazanmıştır. Buna karşılık, sinyal işleme alanında, özellikle de adli ses analizi kapsamında gerçekleştirilen çalışmaların başarımı, diğer alanlarda kullanılan makine öğrenmesi yöntemlerinin şiddet sahnesi sınıflandırması alanına uygulanabileceğini göstermiştir. Bu tez çalışması kapsamında, silah seslerinin ve video verilerinin şiddet içeren sahnelerinin ses tabanlı sınıflandırılması problemleri ele alınmıştır. Bu amaçla, makine öğrenmesi metotlarının ve topluluk öğrenmesi yaklaşımları probleme uygulanmıştır. Yöntemler, performans veri kümeleri üzerinde karşılaştırmalı olarak incelenmiş ve silah sesleri sınıflandırılması alanında %66, şiddet sahnesi sınıflandırması alanında %62'ye varan sınıflandırma başarımları elde edilmiştir. Nowadays, the increase in violent events has enhanced the importance of forensic investigations. All accessible auditory and visual data are highly valuable during the examination to be performed after violent events. Audio forensics analysis contains determination of location in which violent incident occur and determination of type of violence. Recently, the location-free and easier access to online content via smart devices and the increase of content have enhanced the importance of automatical classification of content. With the rapid growth in the amount of content, it has become crucial to automatically determine the content that can adversely affect children and youth. On the other hand, the success of the studies carried out in the field of signal processing, especially in the context of audio forensic analysis, shows that the methods of machine learning used in other areas can be applied to the field of violent scene classification. In this study, we study the problem of gunshot sounds and violent scene classification. For this purpose, machine learning and ensemble learning approaches applied to this problem. We examine classification rates of various machine learning and ensemble learning approaches comperatively and we achieve classification accuracies of 66% and 62% in audio gunshot classification and violent scene classification, respectively.
Analysis of Deep Neural Network Models for Acoustic Scene Classification
(2019) Basbug, Ahmet Melih; Sert, Mustafa
Acoustic Scene Classification is one of the active fields of both audio signal processing and machine learning communities. Due to the uncontrolled environment characteristics and the multiple diversity of environmental sounds, the classification of acoustic environment recordings by computer systems is a challenging task. In this study, the performance of deep learning algorithms on acoustic scene classification problem which includes continuous information in sound events are analyzed. For this purpose, the success of the AlexNet and the VGGish based 4- and 8-layered convolutional neural networks utilizing long-short-term memory recurrent neural network (LSTM-RNN) and Gated Recurrent Unit Recurrent Neural Network (GRU-RNN) architectures have been analyzed for this classification task. In this direction, we adapt the LSTM-RNN and the GRU-RNN models with the 4- and 8-layared CNN architectures for the classification. Our experimental results show that 4-layered CNN with GRU structure improve the accuracy.
Anomaly Detection in Smart Home Environments using Convolutional Neural Network
(2021) Ercan, Naci Mert; Sert, Mustafa
The use of smart devices in home environments has been increasing in recent years. The wireless connection of these devices to the internet enables smart homes to be built with less cost and hence, recognition of activities in home environments and the detection of possible anomalies in activities is important for several applications. In this study, we propose a new method based on the changepoint representation of sensor data and variable-length windowing for the recognition of abnormal activities. We present comparative analyses with different representations to demonstrate the efficacy of the proposed scheme. Our results on the WSU performance dataset show that, the use of variable-length windowing improves the anomaly detection performance in comparison to fixed-length windowing.
Audio Based Violent Scene Classification Using Ensemble Learning
(2018) Sarman, Sercan; Sert, Mustafa
In this paper, we deal with the problem of violent scene detection. Although visual signal has been widely used in detection of violent scenes from video data, audio modality; on the other hand, has not been explored as much as visual modality of the video data. Also, in some scenarios such as video surveillance, visual modality can be missing or absent due to the environmental conditions. Therefore, we use the audio modality of video data to decide whether a video scene is violent or not. For this purpose, we propose an ensemble learning method to classify video scenes as "violent" or "non-violent". We provide empirical analyses both for different audio features and classifiers. As a result, we obtain best classification performance by using the Random Forest algorithm along with the ZCR feature. We use MediaEval Violent Scene Detection task dataset for the evaluations and obtain superior results with the official metric MAP@100 of 66% compared with the literature.
Audio Captioning Based on Combined Audio and Semantic Embeddings
(2020) Eren, Aysegul Ozkaya; Sert, Mustafa
Audio captioning is a recently proposed task for automatically generating a textual description of a given audio clip. Most existing approaches use the encoder-decoder model without using semantic information. In this study, we propose a bi-directional Gated Recurrent Unit (BiGRU) model based on encoder-decoder architecture using audio and semantic embeddings. To obtain semantic embeddings, we extract subject-verb embeddings using the subjects and verbs from the audio captions. We use a Multilayer Perceptron classifier to predict subject-verb embeddings of test audio clips for the testing stage. Within the aim of extracting audio features, in addition to log Mel energies, we use a pretrained audio neural network (PANN) as a feature extractor which is used for the first time in the audio captioning task to explore the usability of audio embeddings in the audio captioning task. We combine audio embeddings and semantic embeddings to feed the BiGRU-based encoder-decoder model. Following this, we evaluate our model on two audio captioning datasets: Clotho and AudioCaps. Experimental results show that the proposed BiGRU-based deep model significantly outperforms the state of the art results across different evaluation metrics and inclusion of semantic information enhance the captioning performance.
Audio Captioning with Composition of Acoustic and Semantic Information
(2021) Eren, Aysegul Ozkaya; Sert, Mustafa
Generating audio captions is a new research area that combines audio and natural language processing to create meaningful textual descriptions for audio clips. To address this problem, previous studies mostly use the encoder-decoder-based models without considering semantic information. To fill this gap, we present a novel encoder-decoder architecture using bi-directional Gated Recurrent Units (BiGRU) with audio and semantic embeddings. We extract semantic embedding by obtaining subjects and verbs from the audio clip captions and combine these embedding with audio embedding to feed the BiGRU-based encoder-decoder model. To enable semantic embeddings for the test audios, we introduce a Multilayer Perceptron classifier to predict the semantic embeddings of those clips. We also present exhaustive experiments to show the effciency of different features and datasets for our proposed model the audio captioning task. To extract audio features, we use the log Mel energy features, VGGish embeddings, and a pretrained audio neural network (PANN) embeddings. Extensive experiments on two audio captioning datasets Clotho and AudioCaps show that our proposed model outperforms state-of-the-art audio captioning models across different evaluation metrics and using the semantic information improves the captioning performance.
Audio-based Event Detection in Office Live Environments Using Optimized MFCC-SVM Approach
(2015) Kucukbay, Selver Ezgi; Sert, Mustafa; 0000-0002-7056-4245; AAB-8673-2019
Audio data contains several sounds and is an important source for multimedia applications. One of them is unstructured Environmental Sounds (also referred to as audio events) that have noise-like characteristics with flat spectrums. Therefore, in general, recognition methods applied for music and speech data are not appropriate for the Environmental Sounds. In this paper, we propose an MFCC-SVM based approach that exploits the effect of feature representation and learner optimization tasks for efficient recognition of audio events from audio signals. The proposed approach considers efficient representation of MFCC features using different window and hop sizes by changing the number of Mel coefficients in the analyses as well as optimizing the SVM parameters. Moreover, 16 different audio events from the IEEE Audio and Acoustic Signal Processing (AASP) Challenge Dataset, namely alert, clear throat, cough, door slam, drawer, keyboard, keys, knock, laughter, mouse, page turn, pen drop, phone, printer, speech, and switch that are collected from office live environments are utilized in the evaluations. Our empirical evaluations show that, when the results of the proposed methods are chosen for MFFC feature and SVM classifier, the tests conducted through using 5-fold cross validation gives the results of 62%, 58% and 55% for Precision, Recall and F-measure scores, respectively. Extensive experiments on audio-based event detection using the IEEE AASP Challenge dataset show the effectiveness of the proposed approach.
Çevrimiçi Öğrenme Ortamlarında Öğrenme Analitiği Verileri Ve Makine Öğrenmesi Kullanarak Akademik Başarının Değerlendirilmesi
(Başkent Üniversitesi Fen Bilimler Enstitüsü, 2022) Tekinarslan, Ramazan; Sert, Mustafa
Covid - 19 salgını döneminde daha önce de kullanılan ancak yaygın olmayan çevrimiçi öğrenme ortamlarının sayısı artmıştır. Bu ortamlarda oluşan öğrenme analitiği verileri üzerinde makine öğrenmesi yöntemleri ile öğrenci başarısının tahmini ve sınıflandırma çalışmaları son yıllarda önem kazanmaktadır. Çevrimiçi öğrenme ortamından elde edilen öğrenme analitiği verileriyle öğrenci başarısı arasındaki ilintinin anlaşılması amacıyla; öğrenci başarısının tahmini ve sınıflandırması yapılmıştır. Veri kümesi üzerinde ilinti (korelasyon), özniteliğin önemi, fisher puanı, selectKbest ve bilgi kazancı öznitelik seçim yöntemleri kullanılarak öğrenci başarısı ile ilintili olan özniteliklerin tespiti yapılmıştır. Seçimi yapılan özniteliklerin normalizasyonu ve verilerin one hot encoding (OHE) yöntemi ile temsili sağlanmıştır. Öğrenci başarısının tahmini ve sınıflandırılmasında geleneksel makine öğrenmesi algoritmaları Rastgele orman (Random Forest-RF), Çok katmanlı algılayıcı (Multilayer Perceptron-MLP) ve k-en yakın komşu (k-Nearest Neighbours, k-NN) probleme uygulanmıştır. Bu çalışmada, öğrenci başarısının tahmini ve sınıflandırılması için verilerin OHE temsiline, öznitelik seçimine ve Evrişimsel Sinir Ağı (ESA)-(Convolutional Neural Network-CNN) mimarisine dayalı bir yöntem önerilmektedir. Belirtilen amaç doğrultusunda Başkent Üniversitesi’nin 2020-2021 akademik yılının çevrimiçi öğrenme ortamı olan Moodle verileri ile 2013-2014 yıllarına ait İngiltere’deki Open University çevrimiçi öğrenme veri kümesi kullanılmıştır. Başkent Üniversitesi veri kümesinde OHE temsili ve temsilsiz üçlü sınıflandırma çalışmasında önerilen ESA modeli %92 doğruluk başarım oranı ile geleneksel makine öğrenmesi yöntemlerinden yüksektir. Literatürde Open University veri kümesi üzerinde yapılan ikili, üçlü ve dörtlü sınıflandırma çalışma sonuçları ile önerilen ESA modeli sonuçları karşılaştırılmıştır. İkili sınıflandırmada %95,43 ile en yüksek başarım oranı bulunurken üçlü sınıflandırmada %88 ve dörtlü sınıflandırmada %73,32 değerleri diğer çalışmalara göre daha yüksek olarak bulunmuştur. Öğrenci başarısını tahmininde hata değerlendirme ölçütü olarak belirlenen kök ortalama kare hata (RMSE) ve ortalama mutlak hata (MAE) değerleri önerilen ESA modelinde %1’in altında kalarak diğer modellere göre düşük bir hata oranı vermiştir. Farklı veri kümeleri üzerinde ayrı ayrı kullanılan öznitelik seçimi, verilerin OHE temsili ve ESA mimarisine dayalı yöntem bu çalışmada birlikte kullanılarak literatüre katkı sunmaktadır. During the Covid-19 pandemic, the use of online learning environments is rapidly increasing. Estimation and classification studies of student success with machine learning methods on learning analytics data generated in these environments have gained importance in recent years. In order to understand the relationship between learning analytics data obtained from the online learning environment and student success; in this thesis, we deal with the estimation and classification of student success using the learning analytic data. With these aims, we propose a method based on One-Hot-Encoding (OHE) representation of data, feature selection, and Convolutional Neural Network (CNN) architecture for the estimation and classification of student success. We determine the features related to student success by using correlation, feature importance, fisher score, selectKbest, and knowledge gain feature selection methods on the data set. We also perform the normalization of the selected features and transform the representation of the data with OHE method. To demonstrate the efficacy of the proposed CNN-based architecture we also employ traditional machine learning algorithms such as Random Forest (RF), Multilayer Perceptron (MLP), and k-Nearest Neighbor, (k-NN) in the analyses. For the learning analytics data, we use the Moodle data, which is the online learning environment of Başkent University of the 2020-2021 academic year, and the Open University online learning dataset of years 2013-2014 in England. The results on the Başkent University dataset show that the proposed CNN model with- and without-OHE in three-class classification (fail, pass, distinction) score is higher than the traditional machine learning methods. We also compare the results of binary (fail, pass), three-class (withdrawn, fail, pass) and four-class (withdrawn, fail, pass, distinction) classification performance of our proposed CNN-based architecture on the Open University dataset. We achieved better results than the literature with the highest accuracy rates of 95.43% in two-class classification, 88% in three-class classification and 73.32% in four-class classification. For the estimation of student's grade, Root mean square error (RMSE) and mean absolute error (MAE) values remained below 1% in the proposed CNN-based model, giving a low error rate compared to other models. As a result, the proposed method achieves promising and better results in the evaluations.
Classification of Indoor-Outdoor Location using Blood Oxygen Saturation Signal
(2018) Memis, Gokhan; Sert, Mustafa; 0000-0002-7056-4245; AAB-8673-2019
Wearable technology, one of the most significant trends in the mobile computing evolution, has been changing our daily life. It has become increasingly popular in many different areas such as military, healthcare, entertainment, and education. In this paper, we aim to find out a person's indoor-outdoor location by oxygen saturation (SpO2) sensor. To this end, we build a new dataset consisting of twelve subjects between the ages of 20-65 and propose an ensemble learning based method for indoor-outdoor classification. We provide comparative tests with Naive Bayes (NB), k-nearest neighbor (kNN), and support vector machine (SVM) algorithms on the dataset and present empirical results regarding the SpO2 usage in different age groups. Our experimental results on real examples show that using RF gives best classification rates with an average accuracy of 69.33% for all test scenarios. Also, we see that, as the age increases, the oxygen saturation in the person's blood decreases.
Classification of Obstructive Sleep Apnea using Multimodal and Sigma-based Feature Representation
(2019) Memis, Gokhan; Sert, Mustafa
Obstructive sleep apnea (OSA) is a sleep disorder characterized by a decrease in blood oxygen saturation and waking up after a long time. Diagnosis can be made by following a full night with a polysomnogram device, so there is a need for computer-based methods for the diagnosis of OSA. In this study, a method based on feature selection is proposed for OSA classification using oxygen saturation and electrocardiogram signals. Standard deviation (sigma) based features have been created to increase accuracy and reduce computational complexity. To evaluate the effectiveness, comparisons were made with selected machine learning algorithms. The achievements of the obtained features were compared with Naive Bayes (NB), k-nearest neighborhood (kNN) and Support Vector Machine (SVM) classifiers. The tests performed on the PhysioNet dataset consisting of real clinical samples show that the use of sigma-based features result an average performance increase of 1.98% in all test scenarios.
Çok kipli video kavram sınıflandırılması
(Başkent Üniversitesi Fen Bilimleri Enstitüsü, 2018) Selbes, Berkay; Sert, Mustafa
Çokluortam verileri, İnternet kullanımının artmasıyla, sürekli üretilmekte ve paylaşılmaktadır. Bunun bir sonucu olarak, çokluortam verilerinin büyüklüğü hızla artmakta ve bu verilerin içeriklerini analiz eden otomatik yöntemlere ihtiyaç duyulmaktadır. Video verisi, çokluortam verilerinin önemli bir bileşenidir. Video içerik analizi, video verisi içeriğindeki zamansal veya konumsal olayların ve kavramların otomatik yöntemlerle belirlenmesi olarak tanımlanabilen önemli bir araştırma konusudur. Video içerik analizi, video içeriğinin karmaşık yapısı nedeniyle zor bir görevdir ve içerdiği bilgilerin otomatik olarak elde edilebilmesi için etkin yöntemlere ihtiyaç duyulmaktadır. Video verisinin artan büyüklüğü bu görevi zorlaştırmaktadır. Bu tez çalışmasında, video verilerinin çok kipli analizi için, görsel ve işitsel kiplerin füzyonuna dayalı bir yöntem önerilmektedir ve büyük veri platformunda uygulaması gerçekleştirilmektedir. Önerilen yöntem, Evrişimsel Sinir Ağı (ESA) öznitelikleri ile Mel-frekansı Kepstrum Katsayıları (MFCC) özniteliğinin temsillerinin füzyonuna dayanmaktadır. Büyük veri platformlarından Apache Spark kullanılarak önerilen yöntem gerçeklenmektedir. Önerilen yöntemin başarısı TRECVID 2012 SIN veri kümesi üzerinde değerlendirilmektedir. Sonuçlar göstermektedir ki, çok kipli yaklaşım tek kipli yaklaşımın başarısını geliştirmekte ve büyük veri platformu, çok kipli video içerik analizi yönteminin işlem zamanını önemli oranda düşürmektedir. The multimedia data has been continuously produced and shared out at a high rate as a result of the internet usage escalation. Thus, the size of multimedia data has rapidly increased, and hence, automated methods are needed to analyze the contents of the data produced. Video data is an important component of multimedia data. Video content analysis is an important research topic for several applications, such as audio-video based surveillance, content-based search and retrieval and can be defined as the automatic determination of temporal or spatial events/concepts in content of video data. Video content analysis is a difficult task due to the complex nature of the video content and requires efficient algorithms for extraction of high-level information included in the content. The increasing size of video data makes this task more difficult. In this thesis, a method based on the fusion of audio-visual modalities for multimodal content analysis of video data is proposed and implemented on a big data platform. The proposed method is based on the fusion of representations of Mel-frequency Cepstral Coefficient (MFCC) features with Convolutional Neural Network (CNN) features. The proposed method is implemented on Apache Spark big data platform. The success of the proposed method is evaluated on the TRECVID 2012 SIN data set. Our results show that the multi-modal method improves the accuracy of the single-model approach and also the big data platform significantly reduces the computation time of the multi-modal video content analysis method.
Combining Acoustic and Semantic Similarity for Acoustic Scene Retrieval
(2019) Sert, Mustafa; Basbug, Ahmet Melih
Automatic retrieval of acoustic scenes in large audio collections is a challenging task due to the complex structures of these sounds. A robust and flexible retrieval system should address both the acoustic- and semantic aspects of these sounds and how to combine them. In this study, we introduce an acoustic scene retrieval system that uses a combined acoustic- and semantic-similarity method. To address the acoustic aspects of sound scenes, we use a cascaded convolutional neural network (CNN) with a gated recurrent unit (GRU). The acoustic similarity is calculated in feature space using the Euclidean distance and the semantic similarity is obtained using the Path Similarity method of the WordNet. Two performance datasets from the TAU Urban Acoustic Scenes 2019 and the TUT Urban Acoustic Scenes 2018 are used to compare the performance of the proposed retrieval system with the literature and the developed baseline. Results show that the semantic similarity improves the mAP and P@k scores.
Continuous Valence Prediction Using Recurrent Neural Networks with Facial Expressions and EEG Signals
(2018) Sen, Dogancan; Sert, Mustafa
Automatic analysis of human emotions by computer systems is an important task for human-machine interaction. Recent studies show that, the temporal characteristics of emotions play an important role in the success of automatic recognition. Also, the use of different signals (facial expressions, bio-signals, etc.) is important for the understanding of emotions. In this study, we propose a multi-modal method based on feature-level fusion of human facial expressions and electroencephalograms (EEG) data to predict human emotions in continuous valence dimension. For this purpose, a recursive neural network (LSTM-RNN) with long short-term memory units is designed. The proposed method is evaluated on the MAHNOB-HCI performance data set.
Derin sinir ağlarını kullanarak uzun ve kısa videolarda zamansal eylem tanıma
(Başkent Üniversitesi Fen Bilimler Enstitüsü, 2023) Şahin, Yağmur; Sert, Mustafa
Günümüzde videoların büyük bir veri kaynağı oluşturması, anlamsal bilgi çıkarımı ve eylem tanıma gibi konularda derin öğrenmenin önemini artırmıştır. Videoların karmaşık ve dinamik yapısı nedeniyle gelişmiş modelleme teknikleri ve algoritmaların kullanılması gerekliliği ortaya çıkmıştır. Bu çalışmada, sayısal teknolojilerle artan video içeriklerinden anlamsal bilgi çıkarımı amacıyla, videolarda eylem tanıma problemi araştırılmıştır. Mevcut çalışmaların birçoğu, kısa videoların sınıflandırılmasına odaklanmaktadır. Tez kapsamında, kısa videoların yanısıra, uzun videoların sınıflandırması için üç boyutlu evrişimsel sinir ağları ve dikkat mekanizmasına dayalı özgün bir model önerilmektedir. Bu entegrasyon hem kısa hem de uzun videolardaki öğrenme sürecini iyileştirmekte ve aktivitelerin doğru tanımlanabilmesine olanak sunmaktadır. Önerilen model, uzun videoların sınıflandırması için öncelikle bölge öneri ağı adı verilen bir sinir ağı ile uzun videoların olası olay sınırlarını tespit etmekte, daha sonra önerilen video sınırları için sınıflandırma yapmaktadır. HMDB, UCF ve ActivityNet gibi veri kümeleri üzerinde gerçekleştirilen deneysel çalışmalarda, dikkat mekanizmalarının model performansını önemli ölçüde artırdığı görülmüştür. Önerilen model, 3D evrişimsel sinir ağları ve dikkat mekanizmalarının entegrasyonuyla, videolardan öznitelik çıkarımı ve aktivite tespiti yeteneklerini geliştirmiştir. Kısa video klipleri için HMDB ve UCF veri kümeleri, uzun videolar içinse ActivityNet veri kümesi kullanılarak modelin farklı aktivite tipleri ve video yapılarındaki tanıma yeteneği ölçülmüştür. Özellikle UCF ve HMDB veri kümelerinde, Öz Dikkat mekanizması kullanılan model yüksek doğruluk oranlarına ulaşırken, ActivityNet’te Çok Başlıklı Dikkat mekanizması uzun videolardaki karmaşık etkileşimleri daha etkili bir şekilde tanıma yeteneği sergilemiştir. Bu bulgular, dikkat mekanizmalarının videolardan anlamsal bilgi çıkarımında önemli bir rol oynadığını ve derin öğrenme yöntemlerinin bu alandaki potansiyelini ortaya koymaktadır. Elde edilen sonuçlar, önerilen derin öğrenme modelinin farklı video yapılarına uyum sağlama yeteneğini ve etkili bilgi çıkarımı gerçekleştirme kapasitesini açıkça ortaya koymaktadır. In today’s world, the vast amount of video data has increased the importance of deep learning in areas such as semantic information extraction and action recognition. Due to the complex and dynamic nature of videos, there is a need for advanced modeling techniques and algorithms. This study investigates the problem of action recognition in videos with the aim of extracting semantic information from the increasing video contents with digital technologies. Many of the existing studies focus on the classification of short videos. Within the scope of the thesis, an original model based on three-dimensional convolutional neural networks and attention mechanism is proposed for the classification of not only short videos but also long videos. This integration enhances the learning process in both short and long videos, enabling accurate action detection. The proposed model focuses on classifying long videos by first identifying potential event boundaries within these videos using a neural network known as region proposal network, and subsequently performing classification on the proposed video segments. Experimental studies carried out on datasets like HMDB, UCF, and ActivityNet have shown that attention mechanisms significantly improve model performance. The proposed model, integrating 3D convolutional neural networks and attention mechanisms, has improved feature extraction and activity detection capabilities from videos. The model’s ability to recognize various activity types and video structures was evaluated using the HMDB and UCF datasets for short video clips and the ActivityNet dataset for longer videos. Specifically, in the UCF and HMDB datasets, the model using the Self Attention mechanism achieved high accuracy rates, while in ActivityNet, the Multi-Head Attention mechanism displayed better ability to recognize complex interactions in longer videos. These findings highlight the crucial role of attention mechanisms in extracting semantic information from videos and reveal the potential of deep learning methods in this area. The obtained results clearly indicate the proposed deep learning model’s adaptability to different video structures and its capacity for effective information extraction.
Detection of Basic Human Physical Activities With Indoor Outdoor Information Using Sigma-Based Features and Deep Learning
(2019) Memis, Gokhan; Sert, Mustafa; 0000-0002-5758-4321; 0000-0002-7056-4245; AAB-8673-2019
The devices created on account of the developments in wearable technology are increasingly becoming a part of our daily lives. In particular, sensors have enhanced the usefulness of such devices. The aim of this paper is to detect human physical activity along with indoor/outdoor information by using mobile phones and a separate oxygen saturation sensor. There is no relevant dataset in the literature for this type of detection. For this purpose, data from four different types of human physical activity was collected through mobile phone and oxygen saturation sensors; 12 people aged between 20-65 years participated in the study. During the data collection process, different physical activities under different environmental conditions were performed by the subjects in 10 min. As a next step, a novel deep neural network (DNN) model specifically designed for physical activity recognition was proposed. In order to improve accuracy and reduce the computational complexity, standard deviation (sigma)-based features were introduced. To evaluate its efficacy, we conducted comparisons with selected machine learning algorithms on our proposed dataset. The results on our dataset indicate that the multimodal sigma-based features give the best classification accuracy of 81.60% using our proposed DNN method. Furthermore, the accuracy of the classification made with our proposed DNN method without sigma-based features was 79.04%.
Development of a Decision Support System for Selection of Reviewers to Evaluate Research and Development Projects
(2023) Kocak, Serdar; Ic, Yusuf Tansel; Sert, Mustafa; Atalay, Kumru Didem; Dengiz, Berna
The evaluation of Research and Development (R&D) projects consists of many steps depending on the government funding agencies and the support program. It is observed that the reviewer evaluation reports have a crucial impact on the support decisions of the projects. In this study, a decision support system (DSS), namely R&D Reviewer, is developed to help the decision-makers with the assignment of the appropriate reviewer to R&D project proposals. It is aimed to create an artificial intelligence-based decision support system that enables the classification of Turkish R&D projects with natural language processing (NLP) methods. Furthermore, we examine the reviewer ranking process by using fuzzy multi-criteria decision-making methods. The data in the database is processed primarily to classify the R&D projects and the word embedding model NLP, "Word2Vec". Also, we designed the Convolutional Neural Network (CNN) model to select the features by using the automatic feature learning approach. Moreover, we incorporate a new integrated hesitant fuzzy VIKOR and TOPSIS methodology into the developed DSS for the reviewer ranking process.
The development of a reviewer selection method: a multi-level hesitant fuzzy VIKOR and TOPSIS approaches
(2021) Kocak, Serdar; Ic, Yusuf Tansel; Atalay, Kumru Didem; Sert, Mustafa; Dengiz, Berna; 0000-0001-9274-7467; AGE-3003-2022
This paper proposes a new approach for the selection of reviewers to evaluate research and development (R&D) projects using a new integrated hesitant fuzzy VIKOR and TOPSIS methodology. A reviewer selection model must have a multi-level framework in which reviewer selection strategies and related objectives guide the second level of the reviewer performance ranking process. The model must measure reviewer performance related to the activities that are necessary for the R&D project evaluation to be successful. A novel model is presented in this paper. In the proposed methodology, the aim is to select a reviewer in a hierarchical decision-making structure. The selection criteria values and their weights were obtained using the hesitant fuzzy VIKOR method. For the selection of a suitable reviewer, the conventional TOPSIS model was used. We developed a simpler procedure for effectively performing the reviewer selection process. The new approach was tested with a real case study and satisfactory results were obtained. A comparative analysis is also included in the article for illustrative purposes.
Düzenleyici DNA motiflerinin tahmini
(Başkent Üniversitesi Fen Bilimleri Enstitüsü, 2009) Yıldız, Kerem; Sert, Mustafa
Gen ifadelerini düzenleyen mekanizmaların anlaşılması, moleküler biyolojideki önemli araştırma konularından birisidir. Bu konudaki önemli problemlerden birisi, transkripsiyon (yazım) faktörleri için Deoksiribonükleik Asit’te (DNA) bulunan bağlanma konumları gibi düzenleyici elemanları (motifleri) tanıma işlemidir. Son yıllarda bu amaç doğrultusunda birçok araç tasarlanmıştır. Önerilen bu araçlara rağmen DNA motiflerinin tahmini hala anlaşılmayan bir konu olarak kalmaya devam etmektedir. Bu çalışmada, Olasılıksal Sonek Ağacı (OSA) kullanılarak yeni bir motif tahmin yöntemi önerilmiştir. Deneysel sonuçlar başka motif bulma araçları ile karşılaştırmalı olarak değerlendirilmiştir. Elde edilen sonuçlar, önerilen yöntemin fare ve insan canlılarına ait motiflerde karşılaştırılan diğer yöntemlerden daha iyi sonuçlar verdiğini göstermiştir. A major study in molecular biology is to understand the mechanisms that regulate the expressions of genes. An important challenge in this study is to identify regulatory elements (motifs), notably the binding sites in deocsiribonucleic acid (DNA) for transcription factors. Over the past few years, numerous tools have become available for this task. Despite the large number of these proposed tools, the prediction of DNA motifs still remains as a complex challenge. In this study, a novel motif prediction method using Probabilistic Suffix Tree (PST) is proposed. Experimental results are evaluated comparatively with other motif prediction tools. Experimental results show that, the proposed method gives a better recognition rate than the compared motif prediction tools for human and mouse genomes.
Early and Late Level Fusion of Deep Convolutional Neural Networks for Visual Concept Recognition
(2016) Ergun, Hilal; Akyuz, Yusuf Caglar; Sert, Mustafa; Liu, Jianquan; 0000-0002-7056-4245; 0000-0002-7056-4245; B-1296-2011; D-3080-2015; AAB-8673-2019
Visual concept recognition is an active research field in the last decade. Related to this attention, deep learning architectures are showing great promise in various computer vision domains including image classification, object detection, event detection and action recognition in videos. In this study, we investigate various aspects of convolutional neural networks for visual concept recognition. We analyze recent studies and different network architectures both in terms of running time and accuracy. In our proposed visual concept recognition system, we first discuss various important properties of popular convolutional network architecture under consideration. Then we describe our method for feature extraction at different levels of abstraction. We present extensive empirical information along with best practices for big data practitioners. Using these best practices we propose efficient fusion mechanisms both for single and multiple network models. We present state-of-the-art results on benchmark datasets while keeping computational costs at low level. Our results show that these state-of-the-art results can be reached without using extensive data augmentation techniques.