Mühendislik Fakültesi / Faculty of Engineering

Permanent URI for this collectionhttps://hdl.handle.net/11727/1401

Browse

Search Results

Now showing 1 - 10 of 36
  • Item
    ParsyBot: Chatbot for Baskent University Related FAQs
    (18TH IEEE INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING, ICSC 2024, 2024-05-17) Karkiner, Zeynep; Yaman, Begum; Zengin, Begum; Cavli, Feride Nursena; Sert, Mustafa
    Reading regulations and instructions may take lots of time and sometimes it results in disappointments. To avoid this issue, people are prone to use sources that provide fast and accurate answers while accessing the information. Chatbots, are one of the most popular trend topics nowadays, and may adapted into various fields e.g., healthcare, finance, and education. This paper proposes the development of ParsyBot which is a Turkish chatbot designed to inform users about the regulations, admissions, departments, scholarships, and social clubs of Baskent University. Furthermore, users may ask via voice in Turkish this feature is not common among the other chatbots. ParsyBot uses a pre-trained BERT model which is specifically trained with regulations and instructions of Baskent University. Parsybot runs on web and mobile platforms to make it available for everyone. Our experiments on the utilized dataset, ParsyBot, reached 0.81 in METEOR, and 0.24 in ROGUE-1, which are promising compared to the ChatGPT 3.5.
  • Item
    Sarcasm Detection in News Headlines with Deep Learning
    (32ND IEEE SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, SIU 2024, 2024-12-05) Karkiner, Zeynep; Sert, Mustafa
    Sarcasm detection is one of the recent topics studied in the field of natural language processing. Although sarcasm detection is generally carried out through social media comments in the literature, it can also be applied to news headlines that are expected to be completely objective and reflect reality. In this study, sarcasm detection was carried out using various deep learning models in a dataset containing sarcastic and non-sarcastic news headlines. The accuracy of classification results of BERT, RNN, LSTM, and GRU models and their training time performance were compared. While the BERT model reached the highest accuracy (0.88), RNN was the most successful model in terms of training time performance.
  • Item
    Multimodal Video Captioning Using Object-Auditory Information Fusion with Transformers
    (2023) Selbes, Berkay; Sert, Mustafa
    Video captioning aims to generate natural language sentences of an input video. Generating coherent natural language sentences is a challenging task due to the complex nature of video content such as object and scene understanding, extraction of object- and event-specific auditory information, and acquisition of relationships among objects. In this study, we address the problem of efficient modeling of object interactions in scenes, as they include crucial information regarding the events in the visual scene. To this end, we propose to use object features along with auditory information to better model the audio-visual scene appearing within the video. Specifically, we extract Faster R-CNN as the object features and VGGish as the auditory features and design a transformer encoder-decoder architecture in the multimodal setup. Experiments on MSR-VTT show encouraging results and object features better model the object interactions along with the auditory information in comparison to the ResNet features.
  • Item
    Efficient Recognition of Human Emotional States from Audio Signals
    (2014) Erdem, Ernur Sonat; Sert, Mustafa; https://orcid.org/0000-0002-7056-4245; AAB-8673-2019
    Automatic recognition of human emotional states is an important task for efficient human-machine communication. Most of existing works focus on the recognition of emotional states using audio signals alone, visual signals alone, or both. Here we propose empirical methods for feature extraction and classifier optimization that consider the temporal aspects of audio signals and introduce our framework to efficiently recognize human emotional states from audio signals. The framework is based on the prediction of input audio clips that are described using representative low-level features. In the experiments, seven (7) discrete emotional states (anger, fear, boredom, disgust, happiness, sadness, and neutral) from EmoDB dataset, are recognized and tested based on nineteen (19) audio features (15 standalone, 4 joint) by using the Support Vector Machine (SVM) classifier. Extensive experiments have been conducted to demonstrate the effect of feature extraction and classifier optimization methods to the recognition accuracy of the emotional states. Our experiments show that, feature extraction and classifier optimization procedures lead to significant improvement of over 11% in emotion recognition. As a result, the overall recognition accuracy achieved for seven emotions in the EmoDB dataset is 83.33% compared to the baseline accuracy of 72.22%.
  • Item
    Audio-based Event Detection in Office Live Environments Using Optimized MFCC-SVM Approach
    (2015) Kucukbay, Selver Ezgi; Sert, Mustafa; 0000-0002-7056-4245; AAB-8673-2019
    Audio data contains several sounds and is an important source for multimedia applications. One of them is unstructured Environmental Sounds (also referred to as audio events) that have noise-like characteristics with flat spectrums. Therefore, in general, recognition methods applied for music and speech data are not appropriate for the Environmental Sounds. In this paper, we propose an MFCC-SVM based approach that exploits the effect of feature representation and learner optimization tasks for efficient recognition of audio events from audio signals. The proposed approach considers efficient representation of MFCC features using different window and hop sizes by changing the number of Mel coefficients in the analyses as well as optimizing the SVM parameters. Moreover, 16 different audio events from the IEEE Audio and Acoustic Signal Processing (AASP) Challenge Dataset, namely alert, clear throat, cough, door slam, drawer, keyboard, keys, knock, laughter, mouse, page turn, pen drop, phone, printer, speech, and switch that are collected from office live environments are utilized in the evaluations. Our empirical evaluations show that, when the results of the proposed methods are chosen for MFFC feature and SVM classifier, the tests conducted through using 5-fold cross validation gives the results of 62%, 58% and 55% for Precision, Recall and F-measure scores, respectively. Extensive experiments on audio-based event detection using the IEEE AASP Challenge dataset show the effectiveness of the proposed approach.
  • Item
    Video Scene Classification Using Spatıal Pyramid Based Features
    (2014) Sert, Mustafa; Ergun, Hilal; https://orcid.org/0000-0002-7056-4245; AAB-8673-2019
    Recognition of video scenes is a challenging problem due to the unconstrained structure of the video content. Here, we propose a spatial pyramid based method for the recognition of video scenes and explore the effect of parameter optimization to the recognition accuracy. In the experiments different sampling methods, dictionary sizes, kernel methods, and pyramid levels are examined. Support Vector Machine (SVM) is employed for classification due to the success in pattern recognition applications. Our experiments show that, the size of dictionary and proper pyramid levels in feature representation drastically enhance the recognition accuracy.
  • Item
    Audio Captioning Based on Combined Audio and Semantic Embeddings
    (2020) Eren, Aysegul Ozkaya; Sert, Mustafa
    Audio captioning is a recently proposed task for automatically generating a textual description of a given audio clip. Most existing approaches use the encoder-decoder model without using semantic information. In this study, we propose a bi-directional Gated Recurrent Unit (BiGRU) model based on encoder-decoder architecture using audio and semantic embeddings. To obtain semantic embeddings, we extract subject-verb embeddings using the subjects and verbs from the audio captions. We use a Multilayer Perceptron classifier to predict subject-verb embeddings of test audio clips for the testing stage. Within the aim of extracting audio features, in addition to log Mel energies, we use a pretrained audio neural network (PANN) as a feature extractor which is used for the first time in the audio captioning task to explore the usability of audio embeddings in the audio captioning task. We combine audio embeddings and semantic embeddings to feed the BiGRU-based encoder-decoder model. Following this, we evaluate our model on two audio captioning datasets: Clotho and AudioCaps. Experimental results show that the proposed BiGRU-based deep model significantly outperforms the state of the art results across different evaluation metrics and inclusion of semantic information enhance the captioning performance.
  • Item
    Analysis of Deep Neural Network Models for Acoustic Scene Classification
    (2019) Basbug, Ahmet Melih; Sert, Mustafa
    Acoustic Scene Classification is one of the active fields of both audio signal processing and machine learning communities. Due to the uncontrolled environment characteristics and the multiple diversity of environmental sounds, the classification of acoustic environment recordings by computer systems is a challenging task. In this study, the performance of deep learning algorithms on acoustic scene classification problem which includes continuous information in sound events are analyzed. For this purpose, the success of the AlexNet and the VGGish based 4- and 8-layered convolutional neural networks utilizing long-short-term memory recurrent neural network (LSTM-RNN) and Gated Recurrent Unit Recurrent Neural Network (GRU-RNN) architectures have been analyzed for this classification task. In this direction, we adapt the LSTM-RNN and the GRU-RNN models with the 4- and 8-layared CNN architectures for the classification. Our experimental results show that 4-layered CNN with GRU structure improve the accuracy.
  • Item
    Classification of Obstructive Sleep Apnea using Multimodal and Sigma-based Feature Representation
    (2019) Memis, Gokhan; Sert, Mustafa
    Obstructive sleep apnea (OSA) is a sleep disorder characterized by a decrease in blood oxygen saturation and waking up after a long time. Diagnosis can be made by following a full night with a polysomnogram device, so there is a need for computer-based methods for the diagnosis of OSA. In this study, a method based on feature selection is proposed for OSA classification using oxygen saturation and electrocardiogram signals. Standard deviation (sigma) based features have been created to increase accuracy and reduce computational complexity. To evaluate the effectiveness, comparisons were made with selected machine learning algorithms. The achievements of the obtained features were compared with Naive Bayes (NB), k-nearest neighborhood (kNN) and Support Vector Machine (SVM) classifiers. The tests performed on the PhysioNet dataset consisting of real clinical samples show that the use of sigma-based features result an average performance increase of 1.98% in all test scenarios.
  • Item
    Combining Acoustic and Semantic Similarity for Acoustic Scene Retrieval
    (2019) Sert, Mustafa; Basbug, Ahmet Melih
    Automatic retrieval of acoustic scenes in large audio collections is a challenging task due to the complex structures of these sounds. A robust and flexible retrieval system should address both the acoustic- and semantic aspects of these sounds and how to combine them. In this study, we introduce an acoustic scene retrieval system that uses a combined acoustic- and semantic-similarity method. To address the acoustic aspects of sound scenes, we use a cascaded convolutional neural network (CNN) with a gated recurrent unit (GRU). The acoustic similarity is calculated in feature space using the Euclidean distance and the semantic similarity is obtained using the Path Similarity method of the WordNet. Two performance datasets from the TAU Urban Acoustic Scenes 2019 and the TUT Urban Acoustic Scenes 2018 are used to compare the performance of the proposed retrieval system with the literature and the developed baseline. Results show that the semantic similarity improves the mAP and P@k scores.