Araştırma Çıktıları | TR-Dizin | WoS | Scopus | PubMed

Permanent URI for this communityhttps://hdl.handle.net/11727/4806

Browse

Search Results

Now showing 1 - 7 of 7

Audio Captioning with Composition of Acoustic and Semantic Information
(2021) Eren, Aysegul Ozkaya; Sert, Mustafa
Generating audio captions is a new research area that combines audio and natural language processing to create meaningful textual descriptions for audio clips. To address this problem, previous studies mostly use the encoder-decoder-based models without considering semantic information. To fill this gap, we present a novel encoder-decoder architecture using bi-directional Gated Recurrent Units (BiGRU) with audio and semantic embeddings. We extract semantic embedding by obtaining subjects and verbs from the audio clip captions and combine these embedding with audio embedding to feed the BiGRU-based encoder-decoder model. To enable semantic embeddings for the test audios, we introduce a Multilayer Perceptron classifier to predict the semantic embeddings of those clips. We also present exhaustive experiments to show the effciency of different features and datasets for our proposed model the audio captioning task. To extract audio features, we use the log Mel energy features, VGGish embeddings, and a pretrained audio neural network (PANN) embeddings. Extensive experiments on two audio captioning datasets Clotho and AudioCaps show that our proposed model outperforms state-of-the-art audio captioning models across different evaluation metrics and using the semantic information improves the captioning performance.
The development of a reviewer selection method: a multi-level hesitant fuzzy VIKOR and TOPSIS approaches
(2021) Kocak, Serdar; Ic, Yusuf Tansel; Atalay, Kumru Didem; Sert, Mustafa; Dengiz, Berna; 0000-0001-9274-7467; AGE-3003-2022
This paper proposes a new approach for the selection of reviewers to evaluate research and development (R&D) projects using a new integrated hesitant fuzzy VIKOR and TOPSIS methodology. A reviewer selection model must have a multi-level framework in which reviewer selection strategies and related objectives guide the second level of the reviewer performance ranking process. The model must measure reviewer performance related to the activities that are necessary for the R&D project evaluation to be successful. A novel model is presented in this paper. In the proposed methodology, the aim is to select a reviewer in a hierarchical decision-making structure. The selection criteria values and their weights were obtained using the hesitant fuzzy VIKOR method. For the selection of a suitable reviewer, the conventional TOPSIS model was used. We developed a simpler procedure for effectively performing the reviewer selection process. The new approach was tested with a real case study and satisfactory results were obtained. A comparative analysis is also included in the article for illustrative purposes.
Anomaly Detection in Smart Home Environments using Convolutional Neural Network
(2021) Ercan, Naci Mert; Sert, Mustafa
The use of smart devices in home environments has been increasing in recent years. The wireless connection of these devices to the internet enables smart homes to be built with less cost and hence, recognition of activities in home environments and the detection of possible anomalies in activities is important for several applications. In this study, we propose a new method based on the changepoint representation of sensor data and variable-length windowing for the recognition of abnormal activities. We present comparative analyses with different representations to demonstrate the efficacy of the proposed scheme. Our results on the WSU performance dataset show that, the use of variable-length windowing improves the anomaly detection performance in comparison to fixed-length windowing.
Visual and Auditory Data Fusion for Energy-Efficient and Improved Object Recognition in Wireless Multimedia Sensor Networks
(2019) Koyuncu, Murat; Yazici, Adnan; Civelek, Muhsin; Cosar, Ahmet; Sert, Mustafa; 0000-0002-7056-4245; AAB-8673-2019
Automatic threat classification without human intervention is a popular research topic in wireless multimedia sensor networks (WMSNs) especially within the context of surveillance applications. This paper explores the effect of fusing audio-visual multimedia and scalar data collected by the sensor nodes in a WMSN for the purpose of energy-efficient and accurate object detection and classification. In order to do that, we implemented a wireless multimedia sensor node with video and audio capturing and processing capabilities in addition to traditional/ordinary scalar sensors. The multimedia sensors are kept in sleep mode in order to save energy until they are activated by the scalar sensors which are always active. The object recognition results obtained from video and audio applications are fused to increase the object recognition performance of the sensor node. Final results are forwarded to the sink in text format, and this greatly reduces the size of data transmitted in network. Performance test results of the implemented prototype system show that the fusing audio data with visual data improves automatic object recognition capability of a sensor node significantly. Since auditory data requires less processing power compared to visual data, the overhead of processing the auditory data is not high, and it helps to extend network lifetime of WMSNs.
Acoustic Scene Classification Using Spatial Pyramid Pooling With Convolutional Neural Networks
(2019) Basbug, Ahmet Melih; Sert, Mustafa; 0000-0002-7056-4245; AAB-8673-2019
Automatic understanding of audio events and acoustic scenes has been an active research topic for researchers from signal processing and machine learning communities. Recognition of acoustic scenes in the real life scenarios is a challenging task due to the diversity of environmental sounds and uncontrolled environments. Efficient methods and feature representations are needed to cope with these challenges. In this study, we address the acoustic scene classification of raw audio signal and propose a cascaded CNN architecture that uses spatial pyramid pooling (SPP, also referred to as spatial pyramid matching) method to aggregate local features coming from convolutional layers of the CNN. We use three well known audio features, namely MFCC, Mel Energy, and spectrogram to represent audio content and evaluate the effectiveness of our proposed CNN-SPP architecture on the DCASE 2018 acoustic scene performance dataset. Our results show that, the proposed CNN-SPP architecture with the spectrogram feature improves the classification accuracy.
Sketch recognition using transfer learning
(2019) Sert, Mustafa; Boyaci, Emel; 0000-0002-7056-4245; AAB-8673-2019
Humans have an excellent ability to recognize freehand sketch drawings despite their abstract and sparse structures. Understanding freehand sketches with automated methods is a challenging task due to the diversity and abstract structures of these sketches. In this paper, we propose an efficient freehand sketch recognition scheme, which is based on the feature-level fusion of Convolutional Neural Networks (CNNs) in the transfer learning context. Specifically, we analyse different layer performances of distinct ImageNet pretrained CNNs and combine best performing layer features within the CNN-SVM pipeline for recognition. We also employ Principal Component Analysis (PCA) to reduce the fused deep feature dimensions to ensure the efficiency of the recognition application on the limited-capacity devices. We perform evaluations on two real sketch benchmark datasets, namely the Sketchy and the TU-Berlin to show the effectiveness of the proposed scheme. Our experimental results show that, the feature-level fusion scheme with the PCA achieves a recognition accuracy of 97.91% and 72.5% on the Sketchy and TU-Berlin datasets, respectively. This result is promising when compared with the human recognition accuracy of 73.1% on the TU-Berlin dataset. We also develop a sketch recognition application for smart devices to demonstrate the proposed scheme.
Detection of Basic Human Physical Activities With Indoor Outdoor Information Using Sigma-Based Features and Deep Learning
(2019) Memis, Gokhan; Sert, Mustafa; 0000-0002-5758-4321; 0000-0002-7056-4245; AAB-8673-2019
The devices created on account of the developments in wearable technology are increasingly becoming a part of our daily lives. In particular, sensors have enhanced the usefulness of such devices. The aim of this paper is to detect human physical activity along with indoor/outdoor information by using mobile phones and a separate oxygen saturation sensor. There is no relevant dataset in the literature for this type of detection. For this purpose, data from four different types of human physical activity was collected through mobile phone and oxygen saturation sensors; 12 people aged between 20-65 years participated in the study. During the data collection process, different physical activities under different environmental conditions were performed by the subjects in 10 min. As a next step, a novel deep neural network (DNN) model specifically designed for physical activity recognition was proposed. In order to improve accuracy and reduce the computational complexity, standard deviation (sigma)-based features were introduced. To evaluate its efficacy, we conducted comparisons with selected machine learning algorithms on our proposed dataset. The results on our dataset indicate that the multimodal sigma-based features give the best classification accuracy of 81.60% using our proposed DNN method. Furthermore, the accuracy of the classification made with our proposed DNN method without sigma-based features was 79.04%.

Araştırma Çıktıları | TR-Dizin | WoS | Scopus | PubMed

Browse

Filters

Settings

Sort By

Results per page

Search Results