Yarı denetimli öğrenme ve füzyon teknikleri ile zayıf etiketli veri kümelerinde ses olayı sezimi

Akar, Yeşim

Yarı denetimli öğrenme ve füzyon teknikleri ile zayıf etiketli veri kümelerinde ses olayı sezimi

Files

10591803.pdf (4.97 MB)

Date

2023

Authors

Akar, Yeşim

Publisher

Başkent Üniversitesi Fen Bilimler Enstitüsü

Abstract

Ses Olayı Sezimi, ses sinyalleri içerisinde yer alan spesifik ses olaylarını otomatik olarak tanımlama ve sınıflandırma görevidir. Güvenlik sistemleri, otomasyon sistemleri ve ses tabanlı kullanıcı etkileşimleri gibi geniş uygulama alanlarına sahiptir. Ancak, bu modellerin çoğu uzun eğitim süreleri ve hesaplama maliyetlerini beraberinde getirmektedir. Bu tez, çoğunluğu günlük yaşamdan elde edilmiş ses kayıtlarından, ses olaylarının ve bu olayların başlangıç ve bitiş noktalarının hassas bir şekilde tespit edilebilmesi için derin öğrenme tabanlı yöntemlerin geliştirilmesini hedefler. Araştırmada, benzer çalışmalardan farklı olarak, büyük çoğunluğu zayıf etiketli ve etiketsiz seslerden oluşan veri kümeleri üzerine yoğunlaşıyoruz. Eğitim sürecini hızlandırmak için, yarı denetimli öğrenme tekniklerinden birisi olan ortalama öğretmen modelini kullanılmaktadır. Diğer yandan, dikkat mekanizmalarının, ses sinyalleri içinde belirli kısımlara odaklanarak zamansal bağlamlar ve ilişkiler üzerinden daha etkin sonuçlar almayı mümkün kılmaktadır. Bu çalışmada, öğretmen-öğrenci modelinin yanı sıra, öz dikkat ve çok başlı dikkat mekanizmalarının, ses olayı sezimindeki rolleri derinlemesine incelenmiştir. Mel- Frequency Cepstral Coefficients (MFCC), Log-Mel Spectrogram (Log-Mel), Bidirectional Encoder representation from Audio Transformers (BEATs), Audio Spectrogram Transformer (AST) ve Pretrained Audio Neural Networks (PANNs) gibi düşük ve yüksek seviyeli ses öznitelikleri kullanılarak, dikkat mekanizmalarının bireysel ve birleştirilmiş özniteliklerle olan etkileri analiz edilmiştir. Çalışmamızda, erken ve geç füzyon tekniklerini de içerecek şekilde çok başlı dikkat mekanizmasının potansiyeli, öz dikkat mekanizmasıyla karşılaştırılmış ve değerlendirilmiştir. Sonuçlarımız, bireysel öznitelikler yerine birleştirilmiş öznitelik kullanımının, özellikle dikkat mekanizmaları entegre edildiğinde, ses olayı sezim performansında belirgin bir iyileşme sağladığını ortaya koymuştur. Bununla birlikte, erken füzyon yöntemi uygulanarak özniteliklerin birleştirmesi ve çok başlı dikkat mekanizması entegrasyonu ile daha da yüksek başarım elde edilmiştir. Bu çalışma, etiketli eğitim verilerinin az olduğu senaryolarda sinir ağlarının sağlamlığını ve genelleme performansını artıracak metotlar sunmaktadır. Sound Event Detection (SED) is the task of automatically identifying and classifying specific sound events within audio signals. It has a wide range of applications including security systems, automation systems, and audio-based user interactions. However, most of these models come with long training durations and high computational costs. This thesis aims to develop deep learning-based methods for more accurately detecting sound events and their start and end points, primarily from sound recordings obtained from daily life. Unlike similar studies, our research focuses on datasets composed mostly of weakly labeled and unlabeled sounds. To accelerate the training process, we utilize the mean teacher model, which is a technique of semi supervised learning. On the other hand, attention mechanisms enable more effective results by focusing on specific parts within sound signals and through temporal contexts and relationships. In this study, alongside the teacher-student model, the roles of self-attention and multi-head attention mechanisms in sound event detection are thoroughly examined. The effects of attention mechanisms with individual and combined features have been analyzed using low and high-level audio features such as Mel-Frequency Cepstral Coefficients (MFCC), Log Mel-Spectrogram (Log-Mel), Bidirectional Encoder Representation from Audio Transformers (BEATs), Audio Spectrogram Transformer (AST), and Pretrained Audio Neural Networks (PANNs). Our work compares and evaluates the potential of the multi-head attention mechanism, including early and late fusion techniques, against the self-attention mechanism. Our results indicate that combined features with attention mechanisms compared to individual features, significantly improving detection performance. Additionally, even higher performance was achieved by combining features using the early fusion method and integrating the multi-head attention mechanism. This study offers methods to increase the robustness and generalization performance of neural networks in scenarios where labeled training data is scarce.

Keywords

Ses Olayı Sezimi, Ses Öznitelikleri, Çok Başlı Dikkat Mekanizması, Ortalama Öğretmen Modeli, Erken Füzyon, Geç Füzyon

URI

http://hdl.handle.net/11727/12282

Collections

Fen Bilimleri Enstitüsü / Science Institute

Full item page

Yarı denetimli öğrenme ve füzyon teknikleri ile zayıf etiketli veri kümelerinde ses olayı sezimi

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By