Mühendislik Fakültesi / Faculty of Engineering

Permanent URI for this collectionhttps://hdl.handle.net/11727/1401

Browse

Search Results

Now showing 1 - 4 of 4
  • Item
    Video Scene Classification Using Spatıal Pyramid Based Features
    (2014) Sert, Mustafa; Ergun, Hilal; https://orcid.org/0000-0002-7056-4245; AAB-8673-2019
    Recognition of video scenes is a challenging problem due to the unconstrained structure of the video content. Here, we propose a spatial pyramid based method for the recognition of video scenes and explore the effect of parameter optimization to the recognition accuracy. In the experiments different sampling methods, dictionary sizes, kernel methods, and pyramid levels are examined. Support Vector Machine (SVM) is employed for classification due to the success in pattern recognition applications. Our experiments show that, the size of dictionary and proper pyramid levels in feature representation drastically enhance the recognition accuracy.
  • Item
    Feature-level Fusion of Convolutional Neural Networks for Visual Object Classification
    (2016) Ergun, Hilal; Sert, Mustafa; https://orcid.org/0000-0002-7056-4245; AAB-8673-2019
    Deep learning architectures have shown great success in various computer vision applications. In this study, we investigate some of the very popular convolutional neural network (CNN) architectures, namely GoogleNet, AlexNet, VGG19 and ResNet. Furthermore, we show possible early feature fusion strategies for visual object classification tasks. Concatanation of features, average pooling and maximum pooling are among the investigated fusion strategies. We obtain state-of-the-art results on well-known image classification datasets of Caltech-101, Caltech-256 and Pascal VOC 2007.
  • Item
    Efficient Bag of Words Based Concept Extraction for Visual Object Retrieval
    (2016) Ergun, Hilal; Sert, Mustafa; https://orcid.org/0000-0002-7056-4245; AAB-8673-2019
    Recent burst of multimedia content available on Internet is pushing expectations on multimedia retrieval systems to even higher grounds. Multimedia retrieval systems should offer better performance both in terms of speed and memory consumption while maintaining good accuracy compared to state-of-the-art implementations. In this paper, we discuss alternative implementations of visual object retrieval systems based on popular bag of words model and show optimal selection of processing steps. We demonstrate our offering using both keyword and example-based retrieval queries on three frequently used benchmark databases, namely Oxford, Paris and Pascal VOC 2007. Additionally, we investigate effect of different distance comparison metrics on retrieval accuracy. Results show that, relatively simple but efficient vector quantization can compete with more sophisticated feature encoding schemes together with the adapted inverted index structure.
  • Item
    Early and Late Level Fusion of Deep Convolutional Neural Networks for Visual Concept Recognition
    (2016) Ergun, Hilal; Akyuz, Yusuf Caglar; Sert, Mustafa; Liu, Jianquan; 0000-0002-7056-4245; 0000-0002-7056-4245; B-1296-2011; D-3080-2015; AAB-8673-2019
    Visual concept recognition is an active research field in the last decade. Related to this attention, deep learning architectures are showing great promise in various computer vision domains including image classification, object detection, event detection and action recognition in videos. In this study, we investigate various aspects of convolutional neural networks for visual concept recognition. We analyze recent studies and different network architectures both in terms of running time and accuracy. In our proposed visual concept recognition system, we first discuss various important properties of popular convolutional network architecture under consideration. Then we describe our method for feature extraction at different levels of abstraction. We present extensive empirical information along with best practices for big data practitioners. Using these best practices we propose efficient fusion mechanisms both for single and multiple network models. We present state-of-the-art results on benchmark datasets while keeping computational costs at low level. Our results show that these state-of-the-art results can be reached without using extensive data augmentation techniques.