1 results listed
Transformer architectures are powerful deep
learning methods widely used in both image processing and
natural language processing. Transformer models are quite
advantageous in learning long-range dependencies, mostly due to
the self-attention mechanism. Image Transformer (ViT)
generalizes this concept to image processing by dividing images
into parts and then treating each part as a sequence, unlike other
structures such as CNNs. In this way, it provides better learning
due to the connection of the divided image with other parts and the
attention mechanism. Compared to CNNs, the image is not
directly processed and thus offers an effective usage. The training
process in this research was done with ViT model. It was done with
images taken from the eye disease classification disease dataset.
The research aims to determine the effect of resolution on model
accuracy. Various performance metrics were used to compare
models trained on learning and generalization images. The results
show the sensitivity of ViT models to resolution for the best
resolution selection. The results benefit not only academia but also
industry in the form of increased efficiency in deep learning
models. In this study, the effects of images with different resolution
values on the performance of the ViT model and comparisons on
running times were made
International Conference on Advanced Technologies, Computer Engineering and Science
ICATCES
Hafize Esra ÖZDENİZ
Alper KILIÇ