INDEXIVE | Smart Conference Indexing Services

Year

2025

Subject Area

Computer Science

Broadcast Area

International

Document Type

Article

Language

English

1 results listed

2025 Performance Analysis of Vision Transformer Models Depending on Image Resolution

Transformer architectures are powerful deep learning methods widely used in both image processing and natural language processing. Transformer models are quite advantageous in learning long-range dependencies, mostly due to the self-attention mechanism. Image Transformer (ViT) generalizes this concept to image processing by dividing images into parts and then treating each part as a sequence, unlike other structures such as CNNs. In this way, it provides better learning due to the connection of the divided image with other parts and the attention mechanism. Compared to CNNs, the image is not directly processed and thus offers an effective usage. The training process in this research was done with ViT model. It was done with images taken from the eye disease classification disease dataset. The research aims to determine the effect of resolution on model accuracy. Various performance metrics were used to compare models trained on learning and generalization images. The results show the sensitivity of ViT models to resolution for the best resolution selection. The results benefit not only academia but also industry in the form of increased efficiency in deep learning models. In this study, the effects of images with different resolution values on the performance of the ViT model and comparisons on running times were made

International Conference on Advanced Technologies, Computer Engineering and Science
ICATCES

Hafize Esra ÖZDENİZ Alper KILIÇ

97 213

Subject Area: Computer Science Broadcast Area: International Type: Article Language: English