News

Features ->VGG16 Feature Extraction: Utilizes pre-trained VGG16 layers to extract rich feature representations from input images. ->Encoder-Decoder Architecture: Employs a combination of an encoder to ...
This project implements an Image Captioning model using a CNN-RNN architecture in PyTorch. The model leverages the Inception v3 pre-trained model for feature extraction from images and an LSTM-based ...
In this paper we deal with image classification tasks using the powerful CLIP vision-language model. Our goal is to advance the classification performance using the CLIP’s image encoder, by proposing ...
To mitigate this issue, we leverage a Contrastive Language-Image Pretraining (CLIP)-based architecture and its semantic knowledge from massive datasets that aims to enhance the discriminative ...
In this paper, we design and train a Generative Image-to-text Transformer, GIT, to unify vision-language tasks such as image/video captioning and question answering. While generative models provide a ...