Omkar Thawakar

JULY,2022

Video Instance Segmentation via Multi-Scale Spatio-Temporal Split Attention Transformer

European Conference on Computer Vision (ECCV) 2022.

State-of-the-art transformer-based video instance segmentation (VIS) approaches typically utilize either single-scale spatio-temporal features or per-frame multi-scale features during the attention computations. We argue that such an attention computation ignores the multi-scale spatio-temporal feature relationships that are crucial to tackle target appearance deformations in videos. To address this issue, we propose a transformer-based VIS framework, named MS-STS VIS, that comprises a novel multi-scale spatio-temporal split (MS-STS) attention module in the encoder. The proposed MS-STS module effectively captures spatio-temporal feature relationships at multiple scales across frames in a video. We further introduce an attention block in the decoder to enhance the temporal consistency of the detected instances in different frames of a video. Moreover, an auxiliary discriminator is introduced during training to ensure better foreground-background separability within the multi-scale spatio-temporal feature space. We conduct extensive experiments on two benchmarks: Youtube-VIS (2019 and 2021). Our MS-STS VIS achieves state-of-the-art performance on both benchmarks. When using the ResNet50 backbone, our MS-STS achieves a mask AP of 50.1%, outperforming the best reported results in literature by 2.7% and by 4.8% at higher overlap threshold of AP75, while being comparable in model size and speed on Youtube-VIS 2019 val. set. When using the Swin Transformer backbone, MS-STS VIS achieves mask AP of 61.0% on Youtube-VIS 2019 val. set.

Project Page

JULY,2019

Image and Video Super Resolution using Recurrent Generative Adversarial Network

IEEE International conference on Advanced Video and Signal based Surveillance (AVSS) Taipei, Taiwan 2019.

Recently, the convolutional neural network with residual learning models achieves high accuracy for single image super-resolution with different scale factors. With adversarial learning model, effective learning of transformation function for the low-resolution input image to a high-resolution target image can be achieved. In this paper, we propose a method for image and video super-resolution using the recurrent generative adversarial network named SR 2 GAN. In the proposed model (SR 2 GAN) we use recursive learning for video super-resolution to overcome the difﬁculty of learning transformation function for synthesizing realistic high-resolution images. This recursive approach helps to reduce the parameters with increasing depth of the model. An extensive evaluation is performed to examine the effectiveness of the proposed model, which shows that SR 2 GAN performs better in terms of peak signal to noise ratio (PSNR) and structural self-similarity index (SSIM) as compared to the state-of-the-art methods for super-resolution.

JULY,2019

Motion Saliency Based Generative Adversarial Network For Underwater Moving Object Segmentation.

IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan 2019.

The underwater moving object segmentation is a challenging task. The problems like absorbing, scattering and attenuation of light rays between the scene and the imaging platform degrades the visibility of image or video frames. Also, the back-scattering of light rays further increases the problem of underwater video analysis, because the light rays interact with underwater particles and scattered back to the sensor. In this paper, a novel Motion Saliency Based Generative Adversarial Network (GAN) for Underwater Moving Object Segmentation (MOS) is proposed. The proposed network comprises of both identity mapping and dense connections for underwater MOS. To the best of our knowledge, this is the first paper with the concept of GAN-based unpaired learning for MOS in underwater videos. Initially, current frame motion saliency is estimated using few initial video frames and current frame. Further, estimated motion saliency is given as input to the proposed network for foreground estimation. To examine the effectiveness of proposed network, the Fish4Knowledge [1] underwater video dataset and challenging video categories of ChangeDetection.net-2014 [2] datasets are considered. The segmentation accuracy of existing state-of-the-art methods are used for comparison with proposed approach in terms of average F-measure. From experimental results, it is evident that the proposed network shows significant improvement as compared to the existing state-of-the-art methods for MOS.

FEB,2018

Application of Machine Learning for profile reconstruction of IPM device.

International Conference on Computing: Communication, Networks and Security (IC3NS-2018), India

Measured IPM profiles can be significantly distorted due to displacement of residual ions or electrons by interaction with beam fields for high brightness or high energy beams [1, 2, 3]. It is thus difficult to deduce the characteristics of the actual beam from the measurements. In this project different Machine Learning Regression Algorithms are applied to reconstruct the actual beam profile from the measurement data.

JANUARY,2019

Training optimisation of feedforward neural network for binary classification.

2019 International Conference on Computer Communication and Informatics (ICCCI -2019), 2019, Coimbatore, INDIA

In this paper, we present a novel technique to reduce the training time of a feedforward neural network by intuiting some of the parameters involved in construction and initialization of the network. These estimated parameters include the number and size of the hidden layers along with the weights related to the neurons. The weights and network architecture is estimated before training by formulating an approximation of the \emph{decision boundary} we want the network to learn. This specific configuration will allow the network to learn the optimum weights in less iterations than in the case of random initialization of weights.

Publications ...

Video Instance Segmentation via Multi-Scale Spatio-Temporal Split Attention Transformer

Image and Video Super Resolution using Recurrent Generative Adversarial Network

Motion Saliency Based Generative Adversarial Network For Underwater Moving Object Segmentation.

Application of Machine Learning for profile reconstruction of IPM device.

Training optimisation of feedforward neural network for binary classification.

About Me

Best Research Paper Award

Best thesis award for B.Tech project

Maze Solver Robot

My Goals

Social

Deep Learning / Machine Learning

Artificial Intelligence

Computer Vision

Reinforcement Learning