I am a researcher specializing in visual computing and machine learning. My doctoral research, supervised by Prof. Chetan Arora at IIT Delhi, focused on proposing novel methods to enhance trust and reliability in deep neural networks (DNNs) based classifiers. During my doctoral studies, I developed techniques for out-of-distribution detection, uncertainty quantification, and the refinement of DNN models backed by theoretical insights.
Before pursuing my doctoral research, I had the privilege of working with Prof. Ramakrishna Kakarala at Nanyang Technological University on High Dynamic Range (HDR) Imaging algorithms. This work formed a part of the image processing pipeline aimed at smartphone cameras. Our research was recognized with the Best Student Paper award at the 2012 SPIE conference in Burlingame, California. I completed my Masterâs degree at the School of Electronic Engineering and Computing at Dublin City University in 2014, under the guidance of Prof. Noel O'Connor and Prof. Alan Smeaton. My focus during this time was on reducing false alarms in surveillance camera networks, with a portion of this research being licensed to Netwatch Systems and the team received Invent award for the project.
Currently, I am employed as a Senior Scientist at TCS Research Labs, within the Deep Learning and Artificial Intelligence Group (DLAI) located at the Research and Development Park, IIT Delhi, India. At TCS, my work spans various areas, including efficient inference of DNNs through model compression, trustworthy ML, continual learning, and, more importantly, the development of algorithms for creative and immersive content generation, such as images, videos, and 3D/4D data
Outside work, I enjoy painting, traveling, cooking and baking, composting, planting tree saplings, and music.
Calibration in VQA measures how well a modelâs confidence reflects correctness, which is critical for autonomous, high-stakes applications where models are often overconfident. We propose AlignVQA, a debate-based multi-agent framework where specialized VLMs generate answers and generalist agents critique and aggregate them to produce better-calibrated confidence estimates. Additionally, our AlignCal loss improves calibration during training, leading to significantly reduced calibration errors across VQA benchmarks.
StyleGAN is a powerful generative model but suffers from catastrophic forgetting when trained continuously on new data distributions. We propose StyleCL, which enables lifelong learning by learning task-specific latent subspace dictionaries and lightweight feature adaptors, while reusing prior knowledge when beneficial. This approach avoids forgetting, improves generation quality across datasets, and requires significantly fewer additional parameters per task.
Knowledge Distillation for Calibration (KD(C)) extends distillation beyond accuracy transfer to produce lightweight, well-calibrated models. We showâboth theoretically and empiricallyâthat calibration can be improved without sacrificing accuracy. In some cases, KD(C) even surpasses standard training, achieving both better calibration and higher accuracy.
Diffusion models enable high-quality prompt-based image editing but struggle with precise, fine-grained changes using text alone. We propose a zero-shot multi-diffusion framework for localized multi-object editing, allowing additions, replacements, and edits in a single pass. Our method, along with the LoMOE-Bench dataset, outperforms prior approaches in both editing quality and speed.
We introduce ReMOVE, a novel reference-free metric for assessing object erasure efficacy in diffusion-based image editing models post-generation. Unlike existing measures such as LPIPS and CLIPScore, ReMOVE addresses the challenge of evaluating inpainting without a reference image, common in practical scenarios. ReMOVE effectively distinguishes between object removal and replacement, a key issue in diffusion models due to stochastic nature of image generation.
Animating a virtual character based on a real performance of an actor is a challenging task that currently requires expensive motion capture setups and additional effort by expert animators, rendering it accessible only to large production houses. The goal of our work is to democratize this task by developing a frugal alternative termed Transfer4D that uses only commodity depth sensors and further reduces animators' effort by automating the rigging and animation transfer process. Our approach can transfer motion from an incomplete, single-view depth video to a semantically similar target mesh, unlike prior works that make a stricter assumption on the source to be noise-free and watertight.
We demonstrate state-of-the-art Deep Neural Network calibration performance via proposing a differentiable loss term
that can be used effectively in gradient descent optimisation and dynamic data pruning strategy not only enhances legitimate
high confidence samples to enhance trust in DNN classifiers but also reduce the training time for calibration.
We propose a novel Compounded Corruption(CnC) technique for the Out-of-Distribution data augmentation. One of the major advantages of CnC is that it does not require any hold-out data apart from the training set. Our extensive comparison with 20 methods from the major conferences in last 4 years show that a model trained using CnC based data augmentation, significantly outperforms SOTA, both in terms of OOD detection accuracy as well as inference time.
We propose a novel auxiliary loss function: Multi-class Difference in Confidence and Accuracy (MDCA) for Deep Neural Network calibration. The loss can be combined with any application specific classification losses for image, NLP, Speech domains. We also demonstrate the utility of the loss in semantic segmentation tasks.
Current Research Team, Research and Innovation Park, IIT Delhi