Publications

Conferences & Journals

EvoLMM: Self-Evolving Large Multimodal Models with Continuous Rewards
Omkar Thawakar et al.
CVPR 2026 (findings) arXiv GitHub
7+ Citations 1.2k+ Model Downloads
CoVR-R: Reason-Aware Composed Video Retrieval
Omkar Thawakar et al.
CVPR 2026 (findings) arXiv GitHub
Thinking Beyond Labels: Vocabulary-Free Fine-Grained Recognition using Reasoning-Augmented LMMs
Dmitry Demidov, Zaigham Zaheer, Zongyan Han, Omkar Thawakar et al.
CVPR 2026 (Main) arXiv GitHub
DuwatBench: Bridging Language and Visual Heritage through an Arabic Calligraphy Benchmark for Multimodal Understanding
Shubham Patle, Sara Ghaboura, ... Omkar Thawakar, et al.
EACL 2026 (Main) arXiv GitHub
500+ Dataset Downloads
Beyond Simple Edits: Composed Video Retrieval with Dense Modifications
Omkar Thawakar, Dmitry Demidev, Ritesh Thawkar, et al.
ICCV 2025 arXiv GitHub
7.1K+ Dataset Downloads 522 Model Downloads 2 Citations
LlamaV-o1: Rethinking Step-by-Step Visual Reasoning in LLMs
Omkar Thawakar, D Dissanayake, KP More, R Thawkar, A Heakl, N Ahsan, ...
300+ GitHub Stars 130+ Citations 7.3K+ Dataset Downloads 31K+ Model Downloads
TimeTravel: A Benchmark for Historical & Cultural Artifact Understanding
S Ghaboura, KP More, R Thawkar, W Al Ghallabi, O Thawakar, FS Khan, ...
10+ Citations 3.1K+ Dataset Downloads
MobiLLaMA: Towards Accurate & Lightweight Fully Transparent GPT
Omkar Thawakar, A Vayani, S Khan, H Cholakal, RM Anwer, M Felsberg, ...
ICLR 2025 (Spotlight, Top-2%) arXiv GitHub
600+ GitHub Stars 50+ Citations 320K+ Model Downloads
All Languages Matter: Evaluating LMMs on 100 Culturally Diverse Languages
A Vayani, D Dissanayake, H Watawana, N Ahsan, N Sasikumar, ...
CVPR 2025 (Highlight) arXiv GitHub
40+ Citations 9.8K+ Dataset Downloads
Composed Video Retrieval via Enriched Context and Discriminative Embeddings
Omkar Thawakar, M Naseer, RM Anwer, S Khan, M Felsberg, M Shah, ...
CVPR 2024 arXiv GitHub
25+ Citations
CAMEL-Bench: A Comprehensive Arabic LMM Benchmark
S Ghaboura, A Heakl, O Thawakar, AHSA Alharthi, I Riahi, A Radman, ...
NAACL 2025 arXiv GitHub
15+ Citations 7.7K+ Dataset Downloads
Fann or Flop: A Benchmark for Arabic Poetry Understanding
W Al Ghallabi, R Thawkar, S Ghaboura, KP More, O Thawakar, ...
EMNLP 2025 (Main Track) arXiv GitHub
8+ Citations 3.2K+ Dataset Downloads
XrayGPT: Chest Radiograph Summarization using Medical VLMs
Omkar Thawakar et al.
ACL-Workshop 2024 arXiv GitHub
500+ GitHub Stars 300+ Citations 352K+ Model Downloads
3D Mitochondria Instance Segmentation with Spatio-Temporal Transformers
Omkar Thawakar et al.
MICCAI 2023 arXiv GitHub
Fast Video Instance Segmentation via Recurrent Encoder Transformers
Omkar Thawakar et al.
CAIP 2023 arXiv GitHub
Video Instance Segmentation via Multi-Scale Spatio-Temporal Attention
Omkar Thawakar et al.
ECCV 2022 arXiv GitHub
20+ Citations
Video Instance Segmentation in an Open-World
Omkar Thawakar et al.
International Journal of Computer Vision (IJCV), 2024 arXiv GitHub
8 Citations
Image and video super resolution using recurrent generative adversarial network
Omkar Thawakar et al.
IEEE international conference on advanced video and signal based surveillance (AVSS), 2019 arXiv GitHub
25+ Citations
Motion saliency based generative adversarial network for underwater moving object segmentation
Prashant W Patil, Omkar Thawakar et al.
IEEE international conference on image processing (ICIP), 2019 arXiv GitHub
35+ Citations

Under Review / Preprints

Mobile-O: Unified Multimodal Understanding & Generation on Mobile
A. Shaker, Omkar Thawakar et al.
LLM Post-Training: A Deep Dive into Reasoning Large Language Models
Komal Kumar, Omkar Thawakar et al.
2.2K+ GitHub Stars 100+ Citations
Ain: The arabic inclusive large multimodal model
Ahmed Heakl, Sara Ghaboura, Omkar Thawakar, ...
Preprint 2025 arXiv GitHub
50+ GitHub Stars 8+ Citations 192K+ Model Downloads
Dynamic pre-training: Towards efficient and scalable all-in-one image restoration
Akshay Dudhane, Omkar Thawakar et al.
Preprint 2024 arXiv GitHub
70+ GitHub Stars 22+ Citations

Patents

System and Method for Video Instance Segmentation via Multi-Scale Spatio-Temporal Transformers
Omkar Thawakar et al.
US Patent App. 17/983,841 GRANTED

For a complete list of publications, citations, and metrics:

See full list on Google Scholar