Publications

Conferences & Journals

DuwatBench: Bridging Language and Visual Heritage through an Arabic Calligraphy Benchmark for Multimodal Understanding
Shubham Patle, Sara Ghaboura, ... Omkar Thawakar, et al.
EACL 2026 (Main)
202 Dataset Downloads
Beyond Simple Edits: Composed Video Retrieval with Dense Modifications
Omkar Thawakar, Dmitry Demidev, Ritesh Thawkar, et al.
ICCV 2025
7.1K+ Dataset Downloads 522 Model Downloads 2 Citations
LlamaV-o1: Rethinking Step-by-Step Visual Reasoning in LLMs
Omkar Thawakar, D Dissanayake, KP More, R Thawkar, A Heakl, N Ahsan, ...
ACL 2025
300+ GitHub Stars 110+ Citations 6.1K+ Dataset Downloads 25K+ Model Downloads
TimeTravel: A Benchmark for Historical & Cultural Artifact Understanding
S Ghaboura, KP More, R Thawkar, W Al Ghallabi, O Thawakar, FS Khan, ...
ACL 2025
10+ Citations 2.5K+ Dataset Downloads
MobiLLaMA: Towards Accurate & Lightweight Fully Transparent GPT
Omkar Thawakar, A Vayani, S Khan, H Cholakal, RM Anwer, M Felsberg, ...
ICLR 2025 (Spotlight, Top-2%)
600+ GitHub Stars 50+ Citations 250K+ Model Downloads
All Languages Matter: Evaluating LMMs on 100 Culturally Diverse Languages
A Vayani, D Dissanayake, H Watawana, N Ahsan, N Sasikumar, ...
CVPR 2025 (Highlight)
40+ Citations 9.8K+ Dataset Downloads
Composed Video Retrieval via Enriched Context and Discriminative Embeddings
Omkar Thawakar, M Naseer, RM Anwer, S Khan, M Felsberg, M Shah, ...
CVPR 2024
20+ Citations
CAMEL-Bench: A Comprehensive Arabic LMM Benchmark
S Ghaboura, A Heakl, O Thawakar, AHSA Alharthi, I Riahi, A Radman, ...
NAACL 2025
10+ Citations 6.5K+ Dataset Downloads
Fann or Flop: A Benchmark for Arabic Poetry Understanding
W Al Ghallabi, R Thawkar, S Ghaboura, KP More, O Thawakar, ...
EMNLP 2025 (Main Track)
2.1K+ Dataset Downloads
XrayGPT: Chest Radiograph Summarization using Medical VLMs
Omkar Thawakar et al.
ACL-Workshop 2024
500+ GitHub Stars 300+ Citations 352K+ Model Downloads
3D Mitochondria Instance Segmentation with Spatio-Temporal Transformers
Omkar Thawakar et al.
MICCAI 2023
Fast Video Instance Segmentation via Recurrent Encoder Transformers
Omkar Thawakar et al.
CAIP 2023
Video Instance Segmentation via Multi-Scale Spatio-Temporal Attention
Omkar Thawakar et al.
ECCV 2022
20+ Citations
Video Instance Segmentation in an Open-World
Omkar Thawakar et al.
International Journal of Computer Vision (IJCV), 2024
8 Citations
Image and video super resolution using recurrent generative adversarial network
Omkar Thawakar et al.
IEEE international conference on advanced video and signal based surveillance (AVSS), 2019
25+ Citations
Motion saliency based generative adversarial network for underwater moving object segmentation
Prashant W Patil, Omkar Thawakar et al.
IEEE international conference on image processing (ICIP), 2019
35+ Citations

Under Review / Preprints

EvoLMM: Self-Evolving Large Multimodal Models with Continuous Rewards
Omkar Thawakar et al.
3 Citations
CoVR-R: Reason-Aware Composed Video Retrieval
Omkar Thawakar et al.
Mobile-O: Unified Multimodal Understanding & Generation on Mobile
A. Shaker, Omkar Thawakar et al.
LLM Post-Training: A Deep Dive into Reasoning Large Language Models
Komal Kumar, Omkar Thawakar et al.
2.2K+ GitHub Stars 100+ Citations
Ain: The arabic inclusive large multimodal model
Ahmed Heakl, Sara Ghaboura, Omkar Thawakar, ...
Preprint 2025
50+ GitHub Stars 8+ Citations 192K+ Model Downloads
Dynamic pre-training: Towards efficient and scalable all-in-one image restoration
Akshay Dudhane, Omkar Thawakar et al.
Preprint 2024
70+ GitHub Stars 22+ Citations

Patents

System and Method for Video Instance Segmentation via Multi-Scale Spatio-Temporal Transformers
Omkar Thawakar et al.
US Patent App. 17/983,841 GRANTED

For a complete list of publications, citations, and metrics:

See full list on Google Scholar