Publications

Conferences & Journals

Beyond Simple Edits: Composed Video Retrieval with Dense Modifications
Omkar Thawakar, Dmitry Demidev, Ritesh Thawkar, et al.
ICCV 2025
LlamaV-o1: Rethinking Step-by-Step Visual Reasoning in LLMs
Omkar Thawakar, D Dissanayake, KP More, R Thawkar, A Heakl, N Ahsan, ...
ACL 2025
300+ GitHub Stars 110+ Citations
TimeTravel: A Benchmark for Historical & Cultural Artifact Understanding
S Ghaboura, KP More, R Thawkar, W Al Ghallabi, O Thawakar, FS Khan, ...
ACL 2025
10+ Citations
MobiLLaMA: Towards Accurate & Lightweight Fully Transparent GPT
Omkar Thawakar, A Vayani, S Khan, H Cholakal, RM Anwer, M Felsberg, ...
ICLR 2025 (Spotlight, Top-2%)
600+ GitHub Stars 50+ Citations
200K+ HuggingFace Downloads
All Languages Matter: Evaluating LMMs on 100 Culturally Diverse Languages
A Vayani, D Dissanayake, H Watawana, N Ahsan, N Sasikumar, ...
CVPR 2025 (Highlight)
40+ Citations
Composed Video Retrieval via Enriched Context and Discriminative Embeddings
Omkar Thawakar, M Naseer, RM Anwer, S Khan, M Felsberg, M Shah, ...
CVPR 2024
20+ Citations
CAMEL-Bench: A Comprehensive Arabic LMM Benchmark
S Ghaboura, A Heakl, O Thawakar, AHSA Alharthi, I Riahi, A Radman, ...
NAACL 2025
10+ Citations
Fann or Flop: A Benchmark for Arabic Poetry Understanding
W Al Ghallabi, R Thawkar, S Ghaboura, KP More, O Thawakar, ...
EMNLP 2025 (Main Track)
XrayGPT: Chest Radiograph Summarization using Medical VLMs
Omkar Thawakar et al.
ACL-Workshop 2024
500+ GitHub Stars 300+ Citations
120K+ HuggingFace Downloads
3D Mitochondria Instance Segmentation with Spatio-Temporal Transformers
Omkar Thawakar et al.
MICCAI 2023
Fast Video Instance Segmentation via Recurrent Encoder Transformers
Omkar Thawakar et al.
CAIP 2023
Video Instance Segmentation via Multi-Scale Spatio-Temporal Attention
Omkar Thawakar et al.
ECCV 2022
20+ Citations
Video Instance Segmentation in an Open-World
Omkar Thawakar et al.
International Journal of Computer Vision (IJCV), 2024

Under Review / Preprints

EvoLMM: Self-Evolving Large Multimodal Models with Continuous Rewards
Omkar Thawakar et al.
CoVR-R: Reason-Aware Composed Video Retrieval
Omkar Thawakar et al.
Mobile-O: Unified Multimodal Understanding & Generation on Mobile
A. Shaker, Omkar Thawakar et al.
LLM Post-Training: A Deep Dive into Reasoning Large Language Models
Komal Kumar, Omkar Thawakar et al.
2.2K+ GitHub Stars 100+ Citations

Patents

System and Method for Video Instance Segmentation via Multi-Scale Spatio-Temporal Transformers
Omkar Thawakar et al.
US Patent App. 17/983,841 GRANTED

For a complete list of publications, citations, and metrics:

See full list on Google Scholar