Omkar Thawakar

PhD Researcher | Multimodal AI | Video Understanding | LLMs & Agents

PhD researcher at MBZUAI, working on multimodal reasoning, video understanding, large multimodal models (LMMs), and self-evolving AI systems, with strong focus on real-world deployment.

Omkar Thawakar

Highlights

  • [CVPR 2026] 3 papers accepted.
  • [ICLR-SLLM 2025] Spotlight (Top-2%) for MobiLLaMA.
  • [CVPR 2025] Highlight for All Languages Matter (LMM Evaluation).
  • [Impact] 300K+ HuggingFace downloads across models.
  • [Award] Khalifa Fund Entrepreneurship Competition Winner (250K AED).
  • [Award] Sandook Al Watan Entrepreneurship Competitio Winner.
  • [Startup] Founder & Tech Lead @ Lawa.AI.

Spotlight Research

MobiLLaMA (ICLR 2025)

Accurate & Lightweight Fully Transparent GPT. 200K+ Downloads.

Read Paper

LlamaV-o1 (ACL 2025)

Rethinking Step-by-Step Visual Reasoning in LLMs.

Read Paper

Recent Projects

VisQ app logo VisQ (Visual Query)

iOS application for composed image and video retrieval on iPhone

VisQ brings reason-aware visual retrieval to iPhone with an on-device Qwen3-VL-2B Core ML runtime. Users can search personal media with natural language or run composed retrieval using a reference image + edit prompt, then inspect "Why This Matched" explanations powered by the model's reasoning capability.

On-device AI Composed Retrieval Explainable Results Privacy-First Offline-First
  • Indexes local photos and videos directly from the iPhone photo library.
  • Supports text search and reference-image-guided retrieval with scene edits.
  • Surfaces human-readable match reasons and visual explanation chips.
  • Keeps embeddings, ranking, and inference on-device for privacy-preserving search.
Built from research

VisQ is based on our recent research work CoVR-R: Reason-Aware Composed Video Retrieval, translating reason-aware composed retrieval into a practical iPhone app for local-first multimodal search.

Available now on the Apple App Store.

VisQ screenshot 1
VisQ screenshot 2
VisQ screenshot 3
VisQ screenshot 4
VisQ screenshot 5
VisQ screenshot 6
VisQ screenshot 7
VisQ screenshot 8
VisQ screenshot 9

Startups & Industry

Lawa.AI

Founder & Tech-Lead

Agentic AI platform for enterprises and businesses.

  • Multilingual, privacy-first AI agents.
  • Deployed in real organizational workflows.
  • $150K+ projected annual revenue (2026).
  • $70K pre-seed + multiple grants.

Nutrigenics.Care

Founder & CTO

AI-powered personalized nutrition platform.

  • Nutrition-GPT engine.
  • Clinical collaboration.
  • $100K+ grant funding.
  • Microsoft Founders Hub Grant($150K Technical Support).

Recent Preprints & Submitted Work

EvoLMM: Self-Evolving Large Multimodal Models with Continuous Rewards
Omkar Thawakar et al.
Accepted: CVPR 2026 (Findings)

CoVR-R: Reason-Aware Composed Video Retrieval
Omkar Thawakar et al.
Accepted: CVPR 2026 (Findings)

Mobile-O: Unified Multimodal Understanding & Generation on Mobile
A. Shaker, Omkar Thawakar et al.
Submitted / Preprint

LLM Post-Training: A Deep Dive into Reasoning Large Language Models
Komal Kumar, Omkar Thawakar et al.
Submitted / Preprint
View All Publications