Publications
Conferences & Journals
Beyond Simple Edits: Composed Video Retrieval with Dense Modifications
ICCV 2025
LlamaV-o1: Rethinking Step-by-Step Visual Reasoning in LLMs
ACL 2025
300+ GitHub Stars
110+ Citations
TimeTravel: A Benchmark for Historical & Cultural Artifact Understanding
ACL 2025
10+ Citations
MobiLLaMA: Towards Accurate & Lightweight Fully Transparent GPT
ICLR 2025 (Spotlight, Top-2%)
600+ GitHub Stars
50+ Citations
200K+ HuggingFace Downloads
All Languages Matter: Evaluating LMMs on 100 Culturally Diverse Languages
CVPR 2025 (Highlight)
40+ Citations
Composed Video Retrieval via Enriched Context and Discriminative Embeddings
CVPR 2024
20+ Citations
CAMEL-Bench: A Comprehensive Arabic LMM Benchmark
NAACL 2025
10+ Citations
Fann or Flop: A Benchmark for Arabic Poetry Understanding
EMNLP 2025 (Main Track)
XrayGPT: Chest Radiograph Summarization using Medical VLMs
ACL-Workshop 2024
500+ GitHub Stars
300+ Citations
120K+ HuggingFace Downloads
3D Mitochondria Instance Segmentation with Spatio-Temporal Transformers
MICCAI 2023
Fast Video Instance Segmentation via Recurrent Encoder Transformers
CAIP 2023
Video Instance Segmentation via Multi-Scale Spatio-Temporal Attention
ECCV 2022
20+ Citations
Video Instance Segmentation in an Open-World
International Journal of Computer Vision (IJCV), 2024
Under Review / Preprints
EvoLMM: Self-Evolving Large Multimodal Models with Continuous Rewards
CoVR-R: Reason-Aware Composed Video Retrieval
Mobile-O: Unified Multimodal Understanding & Generation on Mobile
LLM Post-Training: A Deep Dive into Reasoning Large Language Models
2.2K+ GitHub Stars
100+ Citations
Patents
System and Method for Video Instance Segmentation via Multi-Scale Spatio-Temporal
Transformers
US Patent App. 17/983,841
GRANTED
For a complete list of publications, citations, and metrics:
See full list on Google Scholar