Publications
Conferences & Journals
EvoLMM: Self-Evolving Large Multimodal Models with Continuous Rewards
7+ Citations
1.2k+ Model Downloads
Thinking Beyond Labels: Vocabulary-Free Fine-Grained Recognition using
Reasoning-Augmented LMMs
DuwatBench: Bridging Language and Visual Heritage through an Arabic Calligraphy Benchmark
for Multimodal Understanding
500+ Dataset Downloads
Beyond Simple Edits: Composed Video Retrieval with Dense Modifications
7.1K+ Dataset Downloads
522 Model Downloads
2 Citations
LlamaV-o1: Rethinking Step-by-Step Visual Reasoning in LLMs
300+ GitHub Stars
130+ Citations
7.3K+ Dataset Downloads
31K+ Model Downloads
TimeTravel: A Benchmark for Historical & Cultural Artifact Understanding
10+ Citations
3.1K+ Dataset Downloads
MobiLLaMA: Towards Accurate & Lightweight Fully Transparent GPT
600+ GitHub Stars
50+ Citations
320K+ Model Downloads
All Languages Matter: Evaluating LMMs on 100 Culturally Diverse Languages
40+ Citations
9.8K+ Dataset Downloads
Composed Video Retrieval via Enriched Context and Discriminative Embeddings
25+ Citations
CAMEL-Bench: A Comprehensive Arabic LMM Benchmark
15+ Citations
7.7K+ Dataset Downloads
Fann or Flop: A Benchmark for Arabic Poetry Understanding
8+ Citations
3.2K+ Dataset Downloads
XrayGPT: Chest Radiograph Summarization using Medical VLMs
500+ GitHub Stars
300+ Citations
352K+ Model Downloads
Video Instance Segmentation via Multi-Scale Spatio-Temporal Attention
20+ Citations
Video Instance Segmentation in an Open-World
8 Citations
Image and video super resolution using recurrent generative adversarial network
IEEE international conference on advanced video and signal based surveillance (AVSS),
2019 arXiv GitHub
25+ Citations
Motion saliency based generative adversarial network for underwater moving object
segmentation
35+ Citations
Under Review / Preprints
LLM Post-Training: A Deep Dive into Reasoning Large Language Models
2.2K+ GitHub Stars
100+ Citations
Ain: The arabic inclusive large multimodal model
50+ GitHub Stars
8+ Citations
192K+ Model Downloads
Dynamic pre-training: Towards efficient and scalable all-in-one image restoration
70+ GitHub Stars
22+ Citations
Patents
System and Method for Video Instance Segmentation via Multi-Scale Spatio-Temporal
Transformers
US Patent App. 17/983,841
GRANTED
For a complete list of publications, citations, and metrics:
See full list on Google Scholar