Publications
Conferences & Journals
DuwatBench: Bridging Language and Visual Heritage through an Arabic Calligraphy Benchmark
for Multimodal Understanding
EACL 2026 (Main)
202 Dataset Downloads
Beyond Simple Edits: Composed Video Retrieval with Dense Modifications
ICCV 2025
7.1K+ Dataset Downloads
522 Model Downloads
2 Citations
LlamaV-o1: Rethinking Step-by-Step Visual Reasoning in LLMs
ACL 2025
300+ GitHub Stars
110+ Citations
6.1K+ Dataset Downloads
25K+ Model Downloads
TimeTravel: A Benchmark for Historical & Cultural Artifact Understanding
ACL 2025
10+ Citations
2.5K+ Dataset Downloads
MobiLLaMA: Towards Accurate & Lightweight Fully Transparent GPT
ICLR 2025 (Spotlight, Top-2%)
600+ GitHub Stars
50+ Citations
250K+ Model Downloads
All Languages Matter: Evaluating LMMs on 100 Culturally Diverse Languages
CVPR 2025 (Highlight)
40+ Citations
9.8K+ Dataset Downloads
Composed Video Retrieval via Enriched Context and Discriminative Embeddings
CVPR 2024
20+ Citations
CAMEL-Bench: A Comprehensive Arabic LMM Benchmark
NAACL 2025
10+ Citations
6.5K+ Dataset Downloads
Fann or Flop: A Benchmark for Arabic Poetry Understanding
EMNLP 2025 (Main Track)
2.1K+ Dataset Downloads
XrayGPT: Chest Radiograph Summarization using Medical VLMs
ACL-Workshop 2024
500+ GitHub Stars
300+ Citations
352K+ Model Downloads
3D Mitochondria Instance Segmentation with Spatio-Temporal Transformers
MICCAI 2023
Fast Video Instance Segmentation via Recurrent Encoder Transformers
CAIP 2023
Video Instance Segmentation via Multi-Scale Spatio-Temporal Attention
ECCV 2022
20+ Citations
Video Instance Segmentation in an Open-World
International Journal of Computer Vision (IJCV), 2024
8 Citations
Image and video super resolution using recurrent generative adversarial network
IEEE international conference on advanced video and signal based surveillance (AVSS),
2019
25+ Citations
Motion saliency based generative adversarial network for underwater moving object
segmentation
IEEE international conference on image processing (ICIP),
2019
35+ Citations
Under Review / Preprints
EvoLMM: Self-Evolving Large Multimodal Models with Continuous Rewards
3 Citations
CoVR-R: Reason-Aware Composed Video Retrieval
Mobile-O: Unified Multimodal Understanding & Generation on Mobile
LLM Post-Training: A Deep Dive into Reasoning Large Language Models
2.2K+ GitHub Stars
100+ Citations
Ain: The arabic inclusive large multimodal model
Preprint 2025
50+ GitHub Stars
8+ Citations
192K+ Model Downloads
Dynamic pre-training: Towards efficient and scalable all-in-one image restoration
Preprint 2024
70+ GitHub Stars
22+ Citations
Patents
System and Method for Video Instance Segmentation via Multi-Scale Spatio-Temporal
Transformers
US Patent App. 17/983,841
GRANTED
For a complete list of publications, citations, and metrics:
See full list on Google Scholar