The year 2025 marks the explosion of multimodal AI. Vision-Language Models (VLMs) like GPT-4V and Gemini are now integrated into…