Project Summary: Developed KRETA, a comprehensive benchmark for Korean Reading and rEasoning in Text-rich VQA Attuned to diverse visual contexts, addressing the critical gap in Korean language resources for Vision-Language Model evaluation.
Links: GitHub
Understanding and reasoning over text within visual contexts poses a significant challenge for Vision-Language Models, given the complexity and diversity of real-world scenarios. While text-rich VQA datasets exist for high-resource languages like English, a critical gap remains for low-resource languages such as Korean.
KRETA facilitates in-depth evaluation of both visual text understanding and reasoning capabilities, supporting multifaceted assessment across 15 domains and 26 image types. The project introduces a semi-automated VQA generation pipeline specifically optimized for text-rich settings, leveraging refined stepwise image decomposition and a rigorous seven-metric evaluation protocol to ensure data quality. The benchmark establishes new standards for culturally-aware AI evaluation and supports the development of Korean-specific AI capabilities.