KRETA: Korean Text-Rich VQA Benchmark

Project Summary: Developed KRETA, a comprehensive benchmark for Korean Reading and rEasoning in Text-rich VQA Attuned to diverse visual contexts, addressing the critical gap in Korean language resources for Vision-Language Model evaluation.

Links: GitHub

Understanding and reasoning over text within visual contexts poses a significant challenge for Vision-Language Models, given the complexity and diversity of real-world scenarios. While text-rich VQA datasets exist for high-resource languages like English, a critical gap remains for low-resource languages such as Korean.

KRETA facilitates in-depth evaluation of both visual text understanding and reasoning capabilities, supporting multifaceted assessment across 15 domains and 26 image types. The project introduces a semi-automated VQA generation pipeline specifically optimized for text-rich settings, leveraging refined stepwise image decomposition and a rigorous seven-metric evaluation protocol to ensure data quality. The benchmark establishes new standards for culturally-aware AI evaluation and supports the development of Korean-specific AI capabilities.

프로젝트 요약: KRETA는 다양한 시각 맥락에 맞춘 Korean Reading and rEasoning in Text-rich VQA 벤치마크입니다. 비전 언어 모델 평가에서 한국어 자원이 부족하다는 문제를 다루기 위해 개발했습니다.

Links: GitHub

시각 자료 안의 텍스트를 이해하고 추론하는 일은 현실 세계의 복잡성과 다양성 때문에 비전 언어 모델에 중요한 과제입니다. 영어처럼 자원이 많은 언어에는 text-rich VQA 데이터셋이 있지만, 한국어처럼 상대적으로 자원이 부족한 언어에는 여전히 큰 공백이 있습니다.

KRETA는 시각 텍스트 이해와 추론 능력을 깊이 있게 평가할 수 있도록 설계되었습니다. 15개 도메인과 26개 이미지 유형에 걸쳐 다면적인 평가를 지원합니다. 이 프로젝트는 text-rich 환경에 최적화된 반자동 VQA 생성 파이프라인을 제안하며, 정교한 단계별 이미지 분해와 엄격한 7개 지표 평가 프로토콜을 활용해 데이터 품질을 확보했습니다. 이 벤치마크는 문화적 맥락을 고려한 AI 평가의 새로운 기준을 세우고, 한국어에 특화된 AI 역량 개발을 지원합니다.