LoRA-Tuned Multimodal RAG System for Technical Manual QA: A Case Study on Hyundai Staria

Yerin Nam, Hansun Choi, Jonggeun Choi, Hyukjin Kwon

Research output: Contribution to journalArticlepeer-review

Abstract

This study develops a domain-adaptive multimodal RAG (Retrieval-Augmented Generation) system to improve the accuracy and efficiency of technical question answering based on large-scale structured manuals. Using Hyundai Staria maintenance documents as a case study, we extracted text and images from PDF manuals and constructed QA, RAG, and Multi-Turn datasets to reflect realistic troubleshooting scenarios. To overcome limitations of baseline RAG models, we proposed an enhanced architecture that incorporates sentence-level similarity annotations and parameter-efficient fine-tuning via LoRA (Low-Rank Adaptation) using the bLLossom-8B language model and BAAI-bge-m3 embedding model. Experimental results show that the proposed system achieved improvements of 3.0%p in BERTScore, 3.0%p in cosine similarity, and 18.0%p in ROUGE-L compared to existing RAG systems, with notable gains in image-guided response accuracy. A qualitative evaluation by 20 domain experts yielded an average satisfaction score of 4.4 out of 5. This study presents a practical and extensible AI framework for multimodal document understanding, with broad applicability across automotive, industrial, and defense-related technical documentation.

Original languageEnglish
Article number8387
JournalApplied Sciences (Switzerland)
Volume15
Issue number15
DOIs
StatePublished - Aug 2025

Keywords

  • AI for structured manuals
  • domain adaptation
  • LoRA-based fine-tuning
  • multimodal RAG
  • question-answering system
  • technical documentation

Fingerprint

Dive into the research topics of 'LoRA-Tuned Multimodal RAG System for Technical Manual QA: A Case Study on Hyundai Staria'. Together they form a unique fingerprint.

Cite this