TY - JOUR
T1 - LoRA-Tuned Multimodal RAG System for Technical Manual QA
T2 - A Case Study on Hyundai Staria
AU - Nam, Yerin
AU - Choi, Hansun
AU - Choi, Jonggeun
AU - Kwon, Hyukjin
N1 - Publisher Copyright:
© 2025 by the authors.
PY - 2025/8
Y1 - 2025/8
N2 - This study develops a domain-adaptive multimodal RAG (Retrieval-Augmented Generation) system to improve the accuracy and efficiency of technical question answering based on large-scale structured manuals. Using Hyundai Staria maintenance documents as a case study, we extracted text and images from PDF manuals and constructed QA, RAG, and Multi-Turn datasets to reflect realistic troubleshooting scenarios. To overcome limitations of baseline RAG models, we proposed an enhanced architecture that incorporates sentence-level similarity annotations and parameter-efficient fine-tuning via LoRA (Low-Rank Adaptation) using the bLLossom-8B language model and BAAI-bge-m3 embedding model. Experimental results show that the proposed system achieved improvements of 3.0%p in BERTScore, 3.0%p in cosine similarity, and 18.0%p in ROUGE-L compared to existing RAG systems, with notable gains in image-guided response accuracy. A qualitative evaluation by 20 domain experts yielded an average satisfaction score of 4.4 out of 5. This study presents a practical and extensible AI framework for multimodal document understanding, with broad applicability across automotive, industrial, and defense-related technical documentation.
AB - This study develops a domain-adaptive multimodal RAG (Retrieval-Augmented Generation) system to improve the accuracy and efficiency of technical question answering based on large-scale structured manuals. Using Hyundai Staria maintenance documents as a case study, we extracted text and images from PDF manuals and constructed QA, RAG, and Multi-Turn datasets to reflect realistic troubleshooting scenarios. To overcome limitations of baseline RAG models, we proposed an enhanced architecture that incorporates sentence-level similarity annotations and parameter-efficient fine-tuning via LoRA (Low-Rank Adaptation) using the bLLossom-8B language model and BAAI-bge-m3 embedding model. Experimental results show that the proposed system achieved improvements of 3.0%p in BERTScore, 3.0%p in cosine similarity, and 18.0%p in ROUGE-L compared to existing RAG systems, with notable gains in image-guided response accuracy. A qualitative evaluation by 20 domain experts yielded an average satisfaction score of 4.4 out of 5. This study presents a practical and extensible AI framework for multimodal document understanding, with broad applicability across automotive, industrial, and defense-related technical documentation.
KW - AI for structured manuals
KW - domain adaptation
KW - LoRA-based fine-tuning
KW - multimodal RAG
KW - question-answering system
KW - technical documentation
UR - https://www.scopus.com/pages/publications/105013297998
U2 - 10.3390/app15158387
DO - 10.3390/app15158387
M3 - Article
AN - SCOPUS:105013297998
SN - 2076-3417
VL - 15
JO - Applied Sciences (Switzerland)
JF - Applied Sciences (Switzerland)
IS - 15
M1 - 8387
ER -