Abstract
This study develops a domain-adaptive multimodal RAG (Retrieval-Augmented Generation) system to improve the accuracy and efficiency of technical question answering based on large-scale structured manuals. Using Hyundai Staria maintenance documents as a case study, we extracted text and images from PDF manuals and constructed QA, RAG, and Multi-Turn datasets to reflect realistic troubleshooting scenarios. To overcome limitations of baseline RAG models, we proposed an enhanced architecture that incorporates sentence-level similarity annotations and parameter-efficient fine-tuning via LoRA (Low-Rank Adaptation) using the bLLossom-8B language model and BAAI-bge-m3 embedding model. Experimental results show that the proposed system achieved improvements of 3.0%p in BERTScore, 3.0%p in cosine similarity, and 18.0%p in ROUGE-L compared to existing RAG systems, with notable gains in image-guided response accuracy. A qualitative evaluation by 20 domain experts yielded an average satisfaction score of 4.4 out of 5. This study presents a practical and extensible AI framework for multimodal document understanding, with broad applicability across automotive, industrial, and defense-related technical documentation.
| Original language | English |
|---|---|
| Article number | 8387 |
| Journal | Applied Sciences (Switzerland) |
| Volume | 15 |
| Issue number | 15 |
| DOIs | |
| State | Published - Aug 2025 |
Keywords
- AI for structured manuals
- LoRA-based fine-tuning
- domain adaptation
- multimodal RAG
- question-answering system
- technical documentation
Fingerprint
Dive into the research topics of 'LoRA-Tuned Multimodal RAG System for Technical Manual QA: A Case Study on Hyundai Staria'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver