TY - GEN
T1 - MixLoss
T2 - 17th International Conference on Human System Interaction, HSI 2025
AU - Jung, Gunoh
AU - Tang, Qing
AU - Lee, Hongdon
AU - Jung, Hail
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - Table Structure Recognition (TSR) is a fundamental challenge in document analysis, especially for industrial documents where tables often contain complex layouts and noisy formatting. Although recent image-to-markup approaches have made progress using end-to-end learning, they commonly suffer from misalignment between predicted cell bounding boxes and actual text regions, leading to structural parsing errors. To address this issue, we propose MixLoss, a collaborative learning framework designed to improve TSR performance by enforcing position-wise consistency between HTML structure prediction and bounding box detection. MixLoss combines HTML tokens with bounding box coordinates in a unified sequence, inserting coordinate tokens directly after filled cell tokens. This design ensures structural alignment while maintaining computational efficiency. Extensive experiments show that MixLoss delivers significant improvements on real-world industrial datasets, including a 2.2% gain on IX DocBench, while maintaining strong performance on standard benchmarks. These results demonstrate the effectiveness of collaborative learning in enhancing table structure recognition for practical industrial document parsing.
AB - Table Structure Recognition (TSR) is a fundamental challenge in document analysis, especially for industrial documents where tables often contain complex layouts and noisy formatting. Although recent image-to-markup approaches have made progress using end-to-end learning, they commonly suffer from misalignment between predicted cell bounding boxes and actual text regions, leading to structural parsing errors. To address this issue, we propose MixLoss, a collaborative learning framework designed to improve TSR performance by enforcing position-wise consistency between HTML structure prediction and bounding box detection. MixLoss combines HTML tokens with bounding box coordinates in a unified sequence, inserting coordinate tokens directly after filled cell tokens. This design ensures structural alignment while maintaining computational efficiency. Extensive experiments show that MixLoss delivers significant improvements on real-world industrial datasets, including a 2.2% gain on IX DocBench, while maintaining strong performance on standard benchmarks. These results demonstrate the effectiveness of collaborative learning in enhancing table structure recognition for practical industrial document parsing.
KW - Deep Learning
KW - Industrial Document Analysis
KW - Table Structure Recognition
UR - https://www.scopus.com/pages/publications/105017126938
U2 - 10.1109/HSI66212.2025.11142406
DO - 10.1109/HSI66212.2025.11142406
M3 - Conference contribution
AN - SCOPUS:105017126938
T3 - International Conference on Human System Interaction, HSI
BT - Proceeding - 17th International Conference on Human System Interaction, HSI 2025
PB - IEEE Computer Society
Y2 - 16 July 2025 through 19 July 2025
ER -