Layout analysis plays a crucial role in the automated digitization of printed documents within the cultural heritage sector. By automatically identifying layout elements such as titles, headlines, pictures, captions, and body text, valuable metadata is created, enabling the digitization of a larger number of documents using the same resources.
Due to the vast diversity of layouts, automatic layout analysis presents a complex challenge for traditional algorithmic approaches. However, significant advancements in quality and robustness against layout style changes have been achieved by combining neural networks for pixel-level image labeling with algorithmic techniques.
Among the neural networks used for this purpose, the "Transformer" class stands out as it can directly transform an image into labeled zones.
Focusing on historic newspapers, we will discuss the strengths and weaknesses of three methods: algorithmic approaches, pixel-based AI, and Transformer AI. Initial results indicate that Transformer networks have the potential to significantly enhance the quality, robustness, and scope of automated layout analysis. We will present the first experimental findings from our ongoing research and development efforts.