Workshop on data extraction for digitized source material

In the context of Prof. Dr. Madeleine Herren-Oesch’s lecture on “Global Moments: Fin de siècle and Orientalism in 1900”, the Institute for European Global Studies organized a one-day workshop on data extraction on 27.10.2016. In the process, workshop participants for the first time actively used the method of extracting and generating table of contents, a workflow developed by the project Global Information at a Glance. Fifteen students and project collaborators thus inaugurated the process of making parts of the Asia Directories and Chronicles accessible in more detail.

The Asia Directories and Chronicles are a collection of yearbooks, which provide comprehensive information on foreign economic and political actors in Asia in the context of the late 19th and early 20th century. Related copies, published between 1863 and the Second World War, can be found in libraries dispersed all over the world. Therefore, the project Global Information at a Glance collects digital copies of this vast collection by compiling them into a comprehensive digital library.

The sheer size of the collection and the vast amount of content requires to take on new ways of making this information accessible. While e.g. the extraction of data in the register of persons can be conducted through automation, the parts and data segments that are less structured require collaborative methods. A collaborative method was thus employed at the workshop on 27.10.2016.

All participants work together online to complement the by now 50.000 pages corpus. Highlighting headings on the screen allows for classifications, which then help to generate automated digitalized table of contents. During the workshop, participants were able to classify over 3000 captions.

The next data extraction workshop is planned to be conducted in the context of the lecture on “Europe in Asia – Asia in Europe: actors in the ‘Asia Directories’ in the 19th and 20th century” in spring semester 2017.