During a recent collaboration with British Library, a request came from BL to have Wikisource texts exported to ALTO XML format or any other XML format, as convenient.
ALTO XML seems to be an XML format designed for OCR output. It encodes the text positioning data that we do not keep in Wikitext. It's closer to the DjVu OCR format.
Dumb question: about "any other XML format", would HTML/XHTML work?
Hi, thanks for considering this! I raised this initially with Bohisattwa. I think it would need to be XML or hOCR. My idea was if transcriptions could be exported as XML it would be great to be able to ingest that into our library system so that the books become searchable through our IIIF viewer, i.e. with XML word coordinates, search terms would be highlighted in our image viewer.