News from Donald Sturgeon, who has used optical character recognition to provide extraordinary searchable access to pre-modern Chinese texts online:
Chinese Text Project: over ten million pages of pre-modern Chinese texts now searchable online
A major update to the site has been made by applying OCR to over ten million pages of transmitted texts stored in the Library, linking scanned texts where possible to digital editions that follow them. Over 3000 existing texts have been successfully linked, allowing side-by-side display and textual searching of scanned texts.
Additionally, around ten thousand new texts and editions have also been transcribed for the first time using OCR. While these transcriptions inevitably contain many errors, they make it possible for the first time to search the scanned texts and immediately locate information within them. All newly transcribed texts have been added to the Wiki – please help by correcting errors when using these resources.
For further details, please see the OCR instructions.