Monday, April 4, 2016

Apr 4 – Scanning & Transcribing Documents

Rutland Daily Herald
January 3, 1868
I’m exploring the limits of online historical research.  I’m learning about the terrifying events in 1868 Rutland that may have kept Hattie from returning to Rutland High School after Christmas for the winter term.  I’ve read summaries of the events, but the Rutland Daily Herald published a detailed story, and I want to read that. 

I found a version online that uses technology to scan and transcribe printed text into a word document, rather than an image.  I remember trying to use a primitive version of this in 1996, but the text rendered was completely unintelligible.  This newspaper article was “scanned” in 2014, so the technology should be greatly improved, but the result is terrible.  The text is clear but the meaning is gibberish.  And yet, there is some sense to it…  If the text-reading program reads “m” and consistently turns it into “iii”, some sense can be discerned in the narrative. 
Mtairway which stood botwecn the .shop
and Cramton'ii main building affordi il ;
an easy moans of communication for the
fiery mon.iter, and whilo the firemen, ap-
prehending 110 dmiger in other quai'tera,
were still bnay in endeavoring to hold the
fire in clieelc in the building in wliicli it
oi'iginntcd, it had gained a firm foothold
in the main building aTid likewise eora-
nnuiicated to the building of Mr. Bailey,
which latter being constructed of wood,
afforded an easy prey to the incruaHing
fiamcs, and was quickly a heap of fjh ape-
leas ruins. 

I started trying to fix it.  It can be fun to turn gibberish into meaningful language.  But that’s a rabbit hole I could get lost in.  I could spend much too much time on this tangential pursuit.  I could wait until I can go to the Vermont historical Society archives and read the article myself.  But if this terrible version is available online, surely there might have been a history volunteer somewhere who has already parsed this text.  

So I searched for another online version of the same news story, and found a visual scan (jpg format).  The text is tiny, but I could read it!  Still, wouldn’t it be much more useful to turn the bad textual scan into a more readable version?  Yes, but I shouldn’t be using my research time for this chore.  I can move past this technical issue and proceed with writing about the real human story of Hattie in high school.

No comments:

Post a Comment