Technical Specifications
Technical Specifications for Digital Resources
For digital delivery production, CRL scans derivative files for access purposes from the best available copies. In some cases, the quality of the source may result in a scan of lower-than-optimal quality. In almost every instance, all original source materials, whether paper or microform, are retained by CRL indefinitely, and thus are available for rescanning if that becomes necessary or appropriate. In collaborative digitization efforts the files are scanned at the highest standard available to the digital aggregator or publisher.
Some digital files from CRL collections are represented as page images only, but an OCR search engine is applied to produce searchable text when the quality of the scan, format of the document, and language of the text is suitable.
General specifications for capture and access
- Master scans: TIFFs retained for archival use.
- Image capture: Minimal imaging specifications are 300 to 400 dpi, mostly bitonal with some grayscale as needed for legibility. Images for select content (including the APCRL collection from ProQuest) are in full color.
- OCR: Uncorrected OCR (optical character recognition) is applied to provide searchable text whenever the format and quality of the original source will support it.
- Access files: Scanned documents are accessible as PDF files, combining page images with searchable text. For digital delivery content, multi-page PDF files were produced until 2014, and single-page PDF files have been produced since then. For the single-page PDF files CRL’s DDS server produces multi-page PDFs on the fly, based on user defined page ranges.
Management of CRL digital assets
CRL has or controls two classes of digital assets:
- Digital assets generated by CRL or under the CRL organizational umbrella
- Digital assets generated by CRL partnerships with other organizations, including publishers.
Digital assets generated by CRL or under the CRL organization umbrella are treated the same. Master files are maintained locally, with a first copy stored online and a backup copy stored off-line. Files from early digitization projects prior to 2009 are stored at Amazon Web Service.
CRL obtains copies of digital assets produced through partnerships for safekeeping and eventual access through CRL’s digital platform. CRL relies on the asset management expertise and resources of our partners to ensure long term preservation of these digital assets.