Skip to Main Content

Digitization and Digital Resources

This guide will serve as an introduction to the concepts and terminology associated with digitization and the basic file formats associated with the process.

Scanning

The process of scanning an image or document involves observing several factors that can greatly enhance or degrade the resulting scanned file.  Here are a list of terms involved in the process:

Resolution: The scanning process captures images as dots, and resolution refers to the total number of dots recorded. The more dots recorded, the sharper and clearer the image will be.  Resolution is expressed in a measurement called DPI or PPI.

  • Dots per inch (DPI) or pixels per inch (PPI)? Essentially they are the same measurement, though DPI is used in reference to printing, while DPI is used in reference to image editing.
  • Ideally, for textual documents, resolution should range between 300-600 PPI.  For images, the range should be between 300-800.

Interpolation: This refers to the estimation of resolution by the scanning machine.  If you know the maximum optical resolution of a particular scanner, one should not rely upon interpolation.  

Bit: Bits are single instances of binary code, expressed either in a 0 or 1. Since there are two possibilities for every bit, an image having however many bits will equal 2 to the power of however many bits there are for the image.  Most bit depth ranges between 2-24 bits.

Bit Depth: Bit Depth refers the measurement of bits to define a single pixel.  The greater the bit depth, the greater the variety of color within a digital file. 

  • Bitonal images have only two colors, black and white.
  • Grayscale images are comprised of pixels ranging from multiple bits, typically from 2 to 8.
  • Color images range between having 8 and 24 bits determining their pixels, providing approximately over 16 million shades of color.

Optical Character Recognition (OCR): OCR is the process by which the scanning machine will recognize textual information from a documernt.  Handwritten textual content is normally not adequately supported by OCR, though machine typed documents are read exactly.