Final post and outcomes

Although the POPE project ended a while ago, we had an outstanding issue with regard to running the Preserv plugin to ascertain the file formats which we had. First, find attached our POPE final report.

Outputs:

  • 6,331 records had 10,568 documents attached to them (and thus preserved)
  • Documents were re-introduced to the digital arena by scanning printed backups
  • We analysed the documents retrieved and found most were pdf v1.3 or later and have in place plans to migrate older formats.

Recommendations:

  • Organisations should consider putting simple apache redirects into their websites to reconnect lost document silos when moving to a new CMS with slightly different URLs.
  • Subject domains should preserve their Official Publications and there is a strong business case for this (see report)
  • Consider IPR factors carefully before starting to do the work effectively

What next (in a perfect world)?

  • Find out which of the files are machine readable and OCR those which are not.
  • Put this into a discovery search system and link it to our thesaurus for a rich yet intuitive retrieval experience

 

 

 

Advertisements
Categories: Uncategorized

Progress

It’s already 2012 and only a couple of months to run on the project. Updates have been less regular than planned. We are making good progress. As of today, we have completed 3,305 records (66% of the total).

Just before Christmas we experimented with the scanning aspect of the project which aims to recreate digital copies from printouts. The latter were made as our preservation copy and that decision has now been fully justified as they are being used to replace lost digital content from the web. Initial experiments showed that batching the material and passing it through our sheet feed scanner will do the job nicely. There are some tweaks which are needed, such as the ability to output the scans as jpeg rather than large pdfs. The reason for this is that we wish to produce accessible, web-sized downloadable pdfs which nevertheless are of sufficient quality to contain OCR of a high degree of accuracy. One of our other tweaks is to actually measure that OCR accuracy by comparing a sample of the raw versus the image. Lots to do!

Off to a cracking start

We now have our member of staff in place working their way those 5000 records and checking whether the links are live or not. These were records which used to be in the library catalogue and were added there before 2010. The metadata had been imported to DERA before POPE was even thought of, but we didn’t have the resources to go and pick up the full text documents for preservation. One interesting thing we have found is that it is possible to group links which are from the same organisation and we are finding that links which are broken are often simply due to a simple change to the base URL which has not been linked to a standard http redirect.

Categories: Uncategorized Tags: , , ,

Welcome to POPE!

Preservation of Official Publications in Education

Welcome to the Preservation of Official Publications in Education (POPE) blog. POPE is a JISC funded project and updates / progress will be posted here during its four month lifetime.