Oct 25 2014

Preservation processing update

By mminer in practice

DataAccessioner developer Seth Shaw just sent a tool to help with reports analysis. He says he wants to get feedback on a simple report transformation (from .xml to .csv) tool first. After that, he’s going to add a way to aggregate the data from the .csv into size by type of file, etc. within the DataAccessioner.

He’s created a DA-branded version of his XSLTProcessor and named it the DA Metadata Transformer (DA-MT; see image below). You can download it @ http://dataaccessioner.org/downloads/da-mt/da-mt.zip

With this tool, you can copy in the XML output if DataAccessioner and receive a .csv file that can be opened in Excel. Once in Excel, sorting to identify file types and size-per-type is possible.

Screen shot of DA-MT interface; download at http://dataaccessioner.org/downloads/da-mt/da-mt.zip

He wants us to note:
1) Although the download is available he hasn’t yet created any documentation or links to it from within the DA website. There’s no firm time on completion at this point.
2) The original processor’s code is on GitHub (https://github.com/seth-shaw/XSLTProcessor) however it retains the original general purpose text. At the suggestion of some POWRR partners, he changed existing labels on the processor and created a “branded language file” that is included on GitHub but it requires a manual process after building to make the change.
3) An example of the general-purpose use is for mass-producing HTML or other versions of finding-aids from EAD. Most EAD transformation tools use the same process as the DA-MT. Your sources are the EAD files and the transforms are the “stylesheets” (xsl or xslt).

Where this all fits in my DP workflow: I use DataAccessioner to capture technical metadata as I move files from transfer media to my as-yet non-bit-level storage device. I use DA-MT to aggregate the file information from xml to something I can understand: file types, quantities and sizes by type. I store the aggregate information in my regular accession files (currently a spreadsheet). My accession information and an Access copy are in a different hard drive from the Master copy and XML. Some day, I will move the accessions with content I think is most at-risk (due to format or other unique attribute) into a bit-checking storage environment.

In keeping with the POWRR motto of “good enough DP for real people,” this workflow costs me no money, no technical expertise (beyond downloading Java and two processing files via ZIP) and very little extra time.

With DA, I am capturing all the recommended technical information for use by a back-end preservation system. With DA-MT I can track growth rate of digital content overall, make a case for purchasing better storage, and keep an eye on where all the at-risk file types are in the interim.

Another way to think of this workflow? I know a healthful diet includes a lot of leafy greens. Even though I can never remember the vitamins in each type of vegetable, I know they are there and they are good for me!

So put DA and DA-MT into your workflow for the long term health of your DP program!

mminer

5 comments

1 ping

Skip to comment form

- Asher Jackson on November 10, 2014 at 8:10 pm
- #
- Reply
I understand there may be a plug-in that can be used to convert the XML files into a spreadsheet. How would I locate this?
1. - on November 24, 2014 at 10:44 pm
  - #
  - Reply
  Hi Asher: You can use this link to download: http://dataaccessioner.org/downloads/da-mt/da-mt.zip
- Martin Peltz on January 5, 2015 at 6:20 pm
- #
- Reply
Hello. Has Seth created any documentation for the DA Metadata Transformer yet? I downloaded the DA-MT but some more detailed instructions on how to use the software to its optimum capacity would definitely help.
- Danielle Spalenka on February 2, 2015 at 3:28 pm
- #
- Reply
Not yet, but once you create your XML file using DA, open the DA-MT. From the button “Add DA Metadata,” go to the file path where your XML file is saved. (The default report type is a .csv file, but you can also view in HTML by downloading the file https://raw.githubusercontent.com/seth-shaw/XSLTProcessor/master/xslt/files.html.xslt).

From here, you will need to pick a place where the report will go live. Click the “Set Output Directory” to chose where you would like this report to live. Hit “Generate Reports” and the .csv file should appear in your output directory. Seth will be updating the website soon and we hope to create some more documentation in the coming months. Hope this helps!
- ANGGI HIDAYATULLOH on November 4, 2015 at 6:23 am
- #
- Reply
I use DataAccessioner to capture technical metadata as I move files from transfer media to my as-yet non-bit-level storage device