DataAccessioner developer Seth Shaw just sent a tool to help with reports analysis. He says he wants to get feedback on a simple report transformation (from .xml to .csv) tool first. After that, he’s going to add a way to aggregate the data from the .csv into size by type of file, etc. within the DataAccessioner.
He’s created a DA-branded version of his XSLTProcessor and named it the DA Metadata Transformer (DA-MT). You can download it @ http://dataaccessioner.org/downloads/da-mt/da-mt.zip
With this tool, you can copy in the XML output if DataAccessioner and receive a .csv file that can be opened in Excel. Once in Excel, sorting to identify file types and size-per-type is possible.
He wants us to note:
1) Although the download is available he hasn’t yet created any documentation or links to it from within the DA website. There’s no firm time on completion at this point.
2) The original processor’s code is on GitHub (https://github.com/seth-shaw/XSLTProcessor) however it retains the original general purpose text. At the suggestion of some POWRR partners, he changed existing labels on the processor and created a “branded language file” that is included on GitHub but it requires a manual process after building to make the change.
3) An example of the general-purpose use is for mass-producing HTML or other versions of finding-aids from EAD. Most EAD transformation tools use the same process as the DA-MT. Your sources are the EAD files and the transforms are the “stylesheets” (xsl or xslt).
Where this all fits in my DP workflow: I use DataAccessioner to capture technical metadata as I move files from transfer media to my as-yet non-bit-level storage device. I use DA-MT to aggregate the file information from xml to something I can understand: file types, quantities and sizes by type. I store the aggregate information in my regular accession files (currently a spreadsheet). My accession information and an Access copy are in a different hard drive from the Master copy and XML. Some day, I will move the accessions with content I think is most at-risk (due to format or other unique attribute) into a bit-checking storage environment.
In keeping with the POWRR motto of “good enough DP for real people,” this workflow costs me no money, no technical expertise (beyond downloading Java and two processing files via ZIP) and very little extra time.
With DA, I am capturing all the recommended technical information for use by a back-end preservation system. With DA-MT I can track growth rate of digital content overall, make a case for purchasing better storage, and keep an eye on where all the at-risk file types are in the interim.
Another way to think of this workflow? I know a healthful diet includes a lot of leafy greens. Even though I can never remember the vitamins in each type of vegetable, I know they are there and they are good for me!
So put DA and DA-MT into your workflow for the long term health of your DP program!