Back in October, we announced the new reporting tool for DataAccessioner was ready for download. The DA Metadata Transformer (DA:MT) tool was developed by Seth Shaw to transform the raw XML output from DataAccessioner into .csv and HTML files so that they are much easier to read. Many people had asked for documentation to provide more detailed instructions on how to use the tool. We are pleased to announce that a document has been created that provides instructions and screenshots for using this reporting tool to aid in preservation processing.
Mar 03 2015
Jan 27 2015
The Digital POWRR Project (Preserving digital Objects with Restricted Resources) is pleased to announce the continuation of the POWRR workshops for the next two years. The project, From Theory to Action: Extending the Reach of Digital POWRR Preservation Workshops, has been made possible in part by a major grant from the National Endowment for the Humanities: Celebrating 50 years of Excellence. The grant will allow the POWRR Project to update, develop, and present a minimum of six workshops on digital preservation for archivists, librarians, and other cultural heritage professionals, aimed particularly at those from small and medium-sized institutions.
The Digital POWRR Project began as an Institute of Museum and Library Services (IMLS)- funded grant study to explore practical and pragmatic solutions to digital preservation at under-funded institutions. During the course of our study, Digital POWRR Project team members realized that many information professionals felt overwhelmed by the scope of the problem. This prevented them from moving forward with implementing digital preservation activities. We found that digital preservation is best thought of as an incremental, ongoing, and ever-shifting set of actions, reactions, workflows, and policies. We can start performing digital preservation activities by taking small steps to prioritize and triage digital collections, while working to build awareness and advocate for resources.
We prepared a workshop curriculum based off these findings and presented it to several groups of information professionals as part of the project’s dissemination phase. Much to our surprise, the registration for these workshops filled up quickly and created a long waiting list of eager professionals trying to get into the workshops. Towards the end of the project, organizations of information professionals were still reaching out to team members in hopes to bring the workshop to their area. With the funds of the newly awarded grant from the National Endowment for the Humanities Division of Preservation and Access, the workshop can continue providing practical, hands-on solutions to begin digital preservation practices that meet the demands of the information professionals from small and under-funded institutions.
Over the course of the next two years, the POWRR Preservation Workshops will conduct a minimum of six workshops across the country. We will collaborate with regional organizations of information professionals, which will allow us to emphasize outreach to medium-sized and smaller institutions. These organizations will also help us promote the workshops. Should demand permit, the workshops could be repeated back-to-back on subsequent days at each location. Institutions are encouraged to send a single representative in order to maximize the reach to various institutions. The POWRR Project will also have a limited number of travel bursaries available to individuals in need of assistance traveling to the workshops.
Check back here for updates and to see if a workshop is coming to your area!
Nov 17 2014
Nov 17 2014
Take a look at this opportunity to hear members of the POWRR Team discuss the trials, tribulations, victories, and the future of the Digital POWRR Project.
“The NDSA Infrastructure working group invites you and your colleagues to a call on the outcomes of the Digital POWRR project. In keeping with our ongoing series of conversations, you can expect about half the call to be a presentation and the other half to be time for conversation and discussion.
Title: The Digital POWRR Project: What we discovered, what we did about it, and what still needs to be done.
Abstract: Lynne M Thomas and Jaime Schumacher will discuss the outcomes of this IMLS National Leadership Grant project, outline those deliverables that were particularly well-received by the community, identify gaps that have yet to be addressed, and, with the project end-date approaching, seek guidance on the transfer of project-created products that should be maintained and cultivated for the benefit of the wider community.
When: November 18, 2014 at 2pm ET
Call in #: 877-299-5123″
Oct 25 2014
DataAccessioner developer Seth Shaw just sent a tool to help with reports analysis. He says he wants to get feedback on a simple report transformation (from .xml to .csv) tool first. After that, he’s going to add a way to aggregate the data from the .csv into size by type of file, etc. within the DataAccessioner.
He’s created a DA-branded version of his XSLTProcessor and named it the DA Metadata Transformer (DA-MT; see image below). You can download it @ http://dataaccessioner.org/downloads/da-mt/da-mt.zip
With this tool, you can copy in the XML output if DataAccessioner and receive a .csv file that can be opened in Excel. Once in Excel, sorting to identify file types and size-per-type is possible.
He wants us to note:
1) Although the download is available he hasn’t yet created any documentation or links to it from within the DA website. There’s no firm time on completion at this point.
2) The original processor’s code is on GitHub (https://github.com/seth-shaw/XSLTProcessor) however it retains the original general purpose text. At the suggestion of some POWRR partners, he changed existing labels on the processor and created a “branded language file” that is included on GitHub but it requires a manual process after building to make the change.
3) An example of the general-purpose use is for mass-producing HTML or other versions of finding-aids from EAD. Most EAD transformation tools use the same process as the DA-MT. Your sources are the EAD files and the transforms are the “stylesheets” (xsl or xslt).
Where this all fits in my DP workflow: I use DataAccessioner to capture technical metadata as I move files from transfer media to my as-yet non-bit-level storage device. I use DA-MT to aggregate the file information from xml to something I can understand: file types, quantities and sizes by type. I store the aggregate information in my regular accession files (currently a spreadsheet). My accession information and an Access copy are in a different hard drive from the Master copy and XML. Some day, I will move the accessions with content I think is most at-risk (due to format or other unique attribute) into a bit-checking storage environment.
In keeping with the POWRR motto of “good enough DP for real people,” this workflow costs me no money, no technical expertise (beyond downloading Java and two processing files via ZIP) and very little extra time.
With DA, I am capturing all the recommended technical information for use by a back-end preservation system. With DA-MT I can track growth rate of digital content overall, make a case for purchasing better storage, and keep an eye on where all the at-risk file types are in the interim.
Another way to think of this workflow? I know a healthful diet includes a lot of leafy greens. Even though I can never remember the vitamins in each type of vegetable, I know they are there and they are good for me!
So put DA and DA-MT into your workflow for the long term health of your DP program!
Oct 17 2014
I was honored to be able to represent the Digital POWRR project at iPres 2014, the 11th International Conference on Preservation of Digital Objects. iPres was held in at the State Library of Victoria in Melbourne, Australia from October 6-10, 2014, and brought together leaders in the field of digital preservation from nations across the globe.
Melbourne was a spectacular host city for the conference, as it is full of friendly people, delicious food, beautiful architecture and scenery, and is home to a unique central business district that features countless narrow alleyways brimming with restaurants, shops, bars, and clubs. When I started researching my travel plans, I discovered one of these alleys had been christened “AC/DC Lane” in 2004 after the famous Australian rock band AC/DC. The photo of the street sign – complete with lightning bolt – made me laugh, as it then dawned on me just how much my POWRR poster had subconsciously been drawing from this iconic band’s imagery and energy. When I introduced my poster on Thursday”s quick fire posters session, I made sure to mention this association, and invited the audience to perhaps alternatively think of my poster, and the Digital POWRR project overall as “Digital Preservation Done Dirt Cheap” (riffing on the band’s 1976 hit album and song “Dirty Deeds Done Dirt Cheap.”) It remains to be seen if digital preservation can indeed qualify as a “dirty deed,” and I had to stop myself from writing a Weird Al style parody of the song with my own lyrics relating to preservation topis. In any event, my comparison struck a very positive and happy chord among the participants, and I entertained constant attention from and stimulating discussions with delegates for the rest of the day. I was thrilled. Thunderstruck, even.
On the more serious side, my poster was titled “The Digital POWRR Project: Enabling Collaborative Pragmatic Digital Preservation Approaches.” I attempted to summarize the major work that the Digital POWRR team has been involved in over the last three years, including: the process of testing various preservation services and systems, researching and writing our white paper, compiling our (very popular) tool grid and our current collaboration with the COPTR initiative, the creation of specialized advocacy materials, the development of our popular workshops, our investment in the further development of the open source accessioning tool Data Accessioner, and our work on developing collaborative legal frameworks that can be utilized by anyone in the preservation world, The poster also tries to present some concluding thoughts and lessons learned from the experience of working on this particular project. My poster and summary from iPres can be viewed here in their high-resolution glory. I would like to give a big shout out to my friend Daniel M. Kanemoto for providing the graphical direction and styling that subconsciously harnessed the power of POWRR as well as AC/DC. I remember saying “Can you help me do a poster? All I know is that it needs to be full of thunderbolts. Can you do that?”
I was fortunate enough to attend a number of highly stimulating panels, papers, discussions, and workshops during my time at iPres. Among the highlights of my trip was the hands-on workshop for the BitCurator digital forensics software toolkit (led by the enigmatic Cal Lee), a wonderful panel moderated by Paul Wheatley titled “Getting to Digital Preservation Tools that ‘Just Work'” (which was held in an overflowing room and probably could have sustained a discussion for an entire day!), hearing about further developments in the 4C Project (including the spectacular Digital Curation Costs Exchange site), and, certainly the spectacular gala dinner. The State Library of Victoria was a lovely host venue, combining a stunning building (or series of 23 buildings!) with truly warm and helpful employees. iPres 2015 will be held in Chapel Hill, North Carolina, so keep an eye out for more on that in the coming months!
Oh yes, one other highlight of my trip was getting to see an actual live koala in the wild! I didn’t have much time for sightseeing, but I did manage to take a day long bus tour of the Great Ocean Road. It was a day that I will always cherish. Cheers to my new mates in Australia, and thanks for the wonderful week Down Under.
***As for the title of this post, ” For Those About to Preserve….We Salute You”…if you are familiar with the source material at all, then I hope you enjoy these alternative lyrics that just somehow popped into my head. Who knew that AC/DC was so relvant to our field??
“Stand up and be counted for what you are about to receive!
We are the curators,
We’ll give you everything you need!
Hail hail to the chain of custody!
Cuz format migration has got the right of way.
Hey, can you emulate this for me?
We’re not just saving for today.
For those about to preserve, we salute you….
For those about to preserve, we salute you…
We preserve files at dawn on the digital front line..
Like a bolt right outta the blue,
The skies alight with a computer byte,
Checksums will roll and rock tonight!
For those about to preserve….we salute you…”
Aug 27 2014
Aug 13 2014
The POWRR Team is excited to share this news with those looking for a hosted, soup-to-nuts, digital preservation solution:
DuraSpace and Artefacual have joined forces! Check out the news release at the link below:
May 02 2014
In response to a question from her supervisor, a recent workshop participant asked how to guesstimate the amount – in numbers – of data she would need to store per week.
My reply was that it’s hard to estimate amounts of new material you might need to store in the future until you decide what you’re preserving. Selection is the unsung hero, in my POV, of any kind of preservation. We simply must decide what we are willing and able to keep. But where to start?
An inventory of what you’re currently responsible for is widely recommended and very helpful. After that, it will be useful to note exactly what in your inventory is most at risk. Data accumulation rates that will matter most to administrators will depend on what you decide you must preserve at full bit-level and what can live happily, for at least some time, in basic offline and (don’t forget this part!) geographically distributed storage locations.
Here’s an example of this decision making principle from my world:
Most of the material that my library has digitized, I’m comfortable NOT assigning to the queue for bit-level preservation. There are only a few objects that I digitized because the media they were on was so out of date that it was inaccessible or the originals were too fragile. These things that were truly digitized to preserve their current state AND intellectual value are a higher bit-level preservation concern for me, but they are also in a minority of my digital holdings right now.
Paper material like yearbooks, honors theses, faculty meeting minutes and the student newspaper are all really useful as searchable digital objects, and I want to protect the investment we made in scanning (some outsourced, some not), but I have all the originals and will not be discarding them so I’m not concerned with moving the digital versions into a preservation system right away. Those things will be just fine in an offline storage location until we decide if we want to pay extra just to protect that initial investment. Some born-digital documents that are worth keeping digitally (due to importance of content AND value of keeping in keyword searchable format), stay that way. But a lot of the messages that are important to keep (e.g., brief meeting records and emails from administrators about policy changes) do not have attributes that make them worth keeping digitally, so I print them.
On the other hand, my campus has over a decade worth of digital-only campus photographs, all in jpg, and our new content management system allows individual departments and organizations to post their own photos to their own pages. Those things are unique and being created by people who couldn’t care less about high-res formats because they’re just out there doing their jobs and trying to attract people to our school or showcase their achievements. Additionally, our major events (commencement, several all-campus convocations and colloquia, and our sporting events) are now only being captured live and streamed through a subscription service. I have decided that these born-digital media are more at risk both because of how they are created and by the lack of consistent metadata they are being created with. Therefore, they’re higher up in my queue for prioritizing preservation actions that include enhancing metadata and monitoring format migration needs in an automated environment.
Note that they are also not “library” materials, so they’re going into my this-is-a-common-good-and-therefore-a-shared-cost-responsibility argument 😉
I use the word “queue” because we have no subscription to an automated bit-level system yet. But by separating things out in this way, I have a smaller amount of somewhat-regularly-added-to types of content I can guesstimate based on past collection practices. I had success in getting on a regular transfer schedule of the streamed media because IT is in charge of monitoring that service, and they are now giving me an annual deposit of everything that was streamed the previous year. I’ve got a tougher chore with people posting things to their own web pages. However, IT is trying to communicate the value of using Flikr accounts to manage these files, and if people do start following that advice, I’ll be able to quantify that data because IT passes out logins to our campus Flikr subscription. That way I’ll get a glimpse of what people are doing and can start educational outreach with those dept/orgs about improving their creation and description practices.
Frankly, people who are doing their own thing outside of campus-related programs/services are, in my opinion, on their own. It sounds harsh to say it that way, but I can’t save what I don’t know exists and don’t have access to!
May 02 2014
In a previous post about acquiring digital content, Stacey mentioned that we often “take it all in, the good shepherds that we are. We build systems and websites that can do nifty things.” Stacey’s post was a cautionary tale, and others have expressed these concerns, too. Now I’ll add my voice to it.
I’m a firm believer in being practical about what I attempt to do and honest with others about what I can’t. In my case, that means I won’t “take it all in,” and for what I do take in, I’m critical about what I will keep in electronic form.
A post I read last year on the Society of American Archivists’ listserv for Lone Arrangers (i.e., people who work in archives and have no other full time staff assigned) touched on this topic. The post was about how lone arrangers were managing email preservation and many products were mentioned and then this came from an archivist who also teaches archives management courses:
“The archiving email question comes up all the time, and I have a stock answer. I tell my [X university] preservation students to be bold, if they have to, and keep paper. Yes, paper. Print it out, attachments included, stick it in a folder, and forget about it.
“My motto as an archivist, lone arranger and preservation teacher is, ‘Don’t sign up for the impossible.’ If big institutions are working hard and spending more to sustain their email archives, we little guys ought to be asking ourselves why. That way, we’ll have the answers when the administration comes to us and says to start archiving email.”
I couldn’t agree more! From my POV, it’s all about choosing where you are going to invest your time. I’m lucky to work in a private institution, so I am not subject to all the public records requirements some of my colleagues are, but I think there are larger issues at stake and as a profession I think we do need to start pushing back a bit.
Our users (and the people/agencies that “mandate” things of us) don’t have reasonable expectations when it comes to digital objects. They think these things exist in tangible forms because they can see them before their very eyes, but the underlying code is anything but tangible. And the way objects are created and served by our users makes a lot of what we might capture not really worth saving, according to “best practices” (thinking of those 72dpi jpgs I was sent awhile ago).
For us to capture objects and make them meaningful over time, we have to impress on the people who create them and on the people who choose the systems our users operate in, that standards exist for a reason. A printed piece of paper is not the flashiest use of the latest new technology, but as long as the paper and ink last, and as long as the language/symbols printed on paper hold meaning, it can be conveyed over time!