Testing A Piece of Digital Preservation Management Software: Day One Thoughts

As our intrepid readers know, we are now entering the testing phase of our project. Want to see where we are? Check out the wiki.

So, I attempted to test my first piece of software today. Keep in mind that this is end-user style testing, where my goal was to launch the installed software on the requisite laptop, completely cold, and figure out how easy it was to use out of the box by poking at it. We’re asking questions like “how intuitive is it?” You can find our tool evaluation documents on our wiki.

It was…educational. And not as easy/intuitive as I’d hoped.

Here’s what I was hoping to do with this software (no, not identifying it right now, but it’s one of the ones we’ve selected to look at in-depth):

1. Ingest a folder with a small directory structure and three different file types.

2. Have the software extract/import metadata from the files (or, later, from a supplied file) to the software.

3. Add the metadata to the files, and POOF! prepared for export into the long-term storage vessel of my choice.

*jazzhands*

It…didn’t work like that. I suspected not, and that was confirmed.

This tool seems aimed towards large-batch prep for preservation of materials that are fully processed with all the metadata ready to go, not a small batch of stuff without ready metadata.

I had trouble navigating through the software to find the pages indicated by the help documents. Firstly, read this and try to find a solution. I was making notes in capslock on my documentation sheet within 15 minutes of opening the software in my frustration, especially when the navigation in the help documentation also failed and I had to relaunch it to regain the windows I had clicked away from.

 

This tool (based on the help files that I finally attempted to use) assumed I had metadata all ready to go in a delineated file. What I had was an XML record for the whole file that I managed to export from our ARCHON system. Which I then couldn’t figure out how to convert or import into the “delimited file” that the system expected.

The tool also assumed that I knew enough about MODS to build a record more or less from scratch, if I didn’t have delimited metadata to add. (I’m a former cataloger but not a metadata expert by any means. I don’t routinely build records, I use tools like ARCHON to build them FOR me out of relatively standardized natural language that I enter into the software.)

I have trouble imagining the workflow for this software without more expertise than I currently have. I’m sure I’ll DEVELOP more as I learn more. But.  Frustrating.

I’m sure that the other tools I’ll test will likely be frustrating in slightly different ways.

Digital preservation lesson for the day: there will always be things that you don’t know. The cost of not knowing is the time it takes to either ask for help or figure it out yourself.

…whee?

Reflections from Digital Directions 2013 Conference

Digital Directions Sign

Digital Directions Sign

From July 21st to the 23rd I had the opportunity to attend the Northeast Document Conservation Center (NEDCC) sponsored Digital Directions; Fundamentals of Creating and Managing Digital Collections conference on the campus of the University of Michigan, Ann Arbor. Along with the friendly people, good eats, and plenty of bookstores, I was able to network and engage with people about digital issues in its many forms. There were people who were “experts” in the field, to those who were just getting started in the process. Find the conference program here.

The first day was a lot of theory and guidelines about various issues for digital collections. It was quite overwhelming at some points, with a lot of resources and terms being given. Day two comprised of breakout sessions, where I focused on digital collections, digital preservation, metadata, digital repositories, and cloud. Day three focused on collaboration within your institution, specifically between IT and the library and externally. One of the main idea from day three was beneficial partnerships and that all project partners big and small bring something valuable to the table.

On the third day attendees had the opportunity to tour two of the labs on the U of M campus:  Tour of University of Michigan Digital Conversion Unit or of the Technology Lab. I did the tour of the conversion unit, which was quite impressive and I was glad to see that they have the same Epson 11000XL that we have at Chicago State. See images below.

During the informal breaks I had the opportunity to talk about our project (and to pass out the cards) and everyone I spoke to about it was impressed. Some had heard about it and others were interested in learning more. One of the archivists that I talked to mentioned that it was hard within her state to get people to work together, so she was interested in know how we formed and the responsibilities of each institution.

Conference Takeways

  1. Know your institution, in terms of risk management (is some loss acceptable to you? who will be doing the metadata, how specific will it be?), budget, staffing (who responsibility is what), formats, mission, etc.
  2. It does not take much to get started with digital preservation-every little bit helps
  3. You really cannot do it alone (get assistance at every stage of the process)
  4. Modify standards, guidelines, and best practices to your institution, sometimes just good enough works
  5. Make your metadata interoperable and specific (ex. downstate and Illinois versus just downstate), so that when you merge records it is clear
  6. Approach stakeholders with a tailored message this can be done through workshops and one-on-one sessions. When involving IT, do not let them take over the project, this is your territory.
  7. Assessment of digital collections has to be done, either qualitative or quantitative.
  8. Document what you have done to the collections so that 1) those in the future can know and 2) that data was not lost in the transitions (bit count)IMAG0608
  9. Within the conversation of digital preservation we need to make clear the difference between preservation and access copies
  10. Learned more about the environment that digitization should be taking place in, in terms of lighting, monitors, and equipment.

I think that it is interesting and important to point out that while there was diversity within the conference attendees, it seemed to me that I was the only (if not, I apologize) attendee from a predominately African American university. This is why I think that CSU’s involvement can demonstrate the importance of not just digitizing and providing access to collections about African Americans, but also for the long term preservation of it for future generations.

One of the highlights (in addition to the conference) was exploring the campus libraries and talking with the Outreach Archivist in their Special Collections. To me it is always nice to see how their special collections are set up and what kinds of programming they do.

If you will excuse my metaphors, digital preservation is like this cube in a couple of ways

  1. It only takes a little push to get rolling, and
  2. You have to look at it from several angles

Please add your metaphor in the comments.

Post written by Aaisha Haykal, University Archivist, Chicago State University.

Where is the practical digital preservation research? Some here.

Today the NDIPP blog asked “Where is applied digital preservation research”?

*stands up*

We’re right here.

The Digital POWRR project is focused on pragmatic, practical, sustainable digital preservation for smaller and medium sized libraries and cultural heritage organizations. We know we need to do this, but the resources, and sometimes the knowledge, are tough to come by.

Which is where our project comes in. Have you looked at our Tool Grid? We’ve compared and contrasted over 50 different tools for digital preservation. We’re spending this summer doing in-depth end-user testing of several open-source comprehensive DP solutions: Archivematica, Curators Workbench, and Hoppla, to see how they interact with tools for long-term storage like MetaArchive, DuraCloud, and ArchiveIt.

Want to keep up on our shenanigans? Check out our project wiki. We’re trying to figure out how to do DP at our own institutions, publicly, without a net.

Thanks to the IMLS for funding this highwire act.

A little soul-baring

In our regular conference call this month I was venting, yet again, about all the trouble I have with people not being critical about 1) what digital content they create, and 2) how we can possibly save it all. These obstacles seem insurmountable. My colleagues challenged me to write about this frustration so here goes. I’ll try not to let this become a complete rant 🙂

All of us on the POWRR team are doing a lot of reading–articles, books, software “about” pages, technical reports–and there seems to be no controversy about how digital preservation actually happens: you make copies, put them in different places and do bit-level checks from here to eternity. Or at least for as long as it takes for someone to come up with a better way. That process is what happens once an object is captured/ingested into a DP system. But there’s the rub! You’ve got to get your virtual hands on the things in order to ingest them!

Once upon a time I lived in a part of the country that was known for its cockroach population. I was in an apartment that I thoroughly cleaned before moving in and that the landlord regularly had bug toxins professionally applied to. And yet, I could still count on seeing fleeting glimpses of scurrying bodies if I turned the kitchen light on at night. That’s the image that comes to mind when I think of how “official” communication happens at my institution. My only consolation comes from regular attendance at professional archivists’ conferences that help me realize I am not alone.

We live in a fragmented, scurrying world of e-record creation. It’s a place where Content Management Systems are viewed as a good thing so that every employee can create a Web page, upload, edit or change a document in their department 24/7. Things get “routed” by blog posts, Websites and email distribution lists. Tweets, Facebook, Vimeo, YouTube and ISSUU channels are all the rage. Question: if there’s a conflict between the way a document shows up on one Cloud compared to the other, who will sort it out?

And if by chance you’ve gotten the ear of the right person in the right office and can snag a copy of things as they’re distributed, you’ll also need to become a file-name decoder. Who knows how long it has been since anyone has gone through professional training to become a secretary? Instead of a reliable core of institutionally-knowledgeable people who learned how to file and name things, every person today is their own publisher and distributor. And every person who comes after them will have their own way of doing things and then it’s back to square one: introduce yourself, state what kind of record you’d like a copy of and why, let alone how.

We know that people have always come up with creative ways to document their own lives, but now those practices have spread to professional realms of our institutional history. This feels like a game-changer. Others will have different points-of-view, but I have no interest in telling people about what they should do with their personal files — for research or pleasure. People in the library and archives professions are often asked about how individuals can save personal objects after the fact, like when a photo album gets wet or when the folded over letters/certificates of an ancestor resurface from some forgotten storage place. My anticipated response to the question “What should I do?” after the fact in the digital age seems pretty brief at this point: Find a digital archaeologist!

The future of my profession lies in becoming that archaeologist. I may already be the dinosaur 😉 but my sincere hope is that my institution’s digital bones will last long enough to be excavated by my future self. More will be needed, though, to make sense out of the Digital Deluge. I’d like to propose a new area of study for the archivist/archaeologist: social psychology. There’s going to be a real need for professionally prepared people who can understand why/how people create things as well as how to help their objects resurface for future use.

What Are Your Digital Preservation Roadblocks?

shared under a Creative Commons license DeweyC21; original source: http://www.artsjournal.com/dewey21c/road_block.jpg

Working on large initiatives like digital preservation often feels like being in a car on a bumpy road under construction. There are lots of stops and starts. It’s “hurry up and wait.”

One of the things we’ve come across here on the Digital POWRR project that we weren’t expecting are some technical roadblocks. Specifically, that moment when you’re figuring out what the next step is to take, and you look at the instructions that some generous soul has written up for installing that piece of software, or configuring it, and you see words like this: “Sourceforge” “virtual machine” and “command line.” I’ve linked to the Wikipedia explanations of each, which is one of the ways I’ve come to understand each of these terms.

That’s the first roadblock. The instructions assume that you, too, are a developer or tester, and know what these things are, how to access them, and how to install and run them on your computer. If you’ve never used a command line interface within a virtual machine setup to install software from Sourceforge, you may find some of this intimidating.

If you’re in an open computing environment (i.e. you can download anything you want to your machine without anyone’s permission), you may be able to poke at some of this stuff and muddle through it, especially with the help of user communities. That said, I would love to see some of the instruction writeups for software get “translated” from developer-to-developer to developer-to-possible enduser.

Let’s put it this way. I did the tiniest amount of command line work back in the late 1990s in library school. I learned HTML. I took one scripting class, and we worked in Perl. I am not a programmer or a network administrator, and the chances of me becoming one in the next year are slim to none. I can follow basic directions to complete a task, and am unlikely to blow up my computer by hitting a wrong key. I can tell the difference between a good source of code and a bad one, and I’m not the kind of user who clicks random links and spreads malware or viruses. I’m *reasonably* savvy as a user. (Real world example: I could probably root my Nook to turn it into an android tablet with decent directions, but it would take me 3 times as long as someone who does this sort of thing for a living or a hobby).

This is partially because of the second roadblock. I work in a closed computing environment, and have for basically most of my professional career.  Like many libraries, we have dedicated professional staff who maintain our desktop machines, and monitor all the different software packages that we use. They are great, but any project that involves new software in a computing environment like ours means that they, too, are involved in the project. The have to be, to maintain the integrity of our networked environment.

Even if I wanted to, I’m not generally allowed to download and install software on my desktop just to take it for a spin. I’ve never installed open source software, and I wouldn’t know how to set up a virtual machine if my life depended upon it.

I understand the need for a closed computing environment when you’re trying to deal with a really complex system. No one wants to deal with people installing random software willy-nilly that could crash everyone else’s computers. Heck, no one wants to be the end-user who DID that.

So how do we drive around the roadblocks? Lots of meetings. Good, clear communication. More meetings. Understanding that the systems folks in our closed environment are very busy, and making sure that our requests for software are clear, concise, and lay out the risks, if any, as best we can determine them.

This is one of the roadblocks we’re dealing with as we get ready to test software: the combination of not being terribly familiar with the downloading/configuration side, and working within the schedules of our fantastic IT folk to make it all work.

What are your roadblocks to digital preservation?

Preservation Exhibit at NIU

With Preservation Month just around the corner we have put together a Preservation Exhibit here at NIU that covers both physical and digital preservation. We would like you to see a little preview of what has come together, and in the future we hope to also adapt the exhibit into an online version. We’ll keep you updated. ; )

This is the exhibit introduction poster…

Our Gift to the Future Exhibit Introduction Poster

 

The first case focuses on physical preservation. It discusses and shows some of the tools and strategies that archivists can use to help preserve paper (and other physical) materials.

Preservation Exhibit Case1 NIU

The second case includes two “tales”.  One is a Tale of 2 Artifacts discussing the Dead Sea Scrolls, which have been preserved as much as possible, and the ’96 Election Website for Clinton, which has not been preserved and most of what remains is an image of the homepage.  The second tale is a Tale of 2 Generations, 1913 and 2013, the individuals from 1913 have journals and photos that have been preserved, but do the individuals of today know how to save their Facebook profiles and the like so that future generations can learn about them?

Preservation Exhibit Case 2 NIU

The third case explains what can happen to digital objects and files over time, and thus why they need to be preserved with care.   Specifically it explains hardware obsolescence, software obsolescence, and bit-level deterioration.

Preservation Exhibit Case 3 NIU

The last case includes personal preservation tips (top) and explains what is going on at NIU concerning digital preservation (bottom).

Preservation Exhibit Case 4 NIU

The last piece of our exhibit is a poster that gives a little bit more information about our Digital POWRR research project.  Here is a digital version…

PowrrPoster1

Are you doing something to share or celebrate Preservation Month?  Tell us about it in the comments!

Do it anyway.

 

I’ve been thinking about this song lately. Not just because I’m a fan of the Muppets and enjoy Ben Folds (although both of these things are absolutely true).

But because keeping up with digital preservation, all of the tech, all of the news, all of the projects is hard.

I just want to acknowledge that. This is hard.

One of the tasks I needed to complete this month was getting information about several projects to add to our Tool Grid. This will be available once it’s complete.

Talk about a very quick lesson in how much there is to know, and how little of it I can keep in my head.

The way our tool grid works is that we divvied up the names of tools, and basically each try to figure out what we can from looking at the website. This should be familiar to anyone working currently in libraries and archives as a practice.

Just as an example, let me talk about a tool called BWF Metaedit, which was on my list.

So far as I can tell, BWF Metaedit is an open source tool that “permits embedding, editing, and exporting of metadata in Broadcast WAVE Format (BWF) files.” It was created by “Federal Agencies Digitization Guidelines Initiative (FADGI) supported by AudioVisual Preservation Solutions.”

So, free to use, and it does a specific task. Great. I’m approaching this as someone who doesn’t work with AV very much; my metadata experience is limited to social media tagging and Archon these days, although I used to be a cataloger way back when.  I tried looking at the documentation, and I must say:

This documentation? Not designed for anyone other than a current developer or someone who does coding routinely. Here’s how you set preferences for the software. Here’s a suggested workflow. I think I understood *some* of that one.

Now, my MLS is about *cough* 15 years old now. I took a single scripting class there (PERL, if you’re curious), specifically to give me some chance to be able to parse this kind of information when working with specialists.

I’m feeling very much out of my depth here.

That, I expect, is not uncommon. Nor is the feeling that there just isn’t time and energy and administrative space to learn all of this stuff, even if I wanted to. Which I’m not sure I do, right this second, as this particular tool is less likely to be “my problem” in current library workflows here. It’s not one of the tools we’re looking at closely as one of our more comprehensive solutions, but it’s good to know it exists.

But for the materials that *are* “my problem”? I need to learn “all this stuff” *gestures at the wide world of tech*.

…or maybe I don’t. Maybe I need to learn just what I need to learn. Maybe I need to acknowledge that there will be gaps in my knowledge, and that’s OK. I can develop expertise over time, in the things that are relevant to me, to my collections, to my organization. Just as I did for the rest of my library work. For teaching and bibliographic instruction. For collection development. For exhibits. I’m basically self-taught in many of these areas, going from things I experienced as a student assistant, graduate assistant, library assistant, and patron. Or I just did what I thought was best for the collection based on the information I had, and didn’t exceed my budget.

I did it anyway. Without ALL the information. Just enough.

Digital preservation is no different.

Yes. I live in the fear that Digital POWRR will somehow miss one of the bajillion projects that are out there. It’s really, really hard to get one’s arms around this whole digital preservation problem. I fear that we will not choose wisely for our digital preservation setup, even though we are doing everything in our power to do so. I fear that budget restrictions will force us to select an option that is not the best for us, but what we can afford.

It’s absolutely a risk. One that every organization must take.

We must do it anyway.

Paralysis, or hoping the problem will go away, is not an option. These materials will only grow in number and size, and change. It is better to do something than nothing. Something gets us more options 3-5 years down the road.

Doing nothing ensures that our current crop of cultural heritage professionals will be remembered as the generation that really, really borked the digital revolution and killed our historic legacy just as surely as those people that burned letters and papers in the 19th century.

Which is definitely NOT what I was hoping for.

 

A Cry From the Trenches: Fighting the Ongoing War with Digital Obsolescence One Day at a Time

I’ve lost track of how many “OH CRAP!” moments I have experienced in my short career. As the Digital Collections Curator at Northern Illinois University, I have seen some pretty gnarly (digital) things. Things that have really hammered home for me the importance of the research work we are doing on the Digital POWRR project.

Many institutions have acquired digital content without much thought given to future use and access. We take it all in, the good shepherds that we are. We build systems and websites that can do nifty things. We frequently take it for granted that digital materials require active and ongoing preservation. This is not a one time thing! Traditional printed materials readily show us their boo-boos upon physical inspection….torn pages, broken bindings, insect damage, mold, graffiti, etc. Digital stuff, on the other hand, can’t be eyeballed for a quick health check. You need to be mindfully monitoring it constantly. Like a puppy near a bunch of electrical cables. 0.o

However, sometimes just monitoring or checking files does not ensure that they will be adequately preserved for true long-term use. You are inevitably going to run into various challenges opening, rendering, or transforming digital objects that are dependent on proprietary software. I’m sure most everyone knows the pains of shuttling things even as simple as word processing files between different software versions and hardware platforms. In the mid 1990s, I remember being horrified that Microsoft Word wouldn’t open my ClarisWorks files. How many of us actually take the time to forward migrate word processing documents proactively? It inevitably always seems to happen when you need a crucial piece of information. At least there are digital forensics tools and converters to help you out of this jam. What if you opted to save materials in a compressed/lossy filetype, rather than retaining a full quality master file? Getting those bytes back is just simply not possible.

This very issue is currently a problem here in the Digital Initiatives unit as we move old digital objects into a Fedora Commons-based repository framework. We have a collection of audio files that were recorded in the late 1990s for which we have no master or preservation format. There are no analog master tapes, no recording software session files, no digital .wavs, and certainly no broadcast .wavs. The majority of the recordings only seem to exist in RealMedia format. Even though I plan to follow the Association for Recorded Sound Collections’s best practices regarding audio materials, it remains to be seen if we will be able to migrate RealMedia files to .wav. We have also worked with a variety of video files for our Southeast Asia Digital Library project, and have ran into various baffling issues with filetypes, codecs, and software platforms. Standards in digital video are in such flux, one can feel utterly paralyzed when it comes to making preservation decisions. The LOC digital preservation blog, The Signal, even refers to digital video standards as presently being “the wild, wild west.” I wonder if I should start wearing a holster and riding a pony to work….

And what if you can’t even find your darn files to begin with? Are they stored on a network drive, server, or scattered across external hard drives, flash drives, or even CD/DVD-ROMS? Consolidating all of our materials into a single storage NAS unit has been a huge step forward for us, however, occasionally I cannot find files when I need them. Sometimes master files have different filenames than their derivatives. This would not be a problem if it were a file here or there, but hundreds or thousands is another matter. Bulk file naming programs are very powerful tools that can be a blessing to people working in a digital production lab. However, whoever is doing image production and processing needs to keep an eye on their part of the preservation puzzle. Remember to sync your naming across all file versions, and document your work, preferably, for your co-workers and predecessors. Even better yet, create and stick to consistent file naming protocols for all of your digital collections! This is the best preservation strategy there is. Also, consider sticking to hierarchical folder structures that are easily intuited by your colleagues, and document your methods of organization as much as you can.

There are other preservation nightmares that we have experienced in the lab. Have you ever been stranded? (“Stranded on your own….stranded far from home..” as a favorite old song of mine goes.) Someone forgot to pick you up….your car broke down….flight was cancelled….doesn’t it feel awful? Have you ever been unable to rescue and revive stranded digital objects? The poor things. They went on a three-hour tour, and now they’re stuck on Gilligan’s Island. (And yes, I know they’re just 1’s and 0’s…I just tend to anthropomorphize things just to keep myself entertained!).

Our lab’s own Gilligan’s Island has recently became a fully-fledged lost continent of Atlantis, sadly. I am speaking of a stand-alone server that we used to power our interactive/map resources, that used ESRI’s ArcGIS server software. The server was quite old – it ran Windows Server 2000. Yes – *wince*. We have known that it was terrifically out of date for some time now, but we were unable to migrate to a new server because our version of ArcGIS required this particular server environment. Upgrading ArcGIS (which the library had received for free, initially), was a costly proposition. The files associated with the software are proprietary and cannot simply be exported and loaded into an open source alternative. After years of hand-wringing, the server finally failed. Migrating the data and software to a virtual machine also failed. We are going to have to try to reconstruct Atlantis in an entirely new framework, rather than building on what was there previously.

Although this loss will allow us to reconceptualize and refresh our interactive resources, it’s a shame to lose a digital resource that you or your colleagues have put a lot of hard work into. But this experience has proved to be a genuine “life lesson” that can inform and improve practice for ourselves and others as well. Our Digital Initiatives unit, like countless others, sprang up due to the availability of grant money for large-scale digitization initiatives. There was an initial push to make materials available, and not as much thought given towards their long-term preservation. The Digital POWRR project is allowing us to go through the digital preservation planning process, so that we can fully take inventory of our digital materials, assess (and adjust) our past and future practices, and write comprehensive policies so that we never have to deal with the “oh crap” types of moments that I have described in this post. Well, maybe not never….but hopefully way less! Additionally, policies and procedures need to be holistic, addressing both the digital objects and their underlying environment (hardware and software). Websites and other interactive types of resources should be part of any ongoing digital preservation plan – they need to be consistently evaluated and refreshed as well.

Digital POWRR has become kind of like the brown paper bag I hyperventilate into when I’m having an “oh crap” kind of day. It’s soothing to know that I’m not alone, and that I have colleagues grappling with the same issues out there in the digital ether. And it’s downright exciting to finally be able to install and test digital curation tools and systems that will help get us thinking in more of a “lifecycle” mindset, rather than a linear one. I’m definitely feeling more confident in our ability to straighten out our past and lay the groundwork for the future.

2012 Digital POWRR Annual Project Report

We have posted the Digital POWRR report for the first year to our wiki.  If you’re interested in reading our Annual Report narrative from 2012, you can do so HERE! It discusses what progress the Digital POWRR project has made and some of the challenges that we have come upon.  Boy have we got a lot done, so be sure to check it out! Enjoy : )

An Unsung Hero of Grants Administration: The Business Manager

Everyone’s organization is different, but we all have someone who fits the description:

The Library Business Manager, otherwise known as She (or He) Who Makes Things Happen.

The Business Manager is the person who does lots of cat-herding, who makes sure that all of the forms that you didn’t know existed get filled out, and that the processes at any given institution are followed, so that all of the money from the grant can be used. They deal with red tape so that you can focus on the actual grant work. They are like administrative ninjas.

They know whom to call to get key questions answered, paperwork moved along, and checks cut.

We *love* our business manager. She makes things happen. (Thanks, S!)

If you’re new to grants administration, here’s a word to the wise: get to know (and appreciate) your business manager (or the person who fulfills that role in your organization).  They make life so much better for everyone.