Managing Born Digital Content For Dummies

**Disclaimer- No we don’t actually think we (or you) are “dummies” for not knowing this information, even if that is how we may feel. It’s just a catchy title. ; )**

We have all been there at some point in our lives.  We set out to get something done and have no idea how to do it.  So the logical next step is to find some sort of instructions, yeah? *insert head nods*

We (many of us on the Digital POWRR team) had such a question about what to do first when we have a physical piece of media in our hands.  Low and behold we found this wonderful article that we believed would solve all our problems!

Erway, Ricky. 2012. You’ve Got to Walk Before You Can Run: First Steps for Managing BornDigital Content Received on Physical Media. Dublin, Ohio: OCLC
Research. http://www.oclc.org/content/dam/research/publications/library/2012/2012-06.pdf

(Seriously, it’s great! Give it a read!)

After briefly reading over it many of us accepted that the problem was solved and moved on with our lives temporarily.  Until recently, when one of us actually tried to go through step by step.  We realized our knowledge may not be as deep as we originally thought.  Therefore we have taken this wonderful article that gives a step by step process to get off the ground running with physical media, and added some  additional information for those of us who may not have all the training to get over the hurdles we weren’t aware were going to be in our way.  It can be found in our Digital Preservation 101 section of this website, or directly linked right…HERE.

The process for discovering these resources was just a simple process of researching the things that tripped us up.  Like how do we “Write Protect” things, what is a disk image, and how do we create a checksum?  The additional resources are just links to places that will help to answer those questions, so instead of you having to take the time to look for them, we did!  **Disclaimer** No we don’t actually think we (or you) are “dummies” for not knowing this information, even if that is how we may feel. It’s just a catchy title. ; )

Digital Preservation Tool Help? Yes, Please!

The Digital POWRR project team is excited to be working on something we think will be helpful to those of you who are trying to sort out what on EARTH all of the (alleged) digital preservation tools and technologies actually do! It started with a brainstorming session where the team captured every tool/technology/service that we have come across in our digital preservation explorations. We came up with almost 90….ACK! Seriously, who has the TIME or the INCLINATION to sift through all of these tools to figure out what they do, how much they cost, etc.

Team PictureWE DO!

Taking a divide-and-conquer approach (along with a whittled down list of 60 tools), the Digital POWRR team is tackling this so you don’t have to! As a part of our investigation into how institutions with fewer resources (read: money and/or people) can engage successfully in digital preservation, we have created a grid that will map out each tool against a list of functions a digital preservation system should provide. We have based our list of functions on the OAIS reference model and thrown in a few of our own, like:

 

  • Is it open source?
  • What are the basic system requirements?
  • How much does it cost (or is it FREE!!!)?
  • Does it offer a geographically dispersed data storage model?

We feel the pain of our colleagues who are trying to figure out this “digital preservation thing” while still managing all of their normal responsibilities. (ya know, like explaining to the very nice donor why your institution can’t take Uncle Bert’s collection of romance paperbacks off of their hands…)

What the Digital POWRR team is hoping to accomplish with this particular exercise is this: Professionals who are overwhelmed by the concept of digital preservation and the number of technologies that purport to fulfill some digital preservation requirement will be able to use the grid we have created to understand what a digital preservation system should do, which tools provide which functions, and a snapshot of each tool’s costs/system requirements/etc. We also recognize that some institutions with fewer resources (our project’s target audience!) need to piece together a digital preservation system with various open source/freely available tools….we are hoping the grid will help them in that process.

We will be spending the next few months working on this, so look for our results by late spring 2013. We will also be including the grid in the final report of the larger investigation we are conducting. That report will be coming out through the IMLS in 2014. Which should be in just enough time for about 60 new digital preservation tools to be introduced on the market……ARGH!!!!

What are we missing? What has our culture lost forever?

Why are we researching a solution to the digital preservation challenge?  This is a great question, mostly because we know that important objects are being stored digitally and we do not want to wait for the disaster to happen before we prepare.  Has a digital disaster of data loss already taken place though? 

The truth of the matter is that as humans we are embarrassed of our failures.  We try to hide them.  Failures to preserve our digital objects are no different, we try to sweep them under the rug with an acceptance of the fact that the data is lost for good.  Finding a story of a digital file of historical importance being lost has proved to be difficult thus far (we aren’t giving up yet).  Companies have come clean about their customer data being stolen or lost because that is the best business move and individuals come forward with their stories of computer crashes in hopes of getting help to restore their personal photos and documents.  (Side note – don’t wait to lose all your personal data to learn the lesson of backing up, start now) However, no one wants to take the heat for lost data that may be of historical importance, or maybe we do not even know that it is lost yet.   

You may be saying, what do you mean there are no stories of lost data? Plenty of photos, books, and historical papers have gone up in smoke and other disasters.  We know about much of the physical data that we have unfortunately lost, but I’m talking about 1s and 0s. The pictures that were taken only on a digital camera at a special event, saved onto a computer, and then forgotten to be lost to bit rot, viruses, or obsolescence.  The emails of a prolific author that contained messages with loved ones discussing inspiration for their books that were disregarded.  The blog posts or web pages that stirred a group of people to make change and then were not recovered when the server crashed.  Has anyone taken note of events such as these?

In some cases we have noticed a lack of digital files or data, for example the missing Bush Administration emails, the 1996 election websites, or the Geocities website service that was stopped, and then worked hard to recover as much of that data as possible before it was too late.  Tools like the Wayback Machine that archive the internet help to solve this problem as well.  In my heart of hearts, I still believe that we may be missing pieces of our history because of a failure to preserve digital objects.  Wouldn’t it be a shame to just accept it? Shouldn’t we attempt to note what is missing so that future generations can at least learn from our preservation mistakes?  What are we missing? What has our culture lost forever?

I should note that these questions are not rhetorical.  Please comment and share your stories of data that you were unable to recover, even if it was only personal. What is your data loss horror story? Also please share stories of any historical digital objects (e-mails, documents, photos, websites, music, video, etc) that you think may be lost to us, and we will follow up on your leads!

Digital preservation one-on-one

One of the (many) things happening in the Digital POWRR project is that each project partner is interviewing a group of users on our individual campuses to see how users manage their own data.

We’re asking questions about what kind of data people create in their daily work, how much of it there is, what types of files they make, how they keep their files safe, and whether they’ve ever had a massive data loss. We also ask about user awareness of campus, college, or departmental policies about how to maintain data. You can see our full list of questions on our project wiki.

This has a couple of different purposes.The first is to give us an idea of how much data is created, managed, and used on our campuses in an official capacity, and how much data is created, managed, and used on our campuses without the campus’s knowledge or intervention.

The other purpose to this is the beginning of an educational one-to-one conversation about data loss and how to manage personal workflows so that user data survives as long as possible.

The hoped-for outcome is that we will have a reasonable cross-sectional understanding of How People Work On Our Campus, which will inform our tool selections and our expectations for how much storage we need, for example.  We’re also hoping that we’ll have the opportunity to educate the folks we’re meeting with to think about the safety, security, and longevity of their data as they create it.

We are still in the middle of the survey, so I can’t give full results. But I will say that thus far, what we’re learning has been rather eye-opening. Most of the people we are talking to fully believe that they are totally on their own, and whether or not they go beyond saving their working files on their working computer seems to relate directly to whether they’ve had a massive data loss before. If they have, it’s multiple backups for them, all on their own data carriers.

Most of them are not using the campus networked resources at all, for a long list of reasons.

That, I think, should give all of us pause.

*goes back to doing interviews*

 

Do we like to assume that people think about digital preservation?

My oldest book as a kid was one that had been given to me by a dear friend of my grand-parents. It was a large book with a few photographs and thick pages, published in 1867. I still have it today on a shelf. I pick it once in a while to read a few lines. Years have passed for me and so have they for the computer technology, which has changed quite dramatically fast.

From the time I was doing structural engineering analysis on an old IBM machine and using 5 1/4-inch disks, I soon switched to 3.5 inch floppy disks to save my documents. But, how could I store larger files such as TIFF and JPEG, etc. and most importantly my thesis? What is called the “super-floppy” served the purpose, but will I be able to retrieve my thesis from that 100-MB zip disk today?

Comic depicting the difficulties of preserving material.

Similarly to Cathy, the comic strip in the January 2007 Digital Preservation newsletter of the Library of Congress (http://www.digitalpreservation.gov/news/2007/news_archive0107img.html), I had printed my thesis. Since then, I digitally converted it into a pdf and applied OCR (Optical Character Recognition), however I should not feel more reassured than I was before. The difference is that today I sit on the other side it seems. I am involved in digital preservation with work, which now gives me a better understanding why “during the early decades of computing, the threat of file format obsolescence to the long-term maintenance of digital objects was not widely recognized” (http://www.dpworkshop.org/dpm-eng/oldmedia/obsolescence1.html). Earlier on, my main concern was, “if I break or drop the floppy disk, it would be bad.” I think today, we take it for granted. It is saved, on the computer or not. It is saved.

It is simply different. I don’t necessarily need to store my files on a physical device such a USB x-times-GB Flash Drive or on my computer’s hard drive, except for immediate convenience perhaps, I can simply upload them to the cloud and retrieve them quite rapidly. One constraint is to have internet access.

Perhaps there is no right answer, we like to assume that people are thinking about digital preservation each time they grab a camera for example. How many of us think of TIFF or CR2 before taking some shots on the spur of the moment, which one would want to keep later? How many of us write on or do 3-D design with a proprietary software?

Granted, not everything has to be about digital preservation per se considering the multitude of data produced each day that not everything is to be kept, nonetheless can the process of getting things back be even feasible? It is not just about time and funds – is the file even retrievable?

Amazingly it could be as in the example presented by Matthew Kirschenbaum in his article, Digital Magic: Preservation for a New Era (http://chronicle.com/article/Digital-Magic-Preservation/131091/#top), which led to the retrieval of the video game Paul Zelevansky had formatted on an “old pre-Macintosh Apple II”, called Swallows (greatblankness.com). 

The Train-the-Trainer workshop launched by the Library of Congress’s new Digital Preservation Outreach and Education (DPOE) program is a great way to boost awareness of what digital preservation entails as new trainers will help disseminate that information to their local communities. 

So, is there anything wrong with keeping files on the computer, the fact that I forgot about my thesis being saved on the 100-MB zip disk in a proprietary format? Life just simply went on and I knew I had those files. I will need to get back to those sometime. Similar examples are the use of Corel WordPerfect in legal practice, which causes a readability gap with Microsoft Word (http://law.duke.edu/actech/wordprocessing/) or retrieving files from another proprietary type of format such as VICON Corporation.

Of course, files are backed up for one’s needs and depending on the field of study, such as in medical research, where the outcomes of files will depend on the ‘data use agreement’ or in computer programming where files will be obsolete after few years, what does digital preservation mean? Will it be the same as keeping a book on a shelf? A computer hard disk failure may be the end of one’s priceless data.

What keeps YOU up at night?

QuestionsWhiteBoard

As Lynne mentioned in her previous post, the entire Digital POWRR team, from all 5 partner institutions, got together for a marathon-style meeting in August. There was angst. There was gnashing of teeth. There was A LOT accomplished!  One item on our agenda was to make a laundry list of questions and concerns about digital preservation that we want to ask the members our Advisory Group in October. (If you know one of our Advisors, please do not warn them of this impending onslaught…we are hoping to have them all show up at the meeting!)

These are the questions that are keeping members of our team awake at night!

You see, our team is made up of archivists, curators, historians, and librarians. We are all responsible, in one way or another, to make sure that all objects entrusted to our institutions’ care are protected, preserved, made accessible, and passed safely on to the next generation.

And that includes those pesky digital objects.

You know, the ones that show up at your Archive carefully tucked away inside the deep recesses a 5 1/4″ floppy disc.

Or the ones that are housed inside the hard drive of an author whose family just bequeathed all of her valuable “papers” to your institution. Oh and by the way, she liked to write her papers in EasyWriter and that hard drive is in a 1979 Apple II.

Some of those pesky digital objects that are important to your institution might be in the form of research datasets created by a faculty member on your campus. They are stored on an external hard drive, located in his office, in a proprietary format, and he thinks that because they are digital, they will last forever!

But you know better.

We know better.

And that is why it keeps us up at night.

New website under construction!

Welcome to the new Digital POWRR website!  Please excuse us while we get up to date with our information and design at this address.  If you’re looking for current information please visit our up-to-date website.

Lessons Learned #1: The First Stages of Grant

One of the goals we’ve committed to in our IMLS grant is transparency about the lessons that we are learning along the way. This is the first in what we hope will be a series of “lessons learned” posts that will talk about our experiences.

Lesson 1: It will take longer to hire your grant-funded position than you think.

  • It took us 6 months to hire Jaime, our Project Director. Fortunately, the IMLS assured us that we are right on target and that they were expecting that kind of timeline.

Lesson 2: Administrivia, especially as you get rolling, will eat a lot more of your time than you anticipated (a good friend of mine likens it to being nibbled to death by ducks).

  • Figuring out logistical issues in terms of how to pay for concrete things from a grant, particularly when the grant’s payout system is based upon pay-first-then-be-reimbursed, rather than “here’s your big fat check; go get ’em!” takes time and patience from everyone involved.
  • Figuring out how to incorporate grant management practices and requirements for this particular grant and this particular granting agency into our library’s current business procedures also eats time. Administrative things like who signs what, who sends which form to whom, and in which order changes based on the current requirements, the granting agency, and more. Thus, even when you have an experienced Primary Investigator on board, it still feels like starting from scratch (which it rather is, in many cases).
  • Questions like “What is the standard per diem I will receive when attending X conference in Y location?” can get very complicated answers.
  • If your grant is working with other people in any way for research purposes (i.e. doing a survey), you will likely need to go through an Institutional Review Board training and application process.

Lesson 3: There is, despite our best efforts, really no replacement for an in-person meeting.

  • Our first in person meeting once Jaime was on board of the entire grant team got more accomplished in a single session than we did in the nearly 6 months of conference calls that we used to keep in contact before her hire.
  • In-person meetings work best with a strong agenda and a strong moderator. Time limits for discussion keep things moving, too. (We knew this, and that’s why our in-person meeting was so successful).
  • Lots of caffeine helps. So do breaks for food that are purely social. After all, we were all willing to get into this because we wanted to work with each other. 🙂

Lesson 4: The first time you put together your project timeline, you will likely scare yourself.

  • Our in-person meeting involved us looking at our goals, and then backtracking to figure out what we needed to do when to achieve them. Putting it all together in a timeline reinforced the urgency of the project, moving us firmly from a mindset of “we have two whole years!” to “we only have two years!”

Lesson 5: It’s okay. Do it anyway. Even failure is an option.

  • Fear can be a great motivator.  Make use of it.
  • One of the beautiful things about IMLS National Leadership Grants is that they have the possibility for failure built right into their DNA. The whole point of these grants is to take risks, to experiment, to try something new. Which may not work, or may not turn out how you think it will.
  • The real outcome of these grants is the lessons that you learn, and the skills that you gain, in the trying and the testing, regardless of final outcome.

Long-term Preservation of Digital Objects

Historians producing digital objects in the course of their work may be unaware of the need to provide means for their long-term preservation. Although many of us have assumed that digital objects are eternal, they are in fact subject to degradation and decay like any other artifact.

Should digital objects stored on a single server, CD, or DVD succumb to the process of “bit rot” (http://en.wikipedia.org/wiki/Bit_rot), they may be effectively compromised or even lost. Damage to storage media like CDs and DVDs can also compromise or eradicate data, as can the catastrophic failure of a server, or even a single disk drive in an array.

In the case of materials digitized from analog formats, the only recourse would be to re-scan them. Born-digital objects could only be recreated in toto.

In addition, digital objects created and stored on specific media may one day become unusable. For example, I presently have in my desk drawer approximately ten or twelve 3 1/2″ floppy disks containing notes for, and drafts of, my dissertation. I know of no computer on which I can open them.

Another, hypothetical, example concerns earlier materials that I created as an undergraduate student in the mid-1980s. These are of course long gone, but the point I am trying to make concerns the fact that I created them in a program called Word Perfect. If I had been prescient enough to migrate these files forward from the original 5 1/4″ floppy disks into newer formats, I would still be hard-pressed to find a machine with a copy of Word Perfect, able to read twenty-five year old files, installed on it.

The task of digital preservation thus includes preventing the loss of digital objects due to bit rot or the failure of storage media; preventing them from becoming stranded on obsolete media; and preventing their loss in obsolete, proprietary software formats.

This month Northern Illinois University Libraries, where I work, began work on a project funded by the Institute for Museum and Library Services (IMLS), investigating digital preservation options for medium-sized and smaller institutions. In submitting a request for support, my colleague Lynne Thomas (Curator of Rare Books and Special Collections at NIU Libraries) noted that many large institutions, including flagship state universities and well-endowed private institutions, have implemented long-term digital preservation plans supported by significant technical infrastructure and ongoing expenditures. This presents a challenge for institutions like mine, where we have created approximately six terabytes of digital data, as much as many large institutions, but have yet to address their long-term preservation due to budgetary constraints.

In subsequent posts I will discuss additional aspects of digital preservation, as well as the challenges and (hopefully) solutions that emerge from our project work.

This blog is reposted from one that originally appeared at http://drewvandecreek.blogspot.com/ on December 9, 2011.