What are we missing? What has our culture lost forever?

Why are we researching a solution to the digital preservation challenge?  This is a great question, mostly because we know that important objects are being stored digitally and we do not want to wait for the disaster to happen before we prepare.  Has a digital disaster of data loss already taken place though? 

The truth of the matter is that as humans we are embarrassed of our failures.  We try to hide them.  Failures to preserve our digital objects are no different, we try to sweep them under the rug with an acceptance of the fact that the data is lost for good.  Finding a story of a digital file of historical importance being lost has proved to be difficult thus far (we aren’t giving up yet).  Companies have come clean about their customer data being stolen or lost because that is the best business move and individuals come forward with their stories of computer crashes in hopes of getting help to restore their personal photos and documents.  (Side note – don’t wait to lose all your personal data to learn the lesson of backing up, start now) However, no one wants to take the heat for lost data that may be of historical importance, or maybe we do not even know that it is lost yet.   

You may be saying, what do you mean there are no stories of lost data? Plenty of photos, books, and historical papers have gone up in smoke and other disasters.  We know about much of the physical data that we have unfortunately lost, but I’m talking about 1s and 0s. The pictures that were taken only on a digital camera at a special event, saved onto a computer, and then forgotten to be lost to bit rot, viruses, or obsolescence.  The emails of a prolific author that contained messages with loved ones discussing inspiration for their books that were disregarded.  The blog posts or web pages that stirred a group of people to make change and then were not recovered when the server crashed.  Has anyone taken note of events such as these?

In some cases we have noticed a lack of digital files or data, for example the missing Bush Administration emails, the 1996 election websites, or the Geocities website service that was stopped, and then worked hard to recover as much of that data as possible before it was too late.  Tools like the Wayback Machine that archive the internet help to solve this problem as well.  In my heart of hearts, I still believe that we may be missing pieces of our history because of a failure to preserve digital objects.  Wouldn’t it be a shame to just accept it? Shouldn’t we attempt to note what is missing so that future generations can at least learn from our preservation mistakes?  What are we missing? What has our culture lost forever?

I should note that these questions are not rhetorical.  Please comment and share your stories of data that you were unable to recover, even if it was only personal. What is your data loss horror story? Also please share stories of any historical digital objects (e-mails, documents, photos, websites, music, video, etc) that you think may be lost to us, and we will follow up on your leads!

Digital preservation one-on-one

One of the (many) things happening in the Digital POWRR project is that each project partner is interviewing a group of users on our individual campuses to see how users manage their own data.

We’re asking questions about what kind of data people create in their daily work, how much of it there is, what types of files they make, how they keep their files safe, and whether they’ve ever had a massive data loss. We also ask about user awareness of campus, college, or departmental policies about how to maintain data. You can see our full list of questions on our project wiki.

This has a couple of different purposes.The first is to give us an idea of how much data is created, managed, and used on our campuses in an official capacity, and how much data is created, managed, and used on our campuses without the campus’s knowledge or intervention.

The other purpose to this is the beginning of an educational one-to-one conversation about data loss and how to manage personal workflows so that user data survives as long as possible.

The hoped-for outcome is that we will have a reasonable cross-sectional understanding of How People Work On Our Campus, which will inform our tool selections and our expectations for how much storage we need, for example.  We’re also hoping that we’ll have the opportunity to educate the folks we’re meeting with to think about the safety, security, and longevity of their data as they create it.

We are still in the middle of the survey, so I can’t give full results. But I will say that thus far, what we’re learning has been rather eye-opening. Most of the people we are talking to fully believe that they are totally on their own, and whether or not they go beyond saving their working files on their working computer seems to relate directly to whether they’ve had a massive data loss before. If they have, it’s multiple backups for them, all on their own data carriers.

Most of them are not using the campus networked resources at all, for a long list of reasons.

That, I think, should give all of us pause.

*goes back to doing interviews*


Do we like to assume that people think about digital preservation?

My oldest book as a kid was one that had been given to me by a dear friend of my grand-parents. It was a large book with a few photographs and thick pages, published in 1867. I still have it today on a shelf. I pick it once in a while to read a few lines. Years have passed for me and so have they for the computer technology, which has changed quite dramatically fast.

From the time I was doing structural engineering analysis on an old IBM machine and using 5 1/4-inch disks, I soon switched to 3.5 inch floppy disks to save my documents. But, how could I store larger files such as TIFF and JPEG, etc. and most importantly my thesis? What is called the “super-floppy” served the purpose, but will I be able to retrieve my thesis from that 100-MB zip disk today?

Comic depicting the difficulties of preserving material.

Similarly to Cathy, the comic strip in the January 2007 Digital Preservation newsletter of the Library of Congress (http://www.digitalpreservation.gov/news/2007/news_archive0107img.html), I had printed my thesis. Since then, I digitally converted it into a pdf and applied OCR (Optical Character Recognition), however I should not feel more reassured than I was before. The difference is that today I sit on the other side it seems. I am involved in digital preservation with work, which now gives me a better understanding why “during the early decades of computing, the threat of file format obsolescence to the long-term maintenance of digital objects was not widely recognized” (http://www.dpworkshop.org/dpm-eng/oldmedia/obsolescence1.html). Earlier on, my main concern was, “if I break or drop the floppy disk, it would be bad.” I think today, we take it for granted. It is saved, on the computer or not. It is saved.

It is simply different. I don’t necessarily need to store my files on a physical device such a USB x-times-GB Flash Drive or on my computer’s hard drive, except for immediate convenience perhaps, I can simply upload them to the cloud and retrieve them quite rapidly. One constraint is to have internet access.

Perhaps there is no right answer, we like to assume that people are thinking about digital preservation each time they grab a camera for example. How many of us think of TIFF or CR2 before taking some shots on the spur of the moment, which one would want to keep later? How many of us write on or do 3-D design with a proprietary software?

Granted, not everything has to be about digital preservation per se considering the multitude of data produced each day that not everything is to be kept, nonetheless can the process of getting things back be even feasible? It is not just about time and funds – is the file even retrievable?

Amazingly it could be as in the example presented by Matthew Kirschenbaum in his article, Digital Magic: Preservation for a New Era (http://chronicle.com/article/Digital-Magic-Preservation/131091/#top), which led to the retrieval of the video game Paul Zelevansky had formatted on an “old pre-Macintosh Apple II”, called Swallows (greatblankness.com). 

The Train-the-Trainer workshop launched by the Library of Congress’s new Digital Preservation Outreach and Education (DPOE) program is a great way to boost awareness of what digital preservation entails as new trainers will help disseminate that information to their local communities. 

So, is there anything wrong with keeping files on the computer, the fact that I forgot about my thesis being saved on the 100-MB zip disk in a proprietary format? Life just simply went on and I knew I had those files. I will need to get back to those sometime. Similar examples are the use of Corel WordPerfect in legal practice, which causes a readability gap with Microsoft Word (http://law.duke.edu/actech/wordprocessing/) or retrieving files from another proprietary type of format such as VICON Corporation.

Of course, files are backed up for one’s needs and depending on the field of study, such as in medical research, where the outcomes of files will depend on the ‘data use agreement’ or in computer programming where files will be obsolete after few years, what does digital preservation mean? Will it be the same as keeping a book on a shelf? A computer hard disk failure may be the end of one’s priceless data.

What keeps YOU up at night?


As Lynne mentioned in her previous post, the entire Digital POWRR team, from all 5 partner institutions, got together for a marathon-style meeting in August. There was angst. There was gnashing of teeth. There was A LOT accomplished!  One item on our agenda was to make a laundry list of questions and concerns about digital preservation that we want to ask the members our Advisory Group in October. (If you know one of our Advisors, please do not warn them of this impending onslaught…we are hoping to have them all show up at the meeting!)

These are the questions that are keeping members of our team awake at night!

You see, our team is made up of archivists, curators, historians, and librarians. We are all responsible, in one way or another, to make sure that all objects entrusted to our institutions’ care are protected, preserved, made accessible, and passed safely on to the next generation.

And that includes those pesky digital objects.

You know, the ones that show up at your Archive carefully tucked away inside the deep recesses a 5 1/4″ floppy disc.

Or the ones that are housed inside the hard drive of an author whose family just bequeathed all of her valuable “papers” to your institution. Oh and by the way, she liked to write her papers in EasyWriter and that hard drive is in a 1979 Apple II.

Some of those pesky digital objects that are important to your institution might be in the form of research datasets created by a faculty member on your campus. They are stored on an external hard drive, located in his office, in a proprietary format, and he thinks that because they are digital, they will last forever!

But you know better.

We know better.

And that is why it keeps us up at night.

New website under construction!

Welcome to the new Digital POWRR website!  Please excuse us while we get up to date with our information and design at this address.  If you’re looking for current information please visit our up-to-date website.

Lessons Learned #1: The First Stages of Grant

One of the goals we’ve committed to in our IMLS grant is transparency about the lessons that we are learning along the way. This is the first in what we hope will be a series of “lessons learned” posts that will talk about our experiences.

Lesson 1: It will take longer to hire your grant-funded position than you think.

  • It took us 6 months to hire Jaime, our Project Director. Fortunately, the IMLS assured us that we are right on target and that they were expecting that kind of timeline.

Lesson 2: Administrivia, especially as you get rolling, will eat a lot more of your time than you anticipated (a good friend of mine likens it to being nibbled to death by ducks).

  • Figuring out logistical issues in terms of how to pay for concrete things from a grant, particularly when the grant’s payout system is based upon pay-first-then-be-reimbursed, rather than “here’s your big fat check; go get ’em!” takes time and patience from everyone involved.
  • Figuring out how to incorporate grant management practices and requirements for this particular grant and this particular granting agency into our library’s current business procedures also eats time. Administrative things like who signs what, who sends which form to whom, and in which order changes based on the current requirements, the granting agency, and more. Thus, even when you have an experienced Primary Investigator on board, it still feels like starting from scratch (which it rather is, in many cases).
  • Questions like “What is the standard per diem I will receive when attending X conference in Y location?” can get very complicated answers.
  • If your grant is working with other people in any way for research purposes (i.e. doing a survey), you will likely need to go through an Institutional Review Board training and application process.

Lesson 3: There is, despite our best efforts, really no replacement for an in-person meeting.

  • Our first in person meeting once Jaime was on board of the entire grant team got more accomplished in a single session than we did in the nearly 6 months of conference calls that we used to keep in contact before her hire.
  • In-person meetings work best with a strong agenda and a strong moderator. Time limits for discussion keep things moving, too. (We knew this, and that’s why our in-person meeting was so successful).
  • Lots of caffeine helps. So do breaks for food that are purely social. After all, we were all willing to get into this because we wanted to work with each other. 🙂

Lesson 4: The first time you put together your project timeline, you will likely scare yourself.

  • Our in-person meeting involved us looking at our goals, and then backtracking to figure out what we needed to do when to achieve them. Putting it all together in a timeline reinforced the urgency of the project, moving us firmly from a mindset of “we have two whole years!” to “we only have two years!”

Lesson 5: It’s okay. Do it anyway. Even failure is an option.

  • Fear can be a great motivator.  Make use of it.
  • One of the beautiful things about IMLS National Leadership Grants is that they have the possibility for failure built right into their DNA. The whole point of these grants is to take risks, to experiment, to try something new. Which may not work, or may not turn out how you think it will.
  • The real outcome of these grants is the lessons that you learn, and the skills that you gain, in the trying and the testing, regardless of final outcome.

Long-term Preservation of Digital Objects

Historians producing digital objects in the course of their work may be unaware of the need to provide means for their long-term preservation. Although many of us have assumed that digital objects are eternal, they are in fact subject to degradation and decay like any other artifact.

Should digital objects stored on a single server, CD, or DVD succumb to the process of “bit rot” (http://en.wikipedia.org/wiki/Bit_rot), they may be effectively compromised or even lost. Damage to storage media like CDs and DVDs can also compromise or eradicate data, as can the catastrophic failure of a server, or even a single disk drive in an array.

In the case of materials digitized from analog formats, the only recourse would be to re-scan them. Born-digital objects could only be recreated in toto.

In addition, digital objects created and stored on specific media may one day become unusable. For example, I presently have in my desk drawer approximately ten or twelve 3 1/2″ floppy disks containing notes for, and drafts of, my dissertation. I know of no computer on which I can open them.

Another, hypothetical, example concerns earlier materials that I created as an undergraduate student in the mid-1980s. These are of course long gone, but the point I am trying to make concerns the fact that I created them in a program called Word Perfect. If I had been prescient enough to migrate these files forward from the original 5 1/4″ floppy disks into newer formats, I would still be hard-pressed to find a machine with a copy of Word Perfect, able to read twenty-five year old files, installed on it.

The task of digital preservation thus includes preventing the loss of digital objects due to bit rot or the failure of storage media; preventing them from becoming stranded on obsolete media; and preventing their loss in obsolete, proprietary software formats.

This month Northern Illinois University Libraries, where I work, began work on a project funded by the Institute for Museum and Library Services (IMLS), investigating digital preservation options for medium-sized and smaller institutions. In submitting a request for support, my colleague Lynne Thomas (Curator of Rare Books and Special Collections at NIU Libraries) noted that many large institutions, including flagship state universities and well-endowed private institutions, have implemented long-term digital preservation plans supported by significant technical infrastructure and ongoing expenditures. This presents a challenge for institutions like mine, where we have created approximately six terabytes of digital data, as much as many large institutions, but have yet to address their long-term preservation due to budgetary constraints.

In subsequent posts I will discuss additional aspects of digital preservation, as well as the challenges and (hopefully) solutions that emerge from our project work.

This blog is reposted from one that originally appeared at http://drewvandecreek.blogspot.com/ on December 9, 2011.

Untying the Digital Preservation Knot

Meg pointed out the whys and wherefores of digital preservation in her last post.

I’m supposed to talk about how.

Of course, since the point of the IMLS grant is to help figure out the “how” for libraries with fewer resources, I’m not going to have all of the answers just yet. We have some ideas about the how, but until we run through all of the scenarios with people who know more than us, it won’t be very useful.

I have a metaphor I’d like to play with a little bit, if you’ll bear with me. I’m a novice knitter, and so thinking about process in this way is helpful to me. Dealing with digital preservation feels a lot like untying a very messy knot in someone else’s knitting.

It’s as though we’re being handed a portion of a partially- knitted, really quite complicated lace scarf, a very messy ex-ball of yarn that a kitten has gotten at,  some knitting needles which may or may not be the right size, no pattern, no instruction, no stitch markers, and being told to figure it out, and go on from here. (This is the data and digital objects sitting on servers across my campus). Some of it has been worked, some of it has been relatively ignored, and on occasion, some damage has happened through benign neglect (the kitten is bit rot).

Now, we have some literature that can serve as a guidebook when we get to the part where we can learn new skills and keep knitting the scarf. But there’s still a lot of groundwork to be done before we get to that stage. We need to figure out what we have: what kind of yarn? what sized needles? what kind of stitches make up the pattern? Can we find a similar pattern elsewhere, or do we need to reverse-engineer it from what we see?  Do we have the right tools for this pattern?

Thinking about digital preservation feels a lot like that.

When I think about long-term access to digital objects, I think about it in terms of pulling out yarn knots. More fundamentally, before we can keep knitting, we need to untie all the knots made by that kitten. There may be enough yarn there to move forward, but it’s unworkable until we can pull at the individual threads, untie knots, separate parts from one another, and smooth things out to make a workable ball of yarn.

If I pull on the server space knot, does personnel that can help come with it, or is that a different portion of the thread? Can I pull out particular objects (knots) without making other knots (different file types) more difficult to deal with? Where can I find a knitting instructor (expert) or a book (documentation) or a friend who can explain a new technique (experienced end-user)? How will I know when I’ve mastered a particular stitch (technique or product) enough to incorporate it into my pattern without errors? Even if I learn that stitch, does it complement the scarf pattern (data) I already have?

Would it be more effective to rip back the other person’s work, and begin again with a simpler pattern? Would doing so give the same effect, or does it fundamentally change the scarf too much? As I pull one knot, the yarn often tightens in other areas. Will I be able to unravel as I need to? Do I need to change my approach?

Eventually, as a knitter, to learn a new skill, you just have to sit down and do it. You will probably mess it up quite a few times. But, with some help, and good humor, you will come out the other side with a scarf you can live with. Even if it doesn’t look exactly like the pattern on the page.

Digital preservation is kind of like that, too. What matters is the result (long-term access to the data we create). The process of getting there may not be pretty, and it may not look as nice as the very experienced knitter’s work next to yours, but it still keeps your neck warm.

So here’s to the first attempts to unravel the knot.

Why you should care about digital preservation (DP)

Many people assume that digital files are superior to analog formats. Who doesn’t like to search for what they need online and have access to it instantly? Surely the market will keep up with our demand for access! But who is providing that content and how much will you have to pay each time you use it? As curators for evidence of our cultural heritage, we owe it to our institutions, patrons AND our future selves to think through DP (digital preservation) issues as soon as a digital object is created.

On the surface, the superiority of digital content seems obvious. We often hear that digital files take up less space, cost less to create and use fewer natural resources to store more data. Besides, everyone and everything is going digital….how could it be a problem for our future? I’d argue that the perceived strengths of digital storage are also their weaknesses:
1) Less space—the danger with this idea is that it leads us to think we can keep everything. Also, our need to purchase lots of storage space may lull us into thinking it’s okay to purchase cheap storage. Buying cheap storage will make system failure more likely. Relying on “free” or cheap cloud space subjects our collections to the will of commercial hosts’ practices of data mining our content or pulling the rug out by closing the site altogether.
2) Costs less—costs are hidden. The number one resource digital things need is the thing most people don’t usually think about: people. People power is needed to create digital files that are usable and identifiable now and 50 years from now; additionally, some person has to set up, and possibly initiate, frequent migration of file formats. Good DP systems also have subscription or purchase costs.
3) Uses fewer resources—saving paper or not printing photographs does save the environment from the chemical processes needed to create the material, but if something goes wrong and a file becomes corrupt (called “bit rot”), some knowledgeable person must intervene and find or create a replacement.

The latter point, about file replacement, currently has a technological solution: multiple copies of the same file can be shared in a DP system and examined for bit rot. When a file becomes corrupt, the system can be designed to automatically call up one of the other file locations and create a new copy. But all of this is dependent on the ability of people to create the object in a migrate-able format to begin with and include enough information in the item so that people far into the future will be able to find it and understand it.

Finally, think about how fast digital file formats change. As cultural heritage curators, we do not have the luxury of stuffing digital media into an archivally sound box and walking away.If we ignore it, it will go away. Source: http://geekandpoke.typepad.com