DuraSpace and Artefactual have joined forces!

The POWRR Team is excited to share this news with those looking for a hosted, soup-to-nuts, digital preservation solution:

DuraSpace and Artefacual have joined forces!  Check out the news release at the link below:

DuraSpace and Artefactual Partner to Offer New Hosted Service

Post-workshop question: How to quantify projected data growth

In response to a question from her supervisor, a recent workshop participant asked how to guesstimate the amount – in numbers – of data she would need to store per week.

An illustration of the rapid growth rate of data with storage growth trends following

An illustration of the “rapid growth rate of data with storage growth trends following” captured from xzbackup.com

My reply was that it’s hard to estimate amounts of new material you might need to store in the future until you decide what you’re preserving. Selection is the unsung hero, in my POV, of any kind of preservation. We simply must decide what we are willing and able to keep. But where to start?

An inventory of what you’re currently responsible for is widely recommended and very helpful. After that, it will be useful to note exactly what in your inventory is most at risk. Data accumulation rates that will matter most to administrators will depend on what you decide you must preserve at full bit-level and what can live happily, for at least some time, in basic offline and (don’t forget this part!) geographically distributed storage locations.

Here’s an example of this decision making principle from my world:
Most of the material that my library has digitized, I’m comfortable NOT assigning to the queue for bit-level preservation. There are only a few objects that I digitized because the media they were on was so out of date that it was inaccessible or the originals were too fragile. These things that were truly digitized to preserve their current state AND intellectual value are a higher bit-level preservation concern for me, but they are also in a minority of my digital holdings right now.

Paper material like yearbooks, honors theses, faculty meeting minutes and the student newspaper are all really useful as searchable digital objects, and I want to protect the investment we made in scanning (some outsourced, some not), but I have all the originals and will not be discarding them so I’m not concerned with moving the digital versions into a preservation system right away. Those things will be just fine in an offline storage location until we decide if we want to pay extra just to protect that initial investment. Some born-digital documents that are worth keeping digitally (due to importance of content AND value of keeping in keyword searchable format), stay that way. But a lot of the messages that are important to keep (e.g., brief meeting records and emails from administrators about policy changes) do not have attributes that make them worth keeping digitally, so I print them.

On the other hand, my campus has over a decade worth of digital-only campus photographs, all in jpg, and our new content management system allows individual departments and organizations to post their own photos to their own pages. Those things are unique and being created by people who couldn’t care less about high-res formats because they’re just out there doing their jobs and trying to attract people to our school or showcase their achievements. Additionally, our major events (commencement, several all-campus convocations and colloquia, and our sporting events) are now only being captured live and streamed through a subscription service. I have decided that these born-digital media are more at risk both because of how they are created and by the lack of consistent metadata they are being created with. Therefore, they’re higher up in my queue for prioritizing preservation actions that include enhancing metadata and monitoring format migration needs in an automated environment.

Note that they are also not “library” materials, so they’re going into my this-is-a-common-good-and-therefore-a-shared-cost-responsibility argument 😉

I use the word “queue” because we have no subscription to an automated bit-level system yet. But by separating things out in this way, I have a smaller amount of somewhat-regularly-added-to types of content I can guesstimate based on past collection practices. I had success in getting on a regular transfer schedule of the streamed media because IT is in charge of monitoring that service, and they are now giving me an annual deposit of everything that was streamed the previous year. I’ve got a tougher chore with people posting things to their own web pages. However, IT is trying to communicate the value of using Flikr accounts to manage these files, and if people do start following that advice, I’ll be able to quantify that data because IT passes out logins to our campus Flikr subscription. That way I’ll get a glimpse of what people are doing and can start educational outreach with those dept/orgs about improving their creation and description practices.

Frankly, people who are doing their own thing outside of campus-related programs/services are, in my opinion, on their own. It sounds harsh to say it that way, but I can’t save what I don’t know exists and don’t have access to!

What can we reasonably accomplish?

In a previous post about acquiring digital content, Stacey mentioned that we often “take it all in, the good shepherds that we are. We build systems and websites that can do nifty things.” Stacey’s post was a cautionary tale, and others have expressed these concerns, too. Now I’ll add my voice to it.

I’m a firm believer in being practical about what I attempt to do and honest with others about what I can’t. In my case, that means I won’t “take it all in,” and for what I do take in, I’m critical about what I will keep in electronic form.

A post I read last year on the Society of American Archivists’ listserv for Lone Arrangers (i.e., people who work in archives and have no other full time staff assigned) touched on this topic. The post was about how lone arrangers were managing email preservation and many products were mentioned and then this came from an archivist who also teaches archives management courses:
“The archiving email question comes up all the time, and I have a stock answer. I tell my [X university] preservation students to be bold, if they have to, and keep paper. Yes, paper. Print it out, attachments included, stick it in a folder, and forget about it.

“My motto as an archivist, lone arranger and preservation teacher is, ‘Don’t sign up for the impossible.’ If big institutions are working hard and spending more to sustain their email archives, we little guys ought to be asking ourselves why. That way, we’ll have the answers when the administration comes to us and says to start archiving email.”

I couldn’t agree more! From my POV, it’s all about choosing where you are going to invest your time. I’m lucky to work in a private institution, so I am not subject to all the public records requirements some of my colleagues are, but I think there are larger issues at stake and as a profession I think we do need to start pushing back a bit.

Our users (and the people/agencies that “mandate” things of us) don’t have reasonable expectations when it comes to digital objects. They think these things exist in tangible forms because they can see them before their very eyes, but the underlying code is anything but tangible. And the way objects are created and served by our users makes a lot of what we might capture not really worth saving, according to “best practices” (thinking of those 72dpi jpgs I was sent awhile ago).

For us to capture objects and make them meaningful over time, we have to impress on the people who create them and on the people who choose the systems our users operate in, that standards exist for a reason. A printed piece of paper is not the flashiest use of the latest new technology, but as long as the paper and ink last, and as long as the language/symbols printed on paper hold meaning, it can be conveyed over time!

Digital/Online Materials and their Place in Historical Scholarship

A post by Drew VandeCreek

At the recent meeting of the American Historical Association in Washington, D.C., I made a presentation as part of a discussion session (i.e., not a regular panel – we sat in a circle and talked after very short presentations made by people sitting as part of the circle) exploring digital materials, ranging from blogs and web sites to social media, and the questions that they raise as scholars begin to make use of them as primary sources. Other presenters talked about the future of MOOCs and crowd-sourcing the search for elusive information about a relatively obscure historical figure. I discussed the work of the Digital POWRR project and the challenges presented by the fact that digital objects are generally subject to loss in the relatively short term due to a number of reasons, including hardware and software incompatibility and the degradation of storage media.

One major question that emerged in the discussion was the status of social media materials and other online, digital sources in light of the fact that they are so prone to loss. One presenter at the preceding panel (our discussion group was part of a linked set of two events) described how she had based her work on Pakistani women in part on a web site that no longer existed, apparently because of hacking activities undertaken by parties believing that Pakistani women should not express themselves in this format. The presenter said that she had printed out the sites pages for her own record and thus could document her use of the source. But this made me wonder about the future practice of history.

So, what of digital sources like blogs, web sites, and social media objects like tweets? Digital objects’ intrinsic frailty and the complex, easily disrupted nature of the internet used to present them make them fundamentally unreliable as primary sources, at least by the standards developed for the use of analog/paper media materials.

It seems to me that although history is certainly not a science in any way, historians are similar to scientists in at least one regard. Much like a scientific discovery can only be accepted and confirmed as other practitioners are able to repeat the experiment and yield the same result, historians are accustomed to being able to lay their hands on a paper source cited in a footnote. Manuscripts are usually unique items, but if one travels to the archive and looks in the box and folder number cited, the item will be there. There may be a very small number of copies of a book, but if one is willing to make the trip to the right library, the book will be there. Historians will of course debate a scholar’s reading of a source, but the existence of the source itself is fundamental to the discipline. If the item is not there, practitioners may rightly begin to ask questions about the legitimacy of a work citing it.

Many of the participants in the AHA discussion emphasized the need to preserve online digital materials as fully as possible. I certainly concur. But a whole host of problems, not the least of which is the considerable expense involved in the curation/preservation of digital materials, make this impossible. We will have to face that fact that a considerable amount of online digital objects that future historians may want to use as evidence will simply disappear.

In this situation, several questions occur to me: How will we evaluate work citing online materials that are no longer existent? What if scholars relying on such missing evidence can produce a print-out or other facsimile of the materials? Can we distinguish cases of vanished evidence in which legitimate facsimiles exist from cases of academic fraud?

A post by Drew VandeCreek

An e-records’ transfer tale

In mid-December I received my first-ever completely electronic records transfer from a student organization. The group’s faculty advisor attended two of my campus presentations this year and followed up with a request for a one-on-one meeting to talk about their specific kinds of records. Before the end of the semester a student leader of the group sent me an email with attachments of five photographs from their biggest event of the year plus a scanned version of the event poster and a word document with the names of people pictured, event location and date.

Very exciting!

There were some problems with them not following the naming conventions I recommended, but since I was handling the accession on an item-level basis this wasn’t a big deal. I congratulated the student on being our first fully electronic donation from a registered student organization and thanked her for her efforts. Then I examined the photo files more closely 🙁

The images were 72dpi jpgs and when I tried upsampling they became fuzzy.

In a series of follow up email exchanges, the student told me the faculty advisor had taken the images with an iPhone5, and the advisor told me he hadn’t made any setting changes and the pictures in the phone were a large enough size. We assume the iPhone compressed them when he sent the images from his phone to the student. That close to the end of the semester we never completed the transfer.

{sigh} Something else to face in the busy time at the beginning of the new year.

Lesson learned: don’t accept emailed pictures at face value. Both of my constituents knew the digital object and metadata requirements I requested and thought they were in compliance, but the transmission was scrambled along the way. The old adage of “trust but verify” is still relevant!

ANADP II Part 10: Closing Plenary by Adam Farquhar

Trends and Impact

Farquhar’s (British Library) work has 3 strands: BL team, Planets project, Open PLANETS foundation

  • High-value tech & practitioner exchange, Sustainability New CEO (Ed Fahey)
  • Dataset/Datasite (BL): Research infrastructure & capacity
  • Digital Scholarship: new services, use digital content in new ways.

On to the meeting:

Intellectual honesty is our frame! (Follow the IIPC example: BE NICE!)

  • Inclusion of service providers/vendors–how do we make them welcome?
  • Structure: some action sessions were more project based
  • Gap: Do we have: a consensus on what it means to align? Single clear global voice + national voice? Right folks to influence national legislation?
  • We have underestimated the amount of work it takes
  • Smaller scale collaboration across national boundaries
  • Improved use, shared maintenance = Big impact
  • It’s super-important to handle the legal stuff
  • Organizational axis!
  • Standards: choose wisely. Standards should follow practice.
  • Few discussions about technical underpinnings here
  • Focus on cost, rather than value
  • Bottlenecks are interesting.
  • Challenge: Can we take 50% off the costs of DP?
  • Education and training
  • Alignment — reducing variation & redundancy
  • Don’t lose the voice of evaluation
  • Interdependence: threat or menace? It must be carefully managed.

Trends!

  • Non print legal deposit & regulations for data management plans
  • Shift to business as usual: operational budgets & teams, capital investment in digital infrastructure
  • Often with external service providers
  • Shift to born-digital
  • Greater scale
  • New usage pattern: from single items to dataset(s) analysis
  • Architecture needs to be constructed for these new use patterns
  • Digital library architectures will feel very 1990s soon.
  • OAIS may need a re-think in light of this use
  • Assume more/everything gets looked at! Implementers will need to think differently
  • Reduced funding, growing market problem: Not only a memory institution. Spreading in importance! Personal Digital Archiving solutions create additional pressure for 30 year access.
  • Open thinking about the role of vendors/service providers: opportunity to drive down costs
  • Shift from project thinking to infrastructure funding. Infrastructure can be invisible: we don’t want to disappear.

What’s next?

  • We can be effective … or cheer
  • Legal: (mostly cheering)
  • What can we learn from RDA? Should we hang our coat on that hook? Join an interest group?
  • Worry about our loss of identity/community? It’s an interesting structure, the working group model
  • Education & Training! The economics of it. Think about cost sinks? Engaging more broadly may cost more money, but it’s worth it.
  • Our message & coherence: we need to communicate our consensus messages
  • SCOPE: we’ve been drawing our boundaries too tight
  • This is a scary problem! We narrow things down & put out boundaries to make it less so.
  • Soup-to-nuts handling needs sorting out.
  • So does selection & access

And with that, we were sent out to try to change the world of DP. 🙂

 

ANADP II part 9: Winning Poster Talks, Current Opportunities for Collaboration

Poster sessions at ANADP II were part of a contest. The three winners were Paul Wheatley (COPTR), Cal Lee (UNC), and Neil Grindley.

Each was asked to do a lightning talk, and then we moved into discussing current opportunities for collaboration.

Neil Grindley: 4Cproject.eu Digital Curation as an investment– realizing the value of assets via curation.

Paul Wheatley: Community Owned Digital Preservation Tool Registry: COPTR. [NB: Digital POWRR has contributed all of our Tool Grid information to this project, and we’re working to help make the wiki tool more dynamic. Our goal is that the information in the registry gets spit out in a format that resembles our tool grid.]

Comment from Wheatley: We need to build to fail SLOWLY, so that we have time to recover. (Yes, we need to assume that we will fail. ALL technology does so, eventually.)

Cal Lee: Digital Forensics for Digital Preservation (BitCurator)

Current opportunities for collaboration

 Research Data Alliance:

  • Community organization.
  • 21st century science is global; so is the data infrastructure.
  • The internet as model: interactivity, exchange across networks. Community consensus.
  • RDA Colloquium = funding agencies
  • Interest groups propose. Working groups deliver.

International Internet Preservation Consortium (IIPC):

  • Preserving the web of today for 50-100 years later
  • Internet Archive + 46 members
  • Building progress
  • Tool –> Community –>Collection

Educopia / Katherine Skinner

  • Consulting — Events — Research
  • Interdependence means less chance of human failure (i.e. people leaving, etc.)
  • ICONC project–NDSA’s review of the last 3 years
  • SCAPE: Scalable Preservation Environments

Future opportunities for Collaboration

Google Doc of action items from the conference available

Oya Rieger (Cornell U / ArXiv)

  • Alignment –> cohesive values
  • Sustainability is the capacity to endure. It’s not just money, but social & political will…
  • Digital literacies: how do we survive/thrive in digital culture?
  • Assessment & Outcomes: small bites are less overwhelming
  • Study: 20-25% of ejournals collected (NOT published, COLLECTED) are being preserved right now
  • History of dependency on grant funds means we need to place more emphasis on new organizational models, embedding DP in all areas of the library
  • Registry: she worries about registries. They get forgotten. What about 21st c. methods of spreading info? MOOCs?
  • Collaboration: it’s a very demanding process. It requires good interpersonal and teambuilding skills. The research data community is coming together and bringing in partners. How do we get in on that?
  • We want more about use, usability, access, and discovery.
  • Want more about open access and scholarly communications

Jeremy York, Hathi Trust:

  • Issues: Enormity of the work vs. small number of people doing it.
  • Funding infrastructure with grants = bad plan
  • Specialization of function
  • Preservation-in-place vs. offsite
  • Succession of content?
  • Alignment of goals and diverse voices
  • Imagine: participatory stacks across sectors
  • Broad technical and human infrastructure that allows digital preservation to happen among other things
  • Focus on functions across sectors and we’ll get there
  • Infrastructure technical & human IN education curricula (English, History, etc.)
  • Digital literacy AS infrastructure
  • We need to share information about our practices
  • Make educational resources for the public
  • Expose our data

Up next, the closing plenary.

ANADP II Part 8: Action session, Assessing Training

This is the action session that followed the panel discussion of training opportunities and programs on day 2 of ANADP II. It asked us to brainstorm the kinds of information we would want available to us when searching a “master database” or coordinated website of digital preservation training opportunities. If we had one-stop shopping, how would we shop?

Here’s what we came up with:

  • Levels: Beginner, Intermediate, Advanced
  • Content: What is addressed? Bit level DP? Emulation? Advocacy?
  • Audience: Is this training aimed at administrators? Managers? Practitioners?
  • Format: What is the delivery method? In person? Webinar? Hybrid of the two?
  • Length: How long is the training?
  • Is it broken into modules? Intensive? Are there prerequisites? Is it part of a series or a standalone?
  • What’s the cost? Is it free? Is there a fee?
  • Who are the instructors, and what are their bonafides?
  • Who is sponsoring or backing this training? Are they an accredited organization or a vendor?
  • Technically what’s being covered? Organizationally? What’s the policy orientation?
  • Is it national? International?
  • What language is it conducted in?
  • What references is it using?
  • When was it last updated?

We talked about RIDLS criteria. Describe, review, assess information literacy training interventions in higher education. Information literacy is defined as handling research information in a research capacity (i.e. not just FINDING the information, but handling it throughout its lifecycle). It’s not just about librarians, but about other outlooks, geared towards designers and providers of information.

The notion is that a training database could be developed along the lines of RIDLS criteria.

Thus endeth Day 2.

ANADP II Part 7: Building Capacity / Capacity Alignment

Building capacity / Capacity alignment

Joy Davidson, DCC; Martha Whitehead, Research Data Canada, Laura Molloy, HATII, Mary Molinaro, U Kentucky/DPOE/NDSR

  • DPOE train the trainer
  • One size does NOT fit all.
  • Less about formal education, more about what’s happening on the ground
  • HATII: DIgCurv, BlogForever, OPEN PLanets, DCC (UK)
  • 2010 Training Needs Assessment Survey (Nancy McGovern)–ties into DPOE
  • NDSR and DPOE: 2 sides of the same coin.
  • DPOE: Practictioners. Train-the-trainer. Regional. Participants then go on to train others.
  • NDSR: New grads, residency, Cohort-based, Field experience.
  • Questions of scalability– small institutions at the end of the road?
  • DPOE: Conversations in the digital preservation community & Library of Congress & 2010 Assessment Survey.
  • Challenges: Lack of awareness, lack of policy & planning, limited educational resources, lack of funding
  • Practitioners want: 1 stop, hands-on, near home, in person, half-to-1-day workshops
  • Challenge: How do we do that? Train the trainer to build network, Share curriculum. Train at *audience* level. Cost is $1-1.5K per trainee, roughly. Volunteer instructors.
  • NDSR: DC, NY, Boston cohorts, with goal to expand internationally, create standardized practice. Needs ongoing support–from the private sector? National training associations? Vendors?
  • Research Data Canada (RDC) education: Strengths & Challenges! Uneven funding & research infrastructure for creators, stewards, users.

Lynne: One of the major things to come out of this session was the dangers of being on “soft” funding. Lots of projects that come and go in 1-5 year cycles, which makes continuity for training and practice VERY difficult to maintain. Also, with so much volunteerism required to function, there will be attrition when budget cuts happen, particularly for travel.

 

An (almost) Graduate’s Perspective

During the graduation ceremony this weekend I doubt many students will be thinking about how long their digital objects will be around to tell the tales of their accomplishments. Their minds will be reeling with thoughts of, “I’m so glad to be done.”, “That was so much work!”, “I hope I get a job right away.”, or maybe just “Let’s get this over with so I can get home and NOT work on writing any papers.” What many of them don’t know (but hopefully some of them do) is that there is a whole team of people planning and working towards making sure their important digital objects have the ability to be and are preserved.  I’m happy that I will be among those students this weekend, but it is bittersweet to be ending my time working on this team.

Working with this powerful POWRR team has been a wonderful opportunity. I’ve learned so much about digital preservation and while I don’t plan on directly working in the digital preservation field I will be sharing the knowledge with those I encounter. I know that the work this team completes in the course of the grant (and beyond) will greatly contribute to the digital preservation community.  Let me tell you they are working really hard on this!

While my personal contribution to the project is small, by booking reservations, updating websites, taking minutes, doing research, documenting testing processes, crunching data, etc, I’m so happy to have been a part of it. I began my time as a graduate assistant on the project with an open mind and now I understand many of the risks involved in not preserving material, the different standards that preserved documents must live up to in order to survive, the numerous different tools that are available to make preservation progress and the fact that any program or process is always going to be a little bit more complicated that it sounds. In those cases where technology is giving you a rough time it is best to keep at it, and grab a friend to help you through the problem.

In conclusion, digital preservation is important today and it will be important tomorrow…and everyday. If you know nothing about digital preservation, that’s totally okay, but decide that its time to learn the basics and check out this awesome DP 101 page.  If you already know a bunch and want to start taking steps toward helping with a solution you can also go to the 101 page (there’s an advanced section just for you), or you can learn more about the tools available by checking out the magnificent Tool Grid.