Jan 29 2014

Digital/Online Materials and their Place in Historical Scholarship

A post by Drew VandeCreek

At the recent meeting of the American Historical Association in Washington, D.C., I made a presentation as part of a discussion session (i.e., not a regular panel – we sat in a circle and talked after very short presentations made by people sitting as part of the circle) exploring digital materials, ranging from blogs and web sites to social media, and the questions that they raise as scholars begin to make use of them as primary sources. Other presenters talked about the future of MOOCs and crowd-sourcing the search for elusive information about a relatively obscure historical figure. I discussed the work of the Digital POWRR project and the challenges presented by the fact that digital objects are generally subject to loss in the relatively short term due to a number of reasons, including hardware and software incompatibility and the degradation of storage media.

One major question that emerged in the discussion was the status of social media materials and other online, digital sources in light of the fact that they are so prone to loss. One presenter at the preceding panel (our discussion group was part of a linked set of two events) described how she had based her work on Pakistani women in part on a web site that no longer existed, apparently because of hacking activities undertaken by parties believing that Pakistani women should not express themselves in this format. The presenter said that she had printed out the sites pages for her own record and thus could document her use of the source. But this made me wonder about the future practice of history.

So, what of digital sources like blogs, web sites, and social media objects like tweets? Digital objects’ intrinsic frailty and the complex, easily disrupted nature of the internet used to present them make them fundamentally unreliable as primary sources, at least by the standards developed for the use of analog/paper media materials.

It seems to me that although history is certainly not a science in any way, historians are similar to scientists in at least one regard. Much like a scientific discovery can only be accepted and confirmed as other practitioners are able to repeat the experiment and yield the same result, historians are accustomed to being able to lay their hands on a paper source cited in a footnote. Manuscripts are usually unique items, but if one travels to the archive and looks in the box and folder number cited, the item will be there. There may be a very small number of copies of a book, but if one is willing to make the trip to the right library, the book will be there. Historians will of course debate a scholar’s reading of a source, but the existence of the source itself is fundamental to the discipline. If the item is not there, practitioners may rightly begin to ask questions about the legitimacy of a work citing it.

Many of the participants in the AHA discussion emphasized the need to preserve online digital materials as fully as possible. I certainly concur. But a whole host of problems, not the least of which is the considerable expense involved in the curation/preservation of digital materials, make this impossible. We will have to face that fact that a considerable amount of online digital objects that future historians may want to use as evidence will simply disappear.

In this situation, several questions occur to me: How will we evaluate work citing online materials that are no longer existent? What if scholars relying on such missing evidence can produce a print-out or other facsimile of the materials? Can we distinguish cases of vanished evidence in which legitimate facsimiles exist from cases of academic fraud?

A post by Drew VandeCreek

Dec 30 2013

An e-records’ transfer tale

In mid-December I received my first-ever completely electronic records transfer from a student organization. The group’s faculty advisor attended two of my campus presentations this year and followed up with a request for a one-on-one meeting to talk about their specific kinds of records. Before the end of the semester a student leader of the group sent me an email with attachments of five photographs from their biggest event of the year plus a scanned version of the event poster and a word document with the names of people pictured, event location and date.

Very exciting!

There were some problems with them not following the naming conventions I recommended, but since I was handling the accession on an item-level basis this wasn’t a big deal. I congratulated the student on being our first fully electronic donation from a registered student organization and thanked her for her efforts. Then I examined the photo files more closely :-(

The images were 72dpi jpgs and when I tried upsampling they became fuzzy.

In a series of follow up email exchanges, the student told me the faculty advisor had taken the images with an iPhone5, and the advisor told me he hadn’t made any setting changes and the pictures in the phone were a large enough size. We assume the iPhone compressed them when he sent the images from his phone to the student. That close to the end of the semester we never completed the transfer.

{sigh} Something else to face in the busy time at the beginning of the new year.

Lesson learned: don’t accept emailed pictures at face value. Both of my constituents knew the digital object and metadata requirements I requested and thought they were in compliance, but the transmission was scrambled along the way. The old adage of “trust but verify” is still relevant!

Dec 11 2013

ANADP II Part 10: Closing Plenary by Adam Farquhar

Trends and Impact

Farquhar’s (British Library) work has 3 strands: BL team, Planets project, Open PLANETS foundation

  • High-value tech & practitioner exchange, Sustainability New CEO (Ed Fahey)
  • Dataset/Datasite (BL): Research infrastructure & capacity
  • Digital Scholarship: new services, use digital content in new ways.

On to the meeting:

Intellectual honesty is our frame! (Follow the IIPC example: BE NICE!)

  • Inclusion of service providers/vendors–how do we make them welcome?
  • Structure: some action sessions were more project based
  • Gap: Do we have: a consensus on what it means to align? Single clear global voice + national voice? Right folks to influence national legislation?
  • We have underestimated the amount of work it takes
  • Smaller scale collaboration across national boundaries
  • Improved use, shared maintenance = Big impact
  • It’s super-important to handle the legal stuff
  • Organizational axis!
  • Standards: choose wisely. Standards should follow practice.
  • Few discussions about technical underpinnings here
  • Focus on cost, rather than value
  • Bottlenecks are interesting.
  • Challenge: Can we take 50% off the costs of DP?
  • Education and training
  • Alignment — reducing variation & redundancy
  • Don’t lose the voice of evaluation
  • Interdependence: threat or menace? It must be carefully managed.


  • Non print legal deposit & regulations for data management plans
  • Shift to business as usual: operational budgets & teams, capital investment in digital infrastructure
  • Often with external service providers
  • Shift to born-digital
  • Greater scale
  • New usage pattern: from single items to dataset(s) analysis
  • Architecture needs to be constructed for these new use patterns
  • Digital library architectures will feel very 1990s soon.
  • OAIS may need a re-think in light of this use
  • Assume more/everything gets looked at! Implementers will need to think differently
  • Reduced funding, growing market problem: Not only a memory institution. Spreading in importance! Personal Digital Archiving solutions create additional pressure for 30 year access.
  • Open thinking about the role of vendors/service providers: opportunity to drive down costs
  • Shift from project thinking to infrastructure funding. Infrastructure can be invisible: we don’t want to disappear.

What’s next?

  • We can be effective … or cheer
  • Legal: (mostly cheering)
  • What can we learn from RDA? Should we hang our coat on that hook? Join an interest group?
  • Worry about our loss of identity/community? It’s an interesting structure, the working group model
  • Education & Training! The economics of it. Think about cost sinks? Engaging more broadly may cost more money, but it’s worth it.
  • Our message & coherence: we need to communicate our consensus messages
  • SCOPE: we’ve been drawing our boundaries too tight
  • This is a scary problem! We narrow things down & put out boundaries to make it less so.
  • Soup-to-nuts handling needs sorting out.
  • So does selection & access

And with that, we were sent out to try to change the world of DP. :-)


Dec 11 2013

ANADP II part 9: Winning Poster Talks, Current Opportunities for Collaboration

Poster sessions at ANADP II were part of a contest. The three winners were Paul Wheatley (COPTR), Cal Lee (UNC), and Neil Grindley.

Each was asked to do a lightning talk, and then we moved into discussing current opportunities for collaboration.

Neil Grindley: 4Cproject.eu Digital Curation as an investment– realizing the value of assets via curation.

Paul Wheatley: Community Owned Digital Preservation Tool Registry: COPTR. [NB: Digital POWRR has contributed all of our Tool Grid information to this project, and we're working to help make the wiki tool more dynamic. Our goal is that the information in the registry gets spit out in a format that resembles our tool grid.]

Comment from Wheatley: We need to build to fail SLOWLY, so that we have time to recover. (Yes, we need to assume that we will fail. ALL technology does so, eventually.)

Cal Lee: Digital Forensics for Digital Preservation (BitCurator)

Current opportunities for collaboration

 Research Data Alliance:

  • Community organization.
  • 21st century science is global; so is the data infrastructure.
  • The internet as model: interactivity, exchange across networks. Community consensus.
  • RDA Colloquium = funding agencies
  • Interest groups propose. Working groups deliver.

International Internet Preservation Consortium (IIPC):

  • Preserving the web of today for 50-100 years later
  • Internet Archive + 46 members
  • Building progress
  • Tool –> Community –>Collection

Educopia / Katherine Skinner

  • Consulting — Events — Research
  • Interdependence means less chance of human failure (i.e. people leaving, etc.)
  • ICONC project–NDSA’s review of the last 3 years
  • SCAPE: Scalable Preservation Environments

Future opportunities for Collaboration

Google Doc of action items from the conference available

Oya Rieger (Cornell U / ArXiv)

  • Alignment –> cohesive values
  • Sustainability is the capacity to endure. It’s not just money, but social & political will…
  • Digital literacies: how do we survive/thrive in digital culture?
  • Assessment & Outcomes: small bites are less overwhelming
  • Study: 20-25% of ejournals collected (NOT published, COLLECTED) are being preserved right now
  • History of dependency on grant funds means we need to place more emphasis on new organizational models, embedding DP in all areas of the library
  • Registry: she worries about registries. They get forgotten. What about 21st c. methods of spreading info? MOOCs?
  • Collaboration: it’s a very demanding process. It requires good interpersonal and teambuilding skills. The research data community is coming together and bringing in partners. How do we get in on that?
  • We want more about use, usability, access, and discovery.
  • Want more about open access and scholarly communications

Jeremy York, Hathi Trust:

  • Issues: Enormity of the work vs. small number of people doing it.
  • Funding infrastructure with grants = bad plan
  • Specialization of function
  • Preservation-in-place vs. offsite
  • Succession of content?
  • Alignment of goals and diverse voices
  • Imagine: participatory stacks across sectors
  • Broad technical and human infrastructure that allows digital preservation to happen among other things
  • Focus on functions across sectors and we’ll get there
  • Infrastructure technical & human IN education curricula (English, History, etc.)
  • Digital literacy AS infrastructure
  • We need to share information about our practices
  • Make educational resources for the public
  • Expose our data

Up next, the closing plenary.

Dec 11 2013

ANADP II Part 8: Action session, Assessing Training

This is the action session that followed the panel discussion of training opportunities and programs on day 2 of ANADP II. It asked us to brainstorm the kinds of information we would want available to us when searching a “master database” or coordinated website of digital preservation training opportunities. If we had one-stop shopping, how would we shop?

Here’s what we came up with:

  • Levels: Beginner, Intermediate, Advanced
  • Content: What is addressed? Bit level DP? Emulation? Advocacy?
  • Audience: Is this training aimed at administrators? Managers? Practitioners?
  • Format: What is the delivery method? In person? Webinar? Hybrid of the two?
  • Length: How long is the training?
  • Is it broken into modules? Intensive? Are there prerequisites? Is it part of a series or a standalone?
  • What’s the cost? Is it free? Is there a fee?
  • Who are the instructors, and what are their bonafides?
  • Who is sponsoring or backing this training? Are they an accredited organization or a vendor?
  • Technically what’s being covered? Organizationally? What’s the policy orientation?
  • Is it national? International?
  • What language is it conducted in?
  • What references is it using?
  • When was it last updated?

We talked about RIDLS criteria. Describe, review, assess information literacy training interventions in higher education. Information literacy is defined as handling research information in a research capacity (i.e. not just FINDING the information, but handling it throughout its lifecycle). It’s not just about librarians, but about other outlooks, geared towards designers and providers of information.

The notion is that a training database could be developed along the lines of RIDLS criteria.

Thus endeth Day 2.

Dec 10 2013

ANADP II Part 7: Building Capacity / Capacity Alignment

Building capacity / Capacity alignment

Joy Davidson, DCC; Martha Whitehead, Research Data Canada, Laura Molloy, HATII, Mary Molinaro, U Kentucky/DPOE/NDSR

  • DPOE train the trainer
  • One size does NOT fit all.
  • Less about formal education, more about what’s happening on the ground
  • HATII: DIgCurv, BlogForever, OPEN PLanets, DCC (UK)
  • 2010 Training Needs Assessment Survey (Nancy McGovern)–ties into DPOE
  • NDSR and DPOE: 2 sides of the same coin.
  • DPOE: Practictioners. Train-the-trainer. Regional. Participants then go on to train others.
  • NDSR: New grads, residency, Cohort-based, Field experience.
  • Questions of scalability– small institutions at the end of the road?
  • DPOE: Conversations in the digital preservation community & Library of Congress & 2010 Assessment Survey.
  • Challenges: Lack of awareness, lack of policy & planning, limited educational resources, lack of funding
  • Practitioners want: 1 stop, hands-on, near home, in person, half-to-1-day workshops
  • Challenge: How do we do that? Train the trainer to build network, Share curriculum. Train at *audience* level. Cost is $1-1.5K per trainee, roughly. Volunteer instructors.
  • NDSR: DC, NY, Boston cohorts, with goal to expand internationally, create standardized practice. Needs ongoing support–from the private sector? National training associations? Vendors?
  • Research Data Canada (RDC) education: Strengths & Challenges! Uneven funding & research infrastructure for creators, stewards, users.

Lynne: One of the major things to come out of this session was the dangers of being on “soft” funding. Lots of projects that come and go in 1-5 year cycles, which makes continuity for training and practice VERY difficult to maintain. Also, with so much volunteerism required to function, there will be attrition when budget cuts happen, particularly for travel.


Dec 10 2013

An (almost) Graduate’s Perspective

During the graduation ceremony this weekend I doubt many students will be thinking about how long their digital objects will be around to tell the tales of their accomplishments. Their minds will be reeling with thoughts of, “I’m so glad to be done.”, “That was so much work!”, “I hope I get a job right away.”, or maybe just “Let’s get this over with so I can get home and NOT work on writing any papers.” What many of them don’t know (but hopefully some of them do) is that there is a whole team of people planning and working towards making sure their important digital objects have the ability to be and are preserved.  I’m happy that I will be among those students this weekend, but it is bittersweet to be ending my time working on this team.

Working with this powerful POWRR team has been a wonderful opportunity. I’ve learned so much about digital preservation and while I don’t plan on directly working in the digital preservation field I will be sharing the knowledge with those I encounter. I know that the work this team completes in the course of the grant (and beyond) will greatly contribute to the digital preservation community.  Let me tell you they are working really hard on this!

While my personal contribution to the project is small, by booking reservations, updating websites, taking minutes, doing research, documenting testing processes, crunching data, etc, I’m so happy to have been a part of it. I began my time as a graduate assistant on the project with an open mind and now I understand many of the risks involved in not preserving material, the different standards that preserved documents must live up to in order to survive, the numerous different tools that are available to make preservation progress and the fact that any program or process is always going to be a little bit more complicated that it sounds. In those cases where technology is giving you a rough time it is best to keep at it, and grab a friend to help you through the problem.

In conclusion, digital preservation is important today and it will be important tomorrow…and everyday. If you know nothing about digital preservation, that’s totally okay, but decide that its time to learn the basics and check out this awesome DP 101 page.  If you already know a bunch and want to start taking steps toward helping with a solution you can also go to the 101 page (there’s an advanced section just for you), or you can learn more about the tools available by checking out the magnificent Tool Grid.

Dec 05 2013

ANADP II Part 6 Day 2: Towards a Cost Spectrum

Action session 4: Towards a Cost Spectrum

Aaron Trehub (Auburn University) & Gail McMillan (Virginia Tech)

Uniting Theory & Practice

(This was a slideshow and then discussion of all of the numerous attempts at costing that are out there; I’m sure slides will be available soon)

  • Once you’ve decided to create a digital collection, we’ve taken on a preservation obligation
  • We’re not saying “cost doesn’t matter”
  • Cost & starting up are the two most difficult things
  • Work to date on cost modeling: Open PLANETS blog, LIFE project
  • Transparency
  • The downside of relying on cyclical funding
  • Keeping Research Data Safe (Beagrie)
  • Cost model for Digital Preservation (Excel Spreadsheet)
  • APARSEN (EU Initiative)
  • CDL-TCP Total Cost of preservation
  • Pay as you go: Continued funding, 4C Project… vs.
  • Paid up: Term limited funding, maybe a “fighting chance”?
  • Rosenthal: Ingest is about half, preservation 1/3, access 1/6th of cost
  • ADPNet: Designed tiered for small institutions to join. Membership + storage fees + equipment.
  • Paralysis of Choice. Danger of researching to death. Which doesn’t work any more.
  • Low barriers to entry–negligable losses in Tessella pilot program
  • DPC Wiki (we need to add the POWRR project!)
  • We have empirical evidence of how this (funding) works.

Lynne’s notes: At this point, there was a lively discussion about the importance of integrating digital preservation practices across multiple salary lines/positions in libraries, rather than having it be a single line-item. When it’s part of *everyone*’s job, it is very difficult to cut, as opposed to a single-person or fee-for-service model, which, when funding cuts roll around, are in much more danger of looking tempting. Also, it was pointed out that we don’t necessarily need to know EXACTLY what DP will cost–an approximation in many cases will do just fine for planning purposes, especially with a mostly-sweat-equity model that only has direct expenses for equipment and storage…


Dec 05 2013

ANADP II Part 5 Day 2: Let’s talk money.

The official theme for the day was “Resource Alignment & Capacity Alignment”… but the discussions were all about cold, hard, cash, whether in the form of ACTUAL cash or invested in resources like people or equipment.

Panel 2: Neil Grindley (JISC), Tom Cramer (Stanford/DuraSpace), Sabine Schrimpf (nestor), Tom Wilson (U Alabama/ADPNet)

What is the most economical deployment of our resources?

  • Can we afford to do preservation?
  • As much as we need to?
  • Things will be lost. Life goes on. We can’t preserve everything.
  • Preservation gains value from heterogenaiety (of processes, software, hardware, etc.)
  • We can’t afford to do DP on everything–that’s a good thing. Ongoing demonstrated value of DP activities is necessary.
  • DP requires choices and active measures, i.e. TRIAGE.
  • If you had more money, would you preserve more? Staff bottlenecks–they can’t do more work than they already are at present staffing.
  • Is it worth it to collaborate on DP internationally? Maybe. We need a business case. Common solutions to common problems. This is good for getting on the agenda of decision-makers.
  • Also important: finding metrics for measuring outcomes.
  • Scoping expectations for any collaborations–where are we heading?
  • Yes, working with other people is really a PAIN. And yet: Saying: go fast alone, go far together.
  • Collaborating is still the best option. The problem is too big to go it alone.
  • We’re better at coordinating than collaborating.

How do we get DP noticed / FUNDED?

  • Need for blurred boundaries in our organizations on occasion. Sweat equity that goes across multiple staff.
  • Don’t create silos of expertise! Educate MORE people. Cross train staff & administrators. Strategic money decisions then come from input across the whole organization.
  • Follow the funding chain.
  • Digital assets/objects: They have implicit value and liabilities. This is critical in selection and appraisal. Organizations have difficulty deciding what is valuable.
  • At ‘Bama, they focused on special collections digitization projects exclusively for ADPNet. Workflows were a big issue: where are duplications identified? How were they defined?
  • Line item–could lead to administrative veto.  Sell them on work we’re already doing. Cultural memory work total, of which this is a part.
  • Risk mitigation model: this is also “insurance”
  • A piecemeal approach may be okay.
  • There is a BIG distinction between “asset management” and “digital preservation” Organization is NOT preservation, and we need to make this clear.
  • Consider neglected communities: such as ALL your suppliers.
  • Business language is not always comfortable for us, but we should really frame this as leveraging assets to pay for liabilities.
  • Return on investment/ the value of scholarship isn’t the question. The question is WHY DOES IT COST SO MUCH?
  • In our language, we need to not make single arguments.
  • Our audience needs a temporal framework (e.g. preservation for hundreds of years, new formats last few decades)
  • We need to consider the externalization of costs: producers of content are not doing triage/selection of any kind right now.
  • Our costs come from lack of metadata at submission, i.e. setup costs.
  • What can we STOP doing, how can we realign, reassign our resources to kill this bottleneck?

Dec 04 2013

ANADP II Part 4: Day 2

The finishing of Day 1 at ANADP II was a group of action sessions; I attended the “Applying the OAIS Framework to Distributed Digital Preservation” session, which was facilitated by Matt Schultz (MetaArchive) and Eld Zierau (Royal Library of Denmark). I don’t have a lot of notes, as we were busy moving cutouts of different types of distributed digital preservation system elements around to try to make an instant system based on our respective collections.

This was far more effective than it sounds (and rather resembles real life, where collaborations are often based upon who shows up). But it didn’t make for much note-taking.

This was followed by a lovely wine reception at the Palau Moja, featuring a classical guitarist performance.

Day 2!

We opened with going over the themes that emerged from Day 1:

  • Legal: We need ‘sweeping principles’ not minutia (which is where the lawyers tend to focus)
  • Organizational: Mirroring between centers
  • Alignment & Interdependence
  • Community building: reaching out to neglected constitutencies, and BEING THERE
  • We do things differently here
  • Intellectual Honesty
  • Documentation & Storytelling
  • PR: Elevator Pitches!
  • Who is “we”?
  • Good enough vs. Best Practices
  • Proliferating projects lead to fragmentation
  • External funding: threat or menace
  • Sustainability/self sufficiency–Assume no new funding

The rest of the day was devoted to talking money, extensively, which will go into the next post, as I have a LOT of notes.


Older posts «