Content Analysis

Qualitative Content Analysis: World Bank examine popularity of PDFs

May 9, 2014 12:44:00 PM

The World Bank has investigated the popularity of its policy reports and found that almost one-third is never read! I’m not surprised. The format in which the reports are presented and the quality of the metadata associated with them tell a large part of the story.

worldbank The World Bank is a United Nations international financial institution that provides loans to developing countries for capital programs. Their mission is to eradicate poverty. They invest about 25% of their budget in knowledge products like policy reports explaining how to make the world better. The effectiveness of these reports had not been researched until now.

1,600 reports in 5 years

The World Bank conducted internal research to investigate the exposure of its policy reports. From a database that contains 130,000 publicly available documents, for this report, they have selected a total of 1,600 reports produced between 2008 and 2012. They were all published on the external website as PDF documents. The World Bank wanted to know how many times the report had been downloaded and how many citations were drawn from it.

One-third is never read

The results are shocking, but not surprising either. I'll go into that later. It turns out:

Almost one-third of 1,600 policy reports were NEVER downloaded. Not once.
Almost 40% was downloaded between 1 and 100 times.
Only 25 reports have more than 1,000 downloads in 5 years.
Download percentages are consistently dropping: from 71% in 2008 to 59% in 2012 with the total number of reports remaining stable at around 300 per year.

To put these numbers into perspective, worldbank.org attracts around 3 million monthly visits.

World Bank policy reports downloads

Source: World Bank report

Burying knowledge in unread PDFs

The Washington Post concludes that solutions to world problems might be buried in PDFs nobody reads. They stress the omnipresence of PDF as the single document format in the think tank industry, the government, and universities. Data journalist Christopher Ingraham invites all of them to perform the kind of research the World Bank has bravely done now.

Why wasn’t exposure monitored?

I find it baffling, by the way, that the World Bank should receive praise for researching the effectiveness of its publications. Their goal is not just to make information available; it is to have as many people as possible actively use the info to make the world less poor. So why, after throwing out 300 reports a year since 2008, should the first Omniture report be created 6 years later? I would have expected them to monitor this from the get-go.

There are two main reasons why these figures do not surprise me. The first has to do with the nature of the portable document format, the second with the metadata associated with the reports.

Locking up data

PDF is, in many respects, the easy way out for people who haven't truly adopted online. You write up your document in a word processor, as if you intend to print it, and save it as a PDF to throw it online. The contents of the PDF are completely locked inside. Which is a pity, particularly in the case of the World Bank, which could unleash a lot of useful data for data journalists and others to elaborate upon.

PDF and mobile aren’t friends

PDF is a mobile-unfriendly document format, although this has seen much improvement in recent years through, for instance, Chrome being able to display a PDF in a browser window. Nevertheless, with more and more people only accessing the internet through their mobile phones, PDF is not the best format. HTML is the way to go. Not only for reasons of accessibility, but also for easier indexing by search engines.

Missing document metadata

Which brings me to the second point. One main reason for documents not being downloaded is that people can't find them. Or even if they can, it is unclear what to expect from their contents, so users won't bother. Ironically, the research into their reports not being read is only available as a PDF. Let's take a look at its metadata. For starters, the report has an uninterpretable document title: WPS6851.pdf. The document properties aren't much to look at either, with 'World Bank Document' as the title and no subject or keywords.

World Bank PDF properties

Source: World Bank report (PDF)

Metadata on the web page

Admittedly, the page on the website where the PDF is published does provide some meta information. There we see, for what it’s worth, that this is what the World Bank calls a Policy Research Working Paper. It provides author profiles, an abstract, and a document name: "Which World Bank reports are widely read?" The meta description provided to Google is merely the repeated title.

Snappy document description

My question: Why aren't these metadata included in the actual document? And I'm wondering if an abstract alone is enough to trigger one's interest. It's on the longer side, counting 1162 characters. A decent meta description of 156 characters max. should suffice and can be used in many useful ways. A couple of keywords would be really helpful, too.

Open up the info, describe it meaningfully

In conclusion, it’s a good thing the World Bank is analyzing the popularity of its reports. Many other mass information-producing organizations should follow their lead. But two fundamental issues with the way their policy reports are published are left unaddressed. A closed document format like PDF and the lack of meaningful metadata associated with the document aren’t helping its contents be found and shared freely and easily.

-------------------------------------------------------------------

How to perform a qualitative content analysis on your repository?

Want to understand which content on your repository is unpopular? Or which content on file shares is redundant, obsolete and trivial? Or, do you just want to know what content is located on your repository in general? A file share analysis can be useful, for example, in case of a content migration or implementation of a new WCM or ECM. Learn which insights a content analysis can give. Download the report.