Qualitative Content Analysis: World Bank examine popularity of PDFs
by Corné van Leuveren, on May 9, 2014 12:44:00 PM
The World Bank have investigated the popularity of their policy reports and found almost one-third is never read! I’m not surprised. The format in which the reports are presented and the quality of the metadata associated with them tell a large part of the story.
The World Bank is a United Nations international financial institution that provides loans to developing countries for capital programs. Their mission is to eradicate poverty. They invest about 25% of their budget in knowledge products like policy reports explaining how to make the world better. The effectiveness of these reports had not been researched until now.
1,600 reports in 5 years
The World Bank conducted internal research to investigate the exposure of their policy reports. From a database that contains 130,000 publically available documents, for this report they have selected a total of 1,600 reports produced between 2008 and 2012. They were all published on the external website as PDF documents. The World Bank wanted to know how many times the report had actually been downloaded and how many citations were drawn from it.
One-third is never read
The results are shocking, but not surprising either. I'll go into that later. It turns out:
- Almost one-third of 1,600 policy reports were NEVER downloaded. Not once.
- Almost 40% was downloaded between 1 and 100 times.
- Only 25 reports have more than 1,000 downloads - in 5 years.
- Download percentages are consistently dropping: from 71% in 2008 to 59% in 2012 with total number of reports remaining stable at around 300 per year.
To put these numbers into perspective: worldbank.org attracts around 3 million monthly visits.
Source: World Bank report
Burying knowledge in unread PDFs
The Washington Post concludes solutions to world problems might be buried in PDFs nobody reads. They stress the omnipresence of PDF as the single document format in the think thank industry, the government, and universities. Data journalist Christopher Ingraham invites all of them to perform the kind of research the World Bank have bravely done now.
Why wasn’t exposure monitored?
I find it baffling, by the way, that the World Bank should receive praise for researching the effectiveness of their publications. Their goal is not just to make information available; it is to have as many people as possible actively use the info to make the world less poor. So why, after throwing out 300 reports a year since 2008, should the first Omniture report be created 6 years later? I would have expected them to monitor this from the get-go.
There are two main reasons why these figures do not surprise me. The first has to do with the nature of the portable document format, the second with the metadata associated with the reports.
Locking up data
PDF is in many respects the easy way out for people who haven't truly adopted online. You write up your document in a word processor, as if you intend to print it, and save it as a PDF to throw it online. The contents of the PDF are completely locked inside. Which is a pity particularly in case of the World Bank, who could unleash a lot of useful data for data journalists and others to elaborate upon.
PDF and mobile aren’t friends
PDF is a mobile-unfriendly document format, although this has seen much improvement in recent years through for instance Chrome being able to display a PDF in a browser window. Nevertheless, with more and more people only accessing the internet through their mobile phones, PDF is not the best format. HTML is the way to go. Not only for reasons of accessibility, but also for easier indexing by search engines.
Missing document metadata
Which brings me to the second point. One main reason for documents not being downloaded is people can't find them. Or even if they can, it is unclear what to expect from their contents, so users won't bother. Ironically, the research into their reports not being read is only available as a PDF. Let's take a look at its metadata. For starters, the report has a totally uninterpretable document title: WPS6851.pdf. The document properties aren't much to look at either with 'World Bank Document' as title and no subject or keywords.
Source: World Bank report (PDF)
Metadata on the web page
Admittedly, the page on the website where the PDF is published does provide some meta information. There we see – for what it’s worth – that this is what the World Bank calls a Policy Research Working Paper. It provides author profiles, an abstract, and a document name: "Which World Bank reports are widely read?" The meta description provided to Google is merely the repeated title.
Snappy document description
My question: why aren't these metadata included in the actual document? And I'm wondering if an abstract alone is enough to trigger one's interest. It's on the longer side counting 1162 characters. A decent meta description of 156 characters max. should suffice and can be used in many useful ways. A couple of keywords would be really helpful, too.
Open up the info, describe it meaningfully
In conclusion, it’s a good thing the World Bank are analyzing the popularity of their reports. Many other mass information producing organizations should follow their lead. But two fundamental issues with the way their policy reports are published are left unaddressed. A closed document format like PDF and the lack of meaningful metadata associated with the document aren’t helping its contents being found and shared, freely and easily.
How to perform a qualitative content analysis on your repository?
Want to understand which content on your repository is unpopular? Or which content on file shares is redundant, obsolete and trivial? Or, do you just want to know what content is located on your repository in general? A file share analysis can be useful, for example, in case of a content migration or implementation of a new WCM or ECM. Learn which insights a content analysis can give. Download the report.