Issue Number Six: The Total Archive

How to be open about being closed

How does the Internet forget what it should not remember? Reuben Binns dives inside the rules for Biographies of Living Persons at Wikipedia and the right to be forgotten.

In the summer of 2014, Wikipedia created a public list of pages that have been partly hidden from search engines for privacy reasons (Wikimedia Foundation 2015). It includes entries on criminals, famous musicians, and a chess player, all of whom appear to have made requests to Google to have the content de-listed from searches on their name. This “index of the de-indexed” is one of the many curious by-products of the online encyclopedia’s ongoing construction. It encapsulates a peculiar set of contradictions arising out of the project’s concurrent imperatives: to be at once selective and comprehensive; and to both exclude the vast majority of edits while also maintaining its radical openness.

Visualizing Deletions in Wikipedia. From

Visualizing Deletions in Wikipedia. From

The list, “Notices received from search engines,” comprises links that have been removed from certain search engine results under European data privacy laws. The so-called “Right to Be Forgotten,” established by a European Court of Justice ruling in May 2014, confirmed that European citizens have the right to request the removal of links to certain content about them when their name is entered into a search engine.[1] The right has its legal basis in decades-old data protection laws, but had been unenforced until a Spanish court ruled in favor of an individual who wanted Google to remove a link to a news article detailing his previously unpaid debts. The court agreed that Google would be required to remove the link to the article from search results that were based on the individual’s name. This opened up the floodgates for other individuals to make similar requests under what became known as the “right to be forgotten.”

As many commentators have noted, the phrase “the right to be forgotten” is misleading; the original content isn’t removed, but only becomes harder to find using the individual’s name. In each case the search engine is required to weigh the values of privacy and the public interest before accepting or rejecting a request. If Google’s in-house arbitrators do approve a request to de-list, they notify the website that has been de-listed.

The Wikimedia Foundation, the nonprofit that operates Wikipedia, compiles such notices in the index. Clearly, publishing a list of pages containing the personal information of subjects who have explicitly attempted to obscure them somewhat undermines the purpose of the right to be forgotten. In this sense, the index of the de-indexed looks like a retaliatory blow struck by defenders of openness in their battle against censorship and undeserved privacy. Wikimedia’s press release accompanying the index supports this interpretation; it argues that content “should not be hidden from Internet users seeking truthful and relevant information,” and that the ruling “runs counter to the ethos and values of the Wikimedia movement” (Wikimedia 2014). Wikimedia is not alone in making these notifications public; the BBC also maintains a list of affected pages to preserve “the integrity of the BBC’s online archive” (McIntosh 2015).

We could see this simply as a clash between those who think certain information about individuals should be made public and those who don’t. But Wikipedia’s existing processes for handling the deletion and selection of content suggest a more nuanced position. The apparent hostility toward removing information on grounds of privacy belies the measured stance to be found in Wikipedia’s long-established policies. For many years prior to Google Spain, the project has had its own policy on biographies of living persons (BOLP). This includes many admirable principles that echo those laid out in the court decision. Biographies should be based on up-to-date and reputable sources, “relevant to a disinterested article about the subject,” with due regard for privacy. It warns against spreading “titillating claims about people’s lives; the possibility of harm to living subjects must always be considered when exercising editorial judgment” (Wikipedia, 2015). Furthermore, biographies may in some cases be entirely deleted, upon request, if the individual concerned is a relatively unknown, non-public figure.

An example: information about a deleted Wikipedia page for "Songs about Masturbation."

An example: information about a deleted Wikipedia page for “Songs about Masturbation.”

These rules have themselves been developed in the “wiki-way”—through online discussion and consensus building—and aim to balance various criteria, including the public interest, privacy, and freedom of speech. Compare the requirements of the BOLP to the considerations outlined in the Google Spain decision, and they begin to look roughly equivalent (in some cases, the BOLP appears to impose an even stronger imperative to forget). So in addition to the recent list of pages de-indexed by search engines, there is a much older record of changes made by the site’s editors in accordance with its own self-imposed privacy principles outlined in the BOLP. Some recent examples of privacy-motivated deletions that have arisen out of this policy include removing a link between an author’s real name and a suspected pen name; removing contextual information about an individuals’ family members; and removing references from a medical doctor’s biography to rumors that his or her medical license had been revoked.

Given the substantially overlapping criteria between the right to be forgotten and the BOLP policy, why would the Wikimedia Foundation denounce the former while implicitly endorsing the latter? The notion that this controversy is simply due to disagreement about the balance between openness and privacy is unsatisfactory because the two policies are in broad agreement. One way to explain the disparity may be by paying attention to Wikipedia’s commitment to a principle of openness and the role this plays in justifying the entire project.

Even if it has its own version of the right to be forgotten, Wikipedia’s procedure for “forgetting” is very much its own. Every edit is logged, stored, and debated with reference to the community policies and principles before being approved. Every point of every debate over every edit is also logged, along with the references to the relevant policies. One can therefore find a comprehensive, indelible memory of everything that was ever forgotten, why it was forgotten, who advocated for it, and who objected.

Far from being fundamentally at odds with the idea of forgetting—of closing down material that infringes on individual privacy—the open encyclopedia embraces it. But it manages to reconcile the apparent conflict between open and closed by being open about being closed. This suggests a general strategy by which those working within the open paradigm can feel comfortable within its limitations. If the participation, the policies, the processes, and the end product are all “open,” then maybe forgetting need not be seen as an ideological compromise.

Visualizing Deletions in Wikipedia. From

Visualizing Deletions in Wikipedia. From

The difference between censorship and mere editing is therefore grounded in the community’s ability to square its founding principle of openness with some of the new normative considerations it faces. What looks like a substantive conflict between open and closed, public and private, transparency and privacy is dissolved by appeal to a second-order principle of openness, which preserves ideological consistency and editorial sovereignty.

Indeed, publishing indexes of the de-indexed is just one way that the administrative systems and bureaucracy that lie behind Wikipedia’s topic pages are subjected to a kind of radical openness. “Talk” pages, where the site’s editors deliberate over their activity, have grown faster and are busier than the articles themselves. The “Department Directory” page unveils a bewildering array of governing committees and policy-making processes, from abuse response and counter-vandalism, volunteer recruitment, dispute resolution, and deletion. Every contribution takes place in publicly accessible forums, recorded for posterity in a vast archive of editorial ephemera.[2] Compare this approach with that of traditionally “closed”—or at least, less open—institutions of government, business, or science. Detailed records of internal activity, if they even exist, are usually hidden by default. Even if the official output—a white paper here, a scientific publication there—is made open, the process behind it is not.

The project’s commitment to making its inner bureaucracy open and archived is not just an ideological fetish collectively imposed by its community, but perhaps also fundamental to the encyclopedic project. The vast archive of publicly recorded activity serves an important function regarding the encyclopedia’s primary content. Wikipedia’s aim is to amass the “sum total of human knowledge.” This doesn’t mean including everything that anyone has or could ever say—it is not Borges’ “Library of Babel”—as we can see from the record of deletion and the community’s numerous editorial principles. The project legitimizes leaving certain content out by being open about the means and justification for exclusion.

Commercial general encyclopedias never had to justify openly what they’d left out and why (thereby generating significant work for historians interested in their selection criteria). By contrast, Wikipedia’s archive of talk pages exists as a record of what was left out and why. The project navigates the contested space between what is considered “the world’s knowledge” and what is private, sensitive, irrelevant, unimportant, spurious, or sensationalist. The demarcation of these categories is inherently contestable. By facilitating and archiving such contests openly, the project aims to justify its ambitious claims to totality. A total archive of editorial activity is therefore central to the project’s mission to amass the “sum total of human knowledge.”

Reuben Binns is a postdoctoral research fellow in Computer Science at the University of Oxford, interested in philosophical, technical and legal aspects of personal data, privacy, and the web.


McIntosh, Neil. 2015. “List of BBC Web Pages Which Have Been Removed from Google’s Search Results.”, June 15.
link to BBC Website.

Wikimedia Foundation. 2015. “Notices received from search engines.” link to Wikipedia.

Wikimedia Foundation. 2014 “Wikipedia Pages Censored in European Search Results”

link to Wikipedia

Wikipedia, 2015 “Biographies of Living Persons” link to Wikipedia

[1] Google Spain SL, Google Inc. v Agencia Española de Protección de Datos, Mario Costeja González (2014)

[2] In this sense, Wikipedia may be reminiscent of the Cairo Genizah, described elsewhere in this volume by Benjamin Outhwaite. The accumulated background pages of Wikipedia are rather like the “ephemera” of daily Egyptian Jewish culture, “piling up in a stratified manner” as a result of the “the rabbinic prohibition against destroying holy writ.”