Skip to content

Miss Manners and TPMs

Earlier this week the Eastern District of Pennsylvania issued its opinion in Healthcare Advocates v. Harding, Earley, Follmer & Frailey[pdf]. This case, which involved copyright infringement, DMCA, and CFAA claims brought against a law firm for accessing and printing a litigant’s web pages via the Internet Archive’s Wayback Machine, had the potential to be a total disaster. Luckily, we scraped by with a mere semi-disaster. The court reached the right result, granting the law firm summary judgment on each claim, but its reasoning, particularly with respect to the DMCA claim, gives me some cause for concern.

The issue centered around Healthcare’s use of a robots.txt file. Web spiders, or robots, automatically scour the web and copy the contents of publicly accessible web pages so that search engines and archives can index and preserve their contents. Site owners who prefer not to have their site’s indexed or archived can add a file named “robots.txt” in order to state their preference that the spider ignore the site, or certain portions of it. The Internet Archive’s policy on robots.txt files states that when it finds one, it will stop archiving the site and disable access to previously archived versions.

Harding, in the course of investigating assertions of infringement made by Healthcare against Harding’s client , accessed archived copies of the Healthcare site. Healthcare argued that it added a robots.txt file to its site and, as a result, denied access to older versions of that site. According to Healthcare, the only way Harding could have accessed its site was by circumventing its robots.txt “protection measure.” But the court found that, because of a malfunction at the Internet Archive, Harding simply accessed the Wayback Machine through the normal means where it found archived copies of the Healthcare site. Since Harding took no steps to avoid or bypass any protection measure arguably restricting access to the site, it could not have circumvented in violation of section 1201.

The court’s analysis on the circumvention point is correct, as far as it goes. The problem, from my perspective, is that the court ever got to the question of circumvention. Section 1201(a) provides:

No person shall circumvent a technological measure that effectively controls access to a work protected under this title [and]

a technological measure “effectively controls access to a work” if the measure, in the ordinary course of its operation, requires the application of information, or a process or a treatment, with the authority of the copyright owner, to gain access to the work.

The court, after dutifully reciting this definition, went on to conclude that under the circumstances “the robots.txt file qualifies as a technological measure effectively controlling access to the archived copyrighted images of Healthcare Advocates.” The court to its credit, explicitly avoided any general conclusions about whether robots.txt qualifies as an effective technological protection measure, stating that its “finding should not be interpreted as a finding that a robots.txt file universally qualifies as a technological measure that controls access to copyrighted works under the DMCA.”

But the court’s conclusion that robots.txt functions as an effective TPM, even here, is mistaken. Think first of how robots.txt functions in the everyday non-Internet Archive context. A website operator adds a robot.txt file with the goal of warding off those pesky spiders and robots. Every web browser on the planet remains free to access this content. In fact, the average user virtually never has reason to know or care whether a site includes a robots.txt file. Obviously there is no restriction on this variety of access.

But just as users are free to access these sites, so too, as a technological matter, are those dreaded robots and spiders. Robots.txt does not impose any technological barrier whatsoever. A spider need not avoid or bypass robots.txt in order to access those sites; it simply needs to disregard it – no application of information necessary.

Robots.txt, after all, is something of a gentleman’s agreement. It is, by internet terms, a polite request that one’s hoard of robots and spiders avoid accessing particular directories, and nothing more. The force of good etiquette is not sufficient to produce a technological protection measure. The robots.txt protocol is just a convention. And one cannot, under the DMCA, point to a convention that imposes no technological restraint on accessing a work and claim that it functions as a TPM.

Imagine a book publisher who prints up standard English-language, text-on-both-sides, non-invisible-ink, pages-freely-turnable books with a clear notice on the cover that states “We, the publishers, would rather appreciate it if you would refrain from opening this book and perusing its text should you find it on the sidewalk unattended.” This notice, no matter how politely worded and widely followed, does not give rise to a DMCA claim when a curious passerby browses the book.

What if the local library decides, on its own accord, to enforce that notice? Does it magically transform into a TPM? “Magically,” here, signifies that this question is rhetorical in nature. Now, the library may well have its own technological means of preventing access to certain books, among them books with the no-reading notice. Assuming those means independently satisfy the requirements of 1201, they could be effective TPMs. But one who circumvents the library’s TPM cannot reasonably be said to have circumvented the notice on the book cover.

So that tortured analogy was all a means to say that the court here confused an alleged violation of the Internet Archive’s own access controls, which were triggered voluntarily by the robots.txt file, with alleged violations of robots.txt itself. Of course, Healthcare’s own briefing undoubtedly didn’t do much to clarify the conflation inherent in its own argument. Since there was no circumvention, this confusion didn’t hurt Harding. But any suggestion that robots.txt serves as an effective TPM could hurt plenty of companies in future litigation.

Post a Comment

Your email is never published nor shared. Required fields are marked *
*
*