Duplicate Content: Not Your Father’s Plagiarism
Duplicate content is not plagiarism as you might think of it because Google’s focus is on providing relevant answers to users’ search queries.
What is duplicate content?
The title of this post is a riff on the infamous “Not Your Father’s Oldsmobile” ads of the 1980s. In the ’80s, duplicate content was synonymous with plagiarism. In the eyes of Google, and because of the way the Internet works, that is not so today.
Moz defines duplicate content as “content that appears on the Internet in more than one place.” Google views content as duplicate when it is “appreciably similar.” In general, duplicate content is repetition of certain types of content (in contrast to plagiarism, which is outright appropriation). But the problem with “appreciably similar” content is that it complicates Google’s eternal quest to return the most relevant search results possible to users’ queries.
Risks of duplicate content
Any lawyer engaged in creating web content should be aware of the risks, which include:
- Failing to appear in results for users’ search queries
- Being blacklisted by Google, effectively rendering you a non-entity
- Marketing agencies making false claims that your site fails the “Copyscape test”
- Loss of reputation (and possible litigation) for engaging in plagiarism
In this post, I will examine each of these risks in turn, starting with a common duplicate content scenario for context.
Teen busted for beer, Mom searches for a defense lawyer
Imagine the anxious mother of a wayward teen, sleepless at night in front of the screen, plugging “best criminal defense lawyers underage drinking” into the search bar.
Whose content does she find?
In an ideal world, Google returns search results that prominently feature the local trial veteran, who just so happens to be the author of a preeminent treatise on underage drinking law. This trial veteran has excerpted her treatise and integrated portions of it into her marketing copy. The copy demonstrates her undeniable expertise. There are other tangible benefits as well, such as authoritative inbound links leading to this content, in large part because of its quality and this lawyer’s excellent reputation.
The trial veteran, however, is dismayed to find that Google also serves the “appreciably similar” content of the greenhorn who just last week began growing a beard to look more grizzled than his half year in practice would suggest. In some cases, the greenhorn may appear above the trial veteran, prompting her to do some brief research on this young chap and poke around his site.
She finds some of her appreciably similar content smattered across his site, cries plagiarism and begins to wonder whether she is losing business from it.
Risk: Failing to appear prominently in search results
[Threat level: Minimal]
In most cases – at least, in the vast majority of those I’ve reviewed as part of my duplicate content analysis – it’s not plagiarism. Forget about the irritating fact that the perpetrator is a greenhorn with a questionable beard; it’s not relevant.
What is relevant is a lawyer’s ability to provide satisfactory answers to potential clients’ questions. Relevance is Google’s focus. In all likelihood, Google would not spotlight the greenhorn’s content because Google already knows that it is duplicate content. If the greenhorn nonetheless shows up on page one for some of the anxious mother’s searches, it’s not because of cribbed content alone. Google is sophisticated enough to distinguish between common marketing techniques and overt appropriation.
Use of the phrase appreciably similar in how it defines duplicate content, rather than plagiarized, gives us a clue as to Google’s viewpoint. Ultimately, Google rests its case on the “good job” it does in serving the right version of content to users’ search queries.
Take-away: Even if the greenhorn has shamelessly cribbed substantial portions of the trial veteran’s preeminent treatise and used it in his own marketing copy, Google usually knows what’s what. In an apples-to-apples comparison, Google would favor the trial veteran’s content in search results.
Risk: Being blacklisted by Google
[Threat level: Minimal]
As Google says, “Duplicate content on a site is not grounds for action on that site unless it appears that the intent of the duplicate content is to be deceptive and manipulate search engine results.”
Here are three examples:
- Multiple URLs that all contain the same content, such as those built for tracking purposes, which dilute “link juice” and page authority
- Two pages with the same content: one with “www” in the domain name, the other without
- Printer versions of pages
Note that these are technical in nature, not nefarious. They also pertain to your own website. If you aren’t ranking for a particular piece of content, be sure that you haven’t diluted the impact of your own page by accidentally giving it several duplicates. (On a side note, a number of internal pages or sections of duplicate content could be an example of misguided strategy or poor execution.)
Again, the focus here is on search results.
The caveat is that engaging in deceptive practices to manipulate search rankings or generate traffic – such as copying large swaths of a competitor’s site because it outperforms in search – could get your site removed from search results.
Why? Because you’re messing with Google’s search results, and Google won’t have that.
For better or for worse, duplicate content is a part of the internet, and is even treated differently from agency to agency and from industry to industry: Some shops devote time and attention to it, others don’t. Given its focus on relevance in search, Google takes a permissible approach to duplicate content, whereas FindLaw, as a lawyer-marketing company, holds its writers to high standards and is far less permissive. In other words, standards of acceptable “duplication” will vary depending on your industry.
Take-away: Duplicate content exists everywhere on the Internet. Google does not default to blacklisting sites that contain duplicate content, unless you’re trying to manipulate search results.
Risk: False claims that your site fails the ‘Copyscape test’
[Threat level: Minimal]
Copyscape is a third-party tool that helps content owners locate instances (and assess the scope and severity) of duplicate content. Copyscape runs an algorithm that searches and returns results on potential duplicate content issues, but these results still require an actual human being to analyze.
There is no such thing as a “Copyscape test” that a website either passes or fails.
As we’ve seen with other online tools, like those that measure page speed, marketing agencies may make this claim to drive a competitive wedge and win business. The reality is that if you’ve been on the web for any appreciable length of time, you will have some duplicate content. Just the act of producing content – especially in competitive practice areas like criminal defense (“underage drinking may result in such-and-such charges and this-or-that penalty”) – will nearly always result in some degree of repetition, such as common branding and calls to action. But this does not excuse outright appropriation and adjustment of a competitor’s marketing copy as though it were a game of Mad Libs.
Take-away: A Copyscape report, by its own existence, is not evidence of a duplicate content problem. In lawyer marketing, most duplicate content is of the permissible kind: portions of statutes, case law, publically available legal information and common navigational elements on websites, among others. These have a negligible impact on search results.
Risk: Loss of reputation for engaging in plagiarism
[Threat level: High]
Fortunately, the greenhorn may possess a modicum of reasonable judgment, as he did not appear to have engaged in wholesale plagiarism of the trial veteran’s preeminent underage drinking treatise.
When he wrote his own marketing copy, he cribbed from several sources to cobble his site together, mixing and matching words as he went along. Flagged by Copyscape, a closer look indicated no actual plagiarism but snippets of appreciably similar phrases and sentences from multiple sources smattered here and there like pigeon droppings.
Furthermore, much of this was statutory language and common selling points (the lackluster “we answer our phones and return messages”), as opposed to the few instances of exact-match phrases and sentences (which are troubling, at least from an ethical standpoint).
You may notice that the threat level is high on this particular risk factor.
A lawyer’s reputation is everything, and plagiarism risks damage to one’s reputation, as well as the threat of being blacklisted by Google and the threat of legal action should the original author wish to defend her copyright.
In my opinion, the act of plagiarism, in which the offender attempts to pass off another person’s original work as their own, goes hand in hand with attempting to manipulate search results. The only reason it’s done is to be more relevant than the other player in the eyes of Google.
In this situation, duplicate content truly is your father’s plagiarism.