Easy methods to minimise plagiarism and check for AI-generated content on your website
Plagiarised content is a real problem online, and left unchecked can dramatically lower traffic to your site. It's important to stay on top of it and to remain vigilant.
However, plagiarism-checking isn't your only potential problem. Advancing technology has bred a host of automated content generation software such as ChatGPT, CopyAI and WordHero, but Google has been clear that it does not consider automatically generated content acceptable unless it's unique, high quality and helpful. If an external agency or employee is adding website content on your behalf, it's worth checking those standards.
Let's look at both in more detail, and then explore some options to check for plagiarism and AI-generated content.
What is plagiarised content?
Let's presume you have a new product for your online shop. You write an original description for this product rather than copying the manufacturer's blurb because you know that original content is going to set you apart from your competition. Google quickly finds your new product page and indexes it.
All good so far. But let's assume you were one of the first people anywhere to get this new product online. Bear in mind that there are lots of different ranking factors apart from having original content, but all other things being equal, there's a good chance you're ranking well. That's great for you, as that means traffic, but it can also be a problem as your competitors put together their own product pages. The lazy ones might copy your carefully crafted content onto their site and benefit from all your hard work!
That's not fair!
Google claims to be able to detect which content is the original version (we would guess by considering the timestamp for when the page was first discovered). However, we have seen lots of examples where our customers' original content has been copied, and the copied content has out-ranked the original content, resulting in loss of traffic.
What about AI-generated content?
This is a rapidly developing field, and Google's response to it is bound to shift over time, too. It recognises that certain types of content have been generated automatically for a long time now, for example, big collections of data such as sports scores, weather forecasts or transcripts, and it doesn't deny that these are helpful to users. But Google also says:
“...those seeking success in Google Search should be looking to produce original, high-quality, people-first content demonstrating qualities E-E-A-T” [expertise, experience, authoritativeness, and trustworthiness]
AI-generated content has many valuable applications, not the least of which are content ideation and topic research. However, AI is not infallible, and even though it can usually churn out reasonable-sounding waffle, its quality and accuracy often leaves much to be desired. It's a fairly sure bet that Google does not intend for us to use it for the wholesale production of cut-and-paste content on your web pages.
How to check for copied or AI-generated content
There are various online tools out there that can do this.
- One such tool is Copyscape. You enter the URL for the page you want to check and it will then check for duplicates online. There's a basic free version and a more elaborate paid version.
- Another is Originality.AI, a plagiarism tool and AI detector. The price is very reasonable (at the time of writing), and it allows unlimited team members, unlimited websites and unlimited scans to check for plagiarism and for the suspected use of AI writing tools.
- A rough and ready method to spot-check short passages of text (for example, part of a blog post that you've sweated days to create) is to simply copy a sentence or two from your article and paste it into Google. Search results should show your article at the top, but anyone who has copied it (either an exact-match or partial-match) should be listed, too.
I've found a match
If you find a match or matches, then there are a few things you can do. Firstly, you can report it to Google via their Copyright Removal Tool. Secondly, you take action to minimise the impact.
We recommend checking the PageRank of the offending site or sites using an online tool. If the PageRank of the site is higher than your site, a pragmatic approach would be to rework your content again to make it unique. We have seen evidence here that a website with higher PageRank will out-rank the site with the original content. If the PageRank is lower and the site is not out-ranking you, then we recommend leaving your content as is.
If rewriting content that you've worked hard to create and optimise doesn't sit well with you, you could try to contact the website owner to ask them to remove the plagiarised content. This approach often pays off, especially if you can prove that yours was written first.
If you find the same company persistently stealing your content, and they reside in the same country as you, then it may be worth paying a solicitor to write a letter on your behalf warning them to stop stealing content. Further to that here in the UK, we have the small claims court which, for disputes under £10k, can be a cost-effective route forward.
There are a few other tricks you can use to help combat lazy content scrapers, such as:
- Embed lots of internal links to related pages into your content. That way, if they simply copy and paste from your site, you'll at least benefit from some new links!
- For blog posts, you could add text to the bottom of your post along the lines of "The post #BlogPostTitle# first appeared on yourdomain.co.uk"
- You can stop them serving your images by blocking access to your site from their domain
It's demoralising when stolen content persistently out-ranks you. And it's tempting for time-pressed employees, or agencies lacking creativity, to take short-cuts and rely on AI tools more than they should. Don't fall foul of Google's guidelines; carry out your own checks and maintain your website's high quality and performance.