Beginner’s Guide to Preventing Blog Content Scraping in WordPress

If you write original content day in and day out, you already are aware of the fact that your posts will end up on bunch of SPAM sites within a few days sometimes even few minutes. Some users even noted that the site with stolen content outranked the original post. It is very frustrating as a website owner to see that someone is stealing your content without permission, monetizing it, outranking you in SERPs, and stealing your audience. Content Scraping is a huge problem these days considering that it is so easy for someone to steal your content. In this article, we will cover what is blog content scraping, how to catch content scrapers, how to deal with content scrapers, how you can reduce and prevent content scraping, how to take advantage of content scraping, how to make money from content scrapers, and is content scraping ever good?

What is Blog Content Scraping?

Blog content scraping is an act usually performed with scripts that extract content from numerous sources and pulls it into one site. It is so easy now that anyone can install a WordPress site, put a free or commercial theme, and install a few plugins that will go and scrape content from selected blogs, so it can be published on their site.

Why are they Stealing my Content?

Some of our users have asked us why are they stealing my content? The simple answer is because you are AWESOME. The truth is that these content scrapers have ulterior motives. Below are just few reasons why someone would scrape your content:

  • Affiliate commission – There are some dirty affiliate marketers out there that just wants to exploit the system to make few extra bucks. They will use your content and other’s content to bring traffic to their site through search engine. These sites are usually targeted towards a specific niche, so they have related products that they are promoting.
  • Lead Generation – Often we see lawyers and realtors doing this. They want to seem like industry leaders in their small communities. They do not have the bandwidth to produce quality content, so they go out and scrape content from other sources. Sometimes, they are not even aware of this because they are paying some scumbag $30/month to add content and help them get better SEO. We have encountered quite a few of these in the past.
  • Advertising Revenue – Some folks just want to create a “hub” of knowledge. A one-stop-shop for users in a specific niche. If I had a penny for every time someone has done this with our content, then we would have a few hundred pennies. Often we notice that our site content is being scraped. The scraper always replies, I was doing this for the good of the community. Except the site is plastered with ads.

These are just a few reasons why someone would steal your content.

How to Catch Content Scrapers?

Catching content scrapers is a tedious task and can take up a lot of time. The are few ways that you can utilize to catch content scrapers.

Search Google with Your Post Titles

Yup that is as painful as it sounds. This method is probably not worth it especially if you are writing about a very popular topic.

Trackbacks

If you add internal links in your posts, you will notice a trackback if a site steals your content. This way is pretty much the scraper telling you that they are scraping your content. If you are using Akismet, then a lot of these trackbacks will show up in the SPAM folder. Again, this will only work if you have internal links in your posts.

Webmaster Tools

If you use google webmaster tools, then you are probably aware of the Links to your site page. If you look under “Traffic”, you will see a page that says Links to your site. Chances are your scrapers will be among the top ones there. They will have hundreds if not thousands of links to your pages (considering that you have internal links).

Links to Your Site - Google Webmaster Tools

FeedBurner Uncommon Uses

If you have setup Feedburner for your WordPress blog, then you can see some uncommon uses. In the Analyze Tab under Feed Stats, you will see “Uncommon Uses”. There you will see a list of sites.

FeedBurner Uncommon Uses

How to Deal with Content Scrapers

There are few approaches that people take when dealing with content scrapers. The Do Nothing Approach, Kill them all approach, Take Advantage of them approach.

The Do Nothing Approach

This is by far the easiest approach you can take. Usually the most popular bloggers would recommend this because it takes A LOT of time fighting the scrapers. This approach simply recommends that “instead of fighting them, spend your time producing even more quality content and having fun”. Now obviously if it is a well-known blog like Smashing Magazine, CSS-Tricks, Problogger, or others, then they do not have to worry about it. They are authority sites in Google’s eyes.

However during the Panda Update, we know some good sites got flagged as scrapers because google thought their scrapers were original content. So this approach is not always the best in our opinion.

Kill them all Approach

The exact opposite of the “Do Nothing Approach”. In this approach, you simply contact the scraper and ask them to take the content down. If they refuse to do so or simply do not reply to your requests, then you file a DMCA (Digital Millennium Copyright Act) with their host. In our experience, majority of the scraping websites do not have a contact form available. If they do, then utilize it. If they do not have the contact form, then you need to do a Whois Lookup.

Whois Lookup

You can see the contact info on the administrative contact. Usually the administrative, and technical contact is the same. The whois also shows the domain registrar. Most well-known web hosting companies and domain registrars have DMCA forms or emails. You can see that this specific person is with Hostgator because of their nameservers. HostGator has a form for DMCA complaints. If the nameserver is something like ns1.theirdomain.com, then you have to dig deeper by doing reverse IP lookups and searching for IPs.

You can also use a third party service for DMCA.com for takedowns.

Jeff Starr in his article suggest that you should block the bad guy’s IPs. Access your logs for their IP address, and then block it with something like this in your root .htaccess file:

Deny from 123.456.789

You can also redirect them to a dummy feed by doing something like this:

RewriteCond %{REMOTE_ADDR} 123.456.789.
RewriteRule .* http://dummyfeed.com/feed [R,L]

You can get really creative here as Jeff suggests. Send them to really large text feeds full with Lorem Ipsum. You can send them some disgusting images of bad things. You can also send them right back to their own server causing an infinite loop which will crash their site.

The last approach that we take is to take Advantage of them.

How to Take Advantage of Content Scrapers

This is our approach of dealing with content scrapers, and it turns out quite well. It helps our SEO as well as help us make extra bucks. Majority of the scrapers use your RSS Feed to steal your content. So these are some of the things that you can do:

  • Internal Linking – You need to interlink the CRAP out of your posts. With the Internal Linking Feature in WordPress 3.1, it is now easier than ever. When you have internal links in your article, it helps you increase pageviews and reduce bounce rate on your own site. Secondly, it gets you backlinks from the people who are stealing your content. Lastly, it allows you to steal their audience. If you are a talented blogger, then you understand the art of internal linking. You have to place your links on interesting keywords. Make it tempting for the user to click it. If you do that, then the scraper’s audience will too click on it. Just like that, you took a visitor from their site and brought them back to where they should have been in the first place.
  • Auto Link Keywords with Affiliate Links – There are few plugins like Ninja Affiliate and SEO Smart Links that will automatically replace assigned keywords with affiliate links. For example: HostGator, StudioPress, MaxCDN, Gravity Forms << These all will be auto-replaced with affiliate links when this post goes live.
  • Get Creative with RSS Footer – You can either use the RSS Footer or WordPress SEO by Yoast Plugin to add custom items to your RSS Footer. You can add just about anything you want here. We know some people who like to promote their own products to their RSS readers. So they will add banners. Guess what, now those banners will appear on these scraper’s website as well. In our case, we always add a little disclaimer at the bottom of our posts in our RSS feeds. It simply reads like “How to Put Your WordPress Site in Read Only State for Site Migrations and Maintenance is a post from: WPBeginner which is not allowed to be copied on other sites.” By doing this, we get a backlink to the original article from scraper’s site which lets google and other search engines know we are authority. It also lets their users know that the site is stealing our content. If you are good with codes, then you can totally get nuts. Such as adding related posts just for your RSS readers, and bunch of other stuff. Check out our guide to completely manipulating your WordPress RSS feed.

How You Can Reduce Blog Content Scraping and Possibly Prevent It

Considering if you take our approach of lots of internal linking, adding affiliate links, rss banners and such chances are that you will reduce content scraping to good measure. If you take Jeff Starr’s suggestion of redirecting content scrapers, that too will stop those scrapers. Aside from what we have shared above, there are a few other tricks that you can use.

Full vs. Summary RSS Feed

There has been a debate in the blogging community whether to have full RSS feed or summary RSS feed. We are not going to go into much details about that debate, however one of the PROS of having a Summary Only RSS feed is that you prevent content scraping. You can change the settings by going to your WordPress admin panel and going under Settings » Reading. Then change the setting For each article in a feed show: Summary.

Note: We have full feed because we care more about our RSS readers than the spammers.

Trackback SPAM

Trackbacks and Pingbacks definitely had great uses however, they are now constantly being abused. Often themes display trackbacks and pingbacks under or among the comments. This gives the spammer an incentive to scrape your site and send trackbacks. If you mistakenly approves it, then they get a backlink and mention from your site. Here is how you can disable Trackbacks on all future posts. Here is an article that will show you how to disable trackbacks and pings on existing WordPress posts as well.

Is Content Scraping Ever Good?

It can be. If you see that you are making money from the scraper’s site, then sure it can be. If you see a lot of traffic from a scraper’s site, then it can be. In most cases however, it is not. You should always try to get your content taken off. But you will realize as your blog gets larger, it is almost impossible to keep track of all content scrapers. We still send out DMCA complaints, however we know that there are tons of other sites that are stealing our content that we just cannot keep up with.

What are your thoughts? Do you use any other mechanics to prevent content scraping? Would love to hear your thoughts.