Published in Blogging on Monday, February 20th, 2006
Yep, it was that time again. Time to do the random searches and see who has been lifting content off of some of my other sites and my client's sites.
With the proliferation of splogs and the ease of chewing up and spitting out RSS, it has become an almost monthly habit of mine to do a little searching for people who have been ripping off content on sites that I manage.
Here's how I find it, and what I do to deal with the issue.
Google of course works fairly well for finding duplicate content on the web, but the tool of choice for this task is Copyscape.
I like to go in and test a few random pages along with the money pages to see what I can find. Sadly I almost always find something. Happily, though, it can usually be resolved quite quickly, because the law is on your side - at least that has been my experience.
I find that flagrant reproduction of RSS feeds is an easy one to handle. A simple note explaining that
If the content is not removed in 48 hours we will be advising your hosts, your registrar and the major search engines of the infraction will usually get the ball rolling quite nicely.
Aside: I once called a guy after getting his whois info, and let me say that was very effective, though I don't recommend it as it is far easier to keep your cool in writing.
People tend not to put up a fight (often replying that their tech guy was responsible - sheesh), but if they do, fire them this link to help them get informed (though they likely know that they're on the dark side of the law). This part is rarely necessary, but it's a nice touch because they will take you very seriously if they understand that you know where you are coming from.
Plain theft of copy (as in not RSS republishing) can be a bit more difficult (who's copy is it?), but quite often, as Mike Davidson explains here, people who have been caught generally back down quite quickly. (Replace the word you in that comment with ISP and it is a decent description of how a DMCA complaint against a site works.)
In the end the thieves will usually back down - quite often the fear of losing their Adsense account is enough motivation.
A lot of people think that this is a losing battle, and truth be told, it can be a difficult issue to keep tabs on.
The major issue for me is duplicate content in Google - I've had other sites ranking above mine where they are running my content - I'm not a big fan of that.
So for me it is worth keeping a lookout once in a while. This is especially easy with newer sites that get little traffic, but if you know your sites well enough, you'll can see dips in traffic to certain areas that you know should be higher. That can be a sign that it's time to do some research!
I started freelancing by diving in head first and getting on with it. Many years and a lot of experience later I was still able to take away some gems from this book, and there are plenty I wish I had thought of beforehand. If you are new to freelancing and have a lot of questions (or maybe don't know what questions to ask!) do yourself a favor and at least check out the sample chapters.
Like the other books listed here, this provides a great reference for the PHP developer looking to have the right answers from the right people at their fingertips. I tend to pull this off the shelf when I need to delve into new territory and usually find a workable solution to keep development moving. This only needs to happen once and you recoup the price of the book in time saved from having to develop the solution or find the right pattern for getting the job done..