| Pagejacking
- Identifying and Dealing with Pagejackers
by Michael Bloch
This article on pagejacking is the result of a recent experience
we had with a competitor who thought it would be a good idea to
copy an entire page from our site using a sneaky method. It turned
out he had done the same to dozens of others. I think he's regretting
that strategy now :).
What is pagejacking?
In essence, pagejacking is the copying of a page by unauthorized
parties in order to filter off traffic to another site. The copying
doesn't include just the wording - it's the whole box and dice.
Traffic to the illegitimate page is then usually redirected to a
competing, or at times, totally unrelated offer.
Why do people pagejack?
When you have the good fortune of having a page that ranks highly
in the SERP's (Search Engine Results Pages); it brings you both
good and bad attention. Some unscrupulous individuals make take
copies of your pages in an attempt to get equally high, or higher
rankings and therefore capturing some of the traffic that really
should have gone to your site.
In the instance where the pagejacker is also well versed in search
engine optimization; it can be the case that the *majority* of search
engine traffic that usually arrives on your site is redirected to
the pagejacker. As you can imagine, this can be very costly to your
online business.
How is pagejacking executed?
The "newbie" pagejacker simply copies your page in it's
entirety and pastes it into another page on his own site. They may
add some of their own offers to the page and adjust the links in
your content to point to other pages on their site. Only the most
stupid of pagejackers use this process.
The more advanced pagejacking strategy is quite clever. First, a
copy of your page is taken. A page is then created on the pagejackers
site that is basically a carbon copy of your content - including
meta-tags. The pagejacker then adds extra scripting to allow only
search engine robots to be able to read the content of the page.
A 302 .htaccess redirect or meta-refresh is then used to automatically
redirect human viewers to a totally different page - they never
see your content.
How do I detect pagejacking?
You can detect pagejacking quite easily as most pagejackers will
only bother with pages that have decent search engine rankings.
Use the following process:
Identify a couple of phrases that are rather uncommon in a popular
page on your site. Run these phrases through a query on the most
popular search engines such as Google, Yahoo and MSN. When querying
the engines, ensure your encapsulate your query with quotes; e.g.
"the flomble is pink with black stripes".
In the results that come back, as long as the phrase you have used
is uncommon, you'll probably only see your page and instances of
pagejacking. Even if you're not able to use an uncommon phrase as
the basis of your search criteria, or you allow the reproduction
of some of your content on other sites and you wind up with 100
results, go through all the results pages anyway. Yahoo, Google
and MSN always show extended snippets from the page which will make
it easier to identify a site that is using pagejacked content.
To confirm that the suspect listing is in fact pagejacked content,
instead of clicking on the link to the page in the search engine
results, click on the "cached" option. It will display
the page as it appeared to the search engine robot the last time
it was crawled. High ranking pages are usually crawled quite regularly,
so the cached copy should be reasonably fresh.
How do I deal with pagejacking?
Pagejackers by nature are a snivelling, cowardly breed and easy
to deal with if you go about it in the right way.
If you have identified pagejacked content, the first thing you need
to do is to save the cached copy of the page - this is very important
as it is solid evidence.
One of the great features of Google is that when it displays cached
copies of pages, it adds a box to the top of it with identifying
information, including the URL and the date the cached copy was
taken.
If you are using Internet Explorer, to save a copy of the cached
page, simply go to "File", select "Save as"
and in the "Save as type" dropdown option, choose "Web
archive, single file (*.mht)". This option will download everything,
including images and the Google info box into a single file. Having
a single file makes it easier to transmit to other parties during
the follow up process. Once you have the archive file safely stored
on your own computer, it's time to swing into action.
The first thing you should do is to contact the owner of the site.
There is no need to be overly polite in the notification, but also
do not be abusive. Bear in mind that in some cases, the pagejacker
may *not* be the actual site owner. The owner of the site may have
employed an unethical optimization company who used the pagejacking
technique. Regardless, it is the site owners' responsibility to
deal with the situation.
I recommend writing a brief note along these lines:
Subject = "Copyright infringement - (Domain Name)"
Body = "It has come to my attention that you have made an unauthorized
use of my copyrighted work located here; (copyrighted work URL),
by reproducing it on your site (their URL with infringing copy).
At no time have I given permission for you to reproduce my original
content in such a way.
A cached copy from Google of the illegally copied content on your
site is attached, along with details as to its location on your
site and the date it was gathered. It appears that my content is
being used on your site as part of a pagejacking strategy and is
visible only to search engines.
As the legal owner of this copyrighted content, I demand that you
remove my property from your site immediately.
You have 72 hours to remove this content. If the content is not
removed within this time frame, then I will find it necessary to
take further action; including contacting Google, your hosting service
and any other legal avenues I have at my disposal.
Sincerely
Your name
Your contact details"
Ensure you flag the email as urgent and select the read receipt
option in your email software. If after 72 hours, the content is
not removed, you should first contact the company hosting the site.
These details, as well as the domain name registrant, can usually
be found on the WHOIS record for the domain name by looking at the
nameserver information, or by running a trace on the domain name.
If you do find it necessary to contact the hosting service, check
the host's site first for guidelines for copyright complaints. Each
company may differ slightly in terms of copyright infringement complaints
processes and it's important that you follow their submission guidelines
carefully.
If the infringement has caused you a major loss in profit, then
it is advisable that you contact your lawyer before taking any sort
of action if it is within your means to do so.
How do I prevent pagejacking?
In short - you don't. It gets to a point where you can spend so
much time in trying to protect your online business from parasites
and copycats that you may as well not bother with having a site
at all. Monitoring is the key in relation to pagejacking.
Other possible negative effects of pagejacking
I've read a number of reports on the subject of pagejacking that
appear to indicate that some search engines will favor the pagejacked
page over the original one to the point that the original page will
be dropped from the SERPs altogether. The reason for this is that
most search engines employ duplicate content filters - and the way
some work is that the higher ranking page is usually the one that
is kept.
One very important negative effect of pagejacking is damage to your
brand. For instance, a pagejacker may copy a page that contains
multiple instances of your business or product name. If the pagejacker
is successful in achieving consistently higher rankings than your
own content, unsuspecting surfers may begin to associate the brand
with misleading content and steer clear of it altogether.
Protecting your site from online parasites is an ongoing battle;
I hope this article has assisted you in dealing with one aspect
of this multi-faceted war.
Related learning resources
Preventing
credit card fraud
Pay
per click fraud - ppc anti-fraud strategies and tools
Michael Bloch, Taming the Beast
http://www.tamingthebeast.net
Tutorials, web content, tools and software.
Web Marketing, Internet Development & Ecommerce Resources
Return to the Resources
Archive
|
|