Dec 05

Google announced a new markup to communicate multilingual content to Google’s spiders. The new link element is rel=”alternate” hreflang=”x” where you define the language and location in the hreflang area. Here are examples of how you may use it respectively:…



Please visit Search Engine Land for the full article.




Search Engine Land: News & Info About SEO, PPC, SEM, Search Engines & Search Marketing

Tags: , , , , , ,

Dec 03

Posted by caseyhen

This week we are joined by Carlos del Rio from Agillian, who is based here in Seattle, WA. Carlos will discusses a method that will help you make great content by following 3 easy steps. After watching the video dive into the comments and discuss what your thoughts are on using CRO to make great content.

P.S. It looks like we might have also been joined by a fly, so please excuse him when he flies across the screen a few times….

if(!navigator.mimeTypes['application/x-shockwave-flash'] || navigator.userAgent.match(/Android/i)!==null)Wistia.VideoEmbed(‘wistia_664695′,600,338,{videoUrl:’http://seomoz-cdn.wistia.com/deliveries/fce06f8277632e3b98663101f3971ffd0ebe5344.bin’,stillUrl:’http://seomoz-cdn.wistia.com/deliveries/55b661785e0bc3609452cc977eb21a2c6a0b1153.bin’,distilleryUrl:’http://distillery.wistia.com/x’,accountKey:’wistia-production_3161′,mediaId:’wistia-production_664695′,mediaDuration:359})

var socialJQuery = jQuery.noConflict(true);new SocialBar(“wistia_664695_social_9338″, {buttons:["embed","stats","wistia"], badgeUrl:”http://wistia.com”, embedCode:”%3Cobject%20width%3D%22600%22%20height%3D%22338%22%20id%3D%22wistia_664695%22%20classid%3D%22clsid%3AD27CDB6E-AE6D-11cf-96B8-444553540000%22%3E%3Cparam%20name%3D%22movie%22%20value%3D%22http%3A//seomoz-cdn.wistia.com/flash/embed_player_v1.2.swf%22/%3E%3Cparam%20name%3D%22allowfullscreen%22%20value%3D%22true%22/%3E%3Cparam%20name%3D%22allowscriptaccess%22%20value%3D%22always%22/%3E%3Cparam%20name%3D%22wmode%22%20value%3D%22opaque%22/%3E%3Cparam%20name%3D%22flashvars%22%20value%3D%22videoUrl%3Dhttp%3A//seomoz-cdn.wistia.com/deliveries/68dfa8743c83ab08a0bc7e0fbbda7e866c337e40.bin%26stillUrl%3Dhttp%3A//seomoz-cdn.wistia.com/deliveries/55b661785e0bc3609452cc977eb21a2c6a0b1153.bin%26unbufferedSeek%3Dtrue%26controlsVisibleOnLoad%3Dtrue%26autoPlay%3Dfalse%26endVideoBehavior%3Ddefault%26playButtonVisible%3Dtrue%26embedServiceURL%3Dhttp%3A//distillery.wistia.com/x%26accountKey%3Dwistia-production_3161%26mediaID%3Dwistia-production_664695%26mediaDuration%3D359%26hdUrl%3Dhttp%3A//seomoz-cdn.wistia.com/deliveries/4811617e411c640cebe513fbb9d80222637d4950.bin%26showPlayButton%3Dfalse%26showPlaybar%3Dfalse%22/%3E%3Cembed%20src%3D%22http%3A//seomoz-cdn.wistia.com/flash/embed_player_v1.2.swf%22%20width%3D%22600%22%20height%3D%22338%22%20name%3D%22wistia_664695%22%20type%3D%22application/x-shockwave-flash%22%20allowfullscreen%3D%22true%22%20allowscriptaccess%3D%22always%22%20wmode%3D%22opaque%22%20flashvars%3D%22videoUrl%3Dhttp%3A//seomoz-cdn.wistia.com/deliveries/68dfa8743c83ab08a0bc7e0fbbda7e866c337e40.bin%26stillUrl%3Dhttp%3A//seomoz-cdn.wistia.com/deliveries/55b661785e0bc3609452cc977eb21a2c6a0b1153.bin%26unbufferedSeek%3Dtrue%26controlsVisibleOnLoad%3Dtrue%26autoPlay%3Dfalse%26endVideoBehavior%3Ddefault%26playButtonVisible%3Dtrue%26embedServiceURL%3Dhttp%3A//distillery.wistia.com/x%26accountKey%3Dwistia-production_3161%26mediaID%3Dwistia-production_664695%26mediaDuration%3D359%26hdUrl%3Dhttp%3A//seomoz-cdn.wistia.com/deliveries/4811617e411c640cebe513fbb9d80222637d4950.bin%26showPlayButton%3Dfalse%26showPlaybar%3Dfalse%22%3E%3C/embed%3E%3C/object%3E%3Cscript%20src%3D%22http%3A//seomoz-cdn.wistia.com/embeds/v.js%22%20charset%3D%22ISO-8859-1%22%3E%3C/script%3E%3Cscript%3Eif%28%21navigator.mimeTypes%5B%27application/x-shockwave-flash%27%5D%20%7C%7C%20navigator.userAgent.match%28/Android/i%29%21%3D%3Dnull%29Wistia.VideoEmbed%28%27wistia_664695%27%2C600%2C338%2C%7BvideoUrl%3A%27http%3A//seomoz-cdn.wistia.com/deliveries/fce06f8277632e3b98663101f3971ffd0ebe5344.bin%27%2CstillUrl%3A%27http%3A//seomoz-cdn.wistia.com/deliveries/55b661785e0bc3609452cc977eb21a2c6a0b1153.bin%27%2CdistilleryUrl%3A%27http%3A//distillery.wistia.com/x%27%2CaccountKey%3A%27wistia-production_3161%27%2CmediaId%3A%27wistia-production_664695%27%2CmediaDuration%3A359%7D%29%3C/script%3E”})

Video Transcription

Hey Mozzers. I'm Carlos del Rio. I own a consultancy called Agillian, and I am the author of "User Driven Change: Give Them What They Want" and a "Strategic Framework for Emerging Media," which is kind of a mouthful. Even I have trouble saying it.

I am here today to tell you how to use CRO to make great content, and when I say to make great content, I mean for any portion of your marketing campaign. So, you need to make sure that you meet the most basic portion of conversion optimization. I mean the three things that are the most important for all conversion rates are a clear action, a clear purpose, and a clear value. That's what every landing page is trying to do. That's what every pay-per-click ad is trying to do. Tell a person what you want them to do, tell them what it is about, and communicate what the value they're going to get out of the interaction. So, "Buy tires cheap," or "Buy tires, free delivery." Something where they know what it is that they are coming for and that they get something at the other end. For example, if you are writing a piece of content for your blog, you want to be able to answer, "Is it clear what the purpose of this blog is? Is it clear what the topic is? Is it clear that there is a value for this person to share it with their friends?" Essentially if you are doing blog and content marketing, it is really for the links. We know that's what it's about. Same thing with if you're making LOLcats. Same thing if you're sending out an email to solicit a link buy.

So, in all of your strategies you want to know what is this particular campaign doing. Is it helping our users understand what they can do with us? Is it helping them understand who we are, or is it helping them understand what the value is? Each one of the individual pieces, like each piece of link bait or each email or each tool that you build is supposed to answer all three of these very clearly. You want to know exactly how to interact with it. You want to know what it is going to do. You want to know why is it of value to you.

So, if you take the example of, like, LOLcats, we've all seen these. The difference between the millions of LOLcats that nobody cares about and the LOLcats that end up being in your Facebook stream every 15 minutes are that the ones that get shared answer the clear action, which is share me; what is the purpose, this is a LOLcat; and what is the value, this is the funniest LOLcat that I've seen all day. This is the LOLcat that crosses over with my community. If I was to make a cat playing on a computer that said, "I'm up in your Internet messing with your title tags," you're going to find that funny because you are in SEO, but almost everybody else is going to be like, uh, lame.

If you were, say, This or That, Rebecca Kelley did a thing recently that was, "Does Justin Beiber look like Velma from Scooby Doo?" This enrages both people who like Justin Beiber and people who like Velma. So, what she is doing is creating a place where you interact with this piece of content, and she has two groups of people that want to interact with this type of content. They get to show what they think, and they get a value out of having you know what they think. When they pass this on to their friends who come in and do those three things to derive value for themselves, you get traffic, which you are monetizing.

It is the same thing with the LOLcats. Cheezburger makes money off of people coming to visit. They get people coming to visit by thinking about a clear action, a clear purpose, and a clear value from the perspective of their users.

In the same way, you are here in the Moz community, and they have two kinds of users. They have basic users and they have premium users. Well, they keep building new tools, and they have to think about: What is the action of this tool, what is the purpose of this tool, and is it going to be valuable to the community? When they write out to every one of the basic members and say, "We have this great new tool," they have to really go through this process twice. They have the process of does the tool meet these standards? Is it clear what I can do with the tool? Is it clear what the tool is going to deal with? Is it clear that I can get some value out of it? They also have to write an email that it's very clear what they want you to do, which is switch from being a basic to being a premium user. It has to be very clear what this tool is going to do for you, and it has to be very clear that you're going to derive value out of it. Otherwise, they aren't going to get a good conversion rate.

So, hopefully, these examples will give you something that's actionable for your business and let you take conversion rate optimization into all of the things that you're doing for your marketing.

I'm Carlos del Rio. Thanks.

Video transcription by Speechpad.com

Do you like this post? Yes No


SEOmoz Daily SEO Blog

Tags: , , , ,

Dec 02

Yahoo is cleaning up Associated Content by deleting some of the articles, moving the keepers to Yahoo’s domain and giving the site a new name, too. And next year, Yahoo will begin an online training course to help its writers create higher quality content. The company has announced plans to…



Please visit Search Engine Land for the full article.




Search Engine Land: News & Info About SEO, PPC, SEM, Search Engines & Search Marketing

Tags: , , , , , , , , ,

Nov 18

Posted by Kenny Martin

When keyword targeting is approached separately from a content creation strategy, the concocted results can often leave us scratching our heads and pointing fingers at the malformed "Frankenpages." By fostering a more cohesive relationship between these traditionally detached endeavors, we can greatly enhance our results and deliver considerable value to our audience.

This week Rand shows us how we can move past conventional keyword targeting practices and generate web pages that won't leave us "running for the hills."

if(!navigator.mimeTypes['application/x-shockwave-flash'] || navigator.userAgent.match(/Android/i)!==null)Wistia.VideoEmbed(‘wistia_626659′,600,338,{videoUrl:’http://seomoz-cdn.wistia.com/deliveries/03bfe695c0811152c74171f5f680e2b84e91250f.bin’,stillUrl:’http://seomoz-cdn.wistia.com/deliveries/36fd6b8358a8bf65a2a34e8026be42bf81034a8a.bin’,distilleryUrl:’http://distillery.wistia.com/x’,accountKey:’wistia-production_3161′,mediaId:’wistia-production_626659′,mediaDuration:556})

var socialJQuery = jQuery.noConflict(true);new SocialBar(“wistia_626659_social_4939″, {buttons:["embed","stats"], badgeUrl:”http://wistia.com”, embedCode:”%3Cobject%20width%3D%22600%22%20height%3D%22338%22%20id%3D%22wistia_626659%22%20classid%3D%22clsid%3AD27CDB6E-AE6D-11cf-96B8-444553540000%22%3E%3Cparam%20name%3D%22movie%22%20value%3D%22http%3A//seomoz-cdn.wistia.com/flash/embed_player_v1.2.swf%22/%3E%3Cparam%20name%3D%22allowfullscreen%22%20value%3D%22true%22/%3E%3Cparam%20name%3D%22allowscriptaccess%22%20value%3D%22always%22/%3E%3Cparam%20name%3D%22wmode%22%20value%3D%22opaque%22/%3E%3Cparam%20name%3D%22flashvars%22%20value%3D%22videoUrl%3Dhttp%3A//seomoz-cdn.wistia.com/deliveries/97153182143cfa631408718f1b8d92aebdcdbf29.bin%26stillUrl%3Dhttp%3A//seomoz-cdn.wistia.com/deliveries/36fd6b8358a8bf65a2a34e8026be42bf81034a8a.bin%26unbufferedSeek%3Dtrue%26controlsVisibleOnLoad%3Dfalse%26autoPlay%3Dfalse%26endVideoBehavior%3Ddefault%26playButtonVisible%3Dtrue%26embedServiceURL%3Dhttp%3A//distillery.wistia.com/x%26accountKey%3Dwistia-production_3161%26mediaID%3Dwistia-production_626659%26mediaDuration%3D556%26hdUrl%3Dhttp%3A//seomoz-cdn.wistia.com/deliveries/f93e23c2ade525b80323ac1797e9f62291187400.bin%22/%3E%3Cembed%20src%3D%22http%3A//seomoz-cdn.wistia.com/flash/embed_player_v1.2.swf%22%20width%3D%22600%22%20height%3D%22338%22%20name%3D%22wistia_626659%22%20type%3D%22application/x-shockwave-flash%22%20allowfullscreen%3D%22true%22%20allowscriptaccess%3D%22always%22%20wmode%3D%22opaque%22%20flashvars%3D%22videoUrl%3Dhttp%3A//seomoz-cdn.wistia.com/deliveries/97153182143cfa631408718f1b8d92aebdcdbf29.bin%26stillUrl%3Dhttp%3A//seomoz-cdn.wistia.com/deliveries/36fd6b8358a8bf65a2a34e8026be42bf81034a8a.bin%26unbufferedSeek%3Dtrue%26controlsVisibleOnLoad%3Dfalse%26autoPlay%3Dfalse%26endVideoBehavior%3Ddefault%26playButtonVisible%3Dtrue%26embedServiceURL%3Dhttp%3A//distillery.wistia.com/x%26accountKey%3Dwistia-production_3161%26mediaID%3Dwistia-production_626659%26mediaDuration%3D556%26hdUrl%3Dhttp%3A//seomoz-cdn.wistia.com/deliveries/f93e23c2ade525b80323ac1797e9f62291187400.bin%22%3E%3C/embed%3E%3C/object%3E%3Cscript%20src%3D%22http%3A//seomoz-cdn.wistia.com/embeds/v.js%22%20charset%3D%22ISO-8859-1%22%3E%3C/script%3E%3Cscript%3Eif%28%21navigator.mimeTypes%5B%27application/x-shockwave-flash%27%5D%20%7C%7C%20navigator.userAgent.match%28/Android/i%29%21%3D%3Dnull%29Wistia.VideoEmbed%28%27wistia_626659%27%2C600%2C338%2C%7BvideoUrl%3A%27http%3A//seomoz-cdn.wistia.com/deliveries/03bfe695c0811152c74171f5f680e2b84e91250f.bin%27%2CstillUrl%3A%27http%3A//seomoz-cdn.wistia.com/deliveries/36fd6b8358a8bf65a2a34e8026be42bf81034a8a.bin%27%2CdistilleryUrl%3A%27http%3A//distillery.wistia.com/x%27%2CaccountKey%3A%27wistia-production_3161%27%2CmediaId%3A%27wistia-production_626659%27%2CmediaDuration%3A556%7D%29%3C/script%3E”})

Video Transcription

Howdy SEOmoz fans! Welcome to another edition of Whiteboard Friday. Thrilled to have you with us. Today we're talking about mapping keywords to content for maximum impact.

Now the problem is that a lot of folks think about the world of keyword research and keyword targeting separately from the worlds of content creation. This can happen a lot of the times because the SEO person is not always involved in the design of the content strategy or what's going to go on the website. They're brought in after the fact, maybe in an internal role or in an external consulting role. That can be super frustrating. Let me show you, give you an example of, sort of the traditional keyword targeting process and why this is so bad.

So here's Mr. Biz Owner, and he would like to rank well for oven mitts. A perfectly reasonable request, want to rank for oven mitts. Great. All right. So the SEO person is brought in, and the SEO person goes, "Well, you know, I want to be able to make some changes. I need to add some content to your website." The business owner is like, "No, no, no, no, no. I already have a page. I just want it to rank for oven mitts." Well, okay. Let's chose the best page you've got for oven mitts and we'll try to make that one rank better. The business owner is like, "All right. All right. Good job. Good job. I appreciate that. You did good work. Now I want to rank well for heat retardant oven mitts." The SEO is like, "Well, okay. You know what? We can modify that page again and target that particular phrase."

But this cycle goes on and on and on. Soon enough you'll have Frakenpage, ooh, super scary. He's trying to target ridiculous terms like "advanced kid- friendly oven mitts for hardcore baked lentils." You're like, "How did this happen? How did this Frakenpage get here?"

Well, it got there because of this process, this broken process of the SEO not being the person with the authority or the influence to be able to choose what content needs to be existing on the website and what content needs to be targeting which keywords. This happens all over the Web. You can click on tons of search results in all sorts of verticals and sort of be like, "What were they thinking when they made this page?" It's not that the website is all that bad or they have done something terrible in SEO. It's just that it is not strategic. It is a very tactical approach to SEO, and that tends to lose out over time to pages that are built specifically for users searching for those things that deliver everything they want in the content.

So, let's talk about a strategy to do exactly that. Over here we have a better process. No Frakenpages.

Step one: Establish the full list of keywords. Rather than going sort of one by one and saying, oh, we want to target this, we want to target that, it's nice to be able to start with that full list of keywords. As you refine, if you need to refine that keyword list, beginning again with this process and making sure that the new keywords that you need to be targeting work into the process in this way. We've got our full list of keywords to target. Hopefully, we've figured out how valuable and important they are so we have our spreadsheet. We say, "Well, these are the top converting keywords. These are the ones that send the most traffic, and these are the ones with the lowest difficulty. So based on those three factors, this is how we want to target them." Then we'll map the keywords to existing content based on their relevance. So this means does the page's content actually serve the needs of a keyword phrase that they are targeting? So, if you have a heat-retardant oven mitts page, does that actually contain heat-retardant oven mitts? Is that a full category page? Is it a subcategory page? Is it a single item that happens to be the most heat- retardant oven mitts? Is it a brand page? What is it? We make sure that it is relevant.

Second, we're going to target user intent. This means not just thinking about whether the page is relevant for the keyword, but thinking about, "What does the user want when he gets to this page?" If I am searching for heat-retardant oven mitts, I probably want a bunch of information about why it's heat retardant, what it's made from, explaining to me what kind of temperatures it can handle. I want to know information about where I can buy these magical oven mitts, what the sources are, what the different brands are. I'd like to be able to filter on that data. Maybe I even want tutorials and demos on like, oh, well, this is the kinds of things that you could cook with them. Cool.

Then you can think about yourself, about conversion goals. So you make them happy and they'll make you happy. The conversion goal can be we want them to sign up for an email, we want them to click on a button, we want them to add this to their cart, we want them to convert out of the store. Great. Whatever that is, fine, super.

Then we have step two and a half, which is sort of an interim here. The reason we've got it is because a lot of the times when you're mapping keywords to content, it is not a 1:1 ratio. This again can make for Frakenpages unless you're careful. So, you want to be selecting is this a multiple or a singular keyword page focus. Meaning for the oven mitts, for just that broad keyword phrase, I might suggest, in fact, I'd probably be very strongly suggesting to a business owner who has a website about oven mitts, that that should be one page in and of itself. We should not try to make this a multiple keyword targeting page because we don't know what the user intent is. Someone who has that broad of a phrase is going to need to do a lot of research and discover whether they want heat retardant ones or they want ones for grills or pit fires, or they're looking for a certain material, they want it to withstand certain temperatures, they're looking for kid-friendly gloves, they want gloves for certain sizes, they want gloves with fingers on them or gloves that are just the classic mitt form. Whatever that is, we need to be providing them with a ton of different sorts of data. So, this page is going to have all sorts of selections and things. That has to map to A, B, and C here, or we're going to lose out and that's why I wouldn't try to get a bunch of different phrases ranking for this.

You could conceivably, maybe it's possible that you would have a page for oven mitts and oven gloves and target both on the same one. So oven mitts and gloves could be a page title, could be the target. But I don't know. I think gloves specifies fingers and mitts specifies just like this, and then they're the hybrid ones that has the one finger. I don't know where those go. Kitchen people will figure that out. Don't worry.

Then you have things like, oh, well, this page, oven mitts for kids, that can target lots of keywords like child-friendly oven mitts or kid-friendly oven mitts or children's sizes, oven mitts in children's sizes. So you take the user intent and the relevance of the keyword and you add those onto the page and then you can figure out what are all the pages that the kid- friendly one should target. We'll make the most important ones in the title. We'll put maybe the secondary ones in the body content. We'll try and make that page work for that combination because we don't want to build one that's child friendly and one that's kid friendly when they are exactly the same page just to be able to target different keywords. That generally makes no sense, because again, the link equity gets split up and Google does a lot of things with topic modeling anyway to figure out that those two are probably really similar. So that doesn't make good sense. We can do this. So I'll draw a tiny little oven glove right there. Oh adorable, for kids.

Then you have high-temperature oven mitts. These are, oh, they're big and strong. They can handle a bunch of high temperatures. Oh, look at all that heat they can take. The high-temperature oven mitts could be ones that include phrases like heat resistant, heat retardant, for advanced chefs, for foodies, whatever it is. Those high-temperature oven gloves, they can target a bunch of phrases as well, but we have to go back to relevance and user intent for those.

Then finally, maybe we'll have something in the longer tail, like pit fire mitts or pit fire gloves, and those for people who need to dig around in coals or who are doing the fancy smoking in a backyard barbecue. Whatever it is. Professional grade stuff. Fine. Cool. I don't know. I'll put a hammer there to indicate they're, like, hardcore professionals. I'm not sure why.

Once you have done this process, you can then take the map of keywords that you created to content and actually go build that content to make searchers happy. This works so much better than the Frakenpage approach. I can't even describe to you how well this will work. It doesn't have to be right from the start. You can take an existing site right now, run through this process, and have just a huge win both in terms of your ability to target searches and rank for those keywords as well as your ability to better convert those visitors because of how you've targeted the relevance and the user intent.

I hope you've enjoyed this edition of Whiteboard Friday. We'll see you again next week. Take care.

Video transcription by Speechpad.com

Do you like this post? Yes No


SEOmoz Daily SEO Blog

Tags: , , , , , ,

Nov 18

I recently had a nice chat with Ken Evoy, founder of SiteSell and creator of SBI!, and we discussed the differences between blogging vs. building a theme-based content site.

Make Your Site SellI’ve known Ken since the 90′s(!), when I read his book “Make Your Site SELL!” (MYSS!), and it was fun to catch up and talk about creating affiliate sites.

Some things we touched on…

  • The changes in the affiliate marketing landscape from 1997 to today
  • What you need to know when starting an affiliate business now
  • Who blogging is for (and not for)
  • Who theme-based content sites are for
  • What happens when you take a break from blogging
  • What happens when you put a content site on auto-pilot
  • Socialization of the Net and how it augments your business
  • Which platforms deliver a complete package of process-and-tools for building a successful e-business

Enjoy the chat and if you haven’t gotten your first affiliate site up, yet, get it going.

Anybody can do it – it just takes patience, diligence, and a unique idea.

Listen to my discussion with Ken Evoy.


Affiliate Marketing Blog

Tags: , , , , , , ,

Nov 17

Posted by Dr. Pete

“No one saw the panda uprising coming. One day, they were frolicking in our zoos. The next, they were frolicking in our entrails. They came for the identical twins first, then the gingers, and then the rest of us. I finally trapped one and asked him the question burning in all of our souls – 'Why?!' He just smiled and said ‘You humans all look alike to me.’”

- Sgt. Jericho “Bamboo” Jackson


Pandas Take No PrisonersOk, maybe we’re starting to get a bit melodramatic about this whole Panda thing. While it’s true that Panda didn’t change everything about SEO, I think it has been a wake-up call about SEO issues we’ve been ignoring for too long.

One of those issues is duplicate content. While duplicate content as an SEO problem has been around for years, the way Google handles it has evolved dramatically and seems to only get more complicated with every update. Panda has upped the ante even more.

So, I thought it was a good time to cover the topic of duplicate content, as it stands in 2011, in depth. This is designed to be a comprehensive resource – a complete discussion of what duplicate content is, how it happens, how to diagnose it, and how to fix it. Maybe we’ll even round up a few rogue pandas along the way.


I. What Is Duplicate Content?

Let’s start with the basics. Duplicate content exists when any two (or more) pages share the same content. If you’re a visual learner, here’s an illustration for you:

Illustration of duplicates

Easy enough, right? So, why does such a simple concept cause so much difficulty? One problem is that people often make the mistake of thinking that a “page” is a file or document sitting on their web server. To a crawler (like Googlebot), a page is any unique URL it happens to find, usually through internal or external links. Especially on large, dynamic sites, creating two URLs that land on the same content is surprisingly easy (and often unintentional).


II. Why Do Duplicates Matter?

Duplicate content as an SEO issue was around long before the Panda update, and has taken many forms as the algorithm has changed. Here’s a brief look at some major issues with duplicate content over the years…

The Supplemental Index

In the early days of Google, just indexing the web was a massive computational challenge. To deal with this challenge, some pages that were seen as duplicates or just very low quality were stored in a secondary index called the “supplemental” index. These pages automatically became 2nd-class citizens, from an SEO perspective, and lost any competitive ranking ability.

Around late 2006, Google integrated supplemental results back into the main index, but those results were still often filtered out. You know you’ve hit filtered results anytime you see this warning at the bottom of a Google SERP:

Omitted results in Google

Even though the index was unified, results were still “omitted”, with obvious consequences for SEO. Of course, in many cases, these pages really were duplicates or had very little search value, and the practical SEO impact was negligible, but not always.

The Crawl “Budget”

It’s always tough to talk limits when it comes to Google, because people want to hear an absolute number. There is no absolute crawl budget or fixed number of pages that Google will crawl on a site. There is, however, a point at which Google may give up crawling your site for a while, especially if you keep sending spiders down winding paths.

Although the “budget” isn’t absolute, even for a given site, you can get a sense of Google’s crawl allocation for your site in Google Webmaster Tools (under “Diagnostics” > “Crawl Stats”):

GWT crawl graph

So, what happens when Google hits so many duplicate paths and pages that it gives up for the day? Practically, the pages you want indexed may not get crawled. At best, they probably won’t be crawled as often.

The Indexation “Cap”

Similarly, there’s no set “cap” to how many pages of a site Google will index. There does seem to be a dynamic limit, though, and that limit is relative to the authority of the site. If you fill up your index with useless, duplicate pages, you may push out more important, deeper pages. For example, if you load up on 1000s of internal search results, Google may not index all of your product pages. Many people make the mistake of thinking that more indexed pages is better. I’ve seen too many situations where the opposite was true. All else being equal, bloated indexes dilute your ranking ability.

The Penalty Debate

Long before Panda, a debate would erupt every few months over whether or not there was a duplicate content penalty. While these debates raised valid points, they often focused on semantics – whether or not duplicate content caused a Capital-P Penalty. While I think the conceptual difference between penalties and filters is important, the upshot for a site owner is often the same. If a page isn’t ranking (or even indexed) because of duplicate content, then you’ve got a problem, no matter what you call it.

The Panda Update

Since Panda (starting in February 2011), the impact of duplicate content has become much more severe in some cases. It used to be that duplicate content could only harm that content itself. If you had a duplicate, it might go supplemental or get filtered out. Usually, that was ok. In extreme cases, a large number of duplicates could bloat your index or cause crawl problems and start impacting other pages.

Panda made duplicate content part of a broader quality equation – now, a duplicate content problem can impact your entire site. If you’re hit by Panda, non-duplicate pages may lose ranking power, stop ranking altogether, or even fall out of the index. Duplicate content is no longer an isolated problem.


III. Three Kinds of Duplicates

Before we dive into examples of duplicate content and the tools for dealing with them, I’d like to cover 3 broad categories of duplicates. They are: (1) True Duplicates, (2) Near Duplicates, and (3) Cross-domain Duplicates. I’ll be referencing these 3 main types in the examples later in the post.

(1) True Duplicates

A true duplicate is any page that is 100% identical (in content) to another page. These pages only differ by the URL:

True duplicates

(2) Near Duplicates

A near duplicate differs from another page (or pages) by a very small amount – it could be a block of text, an image, or even the order of the content:

Near duplicates

An exact definition of “near” is tough to pin down, but I’ll discuss some examples in detail later.

(3) Cross-domain Duplicates

A cross-domain duplicate occurs when two websites share the same piece of content:

Cross-domain duplicates

These duplicates could be either “true” or “near” duplicates. Contrary to what some people believe, cross-domain duplicates can be a problem even for legitimate, syndicated content.


IV. Tools for Fixing Duplicates

This may seem out of order, but I want to discuss the tools for dealing with duplicates before I dive into specific examples. That way, I can recommend the appropriate tools to fix each example without confusing anyone.

(1) 404 (Not Found)

Of course, the simplest way to deal with duplicate content is to just remove it and return a 404 error. If the content really has no value to visitors or search, and if it has no significant inbound links or traffic, then total removal is a perfectly valid option.

(2) 301 Redirect

Another way to remove a page is via a 301-redirect. Unlike a 404, the 301 tells visitors (humans and bots) that the page has permanently moved to another location. Human visitors seamlessly arrive at the new page. From an SEO perspective, most of the inbound link authority is also passed to the new page. If your duplicate content has a clear canonical URL, and the duplicate has traffic or inbound links, then a 301-redirect may be a good option.

(3) Robots.txt

Another option is to leave the duplicate content available for human visitors, but block it for search crawlers. The oldest and probably still easiest way to do this is with a robots.txt file (generally located in your root directory). It looks something like this:

Robots.txt sample code

One advantage of robots.txt is that it’s relatively easy to block entire folders or even URL parameters. The disadvantage is that it’s an extreme and sometimes unreliable solution. While robots.txt is effective for blocking uncrawled content, it’s not great for removing content already in the index. The major search engines also seem to frown on its overuse, and don’t generally recommend robots.txt for duplicate content.

(4) Meta Robots

You can also control the behavior of search bots at the page level, with a header-level directive known as the “Meta Robots” tag (or sometimes “Meta Noindex”). In its simplest form, the tag looks something like this:

Meta Noindex sample code

This directive tells search bots not to index this particular page or follow links on it. Anecdotally, I find it a bit more SEO-friendly than Robots.txt, and because the tag can be created dynamically with code, it can often be more flexible.

The other common variant for Meta Robots is the content value “NOINDEX, FOLLOW”, which allows bots to crawl the paths on the page without adding the page to the search index. This can be useful for pages like internal search results, where you may want to block certain variations (I’ll discuss this more later) but still follow the paths to product pages.

One quick note: there is no need to ever add a Meta Robots tag with “INDEX, FOLLOW” to a page. All pages are indexed and followed by default (unless blocked by other means).

(5) Rel=Canonical

In 2009, the search engines banded together to create the Rel=Canonical directive, sometimes called just “Rel-canonical” or the “Canonical Tag”. This allows webmasters to specify a canonical version for any page. The tag goes in the page header (like Meta Robots), and a simple example looks like this:

Rel=canonical sample code

When search engines arrive on a page with a canonical tag, they attribute the page to the canonical URL, regardless of the URL they used to reach the page. So, for example, if a bot reached the above page using the URL “www.example.com/index.html”, the search engine would not index the additional, non-canonical URL. Typically, it seems that inbound link-juice is also passed through the canonical tag.

It’s important to note that you need to clearly understand what the proper canonical page is for any given website template. Canonicalizing your entire site to just one page or the wrong pages can be catastrophic.

(6) Google URL Removal

In Google Webmaster Tools (GWT), you can request that an individual page (or directory) be manually removed from the index. Click on “Site configuration” > “Crawler access”, and you’ll see a series of 3 tabs. Click on the 3rd tab, “Remove URL”, to get this:

GWT parameter blocking #2

Since this tool only removes one URL or path at a time and is completely at Google’s discretion, it’s usually a last-ditch approach to duplicate content. I just want to be thorough, though, and cover all of your options. An important technical note: you need to 404, Robots.txt block or Meta Noindex the page before requesting removal. Removal via GWT is primarily a last defense when Google is being stubborn.

(7) Google Parameter Blocking

You can also use GWT to specify URL parameters that you want Google to ignore (which essentially blocks indexation of pages with those parameters). If you click on “Site Configuration” > “URL parameters”, you’ll get a list something like this:

GWT URL removal screen

This list shows URL parameters that Google has detected, as well as the settings for how those parameters should be crawled. Keep in mind that the “Let Googlebot decide” setting doesn’t reflect other blocking tactics, like Robots.txt or Meta Robots. If you click on “Edit”, you’ll get the following options:

GWT Parameter blocking screen

Google changed these recently, and I find the new version a bit confusing, but essentially “Yes” means the parameter is important and should be indexed, while “No” means the parameter indicates a duplicate. The GWT tool seems to be effective (and can be fast), but I don’t usually recommend it as a first line of defense. It won’t impact other search engines, and it can’t be read by SEO tools and monitoring software. It could also be modified by Google at any time.

(8) Bing URL Removal

Bing Webmaster Center (BWC) has tools very similar to GWT’s options above. Actually, I think the Bing parameter blocking tool came before Google’s version. To request a URL removal in Bing, click on the “Index” tab and then “Block URLs” > “Block URL and Cache”. You’ll get a pop-up like this:

Bing URL removal screen

BWC actually gives you a wider range of options, including blocking a directory and your entire site. Obviously, that last one usually isn’t a good idea.

(9) Bing Parameter Blocking

In the same section of BWC (“Index”), there’s an option called “URL Normalization”. The name implies Bing treats this more like canonicalization, but there’s only one option – “ignore”. Like Google, you get a list of auto-detected parameters and can add or modify them:

Bing parameter blocking screen

As with the GWT tools, I’d consider the Bing versions to be a last resort. Generally, I’d only use these tools if other methods have failed, and one search engine is just giving you grief.

(10) Rel=Prev & Rel=Next

Just this year (September 2011), Google gave us a new tool for fighting a particular form of near-duplicate content – paginated search results. I’ll describe the problem in more detail in the next section, but essentially paginated results are any searches where the results are broken up into chunks, with each chunk (say, 10 results) having its own page/URL.

You can now tell Google how paginated content connects by using a pair of tags much like Rel-Canonical. They’re called Rel-Prev and Rel-Next. Implementation is a bit tricky, but here’s a simple example:

Rel=Prev sample code

In this example, the search bot has landed on page 3 of search results, so you need two tags: (1) a Rel-Prev pointing to page 2, and (2) a Rel-Next pointing to page 4. Where it gets tricky is that you’re almost always going to have to generate these tags dynamically, as your search results are probably driven by one template.

While initial results suggest these tags do work, they’re not currently honored by Bing, and we really don’t have much data on their effectiveness. I’ll briefly discuss other methods for dealing with paginated content in the next section.

(11) Syndication-Source

In November of 2010, Google introduced a set of tags for publishers of syndicated content. The Meta Syndication-Source directive can be used to indicate the original source of a republished article, as follows:

Syndication-source sample code

Even Google’s own advice on when to use this tag and when to use a cross-domain canonical tag are a little bit unclear. Google launched this tag as “experimental”, and I’m not sure they’ve publicly announced a status change. It’s something to watch, but don’t rely on it.

(12) Internal Linking

It’s important to remember that your best tool for dealing with duplicate content is to not create it in the first place. Granted, that’s not always possible, but if you find yourself having to patch dozens of problems, you may need to re-examine your internal linking structure and site architecture.

When you do correct a duplication problem, such as with a 301-redirect or the canonical tag, it’s also important to make your other site cues reflect that change. It’s amazing how often I see someone set a 301 or canonical to one version of a page, and then continue to link internally to the non-canonical version and fill their XML sitemap with non-canonical URLs. Internal links are strong signals, and sending mixed signals will only cause you problems.

(13) Don’t Do Anything

Finally, you can let the search engines sort it out. This is what Google recommended you do for years, actually. Unfortunately, in my experience, especially for large sites, this is almost always a bad idea. It’s important to note, though, that not all duplicate content is a disaster, and Google certainly can filter some of it out without huge consequences. If you only have a few isolated duplicates floating around, leaving them alone is a perfectly valid option.


V. Examples of Duplicate Content

So, now that we’ve worked backwards and sorted out the tools for fixing duplicate content, what does it actually look like in the wild? I’m going to cover a wide range of examples that represent the issues you can expect on a real website. Throughout this section, I’ll reference the solutions listed in Section IV – for example, a reference to a 301-redirect will cite (IV-2).

(1) “www” vs. Non-www

For sitewide duplicate content, this is probably the biggest culprit. Whether you’ve got bad internal paths or have attracted links and social mentions to the wrong URL, you’ve got both the”www” version and non-www (root domain) version of your URLs indexed:

www versus non-www example

Most of the time, a 301-redirect (IV-2) is your best choice here. This is a common problem, and Google is good about honoring redirects for cases like these.

You may also want to set your preferred address in Google Webmaster Tools. Under “Site Configuration” > “Settings”, you should see a section called “Preferred domain”:

GWT Preferred domain screen

There’s a quirk in GWT where, to set a preferred domain, you may have to create GWT profiles for both your “www” and non-www versions of the site. While this is annoying, it won’t cause any harm. If you’re having major canonicalization issues, I’d recommend it. If you’re not, then you can leave well enough alone and let Google determine the preferred domain.

(2) Staging Servers

While much less common than (1), this problem is often also caused by subdomains. In a typical scenario, you’re working on a new site design for a relaunch, your dev team sets up a subdomain with the new site, and they accidentally leave it open to crawlers. What you end up with is two sets of indexed URLS that look something like this:

Staging URL example

Your best bet is to prevent this problem before it happens, by blocking the staging site with Robots.txt (IV-3). If you find your staging site indexed, though, you’ll probably need to 301-redirect (IV-2) those pages or Meta Noindex them (IV-4).

(3) Trailing Slashes ("/")

This is a problem people often have questions about, although it's less of an SEO issue than it once was. Technically, in the original HTTP protocol, a URL with a trailing slash and one without it were different URLs. Here's a simple example:

Trailing slash example

These days, almost all browsers automatically add the trailing slash behind the scenes and resolve both versions the same way. Matt Cutts did a recent video suggesting that Google automatically canonicalizes these URLs in "the vast majority of cases".

(4) Secure (https) Pages

If your site has secure pages (designated by the “https:” protocol), you may find that both secure and non-secure versions are getting indexed. This most frequently happens when navigation links from secure pages – like shopping cart pages – also end up secured, usually due to relative paths, creating variants like this:

Secure URL example

Ideally, these problems are solved by the site-architecture itself. In many cases, it’s best to Noindex (IV-4) secure pages – shopping cart and check-out pages have no place in the search index. After the fact, though, your best option is a 301-redirect (IV-2). Be cautious with any sitewide solutions – if you 301-redirect all “https:” pages to their “http:” versions, you could end up removing security entirely. This is a tricky problem to solve and should be handled carefully.

(5) Home-page Duplicates

While problems (1)-(3) can all create home-page duplicates, the home-page has a couple unique problems of its own. The most typical problem is that both the root domain and the actual home-page document name get indexed. For example:

Home-page duplicate example

Although this problem can be solved with a 301-redirect (IV-2), it’s often a good idea to put a canonical tag on your home-page (IV-5). Home pages are uniquely afflicted by duplicates, and a proactive canonical tag can prevent a lot of problems.

Of course, it’s important to also be consistent with your internal paths (IV-12). If you want the root version of the URL to be canonical, but then link to “/index.htm” in your navigation, you’re sending mixed signals to Google every time the crawlers visit.

(6) Session IDs

Some websites (especially e-commerce platforms) tag each new visitor with a tracking parameter. On occasion, that parameter ends up in the URL and gets indexed, creating something like this:

Session ID URL example

That image really doesn’t do the problem justice, because in reality you can end up with a duplicate for every single session ID and page combination that gets indexed. Session IDs in the URL can easily add 1000s of duplicate pages to your index.

The best option, if possible on your site/platform, is to remove the session ID from the URL altogether and store it in a cookie. There are very few good reasons to create these URLs, and no reason to let bots crawl them. If that’s not feasible, implementing the canonical tag (IV-5) sitewide is a good bet. If you really get stuck, you can block the parameter in Google Webmaster Tools (IV-7) and Bing Webmaster Central (IV-9).

(7) Affiliate Tracking

This problem looks a lot like (6) and happens when sites provide a tracking variable to their affiliates. This variable is typically appended to landing page URLs, like so:

Affiliate URL example

The damage is usually a bit less extreme than (5), but it can still cause large-scale duplication. The solutions are similar to session IDs. Ideally, you can capture the affiliate ID in a cookie and 301-redirect (IV-3) to the canonical version of the page. Otherwise, you’ll probably either need to use canonical tags (IV-5) or block the affiliate URL parameter.

(8) Duplicate Paths

Having duplicate paths to a page is perfectly fine, but when duplicate paths generate duplicate URLs, then you’ve got a problem. Let’s say a product page can be reached one of 3 ways:

Duplicate path examples

Here, the iPad2 product page can be reached by 2 categories and a user-generated tag. User-generated tags are especially problematic, because they can theoretically spawn unlimited versions of a page.

Ideally, these path-based URLs shouldn’t be created at all. However a page is navigated to, it should only have one URL for SEO purposes. Some will argue that including navigation paths in the URL is a positive cue for site visitors, but even as someone with a usability background, I think the cons almost always outweigh the pros here.

If you already have variations indexed, then a 301-redirect (IV-2) or canonical tag (IV-5) are probably your best options. In many cases, implementing the canonical tag will be easier, since there may be too many variations to easily redirect. Long-term, though, you’ll need to re-evaluate your site architecture.

(9) Functional Parameters

Functional parameters are URL parameters that change a page slightly but have no value for search and are essentially duplicates. For example, let’s say that all of your product pages have a printable version, and that version has its own URL:

Print parameter URL example

Here, the “print=1” URL variable indicates a printable version, which normally would have the same content but a modified template. Your best bet is to not index these at all, with something like a Meta Noindex (IV-4), but you could also use a canonical tag (IV-5) to consolidate these pages.

(10) International Duplicates

These duplicates occur when you have content for different countries which share the same language, all hosted on the same root domain (it could be subfolders or subdomains). For example, you may have an English version of your product pages for the US, UK, and Australia:

International sub-folder example

Unfortunately, this one’s a bit tough – in some cases, Google will handle it perfectly well and rank the appropriate content in the appropriate countries. In other cases, even with proper geo-targeting, they won’t. It’s often better to target the language itself than the country, but there are legitimate reasons to split off country-specific content, such as pricing.

If your international content does get treated as duplicate content, there’s no easy answer. If you 301-redirect, you lose the page for visitors. If you use the canonical tag, then Google will only rank one version of the page. The “right” solution can be highly situational and really depends on the risk-reward tradeoff (and the scope of the filter/penalty).

(11) Search Sorts

So far, all of the examples I’ve given have been true duplicates. I’d like to dive into a few examples of “near” duplicates, since that concept is a bit fuzzy. A few common examples pop up with internal search engines, which tend to spin off many variants – sortable results, filters, and paginated results being the most frequent problems.

Search sort duplicates pop up whenever a sort (ascending/descending) creates a separate URL. While the two sorted results are technically different pages, they add no additional value to the search index and contain the same content, just in a different order. URLs might look like:

Search sort URL example

In most cases, it’s best just to block the sortable versions completely, usually by adding a Meta Noindex (IV-4) selectively to pages called with that parameter. In a pinch, you could block the sort parameter in Google Webmaster Tools (IV-7) and Bing Webmaster Central (IV-9).

(12) Search Filters

Search filters are used to narrow an internal search – it could be price, color, features, etc. Filters are very common on e-commerce sites that sell a wide variety of products. Search filter URLs look a lot like search sorts, in many cases:

Search filter URL example

The solution here is similar to (11) – don’t index the filters. As long as Google has a clear path to products, indexing every variant usually causes more harm than good.

(13) Search Pagination

Pagination is an easy problem to describe and an incredibly difficult one to solve. Any time you split internal search results into separate pages, you have paginated content. The URLs are easy enough to visualize:

Search pagination URL example

Of course, over 100s of results, one search can easily spin out dozens of near duplicates. While the results themselves differ, many important features of the pages (Titles, Meta Descriptions, Headers, copy, template, etc.) are identical. Add to that the problem that Google isn’t a big fan of “search within search” (having their search pages land on yours).

In the past, Google has said to let them sort pagination out – problem is, they haven’t done it very well. Recently, Google introduced Rel=Prev and Rel=Next (IV-10). Initial data suggests these tags work, but we don’t have much data, they’re difficult to implement, and Bing doesn’t currently support them.

You have 3 other, viable options (in my opinion), although how and when they’re viable depends a lot on the situation:

  1. You can Meta Noindex,Follow pages 2+ of search results. Let Google crawl the paginated content but don’t let them index it.
  2. You can create a “View All” page that links to all search results at one URL, and let Google auto-detect it. This seems to be Google’s other preferred option.
  3. You can create a “View All” page and set the canonical tag of paginated results back to that page. This is unofficially endorsed, but the pages aren’t really duplicates in the traditional sense, so some claim it violates the intent of Rel-canonical.

Adam Audette has a recent, in-depth discussion of search pagination that I highly recommend. Pagination for SEO is a very difficult topic and well beyond the scope of this post.

(14) Product Variations

Product variant pages are pages that branch off from the main product page and only differ by one feature or option. For example, you might have a page for each color a product comes in:

Product color URL examples

It can be tempting to want to index every color variation, hoping it pops up in search results, but in most cases I think the cons outweigh the pros. If you have a handful of product variations and are talking about dozens of pages, fine. If product variations spin out into 100s or 1000s, though, it’s best to consolidate. Although these pages aren’t technically true duplicates, I think it’s ok to Rel-canonical (IV-5) the options back up to the main product page.

One site note: I purposely used “static” URLs in this example to demonstrate a point. Just because a URL doesn’t have parameters, that doesn’t make it immune to duplication. Static URLs (parameter-free) may look prettier, but they can be duplicates just as easily as dynamic URLs.

(15) Geo-keyword Variations

Once upon a time, “local SEO” meant just copying all of your pages 100s of times, adding a city name to the URL, and swapping out that city in the page copy. It created URLs like these:

Geo-keyword URL examples

In 2011, not only is local SEO a lot more sophisticated, but these pages are almost always going to look like near-duplicates. If you have any chance of ranking, you’re going to need to invest in legitimate, unique content for every geographic region you spin out. If you aren’t willing to make that investment, then don’t create the pages. They’ll probably backfire.

(16) Other “Thin” Content

This isn’t really an example, but I wanted to stop and explain a word we throw around a lot when it comes to content: “thin”. While thin content can mean a variety of things, I think many examples of thin content are near-duplicates like (14) above. Whenever you have pages that vary by only a tiny percentage of content, you risk those pages looking low-value to Google. If those pages are heavy on ads (with more ads than unique content), you’re at even more risk. When too much of your site is thin, it’s time to revisit your content strategy.

(17) Syndicated Content

These last 3 examples all relate to cross-domain content. Here, the URLs don’t really matter – they could be wildly different. Examples (17) and (18) only differ by intent. Syndicated content is any content you use with permission from another site. However you retrieve and integrate it, that content is available on another site (and, often, many sites).

While syndication is legitimate, it’s still likely that one or more copies will get filtered out of search results. You could roll the dice and see what happens (IV-13), but conventional SEO wisdom says that you should link back to the source and probably set up a cross-domain canonical tag (IV-5). A cross-domain canonical looks just like a regular canonical, but with a reference to someone else’s domain.

Of course, a cross-domain canonical tag means that, assuming Google honors the tag, your page won’t get indexed or rank. In some cases, that’s fine – you’re using the content for its value to visitors. Practically, I think it depends on the scope. If you occasionally syndicate content to beef up your own offerings but also have plenty of unique material, then link back and leave it alone. If a larger part of your site is syndicated content, then you could find yourself running into trouble. Unfortunately, using the canonical tag (IV-5) means you'll lose the ranking ability of that content, but it could keep you from getting penalized or having Panda-related problems.

(18) Scraped Content

Scraped content is just like syndicated content, except that you didn’t ask permission (and might even be breaking the law). The best solution: QUIT BREAKING THE LAW!

Seriously, no de-duping solution is going to satisfy the scrapers among you, because most solutions will knock your content out of ranking contention. The best you can do is pad the scraped content with as much of your own, unique content as possible.

(19) Cross-ccTLD Duplicates

Finally, it’s possible to run into trouble when you copy same-language content across countries – see example (9) above – even with separate Top-Level Domains (TLDs). Fortunately, this problem is fairly rare, but we see it with English-language content and even with some European languages. For example, I frequently see questions about Dutch content on Dutch and Belgian domains ranking improperly.

Unfortunately, there’s no easy answer here, and most of the solutions aren’t traditional duplicate-content approaches. In most cases, you need to work on your targeting factors and clearly show Google that the domain is tied to the country in question.


VI. Which URL Is Canonical?

I’d like to take a quick detour to discuss an important question – whether you use a 301-redirect or a canonical tag, how do you know which URL is actually canonical? I often see people making a mistake like this:

Bad canonical tag example

The problem is that “product.php” is just a template – you’ve now collapsed all of your products down to a single page (that probably doesn’t even display a product). In this case, the canonical version probably includes a parameter, like “id=1234”.

The canonical page isn’t always the simplest version of the URL – it’s the simplest version of the URL that generates UNIQUE content. Let’s say you have these 3 URLs that all generate the same product page:

Canonical URL examples

Two of these versions are essentially duplicates, and the “print” and “session” parameters represent variations on the main product page that should be de-duped. The “id” parameter is essential to the content, though – it determines which product is actually being displayed.

So, consider yourself warned. As much trouble as rampant duplicates can be, bad canonicalization can cause even more damage in some cases. Plan carefully, and make absolutely sure you select the correct canonical versions of your pages before consolidating them.


VII. Tools for Diagnosing Duplicates

So, now that you recognize what duplicate content looks like, how do you go about finding it on your own site? Here are a few tools to get you started – I won’t claim it’s a complete list, but it covers the bases:

(1) Google Webmaster Tools

In Google Webmaster Tools, you can pull up a list of duplicate TITLE tags and Meta Descriptions Google has crawled. While these don’t tell the whole story, they’re a good starting point. Many URL-based duplicates will naturally generate identical Meta data. In your GWT account, go to “Diagnostics” > “HTML Suggestions”, and you’ll see a table like this:

GWT duplicate detection screen

You can click on “Duplicate meta descriptions” and “Duplicate title tags” to pull up a list of the duplicates. This is a great first stop for finding your trouble-spots.

(2) Google’s Site: Command

When you already have a sense of where you might be running into trouble and need to take a deeper dive, Google’s “site:” command is a very powerful and flexible tool. What really makes “site:” powerful is that you can use it in conjunction with other search operators.

Let’s say, for example, that you’re worried about home-page duplicates. To find out if Google has indexed any copies of your home-page, you could use the “site:” command with the “intitle:” operator, like this:

site: plus intitle: example

Put the title in quotes to capture the full phrase, and always use the root domain (leave off “www”) when making a wide sweep for duplicate content. This will detect both “www” and non-www versions.

Another powerful combination is “site:” plus the “inurl:” operator. You could use this to detect parameters, such as the search-sort problem mentioned above:

site: plus inurl: example

The “inurl:” operator can also detect the protocol used, which is handy for finding out whether any secure (https:) copies of your pages have been indexed:

site: plus inurl: example #2

You can also combine the “site:” operator with regular search text, to find near-duplicates (such as blocks of repeated content). To search for a block of content across your site, just include it in quotes:

site: plus text block example

I should also mention that searching for a unique block of content in quotes is a cheap and easy way to find out if people have been scraping your site. Just leave off the “site:” operator and search for a long or unique block entirely in quotes.

Of course, these are just a few examples, but if you really need to dig deep, these simple tools can be used in powerful ways. Ultimately, the best way to tell if you have a duplicate content problem is to see what Google sees.

(3) SEOmoz Campaign Manager

If you’re an SEOmoz PRO member, you have access to some additional tools for spotting duplicates in your Campaigns. In addition to duplicate page titles, the Campaign manager will detect duplicate content on the pages themselves. You can see duplicate pages we’ve detected from the Campaign Overview screen:

SEOmoz Campaign screen

Click on the “Duplicate Page Content” link and you’ll not only see a list of potential duplicates, but you’ll get a graph of how your duplicate count has changed over time:

SEOmoz duplicate content graph

The historical graph can be very useful for determining if any recent changes you’ve made have created (or resolved) duplicate content issues.

Just a technical note, since it comes up a lot in Q&A – Our system currently uses a threshold of 95% to determine whether content is duplicated. This is based on the source code (not the text copy), so the amount of actual duplicate content may vary depending on the code/content ratio.

(4) Your Own Brain

Finally, it’s important to remember to use your own brain. Finding duplicate content often requires some detective work, and over-relying on tools can leave some gaps in what you find. One critical step is to systematically navigate your site to find where duplicates are being created. For example, does your internal search have sorts and filters? Do those sorts and filters get translated into URL variables, and are they crawlable? If they are, you can use the “site:” command to dig deeper. Even finding a handful of trouble spots using your own sleuthing skills can end up revealing 1000s of duplicate pages, in my experience.


I Hope That Covers It

If you’ve made it this far: congratulations – you’re probably as exhausted as I am. I hope that covers everything you’d want to know about the state of duplicate content in 2011, but if not, I’d be happy to answer questions in the comments. Dissenting opinions are welcome, too. Some of these topics, like pagination, are extremely tricky in practice, and there’s often not one “right” answer. Finally, if you liked my panda mini-poster, here’s a link to a larger version of Pandas Take No Prisoners.

Do you like this post? Yes No


SEOmoz Daily SEO Blog

Tags: , , ,

Nov 04

Posted by Dr. Pete

Mom with obnoxious sonWith every skirmish in the ongoing war over SEO hats, I inevitably hear someone say “I built great content, and no one cared – content marketing doesn’t work.” I’m not here to deny it – sometimes, “great” content falls flat on its face.

Part of the problem is that we throw around that word like it’s self-evident (“Build great content! Tada!”), but the other part is that we just don’t give our own content a chance to succeed. Too often, it’s not the fault of the content or even Google, but what we do (or don’t do) after we create that content. Here are a few ideas for evaluating “great” content and putting it into action…

Don’t Listen to Your Mom

Before you even start promoting your “great” content, take a minute to make sure it’s as good as you think it is. Have you ever seen an American Idol audition where some kid came out spouting how they were God’s gift to singing and dancing and then proceeded to look like Charlie Sheen doing a one-man show? Apparently, they never performed in front of anyone but their mom. Don’t trust your fans, when it comes to the really important content. Find some critics and listen to them. The content that people will come back to time and time again usually didn’t get written in one draft.

What Does “Great” Mean?

Just the word “great” is a minefield of ambiguity. We all have some ability to judge quality, but too often our measures of greatness are based on hindsight – a blog post was “great” because it got a lot of traffic, Tweets, Likes, etc. I don’t think there’s any one recipe for great content, but I have seen some common themes, at least in my own content marketing successes. Most great content will match at least one of these:

(1) Great Content Has Credibility

As a consultant and subject-matter expert, my most successful content has been the pieces that really distill years of my own experience. Don’t cover a topic if you don’t know what you’re talking about. On the flip side, don’t underestimate the value of your own expertise, even if you think your subject matter is boring.

(2) Great Content Takes Real Effort

Not all great content has to cost a lot (plenty of unknown brands have proven that), but I think that most great content takes time and effort to create. If you know someone poured themselves into a piece, whether it’s a well-researched post, a well-edited video, or a gorgeous infographic, it says that they respect your time and intelligence. Real effort resonates with people. Respect your readers.

(3) Great Content Is Actionable

This is more a feature of informational content than link-bait, but great blog posts, for example, leave you walking away with something useful. Whether it’s SEO tactics, recipes, or home-improvement tips, if you leave with actionable knowledge, you’re going to remember that content. Give people useful information and help them put it into action.

(4) Great Content Begs to Be Shared

On the link-bait side, great content is something you instantly want to show others, whether it’s out of awe, disgust, or just to show that you’re cool. When you’re done creating a piece, are you eager to hit “publish” or are you just glad that it’s over and you can go home? Create content that you’re proud to share, not just because it might go viral, but because you’re the one who has to share it first (see below).

Market Your Marketing

The great irony of content marketing is that you have to market it. We’d all like to write content that everyone links to just by sheer virtue of its greatness. Some people will argue that that’s “pure” and marketing is somehow a stain on real greatness, but (pardon me) that’s bullshit. Wanting to be recognized solely for our virtues is nothing more than an ego trip. If you sit around waiting for a job because you think you’re a genius, but never apply or never talk to anyone, good luck. Your ego is in your way. The same goes for content. Content marketing requires marketing, and that starts with you.

(1) Reach Out to People

Remember what I said about creating content that you can’t wait to share? Well, here’s your chance. If you churn out crap just to build links, you’ll be embarrassed to tell people about it, and you should be. If you know you built something great, you’ll be eager to show your friends and peers. So, show them – contact people directly and let them know you have something great. Don’t just tweet it once and forget – email people, IM them, call if you have to.

(2) Time Your Launch

Too often, we put hours or days into a piece of content and then just hit “Publish” when it’s done, like 8pm on a Sunday when our whole industry is on planes to a conference that starts Monday morning. Plan your content publishing like you would plan a product launch: pre-announce that it’s coming, time your launch well, and don’t be afraid to re-announce. You’re not going to get anyone bent out of shape because you tweeted the same link in the morning AND the afternoon (as long as you don’t make a habit out of it). Only a small percentage of your followers are paying attention at any given moment.

Although I think timing depends a lot on your audience, Dan Zarrella has written some great content on the science of timing content. HubSpot also has a tool called TweetWhen that you can use to see when you’re most likely to be re-tweeted.

(3) Have a Promotion Plan

It’s funny how we’ll pour our hearts and souls into a piece of content, but then, as soon as it’s finished, we’re on to the next project. Then, we wonder why no one cares. I have to admit, I’ve been guilty of this one too many times. Don’t forget the importance of what happens after you publish your content. Better yet, build a marketing plan that covers those next steps. Hit your social media outlets, actively build links, do guest-posts on relevant sites, etc. We see content go viral and assume it just happened by magic – 10% of the time, that may be true, but the other 90% someone hit the streets and made it happen.

(4) Post It Somewhere Else

It’s tough to put a lot of time into a piece of content and not let it live on your own site, but sometimes you need to go where your audience is. Take Oli Gardner’s massive Noob Guide to Online Marketing published earlier this year on SEOmoz. Oli could’ve easily posted this guide on Unbounce, but he opted to target a slightly different but still very relevant audience. Over 4,000 Tweets and almost 100K visits later, it’s hard to deny that this tactic had a positive impact for his reputation and company.

Greatness Isn’t Instant

One last tip: At the speed of the internet, we tend to think that every success is overnight. Some content takes days or even weeks to make its mark. I think the days of trying to make Digg’s home-page left us with some bad habits, and one of those is giving up on content that doesn’t explode in the first hour after it’s published. It’s nice when it happens, but too often that explosion just left behind the charred remains of servers and nothing but some traffic logs to show for it.

If you believe your content is great, give it a chance. It could catch on because of a guest post, a well-placed link, an interview, or any of a hundred factors that happen in the days and weeks after the content goes live. Even if you finally decide it did fail, learn what you can from it. People want to bank everything on one-shot content, but even the best content marketers don’t succeed 100% of the time (I’d say they’re lucky to bat 0.200) – failed content still carries valuable information, and you can build the next piece of great content on top of it.

Do you like this post? Yes No


SEOmoz Daily SEO Blog

Tags: , ,

Oct 27

A HighRankings Forum thread has an interesting post by a person who has a site that wants more traffic.

He made one statement that stood out to me, actually a few, but one more than others…




Search Engine Roundtable

Tags: , , ,

Oct 25

What exactly do I mean by that? Stoney deGeyter uses the analogy of being in the middle of a brick-and-mortar store without any sales assistants around to help. Looking up and down isles, not finding an available employee within driving distance, I get the urge to shout, ‘I’m going to steal something!’ just to see if anyone cares. I can do him one better than that. Stuck in a store with employees that won’t help, I’ll walk right out the door and go visit their competitors, who are more than willing to give me all the assistance I need. So think about that for a second. If I’ll go to the effo…
SEO Chat – Search Engine Optimization Tutorials

Tags: , ,

Oct 20

Brafton Infographic: Why Content for SEO? explores how content is key to search engine visibility.  They also have a related post which breaks down each area of the infographic. I like this stat – “76% of marketers who have strategic SEO campaigns in place invest in content creation.

Follow SEJ on Twitter @sejournal



Search Engine Journal

Tags: , ,