Jan 28

Two weeks ago, Google launched Search Plus Your World. Since then, Google has faced strong criticisms that SPYW is making its search relevancy worse and favoring its Google+ social network too much. Not so, says Google search chief Amit Singhal. Most Google users are happy, Singhal said. Of course,…



Please visit Search Engine Land for the full article.




Search Engine Land: News & Info About SEO, PPC, SEM, Search Engines & Search Marketing

Tags: , , , , , , , , , , ,

Jan 28

Inspiration strikes in strange ways; in this case, I reached the end of Neil Patel’s excellent article covering 13 questions you should ask yourself while writing a blog post. In his discussion of the last question, he compared a blog post to a restaurant meal. Will your reader complain about your blog post because you’ve served them skimpy fare? Are you feeding people content so they are full when they leave your site…or are they hungry, looking for more? Patel asked. If they are still hungry, your readers probably won’t come back. So if your readers are devouring your blog content, th…
SEO Chat – Search Engine Optimization Tutorials

Tags: , , ,

Jan 20

Posted by randfish

What happens when you have a page that ranks very well, but it isn't the page that pulls in the sales that you need? Often times the page that does convert very well is "boring" and subsequently ranks poorly.

In this weeks Whiteboard Friday, we are going to go over some strategies you can use to get those classically "boring" pages to rank well. Don't forget to leave your comments below. Enjoy!


Video Transcription

The transcription for this video will be coming soon.

Video transcription by Speechpad.com

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!


SEOmoz Daily SEO Blog

Tags: , , , , , , , ,

Jan 28

Speaking today at the World Economic Forum in Davos, Switzerland, outgoing Google CEO Eric Schmidt deflected suggestions that Google is in competition with the likes of Apple and Facebook, and rejected speculation that those fights are the real reason that Larry Page is taking the CEO’s…



Please visit Search Engine Land for the full article.




Search Engine Land: News About Search Engines & Search Marketing

Tags: , , , , , , , ,

Oct 08

Posted by Aaron Wheeler

I’ve always liked encyclopedias; when I was in middle school I started using Encarta on CD-ROM, and sure, I usually needed it for "help" with my homework, but sometimes I would stray to non-copy-and-pasting-from-encyclopedia activities and watch terribly animated videos of war battles or Shakespearean plays. My poor children will never know the joys of a succinct five page article on the American Revolution with an accompanying 30-second 160 X 200 resolution video! I suppose they’ll have to make due with the way too informative Wikipedia article and an accompanying overly high-def retelling of events – do they really need to be able to see Benjamin Franklin’s hickeys?

Anyways, if my aforementioned future kids do end up needing to write about the American Revolution, and you have a great site about it, how can you make sure they end up seeing your content? There are a lot of reasons for why it can be hard to rank for reference content, but fortunately, Whiteboard Friday is here to help! This week, Rand discusses some great ways to get your reference content to the top of the SERPs.

if(!navigator.mimeTypes['application/x-shockwave-flash'])Wistia.VideoEmbed(‘wistia_198774′,640,360,{videoUrl:’http://seomoz-cdn.wistia.com/deliveries/b1b27e1be98d755e1c28b75fc022af912e6cf58a.bin’,stillUrl:’http://seomoz-cdn.wistia.com/deliveries/808b32ef4219b82b744fbd52b30d073084b97243.bin’,distilleryUrl:’http://distillery.wistia.com/x’,accountKey:’wistia-production_3161′,mediaId:’wistia-production_198774′,mediaDuration:821.15})
Wistia View statistics for this video
Embed video
<object width="640" height="360" id="wistia_198774" classid="clsid:D27CDB6E-AE6D-11cf-96B8-444553540000"><param name="movie" value="http://seomoz-cdn.wistia.com/flash/embed_player_v1.1.swf"/><param name="allowfullscreen" value="true"/><param name="allowscriptaccess" value="always"/><param name="wmode" value="opaque"/><param name="flashvars" value="videoUrl=http://seomoz-cdn.wistia.com/deliveries/b1b27e1be98d755e1c28b75fc022af912e6cf58a.bin&stillUrl=http://seomoz-cdn.wistia.com/deliveries/808b32ef4219b82b744fbd52b30d073084b97243.bin&unbufferedSeek=false&controlsVisibleOnLoad=false&autoPlay=false&endVideoBehavior=default&playButtonVisible=true&embedServiceURL=http://distillery.wistia.com/x&accountKey=wistia-production_3161&mediaID=wistia-production_198774&mediaDuration=821.15"/><embed src="http://seomoz-cdn.wistia.com/flash/embed_player_v1.1.swf" width="640" height="360" name="wistia_198774" type="application/x-shockwave-flash" allowfullscreen="true" allowscriptaccess="always" wmode="opaque" flashvars="videoUrl=http://seomoz-cdn.wistia.com/deliveries/b1b27e1be98d755e1c28b75fc022af912e6cf58a.bin&stillUrl=http://seomoz-cdn.wistia.com/deliveries/808b32ef4219b82b744fbd52b30d073084b97243.bin&unbufferedSeek=false&controlsVisibleOnLoad=false&autoPlay=false&endVideoBehavior=default&playButtonVisible=true&embedServiceURL=http://distillery.wistia.com/x&accountKey=wistia-production_3161&mediaID=wistia-production_198774&mediaDuration=821.15"></embed></object><script src="http://seomoz-cdn.wistia.com/embeds/v.js" charset="ISO-8859-1"></script><script>if(!navigator.mimeTypes['application/x-shockwave-flash'])Wistia.VideoEmbed(‘wistia_198774′,640,360,{videoUrl:’http://seomoz-cdn.wistia.com/deliveries/b1b27e1be98d755e1c28b75fc022af912e6cf58a.bin’,stillUrl:’http://seomoz-cdn.wistia.com/deliveries/808b32ef4219b82b744fbd52b30d073084b97243.bin’,distilleryUrl:’http://distillery.wistia.com/x’,accountKey:’wistia-production_3161′,mediaId:’wistia-production_198774′,mediaDuration:821.15})</script><a href="http://www.seomoz.org/">SEOmoz – SEO Software</a>

 

Video Transcription

Howdy, SEOmoz fans! Welcome to another edition of Whiteboard Friday. This week we’re talking specifically about reference content, which is a type of content that often has a tough time earning external links, has a tough time getting rankings and visibility in the search results. Yet a lot of people are both (a) interested in it form a searcher perspective and (b) have marketers who are interested in ranking for that type of topic so that they can draw in traffic to help brand their site to sell advertising, to build themselves up as industry authorities, and sometimes even to make direct sales as they relate to that reference content.

So, let’s start with some tips, some specific action items that you can take that will help your reference content get more rankings. When I talk about reference content, I mean everything from, like, dictionary-type definitions to encyclopedic types of content to how-to content. Anything that is sort of less about a news item, an exciting development, or a blog post and more like a piece of content that is simply informational in nature and designed to provide sort of an evergreen long-term resource. It’s tough to get this stuff ranking, but I think we can help.

First off, let’s talk about keyword usage. As you’re building out this content, a lot of people think, "All right. I need to have a certain number of the target keywords and I’m going to use these keyword variations and I’m going to have this keyword density." I talk about the keyword density myth a lot of the time. The problem is, and I think one of the reasons it doesn’t resonate with folks or why people still say, "You know what, I think Rand is full of it on keyword density. It totally works," is because it is true that in many cases you can have a keyword that is used a certain number of times, a small number of times on a page and you can increase the number of times that it is used on a page and see the rankings go up. People say, "Well, that’s proof that keyword density works." In some semantic form, that is technically correct.

The problem is density itself is not necessarily, is almost certainly not the metric. Let’s say very certainly not the metric that search engines are using. So when you use that metric you might be conflating different variables. It could indeed be the case that adding more keywords and increasing technically what could be measured through density is helpful. But density itself is a bad way to measure things. What I’d urge you to instead think about is, "Am I hitting all of these items, and am I doing a good job with them?" If I am, chances are good that increasing my density, measuring my density, is going to add no value. Certainly, it’s the case that the search engines don’t measure it. We don’t want to be doing things that are sort of obviously known to be not used by the engines.

So, things like using the keyword element in the title, preferably at the beginning of the title, particularly for reference content is really good. People want to see right in the title in the search engine results that your page is about the content that they’re searching for in the H1 headline. The H1 headline may not help all that much, or specifically using the H1 tag to designate your headline, as opposed to just having it big, bold, and at the top of the page, may not help that much. So if it is a pain, I wouldn’t worry about it. But if you can, it is sort of a nice, good semantics thing to do. Good web standards.

Certainly, having it in the headline, whether you’re using the H1 tag or not, is important because when someone clicks on that result and reaches your page, you want to reinforce the notion right there at the top of the page, in the headline, that this content that they’ve reached is about what they searched for and it is what they just clicked on. When you have the disconnect between those words and phrases, I really worry that a lot of times your bounce rate will increase, you’ll see people leaving the page. It makes good sense form a usability perspective.

The meta description is certainly a good place to use it. It will get bolded and highlighted in the search result. Even though it doesn’t directly help with rankings.

The URL, same story. Although URLs do seem to have some nice correlation. It looked like in our ranking models that they have some causation influencing that. Certainly, you can see that when you change over to search friendly URLs that use the keywords in there those are very nice for SEO purposes as well.

The body tag usage. This is where people get super obsessed with keyword density. Most of the time, unless your article is really huge, I don’t worry very much about keyword density or the number of times you use it. You use it a few times, you use it the number of times that it makes sense in the document — two, three, four, five, six, right. Those are fine. But I wouldn’t obsess about like, "Okay. Wait. I think we have it nine times here. We should only have it eight because the average of the top ten is that they’re only using it X many times." Get out of town. Like, no way, man. This stuff is not helping here. It is good to use it in the body tag.

It also is surprisingly good to use it in things like the image Alt tag and in the file name of an image that’s on the page. I don’t know what it is. It could just be correlation. It might not be causation, but it turns out that the image Alt tag is higher correlated than H1s are. So, maybe it’s just the case that people like having images that are on the topic. Or maybe the search engines actually do have a preference about this kind of stuff.

You should definitely be worried about readability. If a normal, average user comes to the page and they read it, but the material is not connecting with them and doesn’t make great sense, get out of there. It’s trouble. This is one of the ways that SEOs and people in independent websites can really compete with Wikipedia, which is oftentimes hard to read, hard to parse, hard to understand, not tremendously well written. It’s written by a group of authors a lot of the time. A lot of the material can be dense. The same goes for a lot of professionally published content that just isn’t as accessible.

Completeness. So, one of the things that I definitely think about and this relates back to sort of topic modeling and LDA stuff to whatever extent that’s being used. Certainly it seems like it is being used to some substantive effect, but we don’t know exactly how much. Being able to comprehensively cover the topic that you’re talking about will mean that more people like your content, reference it, use it, enjoy it, share it with their friends, and it means that they are getting value out of it, which means that metrics like time on site and browse rate will go up, which might help your SEO, might not help your SEO, but will certainly help your site metrics. You care about those, too.

Then, I think a lot about the angle that you’re taking with your writing. Things like, I’m going to take a research-driven angle, or I’m going to take an opinion-driven angle, or I’m going to take sort of a showing all the different controversial sides of this, or I am going to walk through the history of this. Having that angle that is sort of unique and people say, "Wow, when I visit SEOmoz, I feel like I get a really thorough understanding of all the issues around a particular topic. Or I get a very opinionated piece from Rand about what he thinks about a particular SEO tactic and how people have used it. Then I get different sorts of opinions in the comments." That angle that you take can brand your site, brand your domain, and your company as having useful information on that topic. All of these things are far better to think about. If you nail those, you’re going to win out over keyword density.

Next item that you do have to worry about with reference content is architecture — internal architecture and internal linking. We talk about this ideal link architecture, the ideal pyramid, a lot. You start with your home page. If you can do this thing where you’ve got a hundred links approximately-ish per page, a hundred unique links, and that’s linking down to the second level with all of your categories and each of those are linking down to subcategories, you can get to a million pages in just one, two, three hops. Three hops from any single page on a site to a million subpages means that even the most robust quantity of reference content can be reached in a small number of clicks. That portends really good things for search engines and for users who are trying to parse through your material and potentially surf your site.

This is a great way to think about organizing your site. You’re never going to get to this perfect layer, but if you can think about this organization as a structure as you’re planning, it would be very helpful. You don’t have to do this with your home page either. If you think about something like a sitemap, an HTML sitemap on your site that you link to in the footer of every page and that page links to all of these and then they all link to these, you’ve accomplished the same thing. You’ve basically made it three or maybe four hops from any page on your site to a million pages. That’s a really good thing.

You should also be thinking about things like using categories and subcategories intelligently. You can’t just be listing content. Those categories and subcategory pages have to be useful and valuable in and of themselves. We’ve talked about that a little bit in the past here on SEOmoz, too. The relevance and usefulness of those pages is going to predict whether they themselves can draw in links. If these pages can draw in external links, you know that’s going to help all the pages that they point to down below to rank better, to earn more linkages and page rank and trust. Those metrics that will flow down through a site.

I think it is very important and very wise to look at models like what Wikipedia has done and NY Times has done, what About.com has done, with cross-referencing content at deep levels. When you get to these deep pages down here and it has a link back up to that category and over to this page which it’s referencing in the content, that’s super useful from a visitor’s standpoint because they’ll click more. You might have a higher browse rate, a higher pages per session, as well as driving SEO value in that the search engines might see this one or see that this is linked too and then follow those links out from there, pass more link juice and more crawling power across those pages.

The last one, and I know the most challenging one, is earning external links. Reference content, are you kidding me? It just doesn’t get linked to, you know. How are you going to win with this stuff? But there are ways. Successful companies have done really good things on this front. The first one I recommend is from the content perspective. Multimedia content, visual explanations, these kinds of things rock. I was pointing today on Twitter to a post from King Arthur Flour. Can you think of a more boring company? King Arthur Flour? Are you kidding me? They have an amazing blog. Their blog has earned hundreds, thousands of links because they’ve produced these blog posts that are sort of reference content about how to bake French bread and how to do no-knead bread. What they do is make them highly multimedia intensive. So, every step of the way they’ve got photo after photo after photo after photo. Tons of comments. People just loving it to death. Granted, you know, they’re in a moderately interesting area of recipes, but it is super competitive, and yet they rank for this stuff. They’re able to draw people in. And they can show off the fact that, you know, King Arthur Flour is sort of very highly rated for this kind of thing by other professional chefs, etc. Those visual explanations, the video content, they rock, right. You’re watching Whiteboard Friday, huh?

Next piece that I really like is doing things with research content as well as like charts, graphs, and data. Even if you take your data from third party sources and you reference back to it, if you’re the one who produces the actual visual chart, other people who want to embed that chart, want to use it in presentations, want to use it in blog posts, who want to talk about it, are going to use your materials. You can check out the SEOmoz free charts section where we take a bunch of data that’s from sources like Eightfold Logic and comScore and Nielsen and Hitwise, put them all together, and then put them into interesting charts that other people can reference and embed on their pages. Of course, they’ll link back to those original sources, as well as to us. Those are great ways to get your reference content to actually earn those links.

The last one, two methods to kind of go out there and do distribution. Those are licensing and translation. These tactics are ideal because you’ll see all these other sites that are copying your content are linking in to your work, referencing back to that original. That is going to provide for the fact that even though these might be technically duplicate content, when the engines see them referencing your single source, especially multiples referencing your single source, they’re going to know this is the original. You can do this with licensing where you say, "Hey, I know you are in this industry and you’d like to license out some content. I’ll be the reference resource for you. You can put this stuff on your site."

It’s brilliant, too, for translation. As the Web is getting more global, more people are interested in this. More people are trying to rank for search content in all sorts of other countries. You can say, "Oh, buon giorno! Would you like to translate this piece into Italiano?" Right? Those kinds of things are absolutely phenomenal.

By the way, I had a great time in Milan with some friends from WebRanking.it and Marco, exceptional experience. The Social Media Conference there had 25,000 people come to it. It’s insane. People care about SEO overseas, and you can leverage that to get these translated articles out there on the Web and then to have the links point back to you. What does it look like to Google when ten sites from all over the world are all pointing back to your reference articles? It looks like you’re going to win at SEO.

All right, everyone. I hope you’ve enjoyed this edition of Whiteboard Friday. I hope you’ll join us again next week for another one. Take care.

Video transcription by SpeechPad.com



Follow SEOmoz on Twitter! Also, you can always follow me, Aaron.

If you have any tips or advice that you’ve learned along the way, or if you also love pretty much anything HBO produces, we’d love to hear about it in the comments below. Post your comment and be heard!

Do you like this post? Yes No


SEOmoz Daily SEO Blog

Tags: , , , , , , ,

Sep 11

Posted by randfish

Last week at our annual mozinar, Ben Hendrickson gave a talk on a unique methodology for improving SEO. The reception was overwhelming – I’ve never previously been part of a professional event where thunderous applause broke out not once but multiple times in the midst of a speaker’s remarks.

Ben Hendrickson of SEOmoz speaking at the London Distilled/SEOmoz PRO Training
_
Ben Hendrickson speaking in last Fall at the Distilled/SEOmoz PRO Training London
(he’ll be returning this year)

_

I doubt I can recreate the energy and excitement of the 320-person filled room that day, but my goal in this post is to help explain the concepts of topic modeling, vector space models as they relate to information retrieval and the work we’ve done on LDA (Latent Dirichlet Allocation). I’ll also try to explain the relationship and potential applications to the practice of SEO.

A Request: Curiously, prior to the release of this post and our research publicly, there have been a number of negative remarks and criticisms from several folks in the search community suggesting that LDA (or topic modeling in general) is definitively not used by the search engines. We think there’s a lot of evidence to suggest engines do use these, but we’d be excited to see contradicting evidence presented. If you have such work, please do publish!

The Search Rankings Pie Chart

Many of us are likely familar with the ranking factors survey SEOmoz conducts every two years (we’ll have another one next year and I expect some exciting/interesting differences). Of course, we know that this aggregation of opinion is likely missing out on many factors and may over or under-emphasize the ones it does show.

Here’s an illustration I created for a presentation recently to help illustrate the major categories in the overall results:

Illustration of Ranking Factors Survey Data

This suggests that many SEOs don’t ascribe much weight to on-page optimization
_

I myself have often felt that from all the metrics, tests and observations of Google’s ranking results, the importance of on-page factors like keyword usage or TF*IDF (explained below) is fairly small. Certainly, I’ve not observed many results, even in low competitive spaces, where one can simply add in a few more repetitions of the keyword, maybe toss in a few synonyms or "related searches" and improve rankings. This experience, which many SEOs I’ve talked to share, has led me to believe that linking signals are an overwhelming majority of how the engines order results.

But, I love to be wrong.

Some of the work we’ve been doing around topic modeling, specifically using a process called LDA (Latent Dirichlet Allocation), has shown some surprisingly strong results. This has made me (and I think a lot of the folks who attended Ben’s talk last Tuesday) question whether it was simply a naive application of the concept of "relevancy" or "keyword usage" that gave us this biased perspective.

Why Search Engines Need Topic Modeling

Some queries are very simple – a search for "wikipedia" is non-ambiguous, straightforward and can be effectively returned by even a very basic web search engine. Other searches aren’t nearly as simple. Let’s look at how engines might order two results – a simple problem most of the time that can be somewhat complex depending on the situation.

Query for Batman

Query for Chief Wiggum

Query for Superman

Query for Pianist

For complex queries or when relating large quantities of results with lots of content-related signals, search engines need ways to determine the intent of a particular page. Simply because it mentions the keyword 4 or 5 times in prominent places or even mentions similar phrases/synonyms won’t necessarily mean that it’s truly relevant to the searcher’s query.

Historically, lots of SEOs have put effort into this process, so what we’re doing here isn’t revolutionary, and topic models, LDA included, have been around for a long time. However, no one in the field, to our knowledge, has made a topic modeling system public or compared its output with Google rankings (to help see how potentially influential these signals might be). The work Ben presented, and the really exciting bit (IMO), is in those numbers.

Term Vector Spaces & Topic Modeling

Term vector spaces, topic modeling and cosine similarity sound like a tough concepts, and when Ben first mentioned them on stage, a lot of the attendees (myself included) felt a bit lost. However, Ben (along with Will Critchlow, whose Cambridge mathematics degree came in handy) helped explain these to me, and I’ll do my best to replicate that here:

Simplistic Term Vector Model

In this imaginary example, every word in the English language is related to either "cat" or "dog," the only topics available. To measure whether a word is more related to "dog," we use a vector space model that creates those relationships mathematically. The illustration above does a reasonable job showing our simplistic world. Words like "bigfoot" are perfectly in the middle with no more closeness to "cat" than to "dog." But words like "canine" and "feline" are clearly closer to one that the other and the degree of the angle in the vector model illustrates this (and gives us a number).

BTW - in an LDA vector space model, topics wouldn’t have exact label associations like "dog" and "cat" but would instead be things like "the vector around the topic of dogs."

Unfortunately, I can’t really visualize beyond this step, as it relies on taking the simple model above and scaling it to thousands or millions of topics, each of which would have its own dimension (and anyone who’s tried knows that drawing more than 3 dimensions in a blog post is pretty hard). Using this construct, the model can compute the similarity between any word or groups of words and the topics its created. You can learn more about this from Stanford University’s posting of Introduction to Information Retrieval, which has a specific section on Vector Space Models.

Correlation of our LDA Results w/ Google.com Rankings

Over the last 10 months, Ben (with help from other SEOmoz team members) has put together a topic modeling system based on a relatively simple implementation of LDA. While it’s certainly challenging to do this work, we doubt we’re the first SEO-focused organization to do so, though possibly the first to make it publicly available.

When we first started this research, we didn’t know what kind of an input LDA/topic modeling might have on search engines. Thus, on completion, we were pretty excited (maybe even ecstatic) to see the following results:

 

Correlation Between Google.com Rankings and Various Single Metrics
Spearman Correlation of LDA, Linking IPs and TF*IDF

 

(the vertical blue bars indicate standard error in the diagram, which is relatively low thanks to the large sample set)
_

Using the same process we did for our release of Google vs. Bing correlation/ranking data at SMX Advanced (we posted much more detail on the process here), we’ve shown the Spearman correlations for a set of metrics familiar to most SEOs against some of the LDA results, including:

  • TF*IDF – the classic term weighting formula, TF*IDF measures keyword usage in a more accurate way than a more primitive metric like keyword density. In this case, we just took the TF*IDF score of the page content that appeared in Google’s rankings
  • Followed IPs – this is our highest correlated single link-based metric, and shows the number of unique IP addresses hosting a website that contains a followed link to the URL. As we’ve shown in the past, with metrics like Page Authority (which uses machine learning to build more complex ranking models) we can do even better, but it’s valuable in this context to just think and compare raw link numbers.
  • LDA Cosine – this is the score produced from the new LDA labs tool. It measures the cosine similarity of topics between a given page or content block and the topics produced by the query.

The correlation with rankings of the LDA scores are uncanny. Certainly, they’re not a perfect correlation, but that shouldn’t be expected given the supposed complexity of Google’s ranking algorithm and the many factors therein. But, seeing LDA scores show this dramatic result made us seriously question whether there was causation at work here (and we hope to do additional research via our ranking models to attempt to show that impact). Perhaps, good links are more likely to point to pages that are more "relevant" via a topic model or some other aspect of Google’s algorithm that we don’t yet understand naturally biases towards these.

However, given that many SEO best practices (e.g. keywords in title tags, static URLs and ) have dramatically lower correlations and the same difficulties proving causation, we suspect a lot of SEO professionals will be deeply interested in trying this approach.

The LDA Labs Tool Now Available; Some Recommendations for Testing & Use

We’ve just recently made the LDA Labs tool available. You can use this to input a word, phrase, chunk of text or an entire page’s content (via the URL input box) along with a desired query (the keyword term/phrase you want to rank for) and the tool will give back a score that represents the cosine similarity in a percentage form (100% = perfect, 0% = no relationship).

LDA Topics Tool

When you use the tool, be aware of a few issues:

  • Scores Change Slightly with Each Run
    This is because, like a pollster interviewing 100 voters in a city to get a sense of the local electorate, we check a sample of the topics a content+query combo could fit with (checking every possibility would take an exceptionally long time). You can, therefore, expect the percentage output to flux 1-5% each time you check a page/content block against a query.
  • Scores are for English Only
    Unfortunately, because our topics are built from a corpus of English language documents, we can’t currently provide scores for non-English queries.
  • LDA isn’t the Whole Picture
    Remember that while the average correlation is in the 0.33 range, we shouldn’t expect scores for any given set of search results to go in precisely descending order (a correlation of 1.0 would suggest that behavior).
  • The Tool Currently Runs Against Google.com in the US only
    You should be able to see the same results the tool extracts from by using a personalization-agnostic search string like http://www.google.com/xhtml?q=my+search&pws=0
  • Using Synonyms, "Related Searches" or Wonder Wheel Suggestions May Not Help
    Term vector models are more sophisticated representations of "concepts" and "topics," so while many SEOs have long recommended using synonyms or adding "related searches" as keywords on their pages and others have suggested the importance of "topically relevant content" there haven’t been great ways to measure these or show their correlation with rankings. The scores you see from the tool will be based on a much less naive interpretation of the connections between words than these classic approaches.
  • Scores are Relative (20% might not be bad)
    Don’t presume that getting a 15% or a 20% is always a terrible result. If the folks ranking in the top 10 all have LDA scores in the 10-20% range, you’re likely doing a reasonable job. Some queries simply won’t produce results that fit remarkably well with given topics (which could be a weakness of our model or a weirdness about the query itself).
  • Our Topic Models Don’t Currently Use Phrases
    Right now, the topics we construct are around single word concepts. We imagine that the search engines have probably gone above and beyond this into topic modeling that leverages multi-word phrases, too, and we hope to get there someday ourselves.
  • Keyword Spamming Might Improve Your LDA Score, But Probably Not Your Rankings
    Like anything else in the SEO world, manipulatively applying the process is probably a terrible idea. Even if this tool worked perfectly to measure keyword relevance and topic modeling in Google, it would be unwise to simply stuff 50 words over and over on your page to get the highest LDA score you could. Quality content that real people actually want to find should be the goal of SEO and Google’s almost certainly sophisticated enough to determine the different between junk content that matches topic models and real content that real users will like (even if the tool’s scoring can’t do that).

If you’re trying to do serious SEO analysis and improvement, my suggested methodology is to build a chart something like this:

Analysis of "SEO" SERPs in Google
SERPs analysis of "SEO" in Google.com w/ Linkscape Metrics + LDA (click for larger)

Right now, you can use Keyword Difficulty’s export function and then add in some of these metrics manually (though in the future, we’re working towards building this type of analysis right into the web app beta).

Once you’ve got a chart like this, you can get a better sense of what’s propping up your competitors rankings – anchor text, domain authority, or maybe something related to topic modeling relevancy (which the LDA tool could help with).

Undoubtedly, Google’s More Sophisticated than This

While the correlations are high, and the excitement around the tool both inside SEOmoz and from a lot of our members and community is equally high, this is not us "reversing the algorithm." We may have built a great tool for improving the relevancy of your pages and helping to judge whether topic modeling is another component in the rankings, but it remains to be seen if we can simply improve scores on pages and see them rise in the results.

What’s exciting to us isn’t that we’ve found a secret formula (LDA has been written about for years and vector space models have been around for decades), but that we’re making a potentially valuable addition to the parts of SEO we’ve traditionally had little measurement around.

BTW – Thanks to Michael Cottam, who suggested the reference of research work by a number of Googlers on pLDA. There are hundreds of papers from Google and Microsoft (Bing) researchers around LDA-related topics, too, for those interested. Reading through some of these, you can see that major search engines have almost certainly built more advanced models to handle this problem. Our correlation and testing of the tool’s usefulness will show whether a naive implementation can still provide value for optimizing pages.

For those who’d like to investigate more, we’ve made all of our raw data available here (in XLS format, though you’ll need a more sophisticated model to do LDA). If you have interest in digging into this, feel free to email Ben at SEOmoz dot org.

How Do I Explain this to the Boss/Client?

The simplest method I’ve found is to use an analogy like:

If we want to rank well for "the rolling stones" it’s probably a really good idea to use words like "Mick Jagger," "Keith Richards," and "tour dates." It’s also probably not super smart to use words like "rubies," "emeralds," "gemstones," or the phrase "gathers no moss," as these might confuse search engines (and visitors) as to the topic we’re covering.

This tool tries to give a best guess number about how well we’re doing on this front vs. other people on the web (or sample blocks of words or content we might want to try). Hopefully, it can help us figure out when we’ve done something like writing about the Stones but forgetting to mention Keith Richards.

As always, we’re looking forward to your feedback and results. We’ve already had some folks write in to us saying they used the tool to optimize the contents of some pages and seen dramatic rankings boosts. As we know, that might not mean anything about the tool itself or the process, but it certainly has us hoping for great things.

p.s. The next step, obviously, is to produce a tool that can make recommendations on words to add or remove to help improve this score. That’s certainly something we’re looking into.

p.p.s. We’re leaving the Labs LDA tool free for anyone to use for a while, as we’d love to hear what the community thinks of the process and want to get as broad input as possible. Future iterations may be PRO-only.

Do you like this post? Yes No


SEOmoz Daily SEO Blog

Tags: , , , , , , ,