The Absolute First Step to Optimizing Your Website
Posted by Al Scillitani on February 26, 2008 – 7:24 amI bet you are thinking this article is about title tags, meta tags, keyword research, or even content, it’s not. It is about making sure your site is as search engine friendly as possible and is not getting penalized by the engines. You can optimize all you want, but if your site is not bot friendly, it may not matter much.
Here are some steps to take to check your site:
Google webmaster tools. Go into your google account and validate your site and add a google sitemap. https://www.google.com/webmasters/tools/docs/en/about.html
The information you get back is invaluable.
Is your site all All flash/ajax
It is ok to have flash and ajax on your site, however it not ok to have an ALL flash page without any text or text links nor is it ok to have an ALL Ajax site that goes from page to page without the url changing.
Dynamic urls “?” static pages only
The engines are doing a much better job recognizing dynamic urls. The main issue is that the urls are static for each page. If the product on the page are green widgets and the url is www.yourdomain.com/?12345.aspx this is not a problem as long as the page stays ?12345.aspx with that specific product on it everytime you, or someone else, goes to the page.
Search engine friendly urls
With dynamic urls, best practice is to use a url re-write to name the urls with proper keywords. In the example above, the url should be www.yourdomain.com/green-widgets.aspx
According to others that have tested dashes vs underscores vs keywords up against each other, for SEO use dashes first (green-widgets), then against (greenwidgets), then underscores (green_widgets). Recently there was some controversy when Matt Cutts stated underscores are recognized evenly in Google. The hoopla did not affect me at all because I would never use underscores anyway. Underscores are horrible for usability reasons. If you send the link to someone or someone links to your page in an article, it is hard to tell if it is an underscore or a space because the link gets underlined.
Session id’s
If there is anyway to get rid of them, do it. Similar to dynamic urls session ids used to be a complete nightmare for the bots. Google is getting better, but even if Google figures them out, it may take years for the others engines to catch up. If it changes the url everytime someone visits, the bots may recognize these as duplicate pages or get stuck trying to index the site. I think there is a way around this with PHP. Maybe someone can comment if they know PHP?
Duplicate content & Crosslinking
In Google search box type in related:yourdomain.com
and look at the results. Do you have orphaned pages, doorway pages, duplicate pages/sites
Now type in site: yourdomain.com
and look at the results. Do you have orphaned pages, doorway pages, duplicate pages/sites. Go all the way to the last page. Are you getting an Omitted Results link? If so, click on it and look through those links again. Check to make sure you do not have duplicate https and http pages, duplicate meta tage, content, etc…
Hidden content
In your browser, go to Edit and “Select All.” Look for images, links, and pixels that get highlighted that do not appear when the select all is off. These are “hidden” from the engines and your site will get penalized if it hasn’t already.
Javascript hiding
Google does not follow javascript links, however google does READ javascript and checks your external files to see if you are trying to hide content or trying to trick the bots.
Frames
I haven’t seen a site in frames for a while. If your site is in frames there are things you can do to optimize it, but I am not even going to go into it. Put your Commodore 64 away and buy dreamweaver or other web software and create a new functional, user friendly website.
Robots.txt
In your browser, type in yourdomain.com/robots.txt Are you blocking categories that you shouldn’t be? Are you allowing the bots to index categories that it shouldn’t be?
One of the first things the engines bot does is check this file to see where and where not to go.
http://www.robotstxt.org/robotstxt.html
Server status & Redirects 301
Go to www.rexswain.com and click on HTTP Viewer. Type in your url. The results show how long it takes your server to show the site and whether or not you have any redirects. If you have a redirect, in most cases it should be a 301 redirect. 301 redirects tell the engines that it is a permanent redirect. This helps in 2 ways. First it prevents duplicate content. A 302 redirect is a temporary redirect so it will still view the old page and new page in the index. Secondly, a 301 redirect will tell the engines to take all the backlinks from the original page and carry them to the new page. Check old interior pages as well. They should 301 to the new page.
Stay away from meta, javascript, and other redirects.
Check you non-www site as well. Your domain without the www in front of it should 301 to the www. url. If it does not, I believe it is a simple change in apache. In IIS it is a little more difficult, but can be done.
Textlink sitemap
Mandatory, no. Good idea, yes. Make it as easy as possible for the engines to find your pages. A link to your site map and a simple format may expedite the engines finding your interior pages. A good example for a large site is http://www.apple.com/sitemap/
404 Error page
Like the sitemap, not mandatory, but may help the bots, and customers, stay on your site. A good example for a large site is http://www.apple.com/404error
Webpage optimizer
I really like this tool. It tells you download times and areas needing improvement. It is currently a free tool and I am not affiliated with them, or any other site mentioned in this article, in any way. http://www.websiteoptimization.com/services/analyze/
Validator
This is a little controversial. Some say it does not matter, some say it does. I am on the “it does” side and should be done no matter what anyway. The validator checks to see if your site is compliant with W3C standards. I believe this helps with SEO because it helps the engines funnel through your site with minimal errors. If HTML tags are missing, you have 2 body tags, etc… this may delay, or even worse, stop the bot from crawling past that page. It may also help with browser issues (site looks differently in IE compared to Firefox).
http://validator.w3.org/
If there are items I missed, please let me know.
6 Responses to “The Absolute First Step to Optimizing Your Website”
great post.
By Dan London on Feb 26, 2008
Awesome tools Al - I will add them to my tool box.
By Garrett French on Feb 26, 2008
Al,
Great post. Can you you give me an example of a dynamically generanted URL? I was discussing this recently, and aside from a site search query (within the domain - not a Google site: search), I actually wasn’t able to to identify a dynamic URL.
For the redirects, I have found using xenu to be much more effective than rexswain. Xenu will crawl your entire site and report all redirects, categorized by type, among other things. And it’s free.
By Rick on Feb 26, 2008
Rick, There are really 2 types of dynamic urls. They have “?” and “&” identifying a query string added to the domain.
One type, the larger engines can handle. The other, they have problems with.
1. You go to an ecommerce site. The url is yourdomain.com/?444444.aspx and everytime you or anyone else visits that page, the url remains yourdomain.com/?444444.aspx.
2. You go to an ecommerce site. The url is yourdomain.com/?444444.aspx and everytime you or anyone else visits that page, the url query string changes to something else with the same content.
Anyone have an example of a site with dynamic urls that would give the engines a problem?
This one def needs to be SEO Friendly, but will still get indexed by Google http://nicoandzoe.com/catalog/product_info.php?cPath=44&products_id=361
I will look around for ones that would give GGL issues. It will always be best to re-write to more SEO friendly links.
Thanks for the info about xenu!
By Al Scillitani on Feb 26, 2008
Bigger fonts.
Re: “dynamic pages” vs. “dynamic urls”:
I consider a “dynamic URL” one that has a question mark in it, at the least. Count the ampersands (&), plus the question mark, and that’s how many variables are being passed through the URL. More variables = less awesome.
I consider a “dynamic page” anything that gets put together on the server side, then, through the magic of the Internet(!), displayed to the user. The fact is that you don’t know when this is happening…most pages on the Internet are dynamic, and therefore can cause issues with rapidly changing content on the same URL.
For site owners, use mod_rewrite on your important pages. Eliminate unnecessary directories (forward slashes in your URL), and tighten up the length of your URLs.
And that’s all I’s gotsta say about THAT.
By Ben on Feb 26, 2008
Love it.. A lot of this should be basic knowledge for anyone wanting to optimise their site, alas this is not the case.. I noiced you shot down a couple of so-called ‘Black Hat’ practices (cloaking) good call
By Shane: content writer on Mar 26, 2008