Posted by on | No Comments

A standard SEO website analysis is going to consist of checking everything on your site to make sure it is search engine friendly and optimized. The analyst will run tools on your site to check internal links, links coming from other sites (backlinks), keywords, keyword metrics, code analysis, current indexed pages, and a whole lot more depending on the depth they want to report on.
What I am going to run through are some tools and files you should have in place, to ensure everything is working as intended. These tools include Link checking, Text only browsing, and files to ensure your site is indexed properly.

Check Every URL, No Matter Where: Xenu Link Checker

Xenu’s Link Sleuth™ will check your site for broken links of any type.
Link checking is done on:

  • Internal and external links
  • Images
  • Frames
  • Plug-ins (Flash, QuickTime, and more)
  • Backgrounds
  • Local image maps
  • Style sheets
  • Scripts
  • Java applets

As the software runs it displays a continuous list of good and bad links found on every page of your site. You can sort each column by different criteria to pinpoint problems.
It displays a continuously updated list of URLs which you can sort by different criteria. A report can be produced at any time.
This program will find broken URLs in your CSS and JavaScript files and will what type of file the URL is pointing to text/html, image/gif, and more. Other nice features for SEO is the ability to sort the title of page column. This allows you to identify duplicated titles, miss spellings, un-optimized titles and pages without titles.

If you have a site with a lot of external links to other sites you should run this weekly if not two or more to ensure the resources are still working.

Download from http://home.snafu.de/tilman/xenulink.html

Lynx Browser: The Text Web Browser

Lynx is a web browser like Firefox, IE, Chrome. Except Lynx only shows you text. It renders your web page and shows you how it would look to a search engine’s bot crawling your site. Some examples are googlebot, bingbot, yahoobot, ect. You want to know what your site looks like in the eyes of the search engines.

Lynx is a very old piece of software originally written for Unix systems. Lynx is available for Windows in 2 forms, The easy one and the hard one. Most people who are not used to installing software that does not have a nice installer that does the work for you will have no chances. That is the hard one. The easy one is just an extension for Firefox call Yellowpipe Lynx Viewer.

Remember what is closest to the top is the most important part of the page. If you don’t see any text that is important your page is not optimized. The reason why ALT tags are so important on images is so the web bots have a name to put with what is there. The image is not displayed but the name or words inside the img tag’s alt attribute. If your code was <img src=”logo.gif” alt=”Company’s Name Logo”/> The text that would be displayed would be “Company’s Name Logo”.

The Lynx browser puts a number next to link that is displayed as text. The link URL is then displayed at the very bottom of the page in the corresponding order it was found on the page. That is the reason for the number next to the text.

My suggestion would to use the simple version, the Firefox addon. I do support using the original versions and finding someone technical to set it up for you. However there are some compiled versions but the Firefox addon is more convenient for quick page checking.

http://download.cnet.com/Yellowpipe-Lynx-Viewer-Tool
http://en.wikipedia.org/wiki/Lynx_web_browser

Search Engine File Rescrictions and Permissions: Robots.txt

A robots.txt file is a file specifically for search engine bots, crawlers, or spiders. This file tells them or suggests to them what they can go see. A nice playing web bot will find a link on your site and before it goes to index it. The bot will scan your robots.txt file to make sure it is allowed to.

A defacto standard robots.txt file to allow every file on your site to be indexed would be:

  User-agent: *
  Disallow:

If however you have a folder with pdfs you od not not want indexed directly in a search engine your robots.txt file would look like:

  User-agent: *
  Disallow: /pdf/

Or say you do not want web bots to crawl your site at all:

  User-agent: *
  Disallow: /

The bad playing web bots like the ones that scan for email addresses or other blackhat acts will not even consider looking at your robots.txt. they could possibly look at it for places that you do not want anyone to see that is public and not protected. So do not try and hide anything with just a robots file.

A great place to make a robots.txt file: http://www.mcanerin.com/en/search-engine/robots-txt.asp

Search Engine Sitemap: Sitemap.xml

Your sitemap xml file is like a robots.txt file, it talks to web bots. However this time instead of telling it where it can go, the sitemap.xml file is the table of contents for your entire site.
The sitemap file allows you to also include additional info about each page:

  • Its last updated time
  • How often the pages are updated
  • The important the page has in relation to other pages in the site.

This allows search engines and other web bots to crawl the site more intelligently and faster.
An example of a sitemap file:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
   <url>
      <loc>http://www.example.com/</loc>
      <lastmod>2005-01-01</lastmod>
      <changefreq>monthly</changefreq>
      <priority>0.8</priority>
   </url>
</urlset>

If you would like the technical explanation of the sitemap.xml for I suggest going to the resource below. The easiest way to make a sitemap.xml file is to use a generator which are available all over the web.
The sitemap.xml compliments the robots.txt file, you can add a line to the robots.txt file to reference your sitemap.xml file location. In the robots.txt you would add:

Sitemap: http://www.yoursite.com/sitemap.xml

You will see some sitemaps with the extension .gz. That extension is a compress version of the xml file. Sites will 1000s of pages can have a very large sitemap file. Compressing them reduces bandwidth among other things.

http://www.sitemaps.org/
http://www.xml-sitemaps.com/

Search Engine Crawling and Indexing Tools

Google, Yahoo, and Bing search engines all have portals for you to help you help them. You can submit your site, add sitemap.xml file location, and see any errors or problems the search engine is having with your site.
Out of the 3; Google Webmaster Tools is the most full featured. GWT as we refer to it. Lets you see a lot of great information, including:

  • Your Website Configuration
    • Sitemaps (add your sitemap.xml file URL)
    • Crawler access (add your robots.txt file URL)
    • Sitelinks (for established sites of authority, cool looking search results block)
    • Change of address (changing domain but keeping content. Read completely)
    • Settings (only show to a specific country, specify correct url structure of domain)
  • Your site on the web
    • Top search queries (where do you rank high and for what)
    • Links to your site (backlinks from other sites)
    • Keywords (what keywords your site ranks in)
    • Internal links (what pages are found)
    • Subscriber stats (who is subscribed to your RSS feed)
  • Googlebot Diagnostics
    • Crawl errors (hard time finding pages it will show you)
    • Crawl stats (Googlebot activity in the last 90 days on your site)
    • HTML suggestions (any problems with meta tags, page title tags, pages it cannot index)

That’s sums up Google Webmaster Tools features. I would suggest to start with Google, then go on to Yahoo and Bing. If you spend enough time in Google you will run through Yahoo and Bing in minutes. The features are about the same. Google has a lot more functionality so Yahoo and bing are currently much more scaled down versions of webmaster tool portals.

Google Webmaster Tools
http://www.google.com/webmasters/tools/

Yahoo Site Explorer
https://siteexplorer.search.yahoo.com/

Bing Webmaster Tools
http://www.bing.com/webmaster

The major idea is to get in and let Google, Yahoo, and Bing know you exist. It is like opening the door to your store and throwing a program guide and menu at them.

This will conclude my SEO website analysis starting guide. These tools are designed to get you started and show the 3 top search engine you exist if they do not know already. You will also find any problems they are having trying to index your website or any suggestions. I suggest you follow this guide to as it is written:

  1. Check your links/URLs on your site (Xenu Link Checker)
  2. Look at your websites content structure in Lynx or Yallowpipe Lynx Viewer. Do the headings all render correctly? Is it readable by humans easily?
  3. Create a Robots.txt and Sitemap.xml files to inform Search engine web bots and  crawlers on what to go index and what not to index.
  4. After the last steps are completed I would then go register for Google, yahoo, and Bing and start filling in the information to help them index you better. Remember when you think you have a problem ranking, go check out the webmaster tools. You would surprised how many times the error is in their waiting for you to correct it.