Crawler Access and Crawl errors in your Webmaster tools – Part 4

In the previous post of the series we saw how to increase blog traffic with the same content via some useful Google Webmaster tools SEO tricks. In this post – the fourth article in our Webmaster Tools series – let us talk about another important topic which is Google Crawl Errors.

Crawl errors affect your site’s search ranks big time. Basically the Search engine bots don’t like to face the fact that some content that you claim to be available at your site is not available for crawling. Fortunately, the WordPress blog SEO is so simple that you can now fix all those errors in a few minutes.

Verify Crawler Access

1. Log in to your Google Webmaster tools account and head to Site configuration -> Crawler access. If you do not see the status as 200 (Success) (see the picture below) there your site is blocked for crawling. Right below that there’s a copy of your robots.txt and the reason for a total block would be usually a ‘Disallow: /’statement in your robots.txt. You can use the ‘Generate robots.text’ tab on the same page to generate the right robots.txt file for your site by blocking certain directories.

google-webmaster-tools-crawler-access

You can the test and retest your site’s crawler accessibility by using the ‘Test’ button at the bottom of the same page. Still if there are issues, you might want to check the chmod for robots.txt (read: WordPress File Permissions) which should be in such a way that the crawler gets to read that file.

2. If the robots.txt looks fine both in terms of content and chmod, you can now head over to Diagnostics -> Crawl errors section of your Webmaster tools.

Now, this is where most of the crawler bad encounters are listed and hence appropriate corrections required from your side.

Fixing Webmaster Tools Crawl Errors

Throughout the scope of this article, we will be focusing only on the Web crawl error data and hence Mobile CHTML and Mobile WML/XHTML tabs will be skipped. This is mainly because, WordPress blog mobile capability is usually built via plugins and hopefully they are able to take care of redirections, XHTML rendering and right linking themselves.

webmaster-tools-crawl-errors

Note: It is possible that your Webmaster tools crawl errors don’t have any of the error pages mentioned below. In that case, you should feel happy for the time saved and may hit the back button or exit this article right now

1. Under the Web tab, the first sub-tab (underlined in orange) lists any HTTP errors such as 403 Forbidden error. You don’t need to worry too much about this as, it is only showing web user access of disallowed or blocked folders and files. You don’t need to do much here but it’s a good indication of someone trying to access your internal pages that are blocked – probably an excited blogging buddy or hacker.

2. The last sub-tab lists mostly 500 errors which again shouldn’t bother you much because most 50X errors tells you about the technical error status of your website or page at a particular point of time.

3. The Not Found (number) sub-tab is off our concern now and probably the data there in might help you to increase your traffic, reduce bounce rate and get better search ranks.

This tab usually lists all 404 errors on your blog which is nothing but Page not found errors that hits your WordPress blog’s 404 page.

You can skip those URLs with ‘unavailable’ status for now because they don’t exist anyhow or were used during some testing or preview in the past. We need to however fix those 404 (Not found) errors which have pages linked from. In order to do that, you have to do the following:

Analyze 404 Errors
  • Click the URL and make sure that it actually comes up with a 404 Error and hence need a fix. This initial check has to be done because Google Webmaster tools takes a few weeks before clearing or updating crawler errors
  • If it’s a real 404 error, the reasons could be the following:
    1. You have deleted or hidden (turned status to draft) that post
    2. You have changed the permalink of that post
    3. A scraper ‘s autoblogging or feedscraping software wrongly referred to it
    4. A genuine inbound link (your comment elsewhere or genuine linking from another blog or even your own internal link) referred to the wrong URL
404 Not Found error fix

In order to fix those 404 errors permanently, take the following steps.

1. If it is an internal linking issue, identify the internal posts and pages (i.e. pages linked from) and edit those posts to provide the right link. Retest the edited link to make sure that there’s no more 404 error.

2. If it’s an issue with a scraper or automated tool referring to wrong pages, you have to contact the scraper webmaster in order to ask him to refrain from such acts.

3. If a genuine post permalink was changed by you, you have to then use a 301 redirect to the new post. I use an amazing plugin to achieve it. It’s called the WordPress Redirection plugin. After I installed this so-easy-to-use plugin, my 404 errors have been drastically reduced (even the one shown on the screenshot above) and my page views per month have gone up by 500 or 600. The plugin gives details on how many redirects took place on each of the entries in its admin area.

4. For some reason, if you cannot put a redirection for a genuine external link, you may ask your good webmaster friend to change that inbound link for you or delete the same in the worst case. It is important to do this step for your top posts that are traffic feeders.

Hope you enjoyed reading this post. The fifth part of the series will be out next week.

Part V: Sitemap Optimization

Happy SEO!

Comments

  1. in case i have deleted some posts
    what should i do

  2. I have been looking every where trying to find good clear information on what to do about 404 errors listed in my webmaster account. I’m so happy to have come across your blog. Thank you very much for the information.

  3. I have a phpBB forum hosted on my server. I hadn’t checked my webmaster tools, when i did i noticed there were nearly 5000 crawl errors. I looked at the link and it turned out the folder had been renamed from forums to forum and all the links were pointing to the old path. I renamed the folder back to forums and created a robots.txt file which blocked off the old path and uploaded it to google.

    I haven’t created a sitemap yet because i will be replacing our website with a new html5 version.

    These crawl errors really are frustrating.

Speak Your Mind

*