Before proceeding with our fifth article in the Google Webmaster Tools series I just want to provide some results on actions taken since our first article (Keyword significance). This blog’s keyword significance now looks as shown in the picture below, which is very good for what this blog’s contents – things are working!
Now let us move on to today’s topic which all about optimizing the sitemaps.
Diagnose Sitemap Issues
As usual, login to your Webmaster Tools account and check the Sitemap status in the dashboard itself. If it is a Red X instead of a Green ‘Tick’ you have a reason to worry.
The most reasons for the red cross status to appear on sitemap are:
- The sitemap has become unreadable by the bots and in that case you have to make sure that you have the right chmod for sitemap
- Your sitemap is not XML compliant or sitemap building process was somehow aborted. Check your sitemap generation tool or plugin and rebuild the sitemap once
- You have scraped or copied content and the Google bot refuses to index it. Stay away from copy-paste content or autoblogging in that case
- Sitemap download taking time and the bot ditched it half way. Use a gz version of the compressed sitemap or even optimize it as per the next section below
- You have opted for a non www version of the preferred domain via the Webmaster tools settings but the sitemap links are still build with www. Make sure that your canonical URL, WordPress blog URL and Webmaster tools preferred domain settings are consistent.
If you take care of the above issues, the Sitemap status should now show green.
To move to the next part of the diagnosis, you have to click the Site Configuration -> Sitemaps link to get a screen like the one given below.
Here you have to check if there is a significant difference between the URLs submitted and URLs in the web index. In my case, it is a perfect scenario whereby I have 392 URLs submitted (362 posts + 8 pages + 1 homepage + 2 DS tools URLs + 19 categories) out of which 373 are indexed because I have ‘noindex’ ROBOTS meta specified on categories. You have to work out the same math by getting the numbers from your WordPress admin panel and cross check if they match.
If there are differences then you have to check if noindex is specified on your posts, pages or additional links. If your new URLs (new posts, pages) doesn’t appear in the web index a few minutes or an hour after your post is published, then you have to check your WordPress ping list and settings there in. You can check if your site URL is indexed in Google web index by typing in the following in Google Search:
Note: Use without www if your preferred domain is without the www prefix
If the situation doesn’t improve with the ping list changes, you may want to increase your crawl rate by changing Crawl rate values in the Webmaster Tools Site Configuration -> Settings -> Set custom crawl rate.
Please note that the above settings change can increase the load on your hosting server.
1. Google XML sitemaps is an amazing plugin to generate your sitemap. However, at times when your website grows in terms of number of pages or articles, you may want to minimize your sitemap size as small as possible. If you have checked Google’s own sitemap you will notice that it doesn’t bother about a visual styling sheet or some other parameters that the Google XML sitemap adds. It is not bothered about the Change Frequency, LastChange time etc.
You may want to remove this excess information via editing the plugin files if you have the expertise to do so.
Basically this exercise reduces your sitemap file size by half. A smaller sitemap file means faster crawling and indexing.
2. In addition, in the sitemap plugin settings page you have to make sure that you generate a .gz version of the sitemap (sitemap.xml.gz) and refer the same in robots.txt.
3. Adding other pages (not WordPress generated) to the sitemap is the third step and keep the priority of all your pages to 0.5 and homepage to 1.0 as defaults.
Once you have done the above steps, your sitemap is optimized for its physical size and content.
Let me know if you have any queries on this particular post and I will be glad to help you.
Happy Site Optimization!