In the first chapter of this series we saw how robot.txt and sitemap.xml play big roles in exposing what all can be crawled on a site. The Search Engine (SE) bot may have reached your site during its frequent prawl or a visit via an inbound link (could be social media, your friends’ blogroll, directories etc) to your site but the important point here is that, linking your site from elsewhere improves its visibility – in the eyes of both humans and search engine bots.
Why linking is important?
Once the SE bots reach your site or a particular page it would relish the opportunity of crawling pages with similar or relevant content linked from the current page. Linking plays an important role in helping the bots with this process of crawl-and-proceed within your site as well as to other sites.
A lot of us may have heard that link building is the key to obtain high page ranks and hence improved search visibility of our contents and so on. Unfortunately many of us were taught and told theories and stories about back linking alone (Other sites linking to you) and hence to build links always meant link exchanges and commenting like crazy.
In my opinion, effective linking involves the following:
- Internal organizing of your own site
- Visible entry points to the contents
- Relevant links – forward, internal and backward
Internal organizing in this context is all about how you use your page level meta tags and how you link between your pages – index files, archives, tagged categories etc. Further, the nofollow flags should be used when required to avoid pages that are duplicate in nature to be indexed. The above mentioned aspects take care of guiding the bots the right way throughout their crawling process. The other aspect of internal organizing is the proper usage of HTML tags, categories, tags etc that I have talked about in the past.
ROBOTS meta tag or page level index control
Once the SE bot reaches on a page on your site, it checks if the links from that page is further ‘crawlable’. The page level setting of interest here is the ROBOTS meta tag in your page header. Depending on the value of this meta tag, the SE bot decides whether to further crawl the links originating from that page.
The following is the format of the ROBOTS meta tag:
< meta name="robots" content="index,follow" />
A content value of index,follow instructs the SE bots to index the current page and also to further crawl links from that page. A value of noindex would mean that the SE bots crawling job almost ends at this page for the current crawling chain. A setting of index,nofollow may not make sense at the moment but it has seems to have further implications that we shall discuss later, perhaps via comments here.
Link level control
While the page level index,follow tells the SE bots to follow all outbound links from a page, this can be overridden at each link level using the REL property or the anchor tag. The REL property of nofollow would tell the SE bots not to consider a particular link for further crawling and indexing.
The role of page and link level control over indexing and following is depicted in the following picture. The green (curved) path shows the route that an SE bot is allowed to crawl. The red arrows shows the link that would not follow and website-3 in the picture has only got noindex pages which is not an ideal setup.
A well connected web site has its main pages’ entry points defined at the sitemap. Further the internal connectivity of the pages is established via good quality internal linking This is because the search engines bots love to crawl from page A to B and from B to C and so on as long as relevant information and relationships can be picked on the fly. Your internal pages are just points (or nodes) of visit during its long journey that perhaps involves multiples sites. More internal linking would mean that more indexing opportunity of your own pages as long as relevant linking is done.
While backlinks bring in indexing opportunity (and priority) to your site or its pages, it’s also your duty to forward or direct the SE bots to crawl some of the relevant and good quality neighbourhood. This forward linking, if done via proper tagging/keywords (will talk about them in the next post) may not be a loss to you based on the credentials of the target pages.
Basically, in order to build a search engine friendly website, linking (inbound, outbound and within) is very important. In the coming posts we will talk about various link building mechanisms, keywords’ role in linking etc.