SEO – Link Building Series: How Search Engines find your blog or website?

How can I get my site on Google search?
I have answered that question at least 10 times on Yahoo Answers and several times over email. Instead of responding to yet another question individually, I thought of creating this reference post – and time permitting a series itself – here on this blog.

How people discover your blog or website?

my website on google searchBefore looking into the technical details on what it takes to increase the search visibility of your blog, one should understand how people discover your website content or blog posts on the web. Unless somebody already knew about your website, there are only three major ways for a total stranger to find you on the web and they are:

#1 Advertising: Online/offline advertising, Exchange advertisement programs, PPC (Pay-Per-Click) advertising, SEM (Search Engine Marketing), Article marketing etc

#2 Referral links: Blogrolls, Directory listing, Forums, RSS/Feed directories, Social bookmarks, Social network links etc

#3 Search Engines: Via Google, Yahoo, MSN, Ask etc.

Off the above three, the most popular method used by people to find information on the web is via search engines. It, probably, is also the most sustainable model for long term to drive traffic to you website or blog.

Now the process of optimizing your site content to make it search engine friendly and easier to spot by the search engine bots is what is known as Search Engine Optimization (SEO).

How Search Engines find information?

The World Wide Web (WWW) in principle is a huge repository of millions of interconnected PCs and information stored in them. So, in order to get your blog or website content to be probed and indexed by the search engines, they need to know about your site’s existence first. This can be done via search engine submission for newer sites or let the search engines automatically find your site via reference links to your site from other sites. In either case, ultimately the search engine spider (a piece of software that examines your site for its content) needs to crawl your site and index its content for the use of search users. Now, there are two entities on your site that helps these spiders probe your site effectively. They are:

#1 Robot.txt: Once your website is submitted to the search sites, the search engine spiders (also known as crawlers or bots) will attempt to navigate your site to understand its content. A robot.txt file maintained on your website will tell the spiders which all folders and modules on your web server are allowed to be crawled by them. (The format and composition of robot.txt is not in the scope of this post)

#2 Sitemap.xml: Once the robot.txt allows spiders to crawl certain areas of your server folders, the sitemap pitches in to provide more details. The sitemap.xml file contains all your pages, posts, links to tools and web applications etc that need to be made visible to the public users (and hence to be indexed by the search engines). Sitemap.xml will also contain more details on these pages such as when it was last changed, the priority of the page etc that help spiders to decide on when to index them.

Basically robot.txt provides folder/file level access details and sitemap.xml provides detailed information on pages. These two files are essential for any website that is search engine friendly and they are usually maintained at the root folder of your website.

Page & link level instructions to the spiders

In addition to the above to files, there are page and link level instructions that help spiders to decide whether to index a page or a target page that is pointed to by a link on the page (URL). These are page meta tags and link properties and I shall talk about them in another post soon.

Additional tips

  • Sitemap.xml is usually submitted to search engines via tools such as Google Webmaster Tools. Even robot.txt can be analyzed sample robot.txt created there in
  • Submiting your website or blog to search engines can be either done manually per search engine or done via bulk submission services (paid as well as free) offered by certain sites
  • Submission of your website does not necessarily guarantee that your site will be appearing in search results immediately. There are other aspects to prioritizing your site contents’ weightage

Part II: Page Rank and Search Engine Optimization >>


  1. This is a great tutorial for beginners Ajith – now when people ask you questions you can just link to you own article (and get more traffic).

    As I was reading it I started thinking about terms that we use that are normal to us but are actually a little strange – robots, spiders – it almost sounds a bad horror film πŸ˜‰

    Ok – now I’m on to part 2.

    Kim Woodbridge’s last blog post… Adding WordPress 2.7 Threaded Comments to your Theme

  2. @Kim, thanks and you spelt out my idea πŸ™‚ I think I can get people point to this post from where they can learn a few basics on SEO and link building.

  3. I think pinging is another way to let search engines find and index your page.
    But sometimes search engines can find your site with some other secret methods which is not listed above, Recently I have built a new website and did not have started link building or any other SE marketing and I found it is indexed by google, strange?!

    Chinese Girl’s last blog post… Ramada Nanjing

  4. @Chinese Girl, thanks for your visit and comment πŸ™‚ Thanks for reminding me of ‘pinging’ as a means to alert the search engines to crawl your site. Platforms like WordPress has this feature built-in and plugins like Google XML sitemap, further enforces it.

    Even without doing search engine submission, the search engine bots could still find you from your public profiles (on social networks, yahoo/Google groups, blog directories, FeedBurner etc)

  5. We generally don’t bother submitting our sites to the search engines as such. We do an initial batch of directory listings and links from quality high PR sites, allied to good on-page optimization, and we find the pages get indexed quickly by the search engine spiders. It actually seems to take longer ifthe site gets submitted….

  6. Richael Neet :

    One of my friends, Swastik has really found an ingenious method of quickly getting your site indexed in Google. Just go to Google Keyword Tool, analyze your website for keywords and presto; your site gets crawled and indexed in a very short time. I have tried this and it seems to work for me too. πŸ™‚

    Just a little tip I felt like sharing…

  7. @Ruby web, that sounds like a strategy πŸ™‚

    @Richael, wow, cool tip… I should try it with my very next post πŸ™‚

  8. Submitting to web directories is a vital part of every successful link building strategy. Apart from driving traffic to your website through direct referrals, web directories provide static, one-way links to your site, boosting your link popularity and improving your rankings on the major search engines like Google and Yahoo.

  9. Sometimes adding the sitemap.xml and robots.txt is just not enough. If it’s a fresh website I would not recommend adding the project to the directories, as this will not bring you a desired effect, Google just will see another link in the spam directory with low authority rate. Instead, I would suggest adding the website to the social media platforms, like twitter f.e. Since Google have launched the real-time search, he has to index this information, so all the links posted in twitter are crawling almost immediately, and from my experience are indexed faster, even if they have a tag.

  10. sir pls tell correct use robot.txt.i don’t know robot.txt but my site restricted robot.txt so pls send infn how to recover this problem when this robot.txt file will useful one and send infn seo

    • Best way to solve the issue is to sign up with Google Webmaster Tools, upload your sitemap and diagnoze your robot.txt file there. It has a mechanism to design and validate robot.txt file. Otherwise, just contact me with your blog URL and current robot.txt content.

      • I just checked your blog. You seem to have disabled the search engines from finding your blog. First you check your blogger admin settings. In Settings -> Basics, opt for “Yes” to ‘Let search engines find your blog” and “Add your blog to our listings”. Once that is done, search engines should find your blog and index it.

        In addition (after making the above change) load your blog in the browser and choose “View source” from browser and search for “<meta content=’noindex’ name=’robots’>”. If that statement is there, search engines will not index your posts. In that case, mostly you will have to edit the blogger template itself to remove it.

        • sir i doing this step now how to finding my blog in search index pls help me sir

        • @Anbu, Your site is already indexed by Google. You will surely start building traffic in the due course of time. You need to do a little bit of site optimization to create the right post titles though.

  11. sir my site now also having robot.txt now if it’s affect my search result and how to modify the robot.txt file

Speak Your Mind