How Search Engines Work - Article
Relevant Search with Search Engines:
The Goal of search engines is to spread information though internet with a relevant search and web content. The revenue generation is though selling advertising space with different methods and places. The pay per Click advertising method is the most common one for all major search engines like Google AdWords. Most of people have a thought that search engine makes money with some hidden techniques, but that’s not true at all.
Listing a New Site is not easy:
Listing a new site is not an easy task for any search engine. First of all it’s hard to know if that site exists on web or not. If someone submits a new site or Search Crawl it using any link then the question arises of its quality. Most of search engines calculate the relevancy and quality by checking the incoming links that page is getting from different sites.
Parts of a Search Engine:
There are many different ways how search engines organize their web content, but every crawling style search engine like Yahoo Google and MSN has the same basic structure. Every search must have three basic structures listed below:
• Crawler or Spider
• Index (or catalog)
• Search Algorithm
Crawler (or Spider):
The crawler does just what its name implies. It scours the web following links, updating pages, and adding new pages when it comes across them. Each search engine has periods of deep crawling and periods of shallow crawling. There is also a scheduler mechanism to prevent a spider from overloading servers and to tell the spider what documents to crawl next and how frequently to crawl them. Rapidly changing or highly important documents are more likely to get crawled frequently. The frequency of crawl should typically have little effect on search relevancy; it simply helps the search engines keep fresh content in their index. The home page of CNN.com might get crawled once every 10 minutes. A popular rapidly growing forum might get crawled a few dozen times each day. A static site with little link popularity and rarely changing content might only get crawled once or twice a month.
The best benefit of having a frequently crawled page is that you can get your new sites, pages, or projects crawled quickly by linking to them from a powerful or frequently changing page.
The Index:
The index is where the spider collected data is stored. When you perform a search on a major search engine, you are not searching the web, but the cache of the web provided by that search engine's index.
Search Interface:
The search algorithm and search interface are used to find the most relevant document in the index based on the search. First the search engine tries to determine user intent by looking at the words the searcher typed in.
|