SEO for Dummies 3 – How does a Search Engine Works?
How does a search engine select the pages to show for a given query? How a specific query is processed? How does a search engine finds the pages online?
This article briefly explains how a search engine works
Search engines use automated programs (called spiders or bots) that explore the web, jumping from one page to the other following the links they found.
When a page is found, or a known page is re-visited, its content it’s saved in the search engine database, so it can be accessed faster in the future.
3. Returning Results
When a query is sent to the search engine (i.e. when a user hit the “search” button in the search engine homepage), the matching pages are selected and ranked with a specific algorithm (every search engine has it’s super-secret ranking algorithm), and the pages are returned to the user ordered by descending importance.
There are enormous differences in the ranking algorithms used by the search engines, but all of them are based on relevance and popularity.
This are terms from the Information Retrieval, of which search engines are one of the most visible application.
Basically higher relevance means that the document is more focused on the given search term, and higher popularity means that the document is more cited from other sources.
In terms of search engines,
relevance is evaluated analyzing
- the page textual content
- the pages that provide inbound links
- reading the anchor text used to link to the document
- reading the text surrounding the link
- evaluating the linking pages
This means, for example, that a page can rank well for a phrase or keyword even if that phrase never appear in that page.
(One famous case is Bush bio ranking #1 for miserable failure on google… this is the result of a massive use of “miserable failure” as anchor text for www.whitehouse.gov/president/).(This is no more true due to a change in google algorithm)
popularity is evaluated counting the number of links to the given page (more links means more popularity)
Given this two main criterions, each search engine adds its own interpretations, for example giving more weight to some “trusted” sites (.edu and .gov domains and sites with higher popularity are considered more trusted), or giving different weights to each element (page title, body, heading tags…)
As an example consider my google guide: it ranks #1 for ‘mapelli’ (my last name) on google, because it has been widely linked with the title of the page (that contains the domain name, i.e. www.mapelli.info), and google gives high relevance to inbound links text, while the same article is not in the top 100 results on yahoo. (This is no more true due to a change in google algorithm)
The obvious consequence is that if you want to get higher rankings you have to
- allow search engines to find your site
- make easy for the spiders to understand the structure of the pages
- increase your relevance
- increase your popularity
We’ll talk about how to do this in the next few articles.
- Spiders or bots: automated programs that crawl the web and index the pages
- Relevance: represents how much a web page match the search terms
- Popularity: represents the number of “citations” (inbound links) of a given webpage, it’s a metric of the importance of the webpage
|Print article||This entry was posted by francesco mapelli on 2007/01/15 at 1:46 am, and is filed under Uncategorized. Follow any responses to this post through RSS 2.0. You can leave a response or trackback from your own site.|
about 5 years ago - 134 comments
about 6 years ago - 13 comments
about 6 years ago - 13 comments
about 6 years ago - 30 comments
about 6 years ago - 2 comments
about 6 years ago - 5 comments
about 6 years ago - 31 comments
about 6 years ago - 244 comments
about 6 years ago - 41 comments
about 6 years ago - 36 comments
The Apache server provides directory-level configuration via .htaccess files. This file can override Apache default configuration and change it for the local directory. If you are not a lazy blogger, you may be intrested in some tips I recently discovered to optimize your .htaccess file in order to have better search engine position, avoid spam…