Web Search Engines

A web search engine does three main things:

  1. First, it gathers the content of all web pages for which it will be able to search, using a program called a crawler (or spider).
  2. Next, it organizes the content it has collected in a way that allows for efficient retrieval. This process is called indexing
  3. Lastly, it takes a query given to it by some end-user, determines which pages match that query, and displays the results in accordance with some ranking.

A crawler using a breadth-first crawl starts with a set of known URL's that function as vertices. Links on the web pages corresponding to those URL's play the role of directed edges from the known vertices to new vertices. Following a bread-first search algorithm, the entire network of URL's are explored in "concentric circles" around these starting URLs -- first visiting pages that are one link away from the starting set of URLs, then visiting pages that are two links away, and so on...