Web search query

From Wikipedia, the free encyclopedia

A web search query is a query that a user enters into web search engine to satisfy his or her information needs. Web search queries are distinctive in that they are unstructured and often ambiguous; they vary greatly from standard query languages which are governed by strict syntax rules.

Contents

[edit] Types

There are three broad categories that cover most web search queries[1]:

  • Informational queries – Queries that cover a broad topic (e.g., colorado or trucks) for which there may be thousands of relevant results.
  • Navigational queries – Queries that seek a single website or web page of a single entity (e.g., youtube or delta airlines).
  • Transactional queries – Queries that reflect the intent of the user to perform a particular action, like purchasing a car or downloading a screen saver.

Search engines often support a forth type of query that is used far less frequently:

  • Connectivity queries – Queries that report on the connectivity of the indexed web graph (e.g., Which links point to this URL?, and How many pages are indexed from this domain name?).

[edit] Characteristics

Most commercial web search engines do not disclose their search logs, so information about what users are searching for on the Web is difficult to come by[2]. Nevertheless, a study in 2001 [3] analyzed the queries from the Excite search engine showed some interesting characteristics of web search:

  • The average length of a search query was 2.4 terms.
  • About half of the users entered a single query while a little less than a third of users entered three or more unique queries.
  • Close to half of the users examined only the first one or two pages of results (10 results per page).
  • Less than 5% of users used advanced search features (e.g., Boolean operators like AND, OR, and NOT).
  • The top three most frequently used terms were and, of, and sex.

Another study in 2005 of Yahoo's query logs revealed 33% of the queries from the same user were repeat queries and that 87% of the time the user would click on the same result[4]. This suggests that many users use repeat queries to revisit or re-find information.

In addition, much research has shown that query term frequency distributions conform to the power law, or long tail distribution curves. That is, a small portion of the terms observed in a large query log (e.g. > 100 million queries) are used most often, while the remaining terms are used less often individually. [5] This example of the Pareto principle (or 80-20 rule) allows search engines to employ optimization techniques such as index or database partitioning, caching and pre-fetching.

[edit] References

  1. ^ Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schutze (2007), Introduction to Information Retrieval, Ch. 19
  2. ^ Dawn Kawamoto and Elinor Mills (2006), AOL apologizes for release of user search data
  3. ^ Amanda Spink, Dietmar Wolfram, Major B. J. Jansen, Tefko Saracevic (2001). "Searching the web: The public and their queries". Journal of the American Society for Information Science and Technology 52 (3): 226–234. doi:10.1002/1097-4571(2000)9999:9999<::AID-ASI1591>3.3.CO;2-I. 
  4. ^ Jaime Teevan, Eytan Adar, Rosie Jones, Michael Potts (2005). "History repeats itself: Repeat Queries in Yahoo's query logs". Proceedings of the 29th Annual ACM Conference on Research and Development in Information Retrieval (SIGIR '06): 703-704. doi:10.1145/1148170.1148326. 
  5. ^ Ricardo Baeza-Yates. "Applications of Web Query Mining", Springer Berlin / Heidelberg, pp. 7-22. 

[edit] See also

[edit] External links