Web search query
From Wikipedia, the free encyclopedia
A web search query is a query that a user enters into web search engine to satisfy his or her information needs. Web search queries are distinctive in that they are unstructured and often ambiguous; they vary greatly from standard query languages which are governed by strict syntax rules.
Contents |
[edit] Types
There are three broad categories that cover most web search queries[1]:
- Informational queries – Queries that cover a broad topic (e.g., colorado or trucks) for which there may be thousands of relevant results.
- Navigational queries – Queries that seek a single website or web page of a single entity (e.g., youtube or delta airlines).
- Transactional queries – Queries that reflect the intent of the user to perform a particular action, like purchasing a car or downloading a screen saver.
Search engines often support a forth type of query that is used far less frequently:
- Connectivity queries – Queries that report on the connectivity of the indexed web graph (e.g., Which links point to this URL?, and How many pages are indexed from this domain name?).
[edit] Characteristics
Most commercial web search engines do not disclose their search logs, so information about what users are searching for on the Web is difficult to come by[2]. Nevertheless, a study in 2001 [3] analyzed the queries from the Excite search engine showed some interesting characteristics of web search:
- The average length of a search query was 2.4 terms.
- About half of the users entered a single query while a little less than a third of users entered three or more unique queries.
- Close to half of the users examined only the first one or two pages of results (10 results per page).
- Less than 5% of users used advanced search features (e.g., Boolean operators like AND, OR, and NOT).
- The top three most frequently used terms were and, of, and sex.
Another study in 2005 of Yahoo's query logs revealed 33% of the queries from the same user were repeat queries and that 87% of the time the user would click on the same result[4]. This suggests that many users use repeat queries to revisit or re-find information.
In addition, much research has shown that query term frequency distributions conform to the power law, or long tail distribution curves. That is, a small portion of the terms observed in a large query log (e.g. > 100 million queries) are used most often, while the remaining terms are used less often individually. [5] This example of the Pareto principle (or 80-20 rule) allows search engines to employ optimization techniques such as index or database partitioning, caching and pre-fetching.
[edit] References
- ^ Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schutze (2007), Introduction to Information Retrieval, Ch. 19
- ^ Dawn Kawamoto and Elinor Mills (2006), AOL apologizes for release of user search data
- ^ Amanda Spink, Dietmar Wolfram, Major B. J. Jansen, Tefko Saracevic (2001). "Searching the web: The public and their queries". Journal of the American Society for Information Science and Technology 52 (3): 226–234. doi: .
- ^ Jaime Teevan, Eytan Adar, Rosie Jones, Michael Potts (2005). "History repeats itself: Repeat Queries in Yahoo's query logs". Proceedings of the 29th Annual ACM Conference on Research and Development in Information Retrieval (SIGIR '06): 703-704. doi:10.1145/1148170.1148326.
- ^ Ricardo Baeza-Yates. "Applications of Web Query Mining", Springer Berlin / Heidelberg, pp. 7-22.