Semantic URL

Semantic URLs, also sometimes referred to as clean URLs, RESTful URLs, user-friendly URLs, or SEO-friendly URLs, are Uniform Resource Locators (URLs) intended to improve the usability and accessibility of a website or web service by being immediately and intuitively meaningful to non-expert users. Such URL schemes tend to reflect the conceptual structure of a collection of information and decouple the user interface from a server's internal representation of information. Other reasons for using clean URLs include search engine optimization (SEO),[1] conforming to the representational state transfer (REST) style of software architecture, and ensuring that individual web resources remain consistently at the same URL. This makes the World Wide Web a more stable and useful system, and allows more durable and reliable bookmarking of web resources.[2]

Semantic URLs also do not contain implementation details of the underlying web application. This carries the benefit of reducing the difficulty of changing the implementation of the resource at a later date. For example, many non-semantic URLs include the filename of a server-side script, such as example.php, example.asp or cgi-bin. If the underlying implementation of a resource is changed, such URLs would need to change along with it. Likewise, when URLs are non-semantic, if the site database is moved or restructured it has the potential to cause broken links, both internally and from external sites, the latter of which can lead to removal from search engine listings. The use of semantic URLs presents a consistent location for resources to user-agents regardless of internal structure. A further potential benefit to the use of semantic URLs is that the concealment of internal server or application information can improve the security of a system.

Structure

A non-semantic URL is typically composed of a path, script name, and query string. The query string parameters dictate the content that is to be shown on the page, and frequently includes information opaque or irrelevant to users, such as internal numeric identifiers for values in a database, illegibly-encoded data, session IDs, implementation details, and so on. Semantic URLs, by contrast, contain only the path of a resource, in a hierarchy that reflects some logical structure which can easily be interpreted and manipulated by users.

Non-semantic URL Semantic URL
http://example.com/index.php?page=name http://example.com/name
http://example.com/index.php?page=consulting/marketing http://example.com/consulting/marketing
http://example.com/products?category=2&pid=25 http://example.com/products/2/25
http://example.com/cgi-bin/feed.cgi?feed=news&frm=rss http://example.com/news.rss
http://example.com/services/index.jsp?category=legal&id=patents http://example.com/services/legal/patents
http://example.com/kb/index.php?cat=8&id=41 http://example.com/kb/8/41
http://example.com/index.php?mod=profiles&id=193 http://example.com/profiles/193

Implementation

The implementation of a semantic URL most often involves transparently rewriting it into the query string form understood by the server-side software. As this takes place on the server side, the semantic URL is the only form seen by the user.

For search engine optimization purposes, web developers often take this opportunity to include relevant keywords in the URL and remove irrelevant words. Common words that are removed include articles and conjunctions, while descriptive keywords are added to increase user-friendliness and improve search engine rankings.[1]

A fragment identifier can be included at the end of a semantic URL for references within a page, and need not be user-readable.[3]

Slug

Some systems define a slug as the part of a URL which identifies a page using human-readable keywords.[4][5] It is usually the end part of the URL, which can be interpreted as the name of the resource, similar to the basename in a filename or the title of a page. The name is based on the use of the word slug in the news media to indicate a short name given to an article for internal use.

Slugs are typically generated automatically from a page title but can also be entered or altered manually, so that while the page title remains designed for display and human readability, its slug may be optimized for brevity or for consumption by search engines. Long page titles may also be truncated to keep the final URL to a reasonable length.

Slugs are generally entirely lowercase, with accented characters replaced by letters from the English alphabet and whitespace characters replaced by a dash or an underscore, in order to avoid being encoded. Punctuation marks are generally removed. For example:

See also

References

  1. 1.0 1.1 Opitz, Pascal (28 February 2006). "Clean URLs for better search engine ranking". Content with Style. Retrieved 9 September 2010.
  2. Berners-Lee, Tim (1998). "Cool URIs don't change". Style Guide for online hypertext. W3C. Retrieved 6 March 2011.
  3. "Uniform Resource Identifier (URI): Generic Syntax". RFC 3986. Internet Engineering Task Force. Retrieved 2 May 2014.
  4. Slug in the WordPress glossary
  5. Slug in the Django glossary

External links