Talk:Ternary search tree

From Wikipedia, the free encyclopedia

I would like to dispute the claim that a ternary search tree is faster than hashing "for many typical search problems". I believe this claim is based on the paper located at:

http://www.cs.princeton.edu/~rs/strings

..which the Dr. Dobbs article is also based on. This paper compares to a very slow hashtable implementation. In particular, they use a hashtable size that is not prime (thus increasing the chance of collisions) and a load factor of 1.0, when most performant implementations use around 0.75 (see Java hashtable for instance). A higher load increases collisions and decreases performance. They also use chaining where fast hashtables (for example google's freely available code) includes part of the chain in the table itself, thus almost always eliminating an extra cache miss. There may be say 2-4 total cache misses for a typical hashtable lookup so this is a huge difference.

Finally, the test setup is biased towards tst performance. They do lookups one after another in a tight loop and in sorted order, meaning that most of the ~9 nodes the TST has to traverse for a successful lookup are in the cpu cache already. For unsuccessful lookups they change the first letter of a string, thus ensuring the best possible results from the TST (which always compares parts of the string in order). For an isolated lookup, as in the program does a bunch of code then does a few query/add operation, a hash table will have far fewer cache misses than a TST.

Given the above, perhaps original article creator or somebody with in-depth knowledge can elaborate in article about which search problems are faster.

I don't know how relevant this is (only found this article by googling 'digital trie'), but people looking to compare hashing to a ternary search tree may find this interesting: http://unthought.net/c++/c_vs_c++.html --MaXiMiUS 22:47, 17 August 2007 (UTC)