Wildmat
From Wikipedia, the free encyclopedia
- The correct title of this article is wildmat. The initial letter is shown capitalized due to technical restrictions.
wildmat is a pattern matching library developed by Rich Salz. Based on the wildcard syntax already used in the Bourne shell, wildmat provides a uniform mechanism for matching patterns across applications with simpler syntax than that typically offered by regular expressions. Patterns are implicitly anchored at the beginning and end of each string when testing for a match.
There are five pattern matching operations other than a strict one-to-one match between the pattern and the source to be checked for a match.
- The first is an asterisk (*) to match any sequence of zero or more characters.
- The second is a question mark (?) to match any single character.
- The third specifies a specific set of characters. The set is specified as a list of characters, or as a range of characters where the beginning and end of the range are separated by a minus (or dash) character, or as any combination of lists and ranges. The dash can also be included in the set as a character if it is the beginning or end of the set. This set is enclosed in square brackets. The close square bracket (]) may be used in a set if it is the first character in the set.
- The fourth operation is the same as the logical not of the third operation and is specified the same way as the third with the addition of a caret character (^) at the beginning of the test string just inside the open square bracket.
- The final operation uses the backslash character to invalidate the special meaning of the an open square bracket ([), the asterisk, backslash or the question mark. Two backslashes in sequence will result in the evaluation of the backslash as a character with no special meaning.
wildmat is most commonly seen in NNTP implementations such as Salz' own INN, also in unrelated software such as GNU tar.
The full wildmat syntax is unable to handle multi-octet character sets, and poses problems when the text being searched may contain multiple incompatible character sets. A simplified version of wildmat oriented toward UTF-8 encoding has been developed by the IETF NNTP working group, to be included in an upcoming standards document.
[edit] External links
- comp.sources.misc article from Rich Salz containing the wildmat source code