pygtrie

pygtrie is a Python library implementing a trie data structure.

Trie data structure, also known as radix or prefix tree, is a tree associating keys to values where all the descendants of a node have a common prefix (associated with that node).

The trie module contains trie.Trie, trie.CharTrie and trie.StringTrie classes each implementing a mutable mapping interface, i.e. dict interface. As such, in most circumstances, trie.Trie could be used as a drop-in replacement for a dict, but the prefix nature of the data structure is trie’s real strength.

The module also contains trie.PrefixSet class which uses a trie to store a set of prefixes such that a key is contained in the set if it or its prefix is stored in the set.

Features

  • A full mutable mapping implementation.
  • Supports iterating over as well as deleting a subtrie.
  • Supports prefix checking as well as shortest and longest prefix look-up.
  • Extensible for any kind of user-defined keys.
  • A PrefixSet supports “all keys starting with given prefix” logic.
  • Can store any value including None.

Installation

To install bz2file, run:

pip install pygtrie

Or download the sources and save trie.py file with your project.

Trie classes

class Trie(*args, **kwargs)[source]

A trie implementation with dict interface plus some extensions.

Keys used with the trie.Trie must be iterable, yielding hashable objects. In other words, for a given key, dict.fromkeys(key) must be valid.

In particular, strings work fine as trie keys, however when getting keys back from iterkeys() method for example, instead of strings, tuples of characters are produced. For that reason, trie.CharTrie or trie.StringTrie may be preferred when using trie.Trie with string keys.

clear()[source]

Removes all the values from the trie.

copy()[source]

Returns a shallow copy of the trie.

classmethod fromkeys(keys, value=None)[source]

Creates a new trie with given keys set.

Args:
keys: An iterable of keys that should be set in the new trie. value: Value to associate with given keys.
Returns:
A new trie where each key from keys has been set to the given value.
has_key(key)[source]

Indicates whether given key has value associated with it.

has_node(key)[source]

Returns whether given node is in the trie.

Args:
key: A key to look for.
Returns:
A bitwise or of HAS_VALUE and HAS_SUBTRIE values indicating that node has a value associated with it and that it is a prefix of another existing key respectively.
has_subtrie(key)[source]

Returns whether given key is a prefix of another key in the trie.

items(prefix=<object object>, shallow=False)[source]

Returns a list of (key, value) pairs in given subtrie.

iteritems(prefix=<object object>, shallow=False)[source]

Yields all nodes with associated values with given prefix.

Args:

prefix: Prefix to limit iteration to. shallow: Perform a shallow traversal, i.e. do not yield items if their

prefix has been yielded.
Yields:
(key, value) tuples.
Raises:
KeyError: If prefix does not match any node.
iterkeys(prefix=<object object>, shallow=False)[source]

Yields all keys having associated values with given prefix.

Args:

prefix: Prefix to limit iteration to. shallow: Perform a shallow traversal, i.e. do not yield keys if their

prefix has been yielded.
Yields:
All the keys (with given prefix) with associated values in the trie.
Raises:
KeyError: If prefix does not match any node.
itervalues(prefix=<object object>, shallow=False)[source]

Yields all values associated with keys with given prefix.

Args:

prefix: Prefix to limit iteration to. shallow: Perform a shallow traversal, i.e. do not yield values if their

prefix has been yielded.
Yields:
All the values associated with keys (with given prefix) in the trie.
Raises:
KeyError: If prefix does not match any node.
keys(prefix=<object object>, shallow=False)[source]

Returns a list of all the keys, with given prefix, in the trie.

longest_prefix(key)[source]

Finds the longest prefix of a key with a value.

Args:
key: Key to look for.
Returns:
(k, value) where k is the longest prefix of key (it may equal key) and value is a value associated with that key. If no node is found, (None, None) is returned.
pop(key, default=<object object>)[source]

Deletes value associated with given key and returns it.

Args:

key: A key to look for. Must be iterable. default: If specified, value that will be returned if given key has no

value associated with it. If not specified, method will throw KeyError in such cases.
Returns:
Removed value, if key had value associated with it, or default (if given).
Raises:
ShortKeyError: If default has not been specified and the key has no
value associated with it but is a prefix of some key with a value. Note that ShortKeyError is subclass of KeyError.
KeyError: If default has not been specified and key has no value
associated with it nor is a prefix of an existing key.
popitem()[source]

Deletes a value from the trie.

Returns:
(key, value) tuple indicating deleted key.
Raises:
KeyError: If the trie is empty.
prefixes(key)[source]

Walks towards the node specified by key and yields all found values.

Args:
key: Key to look for.
Yields:
(k, value) pairs denoting keys with associated values encountered on the way towards the specified key.
setdefault(key, value)[source]

Sets value of a given node if not set already. Returns it afterwards.

shortest_prefix(key)[source]

Finds the shortest prefix of a key with a value.

Args:
key: Key to look for.
Returns:
(k, value) where k is the shortest prefix of key (it may equal key) and value is a value associated with that key. If no node is found, (None, None) is returned.
update(*args, **kwargs)[source]

Updates stored values. Works like dict.update.

values(prefix=<object object>, shallow=False)[source]

Returns a list of values in given subtrie.

class CharTrie(*args, **kwargs)[source]

A variant of a trie.Trie which accepts strings as keys.

The only difference between trie.CharTrie and trie.Trie is that when trie.CharTrie returns keys back to the client (for instance in keys() method is called), those keys are returned as strings.

class StringTrie(*args, **kwargs)[source]

A variant of trie.Trie accepting strings with a separator as keys.

The trie accepts stings as keys which are split into components using a separator specified during initialisation (“/” by default).

PrefixSet class

class PrefixSet(iterable=None, factory=<class 'trie.Trie'>, **kwargs)[source]

A set of prefixes.

trie.PrefixSet works similar to a normal set except it is said to contain a key if the key or it’s prefix is stored in the set. For instance, if “foo” is added to the set, the set contains “foo” as well as “foobar”.

The set supports addition of elements but does not support removal of elements. This is because there’s no obvious consistent and intuitive behaviour for element deletion.

add(key)[source]

Adds given key to the set.

If the set already contains prefix of the key being added, this operation has no effect. If the key being added is a prefix of some existing keys in the set, those keys are deleted and replaced by a single entry for the key being added.

For example, if the set contains key “foo” adding a key “foobar” does not change anything. On the other hand, if the set contains keys “foobar” and “foobaz”, adding a key “foo” will replace those two keys with a single key “foo”.

This makes a difference when iterating over the keys or counting number of keys. Counter intuitively, adding of a key can decrease size of the set.

Args:
key: Key to add.
clear()[source]

Removes all keys from the set.

copy()[source]

Returns a copy of the prefix set.

iter(prefix=<object object>)[source]

Iterates over all keys in the set optionally starting with a prefix.

Since a key does not have to be explicitly added to the set to be an element of the set, this method does not iterate over all possible keys that the set contains, but only over the shortest set of prefixes of all the keys the set contains.

For example, if “foo” has been added to the set, the set contains also “foobar”, but this method will not iterate over “foobar”.

If prefix argument is given, method will iterate over keys with given prefix only. The keys yielded from the function if prefix is given does not have to be a subset (in mathematical sense) of the keys yielded when there is not prefix. This happens, if the set contains a prefix of the given prefix.

For example, if only “foo” has been added to the set, iter method called with no arguments will yield “foo” only. However, when called with “foobar” argument, it will yield “foobar” only.