pygtrie

Implementation of a trie data structure.

Trie data structure, also known as radix or prefix tree, is a tree associating keys to values where all the descendants of a node have a common prefix (associated with that node).

The trie module contains pygtrie.Trie, pygtrie.CharTrie and pygtrie.StringTrie classes each implementing a mutable mapping interface, i.e. dict interface. As such, in most circumstances, pygtrie.Trie could be used as a drop-in replacement for a dict, but the prefix nature of the data structure is trie’s real strength.

The module also contains pygtrie.PrefixSet class which uses a trie to store a set of prefixes such that a key is contained in the set if it or its prefix is stored in the set.

Features

  • A full mutable mapping implementation.
  • Supports iterating over as well as deleting a subtrie.
  • Supports prefix checking as well as shortest and longest prefix look-up.
  • Extensible for any kind of user-defined keys.
  • A PrefixSet supports “all keys starting with given prefix” logic.
  • Can store any value including None.

For some simple examples see example.py file.

Installation

To install pygtrie, run:

pip install pygtrie

Or download the sources and save pygtrie.py file with your project.

Upgrading from 0.9.x

The 1.0 release introduced backwards incompatibility in naming. The module has been renamed from trie to pygtrie. Fortunately, updating scripts using pygtrie should boil down to replacing:

from pytrie import trie

with:

import pygtrie as trie

Trie classes

class Trie(*args, **kwargs)[source]

A trie implementation with dict interface plus some extensions.

Keys used with the pygtrie.Trie must be iterable, yielding hashable objects. In other words, for a given key, dict.fromkeys(key) must be valid.

In particular, strings work fine as trie keys, however when getting keys back from iterkeys() method for example, instead of strings, tuples of characters are produced. For that reason, pygtrie.CharTrie or pygtrie.StringTrie may be preferred when using pygtrie.Trie with string keys.

clear()[source]

Removes all the values from the trie.

copy()[source]

Returns a shallow copy of the trie.

enable_sorting(enable=True)[source]

Enables sorting of child nodes when iterating and traversing.

Normally, child nodes are not sorted when iterating or traversing over the trie (just like dict elements are not sorted). This method allows sorting to be enabled (which was the behaviour prior to pygtrie 2.0 release).

For Trie class, enabling sorting of children is identical to simply sorting the list of items since Trie returns keys as tuples. However, for other implementations such as StringTrie the two may behove subtly different. For example, sorting items might produce:

root/foo-bar root/foo/baz

even though foo comes before foo-bar.

Args:
enable: Whether to enable sorting of child nodes.
classmethod fromkeys(keys, value=None)[source]

Creates a new trie with given keys set.

Args:
keys: An iterable of keys that should be set in the new trie. value: Value to associate with given keys.
Returns:
A new trie where each key from keys has been set to the given value.
has_key(key)[source]

Indicates whether given key has value associated with it.

has_node(key)[source]

Returns whether given node is in the trie.

Args:
key: A key to look for.
Returns:
A bitwise or of HAS_VALUE and HAS_SUBTRIE values indicating that node has a value associated with it and that it is a prefix of another existing key respectively.
has_subtrie(key)[source]

Returns whether given key is a prefix of another key in the trie.

items(prefix=<object object>, shallow=False)[source]

Returns a list of (key, value) pairs in given subtrie.

iteritems(prefix=<object object>, shallow=False)[source]

Yields all nodes with associated values with given prefix.

Args:

prefix: Prefix to limit iteration to. shallow: Perform a shallow traversal, i.e. do not yield items if

their prefix has been yielded.
Yields:
(key, value) tuples.
Raises:
KeyError: If prefix does not match any node.
iterkeys(prefix=<object object>, shallow=False)[source]

Yields all keys having associated values with given prefix.

Args:

prefix: Prefix to limit iteration to. shallow: Perform a shallow traversal, i.e. do not yield keys if

their prefix has been yielded.
Yields:
All the keys (with given prefix) with associated values in the trie.
Raises:
KeyError: If prefix does not match any node.
itervalues(prefix=<object object>, shallow=False)[source]

Yields all values associated with keys with given prefix.

Args:

prefix: Prefix to limit iteration to. shallow: Perform a shallow traversal, i.e. do not yield values if

their prefix has been yielded.
Yields:
All the values associated with keys (with given prefix) in the trie.
Raises:
KeyError: If prefix does not match any node.
keys(prefix=<object object>, shallow=False)[source]

Returns a list of all the keys, with given prefix, in the trie.

longest_prefix(key)[source]

Finds the longest prefix of a key with a value.

Args:
key: Key to look for.
Returns:
(k, value) where k is the longest prefix of key (it may equal key) and value is a value associated with that key. If no node is found, (None, None) is returned.
pop(key, default=<object object>)[source]

Deletes value associated with given key and returns it.

Args:

key: A key to look for. Must be iterable. default: If specified, value that will be returned if given key has

no value associated with it. If not specified, method will throw KeyError in such cases.
Returns:
Removed value, if key had value associated with it, or default (if given).
Raises:
ShortKeyError: If default has not been specified and the key has
no value associated with it but is a prefix of some key with a value. Note that ShortKeyError is subclass of KeyError.
KeyError: If default has not been specified and key has no value
associated with it nor is a prefix of an existing key.
popitem()[source]

Deletes a value from the trie.

Returns:
(key, value) tuple indicating deleted key.
Raises:
KeyError: If the trie is empty.
prefixes(key)[source]

Walks towards the node specified by key and yields all found values.

Args:
key: Key to look for.
Yields:
(k, value) pairs denoting keys with associated values encountered on the way towards the specified key.
setdefault(key, value=None)[source]

Sets value of a given node if not set already. Also returns it.

shortest_prefix(key)[source]

Finds the shortest prefix of a key with a value.

Args:
key: Key to look for.
Returns:
(k, value) where k is the shortest prefix of key (it may equal key) and value is a value associated with that key. If no node is found, (None, None) is returned.
traverse(node_factory, prefix=<object object>)[source]

Traverses the tree using node_factory object.

node_factory is a callable function which accepts (path_conv, path, children, value=...) arguments, where path_conv is a lambda converting path representation to key, path is the path to this node, children is an iterable of children nodes constructed by node_factory, optional value is the value associated with the path.

node_factory’s children argument is a generator which has a few consequences:

  • To traverse into node’s children, the generator must be iterated over. This can by accomplished by a simple “children = list(children)” statement.
  • Ignoring the argument allows node_factory to stop the traversal from going into the children of the node. In other words, whole subtrie can be removed from traversal if node_factory chooses so.
  • If children is stored as is (i.e. as a generator) when it is iterated over later on it will see state of the trie as it is during the iteration and not when traverse method was called.
Note: Unlike iterators, traverse method uses stack recursion which means
that using it on deep tries may lead to a RuntimeError exception thrown once Python’s maximum recursion depth is reached.
Args:
node_factory: Makes opaque objects from the keys and values of the
trie.
prefix: Prefix for node to start traversal, by default starts at
root.
Returns:
Node object constructed by node_factory corresponding to the root node.
update(*args, **kwargs)[source]

Updates stored values. Works like dict.update.

values(prefix=<object object>, shallow=False)[source]

Returns a list of values in given subtrie.

class CharTrie(*args, **kwargs)[source]

A variant of a pygtrie.Trie which accepts strings as keys.

The only difference between pygtrie.CharTrie and pygtrie.Trie is that when pygtrie.CharTrie returns keys back to the client (for instance in keys() method is called), those keys are returned as strings.

class StringTrie(*args, **kwargs)[source]

pygtrie.Trie variant accepting strings with a separator as keys.

The trie accepts strings as keys which are split into components using a separator specified during initialisation (“/” by default).

PrefixSet class

class PrefixSet(iterable=None, factory=<class 'pygtrie.Trie'>, **kwargs)[source]

A set of prefixes.

pygtrie.PrefixSet works similar to a normal set except it is said to contain a key if the key or it’s prefix is stored in the set. For instance, if “foo” is added to the set, the set contains “foo” as well as “foobar”.

The set supports addition of elements but does not support removal of elements. This is because there’s no obvious consistent and intuitive behaviour for element deletion.

add(key)[source]

Adds given key to the set.

If the set already contains prefix of the key being added, this operation has no effect. If the key being added is a prefix of some existing keys in the set, those keys are deleted and replaced by a single entry for the key being added.

For example, if the set contains key “foo” adding a key “foobar” does not change anything. On the other hand, if the set contains keys “foobar” and “foobaz”, adding a key “foo” will replace those two keys with a single key “foo”.

This makes a difference when iterating over the keys or counting number of keys. Counter intuitively, adding of a key can decrease size of the set.

Args:
key: Key to add.
clear()[source]

Removes all keys from the set.

copy()[source]

Returns a copy of the prefix set.

iter(prefix=<object object>)[source]

Iterates over all keys in the set optionally starting with a prefix.

Since a key does not have to be explicitly added to the set to be an element of the set, this method does not iterate over all possible keys that the set contains, but only over the shortest set of prefixes of all the keys the set contains.

For example, if “foo” has been added to the set, the set contains also “foobar”, but this method will not iterate over “foobar”.

If prefix argument is given, method will iterate over keys with given prefix only. The keys yielded from the function if prefix is given does not have to be a subset (in mathematical sense) of the keys yielded when there is not prefix. This happens, if the set contains a prefix of the given prefix.

For example, if only “foo” has been added to the set, iter method called with no arguments will yield “foo” only. However, when called with “foobar” argument, it will yield “foobar” only.

Version History

2.0: 2016/07/06

  • Sorting of child nodes is disabled by default for better performance. enable_sorting method can be used to bring back old behaviour.
  • Tries of arbitrary depth can be pickled without reaching Python’s recursion limits. (N.B. The pickle format is incompatible with one from 1.2 release). _Node’s __getstate__ and __setstate__ method can be used to implement other serialisation methods such as JSON.

1.2: 2016/06/21 [pulled back from PyPi]

  • Tries can now be pickled.
  • Iterating no longer uses recursion so tries of arbitrary depth can be iterated over. The traverse method, however, still uses recursion thus cannot be used on big structures.

1.1: 2016/01/18

  • Fixed PyPi installation issues; all should work now.

1.0: 2015/12/16

  • The module has been renamed from trie to pygtrie. This could break current users but see documentation for how to quickly upgrade your scripts.
  • Added traverse method which goes through the nodes of the trie preserving structure of the tree. This is a depth-first traversal which can be used to search for elements or translate a trie into a different tree structure.
  • Minor documentation fixes.

0.9.3: 2015/05/28

  • Minor documentation fixes.

0.9.2: 2015/05/28

  • Added Sphinx configuration and updated docstrings to work better with Sphinx.

0.9.1: 2014/02/03

  • New name.

0.9: 2014/02/03

  • Initial release.