pygtrie

Pure Python implementation of a trie data structure compatible with Python 2.x and Python 3.x.

Trie data structure, also known as radix or prefix tree, is a tree associating keys to values where all the descendants of a node have a common prefix (associated with that node).

The trie module contains pygtrie.Trie, pygtrie.CharTrie and pygtrie.StringTrie classes each implementing a mutable mapping interface, i.e. dict interface. As such, in most circumstances, pygtrie.Trie could be used as a drop-in replacement for a dict, but the prefix nature of the data structure is trie’s real strength.

The module also contains pygtrie.PrefixSet class which uses a trie to store a set of prefixes such that a key is contained in the set if it or its prefix is stored in the set.

Features

  • A full mutable mapping implementation.
  • Supports iterating over as well as deleting of a branch of a trie (i.e. subtrie)
  • Supports prefix checking as well as shortest and longest prefix look-up.
  • Extensible for any kind of user-defined keys.
  • A PrefixSet supports “all keys starting with given prefix” logic.
  • Can store any value including None.

For a few simple examples see example.py file.

Installation

To install pygtrie, simply run:

pip install pygtrie

or by adding line such as:

pygtrie == 2.*

to project’s requirements file.

Trie classes

class Trie(*args, **kwargs)[source]

A trie implementation with dict interface plus some extensions.

Keys used with the pygtrie.Trie class must be iterable which each component being a hashable objects. In other words, for a given key, dict.fromkeys(key) must be valid expression.

In particular, strings work well as trie keys, however when getting them back (for example via Trie.iterkeys method), instead of strings, tuples of characters are produced. For that reason, pygtrie.CharTrie or pygtrie.StringTrie classes may be preferred when using string keys.

__delitem__(key_or_slice)[source]

Deletes value associated with given key or raises KeyError.

If argument is a key, value associated with it is deleted. If the key is also a prefix, its descendents are not affected. On the other hand, if the argument is a slice (in which case it must have only start set), the whole subtrie is removed. For example:

>>> import pygtrie
>>> t = pygtrie.StringTrie()
>>> t['foo'] = 'Foo'
>>> t['foo/bar'] = 'Bar'
>>> t['foo/bar/baz'] = 'Baz'
>>> del t['foo/bar']
>>> t.keys()
['foo', 'foo/bar/baz']
>>> del t['foo':]
>>> t.keys()
[]
Parameters:

key_or_slice – A key to look for or a slice. If key is a slice, the whole subtrie will be removed.

Raises:
  • ShortKeyError – If the key has no value associated with it but is a prefix of some key with a value. This is not thrown if key_or_slice is a slice – in such cases, the whole subtrie is removed. Note that ShortKeyError is subclass of KeyError.
  • KeyError – If key has no value associated with it nor is a prefix of an existing key.
  • TypeError – If key is a slice whose stop or step are not None.
__eq__(other)[source]

Compares this trie’s mapping with another mapping.

Note that this method doesn’t take trie’s structure into consideration. What matters is whether keys and values in both mappings are the same. This may lead to unexpected results, for example:

>>> import pygtrie
>>> t0 = StringTrie({'foo/bar': 42}, separator='/')
>>> t1 = StringTrie({'foo.bar': 42}, separator='.')
>>> t0 == t1
False
>>> t0 = StringTrie({'foo/bar.baz': 42}, separator='/')
>>> t1 = StringTrie({'foo/bar.baz': 42}, separator='.')
>>> t0 == t1
True
>>> t0 = Trie({'foo': 42})
>>> t1 = CharTrie({'foo': 42})
>>> t0 == t1
False

This behaviour is required to maintain consistency with Mapping interface and its __eq__ method. For example, this implementation maintains transitivity of the comparison:

>>> t0 = StringTrie({'foo/bar.baz': 42}, separator='/')
>>> d = {'foo/bar.baz': 42}
>>> t1 = StringTrie({'foo/bar.baz': 42}, separator='.')
>>> t0 == d
True
>>> d == t1
True
>>> t0 == t1
True
>>> t0 = Trie({'foo': 42})
>>> d = {'foo': 42}
>>> t1 = CharTrie({'foo': 42})
>>> t0 == d
False
>>> d == t1
True
>>> t0 == t1
False
Parameters:other – Other object to compare to.
Returns:NotImplemented if this method does not know how to perform the comparison or a bool denoting whether the two objects are equal or not.
__getitem__(key_or_slice)[source]

Returns value associated with given key or raises KeyError.

When argument is a single key, value for that key is returned (or KeyError exception is thrown if the node does not exist or has no value associated with it).

When argument is a slice, it must be one with only start set in which case the access is identical to Trie.itervalues invocation with prefix argument.

Example

>>> import pygtrie
>>> t = pygtrie.StringTrie()
>>> t['foo/bar'] = 'Bar'
>>> t['foo/baz'] = 'Baz'
>>> t['qux'] = 'Qux'
>>> t['foo/bar']
'Bar'
>>> sorted(t['foo':])
['Bar', 'Baz']
>>> t['foo']  # doctest: +IGNORE_EXCEPTION_DETAIL
Traceback (most recent call last):
    ...
ShortKeyError: 'foo'
Parameters:

key_or_slice – A key or a slice to look for.

Returns:

If a single key is passed, a value associated with given key. If a slice is passed, a generator of values in specified subtrie.

Raises:
  • ShortKeyError – If the key has no value associated with it but is a prefix of some key with a value. Note that ShortKeyError is subclass of KeyError.
  • KeyError – If key has no value associated with it nor is a prefix of an existing key.
  • TypeError – If key_or_slice is a slice but it’s stop or step are not None.
__init__(*args, **kwargs)[source]

Initialises the trie.

Arguments are interpreted the same way Trie.update interprets them.

__len__()[source]

Returns number of values in a trie.

Note that this method is expensive as it iterates over the whole trie.

__setitem__(key_or_slice, value)[source]

Sets value associated with given key.

If key_or_slice is a key, simply associate it with given value. If it is a slice (which must have start set only), it in addition clears any subtrie that might have been attached to particular key. For example:

>>> import pygtrie
>>> t = pygtrie.StringTrie()
>>> t['foo/bar'] = 'Bar'
>>> t['foo/baz'] = 'Baz'
>>> sorted(t.keys())
['foo/bar', 'foo/baz']
>>> t['foo':] = 'Foo'
>>> t.keys()
['foo']
Parameters:
  • key_or_slice – A key to look for or a slice. If it is a slice, the whole subtrie (if present) will be replaced by a single node with given value set.
  • value – Value to set.
Raises:

TypeError – If key is a slice whose stop or step are not None.

clear()[source]

Removes all the values from the trie.

copy(_Trie__make_copy=<function <lambda>>)[source]

Returns a shallow copy of the object.

enable_sorting(enable=True)[source]

Enables sorting of child nodes when iterating and traversing.

Normally, child nodes are not sorted when iterating or traversing over the trie (just like dict elements are not sorted). This method allows sorting to be enabled (which was the behaviour prior to pygtrie 2.0 release).

For Trie class, enabling sorting of children is identical to simply sorting the list of items since Trie returns keys as tuples. However, for other implementations such as StringTrie the two may behave subtly different. For example, sorting items might produce:

root/foo-bar
root/foo/baz

even though foo comes before foo-bar.

Parameters:enable – Whether to enable sorting of child nodes.
classmethod fromkeys(keys, value=None)[source]

Creates a new trie with given keys set.

This is roughly equivalent to calling the constructor with a (key, value) for key in keys generator.

Parameters:
  • keys – An iterable of keys that should be set in the new trie.
  • value – Value to associate with given keys.
Returns:

A new trie where each key from keys has been set to the given value.

has_key(key)[source]

Indicates whether given key has value associated with it.

See Trie.has_node for more detailed documentation.

has_node(key)[source]

Returns whether given node is in the trie.

Return value is a bitwise or of HAS_VALUE and HAS_SUBTRIE constants indicating node has a value associated with it and that it is a prefix of another existing key respectively. Both of those are independent of each other and all of the four combinations are possible. For example:

>>> import pygtrie
>>> t = pygtrie.StringTrie()
>>> t['foo/bar'] = 'Bar'
>>> t['foo/bar/baz'] = 'Baz'
>>> t.has_node('qux') == 0
True
>>> t.has_node('foo/bar/baz') == pygtrie.Trie.HAS_VALUE
True
>>> t.has_node('foo') == pygtrie.Trie.HAS_SUBTRIE
True
>>> t.has_node('foo/bar') == (pygtrie.Trie.HAS_VALUE |
...                           pygtrie.Trie.HAS_SUBTRIE)
True

There are two higher level methods built on top of this one which give easier interface for the information. Trie.has_key returns whether node has a value associated with it and Trie.has_subtrie checks whether node is a prefix. Continuing previous example:

>>> t.has_key('qux'), t.has_subtrie('qux')
(False, False)
>>> t.has_key('foo/bar/baz'), t.has_subtrie('foo/bar/baz')
(True, False)
>>> t.has_key('foo'), t.has_subtrie('foo')
(False, True)
>>> t.has_key('foo/bar'), t.has_subtrie('foo/bar')
(True, True)
Parameters:key – A key to look for.
Returns:Non-zero if node exists and if it does a bit-field denoting whether it has a value associated with it and whether it has a subtrie.
has_subtrie(key)[source]

Returns whether given key is a prefix of another key in the trie.

See Trie.has_node for more detailed documentation.

items(prefix=<pygtrie._NoChildren object>, shallow=False)[source]

Returns a list of (key, value) pairs in given subtrie.

This is equivalent to constructing a list from generator returned by Trie.iteritems which see for more detailed documentation.

iteritems(prefix=<pygtrie._NoChildren object>, shallow=False)[source]

Yields all nodes with associated values with given prefix.

Only nodes with values are output. For example:

>>> import pygtrie
>>> t = pygtrie.StringTrie()
>>> t['foo'] = 'Foo'
>>> t['foo/bar/baz'] = 'Baz'
>>> t['qux'] = 'Qux'
>>> sorted(t.items())
[('foo', 'Foo'), ('foo/bar/baz', 'Baz'), ('qux', 'Qux')]

Items are generated in topological order (i.e. parents before child nodes) but the order of siblings is unspecified. At an expense of efficiency, Trie.enable_sorting method can turn deterministic ordering of siblings.

With prefix argument, only items with specified prefix are generated (i.e. only given subtrie is traversed) as demonstrated by:

>>> t.items(prefix='foo')
[('foo', 'Foo'), ('foo/bar/baz', 'Baz')]

With shallow argument, if a node has value associated with it, it’s children are not traversed even if they exist which can be seen in:

>>> sorted(t.items(shallow=True))
[('foo', 'Foo'), ('qux', 'Qux')]
Parameters:
  • prefix – Prefix to limit iteration to.
  • shallow – Perform a shallow traversal, i.e. do not yield items if their prefix has been yielded.
Yields:

(key, value) tuples.

Raises:

KeyError – If prefix does not match any node.

iterkeys(prefix=<pygtrie._NoChildren object>, shallow=False)[source]

Yields all keys having associated values with given prefix.

This is equivalent to taking first element of tuples generated by Trie.iteritems which see for more detailed documentation.

Parameters:
  • prefix – Prefix to limit iteration to.
  • shallow – Perform a shallow traversal, i.e. do not yield keys if their prefix has been yielded.
Yields:

All the keys (with given prefix) with associated values in the trie.

Raises:

KeyError – If prefix does not match any node.

itervalues(prefix=<pygtrie._NoChildren object>, shallow=False)[source]

Yields all values associated with keys with given prefix.

This is equivalent to taking second element of tuples generated by Trie.iteritems which see for more detailed documentation.

Parameters:
  • prefix – Prefix to limit iteration to.
  • shallow – Perform a shallow traversal, i.e. do not yield values if their prefix has been yielded.
Yields:

All the values associated with keys (with given prefix) in the trie.

Raises:

KeyError – If prefix does not match any node.

keys(prefix=<pygtrie._NoChildren object>, shallow=False)[source]

Returns a list of all the keys, with given prefix, in the trie.

This is equivalent to constructing a list from generator returned by Trie.iterkeys which see for more detailed documentation.

longest_prefix(key)[source]

Finds the longest prefix of a key with a value.

This is roughly equivalent to taking the last object yielded by Trie.prefixes with additional handling for situations when no prefixes are found.

Example

>>> import pygtrie
>>> t = pygtrie.StringTrie()
>>> t['foo'] = 'Foo'
>>> t['foo/bar/baz'] = 'Baz'
>>> t.longest_prefix('foo/bar/baz/qux')
('foo/bar/baz': 'Baz')
>>> t.longest_prefix('foo/bar/baz/qux').key
'foo/bar/baz'
>>> t.longest_prefix('foo/bar/baz/qux').value
'Baz'
>>> t.longest_prefix('does/not/exist')
(None Step)
>>> bool(t.longest_prefix('does/not/exist'))
False
Parameters:key – Key to look for.
Returns:pygtrie.Trie._Step object (which can be used to extract or set node’s value as well as get node’s key), or a pygtrie.Trie._NoneStep object (which is falsy value simulating a _Step with None key and value) if no prefix is found.

The object can be treated as (key, value) pair denoting key with associated value of the prefix. This is deprecated, prefer using key and value properties of the object.

merge(other, overwrite=False)[source]

Moves nodes from other trie into this one.

The merging happens at trie structure level and as such is different than iterating over items of one trie and setting them in the other trie.

The merging may happen between different types of tries resulting in different (key, value) pairs in the destination trie compared to the source. For example, merging two pygtrie.StringTrie objects each using different separators will work as if the other trie had separator of this trie. Similarly, a pygtrie.CharTrie may be merged into a pygtrie.StringTrie but when keys are read those will be joined by the separator. For example:

>>> import pygtrie
>>> st = pygtrie.StringTrie(separator='.')
>>> st.merge(pygtrie.StringTrie({'foo/bar': 42}))
>>> list(st.items())
[('foo.bar', 42)]
>>> st.merge(pygtrie.CharTrie({'baz': 24}))
>>> sorted(st.items())
[('b.a.z', 24), ('foo.bar', 42)]

Not all tries can be merged into other tries. For example, a pygtrie.StringTrie may not be merged into a pygtrie.CharTrie because the latter imposes a requirement for each component in the key to be exactly one character while in the former components may be arbitrary length.

Note that the other trie is cleared and any references or iterators over it are invalidated. To preserve other’s value it needs to be copied first.

Parameters:
  • other – Other trie to move nodes from.
  • overwrite – Whether to overwrite existing values in this trie.
pop(key, default=<pygtrie._NoChildren object>)[source]

Deletes value associated with given key and returns it.

Parameters:
  • key – A key to look for.
  • default – If specified, value that will be returned if given key has no value associated with it. If not specified, method will throw KeyError in such cases.
Returns:

Removed value, if key had value associated with it, or default (if given).

Raises:
  • ShortKeyError – If default has not been specified and the key has no value associated with it but is a prefix of some key with a value. Note that ShortKeyError is subclass of KeyError.
  • KeyError – If default has not been specified and key has no value associated with it nor is a prefix of an existing key.
popitem()[source]

Deletes an arbitrary value from the trie and returns it.

There is no guarantee as to which item is deleted and returned. Neither in respect to its lexicographical nor topological order.

Returns:(key, value) tuple indicating deleted key.
Raises:KeyError – If the trie is empty.
prefixes(key)[source]

Walks towards the node specified by key and yields all found items.

Example

>>> import pygtrie
>>> t = pygtrie.StringTrie()
>>> t['foo'] = 'Foo'
>>> t['foo/bar/baz'] = 'Baz'
>>> list(t.prefixes('foo/bar/baz/qux'))
[('foo': 'Foo'), ('foo/bar/baz': 'Baz')]
>>> list(t.prefixes('does/not/exist'))
[]
Parameters:

key – Key to look for.

Yields:

pygtrie.Trie._Step objects which can be used to extract or set node’s value as well as get node’s key.

The objects can be treated as (k, value) pairs denoting keys with associated values encountered on the way towards the specified key. This is deprecated, prefer using key and value properties of the object.

setdefault(key, default=None)[source]

Sets value of a given node if not set already. Also returns it.

In contrast to Trie.__setitem__, this method does not accept slice as a key.

shortest_prefix(key)[source]

Finds the shortest prefix of a key with a value.

This is roughly equivalent to taking the first object yielded by Trie.prefixes with additional handling for situations when no prefixes are found.

Example

>>> import pygtrie
>>> t = pygtrie.StringTrie()
>>> t['foo'] = 'Foo'
>>> t['foo/bar/baz'] = 'Baz'
>>> t.shortest_prefix('foo/bar/baz/qux')
('foo': 'Foo')
>>> t.shortest_prefix('foo/bar/baz/qux').key
'foo'
>>> t.shortest_prefix('foo/bar/baz/qux').value
'Foo'
>>> t.shortest_prefix('does/not/exist')
(None Step)
>>> bool(t.shortest_prefix('does/not/exist'))
False
Parameters:key – Key to look for.
Returns:pygtrie.Trie._Step object (which can be used to extract or set node’s value as well as get node’s key), or a pygtrie.Trie._NoneStep object (which is falsy value simulating a _Step with None key and value) if no prefix is found.

The object can be treated as (key, value) pair denoting key with associated value of the prefix. This is deprecated, prefer using key and value properties of the object.

strictly_equals(other)[source]

Checks whether tries are equal with the same structure.

This is stricter comparison than the one performed by equality operator. It not only requires for keys and values to be equal but also for the two tries to be of the same type and have the same structure.

For example, for two pygtrie.StringTrie objects to be equal, they need to have the same structure as well as the same separator as seen below:

>>> import pygtrie
>>> t0 = StringTrie({'foo/bar': 42}, separator='/')
>>> t1 = StringTrie({'foo.bar': 42}, separator='.')
>>> t0.strictly_equals(t1)
False
>>> t0 = StringTrie({'foo/bar.baz': 42}, separator='/')
>>> t1 = StringTrie({'foo/bar.baz': 42}, separator='.')
>>> t0 == t1
True
>>> t0.strictly_equals(t1)
False
Parameters:other – Other trie to compare to.
Returns:Whether the two tries are the same type and have the same structure.
traverse(node_factory, prefix=<pygtrie._NoChildren object>)[source]

Traverses the tree using node_factory object.

node_factory is a callable which accepts (path_conv, path, children, value=…) arguments, where path_conv is a lambda converting path representation to key, path is the path to this node, children is an iterable of children nodes constructed by node_factory, optional value is the value associated with the path.

node_factory’s children argument is an iterator which has a few consequences:

  • To traverse into node’s children, the object must be iterated over. This can by accomplished by a simple children = list(children) statement.
  • Ignoring the argument allows node_factory to stop the traversal from going into the children of the node. In other words, whole subtries can be removed from traversal if node_factory chooses so.
  • If children is stored as is (i.e. as a iterator) when it is iterated over later on it may see an inconsistent state of the trie if it has changed between invocation of this method and the iteration.

However, to allow constant-time determination whether the node has children or not, the iterator implements bool conversion such that has_children = bool(children) will tell whether node has children without iterating over them. (Note that bool(children) will continue returning True even if the iterator has been iterated over).

Trie.traverse has two advantages over Trie.iteritems and similar methods:

  1. it allows subtries to be skipped completely when going through the list of nodes based on the property of the parent node; and
  2. it represents structure of the trie directly making it easy to convert structure into a different representation.

For example, the below snippet prints all files in current directory counting how many HTML files were found but ignores hidden files and directories (i.e. those whose names start with a dot):

import os
import pygtrie

t = pygtrie.StringTrie(separator=os.sep)

# Construct a trie with all files in current directory and all
# of its sub-directories.  Files get set a True value.
# Directories are represented implicitly by being prefixes of
# files.
for root, _, files in os.walk('.'):
    for name in files: t[os.path.join(root, name)] = True

def traverse_callback(path_conv, path, children, is_file=False):
    if path and path[-1] != '.' and path[-1][0] == '.':
        # Ignore hidden directory (but accept root node and '.')
        return 0
    elif is_file:
        print path_conv(path)
        return int(path[-1].endswith('.html'))
    else:
        # Otherwise, it's a directory.  Traverse into children.
        return sum(children)

print t.traverse(traverse_callback)

As documented, ignoring the children argument causes subtrie to be omitted and not walked into.

In the next example, the trie is converted to a tree representation where child nodes include a pointer to their parent. As before, hidden files and directories are ignored:

import os
import pygtrie

t = pygtrie.StringTrie(separator=os.sep)
for root, _, files in os.walk('.'):
    for name in files: t[os.path.join(root, name)] = True

class File(object):
    def __init__(self, name):
        self.name = name
        self.parent = None

class Directory(File):
    def __init__(self, name, children):
        super(Directory, self).__init__(name)
        self._children = children
        for child in children:
            child.parent = self

def traverse_callback(path_conv, path, children, is_file=False):
    if not path or path[-1] == '.' or path[-1][0] != '.':
        if is_file:
            return File(path[-1])
        children = filter(None, children)
        return Directory(path[-1] if path else '', children)

root = t.traverse(traverse_callback)

Note: Unlike iterators, when used on a deep trie, traverse method is prone to rising a RuntimeError exception when Python’s maximum recursion depth is reached. This can be addressed by not iterating over children inside of the node_factory. For example, the below code converts a trie into an undirected graph using adjacency list representation:

def undirected_graph_from_trie(t):
    '''Converts trie into a graph and returns its nodes.'''

    Node = collections.namedtuple('Node', 'path neighbours')

    class Builder(object):
        def __init__(self, path_conv, path, children, _=None):
            self.node = Node(path_conv(path), [])
            self.children = children
            self.parent = None

        def build(self, queue):
            for builder in self.children:
                builder.parent = self.node
                queue.append(builder)
            if self.parent:
                self.parent.neighbours.append(self.node)
                self.node.neighbours.append(self.parent)
            return self.node

    nodes = [t.traverse(Builder)]
    i = 0
    while i < len(nodes):
        nodes[i] = nodes[i].build(nodes)
        i += 1
    return nodes
Parameters:
  • node_factory – Makes opaque objects from the keys and values of the trie.
  • prefix – Prefix for node to start traversal, by default starts at root.
Returns:

Node object constructed by node_factory corresponding to the root node.

update(*args, **kwargs)[source]

Updates stored values. Works like dict.update.

values(prefix=<pygtrie._NoChildren object>, shallow=False)[source]

Returns a list of values in given subtrie.

This is equivalent to constructing a list from generator returned by Trie.itervalues which see for more detailed documentation.

walk_towards(key)[source]

Yields nodes on the path to given node.

Parameters:

key – Key of the node to look for.

Yields:

pygtrie.Trie._Step objects which can be used to extract or set node’s value as well as get node’s key.

When representing nodes with assigned values, the objects can be treated as (k, value) pairs denoting keys with associated values encountered on the way towards the specified key. This is deprecated, prefer using key and value properties or get method of the object.

Raises:

KeyError – If node with given key does not exist. It’s all right if they value is not assigned to the node provided it has a child node. Because the method is a generator, the exception is raised only once a missing node is encountered.

class CharTrie(*args, **kwargs)[source]

A variant of a pygtrie.Trie which accepts strings as keys.

The only difference between pygtrie.CharTrie and pygtrie.Trie is that when pygtrie.CharTrie returns keys back to the client (for instance when Trie.keys method is called), those keys are returned as strings.

Common example where this class can be used is a dictionary of words in a natural language. For example:

>>> import pygtrie
>>> t = pygtrie.CharTrie()
>>> t['wombat'] = True
>>> t['woman'] = True
>>> t['man'] = True
>>> t['manhole'] = True
>>> t.has_subtrie('wo')
True
>>> t.has_key('man')
True
>>> t.has_subtrie('man')
True
>>> t.has_subtrie('manhole')
False
class StringTrie(*args, **kwargs)[source]

pygtrie.Trie variant accepting strings with a separator as keys.

The trie accepts strings as keys which are split into components using a separator specified during initialisation (forward slash, i.e. /, by default).

Common example where this class can be used is when keys are paths. For example, it could map from a path to a request handler:

import pygtrie

def handle_root(): pass
def handle_admin(): pass
def handle_admin_images(): pass

handlers = pygtrie.StringTrie()
handlers[''] = handle_root
handlers['/admin'] = handle_admin
handlers['/admin/images'] = handle_admin_images

request_path = '/admin/images/foo'

handler = handlers.longest_prefix(request_path)
__init__(*args, **kwargs)[source]

Initialises the trie.

Except for a separator named argument, all other arguments are interpreted the same way Trie.update interprets them.

Parameters:
  • *args – Passed to super class initialiser.
  • **kwargs – Passed to super class initialiser.
  • separator – A separator to use when splitting keys into paths used by the trie. “/” is used if this argument is not specified. This named argument is not specified on the function’s prototype because of Python’s limitations.
Raises:
classmethod fromkeys(keys, value=None, separator='/')[source]

Creates a new trie with given keys set.

This is roughly equivalent to calling the constructor with a (key, value) for key in keys generator.

Parameters:
  • keys – An iterable of keys that should be set in the new trie.
  • value – Value to associate with given keys.
Returns:

A new trie where each key from keys has been set to the given value.

PrefixSet class

class PrefixSet(iterable=(), factory=<class 'pygtrie.Trie'>, **kwargs)[source]

A set of prefixes.

pygtrie.PrefixSet works similar to a normal set except it is said to contain a key if the key or it’s prefix is stored in the set. For instance, if “foo” is added to the set, the set contains “foo” as well as “foobar”.

The set supports addition of elements but does not support removal of elements. This is because there’s no obvious consistent and intuitive behaviour for element deletion.

__contains__(key)[source]

Checks whether set contains key or its prefix.

__init__(iterable=(), factory=<class 'pygtrie.Trie'>, **kwargs)[source]

Initialises the prefix set.

Parameters:
  • iterable – A sequence of keys to add to the set.
  • factory – A function used to create a trie used by the pygtrie.PrefixSet.
  • kwargs – Additional keyword arguments passed to the factory function.
__iter__()[source]

Return iterator over all prefixes in the set.

See PrefixSet.iter method for more info.

__len__()[source]

Returns number of keys stored in the set.

Since a key does not have to be explicitly added to the set to be an element of the set, this method does not count over all possible keys that the set contains (since that would be infinity), but only over the shortest set of prefixes of all the keys the set contains.

For example, if “foo” has been added to the set, the set contains also “foobar”, but this method will not count “foobar”.

add(value)[source]

Adds given value to the set.

If the set already contains prefix of the value being added, this operation has no effect. If the value being added is a prefix of some existing values in the set, those values are deleted and replaced by a single entry for the value being added.

For example, if the set contains value “foo” adding a value “foobar” does not change anything. On the other hand, if the set contains values “foobar” and “foobaz”, adding a value “foo” will replace those two values with a single value “foo”.

This makes a difference when iterating over the values or counting number of values. Counter intuitively, adding of a value can decrease size of the set.

Parameters:value – Value to add.
clear()[source]

Removes all keys from the set.

copy()[source]

Returns a shallow copy of the object.

discard(value)[source]

Raises NotImplementedError.

iter(prefix=<pygtrie._NoChildren object>)[source]

Iterates over all keys in the set optionally starting with a prefix.

Since a key does not have to be explicitly added to the set to be an element of the set, this method does not iterate over all possible keys that the set contains, but only over the shortest set of prefixes of all the keys the set contains.

For example, if “foo” has been added to the set, the set contains also “foobar”, but this method will not iterate over “foobar”.

If prefix argument is given, method will iterate over keys with given prefix only. The keys yielded from the function if prefix is given does not have to be a subset (in mathematical sense) of the keys yielded when there is not prefix. This happens, if the set contains a prefix of the given prefix.

For example, if only “foo” has been added to the set, iter method called with no arguments will yield “foo” only. However, when called with “foobar” argument, it will yield “foobar” only.

pop()[source]

Raises NotImplementedError.

remove(value)[source]

Raises NotImplementedError.

Custom exceptions

class ShortKeyError[source]

Raised when given key is a prefix of an existing longer key but does not have a value associated with itself.

Version History

2.5: 2022/07/16

  • Add pygtrie.Trie.merge method which merges structures of two tries.

  • Add pygtrie.Trie.strictly_equals method which compares two tries with stricter rules than regular equality operator. It’s not sufficient that keys and values are the same but the structure of the tries must be the same as well. For example:

    >>> t0 = StringTrie({'foo/bar.baz': 42}, separator='/')
    >>> t1 = StringTrie({'foo/bar.baz': 42}, separator='.')
    >>> t0 == t1
    True
    >>> t0.strictly_equals(t1)
    False
    
  • Fix pygtrie.Trie.__eq__ implementation such that key values are taken into consideration rather than just looking at trie structure. To see what this means it’s best to look at a few examples. Firstly:

    >>> t0 = StringTrie({'foo/bar': 42}, separator='/')
    >>> t1 = StringTrie({'foo.bar': 42}, separator='.')
    >>> t0 == t1
    False
    

    This used to be true since the two tries have the same node structure. However, as far as Mapping interface is concerned, they use different keys, i.e. `set(t0) != set(t1). Secondly:

    >>> t0 = StringTrie({'foo/bar.baz': 42}, separator='/')
    >>> t1 = StringTrie({'foo/bar.baz': 42}, separator='.')
    >>> t0 == t1
    True
    

    This used to be false since the two tries have different node structures (the first one splits key into ('foo', 'bar.baz') while the second into ('foo/bar', 'baz')). However, their keys are the same, i.e. `set(t0) == set(t1). And lastly:

    >>> t0 = Trie({'foo': 42})
    >>> t1 = CharTrie({'foo': 42})
    >>> t0 == t1
    False
    

    This used to be true since the two tries have the same node structure. However, the two classes return key as different values. pygtrie.Trie returns keys as tuples while pygtrie.CharTrie returns them as strings.

2.4.2: 2021/01/03

  • Remove use of ‘super’ in setup.py to fix compatibility with Python 2.7. This changes build code only; no changes to the library itself.

2.4.1: 2020/11/20

  • Remove dependency on packaging module from setup.py to fix installation on systems without that package. This changes build code only; no changes to the library itself. [Thanks to Eric McLachlan for reporting]

2.4.0: 2020/11/19 [pulled back from PyPi]

  • Change children argument of the node_factory passed to pygtrie.Trie.traverse from a generator to an iterator with a custom bool conversion. This allows checking whether node has children without having to iterate over them (bool(children))

    To test whether this feature is available, one can check whether Trie.traverse.uses_bool_convertible_children property is true, e.g.: getattr(pygtrie.Trie.traverse, 'uses_bool_convertible_children', False).

    [Thanks to Pallab Pain for suggesting the feature]

2.3.3: 2020/04/04

  • Fix to ‘AttributeError: _NoChildren object has no attribute sorted_items’ failure when iterating over a trie with sorting enabled. [Thanks to Pallab Pain for reporting]
  • Add value property setter to step objects returned by pygtrie.Trie.walk_towards et al. This deprecates the set method.
  • The module now exports pygtrie.__version__ making it possible to determine version of the library at run-time.

2.3.2: 2019/07/18

  • Trivial metadata fix

2.3.1: 2019/07/18 [pulled back from PyPi]

  • Fix to pygtrie.PrefixSet initialisation incorrectly storing elements even if their prefixes are also added to the set.

    For example, PrefixSet(('foo', 'foobar')) incorrectly resulted in a two-element set even though the interface dictates that only foo is kept (recall that if foo is member of the set, foobar is as well). [Thanks to Tal Maimon for reporting]

  • Fix to pygtrie.Trie.copy method not preserving enable-sorting flag and, in case of pygtrie.StringTrie, separator property.

  • Add support for the copy module so copy.copy can now be used with trie objects.

  • Leafs and nodes with just one child use more memory-optimised representation which reduces overall memory usage of a trie structure.

  • Minor performance improvement for adding new elements to a pygtrie.PrefixSet.

  • Improvements to string representation of objects which now includes type and, for pygtrie.StringTrie object, value of separator property.

2.3: 2018/08/10

  • New pygtrie.Trie.walk_towards method allows walking a path towards a node with given key accessing each step of the path. Compared to pygtrie.Trie.walk_prefixes method, steps for nodes without assigned values are returned.
  • Fix to pygtrie.PrefixSet.copy not preserving type of backing trie.
  • pygtrie.StringTrie now checks and explicitly rejects empty separators. Previously empty separator would be accepted but lead to confusing errors later on. [Thanks to Waren Long]
  • Various documentation improvements, Python 2/3 compatibility and test coverage (python-coverage reports 100%).

2.2: 2017/06/03

  • Fixes to setup.py breaking on Windows which prevents installation among other things.

2.1: 2017/03/23

2.0: 2016/07/06

  • Sorting of child nodes is disabled by default for better performance. pygtrie.Trie.enable_sorting method can be used to bring back old behaviour.
  • Tries of arbitrary depth can be pickled without reaching Python’s recursion limits. (N.B. The pickle format is incompatible with one from 1.2 release). _Node’s __getstate__ and __setstate__ method can be used to implement other serialisation methods such as JSON.

1.2: 2016/06/21 [pulled back from PyPI]

  • Tries can now be pickled.
  • Iterating no longer uses recursion so tries of arbitrary depth can be iterated over. The pygtrie.Trie.traverse method, however, still uses recursion thus cannot be used on big structures.

1.1: 2016/01/18

  • Fixed PyPI installation issues; all should work now.

1.0: 2015/12/16

  • The module has been renamed from trie to pygtrie. This could break current users but see documentation for how to quickly upgrade your scripts.
  • Added pygtrie.Trie.traverse method which goes through the nodes of the trie preserving structure of the tree. This is a depth-first traversal which can be used to search for elements or translate a trie into a different tree structure.
  • Minor documentation fixes.

0.9.3: 2015/05/28

  • Minor documentation fixes.

0.9.2: 2015/05/28

  • Added Sphinx configuration and updated docstrings to work better with Sphinx.

0.9.1: 2014/02/03

  • New name.

0.9: 2014/02/03

  • Initial release.