'How to implement the remove function of a trie in python?

I've read the following implementation of the trie in python: https://stackoverflow.com/a/11016430/2225221

and tried to make the remove fnction for it. Basically, I had problems even with the start: If you want to remove a word from a trie, it can has sub-"words", or it can be "subword" of another word.

If you remove with "del dict[key]", you are removing these above mentioned two kinds of words also. Could anyone help me in this, how to remove properly the chosen word (let us presume it's in the trie)



Solution 1:[1]

I think it is better to do it recursively, code as following:

def remove(self, word):
    self.delete(self.tries, word, 0)

def delete(self, dicts, word, i):
    if i == len(word):
        if 'end' in dicts:
            del dicts['end']
            if len(dicts) == 0:
                return True
            else:
                return False
        else:
            return False
    else:
        if word[i] in dicts and self.delete(dicts[word[i]], word, i + 1):
            if len(dicts[word[i]]) == 0:
                del dicts[word[i]]
                return True
            else:
                return False

        else:
            return False

Solution 2:[2]

def remove_a_word_util(self, word, idx, node):
    if len(word) == idx:
        node.is_end_of_word = False
        return bool(node.children)

    ch = word[idx]
    if ch not in node.children:
        return True

    flag = self.remove_a_word_util(word, idx+1, node.children[ch])
    if flag:
        return True

    node.children.pop(ch)
    return bool(node.children) or node.is_end_of_word

Solution 3:[3]

One method of handling structures like this is through recursion. The great thing about recursion in this case is that it zips to the bottom of the trie, then passes the returned values back up through the branches.

The following function does just that. It goes to the leaf and deletes the _end value, just in case the input word is a prefix of another. It then passes up a boolean (boo) which indicates that the current_dict is still in an outlying branch. Once we hit a point where the current dict has more than one child, we delete the appropriate branch and set boo to False so each remaining recursion will do nothing.

def trie_trim(term, trie=SYNONYMS, prev=0):
    # checks that we haven't hit the end of the word
    if term:
        first, rest = term[0], term[1:]
        current_length = len(trie)
        next_length, boo = trie_trim(rest, trie=trie[first], prev=current_length)

        # this statement avoids trimming excessively if the input is a prefix because
        # if the word is a prefix, the first returned value will be greater than 1
        if boo and next_length > 1:
            boo = False

        # this statement checks for the first occurrence of the current dict having more than one child
        # or it checks that we've hit the bottom without trimming anything
        elif boo and (current_length > 1 or not prev):
            del trie[first]
            boo = False

        return current_length, boo

    # when we do hit the end of the word, delete _end
    else:
        del trie[_end]
        return len(trie) + 1, True

Solution 4:[4]

A bit of a long one, but I hope this helps answer your question:

class Trie:
    WORD_END = "$"
    
    def __init__(self):
        self.trie = {}

    def insert(self, word):
        cur = self.trie
        for char in word:
            if char not in cur:
                cur[char] = {}
            cur = cur[char]
        cur[Trie.WORD_END] = word

    def delete(self, word):
        def _delete(word, cur_trie, i=0):
            if i == len(word):
                if Trie.WORD_END not in cur_trie:
                    raise ValueError("'%s' is not registered in the trie..." %word)
                cur_trie.pop(Trie.WORD_END)
                if len(cur_trie) > 0:
                    return False
                return True
            if word[i] not in cur_trie:
                raise ValueError("'%s' is not registered in the trie..." %word)
            cont = _delete(word, cur_trie[word[i]], i+1)
            if cont:
                cur_trie.pop(word[i])
                if Trie.WORD_END in cur_trie:
                    return False
                return True
            return False
        _delete(word, self.trie)

t = Trie()
t.insert("bar")
t.insert("baraka")
t.insert("barakalar")

t.delete("barak") # raises error as 'barak' is not a valid WORD_END although it is a valid path.
t.delete("bareka") # raises error as 'e' does not exist in the path.
t.delete("baraka") # deletes the WORD_END of 'baraka' without deleting any letter as there is 'barakalar' afterwards.
t.delete("barakalar") # deletes until the previous word (until the first Trie.WORD_END; "$" - by going backwards with recursion) in the same path (until 'baraka').

Solution 5:[5]

In case you need the whole DS:

class TrieNode:
    def __init__(self):
        self.children = {}
        self.wordCounter = 0
        self.prefixCounter = 0

class Trie:
    def __init__(self):
        self.root = TrieNode()

    def insert(self, word: str) -> None:
        node = self.root
        for char in word:
            if char not in node.children:
                node.children[char] = TrieNode()
            
            node.prefixCounter += 1                  
            node = node.children[char] 

        node.wordCounter += 1

    def countWordsEqualTo(self, word: str) -> int:
        node = self.root
        if node.children:
            for char in word:
                node = node.children[char]               
        else:
            return 0
            
        return node.wordCounter
 
    def countWordsStartingWith(self, prefix: str) -> int:
        node = self.root
        if node.children:
            for char in prefix:
                node = node.children[char]               
        else:
            return 0

        return node.prefixCounter

    def erase(self, word: str) -> None:
        node = self.root
        for char in word:
            if node.children:
                node.prefixCounter -= 1
                node = node.children[char]
            else:
                return None

        node.wordCounter -= 1

        if node.wordCounter == 0:
            self.dfsRemove(self.root, word, 0)

    def dfsRemove(self, node: TrieNode, word: str, idx: int) -> None:
        if len(word) == idx:
            node.wordCounter = 0
            return

        char = word[idx]
        if char not in node.children:
            return

        self.dfsRemove(node.children[char], word, idx+1)
        
        node.children.pop(char)
        
            



trie = Trie();
trie.insert("apple");                     #// Inserts "apple".
trie.insert("apple");                     #// Inserts another "apple".
print(trie.countWordsEqualTo("apple"))    #// There are two instances of "apple" so return 2.
print(trie.countWordsStartingWith("app")) #// "app" is a prefix of "apple" so return 2.
trie.erase("apple")                       #// Erases one "apple".
print(trie.countWordsEqualTo("apple"))    #// Now there is only one instance of "apple" so return 1.
print(trie.countWordsStartingWith("app")) #// return 1
trie.erase("apple");                      #// Erases "apple". Now the trie is empty.
print(trie.countWordsEqualTo("apple"))    #// return 0
print(trie.countWordsStartingWith("app")) #// return 0

Solution 6:[6]

I would argue that this implementation is the most succinct and easiest to understand after a bit of staring.

        def removeWord(word, node=None):
            if not node:
                node = self.root
            if word == "":
                node.isEnd = False
                return

            newnode = node.children[word[0]]
            removeWord(word[1:], newnode)
            if not newnode.isEnd and len(newnode.children) == 0:
                del node.children[word[0]]

Although it's a little tricky to understand with the default parameter node=None at first, this is the most succinct implementation of a Trie removal that handles marking the word node.isEnd = False while also pruning extraneous nodes.

  • The method is first called as Trie.removeWord("ToBeDeletedWord").
  • In subsequent recursion calls, a node tied to the corresponding letter ("T" then "o" then "B" then "e" etc. etc.) is added to the next recursion (e.g "remove 'oBeDeletedWord' with the node at T").
  • Once we hit the end node that has the full string ToBeDeletedWord , the last recursion calls removeWord("", <node d>)
  • In this last recursion call, we mark node.isEnd = False. Later, the node is no longer marked isEnd and it has no children so we can call the delete operator.
  • Once that last recursion call ends, the rest of the recursions (e.g TobeDeletedWor, TobeDeletedWo, TobeDeletedW, etc. etc.) will then observe that it too is not an end node and there are no more children. These nodes will also delete.

You will have to read this a couple of times but this implementation is concise, readable, and correct. The difficulty is that the recursion happens midfunction rather than at the beginning or end.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 user10542404
Solution 2 himanshu sangal
Solution 3 Eric Ed Lohmar
Solution 4 mcagriardic
Solution 5
Solution 6 DFeng