'How to implement the remove function of a trie in python?
I've read the following implementation of the trie in python: https://stackoverflow.com/a/11016430/2225221
and tried to make the remove fnction for it. Basically, I had problems even with the start: If you want to remove a word from a trie, it can has sub-"words", or it can be "subword" of another word.
If you remove with "del dict[key]", you are removing these above mentioned two kinds of words also. Could anyone help me in this, how to remove properly the chosen word (let us presume it's in the trie)
Solution 1:[1]
I think it is better to do it recursively, code as following:
def remove(self, word):
self.delete(self.tries, word, 0)
def delete(self, dicts, word, i):
if i == len(word):
if 'end' in dicts:
del dicts['end']
if len(dicts) == 0:
return True
else:
return False
else:
return False
else:
if word[i] in dicts and self.delete(dicts[word[i]], word, i + 1):
if len(dicts[word[i]]) == 0:
del dicts[word[i]]
return True
else:
return False
else:
return False
Solution 2:[2]
def remove_a_word_util(self, word, idx, node):
if len(word) == idx:
node.is_end_of_word = False
return bool(node.children)
ch = word[idx]
if ch not in node.children:
return True
flag = self.remove_a_word_util(word, idx+1, node.children[ch])
if flag:
return True
node.children.pop(ch)
return bool(node.children) or node.is_end_of_word
Solution 3:[3]
One method of handling structures like this is through recursion. The great thing about recursion in this case is that it zips to the bottom of the trie, then passes the returned values back up through the branches.
The following function does just that. It goes to the leaf and deletes the _end value, just in case the input word is a prefix of another. It then passes up a boolean (boo) which indicates that the current_dict is still in an outlying branch. Once we hit a point where the current dict has more than one child, we delete the appropriate branch and set boo to False so each remaining recursion will do nothing.
def trie_trim(term, trie=SYNONYMS, prev=0):
# checks that we haven't hit the end of the word
if term:
first, rest = term[0], term[1:]
current_length = len(trie)
next_length, boo = trie_trim(rest, trie=trie[first], prev=current_length)
# this statement avoids trimming excessively if the input is a prefix because
# if the word is a prefix, the first returned value will be greater than 1
if boo and next_length > 1:
boo = False
# this statement checks for the first occurrence of the current dict having more than one child
# or it checks that we've hit the bottom without trimming anything
elif boo and (current_length > 1 or not prev):
del trie[first]
boo = False
return current_length, boo
# when we do hit the end of the word, delete _end
else:
del trie[_end]
return len(trie) + 1, True
Solution 4:[4]
A bit of a long one, but I hope this helps answer your question:
class Trie:
WORD_END = "$"
def __init__(self):
self.trie = {}
def insert(self, word):
cur = self.trie
for char in word:
if char not in cur:
cur[char] = {}
cur = cur[char]
cur[Trie.WORD_END] = word
def delete(self, word):
def _delete(word, cur_trie, i=0):
if i == len(word):
if Trie.WORD_END not in cur_trie:
raise ValueError("'%s' is not registered in the trie..." %word)
cur_trie.pop(Trie.WORD_END)
if len(cur_trie) > 0:
return False
return True
if word[i] not in cur_trie:
raise ValueError("'%s' is not registered in the trie..." %word)
cont = _delete(word, cur_trie[word[i]], i+1)
if cont:
cur_trie.pop(word[i])
if Trie.WORD_END in cur_trie:
return False
return True
return False
_delete(word, self.trie)
t = Trie()
t.insert("bar")
t.insert("baraka")
t.insert("barakalar")
t.delete("barak") # raises error as 'barak' is not a valid WORD_END although it is a valid path.
t.delete("bareka") # raises error as 'e' does not exist in the path.
t.delete("baraka") # deletes the WORD_END of 'baraka' without deleting any letter as there is 'barakalar' afterwards.
t.delete("barakalar") # deletes until the previous word (until the first Trie.WORD_END; "$" - by going backwards with recursion) in the same path (until 'baraka').
Solution 5:[5]
In case you need the whole DS:
class TrieNode:
def __init__(self):
self.children = {}
self.wordCounter = 0
self.prefixCounter = 0
class Trie:
def __init__(self):
self.root = TrieNode()
def insert(self, word: str) -> None:
node = self.root
for char in word:
if char not in node.children:
node.children[char] = TrieNode()
node.prefixCounter += 1
node = node.children[char]
node.wordCounter += 1
def countWordsEqualTo(self, word: str) -> int:
node = self.root
if node.children:
for char in word:
node = node.children[char]
else:
return 0
return node.wordCounter
def countWordsStartingWith(self, prefix: str) -> int:
node = self.root
if node.children:
for char in prefix:
node = node.children[char]
else:
return 0
return node.prefixCounter
def erase(self, word: str) -> None:
node = self.root
for char in word:
if node.children:
node.prefixCounter -= 1
node = node.children[char]
else:
return None
node.wordCounter -= 1
if node.wordCounter == 0:
self.dfsRemove(self.root, word, 0)
def dfsRemove(self, node: TrieNode, word: str, idx: int) -> None:
if len(word) == idx:
node.wordCounter = 0
return
char = word[idx]
if char not in node.children:
return
self.dfsRemove(node.children[char], word, idx+1)
node.children.pop(char)
trie = Trie();
trie.insert("apple"); #// Inserts "apple".
trie.insert("apple"); #// Inserts another "apple".
print(trie.countWordsEqualTo("apple")) #// There are two instances of "apple" so return 2.
print(trie.countWordsStartingWith("app")) #// "app" is a prefix of "apple" so return 2.
trie.erase("apple") #// Erases one "apple".
print(trie.countWordsEqualTo("apple")) #// Now there is only one instance of "apple" so return 1.
print(trie.countWordsStartingWith("app")) #// return 1
trie.erase("apple"); #// Erases "apple". Now the trie is empty.
print(trie.countWordsEqualTo("apple")) #// return 0
print(trie.countWordsStartingWith("app")) #// return 0
Solution 6:[6]
I would argue that this implementation is the most succinct and easiest to understand after a bit of staring.
def removeWord(word, node=None):
if not node:
node = self.root
if word == "":
node.isEnd = False
return
newnode = node.children[word[0]]
removeWord(word[1:], newnode)
if not newnode.isEnd and len(newnode.children) == 0:
del node.children[word[0]]
Although it's a little tricky to understand with the default parameter node=None at first, this is the most succinct implementation of a Trie removal that handles marking the word node.isEnd = False while also pruning extraneous nodes.
- The method is first called as
Trie.removeWord("ToBeDeletedWord"). - In subsequent recursion calls, a node tied to the corresponding letter ("T" then "o" then "B" then "e" etc. etc.) is added to the next recursion (e.g "remove 'oBeDeletedWord' with the node at T").
- Once we hit the end node that has the full string
ToBeDeletedWord, the last recursion callsremoveWord("", <node d>) - In this last recursion call, we mark
node.isEnd = False. Later, the node is no longer markedisEndand it has no children so we can call the delete operator. - Once that last recursion call ends, the rest of the recursions (e.g
TobeDeletedWor,TobeDeletedWo,TobeDeletedW, etc. etc.) will then observe that it too is not an end node and there are no more children. These nodes will also delete.
You will have to read this a couple of times but this implementation is concise, readable, and correct. The difficulty is that the recursion happens midfunction rather than at the beginning or end.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | user10542404 |
| Solution 2 | himanshu sangal |
| Solution 3 | Eric Ed Lohmar |
| Solution 4 | mcagriardic |
| Solution 5 | |
| Solution 6 | DFeng |
