'Alternate optimized ways to check for presence of an element in a list?
I have an simple brute force AI code for the 8 puzzle problem. One of the steps involves checking if the current state has already been visited, and all the visited states are stored in a list called visited = [].
I noticed my program was taking too long to find the solution, and started using wall time checks to see what functions were taking the longest. To my surprise, the built in sorting algorithm was using almost 35 seconds, and had to sort through a maximum of 21000 elements at a time.
On the other hand, my visited list reached a maximum length of 42000. (I'm appending Class objects to this visited list as explained below)
Every time a new state is generated, it is compared with all the states in the visited list to check if it already exists there using the not_in method.
def not_in(self, visited):
for node in visited:
if (self.data == node.data):
return 0
return 1
This checks if the currently passed in Node has its Node.data equal to any other Node in the visited list.
The caller function looks like this -
if Node.not_in(temp_right, visited):
visited.append(temp_right)
where self refers to a class object of the following type-
class Node:
def __init__(self, data, parent):
self.data = data
self.parent = parent
where self.data is a list, and self.parent refers to another class object Node, like a linked list.
This not_in function has to perform over 1366241896 comparisons to reach the solution from my current starting state.
Is there a way to optimize this function? Because that's a lot of comparisons that I'd like to avoid. The whole program exclusively uses lists to store data, if that helps my case.
Solution 1:[1]
"my visited list reached a maximum length of 42000". But in theory the possible states are 9!, or 362880.
One possibility would be to use a dict to create "buckets" where to store the lists of states that pertain to a given bucket. Let's assume you store the state as a 9-char string, so the solved state would be something like 12345678_. Then you may use e.g. the first 3 chars to identify the bucket - you would have 504 buckets (the permutations of 9 values in 3 slots), each with a maximum list lenght of 6!, or 720.
The not_in method would change accordingly. I made two non-object versions of your method, with the list and the dict/bucket approach respectively, and compared the perfomances on a random sample of 10000 states:
from collections import defaultdict
from itertools import permutations
from random import sample
from timeit import timeit
##functions
def not_in_l(data, visited_l):
for node in visited_l:
if (data == node):
return 0
return 1
def not_in_d(data, visited_d):
for node in visited_d[data[:3]]:
if (data == node):
return 0
return 1
##setup sample data
visited_l = sample(list(permutations('12345678_')),10000)
visited_d = defaultdict(list)
for x in visited_l:
visited_d[x[:3]].append(x)
##test that functions return same value
for i in sample(range(10000),100):
assert not_in_l(visited_l[i], visited_l) == not_in_d(visited_l[i], visited_d)
print('list function')
print(timeit('for i in range(0,10000,999):not_in_l(visited_l[i],visited_l)', globals=globals(), number=1000))
print('dict function')
print(timeit('for i in range(0,10000,999):not_in_d(visited_l[i],visited_d)', globals=globals(), number=1000))
The results on my PC are
list function
3.9646465
dict function
0.02327440000000003
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | gimix |
