'Nested for loop is really slow for query set traversal in Django
I have two models named machine and performance,
class machine(models.Model):
machine_type = models.CharField(null=True, max_length=10)
machine_no = models.IntegerField(null=True)
machine_name = models.CharField(null=True,max_length=255)
machine_sis = models.CharField(null=True, max_length=255)
store_code = models.IntegerField(null=True)
created = models.DateTimeField(auto_now_add=True)
class Performance(models.Model):
machine_no = models.IntegerField(null=True)
power = models.IntegerField(null=True)
store_code = models.IntegerField(null=True)
created = models.DateTimeField(auto_now_add=True)
For each Machine, there are multiple fields of in Performance Model and I have to find the count of Performance Model's rows in the db which have power = some_integer. Here is what my view looks like:
machines = machine.objects.filter(machine_type="G",machine_sis="919")
Let's say machine.count() sometimes is 100. For each of this machine I need to calculate the number of machines which have power = 100 in performance model. So what I did first was but was really slow:
for obj in machines:
print performance.objects.filter(machine_no=obj.machine_no,power=100).count()
My second approach was faster than the first approach:
for obj in machines:
data = performance.objects.filter(machine_no=obj.machine_no,power=100)
counter = 0
for p in data: # ***** lets say this loop is called star-loop
if p.power == 100:
counter +=1
My problem is that the speed is really slow when I have to check 100 machines in Performance model whose power = something.
I am not using foreign key in Performance Model because the actual architecture is more complex and I can't use machine number or anything as foreign key because when identifying each machine uniquely I need multiple columns of machine.
Also, this project is working in production and I can't take much chance. I am using Django 1.11, Python 2.7 and postresql rds instance. I have increased the network performance buy renting a better instance from aws.
Solution 1:[1]
This looks like a case of the N+1 Select porblem. You can do the following to reduce query count:
machines = machine.objects.filter(machine_type="G",machine_sis="919")
machine_nos = machine.values_list('machine_no', flat=True)
performance = performance.objects.filter(machine_no__in=machine_nos, power=100)
This reduces number of queries to a maximum of three
Solution 2:[2]
You can use raw queries. It maybe like this. Please update to use exactly your database table name.
machine.objects.raw(select * from machine as b
join (select count(id), machine_no from performance where power=100
group by machine_no) as a
on b.id = a.machine_no
where b.machine_type="G" and b.machine_sis="919")
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | rtindru |
| Solution 2 | Nguyen Quang Trung |
