'How to Analyze database performance
I have a 4 TB data base.
The use is more as a data warehouse.
Using the PostgreSQL DB +Django.
structure looks like Instagram:
class Influencer(models.Model):
followers = models.ManyToManyField(
"instagram_data.Follower", verbose_name=_("Followers"))
class Follower(models.Model):
is_private = models.BooleanField(_("is_private"), default=False, db_index=True)
is_verified = models.BooleanField(_("is_verified"), default=False, db_index=True)
is_business = models.BooleanField(_("is_business"), default=False, db_index=True)
business_category = models.CharField(
_("business_category"), max_length=254, null=True, blank=True, db_index=True)
media_count = models.BigIntegerField(_("media_count"), default=0, db_index=True)
follower_count = models.BigIntegerField(_("follower_count"), default=0, db_index=True)
following_count = models.BigIntegerField(_("following_count"), default=0, db_index=True)
gender = models.CharField(
_("gender"), choices=Gender.choices, null=True,
blank=True, max_length=50, db_index=True)
language = models.ManyToManyField(
"instagram_data.Language", verbose_name=_("language_code_name"), blank=True)
Where each Followers has many "filters".
I first choose X influences, later choose the followers Filter (is_private and so ..)
It does take some time to get results (like 30 sec for each 100K users).
Most filters are Indexed as you can see.
What can I do to speed up things ?
How can I tell if the bottleneck is the Disk i/o bounds and what speed should the disk be ?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|

