'Postgres transaction table with Billion rows and multiple JSON columns

So we have a new project where we need to use postgres 14 to scale up a transaction table that gets heavily updated. The Master table has about a 60 million rows over a six month period and a child table has about 600 million rows. Data retention period is six months after which we have to drop the oldest month partition.

I want opinions from Postgres Experts on whether this design is right and whether anything is overlooked:

Parent/Master table

ID
JSON 1---> A couple of hundred characters
JSON 2 ---> 50 characters

The table has about 20 columns. Updates are always based on the primary key.

Child Table
Parent_IDFK (Parent key from Parent or Master Table)
Occurance_id (Every parent has 10 rows in the Child table, 1,2,3,4,5....)These are occurances
Occurance JSON . Each Child linked to a parent has a specific JSON. Lets call it Occurance JSON. So Child 1 has Occurance 1 JSON. Child 2 has Occurance 2 JSON. 

Over the period of a day,a row first gets inserted into the master. Then about 10 rows get inserted into the child. After the child record is inserted, we have to update the parent with aggregate occurance. The parent JSON aggregate in the parent table will look something like this

UPDATE PARENT SET AGGREGATE_JSON= (sum up the SUM of occurrences in the Child table for that parent key) WHERE ID=<>.

There will also be updates to the Child table based on primary key and occurance id.

Other than that, there will be heavy reads. Here is my design

1)Primary Key on the Master ID. There may be no need to partition a sixty million row table. Because searches are based on dates, I will have another index on the startdate.

2)Child table. Primary key is Master ID, Occurance ID, StartDate. Table is partitioned based on start date

3)Will try to compute aggregates as much as possible on a daily basis and read from aggregates so full table scans are avoided.

4)When we update the child table, we always specify the partition. Something like this

UPDATE CHILD SET <> WHERE PARENT_IDFK=<> AND OCCURANCE_ID=<> AND START_DATE(partition key)=<>.

That way full table scans are avoided.

5)All INSERTS/UPDATES will be via stored procedures keeping the Python/Flask middleware as SQL code-free as possible.

Any other points you want to add to this or is this good enough?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source