'Resolving One-to-Many relations efficiently with GROUP BY/json_agg

My goal is to build a GraphQL Api thats backed by a Postgres Database. In order to query the database efficiently i need to generate SQL based on the incoming GraphQL query. One Solution that I found was to use join-monsters for this task.

Now to my question:

Given the following example schema from their docs img

Why is join-monsters generating queries like this:

  SELECT 
    u.name as 'username', 
    u.id as 'user_id',
    p.id as 'post_id',
    p.title as 'title',
    p.content as 'post_content',
    c.id as 'comment_id',
    c.content as 'comment_content',
    cu.name as 'commenter_name'
  FROM "user" u 
    LEFT JOIN "post" p ON p.user_id = u.id
    LEFT JOIN "comment" c ON c.post_id = p.id
    JOIN "user" cu ON c."user_id" = cu.id

Or like this:

  SELECT 
    u.name as 'username', 
    u.id as 'user_id',
    p.id as 'post_id',
    p.title as 'title',
    p.content as 'post_content',
  FROM "user" u 
    LEFT JOIN "post" p ON p.user_id = u.id
  
  -- use returned post_ids for next query

  SELECT 
    c.id as 'comment_id',
    c.content as 'comment_content',
    cu.name as 'commenter_name'
  FROM "comment" c 
    JOIN "user" cu ON c.user_id = cu.id
  WHERE c.post_id IN (1, 2, 3, 4)

But not like this:

  SELECT 
    u.name as 'username', 
    u.id as 'user_id',
    json_agg(p1.post) as 'posts'
  FROM "user" u 
  LEFT JOIN (
    SELECT 
      p.user_id,
      json_build_object(
        'id', p.id,
        'title', p.title,
        'content', p.content,
        'comments', json_agg(c1.comment)  
      )
    FROM post p
    LEFT JOIN (
      SELECT
        c.post_id,
        json_build_object(
          'id', c.id,
          'content', c.content,
          'commenter_name', cu.name
        ) as 'comment'
      FROM comment c
      JOIN "user" cu ON cu.id = c."user_id"
    )c1 ON c1."post_id" = p.id
    GROUP BY p."user_id", p.title, p.content
  )p1
  ON p1."user_id" = u.id
  GROUP BY u.name, u.id

I know that the first query leads to a lot of memory allocation and to the result set containing a lot of dublicate data (Assume in the first example a user has 20 posts each with 20 comments. That would mean having the same user appear 400 times in different rows in the result set), which is why join-monsters tries to split it up into multiple queries. Also the result of the first two queries would need to be hydrated to match the shape of the GraphQL Query. So wouldn´t the latter query need less memory then the first, since data is aggregated before being passed from one subquery to the next, thus reducing dublication and speeding up the query ?

What could be the drawback for such an approach ?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source