'What is the best way to enhance query performance for a large database?

I'm working on an RDS MariaDB database which contains more than 270 highly-connected tables with more than 10 million records. The main goal is to enhance queries performance, the most important queries are very complex and I need a solution that handles these queries more effectively. The following is a query that generates the result in 6 minutes!!

SELECT table1.col1  AS col1,
   table1.col2      AS col2,
   table2.col1      AS country,
   table3.col1      AS col1,
   table2.col1      AS col1,
   table3.col2      AS time,
   Concat_ws('; ', Collect_set(DISTINCT table4.col1)) AS Route,
   CASE
     WHEN table5.col1 IS NULL THEN 'N/A'
     ELSE table5.col1
   end              AS MAH,
   CASE
     WHEN table6.col1 IS NULL THEN 'N/A'
     ELSE table6.col1
   end              AS MFG,
   table7.col1      AS PackSize,
   table1.col3      AS col3
FROM   table3
   INNER JOIN table7
           ON table3.col3 = table7.col1
   INNER JOIN table8
           ON table3.col4 =
              table8.col4
   INNER JOIN table2
           ON table8.col4 = table2.col1
   LEFT JOIN table9 AS table5
          ON table3.col5 = table5.col1
   LEFT JOIN table9 AS table6
          ON table3.col6 = table6.col1
   INNER JOIN table10
           ON table3.col3 =
              table10.col1
   INNER JOIN table4
           ON table10.col2 = table4.col1
   INNER JOIN (SELECT DISTINCT table11.col1                  AS
                               col3,
                               table12.col1 AS
          DosageUnit_col1,
                               table13.col1             AS
          col1,
                               table13.col2                AS
                               col2
               FROM   table7
                      INNER JOIN table11
                              ON table7.col2 =
                                 table11.col1
                      INNER JOIN table14
                              ON table7.col1 =
                                 table14.col1
                      INNER JOIN table12
                              ON table14.col1 =
                                 table12.col1
                      INNER JOIN (SELECT table15.col1
                                         AS
                                         col1,
                                         Concat_ws(' / ', Collect_set(
                                         table16.col1)) AS
                              col1,
                                         Concat_ws('/', Collect_set(
                                         table17.col2,
                                         table18.col1))
                                         AS
                              col2
                                  FROM   table15
                                         INNER JOIN table17
                                                 ON table15.col1 =
table17.col3
INNER JOIN table18
ON
table17.col1 =
table18.col1
INNER JOIN table16
ON table17.col4 =
table16.col1
GROUP  BY table15.col1) AS
table13
ON table12.col2 =
table13.col1
GROUP  BY dosageunit_col1) AS table1
ON table7.col1 = table1.dosageunit_col1
WHERE  table3.col3 IN ( ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?,     ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?,     ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?,     ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?,     ?, ?) and `market_authorizations`.`authority_id` in (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
   AND table3.col4 IN ( ?, ?, ?, ?, ?, ?, ?, ?, ?, ? )
   AND table3.mapped_col1 IS NULL
   AND col2 >= ''
GROUP  BY table3.col1
ORDER  BY table3.col2 DESC;

The first idea I thought about is to migrate the data to Amazon Redshift, so I created a migration task using AWS DMS to export all data from RDS to S3 in Apache Parquet format to take advantage of the columnar storage and compression. The migration task created a parquet file for each exported table. To test the performance of the various options to use in the solution (RDS, Redshift, Redshift Spectrum, Athena), I used the following query which returns the first million records of a table:

select * from table limit 1000000;

and the execution time in seconds was as follows:

  • RDS MariaDB (8 CPU, 32 RAM, 600 IOPS) => 23.462 sec
  • Redshift Spectrum (Two nodes, each node is with 2 CPUs, 15 RAM) => 23 sec
  • Athena => 4.01 sec

Although Athena and Redshift spectrum read from the same file stored in S3, the performance between both is significantly different.

Why the performance didn't get better at all when we used Redshift Spectrum?

Why do Redshift and RDS have the same performance?

Why I have got the same performance exactly (23 sec) when I resized the Redshift cluster from one node into two nodes?

Is that because I use unpartitioned data (one parquet file for each table)?

Why does Athena have the best performance even though it reads from the same source as the Redshift Spectrum?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source