'EMR memory utilization

I have an EMR cluster running presto with 1 main node and 5 core nodes, all of type r6gd.16xlarge, which is ridiculously large. I am using a Glue catalog as the meta store.

Now, the problem is, that if I run an insert of about 6MM rows it takes about 13-15 minutes, which, given the hardware seems slow. Since EC2 does not have memory monitoring out of the box, I installed the CW agent and created memory metrics. I have added this configuration to the presto server:

            {
                classification: 'presto-config',
                configurationProperties: {
                    'query.max-memory-per-node': '128GB', // 25% of a node
                    'query.max-total-memory-per-node': '256GB', // 50% of a node
                    'query.max-memory': '1536GB', // 60% of the cluster
                    'query.max-total-memory': '2048GB', // 80% of the cluster
                    'query.low-memory-killer.policy': 'none',
                },
            },

However, while running the insert, CPU never goes above 2% on any node and memory usage never goes above 3%.

What can I change in the presto settings to "unlock" using more of my hardware?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source