MySQL query performance which uses huge table to generate the report for a given day with the cummulative count and count – Mysql

by
Ali Hasan
libmysqlclient query-optimization

Quick Fix: Utilize a summary table to store daily subtotals. Consider partitioning for easy data deletion. Drop unnecessary indexes and avoid selecting the same column twice. Qualify columns with their respective tables. Use JOIN instead of LEFT JOIN when appropriate. Use EXISTS or LEFT JOIN with IS NULL instead of NOT IN. Store date and time values in DATETIME or TIMESTAMP rather than VARCHAR. Explore using CTEs instead of temp tables. Implement suggested indexes and drop redundant ones.

The Problem:

I have a table with 2 million records locally and 80 million records on the production server. I need a MySQL query that generates a report for a given day with the cumulative count and count. The query takes 2 minutes for 2 million records locally, and I’m concerned about the performance impact on the production server.

The table is partitioned based on the ProductionStatusNo column, and I have indexes on StatusDateTime, ProductionFacility, and ProductionStatusNo. The query uses a stored procedure and a series of UNION ALL statements to calculate the cumulative count and count for each ProductionStatus.

How can I improve the performance of the query on the production server?

The Solutions:

Solution 1: Maintain a Summary Table

To improve the performance of your MySQL query, consider creating and maintaining a summary table that contains daily subtotals. This can significantly reduce the amount of data that needs to be processed, leading to faster query execution times.

Key Points:

  • The summary table should contain aggregated data for each day, such as the total count and cumulative count.
  • Regularly update the summary table to keep it in sync with the detail table.
  • Use the summary table for reporting purposes instead of directly querying the detail table.

Benefits:

  • Improved query performance, especially for large datasets.
  • Reduced load on the database server.
  • Simplified queries, as you only need to work with the summary table.

Additional Recommendations:

  • Partition the detail table based on ProductionStatusNo to improve performance when deleting old data.
  • Avoid redundant indexes and ensure that the ones you have are appropriate for your queries.
  • Use JOIN instead of LEFT JOIN and check for NULL values when necessary.
  • Avoid using COALESCE when it’s not needed.
  • Consider using CTEs instead of temp tables when possible.
  • Consider adding indexes to the exd and d tables as suggested.

Q&A

Suggest a practical improvement

Try to build and maintain summary table

How can partitioning help?

Partitioning isn’t useful unless deleting old data

Is not exists or left join more efficient?

Try not exists (select 1 …) or left join … where … is null

Video Explanation:

The following video, titled "Amazon Aurora I/O Cost Optimization Methodology | Amazon Web ...", provides additional insights and in-depth exploration related to the topics discussed in this post.

Play video

... using various AWS services & features like AWS cost explorer, AWS cost and usage reports ... running in a particular Aurora cluster might be ...