•  5 min read  •  416 views

Tuning MySQL Insert Speed for Fun and Profit

Tuning MySQL Insert Speed for Fun and Profit

I like MySQL, I really do. Generally it performs well and does what it’s supposed to do without much fuss. It’s a dependable workhorse that I usually trust blindly to get the job done. However, during a recent cry for help from one of our customers, I experienced a severe performance issue when trying to import their old production database. What took 2 minutes using MyISAM (don't ask, this is an absolutely ancient system) tables, took more than 3 hours using InnoDB. Of course I could have just been happy that the import completed and that I now had it in our database, but I'm a curious guy and when faced with a performance regression of this magnitude, I want to know what's going on.

One of my old customers gave me a copy of their production database, containing about 50 million rows, in order for us to diagnose a problem. I imported the file, which took about 2 minutes, took a look and discovered that the tables were of the MyISAM type (again, don't ask). Not thinking that this would cause an issue, I did a search/replace on the SQL dump, coverted the MyISAM table designations to InnoDB, created a new database, restarted the import and waited … and waited some more, went to get coffee, came back and waited some more. After about 15 minutes I got tired of waiting and took a look at the database. The import was adding rows, but it was doing so at a speed which was so abysmal, that I found it hard to believe. The machine it was running on is an 8-core Ryzen-9, 32 GB RAM and 2 Nvme SSD disks; certainly no slouch. So what was going on?

I have long since settled on the following MySQL configuration file for my development machine

innodb_buffer_pool_size = 256M
max_heap_table_size = 256M
query_cache_size = 128M
table_cache = 512
read_buffer_size = 64M
read_rnd_buffer_size = 64M
sort_buffer_size = 64M
tmp_table_size = 256M
thread_cache_size = 16
thread_concurrency = 16

and figured that this configuration would be valid for anything I do. It usually is, since for our development work I usually don't deal with huge data sets and I'm more concerned with getting indexing and algorithms right. However in this case this wasn't enough. After much searching and testing, I found that adding the following lines to my.cnf allowed us to get a dramatic performance increase for our import script:

innodb_flush_log_at_trx_commit = 0
innodb_flush_method = O_DIRECT

With these 2 lines added, the import now ran in 1 minute 8 seconds, which is twice as fast as the original MyISAM import and about 200(!) times as fast as my first attempt using InnoDB tables. While arriving this type of performance improvement is great, it’s only half the battle; the 2nd half is understanding why. So what do these options mean:

  • innodb_flush_log_at_trx_commit: As a default, innodb_flush_log_at_trx_commit is set to 1, meaning the log is flushed to the disk at a transaction commit, and modifications made by the transaction won’t be lost during a MySQL, OS, or HW crash. For workloads running with many small transactions, you can reduce disk I/O to the logs to improve performance by setting the innodb_flush_log_at_trx_commit parameter to 0, meaning no log flushing on each transaction commit. However, the transaction might be lost if MySQL crashes so for ACID compliance the default value of 1 is required!
  • innodb_flush_method; This variable changes the way InnoDB opens files and flushes data to disk and is should be considered as very important for InnoDB performance. By default, InnoDB uses fsync() (without O_DSYNC) to flush both log and data files. Setting this variable to O_DIRECT will result in InnoDB using O_DIRECT while opening files and fsync() to flush both data and log files. O_DIRECT is useful when an application maintains it's own caching mechanism which is very well true for MySQL/InnoDB. O_DIRECT is the option that should be used in most of the cases as it takes the overhead of double buffering and reduces swap pressure. So if you are not doing anything unusual like SAN storage etc (which otherwise also you should reconsider before doing), always use O_DIRECT for this. This leads to a significant improvement in InnoDB performance by removing double buffering.

So there you have it. 2 parameters which together improve bulk insert performance by a factor of 200. Certainly something worth knowing.

In case you care how much each of these options contributed to the performance increase: innodb_flush_log_at_trx_commit accounted for about 95% of the performance increase with the remaining 5% going to innodb_flush_method

Alternative Approach/Solution

The above settings are relevant when you have lots of insert statements to process. In this case, the database export was created in this manner and that's all I had to work with. If you know that you'll potentially facing this situation, you should create your dump to make use of the extended-insert option enabled (as is the default). This will generate statements which insert multiple rows using a single statement. The resulting SQL looks like this:

INSERT INTO MyTable (id,name,city)  
VALUES (1, 'Jeff', 'New York'), 
       (2, 'Mike', 'Miami'), 
       (3, 'Steve', 'Dallas');  

Laravel-hero Povilas Korop has created a nice video which benchmarks alternative approaches when having to perform many inserts. You can find this video on his YouTube channel. It's more Laravel-based than the raw SQL which led to this article, but it's well done and may give you alternative ideas, especially in the context of using a PHP Framework.


Related Posts