We like MySQL, we really do. Generally it performs well and does what it’s supposed to do without much fuss. It’s a dependable workhorse that we usually trust blindly to get the job done. However, during a recent cry for help from one of our customers we experienced a severe performance issue when trying to import their old production database. What took 2 minutes using MyISAM tables, took more than 3 hours using InnoDB. Of course we could have just been happy that the import completed and that we now had it in our database, but we are curious folks and when the performance devil pokes us, we want to know why.
One of our customers gave us a copy of their production database, containing about 1 million rows, in order for us to diagnose a problem. We imported the file, which took about 2 minutes, took a look and discovered that the tables were of the MyISAM type. Not thinking that this would cause an issue, we did a search/replace on the SQL dump, coverted the MyISAM table designations to InnoDB, created a new database, restarted the import and waited … and waited some more, went to get coffee, came back and waited some more. After about 15 minutes we got tired of waiting and took a look at the database. The import was adding rows, but it was doing so at a speed which was so abysmal, that we found it hard to believe. The machine it was running on is an 4-core i7-2600K with Hyperthreading, 8 GB RAM and 2 SATA disks; certainly no slouch. So what was going on?
We have long since settled on the following MySQL configuration file for our development machines:
innodb_buffer_pool_size = 64M max_heap_table_size = 64M query_cache_size = 32M table_cache = 256 read_buffer_size = 8M read_rnd_buffer_size = 8M sort_buffer_size = 8M tmp_table_size = 64M thread_cache_size = 8 thread_concurrency = 8
and figured that this configuration would be valid for anything we do. It usually is, since for our development work we usually don't deal with huge data sets and are more concerned with getting indexing and algorithms right. However in this case this wasn't enough. After much searching and testing we found that adding the following lines to my.cnf allowed us to wring a dramatic performance increase for our import script:
innodb_flush_log_at_trx_commit = 0 innodb_flush_method = O_DIRECT
With these 2 lines added, the import now ran in 1 minute 8 seconds, which is twice as fast as the original MyISAM import and about 200(!) times as fast as our first attempt using InnoDB tables. While arriving this type of performance improvement is great, it’s only half the battle; the 2nd half is understanding why. So what do these options mean:
innodb_flush_log_at_trx_commit: As a default, innodb_flush_log_at_trx_commit is set to 1, meaning the log is flushed to the disk at a transaction commit, and modifications made by the transaction won’t be lost during a MySQL, OS, or HW crash. For workloads running with many small transactions, you can reduce disk I/O to the logs to improve performance by setting the innodb_flush_log_at_trx_commit parameter to 0, meaning no log flushing on each transaction commit. However, the transaction might be lost if MySQL crashes so for ACID compliance the default value of 1 is required!
innodb_flush_method: This variable changes the way InnoDB open files and flush data to disk and is should be considered as very important for InnoDB performance. By default, InnoDB uses fsync() (without O_DSYNC) to flush both log and data files. Setting this variable to O_DIRECT will result in InnoDB using O_DIRECT while opening files and fsync() to flush both data and log files. O_DIRECT is useful when an application maintains it's own caching mechanism which is very well true for MySQL/InnoDB. O_DIRECT is the option that should be used in most of the cases as it takes the overhead of double buffering and reduces swap pressure. So if you are not doing anything unusual like SAN storage etc (which otherwise also you should reconsider before doing), always use O_DIRECT for this. This leads to a significant improvement in InnoDB performance by removing double buffering.
So there you have it. 2 parameters which together improve bulk insert performance by a factor of 200. Certainly something worth knowing.
In case you care how much each of these options contributed to the performance increase: innodb_flush_log_at_trx_commit accounted for about 95% of the performance increase with the remaining 5% going to innodb_flush_method.