I’ve been lately compiling and generating .deb packages for several MySQL and MariaDB recent versions, and I wanted to try them more in depth -specially MySQL 8.0 and MariaDB 10.2, both of which are still in development.
Several people have already given their first impressions (or will do soon), and testing early is the best way to catch bugs and regressions, and get them fixed before the official release. In fact, as I will comment later, I ran into breaking bugs on both MySQL 8.0 and MariaDB 10.2, which I immediately reported or joined others on the bug report.
Last time I did a similar test, I found an important 30% regression in single-thread performance between 5.6 and 5.7, and I wasn’t the only one reporting single-thread performance concerns. This is why I included 3 generations of major database versions, the latest minor release -as of December 2016- of the old but stable 5.6/10.0; the latest GA versions 5.7/10.1 and the next, in-development versions 8.0/10.2 (exact versions shown bellow).
Very recently, there was a call for better MySQL benchmarks, and I cannot agree more with the general idea. However, I have to make some disclaimers: the following are “tests results”, closer to regression testing than to benchmarking. Benchmarking is hard, and I do not have the time or the resources to do them properly and extensively. What I wanted to do is to do a very specific test of a very specific operation (LOAD DATA in a single thread) under very specific circumstances/configuration [in some cases, bad configuration practices] to see if a previously-occurred problem was still there. This is not a mere thought experiment, it will help me tune better the import/backup recovery process, and did allow me get familiarized with the newest versions’ particular idiosyncrasies. Clearly, as you will discover, I do not yet know how to tune the latest unreleased versions, (who knows at this point), so join me on the discovery process.
Secondly, the test I will be doing (LOAD DATA) did not include secondary indexes or JOINs. Again, this will test import times, but not other more common operations like point selects, updates and range selects. Thirdly, I am comparing both Generally Available (stable) versions and versions still in development. The latter can change a lot between now and the release date. I repeat again: This will be in no way representative of the overall performance of MySQL or MariaDB versions. If you get away with the idea that “X is better than Y” based on a chart shown here, you will probably be very wrong. You have been warned.
I hope, however, that some of these results will be helpful to some fellow DBAs and MySQL/MariaDB users and develpers, and that is why I am sharing them early. You may also help me explain some of the results I am getting, which I may not 100% understand yet.
- Oracle MySQL 5.6.34
- Oracle MySQL 5.7.16
- Oracle MySQL 8.0.0-dmr (not “generally available”)
- MariaDB Server 10.0.28
- MariaDB Server 10.1.19
- MariaDB Server 10.2.2 (non “generally available”)
All versions were compiled from the source code downloaded from its official website with the recommended options (
Hardware (desktop grade- no Xeon or a proper RAID setup):
- Intel(R) Core(TM) i7-4770K CPU @ 3.50GHz (x86_64 Quad-core with hyperthreading)
- 16 GB of RAM
- Single, dektop-grade, Samsung SSD 850 PRO 512GB
OS and configuration:
- Debian GNU/Linux 8.6 “Jessie”
- datadir formatted as xfs, mounted with noatime option, all on top of LVM
Several server configurations where used, commented on each individual test.
- I used again the nodes.csv I used last time, a 3,700,635,579-byte text (tab-separated) file containing 46,741,126 lines (rows) with an old OSM node data dump for Spain:
171773 38.6048402 -0.0489871 4 2012-08-25 00:37:46 12850816 472193 rubensd 171774 38.6061981 -0.0496867 2 2008-01-19 10:23:21 666916 9250 j3m 171775 38.6067166 -0.0498342 2 2008-01-19 10:23:21 666916 9250 j3m 171776 38.6028122 -0.0497136 5 2012-08-25 00:26:40 12850816 472193 rubensd 171933 40.4200658 -3.7016652 6 2011-11-29 05:37:42 9984625 33103 sergionaranja
Those rows will be loaded using the load_data_04.py script I used last time, which basically creates a table like this:
CREATE TABLE `nodes` ( `id` bigint PRIMARY KEY, `lat` decimal(9,7), `lon` decimal(10,7), `version` int, `timestamp` timestamp, `changeset` bigint, `uid` bigint, `user` varchar(255) );
And then executes
LOAD DATA, then commits:
# finishing if (options.engine == 'MyISAM'): cursor.execute('FLUSH TABLES') else: db.commit() cursor.close() db.close()
General impressions and breaking bugs found
Setting up the server for the first time in MySQL and MariaDB has now differed. While MariaDB continues using the
mysql_install_db script, MySQL, since 5.7, uses the
mysqld server binary directly. Also, while I have to thank Oracle for focusing on security, if you are going to setup a server just for pure testing, the default account creation options can be a bit annoying. I have to say thanks because there is an additional `–initialize-insecure` which simplifies the setup for one-time setups like this benchmark or a jenkins test, and that you probably should not be using on a proper production. Maybe it could be helpful for automatic deploys, where accounts are taken care by a script.
For MySQL, I found a problem in 8.0.0 in which
mysqld --initialize did not work with binary collations (neither it started if I did the initialization with the default collation, and then started the server with binary as the default encoding). The error I got was:
mysqld: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ‘binary’ at line 1
[ERROR] Data Dictionary initialization failed.
This was clearly caused by the latest feature, the new InnoDB data dictionary. I reported this as Bug #83706 and apparently it will be fixed for 8.0.1.
Binary collation is very important for Wikipedias, because some time ago (and even recently, with the latest extensions to the standard) full Unicode support was difficult to get -and we need to support 300 languages. Also, we have certain needs like custom collations support (e.g. several collations on the same table, numeric sorting). As a consequence of that, most text data is now stored in raw binary. For these tests, I had to disable the binary character for some tests for 8.0.0, which I have appropriately noted it below.
For MariaDB, I ran into an issue that had already been reported (MDEV-10540), in which
mysql_install_db got hanged if the log_bin option was enabled. While this bug could be workarounded by doing so just for the installation, the option is so common and important that I think the issue should be solved as soon as possible.
I also run into MDEV-1124, MariaDB Server promoting an external business, a mistake which hopefully will be fixed by the release of 10.1.20.
In order to perform the tests, I had to destroy and recreate the datadir, starting from scratch on each run (
rm + --initialize). This is also a testimony that you no longer have binary compatibility between server vendors, due to the differences on the mysql schema, performance schema tables, the new data dictionary, etc.
Another thing to notice is that, because of the latest security patches, you have to enable explicitly the
secure_file_priv in order to use the
LOAD DATA query. You probably do not want to do that on a production server, more details about the issue on CVE-2016-6662 and the other recently related issues.
Default config testing
While testing with the default config is the #1 sin one could do while doing benchmarking (because you are testing the vendor defaults, not a real performance comparison), my test is silly and simple enough for this to be interesting, and spending some time there. Long gone are the times in which MySQL’s defaults were horrible, and Oracle seems very focused lately on trying to setup good defaults that would be useful for the majority of people. But let’s also be honest, these options (together with what packagers setup) are also the ones that 90% of the mysql users out there will use because they are not running dedicated services that require a lot of tuning nor have a dedicated DBA to do it.
So I went and setup mysql with just a bunch of administrative options. In particular I would like to note that by default:
- No binary log enabled in all versions
- Default buffer pool size is 134217728 for all versions
innodb_log_file_sizedefaults to 50MB
[client] port = 3306 socket = /tmp/mysql.sock [mysqld] # administrative options user = mysql socket = /tmp/mysql.sock port = 3306 datadir = /srv/sqldata basedir = /opt/mysql # mysql is a symbolic link to each basedir tmpdir = /srv/tmp # prevent imports/exports to arbitrary dirs secure_file_priv = /tmp
Some interesting config differences between 5.6 and MariaDB 10.0 to have into account: performance schema is disabled on MariaDB, and many buffers and caches have different size
< # This is MariaDB 10.0 > # This is MySQL 5.6 25,26c5,7 < back_log 150 --- > back_log 80 61a43 > default_tmp_storage_engine InnoDB 67a51,53 > end_markers_in_json OFF > enforce_gtid_consistency OFF > eq_range_index_dive_limit 10 101,103c82 < host_cache_size 128 --- > host_cache_size 279 208c170 < innodb_open_files 400 --- > innodb_open_files 2000 259,261c210 < join_buffer_size 131072 < join_buffer_space_limit 2097152 --- > join_buffer_size 262144 263c212 < key_buffer_size 134217728 --- > key_buffer_size 8388608 292a243 > master_info_repository FILE 294c245 < max_allowed_packet 1048576 --- > max_allowed_packet 4194304 319d268 < mrr_buffer_size 262144 341,353c289,304 < optimizer_selectivity_sampling_limit 100 < optimizer_switch index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on < optimizer_use_condition_selectivity 1 < performance_schema OFF --- > optimizer_switch index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,engine_condition_pushdown=on,index_condition_pushdown=on,mrr=on,mrr_cost_based=on,block_nested_loop=on,batched_key_access=off,materialization=on,semijoin=on,loosescan=on,firstmatch=on,subquery_materialization_cost_based=on,use_index_extensions=on > performance_schema ON 388,390c337,338 < query_cache_size 0 < query_cache_strip_comments OFF < query_cache_type ON --- > query_cache_size 1048576 > query_cache_type OFF 399a349 > relay_log_info_repository FILE 415,417c357,358 < secure_auth OFF --- > secure_auth ON 423a371,373 > slave_allow_batching OFF > slave_checkpoint_group 512 > slave_checkpoint_period 300 425,426d374 < slave_ddl_exec_mode IDEMPOTENT < slave_domain_parallel_threads 0 431,432c379,381 < slave_parallel_max_queued 131072 < slave_parallel_threads 0 --- > slave_parallel_workers 0 > slave_pending_jobs_size_max 16777216 441c390 < sort_buffer_size 2097152 --- > sort_buffer_size 262144 447c396 < sql_mode --- > sql_mode STRICT_TRANS_TABLES,NO_ENGINE_SUBSTITUTION 465,467c414,416 < sync_master_info 0 < sync_relay_log 0 < sync_relay_log_info 0 --- > sync_master_info 10000 > sync_relay_log 10000 > sync_relay_log_info 10000 469,471c418,421 < table_definition_cache 400 < table_open_cache 400 < thread_cache_size 0 --- > table_definition_cache 1400 > table_open_cache 2000 > table_open_cache_instances 1 > thread_cache_size 9 474,479c424 < thread_pool_size 8 < thread_pool_stall_limit 500 < thread_stack 294912 --- > thread_stack 262144
Some interesting config differences between MySQL 5.7 and MySQL 8.0: Innodb removed innodb_file_format and innodb_large_prefix (deprecated in 5.7), only allowing Antilope format for backwards compatibility ( thanks Morgo and Jörg Brühe for the correction). Also, some extra variables related to the new data dictionary.
< # This is MySQL 5.7 > # This is MySQL 8.0 93c93 < ignore_db_dirs --- > information_schema_stats CACHED 122d121 < innodb_checksums ON 136,138d134 < innodb_file_format Barracuda < innodb_file_format_check ON < innodb_file_format_max Barracuda 143c139 < innodb_flush_method --- > innodb_flush_method fsync 163d158 < innodb_large_prefix ON 165d159 < innodb_locks_unsafe_for_binlog OFF 208d201 < innodb_stats_sample_pages 8 213d205 < innodb_support_xa ON 331a324 > performance_schema_error_size 1043 348c341 < performance_schema_max_memory_classes 320 --- > performance_schema_max_memory_classes 350 354c347 < performance_schema_max_rwlock_classes 40 --- > performance_schema_max_rwlock_classes 50 360c353 < performance_schema_max_statement_classes 193 --- > performance_schema_max_statement_classes 201 370a364 > persisted_globals_load ON 404a399 > schema_definition_cache 256 465a461 > stored_program_definition_cache 256 468d463 < sync_frm ON 475a471 > tablespace_definition_cache 256
Some interesting config differences between MariaDB 10.1 and MariaDB 10.2: some extra features integrated/gone compatible from the xtradb engine and more recent versions of MySQL; enabling the buffer pool dump by default, also setting Barracuda as the default file format, sql strict mode, new checksum algorithms, deleting deprecated variables, option for CTEs, and server_id now defaults to 1.
< # This is MariaDB 10.1 > # This is MariaDB 10.2 16c16 < aria_recover NORMAL --- > aria_recover_options NORMAL 31c31 < binlog_checksum NONE --- > binlog_checksum CRC32 47c47,48 --- > check_constraint_checks ON 121c122 < innodb_adaptive_hash_index_partitions 1 --- > innodb_adaptive_hash_index_parts 8 123d123 < innodb_additional_mem_pool_size 8388608 136c136,137 < innodb_buffer_pool_dump_at_shutdown OFF --- > innodb_buffer_pool_chunk_size 134217728 > innodb_buffer_pool_dump_at_shutdown ON 138c139 < innodb_buffer_pool_dump_pct 100 --- > innodb_buffer_pool_dump_pct 25 140c141 < innodb_buffer_pool_instances 8 --- > innodb_buffer_pool_instances 1 142c143 < innodb_buffer_pool_load_at_startup OFF --- > innodb_buffer_pool_load_at_startup ON 144d144 < innodb_buffer_pool_populate OFF 148c148 < innodb_checksum_algorithm INNODB --- > innodb_checksum_algorithm crc32 150d149 < innodb_cleaner_lsn_age_factor HIGH_CHECKPOINT 158d156 < innodb_corrupt_table_action assert 161a160 > innodb_default_row_format dynamic 171d169 < innodb_empty_free_list_algorithm BACKOFF 177d174 < innodb_fake_changes OFF 180c177 < innodb_file_format Antelope --- > innodb_file_format Barracuda 182c179 < innodb_file_format_max Antelope --- > innodb_file_format_max Barracuda 183a181 > innodb_fill_factor 100 187a186 > innodb_flush_sync ON 192d190 < innodb_foreground_preflush EXPONENTIAL_BACKOFF 210,212c208 < innodb_kill_idle_transaction 0 < innodb_large_prefix OFF < innodb_lock_schedule_algorithm fcfs --- > innodb_large_prefix ON 214d209 < innodb_locking_fake_changes ON 221c212 < innodb_log_checksum_algorithm INNODB --- > innodb_log_checksums ON 225a217 > innodb_log_write_ahead_size 8192 227,228d218 < innodb_max_bitmap_file_size 104857600 < innodb_max_changed_pages 1000000 230c220 < innodb_max_dirty_pages_pct_lwm 0.001000 --- > innodb_max_dirty_pages_pct_lwm 0.000000 233c223 < innodb_mirrored_log_groups 1 --- > innodb_max_undo_log_size 1073741824 243a234 > innodb_page_cleaners 1 248c239,240 < innodb_purge_threads 1 --- > innodb_purge_rseg_truncate_frequency 128 > innodb_purge_threads 4 256d247 < innodb_sched_priority_cleaner 19 259,261d249 < innodb_show_locks_held 10 < innodb_show_verbose_locks 0 < innodb_simulate_comp_failures 0 275c263 < innodb_strict_mode OFF --- > innodb_strict_mode ON 279a268 > innodb_temp_data_file_path ibtmp1:12M:autoextend 283,284c272,273 < innodb_track_changed_pages OFF < innodb_undo_directory . --- > innodb_undo_directory ./ > innodb_undo_log_truncate OFF 289d277 < innodb_use_global_flush_log_at_trx_commit ON 292,293d279 < innodb_use_stacktrace OFF < innodb_use_sys_malloc ON 352a339 > max_recursive_iterations 4294967295 388c375 < optimizer_switch index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=off --- > optimizer_switch index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on 413c400 < performance_schema_max_statement_classes 178 --- > performance_schema_max_statement_classes 184 465c452,456 < server_id 0 --- > server_id 1 479a471 > slave_parallel_workers 0 508a501 > standards_compliant_cte ON 520c513,514 < thread_cache_size 0 --- > table_open_cache_instances 8 > thread_cache_size 151 525a520,521 > thread_pool_prio_kickup_timer 1000 > thread_pool_priority auto 528c524 < thread_stack 295936 --- > thread_stack 297984
These are the results I got:
|avg run (seconds)||193.378||215.2826667||186.081||194.032||195.0173333||198.0726667|
|insert throughput compared to 5.6||100.00%||89.83%||103.92%||99.66%||99.16%||97.63%|
Nothing surprising here, for MySQL, the 5.7 regression still shows, now only around 10%; not sure if because of the work done already to mitigate it on BUG#75981, or just the different hardware I used compared to my last test. The good news is that the regressions seems solved for 8.0, getting even better results than 5.6. We will see if that maintains when we normalize the configuration options.
Regarding MariaDB, 10.0 shows a similar performance to 5.6 (probably within the range of potential measurement errors). What it is noticeable is a very small degradation for 10.1 and 10.2. Nothing that worries me, normally more features create a small overhead (the same has happened in MySQL in the past) which is in most cases worth paying, and many times not even noticeable under normal load. There is also still time for 10.2 to optimize potential problems.
Let’s get a bit more serious, let’s test a configuration used in real life an let’s uniformize configurations. Wikimedia core production databases use this configuration. There are several things, however, I have to comment that will affect this test:
- The tests here were not done on the same production machines, and some of the configuration has been heavily tuned for large amounts of memory available and hardware RAID controller, not available for the test
- Some of the configuration options are MariaDB 10-dependent, with our build. They do not work on MySQL, or other versions of MariaDB- in those cases those options were disabled. Some of them work and were kept, but they may have a negative impact in this context.
- It is obvioulsy not optimized for the specific LOAD DATA use case
- WMF configuration is MariaDB 10-focused, so I expect MySQL server to not be properly tuned for it.
- SSL was not used, probably not interesting for a single connection
- Binary collation is used, as I commented before, it is the chosen charset/collation for mediawiki in WMF servers. As 8.0.0-dmr had the bug I mentioned not allowing that collation, I had to use the default collation there. Beware, that could skew its results
This is the configuration used for this test:
[client] port = 3306 socket = /tmp/mysql.sock [mysqld] # administrative options user = mysql socket = /tmp/mysql.sock port = 3306 # MariaDB only #extra-port = 3307 datadir = /srv/sqldata basedir = /opt/mysql tmpdir = /srv/tmp server_id = 1 skip-external-locking skip-name-resolve temp-pool # Always start in read_only mode. # Master selection will be handled by orchestration. read_only = 1 # prevent imports/exports to arbitrary dirs secure_file_priv = /tmp # thread and connection handling # Mariadb only #thread_handling = pool-of-threads # Mariadb only #thread_pool_stall_limit = 100 # Mariadb only #thread_pool_size = 32 # Mariadb only #thread_pool_max_threads = 2000 max_connections = 10000 back_log = 500 # MariaDB only # extra_max_connections = 10 max_connect_errors = 1000000000 max_allowed_packet = 32M connect_timeout = 3 query_cache_size = 0 query_cache_type = 0 event_scheduler = 1 log-warnings = 0 thread_stack = 192K thread_cache_size = 300 interactive_timeout = 28800 wait_timeout = 3600 transaction-isolation = REPEATABLE-READ # binary log an replication options log-bin log-slave-updates sync-binlog = 1 binlog-cache-size = 1M max-binlog_size = 1000M binlog-format = STATEMENT expire-logs-days = 7 slave-transaction-retries = 4294967295 # table limits tmp-table-size = 64M max-heap-table-size = 64M table-open-cache = 50000 table-definition-cache = 40000 open-files-limit = 200000 # optimizer options # MariaDB only #sql-mode = IGNORE_BAD_TABLE_OPTIONS optimizer_switch = 'mrr=on,mrr_cost_based=on' # MariaDB only # ,mrr_sort_keys=on,optimize_join_buffer_size=on' # MariaDB only #use-stat-tables = preferably # charsets and collations character_set_server = binary character_set_filesystem = binary collation_server = binary # InnoDB options default-storage-engine = InnoDB innodb_file_per_table = 1 innodb_buffer_pool_size = 12G innodb_log_file_size = 2G innodb_flush_log_at_trx_commit = 1 innodb_flush_method = O_DIRECT innodb_thread_concurrency = 0 innodb_io_capacity = 1000 # MariaDB only # innodb_stats_sample_pages = 16 innodb_stats_method = nulls_unequal # MariaDB only #aria_pagecache_buffer_size = 1G # MariaDB only #join_cache_level = 8 # dump and load innodb buffer at start and stop innodb_buffer_pool_load_at_startup = 1 innodb_buffer_pool_dump_at_shutdown = 1 # profiling and metrics # Enabling performance_schema (disabled by default in MariaDB10) performance_schema = 1 # downsizing performance schema memory usage: T99485 performance_schema_max_thread_instances = 500 performance_schema_max_cond_instances = 1000 performance_schema_accounts_size = 300 performance_schema_hosts_size = 300 performance_schema_events_statements_history_size = 10 performance_schema_events_statements_history_long_size = 1000 performance_schema_events_waits_history_size = 10 performance_schema_events_waits_history_long_size = 1000 performance_schema_events_stages_history_size = 10 performance_schema_events_stages_history_long_size = 1000 performance_schema_max_mutex_instances = 5000 performance_schema_max_rwlock_instances = 2000 performance_schema_max_socket_instances = 500 performance_schema_max_table_instances = 1000 [mysqldump] quick max_allowed_packet = 32M
As you can see, the biggest changes were:
- Buffer pool increased to 12GB, so that the whole file can fit into memory
- Log file size increased to 2GB, so also that the whole file can fit into the transaction log
- Performance schema enabled by default, this is very important for our query monitoring
- No pool of connections (not available on MySQL)
- Very large open_file_limits and max_connections, which requires downsizing the P_S parameters. (Update: I am being told that this is not necessary on 5.7 and 8.0, as it auto-resizes. I still tune it as I have to do it for MariaDB 10.0)
These are the results:
|avg (seconds)||212.501||234.3543333||220.1906667 (*)||185.5396667||187.598||210.6946667|
|median (seconds)||212.48||234.608||220.123 (*)||188.16||191.393||209.287|
|rows inserted/s||219957.2049||199446.391||212275.6914 (*)||251919.8554||249155.7799||221842.9481|
|insert throughput compared to 5.6||100.00%||90.68%||96.51% (*)||114.53%||113.27%||100.86%|
(*) Using UTF-8 collation because an 8.0.0 bug, which may impact the results
Again, if you prefer some graph bars:
The main conclusions that I can get from here is that my current performance configuration is not MySQL or MariaDB 10.2-ready. While in the case of 10.0 and 10.1, the performance has gotten around 11% better, for MariaDB 10.2 and MySQL, it has gotten worse, not better.
One thing that is not on the above summary is that MariaDB’s results, unlike MySQLs, even if better on average, have a higher amount of execution time variability. All test done on the 3 MySQL versions ended up within a second from each other. MariaDB results end up in a 10-second range. Throughput is not the only variable to take into account when doing performance testing: stability of timings could be important in many cases to provide a stable response time.
The other thing to comment, do these results show a more important regression for 10.2? Let’s do another test to confirm it.
Let’s try to test a configuration that optimizes for LOAD DATA speed, without regards for security (disabling features that you normally wouldn’t want to do in production: double write buffer, innodb checksums, …
This was the configuration used:
[client] port = 3306 socket = /tmp/mysql.sock [mysqld] # administrative options user = mysql socket = /tmp/mysql.sock port = 3306 datadir = /srv/sqldata basedir = /opt/mysql tmpdir = /srv/tmp #binlog disabled # prevent imports/exports to arbitrary dirs secure_file_priv = /tmp max_connections = 500 query_cache_size = 0 query_cache_type = 0 # optimizer options # MariaDB only sql-mode = '' # charsets and collations #character_set_server = binary #character_set_filesystem = binary #collation_server = binary # InnoDB options default-storage-engine = InnoDB innodb_file_per_table = 1 innodb_buffer_pool_size = 12G innodb_log_file_size = 2G innodb_flush_log_at_trx_commit = 1 # profiling and metrics performance_schema = 0 skip-innodb-doublewrite innodb_flush_neighbors = 0 innodb_checksum_algorithm = none
And again, the results:
|insert throughput compared to 5.6||100.00%||87.20%||84.58%||98.86%||96.00%||93.80%|
The trend is similar to the previous results, with a 20-25% improvement in performance compared to the last test case. Except one version-8.0 seems to perform worse than on the previous configuration. This doesn’t make sense to me, but the results, like in the previous case, are consistent and repeatable. Is there a regression hidden here?
Regarding MariaDB, this configuration seems more stable (load-wise) for it, with less variability between runs. Still a downwards trend for newer versions.
One more result -SELECT count(*)
While checking that all the records were being inserted correctly, and no rows were lost in the process, I run into a curious case of different count + full scan speeds. When doing:
SELECT count(*) FROM nodes;
I got the following timings on console:
|run time (seconds)||8.31 sec||4.81 sec||5.02 sec||1 min 47.88 sec||2 min 5.08 sec||6.61 sec|
For those still obsessed on getting a nice bar chart every time:
I knew there were some optimizations on
SELECT count(*) that made that faster for InnoDB (not instant, like MyISAM), but I didn’t know it was enabled as early as 5.6. It seems MariaDB only merged that feature (or a similar one doing the same) starting with 10.2.
Providing data without an explanation is a bad practice, however, I believe this will help as a starting point to debug (by me, or any others that would want to reproduce the results- I think I have given enough information to do that) and see what code points are blocking this memory-only, 1-thread concurrency specific test. Again, remember this is a very specific test and not a proper benchmark to base decisions on. The intended goals which were: 1) identify regressions 2) identify bugs 3) get a first quick look at the options changed and 4) start working with the newest versions, all were accomplished.
As a followup, I would like to explore more in depth those potentially observed regressions on 8.0 and 10.2, and see if it is just a case of adapting the configuration or if there is an underlying cause slowing them down. If they happened to be bugs, there should be plenty of time to fix those before a stable release from both vendors. The other thing that I would like to analyze next is this very same load type, but with innodb compression, and compare both the performance and the resulting file sizes.
I hope this has been helpful for someone else other than me, too. I have probably made some mistakes along the way, please tell me so on a comment or @jynus.
4 thoughts on “Testing (again) LOAD DATA on MySQL 5.6, 5.7, 8.0 (non-GA) and MariaDB 10.0, 10.1 and 10.2 (non-GA)”
Nice article. It would be nice to attach a slave to each server and let the LOAD DATA replicate to see how the replication thread handles it so we can see if there is any regression or change in that sense.
This was fun to read. Thanks for putting so much effort into writing it.
What I missed, dearly: In your “diff” output of configuration variables, there is no indication which is which. Readers might do the wrong association.
Thank you, for the suggestion. Indeed it was confusing, so much that I missinterpreted the 5.7 and 8.0 results. I have clarified that and added some headers on the diff to make them clearer. When 8.0 and 10.2 are GA I may add a more in depth comparions than just a diff.
Comments are closed.