load data testingI’ve been lately compiling and generating .deb packages for several MySQL and MariaDB recent versions, and I wanted to try them more in depth -specially MySQL 8.0 and MariaDB 10.2, both of which are still in development.

Several people have already given their first impressions (or will do soon), and testing early is the best way to catch bugs and regressions, and get them fixed before the official release. In fact, as I will comment later, I ran into breaking bugs on both MySQL 8.0 and MariaDB 10.2, which I immediately reported or joined others on the bug report.

Last time I did a similar test, I found an important 30% regression in single-thread performance between 5.6 and 5.7, and I wasn’t the only one reporting single-thread performance concerns. This is why I included 3 generations of major database versions, the latest minor release -as of December 2016- of the old but stable 5.6/10.0; the latest GA versions 5.7/10.1 and the next, in-development versions 8.0/10.2 (exact versions shown bellow).

Very recently, there was a call for better MySQL benchmarks, and I cannot agree more with the general idea. However, I have to make some disclaimers: the following are “tests results”, closer to regression testing than to benchmarking. Benchmarking is hard, and I do not have the time or the resources to do them properly and extensively. What I wanted to do is to do a very specific test of a very specific operation (LOAD DATA in a single thread) under very specific circumstances/configuration [in some cases, bad configuration practices] to see if a previously-occurred problem was still there. This is not a mere thought experiment, it will help me tune better the import/backup recovery process, and did allow me get familiarized with the newest versions’ particular idiosyncrasies. Clearly, as you will discover, I do not yet know how to tune the latest unreleased versions, (who knows at this point), so join me on the discovery process.

Secondly, the test I will be doing (LOAD DATA) did not include secondary indexes or JOINs. Again, this will test import times, but not other more common operations like point selects, updates and range selects. Thirdly, I am comparing both Generally Available (stable) versions and versions still in development. The latter can change a lot between now and the release date. I repeat again: This will be in no way representative of the overall performance of MySQL or MariaDB versions. If you get away with the idea that “X is better than Y” based on a chart shown here, you will probably be very wrong. You have been warned.

I hope, however, that some of these results will be helpful to some fellow DBAs and MySQL/MariaDB users and develpers, and that is why I am sharing them early. You may also help me explain some of the results I am getting, which I may not 100% understand yet.

The setup

Server versions:

  • Oracle MySQL 5.6.34
  • Oracle MySQL 5.7.16
  • Oracle MySQL 8.0.0-dmr (not “generally available”)
  • MariaDB Server 10.0.28
  • MariaDB Server 10.1.19
  • MariaDB Server 10.2.2 (non “generally available”)

All versions were compiled from the source code downloaded from its official website with the recommended options (-DBUILD_CONFIG=mysql_release).

Hardware (desktop grade- no Xeon or a proper RAID setup):

  • Intel(R) Core(TM) i7-4770K CPU @ 3.50GHz (x86_64 Quad-core with hyperthreading)
  • 16 GB of RAM
  • Single, dektop-grade, Samsung SSD 850 PRO 512GB

OS and configuration:

  • Debian GNU/Linux 8.6 “Jessie”
  • datadir formatted as xfs, mounted with noatime option, all on top of LVM

Several server configurations where used, commented on each individual test.

Dataset:

  • I used again the nodes.csv I used last time, a 3,700,635,579-byte text (tab-separated) file containing 46,741,126 lines (rows) with an old OSM node data dump for Spain:
171773  38.6048402      -0.0489871      4       2012-08-25 00:37:46     12850816        472193  rubensd
171774  38.6061981      -0.0496867      2       2008-01-19 10:23:21     666916  9250    j3m
171775  38.6067166      -0.0498342      2       2008-01-19 10:23:21     666916  9250    j3m
171776  38.6028122      -0.0497136      5       2012-08-25 00:26:40     12850816        472193  rubensd
171933  40.4200658      -3.7016652      6       2011-11-29 05:37:42     9984625 33103   sergionaranja

Those rows will be loaded using the load_data_04.py script I used last time, which basically creates a table like this:

CREATE TABLE `nodes` (
  `id` bigint PRIMARY KEY,
  `lat` decimal(9,7),
  `lon` decimal(10,7),
  `version` int,
  `timestamp` timestamp,
  `changeset` bigint,
  `uid` bigint,
  `user` varchar(255)
);

And then executes LOAD DATA, then commits:

# finishing
if (options.engine == 'MyISAM'):
    cursor.execute('FLUSH TABLES')
else:
    db.commit()
 
cursor.close()
db.close()

General impressions and breaking bugs found

Bad bug :-(
Setting up the server for the first time in MySQL and MariaDB has now differed. While MariaDB continues using the mysql_install_db script, MySQL, since 5.7, uses the mysqld server binary directly. Also, while I have to thank Oracle for focusing on security, if you are going to setup a server just for pure testing, the default account creation options can be a bit annoying. I have to say thanks because there is an additional `–initialize-insecure` which simplifies the setup for one-time setups like this benchmark or a jenkins test, and that you probably should not be using on a proper production. Maybe it could be helpful for automatic deploys, where accounts are taken care by a script.

For MySQL, I found a problem in 8.0.0 in which mysqld --initialize did not work with binary collations (neither it started if I did the initialization with the default collation, and then started the server with binary as the default encoding). The error I got was:

mysqld: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ‘binary’ at line 1
[ERROR] Data Dictionary initialization failed.

This was clearly caused by the latest feature, the new InnoDB data dictionary. I reported this as Bug #83706 and apparently it will be fixed for 8.0.1.

Binary collation is very important for Wikipedias, because some time ago (and even recently, with the latest extensions to the standard) full Unicode support was difficult to get -and we need to support 300 languages. Also, we have certain needs like custom collations support (e.g. several collations on the same table, numeric sorting). As a consequence of that, most text data is now stored in raw binary. For these tests, I had to disable the binary character for some tests for 8.0.0, which I have appropriately noted it below.

For MariaDB, I ran into an issue that had already been reported (MDEV-10540), in which mysql_install_db got hanged if the log_bin option was enabled. While this bug could be workarounded by doing so just for the installation, the option is so common and important that I think the issue should be solved as soon as possible.

I also run into MDEV-1124, MariaDB Server promoting an external business, a mistake which hopefully will be fixed by the release of 10.1.20.

In order to perform the tests, I had to destroy and recreate the datadir, starting from scratch on each run (rm + --initialize). This is also a testimony that you no longer have binary compatibility between server vendors, due to the differences on the mysql schema, performance schema tables, the new data dictionary, etc.

Another thing to notice is that, because of the latest security patches, you have to enable explicitly the secure_file_priv in order to use the LOAD DATA query. You probably do not want to do that on a production server, more details about the issue on CVE-2016-6662 and the other recently related issues.

Default config testing

While testing with the default config is the #1 sin one could do while doing benchmarking (because you are testing the vendor defaults, not a real performance comparison), my test is silly and simple enough for this to be interesting, and spending some time there. Long gone are the times in which MySQL’s defaults were horrible, and Oracle seems very focused lately on trying to setup good defaults that would be useful for the majority of people. But let’s also be honest, these options (together with what packagers setup) are also the ones that 90% of the mysql users out there will use because they are not running dedicated services that require a lot of tuning nor have a dedicated DBA to do it.

So I went and setup mysql with just a bunch of administrative options. In particular I would like to note that by default:

  • No binary log enabled in all versions
  • Default buffer pool size is 134217728 for all versions
  • innodb_log_file_size defaults to 50MB
[client]
port   = 3306
socket = /tmp/mysql.sock

[mysqld]

# administrative options
user       = mysql
socket     = /tmp/mysql.sock
port       = 3306
datadir    = /srv/sqldata
basedir    = /opt/mysql # mysql is a symbolic link to each basedir
tmpdir     = /srv/tmp
# prevent imports/exports to arbitrary dirs
secure_file_priv               = /tmp

Some interesting config differences between 5.6 and MariaDB 10.0 to have into account: performance schema is disabled on MariaDB, and many buffers and caches have different size

< # This is MariaDB 10.0
> # This is MySQL 5.6
25,26c5,7
< back_log	150
---
> back_log	80
61a43
> default_tmp_storage_engine	InnoDB
67a51,53
> end_markers_in_json	OFF
> enforce_gtid_consistency	OFF
> eq_range_index_dive_limit	10
101,103c82
< host_cache_size	128
---
> host_cache_size	279
208c170
< innodb_open_files	400
---
> innodb_open_files	2000
259,261c210
< join_buffer_size	131072
< join_buffer_space_limit	2097152
---
> join_buffer_size	262144
263c212
< key_buffer_size	134217728
---
> key_buffer_size	8388608
292a243
> master_info_repository	FILE
294c245
< max_allowed_packet	1048576
---
> max_allowed_packet	4194304
319d268
< mrr_buffer_size	262144
341,353c289,304
< optimizer_selectivity_sampling_limit	100
< optimizer_switch	index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on
< optimizer_use_condition_selectivity	1
< performance_schema	OFF
---
> optimizer_switch	index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,engine_condition_pushdown=on,index_condition_pushdown=on,mrr=on,mrr_cost_based=on,block_nested_loop=on,batched_key_access=off,materialization=on,semijoin=on,loosescan=on,firstmatch=on,subquery_materialization_cost_based=on,use_index_extensions=on
> performance_schema	ON
388,390c337,338
< query_cache_size	0
< query_cache_strip_comments	OFF
< query_cache_type	ON
---
> query_cache_size	1048576
> query_cache_type	OFF
399a349
> relay_log_info_repository	FILE
415,417c357,358
< secure_auth	OFF
---
> secure_auth	ON
423a371,373
> slave_allow_batching	OFF
> slave_checkpoint_group	512
> slave_checkpoint_period	300
425,426d374
< slave_ddl_exec_mode	IDEMPOTENT
< slave_domain_parallel_threads	0
431,432c379,381
< slave_parallel_max_queued	131072
< slave_parallel_threads	0
---
> slave_parallel_workers	0
> slave_pending_jobs_size_max	16777216
441c390
< sort_buffer_size	2097152
---
> sort_buffer_size	262144
447c396
< sql_mode	
---
> sql_mode	STRICT_TRANS_TABLES,NO_ENGINE_SUBSTITUTION
465,467c414,416
< sync_master_info	0
< sync_relay_log	0
< sync_relay_log_info	0
---
> sync_master_info	10000
> sync_relay_log	10000
> sync_relay_log_info	10000
469,471c418,421
< table_definition_cache	400
< table_open_cache	400
< thread_cache_size	0
---
> table_definition_cache	1400
> table_open_cache	2000
> table_open_cache_instances	1
> thread_cache_size	9
474,479c424
< thread_pool_size	8
< thread_pool_stall_limit	500
< thread_stack	294912
---
> thread_stack	262144

Some interesting config differences between MySQL 5.7 and MySQL 8.0: Innodb removed innodb_file_format and innodb_large_prefix (deprecated in 5.7), only allowing Antilope format for backwards compatibility ( thanks Morgo and Jörg Brühe for the correction). Also, some extra variables related to the new data dictionary.

< # This is MySQL 5.7
> # This is MySQL 8.0
93c93
< ignore_db_dirs	
---
> information_schema_stats	CACHED
122d121
< innodb_checksums	ON
136,138d134
< innodb_file_format	Barracuda
< innodb_file_format_check	ON
< innodb_file_format_max	Barracuda
143c139
< innodb_flush_method	
---
> innodb_flush_method	fsync
163d158
< innodb_large_prefix	ON
165d159
< innodb_locks_unsafe_for_binlog	OFF
208d201
< innodb_stats_sample_pages	8
213d205
< innodb_support_xa	ON
331a324
> performance_schema_error_size	1043
348c341
< performance_schema_max_memory_classes	320
---
> performance_schema_max_memory_classes	350
354c347
< performance_schema_max_rwlock_classes	40
---
> performance_schema_max_rwlock_classes	50
360c353
< performance_schema_max_statement_classes	193
---
> performance_schema_max_statement_classes	201
370a364
> persisted_globals_load	ON
404a399
> schema_definition_cache	256
465a461
> stored_program_definition_cache	256
468d463
< sync_frm	ON
475a471
> tablespace_definition_cache	256

Some interesting config differences between MariaDB 10.1 and MariaDB 10.2: some extra features integrated/gone compatible from the xtradb engine and more recent versions of MySQL; enabling the buffer pool dump by default, also setting Barracuda as the default file format, sql strict mode, new checksum algorithms, deleting deprecated variables, option for CTEs, and server_id now defaults to 1.

< # This is MariaDB 10.1
> # This is MariaDB 10.2
16c16
< aria_recover	NORMAL
---
> aria_recover_options	NORMAL
31c31
< binlog_checksum	NONE
---
> binlog_checksum	CRC32
47c47,48
---
> check_constraint_checks	ON
121c122
< innodb_adaptive_hash_index_partitions	1
---
> innodb_adaptive_hash_index_parts	8
123d123
< innodb_additional_mem_pool_size	8388608
136c136,137
< innodb_buffer_pool_dump_at_shutdown	OFF
---
> innodb_buffer_pool_chunk_size	134217728
> innodb_buffer_pool_dump_at_shutdown	ON
138c139
< innodb_buffer_pool_dump_pct	100
---
> innodb_buffer_pool_dump_pct	25
140c141
< innodb_buffer_pool_instances	8
---
> innodb_buffer_pool_instances	1
142c143
< innodb_buffer_pool_load_at_startup	OFF
---
> innodb_buffer_pool_load_at_startup	ON
144d144
< innodb_buffer_pool_populate	OFF
148c148
< innodb_checksum_algorithm	INNODB
---
> innodb_checksum_algorithm	crc32
150d149
< innodb_cleaner_lsn_age_factor	HIGH_CHECKPOINT
158d156
< innodb_corrupt_table_action	assert
161a160
> innodb_default_row_format	dynamic
171d169
< innodb_empty_free_list_algorithm	BACKOFF
177d174
< innodb_fake_changes	OFF
180c177
< innodb_file_format	Antelope
---
> innodb_file_format	Barracuda
182c179
< innodb_file_format_max	Antelope
---
> innodb_file_format_max	Barracuda
183a181
> innodb_fill_factor	100
187a186
> innodb_flush_sync	ON
192d190
< innodb_foreground_preflush	EXPONENTIAL_BACKOFF
210,212c208
< innodb_kill_idle_transaction	0
< innodb_large_prefix	OFF
< innodb_lock_schedule_algorithm	fcfs
---
> innodb_large_prefix	ON
214d209
< innodb_locking_fake_changes	ON
221c212
< innodb_log_checksum_algorithm	INNODB
---
> innodb_log_checksums	ON
225a217
> innodb_log_write_ahead_size	8192
227,228d218
< innodb_max_bitmap_file_size	104857600
< innodb_max_changed_pages	1000000
230c220
< innodb_max_dirty_pages_pct_lwm	0.001000
---
> innodb_max_dirty_pages_pct_lwm	0.000000
233c223
< innodb_mirrored_log_groups	1
---
> innodb_max_undo_log_size	1073741824
243a234
> innodb_page_cleaners	1
248c239,240
< innodb_purge_threads	1
---
> innodb_purge_rseg_truncate_frequency	128
> innodb_purge_threads	4
256d247
< innodb_sched_priority_cleaner	19
259,261d249
< innodb_show_locks_held	10
< innodb_show_verbose_locks	0
< innodb_simulate_comp_failures	0
275c263
< innodb_strict_mode	OFF
---
> innodb_strict_mode	ON
279a268
> innodb_temp_data_file_path	ibtmp1:12M:autoextend
283,284c272,273
< innodb_track_changed_pages	OFF
< innodb_undo_directory	.
---
> innodb_undo_directory	./
> innodb_undo_log_truncate	OFF
289d277
< innodb_use_global_flush_log_at_trx_commit	ON
292,293d279
< innodb_use_stacktrace	OFF
< innodb_use_sys_malloc	ON
352a339
> max_recursive_iterations	4294967295
388c375
< optimizer_switch	index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=off
---
> optimizer_switch	index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on
413c400
< performance_schema_max_statement_classes	178
---
> performance_schema_max_statement_classes	184
465c452,456
< server_id	0
---
> server_id	1
479a471
> slave_parallel_workers	0
508a501
> standards_compliant_cte	ON
520c513,514
< thread_cache_size	0
---
> table_open_cache_instances	8
> thread_cache_size	151
525a520,521
> thread_pool_prio_kickup_timer	1000
> thread_pool_priority	auto
528c524
< thread_stack	295936
---
> thread_stack	297984

These are the results I got:

5.6.34 5.7.16 8.0.0-dmr 10.0.28 10.1.18 10.2.2
avg run (seconds) 193.378 215.2826667 186.081 194.032 195.0173333 198.0726667
median (seconds) 194.017 215.425 187.855 197.746 199.459 197.354
rows inserted/s 241708.6018 217115.1385 251186.9885 240893.9041 239676.7774 235979.6876
insert throughput compared to 5.6 100.00% 89.83% 103.92% 99.66% 99.16% 97.63%

If you prefer a graph:
Throughput result comparison, in rows inserted/second, for the LOAD data test with default configuration

Nothing surprising here, for MySQL, the 5.7 regression still shows, now only around 10%; not sure if because of the work done already to mitigate it on BUG#75981, or just the different hardware I used compared to my last test. The good news is that the regressions seems solved for 8.0, getting even better results than 5.6. We will see if that maintains when we normalize the configuration options.

Regarding MariaDB, 10.0 shows a similar performance to 5.6 (probably within the range of potential measurement errors). What it is noticeable is a very small degradation for 10.1 and 10.2. Nothing that worries me, normally more features create a small overhead (the same has happened in MySQL in the past) which is in most cases worth paying, and many times not even noticeable under normal load. There is also still time for 10.2 to optimize potential problems.

WMF-like config

Let’s get a bit more serious, let’s test a configuration used in real life an let’s uniformize configurations. Wikimedia core production databases use this configuration. There are several things, however, I have to comment that will affect this test:

  • The tests here were not done on the same production machines, and some of the configuration has been heavily tuned for large amounts of memory available and hardware RAID controller, not available for the test
  • Some of the configuration options are MariaDB 10-dependent, with our build. They do not work on MySQL, or other versions of MariaDB- in those cases those options were disabled. Some of them work and were kept, but they may have a negative impact in this context.
  • It is obvioulsy not optimized for the specific LOAD DATA use case
  • WMF configuration is MariaDB 10-focused, so I expect MySQL server to not be properly tuned for it.
  • SSL was not used, probably not interesting for a single connection
  • Binary collation is used, as I commented before, it is the chosen charset/collation for mediawiki in WMF servers. As 8.0.0-dmr had the bug I mentioned not allowing that collation, I had to use the default collation there. Beware, that could skew its results

This is the configuration used for this test:

[client]
port   = 3306
socket = /tmp/mysql.sock

[mysqld]

# administrative options
user       = mysql
socket     = /tmp/mysql.sock
port       = 3306
# MariaDB only
#extra-port = 3307
datadir    = /srv/sqldata
basedir    = /opt/mysql
tmpdir     = /srv/tmp
server_id  = 1

skip-external-locking
skip-name-resolve
temp-pool

# Always start in read_only mode.
# Master selection will be handled by orchestration.
read_only = 1

# prevent imports/exports to arbitrary dirs
secure_file_priv               = /tmp
# thread and connection handling
# Mariadb only
#thread_handling                = pool-of-threads
# Mariadb only
#thread_pool_stall_limit        = 100
# Mariadb only
#thread_pool_size               = 32
# Mariadb only
#thread_pool_max_threads        = 2000
max_connections                = 10000
back_log                       = 500
# MariaDB only
# extra_max_connections          = 10
max_connect_errors             = 1000000000
max_allowed_packet             = 32M
connect_timeout                = 3
query_cache_size               = 0
query_cache_type               = 0
event_scheduler                = 1
log-warnings                   = 0
thread_stack                   = 192K
thread_cache_size              = 300
interactive_timeout            = 28800
wait_timeout                   = 3600
transaction-isolation          = REPEATABLE-READ

# binary log an replication options
log-bin
log-slave-updates
sync-binlog                    = 1
binlog-cache-size              = 1M
max-binlog_size                = 1000M
binlog-format                  = STATEMENT
expire-logs-days               = 7
slave-transaction-retries      = 4294967295

# table limits
tmp-table-size                 = 64M
max-heap-table-size            = 64M
table-open-cache               = 50000
table-definition-cache         = 40000
open-files-limit               = 200000

# optimizer options
# MariaDB only
#sql-mode                       = IGNORE_BAD_TABLE_OPTIONS
optimizer_switch               = 'mrr=on,mrr_cost_based=on'
# MariaDB only
# ,mrr_sort_keys=on,optimize_join_buffer_size=on'
# MariaDB only
#use-stat-tables                = preferably

# charsets and collations
character_set_server           = binary
character_set_filesystem       = binary
collation_server               = binary

# InnoDB options
default-storage-engine         = InnoDB
innodb_file_per_table          = 1
innodb_buffer_pool_size        = 12G
innodb_log_file_size           = 2G
innodb_flush_log_at_trx_commit = 1
innodb_flush_method            = O_DIRECT
innodb_thread_concurrency      = 0
innodb_io_capacity             = 1000
# MariaDB only
# innodb_stats_sample_pages      = 16
innodb_stats_method            = nulls_unequal
# MariaDB only
#aria_pagecache_buffer_size     = 1G
# MariaDB only
#join_cache_level               = 8

# dump and load innodb buffer at start and stop
innodb_buffer_pool_load_at_startup  = 1
innodb_buffer_pool_dump_at_shutdown = 1

# profiling and metrics
# Enabling performance_schema (disabled by default in MariaDB10)
performance_schema                                     = 1
# downsizing performance schema memory usage: T99485
performance_schema_max_thread_instances                = 500
performance_schema_max_cond_instances                  = 1000
performance_schema_accounts_size                       = 300
performance_schema_hosts_size                          = 300
performance_schema_events_statements_history_size      = 10
performance_schema_events_statements_history_long_size = 1000
performance_schema_events_waits_history_size           = 10
performance_schema_events_waits_history_long_size      = 1000
performance_schema_events_stages_history_size          = 10
performance_schema_events_stages_history_long_size     = 1000
performance_schema_max_mutex_instances                 = 5000
performance_schema_max_rwlock_instances                = 2000
performance_schema_max_socket_instances                = 500
performance_schema_max_table_instances                 = 1000

[mysqldump]

quick
max_allowed_packet = 32M

As you can see, the biggest changes were:

  • Buffer pool increased to 12GB, so that the whole file can fit into memory
  • Log file size increased to 2GB, so also that the whole file can fit into the transaction log
  • Performance schema enabled by default, this is very important for our query monitoring
  • No pool of connections (not available on MySQL)
  • Very large open_file_limits and max_connections, which requires downsizing the P_S parameters. (Update: I am being told that this is not necessary on 5.7 and 8.0, as it auto-resizes. I still tune it as I have to do it for MariaDB 10.0)

These are the results:

5.6.34 5.7.16 8.0.0-dmr 10.0.28 10.1.18 10.2.2
avg (seconds) 212.501 234.3543333 220.1906667 (*) 185.5396667 187.598 210.6946667
median (seconds) 212.48 234.608 220.123 (*) 188.16 191.393 209.287
rows inserted/s 219957.2049 199446.391 212275.6914 (*) 251919.8554 249155.7799 221842.9481
insert throughput compared to 5.6 100.00% 90.68% 96.51% (*) 114.53% 113.27% 100.86%

(*) Using UTF-8 collation because an 8.0.0 bug, which may impact the results

Again, if you prefer some graph bars:
Througput comparison for the LOAD DATA test on a WMF-like configuration

The main conclusions that I can get from here is that my current performance configuration is not MySQL or MariaDB 10.2-ready. While in the case of 10.0 and 10.1, the performance has gotten around 11% better, for MariaDB 10.2 and MySQL, it has gotten worse, not better.

One thing that is not on the above summary is that MariaDB’s results, unlike MySQLs, even if better on average, have a higher amount of execution time variability. All test done on the 3 MySQL versions ended up within a second from each other. MariaDB results end up in a 10-second range. Throughput is not the only variable to take into account when doing performance testing: stability of timings could be important in many cases to provide a stable response time.

The other thing to comment, do these results show a more important regression for 10.2? Let’s do another test to confirm it.

Insecure config

Let’s try to test a configuration that optimizes for LOAD DATA speed, without regards for security (disabling features that you normally wouldn’t want to do in production: double write buffer, innodb checksums, …

This was the configuration used:

[client]
port   = 3306
socket = /tmp/mysql.sock

[mysqld]

# administrative options
user       = mysql
socket     = /tmp/mysql.sock
port       = 3306
datadir    = /srv/sqldata
basedir    = /opt/mysql
tmpdir     = /srv/tmp

#binlog disabled

# prevent imports/exports to arbitrary dirs
secure_file_priv               = /tmp
max_connections                = 500

query_cache_size               = 0
query_cache_type               = 0

# optimizer options
# MariaDB only
sql-mode                       = ''

# charsets and collations
#character_set_server           = binary
#character_set_filesystem       = binary
#collation_server               = binary

# InnoDB options
default-storage-engine         = InnoDB
innodb_file_per_table          = 1
innodb_buffer_pool_size        = 12G
innodb_log_file_size           = 2G
innodb_flush_log_at_trx_commit = 1

# profiling and metrics
performance_schema             = 0
skip-innodb-doublewrite
innodb_flush_neighbors         = 0
innodb_checksum_algorithm      = none

And again, the results:

5.6.34 5.7.16 8.0.0-dmr 10.0.28 10.1.18 10.2.2
avg (seconds) 146.038 167.482 172.653 147.721 152.1256667 155.683
median (seconds) 146.016 167.054 171.364 147.889 151.937 155.414
rows inserted/s 320061.395 279081.4894 270722.9298 316414.9038 307253.3848 300232.6908
insert throughput compared to 5.6 100.00% 87.20% 84.58% 98.86% 96.00% 93.80%

Graphically:
Throughput results in rows inserted per second for LOAD DATA test with the insecure configuration

The trend is similar to the previous results, with a 20-25% improvement in performance compared to the last test case. Except one version-8.0 seems to perform worse than on the previous configuration. This doesn’t make sense to me, but the results, like in the previous case, are consistent and repeatable. Is there a regression hidden here?

Regarding MariaDB, this configuration seems more stable (load-wise) for it, with less variability between runs. Still a downwards trend for newer versions.

One more result -SELECT count(*)

While checking that all the records were being inserted correctly, and no rows were lost in the process, I run into a curious case of different count + full scan speeds. When doing:

SELECT count(*) FROM nodes;

I got the following timings on console:

5.6.34 5.7.16 8.0.0-dmr 10.0.28 10.1.18 10.2.2
run time (seconds) 8.31 sec 4.81 sec 5.02 sec 1 min 47.88 sec 2 min 5.08 sec 6.61 sec

For those still obsessed on getting a nice bar chart every time:

Execution time, in seconds, of SELECT count(*) FROM nodes;

I knew there were some optimizations on SELECT count(*) that made that faster for InnoDB (not instant, like MyISAM), but I didn’t know it was enabled as early as 5.6. It seems MariaDB only merged that feature (or a similar one doing the same) starting with 10.2.

Conclusions

Providing data without an explanation is a bad practice, however, I believe this will help as a starting point to debug (by me, or any others that would want to reproduce the results- I think I have given enough information to do that) and see what code points are blocking this memory-only, 1-thread concurrency specific test. Again, remember this is a very specific test and not a proper benchmark to base decisions on. The intended goals which were: 1) identify regressions 2) identify bugs 3) get a first quick look at the options changed and 4) start working with the newest versions, all were accomplished.

As a followup, I would like to explore more in depth those potentially observed regressions on 8.0 and 10.2, and see if it is just a case of adapting the configuration or if there is an underlying cause slowing them down. If they happened to be bugs, there should be plenty of time to fix those before a stable release from both vendors. The other thing that I would like to analyze next is this very same load type, but with innodb compression, and compare both the performance and the resulting file sizes.

I hope this has been helpful for someone else other than me, too. I have probably made some mistakes along the way, please tell me so on a comment or @jynus.

Testing (again) LOAD DATA on MySQL 5.6, 5.7, 8.0 (non-GA) and MariaDB 10.0, 10.1 and 10.2 (non-GA)
Tagged on:                                             

4 thoughts on “Testing (again) LOAD DATA on MySQL 5.6, 5.7, 8.0 (non-GA) and MariaDB 10.0, 10.1 and 10.2 (non-GA)

  • 2016-12-07 at 08:15
    Permalink

    Nice article. It would be nice to attach a slave to each server and let the LOAD DATA replicate to see how the replication thread handles it so we can see if there is any regression or change in that sense.

  • 2016-12-12 at 12:53
    Permalink

    This was fun to read. Thanks for putting so much effort into writing it.

  • 2016-12-23 at 19:54
    Permalink

    Quite interesting.
    What I missed, dearly: In your “diff” output of configuration variables, there is no indication which is which. Readers might do the wrong association.

    • 2017-02-05 at 17:15
      Permalink

      Thank you, for the suggestion. Indeed it was confusing, so much that I missinterpreted the 5.7 and 8.0 results. I have clarified that and added some headers on the diff to make them clearer. When 8.0 and 10.2 are GA I may add a more in depth comparions than just a diff.

Comments are closed.