Meltdown logo as a shield melting downUpdate: I included the results for when PCID is disabled, for comparison, as a worse case scenario.

After learning about Meltdown and Spectre, I waited patiently to get a fix from my OS vendor. However, there were several reports of performance impact due to the kernel mitigation- for example on the PostgresQL developers mailing list there was reports of up to 23% throughput loss; Red Hat engineers report a regression range of 1-20%, but setting OLTP systems as the worse type of workload. As it will be highly dependent on the hardware and workload, I decided of doing some test myself for the use cases I need.

My setup

It is similar to that of my previous tests:

Hardware -desktop grade, no Xeon or proper RAID:

  • Intel(R) Core(TM) i7-4790K CPU @ 4.0GHz (x86_64 Quad-core with hyperthreading) with PCID support (disabling pcid with “nopcid” kernel command line will also be tested)
  • 32 GB of RAM
  • Single, desktop-grade, Samsung SSD 850 PRO 512GB

OS and configuration:

  • Debian GNU/Linux 9.3 “Stretch”, comparing kernels:
    • 4.9.0-4-amd64 #1 SMP Debian 4.9.65-3+deb9u1 (no mitigation)
    • 4.9.0-5-amd64 #1 SMP Debian 4.9.65-3+deb9u2 (latest kernel with security updates backported, including pti enabled according to security-announces)
  • datadir formatted as xfs, mounted with noatime option, all on top of LVM
  • MariaDB Server 10.1.30 compiled from source, queried locally through unix socket

The tests performed:

  • The single-thread write with LOAD DATA
  • A read-only sysbench with 8 and 64 threads

The results

LOAD DATA (single thread)

We have been measuring LOAD DATA performance of a single OpenStreetMap table (CSV file) in several previous tests as we detected a regression on some MySQL versions with single-thread write load. I believe it could be a interesting place to start. I tested both the default configuration and another more similar to WMF production:

Load timerows/s
Unpatched Kernel, default configuration229.4±1s203754
Patched Kernel, default configuration227.8±2.5s205185
Patched Kernel, nopcid, default configuration227.9±1.6s205099
Unpatched Kernel, WMF configuration163.5±1s285878
Patched Kernel, WMF configuration163.3±1s286229
Patched Kernel, nopcid, WMF configuration165.1±1.3s283108

No meaningful regressions are observed in this case between the default patched and unpatched kernels- the variability is within the measured error. The nopcid could be showing some overhead, but the overhead (around 1%) is barely above the measuring error. The nopcid option is interesting not because the hardware support, but because of the kernel support- backporting it could be a no-option for older distro versions, as Moritz says on the comments.

It is interesting to notice, although offtopic, that while the results with the WMF “optimized” configuration have become better compared to previous years results (due, most likely, to improved CPU and memory resources); the defaults have become worse- a reminder that defaults are not a good metric for comparison.

This is not a surprising result, a single thread is not a real OLTP workload, and more time will be wasted on io waits than the necessary syscalls.

RO-OLTP

Let’s try with a different workload- let’s use a proper benchmarking tool, create a table and perform point selects with it, with 2 different levels of concurrency- 8 threads and 64 threads:

sysbench --test=oltp --oltp-table-size=1000000 --mysql-db=test --mysql-user=test prepare
sysbench --test=oltp --oltp-table-size=1000000 --mysql-db=test --mysql-user=test --max-time=120 --oltp-read-only=on --max-requests=0 --num-threads={8, 64} run
TPSSELECTs/s95 percentile of latency(ms)
Unpatched Kernel, 8 threads7333±30100953±10001.15±0.05
Patched Kernel, 8 threads6867±15096140±20001.20±0.01
Patched Kernel, nopcid, 8 threads6637±2092915±2001.27±0.05
Unpatched kernel, 64 threads7298±50102176±100043.21±0.15
Patched Kernel, 64 threads6768±4094747±100043.66±0.15
Patched Kernel, nopcid, 64 threads6648±1093073±10043.96±0.10

In this case we can observe around a 4-7% regression in throughput if pcid is enabled. If pcid is disabled, they increase up to 9-10% bad, but not as bad as the warned by some “up to 20%”. If you are in my situation, and upgrade to stretch would be worth to get the pcid support.

Further testing would be required to check at what level of concurrency or what kind of workloads will work better or worse with the extra work for context switch. It will be interesting to measure it with production traffic, too, as some of the above could be nullified when network latencies are added to the mix. Further patches can also change the way mitigation works, plus probably things like having PCID support is helping transparently on all modern hardware.

Have you detected a larger regression? Are you going to patch all your databases right away? Tell me at @jynus.

Finding out the MySQL performance regression due to kernel mitigation for Meltdown CPU vulnerability
Tagged on:                                         

7 thoughts on “Finding out the MySQL performance regression due to kernel mitigation for Meltdown CPU vulnerability

  • 2018-01-05 at 14:41
    Permalink

    Thanks for running these tests! The patch set integrated in Debian stretch also backports support for PCID, so you are already making use of the PCID speedup, BTW. Older backports will usually not have PCID support (which was only introduced in Linux 4.14), e.g. the current patches for Linux 4.4.x do not support it.

    • 2018-01-05 at 22:42
      Permalink

      I will try to get some time and update the numbers with that suggestion. It would be very interesting because not all kernel versions (and with that, older distros like debian oldstable-jessie) will support PCID, as Moritz comments above.

    • 2018-01-06 at 19:02
      Permalink

      Alexey, thanks for the suggestion, I have added those number, too.

  • 2018-01-05 at 22:09
    Permalink

    Thank you for a nice perf report.

    I assume the worst case (up to X%) occurs when you maximize context switches per query and one way to do that is to maximize mutex contention. sysbench update-index or update-nonindex with a 1-row table and concurrency might show more regression than what you get above.

    My fork of sysbench has that (update one row) and a few read-only tests that also get contention with InnoDB. Some results are at http://smalldatum.blogspot.com/2017/12/sysbench-in-memory-and-fast-server-part.html but I have been too lazy to push my sysbench changes to Alexey.

    • 2018-01-05 at 22:48
      Permalink

      I apologize in advance (my university professors would be ashamed of me :-/) for not providing extra metrics to justify the numbers obtained- statistics of function calls and system calls, and a X-time-axis evolution of some metrics, but I may not have all the time I would like to prepare the graphs 🙁

      I know InnoDB does active wait (I think InnoDB call them “spins”) instead of full-blow mutexes in certain cases; do you think that could alter the results somehow OR we could tune InnoDB with existing configuration to optimize for less context switches? Like I said to Alexey, I will try to get the time for additional tests, but cannot promise anything! 🙂

  • Pingback:This Week in Data with Colin Charles 23: CPU security continues to draw attention - Prosyscom Hosting

Comments are closed.