Message ID | 20210818063107.2696454-1-yuzhao@google.com (mailing list archive) |
---|---|
Headers | show |
Series | Multigenerational LRU Framework | expand |
Kernel / MariaDB benchmark with MGLRU TLDR ==== With the MGLRU, MariaDB achieved 95% CIs [5.24, 10.71]% and [20.22, 25.97]% more transactions per minute (TPM), respectively, under the medium- and high-concurrency conditions when slightly overcommitting memory. There were no statistically significant changes in TPM under other conditions. Rationale ========= Memory overcommit can improve utilization and, if not overdone, can also increase throughput. The challenges are estimating working sets and optimizing page reclaim. The risks are performance degradations and OOM kills. Unless overcoming the challenges, the only way to reduce the risks is to overprovision memory. MariaDB is one of the most popular open-source RDBMSs. HammerDB is the leading open-source benchmarking software derived from the TPC specifications. OLTP is the most important use case for RDBMSs. Matrix ====== Kernels: version [+ patchset] * Baseline: 5.14 * Patched: 5.14 + MGLRU Memory conditions: % of memory size * Underutilizing: ~10% on inactive file list * Overcommitting: ~10% swapped out Concurrency conditions: average # of users per CPU * Low: ~3 * Medium: ~13 * High: ~19 Total configurations: 12 Data points per configuration: 10 Total run duration (minutes) per data point: ~45 Procedure ========= The latest MGLRU patchset for the 5.14 kernel is available at git fetch https://linux-mm.googlesource.com/page-reclaim \ refs/changes/30/1430/1 Baseline and patched 5.14 kernel images are available at https://drive.google.com/drive/folders/1eMkQleAFGkP2vzM_JyRA21oKE0ESHBqp <install and configure OS> hammerdbcli auto prep_tpcc.tcl systemctl stop mariadb e2image <backup /mnt/data> <for each kernel> grub2-set-default <baseline / patched> <for each memory condition> <update /etc/my.cnf> <for each concurrency condition> <update run_tpcc.tcl> <for each data point> systemctl stop mariadb e2image <restore /mnt/data> reboot hammerdbcli auto run_tpcc.tcl <collect TPM> Hardware ======== Memory (GB): 64 CPU (total #): 32 NVMe SSD (GB): 1024 OS == $ cat /etc/redhat-release Red Hat Enterprise Linux release 8.4 (Ootpa) $ cat /proc/swaps Filename Type Size Used Priority /dev/nvme0n1p3 partition 32970748 0 -2 $ mount | grep data /dev/nvme0n1p4 on /mnt/data type ext4 (rw,relatime,seclabel) $ cat /proc/cmdline <existing parameters> systemd.unified_cgroup_hierarchy=1 $ cat /sys/fs/cgroup/user.slice/memory.min 4294967296 $ cat /proc/sys/vm/overcommit_memory 1 MariaDB ======= $ mysql --version mysql Ver 15.1 Distrib 10.3.28-MariaDB, for Linux (x86_64) using readline 5.1 $ cat /etc/my.cnf <existing parameters> [mysqld] innodb_buffer_pool_size=<50G, 60G> innodb_doublewrite=0 innodb_flush_log_at_trx_commit=0 innodb_flush_method=O_DIRECT_NO_FSYNC innodb_flush_neighbors=0 innodb_io_capacity=4000 innodb_io_capacity_max=20000 innodb_log_buffer_size=1G innodb_log_file_size=20G innodb_max_dirty_pages_pct=90 innodb_max_dirty_pages_pct_lwm=10 max_connections=1000 datadir=/mnt/data HammerDB ======== $ hammerdbcli -h HammerDB CLI v4.2 Copyright (C) 2003-2021 Steve Shaw Type "help" for a list of commands Usage: hammerdbcli [ auto [ script_to_autoload.tcl ] ] $ cat prep_tpcc.tcl dbset db maria diset connection maria_socket /var/lib/mysql/mysql.sock diset tpcc maria_count_ware 1200 diset tpcc maria_num_vu 32 diset tpcc maria_partition true buildschema waittocomplete quit $ cat run_tpcc.tcl dbset db maria diset connection maria_socket /var/lib/mysql/mysql.sock diset tpcc maria_total_iterations 20000000 diset tpcc maria_driver timed diset tpcc maria_rampup 10 diset tpcc maria_duration 30 diset tpcc maria_allwarehouse true vuset logtotemp 1 vuset unique 1 loadscript vuset vu <100, 400, 600> vucreate vurun runtimer 3000 Vudestroy Results ======= Comparing the patched with the baseline kernel, MariaDB achieved 95% CIs [5.24, 10.71]% and [20.22, 25.97]% more TPM, respectively, under the medium- and high-concurrency conditions when slightly overcommitting memory. There were no statistically significant changes in TPM under other conditions. +--------------------+-----------------------+-----------------------+ | Mean TPM [95% CI] | Underutilizing memory | Overcommitting memory | +--------------------+-----------------------+-----------------------+ | Low concurrency | 270811.6 / 271522.7 | 447933.4 / 447283.3 | | | [-40.97, 1463.17] | [-1330.61, 30.41] | +--------------------+-----------------------+-----------------------+ | Medium concurrency | 240212.9 / 242846.7 | 327276.6 / 353372.7 | | | [-2611.38, 7878.98] | [17149.01, 35043.19] | +--------------------+-----------------------+-----------------------+ | High concurrency | 283897.8 / 283668.1 | 274069.7 / 337366.8 | | | [-11538.08, 11078.68] | [55417.42, 71176.78] | +--------------------+-----------------------+-----------------------+ Table 1. Comparison between the baseline and patched kernels Comparing overcommitting with underutilizing memory, MariaDB achieved 95% CIs [65.12, 65.68]% and [32.45, 40.04]% more TPM, respectively, under the low- and medium-concurrency conditions when using the baseline kernel; 95% CIs [64.48, 64.98]%, [43.53, 47.50]% and [16.48, 21.38]% more TPM, respectively, under the low-, medium- and high-concurrency conditions when using the patched kernel. There were no statistically significant changes in TPM under other conditions. +--------------------+------------------------+----------------------+ | Mean TPM [95% CI] | Baseline kernel | Patched kernel | +--------------------+------------------------+----------------------+ | Low concurrency | 270811.6 / 447933.4 | 271522.7 / 447283.3 | | | [176362.0, 177881.6] | [175089.3, 176431.9] | +--------------------+------------------------+----------------------+ | Medium concurrency | 240212.9 / 327276.6 | 242846.7 / 353372.7 | | | [77946.4, 96181.0] | [105707.7, 115344.3] | +--------------------+------------------------+----------------------+ | High concurrency | 283897.8 / 274069.7 | 283668.1 / 337366.8 | | | [-21605.703, 1949.503] | [46758.85, 60638.55] | +--------------------+------------------------+----------------------+ Table 2. Comparison between underutilizing and overcommitting memory Metrics collected during each run are available at https://github.com/ediworks/KernelPerf/tree/master/mglru/mariadb/5.14 References ========== HammerDB v4.2 New Features: https://www.hammerdb.com/blog/uncategorized/hammerdb-v4-2-new-features -pt1-mariadb-build-and-test-example-with-the-cli/ Appendix ======== $ cat raw_data.r v <- c( # baseline 50g 100vu 269531,270113,270256,270367,270393,270630,270707,271373,272291,272455, # baseline 50g 400vu 231856,234985,235144,235552,238551,239994,244413,245255,247997,248382, # baseline 50g 600vu 256365,271733,275966,280623,281014,283764,293327,296750,298728,300708, # baseline 60g 100vu 446973,447383,447412,447489,447874,448046,448123,448531,448739,448764, # baseline 60g 400vu 312427,312936,313780,321503,329554,330551,332377,333584,337105,348949, # baseline 60g 600vu 262338,262971,266242,266489,268036,272494,279045,281472,289942,291668, # patched 50g 100vu 270621,270913,271026,271137,271517,271616,271699,272117,272218,272363, # patched 50g 400vu 233314,238265,238722,240540,241676,245204,245688,247440,248417,249201, # patched 50g 600vu 271114,271928,277562,279455,282074,285515,287836,288508,289451,303238, # patched 60g 100vu 445923,446178,446837,446889,447331,447480,447823,447999,448145,448228, # patched 60g 400vu 345705,349373,350832,351229,351758,352520,355130,355247,357762,364171, # patched 60g 600vu 330860,334705,336001,337291,338326,338361,338970,339163,339784,340207 ) a <- array(v, dim = c(10, 3, 2, 2)) # baseline vs patched for (m in 1:2) { for (c in 1:3) { r <- t.test(a[, c, m, 1], a[, c, m, 2]) print(r) p <- r$conf.int * 100 / r$estimate[1] if ((p[1] > 0 && p[2] < 0) || (p[1] < 0 && p[2] > 0)) { s <- sprintf("m%d c%d: no significance", m, c) } else { s <- sprintf("m%d c%d: [%.2f, %.2f]%%", m, c, -p[2], -p[1]) } print(s) } } # 50g vs 60g for (k in 1:2) { for (c in 1:3) { r <- t.test(a[, c, 1, k], a[, c, 2, k]) print(r) p <- r$conf.int * 100 / r$estimate[1] if ((p[1] > 0 && p[2] < 0) || (p[1] < 0 && p[2] > 0)) { s <- sprintf("k%d c%d: no significance", k, c) } else { s <- sprintf("k%d c%d: [%.2f, %.2f]%%", k, c, -p[2], -p[1]) } print(s) } } $ R -q -s -f raw_data.r Welch Two Sample t-test data: a[, c, m, 1] and a[, c, m, 2] t = -2.0139, df = 15.122, p-value = 0.06217 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -1463.17673 40.97673 sample estimates: mean of x mean of y 270811.6 271522.7 [1] "50g 100vu: no significance" Welch Two Sample t-test data: a[, c, m, 1] and a[, c, m, 2] t = -1.0564, df = 17.673, p-value = 0.305 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -7878.98 2611.38 sample estimates: mean of x mean of y 240212.9 242846.7 [1] "50g 400vu: no significance" Welch Two Sample t-test data: a[, c, m, 1] and a[, c, m, 2] t = 0.043083, df = 15.895, p-value = 0.9662 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -11078.68 11538.08 sample estimates: mean of x mean of y 283897.8 283668.1 [1] "50g 600vu: no significance" Welch Two Sample t-test data: a[, c, m, 1] and a[, c, m, 2] t = 2.0171, df = 16.831, p-value = 0.05993 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -30.41577 1330.61577 sample estimates: mean of x mean of y 447933.4 447283.3 [1] "60g 100vu: no significance" Welch Two Sample t-test data: a[, c, m, 1] and a[, c, m, 2] t = -6.3473, df = 12.132, p-value = 3.499e-05 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -35043.19 -17149.01 sample estimates: mean of x mean of y 327276.6 353372.7 [1] "60g 400vu: [5.24, 10.71]%" Welch Two Sample t-test data: a[, c, m, 1] and a[, c, m, 2] t = -17.844, df = 10.233, p-value = 4.822e-09 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -71176.78 -55417.42 sample estimates: mean of x mean of y 274069.7 337366.8 [1] "60g 600vu: [20.22, 25.97]%" Welch Two Sample t-test data: a[, c, 1, k] and a[, c, 2, k] t = -495.48, df = 15.503, p-value < 2.2e-16 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -177881.6 -176362.0 sample estimates: mean of x mean of y 270811.6 447933.4 [1] "baseline 100vu: [65.12, 65.68]%" Welch Two Sample t-test data: a[, c, 1, k] and a[, c, 2, k] t = -20.601, df = 13.182, p-value = 2.062e-11 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -96181.0 -77946.4 sample estimates: mean of x mean of y 240212.9 327276.6 [1] "baseline 400vu: [32.45, 40.04]%" Welch Two Sample t-test data: a[, c, 1, k] and a[, c, 2, k] t = 1.7607, df = 16.986, p-value = 0.09628 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -1949.503 21605.703 sample estimates: mean of x mean of y 283897.8 274069.7 [1] "baseline 600vu: no significance" Welch Two Sample t-test data: a[, c, 1, k] and a[, c, 2, k] t = -553.68, df = 16.491, p-value < 2.2e-16 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -176431.9 -175089.3 sample estimates: mean of x mean of y 271522.7 447283.3 [1] "patched 100vu: [64.48, 64.98]%" Welch Two Sample t-test data: a[, c, 1, k] and a[, c, 2, k] t = -48.194, df = 17.992, p-value < 2.2e-16 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -115344.3 -105707.7 sample estimates: mean of x mean of y 242846.7 353372.7 [1] "patched 400vu: [43.53, 47.50]%" Welch Two Sample t-test data: a[, c, 1, k] and a[, c, 2, k] t = -17.109, df = 10.6, p-value = 4.629e-09 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -60638.55 -46758.85 sample estimates: mean of x mean of y 283668.1 337366.8 [1] "patched 600vu: [16.48, 21.38]%"
Kernel / Memcached benchmark with MGLRU TLDR ==== With the MGLRU, Memcached achieved 95% CIs [23.54, 32.25]%, [20.76, 41.61]%, [13.85, 15.97]%, [21.59, 30.02]% and [23.94, 29.92]% more operations per second (OPS), respectively, for sequential access w/ THP=always, random access w/ THP=always, random access w/ THP=never, Gaussian access w/ THP=always and Gaussian access w/ THP=never. There were no statistically significant changes in OPS for sequential access w/ THP=never. Background ========== Memory overcommit can increase utilization and, if carried out properly, can also increase throughput. The challenges are to improve working set estimation and to optimize page reclaim. The risks are performance degradations and OOM kills. Short of overcoming the challenges, the only way to reduce the risks is to underutilize memory. Memcached is one of the most popular open-source in-memory KV stores. memtier_benchmark is the leading open-source KV store benchmarking software that supports multiple access patterns. THP can have a negative effect under memory pressure, due to internal and/or external fragmentations. Matrix ====== Kernels: version [+ patchset] * Baseline: 5.14 * Patched: 5.14 + MGLRU Memory conditions: % of memory size * Underutilizing: N/A * Overcommitting: ~10% swapped out (zram) THP (2MB Transparent Huge Pages): * Always * Never Read patterns (2kB objects): * Parallel sequential * Uniform random * Gaussian (SD = 1/6 of key range) Total configurations: 12 Data points per configuration: 10 Total run duration (minutes) per data point: ~20 Note that the goal of this benchmark is to compare the performance for the same key range, object size, and hit ratio. Since Memcached does not support backing storage, it requires fewer in-memory objects to underutilize memory, which reduces the hit ratio and therefore is not applicable in this case. Procedure ========= The latest MGLRU patchset for the 5.14 kernel is available at git fetch https://linux-mm.googlesource.com/page-reclaim \ refs/changes/30/1430/1 Baseline and patched 5.14 kernel images are available at https://drive.google.com/drive/folders/1eMkQleAFGkP2vzM_JyRA21oKE0ESHBqp <install and configure OS> <for each kernel> grub2-set-default <baseline, patched> <for each THP setting> echo <always, never> > \ /sys/kernel/mm/transparent_hugepage/enabled <update /etc/sysconfig/memcached> <for each access pattern> <update run_memtier.sh> <for each data point> reboot run_memtier.sh <collect OPS> Hardware ======== Memory (GB): 64 CPU (total #): 32 NVMe SSD (GB): 1024 OS == $ cat /etc/redhat-release Red Hat Enterprise Linux release 8.4 (Ootpa) $ cat /proc/swaps Filename Type Size Used Priority /dev/zram0 partition 8388604 0 -2 $ cat /proc/cmdline <existing parameters> systemd.unified_cgroup_hierarchy=1 $ cat /sys/fs/cgroup/user.slice/memory.min 4294967296 $ cat /proc/sys/vm/overcommit_memory 1 Memcached ========= $ memcached -V memcached 1.5.22 $ cat /etc/sysconfig/memcached USER="memcached" MAXCONN="10000" CACHESIZE="65536" OPTIONS="-s /tmp/memcached.sock -a 0766 -t 16 -b 10000 -B binary <-L>" memtier_benchmark $ memtier_benchmark -v memtier_benchmark 1.3.0 Copyright (C) 2011-2020 Redis Labs Ltd. This is free software. You may redistribute copies of it under the terms of the GNU General Public License <http://www.gnu.org/licenses/gpl.html>. There is NO WARRANTY, to the extent permitted by law. $ cat run_memtier.sh # load objects memtier_benchmark -S /tmp/memcached.sock -P memcache_binary -n allkeys -c 1 -t 16 --ratio 1:0 --pipeline 1 -d 2000 --key-minimum=1 --key-maximum=30000000 --key-pattern=P:P # run benchmark memtier_benchmark -S /tmp/memcached.sock -P memcache_binary -n 30000000 -c 1 -t 16 --ratio 0:1 --pipeline 1 --randomize --distinct-client-seed --key-minimum=1 --key-maximum=30000000 --key-pattern=<P:P, R:R, G:G> Results ======= Comparing the patched with the baseline kernel, Memcached achieved 95% CIs [23.54, 32.25]%, [20.76, 41.61]%, [13.85, 15.97]%, [21.59, 30.02]% and [23.94, 29.92]% more OPS, respectively, for sequential access w/ THP=always, random access w/ THP=always, random access w/ THP=never, Gaussian access w/ THP=always and Gaussian access w/ THP=never. There were no statistically significant changes in OPS for sequential access w/ THP=never. +-------------------+-----------------------+------------------------+ | Mean OPS [95% CI] | THP=always | THP=never | +-------------------+-----------------------+------------------------+ | Sequential access | 519599.7 / 664543.2 | 525394.8 / 527170.6 | | | [122297.9, 167589.0] | [-15138.63, 18690.31] | +-------------------+-----------------------+------------------------+ | Random access | 450033.2 / 590360.7 | 509237.3 / 585142.4 | | | [93415.59, 187239.37] | [70504.51, 81305.60] | +-------------------+-----------------------+------------------------+ | Gaussian access | 481182.4 / 605358.7 | 531270.8 / 674341.4 | | | [103892.6, 144460.0]] | [127199.8, 158941.2] | +-------------------+-----------------------+------------------------+ Table 1. Comparison between the baseline and patched kernels Comparing THP=never with THP=always, Memcached achieved 95% CIs [2.73, 23.58]% and [5.45, 15.37]% more OPS, respectively, for random access and Gaussian access when using the baseline kernel; 95% CIs [-22.65, -18.69]% and [10.67, 12.12]% more OPS, respectively, for sequential access and Gaussian access when using the patched kernel. There were no statistically significant changes in OPS under other conditions. +-------------------+-----------------------+------------------------+ | Mean OPS [95% CI] | Baseline kernel | Patched kernel | +-------------------+-----------------------+------------------------+ | Sequential access | 519599.7 / 525394.8 | 664543.2 / 527170.6 | | | [-18739.71, 30329.80] | [-150551.0, -124194.1] | +-------------------+-----------------------+------------------------+ | Random access | 450033.2 / 509237.3 | 590360.7 / 585142.4 | | | [12303.49, 106104.69] | [-10816.1516, 379.475] | +-------------------+-----------------------+------------------------+ | Gaussian access | 481182.4 / 531270.8 | 605358.7 / 674341.4 | | | [26229.02, 73947.84] | [64570.58, 73394.70] | +-------------------+-----------------------+------------------------+ Table 2. Comparison between THP=always and THP=never Metrics collected during each run are available at https://github.com/ediworks/KernelPerf/tree/master/mglru/memcached/5.14 References ========== memtier_benchmark: A High-Throughput Benchmarking Tool for Redis & Memcached https://redis.com/blog/memtier_benchmark-a-high-throughput-benchmarking-tool-for-redis-memcached/ Appendix ======== $ cat raw_data.r v <- c( # baseline THP=always sequential 460266.29, 466497.70, 516145.38, 523474.39, 528507.72, 529481.86, 533867.92, 537028.56, 546027.45, 554699.89, # baseline THP=always random 371470.66, 378967.63, 381137.01, 385205.60, 449100.72, 474670.76, 490470.46, 513341.53, 525159.49, 530808.55, # baseline THP=always Gaussian 455674.14, 457089.50, 460001.46, 463269.94, 468283.00, 474169.61, 477684.67, 506331.96, 507875.30, 541444.54, # baseline THP=never sequential 501887.04, 507303.10, 509573.54, 515222.79, 517429.04, 530805.74, 536490.44, 538088.45, 540459.92, 556687.57, # baseline THP=never random 496489.97, 506444.42, 508002.80, 508707.39, 509746.28, 511157.58, 511897.57, 511926.06, 512652.28, 515348.95, # baseline THP=never Gaussian 493199.15, 504207.48, 518781.40, 520536.21, 528619.45, 540677.91, 544365.57, 551698.32, 554046.80, 556576.14, # patched THP=always sequential 660711.43, 660936.88, 661275.57, 662540.65, 663417.25, 665546.99, 665680.49, 667564.03, 668555.96, 669202.36, # patched THP=always random 582574.69, 583714.04, 587102.54, 587375.85, 588997.85, 589052.96, 593922.17, 594722.98, 596178.28, 599965.83, # patched THP=always Gaussian 601707.98, 602055.03, 603020.28, 603335.93, 604519.55, 605086.48, 607405.59, 607570.79, 609009.54, 609875.98, # patched THP=never sequential 507753.56, 509462.65, 509964.30, 510369.66, 515001.36, 531685.00, 543709.22, 545142.98, 548392.56, 550224.74, # patched THP=never random 571017.21, 579705.57, 582801.51, 584475.82, 586247.73, 587209.97, 587354.87, 588661.14, 591237.23, 592712.76, # patched THP=never Gaussian 666403.77, 669691.68, 670248.43, 672190.97, 672466.43, 674320.42, 674897.72, 677282.76, 678886.51, 687024.85 ) a <- array(v, dim = c(10, 3, 2, 2)) # baseline vs patched for (thp in 1:2) { for (pattern in 1:3) { r <- t.test(a[, pattern, thp, 1], a[, pattern, thp, 2]) print(r) p <- r$conf.int * 100 / r$estimate[1] if ((p[1] > 0 && p[2] < 0) || (p[1] < 0 && p[2] > 0)) { s <- sprintf("thp%d pattern%d: no significance", thp, pattern) } else { s <- sprintf("thp%d pattern%d: [%.2f, %.2f]%%", thp, pattern, -p[2], -p[1]) } print(s) } } # THP=always vs THP=never for (kernel in 1:2) { for (pattern in 1:3) { r <- t.test(a[, pattern, 1, kernel], a[, pattern, 2, kernel]) print(r) p <- r$conf.int * 100 / r$estimate[1] if ((p[1] > 0 && p[2] < 0) || (p[1] < 0 && p[2] > 0)) { s <- sprintf("kernel%d pattern%d: no significance", kernel, pattern) } else { s <- sprintf("kernel%d pattern%d: [%.2f, %.2f]%%", kernel, pattern, -p[2], -p[1]) } print(s) } } $ R -q -s -f raw_data.r Welch Two Sample t-test data: a[, pattern, thp, 1] and a[, pattern, thp, 2] t = -14.434, df = 9.1861, p-value = 1.269e-07 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -167589.0 -122297.9 sample estimates: mean of x mean of y 519599.7 664543.2 [1] "thp1 pattern1: [23.54, 32.25]%" Welch Two Sample t-test data: a[, pattern, thp, 1] and a[, pattern, thp, 2] t = -6.7518, df = 9.1333, p-value = 7.785e-05 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -187239.37 -93415.59 sample estimates: mean of x mean of y 450033.2 590360.7 [1] "thp1 pattern2: [20.76, 41.61]%" Welch Two Sample t-test data: a[, pattern, thp, 1] and a[, pattern, thp, 2] t = -13.805, df = 9.1933, p-value = 1.866e-07 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -144460.0 -103892.6 sample estimates: mean of x mean of y 481182.4 605358.7 [1] "thp1 pattern3: [21.59, 30.02]%" Welch Two Sample t-test data: a[, pattern, thp, 1] and a[, pattern, thp, 2] t = -0.22059, df = 17.979, p-value = 0.8279 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -18690.31 15138.63 sample estimates: mean of x mean of y 525394.8 527170.6 [1] "thp2 pattern1: no significance" Welch Two Sample t-test data: a[, pattern, thp, 1] and a[, pattern, thp, 2] t = -29.606, df = 17.368, p-value = 2.611e-16 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -81305.60 -70504.51 sample estimates: mean of x mean of y 509237.3 585142.4 [1] "thp2 pattern2: [13.85, 15.97]%" Welch Two Sample t-test data: a[, pattern, thp, 1] and a[, pattern, thp, 2] t = -20.02, df = 10.251, p-value = 1.492e-09 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -158941.2 -127199.8 sample estimates: mean of x mean of y 531270.8 674341.4 [1] "thp2 pattern3: [23.94, 29.92]%" Welch Two Sample t-test data: a[, pattern, 1, kernel] and a[, pattern, 2, kernel] t = -0.50612, df = 14.14, p-value = 0.6206 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -30329.80 18739.71 sample estimates: mean of x mean of y 519599.7 525394.8 [1] "kernel1 pattern1: no significance" Welch Two Sample t-test data: a[, pattern, 1, kernel] and a[, pattern, 2, kernel] t = -2.8503, df = 9.1116, p-value = 0.01885 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -106104.69 -12303.49 sample estimates: mean of x mean of y 450033.2 509237.3 [1] "kernel1 pattern2: [2.73, 23.58]%" Welch Two Sample t-test data: a[, pattern, 1, kernel] and a[, pattern, 2, kernel] t = -4.4308, df = 16.918, p-value = 0.0003701 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -73947.84 -26229.02 sample estimates: mean of x mean of y 481182.4 531270.8 [1] "kernel1 pattern3: [5.45, 15.37]%" Welch Two Sample t-test data: a[, pattern, 1, kernel] and a[, pattern, 2, kernel] t = 23.374, df = 9.5538, p-value = 9.402e-10 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: 124194.1 150551.0 sample estimates: mean of x mean of y 664543.2 527170.6 [1] "kernel2 pattern1: [-22.65, -18.69]%" Welch Two Sample t-test data: a[, pattern, 1, kernel] and a[, pattern, 2, kernel] t = 1.96, df = 17.806, p-value = 0.06583 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -379.4756 10816.1516 sample estimates: mean of x mean of y 590360.7 585142.4 [1] "kernel2 pattern2: no significance" Welch Two Sample t-test data: a[, pattern, 1, kernel] and a[, pattern, 2, kernel] t = -33.687, df = 13.354, p-value = 2.614e-14 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -73394.70 -64570.58 sample estimates: mean of x mean of y 605358.7 674341.4 [1] "kernel2 pattern3: [10.67, 12.12]%"
Kernel / Apache Spark benchmark with MGLRU TLDR ==== With the MGLRU, Apache Spark took 95% CIs [9.28, 11.19]% and [12.20, 14.93]% less wall time to sort 3 billion random integers, respectively, under the medium- and high-concurrency conditions when slightly overcommitting memory. There were no statistically significant changes in wall time when sorting the same dataset under other conditions. Background ========== Memory overcommit can increase utilization and, if carried out properly, can also increase throughput. The challenges are to improve working set estimation and to optimize page reclaim. The risks are performance degradations and OOM kills. Short of overcoming the challenges, the only way to reduce the risks is to underutilize memory. Apache Spark is one of the most popular open-source big-data frameworks. Dataset sorting is the most widely used benchmark for such frameworks. Matrix ====== Kernels: version [+ patchset] * Baseline: 5.14 * Patched: 5.14 + MGLRU Memory conditions: % of memory size * Underutilizing: ~10% on inactive file list * Overcommitting: ~10% swapped out Concurrency conditions: average # of workers per CPU * Low: 1 * Medium: 2 * High: 3 Cluster mode: local Dataset size: 3 billion random integers (57GB text) Total configurations: 12 Data points per configuration: 10 Total run duration (minutes) per data point: ~20 Procedure ========= The latest MGLRU patchset for the 5.14 kernel is available at git fetch https://linux-mm.googlesource.com/page-reclaim \ refs/changes/30/1430/1 Baseline and patched 5.14 kernel images are available at https://drive.google.com/drive/folders/1eMkQleAFGkP2vzM_JyRA21oKE0ESHBqp <install and configure OS> spark-shell < gen.scala <for each kernel> grub2-set-default <baseline, patched> <for each memory condition> <update run_spark.sh> <for each concurrency condition> <update run_spark.sh> <for each data point> reboot run_spark.sh <collect wall time> Hardware ======== Memory (GB): 64 CPU (total #): 32 NVMe SSD (GB): 1024 OS == $ cat /etc/redhat-release Red Hat Enterprise Linux release 8.4 (Ootpa) $ cat /proc/swaps Filename Type Size Used Priority /dev/nvme0n1p3 partition 32970748 0 -2 $ cat /proc/cmdline <existing parameters> systemd.unified_cgroup_hierarchy=1 $ cat /sys/fs/cgroup/user.slice/memory.min 4294967296 $ cat /proc/sys/vm/overcommit_memory 1 $ cat /sys/kernel/mm/transparent_hugepage/enabled always madvise [never] Apache Spark ============ $ spark-shell --version Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 3.1.2 /_/ Using Scala version 2.12.10, OpenJDK 64-Bit Server VM, 11.0.12 Branch HEAD Compiled by user centos on 2021-05-24T04:27:48Z Revision de351e30a90dd988b133b3d00fa6218bfcaba8b8 Url https://github.com/apache/spark Type --help for more information. $ cat gen.scala import java.io._ import scala.collection.mutable.ArrayBuffer object GenData { def main(args: Array[String]): Unit = { val file = new File("dataset.txt") val writer = new BufferedWriter(new FileWriter(file)) val buf = ArrayBuffer(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L) for(_ <- 0 until 300000000) { for (i <- 0 until 10) { buf.update(i, scala.util.Random.nextLong()) } writer.write(s"${buf.mkString(",")}\n") } writer.close() } } GenData.main(Array()) $ cat sort.scala import java.time.temporal.ChronoUnit import org.apache.spark.sql.SparkSession object SparkSort { def main(args: Array[String]): Unit = { val spark = SparkSession.builder().getOrCreate() val file = sc.textFile("dataset.txt", 32) val start = java.time.Instant.now() val results = file.flatMap(_.split(",")).map(x => (x, 1)).sortByKey().takeOrdered(10) val finish = java.time.Instant.now() println(s"wall time: ${ChronoUnit.SECONDS.between(start, finish)}") results.foreach(println) spark.stop() } } SparkSort.main(Array()) $ cat run_spark.sh spark-shell --master local\[<32, 64, 96>\] --driver-memory <52G, 62G> < sort.scala Results ======= Comparing the patched with the baseline kernel, Apache Spark took 95% CIs [9.28, 11.19]% and [12.20, 14.93]% less wall time to sort the dataset, respectively, under the medium- and high-concurrency conditions when slightly overcommitting memory. There were no statistically significant changes in wall time under other conditions. +--------------------+-----------------------+-----------------------+ | Mean wall time (s) | Underutilizing memory | Overcommitting memory | | [95% CI] | | | +--------------------+-----------------------+-----------------------+ | Low concurrency | 1037.1 / 1037.0 | 1038.2 / 1036.6 | | | [-1.41, 1.21] | [-3.67, 0.47] | +--------------------+-----------------------+-----------------------+ | Medium concurrency | 1141.8 / 1142.6 | 1297.9 / 1165.1 | | | [-1.35, 2.95] | [-145.21, -120.38] | +--------------------+-----------------------+-----------------------+ | High concurrency | 1239.3 / 1236.4 | 1456.8 / 1259.2 | | | [-7.81, 2.01] | [-217.53, -177.66] | +--------------------+-----------------------+-----------------------+ Table 1. Comparison between the baseline and patched kernels Comparing overcommitting with underutilizing memory, Apache Spark took 95% CIs [12.58, 14.76]% and [15.95, 19.15]% more wall time to sort the dataset, respectively, under the low- and medium-concurrency conditions when using the baseline kernel; 95% CIs [1.78, 2.16]% and [1.42, 2.27]% more wall time, respectively, under the medium- and high-concurrency conditions when using the patched kernel. There were no statistically significant changes in wall time under other conditions. +--------------------+------------------------+----------------------+ | Mean wall time (s) | Baseline kernel | Patched kernel | | [95% CI] | | | +--------------------+------------------------+----------------------+ | Low concurrency | 1037.1 / 1038.2 | 1037.0 / 1036.6 | | | [-0.31, 2.51] | [-2.43, 1.63] | +--------------------+------------------------+----------------------+ | Medium concurrency | 1141.8 / 1297.9 | 1142.6 / 1165.1 | | | [143.68, 168.51] | [20.33, 24.66] | +--------------------+------------------------+----------------------+ | High concurrency | 1239.3 / 1456.8 | 1236.4 / 1259.2 | | | [197.62, 237.37] | [17.55, 28.04] | +--------------------+------------------------+----------------------+ Table 2. Comparison between underutilizing and overcommitting memory Metrics collected during each run are available at https://github.com/ediworks/KernelPerf/tree/master/mglru/spark/5.14 Appendix ======== $ cat raw_data_spark.r v <- c( # baseline 52g 32t 1034, 1036, 1036, 1037, 1037, 1037, 1038, 1038, 1038, 1040, # baseline 52g 64t 1139, 1139, 1140, 1140, 1142, 1143, 1143, 1144, 1144, 1144, # baseline 52g 96t 1236, 1237, 1238, 1238, 1238, 1239, 1240, 1241, 1243, 1243, # baseline 62g 32t 1036, 1036, 1038, 1038, 1038, 1038, 1039, 1039, 1040, 1040, # baseline 62g 64t 1266, 1277, 1284, 1296, 1299, 1302, 1311, 1313, 1314, 1317, # baseline 62g 96t 1403, 1431, 1440, 1447, 1460, 1461, 1467, 1475, 1487, 1497, # patched 52g 32t 1035, 1036, 1036, 1037, 1037, 1037, 1037, 1038, 1038, 1039, # patched 52g 64t 1138, 1140, 1140, 1143, 1143, 1143, 1144, 1145, 1145, 1145, # patched 52g 96t 1228, 1228, 1233, 1234, 1235, 1236, 1236, 1240, 1246, 1248, # patched 62g 32t 1032, 1035, 1035, 1035, 1036, 1036, 1037, 1039, 1040, 1041, # patched 62g 64t 1162, 1164, 1164, 1164, 1164, 1164, 1166, 1166, 1168, 1169, # patched 62g 96t 1252, 1256, 1256, 1258, 1260, 1260, 1260, 1260, 1265, 1265 ) a <- array(v, dim = c(10, 3, 2, 2)) # baseline vs patched for (mem in 1:2) { for (con in 1:3) { r <- t.test(a[, con, mem, 1], a[, con, mem, 2]) print(r) p <- r$conf.int * 100 / r$estimate[1] if ((p[1] > 0 && p[2] < 0) || (p[1] < 0 && p[2] > 0)) { s <- sprintf("mem%d con%d: no significance", mem, con) } else { s <- sprintf("mem%d con%d: [%.2f, %.2f]%%", mem, con, -p[2], -p[1]) } print(s) } } # 52g vs 62g for (ker in 1:2) { for (con in 1:3) { r <- t.test(a[, con, 1, ker], a[, con, 2, ker]) print(r) p <- r$conf.int * 100 / r$estimate[1] if ((p[1] > 0 && p[2] < 0) || (p[1] < 0 && p[2] > 0)) { s <- sprintf("ker%d con%d: no significance", ker, con) } else { s <- sprintf("ker%d con%d: [%.2f, %.2f]%%", ker, con, -p[2], -p[1]) } print(s) } } $ R -q -s -f raw_data_spark.r Welch Two Sample t-test data: a[, con, mem, 1] and a[, con, mem, 2] t = 0.16059, df = 16.4, p-value = 0.8744 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -1.21749 1.41749 sample estimates: mean of x mean of y 1037.1 1037.0 [1] "mem1 con1: no significance" Welch Two Sample t-test data: a[, con, mem, 1] and a[, con, mem, 2] t = -0.78279, df = 17.565, p-value = 0.4442 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -2.950923 1.350923 sample estimates: mean of x mean of y 1141.8 1142.6 [1] "mem1 con2: no significance" Welch Two Sample t-test data: a[, con, mem, 1] and a[, con, mem, 2] t = 1.2933, df = 11.303, p-value = 0.2217 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -2.019103 7.819103 sample estimates: mean of x mean of y 1239.3 1236.4 [1] "mem1 con3: no significance" Welch Two Sample t-test data: a[, con, mem, 1] and a[, con, mem, 2] t = 1.6562, df = 13.458, p-value = 0.1208 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -0.4799188 3.6799188 sample estimates: mean of x mean of y 1038.2 1036.6 [1] "mem2 con1: no significance" Welch Two Sample t-test data: a[, con, mem, 1] and a[, con, mem, 2] t = 24.096, df = 9.2733, p-value = 1.115e-09 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: 120.3881 145.2119 sample estimates: mean of x mean of y 1297.9 1165.1 [1] "mem2 con2: [-11.19, -9.28]%" Welch Two Sample t-test data: a[, con, mem, 1] and a[, con, mem, 2] t = 22.289, df = 9.3728, p-value = 1.944e-09 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: 177.6666 217.5334 sample estimates: mean of x mean of y 1456.8 1259.2 [1] "mem2 con3: [-14.93, -12.20]%" Welch Two Sample t-test data: a[, con, 1, ker] and a[, con, 2, ker] t = -1.6398, df = 17.697, p-value = 0.1187 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -2.5110734 0.3110734 sample estimates: mean of x mean of y 1037.1 1038.2 [1] "ker1 con1: no significance" Welch Two Sample t-test data: a[, con, 1, ker] and a[, con, 2, ker] t = -28.33, df = 9.2646, p-value = 2.57e-10 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -168.5106 -143.6894 sample estimates: mean of x mean of y 1141.8 1297.9 [1] "ker1 con2: [12.58, 14.76]%" Welch Two Sample t-test data: a[, con, 1, ker] and a[, con, 2, ker] t = -24.694, df = 9.1353, p-value = 1.12e-09 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -237.3794 -197.6206 sample estimates: mean of x mean of y 1239.3 1456.8 [1] "ker1 con3: [15.95, 19.15]%" Welch Two Sample t-test data: a[, con, 1, ker] and a[, con, 2, ker] t = 0.42857, df = 12.15, p-value = 0.6757 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -1.630775 2.430775 sample estimates: mean of x mean of y 1037.0 1036.6 [1] "ker2 con1: no significance" Welch Two Sample t-test data: a[, con, 1, ker] and a[, con, 2, ker] t = -21.865, df = 17.646, p-value = 3.151e-14 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -24.66501 -20.33499 sample estimates: mean of x mean of y 1142.6 1165.1 [1] "ker2 con2: [1.78, 2.16]%" Welch Two Sample t-test data: a[, con, 1, ker] and a[, con, 2, ker] t = -9.2738, df = 14.72, p-value = 1.561e-07 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -28.04897 -17.55103 sample estimates: mean of x mean of y 1236.4 1259.2 [1] "ker2 con3: [1.42, 2.27]%"
Kernel / MongoDB benchmark with MGLRU TLDR ==== With the MGLRU, MongoDB achieved 95% CIs [2.23, 3.44]%, [6.97, 9.73]% and [2.16, 3.55]% more operations per second (OPS) respectively for exponential (distribution) access, random access and Zipfian access, when underutizling memory; 95% CIs [8.83, 10.03]%, [21.12, 23.14]% and [5.53, 6.46]% more OPS respectively for exponential access, random access and Zipfian access, when slightly overcommitting memory. Background ========== Memory overcommit can increase utilization and, if carried out properly, can also increase throughput. The challenges are to improve working set estimation and to optimize page reclaim. The risks are performance degradation and OOM kills. Short of overcoming the challenges, the only way to reduce the risks is to underutilize memory. MongoDB is one of the most popular open-source NoSQL databases. YCSB is the leading open-source NoSQL database benchmarking software that supports multiple access distributions. Matrix ====== Kernels: version [+ patchset] * Baseline: 5.14 * Patched: 5.14 + MGLRU Memory utilization: % of memory size * Underutilizing: ~15% on inactive file list * Overcommitting: ~5% swapped out Concurrency: average # of users per CPU * Medium: 2 Access distributions (1kB objects, 20% update): * Exponential * Uniform random * Zipfian Total configurations: 12 Data points per configuration: 10 Total run duration (minutes) per data point: ~20 Note that MongoDB reached the peak performance with the concurrency for this benchmark, i.e., its performance degraded with fewer or more users for this benchmark. Procedure ========= The latest MGLRU patchset for the 5.14 kernel is available at git fetch https://linux-mm.googlesource.com/page-reclaim \ refs/changes/30/1430/1 Baseline and patched 5.14 kernel images are available at https://drive.google.com/drive/folders/1eMkQleAFGkP2vzM_JyRA21oKE0ESHBqp <install and configure OS> ycsb_load.sh systemctl stop mongod e2image <backup /mnt/data> <for each kernel> grub2-set-default <baseline, patched> <for each memory utilization> <update /etc/mongod.conf> <for each access distribution> <update ycsb_run.sh> <for each data point> systemctl stop mongod e2image <restore /mnt/data> reboot ycsb_run.sh <collect OPS> Hardware ======== Memory (GB): 64 CPU (total #): 32 NVMe SSD (GB): 1024 OS == $ cat /etc/redhat-release Red Hat Enterprise Linux release 8.4 (Ootpa) $ cat /proc/swaps Filename Type Size Used Priority /dev/nvme0n1p3 partition 32970748 0 -2 $ cat /proc/cmdline <existing parameters> systemd.unified_cgroup_hierarchy=1 $ cat /sys/fs/cgroup/user.slice/memory.min 4294967296 $ cat /proc/sys/vm/overcommit_memory 1 $ cat /sys/kernel/mm/transparent_hugepage/enabled always madvise [never] MongoDB ======= $ mongod --version db version v5.0.3 Build Info: { "version": "5.0.3", "gitVersion": "657fea5a61a74d7a79df7aff8e4bcf0bc742b748", "openSSLVersion": "OpenSSL 1.1.1g FIPS 21 Apr 2020", "modules": [], "allocator": "tcmalloc", "environment": { "distmod": "rhel80", "distarch": "x86_64", "target_arch": "x86_64" } } $ cat /etc/mongod.conf # mongod.conf <existing parameters> # Where and how to store data. storage: dbPath: /mnt/data journal: enabled: true wiredTiger: engineConfig: cacheSizeGB: <50, 60> <existing parameters> YCSB ==== $ git log commit ce3eb9ce51c84ee9e236998cdd2cefaeb96798a8 (HEAD -> master, origin/master, origin/HEAD) Author: Ivan <john.koepi@gmail.com> Date: Tue Feb 16 17:38:00 2021 +0200 [scylla] enable token aware LB by default, improve the docs (#1507) $ cat ycsb_load.sh # load objects ycsb load mongodb -s -threads 16 \ -p mongodb.url=mongodb://%2Ftmp%2Fmongodb-27017.sock \ -p workload=site.ycsb.workloads.CoreWorkload \ -p recordcount=80000000 $ cat ycsb_run.sh # run benchmark ycsb run mongodb -s -threads 64 \ -p mongodb.url=mongodb://%2Ftmp%2Fmongodb-27017.sock \ -p workload=site.ycsb.workloads.CoreWorkload \ -p recordcount=80000000 -p operationcount=80000000 \ -p readproportion=0.8 -p updateproportion=0.2 \ -p requestdistribution=<exponential, uniform, zipfian> Results ======= Comparing the patched with the baseline kernel, MongoDB achieved 95% CIs [2.23, 3.44]%, [6.97, 9.73]% and [2.16, 3.55]% more OPS respectively for exponential access, random access and Zipfian access, when underutizling memory; 95% CIs [8.83, 10.03]%, [21.12, 23.14]% and [5.53, 6.46]% more OPS respectively for exponential access, random access and Zipfian access, when slightly overcommitting memory. +--------------------+-----------------------+-----------------------+ | Mean OPS [95% CI] | Underutilizing memory | Overcommitting memory | +--------------------+-----------------------+-----------------------+ | Exponential access | 76615.56 / 78788.76 | 73984.90 / 80961.66 | | | [1708.76, 2637.62] | [6533.94, 7419.58] | +--------------------+-----------------------+-----------------------+ | Random access | 62093.40 / 67276.01 | 55990.56 / 68379.91 | | | [4324.96, 6040.25] | [11824.09, 12954.62] | +--------------------+-----------------------+-----------------------+ | Zipfian access | 92532.25 / 95174.43 | 93545.62 / 99151.12 | | | [1997.20, 3287.17] | [5171.27, 6039.72] | +--------------------+-----------------------+-----------------------+ Table 1. Comparison between the baseline and patched kernels Comparing overcommitting with underutilizing memory, MongoDB achieved 95% CIs [-4.10, -2.77]%, [-11.20, -8.46]% and [0.36, 1.83]% more OPS respectively for exponential access, random access and Zipfian access, when using the baseline kernel; 95% CIs [2.27, 3.25]%, [0.78, 2.50]% and [3.81, 4.54]% more OPS respectively for exponential access, random access and Zipfian access, when using the patched kernel. +--------------------+-----------------------+-----------------------+ | Mean OPS [95% CI] | Baseline kernel | Patched kernel | +--------------------+-----------------------+-----------------------+ | Exponential access | 76615.56 / 73984.90 | 78788.76 / 80961.66 | | | [-3139.12, -2122.20] | [1786.70, 2559.09] | +--------------------+-----------------------+-----------------------+ | Random access | 62093.40 / 55990.56 | 67276.01 / 68379.91 | | | [-6953.44, -5252.23] | [525.42, 1682.38] | +--------------------+-----------------------+-----------------------+ | Zipfian access | 92532.25 / 93545.62 | 95174.43 / 99151.12 | | | [330.99, 1695.75] | [3628.31, 4325.06] | +--------------------+-----------------------+-----------------------+ Table 2. Comparison between underutilizing and overcommitting memory Metrics collected during each run are available at https://github.com/ediworks/KernelPerf/tree/master/mglru/mongodb/5.14 Appendix ======== $ cat raw_data_mongodb.r v <- c( # baseline 50g exp 75814.86, 75884.91, 76052.71, 76621.01, 76641.19, 76661.24, 76870.15, 77017.79, 77289.08, 77302.67, # baseline 50g uni 60638.17, 60968.91, 61128.61, 61548.40, 61779.30, 61917.58, 62152.28, 63440.15, 63625.47, 63735.11, # baseline 50g zip 91271.16, 91482.41, 91524.17, 92467.16, 92585.62, 92843.29, 92885.65, 93229.98, 93408.94, 93624.08, # baseline 60g exp 73183.67, 73191.30, 73527.58, 73831.79, 74047.95, 74056.24, 74401.23, 74418.53, 74547.58, 74643.08, # baseline 60g uni 55175.76, 55477.42, 55605.52, 55680.21, 55903.39, 56171.05, 56375.06, 56380.43, 56509.94, 56626.78, # baseline 60g zip 92653.82, 92775.02, 93100.44, 93290.21, 93593.74, 93775.64, 93868.72, 93915.12, 94194.77, 94288.69, # patched 50g exp 78349.95, 78385.64, 78392.33, 78419.91, 78726.59, 78738.68, 78930.72, 78948.25, 79404.38, 79591.14, # patched 50g uni 66622.91, 66667.33, 66951.43, 67104.80, 67117.30, 67196.90, 67389.75, 67406.62, 68131.43, 68171.61, # patched 50g zip 94261.14, 94822.34, 94914.70, 95114.89, 95156.75, 95205.90, 95383.78, 95612.00, 95624.00, 95648.81, # patched 60g exp 80272.04, 80612.33, 80679.23, 80717.74, 81011.18, 81029.64, 81146.68, 81371.84, 81379.13, 81396.76, # patched 60g uni 67559.52, 67600.11, 67718.90, 68062.57, 68278.78, 68446.56, 68452.82, 68853.86, 69278.34, 69547.67, # patched 60g zip 98706.81, 98864.41, 98903.77, 99044.10, 99155.68, 99162.94, 99165.64, 99482.31, 99484.91, 99540.62 ) a <- array(v, dim = c(10, 3, 2, 2)) # baseline vs patched for (mem in 1:2) { for (dist in 1:3) { r <- t.test(a[, dist, mem, 1], a[, dist, mem, 2]) print(r) p <- r$conf.int * 100 / r$estimate[1] if ((p[1] > 0 && p[2] < 0) || (p[1] < 0 && p[2] > 0)) { s <- sprintf("mem%d dist%d: no significance", mem, dist) } else { s <- sprintf("mem%d dist%d: [%.2f, %.2f]%%", mem, dist, -p[2], -p[1]) } print(s) } } # 50g vs 60g for (kern in 1:2) { for (dist in 1:3) { r <- t.test(a[, dist, 1, kern], a[, dist, 2, kern]) print(r) p <- r$conf.int * 100 / r$estimate[1] if ((p[1] > 0 && p[2] < 0) || (p[1] < 0 && p[2] > 0)) { s <- sprintf("kern%d dist%d: no significance", kern, dist) } else { s <- sprintf("kern%d dist%d: [%.2f, %.2f]%%", kern, dist, -p[2], -p[1]) } print(s) } } $ R -q -s -f raw_data_mongodb.r Welch Two Sample t-test data: a[, dist, mem, 1] and a[, dist, mem, 2] t = -9.8624, df = 17.23, p-value = 1.671e-08 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -2637.627 -1708.769 sample estimates: mean of x mean of y 76615.56 78788.76 [1] "mem1 dist1: [2.23, 3.44]%" Welch Two Sample t-test data: a[, dist, mem, 1] and a[, dist, mem, 2] t = -13.081, df = 12.744, p-value = 9.287e-09 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -6040.256 -4324.964 sample estimates: mean of x mean of y 62093.40 67276.01 [1] "mem1 dist2: [6.97, 9.73]%" Welch Two Sample t-test data: a[, dist, mem, 1] and a[, dist, mem, 2] t = -8.8194, df = 13.459, p-value = 5.833e-07 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -3287.17 -1997.20 sample estimates: mean of x mean of y 92532.25 95174.43 [1] "mem1 dist3: [2.16, 3.55]%" Welch Two Sample t-test data: a[, dist, mem, 1] and a[, dist, mem, 2] t = -33.368, df = 16.192, p-value = 2.329e-16 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -7419.582 -6533.942 sample estimates: mean of x mean of y 73984.90 80961.66 [1] "mem2 dist1: [8.83, 10.03]%" Welch Two Sample t-test data: a[, dist, mem, 1] and a[, dist, mem, 2] t = -46.386, df = 16.338, p-value < 2.2e-16 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -12954.62 -11824.09 sample estimates: mean of x mean of y 55990.56 68379.91 [1] "mem2 dist2: [21.12, 23.14]%" Welch Two Sample t-test data: a[, dist, mem, 1] and a[, dist, mem, 2] t = -27.844, df = 13.209, p-value = 4.049e-13 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -6039.729 -5171.275 sample estimates: mean of x mean of y 93545.62 99151.12 [1] "mem2 dist3: [5.53, 6.46]%" Welch Two Sample t-test data: a[, dist, 1, kern] and a[, dist, 2, kern] t = 10.87, df = 18, p-value = 2.439e-09 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: 2122.207 3139.125 sample estimates: mean of x mean of y 76615.56 73984.90 [1] "kern1 dist1: [-4.10, -2.77]%" Welch Two Sample t-test data: a[, dist, 1, kern] and a[, dist, 2, kern] t = 15.593, df = 12.276, p-value = 1.847e-09 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: 5252.237 6953.447 sample estimates: mean of x mean of y 62093.40 55990.56 [1] "kern1 dist2: [-11.20, -8.46]%" Welch Two Sample t-test data: a[, dist, 1, kern] and a[, dist, 2, kern] t = -3.1512, df = 15.811, p-value = 0.006252 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -1695.7509 -330.9911 sample estimates: mean of x mean of y 92532.25 93545.62 [1] "kern1 dist3: [0.36, 1.83]%" Welch Two Sample t-test data: a[, dist, 1, kern] and a[, dist, 2, kern] t = -11.836, df = 17.672, p-value = 7.84e-10 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -2559.092 -1786.704 sample estimates: mean of x mean of y 78788.76 80961.66 [1] "kern2 dist1: [2.27, 3.25]%" Welch Two Sample t-test data: a[, dist, 1, kern] and a[, dist, 2, kern] t = -4.0276, df = 16.921, p-value = 0.0008807 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -1682.3864 -525.4236 sample estimates: mean of x mean of y 67276.01 68379.91 [1] "kern2 dist2: [0.78, 2.50]%" Welch Two Sample t-test data: a[, dist, 1, kern] and a[, dist, 2, kern] t = -24.26, df = 15.517, p-value = 9.257e-14 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -4325.062 -3628.314 sample estimates: mean of x mean of y 95174.43 99151.12 [1] "kern2 dist3: [3.81, 4.54]%"