From patchwork Fri Jun 29 19:21:24 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Josef Bacik X-Patchwork-Id: 10497343 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 00EFB60325 for ; Fri, 29 Jun 2018 19:21:30 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E0E48296CA for ; Fri, 29 Jun 2018 19:21:29 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id DE5F929A81; Fri, 29 Jun 2018 19:21:29 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI, T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 773BC29991 for ; Fri, 29 Jun 2018 19:21:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934805AbeF2TV2 (ORCPT ); Fri, 29 Jun 2018 15:21:28 -0400 Received: from mail-qk0-f193.google.com ([209.85.220.193]:40157 "EHLO mail-qk0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S934711AbeF2TV1 (ORCPT ); Fri, 29 Jun 2018 15:21:27 -0400 Received: by mail-qk0-f193.google.com with SMTP id b129-v6so5533698qke.7 for ; Fri, 29 Jun 2018 12:21:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=toxicpanda-com.20150623.gappssmtp.com; s=20150623; h=from:to:subject:date:message-id; bh=vcplngC+6UU+HMlOKNjc4/HjozjLhCjjckLDKCA4piE=; b=nbZTq4PhwSage+QUmBaXMT3ydMgb/lSxlzxf/B19d2mMyKMtzs8o0z1uVnnnXjGCzJ bQHihsPbtFM91bTGAY7oYkOR+acNOLkTirKAR8Y7XFDCjT3MHtq6jaYwPKlrZtSCbhmI 8+4IIvIVti+eP4gsvqk4Mkh/LoKGFZumAR+S0nMKkCR7Zm0GD5+lYUqJ0W0ISbhfSB2F lWFp9cxG4R7uzEE9uPrCRoy6EyApDTHsbRAvtzEO/hG6dpa4y+uDOyrqRvd2io0wfbji uXr0ETURy/praAcrBxflU05dhbQTXNcpb8m39TJLxtvb2x+yCRy+KpXX1ZH6ciEv6LpM FAeQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id; bh=vcplngC+6UU+HMlOKNjc4/HjozjLhCjjckLDKCA4piE=; b=tbWO6Dr8Kwkz5o7tmF0L0V24gWNHaKyrxDKKo27gdE08aluNIx3rsD2LrmuRtvueL2 tzV6f9I1PQGQmBNkS6Ge0W44sxmZHuKoAftSKGpjfRZCNVPpa87/15/+ppIW+NcCpiN8 lfHn8LZSkaMHN2nPSaW6KpM30PesZOx/8ionItF4yJFD74KO/7qYDQRRkgY68a3vMCCu Bnt1lpItVmqZbc0fihbVYjehf2gU6Q4oR/FKWb024V8T3b8LxrU/tDgsR3kx0CUjz2MQ czj2vOv0NurhM4DoH6TH0Mn8yL/X6T6E0m5N0RzfHk3Ljz/PCcXz6TAYeGUD77O6ZfB3 Mwlw== X-Gm-Message-State: APt69E2avlhFUEyYy4sIEw+5gxhAbZSa2A7pVyDdyS7/Vg9f5UNhaA4t cpjU+1CA62IWoYJ4wjXROQ44W574 X-Google-Smtp-Source: AAOMgpfDHigCZsVGTSdNryOxVrvrnTlNcInVNg7Fx7h5Cg2MMd7lSXiHuepY84lvnu5g2NzlE5dEaQ== X-Received: by 2002:a37:444a:: with SMTP id r71-v6mr14035786qka.79.1530300086156; Fri, 29 Jun 2018 12:21:26 -0700 (PDT) Received: from localhost ([107.15.81.208]) by smtp.gmail.com with ESMTPSA id l25-v6sm5979354qkk.49.2018.06.29.12.21.25 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Fri, 29 Jun 2018 12:21:25 -0700 (PDT) From: Josef Bacik To: fstests@vger.kernel.org, kernel-team@fb.com Subject: [PATCH] fstests: add generic/499, a cgroup io.latency test Date: Fri, 29 Jun 2018 15:21:24 -0400 Message-Id: <20180629192124.22505-1-josef@toxicpanda.com> X-Mailer: git-send-email 2.14.3 Sender: fstests-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: fstests@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP The infrastructure is fairly straightforward, just some helpers to verify cgroup2 is mounted and to setup a cgroup for our test. The fio json script just uses the existing library to print out a value for a given key in a fio json output file. The test itself runs a fio job outside of a cgroup by itself to establish the baseline performance of the job. Then we setup two cgroups, one protected with io.latency and one that is not, and run the same job with 2 threads doing the same workload, one in the protected group and one in the unprotected group. We are trying to verify that io.latency does the protection properly. I tested this with my kernel that has io.latency enabled, as well as with and without all the various features turned off so hopefully it fails gracefully for people who don't have the pre-requisites. Signed-off-by: Josef Bacik --- NOTE: You need io.latency, which can only be found in my blk-iolatency branches in my btrfs-next tree on kernel.org. You also need the fio master branch because I had to fix fio to work with cgroup2 and spit out the right numbers for us to verify. common/cgroup | 42 +++++++++++++ common/config | 1 + common/perf | 10 +++ src/perf/fio-key-value.py | 28 +++++++++ tests/generic/499 | 151 ++++++++++++++++++++++++++++++++++++++++++++++ tests/generic/499.out | 2 + tests/generic/group | 1 + 7 files changed, 235 insertions(+) create mode 100644 common/cgroup create mode 100644 src/perf/fio-key-value.py create mode 100644 tests/generic/499 create mode 100644 tests/generic/499.out diff --git a/common/cgroup b/common/cgroup new file mode 100644 index 000000000000..d74d402976a7 --- /dev/null +++ b/common/cgroup @@ -0,0 +1,42 @@ +##/bin/bash +# SPDX-License-Identifier: GPL-2.0 +# +# common functions for setting up cgroups for tasks + +_cgroup2_base_dir() +{ + grep cgroup2 /proc/mounts | awk '{ print $2 }' +} + +_cleanup_cgroup2() +{ + _dir=$(_cgroup2_base_dir)/xfstest + for i in $(find ${_dir} -type d | tac) + do + rmdir $i + done +} + +_require_cgroup2() +{ + grep -q 'cgroup2' /proc/mounts || _notrun "This test requires cgroup2" +} + +_require_cgroup2_controller_file() +{ + _require_cgroup2 + + _controller=$1 + _file=$2 + _dir=$(_cgroup2_base_dir) + + grep -q ${_controller} ${_dir}/cgroup.controllers || \ + _notrun "No support for ${_controller} cgroup controller" + + mkdir ${_dir}/xfstest + echo "+${_controller}" > ${_dir}/cgroup.subtree_control + if [ ! -f ${_dir}/xfstest/${_file} ]; then + _cleanup_cgroup2 + _notrun "Cgroup file ${_file} doesn't exist" + fi +} diff --git a/common/config b/common/config index a127e98fba2d..d9284ccdf4c7 100644 --- a/common/config +++ b/common/config @@ -192,6 +192,7 @@ export SETCAP_PROG="$(type -P setcap)" export GETCAP_PROG="$(type -P getcap)" export CHECKBASHISMS_PROG="$(type -P checkbashisms)" export XFS_INFO_PROG="$(type -P xfs_info)" +export TIME_PROG="$(type -P time)" # use 'udevadm settle' or 'udevsettle' to wait for lv to be settled. # newer systems have udevadm command but older systems like RHEL5 don't. diff --git a/common/perf b/common/perf index 8b4c9bef8a8d..56034b00a410 100644 --- a/common/perf +++ b/common/perf @@ -38,3 +38,13 @@ _fio_results_compare() -c $PERF_CONFIGNAME -d $RESULT_BASE/fio-results.db \ -n $_testname $_resultfile } + +_fio_results_key() +{ + _job=$1 + _key=$2 + _resultfile=$3 + + $PYTHON2_PROG $here/src/perf/fio-key-value.py -k $_key -j $_job \ + $_resultfile +} diff --git a/src/perf/fio-key-value.py b/src/perf/fio-key-value.py new file mode 100644 index 000000000000..208e9a453a19 --- /dev/null +++ b/src/perf/fio-key-value.py @@ -0,0 +1,28 @@ +# SPDX-License-Identifier: GPL-2.0 + +import FioResultDecoder +import json +import argparse +import sys +import platform + +parser = argparse.ArgumentParser() +parser.add_argument('-j', '--jobname', type=str, + help="The jobname we want our key from.", + required=True) +parser.add_argument('-k', '--key', type=str, + help="The key we want the value of", required=True) +parser.add_argument('result', type=str, + help="The result file read") +args = parser.parse_args() + +json_data = open(args.result) +data = json.load(json_data, cls=FioResultDecoder.FioResultDecoder) + +for job in data['jobs']: + if job['jobname'] == args.jobname: + if args.key not in job: + print('') + else: + print("{}".format(job[args.key])) + break diff --git a/tests/generic/499 b/tests/generic/499 new file mode 100644 index 000000000000..c50fe3c8db36 --- /dev/null +++ b/tests/generic/499 @@ -0,0 +1,151 @@ +#! /bin/bash +# SPDX-License-Identifier: GPL-2.0 +# +# FS QA Test No. 499 +# +# Test that verifies the io.latency controller is doing something resembling +# what it's supposed to be doing. +seq=`basename $0` +seqres=$RESULT_DIR/$seq +echo "QA output created by $seq" +tmp=/tmp/$$ +fio_config_single=$tmp-single.fio +fio_config_double=$tmp-double.fio +fio_results=$tmp.json +status=1 # failure is the default! +trap "_cleanup; exit \$status" 0 1 2 3 15 + +_cleanup() +{ + _cleanup_cgroup2 + cd / + rm -f $tmp.* +} + +_cgroup_run() +{ + $FIO_PROG --output-format=json --output=$2 $3 + echo "$1 finished" >> $seqres.full + cat /sys/fs/cgroup/xfstest/fast/io.stat >> $seqres.full + cat /sys/fs/cgroup/xfstest/slow/io.stat >> $seqres.full +} + +# get standard environment, filters and checks +. ./common/rc +. ./common/cgroup +. ./common/perf + +# real QA test starts here +_supported_fs generic +_supported_os Linux +_require_scratch +_require_cgroup2_controller_file io io.latency +_require_fio_results + +rm -f $seqres.full + +# This is the basic test so we have a baseline for this box +cat >$fio_config_single << EOF +[fast] +directory=${SCRATCH_MNT} +direct=1 +allrandrepeat=1 +readwrite=randrw +size=4G +ioengine=libaio +iodepth=1024 +fallocate=none +EOF + +# This runs one thread in a high priority group, another in a low priority +# group. +cat >$fio_config_double << EOF +[global] +directory=${SCRATCH_MNT} +direct=1 +allrandrepeat=1 +readwrite=randrw +size=4G +ioengine=libaio +iodepth=1024 +fallocate=none + +[fast] +cgroup=xfstest/fast + +[slow] +cgroup=xfstest/slow +EOF + +_require_fio $fio_config_double + +_scratch_mkfs >>$seqres.full 2>&1 +_scratch_mount + +# We want to make sure the scratch device is large enough that we don't incur +# ENOSPC related slowdowns +_size=$((16 * 4 * 1024 * 1024)) +_require_fs_space $SCRATCH_MNT $_size + +# We run the test once so we have an idea of how fast this workload will go with +# nobody else doing IO on the device. +$FIO_PROG --output-format=json --output=${fio_results} $fio_config_single + +_scratch_unmount + +# Grab the time the job took +_time_taken=$(_fio_results_key fast job_runtime $fio_results) +[ "${_time_taken}" = "" ] && _notrun "fio doesn't report job_runtime" + +echo "normal time taken ${_time_taken}" >> $seqres.full + +# File system internals are going to affect us a bit here, so we need to be a +# little fuzzy about our thresholds. First we need to make sure our fast group +# isn't affected too much, and 15% gives us a little bit of wiggle room. But if +# we just so happen to be pretty fast and 2 tasks running at the same time with +# equal weight happens to finish in this threshold we need to have a second +# higher threshold to make sure that the slow task was indeed throttled. So set +# a 50% threshold that the slow group must exceed to make sure we did actually +# throttle the slow group +_fast_thresh=$((${_time_taken} + ${_time_taken} * 15 / 100)) +_slow_thresh=$((${_time_taken} + ${_time_taken} * 50 / 100)) +echo "fast threshold time is ${_fast_thresh}" >> $seqres.full +echo "slow threshold time is ${_slow_thresh}" >> $seqres.full + +# Create the cgroup files +_dir=$(_cgroup2_base_dir)/xfstest +echo "+io" > ${_dir}/cgroup.subtree_control +mkdir ${_dir}/fast +mkdir ${_dir}/slow + +# We set the target to 1usec because we could have a fast device that is capable +# of remarkable IO latencies that would skew the test. It needs to be low +# enough that we do actually throttle the slow group, otherwise this test will +# make no sense. +_major=$((0x$(stat -c "%t" ${SCRATCH_DEV}))) +_minor=$((0x$(stat -c "%T" ${SCRATCH_DEV}))) +echo "${_major}:${_minor} target=1" > ${_dir}/fast/io.latency +[ $? -ne 0 ] && _fatal "Failed to set our latency target" + +# Start from a clean slate +_scratch_mkfs >> $seqres.full 2>&1 +_scratch_mount + +$FIO_PROG --output-format=json --output=${fio_results} ${fio_config_double} + +_scratch_unmount + +# Pull the times for both jobs +_fast_time=$(_fio_results_key fast job_runtime $fio_results) +echo "Fast time ${_fast_time}" >> $seqres.full +_slow_time=$(_fio_results_key slow job_runtime $fio_results) +echo "Slow time ${_slow_time}" >> $seqres.full + +[ ${_fast_thresh} -lt ${_fast_time} ] && \ + _fatal "Too much of a performance drop for the protected workload" +[ ${_slow_thresh} -gt ${_slow_time} ] && \ + _fatal "The slow group does not appear to have been throttled" + +echo "Silence is golden" +status=0 +exit diff --git a/tests/generic/499.out b/tests/generic/499.out new file mode 100644 index 000000000000..c363e6848f82 --- /dev/null +++ b/tests/generic/499.out @@ -0,0 +1,2 @@ +QA output created by 499 +Silence is golden diff --git a/tests/generic/group b/tests/generic/group index 83a6fdab7880..c685c3e65c18 100644 --- a/tests/generic/group +++ b/tests/generic/group @@ -501,3 +501,4 @@ 496 auto quick swap 497 auto quick swap collapse 498 auto quick log +499 auto