From patchwork Thu May 7 12:22:55 2009 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrea Righi X-Patchwork-Id: 26263 Received: from hormel.redhat.com (hormel1.redhat.com [209.132.177.33]) by demeter.kernel.org (8.14.2/8.14.2) with ESMTP id n4R0VhKn011126 for ; Wed, 27 May 2009 00:31:43 GMT Received: from listman.util.phx.redhat.com (listman.util.phx.redhat.com [10.8.4.110]) by hormel.redhat.com (Postfix) with ESMTP id D3F638E02B0; Tue, 26 May 2009 20:31:38 -0400 (EDT) Received: from int-mx1.corp.redhat.com (int-mx1.corp.redhat.com [172.16.52.254]) by listman.util.phx.redhat.com (8.13.1/8.13.1) with ESMTP id n47CNSwf013987 for ; Thu, 7 May 2009 08:23:28 -0400 Received: from mx1.redhat.com (mx1.redhat.com [172.16.48.31]) by int-mx1.corp.redhat.com (8.13.1/8.13.1) with ESMTP id n47CNPPh003521; Thu, 7 May 2009 08:23:25 -0400 Received: from mail-fx0-f214.google.com (mail-fx0-f214.google.com [209.85.220.214]) by mx1.redhat.com (8.13.8/8.13.8) with ESMTP id n47CN01B006216; Thu, 7 May 2009 08:23:01 -0400 Received: by fxm10 with SMTP id 10so743928fxm.3 for ; Thu, 07 May 2009 05:22:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:date:from:to:cc:subject :message-id:references:mime-version:content-type:content-disposition :in-reply-to:user-agent; bh=ICiDCrz6ycnbDOLMLodWvE4Jpme/NaG1I2BUsRKPqCc=; b=TOmq3yMjY+52ZCq9NcB/THwtkxPpYQgPB6G9ggw5roMUAiYnanPTSp465lfGyhV7wh jKZfIzfKJ1DQk4XhypYqzIkv9p22ZJLVN0XPzzb+r45rNdIwbK86puatr9v75oAtpUWm D4ubCPRo79voWVvLgk4Fap4Ckt38qnnK20bVw= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; b=UbjtCbY+qFl9SdHAaUNgyYzcnblIfnL5eUluFtoIv8o9oQ7JTpZmgJZpVBwPt5nt1Z aLyqof/BDbp1UvNyjM1tL7lXTGtDW8JfVuzkBDOyNYpQm4CfFaqmvk0YiZbXZCORymPj 4BwI3XTajtvxl6u06NtyiXGg0qIeUWMFnKk0U= Received: by 10.103.134.8 with SMTP id l8mr1569503mun.116.1241698979820; Thu, 07 May 2009 05:22:59 -0700 (PDT) Received: from localhost (generic-nat.unisi.it [193.205.5.2]) by mx.google.com with ESMTPS id 12sm25839391muq.23.2009.05.07.05.22.58 (version=TLSv1/SSLv3 cipher=RC4-MD5); Thu, 07 May 2009 05:22:59 -0700 (PDT) Date: Thu, 7 May 2009 14:22:55 +0200 From: Andrea Righi To: Vivek Goyal Message-ID: <20090507122254.GA5892@linux> References: <1241553525-28095-1-git-send-email-vgoyal@redhat.com> <20090505132441.1705bfad.akpm@linux-foundation.org> <20090506023332.GA1212@redhat.com> <20090506203228.GH8180@redhat.com> <20090506213453.GC4282@linux> <20090506215235.GJ8180@redhat.com> <20090507090450.GA4613@linux> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20090507090450.GA4613@linux> User-Agent: Mutt/1.5.18 (2008-05-17) X-RedHat-Spam-Score: -1.288 X-Scanned-By: MIMEDefang 2.58 on 172.16.52.254 X-Scanned-By: MIMEDefang 2.63 on 172.16.48.31 X-loop: dm-devel@redhat.com X-Mailman-Approved-At: Tue, 26 May 2009 20:31:16 -0400 Cc: dhaval@linux.vnet.ibm.com, snitzer@redhat.com, peterz@infradead.org, dm-devel@redhat.com, dpshah@google.com, jens.axboe@oracle.com, agk@redhat.com, balbir@linux.vnet.ibm.com, paolo.valente@unimore.it, guijianfeng@cn.fujitsu.com, fernando@oss.ntt.co.jp, mikew@google.com, jmoyer@redhat.com, nauman@google.com, m-ikeda@ds.jp.nec.com, lizf@cn.fujitsu.com, fchecconi@gmail.com, s-uchida@ap.jp.nec.com, containers@lists.linux-foundation.org, linux-kernel@vger.kernel.org, Andrew Morton Subject: [dm-devel] Re: IO scheduler based IO Controller V2 X-BeenThere: dm-devel@redhat.com X-Mailman-Version: 2.1.5 Precedence: junk Reply-To: device-mapper development List-Id: device-mapper development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: dm-devel-bounces@redhat.com Errors-To: dm-devel-bounces@redhat.com On Thu, May 07, 2009 at 11:04:50AM +0200, Andrea Righi wrote: > On Wed, May 06, 2009 at 05:52:35PM -0400, Vivek Goyal wrote: > > > > Without io-throttle patches > > > > --------------------------- > > > > - Two readers, first BE prio 7, second BE prio 0 > > > > > > > > 234179072 bytes (234 MB) copied, 4.12074 s, 56.8 MB/s > > > > High prio reader finished > > > > 234179072 bytes (234 MB) copied, 5.36023 s, 43.7 MB/s > > > > > > > > Note: There is no service differentiation between prio 0 and prio 7 task > > > > with io-throttle patches. > > > > > > > > Test 3 > > > > ====== > > > > - Run the one RT reader and one BE reader in root cgroup without any > > > > limitations. I guess this should mean unlimited BW and behavior should > > > > be same as with CFQ without io-throttling patches. > > > > > > > > With io-throttle patches > > > > ========================= > > > > Ran the test 4 times because I was getting different results in different > > > > runs. > > > > > > > > - Two readers, one RT prio 0 other BE prio 7 > > > > > > > > 234179072 bytes (234 MB) copied, 2.74604 s, 85.3 MB/s > > > > 234179072 bytes (234 MB) copied, 5.20995 s, 44.9 MB/s > > > > RT task finished > > > > > > > > 234179072 bytes (234 MB) copied, 4.54417 s, 51.5 MB/s > > > > RT task finished > > > > 234179072 bytes (234 MB) copied, 5.23396 s, 44.7 MB/s > > > > > > > > 234179072 bytes (234 MB) copied, 5.17727 s, 45.2 MB/s > > > > RT task finished > > > > 234179072 bytes (234 MB) copied, 5.25894 s, 44.5 MB/s > > > > > > > > 234179072 bytes (234 MB) copied, 2.74141 s, 85.4 MB/s > > > > 234179072 bytes (234 MB) copied, 5.20536 s, 45.0 MB/s > > > > RT task finished > > > > > > > > Note: Out of 4 runs, looks like twice it is complete priority inversion > > > > and RT task finished after BE task. Rest of the two times, the > > > > difference between BW of RT and BE task is much less as compared to > > > > without patches. In fact once it was almost same. > > > > > > This is strange. If you don't set any limit there shouldn't be any > > > difference respect to the other case (without io-throttle patches). > > > > > > At worst a small overhead given by the task_to_iothrottle(), under > > > rcu_read_lock(). I'll repeat this test ASAP and see if I'll be able to > > > reproduce this strange behaviour. > > > > Ya, I also found this strange. At least in root group there should not be > > any behavior change (at max one might expect little drop in throughput > > because of extra code). > > Hi Vivek, > > I'm not able to reproduce the strange behaviour above. > > Which commands are you running exactly? is the system isolated (stupid > question) no cron or background tasks doing IO during the tests? > > Following the script I've used: > > $ cat test.sh > #!/bin/sh > echo 3 > /proc/sys/vm/drop_caches > ionice -c 1 -n 0 dd if=bigfile1 of=/dev/null bs=1M 2>&1 | sed "s/\(.*\)/RT: \1/" & > cat /proc/$!/cgroup | sed "s/\(.*\)/RT: \1/" > ionice -c 2 -n 7 dd if=bigfile2 of=/dev/null bs=1M 2>&1 | sed "s/\(.*\)/BE: \1/" & > cat /proc/$!/cgroup | sed "s/\(.*\)/BE: \1/" > for i in 1 2; do > wait > done > > And the results on my PC: > > 2.6.30-rc4 > ~~~~~~~~~~ > $ sudo sh test.sh | sort > BE: 234+0 records in > BE: 234+0 records out > BE: 245366784 bytes (245 MB) copied, 21.3406 s, 11.5 MB/s > RT: 234+0 records in > RT: 234+0 records out > RT: 245366784 bytes (245 MB) copied, 11.989 s, 20.5 MB/s > $ sudo sh test.sh | sort > BE: 234+0 records in > BE: 234+0 records out > BE: 245366784 bytes (245 MB) copied, 23.4436 s, 10.5 MB/s > RT: 234+0 records in > RT: 234+0 records out > RT: 245366784 bytes (245 MB) copied, 11.9555 s, 20.5 MB/s > $ sudo sh test.sh | sort > BE: 234+0 records in > BE: 234+0 records out > BE: 245366784 bytes (245 MB) copied, 21.622 s, 11.3 MB/s > RT: 234+0 records in > RT: 234+0 records out > RT: 245366784 bytes (245 MB) copied, 11.9856 s, 20.5 MB/s > $ sudo sh test.sh | sort > BE: 234+0 records in > BE: 234+0 records out > BE: 245366784 bytes (245 MB) copied, 21.5664 s, 11.4 MB/s > RT: 234+0 records in > RT: 234+0 records out > RT: 245366784 bytes (245 MB) copied, 11.8522 s, 20.7 MB/s > > 2.6.30-rc4 + io-throttle, no BW limit, both tasks in the root cgroup > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > $ sudo sh ./test.sh | sort > BE: 234+0 records in > BE: 234+0 records out > BE: 245366784 bytes (245 MB) copied, 23.6739 s, 10.4 MB/s > BE: cgroup 4:blockio:/ > RT: 234+0 records in > RT: 234+0 records out > RT: 245366784 bytes (245 MB) copied, 12.2853 s, 20.0 MB/s > RT: 4:blockio:/ > $ sudo sh ./test.sh | sort > BE: 234+0 records in > BE: 234+0 records out > BE: 245366784 bytes (245 MB) copied, 23.7483 s, 10.3 MB/s > BE: cgroup 4:blockio:/ > RT: 234+0 records in > RT: 234+0 records out > RT: 245366784 bytes (245 MB) copied, 12.3597 s, 19.9 MB/s > RT: 4:blockio:/ > $ sudo sh ./test.sh | sort > BE: 234+0 records in > BE: 234+0 records out > BE: 245366784 bytes (245 MB) copied, 23.6843 s, 10.4 MB/s > BE: cgroup 4:blockio:/ > RT: 234+0 records in > RT: 234+0 records out > RT: 245366784 bytes (245 MB) copied, 12.4886 s, 19.6 MB/s > RT: 4:blockio:/ > $ sudo sh ./test.sh | sort > BE: 234+0 records in > BE: 234+0 records out > BE: 245366784 bytes (245 MB) copied, 23.8621 s, 10.3 MB/s > BE: cgroup 4:blockio:/ > RT: 234+0 records in > RT: 234+0 records out > RT: 245366784 bytes (245 MB) copied, 12.6737 s, 19.4 MB/s > RT: 4:blockio:/ > > The difference seems to be just the expected overhead. BTW, it is possible to reduce the io-throttle overhead even more for non io-throttle users (also when CONFIG_CGROUP_IO_THROTTLE is enabled) using the trick below. 2.6.30-rc4 + io-throttle + following patch, no BW limit, tasks in root cgroup ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ $ sudo sh test.sh | sort BE: 234+0 records in BE: 234+0 records out BE: 245366784 bytes (245 MB) copied, 17.462 s, 14.1 MB/s BE: 4:blockio:/ RT: 234+0 records in RT: 234+0 records out RT: 245366784 bytes (245 MB) copied, 11.7865 s, 20.8 MB/s RT: 4:blockio:/ $ sudo sh test.sh | sort BE: 234+0 records in BE: 234+0 records out BE: 245366784 bytes (245 MB) copied, 18.8375 s, 13.0 MB/s BE: 4:blockio:/ RT: 234+0 records in RT: 234+0 records out RT: 245366784 bytes (245 MB) copied, 11.9148 s, 20.6 MB/s RT: 4:blockio:/ $ sudo sh test.sh | sort BE: 234+0 records in BE: 234+0 records out BE: 245366784 bytes (245 MB) copied, 19.6826 s, 12.5 MB/s BE: 4:blockio:/ RT: 234+0 records in RT: 234+0 records out RT: 245366784 bytes (245 MB) copied, 11.8715 s, 20.7 MB/s RT: 4:blockio:/ $ sudo sh test.sh | sort BE: 234+0 records in BE: 234+0 records out BE: 245366784 bytes (245 MB) copied, 18.9152 s, 13.0 MB/s BE: 4:blockio:/ RT: 234+0 records in RT: 234+0 records out RT: 245366784 bytes (245 MB) copied, 11.8925 s, 20.6 MB/s RT: 4:blockio:/ [ To be applied on top of io-throttle v16 ] Signed-off-by: Andrea Righi --- block/blk-io-throttle.c | 16 ++++++++++++++-- 1 files changed, 14 insertions(+), 2 deletions(-) -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel diff --git a/block/blk-io-throttle.c b/block/blk-io-throttle.c index e2dfd24..8b45c71 100644 --- a/block/blk-io-throttle.c +++ b/block/blk-io-throttle.c @@ -131,6 +131,14 @@ struct iothrottle_node { struct iothrottle_stat stat; }; +/* + * This is a trick to reduce the unneded overhead when io-throttle is not used + * at all. We use a counter of the io-throttle rules; if the counter is zero, + * we immediately return from the io-throttle hooks, without accounting IO and + * without checking if we need to apply some limiting rules. + */ +static atomic_t iothrottle_node_count __read_mostly; + /** * struct iothrottle - throttling rules for a cgroup * @css: pointer to the cgroup state @@ -193,6 +201,7 @@ static void iothrottle_insert_node(struct iothrottle *iot, { WARN_ON_ONCE(!cgroup_is_locked()); list_add_rcu(&n->node, &iot->list); + atomic_inc(&iothrottle_node_count); } /* @@ -214,6 +223,7 @@ iothrottle_delete_node(struct iothrottle *iot, struct iothrottle_node *n) { WARN_ON_ONCE(!cgroup_is_locked()); list_del_rcu(&n->node); + atomic_dec(&iothrottle_node_count); } /* @@ -250,8 +260,10 @@ static void iothrottle_destroy(struct cgroup_subsys *ss, struct cgroup *cgrp) * reference to the list. */ if (!list_empty(&iot->list)) - list_for_each_entry_safe(n, p, &iot->list, node) + list_for_each_entry_safe(n, p, &iot->list, node) { kfree(n); + atomic_dec(&iothrottle_node_count); + } kfree(iot); } @@ -836,7 +848,7 @@ cgroup_io_throttle(struct bio *bio, struct block_device *bdev, ssize_t bytes) unsigned long long sleep; int type, can_sleep = 1; - if (iothrottle_disabled()) + if (iothrottle_disabled() || !atomic_read(&iothrottle_node_count)) return 0; if (unlikely(!bdev)) return 0;