From patchwork Fri Jul 10 01:56:21 2009
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
X-Patchwork-Id: 34915
Received: from hormel.redhat.com (hormel1.redhat.com [209.132.177.33])
	by demeter.kernel.org (8.14.2/8.14.2) with ESMTP id n6A1vGOX022528
	for <patchwork-dm-devel@patchwork.kernel.org>;
	Fri, 10 Jul 2009 01:57:16 GMT
Received: from listman.util.phx.redhat.com (listman.util.phx.redhat.com
	[10.8.4.110])
	by hormel.redhat.com (Postfix) with ESMTP id 34B9161A3F8;
	Thu,  9 Jul 2009 21:57:16 -0400 (EDT)
Received: from int-mx1.corp.redhat.com (int-mx1.corp.redhat.com
	[172.16.52.254])
	by listman.util.phx.redhat.com (8.13.1/8.13.1) with ESMTP id
	n6A1vDZH018768 for <dm-devel@listman.util.phx.redhat.com>;
	Thu, 9 Jul 2009 21:57:14 -0400
Received: from mx3.redhat.com (mx3.redhat.com [172.16.48.32])
	by int-mx1.corp.redhat.com (8.13.1/8.13.1) with ESMTP id
	n6A1vBpg031436; Thu, 9 Jul 2009 21:57:11 -0400
Received: from song.cn.fujitsu.com (cn.fujitsu.com [222.73.24.84] (may be
	forged))
	by mx3.redhat.com (8.13.8/8.13.8) with ESMTP id n6A1uvbW014251;
	Thu, 9 Jul 2009 21:56:57 -0400
Received: from tang.cn.fujitsu.com (tang.cn.fujitsu.com [10.167.250.3])
	by song.cn.fujitsu.com (Postfix) with ESMTP id 937A4170130;
	Fri, 10 Jul 2009 10:09:42 +0800 (CST)
Received: from fnst.cn.fujitsu.com (localhost.localdomain [127.0.0.1])
	by tang.cn.fujitsu.com (8.13.1/8.13.1) with ESMTP id n6A2PnvZ021324;
	Fri, 10 Jul 2009 10:25:50 +0800
Received: from [127.0.0.1] (unknown [10.167.141.226])
	by fnst.cn.fujitsu.com (Postfix) with ESMTPA id A3598D402F;
	Fri, 10 Jul 2009 09:56:38 +0800 (CST)
Message-ID: <4A569FC5.7090801@cn.fujitsu.com>
Date: Fri, 10 Jul 2009 09:56:21 +0800
From: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
User-Agent: Thunderbird 2.0.0.5 (Windows/20070716)
MIME-Version: 1.0
To: Vivek Goyal <vgoyal@redhat.com>
References: <1246564917-19603-1-git-send-email-vgoyal@redhat.com>
In-Reply-To: <1246564917-19603-1-git-send-email-vgoyal@redhat.com>
X-RedHat-Spam-Score: -0.725 
X-Scanned-By: MIMEDefang 2.58 on 172.16.52.254
X-Scanned-By: MIMEDefang 2.63 on 172.16.48.32
X-loop: dm-devel@redhat.com
Cc: dhaval@linux.vnet.ibm.com, snitzer@redhat.com, peterz@infradead.org,
	dm-devel@redhat.com, dpshah@google.com, jens.axboe@oracle.com,
	agk@redhat.com, balbir@linux.vnet.ibm.com, paolo.valente@unimore.it,
	fernando@oss.ntt.co.jp, mikew@google.com, jmoyer@redhat.com,
	nauman@google.com, m-ikeda@ds.jp.nec.com, lizf@cn.fujitsu.com,
	fchecconi@gmail.com, akpm@linux-foundation.org, jbaron@redhat.com,
	linux-kernel@vger.kernel.org, s-uchida@ap.jp.nec.com,
	righi.andrea@gmail.com, containers@lists.linux-foundation.org
Subject: [dm-devel] [PATCH] io-controller: implement per group request
	allocation limitation
X-BeenThere: dm-devel@redhat.com
X-Mailman-Version: 2.1.5
Precedence: junk
Reply-To: device-mapper development <dm-devel@redhat.com>
List-Id: device-mapper development <dm-devel.redhat.com>
List-Unsubscribe: <https://www.redhat.com/mailman/listinfo/dm-devel>,
	<mailto:dm-devel-request@redhat.com?subject=unsubscribe>
List-Archive: <https://www.redhat.com/archives/dm-devel>
List-Post: <mailto:dm-devel@redhat.com>
List-Help: <mailto:dm-devel-request@redhat.com?subject=help>
List-Subscribe: <https://www.redhat.com/mailman/listinfo/dm-devel>,
	<mailto:dm-devel-request@redhat.com?subject=subscribe>
Sender: dm-devel-bounces@redhat.com
Errors-To: dm-devel-bounces@redhat.com

Hi Vivek,

This patch exports a cgroup based per group request limits interface.
and removes the global one. Now we can use this interface to perform

different request allocation limitation for different groups. 

Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
---
 block/blk-core.c     |   23 ++++++++++--
 block/blk-settings.c |    1 -
 block/blk-sysfs.c    |   43 -----------------------
 block/elevator-fq.c  |   94 ++++++++++++++++++++++++++++++++++++++++++++++---
 block/elevator-fq.h  |    4 ++
 5 files changed, 111 insertions(+), 54 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 79fe6a9..7010b76 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -722,13 +722,20 @@ static void ioc_set_batching(struct request_queue *q, struct io_context *ioc)
 static void __freed_request(struct request_queue *q, int sync,
 					struct request_list *rl)
 {
+	struct io_group *iog;
+	unsigned long nr_group_requests;
+
 	if (q->rq_data.count[sync] < queue_congestion_off_threshold(q))
 		blk_clear_queue_congested(q, sync);
 
 	if (q->rq_data.count[sync] + 1 <= q->nr_requests)
 		blk_clear_queue_full(q, sync);
 
-	if (rl->count[sync] + 1 <= q->nr_group_requests) {
+	iog = rl_iog(rl);
+
+	nr_group_requests = get_group_requests(q, iog);
+
+	if (nr_group_requests && rl->count[sync] + 1 <= nr_group_requests) {
 		if (waitqueue_active(&rl->wait[sync]))
 			wake_up(&rl->wait[sync]);
 	}
@@ -828,6 +835,8 @@ static struct request *get_request(struct request_queue *q, int rw_flags,
 	const bool is_sync = rw_is_sync(rw_flags) != 0;
 	int may_queue, priv;
 	int sleep_on_global = 0;
+	struct io_group *iog;
+	unsigned long nr_group_requests;
 
 	may_queue = elv_may_queue(q, rw_flags);
 	if (may_queue == ELV_MQUEUE_NO)
@@ -843,7 +852,12 @@ static struct request *get_request(struct request_queue *q, int rw_flags,
 	if (q->rq_data.count[is_sync]+1 >= q->nr_requests)
 		blk_set_queue_full(q, is_sync);
 
-	if (rl->count[is_sync]+1 >= q->nr_group_requests) {
+	iog = rl_iog(rl);
+
+	nr_group_requests = get_group_requests(q, iog);
+
+	if (nr_group_requests &&
+	    rl->count[is_sync]+1 >= nr_group_requests) {
 		ioc = current_io_context(GFP_ATOMIC, q->node);
 		/*
 		 * The queue request descriptor group will fill after this
@@ -852,7 +866,7 @@ static struct request *get_request(struct request_queue *q, int rw_flags,
 		 * This process will be allowed to complete a batch of
 		 * requests, others will be blocked.
 		 */
-		if (rl->count[is_sync] <= q->nr_group_requests)
+		if (rl->count[is_sync] <= nr_group_requests)
 			ioc_set_batching(q, ioc);
 		else {
 			if (may_queue != ELV_MQUEUE_MUST
@@ -898,7 +912,8 @@ static struct request *get_request(struct request_queue *q, int rw_flags,
 	 * from per group request list
 	 */
 
-	if (rl->count[is_sync] >= (3 * q->nr_group_requests / 2))
+	if (nr_group_requests &&
+	    rl->count[is_sync] >= (3 * nr_group_requests / 2))
 		goto out;
 
 	rl->starved[is_sync] = 0;
diff --git a/block/blk-settings.c b/block/blk-settings.c
index 78b8aec..bd582a7 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -148,7 +148,6 @@ void blk_queue_make_request(struct request_queue *q, make_request_fn *mfn)
 	 * set defaults
 	 */
 	q->nr_requests = BLKDEV_MAX_RQ;
-	q->nr_group_requests = BLKDEV_MAX_GROUP_RQ;
 
 	q->make_request_fn = mfn;
 	blk_queue_dma_alignment(q, 511);
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 92b9f25..706d852 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -78,40 +78,8 @@ queue_requests_store(struct request_queue *q, const char *page, size_t count)
 	return ret;
 }
 #ifdef CONFIG_GROUP_IOSCHED
-static ssize_t queue_group_requests_show(struct request_queue *q, char *page)
-{
-	return queue_var_show(q->nr_group_requests, (page));
-}
-
 extern void elv_io_group_congestion_threshold(struct request_queue *q,
 					      struct io_group *iog);
-
-static ssize_t
-queue_group_requests_store(struct request_queue *q, const char *page,
-					size_t count)
-{
-	struct hlist_node *n;
-	struct io_group *iog;
-	struct elv_fq_data *efqd;
-	unsigned long nr;
-	int ret = queue_var_store(&nr, page, count);
-
-	if (nr < BLKDEV_MIN_RQ)
-		nr = BLKDEV_MIN_RQ;
-
-	spin_lock_irq(q->queue_lock);
-
-	q->nr_group_requests = nr;
-
-	efqd = &q->elevator->efqd;
-
-	hlist_for_each_entry(iog, n, &efqd->group_list, elv_data_node) {
-		elv_io_group_congestion_threshold(q, iog);
-	}
-
-	spin_unlock_irq(q->queue_lock);
-	return ret;
-}
 #endif
 
 static ssize_t queue_ra_show(struct request_queue *q, char *page)
@@ -278,14 +246,6 @@ static struct queue_sysfs_entry queue_requests_entry = {
 	.store = queue_requests_store,
 };
 
-#ifdef CONFIG_GROUP_IOSCHED
-static struct queue_sysfs_entry queue_group_requests_entry = {
-	.attr = {.name = "nr_group_requests", .mode = S_IRUGO | S_IWUSR },
-	.show = queue_group_requests_show,
-	.store = queue_group_requests_store,
-};
-#endif
-
 static struct queue_sysfs_entry queue_ra_entry = {
 	.attr = {.name = "read_ahead_kb", .mode = S_IRUGO | S_IWUSR },
 	.show = queue_ra_show,
@@ -360,9 +320,6 @@ static struct queue_sysfs_entry queue_iostats_entry = {
 
 static struct attribute *default_attrs[] = {
 	&queue_requests_entry.attr,
-#ifdef CONFIG_GROUP_IOSCHED
-	&queue_group_requests_entry.attr,
-#endif
 	&queue_ra_entry.attr,
 	&queue_max_hw_sectors_entry.attr,
 	&queue_max_sectors_entry.attr,
diff --git a/block/elevator-fq.c b/block/elevator-fq.c
index 29392e7..bfb0210 100644
--- a/block/elevator-fq.c
+++ b/block/elevator-fq.c
@@ -59,6 +59,35 @@ elv_release_ioq(struct elevator_queue *eq, struct io_queue **ioq_ptr);
 #define for_each_entity_safe(entity, parent) \
 	for (; entity && ({ parent = entity->parent; 1; }); entity = parent)
 
+unsigned short get_group_requests(struct request_queue *q,
+				  struct io_group *iog)
+{
+	struct cgroup_subsys_state *css;
+	struct io_cgroup *iocg;
+	unsigned long nr_group_requests;
+
+	if (!iog)
+		return q->nr_requests;
+
+	rcu_read_lock();
+
+	if (!iog->iocg_id) {
+		nr_group_requests = 0;
+		goto out;
+	}
+
+	css = css_lookup(&io_subsys, iog->iocg_id);
+	if (!css) {
+		nr_group_requests = 0;
+		goto out;
+	}
+
+	iocg = container_of(css, struct io_cgroup, css);
+	nr_group_requests = iocg->nr_group_requests;
+out:
+	rcu_read_unlock();
+	return nr_group_requests;
+}
 
 static struct io_entity *bfq_lookup_next_entity(struct io_sched_data *sd,
 						 int extract);
@@ -1257,14 +1286,17 @@ void elv_io_group_congestion_threshold(struct request_queue *q,
 						struct io_group *iog)
 {
 	int nr;
+	unsigned long nr_group_requests;
 
-	nr = q->nr_group_requests - (q->nr_group_requests / 8) + 1;
-	if (nr > q->nr_group_requests)
-		nr = q->nr_group_requests;
+	nr_group_requests = get_group_requests(q, iog);
+
+	nr = nr_group_requests - (nr_group_requests / 8) + 1;
+	if (nr > nr_group_requests)
+		nr = nr_group_requests;
 	iog->nr_congestion_on = nr;
 
-	nr = q->nr_group_requests - (q->nr_group_requests / 8)
-			- (q->nr_group_requests / 16) - 1;
+	nr = nr_group_requests - (nr_group_requests / 8)
+			- (nr_group_requests / 16) - 1;
 	if (nr < 1)
 		nr = 1;
 	iog->nr_congestion_off = nr;
@@ -1283,6 +1315,7 @@ int elv_io_group_congested(struct request_queue *q, struct page *page, int sync)
 {
 	struct io_group *iog;
 	int ret = 0;
+	unsigned long nr_group_requests;
 
 	rcu_read_lock();
 
@@ -1300,10 +1333,11 @@ int elv_io_group_congested(struct request_queue *q, struct page *page, int sync)
 	}
 
 	ret = elv_is_iog_congested(q, iog, sync);
+	nr_group_requests = get_group_requests(q, iog);
 	if (ret)
 		elv_log_iog(&q->elevator->efqd, iog, "iog congested=%d sync=%d"
 			" rl.count[sync]=%d nr_group_requests=%d",
-			ret, sync, iog->rl.count[sync], q->nr_group_requests);
+			ret, sync, iog->rl.count[sync], nr_group_requests);
 	rcu_read_unlock();
 	return ret;
 }
@@ -1549,6 +1583,48 @@ free_buf:
 	return ret;
 }
 
+static u64 io_cgroup_nr_requests_read(struct cgroup *cgroup,
+				       struct cftype *cftype)
+{
+	struct io_cgroup *iocg;
+	u64 ret;
+
+	if (!cgroup_lock_live_group(cgroup))
+		return -ENODEV;
+
+	iocg = cgroup_to_io_cgroup(cgroup);
+	spin_lock_irq(&iocg->lock);
+	ret = iocg->nr_group_requests;
+	spin_unlock_irq(&iocg->lock);
+
+	cgroup_unlock();
+
+	return ret;
+}
+
+static int io_cgroup_nr_requests_write(struct cgroup *cgroup,
+					struct cftype *cftype,
+					u64 val)
+{
+	struct io_cgroup *iocg;
+
+	if (val < BLKDEV_MIN_RQ)
+		val = BLKDEV_MIN_RQ;
+
+	if (!cgroup_lock_live_group(cgroup))
+		return -ENODEV;
+
+	iocg = cgroup_to_io_cgroup(cgroup);
+
+	spin_lock_irq(&iocg->lock);
+	iocg->nr_group_requests = (unsigned long)val;
+	spin_unlock_irq(&iocg->lock);
+
+	cgroup_unlock();
+
+	return 0;
+}
+
 #define SHOW_FUNCTION(__VAR)						\
 static u64 io_cgroup_##__VAR##_read(struct cgroup *cgroup,		\
 				       struct cftype *cftype)		\
@@ -1735,6 +1811,11 @@ static int io_cgroup_disk_dequeue_read(struct cgroup *cgroup,
 
 struct cftype bfqio_files[] = {
 	{
+		.name = "nr_group_requests",
+		.read_u64 = io_cgroup_nr_requests_read,
+		.write_u64 = io_cgroup_nr_requests_write,
+	},
+	{
 		.name = "policy",
 		.read_seq_string = io_cgroup_policy_read,
 		.write_string = io_cgroup_policy_write,
@@ -1790,6 +1871,7 @@ static struct cgroup_subsys_state *iocg_create(struct cgroup_subsys *subsys,
 
 	spin_lock_init(&iocg->lock);
 	INIT_HLIST_HEAD(&iocg->group_data);
+	iocg->nr_group_requests = BLKDEV_MAX_GROUP_RQ;
 	iocg->weight = IO_DEFAULT_GRP_WEIGHT;
 	iocg->ioprio_class = IO_DEFAULT_GRP_CLASS;
 	INIT_LIST_HEAD(&iocg->policy_list);
diff --git a/block/elevator-fq.h b/block/elevator-fq.h
index f089a55..df077d0 100644
--- a/block/elevator-fq.h
+++ b/block/elevator-fq.h
@@ -308,6 +308,7 @@ struct io_cgroup {
 	unsigned int weight;
 	unsigned short ioprio_class;
 
+	unsigned long nr_group_requests;
 	/* list of io_policy_node */
 	struct list_head policy_list;
 
@@ -386,6 +387,9 @@ struct elv_fq_data {
 	unsigned int fairness;
 };
 
+extern unsigned short get_group_requests(struct request_queue *q,
+					 struct io_group *iog);
+
 /* Logging facilities. */
 #ifdef CONFIG_DEBUG_GROUP_IOSCHED
 #define elv_log_ioq(efqd, ioq, fmt, args...) \