From patchwork Wed Aug 23 18:15:15 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shaohua Li X-Patchwork-Id: 9918163 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 719BB602CB for ; Wed, 23 Aug 2017 18:15:23 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 67F2E289CF for ; Wed, 23 Aug 2017 18:15:23 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 5CBF3289F1; Wed, 23 Aug 2017 18:15:23 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C2AC4289CF for ; Wed, 23 Aug 2017 18:15:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932552AbdHWSPV (ORCPT ); Wed, 23 Aug 2017 14:15:21 -0400 Received: from mail.kernel.org ([198.145.29.99]:49056 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932111AbdHWSPT (ORCPT ); Wed, 23 Aug 2017 14:15:19 -0400 Received: from shli-virt.localdomain (unknown [199.201.64.3]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 4D87620C10; Wed, 23 Aug 2017 18:15:18 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4D87620C10 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=fail smtp.mailfrom=shli@fb.com From: Shaohua Li To: linux-block@vger.kernel.org Cc: axboe@kernel.dk, tj@kernel.org, lizefan@huawei.com, Kernel-team@fb.com, Shaohua Li Subject: [RFC] block/loop: make loop cgroup aware Date: Wed, 23 Aug 2017 11:15:15 -0700 Message-Id: X-Mailer: git-send-email 2.11.0 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Shaohua Li Not a merge request, for discussion only. loop block device handles IO in a separate thread. The actual IO dispatched isn't cloned from the IO loop device received, so the dispatched IO loses the cgroup context. I'm ignoring buffer IO case now, which is quite complicated. Making the loop thread aware of cgroup context doesn't really help. The loop device only writes to a single file. In current writeback cgroup implementation, the file can only belong to one cgroup. For direct IO case, we could workaround the issue in theory. For example, say we assign cgroup1 5M/s BW for loop device and cgroup2 10M/s. We can create a special cgroup for loop thread and assign at least 15M/s for the underlayer disk. In this way, we correctly throttle the two cgroups. But this is tricky to setup. This patch tries to address the issue. When loop thread is handling IO, it declares the IO is on behalf of the original task, then in block IO we use the original task to find cgroup. The concept probably works for other scenarios too, but right now I don't make it generic yet. Signed-off-by: Shaohua Li --- block/bio.c | 5 ++++- drivers/block/loop.c | 14 +++++++++++++- drivers/block/loop.h | 1 + include/linux/blk-cgroup.h | 3 ++- include/linux/sched.h | 1 + 5 files changed, 21 insertions(+), 3 deletions(-) diff --git a/block/bio.c b/block/bio.c index e241bbc..8f0df3c 100644 --- a/block/bio.c +++ b/block/bio.c @@ -2058,7 +2058,10 @@ int bio_associate_current(struct bio *bio) get_io_context_active(ioc); bio->bi_ioc = ioc; - bio->bi_css = task_get_css(current, io_cgrp_id); + if (current->cgroup_task) + bio->bi_css = task_get_css(current->cgroup_task, io_cgrp_id); + else + bio->bi_css = task_get_css(current, io_cgrp_id); return 0; } EXPORT_SYMBOL_GPL(bio_associate_current); diff --git a/drivers/block/loop.c b/drivers/block/loop.c index ef83349..fefede3 100644 --- a/drivers/block/loop.c +++ b/drivers/block/loop.c @@ -77,7 +77,7 @@ #include #include #include "loop.h" - +#include #include static DEFINE_IDR(loop_index_idr); @@ -471,6 +471,8 @@ static void lo_rw_aio_complete(struct kiocb *iocb, long ret, long ret2) { struct loop_cmd *cmd = container_of(iocb, struct loop_cmd, iocb); + if (cmd->cgroup_task) + put_task_struct(cmd->cgroup_task); cmd->ret = ret; blk_mq_complete_request(cmd->rq); } @@ -502,11 +504,16 @@ static int lo_rw_aio(struct loop_device *lo, struct loop_cmd *cmd, cmd->iocb.ki_complete = lo_rw_aio_complete; cmd->iocb.ki_flags = IOCB_DIRECT; + if (cmd->cgroup_task) + current->cgroup_task = cmd->cgroup_task; + if (rw == WRITE) ret = call_write_iter(file, &cmd->iocb, &iter); else ret = call_read_iter(file, &cmd->iocb, &iter); + current->cgroup_task = NULL; + if (ret != -EIOCBQUEUED) cmd->iocb.ki_complete(&cmd->iocb, ret, 0); return 0; @@ -1705,6 +1712,11 @@ static blk_status_t loop_queue_rq(struct blk_mq_hw_ctx *hctx, break; } + if (cmd->use_aio) { + cmd->cgroup_task = current; + get_task_struct(current); + } else + cmd->cgroup_task = NULL; kthread_queue_work(&lo->worker, &cmd->work); return BLK_STS_OK; diff --git a/drivers/block/loop.h b/drivers/block/loop.h index 2c096b9..eb98d4d 100644 --- a/drivers/block/loop.h +++ b/drivers/block/loop.h @@ -73,6 +73,7 @@ struct loop_cmd { bool use_aio; /* use AIO interface to handle I/O */ long ret; struct kiocb iocb; + struct task_struct *cgroup_task; }; /* Support for loadable transfer modules */ diff --git a/include/linux/blk-cgroup.h b/include/linux/blk-cgroup.h index 9d92153..38a5517 100644 --- a/include/linux/blk-cgroup.h +++ b/include/linux/blk-cgroup.h @@ -232,7 +232,8 @@ static inline struct blkcg *bio_blkcg(struct bio *bio) { if (bio && bio->bi_css) return css_to_blkcg(bio->bi_css); - return task_blkcg(current); + return task_blkcg(current->cgroup_task ? + current->cgroup_task : current); } static inline struct cgroup_subsys_state * diff --git a/include/linux/sched.h b/include/linux/sched.h index 8337e2d..a5958b0 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -897,6 +897,7 @@ struct task_struct { struct css_set __rcu *cgroups; /* cg_list protected by css_set_lock and tsk->alloc_lock: */ struct list_head cg_list; + struct task_struct *cgroup_task; #endif #ifdef CONFIG_INTEL_RDT_A int closid;