From patchwork Thu Aug 29 07:18:19 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tero Kristo X-Patchwork-Id: 13782704 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.16]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 03CBF1662F1; Thu, 29 Aug 2024 07:54:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.16 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724918079; cv=none; b=mNApWU2QqT3XY/dyq9oUGWuamK3WE1ndpFTUuOqZqg5TINqSo3ydrdINdzoizMvLpJP71hfRji27Jzoh0UjefpXzVMHs+VzkA5zTuZmGly/aI+jTjCqTKJQHrnm7mR/GvJx6oB11g6SHvTV2KWJ+3PsXJq4AKVYLW/5xTadW0J4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724918079; c=relaxed/simple; bh=b4GgLiPHYsug2IjMM3js6XPqklwZWqBgu+zr67nJAg0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=O4CbXAAXtr1u/iq9LMu5aAbHo+QvG/o31mfHTLTytNz0zV40A9WXPKuvvAHvEEgTR3Ej0PVuUIwQGXrqZXW2+vixzRTRfgUDajBihJidLLs3/zh1qzWETJMSKbpEkuABspjsOKQqp+KYRau1qujXmshDswoSEkCWCGPExM+UXWY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=kVyt+7yL; arc=none smtp.client-ip=192.198.163.16 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="kVyt+7yL" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1724918078; x=1756454078; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=b4GgLiPHYsug2IjMM3js6XPqklwZWqBgu+zr67nJAg0=; b=kVyt+7yLEMzEQh6WYtKmyQIXQuCxVafTUmucV6mWvBaYgrYxzW0sA+AO V/Om/fSAWeZPiRtXJ6/5xfojymehjEAEV8P0IvQ2RcqD1+j2xeIu/F9qk gw3LfiV7gioi+kivWKH4lW6gPXtBtIqf/J09bPZQ2sNXiTTPu8Cv7gYpx o3MJ3Ol7jA9gQKEUEYbM7TGc+TgwBoiwTYdBfGC35tIGvT0y3K3A+HIwn PsaIUyTjuyM+MmB95KgMyTihqKJBPnymk0eQ80BgjIRVaqY15iHTzMRXx b3GAaMwqTDSDe+iMq5swYunfFEG86ceI/5JZbCjfnmxWNuziw9G6s90i8 Q==; X-CSE-ConnectionGUID: DLftODU/RpKSU/a/e72Lwg== X-CSE-MsgGUID: AnlaIkVTR3CyhbxiLgwGoQ== X-IronPort-AV: E=McAfee;i="6700,10204,11178"; a="13275403" X-IronPort-AV: E=Sophos;i="6.10,185,1719903600"; d="scan'208";a="13275403" Received: from fmviesa004.fm.intel.com ([10.60.135.144]) by fmvoesa110.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Aug 2024 00:54:37 -0700 X-CSE-ConnectionGUID: zfU+xscUTeSsn1bDuzcVsw== X-CSE-MsgGUID: xb5TQ3zyRTaGsUiKmO9McA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,185,1719903600"; d="scan'208";a="68161190" Received: from carterle-desk.ger.corp.intel.com (HELO tkristo-desk.intel.com) ([10.245.246.205]) by fmviesa004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Aug 2024 00:54:37 -0700 From: Tero Kristo To: axboe@kernel.dk Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [RFC PATCH 1/2] bdev: add support for CPU latency PM QoS tuning Date: Thu, 29 Aug 2024 10:18:19 +0300 Message-ID: <20240829075423.1345042-2-tero.kristo@linux.intel.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240829075423.1345042-1-tero.kristo@linux.intel.com> References: <20240829075423.1345042-1-tero.kristo@linux.intel.com> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Add support for limiting CPU latency while block IO is running. When a block IO is started, it will add a user configurable CPU latency limit in place (if any.) The limit is removed with a configurable timeout mechanism. Signed-off-by: Tero Kristo --- block/bdev.c | 51 +++++++++++++++++++++++++++++++++++++++ block/bio.c | 2 ++ block/blk.h | 2 ++ include/linux/blk_types.h | 12 +++++++++ 4 files changed, 67 insertions(+) diff --git a/block/bdev.c b/block/bdev.c index 353677ac49b3..8f20681a4ea6 100644 --- a/block/bdev.c +++ b/block/bdev.c @@ -405,10 +405,18 @@ void __init bdev_cache_init(void) blockdev_superblock = blockdev_mnt->mnt_sb; /* For writeback */ } +static void bdev_pm_qos_work(struct work_struct *work) +{ + struct bdev_cpu_latency_qos *qos = + container_of(work, struct bdev_cpu_latency_qos, work.work); + dev_pm_qos_remove_request(&qos->req); +} + struct block_device *bdev_alloc(struct gendisk *disk, u8 partno) { struct block_device *bdev; struct inode *inode; + int cpu; inode = new_inode(blockdev_superblock); if (!inode) @@ -433,6 +441,16 @@ struct block_device *bdev_alloc(struct gendisk *disk, u8 partno) return NULL; } bdev->bd_disk = disk; + bdev->bd_pm_qos = alloc_percpu(struct bdev_cpu_latency_qos); + if (!bdev->bd_pm_qos) { + free_percpu(bdev->bd_stats); + iput(inode); + return NULL; + } + for_each_possible_cpu(cpu) + INIT_DELAYED_WORK(per_cpu_ptr(&bdev->bd_pm_qos->work, cpu), + bdev_pm_qos_work); + bdev->cpu_lat_limit = -1; return bdev; } @@ -462,6 +480,19 @@ void bdev_unhash(struct block_device *bdev) void bdev_drop(struct block_device *bdev) { + int cpu; + struct bdev_cpu_latency_qos *qos; + + for_each_possible_cpu(cpu) { + qos = per_cpu_ptr(bdev->bd_pm_qos, cpu); + if (dev_pm_qos_request_active(&qos->req)) { + cancel_delayed_work(&qos->work); + dev_pm_qos_remove_request(&qos->req); + } + } + + free_percpu(bdev->bd_pm_qos); + iput(BD_INODE(bdev)); } @@ -1281,6 +1312,26 @@ void bdev_statx_dioalign(struct inode *inode, struct kstat *stat) blkdev_put_no_open(bdev); } +void bdev_update_cpu_latency_pm_qos(struct block_device *bdev) +{ + int cpu; + struct bdev_cpu_latency_qos *qos; + + if (!bdev || bdev->cpu_lat_limit < 0) + return; + + cpu = raw_smp_processor_id(); + qos = per_cpu_ptr(bdev->bd_pm_qos, cpu); + + if (!dev_pm_qos_request_active(&qos->req)) + dev_pm_qos_add_request(get_cpu_device(cpu), &qos->req, + DEV_PM_QOS_RESUME_LATENCY, + bdev->cpu_lat_limit); + + mod_delayed_work(system_wq, &qos->work, + msecs_to_jiffies(bdev->cpu_lat_timeout)); +} + bool disk_live(struct gendisk *disk) { return !inode_unhashed(BD_INODE(disk->part0)); diff --git a/block/bio.c b/block/bio.c index e9e809a63c59..6c46d75345d7 100644 --- a/block/bio.c +++ b/block/bio.c @@ -282,6 +282,8 @@ void bio_init(struct bio *bio, struct block_device *bdev, struct bio_vec *table, bio->bi_max_vecs = max_vecs; bio->bi_io_vec = table; bio->bi_pool = NULL; + + bdev_update_cpu_latency_pm_qos(bio->bi_bdev); } EXPORT_SYMBOL(bio_init); diff --git a/block/blk.h b/block/blk.h index 189bc25beb50..dda2a188984b 100644 --- a/block/blk.h +++ b/block/blk.h @@ -516,6 +516,8 @@ void drop_partition(struct block_device *part); void bdev_set_nr_sectors(struct block_device *bdev, sector_t sectors); +void bdev_update_cpu_latency_pm_qos(struct block_device *bdev); + struct gendisk *__alloc_disk_node(struct request_queue *q, int node_id, struct lock_class_key *lkclass); diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h index 781c4500491b..0ed29603eaa9 100644 --- a/include/linux/blk_types.h +++ b/include/linux/blk_types.h @@ -11,6 +11,7 @@ #include #include #include +#include struct bio_set; struct bio; @@ -38,6 +39,11 @@ struct bio_crypt_ctx; #define PAGE_SECTORS (1 << PAGE_SECTORS_SHIFT) #define SECTOR_MASK (PAGE_SECTORS - 1) +struct bdev_cpu_latency_qos { + struct dev_pm_qos_request req; + struct delayed_work work; +}; + struct block_device { sector_t bd_start_sect; sector_t bd_nr_sectors; @@ -71,6 +77,12 @@ struct block_device { struct partition_meta_info *bd_meta_info; int bd_writers; + + /* For preventing deep idle during block I/O */ + struct bdev_cpu_latency_qos __percpu *bd_pm_qos; + int cpu_lat_timeout; + int cpu_lat_limit; + /* * keep this out-of-line as it's both big and not needed in the fast * path From patchwork Thu Aug 29 07:18:20 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tero Kristo X-Patchwork-Id: 13782705 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.16]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D6908189F3C; Thu, 29 Aug 2024 07:54:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.16 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724918080; cv=none; b=rIgLDWgitjrxkrsZpteF3Jm8r/bCK5+BvDm4SlKLWTXYHTE6k9cEluEIQ50z6DMWSNFLQZBcDZSbkj8ICoov1FU1d/j0R+V5udL12UP6DSyz0Jbkoql14ms1ANm4zEf9Tbccyu8fcGNJ0H9/8KuJ7DAaGhMLD6cygATcmmCqXvI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724918080; c=relaxed/simple; bh=6ctqR1JOxjRDrvBOIC24P/w4dBCCpjzFRRxbNdyaogM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ZygL0NLxdvn95oARiLqWjCZ7sP+DIMoIn1tGPKOVZAiM8PXNWCm70CUV+Ccwcq3h1dDKrYSnA+HHzd/vCGeK0J3ZH80XgJtEmRbn6t4+5NNfNWWWU9nW/QH6+vEQ+0FNXwhIrx4eStF6vtu3AsSzGJQOUO1/T+40hVSkq9/nuq4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=b+Uoi8rM; arc=none smtp.client-ip=192.198.163.16 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="b+Uoi8rM" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1724918079; x=1756454079; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=6ctqR1JOxjRDrvBOIC24P/w4dBCCpjzFRRxbNdyaogM=; b=b+Uoi8rMN78MYkrsZ+N3eRy7UlPCTvPP+VquJwhbqxq54vmrgd7jLYUE A8dWMe2WgP1zG4MqnXhj/JU0kvgnOvqN7+vBZIuO7FGcE4MH5NmcB+a9r gllzX7Vuy37cYz3m9JKmlO3yi9jd7WE1nhgf6GV+YDlxPO9cUtSBhj7pK TyaoLYTKBJmBJKafCvzNF4oD3u0AASy/KMrFImGtXZolig8EYqRtHiIXX K0B/9h8yJIYlbmyEBhtN+Xho5g/qLqhgiOXMS3XvmsLhimobVYvsRg36a L1fhmhULnzF/wtyLjTNJbwxCsAH0XAeaQ57LeTLHH/oCc1RhoaaVlkyKr g==; X-CSE-ConnectionGUID: aXwbDyAlQTyNyZsGJltvow== X-CSE-MsgGUID: YC5x3lNoQVGOtK1wffbqgQ== X-IronPort-AV: E=McAfee;i="6700,10204,11178"; a="13275405" X-IronPort-AV: E=Sophos;i="6.10,185,1719903600"; d="scan'208";a="13275405" Received: from fmviesa004.fm.intel.com ([10.60.135.144]) by fmvoesa110.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Aug 2024 00:54:39 -0700 X-CSE-ConnectionGUID: RlnBxwenRv2Adg5yY7l2TA== X-CSE-MsgGUID: /SJSYfs3SYa7mC7K8Q+Urw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,185,1719903600"; d="scan'208";a="68161193" Received: from carterle-desk.ger.corp.intel.com (HELO tkristo-desk.intel.com) ([10.245.246.205]) by fmviesa004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Aug 2024 00:54:38 -0700 From: Tero Kristo To: axboe@kernel.dk Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [RFC PATCH 2/2] block/genhd: add sysfs knobs for the CPU latency PM QoS settings Date: Thu, 29 Aug 2024 10:18:20 +0300 Message-ID: <20240829075423.1345042-3-tero.kristo@linux.intel.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240829075423.1345042-1-tero.kristo@linux.intel.com> References: <20240829075423.1345042-1-tero.kristo@linux.intel.com> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Add sysfs knobs for the following parameters: cpu_lat_limit_us: for limiting the CPU latency to given value when block IO is running cpu_lat_timeout_ms: for clearing up the CPU latency limit after block IO is complete This can be used to prevent the CPU from entering deep idle states when block IO is running and waiting for an interrupt, potentially causing large latencies to the operation. Signed-off-by: Tero Kristo --- block/genhd.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 46 insertions(+) diff --git a/block/genhd.c b/block/genhd.c index 8f1f3c6b4d67..2fbd967a3e36 100644 --- a/block/genhd.c +++ b/block/genhd.c @@ -1046,6 +1046,48 @@ static ssize_t partscan_show(struct device *dev, return sprintf(buf, "%u\n", disk_has_partscan(dev_to_disk(dev))); } +static ssize_t cpu_lat_limit_us_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct block_device *bdev = dev_to_bdev(dev); + + return sprintf(buf, "%d\n", bdev->cpu_lat_limit); +} + +static ssize_t cpu_lat_limit_us_store(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count) +{ + struct block_device *bdev = dev_to_bdev(dev); + int i; + + if (count > 0 && !kstrtoint(buf, 10, &i)) + bdev->cpu_lat_limit = i; + + return count; +} + +static ssize_t cpu_lat_timeout_ms_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct block_device *bdev = dev_to_bdev(dev); + + return sprintf(buf, "%d\n", bdev->cpu_lat_timeout); +} + +static ssize_t cpu_lat_timeout_ms_store(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count) +{ + struct block_device *bdev = dev_to_bdev(dev); + int i; + + if (count > 0 && !kstrtoint(buf, 10, &i)) + bdev->cpu_lat_timeout = i; + + return count; +} + static DEVICE_ATTR(range, 0444, disk_range_show, NULL); static DEVICE_ATTR(ext_range, 0444, disk_ext_range_show, NULL); static DEVICE_ATTR(removable, 0444, disk_removable_show, NULL); @@ -1060,6 +1102,8 @@ static DEVICE_ATTR(inflight, 0444, part_inflight_show, NULL); static DEVICE_ATTR(badblocks, 0644, disk_badblocks_show, disk_badblocks_store); static DEVICE_ATTR(diskseq, 0444, diskseq_show, NULL); static DEVICE_ATTR(partscan, 0444, partscan_show, NULL); +static DEVICE_ATTR_RW(cpu_lat_limit_us); +static DEVICE_ATTR_RW(cpu_lat_timeout_ms); #ifdef CONFIG_FAIL_MAKE_REQUEST ssize_t part_fail_show(struct device *dev, @@ -1111,6 +1155,8 @@ static struct attribute *disk_attrs[] = { &dev_attr_events_poll_msecs.attr, &dev_attr_diskseq.attr, &dev_attr_partscan.attr, + &dev_attr_cpu_lat_limit_us.attr, + &dev_attr_cpu_lat_timeout_ms.attr, #ifdef CONFIG_FAIL_MAKE_REQUEST &dev_attr_fail.attr, #endif