From patchwork Thu Jun 16 21:22:08 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stefan Roesch X-Patchwork-Id: 12884866 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3B7F8C433EF for ; Thu, 16 Jun 2022 21:25:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1378660AbiFPVZS (ORCPT ); Thu, 16 Jun 2022 17:25:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44996 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229538AbiFPVZS (ORCPT ); Thu, 16 Jun 2022 17:25:18 -0400 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9921360DAF for ; Thu, 16 Jun 2022 14:25:17 -0700 (PDT) Received: from pps.filterd (m0109333.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 25GKPWeM024960 for ; Thu, 16 Jun 2022 14:25:17 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=+IYV+ohWMnP9Z2WqVp2E+CnW9cqqI1fK3UacqUP+NEU=; b=lnXyN5VaZy0jkf97IXIxihXSHAqSNnJcPl4o74ExLjQfPVwLMsID3kE8O5bN7ICiVJ00 OYa+gdNnJBWq0vMjeq5l6P3XvmFJsVqpNJL6n5jip9rKv1hQE+npX5l3e7tWVA8PDLHj ZcJJ1KPGAuhlEbGWQJw8i+vpnlLmR9sz+N8= Received: from maileast.thefacebook.com ([163.114.130.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3gqt6fpd4m-2 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Thu, 16 Jun 2022 14:25:16 -0700 Received: from twshared25107.07.ash9.facebook.com (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:83::7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.28; Thu, 16 Jun 2022 14:25:15 -0700 Received: by devvm225.atn0.facebook.com (Postfix, from userid 425415) id 8B053108B7093; Thu, 16 Jun 2022 14:22:23 -0700 (PDT) From: Stefan Roesch To: , , , , CC: , , , , , Subject: [PATCH v9 01/14] mm: Move starting of background writeback into the main balancing loop Date: Thu, 16 Jun 2022 14:22:08 -0700 Message-ID: <20220616212221.2024518-2-shr@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220616212221.2024518-1-shr@fb.com> References: <20220616212221.2024518-1-shr@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: UceYNZ3QYzQpdMo27TFjMnT_4bVPFPkm X-Proofpoint-ORIG-GUID: UceYNZ3QYzQpdMo27TFjMnT_4bVPFPkm X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.883,Hydra:6.0.517,FMLib:17.11.64.514 definitions=2022-06-16_18,2022-06-16_01,2022-02-23_01 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org From: Jan Kara We start background writeback if we are over background threshold after exiting the main loop in balance_dirty_pages(). This may result in basing the decision on already stale values (we may have slept for significant amount of time) and it is also inconvenient for refactoring needed for async dirty throttling. Move the check into the main waiting loop. Signed-off-by: Jan Kara Signed-off-by: Stefan Roesch --- mm/page-writeback.c | 31 ++++++++++++++----------------- 1 file changed, 14 insertions(+), 17 deletions(-) diff --git a/mm/page-writeback.c b/mm/page-writeback.c index 55c2776ae699..e59c523aed1a 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -1627,6 +1627,19 @@ static void balance_dirty_pages(struct bdi_writeback *wb, } } + /* + * In laptop mode, we wait until hitting the higher threshold + * before starting background writeout, and then write out all + * the way down to the lower threshold. So slow writers cause + * minimal disk activity. + * + * In normal mode, we start background writeout at the lower + * background_thresh, to keep the amount of dirty memory low. + */ + if (!laptop_mode && nr_reclaimable > gdtc->bg_thresh && + !writeback_in_progress(wb)) + wb_start_background_writeback(wb); + /* * Throttle it only when the background writeback cannot * catch-up. This avoids (excessively) small writeouts @@ -1657,6 +1670,7 @@ static void balance_dirty_pages(struct bdi_writeback *wb, break; } + /* Start writeback even when in laptop mode */ if (unlikely(!writeback_in_progress(wb))) wb_start_background_writeback(wb); @@ -1823,23 +1837,6 @@ static void balance_dirty_pages(struct bdi_writeback *wb, if (!dirty_exceeded && wb->dirty_exceeded) wb->dirty_exceeded = 0; - - if (writeback_in_progress(wb)) - return; - - /* - * In laptop mode, we wait until hitting the higher threshold before - * starting background writeout, and then write out all the way down - * to the lower threshold. So slow writers cause minimal disk activity. - * - * In normal mode, we start background writeout at the lower - * background_thresh, to keep the amount of dirty memory low. - */ - if (laptop_mode) - return; - - if (nr_reclaimable > gdtc->bg_thresh) - wb_start_background_writeback(wb); } static DEFINE_PER_CPU(int, bdp_ratelimits); From patchwork Thu Jun 16 21:22:09 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stefan Roesch X-Patchwork-Id: 12884868 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 123EDCCA47F for ; Thu, 16 Jun 2022 21:25:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1378495AbiFPVZY (ORCPT ); Thu, 16 Jun 2022 17:25:24 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45068 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1378784AbiFPVZW (ORCPT ); Thu, 16 Jun 2022 17:25:22 -0400 Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5C0CF60DBD for ; Thu, 16 Jun 2022 14:25:21 -0700 (PDT) Received: from pps.filterd (m0109332.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 25GKPWlj032025 for ; Thu, 16 Jun 2022 14:25:20 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=kEVeF1raHjzqCCTZ1PYGpmk3OQwPxUvEougAHhggs/M=; b=G1MjJwdD5PalkJ2RTEk/KfUzsOeYWuUlma+oPKFKpQ10w0YVrYt9iGcUjc7ANb+h1j+R v8/AZfHpMDDO+BlylzMdXyH/rM3Pd36Ge3cTOGVRRAnwGIhM86SyBGZQWOQdAD01NEFf 6oUF+vy4oC6mVF9DCdJy5ZCu+MvUPytlLEw= Received: from maileast.thefacebook.com ([163.114.130.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3gqxc5yhpt-7 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Thu, 16 Jun 2022 14:25:20 -0700 Received: from twshared5131.09.ash9.facebook.com (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:83::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.28; Thu, 16 Jun 2022 14:25:17 -0700 Received: by devvm225.atn0.facebook.com (Postfix, from userid 425415) id 93E6F108B7095; Thu, 16 Jun 2022 14:22:23 -0700 (PDT) From: Stefan Roesch To: , , , , CC: , , , , , Subject: [PATCH v9 02/14] mm: Move updates of dirty_exceeded into one place Date: Thu, 16 Jun 2022 14:22:09 -0700 Message-ID: <20220616212221.2024518-3-shr@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220616212221.2024518-1-shr@fb.com> References: <20220616212221.2024518-1-shr@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: YyuiX4qsTqHRiDmEeOL3e4fTGouIVAJ8 X-Proofpoint-ORIG-GUID: YyuiX4qsTqHRiDmEeOL3e4fTGouIVAJ8 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.883,Hydra:6.0.517,FMLib:17.11.64.514 definitions=2022-06-16_18,2022-06-16_01,2022-02-23_01 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org From: Jan Kara Transition of wb->dirty_exceeded from 0 to 1 happens before we go to sleep in balance_dirty_pages() while transition from 1 to 0 happens when exiting from balance_dirty_pages(), possibly based on old values. This does not make a lot of sense since wb->dirty_exceeded should simply reflect whether wb is over dirty limit and so we should ratelimit entering to balance_dirty_pages() less. Move the two updates together. Signed-off-by: Jan Kara Signed-off-by: Stefan Roesch --- mm/page-writeback.c | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/mm/page-writeback.c b/mm/page-writeback.c index e59c523aed1a..90b1998c16a1 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -1729,8 +1729,8 @@ static void balance_dirty_pages(struct bdi_writeback *wb, sdtc = mdtc; } - if (dirty_exceeded && !wb->dirty_exceeded) - wb->dirty_exceeded = 1; + if (dirty_exceeded != wb->dirty_exceeded) + wb->dirty_exceeded = dirty_exceeded; if (time_is_before_jiffies(READ_ONCE(wb->bw_time_stamp) + BANDWIDTH_INTERVAL)) @@ -1834,9 +1834,6 @@ static void balance_dirty_pages(struct bdi_writeback *wb, if (fatal_signal_pending(current)) break; } - - if (!dirty_exceeded && wb->dirty_exceeded) - wb->dirty_exceeded = 0; } static DEFINE_PER_CPU(int, bdp_ratelimits); From patchwork Thu Jun 16 21:22:10 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stefan Roesch X-Patchwork-Id: 12884869 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8229DCCA47C for ; Thu, 16 Jun 2022 21:28:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1379172AbiFPV2V (ORCPT ); Thu, 16 Jun 2022 17:28:21 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47348 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1379116AbiFPV2S (ORCPT ); Thu, 16 Jun 2022 17:28:18 -0400 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0AC0422502 for ; Thu, 16 Jun 2022 14:28:17 -0700 (PDT) Received: from pps.filterd (m0044010.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 25GKPSwX020230 for ; Thu, 16 Jun 2022 14:28:16 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=V4TkoQ7WXm0jJu0pyVno6F/WdfdTQwfd01SRs7zvVjU=; b=eFMT/WnlE3uD4DSMpityR+D9RsFeVopvkHCY0i1z52oE1pFQg0f182KxO0KAj8NFBX+V 0VwbBSNGPWdv2cjE4ltuYfaJzc8O857JVoS8yHBcE8zu3iNI9hiHDwyKRopoxtnXfQBf AtEWjTuiQALH9xy9HZurHZ8niezDOxxjOg8= Received: from mail.thefacebook.com ([163.114.132.120]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3gqve427wj-3 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Thu, 16 Jun 2022 14:28:16 -0700 Received: from twshared14818.18.frc3.facebook.com (2620:10d:c085:108::4) by mail.thefacebook.com (2620:10d:c085:11d::5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.28; Thu, 16 Jun 2022 14:28:15 -0700 Received: by devvm225.atn0.facebook.com (Postfix, from userid 425415) id 9B98D108B7097; Thu, 16 Jun 2022 14:22:23 -0700 (PDT) From: Stefan Roesch To: , , , , CC: , , , , , , Christoph Hellwig Subject: [PATCH v9 03/14] mm: Add balance_dirty_pages_ratelimited_flags() function Date: Thu, 16 Jun 2022 14:22:10 -0700 Message-ID: <20220616212221.2024518-4-shr@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220616212221.2024518-1-shr@fb.com> References: <20220616212221.2024518-1-shr@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: Po1PKghZkFM9r41E1Etg7-VIj-jGlnVH X-Proofpoint-GUID: Po1PKghZkFM9r41E1Etg7-VIj-jGlnVH X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.883,Hydra:6.0.517,FMLib:17.11.64.514 definitions=2022-06-16_18,2022-06-16_01,2022-02-23_01 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org From: Jan Kara This adds the helper function balance_dirty_pages_ratelimited_flags(). It adds the parameter flags to balance_dirty_pages_ratelimited(). The flags parameter is passed to balance_dirty_pages(). For async buffered writes the flag value will be BDP_ASYNC. If balance_dirty_pages() gets called for async buffered write, we don't want to wait. Instead we need to indicate to the caller that throttling is needed so that it can stop writing and offload the rest of the write to a context that can block. The new helper function is also used by balance_dirty_pages_ratelimited(). Signed-off-by: Jan Kara Signed-off-by: Stefan Roesch Reviewed-by: Christoph Hellwig Reported-by: kernel test robot --- include/linux/writeback.h | 7 ++++++ mm/page-writeback.c | 51 +++++++++++++++++++++++++++++++-------- 2 files changed, 48 insertions(+), 10 deletions(-) diff --git a/include/linux/writeback.h b/include/linux/writeback.h index da21d63f70e2..b8c9610c2313 100644 --- a/include/linux/writeback.h +++ b/include/linux/writeback.h @@ -364,7 +364,14 @@ void global_dirty_limits(unsigned long *pbackground, unsigned long *pdirty); unsigned long wb_calc_thresh(struct bdi_writeback *wb, unsigned long thresh); void wb_update_bandwidth(struct bdi_writeback *wb); + +/* Invoke balance dirty pages in async mode. */ +#define BDP_ASYNC 0x0001 + void balance_dirty_pages_ratelimited(struct address_space *mapping); +int balance_dirty_pages_ratelimited_flags(struct address_space *mapping, + unsigned int flags); + bool wb_over_bg_thresh(struct bdi_writeback *wb); typedef int (*writepage_t)(struct page *page, struct writeback_control *wbc, diff --git a/mm/page-writeback.c b/mm/page-writeback.c index 90b1998c16a1..bfca433640fc 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -1554,8 +1554,8 @@ static inline void wb_dirty_limits(struct dirty_throttle_control *dtc) * If we're over `background_thresh' then the writeback threads are woken to * perform some writeout. */ -static void balance_dirty_pages(struct bdi_writeback *wb, - unsigned long pages_dirtied) +static int balance_dirty_pages(struct bdi_writeback *wb, + unsigned long pages_dirtied, unsigned int flags) { struct dirty_throttle_control gdtc_stor = { GDTC_INIT(wb) }; struct dirty_throttle_control mdtc_stor = { MDTC_INIT(wb, &gdtc_stor) }; @@ -1575,6 +1575,7 @@ static void balance_dirty_pages(struct bdi_writeback *wb, struct backing_dev_info *bdi = wb->bdi; bool strictlimit = bdi->capabilities & BDI_CAP_STRICTLIMIT; unsigned long start_time = jiffies; + int ret = 0; for (;;) { unsigned long now = jiffies; @@ -1803,6 +1804,10 @@ static void balance_dirty_pages(struct bdi_writeback *wb, period, pause, start_time); + if (flags & BDP_ASYNC) { + ret = -EAGAIN; + break; + } __set_current_state(TASK_KILLABLE); wb->dirty_sleep = now; io_schedule_timeout(pause); @@ -1834,6 +1839,7 @@ static void balance_dirty_pages(struct bdi_writeback *wb, if (fatal_signal_pending(current)) break; } + return ret; } static DEFINE_PER_CPU(int, bdp_ratelimits); @@ -1855,27 +1861,34 @@ static DEFINE_PER_CPU(int, bdp_ratelimits); DEFINE_PER_CPU(int, dirty_throttle_leaks) = 0; /** - * balance_dirty_pages_ratelimited - balance dirty memory state - * @mapping: address_space which was dirtied + * balance_dirty_pages_ratelimited_flags - Balance dirty memory state. + * @mapping: address_space which was dirtied. + * @flags: BDP flags. * * Processes which are dirtying memory should call in here once for each page * which was newly dirtied. The function will periodically check the system's * dirty state and will initiate writeback if needed. * - * Once we're over the dirty memory limit we decrease the ratelimiting - * by a lot, to prevent individual processes from overshooting the limit - * by (ratelimit_pages) each. + * See balance_dirty_pages_ratelimited() for details. + * + * Return: If @flags contains BDP_ASYNC, it may return -EAGAIN to + * indicate that memory is out of balance and the caller must wait + * for I/O to complete. Otherwise, it will return 0 to indicate + * that either memory was already in balance, or it was able to sleep + * until the amount of dirty memory returned to balance. */ -void balance_dirty_pages_ratelimited(struct address_space *mapping) +int balance_dirty_pages_ratelimited_flags(struct address_space *mapping, + unsigned int flags) { struct inode *inode = mapping->host; struct backing_dev_info *bdi = inode_to_bdi(inode); struct bdi_writeback *wb = NULL; int ratelimit; + int ret = 0; int *p; if (!(bdi->capabilities & BDI_CAP_WRITEBACK)) - return; + return ret; if (inode_cgwb_enabled(inode)) wb = wb_get_create_current(bdi, GFP_KERNEL); @@ -1915,9 +1928,27 @@ void balance_dirty_pages_ratelimited(struct address_space *mapping) preempt_enable(); if (unlikely(current->nr_dirtied >= ratelimit)) - balance_dirty_pages(wb, current->nr_dirtied); + balance_dirty_pages(wb, current->nr_dirtied, flags); wb_put(wb); + return ret; +} + +/** + * balance_dirty_pages_ratelimited - balance dirty memory state. + * @mapping: address_space which was dirtied. + * + * Processes which are dirtying memory should call in here once for each page + * which was newly dirtied. The function will periodically check the system's + * dirty state and will initiate writeback if needed. + * + * Once we're over the dirty memory limit we decrease the ratelimiting + * by a lot, to prevent individual processes from overshooting the limit + * by (ratelimit_pages) each. + */ +void balance_dirty_pages_ratelimited(struct address_space *mapping) +{ + balance_dirty_pages_ratelimited_flags(mapping, 0); } EXPORT_SYMBOL(balance_dirty_pages_ratelimited); From patchwork Thu Jun 16 21:22:12 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stefan Roesch X-Patchwork-Id: 12884908 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D6080C43334 for ; Thu, 16 Jun 2022 21:52:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231792AbiFPVwV (ORCPT ); Thu, 16 Jun 2022 17:52:21 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37268 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231686AbiFPVwV (ORCPT ); Thu, 16 Jun 2022 17:52:21 -0400 Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 358245D5E6 for ; Thu, 16 Jun 2022 14:52:20 -0700 (PDT) Received: from pps.filterd (m0109332.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 25GKPW7E032011 for ; Thu, 16 Jun 2022 14:52:19 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=Ynwi9SDirpjeaykEJbYhjcV/3AMQY22jWj0npJc0zwc=; b=T+KrILDy+2qw2Spbt8zh88B6udlxLj5RiMNAizsQ3xpk6rcwlDkncLomG9GmnLHiFvho L0LEz4KXCVZhyXBAfCmoWBKC+o6BeiKF9na2cCPhbhuLqlUTFIK9+SNUdf2ChQ/ZceiR dK1TzRz8hG40uY6lhJoE6ECfzRKx8Nju2uo= Received: from mail.thefacebook.com ([163.114.132.120]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3gqxc5ypfh-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Thu, 16 Jun 2022 14:52:19 -0700 Received: from snc-exhub201.TheFacebook.com (2620:10d:c085:21d::7) by snc-exhub103.TheFacebook.com (2620:10d:c085:11d::7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.28; Thu, 16 Jun 2022 14:52:18 -0700 Received: from twshared25107.07.ash9.facebook.com (2620:10d:c085:108::4) by mail.thefacebook.com (2620:10d:c085:21d::7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.28; Thu, 16 Jun 2022 14:52:17 -0700 Received: by devvm225.atn0.facebook.com (Postfix, from userid 425415) id ABE25108B709F; Thu, 16 Jun 2022 14:22:23 -0700 (PDT) From: Stefan Roesch To: , , , , CC: , , , , , , Christoph Hellwig Subject: [PATCH v9 05/14] iomap: Add async buffered write support Date: Thu, 16 Jun 2022 14:22:12 -0700 Message-ID: <20220616212221.2024518-6-shr@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220616212221.2024518-1-shr@fb.com> References: <20220616212221.2024518-1-shr@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: tWyrSTV4O0IDEMrScKyhrGCtW9mR9w3o X-Proofpoint-ORIG-GUID: tWyrSTV4O0IDEMrScKyhrGCtW9mR9w3o X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.883,Hydra:6.0.517,FMLib:17.11.64.514 definitions=2022-06-16_18,2022-06-16_01,2022-02-23_01 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org This adds async buffered write support to iomap. This replaces the call to balance_dirty_pages_ratelimited() with the call to balance_dirty_pages_ratelimited_flags. This allows to specify if the write request is async or not. In addition this also moves the above function call to the beginning of the function. If the function call is at the end of the function and the decision is made to throttle writes, then there is no request that io-uring can wait on. By moving it to the beginning of the function, the write request is not issued, but returns -EAGAIN instead. io-uring will punt the request and process it in the io-worker. By moving the function call to the beginning of the function, the write throttling will happen one page later. Signed-off-by: Stefan Roesch Reviewed-by: Jan Kara Reviewed-by: Christoph Hellwig --- fs/iomap/buffered-io.c | 33 ++++++++++++++++++++++++++++----- 1 file changed, 28 insertions(+), 5 deletions(-) diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index 3c97b713f831..83cf093fcb92 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -559,6 +559,7 @@ static int __iomap_write_begin(const struct iomap_iter *iter, loff_t pos, loff_t block_size = i_blocksize(iter->inode); loff_t block_start = round_down(pos, block_size); loff_t block_end = round_up(pos + len, block_size); + unsigned int nr_blocks = i_blocks_per_folio(iter->inode, folio); size_t from = offset_in_folio(folio, pos), to = from + len; size_t poff, plen; @@ -567,6 +568,8 @@ static int __iomap_write_begin(const struct iomap_iter *iter, loff_t pos, folio_clear_error(folio); iop = iomap_page_create(iter->inode, folio, iter->flags); + if ((iter->flags & IOMAP_NOWAIT) && !iop && nr_blocks > 1) + return -EAGAIN; do { iomap_adjust_read_range(iter->inode, folio, &block_start, @@ -584,7 +587,12 @@ static int __iomap_write_begin(const struct iomap_iter *iter, loff_t pos, return -EIO; folio_zero_segments(folio, poff, from, to, poff + plen); } else { - int status = iomap_read_folio_sync(block_start, folio, + int status; + + if (iter->flags & IOMAP_NOWAIT) + return -EAGAIN; + + status = iomap_read_folio_sync(block_start, folio, poff, plen, srcmap); if (status) return status; @@ -613,6 +621,9 @@ static int iomap_write_begin(const struct iomap_iter *iter, loff_t pos, unsigned fgp = FGP_LOCK | FGP_WRITE | FGP_CREAT | FGP_STABLE | FGP_NOFS; int status = 0; + if (iter->flags & IOMAP_NOWAIT) + fgp |= FGP_NOWAIT; + BUG_ON(pos + len > iter->iomap.offset + iter->iomap.length); if (srcmap != &iter->iomap) BUG_ON(pos + len > srcmap->offset + srcmap->length); @@ -632,7 +643,7 @@ static int iomap_write_begin(const struct iomap_iter *iter, loff_t pos, folio = __filemap_get_folio(iter->inode->i_mapping, pos >> PAGE_SHIFT, fgp, mapping_gfp_mask(iter->inode->i_mapping)); if (!folio) { - status = -ENOMEM; + status = (iter->flags & IOMAP_NOWAIT) ? -EAGAIN : -ENOMEM; goto out_no_page; } if (pos + len > folio_pos(folio) + folio_size(folio)) @@ -750,6 +761,8 @@ static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i) loff_t pos = iter->pos; ssize_t written = 0; long status = 0; + struct address_space *mapping = iter->inode->i_mapping; + unsigned int bdp_flags = (iter->flags & IOMAP_NOWAIT) ? BDP_ASYNC : 0; do { struct folio *folio; @@ -762,6 +775,11 @@ static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i) bytes = min_t(unsigned long, PAGE_SIZE - offset, iov_iter_count(i)); again: + status = balance_dirty_pages_ratelimited_flags(mapping, + bdp_flags); + if (unlikely(status)) + break; + if (bytes > length) bytes = length; @@ -770,6 +788,10 @@ static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i) * Otherwise there's a nasty deadlock on copying from the * same page as we're writing to, without it being marked * up-to-date. + * + * For async buffered writes the assumption is that the user + * page has already been faulted in. This can be optimized by + * faulting the user page. */ if (unlikely(fault_in_iov_iter_readable(i, bytes) == bytes)) { status = -EFAULT; @@ -781,7 +803,7 @@ static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i) break; page = folio_file_page(folio, pos >> PAGE_SHIFT); - if (mapping_writably_mapped(iter->inode->i_mapping)) + if (mapping_writably_mapped(mapping)) flush_dcache_page(page); copied = copy_page_from_iter_atomic(page, offset, bytes, i); @@ -806,8 +828,6 @@ static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i) pos += status; written += status; length -= status; - - balance_dirty_pages_ratelimited(iter->inode->i_mapping); } while (iov_iter_count(i) && length); return written ? written : status; @@ -825,6 +845,9 @@ iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *i, }; int ret; + if (iocb->ki_flags & IOCB_NOWAIT) + iter.flags |= IOMAP_NOWAIT; + while ((ret = iomap_iter(&iter, ops)) > 0) iter.processed = iomap_write_iter(&iter, i); if (iter.pos == iocb->ki_pos) From patchwork Thu Jun 16 21:22:13 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stefan Roesch X-Patchwork-Id: 12884909 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 754F7C433EF for ; Thu, 16 Jun 2022 21:52:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1378379AbiFPVw0 (ORCPT ); Thu, 16 Jun 2022 17:52:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37326 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231686AbiFPVwZ (ORCPT ); Thu, 16 Jun 2022 17:52:25 -0400 Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7F77B5DA3A for ; Thu, 16 Jun 2022 14:52:24 -0700 (PDT) Received: from pps.filterd (m0109332.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 25GKPX0G032112 for ; Thu, 16 Jun 2022 14:52:23 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=Y55cVOBKiXeRDoBnXl4t1cftHngh7dGKga/NccAHsxM=; b=L4nJ8QUVxzifUlZu+6nWFjpqB+sYhMo1mFr2NgW1ptQnApABtLVSpu9lxVyZ5HNoMXH6 i9BpH2O0tQ8sRgojBiP0drzJa2jBCIqx9XZq3OoU4fG46WB7BJxau5GrHZbbeGXjeXXI 7Jw5p0ZfHhMWfeIwja8GvrKWS2WS0ttBybc= Received: from maileast.thefacebook.com ([163.114.130.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3gqxc5ypg0-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Thu, 16 Jun 2022 14:52:23 -0700 Received: from twshared14818.18.frc3.facebook.com (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:83::4) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.28; Thu, 16 Jun 2022 14:52:22 -0700 Received: by devvm225.atn0.facebook.com (Postfix, from userid 425415) id B3EAA108B70A4; Thu, 16 Jun 2022 14:22:23 -0700 (PDT) From: Stefan Roesch To: , , , , CC: , , , , , Subject: [PATCH v9 06/14] iomap: Return -EAGAIN from iomap_write_iter() Date: Thu, 16 Jun 2022 14:22:13 -0700 Message-ID: <20220616212221.2024518-7-shr@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220616212221.2024518-1-shr@fb.com> References: <20220616212221.2024518-1-shr@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: Hct958ZOo4IaKN2bLutWBakeviuyl6Te X-Proofpoint-ORIG-GUID: Hct958ZOo4IaKN2bLutWBakeviuyl6Te X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.883,Hydra:6.0.517,FMLib:17.11.64.514 definitions=2022-06-16_18,2022-06-16_01,2022-02-23_01 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org If iomap_write_iter() encounters -EAGAIN, return -EAGAIN to the caller. Signed-off-by: Stefan Roesch --- fs/iomap/buffered-io.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index 83cf093fcb92..f2e36240079f 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -830,7 +830,13 @@ static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i) length -= status; } while (iov_iter_count(i) && length); - return written ? written : status; + if (status == -EAGAIN) { + iov_iter_revert(i, written); + return -EAGAIN; + } + if (written) + return written; + return status; } ssize_t From patchwork Thu Jun 16 21:22:16 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stefan Roesch X-Patchwork-Id: 12884878 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 26C2CC43334 for ; Thu, 16 Jun 2022 21:33:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1379138AbiFPVd0 (ORCPT ); Thu, 16 Jun 2022 17:33:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53732 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1379143AbiFPVdZ (ORCPT ); Thu, 16 Jun 2022 17:33:25 -0400 Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 566A14991B for ; Thu, 16 Jun 2022 14:33:24 -0700 (PDT) Received: from pps.filterd (m0109332.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 25GKPWTb032033 for ; Thu, 16 Jun 2022 14:33:23 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=hg9mOHqe8lbN/Pa620EqWCYSM90BEn33RivumAhrzAE=; b=GWyZVaw3xkTWIdDstvRPeaNMsgXRLrr6bDzOW8PdCq+1iFfqL029ldH6mu+hFJEjJKwz GL2WTmUE/XPfJrMit0PUZI8ekIaVJEtlWzUm9cSZSFpQvRUU0ryT4N05QoC6+mI2drwP 5otmHRVP3fstAcl19jN+ipx2hzUV8l5Cpac= Received: from mail.thefacebook.com ([163.114.132.120]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3gqxc5yk64-3 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Thu, 16 Jun 2022 14:33:23 -0700 Received: from twshared17349.03.ash7.facebook.com (2620:10d:c085:208::11) by mail.thefacebook.com (2620:10d:c085:21d::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.28; Thu, 16 Jun 2022 14:33:20 -0700 Received: by devvm225.atn0.facebook.com (Postfix, from userid 425415) id C98EA108B70AF; Thu, 16 Jun 2022 14:22:23 -0700 (PDT) From: Stefan Roesch To: , , , , CC: , , , , , , Christian Brauner Subject: [PATCH v9 09/14] fs: Split off inode_needs_update_time and __file_update_time Date: Thu, 16 Jun 2022 14:22:16 -0700 Message-ID: <20220616212221.2024518-10-shr@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220616212221.2024518-1-shr@fb.com> References: <20220616212221.2024518-1-shr@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: B5l15Qpfz8UZaAy1UZApsa3drbCGpAu4 X-Proofpoint-ORIG-GUID: B5l15Qpfz8UZaAy1UZApsa3drbCGpAu4 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.883,Hydra:6.0.517,FMLib:17.11.64.514 definitions=2022-06-16_18,2022-06-16_01,2022-02-23_01 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org This splits off the functions inode_needs_update_time() and __file_update_time() from the function file_update_time(). This is required to support async buffered writes. No intended functional changes in this patch. Signed-off-by: Stefan Roesch Reviewed-by: Jan Kara Reviewed-by: Christian Brauner (Microsoft) --- fs/inode.c | 76 +++++++++++++++++++++++++++++++++++------------------- 1 file changed, 50 insertions(+), 26 deletions(-) diff --git a/fs/inode.c b/fs/inode.c index a2e18379c8a6..ff726d99ecc7 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -2049,35 +2049,18 @@ int file_remove_privs(struct file *file) } EXPORT_SYMBOL(file_remove_privs); -/** - * file_update_time - update mtime and ctime time - * @file: file accessed - * - * Update the mtime and ctime members of an inode and mark the inode - * for writeback. Note that this function is meant exclusively for - * usage in the file write path of filesystems, and filesystems may - * choose to explicitly ignore update via this function with the - * S_NOCMTIME inode flag, e.g. for network filesystem where these - * timestamps are handled by the server. This can return an error for - * file systems who need to allocate space in order to update an inode. - */ - -int file_update_time(struct file *file) +static int inode_needs_update_time(struct inode *inode, struct timespec64 *now) { - struct inode *inode = file_inode(file); - struct timespec64 now; int sync_it = 0; - int ret; /* First try to exhaust all avenues to not sync */ if (IS_NOCMTIME(inode)) return 0; - now = current_time(inode); - if (!timespec64_equal(&inode->i_mtime, &now)) + if (!timespec64_equal(&inode->i_mtime, now)) sync_it = S_MTIME; - if (!timespec64_equal(&inode->i_ctime, &now)) + if (!timespec64_equal(&inode->i_ctime, now)) sync_it |= S_CTIME; if (IS_I_VERSION(inode) && inode_iversion_need_inc(inode)) @@ -2086,15 +2069,50 @@ int file_update_time(struct file *file) if (!sync_it) return 0; - /* Finally allowed to write? Takes lock. */ - if (__mnt_want_write_file(file)) - return 0; + return sync_it; +} + +static int __file_update_time(struct file *file, struct timespec64 *now, + int sync_mode) +{ + int ret = 0; + struct inode *inode = file_inode(file); - ret = inode_update_time(inode, &now, sync_it); - __mnt_drop_write_file(file); + /* try to update time settings */ + if (!__mnt_want_write_file(file)) { + ret = inode_update_time(inode, now, sync_mode); + __mnt_drop_write_file(file); + } return ret; } + +/** + * file_update_time - update mtime and ctime time + * @file: file accessed + * + * Update the mtime and ctime members of an inode and mark the inode for + * writeback. Note that this function is meant exclusively for usage in + * the file write path of filesystems, and filesystems may choose to + * explicitly ignore updates via this function with the _NOCMTIME inode + * flag, e.g. for network filesystem where these imestamps are handled + * by the server. This can return an error for file systems who need to + * allocate space in order to update an inode. + * + * Return: 0 on success, negative errno on failure. + */ +int file_update_time(struct file *file) +{ + int ret; + struct inode *inode = file_inode(file); + struct timespec64 now = current_time(inode); + + ret = inode_needs_update_time(inode, &now); + if (ret <= 0) + return ret; + + return __file_update_time(file, &now, ret); +} EXPORT_SYMBOL(file_update_time); /** @@ -2111,6 +2129,8 @@ EXPORT_SYMBOL(file_update_time); int file_modified(struct file *file) { int ret; + struct inode *inode = file_inode(file); + struct timespec64 now = current_time(inode); /* * Clear the security bits if the process is not being run by root. @@ -2123,7 +2143,11 @@ int file_modified(struct file *file) if (unlikely(file->f_mode & FMODE_NOCMTIME)) return 0; - return file_update_time(file); + ret = inode_needs_update_time(inode, &now); + if (ret <= 0) + return ret; + + return __file_update_time(file, &now, ret); } EXPORT_SYMBOL(file_modified); From patchwork Thu Jun 16 21:22:18 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stefan Roesch X-Patchwork-Id: 12884879 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2F611C433EF for ; Thu, 16 Jun 2022 21:34:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1378742AbiFPVeX (ORCPT ); Thu, 16 Jun 2022 17:34:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54772 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1378817AbiFPVeW (ORCPT ); Thu, 16 Jun 2022 17:34:22 -0400 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 32A5A612B2 for ; Thu, 16 Jun 2022 14:34:21 -0700 (PDT) Received: from pps.filterd (m0148461.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 25GKPRYZ014421 for ; Thu, 16 Jun 2022 14:34:21 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=I1DsOb8+/RCOQDjc5VvCIS0HQO+cqihPjCkv7YhigVI=; b=X1ad1hZutk3Y7LHgrpZFWNn4Sa9yK2iSTqlFW+rGNYywg4Kvqy3bSBo7Ji/6KUJJyiE3 hcIrNBAM4SAYfPRfk98SOKrS5SuQA2k65YS5NEiuXjwHukgmLOI9/86pZVVqF1AOtNgS 1cpr6JqRlW2fQ6ykYaYIkWlEIp/iWmcs0yg= Received: from maileast.thefacebook.com ([163.114.130.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3gr3tr41rs-6 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Thu, 16 Jun 2022 14:34:20 -0700 Received: from twshared18317.08.ash9.facebook.com (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:83::4) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.28; Thu, 16 Jun 2022 14:34:17 -0700 Received: by devvm225.atn0.facebook.com (Postfix, from userid 425415) id D6E37108B70B5; Thu, 16 Jun 2022 14:22:23 -0700 (PDT) From: Stefan Roesch To: , , , , CC: , , , , , Subject: [PATCH v9 11/14] io_uring: Add support for async buffered writes Date: Thu, 16 Jun 2022 14:22:18 -0700 Message-ID: <20220616212221.2024518-12-shr@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220616212221.2024518-1-shr@fb.com> References: <20220616212221.2024518-1-shr@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: z9DiaPhVVL0apZtv75ZxAKYxX29BVYMj X-Proofpoint-GUID: z9DiaPhVVL0apZtv75ZxAKYxX29BVYMj X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.883,Hydra:6.0.517,FMLib:17.11.64.514 definitions=2022-06-16_18,2022-06-16_01,2022-02-23_01 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org This enables the async buffered writes for the filesystems that support async buffered writes in io-uring. Buffered writes are enabled for blocks that are already in the page cache or can be acquired with noio. Signed-off-by: Stefan Roesch --- fs/io_uring.c | 29 ++++++++++++++++++++++++----- 1 file changed, 24 insertions(+), 5 deletions(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index 3aab4182fd89..22a0bb8c5fe5 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -4311,7 +4311,7 @@ static inline int io_iter_do_read(struct io_kiocb *req, struct iov_iter *iter) return -EINVAL; } -static bool need_read_all(struct io_kiocb *req) +static bool need_complete_io(struct io_kiocb *req) { return req->flags & REQ_F_ISREG || S_ISBLK(file_inode(req->file)->i_mode); @@ -4440,7 +4440,7 @@ static int io_read(struct io_kiocb *req, unsigned int issue_flags) } else if (ret == -EIOCBQUEUED) { goto out_free; } else if (ret == req->cqe.res || ret <= 0 || !force_nonblock || - (req->flags & REQ_F_NOWAIT) || !need_read_all(req)) { + (req->flags & REQ_F_NOWAIT) || !need_complete_io(req)) { /* read all, failed, already did sync or don't want to retry */ goto done; } @@ -4536,9 +4536,10 @@ static int io_write(struct io_kiocb *req, unsigned int issue_flags) if (unlikely(!io_file_supports_nowait(req))) goto copy_iov; - /* file path doesn't support NOWAIT for non-direct_IO */ - if (force_nonblock && !(kiocb->ki_flags & IOCB_DIRECT) && - (req->flags & REQ_F_ISREG)) + /* File path supports NOWAIT for non-direct_IO only for block devices. */ + if (!(kiocb->ki_flags & IOCB_DIRECT) && + !(kiocb->ki_filp->f_mode & FMODE_BUF_WASYNC) && + (req->flags & REQ_F_ISREG)) goto copy_iov; kiocb->ki_flags |= IOCB_NOWAIT; @@ -4592,6 +4593,24 @@ static int io_write(struct io_kiocb *req, unsigned int issue_flags) /* IOPOLL retry should happen for io-wq threads */ if (ret2 == -EAGAIN && (req->ctx->flags & IORING_SETUP_IOPOLL)) goto copy_iov; + + if (ret2 != req->cqe.res && ret2 >= 0 && need_complete_io(req)) { + struct io_async_rw *rw; + + /* This is a partial write. The file pos has already been + * updated, setup the async struct to complete the request + * in the worker. Also update bytes_done to account for + * the bytes already written. + */ + iov_iter_save_state(&s->iter, &s->iter_state); + ret = io_setup_async_rw(req, iovec, s, true); + + rw = req->async_data; + if (rw) + rw->bytes_done += ret2; + + return ret ? ret : -EAGAIN; + } done: kiocb_done(req, ret2, issue_flags); } else { From patchwork Thu Jun 16 21:22:19 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stefan Roesch X-Patchwork-Id: 12884884 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 70EFAC433EF for ; Thu, 16 Jun 2022 21:40:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240598AbiFPVkT (ORCPT ); Thu, 16 Jun 2022 17:40:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58350 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230194AbiFPVkS (ORCPT ); Thu, 16 Jun 2022 17:40:18 -0400 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 423B75C670 for ; Thu, 16 Jun 2022 14:40:18 -0700 (PDT) Received: from pps.filterd (m0148461.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 25GKPSgS014576 for ; Thu, 16 Jun 2022 14:40:18 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=YK7eG3j7NHy9n91qx9y5OPEq/Sc1sG64s/rjfArr8jQ=; b=pXUvPIe1K2fZc6M+4DH7No7pEMolSqYHQSmSSClwbgH7qJZ7GNAp4x1+TFQbawIVPu78 lZCZfUThwLgIgfH7NsX2k+Rnu5Dp2tEkoGnrOpdFYioa48xNRcQL6irFS91+D5skarNO PNaT/xUjTfg0nEOSXGuUbNpA2bzuDS49aQU= Received: from maileast.thefacebook.com ([163.114.130.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3gr3tr42xd-3 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Thu, 16 Jun 2022 14:40:17 -0700 Received: from twshared18317.08.ash9.facebook.com (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:82::c) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.28; Thu, 16 Jun 2022 14:40:15 -0700 Received: by devvm225.atn0.facebook.com (Postfix, from userid 425415) id DD35C108B70B8; Thu, 16 Jun 2022 14:22:23 -0700 (PDT) From: Stefan Roesch To: , , , , CC: , , , , , Subject: [PATCH v9 12/14] io_uring: Add tracepoint for short writes Date: Thu, 16 Jun 2022 14:22:19 -0700 Message-ID: <20220616212221.2024518-13-shr@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220616212221.2024518-1-shr@fb.com> References: <20220616212221.2024518-1-shr@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: EnDoqGIpNJuNnCUHBUgBg9RtptAafj6r X-Proofpoint-GUID: EnDoqGIpNJuNnCUHBUgBg9RtptAafj6r X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.883,Hydra:6.0.517,FMLib:17.11.64.514 definitions=2022-06-16_18,2022-06-16_01,2022-02-23_01 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org This adds the io_uring_short_write tracepoint to io_uring. A short write is issued if not all pages that are required for a write are in the page cache and the async buffered writes have to return EAGAIN. Signed-off-by: Stefan Roesch --- fs/io_uring.c | 3 +++ include/trace/events/io_uring.h | 25 +++++++++++++++++++++++++ 2 files changed, 28 insertions(+) diff --git a/fs/io_uring.c b/fs/io_uring.c index 22a0bb8c5fe5..510c09192832 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -4597,6 +4597,9 @@ static int io_write(struct io_kiocb *req, unsigned int issue_flags) if (ret2 != req->cqe.res && ret2 >= 0 && need_complete_io(req)) { struct io_async_rw *rw; + trace_io_uring_short_write(req->ctx, kiocb->ki_pos - ret2, + req->cqe.res, ret2); + /* This is a partial write. The file pos has already been * updated, setup the async struct to complete the request * in the worker. Also update bytes_done to account for diff --git a/include/trace/events/io_uring.h b/include/trace/events/io_uring.h index 66fcc5a1a5b1..25df513660cc 100644 --- a/include/trace/events/io_uring.h +++ b/include/trace/events/io_uring.h @@ -600,6 +600,31 @@ TRACE_EVENT(io_uring_cqe_overflow, __entry->cflags, __entry->ocqe) ); +TRACE_EVENT(io_uring_short_write, + + TP_PROTO(void *ctx, u64 fpos, u64 wanted, u64 got), + + TP_ARGS(ctx, fpos, wanted, got), + + TP_STRUCT__entry( + __field(void *, ctx) + __field(u64, fpos) + __field(u64, wanted) + __field(u64, got) + ), + + TP_fast_assign( + __entry->ctx = ctx; + __entry->fpos = fpos; + __entry->wanted = wanted; + __entry->got = got; + ), + + TP_printk("ring %p, fpos %lld, wanted %lld, got %lld", + __entry->ctx, __entry->fpos, + __entry->wanted, __entry->got) +); + #endif /* _TRACE_IO_URING_H */ /* This part must be outside protection */ From patchwork Thu Jun 16 21:22:21 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stefan Roesch X-Patchwork-Id: 12884892 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6B1A7C43334 for ; Thu, 16 Jun 2022 21:46:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230174AbiFPVqW (ORCPT ); Thu, 16 Jun 2022 17:46:22 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33308 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229901AbiFPVqU (ORCPT ); Thu, 16 Jun 2022 17:46:20 -0400 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 842FB5F8D2 for ; Thu, 16 Jun 2022 14:46:19 -0700 (PDT) Received: from pps.filterd (m0109333.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 25GKPTeW024831 for ; Thu, 16 Jun 2022 14:46:19 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=mUyYlJXItg61v1JvQQks15Uucu0ATZYHM7QlCl3c3gQ=; b=iA2nAn0G8g76s6ASVrU0F+Pd+gbCqqD0R+T3hbr3jiYqgi/lS2VxAYlvNUaYv9D3uyCc TKy2x8Vl/An4ja/b5j93UIx/ALn2nEDpi6Qvn+yYYfjmZ2VYGFkNcNtCMh6ydqwrCXZL gBIDNuXa40snx9eTo4d5W91NV/HSmJcjm0k= Received: from maileast.thefacebook.com ([163.114.130.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3gqt6fph14-4 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Thu, 16 Jun 2022 14:46:19 -0700 Received: from twshared14577.08.ash8.facebook.com (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:83::5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.28; Thu, 16 Jun 2022 14:46:16 -0700 Received: by devvm225.atn0.facebook.com (Postfix, from userid 425415) id E9EA5108B70BE; Thu, 16 Jun 2022 14:22:23 -0700 (PDT) From: Stefan Roesch To: , , , , CC: , , , , , , Christoph Hellwig Subject: [PATCH v9 14/14] xfs: Add async buffered write support Date: Thu, 16 Jun 2022 14:22:21 -0700 Message-ID: <20220616212221.2024518-15-shr@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220616212221.2024518-1-shr@fb.com> References: <20220616212221.2024518-1-shr@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: O8p_gKHVOi2K_mNgD30Aiiy4CB_VotX0 X-Proofpoint-ORIG-GUID: O8p_gKHVOi2K_mNgD30Aiiy4CB_VotX0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.883,Hydra:6.0.517,FMLib:17.11.64.514 definitions=2022-06-16_18,2022-06-16_01,2022-02-23_01 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org This adds the async buffered write support to XFS. For async buffered write requests, the request will return -EAGAIN if the ilock cannot be obtained immediately. Signed-off-by: Stefan Roesch Reviewed-by: Christoph Hellwig --- fs/xfs/xfs_file.c | 11 +++++------ fs/xfs/xfs_iomap.c | 5 ++++- 2 files changed, 9 insertions(+), 7 deletions(-) diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c index 5a171c0b244b..8d9b14d2b912 100644 --- a/fs/xfs/xfs_file.c +++ b/fs/xfs/xfs_file.c @@ -410,7 +410,7 @@ xfs_file_write_checks( spin_unlock(&ip->i_flags_lock); out: - return file_modified(file); + return kiocb_modified(iocb); } static int @@ -700,12 +700,11 @@ xfs_file_buffered_write( bool cleared_space = false; unsigned int iolock; - if (iocb->ki_flags & IOCB_NOWAIT) - return -EOPNOTSUPP; - write_retry: iolock = XFS_IOLOCK_EXCL; - xfs_ilock(ip, iolock); + ret = xfs_ilock_iocb(iocb, iolock); + if (ret) + return ret; ret = xfs_file_write_checks(iocb, from, &iolock); if (ret) @@ -1165,7 +1164,7 @@ xfs_file_open( { if (xfs_is_shutdown(XFS_M(inode->i_sb))) return -EIO; - file->f_mode |= FMODE_NOWAIT | FMODE_BUF_RASYNC; + file->f_mode |= FMODE_NOWAIT | FMODE_BUF_RASYNC | FMODE_BUF_WASYNC; return generic_file_open(inode, file); } diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c index bcf7c3694290..5d50fed291b4 100644 --- a/fs/xfs/xfs_iomap.c +++ b/fs/xfs/xfs_iomap.c @@ -886,6 +886,7 @@ xfs_buffered_write_iomap_begin( bool eof = false, cow_eof = false, shared = false; int allocfork = XFS_DATA_FORK; int error = 0; + unsigned int lockmode = XFS_ILOCK_EXCL; if (xfs_is_shutdown(mp)) return -EIO; @@ -897,7 +898,9 @@ xfs_buffered_write_iomap_begin( ASSERT(!XFS_IS_REALTIME_INODE(ip)); - xfs_ilock(ip, XFS_ILOCK_EXCL); + error = xfs_ilock_for_iomap(ip, flags, &lockmode); + if (error) + return error; if (XFS_IS_CORRUPT(mp, !xfs_ifork_has_extents(&ip->i_df)) || XFS_TEST_ERROR(false, mp, XFS_ERRTAG_BMAPIFORMAT)) {