From patchwork Fri Oct 11 09:11:33 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qun-Wei Lin X-Patchwork-Id: 13832258 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B8AEDCFD2FD for ; Fri, 11 Oct 2024 09:11:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 44CB46B00A8; Fri, 11 Oct 2024 05:11:59 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3FBD56B00A9; Fri, 11 Oct 2024 05:11:59 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 275C76B00AC; Fri, 11 Oct 2024 05:11:59 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 06E776B00A8 for ; Fri, 11 Oct 2024 05:11:59 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 25C7DC0C84 for ; Fri, 11 Oct 2024 09:11:54 +0000 (UTC) X-FDA: 82660754316.28.53AF3E8 Received: from mailgw02.mediatek.com (mailgw02.mediatek.com [216.200.240.185]) by imf19.hostedemail.com (Postfix) with ESMTP id C79061A001C for ; Fri, 11 Oct 2024 09:11:53 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=mediatek.com header.s=dk header.b=hgnAzCIl; spf=pass (imf19.hostedemail.com: domain of qun-wei.lin@mediatek.com designates 216.200.240.185 as permitted sender) smtp.mailfrom=qun-wei.lin@mediatek.com; dmarc=pass (policy=quarantine) header.from=mediatek.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1728637846; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=XLdY4eVKFUF5U3hhdsNnHSjGU57ONtroIJeWyFkOtok=; b=bokSmZN6W7T15DXgz1DYTmf5ixONnR4xfeBWlPTmls9b+pi0IsVTvJx6vAofRwVTg78Jkl veVCZiHqcmZWO4K9tMy5gWNJn2/g/1gbpQU0sfHsGMoLeTtDHoyKldYts5QY4m6j7Uflr9 ERNtfQfcARuSdvkh8nE73CQXiDf1Yn0= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=mediatek.com header.s=dk header.b=hgnAzCIl; spf=pass (imf19.hostedemail.com: domain of qun-wei.lin@mediatek.com designates 216.200.240.185 as permitted sender) smtp.mailfrom=qun-wei.lin@mediatek.com; dmarc=pass (policy=quarantine) header.from=mediatek.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1728637846; a=rsa-sha256; cv=none; b=NXjMHL37KCYBEIilBB7KqHoFrRm4pTxXhAPu7b23a5YQEqRPHhVQRB0xHAmOOLp88TPsLS m53kxNUq54FIgnNRxVlxRvnCBbEASR/G6QWCuvXfJJkaCG/AXhTcuYoe1dmWIy1Sruv7n9 Si6OFKLzfAUmLeItYaI8YB1Xez6bbos= X-UUID: d89eb99687b011efba0aef63c0775dbf-20241011 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=mediatek.com; s=dk; h=Content-Type:MIME-Version:Message-ID:Date:Subject:CC:To:From; bh=XLdY4eVKFUF5U3hhdsNnHSjGU57ONtroIJeWyFkOtok=; b=hgnAzCIlxX+GU8+6USzVgGjjno3NuQQjO2Nd/3hPsemm0LTQJ6hV7IqcadTx7FXH1OQoJAqcg4NW8JtsoUbfBe8kMHDTQkbmOEJ+khb01daetiAi4S/oCwxQ1YZJbiyhCG1J32SyJayGHCjbCKJUzJEO/1uTCimijlTyedlql6w=; X-CID-P-RULE: Release_Ham X-CID-O-INFO: VERSION:1.1.41,REQID:8ea9a23f-2e1b-4d3c-bd2e-6d54aa0823d9,IP:0,U RL:0,TC:0,Content:-25,EDM:0,RT:0,SF:0,FILE:0,BULK:0,RULE:Release_Ham,ACTIO N:release,TS:-25 X-CID-META: VersionHash:6dc6a47,CLOUDID:26861941-8751-41b2-98dd-475503d45150,B ulkID:nil,BulkQuantity:0,Recheck:0,SF:102,TC:nil,Content:0,EDM:-3,IP:nil,U RL:0,File:nil,RT:nil,Bulk:nil,QS:nil,BEC:nil,COL:0,OSI:0,OSA:0,AV:0,LES:1, SPR:NO,DKR:0,DKP:0,BRR:0,BRE:0,ARC:0 X-CID-BVR: 0 X-CID-BAS: 0,_,0,_ X-CID-FACTOR: TF_CID_SPAM_SNR X-UUID: d89eb99687b011efba0aef63c0775dbf-20241011 Received: from mtkmbs09n1.mediatek.inc [(172.21.101.35)] by mailgw02.mediatek.com (envelope-from ) (musrelay.mediatek.com ESMTP with TLSv1.2 ECDHE-RSA-AES256-GCM-SHA384 256/256) with ESMTP id 1557341340; Fri, 11 Oct 2024 02:11:49 -0700 Received: from mtkmbs11n1.mediatek.inc (172.21.101.185) by MTKMBS09N2.mediatek.inc (172.21.101.94) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.26; Fri, 11 Oct 2024 02:11:44 -0700 Received: from mtksdccf07.mediatek.inc (172.21.84.99) by mtkmbs11n1.mediatek.inc (172.21.101.73) with Microsoft SMTP Server id 15.2.1118.26 via Frontend Transport; Fri, 11 Oct 2024 17:11:44 +0800 From: Qun-Wei Lin To: Jens Axboe , Minchan Kim , Sergey Senozhatsky , Vishal Verma , Dan Williams , Dave Jiang , Ira Weiny , Andrew Morton , Matthias Brugger , AngeloGioacchino Del Regno , "Huang, Ying" , Chris Li , Ryan Roberts , David Hildenbrand , Kairui Song , "Matthew Wilcox (Oracle)" , Dan Schatzberg , Barry Song CC: , , , , , , Casper Li , Chinwen Chang , Andrew Yang , John Hsu , , Qun-Wei Lin Subject: [PATCH] mm: Split BLK_FEAT_SYNCHRONOUS and SWP_SYNCHRONOUS_IO into separate read and write flags Date: Fri, 11 Oct 2024 17:11:33 +0800 Message-ID: <20241011091133.28173-1-qun-wei.lin@mediatek.com> X-Mailer: git-send-email 2.18.0 MIME-Version: 1.0 X-MTK: N X-Rspamd-Server: rspam03 X-Rspam-User: X-Rspamd-Queue-Id: C79061A001C X-Stat-Signature: qtwcic6n66yjmtyys48twu5bpj7898xz X-HE-Tag: 1728637913-456323 X-HE-Meta: U2FsdGVkX1/4GZ+tMtXHvsuDbM3PWm2I6z7JLJD0CgCNK9zhdaWBUKpicXr+rYzy1WJqtSkWppbakPKtu7TEdMYiUNae4ad4CawKEE1qa4pGewLO9osRtb+E6cHdUbL5ICSMHtQ6vchtLBXvEaH2pKMMBuypKGtg7XgPl6vYq6G75S3J1PeN9rx57DaFlhGEhjm9HFICIVldjzewW9QRjJJDYCMIid8HYsUakiXvRLwBQxue+7IcySWU5IyYwzdISWkxiwq//qrjjxWYhm8r+J8kR445rHyL10S6QuH+Cqiy3/mXj2ru6i7bE20LcL39gHZ9uaJL5osQyxCsl3SNWlHzyhHueDrauencdZww/WjY0UxqPQX+/gcrtvdCTDEnBdkkIvW2ngyv7qGRYlkcQtZGOhuU1ZqWjaA+EoIcvCECI0iQBdfhztPBMXRilfJQFlwgUtZpLBIMMYaoQ7hBgZl17KL1DiLKdgviHcDLRbto4N1/OAN2AwdNNLl58s/7RlI/jE8r58sRIfdBMRcUyuBTh0oT1pi+DQJ/MQHRuWOWRIlH9c09u5h98n8jr+ne/DxljAgjvnkjarUaKxSeQLYPLHe5+S0n0/tcpTgVgvtEzZwPx13/EpMBkO7C8SEt9/H/3JXCbsegAZ5o1kq8ILaRmtyKB7MgjGM3GVlVi7ybqhoypPOQ2vQ6X9FvWcU9LbetNON47zqoBKbatccRqM2IZ7LnC2cMi38eSo5hlGRFTB4N0D4+jXA+z6Lk+Aka0zns/6CPaWcPQ/y2C8KOhSVTI4feRTjcmK+Ganf2uyHiWqA62tc1m9/QtytZFun6k2k+gcEPQECOqxM6H0BPUQrUb4emWc3gxVf0e70XY/Zq2FTd3XmrslG2nKnKR0Xc+5FaVp55lCfNIDSHG+z+4z2ibkUUPvCoMx9/UftZp5vi9tW9Ah45vihIbBwZOhqXoFu+x7MmqHfF4IE6g2s +PbwCJxZ pmdFwPfvUh2dRNk+bPLkPq6yfhZyJojDx5qgax8W7I3x6PtVmSetwM+0FMrS3ZZNNMQURmZuLH1YQdHMafqfI5fHf+pqJLqZ9iS1pkD8d5v6sOcY7EsJ72+ulhU8OiVNp/G+xskSAbWZL6lEt2zv+IrgmqRzhvJXoAOovfVUH+v0/Q7iM96u73O0RFG/wMVGF0nE5ajLDJHQ9dGVK2kreKt7SViRw86Te7fADP4g5GMBPiYM+5MbZhAOKSsIZ7Mz1nKGut+Uru6lmiFn/UyuaujB5mg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: This patch splits the BLK_FEAT_SYNCHRONOUS feature flag into two separate flags: BLK_FEAT_READ_SYNCHRONOUS and BLK_FEAT_WRITE_SYNCHRONOUS. Similarly, the SWP_SYNCHRONOUS_IO flag is split into SWP_READ_SYNCHRONOUS_IO and SWP_WRITE_SYNCHRONOUS_IO. These changes are motivated by the need to better accommodate certain swap devices that support synchronous read operations but asynchronous write operations. The existing BLK_FEAT_SYNCHRONOUS and SWP_SYNCHRONOUS_IO flags are not sufficient for these devices, as they enforce synchronous behavior for both read and write operations. Signed-off-by: Qun-Wei Lin --- drivers/block/brd.c | 3 ++- drivers/block/zram/zram_drv.c | 5 +++-- drivers/nvdimm/btt.c | 3 ++- drivers/nvdimm/pmem.c | 5 +++-- include/linux/blkdev.h | 24 ++++++++++++++++-------- include/linux/swap.h | 31 ++++++++++++++++--------------- mm/memory.c | 4 ++-- mm/page_io.c | 6 +++--- mm/swapfile.c | 7 +++++-- 9 files changed, 52 insertions(+), 36 deletions(-) diff --git a/drivers/block/brd.c b/drivers/block/brd.c index 2fd1ed101748..619a56bf747e 100644 --- a/drivers/block/brd.c +++ b/drivers/block/brd.c @@ -336,7 +336,8 @@ static int brd_alloc(int i) .max_hw_discard_sectors = UINT_MAX, .max_discard_segments = 1, .discard_granularity = PAGE_SIZE, - .features = BLK_FEAT_SYNCHRONOUS | + .features = BLK_FEAT_READ_SYNCHRONOUS | + BLK_FEAT_WRITE_SYNCHRONOUS | BLK_FEAT_NOWAIT, }; diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c index ad9c9bc3ccfc..d2927ea76488 100644 --- a/drivers/block/zram/zram_drv.c +++ b/drivers/block/zram/zram_drv.c @@ -2345,8 +2345,9 @@ static int zram_add(void) #if ZRAM_LOGICAL_BLOCK_SIZE == PAGE_SIZE .max_write_zeroes_sectors = UINT_MAX, #endif - .features = BLK_FEAT_STABLE_WRITES | - BLK_FEAT_SYNCHRONOUS, + .features = BLK_FEAT_STABLE_WRITES | + BLK_FEAT_READ_SYNCHRONOUS | + BLK_FEAT_WRITE_SYNCHRONOUS, }; struct zram *zram; int ret, device_id; diff --git a/drivers/nvdimm/btt.c b/drivers/nvdimm/btt.c index 423dcd190906..1665d98f51af 100644 --- a/drivers/nvdimm/btt.c +++ b/drivers/nvdimm/btt.c @@ -1501,7 +1501,8 @@ static int btt_blk_init(struct btt *btt) .logical_block_size = btt->sector_size, .max_hw_sectors = UINT_MAX, .max_integrity_segments = 1, - .features = BLK_FEAT_SYNCHRONOUS, + .features = BLK_FEAT_READ_SYNCHRONOUS | + BLK_FEAT_WRITE_SYNCHRONOUS, }; int rc; diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c index 210fb77f51ba..c22a6ee13769 100644 --- a/drivers/nvdimm/pmem.c +++ b/drivers/nvdimm/pmem.c @@ -455,8 +455,9 @@ static int pmem_attach_disk(struct device *dev, .logical_block_size = pmem_sector_size(ndns), .physical_block_size = PAGE_SIZE, .max_hw_sectors = UINT_MAX, - .features = BLK_FEAT_WRITE_CACHE | - BLK_FEAT_SYNCHRONOUS, + .features = BLK_FEAT_WRITE_CACHE | + BLK_FEAT_READ_SYNCHRONOUS | + BLK_FEAT_WRITE_SYNCHRONOUS, }; int nid = dev_to_node(dev), fua; struct resource *res = &nsio->res; diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 50c3b959da28..88e96d6cead2 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -304,20 +304,23 @@ typedef unsigned int __bitwise blk_features_t; /* don't modify data until writeback is done */ #define BLK_FEAT_STABLE_WRITES ((__force blk_features_t)(1u << 5)) -/* always completes in submit context */ -#define BLK_FEAT_SYNCHRONOUS ((__force blk_features_t)(1u << 6)) +/* read operations always completes in submit context */ +#define BLK_FEAT_READ_SYNCHRONOUS ((__force blk_features_t)(1u << 6)) + +/* write operations always completes in submit context */ +#define BLK_FEAT_WRITE_SYNCHRONOUS ((__force blk_features_t)(1u << 7)) /* supports REQ_NOWAIT */ -#define BLK_FEAT_NOWAIT ((__force blk_features_t)(1u << 7)) +#define BLK_FEAT_NOWAIT ((__force blk_features_t)(1u << 8)) /* supports DAX */ -#define BLK_FEAT_DAX ((__force blk_features_t)(1u << 8)) +#define BLK_FEAT_DAX ((__force blk_features_t)(1u << 9)) /* supports I/O polling */ -#define BLK_FEAT_POLL ((__force blk_features_t)(1u << 9)) +#define BLK_FEAT_POLL ((__force blk_features_t)(1u << 10)) /* is a zoned device */ -#define BLK_FEAT_ZONED ((__force blk_features_t)(1u << 10)) +#define BLK_FEAT_ZONED ((__force blk_features_t)(1u << 11)) /* supports PCI(e) p2p requests */ #define BLK_FEAT_PCI_P2PDMA ((__force blk_features_t)(1u << 12)) @@ -1303,9 +1306,14 @@ static inline bool bdev_nonrot(struct block_device *bdev) return blk_queue_nonrot(bdev_get_queue(bdev)); } -static inline bool bdev_synchronous(struct block_device *bdev) +static inline bool bdev_read_synchronous(struct block_device *bdev) +{ + return bdev->bd_disk->queue->limits.features & BLK_FEAT_READ_SYNCHRONOUS; +} + +static inline bool bdev_write_synchronous(struct block_device *bdev) { - return bdev->bd_disk->queue->limits.features & BLK_FEAT_SYNCHRONOUS; + return bdev->bd_disk->queue->limits.features & BLK_FEAT_WRITE_SYNCHRONOUS; } static inline bool bdev_stable_writes(struct block_device *bdev) diff --git a/include/linux/swap.h b/include/linux/swap.h index ca533b478c21..6719c6006894 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -205,21 +205,22 @@ struct swap_extent { offsetof(union swap_header, info.badpages)) / sizeof(int)) enum { - SWP_USED = (1 << 0), /* is slot in swap_info[] used? */ - SWP_WRITEOK = (1 << 1), /* ok to write to this swap? */ - SWP_DISCARDABLE = (1 << 2), /* blkdev support discard */ - SWP_DISCARDING = (1 << 3), /* now discarding a free cluster */ - SWP_SOLIDSTATE = (1 << 4), /* blkdev seeks are cheap */ - SWP_CONTINUED = (1 << 5), /* swap_map has count continuation */ - SWP_BLKDEV = (1 << 6), /* its a block device */ - SWP_ACTIVATED = (1 << 7), /* set after swap_activate success */ - SWP_FS_OPS = (1 << 8), /* swapfile operations go through fs */ - SWP_AREA_DISCARD = (1 << 9), /* single-time swap area discards */ - SWP_PAGE_DISCARD = (1 << 10), /* freed swap page-cluster discards */ - SWP_STABLE_WRITES = (1 << 11), /* no overwrite PG_writeback pages */ - SWP_SYNCHRONOUS_IO = (1 << 12), /* synchronous IO is efficient */ - /* add others here before... */ - SWP_SCANNING = (1 << 14), /* refcount in scan_swap_map */ + SWP_USED = (1 << 0), /* is slot in swap_info[] used? */ + SWP_WRITEOK = (1 << 1), /* ok to write to this swap? */ + SWP_DISCARDABLE = (1 << 2), /* blkdev support discard */ + SWP_DISCARDING = (1 << 3), /* now discarding a free cluster */ + SWP_SOLIDSTATE = (1 << 4), /* blkdev seeks are cheap */ + SWP_CONTINUED = (1 << 5), /* swap_map has count continuation */ + SWP_BLKDEV = (1 << 6), /* its a block device */ + SWP_ACTIVATED = (1 << 7), /* set after swap_activate success */ + SWP_FS_OPS = (1 << 8), /* swapfile operations go through fs */ + SWP_AREA_DISCARD = (1 << 9), /* single-time swap area discards */ + SWP_PAGE_DISCARD = (1 << 10), /* freed swap page-cluster discards */ + SWP_STABLE_WRITES = (1 << 11), /* no overwrite PG_writeback pages */ + SWP_READ_SYNCHRONOUS_IO = (1 << 12), /* synchronous read IO is efficient */ + SWP_WRITE_SYNCHRONOUS_IO = (1 << 13), /* synchronous write IO is efficient */ + /* add others here before... */ + SWP_SCANNING = (1 << 14), /* refcount in scan_swap_map */ }; #define SWAP_CLUSTER_MAX 32UL diff --git a/mm/memory.c b/mm/memory.c index 2366578015ad..93eb6c29e52c 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4278,7 +4278,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) swapcache = folio; if (!folio) { - if (data_race(si->flags & SWP_SYNCHRONOUS_IO) && + if (data_race(si->flags & SWP_READ_SYNCHRONOUS_IO) && __swap_count(entry) == 1) { /* skip swapcache */ folio = alloc_swap_folio(vmf); @@ -4413,7 +4413,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) goto out_nomap; } - /* allocated large folios for SWP_SYNCHRONOUS_IO */ + /* allocated large folios for SWP_READ_SYNCHRONOUS_IO */ if (folio_test_large(folio) && !folio_test_swapcache(folio)) { unsigned long nr = folio_nr_pages(folio); unsigned long folio_start = ALIGN_DOWN(vmf->address, nr * PAGE_SIZE); diff --git a/mm/page_io.c b/mm/page_io.c index 78bc88acee79..ffcc9dbbe61e 100644 --- a/mm/page_io.c +++ b/mm/page_io.c @@ -455,10 +455,10 @@ void __swap_writepage(struct folio *folio, struct writeback_control *wbc) swap_writepage_fs(folio, wbc); /* * ->flags can be updated non-atomicially (scan_swap_map_slots), - * but that will never affect SWP_SYNCHRONOUS_IO, so the data_race + * but that will never affect SWP_WRITE_SYNCHRONOUS_IO, so the data_race * is safe. */ - else if (data_race(sis->flags & SWP_SYNCHRONOUS_IO)) + else if (data_race(sis->flags & SWP_WRITE_SYNCHRONOUS_IO)) swap_writepage_bdev_sync(folio, wbc, sis); else swap_writepage_bdev_async(folio, wbc, sis); @@ -592,7 +592,7 @@ static void swap_read_folio_bdev_async(struct folio *folio, void swap_read_folio(struct folio *folio, struct swap_iocb **plug) { struct swap_info_struct *sis = swp_swap_info(folio->swap); - bool synchronous = sis->flags & SWP_SYNCHRONOUS_IO; + bool synchronous = sis->flags & SWP_READ_SYNCHRONOUS_IO; bool workingset = folio_test_workingset(folio); unsigned long pflags; bool in_thrashing; diff --git a/mm/swapfile.c b/mm/swapfile.c index 0cded32414a1..84f6fc86be2b 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -3460,8 +3460,11 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags) if (si->bdev && bdev_stable_writes(si->bdev)) si->flags |= SWP_STABLE_WRITES; - if (si->bdev && bdev_synchronous(si->bdev)) - si->flags |= SWP_SYNCHRONOUS_IO; + if (si->bdev && bdev_read_synchronous(si->bdev)) + si->flags |= SWP_READ_SYNCHRONOUS_IO; + + if (si->bdev && bdev_write_synchronous(si->bdev)) + si->flags |= SWP_WRITE_SYNCHRONOUS_IO; if (si->bdev && bdev_nonrot(si->bdev)) { si->flags |= SWP_SOLIDSTATE;