From patchwork Tue Dec 17 14:39:42 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 11297747 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1E37D6C1 for ; Tue, 17 Dec 2019 14:39:54 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id D1E6C24679 for ; Tue, 17 Dec 2019 14:39:53 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=kernel-dk.20150623.gappssmtp.com header.i=@kernel-dk.20150623.gappssmtp.com header.b="ZFmHNFAd" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D1E6C24679 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 188A68E0075; Tue, 17 Dec 2019 09:39:53 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 111EB8E0072; Tue, 17 Dec 2019 09:39:53 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F1B938E0075; Tue, 17 Dec 2019 09:39:52 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0049.hostedemail.com [216.40.44.49]) by kanga.kvack.org (Postfix) with ESMTP id D8EB38E0072 for ; Tue, 17 Dec 2019 09:39:52 -0500 (EST) Received: from smtpin01.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 91DA82DFC for ; Tue, 17 Dec 2019 14:39:52 +0000 (UTC) X-FDA: 76274892624.01.nose42_562b541292315 X-Spam-Summary: 50,0,0,f2d52e87f676d86d,d41d8cd98f00b204,axboe@kernel.dk,::linux-fsdevel@vger.kernel.org:linux-block@vger.kernel.org:willy@infradead.org:clm@fb.com:torvalds@linux-foundation.org:david@fromorbit.com,RULES_HIT:41:69:355:379:541:966:967:968:973:988:989:1260:1311:1314:1345:1437:1515:1535:1544:1605:1711:1730:1747:1777:1792:1981:2194:2196:2198:2199:2200:2201:2393:2525:2560:2563:2682:2685:2693:2859:2903:2933:2937:2939:2942:2945:2947:2951:2954:3022:3138:3139:3140:3141:3142:3865:3866:3867:3868:3870:3871:3872:3874:3934:3936:3938:3941:3944:3947:3950:3953:3956:3959:4031:4118:4250:4385:4605:5007:6119:6261:6653:7875:7903:7974:8603:9025:9040:10004:10226:11026:11232:11658:11914:12043:12291:12296:12297:12438:12517:12519:12683:12903:13137:13150:13161:13229:13230:13231:13845:13894:13972:14039:14096:14181:14394:14721:21080:21324:21325:21444:21627:21740:21773:30045:30054:30069:30070:30079,0,RBL:209.85.166.66:@kernel.dk:.lbl8.mailshell.net-62.2.0.100 66.100.201.201,CacheIP:none,Bay esian:0. X-HE-Tag: nose42_562b541292315 X-Filterd-Recvd-Size: 7159 Received: from mail-io1-f66.google.com (mail-io1-f66.google.com [209.85.166.66]) by imf01.hostedemail.com (Postfix) with ESMTP for ; Tue, 17 Dec 2019 14:39:52 +0000 (UTC) Received: by mail-io1-f66.google.com with SMTP id t26so10817630ioi.13 for ; Tue, 17 Dec 2019 06:39:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=0sZEnSkiYQNg6SRJmFflI3w/YPx9FXg9wqpG+aZiUIM=; b=ZFmHNFAdcG7mQJRZW+JQhhU6M5I9mUsiClpOEIcm4l+zxMdF3+8vNscaoLE6Ixz8cK Fwwh4HpRBW135LiR/ts9XPRvQCkGIqz3NYGkoG01rCaxt8Y6ZSkM3q3E4JDgVusCk0Nz wyD8BSgwt1xouzFrZsdMhlmtVl91gLdHXF+xnPxxhzLIiJS5UnLvf/gtpGwDhIeC1UJF f2buPPaMh2zmMyy3MfluQl7rimW4fGJZh09GFb7sD3QbvmC98i1/TD16veiObym089bw o8RqmSmla8kCTNssOdDZrKgbj/Lcv+D5RjWhOuj3/8n4kKqm/pxYb/KeWDHSRQrjMFCS 0y5w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=0sZEnSkiYQNg6SRJmFflI3w/YPx9FXg9wqpG+aZiUIM=; b=pjCLib6khwCQGAKW3qE76FArjHQQT3/6lhqt0QvjQl2jdcyfMW6XwhdfmXxwB6cJMA /bpT0gpmnmfCQUQapieMmyf08iOsc0AujH9Z/74Kg2PzwwK3r4OyXtQHwjeUKLpcpFBM 5H6lV268jpmqpPk3CyZlthS/IZyeChrtRZDeG7s9ORJrsjS5I93Wl4X3MRN2Sz+NZ3x3 NPHyqjqxsAG/iNgQgey4FJZtrqtRhs+edn4AQnU9LsXtbuOvCXFax8kv2Bc7WaXfxqJe TqKMA8pe9vfEu2A1r71jUn8WwvBPXGsrDKntCsYQqtETjyuiFHwoq2QuEbxjNzmcfFHs 2Yaw== X-Gm-Message-State: APjAAAX2OvnhTu2SfgSLKQlUR8cVaBGU97S4Ct5iKZj2wvrLXHjq7do0 6mDM0cttxXWCkIAOKXNWG4bfwD6KaG7p7Q== X-Google-Smtp-Source: APXvYqzDsH+MCOmq9YfOhpFDx//rJABZlfmdvz8nIgn21DdebL9/fB5mPQh0kVogMn7XFjwVW9bkYg== X-Received: by 2002:a6b:731a:: with SMTP id e26mr3843894ioh.254.1576593591067; Tue, 17 Dec 2019 06:39:51 -0800 (PST) Received: from x1.thefacebook.com ([65.144.74.34]) by smtp.gmail.com with ESMTPSA id w21sm5285255ioc.34.2019.12.17.06.39.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 17 Dec 2019 06:39:50 -0800 (PST) From: Jens Axboe To: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org Cc: willy@infradead.org, clm@fb.com, torvalds@linux-foundation.org, david@fromorbit.com Subject: [PATCHSET v5 0/6] Support for RWF_UNCACHED Date: Tue, 17 Dec 2019 07:39:42 -0700 Message-Id: <20191217143948.26380-1-axboe@kernel.dk> X-Mailer: git-send-email 2.24.1 MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Recently someone asked me how io_uring buffered IO compares to mmaped IO in terms of performance. So I ran some tests with buffered IO, and found the experience to be somewhat painful. The test case is pretty basic, random reads over a dataset that's 10x the size of RAM. Performance starts out fine, and then the page cache fills up and we hit a throughput cliff. CPU usage of the IO threads go up, and we have kswapd spending 100% of a core trying to keep up. Seeing that, I was reminded of the many complaints I here about buffered IO, and the fact that most of the folks complaining will ultimately bite the bullet and move to O_DIRECT to just get the kernel out of the way. But I don't think it needs to be like that. Switching to O_DIRECT isn't always easily doable. The buffers have different life times, size and alignment constraints, etc. On top of that, mixing buffered and O_DIRECT can be painful. Seems to me that we have an opportunity to provide something that sits somewhere in between buffered and O_DIRECT, and this is where RWF_UNCACHED enters the picture. If this flag is set on IO, we get the following behavior: - If the data is in cache, it remains in cache and the copy (in or out) is served to/from that. This is true for both reads and writes. - For writes, if the data is NOT in cache, we add it while performing the IO. When the IO is done, we remove it again. - For reads, if the data is NOT in the cache, we allocate a private page and use that for IO. When the IO is done, we free this page. The page never sees the page cache. With this, I can do 100% smooth buffered reads or writes without pushing the kernel to the state where kswapd is sweating bullets. In fact it doesn't even register. Comments appreciated! This should work on any standard file system, using either the generic helpers or iomap. I have tested ext4 and xfs for the right read/write behavior, but no further validation has been done yet. This version contains the bigger prep patch of switching iomap_apply() and actors to struct iomap_data, and I hope I didn't mess that up too badly. I'll try and exercise it all, I've done XFS mounts and reads+writes and it seems happy from that POV at least. The core of the changes are actually really small, the majority of the diff is just prep work to get there. Patches are against current git, and can also be found here: https://git.kernel.dk/cgit/linux-block/log/?h=buffered-uncached fs/ceph/file.c | 2 +- fs/dax.c | 25 +++-- fs/ext4/file.c | 2 +- fs/iomap/apply.c | 61 ++++++++--- fs/iomap/buffered-io.c | 230 +++++++++++++++++++++++++--------------- fs/iomap/direct-io.c | 57 +++++----- fs/iomap/fiemap.c | 48 +++++---- fs/iomap/seek.c | 64 ++++++----- fs/iomap/swapfile.c | 27 ++--- fs/iomap/trace.h | 4 +- fs/nfs/file.c | 2 +- fs/xfs/xfs_iomap.c | 7 +- include/linux/fs.h | 7 +- include/linux/iomap.h | 20 +++- include/uapi/linux/fs.h | 5 +- mm/filemap.c | 87 +++++++++++++-- 16 files changed, 439 insertions(+), 209 deletions(-) Changes since v4: - Add patch for disabling delalloc on buffered writes on xfs - Fixup an XFS flags mishap - Add iomap flag trace definitions - Fixup silly add_to_page_cache() vs add_to_page_cache_lru() mistake Changes since v3: - Add iomap_actor_data to cut down on arguments - Fix bad flag drop in iomap_write_begin() - Remove unused IOMAP_WRITE_F_UNCACHED flag - Don't use the page cache at all for reads Changes since v2: - Rework the write side according to Chinners suggestions. Much cleaner this way. It does mean that we invalidate the full write region if just ONE page (or more) had to be created, where before it was more granular. I don't think that's a concern, and on the plus side, we now no longer have to chunk invalidations into 15/16 pages at the time. - Cleanups Changes since v1: - Switch to pagevecs for write_drop_cached_pages() - Use page_offset() instead of manual shift - Ensure we hold a reference on the page between calling ->write_end() and checking the mapping on the locked page - Fix XFS multi-page streamed writes, we'd drop the UNCACHED flag after the first page --- Jens Axboe