From patchwork Tue Dec 17 14:39:43 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 11297753 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0B1826C1 for ; Tue, 17 Dec 2019 14:39:56 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id BE23F24679 for ; Tue, 17 Dec 2019 14:39:55 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=kernel-dk.20150623.gappssmtp.com header.i=@kernel-dk.20150623.gappssmtp.com header.b="Rz0PTf0W" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org BE23F24679 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 4984F8E0076; Tue, 17 Dec 2019 09:39:54 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 3D2FE8E0072; Tue, 17 Dec 2019 09:39:54 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 299338E0076; Tue, 17 Dec 2019 09:39:54 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0065.hostedemail.com [216.40.44.65]) by kanga.kvack.org (Postfix) with ESMTP id F20748E0072 for ; Tue, 17 Dec 2019 09:39:53 -0500 (EST) Received: from smtpin27.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with SMTP id ACB0D8249980 for ; Tue, 17 Dec 2019 14:39:53 +0000 (UTC) X-FDA: 76274892666.27.trip28_5653fd3cb4a03 X-Spam-Summary: 2,0,0,9e401d099bef07f4,d41d8cd98f00b204,axboe@kernel.dk,::linux-fsdevel@vger.kernel.org:linux-block@vger.kernel.org:willy@infradead.org:clm@fb.com:torvalds@linux-foundation.org:david@fromorbit.com:axboe@kernel.dk,RULES_HIT:2:41:355:379:541:800:960:966:988:989:1260:1311:1314:1345:1359:1431:1437:1515:1535:1605:1730:1747:1777:1792:1981:2194:2196:2198:2199:2200:2201:2393:2559:2562:2693:2740:2897:3138:3139:3140:3141:3142:3865:3866:3867:3868:3870:3871:3872:3874:4049:4120:4250:4321:4385:4470:4605:5007:6261:6653:7875:7903:8531:8660:8957:9036:9040:10004:11026:11232:11473:11658:11914:12043:12291:12296:12297:12438:12517:12519:12555:12683:12895:12986:13148:13161:13229:13230:13894:14096:14394:21080:21324:21444:21451:21627:21990:30054:30079,0,RBL:209.85.166.195:@kernel.dk:.lbl8.mailshell.net-62.2.0.100 66.100.201.201,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:26,LUA_SUMMARY:none X-HE-Tag: trip28_5653fd3cb4a03 X-Filterd-Recvd-Size: 9419 Received: from mail-il1-f195.google.com (mail-il1-f195.google.com [209.85.166.195]) by imf11.hostedemail.com (Postfix) with ESMTP for ; Tue, 17 Dec 2019 14:39:53 +0000 (UTC) Received: by mail-il1-f195.google.com with SMTP id c4so4116664ilo.7 for ; Tue, 17 Dec 2019 06:39:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=qEQyHfYbXoJej+s6/OMXT2BLZEPX6SiZ+5HLaKOeShc=; b=Rz0PTf0WwZEmsAUdF5wBZZkfVKCAltgfrzi74jjEA3lLneDy8dSauI/jbG3xoO7X7R e/VLmB+AiDGb4AjO2B/a6EyP+ygtn5D4ibbttq9rQLt5QiJNuzGA8PKC/CE9cdH4s9f+ dNaoH3JbLMpqNSpla+Y1wRrLG2jXJyiYRfyNa/cTX2EX/ow4JZkh0XhafeJtifcvPcPw /mA6Xil4l219I3wA5uND8mxzy81yPgODPqEFgFMTmd/OtU9Ds3F3kx/oD5fjJd118/Sd EO3NABK5jRgz+ogHrcK6ROc0cGWS6zuF8O/S6gQWgDglVQaB14e5ws/hq5XF5GmnU0g9 mBug== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=qEQyHfYbXoJej+s6/OMXT2BLZEPX6SiZ+5HLaKOeShc=; b=pLZe2kbzDy1vaP50v5kK82PaGI7CKY3s5SOO6dBO5nf8hyRqkkxgQc5p4S1ou+u887 PczTnRk1cmrbAx73QGsAs+YY0ISEAh7SbrxgaguHQbHGeKeXjVyKFVGZffBRToHuhITo 6v/qHxnVjUnXKKWpMM80NuS10wj4VPEmkVKrhoCcRnaLTiUD0aJhTYStJgryOK247mQS 2kW3IZqKbBd+HBb6jQVtSalhoIbFDk57N7Rlee13AwjIMX5zjCwBzXnU74tX6S9u7qun 5zgg6CcCeWu8zPxKDAweJMMeTmTACEP6pl6ZB67kxtMlTxL2/QW25bivmlzWDUKUb72x h/jw== X-Gm-Message-State: APjAAAUuLXx1uI8MJqKUjKVKzfzukj05qo7lkb/RSxYofVdIVRAAWXsR BK4+ZyDm0U8OFjNtHESI5u/jcfuriJ2aiQ== X-Google-Smtp-Source: APXvYqxNs1UwvqiKd0kuf9aHlqRHi8PBLH1NSYmwr4Xu4fISsW1/lVDKFz8aN9GKiiU+f+n2z5OIYg== X-Received: by 2002:a92:d809:: with SMTP id y9mr18025008ilm.261.1576593592098; Tue, 17 Dec 2019 06:39:52 -0800 (PST) Received: from x1.thefacebook.com ([65.144.74.34]) by smtp.gmail.com with ESMTPSA id w21sm5285255ioc.34.2019.12.17.06.39.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 17 Dec 2019 06:39:51 -0800 (PST) From: Jens Axboe To: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org Cc: willy@infradead.org, clm@fb.com, torvalds@linux-foundation.org, david@fromorbit.com, Jens Axboe Subject: [PATCH 1/6] fs: add read support for RWF_UNCACHED Date: Tue, 17 Dec 2019 07:39:43 -0700 Message-Id: <20191217143948.26380-2-axboe@kernel.dk> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20191217143948.26380-1-axboe@kernel.dk> References: <20191217143948.26380-1-axboe@kernel.dk> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: If RWF_UNCACHED is set for io_uring (or preadv2(2)), we'll use private pages for the buffered reads. These pages will never be inserted into the page cache, and they are simply droped when we have done the copy at the end of IO. If pages in the read range are already in the page cache, then use those for just copying the data instead of starting IO on private pages. A previous solution used the page cache even for non-cached ranges, but the cost of doing so was too high. Removing nodes at the end is expensive, even with LRU bypass. On top of that, repeatedly instantiating new xarray nodes is very costly, as it needs to memset 576 bytes of data, and freeing said nodes involve an RCU call per node as well. All that adds up, making uncached somewhat slower than O_DIRECT. With the current solition, we're basically at O_DIRECT levels of performance for RWF_UNCACHED IO. Protect against truncate the same way O_DIRECT does, by calling inode_dio_begin() to elevate the inode->i_dio_count. Signed-off-by: Jens Axboe --- include/linux/fs.h | 3 +++ include/uapi/linux/fs.h | 5 ++++- mm/filemap.c | 38 ++++++++++++++++++++++++++++++++------ 3 files changed, 39 insertions(+), 7 deletions(-) diff --git a/include/linux/fs.h b/include/linux/fs.h index 98e0349adb52..092ea2a4319b 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -314,6 +314,7 @@ enum rw_hint { #define IOCB_SYNC (1 << 5) #define IOCB_WRITE (1 << 6) #define IOCB_NOWAIT (1 << 7) +#define IOCB_UNCACHED (1 << 8) struct kiocb { struct file *ki_filp; @@ -3418,6 +3419,8 @@ static inline int kiocb_set_rw_flags(struct kiocb *ki, rwf_t flags) ki->ki_flags |= (IOCB_DSYNC | IOCB_SYNC); if (flags & RWF_APPEND) ki->ki_flags |= IOCB_APPEND; + if (flags & RWF_UNCACHED) + ki->ki_flags |= IOCB_UNCACHED; return 0; } diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h index 379a612f8f1d..357ebb0e0c5d 100644 --- a/include/uapi/linux/fs.h +++ b/include/uapi/linux/fs.h @@ -299,8 +299,11 @@ typedef int __bitwise __kernel_rwf_t; /* per-IO O_APPEND */ #define RWF_APPEND ((__force __kernel_rwf_t)0x00000010) +/* drop cache after reading or writing data */ +#define RWF_UNCACHED ((__force __kernel_rwf_t)0x00000040) + /* mask of flags supported by the kernel */ #define RWF_SUPPORTED (RWF_HIPRI | RWF_DSYNC | RWF_SYNC | RWF_NOWAIT |\ - RWF_APPEND) + RWF_APPEND | RWF_UNCACHED) #endif /* _UAPI_LINUX_FS_H */ diff --git a/mm/filemap.c b/mm/filemap.c index bf6aa30be58d..7ddc4d8386cf 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1990,6 +1990,13 @@ static void shrink_readahead_size_eio(struct file *filp, ra->ra_pages /= 4; } +static void buffered_put_page(struct page *page, bool clear_mapping) +{ + if (clear_mapping) + page->mapping = NULL; + put_page(page); +} + /** * generic_file_buffered_read - generic file read routine * @iocb: the iocb to read @@ -2013,6 +2020,7 @@ static ssize_t generic_file_buffered_read(struct kiocb *iocb, struct address_space *mapping = filp->f_mapping; struct inode *inode = mapping->host; struct file_ra_state *ra = &filp->f_ra; + bool did_dio_begin = false; loff_t *ppos = &iocb->ki_pos; pgoff_t index; pgoff_t last_index; @@ -2032,6 +2040,7 @@ static ssize_t generic_file_buffered_read(struct kiocb *iocb, offset = *ppos & ~PAGE_MASK; for (;;) { + bool clear_mapping = false; struct page *page; pgoff_t end_index; loff_t isize; @@ -2048,6 +2057,13 @@ static ssize_t generic_file_buffered_read(struct kiocb *iocb, if (!page) { if (iocb->ki_flags & IOCB_NOWAIT) goto would_block; + /* UNCACHED implies no read-ahead */ + if (iocb->ki_flags & IOCB_UNCACHED) { + did_dio_begin = true; + /* block truncate for UNCACHED reads */ + inode_dio_begin(inode); + goto no_cached_page; + } page_cache_sync_readahead(mapping, ra, filp, index, last_index - index); @@ -2106,7 +2122,7 @@ static ssize_t generic_file_buffered_read(struct kiocb *iocb, isize = i_size_read(inode); end_index = (isize - 1) >> PAGE_SHIFT; if (unlikely(!isize || index > end_index)) { - put_page(page); + buffered_put_page(page, clear_mapping); goto out; } @@ -2115,7 +2131,7 @@ static ssize_t generic_file_buffered_read(struct kiocb *iocb, if (index == end_index) { nr = ((isize - 1) & ~PAGE_MASK) + 1; if (nr <= offset) { - put_page(page); + buffered_put_page(page, clear_mapping); goto out; } } @@ -2147,7 +2163,7 @@ static ssize_t generic_file_buffered_read(struct kiocb *iocb, offset &= ~PAGE_MASK; prev_offset = offset; - put_page(page); + buffered_put_page(page, clear_mapping); written += ret; if (!iov_iter_count(iter)) goto out; @@ -2189,7 +2205,7 @@ static ssize_t generic_file_buffered_read(struct kiocb *iocb, if (unlikely(error)) { if (error == AOP_TRUNCATED_PAGE) { - put_page(page); + buffered_put_page(page, clear_mapping); error = 0; goto find_page; } @@ -2206,7 +2222,7 @@ static ssize_t generic_file_buffered_read(struct kiocb *iocb, * invalidate_mapping_pages got it */ unlock_page(page); - put_page(page); + buffered_put_page(page, clear_mapping); goto find_page; } unlock_page(page); @@ -2221,7 +2237,7 @@ static ssize_t generic_file_buffered_read(struct kiocb *iocb, readpage_error: /* UHHUH! A synchronous read error occurred. Report it */ - put_page(page); + buffered_put_page(page, clear_mapping); goto out; no_cached_page: @@ -2234,6 +2250,14 @@ static ssize_t generic_file_buffered_read(struct kiocb *iocb, error = -ENOMEM; goto out; } + if (iocb->ki_flags & IOCB_UNCACHED) { + __SetPageLocked(page); + page->mapping = mapping; + page->index = index; + clear_mapping = true; + goto readpage; + } + error = add_to_page_cache_lru(page, mapping, index, mapping_gfp_constraint(mapping, GFP_KERNEL)); if (error) { @@ -2250,6 +2274,8 @@ static ssize_t generic_file_buffered_read(struct kiocb *iocb, would_block: error = -EAGAIN; out: + if (did_dio_begin) + inode_dio_end(inode); ra->prev_pos = prev_index; ra->prev_pos <<= PAGE_SHIFT; ra->prev_pos |= prev_offset;