From patchwork Sun Jul 18 22:39:27 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andreas Gruenbacher X-Patchwork-Id: 12384521 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-17.2 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C0FF3C636C8 for ; Sun, 18 Jul 2021 22:39:50 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 99EDE61166 for ; Sun, 18 Jul 2021 22:39:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233505AbhGRWms (ORCPT ); Sun, 18 Jul 2021 18:42:48 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:60367 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233062AbhGRWmp (ORCPT ); Sun, 18 Jul 2021 18:42:45 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1626647985; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=azzEr4YShufMXoWVaQ28QKOg/ot07qvO0ZF8GTvxfys=; b=DpUayOA4ro4fLxC4z547VYl8L76/SQseYiSWPV8efKZPVNAODe1H5YDXDV2PVpXGIKi1zM jEHZXukAHGrTrCYqR/oE0Up8wcDtkyBx9hMw35U4UcAQeorNAvGr/xmtbcva1kTf6NA5EI spS8H9lI60WpunHYGdu9GyP0i2sATNI= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-130-BxkBRjW-OhOQCpuNplEvDQ-1; Sun, 18 Jul 2021 18:39:44 -0400 X-MC-Unique: BxkBRjW-OhOQCpuNplEvDQ-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 1B2DC362FC; Sun, 18 Jul 2021 22:39:43 +0000 (UTC) Received: from max.com (unknown [10.40.195.75]) by smtp.corp.redhat.com (Postfix) with ESMTP id E3E3D60C0F; Sun, 18 Jul 2021 22:39:40 +0000 (UTC) From: Andreas Gruenbacher To: Linus Torvalds Cc: Alexander Viro , Jan Kara , Matthew Wilcox , Christoph Hellwig , cluster-devel@redhat.com, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, ocfs2-devel@oss.oracle.com, Andreas Gruenbacher Subject: [PATCH v2 1/6] iov_iter: Introduce fault_in_iov_iter helper Date: Mon, 19 Jul 2021 00:39:27 +0200 Message-Id: <20210718223932.2703330-2-agruenba@redhat.com> In-Reply-To: <20210718223932.2703330-1-agruenba@redhat.com> References: <20210718223932.2703330-1-agruenba@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Introduce a new fault_in_iov_iter helper for manually faulting in an iterator. Other than fault_in_pages_writeable(), this function is non-destructive. We'll use fault_in_iov_iter in gfs2 once we've determined that the iterator passed to .read_iter or .write_iter isn't in memory. Signed-off-by: Andreas Gruenbacher --- include/linux/mm.h | 3 ++ include/linux/uio.h | 1 + lib/iov_iter.c | 42 ++++++++++++++++++++++++++++ mm/gup.c | 68 +++++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 114 insertions(+) diff --git a/include/linux/mm.h b/include/linux/mm.h index 8ae31622deef..dc7d64632a60 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1836,6 +1836,9 @@ int get_user_pages_fast(unsigned long start, int nr_pages, int pin_user_pages_fast(unsigned long start, int nr_pages, unsigned int gup_flags, struct page **pages); +unsigned long fault_in_user_pages(unsigned long start, unsigned long len, + bool write); + int account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc); int __account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc, struct task_struct *task, bool bypass_rlim); diff --git a/include/linux/uio.h b/include/linux/uio.h index d3ec87706d75..74f819c41735 100644 --- a/include/linux/uio.h +++ b/include/linux/uio.h @@ -124,6 +124,7 @@ size_t iov_iter_copy_from_user_atomic(struct page *page, void iov_iter_advance(struct iov_iter *i, size_t bytes); void iov_iter_revert(struct iov_iter *i, size_t bytes); int iov_iter_fault_in_readable(struct iov_iter *i, size_t bytes); +size_t fault_in_iov_iter(const struct iov_iter *i); size_t iov_iter_single_seg_count(const struct iov_iter *i); size_t copy_page_to_iter(struct page *page, size_t offset, size_t bytes, struct iov_iter *i); diff --git a/lib/iov_iter.c b/lib/iov_iter.c index c701b7a187f2..3beecf8f77de 100644 --- a/lib/iov_iter.c +++ b/lib/iov_iter.c @@ -487,6 +487,48 @@ int iov_iter_fault_in_readable(struct iov_iter *i, size_t bytes) } EXPORT_SYMBOL(iov_iter_fault_in_readable); +/** + * fault_in_iov_iter - fault in iov iterator for reading / writing + * @i: iterator + * + * Faults in the iterator using get_user_pages, i.e., without triggering + * hardware page faults. + * + * This is primarily useful when we know that some or all of the pages in @i + * aren't in memory. For iterators that are likely to be in memory, + * fault_in_pages_readable() may be more appropriate. + * + * Other than fault_in_pages_writeable(), this function is non-destructive even + * when faulting in pages for writing. + * + * Returns the number of bytes faulted in, or the size of @i if @i doesn't need + * faulting in. + */ +size_t fault_in_iov_iter(const struct iov_iter *i) +{ + size_t count = i->count; + const struct iovec *p; + size_t ret = 0, skip; + + if (iter_is_iovec(i)) { + for (p = i->iov, skip = i->iov_offset; count; p++, skip = 0) { + unsigned long len = min(count, p->iov_len - skip); + unsigned long start, l; + + if (unlikely(!len)) + continue; + start = (unsigned long)p->iov_base + skip; + l = fault_in_user_pages(start, len, iov_iter_rw(i) != WRITE); + ret += l; + if (unlikely(l != len)) + break; + count -= l; + } + } + return ret; +} +EXPORT_SYMBOL(fault_in_iov_iter); + void iov_iter_init(struct iov_iter *i, unsigned int direction, const struct iovec *iov, unsigned long nr_segs, size_t count) diff --git a/mm/gup.c b/mm/gup.c index 3ded6a5f26b2..f87fb24d59e1 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -1565,6 +1565,74 @@ static long __get_user_pages_locked(struct mm_struct *mm, unsigned long start, } #endif /* !CONFIG_MMU */ +/** + * fault_in_user_pages - fault in an address range for reading / writing + * @start: start of address range + * @len: length of address range + * @write: fault in for writing + * + * Note that we don't pin or otherwise hold the pages referenced that we fault + * in. There's no guarantee that they'll stay in memory for any duration of + * time. + * + * Returns the number of bytes faulted in from @start. + */ +unsigned long fault_in_user_pages(unsigned long start, unsigned long len, + bool write) +{ + struct mm_struct *mm = current->mm; + struct vm_area_struct *vma = NULL; + unsigned long end, nstart, nend; + int locked = 0; + int gup_flags; + + /* + * FIXME: Make sure this function doesn't succeed for pages that cannot + * be accessed; otherwise we could end up in a loop trying to fault in + * and then access the pages. (It's okay if a page gets evicted and we + * need more than one retry.) + */ + + /* + * FIXME: Are these the right FOLL_* flags? + */ + + gup_flags = FOLL_TOUCH | FOLL_POPULATE; + if (write) + gup_flags |= FOLL_WRITE; + + end = PAGE_ALIGN(start + len); + for (nstart = start & PAGE_MASK; nstart < end; nstart = nend) { + unsigned long nr_pages; + long ret; + + if (!locked) { + locked = 1; + mmap_read_lock(mm); + vma = find_vma(mm, nstart); + } else if (nstart >= vma->vm_end) + vma = vma->vm_next; + if (!vma || vma->vm_start >= end) + break; + nend = min(end, vma->vm_end); + if (vma->vm_flags & (VM_IO | VM_PFNMAP)) + continue; + if (nstart < vma->vm_start) + nstart = vma->vm_start; + nr_pages = (nend - nstart) / PAGE_SIZE; + ret = __get_user_pages_locked(mm, nstart, nr_pages, + NULL, NULL, &locked, gup_flags); + if (ret <= 0) + break; + nend = nstart + ret * PAGE_SIZE; + } + if (locked) + mmap_read_unlock(mm); + if (nstart > start) + return min(nstart - start, len); + return 0; +} + /** * get_dump_page() - pin user page in memory while writing it to core dump * @addr: user address From patchwork Sun Jul 18 22:39:28 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andreas Gruenbacher X-Patchwork-Id: 12384523 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-17.2 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B3791C636CA for ; Sun, 18 Jul 2021 22:39:57 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8CEFE60FF4 for ; Sun, 18 Jul 2021 22:39:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233683AbhGRWmy (ORCPT ); Sun, 18 Jul 2021 18:42:54 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:44736 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233645AbhGRWmx (ORCPT ); Sun, 18 Jul 2021 18:42:53 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1626647994; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=eIybQ7I3i4b8jsniJfWpWs6xOt2d968vNG/hUFJdVEY=; b=hd8HoSeLWbX2R/Ybb5z9UJhNc3vYLxwUCaIvTggq7h1NZLD8CfJ3Nn/8XTFcHzkQOh+vg9 P2r9Cgw7pqHsyO7CMBWq7mZXuo6opZE+5HMib6wBOs7uru4fXlYyV/I+41d06KrnamfVMK ItmG72oKtG90Py0eL4Py9Uleb0FXCB0= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-601--qKNU9jBNA2RoKNqILHBwg-1; Sun, 18 Jul 2021 18:39:50 -0400 X-MC-Unique: -qKNU9jBNA2RoKNqILHBwg-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id E57AB362F9; Sun, 18 Jul 2021 22:39:48 +0000 (UTC) Received: from max.com (unknown [10.40.195.75]) by smtp.corp.redhat.com (Postfix) with ESMTP id 6F96860C0F; Sun, 18 Jul 2021 22:39:43 +0000 (UTC) From: Andreas Gruenbacher To: Linus Torvalds Cc: Alexander Viro , Jan Kara , Matthew Wilcox , Christoph Hellwig , cluster-devel@redhat.com, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, ocfs2-devel@oss.oracle.com, Andreas Gruenbacher Subject: [PATCH v2 2/6] iomap: Fix iomap_dio_rw return value for page faults Date: Mon, 19 Jul 2021 00:39:28 +0200 Message-Id: <20210718223932.2703330-3-agruenba@redhat.com> In-Reply-To: <20210718223932.2703330-1-agruenba@redhat.com> References: <20210718223932.2703330-1-agruenba@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org When a page fault occurs during a direct I/O, iomap_dio_rw can currently return 0 when a page cannot be accessed. In that case, -EFAULT should be returned instead. (For reads, a return value of 0 indicates the end of file.) Fix that by casting the return value of iomap_apply from 0 to -EFAULT: in that position, we know that we should have been able to read something. Signed-off-by: Andreas Gruenbacher --- fs/iomap/direct-io.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c index 9398b8c31323..a87a43ee8278 100644 --- a/fs/iomap/direct-io.c +++ b/fs/iomap/direct-io.c @@ -561,6 +561,8 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, ret = iomap_apply(inode, pos, count, iomap_flags, ops, dio, iomap_dio_actor); if (ret <= 0) { + if (ret == 0) + ret = -EFAULT; /* magic error code to fall back to buffered I/O */ if (ret == -ENOTBLK) { wait_for_completion = true; From patchwork Sun Jul 18 22:39:29 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andreas Gruenbacher X-Patchwork-Id: 12384527 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-17.2 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1F455C636CB for ; Sun, 18 Jul 2021 22:40:10 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 0624861181 for ; Sun, 18 Jul 2021 22:40:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233729AbhGRWm6 (ORCPT ); Sun, 18 Jul 2021 18:42:58 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:39055 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233690AbhGRWm4 (ORCPT ); Sun, 18 Jul 2021 18:42:56 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1626647996; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=fBZwdp00NW4Y/oBPFNTzlpC1OOQVjFNcVkqWdlmVDEU=; b=XgJPwedfU2k0FKeeh1AF7rq7q1vMGt3xobcF6siUcpirJW1aetDElW2gSJ6tKmzCB7KJ5/ u0eB5Tup8HJsO3ZaxBBJ+WG9YzXQ2/6XrvmtpmyFum60rqbyyKQ8REugRnHUdGz85zGXWo cCRs6QX08Gh/88uBEZqu8Ez51o56At0= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-559-BajAzVvfMeOwnP3fV5AoJw-1; Sun, 18 Jul 2021 18:39:53 -0400 X-MC-Unique: BajAzVvfMeOwnP3fV5AoJw-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 72602100C609; Sun, 18 Jul 2021 22:39:51 +0000 (UTC) Received: from max.com (unknown [10.40.195.75]) by smtp.corp.redhat.com (Postfix) with ESMTP id 44C3560C0F; Sun, 18 Jul 2021 22:39:49 +0000 (UTC) From: Andreas Gruenbacher To: Linus Torvalds Cc: Alexander Viro , Jan Kara , Matthew Wilcox , Christoph Hellwig , cluster-devel@redhat.com, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, ocfs2-devel@oss.oracle.com, Andreas Gruenbacher Subject: [PATCH v2 3/6] gfs2: Add wrapper for iomap_file_buffered_write Date: Mon, 19 Jul 2021 00:39:29 +0200 Message-Id: <20210718223932.2703330-4-agruenba@redhat.com> In-Reply-To: <20210718223932.2703330-1-agruenba@redhat.com> References: <20210718223932.2703330-1-agruenba@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Add a wrapper around iomap_file_buffered_write. We'll add code for when the operation needs to be retried here later. Signed-off-by: Andreas Gruenbacher --- fs/gfs2/file.c | 20 ++++++++++++++------ 1 file changed, 14 insertions(+), 6 deletions(-) diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c index 493a83e3f590..13f701493c3c 100644 --- a/fs/gfs2/file.c +++ b/fs/gfs2/file.c @@ -857,6 +857,18 @@ static ssize_t gfs2_file_read_iter(struct kiocb *iocb, struct iov_iter *to) return written ? written : ret; } +static ssize_t gfs2_file_buffered_write(struct kiocb *iocb, struct iov_iter *from) +{ + struct file *file = iocb->ki_filp; + struct inode *inode = file_inode(file); + ssize_t ret; + + current->backing_dev_info = inode_to_bdi(inode); + ret = iomap_file_buffered_write(iocb, from, &gfs2_iomap_ops); + current->backing_dev_info = NULL; + return ret; +} + /** * gfs2_file_write_iter - Perform a write to a file * @iocb: The io context @@ -908,9 +920,7 @@ static ssize_t gfs2_file_write_iter(struct kiocb *iocb, struct iov_iter *from) goto out_unlock; iocb->ki_flags |= IOCB_DSYNC; - current->backing_dev_info = inode_to_bdi(inode); - buffered = iomap_file_buffered_write(iocb, from, &gfs2_iomap_ops); - current->backing_dev_info = NULL; + buffered = gfs2_file_buffered_write(iocb, from); if (unlikely(buffered <= 0)) { if (!ret) ret = buffered; @@ -932,9 +942,7 @@ static ssize_t gfs2_file_write_iter(struct kiocb *iocb, struct iov_iter *from) if (!ret || ret2 > 0) ret += ret2; } else { - current->backing_dev_info = inode_to_bdi(inode); - ret = iomap_file_buffered_write(iocb, from, &gfs2_iomap_ops); - current->backing_dev_info = NULL; + ret = gfs2_file_buffered_write(iocb, from); if (likely(ret > 0)) { iocb->ki_pos += ret; ret = generic_write_sync(iocb, ret); From patchwork Sun Jul 18 22:39:30 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andreas Gruenbacher X-Patchwork-Id: 12384525 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-17.2 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 83E31C636CA for ; Sun, 18 Jul 2021 22:40:07 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6D4CD60FF4 for ; Sun, 18 Jul 2021 22:40:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233646AbhGRWnC (ORCPT ); Sun, 18 Jul 2021 18:43:02 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:44687 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233807AbhGRWnA (ORCPT ); Sun, 18 Jul 2021 18:43:00 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1626648001; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=4MwVjm5OXM9zYHFFAA3SH+qcDCGvezXdNuQatEJ1db8=; b=KVCkFYo4K9vl5fDKUCaLW1JwBr0B7e15d7Y4y0L71uQ6OyJl+wn34tPX4sbTZWph9X9g9f GWruPxpKEPdFGnAyo6euSVa/Oy131TGsfsu6LICJ6N7G0L6EYZU8jyuybRJivlE6jvwuX+ hJCl+NbCpPgtzdDH654OKykMp7bPjbY= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-603-CDAXjn_TN3mDy0sqgYn1Vg-1; Sun, 18 Jul 2021 18:39:57 -0400 X-MC-Unique: CDAXjn_TN3mDy0sqgYn1Vg-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id F1E08802C80; Sun, 18 Jul 2021 22:39:55 +0000 (UTC) Received: from max.com (unknown [10.40.195.75]) by smtp.corp.redhat.com (Postfix) with ESMTP id C6C3A60C0F; Sun, 18 Jul 2021 22:39:51 +0000 (UTC) From: Andreas Gruenbacher To: Linus Torvalds Cc: Alexander Viro , Jan Kara , Matthew Wilcox , Christoph Hellwig , cluster-devel@redhat.com, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, ocfs2-devel@oss.oracle.com, Andreas Gruenbacher Subject: [PATCH v2 4/6] gfs2: Fix mmap + page fault deadlocks for buffered I/O Date: Mon, 19 Jul 2021 00:39:30 +0200 Message-Id: <20210718223932.2703330-5-agruenba@redhat.com> In-Reply-To: <20210718223932.2703330-1-agruenba@redhat.com> References: <20210718223932.2703330-1-agruenba@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org In the .read_iter and .write_iter file operations, we're accessing user-space memory while holding the inodes glock. There's a possibility that the memory is mapped to the same file, in which case we'd recurse on the same glock. More complex scenarios can involve multiple glocks, processes, and even cluster nodes. Avoids these kinds of problems by disabling page faults while holding a glock. If a page fault occurs, we either end up with a partial read or write, or with -EFAULT if nothing could be read or written. In that case, we drop the glock, fault in the requested pages manually, and repeat the operation. This locking problem in gfs2 was originally reported by Jan Kara. Linus came up with the proposal to disable page faults. Many thanks to Al Viro and Matthew Wilcox for their feedback as well. Signed-off-by: Andreas Gruenbacher --- fs/gfs2/file.c | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+) diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c index 13f701493c3c..99df7934b4d8 100644 --- a/fs/gfs2/file.c +++ b/fs/gfs2/file.c @@ -824,6 +824,12 @@ static ssize_t gfs2_file_read_iter(struct kiocb *iocb, struct iov_iter *to) size_t written = 0; ssize_t ret; + /* + * In this function, we disable page faults when whe're holding the + * inode glock while doing I/O. If a page fault occurs, we drop the + * inode glock, fault in the pages manually, and then we retry. + */ + if (iocb->ki_flags & IOCB_DIRECT) { ret = gfs2_file_direct_read(iocb, to, &gh); if (likely(ret != -ENOTBLK)) @@ -831,6 +837,7 @@ static ssize_t gfs2_file_read_iter(struct kiocb *iocb, struct iov_iter *to) iocb->ki_flags &= ~IOCB_DIRECT; } iocb->ki_flags |= IOCB_NOIO; + /* Leave page faults enabled while we're not holding any locks. */ ret = generic_file_read_iter(iocb, to); iocb->ki_flags &= ~IOCB_NOIO; if (ret >= 0) { @@ -845,13 +852,19 @@ static ssize_t gfs2_file_read_iter(struct kiocb *iocb, struct iov_iter *to) } ip = GFS2_I(iocb->ki_filp->f_mapping->host); gfs2_holder_init(ip->i_gl, LM_ST_SHARED, 0, &gh); +retry: ret = gfs2_glock_nq(&gh); if (ret) goto out_uninit; + pagefault_disable(); ret = generic_file_read_iter(iocb, to); + pagefault_enable(); if (ret > 0) written += ret; gfs2_glock_dq(&gh); + if (unlikely(iov_iter_count(to) && (ret > 0 || ret == -EFAULT)) && + fault_in_iov_iter(to)) + goto retry; out_uninit: gfs2_holder_uninit(&gh); return written ? written : ret; @@ -863,9 +876,20 @@ static ssize_t gfs2_file_buffered_write(struct kiocb *iocb, struct iov_iter *fro struct inode *inode = file_inode(file); ssize_t ret; + /* + * In this function, we disable page faults when whe're holding the + * inode glock while doing I/O. If a page fault occurs, we drop the + * inode glock, fault in the pages manually, and then we retry. + */ + +retry: current->backing_dev_info = inode_to_bdi(inode); + pagefault_disable(); ret = iomap_file_buffered_write(iocb, from, &gfs2_iomap_ops); + pagefault_enable(); current->backing_dev_info = NULL; + if (unlikely(ret == -EFAULT) && fault_in_iov_iter(from)) + goto retry; return ret; } From patchwork Sun Jul 18 22:39:31 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andreas Gruenbacher X-Patchwork-Id: 12384529 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-17.2 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 61134C636CA for ; Sun, 18 Jul 2021 22:40:13 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4477B60FF4 for ; Sun, 18 Jul 2021 22:40:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233936AbhGRWnK (ORCPT ); Sun, 18 Jul 2021 18:43:10 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:25110 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233930AbhGRWnG (ORCPT ); Sun, 18 Jul 2021 18:43:06 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1626648007; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=7mRLVDqIiWSn9u2BNYEgGMNs6NaZK7DsbCGe8tMtuVI=; b=Th6DUltAsWt0ZRjgy2dF4qROuxOAha7U3jVSOma50VM30vR6sqz6JM2OV2yM/RKR8FwwiD v8HP/yBOmI1fWgF7zE4pxaUUYdY2iKjNktFJotrDrr4tTVEsQVouRnIa8NBwU/6OV2e+B7 QuV5ghpyr+pGpW1C9N9WEh9x0FNa7Ns= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-120-XHO2893sNPeaRVaucOuwnQ-1; Sun, 18 Jul 2021 18:40:03 -0400 X-MC-Unique: XHO2893sNPeaRVaucOuwnQ-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 1BB991084F57; Sun, 18 Jul 2021 22:40:02 +0000 (UTC) Received: from max.com (unknown [10.40.195.75]) by smtp.corp.redhat.com (Postfix) with ESMTP id 51AC860C0F; Sun, 18 Jul 2021 22:39:56 +0000 (UTC) From: Andreas Gruenbacher To: Linus Torvalds Cc: Alexander Viro , Jan Kara , Matthew Wilcox , Christoph Hellwig , cluster-devel@redhat.com, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, ocfs2-devel@oss.oracle.com, Andreas Gruenbacher Subject: [PATCH v2 5/6] iov_iter: Introduce ITER_FLAG_FAST_ONLY flag Date: Mon, 19 Jul 2021 00:39:31 +0200 Message-Id: <20210718223932.2703330-6-agruenba@redhat.com> In-Reply-To: <20210718223932.2703330-1-agruenba@redhat.com> References: <20210718223932.2703330-1-agruenba@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Introduce a new ITER_FLAG_FAST_ONLY flag to indicate to get_user_pages to use the FOLL_FAST_ONLY flag. This will cause get_user_pages to fail when it would otherwise fault in a page. Currently, the ITER_FLAG_FAST_ONLY flag is only checked in iov_iter_get_pages and iov_iter_get_pages_alloc. This is enough for iomaop_dio_rw(), but it may make sense to check for this flag in other contexts as well. Signed-off-by: Andreas Gruenbacher --- include/linux/uio.h | 15 ++++++++++++--- lib/iov_iter.c | 20 +++++++++++++++----- 2 files changed, 27 insertions(+), 8 deletions(-) diff --git a/include/linux/uio.h b/include/linux/uio.h index 74f819c41735..d3d629c2153a 100644 --- a/include/linux/uio.h +++ b/include/linux/uio.h @@ -18,6 +18,9 @@ struct kvec { }; enum iter_type { + /* set if get_user_pages should use FOLL_FAST_ONLY */ + ITER_FLAG_FAST_ONLY = 2, + /* iter types */ ITER_IOVEC = 4, ITER_KVEC = 8, @@ -30,8 +33,9 @@ enum iter_type { struct iov_iter { /* * Bit 0 is the read/write bit, set if we're writing. - * Bit 1 is the BVEC_FLAG_NO_REF bit, set if type is a bvec and - * the caller isn't expecting to drop a page reference when done. + * Bit 1 is the ITER_FLAG_FAST_ONLY bit, set if get_user_pages + * should use the FOLL_FAST_ONLY flag when trying to fault in pages + * (only useful for type ITER_IOVEC). */ unsigned int type; size_t iov_offset; @@ -55,7 +59,7 @@ struct iov_iter { static inline enum iter_type iov_iter_type(const struct iov_iter *i) { - return i->type & ~(READ | WRITE); + return i->type & ~(READ | WRITE | ITER_FLAG_FAST_ONLY); } static inline bool iter_is_iovec(const struct iov_iter *i) @@ -93,6 +97,11 @@ static inline unsigned char iov_iter_rw(const struct iov_iter *i) return i->type & (READ | WRITE); } +static inline bool iov_iter_is_fast_only(const struct iov_iter *i) +{ + return i->type & ITER_FLAG_FAST_ONLY; +} + /* * Total number of bytes covered by an iovec. * diff --git a/lib/iov_iter.c b/lib/iov_iter.c index 3beecf8f77de..182ff2afed19 100644 --- a/lib/iov_iter.c +++ b/lib/iov_iter.c @@ -1538,6 +1538,8 @@ ssize_t iov_iter_get_pages(struct iov_iter *i, struct page **pages, size_t maxsize, unsigned maxpages, size_t *start) { + unsigned int gup_flags = 0; + if (maxsize > i->count) maxsize = i->count; @@ -1548,6 +1550,11 @@ ssize_t iov_iter_get_pages(struct iov_iter *i, if (unlikely(iov_iter_is_discard(i))) return -EFAULT; + if (iov_iter_rw(i) != WRITE) + gup_flags |= FOLL_WRITE; + if (iov_iter_is_fast_only(i)) + gup_flags |= FOLL_FAST_ONLY; + iterate_all_kinds(i, maxsize, v, ({ unsigned long addr = (unsigned long)v.iov_base; size_t len = v.iov_len + (*start = addr & (PAGE_SIZE - 1)); @@ -1558,9 +1565,7 @@ ssize_t iov_iter_get_pages(struct iov_iter *i, len = maxpages * PAGE_SIZE; addr &= ~(PAGE_SIZE - 1); n = DIV_ROUND_UP(len, PAGE_SIZE); - res = get_user_pages_fast(addr, n, - iov_iter_rw(i) != WRITE ? FOLL_WRITE : 0, - pages); + res = get_user_pages_fast(addr, n, gup_flags, pages); if (unlikely(res < 0)) return res; return (res == n ? len : res * PAGE_SIZE) - *start; @@ -1665,6 +1670,7 @@ ssize_t iov_iter_get_pages_alloc(struct iov_iter *i, struct page ***pages, size_t maxsize, size_t *start) { + unsigned int gup_flags = 0; struct page **p; if (maxsize > i->count) @@ -1677,6 +1683,11 @@ ssize_t iov_iter_get_pages_alloc(struct iov_iter *i, if (unlikely(iov_iter_is_discard(i))) return -EFAULT; + if (iov_iter_rw(i) != WRITE) + gup_flags |= FOLL_WRITE; + if (iov_iter_is_fast_only(i)) + gup_flags |= FOLL_FAST_ONLY; + iterate_all_kinds(i, maxsize, v, ({ unsigned long addr = (unsigned long)v.iov_base; size_t len = v.iov_len + (*start = addr & (PAGE_SIZE - 1)); @@ -1688,8 +1699,7 @@ ssize_t iov_iter_get_pages_alloc(struct iov_iter *i, p = get_pages_array(n); if (!p) return -ENOMEM; - res = get_user_pages_fast(addr, n, - iov_iter_rw(i) != WRITE ? FOLL_WRITE : 0, p); + res = get_user_pages_fast(addr, n, gup_flags, p); if (unlikely(res < 0)) { kvfree(p); return res; From patchwork Sun Jul 18 22:39:32 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andreas Gruenbacher X-Patchwork-Id: 12384531 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-17.2 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C6C3EC636C8 for ; Sun, 18 Jul 2021 22:40:22 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id AEAE060FF4 for ; Sun, 18 Jul 2021 22:40:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234020AbhGRWnS (ORCPT ); Sun, 18 Jul 2021 18:43:18 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:21376 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233875AbhGRWnN (ORCPT ); Sun, 18 Jul 2021 18:43:13 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1626648014; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=KCnSLNP7riR8LfALdj37STeKckvJ3hnzBxN8+ybrwAA=; b=IzitkoTuH9rDpqpbUU0JiukMqWTRgTHuf3C3rjVXGgy6xXW+a4Mmef7c/kzfBOdbMKIyyR iieoD2eWL2KRTBd2BRjCfp2D4wyOv9b1jM0DZdSboOQ23IlRupM7sGeyDIv2JUwQrMAEOc VdFuyuehy32+3alCzgX4II4flhtrcdI= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-422-_lYvoSUYM2qhfapcOfSkeQ-1; Sun, 18 Jul 2021 18:40:10 -0400 X-MC-Unique: _lYvoSUYM2qhfapcOfSkeQ-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 3994F362F8; Sun, 18 Jul 2021 22:40:09 +0000 (UTC) Received: from max.com (unknown [10.40.195.75]) by smtp.corp.redhat.com (Postfix) with ESMTP id 6D5A560C0F; Sun, 18 Jul 2021 22:40:02 +0000 (UTC) From: Andreas Gruenbacher To: Linus Torvalds Cc: Alexander Viro , Jan Kara , Matthew Wilcox , Christoph Hellwig , cluster-devel@redhat.com, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, ocfs2-devel@oss.oracle.com, Andreas Gruenbacher Subject: [PATCH v2 6/6] gfs2: Fix mmap + page fault deadlocks for direct I/O Date: Mon, 19 Jul 2021 00:39:32 +0200 Message-Id: <20210718223932.2703330-7-agruenba@redhat.com> In-Reply-To: <20210718223932.2703330-1-agruenba@redhat.com> References: <20210718223932.2703330-1-agruenba@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Direct I/O differs from buffered I/O in that it uses bio_iov_iter_get_pages for grabbing page references and for manually faulting in pages instead of triggering actual page faults. For disabling these manual page faults, it's not enough to call pagefault_disable(); instead, we use the new ITER_FLAG_FAST_ONLY flag for telling iomap_dio_rw to stop faulting pages in for us. Signed-off-by: Andreas Gruenbacher --- fs/gfs2/file.c | 35 ++++++++++++++++++++++++++++++++++- 1 file changed, 34 insertions(+), 1 deletion(-) diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c index 99df7934b4d8..6feb857a8a1c 100644 --- a/fs/gfs2/file.c +++ b/fs/gfs2/file.c @@ -763,21 +763,42 @@ static ssize_t gfs2_file_direct_read(struct kiocb *iocb, struct iov_iter *to, struct file *file = iocb->ki_filp; struct gfs2_inode *ip = GFS2_I(file->f_mapping->host); size_t count = iov_iter_count(to); + size_t written = 0; ssize_t ret; + /* + * In this function, we disable page faults when whe're holding the + * inode glock while doing I/O. If a page fault occurs, we drop the + * inode glock, fault in the pages manually, and then we retry. Other + * than in gfs2_file_read_iter, iomap_dio_rw can trigger implicit as + * well as manual page faults, and we need to disable both kinds + * separately. + */ + if (!count) return 0; /* skip atime */ gfs2_holder_init(ip->i_gl, LM_ST_DEFERRED, 0, gh); +retry: ret = gfs2_glock_nq(gh); if (ret) goto out_uninit; + pagefault_disable(); + to->type |= ITER_FLAG_FAST_ONLY; ret = iomap_dio_rw(iocb, to, &gfs2_iomap_ops, NULL, 0); + to->type &= ~ITER_FLAG_FAST_ONLY; + pagefault_enable(); + gfs2_glock_dq(gh); + if (ret > 0) + written += ret; + if (unlikely(iov_iter_count(to) && (ret > 0 || ret == -EFAULT)) && + fault_in_iov_iter(to)) + goto retry; out_uninit: gfs2_holder_uninit(gh); - return ret; + return written ? written : ret; } static ssize_t gfs2_file_direct_write(struct kiocb *iocb, struct iov_iter *from, @@ -790,6 +811,12 @@ static ssize_t gfs2_file_direct_write(struct kiocb *iocb, struct iov_iter *from, loff_t offset = iocb->ki_pos; ssize_t ret; + /* + * In this function, we disable page faults when whe're holding the + * inode glock while doing I/O. If a page fault occurs, we drop the + * inode glock, fault in the pages manually, and then we retry. + */ + /* * Deferred lock, even if its a write, since we do no allocation on * this path. All we need to change is the atime, and this lock mode @@ -799,6 +826,7 @@ static ssize_t gfs2_file_direct_write(struct kiocb *iocb, struct iov_iter *from, * VFS does. */ gfs2_holder_init(ip->i_gl, LM_ST_DEFERRED, 0, gh); +retry: ret = gfs2_glock_nq(gh); if (ret) goto out_uninit; @@ -807,11 +835,16 @@ static ssize_t gfs2_file_direct_write(struct kiocb *iocb, struct iov_iter *from, if (offset + len > i_size_read(&ip->i_inode)) goto out; + from->type |= ITER_FLAG_FAST_ONLY; ret = iomap_dio_rw(iocb, from, &gfs2_iomap_ops, NULL, 0); + from->type &= ~ITER_FLAG_FAST_ONLY; + if (ret == -ENOTBLK) ret = 0; out: gfs2_glock_dq(gh); + if (unlikely(ret == -EFAULT) && fault_in_iov_iter(from)) + goto retry; out_uninit: gfs2_holder_uninit(gh); return ret;