diff mbox series

[v7,18/19] iov_iter: Introduce nofault flag to disable page faults

Message ID 20210827164926.1726765-19-agruenba@redhat.com (mailing list archive)
State New, archived
Headers show
Series gfs2: Fix mmap + page fault deadlocks | expand

Commit Message

Andreas Gruenbacher Aug. 27, 2021, 4:49 p.m. UTC
Introduce a new nofault flag to indicate to get_user_pages to use the
FOLL_NOFAULT flag.  This will cause get_user_pages to fail when it
would otherwise fault in a page.

Currently, the noio flag is only checked in iov_iter_get_pages and
iov_iter_get_pages_alloc.  This is enough for iomaop_dio_rw, but it
may make sense to check in other contexts as well.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
 include/linux/uio.h |  1 +
 lib/iov_iter.c      | 20 +++++++++++++++-----
 2 files changed, 16 insertions(+), 5 deletions(-)

Comments

Al Viro Aug. 27, 2021, 6:47 p.m. UTC | #1
On Fri, Aug 27, 2021 at 06:49:25PM +0200, Andreas Gruenbacher wrote:
> Introduce a new nofault flag to indicate to get_user_pages to use the
> FOLL_NOFAULT flag.  This will cause get_user_pages to fail when it
> would otherwise fault in a page.
> 
> Currently, the noio flag is only checked in iov_iter_get_pages and
> iov_iter_get_pages_alloc.  This is enough for iomaop_dio_rw, but it
> may make sense to check in other contexts as well.

I can live with that, but
	* direct assignments (as in the next patch) are fucking hard to
grep for.  Is it intended to be "we set it for duration of primitive",
or...?
	* it would be nice to have a description of intended semantics
for that thing.  This "may make sense to check in other contexts" really
needs to be elaborated (and agreed) upon.  Details, please.
Andreas Gruenbacher Aug. 27, 2021, 7:56 p.m. UTC | #2
On Fri, Aug 27, 2021 at 8:47 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Fri, Aug 27, 2021 at 06:49:25PM +0200, Andreas Gruenbacher wrote:
> > Introduce a new nofault flag to indicate to get_user_pages to use the
> > FOLL_NOFAULT flag.  This will cause get_user_pages to fail when it
> > would otherwise fault in a page.
> >
> > Currently, the noio flag is only checked in iov_iter_get_pages and
> > iov_iter_get_pages_alloc.  This is enough for iomaop_dio_rw, but it
> > may make sense to check in other contexts as well.
>
> I can live with that, but
>         * direct assignments (as in the next patch) are fucking hard to
> grep for.  Is it intended to be "we set it for duration of primitive",
> or...?

It's for this kind of pattern:

       pagefault_disable();
       to->nofault = true;
       ret = iomap_dio_rw(iocb, to, &gfs2_iomap_ops, NULL,
                          IOMAP_DIO_PARTIAL, written);
       to->nofault = false;
       pagefault_enable();

Clearing the flag at the end isn't strictly necessary, but it kind of
documents that the flag pertains to iomap_dio_rw and not something
else.

>         * it would be nice to have a description of intended semantics
> for that thing.  This "may make sense to check in other contexts" really
> needs to be elaborated (and agreed) upon.  Details, please.

Maybe the description should just be something like:

"Introduce a new nofault flag to indicate to iov_iter_get_pages not to
fault in user pages.

This is implemented by passing the FOLL_NOFAULT flag to get_user_pages,
which causes get_user_pages to fail when it would otherwise fault in a
page. We'll use the ->nofault flag to prevent iomap_dio_rw from faulting
in pages when page faults are not allowed."

Thanks,
Andreas
diff mbox series

Patch

diff --git a/include/linux/uio.h b/include/linux/uio.h
index ffa431aeb067..ea35e511268f 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -29,6 +29,7 @@  enum iter_type {
 
 struct iov_iter {
 	u8 iter_type;
+	bool nofault;
 	bool data_source;
 	size_t iov_offset;
 	size_t count;
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 968f2d2595cd..22a82f272754 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -513,6 +513,7 @@  void iov_iter_init(struct iov_iter *i, unsigned int direction,
 	WARN_ON(direction & ~(READ | WRITE));
 	*i = (struct iov_iter) {
 		.iter_type = ITER_IOVEC,
+		.nofault = false,
 		.data_source = direction,
 		.iov = iov,
 		.nr_segs = nr_segs,
@@ -1523,13 +1524,17 @@  ssize_t iov_iter_get_pages(struct iov_iter *i,
 		return 0;
 
 	if (likely(iter_is_iovec(i))) {
+		unsigned int gup_flags = 0;
 		unsigned long addr;
 
+		if (iov_iter_rw(i) != WRITE)
+			gup_flags |= FOLL_WRITE;
+		if (i->nofault)
+			gup_flags |= FOLL_NOFAULT;
+
 		addr = first_iovec_segment(i, &len, start, maxsize, maxpages);
 		n = DIV_ROUND_UP(len, PAGE_SIZE);
-		res = get_user_pages_fast(addr, n,
-				iov_iter_rw(i) != WRITE ?  FOLL_WRITE : 0,
-				pages);
+		res = get_user_pages_fast(addr, n, gup_flags, pages);
 		if (unlikely(res <= 0))
 			return res;
 		return (res == n ? len : res * PAGE_SIZE) - *start;
@@ -1645,15 +1650,20 @@  ssize_t iov_iter_get_pages_alloc(struct iov_iter *i,
 		return 0;
 
 	if (likely(iter_is_iovec(i))) {
+		unsigned int gup_flags = 0;
 		unsigned long addr;
 
+		if (iov_iter_rw(i) != WRITE)
+			gup_flags |= FOLL_WRITE;
+		if (i->nofault)
+			gup_flags |= FOLL_NOFAULT;
+
 		addr = first_iovec_segment(i, &len, start, maxsize, ~0U);
 		n = DIV_ROUND_UP(len, PAGE_SIZE);
 		p = get_pages_array(n);
 		if (!p)
 			return -ENOMEM;
-		res = get_user_pages_fast(addr, n,
-				iov_iter_rw(i) != WRITE ?  FOLL_WRITE : 0, p);
+		res = get_user_pages_fast(addr, n, gup_flags, p);
 		if (unlikely(res <= 0)) {
 			kvfree(p);
 			*pages = NULL;