diff mbox

[v2,0/4] Avoid live-lock in fault-in+uaccess loops with sub-page faults

Message ID 20211201193750.2097885-1-catalin.marinas@arm.com (mailing list archive)
State New, archived
Headers show

Commit Message

Catalin Marinas Dec. 1, 2021, 7:37 p.m. UTC
Hi,

Following the discussions on the first series,

https://lore.kernel.org/r/20211124192024.2408218-1-catalin.marinas@arm.com

this new patchset aims to generalise the sub-page probing and introduce
a minimum size to the fault_in_*() functions. I called this 'v2' but I
can rebase it on top of v1 and keep v1 as a btrfs live-lock
back-portable fix. The fault_in_*() API improvements would be a new
series. Anyway, I'd first like to know whether this is heading in the
right direction and whether it's worth adding min_size to all
fault_in_*() (more below).

v2 adds a 'min_size' argument to all fault_in_*() functions with current
callers passing 0 (or we could make it 1). A probe_subpage_*() call is
made for the min_size range, though with all 0 this wouldn't have any
effect. The only difference is btrfs search_ioctl() in the last patch
which passes a non-zero min_size to avoid the live-lock (functionally
that's the same as the v1 series).

In terms of sub-page probing, I don't think with the current kernel
anything other than search_ioctl() matters. The buffered file I/O can
already cope with current fault_in_*() + copy_*_user() loops (the
uaccess makes progress). Direct I/O either goes via GUP + kernel mapping
access (and memcpy() can't fault) or, if the user buffer is not PAGE
aligned, it may fall back to buffered I/O. So we really only care about
fault_in_writeable(), as in v1.

Linus suggested that we could use the min_size to request a minimum
guaranteed probed size (in most cases this would be 1) and put a cap on
the faulted-in size, say two pages. All the fault_in_iov_iter_*()
callers will need to check the actual quantity returned by fault_in_*()
rather than bail out on non-zero but Andreas has a patch already (though
I think there are a few cases in btrfs etc.):

https://lore.kernel.org/r/20211123151812.361624-1-agruenba@redhat.com

With these callers fixed, we could add something like the diff below.
But, again, min_size doesn't actually have any current use in the kernel
other than fault_in_writeable() and search_ioctl().

Thanks for having a look. Suggestions welcomed.

------------------8<-------------------------------
------------------8<-------------------------------

Catalin Marinas (4):
  mm: Introduce a 'min_size' argument to fault_in_*()
  mm: Probe for sub-page faults in fault_in_*()
  arm64: Add support for user sub-page fault probing
  btrfs: Avoid live-lock in search_ioctl() on hardware with sub-page
    faults

 arch/Kconfig                        |  7 ++++
 arch/arm64/Kconfig                  |  1 +
 arch/arm64/include/asm/uaccess.h    | 59 +++++++++++++++++++++++++++++
 arch/powerpc/kernel/kvm.c           |  2 +-
 arch/powerpc/kernel/signal_32.c     |  4 +-
 arch/powerpc/kernel/signal_64.c     |  2 +-
 arch/x86/kernel/fpu/signal.c        |  2 +-
 drivers/gpu/drm/armada/armada_gem.c |  2 +-
 fs/btrfs/file.c                     |  6 +--
 fs/btrfs/ioctl.c                    |  7 +++-
 fs/f2fs/file.c                      |  2 +-
 fs/fuse/file.c                      |  2 +-
 fs/gfs2/file.c                      |  8 ++--
 fs/iomap/buffered-io.c              |  2 +-
 fs/ntfs/file.c                      |  2 +-
 fs/ntfs3/file.c                     |  2 +-
 include/linux/pagemap.h             |  8 ++--
 include/linux/uaccess.h             | 53 ++++++++++++++++++++++++++
 include/linux/uio.h                 |  6 ++-
 lib/iov_iter.c                      | 28 +++++++++++---
 mm/filemap.c                        |  2 +-
 mm/gup.c                            | 37 +++++++++++++-----
 22 files changed, 203 insertions(+), 41 deletions(-)

Comments

Andreas Gruenbacher Dec. 3, 2021, 3:29 p.m. UTC | #1
Catalin,

On Wed, Dec 1, 2021 at 8:38 PM Catalin Marinas <catalin.marinas@arm.com> wrote:
>
> Hi,
>
> Following the discussions on the first series,
>
> https://lore.kernel.org/r/20211124192024.2408218-1-catalin.marinas@arm.com
>
> this new patchset aims to generalise the sub-page probing and introduce
> a minimum size to the fault_in_*() functions. I called this 'v2' but I
> can rebase it on top of v1 and keep v1 as a btrfs live-lock
> back-portable fix.

that's what I was actually expecting, an updated patch series that
changes the btrfs code to keep track of the user-copy fault address,
the corresponding changes to the fault_in functions to call the
appropriate arch functions, and the arch functions that probe far
enough from the fault address to prevent deadlocks. In this step, how
far the arch functions need to probe depends on the fault windows of
the user-copy functions.

A next step (as a patch series on top) would be to make sure direct
I/O also takes sub-page faults into account. That seems to require
probing the entire address range before the actual copying. A concern
I have about this is time-of-check versus time-of-use: what if
sub-page faults are added after the probing but before the copying?
Other than that, an approach like adding min_size parameters might
work except that maybe we can find a better name. Also, in order not
to make things even more messy, the fault_in functions should probably
continue to report how much of the address range they've failed to
fault in. Callers can then check for themselves whether the function
could fault in min_size bytes or not.

> The fault_in_*() API improvements would be a new
> series. Anyway, I'd first like to know whether this is heading in the
> right direction and whether it's worth adding min_size to all
> fault_in_*() (more below).
>
> v2 adds a 'min_size' argument to all fault_in_*() functions with current
> callers passing 0 (or we could make it 1). A probe_subpage_*() call is
> made for the min_size range, though with all 0 this wouldn't have any
> effect. The only difference is btrfs search_ioctl() in the last patch
> which passes a non-zero min_size to avoid the live-lock (functionally
> that's the same as the v1 series).

In the btrfs case, the copying will already trigger sub-page faults;
we only need to make sure that the next fault-in attempt happens at
the fault address. (And that the fault_in functions take the user-copy
fuzz into account, which we also need for byte granularity copying
anyway.) Otherwise, we're creating the same time-of-check versus
time-of-use disparity as for direct-IO here, unnecessarily.

> In terms of sub-page probing, I don't think with the current kernel
> anything other than search_ioctl() matters. The buffered file I/O can
> already cope with current fault_in_*() + copy_*_user() loops (the
> uaccess makes progress). Direct I/O either goes via GUP + kernel mapping
> access (and memcpy() can't fault) or, if the user buffer is not PAGE
> aligned, it may fall back to buffered I/O. So we really only care about
> fault_in_writeable(), as in v1.

Yes from a regression point of view, but note that direct I/O still
circumvents the sub-page fault checking, which seems to defeat the
whole point.

> Linus suggested that we could use the min_size to request a minimum
> guaranteed probed size (in most cases this would be 1) and put a cap on
> the faulted-in size, say two pages. All the fault_in_iov_iter_*()
> callers will need to check the actual quantity returned by fault_in_*()
> rather than bail out on non-zero but Andreas has a patch already (though
> I think there are a few cases in btrfs etc.):
>
> https://lore.kernel.org/r/20211123151812.361624-1-agruenba@redhat.com
>
> With these callers fixed, we could add something like the diff below.
> But, again, min_size doesn't actually have any current use in the kernel
> other than fault_in_writeable() and search_ioctl().

We're trying pretty hard to handle large I/O requests efficiently at
the filesystem level. A small, static upper limit in the fault-in
functions has the potential to ruin those efforts. So I'm not a fan of
that.

> Thanks for having a look. Suggestions welcomed.
>
> ------------------8<-------------------------------
> diff --git a/mm/gup.c b/mm/gup.c
> index 7fa69b0fb859..3aa88aa8ce9d 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -1658,6 +1658,8 @@ static long __get_user_pages_locked(struct mm_struct *mm, unsigned long start,
>  }
>  #endif /* !CONFIG_MMU */
>
> +#define MAX_FAULT_IN_SIZE      (2 * PAGE_SIZE)
> +
>  /**
>   * fault_in_writeable - fault in userspace address range for writing
>   * @uaddr: start of address range
> @@ -1671,6 +1673,7 @@ size_t fault_in_writeable(char __user *uaddr, size_t size, size_t min_size)
>  {
>         char __user *start = uaddr, *end;
>         size_t faulted_in = size;
> +       size_t max_size = max_t(size_t, MAX_FAULT_IN_SIZE, min_size);
>
>         if (unlikely(size == 0))
>                 return 0;
> @@ -1679,7 +1682,7 @@ size_t fault_in_writeable(char __user *uaddr, size_t size, size_t min_size)
>                         return size;
>                 uaddr = (char __user *)PAGE_ALIGN((unsigned long)uaddr);
>         }
> -       end = (char __user *)PAGE_ALIGN((unsigned long)start + size);
> +       end = (char __user *)PAGE_ALIGN((unsigned long)start + max_size);
>         if (unlikely(end < start))
>                 end = NULL;
>         while (uaddr != end) {
> @@ -1726,9 +1729,10 @@ size_t fault_in_safe_writeable(const char __user *uaddr, size_t size,
>         struct vm_area_struct *vma = NULL;
>         int locked = 0;
>         size_t faulted_in = size;
> +       size_t max_size = max_t(size_t, MAX_FAULT_IN_SIZE, min_size);
>
>         nstart = start & PAGE_MASK;
> -       end = PAGE_ALIGN(start + size);
> +       end = PAGE_ALIGN(start + max_size);
>         if (end < nstart)
>                 end = 0;
>         for (; nstart != end; nstart = nend) {
> @@ -1759,7 +1763,7 @@ size_t fault_in_safe_writeable(const char __user *uaddr, size_t size,
>         if (locked)
>                 mmap_read_unlock(mm);
>         if (nstart != end)
> -               faulted_in = min_t(size_t, nstart - start, size);
> +               faulted_in = min_t(size_t, nstart - start, max_size);
>         if (faulted_in < min_size ||
>             (min_size && probe_subpage_safe_writeable(uaddr, min_size)))
>                 return size;
> @@ -1782,6 +1786,7 @@ size_t fault_in_readable(const char __user *uaddr, size_t size,
>         const char __user *start = uaddr, *end;
>         volatile char c;
>         size_t faulted_in = size;
> +       size_t max_size = max_t(size_t, MAX_FAULT_IN_SIZE, min_size);
>
>         if (unlikely(size == 0))
>                 return 0;
> @@ -1790,7 +1795,7 @@ size_t fault_in_readable(const char __user *uaddr, size_t size,
>                         return size;
>                 uaddr = (const char __user *)PAGE_ALIGN((unsigned long)uaddr);
>         }
> -       end = (const char __user *)PAGE_ALIGN((unsigned long)start + size);
> +       end = (const char __user *)PAGE_ALIGN((unsigned long)start + max_size);
>         if (unlikely(end < start))
>                 end = NULL;
>         while (uaddr != end) {
> ------------------8<-------------------------------
>
> Catalin Marinas (4):
>   mm: Introduce a 'min_size' argument to fault_in_*()
>   mm: Probe for sub-page faults in fault_in_*()
>   arm64: Add support for user sub-page fault probing
>   btrfs: Avoid live-lock in search_ioctl() on hardware with sub-page
>     faults
>
>  arch/Kconfig                        |  7 ++++
>  arch/arm64/Kconfig                  |  1 +
>  arch/arm64/include/asm/uaccess.h    | 59 +++++++++++++++++++++++++++++
>  arch/powerpc/kernel/kvm.c           |  2 +-
>  arch/powerpc/kernel/signal_32.c     |  4 +-
>  arch/powerpc/kernel/signal_64.c     |  2 +-
>  arch/x86/kernel/fpu/signal.c        |  2 +-
>  drivers/gpu/drm/armada/armada_gem.c |  2 +-
>  fs/btrfs/file.c                     |  6 +--
>  fs/btrfs/ioctl.c                    |  7 +++-
>  fs/f2fs/file.c                      |  2 +-
>  fs/fuse/file.c                      |  2 +-
>  fs/gfs2/file.c                      |  8 ++--
>  fs/iomap/buffered-io.c              |  2 +-
>  fs/ntfs/file.c                      |  2 +-
>  fs/ntfs3/file.c                     |  2 +-
>  include/linux/pagemap.h             |  8 ++--
>  include/linux/uaccess.h             | 53 ++++++++++++++++++++++++++
>  include/linux/uio.h                 |  6 ++-
>  lib/iov_iter.c                      | 28 +++++++++++---
>  mm/filemap.c                        |  2 +-
>  mm/gup.c                            | 37 +++++++++++++-----
>  22 files changed, 203 insertions(+), 41 deletions(-)
>

Thanks,
Andreas
Linus Torvalds Dec. 3, 2021, 5:57 p.m. UTC | #2
On Fri, Dec 3, 2021 at 7:29 AM Andreas Gruenbacher <agruenba@redhat.com> wrote:
>
>
> We're trying pretty hard to handle large I/O requests efficiently at
> the filesystem level. A small, static upper limit in the fault-in
> functions has the potential to ruin those efforts. So I'm not a fan of
> that.

I don't think fault-in should happen under any sane normal circumstances.

Except for low-memory situations, and then you don't want to fault in
large areas.

Do you really expect to write big areas that the user has never even
touched? That would be literally insane.

And if the user _has_ touched them, then they'll in in-core. Except
for the "swapped out" case.

End result: this is purely a correctness issue, not a performance issue.

                       Linus
Andreas Gruenbacher Dec. 3, 2021, 6:11 p.m. UTC | #3
On Fri, Dec 3, 2021 at 6:58 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Fri, Dec 3, 2021 at 7:29 AM Andreas Gruenbacher <agruenba@redhat.com> wrote:
> > We're trying pretty hard to handle large I/O requests efficiently at
> > the filesystem level. A small, static upper limit in the fault-in
> > functions has the potential to ruin those efforts. So I'm not a fan of
> > that.
>
> I don't think fault-in should happen under any sane normal circumstances.
>
> Except for low-memory situations, and then you don't want to fault in
> large areas.
>
> Do you really expect to write big areas that the user has never even
> touched? That would be literally insane.
>
> And if the user _has_ touched them, then they'll in in-core. Except
> for the "swapped out" case.
>
> End result: this is purely a correctness issue, not a performance issue.

It happens when you mmap a file and write the mmapped region to
another file, for example. I don't think we want to make filesystems
go bonkers in such scenarios. Scaling down in response to memory
pressure sounds perfectly fine though.

Thanks,
Andreas
Linus Torvalds Dec. 3, 2021, 6:25 p.m. UTC | #4
On Fri, Dec 3, 2021 at 10:12 AM Andreas Gruenbacher <agruenba@redhat.com> wrote:
>
> It happens when you mmap a file and write the mmapped region to
> another file, for example.

Do you actually have such loads? Nobody should use mmap() for
single-access file copy purposes. It's slower than just doing the copy
exactly due to page fault overhead.

In other words, you seem to be worrying about the performance of a
load that is _explicitly_ badly written. You should be fixing the
application, not making the kernel do stupid things.

Also, it's worth noting that that situation should be caught by the
page-in code, which will map multiple pages in one go
(do_fault_around() - for when the pages are cached), and do the
readahead logic (filemap_fault() - for when the pages aren't in the
page cache).

Both of which are a lot more important than the "synchronously fault
in pages one at a time".

                Linus
Catalin Marinas Dec. 3, 2021, 7:51 p.m. UTC | #5
Hi Andreas,

On Fri, Dec 03, 2021 at 04:29:18PM +0100, Andreas Gruenbacher wrote:
> On Wed, Dec 1, 2021 at 8:38 PM Catalin Marinas <catalin.marinas@arm.com> wrote:
> > Following the discussions on the first series,
> >
> > https://lore.kernel.org/r/20211124192024.2408218-1-catalin.marinas@arm.com
> >
> > this new patchset aims to generalise the sub-page probing and introduce
> > a minimum size to the fault_in_*() functions. I called this 'v2' but I
> > can rebase it on top of v1 and keep v1 as a btrfs live-lock
> > back-portable fix.
> 
> that's what I was actually expecting, an updated patch series that
> changes the btrfs code to keep track of the user-copy fault address,
> the corresponding changes to the fault_in functions to call the
> appropriate arch functions, and the arch functions that probe far
> enough from the fault address to prevent deadlocks. In this step, how
> far the arch functions need to probe depends on the fault windows of
> the user-copy functions.

I have that series as well, see the top patch here (well, you've seen it
already):

https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/log/?h=devel/btrfs-live-lock-fix

But I'm not convinced it's worth it if we go for the approach in v2
here. A key difference between v2 and the above branch is that
probe_subpage_writeable() checks exactly what is given (min_size) in v2
while in the devel/btrfs-live-lock-fix branch it can be given a
PAGE_SIZE or more but only checks the beginning 16 bytes to cover the
copy_to_user() error margin. The latter assumes that the caller will
always attempt the fault_in() from where the uaccess failed rather than
relying on the fault_in() itself to avoid the live-lock.

v1 posted earlier also checks the full range but only in
fault_in_writeable() which seems to be only relevant for btrfs in the
arm64 case.

Maybe I should post the other series as an alternative, get some input
on it.

> A next step (as a patch series on top) would be to make sure direct
> I/O also takes sub-page faults into account. That seems to require
> probing the entire address range before the actual copying. A concern
> I have about this is time-of-check versus time-of-use: what if
> sub-page faults are added after the probing but before the copying?

With direct I/O (that doesn't fall back to buffered), the access is done
via the kernel mapping following a get_user_pages(). Since the access
here cannot cope with exceptions, it must be unchecked. Yes, we do have
the time-of-use vs check problem but I'm not worried. I regard MTE as a
probabilistic security feature. Things get even murkier if the I/O is
done by some DMA engine which ignores tags anyway.

CHERI, OTOH, is a lot more strict but there is no check vs use issue
here since all permissions are encoded in the pointer itself (we might
just expand access_ok() to take this into account).

We could use the min_size logic for this I think in functions like
gfs2_file_direct_read() you'd have to fault in first and than invoke
iomap_dio_rw().

Anyway, like I said before, I'd leave the MTE accesses for direct I/O
unchecked as they currently are, I don't think it's worth the effort and
the potential slow-down (it will be significant).

> Other than that, an approach like adding min_size parameters might
> work except that maybe we can find a better name. Also, in order not
> to make things even more messy, the fault_in functions should probably
> continue to report how much of the address range they've failed to
> fault in. Callers can then check for themselves whether the function
> could fault in min_size bytes or not.

That's fine as well. I did it this way because I found the logic easier
to write.

> > The fault_in_*() API improvements would be a new
> > series. Anyway, I'd first like to know whether this is heading in the
> > right direction and whether it's worth adding min_size to all
> > fault_in_*() (more below).
> >
> > v2 adds a 'min_size' argument to all fault_in_*() functions with current
> > callers passing 0 (or we could make it 1). A probe_subpage_*() call is
> > made for the min_size range, though with all 0 this wouldn't have any
> > effect. The only difference is btrfs search_ioctl() in the last patch
> > which passes a non-zero min_size to avoid the live-lock (functionally
> > that's the same as the v1 series).
> 
> In the btrfs case, the copying will already trigger sub-page faults;
> we only need to make sure that the next fault-in attempt happens at
> the fault address. (And that the fault_in functions take the user-copy
> fuzz into account, which we also need for byte granularity copying
> anyway.) Otherwise, we're creating the same time-of-check versus
> time-of-use disparity as for direct-IO here, unnecessarily.

I don't think it matters for btrfs. In some way, you'd have the time of
check vs use problem even if you fault in from where uaccess failed.
It's just that in practice it's impossible to live-lock as it needs very
precise synchronisation to change the tags from another CPU. But you do
guarantee that the uaccess was correct.

> > In terms of sub-page probing, I don't think with the current kernel
> > anything other than search_ioctl() matters. The buffered file I/O can
> > already cope with current fault_in_*() + copy_*_user() loops (the
> > uaccess makes progress). Direct I/O either goes via GUP + kernel mapping
> > access (and memcpy() can't fault) or, if the user buffer is not PAGE
> > aligned, it may fall back to buffered I/O. So we really only care about
> > fault_in_writeable(), as in v1.
> 
> Yes from a regression point of view, but note that direct I/O still
> circumvents the sub-page fault checking, which seems to defeat the
> whole point.

It doesn't entirely defeat it. From my perspective MTE is more of a best
effort to find use-after-free etc. bugs. It has a performance penalty
and I wouldn't want to make it worse. Some libc allocators even go for
untagged memory (unchecked) if the required size is over some threshold
(usually when it falls back to multiple page allocations). That's more
likely to be involved in direct I/O anyway, so the additional check in
fault_in() won't matter.

> > Linus suggested that we could use the min_size to request a minimum
> > guaranteed probed size (in most cases this would be 1) and put a cap on
> > the faulted-in size, say two pages. All the fault_in_iov_iter_*()
> > callers will need to check the actual quantity returned by fault_in_*()
> > rather than bail out on non-zero but Andreas has a patch already (though
> > I think there are a few cases in btrfs etc.):
> >
> > https://lore.kernel.org/r/20211123151812.361624-1-agruenba@redhat.com
> >
> > With these callers fixed, we could add something like the diff below.
> > But, again, min_size doesn't actually have any current use in the kernel
> > other than fault_in_writeable() and search_ioctl().
> 
> We're trying pretty hard to handle large I/O requests efficiently at
> the filesystem level. A small, static upper limit in the fault-in
> functions has the potential to ruin those efforts. So I'm not a fan of
> that.

I can't comment on this, I haven't spent time in the fs land. But I did
notice that generic_perform_write() for example limits the fault_in() to
PAGE_SIZE. So this min_size potential optimisation wouldn't make any
difference.
diff mbox

Patch

diff --git a/mm/gup.c b/mm/gup.c
index 7fa69b0fb859..3aa88aa8ce9d 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1658,6 +1658,8 @@  static long __get_user_pages_locked(struct mm_struct *mm, unsigned long start,
 }
 #endif /* !CONFIG_MMU */
 
+#define MAX_FAULT_IN_SIZE	(2 * PAGE_SIZE)
+
 /**
  * fault_in_writeable - fault in userspace address range for writing
  * @uaddr: start of address range
@@ -1671,6 +1673,7 @@  size_t fault_in_writeable(char __user *uaddr, size_t size, size_t min_size)
 {
 	char __user *start = uaddr, *end;
 	size_t faulted_in = size;
+	size_t max_size = max_t(size_t, MAX_FAULT_IN_SIZE, min_size);
 
 	if (unlikely(size == 0))
 		return 0;
@@ -1679,7 +1682,7 @@  size_t fault_in_writeable(char __user *uaddr, size_t size, size_t min_size)
 			return size;
 		uaddr = (char __user *)PAGE_ALIGN((unsigned long)uaddr);
 	}
-	end = (char __user *)PAGE_ALIGN((unsigned long)start + size);
+	end = (char __user *)PAGE_ALIGN((unsigned long)start + max_size);
 	if (unlikely(end < start))
 		end = NULL;
 	while (uaddr != end) {
@@ -1726,9 +1729,10 @@  size_t fault_in_safe_writeable(const char __user *uaddr, size_t size,
 	struct vm_area_struct *vma = NULL;
 	int locked = 0;
 	size_t faulted_in = size;
+	size_t max_size = max_t(size_t, MAX_FAULT_IN_SIZE, min_size);
 
 	nstart = start & PAGE_MASK;
-	end = PAGE_ALIGN(start + size);
+	end = PAGE_ALIGN(start + max_size);
 	if (end < nstart)
 		end = 0;
 	for (; nstart != end; nstart = nend) {
@@ -1759,7 +1763,7 @@  size_t fault_in_safe_writeable(const char __user *uaddr, size_t size,
 	if (locked)
 		mmap_read_unlock(mm);
 	if (nstart != end)
-		faulted_in = min_t(size_t, nstart - start, size);
+		faulted_in = min_t(size_t, nstart - start, max_size);
 	if (faulted_in < min_size ||
 	    (min_size && probe_subpage_safe_writeable(uaddr, min_size)))
 		return size;
@@ -1782,6 +1786,7 @@  size_t fault_in_readable(const char __user *uaddr, size_t size,
 	const char __user *start = uaddr, *end;
 	volatile char c;
 	size_t faulted_in = size;
+	size_t max_size = max_t(size_t, MAX_FAULT_IN_SIZE, min_size);
 
 	if (unlikely(size == 0))
 		return 0;
@@ -1790,7 +1795,7 @@  size_t fault_in_readable(const char __user *uaddr, size_t size,
 			return size;
 		uaddr = (const char __user *)PAGE_ALIGN((unsigned long)uaddr);
 	}
-	end = (const char __user *)PAGE_ALIGN((unsigned long)start + size);
+	end = (const char __user *)PAGE_ALIGN((unsigned long)start + max_size);
 	if (unlikely(end < start))
 		end = NULL;
 	while (uaddr != end) {