diff mbox series

[v3,3/6] fs: use copy_from_user_nolog() to copy mount() data

Message ID 20211208044808.872554-4-pcc@google.com (mailing list archive)
State New, archived
Headers show
Series kernel: introduce uaccess logging | expand

Commit Message

Peter Collingbourne Dec. 8, 2021, 4:48 a.m. UTC
With uaccess logging the contract is that the kernel must not report
accessing more data than necessary, as this can lead to false positive
reports in downstream consumers. This generally works out of the box
when instrumenting copy_{from,to}_user(), but with the data argument
to mount() we use copy_from_user() to copy PAGE_SIZE bytes (or as
much as we can, if the PAGE_SIZE sized access failed) and figure out
later how much we actually need.

To prevent this from leading to a false positive report, use
copy_from_user_nolog(), which will prevent the access from being logged.
Recall that it is valid for the kernel to report accessing less
data than it actually accessed, as uaccess logging is a best-effort
mechanism for reporting uaccesses.

Link: https://linux-review.googlesource.com/id/I5629b92a725c817acd9a861288338dd605cafee6
Signed-off-by: Peter Collingbourne <pcc@google.com>
---
 fs/namespace.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

Comments

Dmitry Vyukov Dec. 8, 2021, 9:34 a.m. UTC | #1
On Wed, 8 Dec 2021 at 05:48, Peter Collingbourne <pcc@google.com> wrote:
>
> With uaccess logging the contract is that the kernel must not report
> accessing more data than necessary, as this can lead to false positive
> reports in downstream consumers. This generally works out of the box
> when instrumenting copy_{from,to}_user(), but with the data argument
> to mount() we use copy_from_user() to copy PAGE_SIZE bytes (or as
> much as we can, if the PAGE_SIZE sized access failed) and figure out
> later how much we actually need.
>
> To prevent this from leading to a false positive report, use
> copy_from_user_nolog(), which will prevent the access from being logged.
> Recall that it is valid for the kernel to report accessing less
> data than it actually accessed, as uaccess logging is a best-effort
> mechanism for reporting uaccesses.
>
> Link: https://linux-review.googlesource.com/id/I5629b92a725c817acd9a861288338dd605cafee6
> Signed-off-by: Peter Collingbourne <pcc@google.com>
> ---
>  fs/namespace.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/fs/namespace.c b/fs/namespace.c
> index 659a8f39c61a..8f5f2aaca64e 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -31,6 +31,7 @@
>  #include <uapi/linux/mount.h>
>  #include <linux/fs_context.h>
>  #include <linux/shmem_fs.h>
> +#include <linux/uaccess-buffer.h>
>
>  #include "pnode.h"
>  #include "internal.h"
> @@ -3197,7 +3198,12 @@ static void *copy_mount_options(const void __user * data)
>         if (!copy)
>                 return ERR_PTR(-ENOMEM);
>
> -       left = copy_from_user(copy, data, PAGE_SIZE);
> +       /*
> +        * Use copy_from_user_nolog to avoid reporting overly large accesses in
> +        * the uaccess buffer, as this can lead to false positive reports in
> +        * downstream consumers.
> +        */
> +       left = copy_from_user_nolog(copy, data, PAGE_SIZE);

A late idea...
Maybe it's better to log them with a new UACCESS_BUFFER_FLAG_OVERREAD
flag. Better for user-space, at least can detect UAFs by checking the
first byte. And a more logical kernel annotation (maybe will be used
in some other tools? or if we ever check user tags in the kernel).

Probably not too important today since we use this only in 2 places,
but longer term may be better.

Btw, what's the story with BPF accesses? Can we log them theoretically?

Previously the comment said:

+       /*
+        * Avoid copy_from_user() here as it may leak information about the BPF
+        * program to userspace via the uaccess buffer.
+        */

but now it says something very generic:

/*
* Avoid logging uaccesses here as the BPF program may not be following
* the uaccess log rules.
*/






>         /*
>          * Not all architectures have an exact copy_from_user(). Resort to
> --
> 2.34.1.173.g76aa8bc2d0-goog
Peter Collingbourne Dec. 9, 2021, 9:42 p.m. UTC | #2
On Wed, Dec 8, 2021 at 1:35 AM Dmitry Vyukov <dvyukov@google.com> wrote:
>
> On Wed, 8 Dec 2021 at 05:48, Peter Collingbourne <pcc@google.com> wrote:
> >
> > With uaccess logging the contract is that the kernel must not report
> > accessing more data than necessary, as this can lead to false positive
> > reports in downstream consumers. This generally works out of the box
> > when instrumenting copy_{from,to}_user(), but with the data argument
> > to mount() we use copy_from_user() to copy PAGE_SIZE bytes (or as
> > much as we can, if the PAGE_SIZE sized access failed) and figure out
> > later how much we actually need.
> >
> > To prevent this from leading to a false positive report, use
> > copy_from_user_nolog(), which will prevent the access from being logged.
> > Recall that it is valid for the kernel to report accessing less
> > data than it actually accessed, as uaccess logging is a best-effort
> > mechanism for reporting uaccesses.
> >
> > Link: https://linux-review.googlesource.com/id/I5629b92a725c817acd9a861288338dd605cafee6
> > Signed-off-by: Peter Collingbourne <pcc@google.com>
> > ---
> >  fs/namespace.c | 8 +++++++-
> >  1 file changed, 7 insertions(+), 1 deletion(-)
> >
> > diff --git a/fs/namespace.c b/fs/namespace.c
> > index 659a8f39c61a..8f5f2aaca64e 100644
> > --- a/fs/namespace.c
> > +++ b/fs/namespace.c
> > @@ -31,6 +31,7 @@
> >  #include <uapi/linux/mount.h>
> >  #include <linux/fs_context.h>
> >  #include <linux/shmem_fs.h>
> > +#include <linux/uaccess-buffer.h>
> >
> >  #include "pnode.h"
> >  #include "internal.h"
> > @@ -3197,7 +3198,12 @@ static void *copy_mount_options(const void __user * data)
> >         if (!copy)
> >                 return ERR_PTR(-ENOMEM);
> >
> > -       left = copy_from_user(copy, data, PAGE_SIZE);
> > +       /*
> > +        * Use copy_from_user_nolog to avoid reporting overly large accesses in
> > +        * the uaccess buffer, as this can lead to false positive reports in
> > +        * downstream consumers.
> > +        */
> > +       left = copy_from_user_nolog(copy, data, PAGE_SIZE);
>
> A late idea...
> Maybe it's better to log them with a new UACCESS_BUFFER_FLAG_OVERREAD
> flag. Better for user-space, at least can detect UAFs by checking the
> first byte. And a more logical kernel annotation (maybe will be used
> in some other tools? or if we ever check user tags in the kernel).
>
> Probably not too important today since we use this only in 2 places,
> but longer term may be better.

I'm not sure about this. The overreads are basically an implementation
detail of the kernel, so I'm not sure it makes sense to expose them. A
scheme where we expose all overreads wouldn't necessarily help with
UAF, because what if for example the kernel reads *behind* the
user-provided pointer? I guess it could lead to false positives.

> Btw, what's the story with BPF accesses? Can we log them theoretically?
>
> Previously the comment said:
>
> +       /*
> +        * Avoid copy_from_user() here as it may leak information about the BPF
> +        * program to userspace via the uaccess buffer.
> +        */
>
> but now it says something very generic:
>
> /*
> * Avoid logging uaccesses here as the BPF program may not be following
> * the uaccess log rules.
> */

Yes we should be able to log them theoretically, but we don't need to
do that now. See my reply here:

https://lore.kernel.org/all/CAMn1gO5B5Q3hfN6kugv2wmdFGNhJb75iRX1zmCkw3wnueN1dtg@mail.gmail.com/#:~:text=This%20comment%20was,the%20comment%20accordingly.

Peter
Dmitry Vyukov Dec. 10, 2021, 2:59 a.m. UTC | #3
On Thu, 9 Dec 2021 at 22:42, Peter Collingbourne <pcc@google.com> wrote:
> > > With uaccess logging the contract is that the kernel must not report
> > > accessing more data than necessary, as this can lead to false positive
> > > reports in downstream consumers. This generally works out of the box
> > > when instrumenting copy_{from,to}_user(), but with the data argument
> > > to mount() we use copy_from_user() to copy PAGE_SIZE bytes (or as
> > > much as we can, if the PAGE_SIZE sized access failed) and figure out
> > > later how much we actually need.
> > >
> > > To prevent this from leading to a false positive report, use
> > > copy_from_user_nolog(), which will prevent the access from being logged.
> > > Recall that it is valid for the kernel to report accessing less
> > > data than it actually accessed, as uaccess logging is a best-effort
> > > mechanism for reporting uaccesses.
> > >
> > > Link: https://linux-review.googlesource.com/id/I5629b92a725c817acd9a861288338dd605cafee6
> > > Signed-off-by: Peter Collingbourne <pcc@google.com>
> > > ---
> > >  fs/namespace.c | 8 +++++++-
> > >  1 file changed, 7 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/fs/namespace.c b/fs/namespace.c
> > > index 659a8f39c61a..8f5f2aaca64e 100644
> > > --- a/fs/namespace.c
> > > +++ b/fs/namespace.c
> > > @@ -31,6 +31,7 @@
> > >  #include <uapi/linux/mount.h>
> > >  #include <linux/fs_context.h>
> > >  #include <linux/shmem_fs.h>
> > > +#include <linux/uaccess-buffer.h>
> > >
> > >  #include "pnode.h"
> > >  #include "internal.h"
> > > @@ -3197,7 +3198,12 @@ static void *copy_mount_options(const void __user * data)
> > >         if (!copy)
> > >                 return ERR_PTR(-ENOMEM);
> > >
> > > -       left = copy_from_user(copy, data, PAGE_SIZE);
> > > +       /*
> > > +        * Use copy_from_user_nolog to avoid reporting overly large accesses in
> > > +        * the uaccess buffer, as this can lead to false positive reports in
> > > +        * downstream consumers.
> > > +        */
> > > +       left = copy_from_user_nolog(copy, data, PAGE_SIZE);
> >
> > A late idea...
> > Maybe it's better to log them with a new UACCESS_BUFFER_FLAG_OVERREAD
> > flag. Better for user-space, at least can detect UAFs by checking the
> > first byte. And a more logical kernel annotation (maybe will be used
> > in some other tools? or if we ever check user tags in the kernel).
> >
> > Probably not too important today since we use this only in 2 places,
> > but longer term may be better.
>
> I'm not sure about this. The overreads are basically an implementation
> detail of the kernel, so I'm not sure it makes sense to expose them. A
> scheme where we expose all overreads wouldn't necessarily help with
> UAF, because what if for example the kernel reads *behind* the
> user-provided pointer? I guess it could lead to false positives.

If user-space uses logging to check addressability, then it can safely
check only the first byte (right? there must be at least 1 byte passed
by user-space at that address). And that's enough to detect UAFs.

> > Btw, what's the story with BPF accesses? Can we log them theoretically?
> >
> > Previously the comment said:
> >
> > +       /*
> > +        * Avoid copy_from_user() here as it may leak information about the BPF
> > +        * program to userspace via the uaccess buffer.
> > +        */
> >
> > but now it says something very generic:
> >
> > /*
> > * Avoid logging uaccesses here as the BPF program may not be following
> > * the uaccess log rules.
> > */
>
> Yes we should be able to log them theoretically, but we don't need to
> do that now. See my reply here:
>
> https://lore.kernel.org/all/CAMn1gO5B5Q3hfN6kugv2wmdFGNhJb75iRX1zmCkw3wnueN1dtg@mail.gmail.com/#:~:text=This%20comment%20was,the%20comment%20accordingly.

I see. These could be marked with another flag.
I don't have a strong opinion about this. But I am mentioning this
because my experience is that it's better to expose more raw info from
kernel in these cases, rather than hardcoding policies into kernel
code (what's ignored/why/when) b/c a delay from another kernel change
to wide deployment is 5+ years and user-space code may need to detect
and deal with all various versions of the kernel logic.
Say, fuzzing may still want to know about the mount options (rather
than no signal that the kernel reads at least something at that
address). But adding them later with a flag is not really a backwards
compatible change b/c you now have addressability checking code that's
not checking the new flag and will produce false positives.
Peter Collingbourne Dec. 10, 2021, 4:02 a.m. UTC | #4
On Thu, Dec 9, 2021 at 6:59 PM Dmitry Vyukov <dvyukov@google.com> wrote:
>
> On Thu, 9 Dec 2021 at 22:42, Peter Collingbourne <pcc@google.com> wrote:
> > > > With uaccess logging the contract is that the kernel must not report
> > > > accessing more data than necessary, as this can lead to false positive
> > > > reports in downstream consumers. This generally works out of the box
> > > > when instrumenting copy_{from,to}_user(), but with the data argument
> > > > to mount() we use copy_from_user() to copy PAGE_SIZE bytes (or as
> > > > much as we can, if the PAGE_SIZE sized access failed) and figure out
> > > > later how much we actually need.
> > > >
> > > > To prevent this from leading to a false positive report, use
> > > > copy_from_user_nolog(), which will prevent the access from being logged.
> > > > Recall that it is valid for the kernel to report accessing less
> > > > data than it actually accessed, as uaccess logging is a best-effort
> > > > mechanism for reporting uaccesses.
> > > >
> > > > Link: https://linux-review.googlesource.com/id/I5629b92a725c817acd9a861288338dd605cafee6
> > > > Signed-off-by: Peter Collingbourne <pcc@google.com>
> > > > ---
> > > >  fs/namespace.c | 8 +++++++-
> > > >  1 file changed, 7 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/fs/namespace.c b/fs/namespace.c
> > > > index 659a8f39c61a..8f5f2aaca64e 100644
> > > > --- a/fs/namespace.c
> > > > +++ b/fs/namespace.c
> > > > @@ -31,6 +31,7 @@
> > > >  #include <uapi/linux/mount.h>
> > > >  #include <linux/fs_context.h>
> > > >  #include <linux/shmem_fs.h>
> > > > +#include <linux/uaccess-buffer.h>
> > > >
> > > >  #include "pnode.h"
> > > >  #include "internal.h"
> > > > @@ -3197,7 +3198,12 @@ static void *copy_mount_options(const void __user * data)
> > > >         if (!copy)
> > > >                 return ERR_PTR(-ENOMEM);
> > > >
> > > > -       left = copy_from_user(copy, data, PAGE_SIZE);
> > > > +       /*
> > > > +        * Use copy_from_user_nolog to avoid reporting overly large accesses in
> > > > +        * the uaccess buffer, as this can lead to false positive reports in
> > > > +        * downstream consumers.
> > > > +        */
> > > > +       left = copy_from_user_nolog(copy, data, PAGE_SIZE);
> > >
> > > A late idea...
> > > Maybe it's better to log them with a new UACCESS_BUFFER_FLAG_OVERREAD
> > > flag. Better for user-space, at least can detect UAFs by checking the
> > > first byte. And a more logical kernel annotation (maybe will be used
> > > in some other tools? or if we ever check user tags in the kernel).
> > >
> > > Probably not too important today since we use this only in 2 places,
> > > but longer term may be better.
> >
> > I'm not sure about this. The overreads are basically an implementation
> > detail of the kernel, so I'm not sure it makes sense to expose them. A
> > scheme where we expose all overreads wouldn't necessarily help with
> > UAF, because what if for example the kernel reads *behind* the
> > user-provided pointer? I guess it could lead to false positives.
>
> If user-space uses logging to check addressability, then it can safely
> check only the first byte (right? there must be at least 1 byte passed
> by user-space at that address). And that's enough to detect UAFs.

I was thinking more e.g. what if the kernel reads an entire page with
copy_from_user() and takes a subset of it later. Then the first byte
could point to some other random allocation in the same page and lead
to a false UAF report if we just consider the first byte.

So I think the use cases for accesses with this flag set may be
limited to things like fuzzers.

> > > Btw, what's the story with BPF accesses? Can we log them theoretically?
> > >
> > > Previously the comment said:
> > >
> > > +       /*
> > > +        * Avoid copy_from_user() here as it may leak information about the BPF
> > > +        * program to userspace via the uaccess buffer.
> > > +        */
> > >
> > > but now it says something very generic:
> > >
> > > /*
> > > * Avoid logging uaccesses here as the BPF program may not be following
> > > * the uaccess log rules.
> > > */
> >
> > Yes we should be able to log them theoretically, but we don't need to
> > do that now. See my reply here:
> >
> > https://lore.kernel.org/all/CAMn1gO5B5Q3hfN6kugv2wmdFGNhJb75iRX1zmCkw3wnueN1dtg@mail.gmail.com/#:~:text=This%20comment%20was,the%20comment%20accordingly.
>
> I see. These could be marked with another flag.
> I don't have a strong opinion about this. But I am mentioning this
> because my experience is that it's better to expose more raw info from
> kernel in these cases, rather than hardcoding policies into kernel
> code (what's ignored/why/when) b/c a delay from another kernel change
> to wide deployment is 5+ years and user-space code may need to detect
> and deal with all various versions of the kernel logic.
> Say, fuzzing may still want to know about the mount options (rather
> than no signal that the kernel reads at least something at that
> address). But adding them later with a flag is not really a backwards
> compatible change b/c you now have addressability checking code that's
> not checking the new flag and will produce false positives.

I think this is a good point. I'll see about adding flags for the BPF
and overread cases.

Peter
Dmitry Vyukov Dec. 10, 2021, 4:23 a.m. UTC | #5
On Fri, 10 Dec 2021 at 05:02, Peter Collingbourne <pcc@google.com> wrote:
> > On Thu, 9 Dec 2021 at 22:42, Peter Collingbourne <pcc@google.com> wrote:
> > > > > With uaccess logging the contract is that the kernel must not report
> > > > > accessing more data than necessary, as this can lead to false positive
> > > > > reports in downstream consumers. This generally works out of the box
> > > > > when instrumenting copy_{from,to}_user(), but with the data argument
> > > > > to mount() we use copy_from_user() to copy PAGE_SIZE bytes (or as
> > > > > much as we can, if the PAGE_SIZE sized access failed) and figure out
> > > > > later how much we actually need.
> > > > >
> > > > > To prevent this from leading to a false positive report, use
> > > > > copy_from_user_nolog(), which will prevent the access from being logged.
> > > > > Recall that it is valid for the kernel to report accessing less
> > > > > data than it actually accessed, as uaccess logging is a best-effort
> > > > > mechanism for reporting uaccesses.
> > > > >
> > > > > Link: https://linux-review.googlesource.com/id/I5629b92a725c817acd9a861288338dd605cafee6
> > > > > Signed-off-by: Peter Collingbourne <pcc@google.com>
> > > > > ---
> > > > >  fs/namespace.c | 8 +++++++-
> > > > >  1 file changed, 7 insertions(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/fs/namespace.c b/fs/namespace.c
> > > > > index 659a8f39c61a..8f5f2aaca64e 100644
> > > > > --- a/fs/namespace.c
> > > > > +++ b/fs/namespace.c
> > > > > @@ -31,6 +31,7 @@
> > > > >  #include <uapi/linux/mount.h>
> > > > >  #include <linux/fs_context.h>
> > > > >  #include <linux/shmem_fs.h>
> > > > > +#include <linux/uaccess-buffer.h>
> > > > >
> > > > >  #include "pnode.h"
> > > > >  #include "internal.h"
> > > > > @@ -3197,7 +3198,12 @@ static void *copy_mount_options(const void __user * data)
> > > > >         if (!copy)
> > > > >                 return ERR_PTR(-ENOMEM);
> > > > >
> > > > > -       left = copy_from_user(copy, data, PAGE_SIZE);
> > > > > +       /*
> > > > > +        * Use copy_from_user_nolog to avoid reporting overly large accesses in
> > > > > +        * the uaccess buffer, as this can lead to false positive reports in
> > > > > +        * downstream consumers.
> > > > > +        */
> > > > > +       left = copy_from_user_nolog(copy, data, PAGE_SIZE);
> > > >
> > > > A late idea...
> > > > Maybe it's better to log them with a new UACCESS_BUFFER_FLAG_OVERREAD
> > > > flag. Better for user-space, at least can detect UAFs by checking the
> > > > first byte. And a more logical kernel annotation (maybe will be used
> > > > in some other tools? or if we ever check user tags in the kernel).
> > > >
> > > > Probably not too important today since we use this only in 2 places,
> > > > but longer term may be better.
> > >
> > > I'm not sure about this. The overreads are basically an implementation
> > > detail of the kernel, so I'm not sure it makes sense to expose them. A
> > > scheme where we expose all overreads wouldn't necessarily help with
> > > UAF, because what if for example the kernel reads *behind* the
> > > user-provided pointer? I guess it could lead to false positives.
> >
> > If user-space uses logging to check addressability, then it can safely
> > check only the first byte (right? there must be at least 1 byte passed
> > by user-space at that address). And that's enough to detect UAFs.
>
> I was thinking more e.g. what if the kernel reads an entire page with
> copy_from_user() and takes a subset of it later. Then the first byte
> could point to some other random allocation in the same page and lead
> to a false UAF report if we just consider the first byte.

Humm.. good point. As I said I am not strong about this. I don't know
what's the right answer.

> So I think the use cases for accesses with this flag set may be
> limited to things like fuzzers.
>
> > > > Btw, what's the story with BPF accesses? Can we log them theoretically?
> > > >
> > > > Previously the comment said:
> > > >
> > > > +       /*
> > > > +        * Avoid copy_from_user() here as it may leak information about the BPF
> > > > +        * program to userspace via the uaccess buffer.
> > > > +        */
> > > >
> > > > but now it says something very generic:
> > > >
> > > > /*
> > > > * Avoid logging uaccesses here as the BPF program may not be following
> > > > * the uaccess log rules.
> > > > */
> > >
> > > Yes we should be able to log them theoretically, but we don't need to
> > > do that now. See my reply here:
> > >
> > > https://lore.kernel.org/all/CAMn1gO5B5Q3hfN6kugv2wmdFGNhJb75iRX1zmCkw3wnueN1dtg@mail.gmail.com/#:~:text=This%20comment%20was,the%20comment%20accordingly.
> >
> > I see. These could be marked with another flag.
> > I don't have a strong opinion about this. But I am mentioning this
> > because my experience is that it's better to expose more raw info from
> > kernel in these cases, rather than hardcoding policies into kernel
> > code (what's ignored/why/when) b/c a delay from another kernel change
> > to wide deployment is 5+ years and user-space code may need to detect
> > and deal with all various versions of the kernel logic.
> > Say, fuzzing may still want to know about the mount options (rather
> > than no signal that the kernel reads at least something at that
> > address). But adding them later with a flag is not really a backwards
> > compatible change b/c you now have addressability checking code that's
> > not checking the new flag and will produce false positives.
>
> I think this is a good point. I'll see about adding flags for the BPF
> and overread cases.
>
> Peter
diff mbox series

Patch

diff --git a/fs/namespace.c b/fs/namespace.c
index 659a8f39c61a..8f5f2aaca64e 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -31,6 +31,7 @@ 
 #include <uapi/linux/mount.h>
 #include <linux/fs_context.h>
 #include <linux/shmem_fs.h>
+#include <linux/uaccess-buffer.h>
 
 #include "pnode.h"
 #include "internal.h"
@@ -3197,7 +3198,12 @@  static void *copy_mount_options(const void __user * data)
 	if (!copy)
 		return ERR_PTR(-ENOMEM);
 
-	left = copy_from_user(copy, data, PAGE_SIZE);
+	/*
+	 * Use copy_from_user_nolog to avoid reporting overly large accesses in
+	 * the uaccess buffer, as this can lead to false positive reports in
+	 * downstream consumers.
+	 */
+	left = copy_from_user_nolog(copy, data, PAGE_SIZE);
 
 	/*
 	 * Not all architectures have an exact copy_from_user(). Resort to