[PULL] vhost: cleanups and fixes

Message ID	20181101171938-mutt-send-email-mst@kernel.org (mailing list archive)
State	New, archived
Headers	show Return-Path: <kvm-owner@kernel.org> Date: Thu, 1 Nov 2018 17:19:38 -0400 From: "Michael S. Tsirkin" <mst@redhat.com> To: Linus Torvalds <torvalds@linux-foundation.org> Cc: kvm@vger.kernel.org, virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, bijan.mottahedeh@oracle.com, gedwards@ddn.com, joe@perches.com, lenaic@lhuard.fr, liang.z.li@intel.com, mhocko@kernel.org, mhocko@suse.com, mst@redhat.com, stefanha@redhat.com, torvalds@linux-foundation.org, wei.w.wang@intel.com Subject: [PULL] vhost: cleanups and fixes Message-ID: <20181101171938-mutt-send-email-mst@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit Sender: kvm-owner@vger.kernel.org Precedence: bulk
Series	[PULL] vhost: cleanups and fixes \| expand [PULL] vhost: cleanups and fixes

Michael S. Tsirkin Nov. 1, 2018, 9:19 p.m. UTC

The following changes since commit 84df9525b0c27f3ebc2ebb1864fa62a97fdedb7d:

  Linux 4.19 (2018-10-22 07:37:37 +0100)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git tags/for_linus

for you to fetch changes up to 79f800b2e76923cd8ce0aa659cb5c019d9643bc9:

  MAINTAINERS: remove reference to bogus vsock file (2018-10-24 21:16:14 -0400)

----------------------------------------------------------------
virtio, vhost: fixes, tweaks

virtio balloon page hinting support
vhost scsi control queue

misc fixes.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

----------------------------------------------------------------
Bijan Mottahedeh (3):
      vhost/scsi: Respond to control queue operations
      vhost/scsi: Extract common handling code from control queue handler
      vhost/scsi: Use common handling code in request queue handler

Greg Edwards (1):
      vhost/scsi: truncate T10 PI iov_iter to prot_bytes

Lénaïc Huard (1):
      kvm_config: add CONFIG_VIRTIO_MENU

Stefan Hajnoczi (1):
      MAINTAINERS: remove reference to bogus vsock file

Wei Wang (3):
      virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
      mm/page_poison: expose page_poisoning_enabled to kernel modules
      virtio-balloon: VIRTIO_BALLOON_F_PAGE_POISON

 MAINTAINERS                         |   1 -
 drivers/vhost/scsi.c                | 426 ++++++++++++++++++++++++++++--------
 drivers/virtio/virtio_balloon.c     | 380 +++++++++++++++++++++++++++++---
 include/uapi/linux/virtio_balloon.h |   8 +
 kernel/configs/kvm_guest.config     |   1 +
 mm/page_poison.c                    |   6 +
 6 files changed, 688 insertions(+), 134 deletions(-)

Linus Torvalds Nov. 1, 2018, 9:44 p.m. UTC | #1

On Thu, Nov 1, 2018 at 2:19 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> virtio, vhost: fixes, tweaks

Pulled.

                   Linus

Kees Cook Nov. 1, 2018, 11 p.m. UTC | #2

On Thu, Nov 1, 2018 at 2:19 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> The following changes since commit 84df9525b0c27f3ebc2ebb1864fa62a97fdedb7d:
>
>   Linux 4.19 (2018-10-22 07:37:37 +0100)
>
> are available in the Git repository at:
>
>   git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git tags/for_linus
>
> for you to fetch changes up to 79f800b2e76923cd8ce0aa659cb5c019d9643bc9:
>
>   MAINTAINERS: remove reference to bogus vsock file (2018-10-24 21:16:14 -0400)
>
> ----------------------------------------------------------------
> virtio, vhost: fixes, tweaks
>
> virtio balloon page hinting support
> vhost scsi control queue
>
> misc fixes.
>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
>
> ----------------------------------------------------------------
> Bijan Mottahedeh (3):
>       vhost/scsi: Respond to control queue operations

+static void
+vhost_scsi_send_tmf_resp(struct vhost_scsi *vs,
+                          struct vhost_virtqueue *vq,
+                          int head, unsigned int out)
+{
+       struct virtio_scsi_ctrl_tmf_resp __user *resp;
+       struct virtio_scsi_ctrl_tmf_resp rsp;
+       int ret;
+
+       pr_debug("%s\n", __func__);
+       memset(&rsp, 0, sizeof(rsp));
+       rsp.response = VIRTIO_SCSI_S_FUNCTION_REJECTED;
+       resp = vq->iov[out].iov_base;
+       ret = __copy_to_user(resp, &rsp, sizeof(rsp));

Is it actually safe to trust that iov_base has passed an earlier
access_ok() check here? Why not just use copy_to_user() instead?

-Kees

>       vhost/scsi: Extract common handling code from control queue handler
>       vhost/scsi: Use common handling code in request queue handler
>
> Greg Edwards (1):
>       vhost/scsi: truncate T10 PI iov_iter to prot_bytes
>
> Lénaïc Huard (1):
>       kvm_config: add CONFIG_VIRTIO_MENU
>
> Stefan Hajnoczi (1):
>       MAINTAINERS: remove reference to bogus vsock file
>
> Wei Wang (3):
>       virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
>       mm/page_poison: expose page_poisoning_enabled to kernel modules
>       virtio-balloon: VIRTIO_BALLOON_F_PAGE_POISON
>
>  MAINTAINERS                         |   1 -
>  drivers/vhost/scsi.c                | 426 ++++++++++++++++++++++++++++--------
>  drivers/virtio/virtio_balloon.c     | 380 +++++++++++++++++++++++++++++---
>  include/uapi/linux/virtio_balloon.h |   8 +
>  kernel/configs/kvm_guest.config     |   1 +
>  mm/page_poison.c                    |   6 +
>  6 files changed, 688 insertions(+), 134 deletions(-)

Linus Torvalds Nov. 1, 2018, 11:06 p.m. UTC | #3

On Thu, Nov 1, 2018 at 4:00 PM Kees Cook <keescook@chromium.org> wrote:
>
> +       memset(&rsp, 0, sizeof(rsp));
> +       rsp.response = VIRTIO_SCSI_S_FUNCTION_REJECTED;
> +       resp = vq->iov[out].iov_base;
> +       ret = __copy_to_user(resp, &rsp, sizeof(rsp));
>
> Is it actually safe to trust that iov_base has passed an earlier
> access_ok() check here? Why not just use copy_to_user() instead?

Good point.

We really should have removed those double-underscore things ages ago.

Also, apart from the address, what about the size? Wouldn't it be
better to use copy_to_iter() rather than implement it badly by hand?

               Linus

Michael S. Tsirkin Nov. 1, 2018, 11:38 p.m. UTC | #4

On Thu, Nov 01, 2018 at 04:00:23PM -0700, Kees Cook wrote:
> On Thu, Nov 1, 2018 at 2:19 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > The following changes since commit 84df9525b0c27f3ebc2ebb1864fa62a97fdedb7d:
> >
> >   Linux 4.19 (2018-10-22 07:37:37 +0100)
> >
> > are available in the Git repository at:
> >
> >   git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git tags/for_linus
> >
> > for you to fetch changes up to 79f800b2e76923cd8ce0aa659cb5c019d9643bc9:
> >
> >   MAINTAINERS: remove reference to bogus vsock file (2018-10-24 21:16:14 -0400)
> >
> > ----------------------------------------------------------------
> > virtio, vhost: fixes, tweaks
> >
> > virtio balloon page hinting support
> > vhost scsi control queue
> >
> > misc fixes.
> >
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> >
> > ----------------------------------------------------------------
> > Bijan Mottahedeh (3):
> >       vhost/scsi: Respond to control queue operations
> 
> +static void
> +vhost_scsi_send_tmf_resp(struct vhost_scsi *vs,
> +                          struct vhost_virtqueue *vq,
> +                          int head, unsigned int out)
> +{
> +       struct virtio_scsi_ctrl_tmf_resp __user *resp;
> +       struct virtio_scsi_ctrl_tmf_resp rsp;
> +       int ret;
> +
> +       pr_debug("%s\n", __func__);
> +       memset(&rsp, 0, sizeof(rsp));
> +       rsp.response = VIRTIO_SCSI_S_FUNCTION_REJECTED;
> +       resp = vq->iov[out].iov_base;
> +       ret = __copy_to_user(resp, &rsp, sizeof(rsp));
> 
> Is it actually safe to trust that iov_base has passed an earlier
> access_ok() check here? Why not just use copy_to_user() instead?
> 
> -Kees

I am not sure copy_to_user will do the right thing here, because all
this runs in context of a kernel thread.  We do need access_ok which
takes place way earlier in context of the task.

Another reason it is safe is because the address is not
coming from userspace at all.




> >       vhost/scsi: Extract common handling code from control queue handler
> >       vhost/scsi: Use common handling code in request queue handler
> >
> > Greg Edwards (1):
> >       vhost/scsi: truncate T10 PI iov_iter to prot_bytes
> >
> > Lénaïc Huard (1):
> >       kvm_config: add CONFIG_VIRTIO_MENU
> >
> > Stefan Hajnoczi (1):
> >       MAINTAINERS: remove reference to bogus vsock file
> >
> > Wei Wang (3):
> >       virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
> >       mm/page_poison: expose page_poisoning_enabled to kernel modules
> >       virtio-balloon: VIRTIO_BALLOON_F_PAGE_POISON
> >
> >  MAINTAINERS                         |   1 -
> >  drivers/vhost/scsi.c                | 426 ++++++++++++++++++++++++++++--------
> >  drivers/virtio/virtio_balloon.c     | 380 +++++++++++++++++++++++++++++---
> >  include/uapi/linux/virtio_balloon.h |   8 +
> >  kernel/configs/kvm_guest.config     |   1 +
> >  mm/page_poison.c                    |   6 +
> >  6 files changed, 688 insertions(+), 134 deletions(-)
> 
> 
> 
> -- 
> Kees Cook

Michael S. Tsirkin Nov. 1, 2018, 11:55 p.m. UTC | #5

On Thu, Nov 01, 2018 at 04:06:19PM -0700, Linus Torvalds wrote:
> On Thu, Nov 1, 2018 at 4:00 PM Kees Cook <keescook@chromium.org> wrote:
> >
> > +       memset(&rsp, 0, sizeof(rsp));
> > +       rsp.response = VIRTIO_SCSI_S_FUNCTION_REJECTED;
> > +       resp = vq->iov[out].iov_base;
> > +       ret = __copy_to_user(resp, &rsp, sizeof(rsp));
> >
> > Is it actually safe to trust that iov_base has passed an earlier
> > access_ok() check here? Why not just use copy_to_user() instead?
> 
> Good point.
> 
> We really should have removed those double-underscore things ages ago.

Well in case of vhost there are a bunch of reasons to keep them.

One is that all access_ok checks take place
way earlier in context of the owner task. Result is saved
and then used for access repeatedly. Skipping reding access_ok twice
did seem to give a small speedup in the past.
In fact I am looking
into switching some of the uses to unsafe_put_user/unsafe_get_user
after doing something like barrier_nospec after the
access_ok checks. Seems to give a measureable speedup.

Another is that the double underscore things actually can be done
in softirq context if you do preempt_disable/preempt_enable.
IIUC Jason's looking into using that to cut down the latency
for when the access is very small.

> Also, apart from the address, what about the size? Wouldn't it be
> better to use copy_to_iter() rather than implement it badly by hand?
> 
>                Linus

Generally size is checked when we retrieve the iov but I will recheck
this case and reply here.

Mark Rutland Nov. 2, 2018, 11:46 a.m. UTC | #6

On Thu, Nov 01, 2018 at 04:06:19PM -0700, Linus Torvalds wrote:
> On Thu, Nov 1, 2018 at 4:00 PM Kees Cook <keescook@chromium.org> wrote:
> >
> > +       memset(&rsp, 0, sizeof(rsp));
> > +       rsp.response = VIRTIO_SCSI_S_FUNCTION_REJECTED;
> > +       resp = vq->iov[out].iov_base;
> > +       ret = __copy_to_user(resp, &rsp, sizeof(rsp));
> >
> > Is it actually safe to trust that iov_base has passed an earlier
> > access_ok() check here? Why not just use copy_to_user() instead?
> 
> Good point.
> 
> We really should have removed those double-underscore things ages ago.

FWIW, on arm64 we always check/sanitize the user address as a result of
our sanitization of speculated values. Almost all of our uaccess
routines have an explicit access_ok().

All our uaccess routines mask the user pointer based on addr_limit,
which prevents speculative or architectural uaccess to kernel addresses
when addr_limit it USER_DS:

    4d8efc2d5ee4c9cc ("arm64: Use pointer masking to limit uaccess speculation")

We also inhibit speculative stores to addr_limit being forwarded under
speculation:

    c2f0ad4fc089cff8 ("arm64: uaccess: Prevent speculative use of the current addr_limit")

... and given all that, we folded explicit access_ok() checks into
__{get,put}_user():

    84624087dd7e3b48 ("arm64: uaccess: Don't bother eliding access_ok checks in __{get, put}_user")

IMO we could/should do the same for __copy_{to,from}_user().

Thanks,
Mark.

Michael S. Tsirkin Nov. 2, 2018, 1:04 p.m. UTC | #7

On Fri, Nov 02, 2018 at 11:46:36AM +0000, Mark Rutland wrote:
> On Thu, Nov 01, 2018 at 04:06:19PM -0700, Linus Torvalds wrote:
> > On Thu, Nov 1, 2018 at 4:00 PM Kees Cook <keescook@chromium.org> wrote:
> > >
> > > +       memset(&rsp, 0, sizeof(rsp));
> > > +       rsp.response = VIRTIO_SCSI_S_FUNCTION_REJECTED;
> > > +       resp = vq->iov[out].iov_base;
> > > +       ret = __copy_to_user(resp, &rsp, sizeof(rsp));
> > >
> > > Is it actually safe to trust that iov_base has passed an earlier
> > > access_ok() check here? Why not just use copy_to_user() instead?
> > 
> > Good point.
> > 
> > We really should have removed those double-underscore things ages ago.
> 
> FWIW, on arm64 we always check/sanitize the user address as a result of
> our sanitization of speculated values. Almost all of our uaccess
> routines have an explicit access_ok().
> 
> All our uaccess routines mask the user pointer based on addr_limit,
> which prevents speculative or architectural uaccess to kernel addresses
> when addr_limit it USER_DS:
> 
>     4d8efc2d5ee4c9cc ("arm64: Use pointer masking to limit uaccess speculation")
> 
> We also inhibit speculative stores to addr_limit being forwarded under
> speculation:
> 
>     c2f0ad4fc089cff8 ("arm64: uaccess: Prevent speculative use of the current addr_limit")
> 
> ... and given all that, we folded explicit access_ok() checks into
> __{get,put}_user():
> 
>     84624087dd7e3b48 ("arm64: uaccess: Don't bother eliding access_ok checks in __{get, put}_user")
> 
> IMO we could/should do the same for __copy_{to,from}_user().
> 
> Thanks,
> Mark.

I've tried making access_ok mask the parameter it gets.  Works because
access_ok is a macro. Most users pass in a variable so that will block
attempts to use speculation to bypass the access_ok checks.

Not 100% as someone can copy the value before access_ok, but
then it's all mitigation anyway.

Places which call access_ok on a non-lvalue need to be fixed then but
there are not too many of these.

The advantage here is that a code like this:

access_ok
for(...)
	__get_user

isn't slowed down as the masking is outside the loop.

OTOH macros changing their arguments are kind of ugly.
What do others think?

Just to show what I mean:

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index aae77eb8491c..c4d12c8f47d7 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -7,6 +7,7 @@
 #include <linux/compiler.h>
 #include <linux/kasan-checks.h>
 #include <linux/string.h>
+#include <linux/nospec.h>
 #include <asm/asm.h>
 #include <asm/page.h>
 #include <asm/smap.h>
@@ -69,6 +70,33 @@ static inline bool __chk_range_not_ok(unsigned long addr, unsigned long size, un
 	__chk_range_not_ok((unsigned long __force)(addr), size, limit); \
 })
 
+/*
+ * Test whether a block of memory is a valid user space address.
+ * Returns 0 if the range is valid, address itself otherwise.
+ */
+static inline unsigned long __verify_range_nospec(unsigned long addr,
+						  unsigned long size,
+						  unsigned long limit)
+{
+	/* Be careful about overflow */
+	limit = array_index_nospec(limit, size);
+
+	/*
+	 * If we have used "sizeof()" for the size,
+	 * we know it won't overflow the limit (but
+	 * it might overflow the 'addr', so it's
+	 * important to subtract the size from the
+	 * limit, not add it to the address).
+	 */
+	if (__builtin_constant_p(size)) {
+		return array_index_nospec(addr, limit - size + 1);
+	}
+
+	/* Arbitrary sizes? Be careful about overflow */
+	return array_index_mask_nospec(limit, size) &
+		array_index_nospec(addr, limit - size + 1);
+}
+
 #ifdef CONFIG_DEBUG_ATOMIC_SLEEP
 # define WARN_ON_IN_IRQ()	WARN_ON_ONCE(!in_task())
 #else
@@ -95,12 +123,46 @@ static inline bool __chk_range_not_ok(unsigned long addr, unsigned long size, un
  * checks that the pointer is in the user space range - after calling
  * this function, memory access functions may still return -EFAULT.
  */
-#define access_ok(type, addr, size)					\
+#define unsafe_access_ok(type, addr, size)					\
 ({									\
 	WARN_ON_IN_IRQ();						\
 	likely(!__range_not_ok(addr, size, user_addr_max()));		\
 })
 
+/**
+ * access_ok_nospec: - Checks if a user space pointer is valid
+ * @type: Type of access: %VERIFY_READ or %VERIFY_WRITE.  Note that
+ *        %VERIFY_WRITE is a superset of %VERIFY_READ - if it is safe
+ *        to write to a block, it is always safe to read from it.
+ * @addr: User space pointer to start of block to check
+ * @size: Size of block to check
+ *
+ * Context: User context only. This function may sleep if pagefaults are
+ *          enabled.
+ *
+ * Checks if a pointer to a block of memory in user space is valid.
+ *
+ * Returns address itself (nonzero) if the memory block may be valid,
+ * zero if it is definitely invalid.
+ *
+ * To prevent speculation, the returned value must then be used
+ * for accesses.
+ *
+ * Note that, depending on architecture, this function probably just
+ * checks that the pointer is in the user space range - after calling
+ * this function, memory access functions may still return -EFAULT.
+ */
+#define access_ok_nospec(type, addr, size)			\
+({								\
+	WARN_ON_IN_IRQ();					\
+	__chk_user_ptr(addr);					\
+	addr = (typeof(addr) __force)				\
+	__verify_range_nospec((unsigned long __force)(addr),	\
+			       size, user_addr_max());		\
+})
+
+#define access_ok(type, addr, size) access_ok_nospec(type, addr, size)
+
 /*
  * These are the main single-value transfer routines.  They automatically
  * use the right size if we just have the right pointer type.

Linus Torvalds Nov. 2, 2018, 4:14 p.m. UTC | #8

On Fri, Nov 2, 2018 at 6:04 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> I've tried making access_ok mask the parameter it gets.

PLEASE don't do this.

Just use "copy_to/from_user()".

We have had lots of bugs because code bitrots.

And no, the access_ok() checks aren't expensive, not even in a loop.
They *used* to be somewhat expensive compared to the access, but that
simply isn't true any more. The real expense in copy_to_user and
friends are in the user access bit setting (STAC and CLAC on x86),
which easily an order of magnitude more expensive than access_ok().

So just get rid of the double-underscore version. It's basically
always a mis-optimization due to entirely historical reasons. I can
pretty much guarantee that it's not visible in profiles.

                  Linus

Michael S. Tsirkin Nov. 2, 2018, 4:59 p.m. UTC | #9

On Fri, Nov 02, 2018 at 09:14:51AM -0700, Linus Torvalds wrote:
> On Fri, Nov 2, 2018 at 6:04 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > I've tried making access_ok mask the parameter it gets.
> 
> PLEASE don't do this.

Okay.

> Just use "copy_to/from_user()".

Just for completeness I'd like to point out for vhost the copies are
done from the kernel thread.  So yes we can switch to copy_to/from_user
but for e.g. 32-bit userspace running on top of a 64 bit kernel it is
IIUC not sufficient - we must *also* do access_ok checks on control path
when addresses are passed to the kernel and when current points to the
correct task struct.

> We have had lots of bugs because code bitrots.

Yes, I wish we did not need these access_ok checks and could just rely
on copy_to/from_user.

> And no, the access_ok() checks aren't expensive, not even in a loop.
> They *used* to be somewhat expensive compared to the access, but that
> simply isn't true any more. The real expense in copy_to_user and
> friends are in the user access bit setting (STAC and CLAC on x86),
> which easily an order of magnitude more expensive than access_ok().
> 
> So just get rid of the double-underscore version. It's basically
> always a mis-optimization due to entirely historical reasons. I can
> pretty much guarantee that it's not visible in profiles.
> 
>                   Linus

OK. So maybe we should focus on switching to user_access_begin/end +
unsafe_get_user/unsafe_put_user in a loop which does seem to be
measureable.  That moves the barrier out of the loop, which seems to be
consistent with what you would expect.

Linus Torvalds Nov. 2, 2018, 5:10 p.m. UTC | #10

On Fri, Nov 2, 2018 at 9:59 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> Just for completeness I'd like to point out for vhost the copies are
> done from the kernel thread.  So yes we can switch to copy_to/from_user
> but for e.g. 32-bit userspace running on top of a 64 bit kernel it is
> IIUC not sufficient - we must *also* do access_ok checks on control path
> when addresses are passed to the kernel and when current points to the
> correct task struct.

Don't you take over the VM with "use_mm()" when you do the copies? So
yes, it's a kernel thread, but it has a user VM, and though that
should have the user limits.

No?

          Linus

Linus Torvalds Nov. 2, 2018, 5:15 p.m. UTC | #11

On Fri, Nov 2, 2018 at 10:10 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Don't you take over the VM with "use_mm()" when you do the copies? So
> yes, it's a kernel thread, but it has a user VM, and though that
> should have the user limits.

Oooh. *Just* as I sent this, I realized that "use_mm()" doesn't update
the thread addr_limit.

That actually looks like a bug to me - although one that you've
apparently been aware of and worked around.

Wouldn't it be nicer to just make "use_mm()" do

        set_fs(USER_DS);

instead? And undo it on unuse_mm()?

And, in fact, maybe we should default kernel threads to have a zero
address limit, so that they can't do any user accesses at all without
doing this?

Adding Al to the cc, because I think he's been looking at set_fs() in general.

                  Linus

Michael S. Tsirkin Nov. 2, 2018, 5:21 p.m. UTC | #12

On Fri, Nov 02, 2018 at 10:10:45AM -0700, Linus Torvalds wrote:
> On Fri, Nov 2, 2018 at 9:59 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > Just for completeness I'd like to point out for vhost the copies are
> > done from the kernel thread.  So yes we can switch to copy_to/from_user
> > but for e.g. 32-bit userspace running on top of a 64 bit kernel it is
> > IIUC not sufficient - we must *also* do access_ok checks on control path
> > when addresses are passed to the kernel and when current points to the
> > correct task struct.
> 
> Don't you take over the VM with "use_mm()" when you do the copies?

Yes we do.

> So
> yes, it's a kernel thread, but it has a user VM, and though that
> should have the user limits.
> 
> No?
> 
>           Linus

Here's what I meant: we have

#define access_ok(type, addr, size)                                     \
({                                                                      \
        WARN_ON_IN_IRQ();                                               \
        likely(!__range_not_ok(addr, size, user_addr_max()));           \
})

and

#define user_addr_max() (current->thread.addr_limit.seg)

it seems that it depends on current not on the active mm.

get_user and friends are similar:

ENTRY(__get_user_1)
        mov PER_CPU_VAR(current_task), %_ASM_DX
        cmp TASK_addr_limit(%_ASM_DX),%_ASM_AX
        jae bad_get_user
        sbb %_ASM_DX, %_ASM_DX          /* array_index_mask_nospec() */
        and %_ASM_DX, %_ASM_AX
        ASM_STAC
1:      movzbl (%_ASM_AX),%edx
        xor %eax,%eax
        ASM_CLAC
        ret
ENDPROC(__get_user_1)
EXPORT_SYMBOL(__get_user_1)

Linus Torvalds Nov. 2, 2018, 6:02 p.m. UTC | #13

On Fri, Nov 2, 2018 at 10:21 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> it seems that it depends on current not on the active mm.

Yes, see my other suggestion to just fix "use_mm()" to update the address range.

Would you mind testing that?

Because that would seem to be the *much* less error-prone model..

           Linus

Michael S. Tsirkin Nov. 2, 2018, 6:12 p.m. UTC | #14

On Fri, Nov 02, 2018 at 11:02:35AM -0700, Linus Torvalds wrote:
> On Fri, Nov 2, 2018 at 10:21 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > it seems that it depends on current not on the active mm.
> 
> Yes, see my other suggestion to just fix "use_mm()" to update the address range.
> 
> Would you mind testing that?

Sure, I'll test it.

> Because that would seem to be the *much* less error-prone model..
> 
>            Linus

I agree, it's always been bothering me.

Al Viro Nov. 2, 2018, 7:01 p.m. UTC | #15

On Fri, Nov 02, 2018 at 10:15:56AM -0700, Linus Torvalds wrote:
> On Fri, Nov 2, 2018 at 10:10 AM Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > Don't you take over the VM with "use_mm()" when you do the copies? So
> > yes, it's a kernel thread, but it has a user VM, and though that
> > should have the user limits.
> 
> Oooh. *Just* as I sent this, I realized that "use_mm()" doesn't update
> the thread addr_limit.
> 
> That actually looks like a bug to me - although one that you've
> apparently been aware of and worked around.
> 
> Wouldn't it be nicer to just make "use_mm()" do
> 
>         set_fs(USER_DS);
> 
> instead? And undo it on unuse_mm()?
> 
> And, in fact, maybe we should default kernel threads to have a zero
> address limit, so that they can't do any user accesses at all without
> doing this?

Try it and watch it fail to set initramfs up, let alone exec the init...

> Adding Al to the cc, because I think he's been looking at set_fs() in general.

It would be the right thing (with return to KERNEL_DS), but I'm not certain
if GPU users will survive - these two
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h:157:                         use_mm(mmptr);                          \
drivers/gpu/drm/i915/gvt/kvmgt.c:1799:          use_mm(kvm->mm);
I don't understand the call chains there (especially for the first one) well
enough to tell.

Michael S. Tsirkin Nov. 30, 2018, 1:44 p.m. UTC | #16

On Thu, Nov 01, 2018 at 04:06:19PM -0700, Linus Torvalds wrote:
> On Thu, Nov 1, 2018 at 4:00 PM Kees Cook <keescook@chromium.org> wrote:
> >
> > +       memset(&rsp, 0, sizeof(rsp));
> > +       rsp.response = VIRTIO_SCSI_S_FUNCTION_REJECTED;
> > +       resp = vq->iov[out].iov_base;
> > +       ret = __copy_to_user(resp, &rsp, sizeof(rsp));
> >
> > Is it actually safe to trust that iov_base has passed an earlier
> > access_ok() check here? Why not just use copy_to_user() instead?
> 
> Good point.
> 
> We really should have removed those double-underscore things ages ago.
> 
> Also, apart from the address, what about the size? Wouldn't it be
> better to use copy_to_iter() rather than implement it badly by hand?
> 
>                Linus

Bijan can you respond please?
Are you going to look into this and convert code to copy_to_iter?
I don't think we should release Linux like this, so if you don't
have the time I'd rather revert for now and you can look
into reposting for the next release.

Thanks,

Bijan Mottahedeh Nov. 30, 2018, 7:01 p.m. UTC | #17

On 11/30/2018 5:44 AM, Michael S. Tsirkin wrote:
> On Thu, Nov 01, 2018 at 04:06:19PM -0700, Linus Torvalds wrote:
>> On Thu, Nov 1, 2018 at 4:00 PM Kees Cook <keescook@chromium.org> wrote:
>>> +       memset(&rsp, 0, sizeof(rsp));
>>> +       rsp.response = VIRTIO_SCSI_S_FUNCTION_REJECTED;
>>> +       resp = vq->iov[out].iov_base;
>>> +       ret = __copy_to_user(resp, &rsp, sizeof(rsp));
>>>
>>> Is it actually safe to trust that iov_base has passed an earlier
>>> access_ok() check here? Why not just use copy_to_user() instead?
>> Good point.
>>
>> We really should have removed those double-underscore things ages ago.
>>
>> Also, apart from the address, what about the size? Wouldn't it be
>> better to use copy_to_iter() rather than implement it badly by hand?
>>
>>                 Linus
> Bijan can you respond please?
> Are you going to look into this and convert code to copy_to_iter?
> I don't think we should release Linux like this, so if you don't
> have the time I'd rather revert for now and you can look
> into reposting for the next release.
>
> Thanks,
>

Sure, will do.  Can I send an individual patch for the fix to 
vhost_scsi_send_tmf_reject()?

Thanks.

--bijan

Michael S. Tsirkin Nov. 30, 2018, 7:55 p.m. UTC | #18

On Fri, Nov 30, 2018 at 11:01:03AM -0800, Bijan Mottahedeh wrote:
> On 11/30/2018 5:44 AM, Michael S. Tsirkin wrote:
> > On Thu, Nov 01, 2018 at 04:06:19PM -0700, Linus Torvalds wrote:
> > > On Thu, Nov 1, 2018 at 4:00 PM Kees Cook <keescook@chromium.org> wrote:
> > > > +       memset(&rsp, 0, sizeof(rsp));
> > > > +       rsp.response = VIRTIO_SCSI_S_FUNCTION_REJECTED;
> > > > +       resp = vq->iov[out].iov_base;
> > > > +       ret = __copy_to_user(resp, &rsp, sizeof(rsp));
> > > > 
> > > > Is it actually safe to trust that iov_base has passed an earlier
> > > > access_ok() check here? Why not just use copy_to_user() instead?
> > > Good point.
> > > 
> > > We really should have removed those double-underscore things ages ago.
> > > 
> > > Also, apart from the address, what about the size? Wouldn't it be
> > > better to use copy_to_iter() rather than implement it badly by hand?
> > > 
> > >                 Linus
> > Bijan can you respond please?
> > Are you going to look into this and convert code to copy_to_iter?
> > I don't think we should release Linux like this, so if you don't
> > have the time I'd rather revert for now and you can look
> > into reposting for the next release.
> > 
> > Thanks,
> > 
> 
> Sure, will do.  Can I send an individual patch for the fix to
> vhost_scsi_send_tmf_reject()?
> 
> Thanks.
> 
> --bijan

Please go ahead.

[PULL] vhost: cleanups and fixes

Pull-request

Message

Comments