Message ID | 1438180163-275465-2-git-send-email-imammedo@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, Jul 29, 2015 at 04:29:22PM +0200, Igor Mammedov wrote: > From: "Michael S. Tsirkin" <mst@redhat.com> > > Userspace currently simply tries to give vhost as many regions > as it happens to have, but you only have the mem table > when you have initialized a large part of VM, so graceful > failure is very hard to support. > > The result is that userspace tends to fail catastrophically. > > Instead, add a new ioctl so userspace can find out how much kernel > supports, up front. This returns a positive value that we commit to. > > Also, document our contract with legacy userspace: when running on an > old kernel, you get -1 and you can assume at least 64 slots. Since 0 > value's left unused, let's make that mean that the current userspace > behaviour (trial and error) is required, just in case we want it back. What's wrong with reading the module parameter value? It's there in sysfs ... > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com> > Signed-off-by: Igor Mammedov <imammedo@redhat.com> > --- > drivers/vhost/vhost.c | 7 ++++++- > include/uapi/linux/vhost.h | 17 ++++++++++++++++- > 2 files changed, 22 insertions(+), 2 deletions(-) > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c > index eec2f11..76dc0cf 100644 > --- a/drivers/vhost/vhost.c > +++ b/drivers/vhost/vhost.c > @@ -30,7 +30,7 @@ > > #include "vhost.h" > > -static ushort max_mem_regions = 64; > +static ushort max_mem_regions = VHOST_MEM_MAX_NREGIONS_DEFAULT; > module_param(max_mem_regions, ushort, 0444); > MODULE_PARM_DESC(max_mem_regions, > "Maximum number of memory regions in memory map. (default: 64)"); > @@ -944,6 +944,11 @@ long vhost_dev_ioctl(struct vhost_dev *d, unsigned int ioctl, void __user *argp) > long r; > int i, fd; > > + if (ioctl == VHOST_GET_MEM_MAX_NREGIONS) { > + r = max_mem_regions; > + goto done; > + } > + > /* If you are not the owner, you can become one */ > if (ioctl == VHOST_SET_OWNER) { > r = vhost_dev_set_owner(d); > diff --git a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h > index ab373191..2511954 100644 > --- a/include/uapi/linux/vhost.h > +++ b/include/uapi/linux/vhost.h > @@ -80,7 +80,7 @@ struct vhost_memory { > * Allows subsequent call to VHOST_OWNER_SET to succeed. */ > #define VHOST_RESET_OWNER _IO(VHOST_VIRTIO, 0x02) > > -/* Set up/modify memory layout */ > +/* Set up/modify memory layout: see also VHOST_GET_MEM_MAX_NREGIONS below. */ > #define VHOST_SET_MEM_TABLE _IOW(VHOST_VIRTIO, 0x03, struct vhost_memory) > > /* Write logging setup. */ > @@ -127,6 +127,21 @@ struct vhost_memory { > /* Set eventfd to signal an error */ > #define VHOST_SET_VRING_ERR _IOW(VHOST_VIRTIO, 0x22, struct vhost_vring_file) > > +/* Query upper limit on nregions in VHOST_SET_MEM_TABLE arguments. > + * Returns: > + * 0 < value <= MAX_INT - gives the upper limit, higher values will fail > + * 0 - there's no static limit: try and see if it works > + * -1 - on failure > + */ > +#define VHOST_GET_MEM_MAX_NREGIONS _IO(VHOST_VIRTIO, 0x23) > + > +/* Returned by VHOST_GET_MEM_MAX_NREGIONS to mean there's no static limit: > + * try and it'll work if you are lucky. */ > +#define VHOST_MEM_MAX_NREGIONS_NONE 0 > +/* We support at least as many nregions in VHOST_SET_MEM_TABLE: > + * for use on legacy kernels without VHOST_GET_MEM_MAX_NREGIONS support. */ > +#define VHOST_MEM_MAX_NREGIONS_DEFAULT 64 > + > /* VHOST_NET specific defines */ > > /* Attach virtio net ring to a raw socket, or tap device. > -- > 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 29 Jul 2015 17:43:17 +0300 "Michael S. Tsirkin" <mst@redhat.com> wrote: > On Wed, Jul 29, 2015 at 04:29:22PM +0200, Igor Mammedov wrote: > > From: "Michael S. Tsirkin" <mst@redhat.com> > > > > Userspace currently simply tries to give vhost as many regions > > as it happens to have, but you only have the mem table > > when you have initialized a large part of VM, so graceful > > failure is very hard to support. > > > > The result is that userspace tends to fail catastrophically. > > > > Instead, add a new ioctl so userspace can find out how much kernel > > supports, up front. This returns a positive value that we commit to. > > > > Also, document our contract with legacy userspace: when running on an > > old kernel, you get -1 and you can assume at least 64 slots. Since 0 > > value's left unused, let's make that mean that the current userspace > > behaviour (trial and error) is required, just in case we want it back. > > What's wrong with reading the module parameter value? It's there in > sysfs ... for most cases it would work but distro doesn't have to mount sysfs under /sys so it adds to app a burden of discovering where sysfs is mounted and in what module to look for parameter. So IMHO, sysfs is more human oriented interface, while ioctl is more stable API for apps. > > > > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com> > > Signed-off-by: Igor Mammedov <imammedo@redhat.com> > > --- > > drivers/vhost/vhost.c | 7 ++++++- > > include/uapi/linux/vhost.h | 17 ++++++++++++++++- > > 2 files changed, 22 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c > > index eec2f11..76dc0cf 100644 > > --- a/drivers/vhost/vhost.c > > +++ b/drivers/vhost/vhost.c > > @@ -30,7 +30,7 @@ > > > > #include "vhost.h" > > > > -static ushort max_mem_regions = 64; > > +static ushort max_mem_regions = VHOST_MEM_MAX_NREGIONS_DEFAULT; > > module_param(max_mem_regions, ushort, 0444); > > MODULE_PARM_DESC(max_mem_regions, > > "Maximum number of memory regions in memory map. (default: 64)"); > > @@ -944,6 +944,11 @@ long vhost_dev_ioctl(struct vhost_dev *d, unsigned int ioctl, void __user *argp) > > long r; > > int i, fd; > > > > + if (ioctl == VHOST_GET_MEM_MAX_NREGIONS) { > > + r = max_mem_regions; > > + goto done; > > + } > > + > > /* If you are not the owner, you can become one */ > > if (ioctl == VHOST_SET_OWNER) { > > r = vhost_dev_set_owner(d); > > diff --git a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h > > index ab373191..2511954 100644 > > --- a/include/uapi/linux/vhost.h > > +++ b/include/uapi/linux/vhost.h > > @@ -80,7 +80,7 @@ struct vhost_memory { > > * Allows subsequent call to VHOST_OWNER_SET to succeed. */ > > #define VHOST_RESET_OWNER _IO(VHOST_VIRTIO, 0x02) > > > > -/* Set up/modify memory layout */ > > +/* Set up/modify memory layout: see also VHOST_GET_MEM_MAX_NREGIONS below. */ > > #define VHOST_SET_MEM_TABLE _IOW(VHOST_VIRTIO, 0x03, struct vhost_memory) > > > > /* Write logging setup. */ > > @@ -127,6 +127,21 @@ struct vhost_memory { > > /* Set eventfd to signal an error */ > > #define VHOST_SET_VRING_ERR _IOW(VHOST_VIRTIO, 0x22, struct vhost_vring_file) > > > > +/* Query upper limit on nregions in VHOST_SET_MEM_TABLE arguments. > > + * Returns: > > + * 0 < value <= MAX_INT - gives the upper limit, higher values will fail > > + * 0 - there's no static limit: try and see if it works > > + * -1 - on failure > > + */ > > +#define VHOST_GET_MEM_MAX_NREGIONS _IO(VHOST_VIRTIO, 0x23) > > + > > +/* Returned by VHOST_GET_MEM_MAX_NREGIONS to mean there's no static limit: > > + * try and it'll work if you are lucky. */ > > +#define VHOST_MEM_MAX_NREGIONS_NONE 0 > > +/* We support at least as many nregions in VHOST_SET_MEM_TABLE: > > + * for use on legacy kernels without VHOST_GET_MEM_MAX_NREGIONS support. */ > > +#define VHOST_MEM_MAX_NREGIONS_DEFAULT 64 > > + > > /* VHOST_NET specific defines */ > > > > /* Attach virtio net ring to a raw socket, or tap device. > > -- > > 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jul 29, 2015 at 04:53:51PM +0200, Igor Mammedov wrote: > On Wed, 29 Jul 2015 17:43:17 +0300 > "Michael S. Tsirkin" <mst@redhat.com> wrote: > > > On Wed, Jul 29, 2015 at 04:29:22PM +0200, Igor Mammedov wrote: > > > From: "Michael S. Tsirkin" <mst@redhat.com> > > > > > > Userspace currently simply tries to give vhost as many regions > > > as it happens to have, but you only have the mem table > > > when you have initialized a large part of VM, so graceful > > > failure is very hard to support. > > > > > > The result is that userspace tends to fail catastrophically. > > > > > > Instead, add a new ioctl so userspace can find out how much kernel > > > supports, up front. This returns a positive value that we commit to. > > > > > > Also, document our contract with legacy userspace: when running on an > > > old kernel, you get -1 and you can assume at least 64 slots. Since 0 > > > value's left unused, let's make that mean that the current userspace > > > behaviour (trial and error) is required, just in case we want it back. > > > > What's wrong with reading the module parameter value? It's there in > > sysfs ... > for most cases it would work but distro doesn't have to mount > sysfs under /sys If it wants to rewrite all userspace, sure it doesn't. > so it adds to app a burden of discovering > where sysfs is mounted and in what module to look for parameter. > > So IMHO, sysfs is more human oriented interface, > while ioctl is more stable API for apps. I don't think that's right. ioctls only make sense for per-fd info. > > > > > > > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com> > > > Signed-off-by: Igor Mammedov <imammedo@redhat.com> > > > --- > > > drivers/vhost/vhost.c | 7 ++++++- > > > include/uapi/linux/vhost.h | 17 ++++++++++++++++- > > > 2 files changed, 22 insertions(+), 2 deletions(-) > > > > > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c > > > index eec2f11..76dc0cf 100644 > > > --- a/drivers/vhost/vhost.c > > > +++ b/drivers/vhost/vhost.c > > > @@ -30,7 +30,7 @@ > > > > > > #include "vhost.h" > > > > > > -static ushort max_mem_regions = 64; > > > +static ushort max_mem_regions = VHOST_MEM_MAX_NREGIONS_DEFAULT; > > > module_param(max_mem_regions, ushort, 0444); > > > MODULE_PARM_DESC(max_mem_regions, > > > "Maximum number of memory regions in memory map. (default: 64)"); > > > @@ -944,6 +944,11 @@ long vhost_dev_ioctl(struct vhost_dev *d, unsigned int ioctl, void __user *argp) > > > long r; > > > int i, fd; > > > > > > + if (ioctl == VHOST_GET_MEM_MAX_NREGIONS) { > > > + r = max_mem_regions; > > > + goto done; > > > + } > > > + > > > /* If you are not the owner, you can become one */ > > > if (ioctl == VHOST_SET_OWNER) { > > > r = vhost_dev_set_owner(d); > > > diff --git a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h > > > index ab373191..2511954 100644 > > > --- a/include/uapi/linux/vhost.h > > > +++ b/include/uapi/linux/vhost.h > > > @@ -80,7 +80,7 @@ struct vhost_memory { > > > * Allows subsequent call to VHOST_OWNER_SET to succeed. */ > > > #define VHOST_RESET_OWNER _IO(VHOST_VIRTIO, 0x02) > > > > > > -/* Set up/modify memory layout */ > > > +/* Set up/modify memory layout: see also VHOST_GET_MEM_MAX_NREGIONS below. */ > > > #define VHOST_SET_MEM_TABLE _IOW(VHOST_VIRTIO, 0x03, struct vhost_memory) > > > > > > /* Write logging setup. */ > > > @@ -127,6 +127,21 @@ struct vhost_memory { > > > /* Set eventfd to signal an error */ > > > #define VHOST_SET_VRING_ERR _IOW(VHOST_VIRTIO, 0x22, struct vhost_vring_file) > > > > > > +/* Query upper limit on nregions in VHOST_SET_MEM_TABLE arguments. > > > + * Returns: > > > + * 0 < value <= MAX_INT - gives the upper limit, higher values will fail > > > + * 0 - there's no static limit: try and see if it works > > > + * -1 - on failure > > > + */ > > > +#define VHOST_GET_MEM_MAX_NREGIONS _IO(VHOST_VIRTIO, 0x23) > > > + > > > +/* Returned by VHOST_GET_MEM_MAX_NREGIONS to mean there's no static limit: > > > + * try and it'll work if you are lucky. */ > > > +#define VHOST_MEM_MAX_NREGIONS_NONE 0 > > > +/* We support at least as many nregions in VHOST_SET_MEM_TABLE: > > > + * for use on legacy kernels without VHOST_GET_MEM_MAX_NREGIONS support. */ > > > +#define VHOST_MEM_MAX_NREGIONS_DEFAULT 64 > > > + > > > /* VHOST_NET specific defines */ > > > > > > /* Attach virtio net ring to a raw socket, or tap device. > > > -- > > > 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 29/07/2015 16:56, Michael S. Tsirkin wrote: >>>> > > > Also, document our contract with legacy userspace: when running on an >>>> > > > old kernel, you get -1 and you can assume at least 64 slots. Since 0 >>>> > > > value's left unused, let's make that mean that the current userspace >>>> > > > behaviour (trial and error) is required, just in case we want it back. >>> > > >>> > > What's wrong with reading the module parameter value? It's there in >>> > > sysfs ... >> > for most cases it would work but distro doesn't have to mount >> > sysfs under /sys > If it wants to rewrite all userspace, sure it doesn't. I agree, on the other hand it doesn't seem far fetched to have a per-fd maximum in the future. So I think this patch is more future-proof. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jul 29, 2015 at 05:01:43PM +0200, Paolo Bonzini wrote: > > > On 29/07/2015 16:56, Michael S. Tsirkin wrote: > >>>> > > > Also, document our contract with legacy userspace: when running on an > >>>> > > > old kernel, you get -1 and you can assume at least 64 slots. Since 0 > >>>> > > > value's left unused, let's make that mean that the current userspace > >>>> > > > behaviour (trial and error) is required, just in case we want it back. > >>> > > > >>> > > What's wrong with reading the module parameter value? It's there in > >>> > > sysfs ... > >> > for most cases it would work but distro doesn't have to mount > >> > sysfs under /sys > > If it wants to rewrite all userspace, sure it doesn't. > > I agree, on the other hand it doesn't seem far fetched to have a per-fd > maximum in the future. So I think this patch is more future-proof. > > Paolo Possibly, but this calls for some kind of priveledge separation model, where a priveledged app can set the per-fd limit while regular ones can only read it. Sounds very complex. Let's see some of this code first. And I really think there are better ways to future proof it, e.g. teach userspace to do error handling, revert adding a slot if one of the components can't support it.
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index eec2f11..76dc0cf 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -30,7 +30,7 @@ #include "vhost.h" -static ushort max_mem_regions = 64; +static ushort max_mem_regions = VHOST_MEM_MAX_NREGIONS_DEFAULT; module_param(max_mem_regions, ushort, 0444); MODULE_PARM_DESC(max_mem_regions, "Maximum number of memory regions in memory map. (default: 64)"); @@ -944,6 +944,11 @@ long vhost_dev_ioctl(struct vhost_dev *d, unsigned int ioctl, void __user *argp) long r; int i, fd; + if (ioctl == VHOST_GET_MEM_MAX_NREGIONS) { + r = max_mem_regions; + goto done; + } + /* If you are not the owner, you can become one */ if (ioctl == VHOST_SET_OWNER) { r = vhost_dev_set_owner(d); diff --git a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h index ab373191..2511954 100644 --- a/include/uapi/linux/vhost.h +++ b/include/uapi/linux/vhost.h @@ -80,7 +80,7 @@ struct vhost_memory { * Allows subsequent call to VHOST_OWNER_SET to succeed. */ #define VHOST_RESET_OWNER _IO(VHOST_VIRTIO, 0x02) -/* Set up/modify memory layout */ +/* Set up/modify memory layout: see also VHOST_GET_MEM_MAX_NREGIONS below. */ #define VHOST_SET_MEM_TABLE _IOW(VHOST_VIRTIO, 0x03, struct vhost_memory) /* Write logging setup. */ @@ -127,6 +127,21 @@ struct vhost_memory { /* Set eventfd to signal an error */ #define VHOST_SET_VRING_ERR _IOW(VHOST_VIRTIO, 0x22, struct vhost_vring_file) +/* Query upper limit on nregions in VHOST_SET_MEM_TABLE arguments. + * Returns: + * 0 < value <= MAX_INT - gives the upper limit, higher values will fail + * 0 - there's no static limit: try and see if it works + * -1 - on failure + */ +#define VHOST_GET_MEM_MAX_NREGIONS _IO(VHOST_VIRTIO, 0x23) + +/* Returned by VHOST_GET_MEM_MAX_NREGIONS to mean there's no static limit: + * try and it'll work if you are lucky. */ +#define VHOST_MEM_MAX_NREGIONS_NONE 0 +/* We support at least as many nregions in VHOST_SET_MEM_TABLE: + * for use on legacy kernels without VHOST_GET_MEM_MAX_NREGIONS support. */ +#define VHOST_MEM_MAX_NREGIONS_DEFAULT 64 + /* VHOST_NET specific defines */ /* Attach virtio net ring to a raw socket, or tap device.