Message ID | 20170403085822.13863-1-slp@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Mon, 3 Apr 2017 10:58:22 +0200 Sergio Lopez <slp@redhat.com> wrote: > When quickly unmapping and mapping memory regions (as may happen in > address_space_update_topology), if running with a non-unlimited > RLIMIT_MEMLOCK, the kernel may return ENOMEM for a map request > because the previous unmap has been processed, but accounted yet. > > Probably this should be fixed in the kernel ensuring a deterministic > behavior for VFIO map and unmap operations. Until then, this works > around the issue, waiting 10ms and trying again. I think we need to know what that kernel fix is before adding arbitrary delays and retries in userspace code (Do we know why 10ms works? Is it too long/short?). I think I have a test program that reproduces this, I setup vfio and allocate two 4k buffers, one for mapping through vfio and one for mlocking. I clone(2) the process with CLONE_VM and the clone loops doing mlock/munlock while the main thread does map/unmap. This fails in a fraction of a second while running either independently works well. Still investigating. Thanks, Alex > Signed-off-by: Sergio Lopez <slp@redhat.com> > --- > hw/vfio/common.c | 31 +++++++++++++++++++++++-------- > 1 file changed, 23 insertions(+), 8 deletions(-) > > diff --git a/hw/vfio/common.c b/hw/vfio/common.c > index f3ba9b9..db41fa5 100644 > --- a/hw/vfio/common.c > +++ b/hw/vfio/common.c > @@ -228,17 +228,32 @@ static int vfio_dma_map(VFIOContainer *container, hwaddr iova, > map.flags |= VFIO_DMA_MAP_FLAG_WRITE; > } > > - /* > - * Try the mapping, if it fails with EBUSY, unmap the region and try > - * again. This shouldn't be necessary, but we sometimes see it in > - * the VGA ROM space. > - */ > - if (ioctl(container->fd, VFIO_IOMMU_MAP_DMA, &map) == 0 || > - (errno == EBUSY && vfio_dma_unmap(container, iova, size) == 0 && > - ioctl(container->fd, VFIO_IOMMU_MAP_DMA, &map) == 0)) { > + if (ioctl(container->fd, VFIO_IOMMU_MAP_DMA, &map) == 0) { > return 0; > } > > + if (errno == ENOMEM) { > + /* > + * When quickly unmapping and mapping ranges, the kernel may > + * return ENOMEM for a map request because the previous unmap > + * has not been accounted yet. Wait a bit and try again. > + */ > + usleep(10 * 1000); > + if (ioctl(container->fd, VFIO_IOMMU_MAP_DMA, &map) == 0) { > + return 0; > + } > + } else if (errno == EBUSY) { > + /* > + * If mapping fails with EBUSY, unmap the region and try again. > + * This shouldn't be necessary, but we sometimes see it in the > + * VGA ROM space. > + */ > + if (vfio_dma_unmap(container, iova, size) == 0 && > + ioctl(container->fd, VFIO_IOMMU_MAP_DMA, &map) == 0) { > + return 0; > + } > + } > + > error_report("VFIO_MAP_DMA: %d", -errno); > return -errno; > }
On Mon, Apr 3, 2017 at 5:40 PM, Alex Williamson <alex.williamson@redhat.com> wrote: > > On Mon, 3 Apr 2017 10:58:22 +0200 > Sergio Lopez <slp@redhat.com> wrote: > > > When quickly unmapping and mapping memory regions (as may happen in > > address_space_update_topology), if running with a non-unlimited > > RLIMIT_MEMLOCK, the kernel may return ENOMEM for a map request > > because the previous unmap has been processed, but accounted yet. > > > > Probably this should be fixed in the kernel ensuring a deterministic > > behavior for VFIO map and unmap operations. Until then, this works > > around the issue, waiting 10ms and trying again. > > I think we need to know what that kernel fix is before adding arbitrary > delays and retries in userspace code (Do we know why 10ms works? Is > it too long/short?). AFAIK from userspace we can't know when a certain work scheduled in a kernel workqueue has been completed. Calling usleep ensures the process will yield, and 10ms looks enough time for a full world of context switches, but I agree with you that's pretty arbitrary. On the other hand, this code is only reached in a pretty exceptional situation, which is not relevant from a performance point of view, and there's already a workaround for a non-deterministic EBUSY while mapping VGA ROM space. There's the option of leaving this as is, and waiting for a fix in the kernel, but I think I'd a good idea to work around the issue for older kernels too. Sergio.
diff --git a/hw/vfio/common.c b/hw/vfio/common.c index f3ba9b9..db41fa5 100644 --- a/hw/vfio/common.c +++ b/hw/vfio/common.c @@ -228,17 +228,32 @@ static int vfio_dma_map(VFIOContainer *container, hwaddr iova, map.flags |= VFIO_DMA_MAP_FLAG_WRITE; } - /* - * Try the mapping, if it fails with EBUSY, unmap the region and try - * again. This shouldn't be necessary, but we sometimes see it in - * the VGA ROM space. - */ - if (ioctl(container->fd, VFIO_IOMMU_MAP_DMA, &map) == 0 || - (errno == EBUSY && vfio_dma_unmap(container, iova, size) == 0 && - ioctl(container->fd, VFIO_IOMMU_MAP_DMA, &map) == 0)) { + if (ioctl(container->fd, VFIO_IOMMU_MAP_DMA, &map) == 0) { return 0; } + if (errno == ENOMEM) { + /* + * When quickly unmapping and mapping ranges, the kernel may + * return ENOMEM for a map request because the previous unmap + * has not been accounted yet. Wait a bit and try again. + */ + usleep(10 * 1000); + if (ioctl(container->fd, VFIO_IOMMU_MAP_DMA, &map) == 0) { + return 0; + } + } else if (errno == EBUSY) { + /* + * If mapping fails with EBUSY, unmap the region and try again. + * This shouldn't be necessary, but we sometimes see it in the + * VGA ROM space. + */ + if (vfio_dma_unmap(container, iova, size) == 0 && + ioctl(container->fd, VFIO_IOMMU_MAP_DMA, &map) == 0) { + return 0; + } + } + error_report("VFIO_MAP_DMA: %d", -errno); return -errno; }
When quickly unmapping and mapping memory regions (as may happen in address_space_update_topology), if running with a non-unlimited RLIMIT_MEMLOCK, the kernel may return ENOMEM for a map request because the previous unmap has been processed, but accounted yet. Probably this should be fixed in the kernel ensuring a deterministic behavior for VFIO map and unmap operations. Until then, this works around the issue, waiting 10ms and trying again. Signed-off-by: Sergio Lopez <slp@redhat.com> --- hw/vfio/common.c | 31 +++++++++++++++++++++++-------- 1 file changed, 23 insertions(+), 8 deletions(-)