Message ID | 20210611114606.320008-1-philmd@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | block/nvme: Fix VFIO_MAP_DMA failed: No space left on device | expand |
On 6/11/21 1:46 PM, Philippe Mathieu-Daudé wrote: > When the NVMe block driver was introduced (see commit bdd6a90a9e5, > January 2018), Linux VFIO_IOMMU_MAP_DMA ioctl was only returning > -ENOMEM in case of error. The driver was correctly handling the > error path to recycle its volatile IOVA mappings. > > To fix CVE-2019-3882, Linux commit 492855939bdb ("vfio/type1: Limit > DMA mappings per container", April 2019) added the -ENOSPC error to > signal the user exhausted the DMA mappings available for a container. > > The block driver started to mis-behave: > > qemu-system-x86_64: VFIO_MAP_DMA failed: No space left on device > (qemu) > (qemu) info status > VM status: paused (io-error) > (qemu) c > VFIO_MAP_DMA failed: No space left on device > qemu-system-x86_64: block/block-backend.c:1968: blk_get_aio_context: Assertion `ctx == blk->ctx' failed. > > Fix by handling the -ENOSPC error when DMA mappings are exhausted; > other errors (such -ENOMEM) are still handled later in the same > function. > > An easy way to reproduce this bug is to restrict the DMA mapping > limit (65535 by default) when loading the VFIO IOMMU module: > > # modprobe vfio_iommu_type1 dma_entry_limit=666 > > Cc: qemu-stable@nongnu.org > Reported-by: Michal Prívozník <mprivozn@redhat.com> > Fixes: bdd6a90a9e5 ("block: Add VFIO based NVMe driver") > Buglink: https://bugs.launchpad.net/qemu/+bug/1863333 > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/65 > Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> > --- > Michal, is it still possible for you to test this (old bug)? > Unfortunately I no longer have access to the machine. But, IIRC it was fairly easy to reproduce - just passthrough any NVMe disk using NVMe disk backend (-blockdev '{"driver":"nvme", ...). Sorry, Michal
On 6/11/21 1:46 PM, Philippe Mathieu-Daudé wrote: > When the NVMe block driver was introduced (see commit bdd6a90a9e5, > January 2018), Linux VFIO_IOMMU_MAP_DMA ioctl was only returning > -ENOMEM in case of error. The driver was correctly handling the > error path to recycle its volatile IOVA mappings. > > To fix CVE-2019-3882, Linux commit 492855939bdb ("vfio/type1: Limit > DMA mappings per container", April 2019) added the -ENOSPC error to > signal the user exhausted the DMA mappings available for a container. Hmm this commit has been added before v5.1-rc4. So while this fixes the behavior of v5.1-rc4+ kernels, older kernels using this fix will have the same problem... Should I check uname(2)'s utsname.release[]? Is it reliable? > The block driver started to mis-behave: > > qemu-system-x86_64: VFIO_MAP_DMA failed: No space left on device > (qemu) > (qemu) info status > VM status: paused (io-error) > (qemu) c > VFIO_MAP_DMA failed: No space left on device > qemu-system-x86_64: block/block-backend.c:1968: blk_get_aio_context: Assertion `ctx == blk->ctx' failed. > > Fix by handling the -ENOSPC error when DMA mappings are exhausted; > other errors (such -ENOMEM) are still handled later in the same > function. > > An easy way to reproduce this bug is to restrict the DMA mapping > limit (65535 by default) when loading the VFIO IOMMU module: > > # modprobe vfio_iommu_type1 dma_entry_limit=666 > > Cc: qemu-stable@nongnu.org > Reported-by: Michal Prívozník <mprivozn@redhat.com> > Fixes: bdd6a90a9e5 ("block: Add VFIO based NVMe driver") > Buglink: https://bugs.launchpad.net/qemu/+bug/1863333 > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/65 > Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> > --- > Michal, is it still possible for you to test this (old bug)? > > A functional test using viommu & nested VM is planned (suggested by > Stefan and Maxim). > --- > block/nvme.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/block/nvme.c b/block/nvme.c > index 2b5421e7aa6..12f9dd5cce3 100644 > --- a/block/nvme.c > +++ b/block/nvme.c > @@ -1030,7 +1030,7 @@ try_map: > r = qemu_vfio_dma_map(s->vfio, > qiov->iov[i].iov_base, > len, true, &iova); > - if (r == -ENOMEM && retry) { > + if (r == -ENOSPC && retry) { > retry = false; > trace_nvme_dma_flush_queue_wait(s); > if (s->dma_map_count) { >
On 6/14/21 6:03 PM, Philippe Mathieu-Daudé wrote: > On 6/11/21 1:46 PM, Philippe Mathieu-Daudé wrote: >> When the NVMe block driver was introduced (see commit bdd6a90a9e5, >> January 2018), Linux VFIO_IOMMU_MAP_DMA ioctl was only returning >> -ENOMEM in case of error. The driver was correctly handling the >> error path to recycle its volatile IOVA mappings. >> >> To fix CVE-2019-3882, Linux commit 492855939bdb ("vfio/type1: Limit >> DMA mappings per container", April 2019) added the -ENOSPC error to >> signal the user exhausted the DMA mappings available for a container. > > Hmm this commit has been added before v5.1-rc4. > > So while this fixes the behavior of v5.1-rc4+ kernels, > older kernels using this fix will have the same problem... > > Should I check uname(2)'s utsname.release[]? Is it reliable? Not at all. What if somebody runs kernel with that commit backported? I would leave this up to distro maintainers to figure out. They know what kernel patches are backported and thus what qemu patches should be backported too. I'm wondering if there's a way we can help them by letting know (other than mentioning the kernel commit). Michal
On Mon, 2021-06-14 at 18:03 +0200, Philippe Mathieu-Daudé wrote: > On 6/11/21 1:46 PM, Philippe Mathieu-Daudé wrote: > > When the NVMe block driver was introduced (see commit bdd6a90a9e5, > > January 2018), Linux VFIO_IOMMU_MAP_DMA ioctl was only returning > > -ENOMEM in case of error. The driver was correctly handling the > > error path to recycle its volatile IOVA mappings. > > > > To fix CVE-2019-3882, Linux commit 492855939bdb ("vfio/type1: Limit > > DMA mappings per container", April 2019) added the -ENOSPC error to > > signal the user exhausted the DMA mappings available for a container. > > Hmm this commit has been added before v5.1-rc4. > > So while this fixes the behavior of v5.1-rc4+ kernels, > older kernels using this fix will have the same problem... Hi! I wonder why not to check for both -ENOMEM and -ENOSPC and recycle the mappings in both cases? I think that would work on both old and new kernels. What do you think? Best regards, Maxim Levitsky > > Should I check uname(2)'s utsname.release[]? Is it reliable? > > > The block driver started to mis-behave: > > > > qemu-system-x86_64: VFIO_MAP_DMA failed: No space left on device > > (qemu) > > (qemu) info status > > VM status: paused (io-error) > > (qemu) c > > VFIO_MAP_DMA failed: No space left on device > > qemu-system-x86_64: block/block-backend.c:1968: blk_get_aio_context: Assertion `ctx == blk->ctx' failed. > > > > Fix by handling the -ENOSPC error when DMA mappings are exhausted; > > other errors (such -ENOMEM) are still handled later in the same > > function. > > > > An easy way to reproduce this bug is to restrict the DMA mapping > > limit (65535 by default) when loading the VFIO IOMMU module: > > > > # modprobe vfio_iommu_type1 dma_entry_limit=666 > > > > Cc: qemu-stable@nongnu.org > > Reported-by: Michal Prívozník <mprivozn@redhat.com> > > Fixes: bdd6a90a9e5 ("block: Add VFIO based NVMe driver") > > Buglink: https://bugs.launchpad.net/qemu/+bug/1863333 > > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/65 > > Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> > > --- > > Michal, is it still possible for you to test this (old bug)? > > > > A functional test using viommu & nested VM is planned (suggested by > > Stefan and Maxim). > > --- > > block/nvme.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/block/nvme.c b/block/nvme.c > > index 2b5421e7aa6..12f9dd5cce3 100644 > > --- a/block/nvme.c > > +++ b/block/nvme.c > > @@ -1030,7 +1030,7 @@ try_map: > > r = qemu_vfio_dma_map(s->vfio, > > qiov->iov[i].iov_base, > > len, true, &iova); > > - if (r == -ENOMEM && retry) { > > + if (r == -ENOSPC && retry) { > > retry = false; > > trace_nvme_dma_flush_queue_wait(s); > > if (s->dma_map_count) { > >
On 6/17/21 2:40 PM, Maxim Levitsky wrote: > On Mon, 2021-06-14 at 18:03 +0200, Philippe Mathieu-Daudé wrote: >> On 6/11/21 1:46 PM, Philippe Mathieu-Daudé wrote: >>> When the NVMe block driver was introduced (see commit bdd6a90a9e5, >>> January 2018), Linux VFIO_IOMMU_MAP_DMA ioctl was only returning >>> -ENOMEM in case of error. The driver was correctly handling the >>> error path to recycle its volatile IOVA mappings. >>> >>> To fix CVE-2019-3882, Linux commit 492855939bdb ("vfio/type1: Limit >>> DMA mappings per container", April 2019) added the -ENOSPC error to >>> signal the user exhausted the DMA mappings available for a container. >> >> Hmm this commit has been added before v5.1-rc4. >> >> So while this fixes the behavior of v5.1-rc4+ kernels, >> older kernels using this fix will have the same problem... > > > Hi! > > I wonder why not to check for both -ENOMEM and -ENOSPC > and recycle the mappings in both cases? > > I think that would work on both old and new kernels. > > What do you think? Yes, worst case we retry one more time for nothing. Alex suggested to use VFIO_IOMMU_GET_INFO to get the limit with VFIO_IOMMU_TYPE1_INFO_DMA_AVAIL, and if not available the driver should use a reasonable value to limit itself. I'll try to get it quick, otherwise fall back to your "dual errno" case. Thanks both for the ideas, Phil. >>> diff --git a/block/nvme.c b/block/nvme.c >>> index 2b5421e7aa6..12f9dd5cce3 100644 >>> --- a/block/nvme.c >>> +++ b/block/nvme.c >>> @@ -1030,7 +1030,7 @@ try_map: >>> r = qemu_vfio_dma_map(s->vfio, >>> qiov->iov[i].iov_base, >>> len, true, &iova); >>> - if (r == -ENOMEM && retry) { >>> + if (r == -ENOSPC && retry) { >>> retry = false; >>> trace_nvme_dma_flush_queue_wait(s); >>> if (s->dma_map_count) { >>>
diff --git a/block/nvme.c b/block/nvme.c index 2b5421e7aa6..12f9dd5cce3 100644 --- a/block/nvme.c +++ b/block/nvme.c @@ -1030,7 +1030,7 @@ try_map: r = qemu_vfio_dma_map(s->vfio, qiov->iov[i].iov_base, len, true, &iova); - if (r == -ENOMEM && retry) { + if (r == -ENOSPC && retry) { retry = false; trace_nvme_dma_flush_queue_wait(s); if (s->dma_map_count) {
When the NVMe block driver was introduced (see commit bdd6a90a9e5, January 2018), Linux VFIO_IOMMU_MAP_DMA ioctl was only returning -ENOMEM in case of error. The driver was correctly handling the error path to recycle its volatile IOVA mappings. To fix CVE-2019-3882, Linux commit 492855939bdb ("vfio/type1: Limit DMA mappings per container", April 2019) added the -ENOSPC error to signal the user exhausted the DMA mappings available for a container. The block driver started to mis-behave: qemu-system-x86_64: VFIO_MAP_DMA failed: No space left on device (qemu) (qemu) info status VM status: paused (io-error) (qemu) c VFIO_MAP_DMA failed: No space left on device qemu-system-x86_64: block/block-backend.c:1968: blk_get_aio_context: Assertion `ctx == blk->ctx' failed. Fix by handling the -ENOSPC error when DMA mappings are exhausted; other errors (such -ENOMEM) are still handled later in the same function. An easy way to reproduce this bug is to restrict the DMA mapping limit (65535 by default) when loading the VFIO IOMMU module: # modprobe vfio_iommu_type1 dma_entry_limit=666 Cc: qemu-stable@nongnu.org Reported-by: Michal Prívozník <mprivozn@redhat.com> Fixes: bdd6a90a9e5 ("block: Add VFIO based NVMe driver") Buglink: https://bugs.launchpad.net/qemu/+bug/1863333 Resolves: https://gitlab.com/qemu-project/qemu/-/issues/65 Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> --- Michal, is it still possible for you to test this (old bug)? A functional test using viommu & nested VM is planned (suggested by Stefan and Maxim). --- block/nvme.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)