diff mbox series

block/nvme: Fix VFIO_MAP_DMA failed: No space left on device

Message ID 20210611114606.320008-1-philmd@redhat.com (mailing list archive)
State New, archived
Headers show
Series block/nvme: Fix VFIO_MAP_DMA failed: No space left on device | expand

Commit Message

Philippe Mathieu-Daudé June 11, 2021, 11:46 a.m. UTC
When the NVMe block driver was introduced (see commit bdd6a90a9e5,
January 2018), Linux VFIO_IOMMU_MAP_DMA ioctl was only returning
-ENOMEM in case of error. The driver was correctly handling the
error path to recycle its volatile IOVA mappings.

To fix CVE-2019-3882, Linux commit 492855939bdb ("vfio/type1: Limit
DMA mappings per container", April 2019) added the -ENOSPC error to
signal the user exhausted the DMA mappings available for a container.

The block driver started to mis-behave:

  qemu-system-x86_64: VFIO_MAP_DMA failed: No space left on device
  (qemu)
  (qemu) info status
  VM status: paused (io-error)
  (qemu) c
  VFIO_MAP_DMA failed: No space left on device
  qemu-system-x86_64: block/block-backend.c:1968: blk_get_aio_context: Assertion `ctx == blk->ctx' failed.

Fix by handling the -ENOSPC error when DMA mappings are exhausted;
other errors (such -ENOMEM) are still handled later in the same
function.

An easy way to reproduce this bug is to restrict the DMA mapping
limit (65535 by default) when loading the VFIO IOMMU module:

  # modprobe vfio_iommu_type1 dma_entry_limit=666

Cc: qemu-stable@nongnu.org
Reported-by: Michal Prívozník <mprivozn@redhat.com>
Fixes: bdd6a90a9e5 ("block: Add VFIO based NVMe driver")
Buglink: https://bugs.launchpad.net/qemu/+bug/1863333
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/65
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
---
Michal, is it still possible for you to test this (old bug)?

A functional test using viommu & nested VM is planned (suggested by
Stefan and Maxim).
---
 block/nvme.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Michal Privoznik June 14, 2021, 3:25 p.m. UTC | #1
On 6/11/21 1:46 PM, Philippe Mathieu-Daudé wrote:
> When the NVMe block driver was introduced (see commit bdd6a90a9e5,
> January 2018), Linux VFIO_IOMMU_MAP_DMA ioctl was only returning
> -ENOMEM in case of error. The driver was correctly handling the
> error path to recycle its volatile IOVA mappings.
> 
> To fix CVE-2019-3882, Linux commit 492855939bdb ("vfio/type1: Limit
> DMA mappings per container", April 2019) added the -ENOSPC error to
> signal the user exhausted the DMA mappings available for a container.
> 
> The block driver started to mis-behave:
> 
>   qemu-system-x86_64: VFIO_MAP_DMA failed: No space left on device
>   (qemu)
>   (qemu) info status
>   VM status: paused (io-error)
>   (qemu) c
>   VFIO_MAP_DMA failed: No space left on device
>   qemu-system-x86_64: block/block-backend.c:1968: blk_get_aio_context: Assertion `ctx == blk->ctx' failed.
> 
> Fix by handling the -ENOSPC error when DMA mappings are exhausted;
> other errors (such -ENOMEM) are still handled later in the same
> function.
> 
> An easy way to reproduce this bug is to restrict the DMA mapping
> limit (65535 by default) when loading the VFIO IOMMU module:
> 
>   # modprobe vfio_iommu_type1 dma_entry_limit=666
> 
> Cc: qemu-stable@nongnu.org
> Reported-by: Michal Prívozník <mprivozn@redhat.com>
> Fixes: bdd6a90a9e5 ("block: Add VFIO based NVMe driver")
> Buglink: https://bugs.launchpad.net/qemu/+bug/1863333
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/65
> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
> ---
> Michal, is it still possible for you to test this (old bug)?
> 

Unfortunately I no longer have access to the machine. But, IIRC it was
fairly easy to reproduce - just passthrough any NVMe disk using NVMe
disk backend (-blockdev '{"driver":"nvme", ...).

Sorry,
Michal
Philippe Mathieu-Daudé June 14, 2021, 4:03 p.m. UTC | #2
On 6/11/21 1:46 PM, Philippe Mathieu-Daudé wrote:
> When the NVMe block driver was introduced (see commit bdd6a90a9e5,
> January 2018), Linux VFIO_IOMMU_MAP_DMA ioctl was only returning
> -ENOMEM in case of error. The driver was correctly handling the
> error path to recycle its volatile IOVA mappings.
> 
> To fix CVE-2019-3882, Linux commit 492855939bdb ("vfio/type1: Limit
> DMA mappings per container", April 2019) added the -ENOSPC error to
> signal the user exhausted the DMA mappings available for a container.

Hmm this commit has been added before v5.1-rc4.

So while this fixes the behavior of v5.1-rc4+ kernels,
older kernels using this fix will have the same problem...

Should I check uname(2)'s utsname.release[]? Is it reliable?

> The block driver started to mis-behave:
> 
>   qemu-system-x86_64: VFIO_MAP_DMA failed: No space left on device
>   (qemu)
>   (qemu) info status
>   VM status: paused (io-error)
>   (qemu) c
>   VFIO_MAP_DMA failed: No space left on device
>   qemu-system-x86_64: block/block-backend.c:1968: blk_get_aio_context: Assertion `ctx == blk->ctx' failed.
> 
> Fix by handling the -ENOSPC error when DMA mappings are exhausted;
> other errors (such -ENOMEM) are still handled later in the same
> function.
> 
> An easy way to reproduce this bug is to restrict the DMA mapping
> limit (65535 by default) when loading the VFIO IOMMU module:
> 
>   # modprobe vfio_iommu_type1 dma_entry_limit=666
> 
> Cc: qemu-stable@nongnu.org
> Reported-by: Michal Prívozník <mprivozn@redhat.com>
> Fixes: bdd6a90a9e5 ("block: Add VFIO based NVMe driver")
> Buglink: https://bugs.launchpad.net/qemu/+bug/1863333
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/65
> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
> ---
> Michal, is it still possible for you to test this (old bug)?
> 
> A functional test using viommu & nested VM is planned (suggested by
> Stefan and Maxim).
> ---
>  block/nvme.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/block/nvme.c b/block/nvme.c
> index 2b5421e7aa6..12f9dd5cce3 100644
> --- a/block/nvme.c
> +++ b/block/nvme.c
> @@ -1030,7 +1030,7 @@ try_map:
>          r = qemu_vfio_dma_map(s->vfio,
>                                qiov->iov[i].iov_base,
>                                len, true, &iova);
> -        if (r == -ENOMEM && retry) {
> +        if (r == -ENOSPC && retry) {
>              retry = false;
>              trace_nvme_dma_flush_queue_wait(s);
>              if (s->dma_map_count) {
>
Michal Privoznik June 14, 2021, 5:58 p.m. UTC | #3
On 6/14/21 6:03 PM, Philippe Mathieu-Daudé wrote:
> On 6/11/21 1:46 PM, Philippe Mathieu-Daudé wrote:
>> When the NVMe block driver was introduced (see commit bdd6a90a9e5,
>> January 2018), Linux VFIO_IOMMU_MAP_DMA ioctl was only returning
>> -ENOMEM in case of error. The driver was correctly handling the
>> error path to recycle its volatile IOVA mappings.
>>
>> To fix CVE-2019-3882, Linux commit 492855939bdb ("vfio/type1: Limit
>> DMA mappings per container", April 2019) added the -ENOSPC error to
>> signal the user exhausted the DMA mappings available for a container.
> 
> Hmm this commit has been added before v5.1-rc4.
> 
> So while this fixes the behavior of v5.1-rc4+ kernels,
> older kernels using this fix will have the same problem...
> 
> Should I check uname(2)'s utsname.release[]? Is it reliable?

Not at all. What if somebody runs kernel with that commit backported? I
would leave this up to distro maintainers to figure out. They know what
kernel patches are backported and thus what qemu patches should be
backported too. I'm wondering if there's a way we can help them by
letting know (other than mentioning the kernel commit).

Michal
Maxim Levitsky June 17, 2021, 12:40 p.m. UTC | #4
On Mon, 2021-06-14 at 18:03 +0200, Philippe Mathieu-Daudé wrote:
> On 6/11/21 1:46 PM, Philippe Mathieu-Daudé wrote:
> > When the NVMe block driver was introduced (see commit bdd6a90a9e5,
> > January 2018), Linux VFIO_IOMMU_MAP_DMA ioctl was only returning
> > -ENOMEM in case of error. The driver was correctly handling the
> > error path to recycle its volatile IOVA mappings.
> > 
> > To fix CVE-2019-3882, Linux commit 492855939bdb ("vfio/type1: Limit
> > DMA mappings per container", April 2019) added the -ENOSPC error to
> > signal the user exhausted the DMA mappings available for a container.
> 
> Hmm this commit has been added before v5.1-rc4.
> 
> So while this fixes the behavior of v5.1-rc4+ kernels,
> older kernels using this fix will have the same problem...


Hi!

I wonder why not to check for both -ENOMEM and -ENOSPC
and recycle the mappings in both cases?

I think that would work on both old and new kernels.

What do you think?

Best regards,
	Maxim Levitsky

> 
> Should I check uname(2)'s utsname.release[]? Is it reliable?
> 
> > The block driver started to mis-behave:
> > 
> >   qemu-system-x86_64: VFIO_MAP_DMA failed: No space left on device
> >   (qemu)
> >   (qemu) info status
> >   VM status: paused (io-error)
> >   (qemu) c
> >   VFIO_MAP_DMA failed: No space left on device
> >   qemu-system-x86_64: block/block-backend.c:1968: blk_get_aio_context: Assertion `ctx == blk->ctx' failed.
> > 
> > Fix by handling the -ENOSPC error when DMA mappings are exhausted;
> > other errors (such -ENOMEM) are still handled later in the same
> > function.
> > 
> > An easy way to reproduce this bug is to restrict the DMA mapping
> > limit (65535 by default) when loading the VFIO IOMMU module:
> > 
> >   # modprobe vfio_iommu_type1 dma_entry_limit=666
> > 
> > Cc: qemu-stable@nongnu.org
> > Reported-by: Michal Prívozník <mprivozn@redhat.com>
> > Fixes: bdd6a90a9e5 ("block: Add VFIO based NVMe driver")
> > Buglink: https://bugs.launchpad.net/qemu/+bug/1863333
> > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/65
> > Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
> > ---
> > Michal, is it still possible for you to test this (old bug)?
> > 
> > A functional test using viommu & nested VM is planned (suggested by
> > Stefan and Maxim).
> > ---
> >  block/nvme.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/block/nvme.c b/block/nvme.c
> > index 2b5421e7aa6..12f9dd5cce3 100644
> > --- a/block/nvme.c
> > +++ b/block/nvme.c
> > @@ -1030,7 +1030,7 @@ try_map:
> >          r = qemu_vfio_dma_map(s->vfio,
> >                                qiov->iov[i].iov_base,
> >                                len, true, &iova);
> > -        if (r == -ENOMEM && retry) {
> > +        if (r == -ENOSPC && retry) {
> >              retry = false;
> >              trace_nvme_dma_flush_queue_wait(s);
> >              if (s->dma_map_count) {
> >
Philippe Mathieu-Daudé June 17, 2021, 3:48 p.m. UTC | #5
On 6/17/21 2:40 PM, Maxim Levitsky wrote:
> On Mon, 2021-06-14 at 18:03 +0200, Philippe Mathieu-Daudé wrote:
>> On 6/11/21 1:46 PM, Philippe Mathieu-Daudé wrote:
>>> When the NVMe block driver was introduced (see commit bdd6a90a9e5,
>>> January 2018), Linux VFIO_IOMMU_MAP_DMA ioctl was only returning
>>> -ENOMEM in case of error. The driver was correctly handling the
>>> error path to recycle its volatile IOVA mappings.
>>>
>>> To fix CVE-2019-3882, Linux commit 492855939bdb ("vfio/type1: Limit
>>> DMA mappings per container", April 2019) added the -ENOSPC error to
>>> signal the user exhausted the DMA mappings available for a container.
>>
>> Hmm this commit has been added before v5.1-rc4.
>>
>> So while this fixes the behavior of v5.1-rc4+ kernels,
>> older kernels using this fix will have the same problem...
> 
> 
> Hi!
> 
> I wonder why not to check for both -ENOMEM and -ENOSPC
> and recycle the mappings in both cases?
> 
> I think that would work on both old and new kernels.
> 
> What do you think?

Yes, worst case we retry one more time for nothing.

Alex suggested to use VFIO_IOMMU_GET_INFO to get the limit
with VFIO_IOMMU_TYPE1_INFO_DMA_AVAIL, and if not available
the driver should use a reasonable value to limit itself.

I'll try to get it quick, otherwise fall back to your "dual
errno" case.

Thanks both for the ideas,

Phil.

>>> diff --git a/block/nvme.c b/block/nvme.c
>>> index 2b5421e7aa6..12f9dd5cce3 100644
>>> --- a/block/nvme.c
>>> +++ b/block/nvme.c
>>> @@ -1030,7 +1030,7 @@ try_map:
>>>          r = qemu_vfio_dma_map(s->vfio,
>>>                                qiov->iov[i].iov_base,
>>>                                len, true, &iova);
>>> -        if (r == -ENOMEM && retry) {
>>> +        if (r == -ENOSPC && retry) {
>>>              retry = false;
>>>              trace_nvme_dma_flush_queue_wait(s);
>>>              if (s->dma_map_count) {
>>>
diff mbox series

Patch

diff --git a/block/nvme.c b/block/nvme.c
index 2b5421e7aa6..12f9dd5cce3 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -1030,7 +1030,7 @@  try_map:
         r = qemu_vfio_dma_map(s->vfio,
                               qiov->iov[i].iov_base,
                               len, true, &iova);
-        if (r == -ENOMEM && retry) {
+        if (r == -ENOSPC && retry) {
             retry = false;
             trace_nvme_dma_flush_queue_wait(s);
             if (s->dma_map_count) {