diff mbox series

[RFC] memory: Fix dma-reentrancy issues at the MMIO level

Message ID 20211217030858.834822-1-alxndr@bu.edu (mailing list archive)
State New, archived
Headers show
Series [RFC] memory: Fix dma-reentrancy issues at the MMIO level | expand

Commit Message

Alexander Bulekov Dec. 17, 2021, 3:08 a.m. UTC
Here's my shot at fixing dma-reentracy issues. This patch adds a flag to
the DeviceState, which is set/checked when we call an accessor
associated with the device's IO MRs.

The problem, in short, as I understand it: For the vast majority of
cases, we want to prevent a device from accessing it's own PIO/MMIO
regions over DMA.

This patch/solution is based on some assumptions:
1. DMA accesses that hit mmio regions are only dangerous if they end up
interacting with memory-regions belonging to the device initiating the
DMA.
Not dangerous:  sdhci_pio->dma_write->e1000_mmio
Dangerous:      sdhci_pio->dma_write->sdhci_mmio

2. Most devices do not interact with their own PIO/MMIO memory-regions
using DMA.

3. There is no way for there to be multiple simultaneous accesses to a
device's PIO/MMIO memory-regions.

4. All devices are QOMified :-)

With this patch, I wasn't able to reproduce the issues being tracked
here, with QTest reproducers:
https://gitlab.com/qemu-project/qemu/-/issues/556

This passes the i386 qos/qtests for me and I was able to boot some linux/windows
VMs with basic devices configured, without any apparent problems.

Cc: Philippe Mathieu-Daudé <philmd@redhat.com>
Cc: Mauro Matteo Cascella <mcascell@redhat.com>
Cc: Qiuhao Li <Qiuhao.Li@outlook.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: Li Qiang <liq3ea@gmail.com>
Cc: Thomas Huth <thuth@redhat.com>
Cc: Laurent Vivier <lvivier@redhat.com>
Cc: Bandan Das <bsd@redhat.com>
Cc: Edgar E. Iglesias <edgar.iglesias@gmail.com>
Cc: Darren Kenny <darren.kenny@oracle.com>
Cc: Bin Meng <bin.meng@windriver.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>

Signed-off-by: Alexander Bulekov <alxndr@bu.edu>
---
 include/hw/qdev-core.h |  1 +
 softmmu/memory.c       | 15 +++++++++++++++
 softmmu/trace-events   |  1 +
 3 files changed, 17 insertions(+)

Comments

Qiuhao Li Dec. 17, 2021, 6:27 a.m. UTC | #1
Thanks Alex. It seems this patch sets and checks if the destination device is busy. But how about the data transfers not triggered directly by PMIO/MMIO handlers? For example:

1. Device A Timer's callback -> Device A MMIO handler
2. Device A BH's callback -> Device A MMIO handler

In these situations, when A launches a DMA to itself, the dev->engaged_in_direct_io is not set, so the operation is allowed. Maybe we should log the source and check the destination when we launch data transfers. Is there a way to do that?

Below is a reproducer in NVMe which triggers DMA in a timer's callback (nvme_process_sq). I can still trigger use-after-free exception with this patch on qemu-6.1.0:

cat << EOF | ./qemu-system-x86_64 -display none -machine accel=qtest \
-machine q35 -nodefaults -drive file=null-co://,if=none,format=raw,id=disk0 \
-device nvme,drive=disk0,serial=1 -qtest stdio \

outl 0xcf8 0x80000810               /* MLBAR (BAR0) – Memory Register Base Address, lower 32-bits */
outl 0xcfc 0xe0000000               /* MMIO Base Address = 0xe0000000 */
outl 0xcf8 0x80000804               /* CMD - Command */
outw 0xcfc 0x06                     /* Bus Master Enable, Memory Space Enable */
write 0xe0000024 0x4 0x02000200     /* [3] 3.1.8, Admin Queue Attributes */
write 0xe0000028 0x4 0x00100000     /* asq = 0x1000 */
write 0xe0000030 0x4 0x00200000     /* acq = 0x2000 */
write 0xe0000014 0x4 0x01004600     /* [3] 3.1.5, Controller Configuration, start ctrl */
write 0xe0001000 0x1 0x01           /* [3] 3.1.24, SQyTDBL – Submission Queue y Tail Doorbell */
write 0x1000 0x1 0x02               /* cmd->opcode, NVME_ADM_CMD_GET_LOG_PAGE, nvme_get_log() */
write 0x1018 0x4 0x140000e0         /* prp1 = 0xe0000014, NVME_REG_CC, nvme_ctrl_reset() */
write 0x1028 0x4 0x03000004         /* cmd->cdw10, lid = 3 NVME_LOG_FW_SLOT_INFO, nvme_fw_log_info, buf_len = 4 */
write 0x1030 0x4 0xfc010000         /* cmd->cdw12 = 0x1fc, Log Page Offset, trans_len = sizeof(fw_log) - 0x1fc = 4 */
clock_step
EOF

CC: Mauro Matteo Cascella and Philippe Mathieu-Daudé. Should we put the reproducer above to https://gitlab.com/qemu-project/qemu/-/issues/556?
Klaus Jensen Dec. 17, 2021, 8:37 a.m. UTC | #2
On Dec 17 06:27, Qiuhao Li wrote:
> Thanks Alex. It seems this patch sets and checks if the destination device is busy. But how about the data transfers not triggered directly by PMIO/MMIO handlers? For example:
> 
> 1. Device A Timer's callback -> Device A MMIO handler
> 2. Device A BH's callback -> Device A MMIO handler
> 
> In these situations, when A launches a DMA to itself, the dev->engaged_in_direct_io is not set, so the operation is allowed. Maybe we should log the source and check the destination when we launch data transfers. Is there a way to do that?
> 
> Below is a reproducer in NVMe which triggers DMA in a timer's callback (nvme_process_sq). I can still trigger use-after-free exception with this patch on qemu-6.1.0:
> 
> cat << EOF | ./qemu-system-x86_64 -display none -machine accel=qtest \
> -machine q35 -nodefaults -drive file=null-co://,if=none,format=raw,id=disk0 \
> -device nvme,drive=disk0,serial=1 -qtest stdio \
> 
> outl 0xcf8 0x80000810               /* MLBAR (BAR0) – Memory Register Base Address, lower 32-bits */
> outl 0xcfc 0xe0000000               /* MMIO Base Address = 0xe0000000 */
> outl 0xcf8 0x80000804               /* CMD - Command */
> outw 0xcfc 0x06                     /* Bus Master Enable, Memory Space Enable */
> write 0xe0000024 0x4 0x02000200     /* [3] 3.1.8, Admin Queue Attributes */
> write 0xe0000028 0x4 0x00100000     /* asq = 0x1000 */
> write 0xe0000030 0x4 0x00200000     /* acq = 0x2000 */
> write 0xe0000014 0x4 0x01004600     /* [3] 3.1.5, Controller Configuration, start ctrl */
> write 0xe0001000 0x1 0x01           /* [3] 3.1.24, SQyTDBL – Submission Queue y Tail Doorbell */
> write 0x1000 0x1 0x02               /* cmd->opcode, NVME_ADM_CMD_GET_LOG_PAGE, nvme_get_log() */
> write 0x1018 0x4 0x140000e0         /* prp1 = 0xe0000014, NVME_REG_CC, nvme_ctrl_reset() */
> write 0x1028 0x4 0x03000004         /* cmd->cdw10, lid = 3 NVME_LOG_FW_SLOT_INFO, nvme_fw_log_info, buf_len = 4 */
> write 0x1030 0x4 0xfc010000         /* cmd->cdw12 = 0x1fc, Log Page Offset, trans_len = sizeof(fw_log) - 0x1fc = 4 */
> clock_step
> EOF
> 
> CC: Mauro Matteo Cascella and Philippe Mathieu-Daudé. Should we put the reproducer above to https://gitlab.com/qemu-project/qemu/-/issues/556?
> 

This is a good reproducer. Does it still work if you do the `write
0xe0001000 0x1 0x01` at the end instead? It looks weird that you ring
the doorbell prior to writing the command in the queue.
Mauro Matteo Cascella Dec. 17, 2021, 9:44 a.m. UTC | #3
On Fri, Dec 17, 2021 at 7:28 AM Qiuhao Li <Qiuhao.Li@outlook.com> wrote:
>
> Thanks Alex. It seems this patch sets and checks if the destination device is busy. But how about the data transfers not triggered directly by PMIO/MMIO handlers? For example:
>
> 1. Device A Timer's callback -> Device A MMIO handler
> 2. Device A BH's callback -> Device A MMIO handler
>
> In these situations, when A launches a DMA to itself, the dev->engaged_in_direct_io is not set, so the operation is allowed. Maybe we should log the source and check the destination when we launch data transfers. Is there a way to do that?
>
> Below is a reproducer in NVMe which triggers DMA in a timer's callback (nvme_process_sq). I can still trigger use-after-free exception with this patch on qemu-6.1.0:
>
> cat << EOF | ./qemu-system-x86_64 -display none -machine accel=qtest \
> -machine q35 -nodefaults -drive file=null-co://,if=none,format=raw,id=disk0 \
> -device nvme,drive=disk0,serial=1 -qtest stdio \
>
> outl 0xcf8 0x80000810               /* MLBAR (BAR0) – Memory Register Base Address, lower 32-bits */
> outl 0xcfc 0xe0000000               /* MMIO Base Address = 0xe0000000 */
> outl 0xcf8 0x80000804               /* CMD - Command */
> outw 0xcfc 0x06                     /* Bus Master Enable, Memory Space Enable */
> write 0xe0000024 0x4 0x02000200     /* [3] 3.1.8, Admin Queue Attributes */
> write 0xe0000028 0x4 0x00100000     /* asq = 0x1000 */
> write 0xe0000030 0x4 0x00200000     /* acq = 0x2000 */
> write 0xe0000014 0x4 0x01004600     /* [3] 3.1.5, Controller Configuration, start ctrl */
> write 0xe0001000 0x1 0x01           /* [3] 3.1.24, SQyTDBL – Submission Queue y Tail Doorbell */
> write 0x1000 0x1 0x02               /* cmd->opcode, NVME_ADM_CMD_GET_LOG_PAGE, nvme_get_log() */
> write 0x1018 0x4 0x140000e0         /* prp1 = 0xe0000014, NVME_REG_CC, nvme_ctrl_reset() */
> write 0x1028 0x4 0x03000004         /* cmd->cdw10, lid = 3 NVME_LOG_FW_SLOT_INFO, nvme_fw_log_info, buf_len = 4 */
> write 0x1030 0x4 0xfc010000         /* cmd->cdw12 = 0x1fc, Log Page Offset, trans_len = sizeof(fw_log) - 0x1fc = 4 */
> clock_step
> EOF
>
> CC: Mauro Matteo Cascella and Philippe Mathieu-Daudé. Should we put the reproducer above to https://gitlab.com/qemu-project/qemu/-/issues/556??

Upstream issue: https://gitlab.com/qemu-project/qemu/-/issues/782

Thanks.
Philippe Mathieu-Daudé Dec. 17, 2021, 1:58 p.m. UTC | #4
On 12/17/21 04:08, Alexander Bulekov wrote:
> Here's my shot at fixing dma-reentracy issues. This patch adds a flag to
> the DeviceState, which is set/checked when we call an accessor
> associated with the device's IO MRs.

Your approach is exactly what Gerd suggested:
https://www.mail-archive.com/qemu-devel@nongnu.org/msg831437.html

> The problem, in short, as I understand it: For the vast majority of
> cases, we want to prevent a device from accessing it's own PIO/MMIO
> regions over DMA.
> 
> This patch/solution is based on some assumptions:
> 1. DMA accesses that hit mmio regions are only dangerous if they end up
> interacting with memory-regions belonging to the device initiating the
> DMA.
> Not dangerous:  sdhci_pio->dma_write->e1000_mmio
> Dangerous:      sdhci_pio->dma_write->sdhci_mmio

It doesn't have to be dangerous, see Paolo's example which
invalidated my previous attempt and forced me to write 24
patches in multiples series to keep the "niche" cases working:
https://www.mail-archive.com/qemu-block@nongnu.org/msg72939.html

> 
> 2. Most devices do not interact with their own PIO/MMIO memory-regions
> using DMA.
> 
> 3. There is no way for there to be multiple simultaneous accesses to a
> device's PIO/MMIO memory-regions.
> 
> 4. All devices are QOMified :-)
> 
> With this patch, I wasn't able to reproduce the issues being tracked
> here, with QTest reproducers:
> https://gitlab.com/qemu-project/qemu/-/issues/556
> 
> This passes the i386 qos/qtests for me and I was able to boot some linux/windows
> VMs with basic devices configured, without any apparent problems.
> 
> Cc: Philippe Mathieu-Daudé <philmd@redhat.com>
> Cc: Mauro Matteo Cascella <mcascell@redhat.com>
> Cc: Qiuhao Li <Qiuhao.Li@outlook.com>
> Cc: Peter Xu <peterx@redhat.com>
> Cc: Jason Wang <jasowang@redhat.com>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Gerd Hoffmann <kraxel@redhat.com>
> Cc: Peter Maydell <peter.maydell@linaro.org>
> Cc: Li Qiang <liq3ea@gmail.com>
> Cc: Thomas Huth <thuth@redhat.com>
> Cc: Laurent Vivier <lvivier@redhat.com>
> Cc: Bandan Das <bsd@redhat.com>
> Cc: Edgar E. Iglesias <edgar.iglesias@gmail.com>
> Cc: Darren Kenny <darren.kenny@oracle.com>
> Cc: Bin Meng <bin.meng@windriver.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Stefan Hajnoczi <stefanha@redhat.com>
> 
> Signed-off-by: Alexander Bulekov <alxndr@bu.edu>
> ---
>  include/hw/qdev-core.h |  1 +
>  softmmu/memory.c       | 15 +++++++++++++++
>  softmmu/trace-events   |  1 +
>  3 files changed, 17 insertions(+)
> 
> diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
> index 20d3066595..32f7c779ab 100644
> --- a/include/hw/qdev-core.h
> +++ b/include/hw/qdev-core.h
> @@ -193,6 +193,7 @@ struct DeviceState {
>      int instance_id_alias;
>      int alias_required_for_version;
>      ResettableState reset;
> +    int engaged_in_direct_io;
>  };
>  
>  struct DeviceListener {
> diff --git a/softmmu/memory.c b/softmmu/memory.c
> index 7340e19ff5..255c3c602f 100644
> --- a/softmmu/memory.c
> +++ b/softmmu/memory.c
> @@ -532,6 +532,7 @@ static MemTxResult access_with_adjusted_size(hwaddr addr,
>      uint64_t access_mask;
>      unsigned access_size;
>      unsigned i;
> +    DeviceState *dev = NULL;
>      MemTxResult r = MEMTX_OK;
>  
>      if (!access_size_min) {
> @@ -541,6 +542,17 @@ static MemTxResult access_with_adjusted_size(hwaddr addr,
>          access_size_max = 4;
>      }
>  
> +    /* Do not allow more than one simultanous access to a device's IO Regions */
> +    if (mr->owner &&
> +        !mr->ram_device && !mr->ram && !mr->rom_device && !mr->readonly) {
> +        dev = (DeviceState *) object_dynamic_cast(mr->owner, TYPE_DEVICE);
> +        if (dev->engaged_in_direct_io) {
> +            trace_memory_region_reentrant_io(get_cpu_index(), mr, addr, size);
> +            return MEMTX_ERROR;
> +        }
> +        dev->engaged_in_direct_io = true;
> +    }
> +
>      /* FIXME: support unaligned access? */
>      access_size = MAX(MIN(size, access_size_max), access_size_min);
>      access_mask = MAKE_64BIT_MASK(0, access_size * 8);
> @@ -555,6 +567,9 @@ static MemTxResult access_with_adjusted_size(hwaddr addr,
>                          access_mask, attrs);
>          }
>      }
> +    if (dev) {
> +        dev->engaged_in_direct_io = false;
> +    }
>      return r;
>  }
>  
> diff --git a/softmmu/trace-events b/softmmu/trace-events
> index 9c88887b3c..d7228316db 100644
> --- a/softmmu/trace-events
> +++ b/softmmu/trace-events
> @@ -13,6 +13,7 @@ memory_region_ops_read(int cpu_index, void *mr, uint64_t addr, uint64_t value, u
>  memory_region_ops_write(int cpu_index, void *mr, uint64_t addr, uint64_t value, unsigned size, const char *name) "cpu %d mr %p addr 0x%"PRIx64" value 0x%"PRIx64" size %u name '%s'"
>  memory_region_subpage_read(int cpu_index, void *mr, uint64_t offset, uint64_t value, unsigned size) "cpu %d mr %p offset 0x%"PRIx64" value 0x%"PRIx64" size %u"
>  memory_region_subpage_write(int cpu_index, void *mr, uint64_t offset, uint64_t value, unsigned size) "cpu %d mr %p offset 0x%"PRIx64" value 0x%"PRIx64" size %u"
> +memory_region_reentrant_io(int cpu_index, void *mr, uint64_t offset, unsigned size) "cpu %d mr %p offset 0x%"PRIx64" size %u"
>  memory_region_ram_device_read(int cpu_index, void *mr, uint64_t addr, uint64_t value, unsigned size) "cpu %d mr %p addr 0x%"PRIx64" value 0x%"PRIx64" size %u"
>  memory_region_ram_device_write(int cpu_index, void *mr, uint64_t addr, uint64_t value, unsigned size) "cpu %d mr %p addr 0x%"PRIx64" value 0x%"PRIx64" size %u"
>  memory_region_sync_dirty(const char *mr, const char *listener, int global) "mr '%s' listener '%s' synced (global=%d)"
>
Alexander Bulekov Dec. 17, 2021, 2:20 p.m. UTC | #5
On 211217 0627, Qiuhao Li wrote:
> Thanks Alex. It seems this patch sets and checks if the destination device is busy. But how about the data transfers not triggered directly by PMIO/MMIO handlers? For example:
> 
> 1. Device A Timer's callback -> Device A MMIO handler
> 2. Device A BH's callback -> Device A MMIO handler
> 
> In these situations, when A launches a DMA to itself, the dev->engaged_in_direct_io is not set, so the operation is allowed. Maybe we should log the source and check the destination when we launch data transfers. Is there a way to do that?

Ahh, I forgot that this can happen in BHs as well... I think the only
place to do that, without major API changes, is at pci_dma_rw and
dma_buffer_rw (where we can catch a glimpse of the
PCIDevice/DeviceState).

I'll send a V2.

-Alex

> 
> Below is a reproducer in NVMe which triggers DMA in a timer's callback (nvme_process_sq). I can still trigger use-after-free exception with this patch on qemu-6.1.0:
> 
> cat << EOF | ./qemu-system-x86_64 -display none -machine accel=qtest \
> -machine q35 -nodefaults -drive file=null-co://,if=none,format=raw,id=disk0 \
> -device nvme,drive=disk0,serial=1 -qtest stdio \
> 
> outl 0xcf8 0x80000810               /* MLBAR (BAR0) – Memory Register Base Address, lower 32-bits */
> outl 0xcfc 0xe0000000               /* MMIO Base Address = 0xe0000000 */
> outl 0xcf8 0x80000804               /* CMD - Command */
> outw 0xcfc 0x06                     /* Bus Master Enable, Memory Space Enable */
> write 0xe0000024 0x4 0x02000200     /* [3] 3.1.8, Admin Queue Attributes */
> write 0xe0000028 0x4 0x00100000     /* asq = 0x1000 */
> write 0xe0000030 0x4 0x00200000     /* acq = 0x2000 */
> write 0xe0000014 0x4 0x01004600     /* [3] 3.1.5, Controller Configuration, start ctrl */
> write 0xe0001000 0x1 0x01           /* [3] 3.1.24, SQyTDBL – Submission Queue y Tail Doorbell */
> write 0x1000 0x1 0x02               /* cmd->opcode, NVME_ADM_CMD_GET_LOG_PAGE, nvme_get_log() */
> write 0x1018 0x4 0x140000e0         /* prp1 = 0xe0000014, NVME_REG_CC, nvme_ctrl_reset() */
> write 0x1028 0x4 0x03000004         /* cmd->cdw10, lid = 3 NVME_LOG_FW_SLOT_INFO, nvme_fw_log_info, buf_len = 4 */
> write 0x1030 0x4 0xfc010000         /* cmd->cdw12 = 0x1fc, Log Page Offset, trans_len = sizeof(fw_log) - 0x1fc = 4 */
> clock_step
> EOF
> 
> CC: Mauro Matteo Cascella and Philippe Mathieu-Daudé. Should we put the reproducer above to https://gitlab.com/qemu-project/qemu/-/issues/556?
> 
> ________________________________
> From: Alexander Bulekov <alxndr@bu.edu>
> Sent: Friday, December 17, 2021 11:08
> To: qemu-devel@nongnu.org <qemu-devel@nongnu.org>
> Cc: Alexander Bulekov <alxndr@bu.edu>; Philippe Mathieu-Daudé <philmd@redhat.com>; Mauro Matteo Cascella <mcascell@redhat.com>; Qiuhao Li <Qiuhao.Li@outlook.com>; Peter Xu <peterx@redhat.com>; Jason Wang <jasowang@redhat.com>; David Hildenbrand <david@redhat.com>; Gerd Hoffmann <kraxel@redhat.com>; Peter Maydell <peter.maydell@linaro.org>; Li Qiang <liq3ea@gmail.com>; Thomas Huth <thuth@redhat.com>; Laurent Vivier <lvivier@redhat.com>; Bandan Das <bsd@redhat.com>; Edgar E . Iglesias <edgar.iglesias@gmail.com>; Darren Kenny <darren.kenny@oracle.com>; Bin Meng <bin.meng@windriver.com>; Paolo Bonzini <pbonzini@redhat.com>; Stefan Hajnoczi <stefanha@redhat.com>; Daniel P. Berrangé <berrange@redhat.com>; Eduardo Habkost <eduardo@habkost.net>
> Subject: [RFC PATCH] memory: Fix dma-reentrancy issues at the MMIO level
> 
> Here's my shot at fixing dma-reentracy issues. This patch adds a flag to
> the DeviceState, which is set/checked when we call an accessor
> associated with the device's IO MRs.
> 
> The problem, in short, as I understand it: For the vast majority of
> cases, we want to prevent a device from accessing it's own PIO/MMIO
> regions over DMA.
> 
> This patch/solution is based on some assumptions:
> 1. DMA accesses that hit mmio regions are only dangerous if they end up
> interacting with memory-regions belonging to the device initiating the
> DMA.
> Not dangerous:  sdhci_pio->dma_write->e1000_mmio
> Dangerous:      sdhci_pio->dma_write->sdhci_mmio
> 
> 2. Most devices do not interact with their own PIO/MMIO memory-regions
> using DMA.
> 
> 3. There is no way for there to be multiple simultaneous accesses to a
> device's PIO/MMIO memory-regions.
> 
> 4. All devices are QOMified :-)
> 
> With this patch, I wasn't able to reproduce the issues being tracked
> here, with QTest reproducers:
> https://gitlab.com/qemu-project/qemu/-/issues/556
> 
> This passes the i386 qos/qtests for me and I was able to boot some linux/windows
> VMs with basic devices configured, without any apparent problems.
> 
> Cc: Philippe Mathieu-Daudé <philmd@redhat.com>
> Cc: Mauro Matteo Cascella <mcascell@redhat.com>
> Cc: Qiuhao Li <Qiuhao.Li@outlook.com>
> Cc: Peter Xu <peterx@redhat.com>
> Cc: Jason Wang <jasowang@redhat.com>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Gerd Hoffmann <kraxel@redhat.com>
> Cc: Peter Maydell <peter.maydell@linaro.org>
> Cc: Li Qiang <liq3ea@gmail.com>
> Cc: Thomas Huth <thuth@redhat.com>
> Cc: Laurent Vivier <lvivier@redhat.com>
> Cc: Bandan Das <bsd@redhat.com>
> Cc: Edgar E. Iglesias <edgar.iglesias@gmail.com>
> Cc: Darren Kenny <darren.kenny@oracle.com>
> Cc: Bin Meng <bin.meng@windriver.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Stefan Hajnoczi <stefanha@redhat.com>
> 
> Signed-off-by: Alexander Bulekov <alxndr@bu.edu>
> ---
>  include/hw/qdev-core.h |  1 +
>  softmmu/memory.c       | 15 +++++++++++++++
>  softmmu/trace-events   |  1 +
>  3 files changed, 17 insertions(+)
> 
> diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
> index 20d3066595..32f7c779ab 100644
> --- a/include/hw/qdev-core.h
> +++ b/include/hw/qdev-core.h
> @@ -193,6 +193,7 @@ struct DeviceState {
>      int instance_id_alias;
>      int alias_required_for_version;
>      ResettableState reset;
> +    int engaged_in_direct_io;
>  };
> 
>  struct DeviceListener {
> diff --git a/softmmu/memory.c b/softmmu/memory.c
> index 7340e19ff5..255c3c602f 100644
> --- a/softmmu/memory.c
> +++ b/softmmu/memory.c
> @@ -532,6 +532,7 @@ static MemTxResult access_with_adjusted_size(hwaddr addr,
>      uint64_t access_mask;
>      unsigned access_size;
>      unsigned i;
> +    DeviceState *dev = NULL;
>      MemTxResult r = MEMTX_OK;
> 
>      if (!access_size_min) {
> @@ -541,6 +542,17 @@ static MemTxResult access_with_adjusted_size(hwaddr addr,
>          access_size_max = 4;
>      }
> 
> +    /* Do not allow more than one simultanous access to a device's IO Regions */
> +    if (mr->owner &&
> +        !mr->ram_device && !mr->ram && !mr->rom_device && !mr->readonly) {
> +        dev = (DeviceState *) object_dynamic_cast(mr->owner, TYPE_DEVICE);
> +        if (dev->engaged_in_direct_io) {
> +            trace_memory_region_reentrant_io(get_cpu_index(), mr, addr, size);
> +            return MEMTX_ERROR;
> +        }
> +        dev->engaged_in_direct_io = true;
> +    }
> +
>      /* FIXME: support unaligned access? */
>      access_size = MAX(MIN(size, access_size_max), access_size_min);
>      access_mask = MAKE_64BIT_MASK(0, access_size * 8);
> @@ -555,6 +567,9 @@ static MemTxResult access_with_adjusted_size(hwaddr addr,
>                          access_mask, attrs);
>          }
>      }
> +    if (dev) {
> +        dev->engaged_in_direct_io = false;
> +    }
>      return r;
>  }
> 
> diff --git a/softmmu/trace-events b/softmmu/trace-events
> index 9c88887b3c..d7228316db 100644
> --- a/softmmu/trace-events
> +++ b/softmmu/trace-events
> @@ -13,6 +13,7 @@ memory_region_ops_read(int cpu_index, void *mr, uint64_t addr, uint64_t value, u
>  memory_region_ops_write(int cpu_index, void *mr, uint64_t addr, uint64_t value, unsigned size, const char *name) "cpu %d mr %p addr 0x%"PRIx64" value 0x%"PRIx64" size %u name '%s'"
>  memory_region_subpage_read(int cpu_index, void *mr, uint64_t offset, uint64_t value, unsigned size) "cpu %d mr %p offset 0x%"PRIx64" value 0x%"PRIx64" size %u"
>  memory_region_subpage_write(int cpu_index, void *mr, uint64_t offset, uint64_t value, unsigned size) "cpu %d mr %p offset 0x%"PRIx64" value 0x%"PRIx64" size %u"
> +memory_region_reentrant_io(int cpu_index, void *mr, uint64_t offset, unsigned size) "cpu %d mr %p offset 0x%"PRIx64" size %u"
>  memory_region_ram_device_read(int cpu_index, void *mr, uint64_t addr, uint64_t value, unsigned size) "cpu %d mr %p addr 0x%"PRIx64" value 0x%"PRIx64" size %u"
>  memory_region_ram_device_write(int cpu_index, void *mr, uint64_t addr, uint64_t value, unsigned size) "cpu %d mr %p addr 0x%"PRIx64" value 0x%"PRIx64" size %u"
>  memory_region_sync_dirty(const char *mr, const char *listener, int global) "mr '%s' listener '%s' synced (global=%d)"
> --
> 2.33.0
>
Alexander Bulekov Dec. 17, 2021, 2:30 p.m. UTC | #6
On 211217 1458, Philippe Mathieu-Daudé wrote:
> On 12/17/21 04:08, Alexander Bulekov wrote:
> > Here's my shot at fixing dma-reentracy issues. This patch adds a flag to
> > the DeviceState, which is set/checked when we call an accessor
> > associated with the device's IO MRs.
> 
> Your approach is exactly what Gerd suggested:
> https://www.mail-archive.com/qemu-devel@nongnu.org/msg831437.html

Yes - my bad for not searching my mail more carefully.

> 
> > The problem, in short, as I understand it: For the vast majority of
> > cases, we want to prevent a device from accessing it's own PIO/MMIO
> > regions over DMA.
> > 
> > This patch/solution is based on some assumptions:
> > 1. DMA accesses that hit mmio regions are only dangerous if they end up
> > interacting with memory-regions belonging to the device initiating the
> > DMA.
> > Not dangerous:  sdhci_pio->dma_write->e1000_mmio
> > Dangerous:      sdhci_pio->dma_write->sdhci_mmio
> 
> It doesn't have to be dangerous, see Paolo's example which
> invalidated my previous attempt and forced me to write 24
> patches in multiples series to keep the "niche" cases working:
> https://www.mail-archive.com/qemu-block@nongnu.org/msg72939.html

I don't understand what IO accesses this decodes to. This is loading a
picture into VRAM?

-Alex

> 
> > 
> > 2. Most devices do not interact with their own PIO/MMIO memory-regions
> > using DMA.
> > 
> > 3. There is no way for there to be multiple simultaneous accesses to a
> > device's PIO/MMIO memory-regions.
> > 
> > 4. All devices are QOMified :-)
> > 
> > With this patch, I wasn't able to reproduce the issues being tracked
> > here, with QTest reproducers:
> > https://gitlab.com/qemu-project/qemu/-/issues/556
> > 
> > This passes the i386 qos/qtests for me and I was able to boot some linux/windows
> > VMs with basic devices configured, without any apparent problems.
> > 
> > Cc: Philippe Mathieu-Daudé <philmd@redhat.com>
> > Cc: Mauro Matteo Cascella <mcascell@redhat.com>
> > Cc: Qiuhao Li <Qiuhao.Li@outlook.com>
> > Cc: Peter Xu <peterx@redhat.com>
> > Cc: Jason Wang <jasowang@redhat.com>
> > Cc: David Hildenbrand <david@redhat.com>
> > Cc: Gerd Hoffmann <kraxel@redhat.com>
> > Cc: Peter Maydell <peter.maydell@linaro.org>
> > Cc: Li Qiang <liq3ea@gmail.com>
> > Cc: Thomas Huth <thuth@redhat.com>
> > Cc: Laurent Vivier <lvivier@redhat.com>
> > Cc: Bandan Das <bsd@redhat.com>
> > Cc: Edgar E. Iglesias <edgar.iglesias@gmail.com>
> > Cc: Darren Kenny <darren.kenny@oracle.com>
> > Cc: Bin Meng <bin.meng@windriver.com>
> > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > Cc: Stefan Hajnoczi <stefanha@redhat.com>
> > 
> > Signed-off-by: Alexander Bulekov <alxndr@bu.edu>
> > ---
> >  include/hw/qdev-core.h |  1 +
> >  softmmu/memory.c       | 15 +++++++++++++++
> >  softmmu/trace-events   |  1 +
> >  3 files changed, 17 insertions(+)
> > 
> > diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
> > index 20d3066595..32f7c779ab 100644
> > --- a/include/hw/qdev-core.h
> > +++ b/include/hw/qdev-core.h
> > @@ -193,6 +193,7 @@ struct DeviceState {
> >      int instance_id_alias;
> >      int alias_required_for_version;
> >      ResettableState reset;
> > +    int engaged_in_direct_io;
> >  };
> >  
> >  struct DeviceListener {
> > diff --git a/softmmu/memory.c b/softmmu/memory.c
> > index 7340e19ff5..255c3c602f 100644
> > --- a/softmmu/memory.c
> > +++ b/softmmu/memory.c
> > @@ -532,6 +532,7 @@ static MemTxResult access_with_adjusted_size(hwaddr addr,
> >      uint64_t access_mask;
> >      unsigned access_size;
> >      unsigned i;
> > +    DeviceState *dev = NULL;
> >      MemTxResult r = MEMTX_OK;
> >  
> >      if (!access_size_min) {
> > @@ -541,6 +542,17 @@ static MemTxResult access_with_adjusted_size(hwaddr addr,
> >          access_size_max = 4;
> >      }
> >  
> > +    /* Do not allow more than one simultanous access to a device's IO Regions */
> > +    if (mr->owner &&
> > +        !mr->ram_device && !mr->ram && !mr->rom_device && !mr->readonly) {
> > +        dev = (DeviceState *) object_dynamic_cast(mr->owner, TYPE_DEVICE);
> > +        if (dev->engaged_in_direct_io) {
> > +            trace_memory_region_reentrant_io(get_cpu_index(), mr, addr, size);
> > +            return MEMTX_ERROR;
> > +        }
> > +        dev->engaged_in_direct_io = true;
> > +    }
> > +
> >      /* FIXME: support unaligned access? */
> >      access_size = MAX(MIN(size, access_size_max), access_size_min);
> >      access_mask = MAKE_64BIT_MASK(0, access_size * 8);
> > @@ -555,6 +567,9 @@ static MemTxResult access_with_adjusted_size(hwaddr addr,
> >                          access_mask, attrs);
> >          }
> >      }
> > +    if (dev) {
> > +        dev->engaged_in_direct_io = false;
> > +    }
> >      return r;
> >  }
> >  
> > diff --git a/softmmu/trace-events b/softmmu/trace-events
> > index 9c88887b3c..d7228316db 100644
> > --- a/softmmu/trace-events
> > +++ b/softmmu/trace-events
> > @@ -13,6 +13,7 @@ memory_region_ops_read(int cpu_index, void *mr, uint64_t addr, uint64_t value, u
> >  memory_region_ops_write(int cpu_index, void *mr, uint64_t addr, uint64_t value, unsigned size, const char *name) "cpu %d mr %p addr 0x%"PRIx64" value 0x%"PRIx64" size %u name '%s'"
> >  memory_region_subpage_read(int cpu_index, void *mr, uint64_t offset, uint64_t value, unsigned size) "cpu %d mr %p offset 0x%"PRIx64" value 0x%"PRIx64" size %u"
> >  memory_region_subpage_write(int cpu_index, void *mr, uint64_t offset, uint64_t value, unsigned size) "cpu %d mr %p offset 0x%"PRIx64" value 0x%"PRIx64" size %u"
> > +memory_region_reentrant_io(int cpu_index, void *mr, uint64_t offset, unsigned size) "cpu %d mr %p offset 0x%"PRIx64" size %u"
> >  memory_region_ram_device_read(int cpu_index, void *mr, uint64_t addr, uint64_t value, unsigned size) "cpu %d mr %p addr 0x%"PRIx64" value 0x%"PRIx64" size %u"
> >  memory_region_ram_device_write(int cpu_index, void *mr, uint64_t addr, uint64_t value, unsigned size) "cpu %d mr %p addr 0x%"PRIx64" value 0x%"PRIx64" size %u"
> >  memory_region_sync_dirty(const char *mr, const char *listener, int global) "mr '%s' listener '%s' synced (global=%d)"
> > 
>
Philippe Mathieu-Daudé Dec. 17, 2021, 3:25 p.m. UTC | #7
On 12/17/21 15:30, Alexander Bulekov wrote:
> On 211217 1458, Philippe Mathieu-Daudé wrote:
>> On 12/17/21 04:08, Alexander Bulekov wrote:
>>> Here's my shot at fixing dma-reentracy issues. This patch adds a flag to
>>> the DeviceState, which is set/checked when we call an accessor
>>> associated with the device's IO MRs.
>>
>> Your approach is exactly what Gerd suggested:
>> https://www.mail-archive.com/qemu-devel@nongnu.org/msg831437.html
> 
> Yes - my bad for not searching my mail more carefully.

Well it is not "exactly" the same, but almost.

>>
>>> The problem, in short, as I understand it: For the vast majority of
>>> cases, we want to prevent a device from accessing it's own PIO/MMIO
>>> regions over DMA.
>>>
>>> This patch/solution is based on some assumptions:
>>> 1. DMA accesses that hit mmio regions are only dangerous if they end up
>>> interacting with memory-regions belonging to the device initiating the
>>> DMA.
>>> Not dangerous:  sdhci_pio->dma_write->e1000_mmio
>>> Dangerous:      sdhci_pio->dma_write->sdhci_mmio
>>
>> It doesn't have to be dangerous, see Paolo's example which
>> invalidated my previous attempt and forced me to write 24
>> patches in multiples series to keep the "niche" cases working:
>> https://www.mail-archive.com/qemu-block@nongnu.org/msg72939.html
> 
> I don't understand what IO accesses this decodes to. This is loading a
> picture into VRAM?

I'd say "loading a picture into VRAM via the DMA" but am not sure :)

This link is helpful:
http://petesqbsite.com/sections/tutorials/tutorials/peekpoke.txt
Alexander Bulekov Dec. 17, 2021, 4:51 p.m. UTC | #8
On 211217 1625, Philippe Mathieu-Daudé wrote:
> On 12/17/21 15:30, Alexander Bulekov wrote:
> > On 211217 1458, Philippe Mathieu-Daudé wrote:
> >> On 12/17/21 04:08, Alexander Bulekov wrote:
> >>> Here's my shot at fixing dma-reentracy issues. This patch adds a flag to
> >>> the DeviceState, which is set/checked when we call an accessor
> >>> associated with the device's IO MRs.
> >>
> >> Your approach is exactly what Gerd suggested:
> >> https://www.mail-archive.com/qemu-devel@nongnu.org/msg831437.html
> > 
> > Yes - my bad for not searching my mail more carefully.
> 
> Well it is not "exactly" the same, but almost.
> 
> >>
> >>> The problem, in short, as I understand it: For the vast majority of
> >>> cases, we want to prevent a device from accessing it's own PIO/MMIO
> >>> regions over DMA.
> >>>
> >>> This patch/solution is based on some assumptions:
> >>> 1. DMA accesses that hit mmio regions are only dangerous if they end up
> >>> interacting with memory-regions belonging to the device initiating the
> >>> DMA.
> >>> Not dangerous:  sdhci_pio->dma_write->e1000_mmio
> >>> Dangerous:      sdhci_pio->dma_write->sdhci_mmio
> >>
> >> It doesn't have to be dangerous, see Paolo's example which
> >> invalidated my previous attempt and forced me to write 24
> >> patches in multiples series to keep the "niche" cases working:
> >> https://www.mail-archive.com/qemu-block@nongnu.org/msg72939.html
> > 
> > I don't understand what IO accesses this decodes to. This is loading a
> > picture into VRAM?
> 
> I'd say "loading a picture into VRAM via the DMA" but am not sure :)
> 
> This link is helpful:
> http://petesqbsite.com/sections/tutorials/tutorials/peekpoke.txt
>

https://github.com/microsoft/GW-BASIC/blob/edf82c2ebf6bfe099c2054e0ae125c3efe5769c4/GIO86.ASM#L333

AFAICT this would just do repeated MMIO writes to VRAM - no DMA
involved?

Maybe there is some way to log when a device performs a DMA access to
it's own IO regions, so that we could identify these niche cases? We
would still need a way to actually trigger that behavior...
diff mbox series

Patch

diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index 20d3066595..32f7c779ab 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -193,6 +193,7 @@  struct DeviceState {
     int instance_id_alias;
     int alias_required_for_version;
     ResettableState reset;
+    int engaged_in_direct_io;
 };
 
 struct DeviceListener {
diff --git a/softmmu/memory.c b/softmmu/memory.c
index 7340e19ff5..255c3c602f 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -532,6 +532,7 @@  static MemTxResult access_with_adjusted_size(hwaddr addr,
     uint64_t access_mask;
     unsigned access_size;
     unsigned i;
+    DeviceState *dev = NULL;
     MemTxResult r = MEMTX_OK;
 
     if (!access_size_min) {
@@ -541,6 +542,17 @@  static MemTxResult access_with_adjusted_size(hwaddr addr,
         access_size_max = 4;
     }
 
+    /* Do not allow more than one simultanous access to a device's IO Regions */
+    if (mr->owner &&
+        !mr->ram_device && !mr->ram && !mr->rom_device && !mr->readonly) {
+        dev = (DeviceState *) object_dynamic_cast(mr->owner, TYPE_DEVICE);
+        if (dev->engaged_in_direct_io) {
+            trace_memory_region_reentrant_io(get_cpu_index(), mr, addr, size);
+            return MEMTX_ERROR;
+        }
+        dev->engaged_in_direct_io = true;
+    }
+
     /* FIXME: support unaligned access? */
     access_size = MAX(MIN(size, access_size_max), access_size_min);
     access_mask = MAKE_64BIT_MASK(0, access_size * 8);
@@ -555,6 +567,9 @@  static MemTxResult access_with_adjusted_size(hwaddr addr,
                         access_mask, attrs);
         }
     }
+    if (dev) {
+        dev->engaged_in_direct_io = false;
+    }
     return r;
 }
 
diff --git a/softmmu/trace-events b/softmmu/trace-events
index 9c88887b3c..d7228316db 100644
--- a/softmmu/trace-events
+++ b/softmmu/trace-events
@@ -13,6 +13,7 @@  memory_region_ops_read(int cpu_index, void *mr, uint64_t addr, uint64_t value, u
 memory_region_ops_write(int cpu_index, void *mr, uint64_t addr, uint64_t value, unsigned size, const char *name) "cpu %d mr %p addr 0x%"PRIx64" value 0x%"PRIx64" size %u name '%s'"
 memory_region_subpage_read(int cpu_index, void *mr, uint64_t offset, uint64_t value, unsigned size) "cpu %d mr %p offset 0x%"PRIx64" value 0x%"PRIx64" size %u"
 memory_region_subpage_write(int cpu_index, void *mr, uint64_t offset, uint64_t value, unsigned size) "cpu %d mr %p offset 0x%"PRIx64" value 0x%"PRIx64" size %u"
+memory_region_reentrant_io(int cpu_index, void *mr, uint64_t offset, unsigned size) "cpu %d mr %p offset 0x%"PRIx64" size %u"
 memory_region_ram_device_read(int cpu_index, void *mr, uint64_t addr, uint64_t value, unsigned size) "cpu %d mr %p addr 0x%"PRIx64" value 0x%"PRIx64" size %u"
 memory_region_ram_device_write(int cpu_index, void *mr, uint64_t addr, uint64_t value, unsigned size) "cpu %d mr %p addr 0x%"PRIx64" value 0x%"PRIx64" size %u"
 memory_region_sync_dirty(const char *mr, const char *listener, int global) "mr '%s' listener '%s' synced (global=%d)"