diff mbox series

[v6,1/4] memory: prevent dma-reentracy issues

Message ID 20230205040737.3567731-2-alxndr@bu.edu (mailing list archive)
State New, archived
Headers show
Series memory: prevent dma-reentracy issues | expand

Commit Message

Alexander Bulekov Feb. 5, 2023, 4:07 a.m. UTC
Add a flag to the DeviceState, when a device is engaged in PIO/MMIO/DMA.
This flag is set/checked prior to calling a device's MemoryRegion
handlers, and set when device code initiates DMA.  The purpose of this
flag is to prevent two types of DMA-based reentrancy issues:

1.) mmio -> dma -> mmio case
2.) bh -> dma write -> mmio case

These issues have led to problems such as stack-exhaustion and
use-after-frees.

Summary of the problem from Peter Maydell:
https://lore.kernel.org/qemu-devel/CAFEAcA_23vc7hE3iaM-JVA6W38LK4hJoWae5KcknhPRD5fPBZA@mail.gmail.com

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/62
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/540
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/541
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/556
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/557
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/827
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1282

Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Alexander Bulekov <alxndr@bu.edu>
Acked-by: Peter Xu <peterx@redhat.com>
---
 include/hw/qdev-core.h |  7 +++++++
 softmmu/memory.c       | 17 +++++++++++++++++
 softmmu/trace-events   |  1 +
 3 files changed, 25 insertions(+)

Comments

Fiona Ebner March 10, 2023, 11:14 a.m. UTC | #1
Am 05.02.23 um 05:07 schrieb Alexander Bulekov:
> Add a flag to the DeviceState, when a device is engaged in PIO/MMIO/DMA.
> This flag is set/checked prior to calling a device's MemoryRegion
> handlers, and set when device code initiates DMA.  The purpose of this
> flag is to prevent two types of DMA-based reentrancy issues:
> 
> 1.) mmio -> dma -> mmio case
> 2.) bh -> dma write -> mmio case
> 
> These issues have led to problems such as stack-exhaustion and
> use-after-frees.
> 
> Summary of the problem from Peter Maydell:
> https://lore.kernel.org/qemu-devel/CAFEAcA_23vc7hE3iaM-JVA6W38LK4hJoWae5KcknhPRD5fPBZA@mail.gmail.com
> 
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/62
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/540
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/541
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/556
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/557
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/827
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1282
> 
> Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> Signed-off-by: Alexander Bulekov <alxndr@bu.edu>
> Acked-by: Peter Xu <peterx@redhat.com>
> ---
>  include/hw/qdev-core.h |  7 +++++++
>  softmmu/memory.c       | 17 +++++++++++++++++
>  softmmu/trace-events   |  1 +
>  3 files changed, 25 insertions(+)
> 
Hi,
there seems to be an issue with this patch or existing issue exposed by
this patch in combination with the LSI SCSI controller.

After applying this patch on current master (i.e.
ee59483267de29056b5b2ee2421ef3844e5c9932), a Debian 11 with the LSI
controller would not boot properly anymore:
> [    7.540907] sym0: <895a> rev 0x0 at pci 0000:00:05.0 irq 10
> [    7.546028] sym0: No NVRAM, ID 7, Fast-40, LVD, parity checking
> [    7.559724] sym0: SCSI BUS has been reset.
> [    7.560820] sym0: interrupted SCRIPT address not found.
> [    7.563802] scsi host2: sym-2.2.3
> [    7.881111] e1000 0000:00:03.0 eth0: (PCI:33MHz:32-bit) 52:54:00:12:34:56
> [    7.881998] e1000 0000:00:03.0 eth0: Intel(R) PRO/1000 Network Connection
> [    7.925902] e1000 0000:00:03.0 ens3: renamed from eth0
> [   32.654811] scsi 2:0:0:0: tag#192 ABORT operation started
> [   37.764283] scsi 2:0:0:0: ABORT operation timed-out.
> [   37.774974] scsi 2:0:0:0: tag#192 DEVICE RESET operation started
> [   42.882488] scsi 2:0:0:0: DEVICE RESET operation timed-out.
> [   42.883606] scsi 2:0:0:0: tag#192 BUS RESET operation started
> [   48.002437] scsi 2:0:0:0: BUS RESET operation timed-out.
> [   48.003030] scsi 2:0:0:0: tag#192 HOST RESET operation started
> [   48.010226] sym0: SCSI BUS has been reset.
> [   53.122472] scsi 2:0:0:0: HOST RESET operation timed-out.
> [   53.123030] scsi 2:0:0:0: Device offlined - not ready after error recovery

The commandline I used is:
./qemu-system-x86_64 \
   -cpu 'kvm64' \
   -m 4096 \
   -serial 'stdio' \
   -device 'lsi,id=scsihw0,bus=pci.0,addr=0x5' \
   -drive
'file=/dev/zvol/myzpool/vm-9006-disk-0,if=none,id=drive-scsi0,format=raw' \
   -device
'scsi-hd,bus=scsihw0.0,scsi-id=0,drive=drive-scsi0,id=scsi0,bootindex=100' \
   -machine 'pc'

Happy to provide any more information if necessary!

CC-ing Fam Zheng (reviewer:SCSI)

Originally reported by one of our community members:
https://forum.proxmox.com/threads/123843/

Best Regards,
Fiona
Alexander Bulekov March 10, 2023, 12:23 p.m. UTC | #2
On 230310 1214, Fiona Ebner wrote:
> Am 05.02.23 um 05:07 schrieb Alexander Bulekov:
> > Add a flag to the DeviceState, when a device is engaged in PIO/MMIO/DMA.
> > This flag is set/checked prior to calling a device's MemoryRegion
> > handlers, and set when device code initiates DMA.  The purpose of this
> > flag is to prevent two types of DMA-based reentrancy issues:
> > 
> > 1.) mmio -> dma -> mmio case
> > 2.) bh -> dma write -> mmio case
> > 
> > These issues have led to problems such as stack-exhaustion and
> > use-after-frees.
> > 
> > Summary of the problem from Peter Maydell:
> > https://lore.kernel.org/qemu-devel/CAFEAcA_23vc7hE3iaM-JVA6W38LK4hJoWae5KcknhPRD5fPBZA@mail.gmail.com
> > 
> > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/62
> > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/540
> > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/541
> > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/556
> > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/557
> > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/827
> > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1282
> > 
> > Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
> > Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> > Signed-off-by: Alexander Bulekov <alxndr@bu.edu>
> > Acked-by: Peter Xu <peterx@redhat.com>
> > ---
> >  include/hw/qdev-core.h |  7 +++++++
> >  softmmu/memory.c       | 17 +++++++++++++++++
> >  softmmu/trace-events   |  1 +
> >  3 files changed, 25 insertions(+)
> > 
> Hi,
> there seems to be an issue with this patch or existing issue exposed by
> this patch in combination with the LSI SCSI controller.
> 
> After applying this patch on current master (i.e.
> ee59483267de29056b5b2ee2421ef3844e5c9932), a Debian 11 with the LSI
> controller would not boot properly anymore:
> > [    7.540907] sym0: <895a> rev 0x0 at pci 0000:00:05.0 irq 10
> > [    7.546028] sym0: No NVRAM, ID 7, Fast-40, LVD, parity checking
> > [    7.559724] sym0: SCSI BUS has been reset.
> > [    7.560820] sym0: interrupted SCRIPT address not found.
> > [    7.563802] scsi host2: sym-2.2.3
> > [    7.881111] e1000 0000:00:03.0 eth0: (PCI:33MHz:32-bit) 52:54:00:12:34:56
> > [    7.881998] e1000 0000:00:03.0 eth0: Intel(R) PRO/1000 Network Connection
> > [    7.925902] e1000 0000:00:03.0 ens3: renamed from eth0
> > [   32.654811] scsi 2:0:0:0: tag#192 ABORT operation started
> > [   37.764283] scsi 2:0:0:0: ABORT operation timed-out.
> > [   37.774974] scsi 2:0:0:0: tag#192 DEVICE RESET operation started
> > [   42.882488] scsi 2:0:0:0: DEVICE RESET operation timed-out.
> > [   42.883606] scsi 2:0:0:0: tag#192 BUS RESET operation started
> > [   48.002437] scsi 2:0:0:0: BUS RESET operation timed-out.
> > [   48.003030] scsi 2:0:0:0: tag#192 HOST RESET operation started
> > [   48.010226] sym0: SCSI BUS has been reset.
> > [   53.122472] scsi 2:0:0:0: HOST RESET operation timed-out.
> > [   53.123030] scsi 2:0:0:0: Device offlined - not ready after error recovery
> 
> The commandline I used is:
> ./qemu-system-x86_64 \
>    -cpu 'kvm64' \
>    -m 4096 \
>    -serial 'stdio' \
>    -device 'lsi,id=scsihw0,bus=pci.0,addr=0x5' \
>    -drive
> 'file=/dev/zvol/myzpool/vm-9006-disk-0,if=none,id=drive-scsi0,format=raw' \
>    -device
> 'scsi-hd,bus=scsihw0.0,scsi-id=0,drive=drive-scsi0,id=scsi0,bootindex=100' \
>    -machine 'pc'
> 
> Happy to provide any more information if necessary!
> 
> CC-ing Fam Zheng (reviewer:SCSI)
> 
> Originally reported by one of our community members:
> https://forum.proxmox.com/threads/123843/
> 
> Best Regards,
> Fiona
> 

Thanks, I confirmed this by booting up a livecd iso with an lsi device
attached.  I will do some digging

Stack-trace:

#0  trace_memory_region_reentrant_io (cpu_index=<optimized out>, mr=<optimized out>, offset=<optimized out>, size=<optimized out>) at trace/trace-softmmu.h:337
#1  0x000055555815ce67 in access_with_adjusted_size (addr=addr@entry=0x1000, value=0x7ffef01fb980, size=size@entry=0x4, access_size_min=0x1, access_size_min@entry=0x0, access_size_max=0x4, access_fn=0x555558181370 <memory_region_read_accessor>, mr=0x627000000c50, attrs=...
) at ../softmmu/memory.c:552
#2  0x000055555815aec7 in memory_region_dispatch_read1 (mr=0x627000000c50, addr=0x1000, pval=<optimized out>, size=0x4, attrs=...) at ../softmmu/memory.c:1448
#3  memory_region_dispatch_read (mr=0x627000000c50, addr=0x1000, pval=<optimized out>, op=<optimized out>, attrs=...) at ../softmmu/memory.c:1475
#4  0x000055555819eb77 in flatview_read_continue (fv=<optimized out>, addr=0xfebf1000, attrs=..., ptr=<optimized out>, len=0x4, addr1=0x627000000c50, l=<optimized out>, mr=<optimized out>) at ../softmmu/physmem.c:2893
#5  0x000055555819f844 in flatview_read (fv=<optimized out>, addr=<optimized out>, attrs=..., buf=<optimized out>, len=<optimized out>) at ../softmmu/physmem.c:2935
#6  0x000055555819f554 in address_space_read_full (as=<optimized out>, addr=0xfebf1000, attrs=..., buf=0x7ffef01fd800, len=0x4) at ../softmmu/physmem.c:2948
#7  0x00005555575475ab in dma_memory_rw_relaxed (as=0x0, len=0x4, dir=DMA_DIRECTION_TO_DEVICE, attrs=..., addr=<optimized out>, buf=<optimized out>) at include/sysemu/dma.h:87
#8  dma_memory_rw (as=0x0, len=0x4, dir=DMA_DIRECTION_TO_DEVICE, attrs=..., addr=<optimized out>, buf=<optimized out>) at include/sysemu/dma.h:130
#9  pci_dma_rw (dev=<optimized out>, addr=0xfebf1000, len=0x4, dir=DMA_DIRECTION_TO_DEVICE, attrs=..., buf=<optimized out>) at include/hw/pci/pci_device.h:233
#10 pci_dma_read (dev=<optimized out>, addr=0xfebf1000, len=0x4, buf=<optimized out>) at include/hw/pci/pci_device.h:252
#11 read_dword (s=0x627000000100, addr=<optimized out>) at ../hw/scsi/lsi53c895a.c:472
#12 lsi_execute_script (s=s@entry=0x627000000100) at ../hw/scsi/lsi53c895a.c:1155
#13 0x0000555557540c67 in lsi_reg_writeb (s=<optimized out>, offset=<optimized out>, val=<optimized out>) at ../hw/scsi/lsi53c895a.c:2005
#14 0x000055555815d232 in memory_region_write_accessor (mr=<optimized out>, addr=<optimized out>, value=<optimized out>, size=<optimized out>, shift=<optimized out>, mask=<optimized out>, attrs=...) at ../softmmu/memory.c:493
#15 0x000055555815cbfb in access_with_adjusted_size (addr=0x2c, value=value@entry=0x7ffef01fdba0, size=size@entry=0x4, access_size_min=<optimized out>, access_size_min@entry=0x1, access_size_max=<optimized out>, access_fn=0x55555815cef0 <memory_region_write_accessor>, mr=0
x627000000b40, attrs=...) at ../softmmu/memory.c:569
#16 0x000055555815c134 in memory_region_dispatch_write (mr=0x627000000b40, addr=<optimized out>, data=<optimized out>, op=<optimized out>, attrs=...) at ../softmmu/memory.c:1539
#17 0x00005555581a87b1 in flatview_write_continue (fv=<optimized out>, addr=0xfebf302c, attrs=..., ptr=<optimized out>, len=0x4, addr1=0x627000000c50, l=<optimized out>, mr=<optimized out>) at ../softmmu/physmem.c:2826
#18 0x000055555819fdc4 in flatview_write (fv=<optimized out>, addr=<optimized out>, attrs=..., buf=<optimized out>, len=<optimized out>) at ../softmmu/physmem.c:2868
#19 0x000055555819fad4 in address_space_write (as=<optimized out>, addr=0xfebf302c, attrs=..., buf=0x7ffff3e16028, len=0x4) at ../softmmu/physmem.c:2964
#20 0x00005555584078dc in kvm_cpu_exec (cpu=cpu@entry=0x629000019200) at ../accel/kvm/kvm-all.c:2980
#21 0x0000555558421d64 in kvm_vcpu_thread_fn (arg=0x629000019200) at ../accel/kvm/kvm-accel-ops.c:51
#22 0x0000555558a5a279 in qemu_thread_start (args=0x60300008a6d0) at ../util/qemu-thread-posix.c:512
#23 0x00007ffff66a7fd4 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#24 0x00007ffff672866c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
Alexander Bulekov March 10, 2023, 12:31 p.m. UTC | #3
On 230310 0723, Alexander Bulekov wrote:
> On 230310 1214, Fiona Ebner wrote:
> > Am 05.02.23 um 05:07 schrieb Alexander Bulekov:
> > > Add a flag to the DeviceState, when a device is engaged in PIO/MMIO/DMA.
> > > This flag is set/checked prior to calling a device's MemoryRegion
> > > handlers, and set when device code initiates DMA.  The purpose of this
> > > flag is to prevent two types of DMA-based reentrancy issues:
> > > 
> > > 1.) mmio -> dma -> mmio case
> > > 2.) bh -> dma write -> mmio case
> > > 
> > > These issues have led to problems such as stack-exhaustion and
> > > use-after-frees.
> > > 
> > > Summary of the problem from Peter Maydell:
> > > https://lore.kernel.org/qemu-devel/CAFEAcA_23vc7hE3iaM-JVA6W38LK4hJoWae5KcknhPRD5fPBZA@mail.gmail.com
> > > 
> > > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/62
> > > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/540
> > > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/541
> > > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/556
> > > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/557
> > > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/827
> > > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1282
> > > 
> > > Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
> > > Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> > > Signed-off-by: Alexander Bulekov <alxndr@bu.edu>
> > > Acked-by: Peter Xu <peterx@redhat.com>
> > > ---
> > >  include/hw/qdev-core.h |  7 +++++++
> > >  softmmu/memory.c       | 17 +++++++++++++++++
> > >  softmmu/trace-events   |  1 +
> > >  3 files changed, 25 insertions(+)
> > > 
> > Hi,
> > there seems to be an issue with this patch or existing issue exposed by
> > this patch in combination with the LSI SCSI controller.
> > 
> > After applying this patch on current master (i.e.
> > ee59483267de29056b5b2ee2421ef3844e5c9932), a Debian 11 with the LSI
> > controller would not boot properly anymore:
> > > [    7.540907] sym0: <895a> rev 0x0 at pci 0000:00:05.0 irq 10
> > > [    7.546028] sym0: No NVRAM, ID 7, Fast-40, LVD, parity checking
> > > [    7.559724] sym0: SCSI BUS has been reset.
> > > [    7.560820] sym0: interrupted SCRIPT address not found.
> > > [    7.563802] scsi host2: sym-2.2.3
> > > [    7.881111] e1000 0000:00:03.0 eth0: (PCI:33MHz:32-bit) 52:54:00:12:34:56
> > > [    7.881998] e1000 0000:00:03.0 eth0: Intel(R) PRO/1000 Network Connection
> > > [    7.925902] e1000 0000:00:03.0 ens3: renamed from eth0
> > > [   32.654811] scsi 2:0:0:0: tag#192 ABORT operation started
> > > [   37.764283] scsi 2:0:0:0: ABORT operation timed-out.
> > > [   37.774974] scsi 2:0:0:0: tag#192 DEVICE RESET operation started
> > > [   42.882488] scsi 2:0:0:0: DEVICE RESET operation timed-out.
> > > [   42.883606] scsi 2:0:0:0: tag#192 BUS RESET operation started
> > > [   48.002437] scsi 2:0:0:0: BUS RESET operation timed-out.
> > > [   48.003030] scsi 2:0:0:0: tag#192 HOST RESET operation started
> > > [   48.010226] sym0: SCSI BUS has been reset.
> > > [   53.122472] scsi 2:0:0:0: HOST RESET operation timed-out.
> > > [   53.123030] scsi 2:0:0:0: Device offlined - not ready after error recovery
> > 
> > The commandline I used is:
> > ./qemu-system-x86_64 \
> >    -cpu 'kvm64' \
> >    -m 4096 \
> >    -serial 'stdio' \
> >    -device 'lsi,id=scsihw0,bus=pci.0,addr=0x5' \
> >    -drive
> > 'file=/dev/zvol/myzpool/vm-9006-disk-0,if=none,id=drive-scsi0,format=raw' \
> >    -device
> > 'scsi-hd,bus=scsihw0.0,scsi-id=0,drive=drive-scsi0,id=scsi0,bootindex=100' \
> >    -machine 'pc'
> > 
> > Happy to provide any more information if necessary!
> > 
> > CC-ing Fam Zheng (reviewer:SCSI)
> > 
> > Originally reported by one of our community members:
> > https://forum.proxmox.com/threads/123843/
> > 
> > Best Regards,
> > Fiona
> > 
> 
> Thanks, I confirmed this by booting up a livecd iso with an lsi device
> attached.  I will do some digging
> 
> Stack-trace:
> 
> #0  trace_memory_region_reentrant_io (cpu_index=<optimized out>, mr=<optimized out>, offset=<optimized out>, size=<optimized out>) at trace/trace-softmmu.h:337
> #1  0x000055555815ce67 in access_with_adjusted_size (addr=addr@entry=0x1000, value=0x7ffef01fb980, size=size@entry=0x4, access_size_min=0x1, access_size_min@entry=0x0, access_size_max=0x4, access_fn=0x555558181370 <memory_region_read_accessor>, mr=0x627000000c50, attrs=...
> ) at ../softmmu/memory.c:552
> #2  0x000055555815aec7 in memory_region_dispatch_read1 (mr=0x627000000c50, addr=0x1000, pval=<optimized out>, size=0x4, attrs=...) at ../softmmu/memory.c:1448

This MR seems to be "lsi-ram".

From hw/scsi/lsi53c895a.c:

memory_region_init_io(&s->ram_io, OBJECT(s), &lsi_ram_ops, s,
        "lsi-ram", 0x2000);                    

So the LSI device is reading an LSI "Script" from its own IO region.. In
this particular case, I think there was no reason to use
memory_region_init_io rather than memory_region_init_ram, but this makes
me worried that there are other devices that use something like this.

-Alex
Peter Maydell March 10, 2023, 12:45 p.m. UTC | #4
On Fri, 10 Mar 2023 at 12:32, Alexander Bulekov <alxndr@bu.edu> wrote:
> This MR seems to be "lsi-ram".
>
> From hw/scsi/lsi53c895a.c:
>
> memory_region_init_io(&s->ram_io, OBJECT(s), &lsi_ram_ops, s,
>         "lsi-ram", 0x2000);
>
> So the LSI device is reading an LSI "Script" from its own IO region.. In
> this particular case, I think there was no reason to use
> memory_region_init_io rather than memory_region_init_ram, but this makes
> me worried that there are other devices that use something like this.

This particular device predates the entire MemoryRegion set of
abstractions, so it might have seemed easier at the time.
The endianness handling of the current code is also a bit
confusing and might make it tricky to convert to a RAM MR.

thanks
-- PMM
Alexander Bulekov March 10, 2023, 1:02 p.m. UTC | #5
On 230310 1245, Peter Maydell wrote:
> On Fri, 10 Mar 2023 at 12:32, Alexander Bulekov <alxndr@bu.edu> wrote:
> > This MR seems to be "lsi-ram".
> >
> > From hw/scsi/lsi53c895a.c:
> >
> > memory_region_init_io(&s->ram_io, OBJECT(s), &lsi_ram_ops, s,
> >         "lsi-ram", 0x2000);
> >
> > So the LSI device is reading an LSI "Script" from its own IO region.. In
> > this particular case, I think there was no reason to use
> > memory_region_init_io rather than memory_region_init_ram, but this makes
> > me worried that there are other devices that use something like this.
> 
> This particular device predates the entire MemoryRegion set of
> abstractions, so it might have seemed easier at the time.
> The endianness handling of the current code is also a bit
> confusing and might make it tricky to convert to a RAM MR.

With my trivial mr_io - > mr_ram conversion, I no longer hit the
re-entrancy tracepoint, and my livecd boots, but it's probably not
exhaustively using the script functionality.. 

Does the endianness actually cause a problem? As long as all accesses
(CPU -> LSI_RAM and LSI -> LSI_RAM) occur through the address_space API,
are we safe?
-Alex
Mark Cave-Ayland March 10, 2023, 1:07 p.m. UTC | #6
On 10/03/2023 12:45, Peter Maydell wrote:

> On Fri, 10 Mar 2023 at 12:32, Alexander Bulekov <alxndr@bu.edu> wrote:
>> This MR seems to be "lsi-ram".
>>
>>  From hw/scsi/lsi53c895a.c:
>>
>> memory_region_init_io(&s->ram_io, OBJECT(s), &lsi_ram_ops, s,
>>          "lsi-ram", 0x2000);
>>
>> So the LSI device is reading an LSI "Script" from its own IO region.. In
>> this particular case, I think there was no reason to use
>> memory_region_init_io rather than memory_region_init_ram, but this makes
>> me worried that there are other devices that use something like this.
> 
> This particular device predates the entire MemoryRegion set of
> abstractions, so it might have seemed easier at the time.
> The endianness handling of the current code is also a bit
> confusing and might make it tricky to convert to a RAM MR.

Since the LSI controller is attached to a PCI BAR then the accesses from the host to 
the RAM should all be little endian (you can see the conversions in the driver I 
wrote for the 40p machine to allow it to boot Linux: 
https://github.com/openbios/openbios/blob/master/drivers/lsi.c).

I'm mildly curious that read_dword() which reads the SCRIPTS program internally 
returns the value via cpu_to_le32(), as at first glance I would expect that to be the 
other way around...



ATB,

Mark.
Alexander Bulekov March 10, 2023, 1:28 p.m. UTC | #7
On 230310 0802, Alexander Bulekov wrote:
> On 230310 1245, Peter Maydell wrote:
> > On Fri, 10 Mar 2023 at 12:32, Alexander Bulekov <alxndr@bu.edu> wrote:
> > > This MR seems to be "lsi-ram".
> > >
> > > From hw/scsi/lsi53c895a.c:
> > >
> > > memory_region_init_io(&s->ram_io, OBJECT(s), &lsi_ram_ops, s,
> > >         "lsi-ram", 0x2000);
> > >
> > > So the LSI device is reading an LSI "Script" from its own IO region.. In
> > > this particular case, I think there was no reason to use
> > > memory_region_init_io rather than memory_region_init_ram, but this makes
> > > me worried that there are other devices that use something like this.
> > 
> > This particular device predates the entire MemoryRegion set of
> > abstractions, so it might have seemed easier at the time.
> > The endianness handling of the current code is also a bit
> > confusing and might make it tricky to convert to a RAM MR.
> 
> With my trivial mr_io - > mr_ram conversion, I no longer hit the
> re-entrancy tracepoint, and my livecd boots, but it's probably not
> exhaustively using the script functionality.. 
> 
> Does the endianness actually cause a problem? As long as all accesses
> (CPU -> LSI_RAM and LSI -> LSI_RAM) occur through the address_space API,
> are we safe?
> -Alex

Or maybe a rom_device with memory_region_rom_device_set_romd(romd_mode =
false) is better here?
Peter Maydell March 10, 2023, 1:37 p.m. UTC | #8
On Fri, 10 Mar 2023 at 13:29, Alexander Bulekov <alxndr@bu.edu> wrote:
>
> On 230310 0802, Alexander Bulekov wrote:
> > On 230310 1245, Peter Maydell wrote:
> > > On Fri, 10 Mar 2023 at 12:32, Alexander Bulekov <alxndr@bu.edu> wrote:
> > > > This MR seems to be "lsi-ram".
> > > >
> > > > From hw/scsi/lsi53c895a.c:
> > > >
> > > > memory_region_init_io(&s->ram_io, OBJECT(s), &lsi_ram_ops, s,
> > > >         "lsi-ram", 0x2000);
> > > >
> > > > So the LSI device is reading an LSI "Script" from its own IO region.. In
> > > > this particular case, I think there was no reason to use
> > > > memory_region_init_io rather than memory_region_init_ram, but this makes
> > > > me worried that there are other devices that use something like this.
> > >
> > > This particular device predates the entire MemoryRegion set of
> > > abstractions, so it might have seemed easier at the time.
> > > The endianness handling of the current code is also a bit
> > > confusing and might make it tricky to convert to a RAM MR.
> >
> > With my trivial mr_io - > mr_ram conversion, I no longer hit the
> > re-entrancy tracepoint, and my livecd boots, but it's probably not
> > exhaustively using the script functionality..
> >
> > Does the endianness actually cause a problem? As long as all accesses
> > (CPU -> LSI_RAM and LSI -> LSI_RAM) occur through the address_space API,
> > are we safe?

I'm not sure -- I looked at the code and I wasn't confident
on exactly what the combination of the DEVICE_LITTLE_ENDIAN
MemoryRegion and the use of stn_le_p/ldn_le_p would be.
I think it ought to be possible to use a RAM MR, but we'd
need to check the handling on both BE and LE hosts. Migration
compatibility is the other thing we would need to check, to
avoid accidentally switching from "script_ram[] contents are
in order X" to "they are in order Y"...

> Or maybe a rom_device with memory_region_rom_device_set_romd(romd_mode =
> false) is better here?

I don't think that helps -- if we can let the guest do direct
reads from the region then it's equally OK to let it do
direct writes, so we could just use a RAM MR.

-- PMM
diff mbox series

Patch

diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index 35fddb19a6..8858195262 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -162,6 +162,10 @@  struct NamedClockList {
     QLIST_ENTRY(NamedClockList) node;
 };
 
+typedef struct {
+    bool engaged_in_io;
+} MemReentrancyGuard;
+
 /**
  * DeviceState:
  * @realized: Indicates whether the device has been fully constructed.
@@ -194,6 +198,9 @@  struct DeviceState {
     int alias_required_for_version;
     ResettableState reset;
     GSList *unplug_blockers;
+
+    /* Is the device currently in mmio/pio/dma? Used to prevent re-entrancy */
+    MemReentrancyGuard mem_reentrancy_guard;
 };
 
 struct DeviceListener {
diff --git a/softmmu/memory.c b/softmmu/memory.c
index 9d64efca26..eefeeae317 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -533,6 +533,7 @@  static MemTxResult access_with_adjusted_size(hwaddr addr,
     uint64_t access_mask;
     unsigned access_size;
     unsigned i;
+    DeviceState *dev = NULL;
     MemTxResult r = MEMTX_OK;
 
     if (!access_size_min) {
@@ -542,6 +543,19 @@  static MemTxResult access_with_adjusted_size(hwaddr addr,
         access_size_max = 4;
     }
 
+    /* Do not allow more than one simultanous access to a device's IO Regions */
+    if (mr->owner &&
+        !mr->ram_device && !mr->ram && !mr->rom_device && !mr->readonly) {
+        dev = (DeviceState *) object_dynamic_cast(mr->owner, TYPE_DEVICE);
+        if (dev) {
+            if (dev->mem_reentrancy_guard.engaged_in_io) {
+                trace_memory_region_reentrant_io(get_cpu_index(), mr, addr, size);
+                return MEMTX_ERROR;
+            }
+            dev->mem_reentrancy_guard.engaged_in_io = true;
+        }
+    }
+
     /* FIXME: support unaligned access? */
     access_size = MAX(MIN(size, access_size_max), access_size_min);
     access_mask = MAKE_64BIT_MASK(0, access_size * 8);
@@ -556,6 +570,9 @@  static MemTxResult access_with_adjusted_size(hwaddr addr,
                         access_mask, attrs);
         }
     }
+    if (dev) {
+        dev->mem_reentrancy_guard.engaged_in_io = false;
+    }
     return r;
 }
 
diff --git a/softmmu/trace-events b/softmmu/trace-events
index 22606dc27b..62d04ea9a7 100644
--- a/softmmu/trace-events
+++ b/softmmu/trace-events
@@ -13,6 +13,7 @@  memory_region_ops_read(int cpu_index, void *mr, uint64_t addr, uint64_t value, u
 memory_region_ops_write(int cpu_index, void *mr, uint64_t addr, uint64_t value, unsigned size, const char *name) "cpu %d mr %p addr 0x%"PRIx64" value 0x%"PRIx64" size %u name '%s'"
 memory_region_subpage_read(int cpu_index, void *mr, uint64_t offset, uint64_t value, unsigned size) "cpu %d mr %p offset 0x%"PRIx64" value 0x%"PRIx64" size %u"
 memory_region_subpage_write(int cpu_index, void *mr, uint64_t offset, uint64_t value, unsigned size) "cpu %d mr %p offset 0x%"PRIx64" value 0x%"PRIx64" size %u"
+memory_region_reentrant_io(int cpu_index, void *mr, uint64_t offset, unsigned size) "cpu %d mr %p offset 0x%"PRIx64" size %u"
 memory_region_ram_device_read(int cpu_index, void *mr, uint64_t addr, uint64_t value, unsigned size) "cpu %d mr %p addr 0x%"PRIx64" value 0x%"PRIx64" size %u"
 memory_region_ram_device_write(int cpu_index, void *mr, uint64_t addr, uint64_t value, unsigned size) "cpu %d mr %p addr 0x%"PRIx64" value 0x%"PRIx64" size %u"
 memory_region_sync_dirty(const char *mr, const char *listener, int global) "mr '%s' listener '%s' synced (global=%d)"