[v6,1/4] memory: prevent dma-reentracy issues

Message ID	20230205040737.3567731-2-alxndr@bu.edu (mailing list archive)
State	New, archived
Headers	show Return-Path: <qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org> IronPort-Data: A9a23:d6ZY16Lh4oOS4/c1FE+Rb5clxSXFcZb7ZxGr2PjKsXjdYENS0TVRz 2JKCjjXM6yIazD9LdF/bYvnoRkB75XTx4A2HAZorCE8RH908vbIVI+TRqvS04J+DSFhoGZPt Zh2hgzodZhsJpPkjk7xdOOn9Twkj/rgqoPUUIbsIjp2SRJvVBAvgBdin/9RqoNziJ2yDhjlV ena+qUzA3f4nW8rWo4ow/jb8kg35qyu4GhwUmEWPpingnePzxH5M7pCfcldH1OgKqFIE+izQ fr0zb3R1gs1KD90V7tJOp6iGqE7aue60Tqm0xK6aID76vR2nRHe545gXBYqhea7vB3S9zx54 I0lWZVd0m7FNIWV8AgWe0Aw/y2TocSqUVIISJSymZX78qHIT5fj6+t1Klg1GqI2xr00BmJvr cceDToIcynW0opawJrjIgVtrsEqLc2uJYBG/386lWmfAvEhTpTOBa7N4Le03h9q3pEITauYP ZBDL2A3NHwsYDUWUrsTIJs6jOGknFH1bntVpE/9Sa8fuTWPkFIsj+O2WDbTUoSWQ+4EpXSSm lzh8TzUI0AoLN6e+RPQpxpAgceKx0sXQrk6DbC967tmjUOewkQVDxsZU0b9puO24nNSQPpaI k0QvzMt9O08rhTzCNb6WBK8rTiPuRt0t8dsLtDWITqlksL8izt1zEBdJtKdQLTKbPMLeAE= IronPort-HdrOrdr: A9a23:ViXBL6//pcVJX08ldoluk+DoI+orL9Y04lQ7vn2ZLiY4TiX4ra +TdZEgviMc5wx+ZJhNo7G90cu7MBHhHO9OgbX5VI3KNGOKhILPFvAG0WKI+UyCJ8SRzJ866Y 5QN4R4Fd3sHRxboK/BkXCF+g8bsb26GXaT9IDj80s= From: Alexander Bulekov <alxndr@bu.edu> To: qemu-devel@nongnu.org Cc: Alexander Bulekov <alxndr@bu.edu>, Stefan Hajnoczi <stefanha@redhat.com>, =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= <philmd@linaro.org>, Mauro Matteo Cascella <mcascell@redhat.com>, Peter Xu <peterx@redhat.com>, Jason Wang <jasowang@redhat.com>, David Hildenbrand <david@redhat.com>, Gerd Hoffmann <kraxel@redhat.com>, Thomas Huth <thuth@redhat.com>, Laurent Vivier <lvivier@redhat.com>, Bandan Das <bsd@redhat.com>, "Edgar E . Iglesias" <edgar.iglesias@gmail.com>, Darren Kenny <darren.kenny@oracle.com>, Bin Meng <bin.meng@windriver.com>, Paolo Bonzini <pbonzini@redhat.com>, "Michael S . Tsirkin" <mst@redhat.com>, Marcel Apfelbaum <marcel.apfelbaum@gmail.com>, =?utf-8?q?Daniel_P_=2E_Berra?= =?utf-8?q?ng=C3=A9?= <berrange@redhat.com>, Eduardo Habkost <eduardo@habkost.net>, Jon Maloy <jmaloy@redhat.com>, Siqi Chen <coc.cyqh@gmail.com> Subject: [PATCH v6 1/4] memory: prevent dma-reentracy issues Date: Sat, 4 Feb 2023 23:07:34 -0500 Message-Id: <20230205040737.3567731-2-alxndr@bu.edu> In-Reply-To: <20230205040737.3567731-1-alxndr@bu.edu> References: <20230205040737.3567731-1-alxndr@bu.edu> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-CES-GSUITE_AUTH: bf3aNvsZpxl8 Received-SPF: pass client-ip=216.71.140.196; envelope-from=alxndr@bu.edu; helo=esa8.hc2706-39.iphmx.com X-Spam_score_int: -10 X-Spam_score: -1.1 X-Spam_bar: - X-Spam_report: (-1.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HK_RANDOM_ENVFROM=0.001, HK_RANDOM_FROM=0.999, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action Precedence: list Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org
Series	memory: prevent dma-reentracy issues \| expand [v6,0/4] memory: prevent dma-reentracy issues [v6,1/4] memory: prevent dma-reentracy issues [v6,2/4] async: Add an optional reentrancy guard to the BH API [v6,3/4] checkpatch: add qemu_bh_new/aio_bh_new checks [v6,4/4] hw: replace most qemu_bh_new calls with qemu_bh_new_guarded

Alexander Bulekov Feb. 5, 2023, 4:07 a.m. UTC

Add a flag to the DeviceState, when a device is engaged in PIO/MMIO/DMA.
This flag is set/checked prior to calling a device's MemoryRegion
handlers, and set when device code initiates DMA.  The purpose of this
flag is to prevent two types of DMA-based reentrancy issues:

1.) mmio -> dma -> mmio case
2.) bh -> dma write -> mmio case

These issues have led to problems such as stack-exhaustion and
use-after-frees.

Summary of the problem from Peter Maydell:
https://lore.kernel.org/qemu-devel/CAFEAcA_23vc7hE3iaM-JVA6W38LK4hJoWae5KcknhPRD5fPBZA@mail.gmail.com

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/62
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/540
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/541
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/556
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/557
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/827
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1282

Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Alexander Bulekov <alxndr@bu.edu>
Acked-by: Peter Xu <peterx@redhat.com>
---
 include/hw/qdev-core.h |  7 +++++++
 softmmu/memory.c       | 17 +++++++++++++++++
 softmmu/trace-events   |  1 +
 3 files changed, 25 insertions(+)

Fiona Ebner March 10, 2023, 11:14 a.m. UTC | #1

Am 05.02.23 um 05:07 schrieb Alexander Bulekov:
> Add a flag to the DeviceState, when a device is engaged in PIO/MMIO/DMA.
> This flag is set/checked prior to calling a device's MemoryRegion
> handlers, and set when device code initiates DMA.  The purpose of this
> flag is to prevent two types of DMA-based reentrancy issues:
> 
> 1.) mmio -> dma -> mmio case
> 2.) bh -> dma write -> mmio case
> 
> These issues have led to problems such as stack-exhaustion and
> use-after-frees.
> 
> Summary of the problem from Peter Maydell:
> https://lore.kernel.org/qemu-devel/CAFEAcA_23vc7hE3iaM-JVA6W38LK4hJoWae5KcknhPRD5fPBZA@mail.gmail.com
> 
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/62
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/540
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/541
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/556
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/557
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/827
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1282
> 
> Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> Signed-off-by: Alexander Bulekov <alxndr@bu.edu>
> Acked-by: Peter Xu <peterx@redhat.com>
> ---
>  include/hw/qdev-core.h |  7 +++++++
>  softmmu/memory.c       | 17 +++++++++++++++++
>  softmmu/trace-events   |  1 +
>  3 files changed, 25 insertions(+)
> 
Hi,
there seems to be an issue with this patch or existing issue exposed by
this patch in combination with the LSI SCSI controller.

After applying this patch on current master (i.e.
ee59483267de29056b5b2ee2421ef3844e5c9932), a Debian 11 with the LSI
controller would not boot properly anymore:
> [    7.540907] sym0: <895a> rev 0x0 at pci 0000:00:05.0 irq 10
> [    7.546028] sym0: No NVRAM, ID 7, Fast-40, LVD, parity checking
> [    7.559724] sym0: SCSI BUS has been reset.
> [    7.560820] sym0: interrupted SCRIPT address not found.
> [    7.563802] scsi host2: sym-2.2.3
> [    7.881111] e1000 0000:00:03.0 eth0: (PCI:33MHz:32-bit) 52:54:00:12:34:56
> [    7.881998] e1000 0000:00:03.0 eth0: Intel(R) PRO/1000 Network Connection
> [    7.925902] e1000 0000:00:03.0 ens3: renamed from eth0
> [   32.654811] scsi 2:0:0:0: tag#192 ABORT operation started
> [   37.764283] scsi 2:0:0:0: ABORT operation timed-out.
> [   37.774974] scsi 2:0:0:0: tag#192 DEVICE RESET operation started
> [   42.882488] scsi 2:0:0:0: DEVICE RESET operation timed-out.
> [   42.883606] scsi 2:0:0:0: tag#192 BUS RESET operation started
> [   48.002437] scsi 2:0:0:0: BUS RESET operation timed-out.
> [   48.003030] scsi 2:0:0:0: tag#192 HOST RESET operation started
> [   48.010226] sym0: SCSI BUS has been reset.
> [   53.122472] scsi 2:0:0:0: HOST RESET operation timed-out.
> [   53.123030] scsi 2:0:0:0: Device offlined - not ready after error recovery

The commandline I used is:
./qemu-system-x86_64 \
   -cpu 'kvm64' \
   -m 4096 \
   -serial 'stdio' \
   -device 'lsi,id=scsihw0,bus=pci.0,addr=0x5' \
   -drive
'file=/dev/zvol/myzpool/vm-9006-disk-0,if=none,id=drive-scsi0,format=raw' \
   -device
'scsi-hd,bus=scsihw0.0,scsi-id=0,drive=drive-scsi0,id=scsi0,bootindex=100' \
   -machine 'pc'

Happy to provide any more information if necessary!

CC-ing Fam Zheng (reviewer:SCSI)

Originally reported by one of our community members:
https://forum.proxmox.com/threads/123843/

Best Regards,
Fiona

Alexander Bulekov March 10, 2023, 12:23 p.m. UTC | #2

On 230310 1214, Fiona Ebner wrote:
> Am 05.02.23 um 05:07 schrieb Alexander Bulekov:
> > Add a flag to the DeviceState, when a device is engaged in PIO/MMIO/DMA.
> > This flag is set/checked prior to calling a device's MemoryRegion
> > handlers, and set when device code initiates DMA.  The purpose of this
> > flag is to prevent two types of DMA-based reentrancy issues:
> > 
> > 1.) mmio -> dma -> mmio case
> > 2.) bh -> dma write -> mmio case
> > 
> > These issues have led to problems such as stack-exhaustion and
> > use-after-frees.
> > 
> > Summary of the problem from Peter Maydell:
> > https://lore.kernel.org/qemu-devel/CAFEAcA_23vc7hE3iaM-JVA6W38LK4hJoWae5KcknhPRD5fPBZA@mail.gmail.com
> > 
> > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/62
> > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/540
> > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/541
> > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/556
> > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/557
> > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/827
> > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1282
> > 
> > Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
> > Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> > Signed-off-by: Alexander Bulekov <alxndr@bu.edu>
> > Acked-by: Peter Xu <peterx@redhat.com>
> > ---
> >  include/hw/qdev-core.h |  7 +++++++
> >  softmmu/memory.c       | 17 +++++++++++++++++
> >  softmmu/trace-events   |  1 +
> >  3 files changed, 25 insertions(+)
> > 
> Hi,
> there seems to be an issue with this patch or existing issue exposed by
> this patch in combination with the LSI SCSI controller.
> 
> After applying this patch on current master (i.e.
> ee59483267de29056b5b2ee2421ef3844e5c9932), a Debian 11 with the LSI
> controller would not boot properly anymore:
> > [    7.540907] sym0: <895a> rev 0x0 at pci 0000:00:05.0 irq 10
> > [    7.546028] sym0: No NVRAM, ID 7, Fast-40, LVD, parity checking
> > [    7.559724] sym0: SCSI BUS has been reset.
> > [    7.560820] sym0: interrupted SCRIPT address not found.
> > [    7.563802] scsi host2: sym-2.2.3
> > [    7.881111] e1000 0000:00:03.0 eth0: (PCI:33MHz:32-bit) 52:54:00:12:34:56
> > [    7.881998] e1000 0000:00:03.0 eth0: Intel(R) PRO/1000 Network Connection
> > [    7.925902] e1000 0000:00:03.0 ens3: renamed from eth0
> > [   32.654811] scsi 2:0:0:0: tag#192 ABORT operation started
> > [   37.764283] scsi 2:0:0:0: ABORT operation timed-out.
> > [   37.774974] scsi 2:0:0:0: tag#192 DEVICE RESET operation started
> > [   42.882488] scsi 2:0:0:0: DEVICE RESET operation timed-out.
> > [   42.883606] scsi 2:0:0:0: tag#192 BUS RESET operation started
> > [   48.002437] scsi 2:0:0:0: BUS RESET operation timed-out.
> > [   48.003030] scsi 2:0:0:0: tag#192 HOST RESET operation started
> > [   48.010226] sym0: SCSI BUS has been reset.
> > [   53.122472] scsi 2:0:0:0: HOST RESET operation timed-out.
> > [   53.123030] scsi 2:0:0:0: Device offlined - not ready after error recovery
> 
> The commandline I used is:
> ./qemu-system-x86_64 \
>    -cpu 'kvm64' \
>    -m 4096 \
>    -serial 'stdio' \
>    -device 'lsi,id=scsihw0,bus=pci.0,addr=0x5' \
>    -drive
> 'file=/dev/zvol/myzpool/vm-9006-disk-0,if=none,id=drive-scsi0,format=raw' \
>    -device
> 'scsi-hd,bus=scsihw0.0,scsi-id=0,drive=drive-scsi0,id=scsi0,bootindex=100' \
>    -machine 'pc'
> 
> Happy to provide any more information if necessary!
> 
> CC-ing Fam Zheng (reviewer:SCSI)
> 
> Originally reported by one of our community members:
> https://forum.proxmox.com/threads/123843/
> 
> Best Regards,
> Fiona
> 

Thanks, I confirmed this by booting up a livecd iso with an lsi device
attached.  I will do some digging

Stack-trace:

#0  trace_memory_region_reentrant_io (cpu_index=<optimized out>, mr=<optimized out>, offset=<optimized out>, size=<optimized out>) at trace/trace-softmmu.h:337
#1  0x000055555815ce67 in access_with_adjusted_size (addr=addr@entry=0x1000, value=0x7ffef01fb980, size=size@entry=0x4, access_size_min=0x1, access_size_min@entry=0x0, access_size_max=0x4, access_fn=0x555558181370 <memory_region_read_accessor>, mr=0x627000000c50, attrs=...
) at ../softmmu/memory.c:552
#2  0x000055555815aec7 in memory_region_dispatch_read1 (mr=0x627000000c50, addr=0x1000, pval=<optimized out>, size=0x4, attrs=...) at ../softmmu/memory.c:1448
#3  memory_region_dispatch_read (mr=0x627000000c50, addr=0x1000, pval=<optimized out>, op=<optimized out>, attrs=...) at ../softmmu/memory.c:1475
#4  0x000055555819eb77 in flatview_read_continue (fv=<optimized out>, addr=0xfebf1000, attrs=..., ptr=<optimized out>, len=0x4, addr1=0x627000000c50, l=<optimized out>, mr=<optimized out>) at ../softmmu/physmem.c:2893
#5  0x000055555819f844 in flatview_read (fv=<optimized out>, addr=<optimized out>, attrs=..., buf=<optimized out>, len=<optimized out>) at ../softmmu/physmem.c:2935
#6  0x000055555819f554 in address_space_read_full (as=<optimized out>, addr=0xfebf1000, attrs=..., buf=0x7ffef01fd800, len=0x4) at ../softmmu/physmem.c:2948
#7  0x00005555575475ab in dma_memory_rw_relaxed (as=0x0, len=0x4, dir=DMA_DIRECTION_TO_DEVICE, attrs=..., addr=<optimized out>, buf=<optimized out>) at include/sysemu/dma.h:87
#8  dma_memory_rw (as=0x0, len=0x4, dir=DMA_DIRECTION_TO_DEVICE, attrs=..., addr=<optimized out>, buf=<optimized out>) at include/sysemu/dma.h:130
#9  pci_dma_rw (dev=<optimized out>, addr=0xfebf1000, len=0x4, dir=DMA_DIRECTION_TO_DEVICE, attrs=..., buf=<optimized out>) at include/hw/pci/pci_device.h:233
#10 pci_dma_read (dev=<optimized out>, addr=0xfebf1000, len=0x4, buf=<optimized out>) at include/hw/pci/pci_device.h:252
#11 read_dword (s=0x627000000100, addr=<optimized out>) at ../hw/scsi/lsi53c895a.c:472
#12 lsi_execute_script (s=s@entry=0x627000000100) at ../hw/scsi/lsi53c895a.c:1155
#13 0x0000555557540c67 in lsi_reg_writeb (s=<optimized out>, offset=<optimized out>, val=<optimized out>) at ../hw/scsi/lsi53c895a.c:2005
#14 0x000055555815d232 in memory_region_write_accessor (mr=<optimized out>, addr=<optimized out>, value=<optimized out>, size=<optimized out>, shift=<optimized out>, mask=<optimized out>, attrs=...) at ../softmmu/memory.c:493
#15 0x000055555815cbfb in access_with_adjusted_size (addr=0x2c, value=value@entry=0x7ffef01fdba0, size=size@entry=0x4, access_size_min=<optimized out>, access_size_min@entry=0x1, access_size_max=<optimized out>, access_fn=0x55555815cef0 <memory_region_write_accessor>, mr=0
x627000000b40, attrs=...) at ../softmmu/memory.c:569
#16 0x000055555815c134 in memory_region_dispatch_write (mr=0x627000000b40, addr=<optimized out>, data=<optimized out>, op=<optimized out>, attrs=...) at ../softmmu/memory.c:1539
#17 0x00005555581a87b1 in flatview_write_continue (fv=<optimized out>, addr=0xfebf302c, attrs=..., ptr=<optimized out>, len=0x4, addr1=0x627000000c50, l=<optimized out>, mr=<optimized out>) at ../softmmu/physmem.c:2826
#18 0x000055555819fdc4 in flatview_write (fv=<optimized out>, addr=<optimized out>, attrs=..., buf=<optimized out>, len=<optimized out>) at ../softmmu/physmem.c:2868
#19 0x000055555819fad4 in address_space_write (as=<optimized out>, addr=0xfebf302c, attrs=..., buf=0x7ffff3e16028, len=0x4) at ../softmmu/physmem.c:2964
#20 0x00005555584078dc in kvm_cpu_exec (cpu=cpu@entry=0x629000019200) at ../accel/kvm/kvm-all.c:2980
#21 0x0000555558421d64 in kvm_vcpu_thread_fn (arg=0x629000019200) at ../accel/kvm/kvm-accel-ops.c:51
#22 0x0000555558a5a279 in qemu_thread_start (args=0x60300008a6d0) at ../util/qemu-thread-posix.c:512
#23 0x00007ffff66a7fd4 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#24 0x00007ffff672866c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

Alexander Bulekov March 10, 2023, 12:31 p.m. UTC | #3

On 230310 0723, Alexander Bulekov wrote:
> On 230310 1214, Fiona Ebner wrote:
> > Am 05.02.23 um 05:07 schrieb Alexander Bulekov:
> > > Add a flag to the DeviceState, when a device is engaged in PIO/MMIO/DMA.
> > > This flag is set/checked prior to calling a device's MemoryRegion
> > > handlers, and set when device code initiates DMA.  The purpose of this
> > > flag is to prevent two types of DMA-based reentrancy issues:
> > > 
> > > 1.) mmio -> dma -> mmio case
> > > 2.) bh -> dma write -> mmio case
> > > 
> > > These issues have led to problems such as stack-exhaustion and
> > > use-after-frees.
> > > 
> > > Summary of the problem from Peter Maydell:
> > > https://lore.kernel.org/qemu-devel/CAFEAcA_23vc7hE3iaM-JVA6W38LK4hJoWae5KcknhPRD5fPBZA@mail.gmail.com
> > > 
> > > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/62
> > > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/540
> > > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/541
> > > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/556
> > > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/557
> > > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/827
> > > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1282
> > > 
> > > Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
> > > Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> > > Signed-off-by: Alexander Bulekov <alxndr@bu.edu>
> > > Acked-by: Peter Xu <peterx@redhat.com>
> > > ---
> > >  include/hw/qdev-core.h |  7 +++++++
> > >  softmmu/memory.c       | 17 +++++++++++++++++
> > >  softmmu/trace-events   |  1 +
> > >  3 files changed, 25 insertions(+)
> > > 
> > Hi,
> > there seems to be an issue with this patch or existing issue exposed by
> > this patch in combination with the LSI SCSI controller.
> > 
> > After applying this patch on current master (i.e.
> > ee59483267de29056b5b2ee2421ef3844e5c9932), a Debian 11 with the LSI
> > controller would not boot properly anymore:
> > > [    7.540907] sym0: <895a> rev 0x0 at pci 0000:00:05.0 irq 10
> > > [    7.546028] sym0: No NVRAM, ID 7, Fast-40, LVD, parity checking
> > > [    7.559724] sym0: SCSI BUS has been reset.
> > > [    7.560820] sym0: interrupted SCRIPT address not found.
> > > [    7.563802] scsi host2: sym-2.2.3
> > > [    7.881111] e1000 0000:00:03.0 eth0: (PCI:33MHz:32-bit) 52:54:00:12:34:56
> > > [    7.881998] e1000 0000:00:03.0 eth0: Intel(R) PRO/1000 Network Connection
> > > [    7.925902] e1000 0000:00:03.0 ens3: renamed from eth0
> > > [   32.654811] scsi 2:0:0:0: tag#192 ABORT operation started
> > > [   37.764283] scsi 2:0:0:0: ABORT operation timed-out.
> > > [   37.774974] scsi 2:0:0:0: tag#192 DEVICE RESET operation started
> > > [   42.882488] scsi 2:0:0:0: DEVICE RESET operation timed-out.
> > > [   42.883606] scsi 2:0:0:0: tag#192 BUS RESET operation started
> > > [   48.002437] scsi 2:0:0:0: BUS RESET operation timed-out.
> > > [   48.003030] scsi 2:0:0:0: tag#192 HOST RESET operation started
> > > [   48.010226] sym0: SCSI BUS has been reset.
> > > [   53.122472] scsi 2:0:0:0: HOST RESET operation timed-out.
> > > [   53.123030] scsi 2:0:0:0: Device offlined - not ready after error recovery
> > 
> > The commandline I used is:
> > ./qemu-system-x86_64 \
> >    -cpu 'kvm64' \
> >    -m 4096 \
> >    -serial 'stdio' \
> >    -device 'lsi,id=scsihw0,bus=pci.0,addr=0x5' \
> >    -drive
> > 'file=/dev/zvol/myzpool/vm-9006-disk-0,if=none,id=drive-scsi0,format=raw' \
> >    -device
> > 'scsi-hd,bus=scsihw0.0,scsi-id=0,drive=drive-scsi0,id=scsi0,bootindex=100' \
> >    -machine 'pc'
> > 
> > Happy to provide any more information if necessary!
> > 
> > CC-ing Fam Zheng (reviewer:SCSI)
> > 
> > Originally reported by one of our community members:
> > https://forum.proxmox.com/threads/123843/
> > 
> > Best Regards,
> > Fiona
> > 
> 
> Thanks, I confirmed this by booting up a livecd iso with an lsi device
> attached.  I will do some digging
> 
> Stack-trace:
> 
> #0  trace_memory_region_reentrant_io (cpu_index=<optimized out>, mr=<optimized out>, offset=<optimized out>, size=<optimized out>) at trace/trace-softmmu.h:337
> #1  0x000055555815ce67 in access_with_adjusted_size (addr=addr@entry=0x1000, value=0x7ffef01fb980, size=size@entry=0x4, access_size_min=0x1, access_size_min@entry=0x0, access_size_max=0x4, access_fn=0x555558181370 <memory_region_read_accessor>, mr=0x627000000c50, attrs=...
> ) at ../softmmu/memory.c:552
> #2  0x000055555815aec7 in memory_region_dispatch_read1 (mr=0x627000000c50, addr=0x1000, pval=<optimized out>, size=0x4, attrs=...) at ../softmmu/memory.c:1448

This MR seems to be "lsi-ram".

From hw/scsi/lsi53c895a.c:

memory_region_init_io(&s->ram_io, OBJECT(s), &lsi_ram_ops, s,
        "lsi-ram", 0x2000);                    

So the LSI device is reading an LSI "Script" from its own IO region.. In
this particular case, I think there was no reason to use
memory_region_init_io rather than memory_region_init_ram, but this makes
me worried that there are other devices that use something like this.

-Alex

Peter Maydell March 10, 2023, 12:45 p.m. UTC | #4

On Fri, 10 Mar 2023 at 12:32, Alexander Bulekov <alxndr@bu.edu> wrote:
> This MR seems to be "lsi-ram".
>
> From hw/scsi/lsi53c895a.c:
>
> memory_region_init_io(&s->ram_io, OBJECT(s), &lsi_ram_ops, s,
>         "lsi-ram", 0x2000);
>
> So the LSI device is reading an LSI "Script" from its own IO region.. In
> this particular case, I think there was no reason to use
> memory_region_init_io rather than memory_region_init_ram, but this makes
> me worried that there are other devices that use something like this.

This particular device predates the entire MemoryRegion set of
abstractions, so it might have seemed easier at the time.
The endianness handling of the current code is also a bit
confusing and might make it tricky to convert to a RAM MR.

thanks
-- PMM

Alexander Bulekov March 10, 2023, 1:02 p.m. UTC | #5

On 230310 1245, Peter Maydell wrote:
> On Fri, 10 Mar 2023 at 12:32, Alexander Bulekov <alxndr@bu.edu> wrote:
> > This MR seems to be "lsi-ram".
> >
> > From hw/scsi/lsi53c895a.c:
> >
> > memory_region_init_io(&s->ram_io, OBJECT(s), &lsi_ram_ops, s,
> >         "lsi-ram", 0x2000);
> >
> > So the LSI device is reading an LSI "Script" from its own IO region.. In
> > this particular case, I think there was no reason to use
> > memory_region_init_io rather than memory_region_init_ram, but this makes
> > me worried that there are other devices that use something like this.
> 
> This particular device predates the entire MemoryRegion set of
> abstractions, so it might have seemed easier at the time.
> The endianness handling of the current code is also a bit
> confusing and might make it tricky to convert to a RAM MR.

With my trivial mr_io - > mr_ram conversion, I no longer hit the
re-entrancy tracepoint, and my livecd boots, but it's probably not
exhaustively using the script functionality.. 

Does the endianness actually cause a problem? As long as all accesses
(CPU -> LSI_RAM and LSI -> LSI_RAM) occur through the address_space API,
are we safe?
-Alex

Mark Cave-Ayland March 10, 2023, 1:07 p.m. UTC | #6

On 10/03/2023 12:45, Peter Maydell wrote:

> On Fri, 10 Mar 2023 at 12:32, Alexander Bulekov <alxndr@bu.edu> wrote:
>> This MR seems to be "lsi-ram".
>>
>>  From hw/scsi/lsi53c895a.c:
>>
>> memory_region_init_io(&s->ram_io, OBJECT(s), &lsi_ram_ops, s,
>>          "lsi-ram", 0x2000);
>>
>> So the LSI device is reading an LSI "Script" from its own IO region.. In
>> this particular case, I think there was no reason to use
>> memory_region_init_io rather than memory_region_init_ram, but this makes
>> me worried that there are other devices that use something like this.
> 
> This particular device predates the entire MemoryRegion set of
> abstractions, so it might have seemed easier at the time.
> The endianness handling of the current code is also a bit
> confusing and might make it tricky to convert to a RAM MR.

Since the LSI controller is attached to a PCI BAR then the accesses from the host to 
the RAM should all be little endian (you can see the conversions in the driver I 
wrote for the 40p machine to allow it to boot Linux: 
https://github.com/openbios/openbios/blob/master/drivers/lsi.c).

I'm mildly curious that read_dword() which reads the SCRIPTS program internally 
returns the value via cpu_to_le32(), as at first glance I would expect that to be the 
other way around...

ATB,

Mark.

Alexander Bulekov March 10, 2023, 1:28 p.m. UTC | #7

On 230310 0802, Alexander Bulekov wrote:
> On 230310 1245, Peter Maydell wrote:
> > On Fri, 10 Mar 2023 at 12:32, Alexander Bulekov <alxndr@bu.edu> wrote:
> > > This MR seems to be "lsi-ram".
> > >
> > > From hw/scsi/lsi53c895a.c:
> > >
> > > memory_region_init_io(&s->ram_io, OBJECT(s), &lsi_ram_ops, s,
> > >         "lsi-ram", 0x2000);
> > >
> > > So the LSI device is reading an LSI "Script" from its own IO region.. In
> > > this particular case, I think there was no reason to use
> > > memory_region_init_io rather than memory_region_init_ram, but this makes
> > > me worried that there are other devices that use something like this.
> > 
> > This particular device predates the entire MemoryRegion set of
> > abstractions, so it might have seemed easier at the time.
> > The endianness handling of the current code is also a bit
> > confusing and might make it tricky to convert to a RAM MR.
> 
> With my trivial mr_io - > mr_ram conversion, I no longer hit the
> re-entrancy tracepoint, and my livecd boots, but it's probably not
> exhaustively using the script functionality.. 
> 
> Does the endianness actually cause a problem? As long as all accesses
> (CPU -> LSI_RAM and LSI -> LSI_RAM) occur through the address_space API,
> are we safe?
> -Alex

Or maybe a rom_device with memory_region_rom_device_set_romd(romd_mode =
false) is better here?

Peter Maydell March 10, 2023, 1:37 p.m. UTC | #8

On Fri, 10 Mar 2023 at 13:29, Alexander Bulekov <alxndr@bu.edu> wrote:
>
> On 230310 0802, Alexander Bulekov wrote:
> > On 230310 1245, Peter Maydell wrote:
> > > On Fri, 10 Mar 2023 at 12:32, Alexander Bulekov <alxndr@bu.edu> wrote:
> > > > This MR seems to be "lsi-ram".
> > > >
> > > > From hw/scsi/lsi53c895a.c:
> > > >
> > > > memory_region_init_io(&s->ram_io, OBJECT(s), &lsi_ram_ops, s,
> > > >         "lsi-ram", 0x2000);
> > > >
> > > > So the LSI device is reading an LSI "Script" from its own IO region.. In
> > > > this particular case, I think there was no reason to use
> > > > memory_region_init_io rather than memory_region_init_ram, but this makes
> > > > me worried that there are other devices that use something like this.
> > >
> > > This particular device predates the entire MemoryRegion set of
> > > abstractions, so it might have seemed easier at the time.
> > > The endianness handling of the current code is also a bit
> > > confusing and might make it tricky to convert to a RAM MR.
> >
> > With my trivial mr_io - > mr_ram conversion, I no longer hit the
> > re-entrancy tracepoint, and my livecd boots, but it's probably not
> > exhaustively using the script functionality..
> >
> > Does the endianness actually cause a problem? As long as all accesses
> > (CPU -> LSI_RAM and LSI -> LSI_RAM) occur through the address_space API,
> > are we safe?

I'm not sure -- I looked at the code and I wasn't confident
on exactly what the combination of the DEVICE_LITTLE_ENDIAN
MemoryRegion and the use of stn_le_p/ldn_le_p would be.
I think it ought to be possible to use a RAM MR, but we'd
need to check the handling on both BE and LE hosts. Migration
compatibility is the other thing we would need to check, to
avoid accidentally switching from "script_ram[] contents are
in order X" to "they are in order Y"...

> Or maybe a rom_device with memory_region_rom_device_set_romd(romd_mode =
> false) is better here?

I don't think that helps -- if we can let the guest do direct
reads from the region then it's equally OK to let it do
direct writes, so we could just use a RAM MR.

-- PMM

[v6,1/4] memory: prevent dma-reentracy issues

Commit Message

Comments

Patch