diff mbox series

[v2] hw/scsi/lsi53c895a: add timer to scripts processing

Message ID 20240229204407.1699260-1-svens@stackframe.org (mailing list archive)
State New, archived
Headers show
Series [v2] hw/scsi/lsi53c895a: add timer to scripts processing | expand

Commit Message

Sven Schnelle Feb. 29, 2024, 8:44 p.m. UTC
HP-UX 10.20 seems to make the lsi53c895a spinning on a memory location
under certain circumstances. As the SCSI controller and CPU are not
running at the same time this loop will never finish. After some
time, the check loop interrupts with a unexpected device disconnect.
This works, but is slow because the kernel resets the scsi controller.
Instead of signaling UDC, start a timer and exit the loop. Until the
timer fires, the CPU can process instructions which might changes the
memory location.

The limit of instructions is also reduced because scripts running on
the SCSI processor are usually very short. This keeps the time until
the loop is exit short.

Suggested-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Sven Schnelle <svens@stackframe.org>
---
Changes in v2:
- update comment in lsi_execute_script()
- reset waiting state and del timer in lsi_execute_script() to
  handle the case where script processing is triggered via
  register write, and not from the pending timer
- delete timer in lsi_scsi_exit()

 hw/scsi/lsi53c895a.c | 43 +++++++++++++++++++++++++++++++++----------
 hw/scsi/trace-events |  2 ++
 2 files changed, 35 insertions(+), 10 deletions(-)

Comments

Peter Maydell March 4, 2024, 3:08 p.m. UTC | #1
On Thu, 29 Feb 2024 at 20:44, Sven Schnelle <svens@stackframe.org> wrote:
>
> HP-UX 10.20 seems to make the lsi53c895a spinning on a memory location
> under certain circumstances. As the SCSI controller and CPU are not
> running at the same time this loop will never finish. After some
> time, the check loop interrupts with a unexpected device disconnect.
> This works, but is slow because the kernel resets the scsi controller.
> Instead of signaling UDC, start a timer and exit the loop. Until the
> timer fires, the CPU can process instructions which might changes the
> memory location.
>
> The limit of instructions is also reduced because scripts running on
> the SCSI processor are usually very short. This keeps the time until
> the loop is exit short.

"exited"

>
> Suggested-by: Peter Maydell <peter.maydell@linaro.org>
> Signed-off-by: Sven Schnelle <svens@stackframe.org>
> ---
> Changes in v2:
> - update comment in lsi_execute_script()
> - reset waiting state and del timer in lsi_execute_script() to
>   handle the case where script processing is triggered via
>   register write, and not from the pending timer
> - delete timer in lsi_scsi_exit()

Other than the s/host/guest/ comment fix,
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

I don't suppose anybody has a setup with the Windows drivers
to test this on? (commit ee4d919f30f13 suggests that at least
Windows XP and 2003 had this problem.)

thanks
-- PMM
Sven Schnelle March 4, 2024, 3:58 p.m. UTC | #2
Peter Maydell <peter.maydell@linaro.org> writes:

> On Thu, 29 Feb 2024 at 20:44, Sven Schnelle <svens@stackframe.org> wrote:
>>
>> HP-UX 10.20 seems to make the lsi53c895a spinning on a memory location
>> under certain circumstances. As the SCSI controller and CPU are not
>> running at the same time this loop will never finish. After some
>> time, the check loop interrupts with a unexpected device disconnect.
>> This works, but is slow because the kernel resets the scsi controller.
>> Instead of signaling UDC, start a timer and exit the loop. Until the
>> timer fires, the CPU can process instructions which might changes the
>> memory location.
>>
>> The limit of instructions is also reduced because scripts running on
>> the SCSI processor are usually very short. This keeps the time until
>> the loop is exit short.
>
> "exited"
>
>>
>> Suggested-by: Peter Maydell <peter.maydell@linaro.org>
>> Signed-off-by: Sven Schnelle <svens@stackframe.org>
>> ---
>> Changes in v2:
>> - update comment in lsi_execute_script()
>> - reset waiting state and del timer in lsi_execute_script() to
>>   handle the case where script processing is triggered via
>>   register write, and not from the pending timer
>> - delete timer in lsi_scsi_exit()
>
> Other than the s/host/guest/ comment fix,
> Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
>
> I don't suppose anybody has a setup with the Windows drivers
> to test this on? (commit ee4d919f30f13 suggests that at least
> Windows XP and 2003 had this problem.)

I have a Windows XP VM with lsi53c895a. I just fired it up and added
a qemu_log() in the timer path, and seen it trigger once while copying
a few files. It looks like Windows XP (or better the SCSI driver) also
works with this patch.
Peter Maydell March 4, 2024, 4:07 p.m. UTC | #3
On Mon, 4 Mar 2024 at 15:58, Sven Schnelle <svens@stackframe.org> wrote:
>
> Peter Maydell <peter.maydell@linaro.org> writes:
>
> > On Thu, 29 Feb 2024 at 20:44, Sven Schnelle <svens@stackframe.org> wrote:
> >>
> >> HP-UX 10.20 seems to make the lsi53c895a spinning on a memory location
> >> under certain circumstances. As the SCSI controller and CPU are not
> >> running at the same time this loop will never finish. After some
> >> time, the check loop interrupts with a unexpected device disconnect.
> >> This works, but is slow because the kernel resets the scsi controller.
> >> Instead of signaling UDC, start a timer and exit the loop. Until the
> >> timer fires, the CPU can process instructions which might changes the
> >> memory location.
> >>
> >> The limit of instructions is also reduced because scripts running on
> >> the SCSI processor are usually very short. This keeps the time until
> >> the loop is exit short.
> >
> > "exited"
> >
> >>
> >> Suggested-by: Peter Maydell <peter.maydell@linaro.org>
> >> Signed-off-by: Sven Schnelle <svens@stackframe.org>
> >> ---
> >> Changes in v2:
> >> - update comment in lsi_execute_script()
> >> - reset waiting state and del timer in lsi_execute_script() to
> >>   handle the case where script processing is triggered via
> >>   register write, and not from the pending timer
> >> - delete timer in lsi_scsi_exit()
> >
> > Other than the s/host/guest/ comment fix,
> > Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
> >
> > I don't suppose anybody has a setup with the Windows drivers
> > to test this on? (commit ee4d919f30f13 suggests that at least
> > Windows XP and 2003 had this problem.)
>
> I have a Windows XP VM with lsi53c895a. I just fired it up and added
> a qemu_log() in the timer path, and seen it trigger once while copying
> a few files. It looks like Windows XP (or better the SCSI driver) also
> works with this patch.

Excellent; thanks for testing.

-- PMM
diff mbox series

Patch

diff --git a/hw/scsi/lsi53c895a.c b/hw/scsi/lsi53c895a.c
index d607a5f9fb..4ff9470381 100644
--- a/hw/scsi/lsi53c895a.c
+++ b/hw/scsi/lsi53c895a.c
@@ -188,7 +188,7 @@  static const char *names[] = {
 #define LSI_TAG_VALID     (1 << 16)
 
 /* Maximum instructions to process. */
-#define LSI_MAX_INSN    10000
+#define LSI_MAX_INSN    100
 
 typedef struct lsi_request {
     SCSIRequest *req;
@@ -205,6 +205,7 @@  enum {
     LSI_WAIT_RESELECT, /* Wait Reselect instruction has been issued */
     LSI_DMA_SCRIPTS, /* processing DMA from lsi_execute_script */
     LSI_DMA_IN_PROGRESS, /* DMA operation is in progress */
+    LSI_WAIT_SCRIPTS, /* SCRIPTS stopped because of instruction count limit */
 };
 
 enum {
@@ -224,6 +225,7 @@  struct LSIState {
     MemoryRegion ram_io;
     MemoryRegion io_io;
     AddressSpace pci_io_as;
+    QEMUTimer *scripts_timer;
 
     int carry; /* ??? Should this be an a visible register somewhere?  */
     int status;
@@ -415,6 +417,7 @@  static void lsi_soft_reset(LSIState *s)
     s->sbr = 0;
     assert(QTAILQ_EMPTY(&s->queue));
     assert(!s->current);
+    timer_del(s->scripts_timer);
 }
 
 static int lsi_dma_40bit(LSIState *s)
@@ -1127,6 +1130,12 @@  static void lsi_wait_reselect(LSIState *s)
     }
 }
 
+static void lsi_scripts_timer_start(LSIState *s)
+{
+    trace_lsi_scripts_timer_start();
+    timer_mod(s->scripts_timer, qemu_clock_get_us(QEMU_CLOCK_VIRTUAL) + 500);
+}
+
 static void lsi_execute_script(LSIState *s)
 {
     PCIDevice *pci_dev = PCI_DEVICE(s);
@@ -1136,6 +1145,11 @@  static void lsi_execute_script(LSIState *s)
     int insn_processed = 0;
     static int reentrancy_level;
 
+    if (s->waiting == LSI_WAIT_SCRIPTS) {
+        timer_del(s->scripts_timer);
+        s->waiting = LSI_NOWAIT;
+    }
+
     reentrancy_level++;
 
     s->istat1 |= LSI_ISTAT1_SRUN;
@@ -1143,8 +1157,8 @@  again:
     /*
      * Some windows drivers make the device spin waiting for a memory location
      * to change. If we have executed more than LSI_MAX_INSN instructions then
-     * assume this is the case and force an unexpected device disconnect. This
-     * is apparently sufficient to beat the drivers into submission.
+     * assume this is the case and start a timer. Until the timer fires, the
+     * host CPU has a chance to run and change the memory location.
      *
      * Another issue (CVE-2023-0330) can occur if the script is programmed to
      * trigger itself again and again. Avoid this problem by stopping after
@@ -1152,13 +1166,8 @@  again:
      * which should be enough for all valid use cases).
      */
     if (++insn_processed > LSI_MAX_INSN || reentrancy_level > 8) {
-        if (!(s->sien0 & LSI_SIST0_UDC)) {
-            qemu_log_mask(LOG_GUEST_ERROR,
-                          "lsi_scsi: inf. loop with UDC masked");
-        }
-        lsi_script_scsi_interrupt(s, LSI_SIST0_UDC, 0);
-        lsi_disconnect(s);
-        trace_lsi_execute_script_stop();
+        s->waiting = LSI_WAIT_SCRIPTS;
+        lsi_scripts_timer_start(s);
         reentrancy_level--;
         return;
     }
@@ -2197,6 +2206,9 @@  static int lsi_post_load(void *opaque, int version_id)
         return -EINVAL;
     }
 
+    if (s->waiting == LSI_WAIT_SCRIPTS) {
+        lsi_scripts_timer_start(s);
+    }
     return 0;
 }
 
@@ -2294,6 +2306,15 @@  static const struct SCSIBusInfo lsi_scsi_info = {
     .cancel = lsi_request_cancelled
 };
 
+static void scripts_timer_cb(void *opaque)
+{
+    LSIState *s = opaque;
+
+    trace_lsi_scripts_timer_triggered();
+    s->waiting = LSI_NOWAIT;
+    lsi_execute_script(s);
+}
+
 static void lsi_scsi_realize(PCIDevice *dev, Error **errp)
 {
     LSIState *s = LSI53C895A(dev);
@@ -2313,6 +2334,7 @@  static void lsi_scsi_realize(PCIDevice *dev, Error **errp)
                           "lsi-ram", 0x2000);
     memory_region_init_io(&s->io_io, OBJECT(s), &lsi_io_ops, s,
                           "lsi-io", 256);
+    s->scripts_timer = timer_new_us(QEMU_CLOCK_VIRTUAL, scripts_timer_cb, s);
 
     /*
      * Since we use the address-space API to interact with ram_io, disable the
@@ -2337,6 +2359,7 @@  static void lsi_scsi_exit(PCIDevice *dev)
     LSIState *s = LSI53C895A(dev);
 
     address_space_destroy(&s->pci_io_as);
+    timer_del(s->scripts_timer);
 }
 
 static void lsi_class_init(ObjectClass *klass, void *data)
diff --git a/hw/scsi/trace-events b/hw/scsi/trace-events
index d72f741ed8..f0f2a98c2e 100644
--- a/hw/scsi/trace-events
+++ b/hw/scsi/trace-events
@@ -302,6 +302,8 @@  lsi_execute_script_stop(void) "SCRIPTS execution stopped"
 lsi_awoken(void) "Woken by SIGP"
 lsi_reg_read(const char *name, int offset, uint8_t ret) "Read reg %s 0x%x = 0x%02x"
 lsi_reg_write(const char *name, int offset, uint8_t val) "Write reg %s 0x%x = 0x%02x"
+lsi_scripts_timer_triggered(void) "SCRIPTS timer triggered"
+lsi_scripts_timer_start(void) "SCRIPTS timer started"
 
 # virtio-scsi.c
 virtio_scsi_cmd_req(int lun, uint32_t tag, uint8_t cmd) "virtio_scsi_cmd_req lun=%u tag=0x%x cmd=0x%x"