diff mbox series

[v4,1/6] x86/msi: passthrough all MSI-X vector ctrl writes to device model

Message ID 6c030c703fcd37d7dfae3c488c21b5bca540e352.1700790421.git-series.marmarek@invisiblethingslab.com (mailing list archive)
State New, archived
Headers show
Series MSI-X support with qemu in stubdomain, and other related changes | expand

Commit Message

Marek Marczykowski-Górecki Nov. 24, 2023, 1:47 a.m. UTC
QEMU needs to know whether clearing maskbit of a vector is really
clearing, or was already cleared before. Currently Xen sends only
clearing that bit to the device model, but not setting it, so QEMU
cannot detect it. Because of that, QEMU is working this around by
checking via /dev/mem, but that isn't the proper approach.

Give all necessary information to QEMU by passing all ctrl writes,
including masking a vector. Advertise the new behavior via
XENVER_get_features, so QEMU can know it doesn't need to access /dev/mem
anymore.

While this commit doesn't move the whole maskbit handling to QEMU (as
discussed on xen-devel as one of the possibilities), it is a necessary
first step anyway. Including telling QEMU it will get all the required
information to do so. The actual implementation would need to include:
 - a hypercall for QEMU to control just maskbit (without (re)binding the
   interrupt again
 - a methor for QEMU to tell Xen it will actually do the work
Those are not part of this series.

Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
---
I did not added any control to enable/disable this new behavior (as
Roger have suggested for possible non-QEMU ioreqs). I don't see how the
new behavior could be problematic for some existing ioreq server (they
already received writes to those addresses, just not all of them),
but if that's really necessary, I can probably add a command line option
to restore previous behavior system-wide.

Changes in v4:
- ignore unaligned writes with X86EMUL_OKAY
- restructure the code to forward all writes in _msixtbl_write() instead
  of manipulating return value of msixtbl_write() - this makes
  WRITE_LEN4_COMPLETION special case unnecessary
- advertise the changed behavior via XENVER_get_features instead of DMOP
v3:
 - advertise changed behavior in XEN_DMOP_get_ioreq_server_info - make
   "flags" parameter IN/OUT
 - move len check back to msixtbl_write() - will be needed there anyway
   in a later patch
v2:
 - passthrough quad writes to emulator too (Jan)
 - (ab)use len==0 for write len=4 completion (Jan), but add descriptive
   #define for this magic value
---
 xen/arch/x86/hvm/vmsi.c       | 19 ++++++++++++++-----
 xen/common/kernel.c           |  1 +
 xen/include/public/features.h |  8 ++++++++
 3 files changed, 23 insertions(+), 5 deletions(-)

Comments

Jan Beulich Nov. 27, 2023, 4:18 p.m. UTC | #1
On 24.11.2023 02:47, Marek Marczykowski-Górecki wrote:
> @@ -357,7 +356,17 @@ static int cf_check _msixtbl_write(
>      const struct hvm_io_handler *handler, uint64_t address, uint32_t len,
>      uint64_t val)
>  {
> -    return msixtbl_write(current, address, len, val);
> +    /* ignore invalid length or unaligned writes */

Nit: Style (capital first letter).

> +    if ( len != 4 && len != 8 || !IS_ALIGNED(address, len) )

Please parenthesize the && as an operand of ||.

> --- a/xen/common/kernel.c
> +++ b/xen/common/kernel.c
> @@ -642,6 +642,7 @@ long do_xen_version(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>                  fi.submap |= (1U << XENFEAT_direct_mapped);
>              else
>                  fi.submap |= (1U << XENFEAT_not_direct_mapped);
> +            fi.submap |= (1U << XENFEAT_dm_msix_all_writes);

This probably wants to move up into the x86-only section?

Jan
diff mbox series

Patch

diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c
index 128f23636279..2436154c40b6 100644
--- a/xen/arch/x86/hvm/vmsi.c
+++ b/xen/arch/x86/hvm/vmsi.c
@@ -283,8 +283,8 @@  static int msixtbl_write(struct vcpu *v, unsigned long address,
     unsigned long flags;
     struct irq_desc *desc;
 
-    if ( (len != 4 && len != 8) || (address & (len - 1)) )
-        return r;
+    if ( !IS_ALIGNED(address, len) )
+        return X86EMUL_OKAY;
 
     rcu_read_lock(&msixtbl_rcu_lock);
 
@@ -345,8 +345,7 @@  static int msixtbl_write(struct vcpu *v, unsigned long address,
 
 unlock:
     spin_unlock_irqrestore(&desc->lock, flags);
-    if ( len == 4 )
-        r = X86EMUL_OKAY;
+    r = X86EMUL_OKAY;
 
 out:
     rcu_read_unlock(&msixtbl_rcu_lock);
@@ -357,7 +356,17 @@  static int cf_check _msixtbl_write(
     const struct hvm_io_handler *handler, uint64_t address, uint32_t len,
     uint64_t val)
 {
-    return msixtbl_write(current, address, len, val);
+    /* ignore invalid length or unaligned writes */
+    if ( len != 4 && len != 8 || !IS_ALIGNED(address, len) )
+        return X86EMUL_OKAY;
+
+    /*
+     * This function returns X86EMUL_UNHANDLEABLE even if write is properly
+     * handled, to propagate it to the device model (so it can keep its
+     * internal state in sync).
+     */
+    msixtbl_write(current, address, len, val);
+    return X86EMUL_UNHANDLEABLE;
 }
 
 static bool cf_check msixtbl_range(
diff --git a/xen/common/kernel.c b/xen/common/kernel.c
index 08dbaa2a054c..229784c6ce52 100644
--- a/xen/common/kernel.c
+++ b/xen/common/kernel.c
@@ -642,6 +642,7 @@  long do_xen_version(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
                 fi.submap |= (1U << XENFEAT_direct_mapped);
             else
                 fi.submap |= (1U << XENFEAT_not_direct_mapped);
+            fi.submap |= (1U << XENFEAT_dm_msix_all_writes);
             break;
         default:
             return -EINVAL;
diff --git a/xen/include/public/features.h b/xen/include/public/features.h
index 36936f6a4ee0..634534827d43 100644
--- a/xen/include/public/features.h
+++ b/xen/include/public/features.h
@@ -120,6 +120,14 @@ 
 #define XENFEAT_runstate_phys_area	  18
 #define XENFEAT_vcpu_time_phys_area	  19
 
+/*
+ * If set, Xen will passthrough all MSI-X vector ctrl writes to device model,
+ * not only those unmasking an entry. This allows device model to properly keep
+ * track of the MSI-X table without having to read it from the device behind
+ * Xen's backs. This information is relevant only for device models.
+ */
+#define XENFEAT_dm_msix_all_writes        20
+
 #define XENFEAT_NR_SUBMAPS 1
 
 #endif /* __XEN_PUBLIC_FEATURES_H__ */