diff mbox series

KVM: x86: Use MMCONFIG for all PCI config space accesses

Message ID 20200730193510.578309-1-jusual@redhat.com (mailing list archive)
State New, archived
Headers show
Series KVM: x86: Use MMCONFIG for all PCI config space accesses | expand

Commit Message

Julia Suvorova July 30, 2020, 7:35 p.m. UTC
Using MMCONFIG instead of I/O ports cuts the number of config space
accesses in half, which is faster on KVM and opens the door for
additional optimizations such as Vitaly's "[PATCH 0/3] KVM: x86: KVM
MEM_PCI_HOLE memory":
https://lore.kernel.org/kvm/20200728143741.2718593-1-vkuznets@redhat.com

However, this change will not bring significant performance improvement
unless it is running on x86 within a hypervisor. Moreover, allowing
MMCONFIG access for addresses < 256 can be dangerous for some devices:
see commit a0ca99096094 ("PCI x86: always use conf1 to access config
space below 256 bytes"). That is why a special feature flag is needed.

Introduce KVM_FEATURE_PCI_GO_MMCONFIG, which can be enabled when the
configuration is known to be safe (e.g. in QEMU).

Signed-off-by: Julia Suvorova <jusual@redhat.com>
---
 Documentation/virt/kvm/cpuid.rst     |  4 ++++
 arch/x86/include/uapi/asm/kvm_para.h |  1 +
 arch/x86/kernel/kvm.c                | 14 ++++++++++++++
 3 files changed, 19 insertions(+)

Comments

Andy Shevchenko July 30, 2020, 7:50 p.m. UTC | #1
On Thu, Jul 30, 2020 at 10:37 PM Julia Suvorova <jusual@redhat.com> wrote:
>
> Using MMCONFIG instead of I/O ports cuts the number of config space
> accesses in half, which is faster on KVM and opens the door for
> additional optimizations such as Vitaly's "[PATCH 0/3] KVM: x86: KVM
> MEM_PCI_HOLE memory":

> https://lore.kernel.org/kvm/20200728143741.2718593-1-vkuznets@redhat.com

You may use Link: tag for this.

> However, this change will not bring significant performance improvement
> unless it is running on x86 within a hypervisor. Moreover, allowing
> MMCONFIG access for addresses < 256 can be dangerous for some devices:
> see commit a0ca99096094 ("PCI x86: always use conf1 to access config
> space below 256 bytes"). That is why a special feature flag is needed.
>
> Introduce KVM_FEATURE_PCI_GO_MMCONFIG, which can be enabled when the
> configuration is known to be safe (e.g. in QEMU).

...

> +static int __init kvm_pci_arch_init(void)
> +{
> +       if (raw_pci_ext_ops &&
> +           kvm_para_has_feature(KVM_FEATURE_PCI_GO_MMCONFIG)) {

Better to use traditional pattern, i.e.
  if (not_supported)
    return bail_out;

  ...do useful things...
  return 0;

> +               pr_info("PCI: Using MMCONFIG for base access\n");
> +               raw_pci_ops = raw_pci_ext_ops;
> +               return 0;
> +       }

> +       return 1;

Hmm... I don't remember what positive codes means there. Perhaps you
need to return a rather error code?

> +}
Vitaly Kuznetsov July 31, 2020, 9:22 a.m. UTC | #2
Andy Shevchenko <andy.shevchenko@gmail.com> writes:

> On Thu, Jul 30, 2020 at 10:37 PM Julia Suvorova <jusual@redhat.com> wrote:
>>
>> Using MMCONFIG instead of I/O ports cuts the number of config space
>> accesses in half, which is faster on KVM and opens the door for
>> additional optimizations such as Vitaly's "[PATCH 0/3] KVM: x86: KVM
>> MEM_PCI_HOLE memory":
>
>> https://lore.kernel.org/kvm/20200728143741.2718593-1-vkuznets@redhat.com
>
> You may use Link: tag for this.
>
>> However, this change will not bring significant performance improvement
>> unless it is running on x86 within a hypervisor. Moreover, allowing
>> MMCONFIG access for addresses < 256 can be dangerous for some devices:
>> see commit a0ca99096094 ("PCI x86: always use conf1 to access config
>> space below 256 bytes"). That is why a special feature flag is needed.
>>
>> Introduce KVM_FEATURE_PCI_GO_MMCONFIG, which can be enabled when the
>> configuration is known to be safe (e.g. in QEMU).
>
> ...
>
>> +static int __init kvm_pci_arch_init(void)
>> +{
>> +       if (raw_pci_ext_ops &&
>> +           kvm_para_has_feature(KVM_FEATURE_PCI_GO_MMCONFIG)) {
>
> Better to use traditional pattern, i.e.
>   if (not_supported)
>     return bail_out;
>
>   ...do useful things...
>   return 0;
>
>> +               pr_info("PCI: Using MMCONFIG for base access\n");
>> +               raw_pci_ops = raw_pci_ext_ops;
>> +               return 0;
>> +       }
>
>> +       return 1;
>
> Hmm... I don't remember what positive codes means there. Perhaps you
> need to return a rather error code?

If I'm reading the code correctly,

pci_arch_init() has the following:

        if (x86_init.pci.arch_init && !x86_init.pci.arch_init())
                return 0;


so returning '1' here means 'continue' and this seems to be
correct. (E.g. Hyper-V's hv_pci_init() does the same). What I'm not sure
about is 'return 0' above as this will result in skipping the rest of
pci_arch_init(). Was this desired or should we return '1' in both cases?
Andy Shevchenko July 31, 2020, 9:41 a.m. UTC | #3
On Fri, Jul 31, 2020 at 12:22 PM Vitaly Kuznetsov <vkuznets@redhat.com> wrote:
> Andy Shevchenko <andy.shevchenko@gmail.com> writes:
> > On Thu, Jul 30, 2020 at 10:37 PM Julia Suvorova <jusual@redhat.com> wrote:

...

> >> +static int __init kvm_pci_arch_init(void)
> >> +{
> >> +       if (raw_pci_ext_ops &&

> >> +               return 0;
> >> +       }
> >
> >> +       return 1;
> >
> > Hmm... I don't remember what positive codes means there. Perhaps you
> > need to return a rather error code?
>
> If I'm reading the code correctly,
>
> pci_arch_init() has the following:
>
>         if (x86_init.pci.arch_init && !x86_init.pci.arch_init())
>                 return 0;
>
>
> so returning '1' here means 'continue' and this seems to be
> correct. (E.g. Hyper-V's hv_pci_init() does the same). What I'm not sure
> about is 'return 0' above as this will result in skipping the rest of
> pci_arch_init(). Was this desired or should we return '1' in both cases?

I think it depends what you want. In complex cases we recognize three
possibilities

-ERRNO: function failed, we have to stop and bailout with error from callee
0: function OK, stop and return 0
1: function OK, continue the rest in callee

Do we have needs in this or is the current enough for all (exist) callees?
Julia Suvorova July 31, 2020, 6:23 p.m. UTC | #4
On Fri, Jul 31, 2020 at 11:22 AM Vitaly Kuznetsov <vkuznets@redhat.com> wrote:
>
> Andy Shevchenko <andy.shevchenko@gmail.com> writes:
>
> > On Thu, Jul 30, 2020 at 10:37 PM Julia Suvorova <jusual@redhat.com> wrote:
> >>
> >> Using MMCONFIG instead of I/O ports cuts the number of config space
> >> accesses in half, which is faster on KVM and opens the door for
> >> additional optimizations such as Vitaly's "[PATCH 0/3] KVM: x86: KVM
> >> MEM_PCI_HOLE memory":
> >
> >> https://lore.kernel.org/kvm/20200728143741.2718593-1-vkuznets@redhat.com
> >
> > You may use Link: tag for this.
> >
> >> However, this change will not bring significant performance improvement
> >> unless it is running on x86 within a hypervisor. Moreover, allowing
> >> MMCONFIG access for addresses < 256 can be dangerous for some devices:
> >> see commit a0ca99096094 ("PCI x86: always use conf1 to access config
> >> space below 256 bytes"). That is why a special feature flag is needed.
> >>
> >> Introduce KVM_FEATURE_PCI_GO_MMCONFIG, which can be enabled when the
> >> configuration is known to be safe (e.g. in QEMU).
> >
> > ...
> >
> >> +static int __init kvm_pci_arch_init(void)
> >> +{
> >> +       if (raw_pci_ext_ops &&
> >> +           kvm_para_has_feature(KVM_FEATURE_PCI_GO_MMCONFIG)) {
> >
> > Better to use traditional pattern, i.e.
> >   if (not_supported)
> >     return bail_out;
> >
> >   ...do useful things...
> >   return 0;
> >
> >> +               pr_info("PCI: Using MMCONFIG for base access\n");
> >> +               raw_pci_ops = raw_pci_ext_ops;
> >> +               return 0;
> >> +       }
> >
> >> +       return 1;
> >
> > Hmm... I don't remember what positive codes means there. Perhaps you
> > need to return a rather error code?
>
> If I'm reading the code correctly,
>
> pci_arch_init() has the following:
>
>         if (x86_init.pci.arch_init && !x86_init.pci.arch_init())
>                 return 0;
>
>
> so returning '1' here means 'continue' and this seems to be
> correct. (E.g. Hyper-V's hv_pci_init() does the same). What I'm not sure
> about is 'return 0' above as this will result in skipping the rest of
> pci_arch_init(). Was this desired or should we return '1' in both cases?

This is intentional because pci_direct_init() is about to overwrite
raw_pci_ops. And since QEMU doesn't have anything in
pciprobe_dmi_table, it is safe to skip it.

Best regards, Julia Suvorova.
diff mbox series

Patch

diff --git a/Documentation/virt/kvm/cpuid.rst b/Documentation/virt/kvm/cpuid.rst
index a7dff9186bed..711f2074877b 100644
--- a/Documentation/virt/kvm/cpuid.rst
+++ b/Documentation/virt/kvm/cpuid.rst
@@ -92,6 +92,10 @@  KVM_FEATURE_ASYNC_PF_INT          14          guest checks this feature bit
                                               async pf acknowledgment msr
                                               0x4b564d07.
 
+KVM_FEATURE_PCI_GO_MMCONFIG       15          guest checks this feature bit
+                                              before using MMCONFIG for all
+                                              PCI config accesses
+
 KVM_FEATURE_CLOCSOURCE_STABLE_BIT 24          host will warn if no guest-side
                                               per-cpu warps are expeced in
                                               kvmclock
diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
index 812e9b4c1114..5793f372cae0 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -32,6 +32,7 @@ 
 #define KVM_FEATURE_POLL_CONTROL	12
 #define KVM_FEATURE_PV_SCHED_YIELD	13
 #define KVM_FEATURE_ASYNC_PF_INT	14
+#define KVM_FEATURE_PCI_GO_MMCONFIG	15
 
 #define KVM_HINTS_REALTIME      0
 
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index df63786e7bfa..1ec73e6f25ce 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -33,6 +33,7 @@ 
 #include <asm/hypervisor.h>
 #include <asm/tlb.h>
 #include <asm/cpuidle_haltpoll.h>
+#include <asm/pci_x86.h>
 
 DEFINE_STATIC_KEY_FALSE(kvm_async_pf_enabled);
 
@@ -715,6 +716,18 @@  static uint32_t __init kvm_detect(void)
 	return kvm_cpuid_base();
 }
 
+static int __init kvm_pci_arch_init(void)
+{
+	if (raw_pci_ext_ops &&
+	    kvm_para_has_feature(KVM_FEATURE_PCI_GO_MMCONFIG)) {
+		pr_info("PCI: Using MMCONFIG for base access\n");
+		raw_pci_ops = raw_pci_ext_ops;
+		return 0;
+	}
+
+	return 1;
+}
+
 static void __init kvm_apic_init(void)
 {
 #if defined(CONFIG_SMP)
@@ -726,6 +739,7 @@  static void __init kvm_apic_init(void)
 static void __init kvm_init_platform(void)
 {
 	kvmclock_init();
+	x86_init.pci.arch_init = kvm_pci_arch_init;
 	x86_platform.apic_post_init = kvm_apic_init;
 }