Message ID | 20200730193510.578309-1-jusual@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: x86: Use MMCONFIG for all PCI config space accesses | expand |
On Thu, Jul 30, 2020 at 10:37 PM Julia Suvorova <jusual@redhat.com> wrote: > > Using MMCONFIG instead of I/O ports cuts the number of config space > accesses in half, which is faster on KVM and opens the door for > additional optimizations such as Vitaly's "[PATCH 0/3] KVM: x86: KVM > MEM_PCI_HOLE memory": > https://lore.kernel.org/kvm/20200728143741.2718593-1-vkuznets@redhat.com You may use Link: tag for this. > However, this change will not bring significant performance improvement > unless it is running on x86 within a hypervisor. Moreover, allowing > MMCONFIG access for addresses < 256 can be dangerous for some devices: > see commit a0ca99096094 ("PCI x86: always use conf1 to access config > space below 256 bytes"). That is why a special feature flag is needed. > > Introduce KVM_FEATURE_PCI_GO_MMCONFIG, which can be enabled when the > configuration is known to be safe (e.g. in QEMU). ... > +static int __init kvm_pci_arch_init(void) > +{ > + if (raw_pci_ext_ops && > + kvm_para_has_feature(KVM_FEATURE_PCI_GO_MMCONFIG)) { Better to use traditional pattern, i.e. if (not_supported) return bail_out; ...do useful things... return 0; > + pr_info("PCI: Using MMCONFIG for base access\n"); > + raw_pci_ops = raw_pci_ext_ops; > + return 0; > + } > + return 1; Hmm... I don't remember what positive codes means there. Perhaps you need to return a rather error code? > +}
Andy Shevchenko <andy.shevchenko@gmail.com> writes: > On Thu, Jul 30, 2020 at 10:37 PM Julia Suvorova <jusual@redhat.com> wrote: >> >> Using MMCONFIG instead of I/O ports cuts the number of config space >> accesses in half, which is faster on KVM and opens the door for >> additional optimizations such as Vitaly's "[PATCH 0/3] KVM: x86: KVM >> MEM_PCI_HOLE memory": > >> https://lore.kernel.org/kvm/20200728143741.2718593-1-vkuznets@redhat.com > > You may use Link: tag for this. > >> However, this change will not bring significant performance improvement >> unless it is running on x86 within a hypervisor. Moreover, allowing >> MMCONFIG access for addresses < 256 can be dangerous for some devices: >> see commit a0ca99096094 ("PCI x86: always use conf1 to access config >> space below 256 bytes"). That is why a special feature flag is needed. >> >> Introduce KVM_FEATURE_PCI_GO_MMCONFIG, which can be enabled when the >> configuration is known to be safe (e.g. in QEMU). > > ... > >> +static int __init kvm_pci_arch_init(void) >> +{ >> + if (raw_pci_ext_ops && >> + kvm_para_has_feature(KVM_FEATURE_PCI_GO_MMCONFIG)) { > > Better to use traditional pattern, i.e. > if (not_supported) > return bail_out; > > ...do useful things... > return 0; > >> + pr_info("PCI: Using MMCONFIG for base access\n"); >> + raw_pci_ops = raw_pci_ext_ops; >> + return 0; >> + } > >> + return 1; > > Hmm... I don't remember what positive codes means there. Perhaps you > need to return a rather error code? If I'm reading the code correctly, pci_arch_init() has the following: if (x86_init.pci.arch_init && !x86_init.pci.arch_init()) return 0; so returning '1' here means 'continue' and this seems to be correct. (E.g. Hyper-V's hv_pci_init() does the same). What I'm not sure about is 'return 0' above as this will result in skipping the rest of pci_arch_init(). Was this desired or should we return '1' in both cases?
On Fri, Jul 31, 2020 at 12:22 PM Vitaly Kuznetsov <vkuznets@redhat.com> wrote: > Andy Shevchenko <andy.shevchenko@gmail.com> writes: > > On Thu, Jul 30, 2020 at 10:37 PM Julia Suvorova <jusual@redhat.com> wrote: ... > >> +static int __init kvm_pci_arch_init(void) > >> +{ > >> + if (raw_pci_ext_ops && > >> + return 0; > >> + } > > > >> + return 1; > > > > Hmm... I don't remember what positive codes means there. Perhaps you > > need to return a rather error code? > > If I'm reading the code correctly, > > pci_arch_init() has the following: > > if (x86_init.pci.arch_init && !x86_init.pci.arch_init()) > return 0; > > > so returning '1' here means 'continue' and this seems to be > correct. (E.g. Hyper-V's hv_pci_init() does the same). What I'm not sure > about is 'return 0' above as this will result in skipping the rest of > pci_arch_init(). Was this desired or should we return '1' in both cases? I think it depends what you want. In complex cases we recognize three possibilities -ERRNO: function failed, we have to stop and bailout with error from callee 0: function OK, stop and return 0 1: function OK, continue the rest in callee Do we have needs in this or is the current enough for all (exist) callees?
On Fri, Jul 31, 2020 at 11:22 AM Vitaly Kuznetsov <vkuznets@redhat.com> wrote: > > Andy Shevchenko <andy.shevchenko@gmail.com> writes: > > > On Thu, Jul 30, 2020 at 10:37 PM Julia Suvorova <jusual@redhat.com> wrote: > >> > >> Using MMCONFIG instead of I/O ports cuts the number of config space > >> accesses in half, which is faster on KVM and opens the door for > >> additional optimizations such as Vitaly's "[PATCH 0/3] KVM: x86: KVM > >> MEM_PCI_HOLE memory": > > > >> https://lore.kernel.org/kvm/20200728143741.2718593-1-vkuznets@redhat.com > > > > You may use Link: tag for this. > > > >> However, this change will not bring significant performance improvement > >> unless it is running on x86 within a hypervisor. Moreover, allowing > >> MMCONFIG access for addresses < 256 can be dangerous for some devices: > >> see commit a0ca99096094 ("PCI x86: always use conf1 to access config > >> space below 256 bytes"). That is why a special feature flag is needed. > >> > >> Introduce KVM_FEATURE_PCI_GO_MMCONFIG, which can be enabled when the > >> configuration is known to be safe (e.g. in QEMU). > > > > ... > > > >> +static int __init kvm_pci_arch_init(void) > >> +{ > >> + if (raw_pci_ext_ops && > >> + kvm_para_has_feature(KVM_FEATURE_PCI_GO_MMCONFIG)) { > > > > Better to use traditional pattern, i.e. > > if (not_supported) > > return bail_out; > > > > ...do useful things... > > return 0; > > > >> + pr_info("PCI: Using MMCONFIG for base access\n"); > >> + raw_pci_ops = raw_pci_ext_ops; > >> + return 0; > >> + } > > > >> + return 1; > > > > Hmm... I don't remember what positive codes means there. Perhaps you > > need to return a rather error code? > > If I'm reading the code correctly, > > pci_arch_init() has the following: > > if (x86_init.pci.arch_init && !x86_init.pci.arch_init()) > return 0; > > > so returning '1' here means 'continue' and this seems to be > correct. (E.g. Hyper-V's hv_pci_init() does the same). What I'm not sure > about is 'return 0' above as this will result in skipping the rest of > pci_arch_init(). Was this desired or should we return '1' in both cases? This is intentional because pci_direct_init() is about to overwrite raw_pci_ops. And since QEMU doesn't have anything in pciprobe_dmi_table, it is safe to skip it. Best regards, Julia Suvorova.
diff --git a/Documentation/virt/kvm/cpuid.rst b/Documentation/virt/kvm/cpuid.rst index a7dff9186bed..711f2074877b 100644 --- a/Documentation/virt/kvm/cpuid.rst +++ b/Documentation/virt/kvm/cpuid.rst @@ -92,6 +92,10 @@ KVM_FEATURE_ASYNC_PF_INT 14 guest checks this feature bit async pf acknowledgment msr 0x4b564d07. +KVM_FEATURE_PCI_GO_MMCONFIG 15 guest checks this feature bit + before using MMCONFIG for all + PCI config accesses + KVM_FEATURE_CLOCSOURCE_STABLE_BIT 24 host will warn if no guest-side per-cpu warps are expeced in kvmclock diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h index 812e9b4c1114..5793f372cae0 100644 --- a/arch/x86/include/uapi/asm/kvm_para.h +++ b/arch/x86/include/uapi/asm/kvm_para.h @@ -32,6 +32,7 @@ #define KVM_FEATURE_POLL_CONTROL 12 #define KVM_FEATURE_PV_SCHED_YIELD 13 #define KVM_FEATURE_ASYNC_PF_INT 14 +#define KVM_FEATURE_PCI_GO_MMCONFIG 15 #define KVM_HINTS_REALTIME 0 diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index df63786e7bfa..1ec73e6f25ce 100644 --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -33,6 +33,7 @@ #include <asm/hypervisor.h> #include <asm/tlb.h> #include <asm/cpuidle_haltpoll.h> +#include <asm/pci_x86.h> DEFINE_STATIC_KEY_FALSE(kvm_async_pf_enabled); @@ -715,6 +716,18 @@ static uint32_t __init kvm_detect(void) return kvm_cpuid_base(); } +static int __init kvm_pci_arch_init(void) +{ + if (raw_pci_ext_ops && + kvm_para_has_feature(KVM_FEATURE_PCI_GO_MMCONFIG)) { + pr_info("PCI: Using MMCONFIG for base access\n"); + raw_pci_ops = raw_pci_ext_ops; + return 0; + } + + return 1; +} + static void __init kvm_apic_init(void) { #if defined(CONFIG_SMP) @@ -726,6 +739,7 @@ static void __init kvm_apic_init(void) static void __init kvm_init_platform(void) { kvmclock_init(); + x86_init.pci.arch_init = kvm_pci_arch_init; x86_platform.apic_post_init = kvm_apic_init; }
Using MMCONFIG instead of I/O ports cuts the number of config space accesses in half, which is faster on KVM and opens the door for additional optimizations such as Vitaly's "[PATCH 0/3] KVM: x86: KVM MEM_PCI_HOLE memory": https://lore.kernel.org/kvm/20200728143741.2718593-1-vkuznets@redhat.com However, this change will not bring significant performance improvement unless it is running on x86 within a hypervisor. Moreover, allowing MMCONFIG access for addresses < 256 can be dangerous for some devices: see commit a0ca99096094 ("PCI x86: always use conf1 to access config space below 256 bytes"). That is why a special feature flag is needed. Introduce KVM_FEATURE_PCI_GO_MMCONFIG, which can be enabled when the configuration is known to be safe (e.g. in QEMU). Signed-off-by: Julia Suvorova <jusual@redhat.com> --- Documentation/virt/kvm/cpuid.rst | 4 ++++ arch/x86/include/uapi/asm/kvm_para.h | 1 + arch/x86/kernel/kvm.c | 14 ++++++++++++++ 3 files changed, 19 insertions(+)