mbox series

[v6,00/12] SVM cleanup and INVPCID feature support

Message ID 159985237526.11252.1516487214307300610.stgit@bmoger-ubuntu (mailing list archive)
Headers show
Series SVM cleanup and INVPCID feature support | expand

Message

Babu Moger Sept. 11, 2020, 7:27 p.m. UTC
The following series adds the support for PCID/INVPCID on AMD guests.
While doing it re-structured the vmcb_control_area data structure to
combine all the intercept vectors into one 32 bit array. Makes it easy
for future additions. Re-arranged few pcid related code to make it common
between SVM and VMX.

INVPCID interceptions are added only when the guest is running with shadow
page table enabled. In this case the hypervisor needs to handle the tlbflush
based on the type of invpcid instruction.

For the guests with nested page table (NPT) support, the INVPCID feature
works as running it natively. KVM does not need to do any special handling.

AMD documentation for INVPCID feature is available at "AMD64 Architecture
Programmer’s Manual Volume 2: System Programming, Pub. 24593 Rev. 3.34(or later)"

The documentation can be obtained at the links below:
Link: https://www.amd.com/system/files/TechDocs/24593.pdf
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
---

v6:
 One minor change in patch #04. Otherwise same as v5.
 Updated all the patches by Reviewed-by.

v5:
 https://lore.kernel.org/lkml/159846887637.18873.14677728679411578606.stgit@bmoger-ubuntu/
 All the changes are related to rebase.
 Aplies cleanly on mainline and kvm(master) tree. 
 Resending it to get some attention.

v4:
 https://lore.kernel.org/lkml/159676101387.12805.18038347880482984693.stgit@bmoger-ubuntu/
 1. Changed the functions __set_intercept/__clr_intercept/__is_intercept to
    to vmcb_set_intercept/vmcb_clr_intercept/vmcb_is_intercept by passing
    vmcb_control_area structure(Suggested by Paolo).
 2. Rearranged the commit 7a35e515a7055 ("KVM: VMX: Properly handle kvm_read/write_guest_virt*())
    to make it common across both SVM/VMX(Suggested by Jim Mattson).
 3. Took care of few other comments from Jim Mattson. Dropped "Reviewed-by"
    on few patches which I have changed since v3.

v3:
 https://lore.kernel.org/lkml/159597929496.12744.14654593948763926416.stgit@bmoger-ubuntu/
 1. Addressing the comments from Jim Mattson. Follow the v2 link below
    for the context.
 2. Introduced the generic __set_intercept, __clr_intercept and is_intercept
    using native __set_bit, clear_bit and test_bit.
 3. Combined all the intercepts vectors into single 32 bit array.
 4. Removed set_intercept_cr, clr_intercept_cr, set_exception_intercepts,
    clr_exception_intercept etc. Used the generic set_intercept and
    clr_intercept where applicable.
 5. Tested both L1 guest and l2 nested guests. 

v2:
  https://lore.kernel.org/lkml/159234483706.6230.13753828995249423191.stgit@bmoger-ubuntu/
  - Taken care of few comments from Jim Mattson.
  - KVM interceptions added only when tdp is off. No interceptions
    when tdp is on.
  - Reverted the fault priority to original order in VMX. 
  
v1:
  https://lore.kernel.org/lkml/159191202523.31436.11959784252237488867.stgit@bmoger-ubuntu/

Babu Moger (12):
      KVM: SVM: Introduce vmcb_(set_intercept/clr_intercept/_is_intercept)
      KVM: SVM: Change intercept_cr to generic intercepts
      KVM: SVM: Change intercept_dr to generic intercepts
      KVM: SVM: Modify intercept_exceptions to generic intercepts
      KVM: SVM: Modify 64 bit intercept field to two 32 bit vectors
      KVM: SVM: Add new intercept vector in vmcb_control_area
      KVM: nSVM: Cleanup nested_state data structure
      KVM: SVM: Remove set_cr_intercept, clr_cr_intercept and is_cr_intercept
      KVM: SVM: Remove set_exception_intercept and clr_exception_intercept
      KVM: X86: Rename and move the function vmx_handle_memory_failure to x86.c
      KVM: X86: Move handling of INVPCID types to x86
      KVM:SVM: Enable INVPCID feature on AMD


 arch/x86/include/asm/svm.h      |  117 +++++++++++++++++++++++++----------
 arch/x86/include/uapi/asm/svm.h |    2 +
 arch/x86/kvm/svm/nested.c       |   66 +++++++++-----------
 arch/x86/kvm/svm/svm.c          |  131 ++++++++++++++++++++++++++-------------
 arch/x86/kvm/svm/svm.h          |   87 +++++++++-----------------
 arch/x86/kvm/trace.h            |   21 ++++--
 arch/x86/kvm/vmx/nested.c       |   12 ++--
 arch/x86/kvm/vmx/vmx.c          |   95 ----------------------------
 arch/x86/kvm/vmx/vmx.h          |    2 -
 arch/x86/kvm/x86.c              |  106 ++++++++++++++++++++++++++++++++
 arch/x86/kvm/x86.h              |    3 +
 11 files changed, 364 insertions(+), 278 deletions(-)

--
Signature

Comments

Paolo Bonzini Sept. 12, 2020, 5:08 p.m. UTC | #1
On 11/09/20 21:27, Babu Moger wrote:
> The following series adds the support for PCID/INVPCID on AMD guests.
> While doing it re-structured the vmcb_control_area data structure to
> combine all the intercept vectors into one 32 bit array. Makes it easy
> for future additions. Re-arranged few pcid related code to make it common
> between SVM and VMX.
> 
> INVPCID interceptions are added only when the guest is running with shadow
> page table enabled. In this case the hypervisor needs to handle the tlbflush
> based on the type of invpcid instruction.
> 
> For the guests with nested page table (NPT) support, the INVPCID feature
> works as running it natively. KVM does not need to do any special handling.
> 
> AMD documentation for INVPCID feature is available at "AMD64 Architecture
> Programmer’s Manual Volume 2: System Programming, Pub. 24593 Rev. 3.34(or later)"
> 
> The documentation can be obtained at the links below:
> Link: https://www.amd.com/system/files/TechDocs/24593.pdf
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
> ---
> 
> v6:
>  One minor change in patch #04. Otherwise same as v5.
>  Updated all the patches by Reviewed-by.
> 
> v5:
>  https://lore.kernel.org/lkml/159846887637.18873.14677728679411578606.stgit@bmoger-ubuntu/
>  All the changes are related to rebase.
>  Aplies cleanly on mainline and kvm(master) tree. 
>  Resending it to get some attention.
> 
> v4:
>  https://lore.kernel.org/lkml/159676101387.12805.18038347880482984693.stgit@bmoger-ubuntu/
>  1. Changed the functions __set_intercept/__clr_intercept/__is_intercept to
>     to vmcb_set_intercept/vmcb_clr_intercept/vmcb_is_intercept by passing
>     vmcb_control_area structure(Suggested by Paolo).
>  2. Rearranged the commit 7a35e515a7055 ("KVM: VMX: Properly handle kvm_read/write_guest_virt*())
>     to make it common across both SVM/VMX(Suggested by Jim Mattson).
>  3. Took care of few other comments from Jim Mattson. Dropped "Reviewed-by"
>     on few patches which I have changed since v3.
> 
> v3:
>  https://lore.kernel.org/lkml/159597929496.12744.14654593948763926416.stgit@bmoger-ubuntu/
>  1. Addressing the comments from Jim Mattson. Follow the v2 link below
>     for the context.
>  2. Introduced the generic __set_intercept, __clr_intercept and is_intercept
>     using native __set_bit, clear_bit and test_bit.
>  3. Combined all the intercepts vectors into single 32 bit array.
>  4. Removed set_intercept_cr, clr_intercept_cr, set_exception_intercepts,
>     clr_exception_intercept etc. Used the generic set_intercept and
>     clr_intercept where applicable.
>  5. Tested both L1 guest and l2 nested guests. 
> 
> v2:
>   https://lore.kernel.org/lkml/159234483706.6230.13753828995249423191.stgit@bmoger-ubuntu/
>   - Taken care of few comments from Jim Mattson.
>   - KVM interceptions added only when tdp is off. No interceptions
>     when tdp is on.
>   - Reverted the fault priority to original order in VMX. 
>   
> v1:
>   https://lore.kernel.org/lkml/159191202523.31436.11959784252237488867.stgit@bmoger-ubuntu/
> 
> Babu Moger (12):
>       KVM: SVM: Introduce vmcb_(set_intercept/clr_intercept/_is_intercept)
>       KVM: SVM: Change intercept_cr to generic intercepts
>       KVM: SVM: Change intercept_dr to generic intercepts
>       KVM: SVM: Modify intercept_exceptions to generic intercepts
>       KVM: SVM: Modify 64 bit intercept field to two 32 bit vectors
>       KVM: SVM: Add new intercept vector in vmcb_control_area
>       KVM: nSVM: Cleanup nested_state data structure
>       KVM: SVM: Remove set_cr_intercept, clr_cr_intercept and is_cr_intercept
>       KVM: SVM: Remove set_exception_intercept and clr_exception_intercept
>       KVM: X86: Rename and move the function vmx_handle_memory_failure to x86.c
>       KVM: X86: Move handling of INVPCID types to x86
>       KVM:SVM: Enable INVPCID feature on AMD
> 
> 
>  arch/x86/include/asm/svm.h      |  117 +++++++++++++++++++++++++----------
>  arch/x86/include/uapi/asm/svm.h |    2 +
>  arch/x86/kvm/svm/nested.c       |   66 +++++++++-----------
>  arch/x86/kvm/svm/svm.c          |  131 ++++++++++++++++++++++++++-------------
>  arch/x86/kvm/svm/svm.h          |   87 +++++++++-----------------
>  arch/x86/kvm/trace.h            |   21 ++++--
>  arch/x86/kvm/vmx/nested.c       |   12 ++--
>  arch/x86/kvm/vmx/vmx.c          |   95 ----------------------------
>  arch/x86/kvm/vmx/vmx.h          |    2 -
>  arch/x86/kvm/x86.c              |  106 ++++++++++++++++++++++++++++++++
>  arch/x86/kvm/x86.h              |    3 +
>  11 files changed, 364 insertions(+), 278 deletions(-)
> 
> --
> Signature
> 

Queued except for patch 9 with only some changes to the names (mostly
replacing "vector" with "word").  It should get to kvm.git on Monday or
Tuesday, please give it a shot.

Thanks!

Paolo
Sean Christopherson Sept. 14, 2020, 3:05 p.m. UTC | #2
On Sat, Sep 12, 2020 at 07:08:05PM +0200, Paolo Bonzini wrote:
> Queued except for patch 9 with only some changes to the names (mostly
> replacing "vector" with "word").  It should get to kvm.git on Monday or
> Tuesday, please give it a shot.

Belated vote for s/vector/word, I found EXCEPTION_VECTOR quite confusing.
Babu Moger Sept. 14, 2020, 6:33 p.m. UTC | #3
On 9/12/20 12:08 PM, Paolo Bonzini wrote:
> On 11/09/20 21:27, Babu Moger wrote:
>> The following series adds the support for PCID/INVPCID on AMD guests.
>> While doing it re-structured the vmcb_control_area data structure to
>> combine all the intercept vectors into one 32 bit array. Makes it easy
>> for future additions. Re-arranged few pcid related code to make it common
>> between SVM and VMX.
>>
>> INVPCID interceptions are added only when the guest is running with shadow
>> page table enabled. In this case the hypervisor needs to handle the tlbflush
>> based on the type of invpcid instruction.
>>
>> For the guests with nested page table (NPT) support, the INVPCID feature
>> works as running it natively. KVM does not need to do any special handling.
>>
>> AMD documentation for INVPCID feature is available at "AMD64 Architecture
>> Programmer’s Manual Volume 2: System Programming, Pub. 24593 Rev. 3.34(or later)"
>>
>> The documentation can be obtained at the links below:
>> Link: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.amd.com%2Fsystem%2Ffiles%2FTechDocs%2F24593.pdf&data=02%7C01%7Cbabu.moger%40amd.com%7Cd2bca7c6209743a7fe0e08d8573e70fd%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637355274033139116&sdata=C3EGywJcz3rAPmjckWGKbm7GkHR1Xyrl%2BIL9sEijhcQ%3D&reserved=0
>> Link: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.kernel.org%2Fshow_bug.cgi%3Fid%3D206537&data=02%7C01%7Cbabu.moger%40amd.com%7Cd2bca7c6209743a7fe0e08d8573e70fd%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637355274033139116&sdata=29n8WNNpcUgVQRUyxbiSPcWJGTL5uV%2FaHgHXU1b9BjI%3D&reserved=0
>> ---
>>
>> v6:
>>  One minor change in patch #04. Otherwise same as v5.
>>  Updated all the patches by Reviewed-by.
>>
>> v5:
>>  https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Flkml%2F159846887637.18873.14677728679411578606.stgit%40bmoger-ubuntu%2F&data=02%7C01%7Cbabu.moger%40amd.com%7Cd2bca7c6209743a7fe0e08d8573e70fd%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637355274033139116&sdata=D7HvBj6OArmpKsiaZj0Qk3mIHWYOOUN23f53ajhQpOY%3D&reserved=0
>>  All the changes are related to rebase.
>>  Aplies cleanly on mainline and kvm(master) tree. 
>>  Resending it to get some attention.
>>
>> v4:
>>  https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Flkml%2F159676101387.12805.18038347880482984693.stgit%40bmoger-ubuntu%2F&data=02%7C01%7Cbabu.moger%40amd.com%7Cd2bca7c6209743a7fe0e08d8573e70fd%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637355274033139116&sdata=7og620g0qsxee7Wd60emz5YdbA44Al4tiUJX5n46MhE%3D&reserved=0
>>  1. Changed the functions __set_intercept/__clr_intercept/__is_intercept to
>>     to vmcb_set_intercept/vmcb_clr_intercept/vmcb_is_intercept by passing
>>     vmcb_control_area structure(Suggested by Paolo).
>>  2. Rearranged the commit 7a35e515a7055 ("KVM: VMX: Properly handle kvm_read/write_guest_virt*())
>>     to make it common across both SVM/VMX(Suggested by Jim Mattson).
>>  3. Took care of few other comments from Jim Mattson. Dropped "Reviewed-by"
>>     on few patches which I have changed since v3.
>>
>> v3:
>>  https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Flkml%2F159597929496.12744.14654593948763926416.stgit%40bmoger-ubuntu%2F&data=02%7C01%7Cbabu.moger%40amd.com%7Cd2bca7c6209743a7fe0e08d8573e70fd%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637355274033139116&sdata=hvPNH827bmo1VL%2F%2FIv%2F%2ByQdVBygOpI1tkgQ6ASf5Wt8%3D&reserved=0
>>  1. Addressing the comments from Jim Mattson. Follow the v2 link below
>>     for the context.
>>  2. Introduced the generic __set_intercept, __clr_intercept and is_intercept
>>     using native __set_bit, clear_bit and test_bit.
>>  3. Combined all the intercepts vectors into single 32 bit array.
>>  4. Removed set_intercept_cr, clr_intercept_cr, set_exception_intercepts,
>>     clr_exception_intercept etc. Used the generic set_intercept and
>>     clr_intercept where applicable.
>>  5. Tested both L1 guest and l2 nested guests. 
>>
>> v2:
>>   https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Flkml%2F159234483706.6230.13753828995249423191.stgit%40bmoger-ubuntu%2F&data=02%7C01%7Cbabu.moger%40amd.com%7Cd2bca7c6209743a7fe0e08d8573e70fd%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637355274033139116&sdata=rP%2BlRJ91tk1VXS3YX8TdP2L9vORiIj8gN3ZZLKIXfeY%3D&reserved=0
>>   - Taken care of few comments from Jim Mattson.
>>   - KVM interceptions added only when tdp is off. No interceptions
>>     when tdp is on.
>>   - Reverted the fault priority to original order in VMX. 
>>   
>> v1:
>>   https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Flkml%2F159191202523.31436.11959784252237488867.stgit%40bmoger-ubuntu%2F&data=02%7C01%7Cbabu.moger%40amd.com%7Cd2bca7c6209743a7fe0e08d8573e70fd%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637355274033139116&sdata=IGmv%2BLF60dmGVSCwcTU6sTDMvW1%2BEWUqTA5K%2FAowuxM%3D&reserved=0
>>
>> Babu Moger (12):
>>       KVM: SVM: Introduce vmcb_(set_intercept/clr_intercept/_is_intercept)
>>       KVM: SVM: Change intercept_cr to generic intercepts
>>       KVM: SVM: Change intercept_dr to generic intercepts
>>       KVM: SVM: Modify intercept_exceptions to generic intercepts
>>       KVM: SVM: Modify 64 bit intercept field to two 32 bit vectors
>>       KVM: SVM: Add new intercept vector in vmcb_control_area
>>       KVM: nSVM: Cleanup nested_state data structure
>>       KVM: SVM: Remove set_cr_intercept, clr_cr_intercept and is_cr_intercept
>>       KVM: SVM: Remove set_exception_intercept and clr_exception_intercept
>>       KVM: X86: Rename and move the function vmx_handle_memory_failure to x86.c
>>       KVM: X86: Move handling of INVPCID types to x86
>>       KVM:SVM: Enable INVPCID feature on AMD
>>
>>
>>  arch/x86/include/asm/svm.h      |  117 +++++++++++++++++++++++++----------
>>  arch/x86/include/uapi/asm/svm.h |    2 +
>>  arch/x86/kvm/svm/nested.c       |   66 +++++++++-----------
>>  arch/x86/kvm/svm/svm.c          |  131 ++++++++++++++++++++++++++-------------
>>  arch/x86/kvm/svm/svm.h          |   87 +++++++++-----------------
>>  arch/x86/kvm/trace.h            |   21 ++++--
>>  arch/x86/kvm/vmx/nested.c       |   12 ++--
>>  arch/x86/kvm/vmx/vmx.c          |   95 ----------------------------
>>  arch/x86/kvm/vmx/vmx.h          |    2 -
>>  arch/x86/kvm/x86.c              |  106 ++++++++++++++++++++++++++++++++
>>  arch/x86/kvm/x86.h              |    3 +
>>  11 files changed, 364 insertions(+), 278 deletions(-)
>>
>> --
>> Signature
>>
> 
> Queued except for patch 9 with only some changes to the names (mostly
> replacing "vector" with "word").  It should get to kvm.git on Monday or
> Tuesday, please give it a shot.

Thanks Paolo. Tested Guest/nested guest/kvm units tests. Everything works
as expected.
Jim Mattson Jan. 19, 2021, 11:01 p.m. UTC | #4
On Mon, Sep 14, 2020 at 11:33 AM Babu Moger <babu.moger@amd.com> wrote:

> Thanks Paolo. Tested Guest/nested guest/kvm units tests. Everything works
> as expected.

Debian 9 does not like this patch set. As a kvm guest, it panics on a
Milan CPU unless booted with 'nopcid'. Gmail mangles long lines, so
please see the attached kernel log snippet. Debian 10 is fine, so I
assume this is a guest bug.
Babu Moger Jan. 19, 2021, 11:45 p.m. UTC | #5
On 1/19/21 5:01 PM, Jim Mattson wrote:
> On Mon, Sep 14, 2020 at 11:33 AM Babu Moger <babu.moger@amd.com> wrote:
> 
>> Thanks Paolo. Tested Guest/nested guest/kvm units tests. Everything works
>> as expected.
> 
> Debian 9 does not like this patch set. As a kvm guest, it panics on a
> Milan CPU unless booted with 'nopcid'. Gmail mangles long lines, so
> please see the attached kernel log snippet. Debian 10 is fine, so I
> assume this is a guest bug.
> 

We had an issue with PCID feature earlier. This was showing only with SEV
guests. It is resolved recently. Do you think it is not related that?
Here are the patch set.
https://lore.kernel.org/kvm/160521930597.32054.4906933314022910996.stgit@bmoger-ubuntu/
Jim Mattson Jan. 20, 2021, 9:14 p.m. UTC | #6
On Tue, Jan 19, 2021 at 3:45 PM Babu Moger <babu.moger@amd.com> wrote:
>
>
>
> On 1/19/21 5:01 PM, Jim Mattson wrote:
> > On Mon, Sep 14, 2020 at 11:33 AM Babu Moger <babu.moger@amd.com> wrote:
> >
> >> Thanks Paolo. Tested Guest/nested guest/kvm units tests. Everything works
> >> as expected.
> >
> > Debian 9 does not like this patch set. As a kvm guest, it panics on a
> > Milan CPU unless booted with 'nopcid'. Gmail mangles long lines, so
> > please see the attached kernel log snippet. Debian 10 is fine, so I
> > assume this is a guest bug.
> >
>
> We had an issue with PCID feature earlier. This was showing only with SEV
> guests. It is resolved recently. Do you think it is not related that?
> Here are the patch set.
> https://lore.kernel.org/kvm/160521930597.32054.4906933314022910996.stgit@bmoger-ubuntu/

The Debian 9 release we tested is not an SEV guest.
Babu Moger Jan. 20, 2021, 9:45 p.m. UTC | #7
On 1/20/21 3:14 PM, Jim Mattson wrote:
> On Tue, Jan 19, 2021 at 3:45 PM Babu Moger <babu.moger@amd.com> wrote:
>>
>>
>>
>> On 1/19/21 5:01 PM, Jim Mattson wrote:
>>> On Mon, Sep 14, 2020 at 11:33 AM Babu Moger <babu.moger@amd.com> wrote:
>>>
>>>> Thanks Paolo. Tested Guest/nested guest/kvm units tests. Everything works
>>>> as expected.
>>>
>>> Debian 9 does not like this patch set. As a kvm guest, it panics on a
>>> Milan CPU unless booted with 'nopcid'. Gmail mangles long lines, so
>>> please see the attached kernel log snippet. Debian 10 is fine, so I
>>> assume this is a guest bug.
>>>
>>
>> We had an issue with PCID feature earlier. This was showing only with SEV
>> guests. It is resolved recently. Do you think it is not related that?
>> Here are the patch set.
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fkvm%2F160521930597.32054.4906933314022910996.stgit%40bmoger-ubuntu%2F&amp;data=04%7C01%7Cbabu.moger%40amd.com%7C507e52200cc5478e3b9308d8bd8860bc%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637467740754159704%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=Nxhg4Atzr6wZ1L7egyxQVZ%2FmVCE473%2F%2F5Fi0savgUfk%3D&amp;reserved=0
> 
> The Debian 9 release we tested is not an SEV guest.
ok. I have not tested Debian 9 before. I will try now. Will let you know
how it goes. thanks
Babu Moger Jan. 21, 2021, 3:10 a.m. UTC | #8
On 1/20/21 3:45 PM, Babu Moger wrote:
> 
> 
> On 1/20/21 3:14 PM, Jim Mattson wrote:
>> On Tue, Jan 19, 2021 at 3:45 PM Babu Moger <babu.moger@amd.com> wrote:
>>>
>>>
>>>
>>> On 1/19/21 5:01 PM, Jim Mattson wrote:
>>>> On Mon, Sep 14, 2020 at 11:33 AM Babu Moger <babu.moger@amd.com> wrote:
>>>>
>>>>> Thanks Paolo. Tested Guest/nested guest/kvm units tests. Everything works
>>>>> as expected.
>>>>
>>>> Debian 9 does not like this patch set. As a kvm guest, it panics on a
>>>> Milan CPU unless booted with 'nopcid'. Gmail mangles long lines, so
>>>> please see the attached kernel log snippet. Debian 10 is fine, so I
>>>> assume this is a guest bug.
>>>>
>>>
>>> We had an issue with PCID feature earlier. This was showing only with SEV
>>> guests. It is resolved recently. Do you think it is not related that?
>>> Here are the patch set.
>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fkvm%2F160521930597.32054.4906933314022910996.stgit%40bmoger-ubuntu%2F&amp;data=04%7C01%7CBabu.Moger%40amd.com%7C562d8b8ea61c41a61fe608d8bda0ae3b%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637467845105800757%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=l%2FhF%2FlDAqFN10SzDQ1L05FH1joXrLiuMwHAibBGHOqw%3D&amp;reserved=0
>>
>> The Debian 9 release we tested is not an SEV guest.
> ok. I have not tested Debian 9 before. I will try now. Will let you know
> how it goes. thanks
> 

I have reproduced the issue locally. Will investigate. thanks
Babu Moger Jan. 21, 2021, 11:51 p.m. UTC | #9
On 1/20/21 9:10 PM, Babu Moger wrote:
> 
> 
> On 1/20/21 3:45 PM, Babu Moger wrote:
>>
>>
>> On 1/20/21 3:14 PM, Jim Mattson wrote:
>>> On Tue, Jan 19, 2021 at 3:45 PM Babu Moger <babu.moger@amd.com> wrote:
>>>>
>>>>
>>>>
>>>> On 1/19/21 5:01 PM, Jim Mattson wrote:
>>>>> On Mon, Sep 14, 2020 at 11:33 AM Babu Moger <babu.moger@amd.com> wrote:
>>>>>
>>>>>> Thanks Paolo. Tested Guest/nested guest/kvm units tests. Everything works
>>>>>> as expected.
>>>>>
>>>>> Debian 9 does not like this patch set. As a kvm guest, it panics on a
>>>>> Milan CPU unless booted with 'nopcid'. Gmail mangles long lines, so
>>>>> please see the attached kernel log snippet. Debian 10 is fine, so I
>>>>> assume this is a guest bug.
>>>>>
>>>>
>>>> We had an issue with PCID feature earlier. This was showing only with SEV
>>>> guests. It is resolved recently. Do you think it is not related that?
>>>> Here are the patch set.
>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fkvm%2F160521930597.32054.4906933314022910996.stgit%40bmoger-ubuntu%2F&amp;data=04%7C01%7CBabu.Moger%40amd.com%7C3009e5f7f32b4dbd4aee08d8bdc045c9%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637467980841376327%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=%2Bva7em372XD7uaCrSy3UBH6a9n8xaTTXWCAlA3gJX78%3D&amp;reserved=0
>>>
>>> The Debian 9 release we tested is not an SEV guest.
>> ok. I have not tested Debian 9 before. I will try now. Will let you know
>> how it goes. thanks
>>
> 
> I have reproduced the issue locally. Will investigate. thanks
> 
Few updates.
1. Like Jim mentioned earlier, this appears to be guest kernel issue.
Debian 9 runs the base kernel 4.9.0-14. Problem can be seen consistently
with this kernel.

2. This guest kernel(4.9.0-14) does not like the new feature INVPCID.

3. System comes up fine when invpcid feature is disabled with the boot
parameter "noinvpcid" and also with "nopcid". nopcid disables both pcid
and invpcid.

4. Upgraded the guest kernel to v5.0 and system comes up fine.

5. Also system comes up fine with latest guest kernel 5.11.0-rc4.

I did not bisect further yet.
Babu
Thanks
Babu Moger Jan. 23, 2021, 1:52 a.m. UTC | #10
On 1/21/21 5:51 PM, Babu Moger wrote:
> 
> 
> On 1/20/21 9:10 PM, Babu Moger wrote:
>>
>>
>> On 1/20/21 3:45 PM, Babu Moger wrote:
>>>
>>>
>>> On 1/20/21 3:14 PM, Jim Mattson wrote:
>>>> On Tue, Jan 19, 2021 at 3:45 PM Babu Moger <babu.moger@amd.com> wrote:
>>>>>
>>>>>
>>>>>
>>>>> On 1/19/21 5:01 PM, Jim Mattson wrote:
>>>>>> On Mon, Sep 14, 2020 at 11:33 AM Babu Moger <babu.moger@amd.com> wrote:
>>>>>>
>>>>>>> Thanks Paolo. Tested Guest/nested guest/kvm units tests. Everything works
>>>>>>> as expected.
>>>>>>
>>>>>> Debian 9 does not like this patch set. As a kvm guest, it panics on a
>>>>>> Milan CPU unless booted with 'nopcid'. Gmail mangles long lines, so
>>>>>> please see the attached kernel log snippet. Debian 10 is fine, so I
>>>>>> assume this is a guest bug.
>>>>>>
>>>>>
>>>>> We had an issue with PCID feature earlier. This was showing only with SEV
>>>>> guests. It is resolved recently. Do you think it is not related that?
>>>>> Here are the patch set.
>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fkvm%2F160521930597.32054.4906933314022910996.stgit%40bmoger-ubuntu%2F&amp;data=04%7C01%7CBabu.Moger%40amd.com%7C3009e5f7f32b4dbd4aee08d8bdc045c9%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637467980841376327%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=%2Bva7em372XD7uaCrSy3UBH6a9n8xaTTXWCAlA3gJX78%3D&amp;reserved=0
>>>>
>>>> The Debian 9 release we tested is not an SEV guest.
>>> ok. I have not tested Debian 9 before. I will try now. Will let you know
>>> how it goes. thanks
>>>
>>
>> I have reproduced the issue locally. Will investigate. thanks
>>
> Few updates.
> 1. Like Jim mentioned earlier, this appears to be guest kernel issue.
> Debian 9 runs the base kernel 4.9.0-14. Problem can be seen consistently
> with this kernel.
> 
> 2. This guest kernel(4.9.0-14) does not like the new feature INVPCID.
> 
> 3. System comes up fine when invpcid feature is disabled with the boot
> parameter "noinvpcid" and also with "nopcid". nopcid disables both pcid
> and invpcid.
> 
> 4. Upgraded the guest kernel to v5.0 and system comes up fine.
> 
> 5. Also system comes up fine with latest guest kernel 5.11.0-rc4.
> 
> I did not bisect further yet.
> Babu
> Thanks


Some more update:
 System comes up fine with kernel v4.9(checked out on upstream tag v4.9).
So, I am assuming this is something specific to Debian 4.9.0-14 kernel.

Note: I couldn't go back prior versions(v4.8 or earlier) due to compile
issues.
Thanks
Babu
Jim Mattson Feb. 24, 2021, 12:13 a.m. UTC | #11
Any updates? What should we be telling customers with Debian 9 guests? :-)

On Fri, Jan 22, 2021 at 5:52 PM Babu Moger <babu.moger@amd.com> wrote:
>
>
>
> On 1/21/21 5:51 PM, Babu Moger wrote:
> >
> >
> > On 1/20/21 9:10 PM, Babu Moger wrote:
> >>
> >>
> >> On 1/20/21 3:45 PM, Babu Moger wrote:
> >>>
> >>>
> >>> On 1/20/21 3:14 PM, Jim Mattson wrote:
> >>>> On Tue, Jan 19, 2021 at 3:45 PM Babu Moger <babu.moger@amd.com> wrote:
> >>>>>
> >>>>>
> >>>>>
> >>>>> On 1/19/21 5:01 PM, Jim Mattson wrote:
> >>>>>> On Mon, Sep 14, 2020 at 11:33 AM Babu Moger <babu.moger@amd.com> wrote:
> >>>>>>
> >>>>>>> Thanks Paolo. Tested Guest/nested guest/kvm units tests. Everything works
> >>>>>>> as expected.
> >>>>>>
> >>>>>> Debian 9 does not like this patch set. As a kvm guest, it panics on a
> >>>>>> Milan CPU unless booted with 'nopcid'. Gmail mangles long lines, so
> >>>>>> please see the attached kernel log snippet. Debian 10 is fine, so I
> >>>>>> assume this is a guest bug.
> >>>>>>
> >>>>>
> >>>>> We had an issue with PCID feature earlier. This was showing only with SEV
> >>>>> guests. It is resolved recently. Do you think it is not related that?
> >>>>> Here are the patch set.
> >>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fkvm%2F160521930597.32054.4906933314022910996.stgit%40bmoger-ubuntu%2F&amp;data=04%7C01%7CBabu.Moger%40amd.com%7C3009e5f7f32b4dbd4aee08d8bdc045c9%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637467980841376327%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=%2Bva7em372XD7uaCrSy3UBH6a9n8xaTTXWCAlA3gJX78%3D&amp;reserved=0
> >>>>
> >>>> The Debian 9 release we tested is not an SEV guest.
> >>> ok. I have not tested Debian 9 before. I will try now. Will let you know
> >>> how it goes. thanks
> >>>
> >>
> >> I have reproduced the issue locally. Will investigate. thanks
> >>
> > Few updates.
> > 1. Like Jim mentioned earlier, this appears to be guest kernel issue.
> > Debian 9 runs the base kernel 4.9.0-14. Problem can be seen consistently
> > with this kernel.
> >
> > 2. This guest kernel(4.9.0-14) does not like the new feature INVPCID.
> >
> > 3. System comes up fine when invpcid feature is disabled with the boot
> > parameter "noinvpcid" and also with "nopcid". nopcid disables both pcid
> > and invpcid.
> >
> > 4. Upgraded the guest kernel to v5.0 and system comes up fine.
> >
> > 5. Also system comes up fine with latest guest kernel 5.11.0-rc4.
> >
> > I did not bisect further yet.
> > Babu
> > Thanks
>
>
> Some more update:
>  System comes up fine with kernel v4.9(checked out on upstream tag v4.9).
> So, I am assuming this is something specific to Debian 4.9.0-14 kernel.
>
> Note: I couldn't go back prior versions(v4.8 or earlier) due to compile
> issues.
> Thanks
> Babu
>
Babu Moger Feb. 24, 2021, 10:17 p.m. UTC | #12
> -----Original Message-----
> From: Jim Mattson <jmattson@google.com>
> Sent: Tuesday, February 23, 2021 6:14 PM
> To: Moger, Babu <Babu.Moger@amd.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>; Vitaly Kuznetsov
> <vkuznets@redhat.com>; Wanpeng Li <wanpengli@tencent.com>; kvm list
> <kvm@vger.kernel.org>; Joerg Roedel <joro@8bytes.org>; the arch/x86
> maintainers <x86@kernel.org>; LKML <linux-kernel@vger.kernel.org>; Ingo
> Molnar <mingo@redhat.com>; Borislav Petkov <bp@alien8.de>; H . Peter Anvin
> <hpa@zytor.com>; Thomas Gleixner <tglx@linutronix.de>; Makarand Sonare
> <makarandsonare@google.com>; Sean Christopherson <seanjc@google.com>
> Subject: Re: [PATCH v6 00/12] SVM cleanup and INVPCID feature support
> 
> Any updates? What should we be telling customers with Debian 9 guests? :-)

Found another problem with pcid feature om SVM. It is do with CR4 flags
reset during bootup. Problem was showing up with kexec loading on VM.
I am not sure if this is related to that. Will send the patch soon.

> 
> On Fri, Jan 22, 2021 at 5:52 PM Babu Moger <babu.moger@amd.com> wrote:
> >
> >
> >
> > On 1/21/21 5:51 PM, Babu Moger wrote:
> > >
> > >
> > > On 1/20/21 9:10 PM, Babu Moger wrote:
> > >>
> > >>
> > >> On 1/20/21 3:45 PM, Babu Moger wrote:
> > >>>
> > >>>
> > >>> On 1/20/21 3:14 PM, Jim Mattson wrote:
> > >>>> On Tue, Jan 19, 2021 at 3:45 PM Babu Moger <babu.moger@amd.com>
> wrote:
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> On 1/19/21 5:01 PM, Jim Mattson wrote:
> > >>>>>> On Mon, Sep 14, 2020 at 11:33 AM Babu Moger
> <babu.moger@amd.com> wrote:
> > >>>>>>
> > >>>>>>> Thanks Paolo. Tested Guest/nested guest/kvm units tests.
> > >>>>>>> Everything works as expected.
> > >>>>>>
> > >>>>>> Debian 9 does not like this patch set. As a kvm guest, it
> > >>>>>> panics on a Milan CPU unless booted with 'nopcid'. Gmail
> > >>>>>> mangles long lines, so please see the attached kernel log
> > >>>>>> snippet. Debian 10 is fine, so I assume this is a guest bug.
> > >>>>>>
> > >>>>>
> > >>>>> We had an issue with PCID feature earlier. This was showing only
> > >>>>> with SEV guests. It is resolved recently. Do you think it is not related
> that?
> > >>>>> Here are the patch set.
> > >>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%
> > >>>>>
> 2Flore.kernel.org%2Fkvm%2F160521930597.32054.4906933314022910996
> > >>>>> .stgit%40bmoger-
> ubuntu%2F&amp;data=04%7C01%7Cbabu.moger%40amd.co
> > >>>>>
> m%7C9558672ca21c4f6c2d5308d8d85919dc%7C3dd8961fe4884e608e11a82d9
> > >>>>>
> 94e183d%7C0%7C0%7C637497224490455772%7CUnknown%7CTWFpbGZsb3d8
> eyJ
> > >>>>>
> WIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%
> > >>>>>
> 7C1000&amp;sdata=4QzTNHaYllwPd1U0kumq75dpwp7Rg0ZXsSQ631jMeqs%3D
> &
> > >>>>> amp;reserved=0
> > >>>>
> > >>>> The Debian 9 release we tested is not an SEV guest.
> > >>> ok. I have not tested Debian 9 before. I will try now. Will let
> > >>> you know how it goes. thanks
> > >>>
> > >>
> > >> I have reproduced the issue locally. Will investigate. thanks
> > >>
> > > Few updates.
> > > 1. Like Jim mentioned earlier, this appears to be guest kernel issue.
> > > Debian 9 runs the base kernel 4.9.0-14. Problem can be seen
> > > consistently with this kernel.
> > >
> > > 2. This guest kernel(4.9.0-14) does not like the new feature INVPCID.
> > >
> > > 3. System comes up fine when invpcid feature is disabled with the
> > > boot parameter "noinvpcid" and also with "nopcid". nopcid disables
> > > both pcid and invpcid.
> > >
> > > 4. Upgraded the guest kernel to v5.0 and system comes up fine.
> > >
> > > 5. Also system comes up fine with latest guest kernel 5.11.0-rc4.
> > >
> > > I did not bisect further yet.
> > > Babu
> > > Thanks
> >
> >
> > Some more update:
> >  System comes up fine with kernel v4.9(checked out on upstream tag v4.9).
> > So, I am assuming this is something specific to Debian 4.9.0-14 kernel.
> >
> > Note: I couldn't go back prior versions(v4.8 or earlier) due to
> > compile issues.
> > Thanks
> > Babu
> >
Babu Moger March 10, 2021, 1:04 a.m. UTC | #13
> -----Original Message-----
> From: Babu Moger <babu.moger@amd.com>
> Sent: Wednesday, February 24, 2021 4:17 PM
> To: Jim Mattson <jmattson@google.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>; Vitaly Kuznetsov
> <vkuznets@redhat.com>; Wanpeng Li <wanpengli@tencent.com>; kvm list
> <kvm@vger.kernel.org>; Joerg Roedel <joro@8bytes.org>; the arch/x86
> maintainers <x86@kernel.org>; LKML <linux-kernel@vger.kernel.org>; Ingo
> Molnar <mingo@redhat.com>; Borislav Petkov <bp@alien8.de>; H . Peter
> Anvin <hpa@zytor.com>; Thomas Gleixner <tglx@linutronix.de>; Makarand
> Sonare <makarandsonare@google.com>; Sean Christopherson
> <seanjc@google.com>
> Subject: RE: [PATCH v6 00/12] SVM cleanup and INVPCID feature support
> 
> 
> 
> > -----Original Message-----
> > From: Jim Mattson <jmattson@google.com>
> > Sent: Tuesday, February 23, 2021 6:14 PM
> > To: Moger, Babu <Babu.Moger@amd.com>
> > Cc: Paolo Bonzini <pbonzini@redhat.com>; Vitaly Kuznetsov
> > <vkuznets@redhat.com>; Wanpeng Li <wanpengli@tencent.com>; kvm list
> > <kvm@vger.kernel.org>; Joerg Roedel <joro@8bytes.org>; the arch/x86
> > maintainers <x86@kernel.org>; LKML <linux-kernel@vger.kernel.org>;
> > Ingo Molnar <mingo@redhat.com>; Borislav Petkov <bp@alien8.de>; H .
> > Peter Anvin <hpa@zytor.com>; Thomas Gleixner <tglx@linutronix.de>;
> > Makarand Sonare <makarandsonare@google.com>; Sean Christopherson
> > <seanjc@google.com>
> > Subject: Re: [PATCH v6 00/12] SVM cleanup and INVPCID feature support
> >
> > Any updates? What should we be telling customers with Debian 9 guests?
> > :-)
> 
> Found another problem with pcid feature om SVM. It is do with CR4 flags
> reset during bootup. Problem was showing up with kexec loading on VM.
> I am not sure if this is related to that. Will send the patch soon.

Tried to reproduce the problem on upstream kernel versions without any
success.  Tried v4.9-0 and v4.8-0. Both these upstream versions are
working fine. So "git bisect" on upstream is ruled out.

Debian kernel 4.10(tag 4.10~rc6-1~exp1) also works fine. It appears the
problem is on Debian 4.9 kernel. I am not sure how to run git bisect on
Debian kernel. Tried anyway. It is pointing to

47811c66356d875e76a6ca637a9d384779a659bb is the first bad commit
commit 47811c66356d875e76a6ca637a9d384779a659bb
Author: Ben Hutchings <benh@debian.org>
Date:   Mon Mar 8 01:17:32 2021 +0100

    Prepare to release linux (4.9.258-1).

It does not appear to be the right commit. I am out of ideas now.
hanks
Babu

> 
> >
> > On Fri, Jan 22, 2021 at 5:52 PM Babu Moger <babu.moger@amd.com>
> wrote:
> > >
> > >
> > >
> > > On 1/21/21 5:51 PM, Babu Moger wrote:
> > > >
> > > >
> > > > On 1/20/21 9:10 PM, Babu Moger wrote:
> > > >>
> > > >>
> > > >> On 1/20/21 3:45 PM, Babu Moger wrote:
> > > >>>
> > > >>>
> > > >>> On 1/20/21 3:14 PM, Jim Mattson wrote:
> > > >>>> On Tue, Jan 19, 2021 at 3:45 PM Babu Moger
> <babu.moger@amd.com>
> > wrote:
> > > >>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>> On 1/19/21 5:01 PM, Jim Mattson wrote:
> > > >>>>>> On Mon, Sep 14, 2020 at 11:33 AM Babu Moger
> > <babu.moger@amd.com> wrote:
> > > >>>>>>
> > > >>>>>>> Thanks Paolo. Tested Guest/nested guest/kvm units tests.
> > > >>>>>>> Everything works as expected.
> > > >>>>>>
> > > >>>>>> Debian 9 does not like this patch set. As a kvm guest, it
> > > >>>>>> panics on a Milan CPU unless booted with 'nopcid'. Gmail
> > > >>>>>> mangles long lines, so please see the attached kernel log
> > > >>>>>> snippet. Debian 10 is fine, so I assume this is a guest bug.
> > > >>>>>>
> > > >>>>>
> > > >>>>> We had an issue with PCID feature earlier. This was showing
> > > >>>>> only with SEV guests. It is resolved recently. Do you think it
> > > >>>>> is not related
> > that?
> > > >>>>> Here are the patch set.
> > > >>>>>
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2
> > > >>>>> F%25
> > > >>>>>
> > 2Flore.kernel.org%2Fkvm%2F160521930597.32054.4906933314022910996
> > > >>>>> .stgit%40bmoger-
> > ubuntu%2F&amp;data=04%7C01%7Cbabu.moger%40amd.co
> > > >>>>>
> >
> m%7C9558672ca21c4f6c2d5308d8d85919dc%7C3dd8961fe4884e608e11a82d9
> > > >>>>>
> >
> 94e183d%7C0%7C0%7C637497224490455772%7CUnknown%7CTWFpbGZsb3d
> 8
> > eyJ
> > > >>>>>
> >
> WIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%
> > > >>>>>
> >
> 7C1000&amp;sdata=4QzTNHaYllwPd1U0kumq75dpwp7Rg0ZXsSQ631jMeqs%
> 3D
> > &
> > > >>>>> amp;reserved=0
> > > >>>>
> > > >>>> The Debian 9 release we tested is not an SEV guest.
> > > >>> ok. I have not tested Debian 9 before. I will try now. Will let
> > > >>> you know how it goes. thanks
> > > >>>
> > > >>
> > > >> I have reproduced the issue locally. Will investigate. thanks
> > > >>
> > > > Few updates.
> > > > 1. Like Jim mentioned earlier, this appears to be guest kernel issue.
> > > > Debian 9 runs the base kernel 4.9.0-14. Problem can be seen
> > > > consistently with this kernel.
> > > >
> > > > 2. This guest kernel(4.9.0-14) does not like the new feature INVPCID.
> > > >
> > > > 3. System comes up fine when invpcid feature is disabled with the
> > > > boot parameter "noinvpcid" and also with "nopcid". nopcid disables
> > > > both pcid and invpcid.
> > > >
> > > > 4. Upgraded the guest kernel to v5.0 and system comes up fine.
> > > >
> > > > 5. Also system comes up fine with latest guest kernel 5.11.0-rc4.
> > > >
> > > > I did not bisect further yet.
> > > > Babu
> > > > Thanks
> > >
> > >
> > > Some more update:
> > >  System comes up fine with kernel v4.9(checked out on upstream tag
> v4.9).
> > > So, I am assuming this is something specific to Debian 4.9.0-14 kernel.
> > >
> > > Note: I couldn't go back prior versions(v4.8 or earlier) due to
> > > compile issues.
> > > Thanks
> > > Babu
> > >
Paolo Bonzini March 10, 2021, 9:08 a.m. UTC | #14
On 10/03/21 02:04, Babu Moger wrote:
> Debian kernel 4.10(tag 4.10~rc6-1~exp1) also works fine. It appears the
> problem is on Debian 4.9 kernel. I am not sure how to run git bisect on
> Debian kernel. Tried anyway. It is pointing to
> 
> 47811c66356d875e76a6ca637a9d384779a659bb is the first bad commit
> commit 47811c66356d875e76a6ca637a9d384779a659bb
> Author: Ben Hutchings<benh@debian.org>
> Date:   Mon Mar 8 01:17:32 2021 +0100
> 
>      Prepare to release linux (4.9.258-1).
> 
> It does not appear to be the right commit. I am out of ideas now.
> hanks

Have you tried bisecting the upstream stable kernels (from 4.9.0 to 
4.9.258)?

Paolo
Babu Moger March 10, 2021, 2:55 p.m. UTC | #15
> -----Original Message-----
> From: Paolo Bonzini <pbonzini@redhat.com>
> Sent: Wednesday, March 10, 2021 3:09 AM
> To: Moger, Babu <Babu.Moger@amd.com>; Jim Mattson
> <jmattson@google.com>
> Cc: Vitaly Kuznetsov <vkuznets@redhat.com>; Wanpeng Li
> <wanpengli@tencent.com>; kvm list <kvm@vger.kernel.org>; Joerg Roedel
> <joro@8bytes.org>; the arch/x86 maintainers <x86@kernel.org>; LKML <linux-
> kernel@vger.kernel.org>; Ingo Molnar <mingo@redhat.com>; Borislav Petkov
> <bp@alien8.de>; H . Peter Anvin <hpa@zytor.com>; Thomas Gleixner
> <tglx@linutronix.de>; Makarand Sonare <makarandsonare@google.com>; Sean
> Christopherson <seanjc@google.com>
> Subject: Re: [PATCH v6 00/12] SVM cleanup and INVPCID feature support
> 
> On 10/03/21 02:04, Babu Moger wrote:
> > Debian kernel 4.10(tag 4.10~rc6-1~exp1) also works fine. It appears
> > the problem is on Debian 4.9 kernel. I am not sure how to run git
> > bisect on Debian kernel. Tried anyway. It is pointing to
> >
> > 47811c66356d875e76a6ca637a9d384779a659bb is the first bad commit
> > commit 47811c66356d875e76a6ca637a9d384779a659bb
> > Author: Ben Hutchings<benh@debian.org>
> > Date:   Mon Mar 8 01:17:32 2021 +0100
> >
> >      Prepare to release linux (4.9.258-1).
> >
> > It does not appear to be the right commit. I am out of ideas now.
> > hanks
> 
> Have you tried bisecting the upstream stable kernels (from 4.9.0 to 4.9.258)?

I couldn't reproduce the issue on any of the upstream versions. I have
tried v4.9, v4.8 and even on latest v5.11. No issues there.

Jim mentioned Debian 10 which is based of kernel version 4.19 is also
fine. Issue appears to be only affecting  Debian 9(kernel v4.9.0-14).
Babu Moger March 10, 2021, 2:58 p.m. UTC | #16
On 3/10/21 8:55 AM, Babu Moger wrote:
> 
> 
>> -----Original Message-----
>> From: Paolo Bonzini <pbonzini@redhat.com>
>> Sent: Wednesday, March 10, 2021 3:09 AM
>> To: Moger, Babu <Babu.Moger@amd.com>; Jim Mattson
>> <jmattson@google.com>
>> Cc: Vitaly Kuznetsov <vkuznets@redhat.com>; Wanpeng Li
>> <wanpengli@tencent.com>; kvm list <kvm@vger.kernel.org>; Joerg Roedel
>> <joro@8bytes.org>; the arch/x86 maintainers <x86@kernel.org>; LKML <linux-
>> kernel@vger.kernel.org>; Ingo Molnar <mingo@redhat.com>; Borislav Petkov
>> <bp@alien8.de>; H . Peter Anvin <hpa@zytor.com>; Thomas Gleixner
>> <tglx@linutronix.de>; Makarand Sonare <makarandsonare@google.com>; Sean
>> Christopherson <seanjc@google.com>
>> Subject: Re: [PATCH v6 00/12] SVM cleanup and INVPCID feature support
>>
>> On 10/03/21 02:04, Babu Moger wrote:
>>> Debian kernel 4.10(tag 4.10~rc6-1~exp1) also works fine. It appears
>>> the problem is on Debian 4.9 kernel. I am not sure how to run git
>>> bisect on Debian kernel. Tried anyway. It is pointing to
>>>
>>> 47811c66356d875e76a6ca637a9d384779a659bb is the first bad commit
>>> commit 47811c66356d875e76a6ca637a9d384779a659bb
>>> Author: Ben Hutchings<benh@debian.org>
>>> Date:   Mon Mar 8 01:17:32 2021 +0100
>>>
>>>      Prepare to release linux (4.9.258-1).
>>>
>>> It does not appear to be the right commit. I am out of ideas now.
>>> hanks
>>
>> Have you tried bisecting the upstream stable kernels (from 4.9.0 to 4.9.258)?

I couldn't reproduce the issue on any of the upstream versions. I have
tried v4.9, v4.8 and even on latest v5.11. No issues there. There is no
upstream version 4.9.258.

Jim mentioned Debian 10 which is based of kernel version 4.19 is also
fine. Issue appears to be only affecting  Debian 9(kernel v4.9.0-14).
Paolo Bonzini March 10, 2021, 3:31 p.m. UTC | #17
On 10/03/21 15:58, Babu Moger wrote:
> There is no upstream version 4.9.258.

Sure there is, check out https://cdn.kernel.org/pub/linux/kernel/v4.x/

The easiest way to do it is to bisect on the linux-4.9.y branch of 
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git.

paolo
Babu Moger March 11, 2021, 1:21 a.m. UTC | #18
On 3/10/21 9:31 AM, Paolo Bonzini wrote:
> On 10/03/21 15:58, Babu Moger wrote:
>> There is no upstream version 4.9.258.
> 
> Sure there is, check out
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcdn.kernel.org%2Fpub%2Flinux%2Fkernel%2Fv4.x%2F&amp;data=04%7C01%7Cbabu.moger%40amd.com%7Caeefc58416ed490faa7f08d8e3d99d72%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637509871127634618%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=re2Jj5P7IjN2UdmPTjTuKd1KIJLek84KlcnsXxgKYRc%3D&amp;reserved=0
> 
> 
> The easiest way to do it is to bisect on the linux-4.9.y branch of
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git.
> 
Paolo, Thanks for pointing that out. Bisected linux-4.9.y branch.
It is pointing at

# git bisect good
59094faf3f618b2d2b2a45acb916437d611cede6 is the first bad commit
commit 59094faf3f618b2d2b2a45acb916437d611cede6
Author: Borislav Petkov <bp@suse.de>
Date:   Mon Dec 25 13:57:16 2017 +0100

    x86/kaiser: Move feature detection up


    ... before the first use of kaiser_enabled as otherwise funky
    things happen:

      about to get started...
      (XEN) d0v0 Unhandled page fault fault/trap [#14, ec=0000]
      (XEN) Pagetable walk from ffff88022a449090:
      (XEN)  L4[0x110] = 0000000229e0e067 0000000000001e0e
      (XEN)  L3[0x008] = 0000000000000000 ffffffffffffffff
      (XEN) domain_crash_sync called from entry.S: fault at ffff82d08033fd08
      entry.o#create_bounce_frame+0x135/0x14d
      (XEN) Domain 0 (vcpu#0) crashed on cpu#0:
      (XEN) ----[ Xen-4.9.1_02-3.21  x86_64  debug=n   Not tainted ]----
      (XEN) CPU:    0
      (XEN) RIP:    e033:[<ffffffff81007460>]
      (XEN) RFLAGS: 0000000000000286   EM: 1   CONTEXT: pv guest (d0v0)

    Signed-off-by: Borislav Petkov <bp@suse.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

:040000 040000 e56bbc975c3fd1a774b6cc0d6699c0c24e66be1c
e06231dccc8589b4baa0cd5759a37899b7ec71c1 M    arch

Not sure what is going on with this commit. Still looking.
Borislav Petkov March 11, 2021, 8:07 p.m. UTC | #19
On Wed, Mar 10, 2021 at 07:21:23PM -0600, Babu Moger wrote:
> # git bisect good
> 59094faf3f618b2d2b2a45acb916437d611cede6 is the first bad commit
> commit 59094faf3f618b2d2b2a45acb916437d611cede6
> Author: Borislav Petkov <bp@suse.de>
> Date:   Mon Dec 25 13:57:16 2017 +0100
> 
>     x86/kaiser: Move feature detection up

What is the reproducer?

Boot latest 4.9 stable kernel in a SEV guest? Can you send guest
.config?

Upthread is talking about PCID, so I'm guessing host needs to be Zen3
with PCID. Anything else?

Thx.
Borislav Petkov March 11, 2021, 8:32 p.m. UTC | #20
On Thu, Mar 11, 2021 at 09:07:55PM +0100, Borislav Petkov wrote:
> On Wed, Mar 10, 2021 at 07:21:23PM -0600, Babu Moger wrote:
> > # git bisect good
> > 59094faf3f618b2d2b2a45acb916437d611cede6 is the first bad commit
> > commit 59094faf3f618b2d2b2a45acb916437d611cede6
> > Author: Borislav Petkov <bp@suse.de>
> > Date:   Mon Dec 25 13:57:16 2017 +0100
> > 
> >     x86/kaiser: Move feature detection up
> 
> What is the reproducer?
> 
> Boot latest 4.9 stable kernel in a SEV guest? Can you send guest
> .config?
> 
> Upthread is talking about PCID, so I'm guessing host needs to be Zen3
> with PCID. Anything else?

That oops points to:

[    1.237515] kernel BUG at /build/linux-dqnRSc/linux-4.9.228/arch/x86/kernel/alternative.c:709!

which is:

        local_flush_tlb();
        sync_core();
        /* Could also do a CLFLUSH here to speed up CPU recovery; but
           that causes hangs on some VIA CPUs. */
        for (i = 0; i < len; i++)
                BUG_ON(((char *)addr)[i] != ((char *)opcode)[i]);	<---
        local_irq_restore(flags);
        return addr;

in text_poke() which basically says that the patching verification
fails. And you have a local_flush_tlb() before that. And with PCID maybe
it is not flushing properly or whatnot.

And deep down in the TLB flushing code, it does:

        if (kaiser_enabled)
                kaiser_flush_tlb_on_return_to_user();

and that uses PCID...

Anyway, needs more info.
Babu Moger March 11, 2021, 8:57 p.m. UTC | #21
On 3/11/21 2:32 PM, Borislav Petkov wrote:
> On Thu, Mar 11, 2021 at 09:07:55PM +0100, Borislav Petkov wrote:
>> On Wed, Mar 10, 2021 at 07:21:23PM -0600, Babu Moger wrote:
>>> # git bisect good
>>> 59094faf3f618b2d2b2a45acb916437d611cede6 is the first bad commit
>>> commit 59094faf3f618b2d2b2a45acb916437d611cede6
>>> Author: Borislav Petkov <bp@suse.de>
>>> Date:   Mon Dec 25 13:57:16 2017 +0100
>>>
>>>     x86/kaiser: Move feature detection up
>>
>> What is the reproducer?
>>
>> Boot latest 4.9 stable kernel in a SEV guest? Can you send guest
>> .config?
>>
>> Upthread is talking about PCID, so I'm guessing host needs to be Zen3
>> with PCID. Anything else?
> 
> That oops points to:
> 
> [    1.237515] kernel BUG at /build/linux-dqnRSc/linux-4.9.228/arch/x86/kernel/alternative.c:709!
> 
> which is:
> 
>         local_flush_tlb();
>         sync_core();
>         /* Could also do a CLFLUSH here to speed up CPU recovery; but
>            that causes hangs on some VIA CPUs. */
>         for (i = 0; i < len; i++)
>                 BUG_ON(((char *)addr)[i] != ((char *)opcode)[i]);	<---
>         local_irq_restore(flags);
>         return addr;
> 
> in text_poke() which basically says that the patching verification
> fails. And you have a local_flush_tlb() before that. And with PCID maybe
> it is not flushing properly or whatnot.
> 
> And deep down in the TLB flushing code, it does:
> 
>         if (kaiser_enabled)
>                 kaiser_flush_tlb_on_return_to_user();
> 
> and that uses PCID...
> 
> Anyway, needs more info.

Boris,
 It is related PCID and INVPCID combination. Few more details.
 1. System comes up fine with "noinvpid". So, it happens when invpcid is
enabled.
 2. Host is coming up fine. Problem is with the guest.
 3. Problem happens with Debian 9. Debian kernel version is 4.9.0-14.
 4. Debian 10 is fine.
 5. Upstream kernels are fine. Tried on v5.11 and it is working fine.
 6. Git bisect pointed to commit 47811c66356d875e76a6ca637a9d384779a659bb.

 Let me know if want me to try something else.
thanks
Babu
Jim Mattson March 11, 2021, 9:23 p.m. UTC | #22
On Thu, Mar 11, 2021 at 12:32 PM Borislav Petkov <bp@alien8.de> wrote:
>
> On Thu, Mar 11, 2021 at 09:07:55PM +0100, Borislav Petkov wrote:
> > On Wed, Mar 10, 2021 at 07:21:23PM -0600, Babu Moger wrote:
> > > # git bisect good
> > > 59094faf3f618b2d2b2a45acb916437d611cede6 is the first bad commit
> > > commit 59094faf3f618b2d2b2a45acb916437d611cede6
> > > Author: Borislav Petkov <bp@suse.de>
> > > Date:   Mon Dec 25 13:57:16 2017 +0100
> > >
> > >     x86/kaiser: Move feature detection up
> >
> > What is the reproducer?
> >
> > Boot latest 4.9 stable kernel in a SEV guest? Can you send guest
> > .config?
> >
> > Upthread is talking about PCID, so I'm guessing host needs to be Zen3
> > with PCID. Anything else?
>
> That oops points to:
>
> [    1.237515] kernel BUG at /build/linux-dqnRSc/linux-4.9.228/arch/x86/kernel/alternative.c:709!
>
> which is:
>
>         local_flush_tlb();
>         sync_core();
>         /* Could also do a CLFLUSH here to speed up CPU recovery; but
>            that causes hangs on some VIA CPUs. */
>         for (i = 0; i < len; i++)
>                 BUG_ON(((char *)addr)[i] != ((char *)opcode)[i]);       <---
>         local_irq_restore(flags);
>         return addr;
>
> in text_poke() which basically says that the patching verification
> fails. And you have a local_flush_tlb() before that. And with PCID maybe
> it is not flushing properly or whatnot.
>
> And deep down in the TLB flushing code, it does:
>
>         if (kaiser_enabled)
>                 kaiser_flush_tlb_on_return_to_user();
>
> and that uses PCID...

I would expect kaiser_enabled to be false (and PCIDs not to be used),
since AMD CPUs are not vulnerable to Meltdown.
Borislav Petkov March 11, 2021, 9:36 p.m. UTC | #23
On Thu, Mar 11, 2021 at 01:23:47PM -0800, Jim Mattson wrote:
> I would expect kaiser_enabled to be false (and PCIDs not to be used),
> since AMD CPUs are not vulnerable to Meltdown.

Ah, of course. The guest dmesg should have

"Kernel/User page tables isolation: disabled."

Lemme see if I can reproduce.
Borislav Petkov March 11, 2021, 9:40 p.m. UTC | #24
On Thu, Mar 11, 2021 at 02:57:04PM -0600, Babu Moger wrote:
>  It is related PCID and INVPCID combination. Few more details.
>  1. System comes up fine with "noinvpid". So, it happens when invpcid is
> enabled.

Which system, host or guest?

>  2. Host is coming up fine. Problem is with the guest.

Aha, guest.

>  3. Problem happens with Debian 9. Debian kernel version is 4.9.0-14.
>  4. Debian 10 is fine.
>  5. Upstream kernels are fine. Tried on v5.11 and it is working fine.
>  6. Git bisect pointed to commit 47811c66356d875e76a6ca637a9d384779a659bb.
> 
>  Let me know if want me to try something else.

Yes, I assume host has the patches which belong to this thread?

So please describe:

1. host has these patches, cmdline params, etc.
2. guest is a 4.9 kernel, cmdline params, etc.

Please be exact and specific so that I can properly reproduce.

Thx.
Babu Moger March 11, 2021, 9:50 p.m. UTC | #25
On 3/11/21 3:36 PM, Borislav Petkov wrote:
> On Thu, Mar 11, 2021 at 01:23:47PM -0800, Jim Mattson wrote:
>> I would expect kaiser_enabled to be false (and PCIDs not to be used),
>> since AMD CPUs are not vulnerable to Meltdown.
> 
> Ah, of course. The guest dmesg should have
> 
> "Kernel/User page tables isolation: disabled."

yes.
 #dmesg |grep isolation
[    0.000000] Kernel/User page tables isolation: disabled

> 
> Lemme see if I can reproduce.
>
Babu Moger March 11, 2021, 10:04 p.m. UTC | #26
On 3/11/21 3:40 PM, Borislav Petkov wrote:
> On Thu, Mar 11, 2021 at 02:57:04PM -0600, Babu Moger wrote:
>>  It is related PCID and INVPCID combination. Few more details.
>>  1. System comes up fine with "noinvpid". So, it happens when invpcid is
>> enabled.
> 
> Which system, host or guest?
> 
>>  2. Host is coming up fine. Problem is with the guest.
> 
> Aha, guest.
> 
>>  3. Problem happens with Debian 9. Debian kernel version is 4.9.0-14.
>>  4. Debian 10 is fine.
>>  5. Upstream kernels are fine. Tried on v5.11 and it is working fine.
>>  6. Git bisect pointed to commit 47811c66356d875e76a6ca637a9d384779a659bb.
>>
>>  Let me know if want me to try something else.
> 
> Yes, I assume host has the patches which belong to this thread?

Yes. Host has all these patches. Right now I am on 5.12.0-rc2. I just
updated yesterday. I was able to reproduce 5.11 also.


> 
> So please describe:
> 
> 1. host has these patches, cmdline params, etc.

# cat /proc/cmdline
BOOT_IMAGE=(hd0,gpt2)/vmlinuz-5.12.0-rc2+ root=/dev/mapper/rhel-root ro
crashkernel=auto resume=/dev/mapper/rhel-swap rd.lvm.lv=rhel/root
rd.lvm.lv=rhel/swap ras=cec_disable nmi_watchdog=0 warn_ud2=on selinux=0
earlyprintk=serial,ttyS1,115200n8 console=ttyS1,115200n8


> 2. guest is a 4.9 kernel, cmdline params, etc.

I use qemu command line to bring up the guest. Make sure to use "-cpu host".

qemu-system-x86_64 -name deb9 -m 16384 -smp cores=16,threads=1,sockets=1
-hda vdisk-deb.qcow2 -enable-kvm -net nic  -net
bridge,br=virbr0,helper=/usr/libexec/qemu-bridge-helper -cpu host,+svm
-nographic


The grub command line looks like this on the guest.

cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-4.9.0-14-amd64
root=UUID=a0069240-cd60-4795-a391-273266dbae29 ro console=ttyS0,112500n8
earlyprintk
Babu Moger March 11, 2021, 10:15 p.m. UTC | #27
On 3/11/21 4:04 PM, Babu Moger wrote:
> 
> 
> On 3/11/21 3:40 PM, Borislav Petkov wrote:
>> On Thu, Mar 11, 2021 at 02:57:04PM -0600, Babu Moger wrote:
>>>  It is related PCID and INVPCID combination. Few more details.
>>>  1. System comes up fine with "noinvpid". So, it happens when invpcid is
>>> enabled.
>>
>> Which system, host or guest?
>>
>>>  2. Host is coming up fine. Problem is with the guest.
>>
>> Aha, guest.
>>
>>>  3. Problem happens with Debian 9. Debian kernel version is 4.9.0-14.
>>>  4. Debian 10 is fine.
>>>  5. Upstream kernels are fine. Tried on v5.11 and it is working fine.
>>>  6. Git bisect pointed to commit 47811c66356d875e76a6ca637a9d384779a659bb.
>>>
>>>  Let me know if want me to try something else.
>>
>> Yes, I assume host has the patches which belong to this thread?
> 
> Yes. Host has all these patches. Right now I am on 5.12.0-rc2. I just
> updated yesterday. I was able to reproduce 5.11 also.
> 
> 
>>
>> So please describe:
>>
>> 1. host has these patches, cmdline params, etc.

My host is
# cat /etc/redhat-release
Red Hat Enterprise Linux release 8.3 (Ootpa)
# uname -r
5.12.0-rc2+


> 
> # cat /proc/cmdline
> BOOT_IMAGE=(hd0,gpt2)/vmlinuz-5.12.0-rc2+ root=/dev/mapper/rhel-root ro
> crashkernel=auto resume=/dev/mapper/rhel-swap rd.lvm.lv=rhel/root
> rd.lvm.lv=rhel/swap ras=cec_disable nmi_watchdog=0 warn_ud2=on selinux=0
> earlyprintk=serial,ttyS1,115200n8 console=ttyS1,115200n8
> 
> 
>> 2. guest is a 4.9 kernel, cmdline params, etc.
> 
> I use qemu command line to bring up the guest. Make sure to use "-cpu host".
> 
> qemu-system-x86_64 -name deb9 -m 16384 -smp cores=16,threads=1,sockets=1
> -hda vdisk-deb.qcow2 -enable-kvm -net nic  -net
> bridge,br=virbr0,helper=/usr/libexec/qemu-bridge-helper -cpu host,+svm
> -nographic
> 
> 
> The grub command line looks like this on the guest.
> 
> cat /proc/cmdline
> BOOT_IMAGE=/boot/vmlinuz-4.9.0-14-amd64
> root=UUID=a0069240-cd60-4795-a391-273266dbae29 ro console=ttyS0,112500n8
> earlyprintk
>
Borislav Petkov March 11, 2021, 11:52 p.m. UTC | #28
On Thu, Mar 11, 2021 at 04:15:37PM -0600, Babu Moger wrote:
> My host is
> # cat /etc/redhat-release
> Red Hat Enterprise Linux release 8.3 (Ootpa)
> # uname -r
> 5.12.0-rc2+

Please upload host and guest .config.

Thx.
Babu Moger March 12, 2021, 4:12 p.m. UTC | #29
> -----Original Message-----
> From: Borislav Petkov <bp@alien8.de>
> Sent: Thursday, March 11, 2021 5:52 PM
> To: Moger, Babu <Babu.Moger@amd.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>; Jim Mattson
> <jmattson@google.com>; Vitaly Kuznetsov <vkuznets@redhat.com>; Wanpeng
> Li <wanpengli@tencent.com>; kvm list <kvm@vger.kernel.org>; Joerg Roedel
> <joro@8bytes.org>; the arch/x86 maintainers <x86@kernel.org>; LKML <linux-
> kernel@vger.kernel.org>; Ingo Molnar <mingo@redhat.com>; H . Peter Anvin
> <hpa@zytor.com>; Thomas Gleixner <tglx@linutronix.de>; Makarand Sonare
> <makarandsonare@google.com>; Sean Christopherson <seanjc@google.com>
> Subject: Re: [PATCH v6 00/12] SVM cleanup and INVPCID feature support
> 
> On Thu, Mar 11, 2021 at 04:15:37PM -0600, Babu Moger wrote:
> > My host is
> > # cat /etc/redhat-release
> > Red Hat Enterprise Linux release 8.3 (Ootpa) # uname -r 5.12.0-rc2+
> 
> Please upload host and guest .config.

Host config.

https://pastebin.com/wuLzEqZr

Guest config

https://pastebin.com/mvzEEq6R
Borislav Petkov March 24, 2021, 9:21 p.m. UTC | #30
Ok,

some more experimenting Babu and I did lead us to:

---
diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
index f5ca15622dc9..259aa4889cad 100644
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -250,6 +250,9 @@ static inline void __native_flush_tlb_single(unsigned long addr)
 	 */
 	if (kaiser_enabled)
 		invpcid_flush_one(X86_CR3_PCID_ASID_USER, addr);
+	else
+		asm volatile("invlpg (%0)" ::"r" (addr) : "memory");
+
 	invpcid_flush_one(X86_CR3_PCID_ASID_KERN, addr);
 }

applied on the guest kernel which fixes the issue. And let me add Hugh
who did that PCID stuff at the time. So lemme summarize for Hugh and to
ask him nicely to sanity-check me. :-)

Basically, you have an AMD host which supports PCID and INVPCID and you
boot on it a 4.9 guest. It explodes like the panic below.

What fixes it is this:

diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
index f5ca15622dc9..259aa4889cad 100644
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -250,6 +250,9 @@ static inline void __native_flush_tlb_single(unsigned long addr)
 	 */
 	if (kaiser_enabled)
 		invpcid_flush_one(X86_CR3_PCID_ASID_USER, addr);
+	else
+		asm volatile("invlpg (%0)" ::"r" (addr) : "memory");
+
 	invpcid_flush_one(X86_CR3_PCID_ASID_KERN, addr);
 }

---

and the reason why it does, IMHO, is because on AMD, kaiser_enabled is
false because AMD is not affected by Meltdown, which means, there's no
user/kernel pagetables split.

And that also means, you have global TLB entries which means that if you
look at that __native_flush_tlb_single() function, it needs to flush
global TLB entries on CPUs with X86_FEATURE_INVPCID_SINGLE by doing an
INVLPG in the kaiser_enabled=0 case. Errgo, the above hunk.

But I might be completely off here thus this note...

Thoughts?

Thx.


[    1.235726] ------------[ cut here ]------------
[    1.237515] kernel BUG at /build/linux-dqnRSc/linux-4.9.228/arch/x86/kernel/alternative.c:709!
[    1.240926] invalid opcode: 0000 [#1] SMP
[    1.243301] Modules linked in:
[    1.244585] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.9.0-13-amd64 #1 Debian 4.9.228-1
[    1.247657] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
[    1.251249] task: ffff909363e94040 task.stack: ffffa41bc0194000
[    1.253519] RIP: 0010:[<ffffffff8fa2e40c>]  [<ffffffff8fa2e40c>] text_poke+0x18c/0x240
[    1.256593] RSP: 0018:ffffa41bc0197d90  EFLAGS: 00010096
[    1.258657] RAX: 000000000000000f RBX: 0000000001020800 RCX: 00000000feda3203
[    1.261388] RDX: 00000000178bfbff RSI: 0000000000000000 RDI: ffffffffff57a000
[    1.264168] RBP: ffffffff8fbd3eca R08: 0000000000000000 R09: 0000000000000003
[    1.266983] R10: 0000000000000003 R11: 0000000000000112 R12: 0000000000000001
[    1.269702] R13: ffffa41bc0197dcf R14: 0000000000000286 R15: ffffed1c40407500
[    1.272572] FS:  0000000000000000(0000) GS:ffff909366300000(0000) knlGS:0000000000000000
[    1.275791] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    1.278032] CR2: 0000000000000000 CR3: 0000000010c08000 CR4: 00000000003606f0
[    1.280815] Stack:
[    1.281630]  ffffffff8fbd3eca 0000000000000005 ffffa41bc0197e03 ffffffff8fbd3ecb
[    1.284660]  0000000000000000 0000000000000000 ffffffff8fa2e835 ccffffff8fad4326
[    1.287729]  1ccd0231874d55d3 ffffffff8fbd3eca ffffa41bc0197e03 ffffffff90203844
[    1.290852] Call Trace:
[    1.291782]  [<ffffffff8fbd3eca>] ? swap_entry_free+0x12a/0x300
[    1.294900]  [<ffffffff8fbd3ecb>] ? swap_entry_free+0x12b/0x300
[    1.297267]  [<ffffffff8fa2e835>] ? text_poke_bp+0x55/0xe0
[    1.299473]  [<ffffffff8fbd3eca>] ? swap_entry_free+0x12a/0x300
[    1.301896]  [<ffffffff8fa2b64c>] ? arch_jump_label_transform+0x9c/0x120
[    1.304557]  [<ffffffff9073e81f>] ? set_debug_rodata+0xc/0xc
[    1.306790]  [<ffffffff8fb81d92>] ? __jump_label_update+0x72/0x80
[    1.309255]  [<ffffffff8fb8206f>] ? static_key_slow_inc+0x8f/0xa0
[    1.311680]  [<ffffffff8fbd7a57>] ? frontswap_register_ops+0x107/0x1d0
[    1.314281]  [<ffffffff9077078c>] ? init_zswap+0x282/0x3f6
[    1.316547]  [<ffffffff9077050a>] ? init_frontswap+0x8c/0x8c
[    1.318784]  [<ffffffff8fa0223e>] ? do_one_initcall+0x4e/0x180
[    1.321067]  [<ffffffff9073e81f>] ? set_debug_rodata+0xc/0xc
[    1.323366]  [<ffffffff9073f08d>] ? kernel_init_freeable+0x16b/0x1ec
[    1.325873]  [<ffffffff90011d50>] ? rest_init+0x80/0x80
[    1.327989]  [<ffffffff90011d5a>] ? kernel_init+0xa/0x100
[    1.330092]  [<ffffffff9001f424>] ? ret_from_fork+0x44/0x70
[    1.332311] Code: 00 0f a2 4d 85 e4 74 4a 0f b6 45 00 41 38 45 00 75 19 31 c0 83 c0 01 48 63 d0 49 39 d4 76 33 41 0f b6 4c 15 00 38 4c 15 00 74 e9 <0f> 0b 48 89 ef e8 da d6 19 00 48 8d bd 00 10 00 00 48 89 c3 e8 
[    1.342818] RIP  [<ffffffff8fa2e40c>] text_poke+0x18c/0x240
[    1.345859]  RSP <ffffa41bc0197d90>
[    1.347285] ---[ end trace 0a1c5ab5eb16de89 ]---
[    1.349169] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[    1.349169] 
[    1.352885] Kernel Offset: 0xea00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[    1.357039] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[    1.357039]
Paolo Bonzini March 24, 2021, 9:59 p.m. UTC | #31
On 24/03/21 22:21, Borislav Petkov wrote:
>  	if (kaiser_enabled)
>   		invpcid_flush_one(X86_CR3_PCID_ASID_USER, addr);
> +	else
> +		asm volatile("invlpg (%0)" ::"r" (addr) : "memory");
> +
>   	invpcid_flush_one(X86_CR3_PCID_ASID_KERN, addr);
>   }

I think the kernel ASID flush can also be moved under the "if"?

> and the reason why it does, IMHO, is because on AMD, kaiser_enabled is
> false because AMD is not affected by Meltdown, which means, there's no
> user/kernel pagetables split.
> 
> And that also means, you have global TLB entries which means that if you
> look at that __native_flush_tlb_single() function, it needs to flush
> global TLB entries on CPUs with X86_FEATURE_INVPCID_SINGLE by doing an
> INVLPG in the kaiser_enabled=0 case. Errgo, the above hunk.

Makes sense.

Paolo
Hugh Dickins March 25, 2021, 12:05 a.m. UTC | #32
On Wed, 24 Mar 2021, Borislav Petkov wrote:

> Ok,
> 
> some more experimenting Babu and I did lead us to:
> 
> ---
> diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
> index f5ca15622dc9..259aa4889cad 100644
> --- a/arch/x86/include/asm/tlbflush.h
> +++ b/arch/x86/include/asm/tlbflush.h
> @@ -250,6 +250,9 @@ static inline void __native_flush_tlb_single(unsigned long addr)
>  	 */
>  	if (kaiser_enabled)
>  		invpcid_flush_one(X86_CR3_PCID_ASID_USER, addr);
> +	else
> +		asm volatile("invlpg (%0)" ::"r" (addr) : "memory");
> +
>  	invpcid_flush_one(X86_CR3_PCID_ASID_KERN, addr);
>  }
> 
> applied on the guest kernel which fixes the issue. And let me add Hugh
> who did that PCID stuff at the time. So lemme summarize for Hugh and to
> ask him nicely to sanity-check me. :-)

Just a brief interim note to assure you that I'm paying attention,
but wow, it's a long time since I gave any thought down here!
Trying to page it all back in...

I see no harm in your workaround if it works, but it's not as if
this is a previously untried path: so I'm suspicious how an issue
here with Globals could have gone unnoticed for so long, and need
to understand it better.

Hugh

> 
> Basically, you have an AMD host which supports PCID and INVPCID and you
> boot on it a 4.9 guest. It explodes like the panic below.
> 
> What fixes it is this:
> 
> diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
> index f5ca15622dc9..259aa4889cad 100644
> --- a/arch/x86/include/asm/tlbflush.h
> +++ b/arch/x86/include/asm/tlbflush.h
> @@ -250,6 +250,9 @@ static inline void __native_flush_tlb_single(unsigned long addr)
>  	 */
>  	if (kaiser_enabled)
>  		invpcid_flush_one(X86_CR3_PCID_ASID_USER, addr);
> +	else
> +		asm volatile("invlpg (%0)" ::"r" (addr) : "memory");
> +
>  	invpcid_flush_one(X86_CR3_PCID_ASID_KERN, addr);
>  }
> 
> ---
> 
> and the reason why it does, IMHO, is because on AMD, kaiser_enabled is
> false because AMD is not affected by Meltdown, which means, there's no
> user/kernel pagetables split.
> 
> And that also means, you have global TLB entries which means that if you
> look at that __native_flush_tlb_single() function, it needs to flush
> global TLB entries on CPUs with X86_FEATURE_INVPCID_SINGLE by doing an
> INVLPG in the kaiser_enabled=0 case. Errgo, the above hunk.
> 
> But I might be completely off here thus this note...
> 
> Thoughts?
> 
> Thx.
> 
> 
> [    1.235726] ------------[ cut here ]------------
> [    1.237515] kernel BUG at /build/linux-dqnRSc/linux-4.9.228/arch/x86/kernel/alternative.c:709!
> [    1.240926] invalid opcode: 0000 [#1] SMP
> [    1.243301] Modules linked in:
> [    1.244585] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.9.0-13-amd64 #1 Debian 4.9.228-1
> [    1.247657] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> [    1.251249] task: ffff909363e94040 task.stack: ffffa41bc0194000
> [    1.253519] RIP: 0010:[<ffffffff8fa2e40c>]  [<ffffffff8fa2e40c>] text_poke+0x18c/0x240
> [    1.256593] RSP: 0018:ffffa41bc0197d90  EFLAGS: 00010096
> [    1.258657] RAX: 000000000000000f RBX: 0000000001020800 RCX: 00000000feda3203
> [    1.261388] RDX: 00000000178bfbff RSI: 0000000000000000 RDI: ffffffffff57a000
> [    1.264168] RBP: ffffffff8fbd3eca R08: 0000000000000000 R09: 0000000000000003
> [    1.266983] R10: 0000000000000003 R11: 0000000000000112 R12: 0000000000000001
> [    1.269702] R13: ffffa41bc0197dcf R14: 0000000000000286 R15: ffffed1c40407500
> [    1.272572] FS:  0000000000000000(0000) GS:ffff909366300000(0000) knlGS:0000000000000000
> [    1.275791] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    1.278032] CR2: 0000000000000000 CR3: 0000000010c08000 CR4: 00000000003606f0
> [    1.280815] Stack:
> [    1.281630]  ffffffff8fbd3eca 0000000000000005 ffffa41bc0197e03 ffffffff8fbd3ecb
> [    1.284660]  0000000000000000 0000000000000000 ffffffff8fa2e835 ccffffff8fad4326
> [    1.287729]  1ccd0231874d55d3 ffffffff8fbd3eca ffffa41bc0197e03 ffffffff90203844
> [    1.290852] Call Trace:
> [    1.291782]  [<ffffffff8fbd3eca>] ? swap_entry_free+0x12a/0x300
> [    1.294900]  [<ffffffff8fbd3ecb>] ? swap_entry_free+0x12b/0x300
> [    1.297267]  [<ffffffff8fa2e835>] ? text_poke_bp+0x55/0xe0
> [    1.299473]  [<ffffffff8fbd3eca>] ? swap_entry_free+0x12a/0x300
> [    1.301896]  [<ffffffff8fa2b64c>] ? arch_jump_label_transform+0x9c/0x120
> [    1.304557]  [<ffffffff9073e81f>] ? set_debug_rodata+0xc/0xc
> [    1.306790]  [<ffffffff8fb81d92>] ? __jump_label_update+0x72/0x80
> [    1.309255]  [<ffffffff8fb8206f>] ? static_key_slow_inc+0x8f/0xa0
> [    1.311680]  [<ffffffff8fbd7a57>] ? frontswap_register_ops+0x107/0x1d0
> [    1.314281]  [<ffffffff9077078c>] ? init_zswap+0x282/0x3f6
> [    1.316547]  [<ffffffff9077050a>] ? init_frontswap+0x8c/0x8c
> [    1.318784]  [<ffffffff8fa0223e>] ? do_one_initcall+0x4e/0x180
> [    1.321067]  [<ffffffff9073e81f>] ? set_debug_rodata+0xc/0xc
> [    1.323366]  [<ffffffff9073f08d>] ? kernel_init_freeable+0x16b/0x1ec
> [    1.325873]  [<ffffffff90011d50>] ? rest_init+0x80/0x80
> [    1.327989]  [<ffffffff90011d5a>] ? kernel_init+0xa/0x100
> [    1.330092]  [<ffffffff9001f424>] ? ret_from_fork+0x44/0x70
> [    1.332311] Code: 00 0f a2 4d 85 e4 74 4a 0f b6 45 00 41 38 45 00 75 19 31 c0 83 c0 01 48 63 d0 49 39 d4 76 33 41 0f b6 4c 15 00 38 4c 15 00 74 e9 <0f> 0b 48 89 ef e8 da d6 19 00 48 8d bd 00 10 00 00 48 89 c3 e8 
> [    1.342818] RIP  [<ffffffff8fa2e40c>] text_poke+0x18c/0x240
> [    1.345859]  RSP <ffffa41bc0197d90>
> [    1.347285] ---[ end trace 0a1c5ab5eb16de89 ]---
> [    1.349169] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
> [    1.349169] 
> [    1.352885] Kernel Offset: 0xea00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> [    1.357039] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
> [    1.357039] 
> 
> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> https://people.kernel.org/tglx/notes-about-netiquette
>
Hugh Dickins March 25, 2021, 2:43 a.m. UTC | #33
On Wed, 24 Mar 2021, Hugh Dickins wrote:
> On Wed, 24 Mar 2021, Borislav Petkov wrote:
> 
> > Ok,
> > 
> > some more experimenting Babu and I did lead us to:
> > 
> > ---
> > diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
> > index f5ca15622dc9..259aa4889cad 100644
> > --- a/arch/x86/include/asm/tlbflush.h
> > +++ b/arch/x86/include/asm/tlbflush.h
> > @@ -250,6 +250,9 @@ static inline void __native_flush_tlb_single(unsigned long addr)
> >  	 */
> >  	if (kaiser_enabled)
> >  		invpcid_flush_one(X86_CR3_PCID_ASID_USER, addr);
> > +	else
> > +		asm volatile("invlpg (%0)" ::"r" (addr) : "memory");
> > +
> >  	invpcid_flush_one(X86_CR3_PCID_ASID_KERN, addr);
> >  }
> > 
> > applied on the guest kernel which fixes the issue. And let me add Hugh
> > who did that PCID stuff at the time. So lemme summarize for Hugh and to
> > ask him nicely to sanity-check me. :-)
> 
> Just a brief interim note to assure you that I'm paying attention,
> but wow, it's a long time since I gave any thought down here!
> Trying to page it all back in...
> 
> I see no harm in your workaround if it works, but it's not as if
> this is a previously untried path: so I'm suspicious how an issue
> here with Globals could have gone unnoticed for so long, and need
> to understand it better.

Right, after looking into it more, I completely agree with you:
the Kaiser series (in both 4.4-stable and 4.9-stable) was simply
wrong to lose that invlpg - fine in the kaiser case when we don't
enable Globals at all, but plain wrong in the !kaiser_enabled case.
One way or another, we have somehow got away with it for three years.

I do agree with Paolo that the PCID_ASID_KERN flush would be better
moved under the "if (kaiser_enabled)" now. (And if this were ongoing
development, I'd want to rewrite the function altogether: but no,
these old stable trees are not the place for that.)

Boris, may I leave both -stable fixes to you?
Let me know if you'd prefer me to clean up my mess.

Thanks a lot for tracking this down,
Hugh

> > 
> > Basically, you have an AMD host which supports PCID and INVPCID and you
> > boot on it a 4.9 guest. It explodes like the panic below.
> > 
> > What fixes it is this:
> > 
> > diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
> > index f5ca15622dc9..259aa4889cad 100644
> > --- a/arch/x86/include/asm/tlbflush.h
> > +++ b/arch/x86/include/asm/tlbflush.h
> > @@ -250,6 +250,9 @@ static inline void __native_flush_tlb_single(unsigned long addr)
> >  	 */
> >  	if (kaiser_enabled)
> >  		invpcid_flush_one(X86_CR3_PCID_ASID_USER, addr);
> > +	else
> > +		asm volatile("invlpg (%0)" ::"r" (addr) : "memory");
> > +
> >  	invpcid_flush_one(X86_CR3_PCID_ASID_KERN, addr);
> >  }
> > 
> > ---
> > 
> > and the reason why it does, IMHO, is because on AMD, kaiser_enabled is
> > false because AMD is not affected by Meltdown, which means, there's no
> > user/kernel pagetables split.
> > 
> > And that also means, you have global TLB entries which means that if you
> > look at that __native_flush_tlb_single() function, it needs to flush
> > global TLB entries on CPUs with X86_FEATURE_INVPCID_SINGLE by doing an
> > INVLPG in the kaiser_enabled=0 case. Errgo, the above hunk.
> > 
> > But I might be completely off here thus this note...
> > 
> > Thoughts?
> > 
> > Thx.
> > 
> > 
> > [    1.235726] ------------[ cut here ]------------
> > [    1.237515] kernel BUG at /build/linux-dqnRSc/linux-4.9.228/arch/x86/kernel/alternative.c:709!
> > [    1.240926] invalid opcode: 0000 [#1] SMP
> > [    1.243301] Modules linked in:
> > [    1.244585] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.9.0-13-amd64 #1 Debian 4.9.228-1
> > [    1.247657] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> > [    1.251249] task: ffff909363e94040 task.stack: ffffa41bc0194000
> > [    1.253519] RIP: 0010:[<ffffffff8fa2e40c>]  [<ffffffff8fa2e40c>] text_poke+0x18c/0x240
> > [    1.256593] RSP: 0018:ffffa41bc0197d90  EFLAGS: 00010096
> > [    1.258657] RAX: 000000000000000f RBX: 0000000001020800 RCX: 00000000feda3203
> > [    1.261388] RDX: 00000000178bfbff RSI: 0000000000000000 RDI: ffffffffff57a000
> > [    1.264168] RBP: ffffffff8fbd3eca R08: 0000000000000000 R09: 0000000000000003
> > [    1.266983] R10: 0000000000000003 R11: 0000000000000112 R12: 0000000000000001
> > [    1.269702] R13: ffffa41bc0197dcf R14: 0000000000000286 R15: ffffed1c40407500
> > [    1.272572] FS:  0000000000000000(0000) GS:ffff909366300000(0000) knlGS:0000000000000000
> > [    1.275791] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [    1.278032] CR2: 0000000000000000 CR3: 0000000010c08000 CR4: 00000000003606f0
> > [    1.280815] Stack:
> > [    1.281630]  ffffffff8fbd3eca 0000000000000005 ffffa41bc0197e03 ffffffff8fbd3ecb
> > [    1.284660]  0000000000000000 0000000000000000 ffffffff8fa2e835 ccffffff8fad4326
> > [    1.287729]  1ccd0231874d55d3 ffffffff8fbd3eca ffffa41bc0197e03 ffffffff90203844
> > [    1.290852] Call Trace:
> > [    1.291782]  [<ffffffff8fbd3eca>] ? swap_entry_free+0x12a/0x300
> > [    1.294900]  [<ffffffff8fbd3ecb>] ? swap_entry_free+0x12b/0x300
> > [    1.297267]  [<ffffffff8fa2e835>] ? text_poke_bp+0x55/0xe0
> > [    1.299473]  [<ffffffff8fbd3eca>] ? swap_entry_free+0x12a/0x300
> > [    1.301896]  [<ffffffff8fa2b64c>] ? arch_jump_label_transform+0x9c/0x120
> > [    1.304557]  [<ffffffff9073e81f>] ? set_debug_rodata+0xc/0xc
> > [    1.306790]  [<ffffffff8fb81d92>] ? __jump_label_update+0x72/0x80
> > [    1.309255]  [<ffffffff8fb8206f>] ? static_key_slow_inc+0x8f/0xa0
> > [    1.311680]  [<ffffffff8fbd7a57>] ? frontswap_register_ops+0x107/0x1d0
> > [    1.314281]  [<ffffffff9077078c>] ? init_zswap+0x282/0x3f6
> > [    1.316547]  [<ffffffff9077050a>] ? init_frontswap+0x8c/0x8c
> > [    1.318784]  [<ffffffff8fa0223e>] ? do_one_initcall+0x4e/0x180
> > [    1.321067]  [<ffffffff9073e81f>] ? set_debug_rodata+0xc/0xc
> > [    1.323366]  [<ffffffff9073f08d>] ? kernel_init_freeable+0x16b/0x1ec
> > [    1.325873]  [<ffffffff90011d50>] ? rest_init+0x80/0x80
> > [    1.327989]  [<ffffffff90011d5a>] ? kernel_init+0xa/0x100
> > [    1.330092]  [<ffffffff9001f424>] ? ret_from_fork+0x44/0x70
> > [    1.332311] Code: 00 0f a2 4d 85 e4 74 4a 0f b6 45 00 41 38 45 00 75 19 31 c0 83 c0 01 48 63 d0 49 39 d4 76 33 41 0f b6 4c 15 00 38 4c 15 00 74 e9 <0f> 0b 48 89 ef e8 da d6 19 00 48 8d bd 00 10 00 00 48 89 c3 e8 
> > [    1.342818] RIP  [<ffffffff8fa2e40c>] text_poke+0x18c/0x240
> > [    1.345859]  RSP <ffffa41bc0197d90>
> > [    1.347285] ---[ end trace 0a1c5ab5eb16de89 ]---
> > [    1.349169] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
> > [    1.349169] 
> > [    1.352885] Kernel Offset: 0xea00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> > [    1.357039] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
> > [    1.357039] 
> > 
> > 
> > -- 
> > Regards/Gruss,
> >     Boris.
> > 
> > https://people.kernel.org/tglx/notes-about-netiquette
Borislav Petkov March 25, 2021, 9:56 a.m. UTC | #34
On Wed, Mar 24, 2021 at 07:43:29PM -0700, Hugh Dickins wrote:
> Right, after looking into it more, I completely agree with you:
> the Kaiser series (in both 4.4-stable and 4.9-stable) was simply
> wrong to lose that invlpg - fine in the kaiser case when we don't
> enable Globals at all, but plain wrong in the !kaiser_enabled case.
> One way or another, we have somehow got away with it for three years.

Yeah, because there were no boxes with kaiser_enabled=0 *and* PCID
which would set INVPCID_SINGLE. Before those, it would INVLPG in the
!INVPCID_SINGLE case.

Oh, btw, booting with "pci=on" "fixes" the issue too. And I tried
reproducing this on an Intel box with "pti=off" but it booted fine
so I'm probably missing some other aspect or triggering it there is
harder/different due to TLB differences or whatnot.

And Babu triggered the same issue on a AMD baremetal yesterday.

> I do agree with Paolo that the PCID_ASID_KERN flush would be better
> moved under the "if (kaiser_enabled)" now.

Ok.

> (And if this were ongoing development, I'd want to rewrite the
> function altogether: but no, these old stable trees are not the place
> for that.)

Bah, it brought some very mixed memories, wading through that code
after years. And yeah, people should stop using all these dead kernels
already! So yeah, no, you don't want to clean up stuff there - let
sleeping dogs lie.

> Boris, may I leave both -stable fixes to you?
> Let me know if you'd prefer me to clean up my mess.

No worries, I'll take care of it.

> Thanks a lot for tracking this down,

Thanks for double-checking me so quickly, lemme whip up a patch.

Thx.