diff mbox series

[v12,19/46] x86/kernel: Make the .bss..decrypted section shared in RMP table

Message ID 20220307213356.2797205-20-brijesh.singh@amd.com (mailing list archive)
State New, archived
Headers show
Series Add AMD Secure Nested Paging (SEV-SNP) Guest Support | expand

Commit Message

Brijesh Singh March 7, 2022, 9:33 p.m. UTC
The encryption attribute for the .bss..decrypted section is cleared in the
initial page table build. This is because the section contains the data
that need to be shared between the guest and the hypervisor.

When SEV-SNP is active, just clearing the encryption attribute in the
page table is not enough. The page state need to be updated in the RMP
table.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 arch/x86/kernel/head64.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

Comments

Sean Christopherson June 14, 2022, 12:46 a.m. UTC | #1
s/Brijesh/Michael

On Mon, Mar 07, 2022, Brijesh Singh wrote:
> The encryption attribute for the .bss..decrypted section is cleared in the
> initial page table build. This is because the section contains the data
> that need to be shared between the guest and the hypervisor.
> 
> When SEV-SNP is active, just clearing the encryption attribute in the
> page table is not enough. The page state need to be updated in the RMP
> table.
> 
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> ---
>  arch/x86/kernel/head64.c | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
> 
> diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
> index 83514b9827e6..656d2f3e2cf0 100644
> --- a/arch/x86/kernel/head64.c
> +++ b/arch/x86/kernel/head64.c
> @@ -143,7 +143,20 @@ static unsigned long __head sme_postprocess_startup(struct boot_params *bp, pmdv
>  	if (sme_get_me_mask()) {
>  		vaddr = (unsigned long)__start_bss_decrypted;
>  		vaddr_end = (unsigned long)__end_bss_decrypted;
> +
>  		for (; vaddr < vaddr_end; vaddr += PMD_SIZE) {
> +			/*
> +			 * On SNP, transition the page to shared in the RMP table so that
> +			 * it is consistent with the page table attribute change.
> +			 *
> +			 * __start_bss_decrypted has a virtual address in the high range
> +			 * mapping (kernel .text). PVALIDATE, by way of
> +			 * early_snp_set_memory_shared(), requires a valid virtual
> +			 * address but the kernel is currently running off of the identity
> +			 * mapping so use __pa() to get a *currently* valid virtual address.
> +			 */
> +			early_snp_set_memory_shared(__pa(vaddr), __pa(vaddr), PTRS_PER_PMD);

This breaks SME on Rome and Milan when compiling with clang-13.  I haven't been
able to figure out exactly what goes wrong.  printk isn't functional at this point,
and interactive debug during boot on our test systems is beyond me.  I can't even
verify that the bug is specific to clang because the draconian build system for our
test systems apparently is stuck pointing at gcc-4.9.

I suspect the issue is related to relocation and/or encrypting memory, as skipping
the call to early_snp_set_memory_shared() if SNP isn't active masks the issue.
I've dug through the assembly and haven't spotted a smoking gun, e.g. no obvious
use of absolute addresses.

Forcing a VM through the same path doesn't fail.  I can't test an SEV guest at the
moment because INIT_EX is also broken.

The crash incurs a very, very slow reboot, and I was out of cycles to work on this
about three hours ago.  If someone on the AMD side can repro, it would be much
appreciated.
Sean Christopherson June 14, 2022, 3:43 p.m. UTC | #2
On Tue, Jun 14, 2022, Sean Christopherson wrote:
> s/Brijesh/Michael
> 
> On Mon, Mar 07, 2022, Brijesh Singh wrote:
> > The encryption attribute for the .bss..decrypted section is cleared in the
> > initial page table build. This is because the section contains the data
> > that need to be shared between the guest and the hypervisor.
> > 
> > When SEV-SNP is active, just clearing the encryption attribute in the
> > page table is not enough. The page state need to be updated in the RMP
> > table.
> > 
> > Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> > ---
> >  arch/x86/kernel/head64.c | 13 +++++++++++++
> >  1 file changed, 13 insertions(+)
> > 
> > diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
> > index 83514b9827e6..656d2f3e2cf0 100644
> > --- a/arch/x86/kernel/head64.c
> > +++ b/arch/x86/kernel/head64.c
> > @@ -143,7 +143,20 @@ static unsigned long __head sme_postprocess_startup(struct boot_params *bp, pmdv
> >  	if (sme_get_me_mask()) {
> >  		vaddr = (unsigned long)__start_bss_decrypted;
> >  		vaddr_end = (unsigned long)__end_bss_decrypted;
> > +
> >  		for (; vaddr < vaddr_end; vaddr += PMD_SIZE) {
> > +			/*
> > +			 * On SNP, transition the page to shared in the RMP table so that
> > +			 * it is consistent with the page table attribute change.
> > +			 *
> > +			 * __start_bss_decrypted has a virtual address in the high range
> > +			 * mapping (kernel .text). PVALIDATE, by way of
> > +			 * early_snp_set_memory_shared(), requires a valid virtual
> > +			 * address but the kernel is currently running off of the identity
> > +			 * mapping so use __pa() to get a *currently* valid virtual address.
> > +			 */
> > +			early_snp_set_memory_shared(__pa(vaddr), __pa(vaddr), PTRS_PER_PMD);
> 
> This breaks SME on Rome and Milan when compiling with clang-13.  I haven't been
> able to figure out exactly what goes wrong.  printk isn't functional at this point,
> and interactive debug during boot on our test systems is beyond me.  I can't even
> verify that the bug is specific to clang because the draconian build system for our
> test systems apparently is stuck pointing at gcc-4.9.
> 
> I suspect the issue is related to relocation and/or encrypting memory, as skipping
> the call to early_snp_set_memory_shared() if SNP isn't active masks the issue.
> I've dug through the assembly and haven't spotted a smoking gun, e.g. no obvious
> use of absolute addresses.
> 
> Forcing a VM through the same path doesn't fail.  I can't test an SEV guest at the
> moment because INIT_EX is also broken.

The SEV INIT_EX was a PEBKAC issue.  An SEV guest boots just fine with a clang-built
kernel, so either it's a finnicky relocation issue or something specific to SME.

> The crash incurs a very, very slow reboot, and I was out of cycles to work on this
> about three hours ago.  If someone on the AMD side can repro, it would be much
> appreciated.
Tom Lendacky June 14, 2022, 4:01 p.m. UTC | #3
On 6/14/22 10:43, Sean Christopherson wrote:
> On Tue, Jun 14, 2022, Sean Christopherson wrote:
>> s/Brijesh/Michael
>>
>> On Mon, Mar 07, 2022, Brijesh Singh wrote:
>>> The encryption attribute for the .bss..decrypted section is cleared in the
>>> initial page table build. This is because the section contains the data
>>> that need to be shared between the guest and the hypervisor.
>>>
>>> When SEV-SNP is active, just clearing the encryption attribute in the
>>> page table is not enough. The page state need to be updated in the RMP
>>> table.
>>>
>>> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
>>> ---
>>>   arch/x86/kernel/head64.c | 13 +++++++++++++
>>>   1 file changed, 13 insertions(+)
>>>
>>> diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
>>> index 83514b9827e6..656d2f3e2cf0 100644
>>> --- a/arch/x86/kernel/head64.c
>>> +++ b/arch/x86/kernel/head64.c
>>> @@ -143,7 +143,20 @@ static unsigned long __head sme_postprocess_startup(struct boot_params *bp, pmdv
>>>   	if (sme_get_me_mask()) {
>>>   		vaddr = (unsigned long)__start_bss_decrypted;
>>>   		vaddr_end = (unsigned long)__end_bss_decrypted;
>>> +
>>>   		for (; vaddr < vaddr_end; vaddr += PMD_SIZE) {
>>> +			/*
>>> +			 * On SNP, transition the page to shared in the RMP table so that
>>> +			 * it is consistent with the page table attribute change.
>>> +			 *
>>> +			 * __start_bss_decrypted has a virtual address in the high range
>>> +			 * mapping (kernel .text). PVALIDATE, by way of
>>> +			 * early_snp_set_memory_shared(), requires a valid virtual
>>> +			 * address but the kernel is currently running off of the identity
>>> +			 * mapping so use __pa() to get a *currently* valid virtual address.
>>> +			 */
>>> +			early_snp_set_memory_shared(__pa(vaddr), __pa(vaddr), PTRS_PER_PMD);
>>
>> This breaks SME on Rome and Milan when compiling with clang-13.  I haven't been
>> able to figure out exactly what goes wrong.  printk isn't functional at this point,
>> and interactive debug during boot on our test systems is beyond me.  I can't even
>> verify that the bug is specific to clang because the draconian build system for our
>> test systems apparently is stuck pointing at gcc-4.9.
>>
>> I suspect the issue is related to relocation and/or encrypting memory, as skipping
>> the call to early_snp_set_memory_shared() if SNP isn't active masks the issue.
>> I've dug through the assembly and haven't spotted a smoking gun, e.g. no obvious
>> use of absolute addresses.
>>
>> Forcing a VM through the same path doesn't fail.  I can't test an SEV guest at the
>> moment because INIT_EX is also broken.
> 
> The SEV INIT_EX was a PEBKAC issue.  An SEV guest boots just fine with a clang-built
> kernel, so either it's a finnicky relocation issue or something specific to SME.

I just built and booted 5.19-rc2 with clang-13 and SME enabled without issue:

[    4.118226] Memory Encryption Features active: AMD SME

Maybe something with your kernel config? Can you send me your config?

Thanks,
Tom

> 
>> The crash incurs a very, very slow reboot, and I was out of cycles to work on this
>> about three hours ago.  If someone on the AMD side can repro, it would be much
>> appreciated.
Sean Christopherson June 14, 2022, 4:13 p.m. UTC | #4
On Tue, Jun 14, 2022, Tom Lendacky wrote:
> On 6/14/22 10:43, Sean Christopherson wrote:
> > On Tue, Jun 14, 2022, Sean Christopherson wrote:
> > > s/Brijesh/Michael
> > > 
> > > On Mon, Mar 07, 2022, Brijesh Singh wrote:
> > > > The encryption attribute for the .bss..decrypted section is cleared in the
> > > > initial page table build. This is because the section contains the data
> > > > that need to be shared between the guest and the hypervisor.
> > > > 
> > > > When SEV-SNP is active, just clearing the encryption attribute in the
> > > > page table is not enough. The page state need to be updated in the RMP
> > > > table.
> > > > 
> > > > Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> > > > ---
> > > >   arch/x86/kernel/head64.c | 13 +++++++++++++
> > > >   1 file changed, 13 insertions(+)
> > > > 
> > > > diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
> > > > index 83514b9827e6..656d2f3e2cf0 100644
> > > > --- a/arch/x86/kernel/head64.c
> > > > +++ b/arch/x86/kernel/head64.c
> > > > @@ -143,7 +143,20 @@ static unsigned long __head sme_postprocess_startup(struct boot_params *bp, pmdv
> > > >   	if (sme_get_me_mask()) {
> > > >   		vaddr = (unsigned long)__start_bss_decrypted;
> > > >   		vaddr_end = (unsigned long)__end_bss_decrypted;
> > > > +
> > > >   		for (; vaddr < vaddr_end; vaddr += PMD_SIZE) {
> > > > +			/*
> > > > +			 * On SNP, transition the page to shared in the RMP table so that
> > > > +			 * it is consistent with the page table attribute change.
> > > > +			 *
> > > > +			 * __start_bss_decrypted has a virtual address in the high range
> > > > +			 * mapping (kernel .text). PVALIDATE, by way of
> > > > +			 * early_snp_set_memory_shared(), requires a valid virtual
> > > > +			 * address but the kernel is currently running off of the identity
> > > > +			 * mapping so use __pa() to get a *currently* valid virtual address.
> > > > +			 */
> > > > +			early_snp_set_memory_shared(__pa(vaddr), __pa(vaddr), PTRS_PER_PMD);
> > > 
> > > This breaks SME on Rome and Milan when compiling with clang-13.  I haven't been
> > > able to figure out exactly what goes wrong.  printk isn't functional at this point,
> > > and interactive debug during boot on our test systems is beyond me.  I can't even
> > > verify that the bug is specific to clang because the draconian build system for our
> > > test systems apparently is stuck pointing at gcc-4.9.
> > > 
> > > I suspect the issue is related to relocation and/or encrypting memory, as skipping
> > > the call to early_snp_set_memory_shared() if SNP isn't active masks the issue.
> > > I've dug through the assembly and haven't spotted a smoking gun, e.g. no obvious
> > > use of absolute addresses.
> > > 
> > > Forcing a VM through the same path doesn't fail.  I can't test an SEV guest at the
> > > moment because INIT_EX is also broken.
> > 
> > The SEV INIT_EX was a PEBKAC issue.  An SEV guest boots just fine with a clang-built
> > kernel, so either it's a finnicky relocation issue or something specific to SME.
> 
> I just built and booted 5.19-rc2 with clang-13 and SME enabled without issue:
> 
> [    4.118226] Memory Encryption Features active: AMD SME

Phooey. 

> Maybe something with your kernel config? Can you send me your config?

Attached.  If you can't repro, I'll find someone on our end to work on this.

Thanks!
Tom Lendacky June 14, 2022, 7 p.m. UTC | #5
On 6/14/22 11:13, Sean Christopherson wrote:
> On Tue, Jun 14, 2022, Tom Lendacky wrote:
>> On 6/14/22 10:43, Sean Christopherson wrote:
>>> On Tue, Jun 14, 2022, Sean Christopherson wrote:
>>>> s/Brijesh/Michael
>>>>
>>>> On Mon, Mar 07, 2022, Brijesh Singh wrote:
>>>>> The encryption attribute for the .bss..decrypted section is cleared in the
>>>>> initial page table build. This is because the section contains the data
>>>>> that need to be shared between the guest and the hypervisor.
>>>>>
>>>>> When SEV-SNP is active, just clearing the encryption attribute in the
>>>>> page table is not enough. The page state need to be updated in the RMP
>>>>> table.
>>>>>
>>>>> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
>>>>> ---
>>>>>    arch/x86/kernel/head64.c | 13 +++++++++++++
>>>>>    1 file changed, 13 insertions(+)
>>>>>
>>>>> diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
>>>>> index 83514b9827e6..656d2f3e2cf0 100644
>>>>> --- a/arch/x86/kernel/head64.c
>>>>> +++ b/arch/x86/kernel/head64.c
>>>>> @@ -143,7 +143,20 @@ static unsigned long __head sme_postprocess_startup(struct boot_params *bp, pmdv
>>>>>    	if (sme_get_me_mask()) {
>>>>>    		vaddr = (unsigned long)__start_bss_decrypted;
>>>>>    		vaddr_end = (unsigned long)__end_bss_decrypted;
>>>>> +
>>>>>    		for (; vaddr < vaddr_end; vaddr += PMD_SIZE) {
>>>>> +			/*
>>>>> +			 * On SNP, transition the page to shared in the RMP table so that
>>>>> +			 * it is consistent with the page table attribute change.
>>>>> +			 *
>>>>> +			 * __start_bss_decrypted has a virtual address in the high range
>>>>> +			 * mapping (kernel .text). PVALIDATE, by way of
>>>>> +			 * early_snp_set_memory_shared(), requires a valid virtual
>>>>> +			 * address but the kernel is currently running off of the identity
>>>>> +			 * mapping so use __pa() to get a *currently* valid virtual address.
>>>>> +			 */
>>>>> +			early_snp_set_memory_shared(__pa(vaddr), __pa(vaddr), PTRS_PER_PMD);
>>>>
>>>> This breaks SME on Rome and Milan when compiling with clang-13.  I haven't been
>>>> able to figure out exactly what goes wrong.  printk isn't functional at this point,
>>>> and interactive debug during boot on our test systems is beyond me.  I can't even
>>>> verify that the bug is specific to clang because the draconian build system for our
>>>> test systems apparently is stuck pointing at gcc-4.9.
>>>>
>>>> I suspect the issue is related to relocation and/or encrypting memory, as skipping
>>>> the call to early_snp_set_memory_shared() if SNP isn't active masks the issue.
>>>> I've dug through the assembly and haven't spotted a smoking gun, e.g. no obvious
>>>> use of absolute addresses.
>>>>
>>>> Forcing a VM through the same path doesn't fail.  I can't test an SEV guest at the
>>>> moment because INIT_EX is also broken.
>>>
>>> The SEV INIT_EX was a PEBKAC issue.  An SEV guest boots just fine with a clang-built
>>> kernel, so either it's a finnicky relocation issue or something specific to SME.
>>
>> I just built and booted 5.19-rc2 with clang-13 and SME enabled without issue:
>>
>> [    4.118226] Memory Encryption Features active: AMD SME
> 
> Phooey.
> 
>> Maybe something with your kernel config? Can you send me your config?
> 
> Attached.  If you can't repro, I'll find someone on our end to work on this.

I was able to repro. It dies in the cc_platform_has() code, where it is
trying to do an indirect jump based on the attribute (actually in the
amd_cc_platform_has() which I think has been optimized in):

bool cc_platform_has(enum cc_attr attr)
{
ffffffff81002140:       55                      push   %rbp
ffffffff81002141:       48 89 e5                mov    %rsp,%rbp
         switch (vendor) {
ffffffff81002144:       8b 05 c6 e9 3a 01       mov    0x13ae9c6(%rip),%eax        # ffffffff823b0b10 <vendor>
ffffffff8100214a:       83 f8 03                cmp    $0x3,%eax
ffffffff8100214d:       74 25                   je     ffffffff81002174 <cc_platform_has+0x34>
ffffffff8100214f:       83 f8 02                cmp    $0x2,%eax
ffffffff81002152:       74 2f                   je     ffffffff81002183 <cc_platform_has+0x43>
ffffffff81002154:       83 f8 01                cmp    $0x1,%eax
ffffffff81002157:       75 26                   jne    ffffffff8100217f <cc_platform_has+0x3f>
         switch (attr) {
ffffffff81002159:       83 ff 05                cmp    $0x5,%edi
ffffffff8100215c:       77 21                   ja     ffffffff8100217f <cc_platform_has+0x3f>
ffffffff8100215e:       89 f8                   mov    %edi,%eax
ffffffff81002160:       ff 24 c5 c0 01 00 82    jmp    *-0x7dfffe40(,%rax,8)

This last line is what causes the reset. I'm guessing that the jump isn't
valid at this point because we are running in identity mapped mode and not
with a kernel virtual address at this point.

Trying to see what the difference was between your config and mine, the
indirect jump lead me to check the setting of CONFIG_RETPOLINE. Your config
did not have it enabled, so I set CONFIG_RETPOLINE=y, and with that, the
kernel boots successfully. With retpolines, the code is completely different
around here:

bool cc_platform_has(enum cc_attr attr)
{
ffffffff81001f30:       55                      push   %rbp
ffffffff81001f31:       48 89 e5                mov    %rsp,%rbp
         switch (vendor) {
ffffffff81001f34:       8b 05 26 8f 37 01       mov    0x1378f26(%rip),%eax        # ffffffff8237ae60 <vendor>
ffffffff81001f3a:       83 f8 03                cmp    $0x3,%eax
ffffffff81001f3d:       74 29                   je     ffffffff81001f68 <cc_platform_has+0x38>
ffffffff81001f3f:       83 f8 02                cmp    $0x2,%eax
ffffffff81001f42:       74 33                   je     ffffffff81001f77 <cc_platform_has+0x47>
ffffffff81001f44:       83 f8 01                cmp    $0x1,%eax
ffffffff81001f47:       75 2a                   jne    ffffffff81001f73 <cc_platform_has+0x43>
ffffffff81001f49:       31 c0                   xor    %eax,%eax
         switch (attr) {
ffffffff81001f4b:       83 ff 02                cmp    $0x2,%edi
ffffffff81001f4e:       7f 2f                   jg     ffffffff81001f7f <cc_platform_has+0x4f>
ffffffff81001f50:       85 ff                   test   %edi,%edi
ffffffff81001f52:       74 47                   je     ffffffff81001f9b <cc_platform_has+0x6b>
ffffffff81001f54:       83 ff 01                cmp    $0x1,%edi
ffffffff81001f57:       74 5b                   je     ffffffff81001fb4 <cc_platform_has+0x84>
ffffffff81001f59:       83 ff 02                cmp    $0x2,%edi
ffffffff81001f5c:       75 08                   jne    ffffffff81001f66 <cc_platform_has+0x36>
                 return sev_status & MSR_AMD64_SEV_ENABLED;
ffffffff81001f5e:       8a 05 44 3f 64 01       mov    0x1643f44(%rip),%al        # ffffffff82645ea8 <sev_status>
ffffffff81001f64:       24 01                   and    $0x1,%al
         case CC_VENDOR_HYPERV:
                 return hyperv_cc_platform_has(attr);
         default:
                 return false;
         }
}
ffffffff81001f66:       5d                      pop    %rbp
ffffffff81001f67:       c3                      ret
         switch (attr) {
ffffffff81001f68:       83 ff 07                cmp    $0x7,%edi
ffffffff81001f6b:       73 06                   jae    ffffffff81001f73 <cc_platform_has+0x43>
ffffffff81001f6d:       40 f6 c7 01             test   $0x1,%dil
ffffffff81001f71:       eb 07                   jmp    ffffffff81001f7a <cc_platform_has+0x4a>
ffffffff81001f73:       31 c0                   xor    %eax,%eax
}
ffffffff81001f75:       5d                      pop    %rbp
ffffffff81001f76:       c3                      ret
         return attr == CC_ATTR_GUEST_MEM_ENCRYPT;
ffffffff81001f77:       83 ff 02                cmp    $0x2,%edi
ffffffff81001f7a:       0f 94 c0                sete   %al
}
ffffffff81001f7d:       5d                      pop    %rbp
ffffffff81001f7e:       c3                      ret
         switch (attr) {
.
.
.

I'm not sure if there's a way to remove the jump table optimization for
the arch/x86/coco/core.c file when retpolines aren't configured.

Thanks,
Tom

> 
> Thanks!
Sean Christopherson June 14, 2022, 7:52 p.m. UTC | #6
On Tue, Jun 14, 2022, Tom Lendacky wrote:
> On 6/14/22 11:13, Sean Christopherson wrote:
> > > > > This breaks SME on Rome and Milan when compiling with clang-13.  I haven't been
> > > > > able to figure out exactly what goes wrong.  printk isn't functional at this point,
> > > > > and interactive debug during boot on our test systems is beyond me.  I can't even
> > > > > verify that the bug is specific to clang because the draconian build system for our
> > > > > test systems apparently is stuck pointing at gcc-4.9.
> > > > > 
> > > > > I suspect the issue is related to relocation and/or encrypting memory, as skipping
> > > > > the call to early_snp_set_memory_shared() if SNP isn't active masks the issue.
> > > > > I've dug through the assembly and haven't spotted a smoking gun, e.g. no obvious
> > > > > use of absolute addresses.
> > > > > 
> > > > > Forcing a VM through the same path doesn't fail.  I can't test an SEV guest at the
> > > > > moment because INIT_EX is also broken.
> > > > 
> > > > The SEV INIT_EX was a PEBKAC issue.  An SEV guest boots just fine with a clang-built
> > > > kernel, so either it's a finnicky relocation issue or something specific to SME.
> > > 
> > > I just built and booted 5.19-rc2 with clang-13 and SME enabled without issue:
> > > 
> > > [    4.118226] Memory Encryption Features active: AMD SME
> > 
> > Phooey.
> > 
> > > Maybe something with your kernel config? Can you send me your config?
> > 
> > Attached.  If you can't repro, I'll find someone on our end to work on this.
> 
> I was able to repro. It dies in the cc_platform_has() code, where it is
> trying to do an indirect jump based on the attribute (actually in the
> amd_cc_platform_has() which I think has been optimized in):
> 
> bool cc_platform_has(enum cc_attr attr)

...

> ffffffff81002160:       ff 24 c5 c0 01 00 82    jmp    *-0x7dfffe40(,%rax,8)
> 
> This last line is what causes the reset. I'm guessing that the jump isn't
> valid at this point because we are running in identity mapped mode and not
> with a kernel virtual address at this point.
> 
> Trying to see what the difference was between your config and mine, the
> indirect jump lead me to check the setting of CONFIG_RETPOLINE. Your config
> did not have it enabled, so I set CONFIG_RETPOLINE=y, and with that, the
> kernel boots successfully.

That would explain why my VMs didn't fail, I build those kernels with CONFIG_RETPOLINE=y.

> With retpolines, the code is completely different around here:

...

> I'm not sure if there's a way to remove the jump table optimization for
> the arch/x86/coco/core.c file when retpolines aren't configured.

And for post-boot I don't think we'd want to disable any such optimizations.

A possibled "fix" would be to do what sme_encrypt_kernel() does and just query
sev_status directly.  But even that works, the fragility of the boot code is
terrifying :-(  I can't think of any clever solutions though.

Many thanks again Tom!

---
 arch/x86/include/asm/sev.h |  4 ++++
 arch/x86/kernel/head64.c   | 10 +++++++---
 arch/x86/kernel/sev.c      | 16 +++++++++++-----
 3 files changed, 22 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index 19514524f0f8..701c561fdf08 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -193,6 +193,8 @@ static inline int pvalidate(unsigned long vaddr, bool rmp_psize, bool validate)
 void setup_ghcb(void);
 void __init early_snp_set_memory_private(unsigned long vaddr, unsigned long paddr,
 					 unsigned int npages);
+void __init __early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr,
+					  unsigned int npages);
 void __init early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr,
 					unsigned int npages);
 void __init snp_prep_memory(unsigned long paddr, unsigned int sz, enum psc_op op);
@@ -214,6 +216,8 @@ static inline void setup_ghcb(void) { }
 static inline void __init
 early_snp_set_memory_private(unsigned long vaddr, unsigned long paddr, unsigned int npages) { }
 static inline void __init
+__early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr, unsigned int npages) { }
+static inline void __init
 early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr, unsigned int npages) { }
 static inline void __init snp_prep_memory(unsigned long paddr, unsigned int sz, enum psc_op op) { }
 static inline void snp_set_memory_shared(unsigned long vaddr, unsigned int npages) { }
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index bd4a34100ed0..5efab0d8e49d 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -127,7 +127,9 @@ static bool __head check_la57_support(unsigned long physaddr)
 }
 #endif

-static unsigned long __head sme_postprocess_startup(struct boot_params *bp, pmdval_t *pmd)
+static unsigned long __head sme_postprocess_startup(struct boot_params *bp,
+						    pmdval_t *pmd,
+						    unsigned long physaddr)
 {
 	unsigned long vaddr, vaddr_end;
 	int i;
@@ -156,7 +158,9 @@ static unsigned long __head sme_postprocess_startup(struct boot_params *bp, pmdv
 			 * address but the kernel is currently running off of the identity
 			 * mapping so use __pa() to get a *currently* valid virtual address.
 			 */
-			early_snp_set_memory_shared(__pa(vaddr), __pa(vaddr), PTRS_PER_PMD);
+			if (sev_status & MSR_AMD64_SEV_SNP_ENABLED_BIT)
+				__early_snp_set_memory_shared(__pa(vaddr), __pa(vaddr),
+							      PTRS_PER_PMD);

 			i = pmd_index(vaddr);
 			pmd[i] -= sme_get_me_mask();
@@ -316,7 +320,7 @@ unsigned long __head __startup_64(unsigned long physaddr,
 	 */
 	*fixup_long(&phys_base, physaddr) += load_delta - sme_get_me_mask();

-	return sme_postprocess_startup(bp, pmd);
+	return sme_postprocess_startup(bp, pmd, physaddr);
 }

 /* Wipe all early page tables except for the kernel symbol map */
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index c05f0124c410..48966ecc520e 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -714,12 +714,9 @@ void __init early_snp_set_memory_private(unsigned long vaddr, unsigned long padd
 	pvalidate_pages(vaddr, npages, true);
 }

-void __init early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr,
-					unsigned int npages)
+void __init __early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr,
+					  unsigned int npages)
 {
-	if (!cc_platform_has(CC_ATTR_GUEST_SEV_SNP))
-		return;
-
 	/* Invalidate the memory pages before they are marked shared in the RMP table. */
 	pvalidate_pages(vaddr, npages, false);

@@ -727,6 +724,15 @@ void __init early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr
 	early_set_pages_state(paddr, npages, SNP_PAGE_STATE_SHARED);
 }

+void __init early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr,
+					unsigned int npages)
+{
+	if (!cc_platform_has(CC_ATTR_GUEST_SEV_SNP))
+		return;
+
+	__early_snp_set_memory_shared(vaddr, paddr, npages);
+}
+
 void __init snp_prep_memory(unsigned long paddr, unsigned int sz, enum psc_op op)
 {
 	unsigned long vaddr, npages;

base-commit: b13baccc3850ca8b8cccbf8ed9912dbaa0fdf7f3
--
Tom Lendacky June 16, 2022, 4:17 p.m. UTC | #7
On 6/14/22 14:52, Sean Christopherson wrote:
> On Tue, Jun 14, 2022, Tom Lendacky wrote:
>> On 6/14/22 11:13, Sean Christopherson wrote:
>>>>>> This breaks SME on Rome and Milan when compiling with clang-13.  I haven't been
>>>>>> able to figure out exactly what goes wrong.  printk isn't functional at this point,
>>>>>> and interactive debug during boot on our test systems is beyond me.  I can't even
>>>>>> verify that the bug is specific to clang because the draconian build system for our
>>>>>> test systems apparently is stuck pointing at gcc-4.9.
>>>>>>
>>>>>> I suspect the issue is related to relocation and/or encrypting memory, as skipping
>>>>>> the call to early_snp_set_memory_shared() if SNP isn't active masks the issue.
>>>>>> I've dug through the assembly and haven't spotted a smoking gun, e.g. no obvious
>>>>>> use of absolute addresses.
>>>>>>
>>>>>> Forcing a VM through the same path doesn't fail.  I can't test an SEV guest at the
>>>>>> moment because INIT_EX is also broken.
>>>>>

> 
>> I'm not sure if there's a way to remove the jump table optimization for
>> the arch/x86/coco/core.c file when retpolines aren't configured.
> 
> And for post-boot I don't think we'd want to disable any such optimizations.
> 
> A possibled "fix" would be to do what sme_encrypt_kernel() does and just query
> sev_status directly.  But even that works, the fragility of the boot code is
> terrifying :-(  I can't think of any clever solutions though.

I worry that another use of cc_platform_has() could creep in at some point 
and cause the same issue. Not sure how bad it would be, performance-wise, 
to remove the jump table optimization for arch/x86/coco/core.c.

I guess we can wait for Boris to get back and chime in.

> diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
> index bd4a34100ed0..5efab0d8e49d 100644
> --- a/arch/x86/kernel/head64.c
> +++ b/arch/x86/kernel/head64.c
> @@ -127,7 +127,9 @@ static bool __head check_la57_support(unsigned long physaddr)
>   }
>   #endif
> 
> -static unsigned long __head sme_postprocess_startup(struct boot_params *bp, pmdval_t *pmd)
> +static unsigned long __head sme_postprocess_startup(struct boot_params *bp,
> +						    pmdval_t *pmd,
> +						    unsigned long physaddr)

I noticed that you added the physaddr parameter but never use it...

Thanks,
Tom

>   {
>   	unsigned long vaddr, vaddr_end;
>   	int i;
> @@ -156,7 +158,9 @@ static unsigned long __head sme_postprocess_startup(struct boot_params *bp, pmdv
>   			 * address but the kernel is currently running off of the identity
>   			 * mapping so use __pa() to get a *currently* valid virtual address.
>   			 */
> -			early_snp_set_memory_shared(__pa(vaddr), __pa(vaddr), PTRS_PER_PMD);
> +			if (sev_status & MSR_AMD64_SEV_SNP_ENABLED_BIT)
> +				__early_snp_set_memory_shared(__pa(vaddr), __pa(vaddr),
> +							      PTRS_PER_PMD);
> 
>   			i = pmd_index(vaddr);
>   			pmd[i] -= sme_get_me_mask();
> @@ -316,7 +320,7 @@ unsigned long __head __startup_64(unsigned long physaddr,
>   	 */
>   	*fixup_long(&phys_base, physaddr) += load_delta - sme_get_me_mask();
> 
> -	return sme_postprocess_startup(bp, pmd);
> +	return sme_postprocess_startup(bp, pmd, physaddr);
>   }
> 
>   /* Wipe all early page tables except for the kernel symbol map */
Sean Christopherson June 16, 2022, 4:41 p.m. UTC | #8
On Thu, Jun 16, 2022, Tom Lendacky wrote:
> On 6/14/22 14:52, Sean Christopherson wrote:
> > On Tue, Jun 14, 2022, Tom Lendacky wrote:
> > > On 6/14/22 11:13, Sean Christopherson wrote:
> > > > > > > This breaks SME on Rome and Milan when compiling with clang-13.  I haven't been
> > > > > > > able to figure out exactly what goes wrong.  printk isn't functional at this point,
> > > > > > > and interactive debug during boot on our test systems is beyond me.  I can't even
> > > > > > > verify that the bug is specific to clang because the draconian build system for our
> > > > > > > test systems apparently is stuck pointing at gcc-4.9.
> > > > > > > 
> > > > > > > I suspect the issue is related to relocation and/or encrypting memory, as skipping
> > > > > > > the call to early_snp_set_memory_shared() if SNP isn't active masks the issue.
> > > > > > > I've dug through the assembly and haven't spotted a smoking gun, e.g. no obvious
> > > > > > > use of absolute addresses.
> > > > > > > 
> > > > > > > Forcing a VM through the same path doesn't fail.  I can't test an SEV guest at the
> > > > > > > moment because INIT_EX is also broken.
> > > > > > 
> 
> > 
> > > I'm not sure if there's a way to remove the jump table optimization for
> > > the arch/x86/coco/core.c file when retpolines aren't configured.
> > 
> > And for post-boot I don't think we'd want to disable any such optimizations.
> > 
> > A possibled "fix" would be to do what sme_encrypt_kernel() does and just query
> > sev_status directly.  But even that works, the fragility of the boot code is
> > terrifying :-(  I can't think of any clever solutions though.
> 
> I worry that another use of cc_platform_has() could creep in at some point
> and cause the same issue. Not sure how bad it would be, performance-wise, to
> remove the jump table optimization for arch/x86/coco/core.c.

One thought would be to initialize "vendor" to a bogus value, disallow calls to
cc_set_vendor() until after the kernel as gotten to a safe point, and then WARN
(or panic?) if cc_platform_has() is called before "vendor" is explicitly set.
New calls can still get in, but they'll be much easier to detect and less likely
to escape initial testing.

diff --git a/arch/x86/coco/core.c b/arch/x86/coco/core.c
index 49b44f881484..803220cd34a6 100644
--- a/arch/x86/coco/core.c
+++ b/arch/x86/coco/core.c
@@ -13,7 +13,11 @@
 #include <asm/coco.h>
 #include <asm/processor.h>

-static enum cc_vendor vendor __ro_after_init;
+/*
+ * Initialize the vendor to garbage to detect usage of cc_platform_has() before
+ * the vendor has been set.
+ */
+static enum cc_vendor vendor = CC_NR_VENDORS __ro_after_init;
 static u64 cc_mask __ro_after_init;

 static bool intel_cc_platform_has(enum cc_attr attr)
@@ -90,7 +94,10 @@ bool cc_platform_has(enum cc_attr attr)
                return intel_cc_platform_has(attr);
        case CC_VENDOR_HYPERV:
                return hyperv_cc_platform_has(attr);
+       case CC_VENDOR_NONE:
+               return false;
        default:
+               WARN_ONCE(1, "blah blah blah");
                return false;
        }
 }
diff --git a/arch/x86/include/asm/coco.h b/arch/x86/include/asm/coco.h
index 3d98c3a60d34..adfd2fbce7ac 100644
--- a/arch/x86/include/asm/coco.h
+++ b/arch/x86/include/asm/coco.h
@@ -9,6 +9,7 @@ enum cc_vendor {
        CC_VENDOR_AMD,
        CC_VENDOR_HYPERV,
        CC_VENDOR_INTEL,
+       CC_NR_VENDORS,
 };

 void cc_set_vendor(enum cc_vendor v);

> I guess we can wait for Boris to get back and chime in.
> 
> > diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
> > index bd4a34100ed0..5efab0d8e49d 100644
> > --- a/arch/x86/kernel/head64.c
> > +++ b/arch/x86/kernel/head64.c
> > @@ -127,7 +127,9 @@ static bool __head check_la57_support(unsigned long physaddr)
> >   }
> >   #endif
> > 
> > -static unsigned long __head sme_postprocess_startup(struct boot_params *bp, pmdval_t *pmd)
> > +static unsigned long __head sme_postprocess_startup(struct boot_params *bp,
> > +						    pmdval_t *pmd,
> > +						    unsigned long physaddr)
> 
> I noticed that you added the physaddr parameter but never use it...

Likely just garbage on my end, I was trying various ideas.
Borislav Petkov July 1, 2022, 4:51 p.m. UTC | #9
On Thu, Jun 16, 2022 at 04:41:05PM +0000, Sean Christopherson wrote:
> > I worry that another use of cc_platform_has() could creep in at some point
> > and cause the same issue. Not sure how bad it would be, performance-wise, to
> > remove the jump table optimization for arch/x86/coco/core.c.

Is there a gcc switch for that?

> One thought would be to initialize "vendor" to a bogus value, disallow calls to
> cc_set_vendor() until after the kernel as gotten to a safe point, and then WARN
> (or panic?) if cc_platform_has() is called before "vendor" is explicitly set.
> New calls can still get in, but they'll be much easier to detect and less likely
> to escape initial testing.

The invalid vendor thing makes sense but I don't think it'll help in
this case.

We set vendor in sme_enable() which comes before the

__startup_64 -> sme_postprocess_startup

path you're hitting.

We could do only the aspect of checking whether it hasn't been set yet
and warn then, in order to make the usage more robust...
Sean Christopherson July 7, 2022, 8:43 p.m. UTC | #10
On Fri, Jul 01, 2022, Borislav Petkov wrote:
> On Thu, Jun 16, 2022 at 04:41:05PM +0000, Sean Christopherson wrote:
> > > I worry that another use of cc_platform_has() could creep in at some point
> > > and cause the same issue. Not sure how bad it would be, performance-wise, to
> > > remove the jump table optimization for arch/x86/coco/core.c.
> 
> Is there a gcc switch for that?

I believe -fno-jump-tables will do the trick.  That also reminds me exactly why
CONFIG_RETPOLINE=y isn't broken, jump tables are disabled when retpolines are enabled[*].

[*] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86952

> > One thought would be to initialize "vendor" to a bogus value, disallow calls to
> > cc_set_vendor() until after the kernel as gotten to a safe point, and then WARN
> > (or panic?) if cc_platform_has() is called before "vendor" is explicitly set.
> > New calls can still get in, but they'll be much easier to detect and less likely
> > to escape initial testing.
> 
> The invalid vendor thing makes sense but I don't think it'll help in
> this case.
> 
> We set vendor in sme_enable() which comes before the
> 
> __startup_64 -> sme_postprocess_startup
> 
> path you're hitting.

Right, but that's easily solved, no?  E.g.

diff --git a/arch/x86/mm/mem_encrypt_amd.c b/arch/x86/mm/mem_encrypt_amd.c
index e8f7953fda83..ed3118f5bf62 100644
--- a/arch/x86/mm/mem_encrypt_amd.c
+++ b/arch/x86/mm/mem_encrypt_amd.c
@@ -487,6 +487,8 @@ void __init sme_early_init(void)
        if (!sme_me_mask)
                return;

+       cc_set_vendor(CC_VENDOR_AMD);
+
        early_pmd_flags = __sme_set(early_pmd_flags);

        __supported_pte_mask = __sme_set(__supported_pte_mask);
diff --git a/arch/x86/mm/mem_encrypt_identity.c b/arch/x86/mm/mem_encrypt_identity.c
index f415498d3175..6b1c60032400 100644
--- a/arch/x86/mm/mem_encrypt_identity.c
+++ b/arch/x86/mm/mem_encrypt_identity.c
@@ -611,7 +611,6 @@ void __init sme_enable(struct boot_params *bp)
 out:
        if (sme_me_mask) {
                physical_mask &= ~sme_me_mask;
-               cc_set_vendor(CC_VENDOR_AMD);
                cc_set_mask(sme_me_mask);
        }
 }

And disallow cc_set_vendor() before x86_64_start_kernel(), then fix any fallout.

> We could do only the aspect of checking whether it hasn't been set yet
> and warn then, in order to make the usage more robust...
> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> https://people.kernel.org/tglx/notes-about-netiquette
diff mbox series

Patch

diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 83514b9827e6..656d2f3e2cf0 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -143,7 +143,20 @@  static unsigned long __head sme_postprocess_startup(struct boot_params *bp, pmdv
 	if (sme_get_me_mask()) {
 		vaddr = (unsigned long)__start_bss_decrypted;
 		vaddr_end = (unsigned long)__end_bss_decrypted;
+
 		for (; vaddr < vaddr_end; vaddr += PMD_SIZE) {
+			/*
+			 * On SNP, transition the page to shared in the RMP table so that
+			 * it is consistent with the page table attribute change.
+			 *
+			 * __start_bss_decrypted has a virtual address in the high range
+			 * mapping (kernel .text). PVALIDATE, by way of
+			 * early_snp_set_memory_shared(), requires a valid virtual
+			 * address but the kernel is currently running off of the identity
+			 * mapping so use __pa() to get a *currently* valid virtual address.
+			 */
+			early_snp_set_memory_shared(__pa(vaddr), __pa(vaddr), PTRS_PER_PMD);
+
 			i = pmd_index(vaddr);
 			pmd[i] -= sme_get_me_mask();
 		}