diff mbox series

[v8,13/40] x86/kernel: Make the bss.decrypted section shared in RMP table

Message ID 20211210154332.11526-14-brijesh.singh@amd.com (mailing list archive)
State New, archived
Headers show
Series Add AMD Secure Nested Paging (SEV-SNP) Guest Support | expand

Commit Message

Brijesh Singh Dec. 10, 2021, 3:43 p.m. UTC
The encryption attribute for the bss.decrypted region is cleared in the
initial page table build. This is because the section contains the data
that need to be shared between the guest and the hypervisor.

When SEV-SNP is active, just clearing the encryption attribute in the
page table is not enough. The page state need to be updated in the RMP
table.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 arch/x86/kernel/head64.c | 7 +++++++
 1 file changed, 7 insertions(+)

Comments

Borislav Petkov Dec. 28, 2021, 11:53 a.m. UTC | #1
On Fri, Dec 10, 2021 at 09:43:05AM -0600, Brijesh Singh wrote:
> The encryption attribute for the bss.decrypted region is cleared in the

s/region/section/

s/bss.decrypted/.bss..decrypted/g

if you're going to call it by its name, use the correct one pls.

Ditto in the Subject.
Venu Busireddy Jan. 4, 2022, 5:56 p.m. UTC | #2
On 2021-12-10 09:43:05 -0600, Brijesh Singh wrote:
> The encryption attribute for the bss.decrypted region is cleared in the
> initial page table build. This is because the section contains the data
> that need to be shared between the guest and the hypervisor.
> 
> When SEV-SNP is active, just clearing the encryption attribute in the
> page table is not enough. The page state need to be updated in the RMP
> table.
> 
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> ---
>  arch/x86/kernel/head64.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
> index fa02402dcb9b..72c5082a3ba4 100644
> --- a/arch/x86/kernel/head64.c
> +++ b/arch/x86/kernel/head64.c
> @@ -143,7 +143,14 @@ static unsigned long sme_postprocess_startup(struct boot_params *bp, pmdval_t *p
>  	if (sme_get_me_mask()) {
>  		vaddr = (unsigned long)__start_bss_decrypted;
>  		vaddr_end = (unsigned long)__end_bss_decrypted;
> +
>  		for (; vaddr < vaddr_end; vaddr += PMD_SIZE) {
> +			/*
> +			 * When SEV-SNP is active then transition the page to shared in the RMP
> +			 * table so that it is consistent with the page table attribute change.
> +			 */
> +			early_snp_set_memory_shared(__pa(vaddr), __pa(vaddr), PTRS_PER_PMD);

Shouldn't the first argument be vaddr as below?

   			early_snp_set_memory_shared(vaddr, __pa(vaddr), PTRS_PER_PMD);

Venu
Brijesh Singh Jan. 5, 2022, 7:52 p.m. UTC | #3
On 1/4/22 11:56 AM, Venu Busireddy wrote:
> On 2021-12-10 09:43:05 -0600, Brijesh Singh wrote:
>> The encryption attribute for the bss.decrypted region is cleared in the
>> initial page table build. This is because the section contains the data
>> that need to be shared between the guest and the hypervisor.
>>
>> When SEV-SNP is active, just clearing the encryption attribute in the
>> page table is not enough. The page state need to be updated in the RMP
>> table.
>>
>> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
>> ---
>>   arch/x86/kernel/head64.c | 7 +++++++
>>   1 file changed, 7 insertions(+)
>>
>> diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
>> index fa02402dcb9b..72c5082a3ba4 100644
>> --- a/arch/x86/kernel/head64.c
>> +++ b/arch/x86/kernel/head64.c
>> @@ -143,7 +143,14 @@ static unsigned long sme_postprocess_startup(struct boot_params *bp, pmdval_t *p
>>   	if (sme_get_me_mask()) {
>>   		vaddr = (unsigned long)__start_bss_decrypted;
>>   		vaddr_end = (unsigned long)__end_bss_decrypted;
>> +
>>   		for (; vaddr < vaddr_end; vaddr += PMD_SIZE) {
>> +			/*
>> +			 * When SEV-SNP is active then transition the page to shared in the RMP
>> +			 * table so that it is consistent with the page table attribute change.
>> +			 */
>> +			early_snp_set_memory_shared(__pa(vaddr), __pa(vaddr), PTRS_PER_PMD);
> 
> Shouldn't the first argument be vaddr as below?
> 

Nope, sme_postprocess_startup() is called while we are fixing the 
initial page table and running with identity mapping (so va == pa).

thanks

>     			early_snp_set_memory_shared(vaddr, __pa(vaddr), PTRS_PER_PMD);
> 
> Venu
>
Dave Hansen Jan. 5, 2022, 8:27 p.m. UTC | #4
On 1/5/22 11:52, Brijesh Singh wrote:
>>>           for (; vaddr < vaddr_end; vaddr += PMD_SIZE) {
>>> +            /*
>>> +             * When SEV-SNP is active then transition the page to shared in the RMP
>>> +             * table so that it is consistent with the page table attribute change.
>>> +             */
>>> +            early_snp_set_memory_shared(__pa(vaddr), __pa(vaddr), PTRS_PER_PMD);
>>
>> Shouldn't the first argument be vaddr as below?
> 
> Nope, sme_postprocess_startup() is called while we are fixing the 
> initial page table and running with identity mapping (so va == pa).

I'm not sure I've ever seen a line of code that wanted a comment so badly.
Brijesh Singh Jan. 5, 2022, 9:39 p.m. UTC | #5
On 1/5/22 2:27 PM, Dave Hansen wrote:
> On 1/5/22 11:52, Brijesh Singh wrote:
>>>>           for (; vaddr < vaddr_end; vaddr += PMD_SIZE) {
>>>> +            /*
>>>> +             * When SEV-SNP is active then transition the page to 
>>>> shared in the RMP
>>>> +             * table so that it is consistent with the page table 
>>>> attribute change.
>>>> +             */
>>>> +            early_snp_set_memory_shared(__pa(vaddr), __pa(vaddr), 
>>>> PTRS_PER_PMD);
>>>
>>> Shouldn't the first argument be vaddr as below?
>>
>> Nope, sme_postprocess_startup() is called while we are fixing the 
>> initial page table and running with identity mapping (so va == pa).
> 
> I'm not sure I've ever seen a line of code that wanted a comment so badly.

The early_snp_set_memory_shared() call the PVALIDATE instruction to 
clear the validated bit from the BSS region. The PVALIDATE instruction 
needs a virtual address, so we need to use the identity mapped virtual 
address so that PVALIDATE can clear the validated bit. I will add more 
comments to clarify it.

-Brijesh
Venu Busireddy Jan. 6, 2022, 5:40 p.m. UTC | #6
On 2022-01-05 15:39:22 -0600, Brijesh Singh wrote:
> 
> 
> On 1/5/22 2:27 PM, Dave Hansen wrote:
> > On 1/5/22 11:52, Brijesh Singh wrote:
> > > > >           for (; vaddr < vaddr_end; vaddr += PMD_SIZE) {
> > > > > +            /*
> > > > > +             * When SEV-SNP is active then transition the
> > > > > page to shared in the RMP
> > > > > +             * table so that it is consistent with the page
> > > > > table attribute change.
> > > > > +             */
> > > > > +            early_snp_set_memory_shared(__pa(vaddr),
> > > > > __pa(vaddr), PTRS_PER_PMD);
> > > > 
> > > > Shouldn't the first argument be vaddr as below?
> > > 
> > > Nope, sme_postprocess_startup() is called while we are fixing the
> > > initial page table and running with identity mapping (so va == pa).
> > 
> > I'm not sure I've ever seen a line of code that wanted a comment so badly.
> 
> The early_snp_set_memory_shared() call the PVALIDATE instruction to clear
> the validated bit from the BSS region. The PVALIDATE instruction needs a
> virtual address, so we need to use the identity mapped virtual address so
> that PVALIDATE can clear the validated bit. I will add more comments to
> clarify it.

Looking forward to see your final comments explaining this. I can't
still follow why, when PVALIDATE needs the virtual address, we are doing
a __pa() on the vaddr.

Venu
Tom Lendacky Jan. 6, 2022, 7:06 p.m. UTC | #7
On 1/6/22 11:40 AM, Venu Busireddy wrote:
> On 2022-01-05 15:39:22 -0600, Brijesh Singh wrote:
>>
>>
>> On 1/5/22 2:27 PM, Dave Hansen wrote:
>>> On 1/5/22 11:52, Brijesh Singh wrote:
>>>>>>            for (; vaddr < vaddr_end; vaddr += PMD_SIZE) {
>>>>>> +            /*
>>>>>> +             * When SEV-SNP is active then transition the
>>>>>> page to shared in the RMP
>>>>>> +             * table so that it is consistent with the page
>>>>>> table attribute change.
>>>>>> +             */
>>>>>> +            early_snp_set_memory_shared(__pa(vaddr),
>>>>>> __pa(vaddr), PTRS_PER_PMD);
>>>>>
>>>>> Shouldn't the first argument be vaddr as below?
>>>>
>>>> Nope, sme_postprocess_startup() is called while we are fixing the
>>>> initial page table and running with identity mapping (so va == pa).
>>>
>>> I'm not sure I've ever seen a line of code that wanted a comment so badly.
>>
>> The early_snp_set_memory_shared() call the PVALIDATE instruction to clear
>> the validated bit from the BSS region. The PVALIDATE instruction needs a
>> virtual address, so we need to use the identity mapped virtual address so
>> that PVALIDATE can clear the validated bit. I will add more comments to
>> clarify it.
> 
> Looking forward to see your final comments explaining this. I can't
> still follow why, when PVALIDATE needs the virtual address, we are doing
> a __pa() on the vaddr.

It's because of the phase of booting that the kernel is in. At this point, 
the kernel is running in identity mapped mode (VA == PA). The 
__start_bss_decrypted address is a regular kernel address, e.g. for the 
kernel I'm on it is 0xffffffffa7600000. Since the PVALIDATE instruction 
requires a valid virtual address, the code needs to perform a __pa() 
against __start_bss_decrypted to get the identity mapped virtual address 
that is currently in place.

It is not until the .Ljump_to_C_code label in head_64.S that the 
addressing switches from identity mapped to kernel addresses.

Thanks,
Tom

> 
> Venu
>
Venu Busireddy Jan. 6, 2022, 8:16 p.m. UTC | #8
On 2022-01-06 13:06:13 -0600, Tom Lendacky wrote:
> On 1/6/22 11:40 AM, Venu Busireddy wrote:
> > On 2022-01-05 15:39:22 -0600, Brijesh Singh wrote:
> > > 
> > > 
> > > On 1/5/22 2:27 PM, Dave Hansen wrote:
> > > > On 1/5/22 11:52, Brijesh Singh wrote:
> > > > > > >            for (; vaddr < vaddr_end; vaddr += PMD_SIZE) {
> > > > > > > +            /*
> > > > > > > +             * When SEV-SNP is active then transition the
> > > > > > > page to shared in the RMP
> > > > > > > +             * table so that it is consistent with the page
> > > > > > > table attribute change.
> > > > > > > +             */
> > > > > > > +            early_snp_set_memory_shared(__pa(vaddr),
> > > > > > > __pa(vaddr), PTRS_PER_PMD);
> > > > > > 
> > > > > > Shouldn't the first argument be vaddr as below?
> > > > > 
> > > > > Nope, sme_postprocess_startup() is called while we are fixing the
> > > > > initial page table and running with identity mapping (so va == pa).
> > > > 
> > > > I'm not sure I've ever seen a line of code that wanted a comment so badly.
> > > 
> > > The early_snp_set_memory_shared() call the PVALIDATE instruction to clear
> > > the validated bit from the BSS region. The PVALIDATE instruction needs a
> > > virtual address, so we need to use the identity mapped virtual address so
> > > that PVALIDATE can clear the validated bit. I will add more comments to
> > > clarify it.
> > 
> > Looking forward to see your final comments explaining this. I can't
> > still follow why, when PVALIDATE needs the virtual address, we are doing
> > a __pa() on the vaddr.
> 
> It's because of the phase of booting that the kernel is in. At this point,
> the kernel is running in identity mapped mode (VA == PA). The
> __start_bss_decrypted address is a regular kernel address, e.g. for the
> kernel I'm on it is 0xffffffffa7600000. Since the PVALIDATE instruction
> requires a valid virtual address, the code needs to perform a __pa() against
> __start_bss_decrypted to get the identity mapped virtual address that is
> currently in place.

Perhaps  my confusion stems from the fact that __pa(x) is defined either
as "((unsigned long ) (x))" (for the cases where paddr and vaddr are
same), or as "__phys_addr((unsigned long )(x))", where a vaddr needs to
be converted to a paddr. If the paddr and vaddr are same in our case,
what exactly is the _pa(vaddr) doing to the vaddr?

Venu
Tom Lendacky Jan. 6, 2022, 8:50 p.m. UTC | #9
On 1/6/22 2:16 PM, Venu Busireddy wrote:
> On 2022-01-06 13:06:13 -0600, Tom Lendacky wrote:
>> On 1/6/22 11:40 AM, Venu Busireddy wrote:
>>> On 2022-01-05 15:39:22 -0600, Brijesh Singh wrote:
>>>>
>>>>
>>>> On 1/5/22 2:27 PM, Dave Hansen wrote:
>>>>> On 1/5/22 11:52, Brijesh Singh wrote:
>>>>>>>>             for (; vaddr < vaddr_end; vaddr += PMD_SIZE) {
>>>>>>>> +            /*
>>>>>>>> +             * When SEV-SNP is active then transition the
>>>>>>>> page to shared in the RMP
>>>>>>>> +             * table so that it is consistent with the page
>>>>>>>> table attribute change.
>>>>>>>> +             */
>>>>>>>> +            early_snp_set_memory_shared(__pa(vaddr),
>>>>>>>> __pa(vaddr), PTRS_PER_PMD);
>>>>>>>
>>>>>>> Shouldn't the first argument be vaddr as below?
>>>>>>
>>>>>> Nope, sme_postprocess_startup() is called while we are fixing the
>>>>>> initial page table and running with identity mapping (so va == pa).
>>>>>
>>>>> I'm not sure I've ever seen a line of code that wanted a comment so badly.
>>>>
>>>> The early_snp_set_memory_shared() call the PVALIDATE instruction to clear
>>>> the validated bit from the BSS region. The PVALIDATE instruction needs a
>>>> virtual address, so we need to use the identity mapped virtual address so
>>>> that PVALIDATE can clear the validated bit. I will add more comments to
>>>> clarify it.
>>>
>>> Looking forward to see your final comments explaining this. I can't
>>> still follow why, when PVALIDATE needs the virtual address, we are doing
>>> a __pa() on the vaddr.
>>
>> It's because of the phase of booting that the kernel is in. At this point,
>> the kernel is running in identity mapped mode (VA == PA). The
>> __start_bss_decrypted address is a regular kernel address, e.g. for the
>> kernel I'm on it is 0xffffffffa7600000. Since the PVALIDATE instruction
>> requires a valid virtual address, the code needs to perform a __pa() against
>> __start_bss_decrypted to get the identity mapped virtual address that is
>> currently in place.
> 
> Perhaps  my confusion stems from the fact that __pa(x) is defined either
> as "((unsigned long ) (x))" (for the cases where paddr and vaddr are
> same), or as "__phys_addr((unsigned long )(x))", where a vaddr needs to
> be converted to a paddr. If the paddr and vaddr are same in our case,
> what exactly is the _pa(vaddr) doing to the vaddr?

But they are not the same and the head64.c file is compiled without 
defining a value for __pa(), so __pa() is __phys_addr((unsigned long)(x)). 
The virtual address value of __start_bss_decrypted, for me, is 
0xffffffffa7600000, and that does not equal the physical address (take a 
look at your /proc/kallsyms). However, since the code is running identity 
mapped and with a page table without kernel virtual addresses, it cannot 
use that value. It needs to convert that value to the identity mapped 
virtual address and that is done using __pa(). Only after using __pa() on 
__start_bss_decrypted, do you get a virtual address that maps to and is 
equal to the physical address.

You may want to step through the boot code using KVM to see what the 
environment is and why things are done the way they are.

Thanks,
Tom

> 
> Venu
>
diff mbox series

Patch

diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index fa02402dcb9b..72c5082a3ba4 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -143,7 +143,14 @@  static unsigned long sme_postprocess_startup(struct boot_params *bp, pmdval_t *p
 	if (sme_get_me_mask()) {
 		vaddr = (unsigned long)__start_bss_decrypted;
 		vaddr_end = (unsigned long)__end_bss_decrypted;
+
 		for (; vaddr < vaddr_end; vaddr += PMD_SIZE) {
+			/*
+			 * When SEV-SNP is active then transition the page to shared in the RMP
+			 * table so that it is consistent with the page table attribute change.
+			 */
+			early_snp_set_memory_shared(__pa(vaddr), __pa(vaddr), PTRS_PER_PMD);
+
 			i = pmd_index(vaddr);
 			pmd[i] -= sme_get_me_mask();
 		}