[PATCHv5,06/19] mm/khugepaged: Handle encrypted pages

Message ID	20180717112029.42378-7-kirill.shutemov@linux.intel.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <owner-linux-mm@kvack.org> Received-SPF: pass (google.com: best guess record for domain of kirill.shutemov@linux.intel.com designates 134.134.136.65 as permitted sender) client-ip=134.134.136.65; From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> To: Ingo Molnar <mingo@redhat.com>, x86@kernel.org, Thomas Gleixner <tglx@linutronix.de>, "H. Peter Anvin" <hpa@zytor.com>, Tom Lendacky <thomas.lendacky@amd.com> Cc: Dave Hansen <dave.hansen@intel.com>, Kai Huang <kai.huang@linux.intel.com>, Jacob Pan <jacob.jun.pan@linux.intel.com>, linux-kernel@vger.kernel.org, linux-mm@kvack.org, "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Subject: [PATCHv5 06/19] mm/khugepaged: Handle encrypted pages Date: Tue, 17 Jul 2018 14:20:16 +0300 Message-Id: <20180717112029.42378-7-kirill.shutemov@linux.intel.com> In-Reply-To: <20180717112029.42378-1-kirill.shutemov@linux.intel.com> References: <20180717112029.42378-1-kirill.shutemov@linux.intel.com> Sender: owner-linux-mm@kvack.org Precedence: bulk

Message ID

20180717112029.42378-7-kirill.shutemov@linux.intel.com (mailing list archive)

State

New, archived

Headers

Received-SPF: pass (google.com: best guess record for domain of
	kirill.shutemov@linux.intel.com designates 134.134.136.65 as
	permitted sender) client-ip=134.134.136.65; 
From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
To: Ingo Molnar <mingo@redhat.com>, x86@kernel.org,
	Thomas Gleixner <tglx@linutronix.de>, "H. Peter Anvin" <hpa@zytor.com>,
	Tom Lendacky <thomas.lendacky@amd.com>
Cc: Dave Hansen <dave.hansen@intel.com>,
	Kai Huang <kai.huang@linux.intel.com>,
	Jacob Pan <jacob.jun.pan@linux.intel.com>, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Subject: [PATCHv5 06/19] mm/khugepaged: Handle encrypted pages
Date: Tue, 17 Jul 2018 14:20:16 +0300
Message-Id: <20180717112029.42378-7-kirill.shutemov@linux.intel.com>
In-Reply-To: <20180717112029.42378-1-kirill.shutemov@linux.intel.com>
References: <20180717112029.42378-1-kirill.shutemov@linux.intel.com>
Sender: owner-linux-mm@kvack.org
Precedence: bulk

Commit Message

Kirill A . Shutemov July 17, 2018, 11:20 a.m. UTC

khugepaged allocates page in advance, before we found a VMA for
collapse. We don't yet know which KeyID to use for the allocation.

The page is allocated with KeyID-0. Once we know that the VMA is
suitable for collapsing, we prepare the page for KeyID we need, based on
vma_keyid().

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 mm/khugepaged.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

Comments

Dave Hansen July 18, 2018, 11:11 p.m. UTC | #1

On 07/17/2018 04:20 AM, Kirill A. Shutemov wrote:
> khugepaged allocates page in advance, before we found a VMA for
> collapse. We don't yet know which KeyID to use for the allocation.

That's not really true.  We have the VMA and the address in the caller
(khugepaged_scan_pmd()), but we drop the lock and have to revalidate the
VMA.


> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 5ae34097aed1..d116f4ebb622 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -1056,6 +1056,16 @@ static void collapse_huge_page(struct mm_struct *mm,
>  	 */
>  	anon_vma_unlock_write(vma->anon_vma);
>  
> +	/*
> +	 * At this point new_page is allocated as non-encrypted.
> +	 * If VMA's KeyID is non-zero, we need to prepare it to be encrypted
> +	 * before coping data.
> +	 */
> +	if (vma_keyid(vma)) {
> +		prep_encrypted_page(new_page, HPAGE_PMD_ORDER,
> +				vma_keyid(vma), false);
> +	}

I guess this isn't horribly problematic now, but if we ever keep pools
of preassigned-keyids, this won't work any more.

Kirill A. Shutemov July 19, 2018, 8:59 a.m. UTC | #2

On Wed, Jul 18, 2018 at 04:11:57PM -0700, Dave Hansen wrote:
> On 07/17/2018 04:20 AM, Kirill A. Shutemov wrote:
> > khugepaged allocates page in advance, before we found a VMA for
> > collapse. We don't yet know which KeyID to use for the allocation.
> 
> That's not really true.  We have the VMA and the address in the caller
> (khugepaged_scan_pmd()), but we drop the lock and have to revalidate the
> VMA.

For !NUMA we allocate the page in khugepaged_do_scan(), well before we
know VMA.

> 
> 
> > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> > index 5ae34097aed1..d116f4ebb622 100644
> > --- a/mm/khugepaged.c
> > +++ b/mm/khugepaged.c
> > @@ -1056,6 +1056,16 @@ static void collapse_huge_page(struct mm_struct *mm,
> >  	 */
> >  	anon_vma_unlock_write(vma->anon_vma);
> >  
> > +	/*
> > +	 * At this point new_page is allocated as non-encrypted.
> > +	 * If VMA's KeyID is non-zero, we need to prepare it to be encrypted
> > +	 * before coping data.
> > +	 */
> > +	if (vma_keyid(vma)) {
> > +		prep_encrypted_page(new_page, HPAGE_PMD_ORDER,
> > +				vma_keyid(vma), false);
> > +	}
> 
> I guess this isn't horribly problematic now, but if we ever keep pools
> of preassigned-keyids, this won't work any more.

I don't get this. What pools of preassigned-keyids are you talking about?

Dave Hansen July 19, 2018, 2:13 p.m. UTC | #3

On 07/19/2018 01:59 AM, Kirill A. Shutemov wrote:
> On Wed, Jul 18, 2018 at 04:11:57PM -0700, Dave Hansen wrote:
>> On 07/17/2018 04:20 AM, Kirill A. Shutemov wrote:
>>> khugepaged allocates page in advance, before we found a VMA for
>>> collapse. We don't yet know which KeyID to use for the allocation.
>>
>> That's not really true.  We have the VMA and the address in the caller
>> (khugepaged_scan_pmd()), but we drop the lock and have to revalidate the
>> VMA.
> 
> For !NUMA we allocate the page in khugepaged_do_scan(), well before we
> know VMA.

Ahh, thanks for clarifying.  That's some more very good information
about the design and progression of your patch that belongs in the
changelog.

>>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>>> index 5ae34097aed1..d116f4ebb622 100644
>>> --- a/mm/khugepaged.c
>>> +++ b/mm/khugepaged.c
>>> @@ -1056,6 +1056,16 @@ static void collapse_huge_page(struct mm_struct *mm,
>>>  	 */
>>>  	anon_vma_unlock_write(vma->anon_vma);
>>>  
>>> +	/*
>>> +	 * At this point new_page is allocated as non-encrypted.
>>> +	 * If VMA's KeyID is non-zero, we need to prepare it to be encrypted
>>> +	 * before coping data.
>>> +	 */
>>> +	if (vma_keyid(vma)) {
>>> +		prep_encrypted_page(new_page, HPAGE_PMD_ORDER,
>>> +				vma_keyid(vma), false);
>>> +	}
>>
>> I guess this isn't horribly problematic now, but if we ever keep pools
>> of preassigned-keyids, this won't work any more.
> 
> I don't get this. What pools of preassigned-keyids are you talking about?

My point was that if we ever teach the allocator or something _near_ the
allocator to keep pools of pre-zeroed and/or pre-cache-cleared pages,
this approach will need to get changed otherwise we will double-prep pages.

My overall concern with prep_encrypted_page() in this patch set is that
it's inserted pretty ad-hoc.  It seems easy to miss spots where it
should be.  I'm also unsure of the failure mode and anything we've done
to ensure that if we get this wrong, we scream clearly and loudly about
what happened.  Do we do something like that?

Kirill A. Shutemov July 20, 2018, 12:29 p.m. UTC | #4

On Thu, Jul 19, 2018 at 07:13:39AM -0700, Dave Hansen wrote:
> On 07/19/2018 01:59 AM, Kirill A. Shutemov wrote:
> > On Wed, Jul 18, 2018 at 04:11:57PM -0700, Dave Hansen wrote:
> >> On 07/17/2018 04:20 AM, Kirill A. Shutemov wrote:
> >>> khugepaged allocates page in advance, before we found a VMA for
> >>> collapse. We don't yet know which KeyID to use for the allocation.
> >>
> >> That's not really true.  We have the VMA and the address in the caller
> >> (khugepaged_scan_pmd()), but we drop the lock and have to revalidate the
> >> VMA.
> > 
> > For !NUMA we allocate the page in khugepaged_do_scan(), well before we
> > know VMA.
> 
> Ahh, thanks for clarifying.  That's some more very good information
> about the design and progression of your patch that belongs in the
> changelog.

Okay.

> >>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> >>> index 5ae34097aed1..d116f4ebb622 100644
> >>> --- a/mm/khugepaged.c
> >>> +++ b/mm/khugepaged.c
> >>> @@ -1056,6 +1056,16 @@ static void collapse_huge_page(struct mm_struct *mm,
> >>>  	 */
> >>>  	anon_vma_unlock_write(vma->anon_vma);
> >>>  
> >>> +	/*
> >>> +	 * At this point new_page is allocated as non-encrypted.
> >>> +	 * If VMA's KeyID is non-zero, we need to prepare it to be encrypted
> >>> +	 * before coping data.
> >>> +	 */
> >>> +	if (vma_keyid(vma)) {
> >>> +		prep_encrypted_page(new_page, HPAGE_PMD_ORDER,
> >>> +				vma_keyid(vma), false);
> >>> +	}
> >>
> >> I guess this isn't horribly problematic now, but if we ever keep pools
> >> of preassigned-keyids, this won't work any more.
> > 
> > I don't get this. What pools of preassigned-keyids are you talking about?
> 
> My point was that if we ever teach the allocator or something _near_ the
> allocator to keep pools of pre-zeroed and/or pre-cache-cleared pages,
> this approach will need to get changed otherwise we will double-prep pages.

It shouldn't be a problem here. It's pretty slow path. We often wait
memory to be compacted before page for khugepaged gets allocated.
Double-prep shouldn't have visible impact.

> My overall concern with prep_encrypted_page() in this patch set is that
> it's inserted pretty ad-hoc.  It seems easy to miss spots where it
> should be.  I'm also unsure of the failure mode and anything we've done
> to ensure that if we get this wrong, we scream clearly and loudly about
> what happened.  Do we do something like that?

I have debugging patch that puts BUG_ONs around set_pte_at() to check if
the page's keyid matches VMA's keyid. But that's not very systematic.
We would need something better than this.

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 5ae34097aed1..d116f4ebb622 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1056,6 +1056,16 @@  static void collapse_huge_page(struct mm_struct *mm,
 	 */
 	anon_vma_unlock_write(vma->anon_vma);
 
+	/*
+	 * At this point new_page is allocated as non-encrypted.
+	 * If VMA's KeyID is non-zero, we need to prepare it to be encrypted
+	 * before coping data.
+	 */
+	if (vma_keyid(vma)) {
+		prep_encrypted_page(new_page, HPAGE_PMD_ORDER,
+				vma_keyid(vma), false);
+	}
+
 	__collapse_huge_page_copy(pte, new_page, vma, address, pte_ptl);
 	pte_unmap(pte);
 	__SetPageUptodate(new_page);

[PATCHv5,06/19] mm/khugepaged: Handle encrypted pages

Commit Message

Comments

Patch