diff mbox series

[v2,01/42] mm:gup/writeback: add callbacks for inaccessible pages

Message ID 20200214222658.12946-2-borntraeger@de.ibm.com (mailing list archive)
State New, archived
Headers show
Series KVM: s390: Add support for protected VMs | expand

Commit Message

Christian Borntraeger Feb. 14, 2020, 10:26 p.m. UTC
From: Claudio Imbrenda <imbrenda@linux.ibm.com>

With the introduction of protected KVM guests on s390 there is now a
concept of inaccessible pages. These pages need to be made accessible
before the host can access them.

While cpu accesses will trigger a fault that can be resolved, I/O
accesses will just fail.  We need to add a callback into architecture
code for places that will do I/O, namely when writeback is started or
when a page reference is taken.

This is not only to enable paging, file backing etc, it is also
necessary to protect the host against a malicious user space. For
example a bad QEMU could simply start direct I/O on such protected
memory.  We do not want userspace to be able to trigger I/O errors and
thus we the logic is "whenever somebody accesses that page (gup) or
doing I/O, make sure that this page can be accessed. When the guest
tries to access that page we will wait in the page fault handler for
writeback to have finished and for the page_ref to be the expected
value.

If wanted by others, the callbacks can be extended with error handlin
and a parameter from where this is called.

Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 include/linux/gfp.h | 6 ++++++
 mm/gup.c            | 2 ++
 mm/page-writeback.c | 1 +
 3 files changed, 9 insertions(+)

Comments

David Hildenbrand Feb. 17, 2020, 9:14 a.m. UTC | #1
On 14.02.20 23:26, Christian Borntraeger wrote:
> From: Claudio Imbrenda <imbrenda@linux.ibm.com>
> 
> With the introduction of protected KVM guests on s390 there is now a
> concept of inaccessible pages. These pages need to be made accessible
> before the host can access them.
> 
> While cpu accesses will trigger a fault that can be resolved, I/O
> accesses will just fail.  We need to add a callback into architecture
> code for places that will do I/O, namely when writeback is started or
> when a page reference is taken.
> 
> This is not only to enable paging, file backing etc, it is also
> necessary to protect the host against a malicious user space. For
> example a bad QEMU could simply start direct I/O on such protected
> memory.  We do not want userspace to be able to trigger I/O errors and
> thus we the logic is "whenever somebody accesses that page (gup) or
> doing I/O, make sure that this page can be accessed. When the guest
> tries to access that page we will wait in the page fault handler for
> writeback to have finished and for the page_ref to be the expected
> value.
> 
> If wanted by others, the callbacks can be extended with error handlin
> and a parameter from where this is called.

s/handlin/handling/

One last question from my side:

Why is it OK to ignore errors here. IOW, why not squash "[PATCH v2
39/42] example for future extension: mm:gup/writeback: add callbacks for
inaccessible pages: error cases" into this patch.

I can see in patch "[PATCH v2 05/42] s390/mm: provide memory management
functions for protected KVM guests", that the call can fail for various
reasons. That puzzles me a bit - what would happen if any of that fails?
Or will it actually never fail for s390x (and all that error handling in
arch_make_page_accessible() is essentially dead code in real life?)
Christian Borntraeger Feb. 17, 2020, 11:10 a.m. UTC | #2
On 17.02.20 10:14, David Hildenbrand wrote:
> On 14.02.20 23:26, Christian Borntraeger wrote:
>> From: Claudio Imbrenda <imbrenda@linux.ibm.com>
>>
>> With the introduction of protected KVM guests on s390 there is now a
>> concept of inaccessible pages. These pages need to be made accessible
>> before the host can access them.
>>
>> While cpu accesses will trigger a fault that can be resolved, I/O
>> accesses will just fail.  We need to add a callback into architecture
>> code for places that will do I/O, namely when writeback is started or
>> when a page reference is taken.
>>
>> This is not only to enable paging, file backing etc, it is also
>> necessary to protect the host against a malicious user space. For
>> example a bad QEMU could simply start direct I/O on such protected
>> memory.  We do not want userspace to be able to trigger I/O errors and
>> thus we the logic is "whenever somebody accesses that page (gup) or
>> doing I/O, make sure that this page can be accessed. When the guest
>> tries to access that page we will wait in the page fault handler for
>> writeback to have finished and for the page_ref to be the expected
>> value.
>>
>> If wanted by others, the callbacks can be extended with error handlin
>> and a parameter from where this is called.
> 
> s/handlin/handling/

ack. 
> 
> One last question from my side:
> 
> Why is it OK to ignore errors here. IOW, why not squash "[PATCH v2
> 39/42] example for future extension: mm:gup/writeback: add callbacks for
> inaccessible pages: error cases" into this patch.

I can certainly squash this in. But I have never heard feedback from the 
memory management people if they would prefer the patch as-is (no overhead
at all for !s390) or already with the error handling. 
> 
> I can see in patch "[PATCH v2 05/42] s390/mm: provide memory management
> functions for protected KVM guests", that the call can fail for various
> reasons. That puzzles me a bit - what would happen if any of that fails?

The convert _to_ secure has error handling. uv_convert_from_secure (the
one that is used for arch_make_page_accessible can only fail if we mess up,
e.g. an "export" on memory that we have donated to the ultravisor.


> Or will it actually never fail for s390x (and all that error handling in
> arch_make_page_accessible() is essentially dead code in real life?)

So yes, if everything is setup properly this should not fail in real life
and only we have a kernel (or firmware) bug.
David Hildenbrand Feb. 18, 2020, 8:27 a.m. UTC | #3
On 17.02.20 12:10, Christian Borntraeger wrote:
> 
> 
> On 17.02.20 10:14, David Hildenbrand wrote:
>> On 14.02.20 23:26, Christian Borntraeger wrote:
>>> From: Claudio Imbrenda <imbrenda@linux.ibm.com>
>>>
>>> With the introduction of protected KVM guests on s390 there is now a
>>> concept of inaccessible pages. These pages need to be made accessible
>>> before the host can access them.
>>>
>>> While cpu accesses will trigger a fault that can be resolved, I/O
>>> accesses will just fail.  We need to add a callback into architecture
>>> code for places that will do I/O, namely when writeback is started or
>>> when a page reference is taken.
>>>
>>> This is not only to enable paging, file backing etc, it is also
>>> necessary to protect the host against a malicious user space. For
>>> example a bad QEMU could simply start direct I/O on such protected
>>> memory.  We do not want userspace to be able to trigger I/O errors and
>>> thus we the logic is "whenever somebody accesses that page (gup) or
>>> doing I/O, make sure that this page can be accessed. When the guest
>>> tries to access that page we will wait in the page fault handler for
>>> writeback to have finished and for the page_ref to be the expected
>>> value.
>>>
>>> If wanted by others, the callbacks can be extended with error handlin
>>> and a parameter from where this is called.
>>
>> s/handlin/handling/
> 
> ack. 
>>
>> One last question from my side:
>>
>> Why is it OK to ignore errors here. IOW, why not squash "[PATCH v2
>> 39/42] example for future extension: mm:gup/writeback: add callbacks for
>> inaccessible pages: error cases" into this patch.
> 
> I can certainly squash this in. But I have never heard feedback from the 
> memory management people if they would prefer the patch as-is (no overhead
> at all for !s390) or already with the error handling. 
>>
>> I can see in patch "[PATCH v2 05/42] s390/mm: provide memory management
>> functions for protected KVM guests", that the call can fail for various
>> reasons. That puzzles me a bit - what would happen if any of that fails?
> 
> The convert _to_ secure has error handling. uv_convert_from_secure (the
> one that is used for arch_make_page_accessible can only fail if we mess up,
> e.g. an "export" on memory that we have donated to the ultravisor.

Okay, so more like a BUG_ON() in case it really fails.

> 
> 
>> Or will it actually never fail for s390x (and all that error handling in
>> arch_make_page_accessible() is essentially dead code in real life?)
> 
> So yes, if everything is setup properly this should not fail in real life
> and only we have a kernel (or firmware) bug.
> 

Then, without feedback from other possible users, this should be a void
function. So either introduce error handling or convert it to a void for
now (and add e.g., BUG_ON and a comment inside the s390x implementation).

As discussed in the other patch, I'd prefer a
CONFIG_HAVE_ARCH_MAKE_PAGE_ACCESSIBLE and handle it via kconfig, but not
sure what other people here prefer.
Sean Christopherson Feb. 18, 2020, 3:46 p.m. UTC | #4
On Tue, Feb 18, 2020 at 09:27:20AM +0100, David Hildenbrand wrote:
> On 17.02.20 12:10, Christian Borntraeger wrote:
> > So yes, if everything is setup properly this should not fail in real life
> > and only we have a kernel (or firmware) bug.
> > 
> 
> Then, without feedback from other possible users, this should be a void
> function. So either introduce error handling or convert it to a void for
> now (and add e.g., BUG_ON and a comment inside the s390x implementation).

My preference would also be for a void function (versus ignoring an int
return).
Will Deacon Feb. 18, 2020, 4:02 p.m. UTC | #5
On Tue, Feb 18, 2020 at 07:46:10AM -0800, Sean Christopherson wrote:
> On Tue, Feb 18, 2020 at 09:27:20AM +0100, David Hildenbrand wrote:
> > On 17.02.20 12:10, Christian Borntraeger wrote:
> > > So yes, if everything is setup properly this should not fail in real life
> > > and only we have a kernel (or firmware) bug.
> > > 
> > 
> > Then, without feedback from other possible users, this should be a void
> > function. So either introduce error handling or convert it to a void for
> > now (and add e.g., BUG_ON and a comment inside the s390x implementation).
> 
> My preference would also be for a void function (versus ignoring an int
> return).

The gup code could certainly handle the error value, although the writeback
is a lot less clear (so a BUG_ON() would seem to be sufficient for now).

Will
Christian Borntraeger Feb. 18, 2020, 4:15 p.m. UTC | #6
On 18.02.20 17:02, Will Deacon wrote:
> On Tue, Feb 18, 2020 at 07:46:10AM -0800, Sean Christopherson wrote:
>> On Tue, Feb 18, 2020 at 09:27:20AM +0100, David Hildenbrand wrote:
>>> On 17.02.20 12:10, Christian Borntraeger wrote:
>>>> So yes, if everything is setup properly this should not fail in real life
>>>> and only we have a kernel (or firmware) bug.
>>>>
>>>
>>> Then, without feedback from other possible users, this should be a void
>>> function. So either introduce error handling or convert it to a void for
>>> now (and add e.g., BUG_ON and a comment inside the s390x implementation).
>>
>> My preference would also be for a void function (versus ignoring an int
>> return).
> 
> The gup code could certainly handle the error value, although the writeback
> is a lot less clear (so a BUG_ON() would seem to be sufficient for now).

Sean, David. Can we agree on merging patch 39?
Sean Christopherson Feb. 18, 2020, 9:35 p.m. UTC | #7
On Tue, Feb 18, 2020 at 05:15:57PM +0100, Christian Borntraeger wrote:
> 
> 
> On 18.02.20 17:02, Will Deacon wrote:
> > On Tue, Feb 18, 2020 at 07:46:10AM -0800, Sean Christopherson wrote:
> >> On Tue, Feb 18, 2020 at 09:27:20AM +0100, David Hildenbrand wrote:
> >>> On 17.02.20 12:10, Christian Borntraeger wrote:
> >>>> So yes, if everything is setup properly this should not fail in real life
> >>>> and only we have a kernel (or firmware) bug.
> >>>>
> >>>
> >>> Then, without feedback from other possible users, this should be a void
> >>> function. So either introduce error handling or convert it to a void for
> >>> now (and add e.g., BUG_ON and a comment inside the s390x implementation).
> >>
> >> My preference would also be for a void function (versus ignoring an int
> >> return).
> > 
> > The gup code could certainly handle the error value, although the writeback
> > is a lot less clear (so a BUG_ON() would seem to be sufficient for now).
> 
> Sean, David. Can we agree on merging patch 39?

I'm a-ok with adding error checking, ignoring the return value is the only
option I don't like :-)
David Hildenbrand Feb. 19, 2020, 8:31 a.m. UTC | #8
On 18.02.20 22:35, Sean Christopherson wrote:
> On Tue, Feb 18, 2020 at 05:15:57PM +0100, Christian Borntraeger wrote:
>>
>>
>> On 18.02.20 17:02, Will Deacon wrote:
>>> On Tue, Feb 18, 2020 at 07:46:10AM -0800, Sean Christopherson wrote:
>>>> On Tue, Feb 18, 2020 at 09:27:20AM +0100, David Hildenbrand wrote:
>>>>> On 17.02.20 12:10, Christian Borntraeger wrote:
>>>>>> So yes, if everything is setup properly this should not fail in real life
>>>>>> and only we have a kernel (or firmware) bug.
>>>>>>
>>>>>
>>>>> Then, without feedback from other possible users, this should be a void
>>>>> function. So either introduce error handling or convert it to a void for
>>>>> now (and add e.g., BUG_ON and a comment inside the s390x implementation).
>>>>
>>>> My preference would also be for a void function (versus ignoring an int
>>>> return).
>>>
>>> The gup code could certainly handle the error value, although the writeback
>>> is a lot less clear (so a BUG_ON() would seem to be sufficient for now).
>>
>> Sean, David. Can we agree on merging patch 39?
> 
> I'm a-ok with adding error checking, ignoring the return value is the only
> option I don't like :-)

Same over here :)
diff mbox series

Patch

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index e5b817cb86e7..be2754841369 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -485,6 +485,12 @@  static inline void arch_free_page(struct page *page, int order) { }
 #ifndef HAVE_ARCH_ALLOC_PAGE
 static inline void arch_alloc_page(struct page *page, int order) { }
 #endif
+#ifndef HAVE_ARCH_MAKE_PAGE_ACCESSIBLE
+static inline int arch_make_page_accessible(struct page *page)
+{
+	return 0;
+}
+#endif
 
 struct page *
 __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
diff --git a/mm/gup.c b/mm/gup.c
index 1b521e0ac1de..a1c15d029f7c 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -276,6 +276,7 @@  static struct page *follow_page_pte(struct vm_area_struct *vma,
 			page = ERR_PTR(-ENOMEM);
 			goto out;
 		}
+		arch_make_page_accessible(page);
 	}
 	if (flags & FOLL_TOUCH) {
 		if ((flags & FOLL_WRITE) &&
@@ -1919,6 +1920,7 @@  static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
 
 		VM_BUG_ON_PAGE(compound_head(page) != head, page);
 
+		arch_make_page_accessible(page);
 		SetPageReferenced(page);
 		pages[*nr] = page;
 		(*nr)++;
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 2caf780a42e7..4c020e4ae71c 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2807,6 +2807,7 @@  int __test_set_page_writeback(struct page *page, bool keep_write)
 		inc_zone_page_state(page, NR_ZONE_WRITE_PENDING);
 	}
 	unlock_page_memcg(page);
+	arch_make_page_accessible(page);
 	return ret;
 
 }