Message ID | 20200214222658.12946-2-borntraeger@de.ibm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: s390: Add support for protected VMs | expand |
On 14.02.20 23:26, Christian Borntraeger wrote: > From: Claudio Imbrenda <imbrenda@linux.ibm.com> > > With the introduction of protected KVM guests on s390 there is now a > concept of inaccessible pages. These pages need to be made accessible > before the host can access them. > > While cpu accesses will trigger a fault that can be resolved, I/O > accesses will just fail. We need to add a callback into architecture > code for places that will do I/O, namely when writeback is started or > when a page reference is taken. > > This is not only to enable paging, file backing etc, it is also > necessary to protect the host against a malicious user space. For > example a bad QEMU could simply start direct I/O on such protected > memory. We do not want userspace to be able to trigger I/O errors and > thus we the logic is "whenever somebody accesses that page (gup) or > doing I/O, make sure that this page can be accessed. When the guest > tries to access that page we will wait in the page fault handler for > writeback to have finished and for the page_ref to be the expected > value. > > If wanted by others, the callbacks can be extended with error handlin > and a parameter from where this is called. s/handlin/handling/ One last question from my side: Why is it OK to ignore errors here. IOW, why not squash "[PATCH v2 39/42] example for future extension: mm:gup/writeback: add callbacks for inaccessible pages: error cases" into this patch. I can see in patch "[PATCH v2 05/42] s390/mm: provide memory management functions for protected KVM guests", that the call can fail for various reasons. That puzzles me a bit - what would happen if any of that fails? Or will it actually never fail for s390x (and all that error handling in arch_make_page_accessible() is essentially dead code in real life?)
On 17.02.20 10:14, David Hildenbrand wrote: > On 14.02.20 23:26, Christian Borntraeger wrote: >> From: Claudio Imbrenda <imbrenda@linux.ibm.com> >> >> With the introduction of protected KVM guests on s390 there is now a >> concept of inaccessible pages. These pages need to be made accessible >> before the host can access them. >> >> While cpu accesses will trigger a fault that can be resolved, I/O >> accesses will just fail. We need to add a callback into architecture >> code for places that will do I/O, namely when writeback is started or >> when a page reference is taken. >> >> This is not only to enable paging, file backing etc, it is also >> necessary to protect the host against a malicious user space. For >> example a bad QEMU could simply start direct I/O on such protected >> memory. We do not want userspace to be able to trigger I/O errors and >> thus we the logic is "whenever somebody accesses that page (gup) or >> doing I/O, make sure that this page can be accessed. When the guest >> tries to access that page we will wait in the page fault handler for >> writeback to have finished and for the page_ref to be the expected >> value. >> >> If wanted by others, the callbacks can be extended with error handlin >> and a parameter from where this is called. > > s/handlin/handling/ ack. > > One last question from my side: > > Why is it OK to ignore errors here. IOW, why not squash "[PATCH v2 > 39/42] example for future extension: mm:gup/writeback: add callbacks for > inaccessible pages: error cases" into this patch. I can certainly squash this in. But I have never heard feedback from the memory management people if they would prefer the patch as-is (no overhead at all for !s390) or already with the error handling. > > I can see in patch "[PATCH v2 05/42] s390/mm: provide memory management > functions for protected KVM guests", that the call can fail for various > reasons. That puzzles me a bit - what would happen if any of that fails? The convert _to_ secure has error handling. uv_convert_from_secure (the one that is used for arch_make_page_accessible can only fail if we mess up, e.g. an "export" on memory that we have donated to the ultravisor. > Or will it actually never fail for s390x (and all that error handling in > arch_make_page_accessible() is essentially dead code in real life?) So yes, if everything is setup properly this should not fail in real life and only we have a kernel (or firmware) bug.
On 17.02.20 12:10, Christian Borntraeger wrote: > > > On 17.02.20 10:14, David Hildenbrand wrote: >> On 14.02.20 23:26, Christian Borntraeger wrote: >>> From: Claudio Imbrenda <imbrenda@linux.ibm.com> >>> >>> With the introduction of protected KVM guests on s390 there is now a >>> concept of inaccessible pages. These pages need to be made accessible >>> before the host can access them. >>> >>> While cpu accesses will trigger a fault that can be resolved, I/O >>> accesses will just fail. We need to add a callback into architecture >>> code for places that will do I/O, namely when writeback is started or >>> when a page reference is taken. >>> >>> This is not only to enable paging, file backing etc, it is also >>> necessary to protect the host against a malicious user space. For >>> example a bad QEMU could simply start direct I/O on such protected >>> memory. We do not want userspace to be able to trigger I/O errors and >>> thus we the logic is "whenever somebody accesses that page (gup) or >>> doing I/O, make sure that this page can be accessed. When the guest >>> tries to access that page we will wait in the page fault handler for >>> writeback to have finished and for the page_ref to be the expected >>> value. >>> >>> If wanted by others, the callbacks can be extended with error handlin >>> and a parameter from where this is called. >> >> s/handlin/handling/ > > ack. >> >> One last question from my side: >> >> Why is it OK to ignore errors here. IOW, why not squash "[PATCH v2 >> 39/42] example for future extension: mm:gup/writeback: add callbacks for >> inaccessible pages: error cases" into this patch. > > I can certainly squash this in. But I have never heard feedback from the > memory management people if they would prefer the patch as-is (no overhead > at all for !s390) or already with the error handling. >> >> I can see in patch "[PATCH v2 05/42] s390/mm: provide memory management >> functions for protected KVM guests", that the call can fail for various >> reasons. That puzzles me a bit - what would happen if any of that fails? > > The convert _to_ secure has error handling. uv_convert_from_secure (the > one that is used for arch_make_page_accessible can only fail if we mess up, > e.g. an "export" on memory that we have donated to the ultravisor. Okay, so more like a BUG_ON() in case it really fails. > > >> Or will it actually never fail for s390x (and all that error handling in >> arch_make_page_accessible() is essentially dead code in real life?) > > So yes, if everything is setup properly this should not fail in real life > and only we have a kernel (or firmware) bug. > Then, without feedback from other possible users, this should be a void function. So either introduce error handling or convert it to a void for now (and add e.g., BUG_ON and a comment inside the s390x implementation). As discussed in the other patch, I'd prefer a CONFIG_HAVE_ARCH_MAKE_PAGE_ACCESSIBLE and handle it via kconfig, but not sure what other people here prefer.
On Tue, Feb 18, 2020 at 09:27:20AM +0100, David Hildenbrand wrote: > On 17.02.20 12:10, Christian Borntraeger wrote: > > So yes, if everything is setup properly this should not fail in real life > > and only we have a kernel (or firmware) bug. > > > > Then, without feedback from other possible users, this should be a void > function. So either introduce error handling or convert it to a void for > now (and add e.g., BUG_ON and a comment inside the s390x implementation). My preference would also be for a void function (versus ignoring an int return).
On Tue, Feb 18, 2020 at 07:46:10AM -0800, Sean Christopherson wrote: > On Tue, Feb 18, 2020 at 09:27:20AM +0100, David Hildenbrand wrote: > > On 17.02.20 12:10, Christian Borntraeger wrote: > > > So yes, if everything is setup properly this should not fail in real life > > > and only we have a kernel (or firmware) bug. > > > > > > > Then, without feedback from other possible users, this should be a void > > function. So either introduce error handling or convert it to a void for > > now (and add e.g., BUG_ON and a comment inside the s390x implementation). > > My preference would also be for a void function (versus ignoring an int > return). The gup code could certainly handle the error value, although the writeback is a lot less clear (so a BUG_ON() would seem to be sufficient for now). Will
On 18.02.20 17:02, Will Deacon wrote: > On Tue, Feb 18, 2020 at 07:46:10AM -0800, Sean Christopherson wrote: >> On Tue, Feb 18, 2020 at 09:27:20AM +0100, David Hildenbrand wrote: >>> On 17.02.20 12:10, Christian Borntraeger wrote: >>>> So yes, if everything is setup properly this should not fail in real life >>>> and only we have a kernel (or firmware) bug. >>>> >>> >>> Then, without feedback from other possible users, this should be a void >>> function. So either introduce error handling or convert it to a void for >>> now (and add e.g., BUG_ON and a comment inside the s390x implementation). >> >> My preference would also be for a void function (versus ignoring an int >> return). > > The gup code could certainly handle the error value, although the writeback > is a lot less clear (so a BUG_ON() would seem to be sufficient for now). Sean, David. Can we agree on merging patch 39?
On Tue, Feb 18, 2020 at 05:15:57PM +0100, Christian Borntraeger wrote: > > > On 18.02.20 17:02, Will Deacon wrote: > > On Tue, Feb 18, 2020 at 07:46:10AM -0800, Sean Christopherson wrote: > >> On Tue, Feb 18, 2020 at 09:27:20AM +0100, David Hildenbrand wrote: > >>> On 17.02.20 12:10, Christian Borntraeger wrote: > >>>> So yes, if everything is setup properly this should not fail in real life > >>>> and only we have a kernel (or firmware) bug. > >>>> > >>> > >>> Then, without feedback from other possible users, this should be a void > >>> function. So either introduce error handling or convert it to a void for > >>> now (and add e.g., BUG_ON and a comment inside the s390x implementation). > >> > >> My preference would also be for a void function (versus ignoring an int > >> return). > > > > The gup code could certainly handle the error value, although the writeback > > is a lot less clear (so a BUG_ON() would seem to be sufficient for now). > > Sean, David. Can we agree on merging patch 39? I'm a-ok with adding error checking, ignoring the return value is the only option I don't like :-)
On 18.02.20 22:35, Sean Christopherson wrote: > On Tue, Feb 18, 2020 at 05:15:57PM +0100, Christian Borntraeger wrote: >> >> >> On 18.02.20 17:02, Will Deacon wrote: >>> On Tue, Feb 18, 2020 at 07:46:10AM -0800, Sean Christopherson wrote: >>>> On Tue, Feb 18, 2020 at 09:27:20AM +0100, David Hildenbrand wrote: >>>>> On 17.02.20 12:10, Christian Borntraeger wrote: >>>>>> So yes, if everything is setup properly this should not fail in real life >>>>>> and only we have a kernel (or firmware) bug. >>>>>> >>>>> >>>>> Then, without feedback from other possible users, this should be a void >>>>> function. So either introduce error handling or convert it to a void for >>>>> now (and add e.g., BUG_ON and a comment inside the s390x implementation). >>>> >>>> My preference would also be for a void function (versus ignoring an int >>>> return). >>> >>> The gup code could certainly handle the error value, although the writeback >>> is a lot less clear (so a BUG_ON() would seem to be sufficient for now). >> >> Sean, David. Can we agree on merging patch 39? > > I'm a-ok with adding error checking, ignoring the return value is the only > option I don't like :-) Same over here :)
diff --git a/include/linux/gfp.h b/include/linux/gfp.h index e5b817cb86e7..be2754841369 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -485,6 +485,12 @@ static inline void arch_free_page(struct page *page, int order) { } #ifndef HAVE_ARCH_ALLOC_PAGE static inline void arch_alloc_page(struct page *page, int order) { } #endif +#ifndef HAVE_ARCH_MAKE_PAGE_ACCESSIBLE +static inline int arch_make_page_accessible(struct page *page) +{ + return 0; +} +#endif struct page * __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid, diff --git a/mm/gup.c b/mm/gup.c index 1b521e0ac1de..a1c15d029f7c 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -276,6 +276,7 @@ static struct page *follow_page_pte(struct vm_area_struct *vma, page = ERR_PTR(-ENOMEM); goto out; } + arch_make_page_accessible(page); } if (flags & FOLL_TOUCH) { if ((flags & FOLL_WRITE) && @@ -1919,6 +1920,7 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end, VM_BUG_ON_PAGE(compound_head(page) != head, page); + arch_make_page_accessible(page); SetPageReferenced(page); pages[*nr] = page; (*nr)++; diff --git a/mm/page-writeback.c b/mm/page-writeback.c index 2caf780a42e7..4c020e4ae71c 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -2807,6 +2807,7 @@ int __test_set_page_writeback(struct page *page, bool keep_write) inc_zone_page_state(page, NR_ZONE_WRITE_PENDING); } unlock_page_memcg(page); + arch_make_page_accessible(page); return ret; }