mbox series

[0/3] Use arch_make_folio_accessible() everywhere

Message ID 20230915172829.2632994-1-willy@infradead.org (mailing list archive)
Headers show
Series Use arch_make_folio_accessible() everywhere | expand

Message

Matthew Wilcox Sept. 15, 2023, 5:28 p.m. UTC
We introduced arch_make_folio_accessible() a couple of years
ago, and it's in use in the page writeback path.  GUP still uses
arch_make_page_accessible(), which means that we can succeed in making
a single page of a folio accessible, then fail to make the rest of the
folio accessible when it comes time to do writeback and it's too late
to do anything about it.  I'm not sure how much of a real problem this is.

Switching everything around to arch_make_folio_accessible() also lets
us switch the page flag to be per-folio instead of per-page, which is
a good step towards dynamically allocated folios.

Build-tested only.

Matthew Wilcox (Oracle) (3):
  mm: Use arch_make_folio_accessible() in gup_pte_range()
  mm: Convert follow_page_pte() to use a folio
  s390: Convert arch_make_page_accessible() to
    arch_make_folio_accessible()

 arch/s390/include/asm/page.h |  5 ++--
 arch/s390/kernel/uv.c        | 46 +++++++++++++++++++++++-------------
 arch/s390/mm/fault.c         | 15 ++++++------
 include/linux/mm.h           | 20 ++--------------
 mm/gup.c                     | 22 +++++++++--------
 5 files changed, 54 insertions(+), 54 deletions(-)

Comments

Claudio Imbrenda Sept. 15, 2023, 5:54 p.m. UTC | #1
On Fri, 15 Sep 2023 18:28:25 +0100
"Matthew Wilcox (Oracle)" <willy@infradead.org> wrote:

> We introduced arch_make_folio_accessible() a couple of years
> ago, and it's in use in the page writeback path.  GUP still uses
> arch_make_page_accessible(), which means that we can succeed in making
> a single page of a folio accessible, then fail to make the rest of the
> folio accessible when it comes time to do writeback and it's too late
> to do anything about it.  I'm not sure how much of a real problem this is.
> 
> Switching everything around to arch_make_folio_accessible() also lets
> us switch the page flag to be per-folio instead of per-page, which is
> a good step towards dynamically allocated folios.
> 
> Build-tested only.
> 
> Matthew Wilcox (Oracle) (3):
>   mm: Use arch_make_folio_accessible() in gup_pte_range()
>   mm: Convert follow_page_pte() to use a folio
>   s390: Convert arch_make_page_accessible() to
>     arch_make_folio_accessible()
> 
>  arch/s390/include/asm/page.h |  5 ++--
>  arch/s390/kernel/uv.c        | 46 +++++++++++++++++++++++-------------
>  arch/s390/mm/fault.c         | 15 ++++++------
>  include/linux/mm.h           | 20 ++--------------
>  mm/gup.c                     | 22 +++++++++--------
>  5 files changed, 54 insertions(+), 54 deletions(-)
> 

if I understand correctly, this will as a matter of fact move the
security property from pages to folios.

this means that trying to access a page will (try to) make the whole
folio accessible, even though that might be counterproductive.... 

and there is no way to simply split a folio

I don't like this

there are also other reasons, but I don't have time to go into the
details on a Friday evening (will elaborate more on Monday)
Matthew Wilcox Sept. 15, 2023, 6:17 p.m. UTC | #2
On Fri, Sep 15, 2023 at 07:54:50PM +0200, Claudio Imbrenda wrote:
> On Fri, 15 Sep 2023 18:28:25 +0100
> "Matthew Wilcox (Oracle)" <willy@infradead.org> wrote:
> 
> > We introduced arch_make_folio_accessible() a couple of years
> > ago, and it's in use in the page writeback path.  GUP still uses
> > arch_make_page_accessible(), which means that we can succeed in making
> > a single page of a folio accessible, then fail to make the rest of the
> > folio accessible when it comes time to do writeback and it's too late
> > to do anything about it.  I'm not sure how much of a real problem this is.
> > 
> > Switching everything around to arch_make_folio_accessible() also lets
> > us switch the page flag to be per-folio instead of per-page, which is
> > a good step towards dynamically allocated folios.
> 
> if I understand correctly, this will as a matter of fact move the
> security property from pages to folios.

Correct.

> this means that trying to access a page will (try to) make the whole
> folio accessible, even though that might be counterproductive.... 
> 
> and there is no way to simply split a folio
> 
> I don't like this

As I said in the cover letter, we already make the entire folio
accessible in the writeback path.  I suppose if you never write the
folio back, this is new ...

Anyway, looking forward to a more substantial discussion on Monday.
Claudio Imbrenda Sept. 18, 2023, 11:01 a.m. UTC | #3
On Fri, 15 Sep 2023 19:17:51 +0100
Matthew Wilcox <willy@infradead.org> wrote:

> On Fri, Sep 15, 2023 at 07:54:50PM +0200, Claudio Imbrenda wrote:
> > On Fri, 15 Sep 2023 18:28:25 +0100
> > "Matthew Wilcox (Oracle)" <willy@infradead.org> wrote:
> >   
> > > We introduced arch_make_folio_accessible() a couple of years
> > > ago, and it's in use in the page writeback path.  GUP still uses
> > > arch_make_page_accessible(), which means that we can succeed in making
> > > a single page of a folio accessible, then fail to make the rest of the
> > > folio accessible when it comes time to do writeback and it's too late
> > > to do anything about it.  I'm not sure how much of a real problem this is.
> > > 
> > > Switching everything around to arch_make_folio_accessible() also lets
> > > us switch the page flag to be per-folio instead of per-page, which is
> > > a good step towards dynamically allocated folios.  
> > 
> > if I understand correctly, this will as a matter of fact move the
> > security property from pages to folios.  
> 
> Correct.
> 
> > this means that trying to access a page will (try to) make the whole
> > folio accessible, even though that might be counterproductive.... 
> > 
> > and there is no way to simply split a folio
> > 
> > I don't like this  
> 
> As I said in the cover letter, we already make the entire folio
> accessible in the writeback path.  I suppose if you never write the
> folio back, this is new ...
> 
> Anyway, looking forward to a more substantial discussion on Monday.

this will be a wall of text, sorry

first some definitions:

* secure page
page belonging to a secure guest, accessible by the guest, not
accessible by the host

* shared page
page belonging to a secure guest, accessible by the guest and by the
host. the guest decides which pages to share

* pinned shared
the host can force a page to stay shared; when the guest wants to
unshare it, a vm-exit event happens. this allows the host to make sure
the page is allowed become secure again before allowing the page to be
unshared

* exported
page with guest content, encrypted and integrity protected, no longer
accessible by the guest, but accessible by the host

* made accessible
a page is pinned shared, or, if that fails, exported.


now let's see how we use the arch-bit in struct page:

the arch-bit that is used to track whether the page is secure or not.
the bit MUST be set whenever a page is secure, and MAY be set for
non-secure pages. in general it should not be set for non-secure pages,
for performance reasons. sometimes we have small windows where
non-secure pages can have the bit set.

sometimes the arch-bit is used to determine whether further processing
(e.g. export) is needed.

when a page transitions from non-secure to secure, we must make sure
that no I/O is possibly happening on it. we do this in 2 ways at the
same time:

* make sure the page is not in writeback, and if so wait until the
  writeback is over

* make sure the page does not have any extra references except for the
  mapping itself. this means no-one else is trying to use the page for
  any other purpose, e.g. direct I/O.

  >> If the page has extra references, we wait until they are dropped <<


each time a guest tries to touch an exported page, it gets imported.
it's important to track the security property of each page individually.


and this is the most important thing, and the root of all our issues:

	>> we MUST NOT do any kind of I/O on secure pages <<


now let's see what happens for writeback. if a folio is in writeback,
all of it is going to be written back. so all of it needs to be made
accessible. once the I/O is over, the pages can be accessed again.


let's see what happens for other types of I/O (e.g. direct I/O)

currently, when we try to do direct I/O on a specific page,
pin_user_pages() will cause the pages to be made accessible. the
other neighbouring pages will stay secure and keep the arch bit set.

if the security property tracking gets moved to a whole folio, then you
can have a situation where the guest asks for I/O on a shared page, and
then tries to access a neighbouring page... which will hang until the
I/O is done, since the rest of folio has now been exported.

even worse, virtio requires individual guest pages to be shared
with the host. the host needs to do pin_user_pages() on those pages to
make sure they stay there, since they will be accessed directly. the pin
stays as long as the corresponding virtio devices are configured (i.e.
until the guest shuts down or reboots). with your changes, the whole
folio will be made accessible, both the shared page (pin shared) and the
remaining pages in the folio (exported). when the guest tries to access
the remaining pages, it will fail, triggering an import in the host. the
host will then proceed to wait forever.....


if I understand correctly, we have no control over the size of the
folios, and no way to split folios, so there is no solution to this.