diff mbox series

[v5,3/6] x86 / pv: do not treat PGC_extra pages as RAM

Message ID 20200309102304.1251-4-paul@xen.org (mailing list archive)
State Superseded
Headers show
Series remove one more shared xenheap page: shared_info | expand

Commit Message

Paul Durrant March 9, 2020, 10:23 a.m. UTC
From: Paul Durrant <pdurrant@amazon.com>

This patch modifies several places walking the domain's page_list to make
them ignore PGC_extra pages:

- dump_pageframe_info() should ignore PGC_extra pages in its dump as it
  determines whether to dump using domain_tot_pages() which also ignores
  PGC_extra pages.

- arch_set_info_guest() is looking for an L4 page table which will
  definitely not be in a PGC_extra page.

- audit_p2m() should ignore PGC_extra pages as it is perfectly legitimate
  for them not to be present in the P2M.

- dump_nama() should ignore PGC_extra pages as they are essentially
  uninteresting in that context.

- dom0_construct_pv() should ignore PGC_extra pages when setting up the
  physmap as they are only created for special purposes and, if they need
  to be mapped, will be mapped explicitly for whatever purpose is relevant.

- tboot_gen_domain_integrity() should ignore PGC_extra pages as they should
  not form part of the measurement.

Signed-off-by: Paul Durrant <paul@xen.org>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Wei Liu <wl@xen.org>
Cc: "Roger Pau Monné" <roger.pau@citrix.com>

v4:
 - Expand to cover more than just dom0_construct_pv()

v2:
 - New in v2
---
 xen/arch/x86/domain.c        | 6 +++++-
 xen/arch/x86/mm/p2m.c        | 3 +++
 xen/arch/x86/numa.c          | 3 +++
 xen/arch/x86/pv/dom0_build.c | 4 ++++
 xen/arch/x86/tboot.c         | 7 ++++++-
 5 files changed, 21 insertions(+), 2 deletions(-)

Comments

Jan Beulich March 9, 2020, 1:04 p.m. UTC | #1
On 09.03.2020 11:23, paul@xen.org wrote:
> From: Paul Durrant <pdurrant@amazon.com>
> 
> This patch modifies several places walking the domain's page_list to make
> them ignore PGC_extra pages:
> 
> - dump_pageframe_info() should ignore PGC_extra pages in its dump as it
>   determines whether to dump using domain_tot_pages() which also ignores
>   PGC_extra pages.

This argument looks wrong to me: Let's take an example - a domain
almost fully cleaned up, with 8 "normal" and 3 "extra" pages left.
domain_tot_pages() returns 8 in this case, i.e. "normal" page
dumping doesn't get skipped. However, there now won't be any trace
of the "extra" pages, because they're also not on xenpage_list,
which gets all its pages dumped in all cases. Correct restoration
of original behavior would be to dump "normal" pages when there
are less than 10, and to dump all "extra" pages. (Same of course
goes for live domains, where "normal" page dumping would be
skipped in the common case, but xenheap pages would be dumped, and
hence so should be "extra" ones.) As indicated before, the removal
of the APIC assist page from xenpage_list was already slightly
regressing in this regard (as well as in at least one other way,
I'm afraid), and you're now deliberately making the regression
even bigger.

Jan
Paul Durrant March 10, 2020, 1:32 p.m. UTC | #2
> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: 09 March 2020 13:04
> To: paul@xen.org
> Cc: xen-devel@lists.xenproject.org; Paul Durrant <pdurrant@amazon.com>; Andrew Cooper
> <andrew.cooper3@citrix.com>; Wei Liu <wl@xen.org>; Roger Pau Monné <roger.pau@citrix.com>
> Subject: Re: [PATCH v5 3/6] x86 / pv: do not treat PGC_extra pages as RAM
> 
> On 09.03.2020 11:23, paul@xen.org wrote:
> > From: Paul Durrant <pdurrant@amazon.com>
> >
> > This patch modifies several places walking the domain's page_list to make
> > them ignore PGC_extra pages:
> >
> > - dump_pageframe_info() should ignore PGC_extra pages in its dump as it
> >   determines whether to dump using domain_tot_pages() which also ignores
> >   PGC_extra pages.
> 
> This argument looks wrong to me: Let's take an example - a domain
> almost fully cleaned up, with 8 "normal" and 3 "extra" pages left.
> domain_tot_pages() returns 8 in this case, i.e. "normal" page
> dumping doesn't get skipped. However, there now won't be any trace
> of the "extra" pages, because they're also not on xenpage_list,
> which gets all its pages dumped in all cases. Correct restoration
> of original behavior would be to dump "normal" pages when there
> are less than 10, and to dump all "extra" pages. (Same of course
> goes for live domains, where "normal" page dumping would be
> skipped in the common case, but xenheap pages would be dumped, and
> hence so should be "extra" ones.) As indicated before, the removal
> of the APIC assist page from xenpage_list was already slightly
> regressing in this regard (as well as in at least one other way,
> I'm afraid), and you're now deliberately making the regression
> even bigger.

I thought the idea here was that the domheap dump loop should be dumping 'normal' pages so it seems reasonable to me that the number of pages dumped to match the value returned by domain_tot_pages().
Would you perhaps be happier if we put 'extra' pages on separate page list, which can be unconditionally dumped so as I transition xenheap pages to 'extra' pages they don't get missed? It would also get rid of some of the other checks for PGC_extra that I have to introduce because they currently end up on the domain's page list.

  Paul

> 
> Jan
Jan Beulich March 10, 2020, 2:58 p.m. UTC | #3
On 10.03.2020 14:32, Paul Durrant wrote:
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: 09 March 2020 13:04
>> To: paul@xen.org
>> Cc: xen-devel@lists.xenproject.org; Paul Durrant <pdurrant@amazon.com>; Andrew Cooper
>> <andrew.cooper3@citrix.com>; Wei Liu <wl@xen.org>; Roger Pau Monné <roger.pau@citrix.com>
>> Subject: Re: [PATCH v5 3/6] x86 / pv: do not treat PGC_extra pages as RAM
>>
>> On 09.03.2020 11:23, paul@xen.org wrote:
>>> From: Paul Durrant <pdurrant@amazon.com>
>>>
>>> This patch modifies several places walking the domain's page_list to make
>>> them ignore PGC_extra pages:
>>>
>>> - dump_pageframe_info() should ignore PGC_extra pages in its dump as it
>>>   determines whether to dump using domain_tot_pages() which also ignores
>>>   PGC_extra pages.
>>
>> This argument looks wrong to me: Let's take an example - a domain
>> almost fully cleaned up, with 8 "normal" and 3 "extra" pages left.
>> domain_tot_pages() returns 8 in this case, i.e. "normal" page
>> dumping doesn't get skipped. However, there now won't be any trace
>> of the "extra" pages, because they're also not on xenpage_list,
>> which gets all its pages dumped in all cases. Correct restoration
>> of original behavior would be to dump "normal" pages when there
>> are less than 10, and to dump all "extra" pages. (Same of course
>> goes for live domains, where "normal" page dumping would be
>> skipped in the common case, but xenheap pages would be dumped, and
>> hence so should be "extra" ones.) As indicated before, the removal
>> of the APIC assist page from xenpage_list was already slightly
>> regressing in this regard (as well as in at least one other way,
>> I'm afraid), and you're now deliberately making the regression
>> even bigger.
> 
> I thought the idea here was that the domheap dump loop should be
> dumping 'normal' pages so it seems reasonable to me that the number
> of pages dumped to match the value returned by domain_tot_pages().

I never thought of such a connection. To me the invocation of
domain_tot_pages() there is there only to avoid overly much output.

> Would you perhaps be happier if we put 'extra' pages on separate
> page list, which can be unconditionally dumped so as I transition
> xenheap pages to 'extra' pages they don't get missed? It would
> also get rid of some of the other checks for PGC_extra that I
> have to introduce because they currently end up on the domain's
> page list.

Hmm, wasn't it an intended side effect to have all pages on one
list now? Introducing a 3rd list (even if just temporarily, until
xenpage_list can be dropped) will be somewhat ugly because of how
arch_free_heap_page() works. In reply to patch 6 I did suggest to
have a separate list, but without taking these pages off
d->page_list, such that here you would skip them in the main
domain page dumping loop, but you would then traverse that second
list and dump all of its elements, just like xenpage_list gets
handled there.

Jan
Paul Durrant March 10, 2020, 3:05 p.m. UTC | #4
> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: 10 March 2020 14:59
> To: paul@xen.org
> Cc: xen-devel@lists.xenproject.org; 'Paul Durrant' <pdurrant@amazon.com>; 'Andrew Cooper'
> <andrew.cooper3@citrix.com>; 'Wei Liu' <wl@xen.org>; 'Roger Pau Monné' <roger.pau@citrix.com>
> Subject: Re: [PATCH v5 3/6] x86 / pv: do not treat PGC_extra pages as RAM
> 
> On 10.03.2020 14:32, Paul Durrant wrote:
> >> -----Original Message-----
> >> From: Jan Beulich <jbeulich@suse.com>
> >> Sent: 09 March 2020 13:04
> >> To: paul@xen.org
> >> Cc: xen-devel@lists.xenproject.org; Paul Durrant <pdurrant@amazon.com>; Andrew Cooper
> >> <andrew.cooper3@citrix.com>; Wei Liu <wl@xen.org>; Roger Pau Monné <roger.pau@citrix.com>
> >> Subject: Re: [PATCH v5 3/6] x86 / pv: do not treat PGC_extra pages as RAM
> >>
> >> On 09.03.2020 11:23, paul@xen.org wrote:
> >>> From: Paul Durrant <pdurrant@amazon.com>
> >>>
> >>> This patch modifies several places walking the domain's page_list to make
> >>> them ignore PGC_extra pages:
> >>>
> >>> - dump_pageframe_info() should ignore PGC_extra pages in its dump as it
> >>>   determines whether to dump using domain_tot_pages() which also ignores
> >>>   PGC_extra pages.
> >>
> >> This argument looks wrong to me: Let's take an example - a domain
> >> almost fully cleaned up, with 8 "normal" and 3 "extra" pages left.
> >> domain_tot_pages() returns 8 in this case, i.e. "normal" page
> >> dumping doesn't get skipped. However, there now won't be any trace
> >> of the "extra" pages, because they're also not on xenpage_list,
> >> which gets all its pages dumped in all cases. Correct restoration
> >> of original behavior would be to dump "normal" pages when there
> >> are less than 10, and to dump all "extra" pages. (Same of course
> >> goes for live domains, where "normal" page dumping would be
> >> skipped in the common case, but xenheap pages would be dumped, and
> >> hence so should be "extra" ones.) As indicated before, the removal
> >> of the APIC assist page from xenpage_list was already slightly
> >> regressing in this regard (as well as in at least one other way,
> >> I'm afraid), and you're now deliberately making the regression
> >> even bigger.
> >
> > I thought the idea here was that the domheap dump loop should be
> > dumping 'normal' pages so it seems reasonable to me that the number
> > of pages dumped to match the value returned by domain_tot_pages().
> 
> I never thought of such a connection. To me the invocation of
> domain_tot_pages() there is there only to avoid overly much output.
> 
> > Would you perhaps be happier if we put 'extra' pages on separate
> > page list, which can be unconditionally dumped so as I transition
> > xenheap pages to 'extra' pages they don't get missed? It would
> > also get rid of some of the other checks for PGC_extra that I
> > have to introduce because they currently end up on the domain's
> > page list.
> 
> Hmm, wasn't it an intended side effect to have all pages on one
> list now?

That would be nice, but I cannot reconcile that with unconditionally dumping all extra pages... otherwise dump_pageframe_info() would always have to walk the entire page_list no matter how long it was.

> Introducing a 3rd list (even if just temporarily, until
> xenpage_list can be dropped) will be somewhat ugly because of how
> arch_free_heap_page() works.

Yes, but it would at least be temporary.

> In reply to patch 6 I did suggest to
> have a separate list, but without taking these pages off
> d->page_list,

How would that work without adding an extra page_list_entry into struct page_info?

> such that here you would skip them in the main
> domain page dumping loop, but you would then traverse that second
> list and dump all of its elements, just like xenpage_list gets
> handled there.
> 

Well, that's what I'm trying to achieve, yes.

  Paul
Jan Beulich March 10, 2020, 3:12 p.m. UTC | #5
On 10.03.2020 16:05, Paul Durrant wrote:
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: 10 March 2020 14:59
>>
>> In reply to patch 6 I did suggest to
>> have a separate list, but without taking these pages off
>> d->page_list,
> 
> How would that work without adding an extra page_list_entry into struct page_info?

As said there, it'd be a singly linked list using a __pdx_t
field just like there already is with "next_shadow", i.e.
you'd add another union member "next_extra" or some such. Of
course the list shouldn't grow too long, or else insertion
and removal may become a bottleneck. Not sure how well this
would fit Arm, though; maybe they wouldn't need this, but
that depends on whether the list would be used for purposes
beyond dumping.

Jan

>> such that here you would skip them in the main
>> domain page dumping loop, but you would then traverse that second
>> list and dump all of its elements, just like xenpage_list gets
>> handled there.
>>
> 
> Well, that's what I'm trying to achieve, yes.
> 
>   Paul
>
Paul Durrant March 10, 2020, 3:16 p.m. UTC | #6
> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: 10 March 2020 15:13
> To: paul@xen.org
> Cc: xen-devel@lists.xenproject.org; 'Paul Durrant' <pdurrant@amazon.com>; 'Andrew Cooper'
> <andrew.cooper3@citrix.com>; 'Wei Liu' <wl@xen.org>; 'Roger Pau Monné' <roger.pau@citrix.com>
> Subject: Re: [PATCH v5 3/6] x86 / pv: do not treat PGC_extra pages as RAM
> 
> On 10.03.2020 16:05, Paul Durrant wrote:
> >> -----Original Message-----
> >> From: Jan Beulich <jbeulich@suse.com>
> >> Sent: 10 March 2020 14:59
> >>
> >> In reply to patch 6 I did suggest to
> >> have a separate list, but without taking these pages off
> >> d->page_list,
> >
> > How would that work without adding an extra page_list_entry into struct page_info?
> 
> As said there, it'd be a singly linked list using a __pdx_t
> field just like there already is with "next_shadow", i.e.
> you'd add another union member "next_extra" or some such. Of
> course the list shouldn't grow too long, or else insertion
> and removal may become a bottleneck. Not sure how well this
> would fit Arm, though; maybe they wouldn't need this, but
> that depends on whether the list would be used for purposes
> beyond dumping.
> 

That seems more obscure and bug-prone than an extra list head in struct domain. I'd really prefer to stick with that even if it does make things a little more ugly until xenpage_list goes away.

  Paul
Jan Beulich March 10, 2020, 3:33 p.m. UTC | #7
On 10.03.2020 16:16, Paul Durrant wrote:
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: 10 March 2020 15:13
>> To: paul@xen.org
>> Cc: xen-devel@lists.xenproject.org; 'Paul Durrant' <pdurrant@amazon.com>; 'Andrew Cooper'
>> <andrew.cooper3@citrix.com>; 'Wei Liu' <wl@xen.org>; 'Roger Pau Monné' <roger.pau@citrix.com>
>> Subject: Re: [PATCH v5 3/6] x86 / pv: do not treat PGC_extra pages as RAM
>>
>> On 10.03.2020 16:05, Paul Durrant wrote:
>>>> -----Original Message-----
>>>> From: Jan Beulich <jbeulich@suse.com>
>>>> Sent: 10 March 2020 14:59
>>>>
>>>> In reply to patch 6 I did suggest to
>>>> have a separate list, but without taking these pages off
>>>> d->page_list,
>>>
>>> How would that work without adding an extra page_list_entry into struct page_info?
>>
>> As said there, it'd be a singly linked list using a __pdx_t
>> field just like there already is with "next_shadow", i.e.
>> you'd add another union member "next_extra" or some such. Of
>> course the list shouldn't grow too long, or else insertion
>> and removal may become a bottleneck. Not sure how well this
>> would fit Arm, though; maybe they wouldn't need this, but
>> that depends on whether the list would be used for purposes
>> beyond dumping.
> 
> That seems more obscure and bug-prone than an extra list head
> in struct domain. I'd really prefer to stick with that even
> if it does make things a little more ugly until xenpage_list
> goes away.

Okay with me if there really was no property (other than
assign_pages() then needing to pick the right list, which is
not much different from needing to put the extra pages on two
lists) that you'd lose by no longer having the pages on the
same list.

Jan
Paul Durrant March 10, 2020, 3:54 p.m. UTC | #8
> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: 10 March 2020 15:33
> To: paul@xen.org
> Cc: xen-devel@lists.xenproject.org; 'Paul Durrant' <pdurrant@amazon.com>; 'Andrew Cooper'
> <andrew.cooper3@citrix.com>; 'Wei Liu' <wl@xen.org>; 'Roger Pau Monné' <roger.pau@citrix.com>
> Subject: Re: [PATCH v5 3/6] x86 / pv: do not treat PGC_extra pages as RAM
> 
> On 10.03.2020 16:16, Paul Durrant wrote:
> >> -----Original Message-----
> >> From: Jan Beulich <jbeulich@suse.com>
> >> Sent: 10 March 2020 15:13
> >> To: paul@xen.org
> >> Cc: xen-devel@lists.xenproject.org; 'Paul Durrant' <pdurrant@amazon.com>; 'Andrew Cooper'
> >> <andrew.cooper3@citrix.com>; 'Wei Liu' <wl@xen.org>; 'Roger Pau Monné' <roger.pau@citrix.com>
> >> Subject: Re: [PATCH v5 3/6] x86 / pv: do not treat PGC_extra pages as RAM
> >>
> >> On 10.03.2020 16:05, Paul Durrant wrote:
> >>>> -----Original Message-----
> >>>> From: Jan Beulich <jbeulich@suse.com>
> >>>> Sent: 10 March 2020 14:59
> >>>>
> >>>> In reply to patch 6 I did suggest to
> >>>> have a separate list, but without taking these pages off
> >>>> d->page_list,
> >>>
> >>> How would that work without adding an extra page_list_entry into struct page_info?
> >>
> >> As said there, it'd be a singly linked list using a __pdx_t
> >> field just like there already is with "next_shadow", i.e.
> >> you'd add another union member "next_extra" or some such. Of
> >> course the list shouldn't grow too long, or else insertion
> >> and removal may become a bottleneck. Not sure how well this
> >> would fit Arm, though; maybe they wouldn't need this, but
> >> that depends on whether the list would be used for purposes
> >> beyond dumping.
> >
> > That seems more obscure and bug-prone than an extra list head
> > in struct domain. I'd really prefer to stick with that even
> > if it does make things a little more ugly until xenpage_list
> > goes away.
> 
> Okay with me if there really was no property (other than
> assign_pages() then needing to pick the right list, which is
> not much different from needing to put the extra pages on two
> lists) that you'd lose by no longer having the pages on the
> same list.

Just assign_pages() and arch_free_heap_page(), and my test patch defines:

+static inline struct page_list_head *page_to_list(
+    struct domain *d, const struct page_info *pg)
+{
+    if ( is_xen_heap_page(pg) )
+        return &d->xenpage_list;
+    else if ( pg->count_info & PGC_extra )
+        return &d->extra_page_list;
+
+    return &d->page_list;
+}

which they both use to select the right list.

Once xenpage_list goes away then this can be simplified to use a ternary operator.

  Paul

> 
> Jan
diff mbox series

Patch

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index bdcc0d972a..f6ed25e8ee 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -231,6 +231,9 @@  void dump_pageframe_info(struct domain *d)
             unsigned int index = MASK_EXTR(page->u.inuse.type_info,
                                            PGT_type_mask);
 
+            if ( page->count_info & PGC_extra )
+                continue;
+
             if ( ++total[index] > 16 )
             {
                 switch ( page->u.inuse.type_info & PGT_type_mask )
@@ -1044,7 +1047,8 @@  int arch_set_info_guest(
             {
                 struct page_info *page = page_list_remove_head(&d->page_list);
 
-                if ( page_lock(page) )
+                if ( !(page->count_info & PGC_extra) &&
+                     page_lock(page) )
                 {
                     if ( (page->u.inuse.type_info & PGT_type_mask) ==
                          PGT_l4_page_table )
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 9f51370327..71d2fb9bbc 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -2843,6 +2843,9 @@  void audit_p2m(struct domain *d,
     spin_lock(&d->page_alloc_lock);
     page_list_for_each ( page, &d->page_list )
     {
+        if ( page->count_info & PGC_extra )
+            continue;
+
         mfn = mfn_x(page_to_mfn(page));
 
         P2M_PRINTK("auditing guest page, mfn=%#lx\n", mfn);
diff --git a/xen/arch/x86/numa.c b/xen/arch/x86/numa.c
index f1066c59c7..7e5aa8dc95 100644
--- a/xen/arch/x86/numa.c
+++ b/xen/arch/x86/numa.c
@@ -428,6 +428,9 @@  static void dump_numa(unsigned char key)
         spin_lock(&d->page_alloc_lock);
         page_list_for_each(page, &d->page_list)
         {
+            if ( page->count_info & PGC_extra )
+                break;
+
             i = phys_to_nid(page_to_maddr(page));
             page_num_node[i]++;
         }
diff --git a/xen/arch/x86/pv/dom0_build.c b/xen/arch/x86/pv/dom0_build.c
index dc16ef2e79..f8f1bbe2f4 100644
--- a/xen/arch/x86/pv/dom0_build.c
+++ b/xen/arch/x86/pv/dom0_build.c
@@ -792,6 +792,10 @@  int __init dom0_construct_pv(struct domain *d,
     {
         mfn = mfn_x(page_to_mfn(page));
         BUG_ON(SHARED_M2P(get_gpfn_from_mfn(mfn)));
+
+        if ( page->count_info & PGC_extra )
+            continue;
+
         if ( get_gpfn_from_mfn(mfn) >= count )
         {
             BUG_ON(is_pv_32bit_domain(d));
diff --git a/xen/arch/x86/tboot.c b/xen/arch/x86/tboot.c
index 8c232270b4..6cc020cb71 100644
--- a/xen/arch/x86/tboot.c
+++ b/xen/arch/x86/tboot.c
@@ -220,7 +220,12 @@  static void tboot_gen_domain_integrity(const uint8_t key[TB_KEY_SIZE],
         spin_lock(&d->page_alloc_lock);
         page_list_for_each(page, &d->page_list)
         {
-            void *pg = __map_domain_page(page);
+            void *pg;
+
+            if ( page->count_info & PGC_extra )
+                continue;
+
+            pg = __map_domain_page(page);
             vmac_update(pg, PAGE_SIZE, &ctx);
             unmap_domain_page(pg);
         }