diff mbox series

[v4,2/4] x86/mem_sharing: copy a page_lock version to be internal to memshr

Message ID 20190502221345.18459-2-tamas@tklengyel.com (mailing list archive)
State Superseded
Headers show
Series [v4,1/4] x86/mem_sharing: reorder when pages are unlocked and released | expand

Commit Message

Tamas K Lengyel May 2, 2019, 10:13 p.m. UTC
Patch cf4b30dca0a "Add debug code to detect illegal page_lock and put_page_type
ordering" added extra sanity checking to page_lock/page_unlock for debug builds
with the assumption that no hypervisor path ever locks two pages at once.

This assumption doesn't hold during memory sharing so we copy a version of
page_lock/unlock to be used exclusively in the memory sharing subsystem
without the sanity checks.

Signed-off-by: Tamas K Lengyel <tamas@tklengyel.com>
Cc: George Dunlap <george.dunlap@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Roger Pau Monne <roger.pau@citrix.com>
---
 xen/arch/x86/mm/mem_sharing.c | 43 +++++++++++++++++++++++++++++++----
 xen/include/asm-x86/mm.h      | 14 +-----------
 2 files changed, 40 insertions(+), 17 deletions(-)

Comments

Jan Beulich May 3, 2019, 8:27 a.m. UTC | #1
>>> On 03.05.19 at 00:13, <tamas@tklengyel.com> wrote:
> --- a/xen/arch/x86/mm/mem_sharing.c
> +++ b/xen/arch/x86/mm/mem_sharing.c
> @@ -112,13 +112,48 @@ static inline void page_sharing_dispose(struct page_info *page)
>  
>  #endif /* MEM_SHARING_AUDIT */
>  
> -static inline int mem_sharing_page_lock(struct page_info *pg)
> +/*
> + * Private implementations of page_lock/unlock to bypass PV-only
> + * sanity checks not applicable to mem-sharing.
> + */
> +static inline bool _page_lock(struct page_info *page)
>  {
> -    int rc;
> +    unsigned long x, nx;
> +
> +    do {
> +        while ( (x = page->u.inuse.type_info) & PGT_locked )
> +            cpu_relax();
> +        nx = x + (1 | PGT_locked);
> +        if ( !(x & PGT_validated) ||
> +             !(x & PGT_count_mask) ||
> +             !(nx & PGT_count_mask) )
> +            return false;

Just for my own understanding: Did you verify that the PGT_validated
check is indeed needed here, or did you copy it "just in case"? In the
latter case a comment may be worthwhile.

Furthermore, are there any mem-sharing specific checks reasonable
to do here in place of the PV ones you want to avoid? Like pages
making it here only ever being of PGT_shared_page type?

> --- a/xen/include/asm-x86/mm.h
> +++ b/xen/include/asm-x86/mm.h
> @@ -356,24 +356,12 @@ struct platform_bad_page {
>  const struct platform_bad_page *get_platform_badpages(unsigned int 
> *array_size);
>  
>  /* Per page locks:
> - * page_lock() is used for two purposes: pte serialization, and memory sharing.
> + * page_lock() is used for pte serialization.
>   *
>   * All users of page lock for pte serialization live in mm.c, use it
>   * to lock a page table page during pte updates, do not take other locks ithin
>   * the critical section delimited by page_lock/unlock, and perform no
>   * nesting.
> - *
> - * All users of page lock for memory sharing live in mm/mem_sharing.c. Page_lock
> - * is used in memory sharing to protect addition (share) and removal (unshare)
> - * of (gfn,domain) tupples to a list of gfn's that the shared page is currently
> - * backing. Nesting may happen when sharing (and locking) two pages -- deadlock
> - * is avoided by locking pages in increasing order.
> - * All memory sharing code paths take the p2m lock of the affected gfn before
> - * taking the lock for the underlying page. We enforce ordering between page_lock
> - * and p2m_lock using an mm-locks.h construct.
> - *
> - * These two users (pte serialization and memory sharing) do not collide, since
> - * sharing is only supported for hvm guests, which do not perform pv pte updates.
>   */
>  int page_lock(struct page_info *page);
>  void page_unlock(struct page_info *page);

I think it would be helpful to retain (in a slightly adjusted form) the last
sentence of the comment above, to clarify that the PGT_locked uses
are now what does not end up colliding. At this occasion "which do not
perform pv pte updates" would also better be re-worded to e.g.
"which do not have PV PTEs updated" (as PVH Dom0 is very much
expected to issue PV page table operations for PV DomU-s).

Jan
Tamas K Lengyel May 3, 2019, 1:43 p.m. UTC | #2
On Fri, May 3, 2019 at 2:27 AM Jan Beulich <JBeulich@suse.com> wrote:
>
> >>> On 03.05.19 at 00:13, <tamas@tklengyel.com> wrote:
> > --- a/xen/arch/x86/mm/mem_sharing.c
> > +++ b/xen/arch/x86/mm/mem_sharing.c
> > @@ -112,13 +112,48 @@ static inline void page_sharing_dispose(struct page_info *page)
> >
> >  #endif /* MEM_SHARING_AUDIT */
> >
> > -static inline int mem_sharing_page_lock(struct page_info *pg)
> > +/*
> > + * Private implementations of page_lock/unlock to bypass PV-only
> > + * sanity checks not applicable to mem-sharing.
> > + */
> > +static inline bool _page_lock(struct page_info *page)
> >  {
> > -    int rc;
> > +    unsigned long x, nx;
> > +
> > +    do {
> > +        while ( (x = page->u.inuse.type_info) & PGT_locked )
> > +            cpu_relax();
> > +        nx = x + (1 | PGT_locked);
> > +        if ( !(x & PGT_validated) ||
> > +             !(x & PGT_count_mask) ||
> > +             !(nx & PGT_count_mask) )
> > +            return false;
>
> Just for my own understanding: Did you verify that the PGT_validated
> check is indeed needed here, or did you copy it "just in case"? In the
> latter case a comment may be worthwhile.

This is an exact copy of page_lock, sans the asserts that break it
from mem_sharing. I didn't investigate which of these flags are
necessary for mem_sharing. Frankly, I don't fully understand their
meaning and I haven't came across documentation about it yet. I can
certainly add a comment saying TODO: figure out which of these flags
are actually needed.

> Furthermore, are there any mem-sharing specific checks reasonable
> to do here in place of the PV ones you want to avoid? Like pages
> making it here only ever being of PGT_shared_page type?

There are checks already in place after the lock is taken where
necessary. Those checks can't be moved in here because they don't
apply to all cases uniformly - for example, we also take the lock for
when the page type is being converted between shared and not shared.

> > --- a/xen/include/asm-x86/mm.h
> > +++ b/xen/include/asm-x86/mm.h
> > @@ -356,24 +356,12 @@ struct platform_bad_page {
> >  const struct platform_bad_page *get_platform_badpages(unsigned int
> > *array_size);
> >
> >  /* Per page locks:
> > - * page_lock() is used for two purposes: pte serialization, and memory sharing.
> > + * page_lock() is used for pte serialization.
> >   *
> >   * All users of page lock for pte serialization live in mm.c, use it
> >   * to lock a page table page during pte updates, do not take other locks ithin
> >   * the critical section delimited by page_lock/unlock, and perform no
> >   * nesting.
> > - *
> > - * All users of page lock for memory sharing live in mm/mem_sharing.c. Page_lock
> > - * is used in memory sharing to protect addition (share) and removal (unshare)
> > - * of (gfn,domain) tupples to a list of gfn's that the shared page is currently
> > - * backing. Nesting may happen when sharing (and locking) two pages -- deadlock
> > - * is avoided by locking pages in increasing order.
> > - * All memory sharing code paths take the p2m lock of the affected gfn before
> > - * taking the lock for the underlying page. We enforce ordering between page_lock
> > - * and p2m_lock using an mm-locks.h construct.
> > - *
> > - * These two users (pte serialization and memory sharing) do not collide, since
> > - * sharing is only supported for hvm guests, which do not perform pv pte updates.
> >   */
> >  int page_lock(struct page_info *page);
> >  void page_unlock(struct page_info *page);
>
> I think it would be helpful to retain (in a slightly adjusted form) the last
> sentence of the comment above, to clarify that the PGT_locked uses
> are now what does not end up colliding. At this occasion "which do not
> perform pv pte updates" would also better be re-worded to e.g.
> "which do not have PV PTEs updated" (as PVH Dom0 is very much
> expected to issue PV page table operations for PV DomU-s).

Sure.

Thanks,
Tamas
Jan Beulich May 3, 2019, 1:57 p.m. UTC | #3
>>> On 03.05.19 at 15:43, <tamas@tklengyel.com> wrote:
> On Fri, May 3, 2019 at 2:27 AM Jan Beulich <JBeulich@suse.com> wrote:
>>
>> >>> On 03.05.19 at 00:13, <tamas@tklengyel.com> wrote:
>> > --- a/xen/arch/x86/mm/mem_sharing.c
>> > +++ b/xen/arch/x86/mm/mem_sharing.c
>> > @@ -112,13 +112,48 @@ static inline void page_sharing_dispose(struct page_info *page)
>> >
>> >  #endif /* MEM_SHARING_AUDIT */
>> >
>> > -static inline int mem_sharing_page_lock(struct page_info *pg)
>> > +/*
>> > + * Private implementations of page_lock/unlock to bypass PV-only
>> > + * sanity checks not applicable to mem-sharing.
>> > + */
>> > +static inline bool _page_lock(struct page_info *page)
>> >  {
>> > -    int rc;
>> > +    unsigned long x, nx;
>> > +
>> > +    do {
>> > +        while ( (x = page->u.inuse.type_info) & PGT_locked )
>> > +            cpu_relax();
>> > +        nx = x + (1 | PGT_locked);
>> > +        if ( !(x & PGT_validated) ||
>> > +             !(x & PGT_count_mask) ||
>> > +             !(nx & PGT_count_mask) )
>> > +            return false;
>>
>> Just for my own understanding: Did you verify that the PGT_validated
>> check is indeed needed here, or did you copy it "just in case"? In the
>> latter case a comment may be worthwhile.
> 
> This is an exact copy of page_lock, sans the asserts that break it
> from mem_sharing. I didn't investigate which of these flags are
> necessary for mem_sharing. Frankly, I don't fully understand their
> meaning and I haven't came across documentation about it yet. I can
> certainly add a comment saying TODO: figure out which of these flags
> are actually needed.

Yes something along these lines is what I'm after. But "these flags"
really is just PGT_validated. There's no question the PGT_locked
and PGT_count_mask operations need to remain as they are.

Jan
George Dunlap May 3, 2019, 2:35 p.m. UTC | #4
On 5/3/19 2:57 PM, Jan Beulich wrote:
>>>> On 03.05.19 at 15:43, <tamas@tklengyel.com> wrote:
>> On Fri, May 3, 2019 at 2:27 AM Jan Beulich <JBeulich@suse.com> wrote:
>>>
>>>>>> On 03.05.19 at 00:13, <tamas@tklengyel.com> wrote:
>>>> --- a/xen/arch/x86/mm/mem_sharing.c
>>>> +++ b/xen/arch/x86/mm/mem_sharing.c
>>>> @@ -112,13 +112,48 @@ static inline void page_sharing_dispose(struct page_info *page)
>>>>
>>>>  #endif /* MEM_SHARING_AUDIT */
>>>>
>>>> -static inline int mem_sharing_page_lock(struct page_info *pg)
>>>> +/*
>>>> + * Private implementations of page_lock/unlock to bypass PV-only
>>>> + * sanity checks not applicable to mem-sharing.
>>>> + */
>>>> +static inline bool _page_lock(struct page_info *page)
>>>>  {
>>>> -    int rc;
>>>> +    unsigned long x, nx;
>>>> +
>>>> +    do {
>>>> +        while ( (x = page->u.inuse.type_info) & PGT_locked )
>>>> +            cpu_relax();
>>>> +        nx = x + (1 | PGT_locked);
>>>> +        if ( !(x & PGT_validated) ||
>>>> +             !(x & PGT_count_mask) ||
>>>> +             !(nx & PGT_count_mask) )
>>>> +            return false;
>>>
>>> Just for my own understanding: Did you verify that the PGT_validated
>>> check is indeed needed here, or did you copy it "just in case"? In the
>>> latter case a comment may be worthwhile.
>>
>> This is an exact copy of page_lock, sans the asserts that break it
>> from mem_sharing. I didn't investigate which of these flags are
>> necessary for mem_sharing. Frankly, I don't fully understand their
>> meaning and I haven't came across documentation about it yet.

I hope to add some documentation sometime in the next 6 months or so.
I've submitted a talk on the refcounting system to the XenSummit as well.

Short answer: PGT_validated means that the page in question has been
validated as safe to use as the designated type.  For PGT_writable, this
is instantaneous (i.e., the type is set to PGT_writable with the
PGT_validated bit set in the same cmpxchg operation).

Other types (such as PGT_l*_table) actually take time to verify, and so
you need to first change the type to the new type (so nobody tries to
use it as yet a different type), then verify that it's OK to use it as a
page table (by recursively checking all the entries), then set the
PGT_validated bit.

> Yes something along these lines is what I'm after. But "these flags"
> really is just PGT_validated.

Here's my take:

The sorts of "needs validation" types are PV-only I believe; if
mem_sharing is only for HVM guests, then the types should only be
PGT_writable, for which PGT_validated should always be set.

*If* we somehow do get to a situation where the type count > 0 but
PGT_validated is clear, something has probably gone terribly wrong; and
it would probably be much safer to just fail the page lock.

But perhaps that implies there should also be an ASSERT(!(x &
PGT_validated))?

 -George
Tamas K Lengyel May 3, 2019, 3:23 p.m. UTC | #5
On Fri, May 3, 2019 at 8:35 AM George Dunlap <george.dunlap@citrix.com> wrote:
>
> On 5/3/19 2:57 PM, Jan Beulich wrote:
> >>>> On 03.05.19 at 15:43, <tamas@tklengyel.com> wrote:
> >> On Fri, May 3, 2019 at 2:27 AM Jan Beulich <JBeulich@suse.com> wrote:
> >>>
> >>>>>> On 03.05.19 at 00:13, <tamas@tklengyel.com> wrote:
> >>>> --- a/xen/arch/x86/mm/mem_sharing.c
> >>>> +++ b/xen/arch/x86/mm/mem_sharing.c
> >>>> @@ -112,13 +112,48 @@ static inline void page_sharing_dispose(struct page_info *page)
> >>>>
> >>>>  #endif /* MEM_SHARING_AUDIT */
> >>>>
> >>>> -static inline int mem_sharing_page_lock(struct page_info *pg)
> >>>> +/*
> >>>> + * Private implementations of page_lock/unlock to bypass PV-only
> >>>> + * sanity checks not applicable to mem-sharing.
> >>>> + */
> >>>> +static inline bool _page_lock(struct page_info *page)
> >>>>  {
> >>>> -    int rc;
> >>>> +    unsigned long x, nx;
> >>>> +
> >>>> +    do {
> >>>> +        while ( (x = page->u.inuse.type_info) & PGT_locked )
> >>>> +            cpu_relax();
> >>>> +        nx = x + (1 | PGT_locked);
> >>>> +        if ( !(x & PGT_validated) ||
> >>>> +             !(x & PGT_count_mask) ||
> >>>> +             !(nx & PGT_count_mask) )
> >>>> +            return false;
> >>>
> >>> Just for my own understanding: Did you verify that the PGT_validated
> >>> check is indeed needed here, or did you copy it "just in case"? In the
> >>> latter case a comment may be worthwhile.
> >>
> >> This is an exact copy of page_lock, sans the asserts that break it
> >> from mem_sharing. I didn't investigate which of these flags are
> >> necessary for mem_sharing. Frankly, I don't fully understand their
> >> meaning and I haven't came across documentation about it yet.
>
> I hope to add some documentation sometime in the next 6 months or so.
> I've submitted a talk on the refcounting system to the XenSummit as well.

Great, looking forward to it!

>
> Short answer: PGT_validated means that the page in question has been
> validated as safe to use as the designated type.  For PGT_writable, this
> is instantaneous (i.e., the type is set to PGT_writable with the
> PGT_validated bit set in the same cmpxchg operation).
>
> Other types (such as PGT_l*_table) actually take time to verify, and so
> you need to first change the type to the new type (so nobody tries to
> use it as yet a different type), then verify that it's OK to use it as a
> page table (by recursively checking all the entries), then set the
> PGT_validated bit.
>
> > Yes something along these lines is what I'm after. But "these flags"
> > really is just PGT_validated.
>
> Here's my take:
>
> The sorts of "needs validation" types are PV-only I believe; if
> mem_sharing is only for HVM guests, then the types should only be
> PGT_writable, for which PGT_validated should always be set.

In which case it does sound like it's a superfluous check but it's
also harmless.

>
> *If* we somehow do get to a situation where the type count > 0 but
> PGT_validated is clear, something has probably gone terribly wrong; and
> it would probably be much safer to just fail the page lock.

Isn't that exactly what happens?

if ( !(x & PGT_validated) ||
             !(x & PGT_count_mask) ||
             !(nx & PGT_count_mask) )
            return false;

>
> But perhaps that implies there should also be an ASSERT(!(x &
> PGT_validated))?

Well, if the lock failed we already BUG out when it's for runtime
deduplication and return error when it's an operation from toolstack.
So I don't think we need that extra ASSERT.

Tamas
diff mbox series

Patch

diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
index 4b3a094481..baae7ceeda 100644
--- a/xen/arch/x86/mm/mem_sharing.c
+++ b/xen/arch/x86/mm/mem_sharing.c
@@ -112,13 +112,48 @@  static inline void page_sharing_dispose(struct page_info *page)
 
 #endif /* MEM_SHARING_AUDIT */
 
-static inline int mem_sharing_page_lock(struct page_info *pg)
+/*
+ * Private implementations of page_lock/unlock to bypass PV-only
+ * sanity checks not applicable to mem-sharing.
+ */
+static inline bool _page_lock(struct page_info *page)
 {
-    int rc;
+    unsigned long x, nx;
+
+    do {
+        while ( (x = page->u.inuse.type_info) & PGT_locked )
+            cpu_relax();
+        nx = x + (1 | PGT_locked);
+        if ( !(x & PGT_validated) ||
+             !(x & PGT_count_mask) ||
+             !(nx & PGT_count_mask) )
+            return false;
+    } while ( cmpxchg(&page->u.inuse.type_info, x, nx) != x );
+
+    return true;
+}
+
+static inline void _page_unlock(struct page_info *page)
+{
+    unsigned long x, nx, y = page->u.inuse.type_info;
+
+    do {
+        x = y;
+        ASSERT((x & PGT_count_mask) && (x & PGT_locked));
+
+        nx = x - (1 | PGT_locked);
+        /* We must not drop the last reference here. */
+        ASSERT(nx & PGT_count_mask);
+    } while ( (y = cmpxchg(&page->u.inuse.type_info, x, nx)) != x );
+}
+
+static inline bool mem_sharing_page_lock(struct page_info *pg)
+{
+    bool rc;
     pg_lock_data_t *pld = &(this_cpu(__pld));
 
     page_sharing_mm_pre_lock();
-    rc = page_lock(pg);
+    rc = _page_lock(pg);
     if ( rc )
     {
         preempt_disable();
@@ -135,7 +170,7 @@  static inline void mem_sharing_page_unlock(struct page_info *pg)
     page_sharing_mm_unlock(pld->mm_unlock_level, 
                            &pld->recurse_count);
     preempt_enable();
-    page_unlock(pg);
+    _page_unlock(pg);
 }
 
 static inline shr_handle_t get_next_handle(void)
diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
index 6faa563167..7dc7e33f73 100644
--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -356,24 +356,12 @@  struct platform_bad_page {
 const struct platform_bad_page *get_platform_badpages(unsigned int *array_size);
 
 /* Per page locks:
- * page_lock() is used for two purposes: pte serialization, and memory sharing.
+ * page_lock() is used for pte serialization.
  *
  * All users of page lock for pte serialization live in mm.c, use it
  * to lock a page table page during pte updates, do not take other locks within
  * the critical section delimited by page_lock/unlock, and perform no
  * nesting.
- *
- * All users of page lock for memory sharing live in mm/mem_sharing.c. Page_lock
- * is used in memory sharing to protect addition (share) and removal (unshare)
- * of (gfn,domain) tupples to a list of gfn's that the shared page is currently
- * backing. Nesting may happen when sharing (and locking) two pages -- deadlock
- * is avoided by locking pages in increasing order.
- * All memory sharing code paths take the p2m lock of the affected gfn before
- * taking the lock for the underlying page. We enforce ordering between page_lock
- * and p2m_lock using an mm-locks.h construct.
- *
- * These two users (pte serialization and memory sharing) do not collide, since
- * sharing is only supported for hvm guests, which do not perform pv pte updates.
  */
 int page_lock(struct page_info *page);
 void page_unlock(struct page_info *page);