diff mbox series

[2/3] x86/mem_sharing: replace use of page_lock/unlock with our own lock

Message ID 20190425153252.14795-2-tamas@tklengyel.com (mailing list archive)
State New, archived
Headers show
Series [1/3] x86/mem_sharing: aquire extra references for pages with correct domain | expand

Commit Message

Tamas K Lengyel April 25, 2019, 3:32 p.m. UTC
The page_lock/unlock functions were designed to be working with PV pagetable
updates. It's recycled use in mem_sharing is somewhat odd and results in
unecessary complications. Replace it with a separate per-page lock.

Signed-off-by: Tamas K Lengyel <tamas@tklengyel.com>
Cc: George Dunlap <george.dunlap@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Roger Pau Monne <roger.pau@citrix.com>
---
 xen/arch/x86/mm/mem_sharing.c | 138 ++++++++++++++--------------------
 xen/include/asm-x86/mm.h      |   5 +-
 2 files changed, 60 insertions(+), 83 deletions(-)

Comments

Andrew Cooper April 25, 2019, 6:58 p.m. UTC | #1
On 25/04/2019 16:32, Tamas K Lengyel wrote:
> diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
> index 6faa563167..594de6148f 100644
> --- a/xen/include/asm-x86/mm.h
> +++ b/xen/include/asm-x86/mm.h
> @@ -133,7 +133,10 @@ struct page_info
>           * of sharing share the version they expect to.
>           * This list is allocated and freed when a page is shared/unshared.
>           */
> -        struct page_sharing_info *sharing;
> +        struct {
> +            struct page_sharing_info *info;
> +            rwlock_t lock;
> +        } sharing;
>      };
>  
>      /* Reference count and various PGC_xxx flags and fields. */

I'm afraid this is a no-go, but for some reasons which are rather more
subtle that they appear here.

There is one struct page_info per 4k frame, and you've added an extra 16
bytes, taking it from 32 bytes to 48 bytes.

Other than making it a non-power-of-2 (net diffstat grow/shrink: 256/27
up/down: 7750/-5696 (2054) due to less efficient arithmetic[1]), the
framtable has a fixed virtual size (128G by default), so you've ended up
dropping Xen's memory limit (16TB by default) by 1/3.

~Andrew

[1] The drop is almost all in pdx_group_valid datastructure, which is a
function of the max supported memory limit.
Tamas K Lengyel April 25, 2019, 7:57 p.m. UTC | #2
On Thu, Apr 25, 2019 at 12:58 PM Andrew Cooper
<andrew.cooper3@citrix.com> wrote:
>
> On 25/04/2019 16:32, Tamas K Lengyel wrote:
> > diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
> > index 6faa563167..594de6148f 100644
> > --- a/xen/include/asm-x86/mm.h
> > +++ b/xen/include/asm-x86/mm.h
> > @@ -133,7 +133,10 @@ struct page_info
> >           * of sharing share the version they expect to.
> >           * This list is allocated and freed when a page is shared/unshared.
> >           */
> > -        struct page_sharing_info *sharing;
> > +        struct {
> > +            struct page_sharing_info *info;
> > +            rwlock_t lock;
> > +        } sharing;
> >      };
> >
> >      /* Reference count and various PGC_xxx flags and fields. */
>
> I'm afraid this is a no-go, but for some reasons which are rather more
> subtle that they appear here.
>
> There is one struct page_info per 4k frame, and you've added an extra 16
> bytes, taking it from 32 bytes to 48 bytes.
>
> Other than making it a non-power-of-2 (net diffstat grow/shrink: 256/27
> up/down: 7750/-5696 (2054) due to less efficient arithmetic[1]), the
> framtable has a fixed virtual size (128G by default), so you've ended up
> dropping Xen's memory limit (16TB by default) by 1/3.

Interesting, I'm not sure I completely follow but does this mean that
this structure is now not allowed to grow, like ever, or that I just
need to add padding to fix up its alignment?

Tamas
Andrew Cooper April 25, 2019, 8:01 p.m. UTC | #3
On 25/04/2019 20:57, Tamas K Lengyel wrote:
> On Thu, Apr 25, 2019 at 12:58 PM Andrew Cooper
> <andrew.cooper3@citrix.com> wrote:
>> On 25/04/2019 16:32, Tamas K Lengyel wrote:
>>> diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
>>> index 6faa563167..594de6148f 100644
>>> --- a/xen/include/asm-x86/mm.h
>>> +++ b/xen/include/asm-x86/mm.h
>>> @@ -133,7 +133,10 @@ struct page_info
>>>           * of sharing share the version they expect to.
>>>           * This list is allocated and freed when a page is shared/unshared.
>>>           */
>>> -        struct page_sharing_info *sharing;
>>> +        struct {
>>> +            struct page_sharing_info *info;
>>> +            rwlock_t lock;
>>> +        } sharing;
>>>      };
>>>
>>>      /* Reference count and various PGC_xxx flags and fields. */
>> I'm afraid this is a no-go, but for some reasons which are rather more
>> subtle that they appear here.
>>
>> There is one struct page_info per 4k frame, and you've added an extra 16
>> bytes, taking it from 32 bytes to 48 bytes.
>>
>> Other than making it a non-power-of-2 (net diffstat grow/shrink: 256/27
>> up/down: 7750/-5696 (2054) due to less efficient arithmetic[1]), the
>> framtable has a fixed virtual size (128G by default), so you've ended up
>> dropping Xen's memory limit (16TB by default) by 1/3.
> Interesting, I'm not sure I completely follow but does this mean that
> this structure is now not allowed to grow, like ever, or that I just
> need to add padding to fix up its alignment?

Basically, it needs exceedingly good reasons to grow.  Even "supporting
more than 16T of RAM" wasn't a good enough reason to grow it from 32 to
64 bytes, which is why CONFIG_BIGMEM exists and is off by default.

At the moment, I don't have a good solution for your problem to suggest.

~Andrew
Tamas K Lengyel April 26, 2019, 12:12 a.m. UTC | #4
On Thu, Apr 25, 2019 at 2:01 PM Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>
> On 25/04/2019 20:57, Tamas K Lengyel wrote:
> > On Thu, Apr 25, 2019 at 12:58 PM Andrew Cooper
> > <andrew.cooper3@citrix.com> wrote:
> >> On 25/04/2019 16:32, Tamas K Lengyel wrote:
> >>> diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
> >>> index 6faa563167..594de6148f 100644
> >>> --- a/xen/include/asm-x86/mm.h
> >>> +++ b/xen/include/asm-x86/mm.h
> >>> @@ -133,7 +133,10 @@ struct page_info
> >>>           * of sharing share the version they expect to.
> >>>           * This list is allocated and freed when a page is shared/unshared.
> >>>           */
> >>> -        struct page_sharing_info *sharing;
> >>> +        struct {
> >>> +            struct page_sharing_info *info;
> >>> +            rwlock_t lock;
> >>> +        } sharing;
> >>>      };
> >>>
> >>>      /* Reference count and various PGC_xxx flags and fields. */
> >> I'm afraid this is a no-go, but for some reasons which are rather more
> >> subtle that they appear here.
> >>
> >> There is one struct page_info per 4k frame, and you've added an extra 16
> >> bytes, taking it from 32 bytes to 48 bytes.
> >>
> >> Other than making it a non-power-of-2 (net diffstat grow/shrink: 256/27
> >> up/down: 7750/-5696 (2054) due to less efficient arithmetic[1]), the
> >> framtable has a fixed virtual size (128G by default), so you've ended up
> >> dropping Xen's memory limit (16TB by default) by 1/3.
> > Interesting, I'm not sure I completely follow but does this mean that
> > this structure is now not allowed to grow, like ever, or that I just
> > need to add padding to fix up its alignment?
>
> Basically, it needs exceedingly good reasons to grow.  Even "supporting
> more than 16T of RAM" wasn't a good enough reason to grow it from 32 to
> 64 bytes, which is why CONFIG_BIGMEM exists and is off by default.
>
> At the moment, I don't have a good solution for your problem to suggest.
>

I would be OK with putting the whole thing behind
CONFIG_HAS_MEM_SHARING and having that be off by default. Is that a
feasible route from your POV?

Tamas
Jan Beulich April 26, 2019, 9:24 a.m. UTC | #5
>>> On 26.04.19 at 02:12, <tamas@tklengyel.com> wrote:
> I would be OK with putting the whole thing behind
> CONFIG_HAS_MEM_SHARING and having that be off by default. Is that a
> feasible route from your POV?

So is there anything wrong with my earlier suggestion of
re-purposing the sharing field to attach a structure to the page
which contains the necessary lock? I.e. in the simplest case by
adding the lock to struct page_sharing_info itself?

As to your question above - that would be another option, of
course with the config option getting its HAS_ part dropped.
Possibly it could then even default to enabled when BIGMEM=y.
But you realize that by going this route you further increase
the risk of changes elsewhere breaking mem-sharing without
anyone noticing right away?

Jan
Jan Beulich April 26, 2019, 9:26 a.m. UTC | #6
>>> On 25.04.19 at 21:57, <tamas@tklengyel.com> wrote:
> On Thu, Apr 25, 2019 at 12:58 PM Andrew Cooper
> <andrew.cooper3@citrix.com> wrote:
>>
>> On 25/04/2019 16:32, Tamas K Lengyel wrote:
>> > diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
>> > index 6faa563167..594de6148f 100644
>> > --- a/xen/include/asm-x86/mm.h
>> > +++ b/xen/include/asm-x86/mm.h
>> > @@ -133,7 +133,10 @@ struct page_info
>> >           * of sharing share the version they expect to.
>> >           * This list is allocated and freed when a page is 
> shared/unshared.
>> >           */
>> > -        struct page_sharing_info *sharing;
>> > +        struct {
>> > +            struct page_sharing_info *info;
>> > +            rwlock_t lock;
>> > +        } sharing;
>> >      };
>> >
>> >      /* Reference count and various PGC_xxx flags and fields. */
>>
>> I'm afraid this is a no-go, but for some reasons which are rather more
>> subtle that they appear here.
>>
>> There is one struct page_info per 4k frame, and you've added an extra 16
>> bytes, taking it from 32 bytes to 48 bytes.
>>
>> Other than making it a non-power-of-2 (net diffstat grow/shrink: 256/27
>> up/down: 7750/-5696 (2054) due to less efficient arithmetic[1]), the
>> framtable has a fixed virtual size (128G by default), so you've ended up
>> dropping Xen's memory limit (16TB by default) by 1/3.
> 
> Interesting, I'm not sure I completely follow but does this mean that
> this structure is now not allowed to grow, like ever, or that I just
> need to add padding to fix up its alignment?

In addition to what Andrew has said, also notice how we've gone
through extra hoops to avoid growing the structure in the context
of XSA-280.

Jan
Tamas K Lengyel April 26, 2019, 12:24 p.m. UTC | #7
On Fri, Apr 26, 2019 at 3:24 AM Jan Beulich <JBeulich@suse.com> wrote:
>
> >>> On 26.04.19 at 02:12, <tamas@tklengyel.com> wrote:
> > I would be OK with putting the whole thing behind
> > CONFIG_HAS_MEM_SHARING and having that be off by default. Is that a
> > feasible route from your POV?
>
> So is there anything wrong with my earlier suggestion of
> re-purposing the sharing field to attach a structure to the page
> which contains the necessary lock? I.e. in the simplest case by
> adding the lock to struct page_sharing_info itself?

Yes, that won't work unfortunately. The lock is supposed to protect
updates made to the structure but also freeing it. If the lock lives
within the structure it obviously would have to be unlocked before its
freed, but if its unlocked before freed then another thread waiting on
it could continue without realizing it is being freed.

>
> As to your question above - that would be another option, of
> course with the config option getting its HAS_ part dropped.
> Possibly it could then even default to enabled when BIGMEM=y.
> But you realize that by going this route you further increase
> the risk of changes elsewhere breaking mem-sharing without
> anyone noticing right away?

I do realize that but considering how much breakage happened to
memsharing with it enabled, I don't think it makes much difference.
Without having any tests in the CI system the compile-testing itself
is not sufficient.

Tamas
Tamas K Lengyel April 26, 2019, 12:24 p.m. UTC | #8
On Fri, Apr 26, 2019 at 3:26 AM Jan Beulich <JBeulich@suse.com> wrote:
>
> >>> On 25.04.19 at 21:57, <tamas@tklengyel.com> wrote:
> > On Thu, Apr 25, 2019 at 12:58 PM Andrew Cooper
> > <andrew.cooper3@citrix.com> wrote:
> >>
> >> On 25/04/2019 16:32, Tamas K Lengyel wrote:
> >> > diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
> >> > index 6faa563167..594de6148f 100644
> >> > --- a/xen/include/asm-x86/mm.h
> >> > +++ b/xen/include/asm-x86/mm.h
> >> > @@ -133,7 +133,10 @@ struct page_info
> >> >           * of sharing share the version they expect to.
> >> >           * This list is allocated and freed when a page is
> > shared/unshared.
> >> >           */
> >> > -        struct page_sharing_info *sharing;
> >> > +        struct {
> >> > +            struct page_sharing_info *info;
> >> > +            rwlock_t lock;
> >> > +        } sharing;
> >> >      };
> >> >
> >> >      /* Reference count and various PGC_xxx flags and fields. */
> >>
> >> I'm afraid this is a no-go, but for some reasons which are rather more
> >> subtle that they appear here.
> >>
> >> There is one struct page_info per 4k frame, and you've added an extra 16
> >> bytes, taking it from 32 bytes to 48 bytes.
> >>
> >> Other than making it a non-power-of-2 (net diffstat grow/shrink: 256/27
> >> up/down: 7750/-5696 (2054) due to less efficient arithmetic[1]), the
> >> framtable has a fixed virtual size (128G by default), so you've ended up
> >> dropping Xen's memory limit (16TB by default) by 1/3.
> >
> > Interesting, I'm not sure I completely follow but does this mean that
> > this structure is now not allowed to grow, like ever, or that I just
> > need to add padding to fix up its alignment?
>
> In addition to what Andrew has said, also notice how we've gone
> through extra hoops to avoid growing the structure in the context
> of XSA-280.

Looks to me at this point I won't be able to replace using
page_lock/page_unlock with a separate lock in a meaningful way.

Tamas
Jan Beulich April 26, 2019, 1:12 p.m. UTC | #9
>>> On 26.04.19 at 14:24, <tamas@tklengyel.com> wrote:
> On Fri, Apr 26, 2019 at 3:24 AM Jan Beulich <JBeulich@suse.com> wrote:
>>
>> >>> On 26.04.19 at 02:12, <tamas@tklengyel.com> wrote:
>> > I would be OK with putting the whole thing behind
>> > CONFIG_HAS_MEM_SHARING and having that be off by default. Is that a
>> > feasible route from your POV?
>>
>> So is there anything wrong with my earlier suggestion of
>> re-purposing the sharing field to attach a structure to the page
>> which contains the necessary lock? I.e. in the simplest case by
>> adding the lock to struct page_sharing_info itself?
> 
> Yes, that won't work unfortunately. The lock is supposed to protect
> updates made to the structure but also freeing it. If the lock lives
> within the structure it obviously would have to be unlocked before its
> freed, but if its unlocked before freed then another thread waiting on
> it could continue without realizing it is being freed.

Can't you RCU-free the structure instead, after detaching it from
the main struct page_info instance? Of course all involved parties
then need to be aware that once they've acquired the lock, the
pointer in struct page_info may have become NULL, which
presumably would direct them to drop the lock again right away.

Jan
Tamas K Lengyel April 26, 2019, 1:18 p.m. UTC | #10
On Fri, Apr 26, 2019 at 7:12 AM Jan Beulich <JBeulich@suse.com> wrote:
>
> >>> On 26.04.19 at 14:24, <tamas@tklengyel.com> wrote:
> > On Fri, Apr 26, 2019 at 3:24 AM Jan Beulich <JBeulich@suse.com> wrote:
> >>
> >> >>> On 26.04.19 at 02:12, <tamas@tklengyel.com> wrote:
> >> > I would be OK with putting the whole thing behind
> >> > CONFIG_HAS_MEM_SHARING and having that be off by default. Is that a
> >> > feasible route from your POV?
> >>
> >> So is there anything wrong with my earlier suggestion of
> >> re-purposing the sharing field to attach a structure to the page
> >> which contains the necessary lock? I.e. in the simplest case by
> >> adding the lock to struct page_sharing_info itself?
> >
> > Yes, that won't work unfortunately. The lock is supposed to protect
> > updates made to the structure but also freeing it. If the lock lives
> > within the structure it obviously would have to be unlocked before its
> > freed, but if its unlocked before freed then another thread waiting on
> > it could continue without realizing it is being freed.
>
> Can't you RCU-free the structure instead, after detaching it from
> the main struct page_info instance? Of course all involved parties
> then need to be aware that once they've acquired the lock, the
> pointer in struct page_info may have become NULL, which
> presumably would direct them to drop the lock again right away.

It's a chicken-and-the-egg problem. It wouldn't be safe without a lock
to do anything with that structure. Having a caller be able to grab
the lock but have an understanding that the structure - including the
lock itself - may be freed is not a feasible route. If it was possible
to do that kind-of coordination then we wouldn't need a lock in the
first place.

Tamas
Jan Beulich April 26, 2019, 2:47 p.m. UTC | #11
>>> On 26.04.19 at 15:18, <tamas@tklengyel.com> wrote:
> On Fri, Apr 26, 2019 at 7:12 AM Jan Beulich <JBeulich@suse.com> wrote:
>>
>> >>> On 26.04.19 at 14:24, <tamas@tklengyel.com> wrote:
>> > On Fri, Apr 26, 2019 at 3:24 AM Jan Beulich <JBeulich@suse.com> wrote:
>> >>
>> >> >>> On 26.04.19 at 02:12, <tamas@tklengyel.com> wrote:
>> >> > I would be OK with putting the whole thing behind
>> >> > CONFIG_HAS_MEM_SHARING and having that be off by default. Is that a
>> >> > feasible route from your POV?
>> >>
>> >> So is there anything wrong with my earlier suggestion of
>> >> re-purposing the sharing field to attach a structure to the page
>> >> which contains the necessary lock? I.e. in the simplest case by
>> >> adding the lock to struct page_sharing_info itself?
>> >
>> > Yes, that won't work unfortunately. The lock is supposed to protect
>> > updates made to the structure but also freeing it. If the lock lives
>> > within the structure it obviously would have to be unlocked before its
>> > freed, but if its unlocked before freed then another thread waiting on
>> > it could continue without realizing it is being freed.
>>
>> Can't you RCU-free the structure instead, after detaching it from
>> the main struct page_info instance? Of course all involved parties
>> then need to be aware that once they've acquired the lock, the
>> pointer in struct page_info may have become NULL, which
>> presumably would direct them to drop the lock again right away.
> 
> It's a chicken-and-the-egg problem. It wouldn't be safe without a lock
> to do anything with that structure. Having a caller be able to grab
> the lock but have an understanding that the structure - including the
> lock itself - may be freed is not a feasible route.

But by using RCU the structure can't be freed behind the back of
anyone holding a lock. Parties observing the pointer to become
NULL could still (using a local copy of the pointer) access the
structure (and hence the lock).

> If it was possible
> to do that kind-of coordination then we wouldn't need a lock in the
> first place.

I'm not convinced of this, but I'm also not an RCU specialist, so
there may well be ways of getting things to work that way without
any lock.

Jan
Tamas K Lengyel April 26, 2019, 3:09 p.m. UTC | #12
On Fri, Apr 26, 2019 at 8:47 AM Jan Beulich <JBeulich@suse.com> wrote:
>
> >>> On 26.04.19 at 15:18, <tamas@tklengyel.com> wrote:
> > On Fri, Apr 26, 2019 at 7:12 AM Jan Beulich <JBeulich@suse.com> wrote:
> >>
> >> >>> On 26.04.19 at 14:24, <tamas@tklengyel.com> wrote:
> >> > On Fri, Apr 26, 2019 at 3:24 AM Jan Beulich <JBeulich@suse.com> wrote:
> >> >>
> >> >> >>> On 26.04.19 at 02:12, <tamas@tklengyel.com> wrote:
> >> >> > I would be OK with putting the whole thing behind
> >> >> > CONFIG_HAS_MEM_SHARING and having that be off by default. Is that a
> >> >> > feasible route from your POV?
> >> >>
> >> >> So is there anything wrong with my earlier suggestion of
> >> >> re-purposing the sharing field to attach a structure to the page
> >> >> which contains the necessary lock? I.e. in the simplest case by
> >> >> adding the lock to struct page_sharing_info itself?
> >> >
> >> > Yes, that won't work unfortunately. The lock is supposed to protect
> >> > updates made to the structure but also freeing it. If the lock lives
> >> > within the structure it obviously would have to be unlocked before its
> >> > freed, but if its unlocked before freed then another thread waiting on
> >> > it could continue without realizing it is being freed.
> >>
> >> Can't you RCU-free the structure instead, after detaching it from
> >> the main struct page_info instance? Of course all involved parties
> >> then need to be aware that once they've acquired the lock, the
> >> pointer in struct page_info may have become NULL, which
> >> presumably would direct them to drop the lock again right away.
> >
> > It's a chicken-and-the-egg problem. It wouldn't be safe without a lock
> > to do anything with that structure. Having a caller be able to grab
> > the lock but have an understanding that the structure - including the
> > lock itself - may be freed is not a feasible route.
>
> But by using RCU the structure can't be freed behind the back of
> anyone holding a lock. Parties observing the pointer to become
> NULL could still (using a local copy of the pointer) access the
> structure (and hence the lock).
>
> > If it was possible
> > to do that kind-of coordination then we wouldn't need a lock in the
> > first place.
>
> I'm not convinced of this, but I'm also not an RCU specialist, so
> there may well be ways of getting things to work that way without
> any lock.

Perhaps, but that's also way beyond what I have bandwidth at this time
to investigate.

Tamas
diff mbox series

Patch

diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
index dfc279d371..f63fe9a415 100644
--- a/xen/arch/x86/mm/mem_sharing.c
+++ b/xen/arch/x86/mm/mem_sharing.c
@@ -57,7 +57,7 @@  static DEFINE_PER_CPU(pg_lock_data_t, __pld);
 #define RMAP_HASHTAB_SIZE   \
         ((PAGE_SIZE << RMAP_HASHTAB_ORDER) / sizeof(struct list_head))
 #define RMAP_USES_HASHTAB(page) \
-        ((page)->sharing->hash_table.flag == NULL)
+        ((page)->sharing.info->hash_table.flag == NULL)
 #define RMAP_HEAVY_SHARED_PAGE   RMAP_HASHTAB_SIZE
 /* A bit of hysteresis. We don't want to be mutating between list and hash
  * table constantly. */
@@ -77,9 +77,9 @@  static void _free_pg_shared_info(struct rcu_head *head)
 
 static inline void audit_add_list(struct page_info *page)
 {
-    INIT_LIST_HEAD(&page->sharing->entry);
+    INIT_LIST_HEAD(&page->sharing.info->entry);
     spin_lock(&shr_audit_lock);
-    list_add_rcu(&page->sharing->entry, &shr_audit_list);
+    list_add_rcu(&page->sharing.info->entry, &shr_audit_list);
     spin_unlock(&shr_audit_lock);
 }
 
@@ -88,14 +88,14 @@  static inline void page_sharing_dispose(struct page_info *page)
 {
     /* Unlikely given our thresholds, but we should be careful. */
     if ( unlikely(RMAP_USES_HASHTAB(page)) )
-        free_xenheap_pages(page->sharing->hash_table.bucket, 
-                            RMAP_HASHTAB_ORDER);
+        free_xenheap_pages(page->sharing.info->hash_table.bucket,
+                           RMAP_HASHTAB_ORDER);
 
     spin_lock(&shr_audit_lock);
-    list_del_rcu(&page->sharing->entry);
+    list_del_rcu(&page->sharing.info->entry);
     spin_unlock(&shr_audit_lock);
-    INIT_RCU_HEAD(&page->sharing->rcu_head);
-    call_rcu(&page->sharing->rcu_head, _free_pg_shared_info);
+    INIT_RCU_HEAD(&page->sharing.info->rcu_head);
+    call_rcu(&page->sharing.info->rcu_head, _free_pg_shared_info);
 }
 
 #else
@@ -105,27 +105,22 @@  static inline void page_sharing_dispose(struct page_info *page)
 {
     /* Unlikely given our thresholds, but we should be careful. */
     if ( unlikely(RMAP_USES_HASHTAB(page)) )
-        free_xenheap_pages(page->sharing->hash_table.bucket, 
-                            RMAP_HASHTAB_ORDER);
-    xfree(page->sharing);
+        free_xenheap_pages(page->sharing.info->hash_table.bucket,
+                           RMAP_HASHTAB_ORDER);
+    xfree(page->sharing.info);
 }
 
 #endif /* MEM_SHARING_AUDIT */
 
-static inline int mem_sharing_page_lock(struct page_info *pg)
+static inline void mem_sharing_page_lock(struct page_info *pg)
 {
-    int rc;
     pg_lock_data_t *pld = &(this_cpu(__pld));
 
     page_sharing_mm_pre_lock();
-    rc = page_lock(pg);
-    if ( rc )
-    {
-        preempt_disable();
-        page_sharing_mm_post_lock(&pld->mm_unlock_level, 
-                                  &pld->recurse_count);
-    }
-    return rc;
+    write_lock(&pg->sharing.lock);
+    preempt_disable();
+    page_sharing_mm_post_lock(&pld->mm_unlock_level,
+                              &pld->recurse_count);
 }
 
 static inline void mem_sharing_page_unlock(struct page_info *pg)
@@ -135,7 +130,7 @@  static inline void mem_sharing_page_unlock(struct page_info *pg)
     page_sharing_mm_unlock(pld->mm_unlock_level, 
                            &pld->recurse_count);
     preempt_enable();
-    page_unlock(pg);
+    write_unlock(&pg->sharing.lock);
 }
 
 static inline shr_handle_t get_next_handle(void)
@@ -172,7 +167,7 @@  static inline void
 rmap_init(struct page_info *page)
 {
     /* We always start off as a doubly linked list. */
-    INIT_LIST_HEAD(&page->sharing->gfns);
+    INIT_LIST_HEAD(&page->sharing.info->gfns);
 }
 
 /* Exceedingly simple "hash function" */
@@ -194,7 +189,7 @@  rmap_list_to_hash_table(struct page_info *page)
     for ( i = 0; i < RMAP_HASHTAB_SIZE; i++ )
         INIT_LIST_HEAD(b + i);
 
-    list_for_each_safe(pos, tmp, &page->sharing->gfns)
+    list_for_each_safe(pos, tmp, &page->sharing.info->gfns)
     {
         gfn_info_t *gfn_info = list_entry(pos, gfn_info_t, list);
         struct list_head *bucket = b + HASH(gfn_info->domain, gfn_info->gfn);
@@ -202,8 +197,8 @@  rmap_list_to_hash_table(struct page_info *page)
         list_add(pos, bucket);
     }
 
-    page->sharing->hash_table.bucket = b;
-    page->sharing->hash_table.flag   = NULL;
+    page->sharing.info->hash_table.bucket = b;
+    page->sharing.info->hash_table.flag   = NULL;
 
     return 0;
 }
@@ -212,9 +207,9 @@  static inline void
 rmap_hash_table_to_list(struct page_info *page)
 {
     unsigned int i;
-    struct list_head *bucket = page->sharing->hash_table.bucket;
+    struct list_head *bucket = page->sharing.info->hash_table.bucket;
 
-    INIT_LIST_HEAD(&page->sharing->gfns);
+    INIT_LIST_HEAD(&page->sharing.info->gfns);
 
     for ( i = 0; i < RMAP_HASHTAB_SIZE; i++ )
     {
@@ -222,7 +217,7 @@  rmap_hash_table_to_list(struct page_info *page)
         list_for_each_safe(pos, tmp, head)
         {
             list_del(pos);
-            list_add(pos, &page->sharing->gfns);
+            list_add(pos, &page->sharing.info->gfns);
         }
     }
 
@@ -268,9 +263,9 @@  rmap_add(gfn_info_t *gfn_info, struct page_info *page)
         (void)rmap_list_to_hash_table(page);
 
     head = (RMAP_USES_HASHTAB(page)) ?
-        page->sharing->hash_table.bucket + 
+        page->sharing.info->hash_table.bucket +
                             HASH(gfn_info->domain, gfn_info->gfn) :
-        &page->sharing->gfns;
+        &page->sharing.info->gfns;
 
     INIT_LIST_HEAD(&gfn_info->list);
     list_add(&gfn_info->list, head);
@@ -284,8 +279,8 @@  rmap_retrieve(uint16_t domain_id, unsigned long gfn,
     struct list_head *le, *head;
 
     head = (RMAP_USES_HASHTAB(page)) ?
-        page->sharing->hash_table.bucket + HASH(domain_id, gfn) :
-        &page->sharing->gfns;
+        page->sharing.info->hash_table.bucket + HASH(domain_id, gfn) :
+        &page->sharing.info->gfns;
 
     list_for_each(le, head)
     {
@@ -322,8 +317,8 @@  static inline void
 rmap_seed_iterator(struct page_info *page, struct rmap_iterator *ri)
 {
     ri->curr = (RMAP_USES_HASHTAB(page)) ?
-                page->sharing->hash_table.bucket :
-                &page->sharing->gfns;
+                page->sharing.info->hash_table.bucket :
+                &page->sharing.info->gfns;
     ri->next = ri->curr->next; 
     ri->bucket = 0;
 }
@@ -332,8 +327,8 @@  static inline gfn_info_t *
 rmap_iterate(struct page_info *page, struct rmap_iterator *ri)
 {
     struct list_head *head = (RMAP_USES_HASHTAB(page)) ?
-                page->sharing->hash_table.bucket + ri->bucket :
-                &page->sharing->gfns;
+                page->sharing.info->hash_table.bucket + ri->bucket :
+                &page->sharing.info->gfns;
 
 retry:
     if ( ri->next == head)
@@ -344,7 +339,7 @@  retry:
             if ( ri->bucket >= RMAP_HASHTAB_SIZE )
                 /* No more hash table buckets */
                 return NULL;
-            head = page->sharing->hash_table.bucket + ri->bucket;
+            head = page->sharing.info->hash_table.bucket + ri->bucket;
             ri->curr = head;
             ri->next = ri->curr->next;
             goto retry;
@@ -398,12 +393,8 @@  static struct page_info* mem_sharing_lookup(unsigned long mfn)
         struct page_info* page = mfn_to_page(_mfn(mfn));
         if ( page_get_owner(page) == dom_cow )
         {
-            /* Count has to be at least two, because we're called
-             * with the mfn locked (1) and this is supposed to be 
-             * a shared page (1). */
             unsigned long t = read_atomic(&page->u.inuse.type_info);
             ASSERT((t & PGT_type_mask) == PGT_shared_page);
-            ASSERT((t & PGT_count_mask) >= 2);
             ASSERT(SHARED_M2P(get_gpfn_from_mfn(mfn)));
             return page;
         }
@@ -437,14 +428,7 @@  static int audit(void)
         pg = pg_shared_info->pg;
         mfn = page_to_mfn(pg);
 
-        /* If we can't lock it, it's definitely not a shared page */
-        if ( !mem_sharing_page_lock(pg) )
-        {
-           MEM_SHARING_DEBUG("mfn %lx in audit list, but cannot be locked (%lx)!\n",
-                              mfn_x(mfn), pg->u.inuse.type_info);
-           errors++;
-           continue;
-        }
+        mem_sharing_page_lock(pg);
 
         /* Check if the MFN has correct type, owner and handle. */ 
         if ( (pg->u.inuse.type_info & PGT_type_mask) != PGT_shared_page )
@@ -472,7 +456,7 @@  static int audit(void)
         }
 
         /* Check we have a list */
-        if ( (!pg->sharing) || !rmap_has_entries(pg) )
+        if ( (!pg->sharing.info) || !rmap_has_entries(pg) )
         {
            MEM_SHARING_DEBUG("mfn %lx shared, but empty gfn list!\n",
                              mfn_x(mfn));
@@ -517,8 +501,8 @@  static int audit(void)
             put_domain(d);
             nr_gfns++;
         }
-        /* The type count has an extra ref because we have locked the page */
-        if ( (nr_gfns + 1) != (pg->u.inuse.type_info & PGT_count_mask) )
+
+        if ( nr_gfns != (pg->u.inuse.type_info & PGT_count_mask) )
         {
             MEM_SHARING_DEBUG("Mismatched counts for MFN=%lx."
                               "nr_gfns in list %lu, in type_info %lx\n",
@@ -622,6 +606,7 @@  static int page_make_sharable(struct domain *d,
         return -E2BIG;
     }
 
+    rwlock_init(&page->sharing.lock);
     page_set_owner(page, dom_cow);
     drop_dom_ref = !domain_adjust_tot_pages(d, -1);
     page_list_del(page, &d->page_list);
@@ -648,11 +633,7 @@  static int page_make_private(struct domain *d, struct page_info *page)
         return -EBUSY;
     }
 
-    /* We can only change the type if count is one */
-    /* Because we are locking pages individually, we need to drop
-     * the lock here, while the page is typed. We cannot risk the 
-     * race of page_unlock and then put_page_type. */
-    expected_type = (PGT_shared_page | PGT_validated | PGT_locked | 2);
+    expected_type = (PGT_shared_page | PGT_validated | 1);
     if ( page->u.inuse.type_info != expected_type )
     {
         spin_unlock(&d->page_alloc_lock);
@@ -688,10 +669,7 @@  static inline struct page_info *__grab_shared_page(mfn_t mfn)
         return NULL;
     pg = mfn_to_page(mfn);
 
-    /* If the page is not validated we can't lock it, and if it's  
-     * not validated it's obviously not shared. */
-    if ( !mem_sharing_page_lock(pg) )
-        return NULL;
+    mem_sharing_page_lock(pg);
 
     if ( mem_sharing_lookup(mfn_x(mfn)) == NULL )
     {
@@ -720,8 +698,7 @@  static int debug_mfn(mfn_t mfn)
             page->u.inuse.type_info,
             page_get_owner(page)->domain_id);
 
-    /* -1 because the page is locked and that's an additional type ref */
-    num_refs = ((int) (page->u.inuse.type_info & PGT_count_mask)) - 1;
+    num_refs = (int) (page->u.inuse.type_info & PGT_count_mask) ;
     mem_sharing_page_unlock(page);
     return num_refs;
 }
@@ -792,7 +769,7 @@  static int nominate_page(struct domain *d, gfn_t gfn,
                     gfn_x(gfn), mfn_x(mfn), d->domain_id);
             BUG();
         }
-        *phandle = pg->sharing->handle;
+        *phandle = pg->sharing.info->handle;
         ret = 0;
         mem_sharing_page_unlock(pg);
         goto out;
@@ -839,33 +816,30 @@  static int nominate_page(struct domain *d, gfn_t gfn,
     if ( ret ) 
         goto out;
 
-    /* Now that the page is validated, we can lock it. There is no 
-     * race because we're holding the p2m entry, so no one else 
+    /* There is no race because we're holding the p2m entry, so no one else
      * could be nominating this gfn */
-    ret = -ENOENT;
-    if ( !mem_sharing_page_lock(page) )
-        goto out;
+    mem_sharing_page_lock(page);
 
     /* Initialize the shared state */
     ret = -ENOMEM;
-    if ( (page->sharing = 
+    if ( (page->sharing.info =
             xmalloc(struct page_sharing_info)) == NULL )
     {
         /* Making a page private atomically unlocks it */
         BUG_ON(page_make_private(d, page) != 0);
         goto out;
     }
-    page->sharing->pg = page;
+    page->sharing.info->pg = page;
     rmap_init(page);
 
     /* Create the handle */
-    page->sharing->handle = get_next_handle();  
+    page->sharing.info->handle = get_next_handle();
 
     /* Create the local gfn info */
     if ( mem_sharing_gfn_alloc(page, d, gfn_x(gfn)) == NULL )
     {
-        xfree(page->sharing);
-        page->sharing = NULL;
+        xfree(page->sharing.info);
+        page->sharing.info = NULL;
         BUG_ON(page_make_private(d, page) != 0);
         goto out;
     }
@@ -879,7 +853,7 @@  static int nominate_page(struct domain *d, gfn_t gfn,
     /* Update m2p entry to SHARED_M2P_ENTRY */
     set_gpfn_from_mfn(mfn_x(mfn), SHARED_M2P_ENTRY);
 
-    *phandle = page->sharing->handle;
+    *phandle = page->sharing.info->handle;
     audit_add_list(page);
     mem_sharing_page_unlock(page);
     ret = 0;
@@ -949,14 +923,14 @@  static int share_pages(struct domain *sd, gfn_t sgfn, shr_handle_t sh,
     ASSERT(cmfn_type == p2m_ram_shared);
 
     /* Check that the handles match */
-    if ( spage->sharing->handle != sh )
+    if ( spage->sharing.info->handle != sh )
     {
         ret = XENMEM_SHARING_OP_S_HANDLE_INVALID;
         mem_sharing_page_unlock(secondpg);
         mem_sharing_page_unlock(firstpg);
         goto err_out;
     }
-    if ( cpage->sharing->handle != ch )
+    if ( cpage->sharing.info->handle != ch )
     {
         ret = XENMEM_SHARING_OP_C_HANDLE_INVALID;
         mem_sharing_page_unlock(secondpg);
@@ -990,11 +964,11 @@  static int share_pages(struct domain *sd, gfn_t sgfn, shr_handle_t sh,
         BUG_ON(set_shared_p2m_entry(d, gfn->gfn, smfn));
         put_domain(d);
     }
-    ASSERT(list_empty(&cpage->sharing->gfns));
+    ASSERT(list_empty(&cpage->sharing.info->gfns));
 
     /* Clear the rest of the shared state */
     page_sharing_dispose(cpage);
-    cpage->sharing = NULL;
+    cpage->sharing.info = NULL;
 
     mem_sharing_page_unlock(secondpg);
     mem_sharing_page_unlock(firstpg);
@@ -1037,7 +1011,7 @@  int mem_sharing_add_to_physmap(struct domain *sd, unsigned long sgfn, shr_handle
     ASSERT(smfn_type == p2m_ram_shared);
 
     /* Check that the handles match */
-    if ( spage->sharing->handle != sh )
+    if ( spage->sharing.info->handle != sh )
         goto err_unlock;
 
     /* Make sure the target page is a hole in the physmap. These are typically
@@ -1155,7 +1129,7 @@  int __mem_sharing_unshare_page(struct domain *d,
          * before destroying the rmap. */
         mem_sharing_gfn_destroy(page, d, gfn_info);
         page_sharing_dispose(page);
-        page->sharing = NULL;
+        page->sharing.info = NULL;
         atomic_dec(&nr_shared_mfns);
     }
     else
diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
index 6faa563167..594de6148f 100644
--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -133,7 +133,10 @@  struct page_info
          * of sharing share the version they expect to.
          * This list is allocated and freed when a page is shared/unshared.
          */
-        struct page_sharing_info *sharing;
+        struct {
+            struct page_sharing_info *info;
+            rwlock_t lock;
+        } sharing;
     };
 
     /* Reference count and various PGC_xxx flags and fields. */