diff mbox

[PATCHv6,2/3] grant_table: convert grant table rwlock to percpu rwlock

Message ID 1453470107-27861-3-git-send-email-malcolm.crossley@citrix.com (mailing list archive)
State New, archived
Headers show

Commit Message

Malcolm Crossley Jan. 22, 2016, 1:41 p.m. UTC
The per domain grant table read lock suffers from significant contention when
performance multi-queue block or network IO due to the parallel
grant map/unmaps/copies occurring on the DomU's grant table.

On multi-socket systems, the contention results in the locked compare swap
operation failing frequently which results in a tight loop of retries of the
compare swap operation. As the coherency fabric can only support a specific
rate of compare swap operations for a particular data location then taking
the read lock itself becomes a bottleneck for grant operations.

Standard rwlock performance of a single VIF VM-VM transfer with 16 queues
configured was limited to approximately 15 gbit/s on a 2 socket Haswell-EP
host.

Percpu rwlock performance with the same configuration is approximately
48 gbit/s.

Oprofile was used to determine the initial overhead of the read-write locks
and to confirm the overhead was dramatically reduced by the percpu rwlocks.

Signed-off-by: Malcolm Crossley <malcolm.crossley@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
--
Changes since v5:
 - None
Changes since v4:
 - Rename grant table rwlock wrappers and use grant table pointer as argument
Changes since v3:
 - None
Changes since v2:
 - Switched to using wrappers for taking percpu rwlock
 - Added percpu structure pointer to percpu rwlock initialisation
 - Added comment on removal of ASSERTS for grant table rw_is_locked()
Changes since v1:
 - Used new macros provided in updated percpu rwlock v2 patch
 - Converted grant table rwlock_t to percpu_rwlock_t
 - Patched a missed grant table rwlock_t usage site
---
 xen/arch/arm/mm.c             |   4 +-
 xen/arch/x86/mm.c             |   4 +-
 xen/common/grant_table.c      | 124 +++++++++++++++++++++++-------------------
 xen/include/xen/grant_table.h |  24 +++++++-
 4 files changed, 96 insertions(+), 60 deletions(-)

Comments

Jan Beulich Jan. 22, 2016, 3:15 p.m. UTC | #1
>>> On 22.01.16 at 14:41, <malcolm.crossley@citrix.com> wrote:
> --- a/xen/common/grant_table.c
> +++ b/xen/common/grant_table.c
> @@ -178,6 +178,8 @@ struct active_grant_entry {
>  #define _active_entry(t, e) \
>      ((t)->active[(e)/ACGNT_PER_PAGE][(e)%ACGNT_PER_PAGE])
>  
> +DEFINE_PERCPU_RWLOCK_GLOBAL(grant_rwlock);
> +
>  static inline void gnttab_flush_tlb(const struct domain *d)
>  {
>      if ( !paging_mode_external(d) )
> @@ -208,7 +210,13 @@ active_entry_acquire(struct grant_table *t, grant_ref_t e)
>  {
>      struct active_grant_entry *act;
>  
> -    ASSERT(rw_is_locked(&t->lock));
> +    /* 
> +     * The grant table for the active entry should be locked but the 
> +     * percpu rwlock cannot be checked for read lock without race conditions
> +     * or high overhead so we cannot use an ASSERT 
> +     *
> +     *   ASSERT(rw_is_locked(&t->lock));
> +     */

There are a number of trailing blanks being added here (and further
down), which I'm fixing up as I'm in the process of applying this. The
reason I noticed though is that this hunk ...

> @@ -660,7 +668,13 @@ static int grant_map_exists(const struct domain *ld,
>  {
>      unsigned int ref, max_iter;
>  
> -    ASSERT(rw_is_locked(&rgt->lock));
> +    /* 
> +     * The remote grant table should be locked but the percpu rwlock
> +     * cannot be checked for read lock without race conditions or high 
> +     * overhead so we cannot use an ASSERT 
> +     *
> +     *   ASSERT(rw_is_locked(&rgt->lock));
> +     */
>  
>      max_iter = min(*ref_count + (1 << GNTTABOP_CONTINUATION_ARG_SHIFT),
>                     nr_grant_entries(rgt));

... doesn't apply at all due to being white space damaged (the line
immediately preceding the ASSERT() which gets removed actually
has four blanks on it in the source tree (which is wrong, but should
nevertheless be reflected in your patch). Due to the other trailing
whitespace found above I can also exclude the mail system to have
eaten that white space on the way here, so I really wonder which
tree this patch got created against.

Considering the hassle with the first commit attempt yesterday,
may I please ask that you apply a little more care?

Thanks, Jan
Ian Campbell Jan. 22, 2016, 3:22 p.m. UTC | #2
On Fri, 2016-01-22 at 08:15 -0700, Jan Beulich wrote:
> 
> There are a number of trailing blanks being added here (and further
> down), which I'm fixing up as I'm in the process of applying this.

Aside: Do you know about "git am --whitespace=fix" ? It automates the
removal of trailing whitespace...

Ian.
Jan Beulich Jan. 22, 2016, 3:34 p.m. UTC | #3
>>> On 22.01.16 at 16:22, <ian.campbell@citrix.com> wrote:
> On Fri, 2016-01-22 at 08:15 -0700, Jan Beulich wrote:
>> 
>> There are a number of trailing blanks being added here (and further
>> down), which I'm fixing up as I'm in the process of applying this.
> 
> Aside: Do you know about "git am --whitespace=fix" ? It automates the
> removal of trailing whitespace...

No, I didn't, but it'd be maximally useful only if I could store this as
the default into ~/.gitconfig (and maybe that's possible, just that
my git foo is too lame). Besides that I'm also not always using "git
am", not the least because what my mail frontend saves is not
always compatible with that command (leading to lost metadata).

Jan
Ian Campbell Jan. 22, 2016, 3:53 p.m. UTC | #4
On Fri, 2016-01-22 at 08:34 -0700, Jan Beulich wrote:
> >>> On 22.01.16 at 16:22, <ian.campbell@citrix.com> wrote:
> > On Fri, 2016-01-22 at 08:15 -0700, Jan Beulich wrote:
> >> 
> >> There are a number of trailing blanks being added here (and
> further
> >> down), which I'm fixing up as I'm in the process of applying this.
> > 
> > Aside: Do you know about "git am --whitespace=fix" ? It automates
> the
> > removal of trailing whitespace...
> 
> No, I didn't, but it'd be maximally useful only if I could store this
> as
> the default into ~/.gitconfig (and maybe that's possible, just that
> my git foo is too lame).

It ends up being a git apply option, so it looks like

apply.whitespace = "fix"

is the answer.

I don't do that and tend to just rerun if git am complains about
whitespace (which is the default) and it looks like it is worth fixing.
Or is correct to fix since checked in patches have deliberate trailing
whitepace on the context lines which you don't want to squash.

IME trailing whitespace in patches is actually astonishingly rare in
practice.

>  Besides that I'm also not always using "git
> am", not the least because what my mail frontend saves is not
> always compatible with that command (leading to lost metadata).

I use a sneaky trick, which is that the bug tracker will serve up raw,
unadulterated messages sent to xen-devel by message-id:

curl --silent http://bugs.xenproject.org/xen/mid/<msg@id>/raw

Doesn't help if the patch didn't go to xen-devel, but most of the ones
I'm interested in do.

I actually use the skanky script below which has a little bit of smarts
to do series at a time if they were sent with git send-email...

	git-msgid -g '1 10' '<git-send-email-1-blah@example.com>' | git am

Ian.

8<------

#!/bin/bash

help()
{
    echo "help!" 1>&2
}

GIT=
while getopts g: OPT ; do
        case $OPT in
            g)  GIT="$OPTARG" ;;
            h)  help ; exit 1 ;;
            \?) exit 1 ;;
        esac
done
shift $(expr $OPTIND - 1)

fetch_messages()
{
    for i in $@ ; do
        echo "Fetching Message ID $i" 1>&2
        if [ -n "$X" ] ; then
            ssh celaeno cat /srv/mldrop/xen-devel/"\"$i\""
        else
            #wget -O - -q http://bugs.xenproject.org/xen/mid/"$i"/raw
            i=${i/\+/%2B}
            curl --silent http://bugs.xenproject.org/xen/mid/"$i"/raw
        fi
    done
}

if [ -z "$GIT" ] ; then
    fetch_messages $@
else
    #<1349427871-31195-4-git-send-email-anthony.perard@citrix.com>
    for i in $@ ; do
        PATTERN=$(echo "$i" | sed -e 's/^\(<[0-9]*-[0-9]*-\)[0-9]*\(-.*>\)/\1@@NR@@\2/g')
        echo "GIT pattern $PATTERN" 1>&2
        for n in $(seq $GIT) ; do
            MSG=$(echo "$PATTERN" | sed -e "s/@@NR@@/$n/")
            fetch_messages $MSG
        done
    done
fi
diff mbox

Patch

diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
index 47bfb27..81f9e2e 100644
--- a/xen/arch/arm/mm.c
+++ b/xen/arch/arm/mm.c
@@ -1055,7 +1055,7 @@  int xenmem_add_to_physmap_one(
     switch ( space )
     {
     case XENMAPSPACE_grant_table:
-        write_lock(&d->grant_table->lock);
+        grant_write_lock(d->grant_table);
 
         if ( d->grant_table->gt_version == 0 )
             d->grant_table->gt_version = 1;
@@ -1085,7 +1085,7 @@  int xenmem_add_to_physmap_one(
 
         t = p2m_ram_rw;
 
-        write_unlock(&d->grant_table->lock);
+        grant_write_unlock(d->grant_table);
         break;
     case XENMAPSPACE_shared_info:
         if ( idx != 0 )
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index b81d1fd..a5a9b6f 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -4671,7 +4671,7 @@  int xenmem_add_to_physmap_one(
                 mfn = virt_to_mfn(d->shared_info);
             break;
         case XENMAPSPACE_grant_table:
-            write_lock(&d->grant_table->lock);
+            grant_write_lock(d->grant_table);
 
             if ( d->grant_table->gt_version == 0 )
                 d->grant_table->gt_version = 1;
@@ -4693,7 +4693,7 @@  int xenmem_add_to_physmap_one(
                     mfn = virt_to_mfn(d->grant_table->shared_raw[idx]);
             }
 
-            write_unlock(&d->grant_table->lock);
+            grant_write_unlock(d->grant_table);
             break;
         case XENMAPSPACE_gmfn_range:
         case XENMAPSPACE_gmfn:
diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c
index 5d52d1e..6a536d2 100644
--- a/xen/common/grant_table.c
+++ b/xen/common/grant_table.c
@@ -178,6 +178,8 @@  struct active_grant_entry {
 #define _active_entry(t, e) \
     ((t)->active[(e)/ACGNT_PER_PAGE][(e)%ACGNT_PER_PAGE])
 
+DEFINE_PERCPU_RWLOCK_GLOBAL(grant_rwlock);
+
 static inline void gnttab_flush_tlb(const struct domain *d)
 {
     if ( !paging_mode_external(d) )
@@ -208,7 +210,13 @@  active_entry_acquire(struct grant_table *t, grant_ref_t e)
 {
     struct active_grant_entry *act;
 
-    ASSERT(rw_is_locked(&t->lock));
+    /* 
+     * The grant table for the active entry should be locked but the 
+     * percpu rwlock cannot be checked for read lock without race conditions
+     * or high overhead so we cannot use an ASSERT 
+     *
+     *   ASSERT(rw_is_locked(&t->lock));
+     */
 
     act = &_active_entry(t, e);
     spin_lock(&act->lock);
@@ -270,23 +278,23 @@  double_gt_lock(struct grant_table *lgt, struct grant_table *rgt)
      */
     if ( lgt < rgt )
     {
-        write_lock(&lgt->lock);
-        write_lock(&rgt->lock);
+        grant_write_lock(lgt);
+        grant_write_lock(rgt);
     }
     else
     {
         if ( lgt != rgt )
-            write_lock(&rgt->lock);
-        write_lock(&lgt->lock);
+            grant_write_lock(rgt);
+        grant_write_lock(lgt);
     }
 }
 
 static inline void
 double_gt_unlock(struct grant_table *lgt, struct grant_table *rgt)
 {
-    write_unlock(&lgt->lock);
+    grant_write_unlock(lgt);
     if ( lgt != rgt )
-        write_unlock(&rgt->lock);
+        grant_write_unlock(rgt);
 }
 
 static inline int
@@ -660,7 +668,13 @@  static int grant_map_exists(const struct domain *ld,
 {
     unsigned int ref, max_iter;
 
-    ASSERT(rw_is_locked(&rgt->lock));
+    /* 
+     * The remote grant table should be locked but the percpu rwlock
+     * cannot be checked for read lock without race conditions or high 
+     * overhead so we cannot use an ASSERT 
+     *
+     *   ASSERT(rw_is_locked(&rgt->lock));
+     */
 
     max_iter = min(*ref_count + (1 << GNTTABOP_CONTINUATION_ARG_SHIFT),
                    nr_grant_entries(rgt));
@@ -703,12 +717,12 @@  static unsigned int mapkind(
      * Must have the local domain's grant table write lock when
      * iterating over its maptrack entries.
      */
-    ASSERT(rw_is_write_locked(&lgt->lock));
+    ASSERT(percpu_rw_is_write_locked(&lgt->lock));
     /*
      * Must have the remote domain's grant table write lock while
      * counting its active entries.
      */
-    ASSERT(rw_is_write_locked(&rd->grant_table->lock));
+    ASSERT(percpu_rw_is_write_locked(&rd->grant_table->lock));
 
     for ( handle = 0; !(kind & MAPKIND_WRITE) &&
                       handle < lgt->maptrack_limit; handle++ )
@@ -796,7 +810,7 @@  __gnttab_map_grant_ref(
     }
 
     rgt = rd->grant_table;
-    read_lock(&rgt->lock);
+    grant_read_lock(rgt);
 
     /* Bounds check on the grant ref */
     if ( unlikely(op->ref >= nr_grant_entries(rgt)))
@@ -859,7 +873,7 @@  __gnttab_map_grant_ref(
     cache_flags = (shah->flags & (GTF_PAT | GTF_PWT | GTF_PCD) );
 
     active_entry_release(act);
-    read_unlock(&rgt->lock);
+    grant_read_unlock(rgt);
 
     /* pg may be set, with a refcount included, from __get_paged_frame */
     if ( !pg )
@@ -1006,7 +1020,7 @@  __gnttab_map_grant_ref(
         put_page(pg);
     }
 
-    read_lock(&rgt->lock);
+    grant_read_lock(rgt);
 
     act = active_entry_acquire(rgt, op->ref);
 
@@ -1029,7 +1043,7 @@  __gnttab_map_grant_ref(
     active_entry_release(act);
 
  unlock_out:
-    read_unlock(&rgt->lock);
+    grant_read_unlock(rgt);
     op->status = rc;
     put_maptrack_handle(lgt, handle);
     rcu_unlock_domain(rd);
@@ -1080,18 +1094,18 @@  __gnttab_unmap_common(
 
     op->map = &maptrack_entry(lgt, op->handle);
 
-    read_lock(&lgt->lock);
+    grant_read_lock(lgt);
 
     if ( unlikely(!read_atomic(&op->map->flags)) )
     {
-        read_unlock(&lgt->lock);
+        grant_read_unlock(lgt);
         gdprintk(XENLOG_INFO, "Zero flags for handle (%d).\n", op->handle);
         op->status = GNTST_bad_handle;
         return;
     }
 
     dom = op->map->domid;
-    read_unlock(&lgt->lock);
+    grant_read_unlock(lgt);
 
     if ( unlikely((rd = rcu_lock_domain_by_id(dom)) == NULL) )
     {
@@ -1113,7 +1127,7 @@  __gnttab_unmap_common(
 
     rgt = rd->grant_table;
 
-    read_lock(&rgt->lock);
+    grant_read_lock(rgt);
 
     op->flags = read_atomic(&op->map->flags);
     if ( unlikely(!op->flags) || unlikely(op->map->domid != dom) )
@@ -1165,7 +1179,7 @@  __gnttab_unmap_common(
  act_release_out:
     active_entry_release(act);
  unmap_out:
-    read_unlock(&rgt->lock);
+    grant_read_unlock(rgt);
 
     if ( rc == GNTST_okay && gnttab_need_iommu_mapping(ld) )
     {
@@ -1220,7 +1234,7 @@  __gnttab_unmap_common_complete(struct gnttab_unmap_common *op)
     rcu_lock_domain(rd);
     rgt = rd->grant_table;
 
-    read_lock(&rgt->lock);
+    grant_read_lock(rgt);
     if ( rgt->gt_version == 0 )
         goto unlock_out;
 
@@ -1286,7 +1300,7 @@  __gnttab_unmap_common_complete(struct gnttab_unmap_common *op)
  act_release_out:
     active_entry_release(act);
  unlock_out:
-    read_unlock(&rgt->lock);
+    grant_read_unlock(rgt);
 
     if ( put_handle )
     {
@@ -1585,7 +1599,7 @@  gnttab_setup_table(
     }
 
     gt = d->grant_table;
-    write_lock(&gt->lock);
+    grant_write_lock(gt);
 
     if ( gt->gt_version == 0 )
         gt->gt_version = 1;
@@ -1613,7 +1627,7 @@  gnttab_setup_table(
     }
 
  out3:
-    write_unlock(&gt->lock);
+    grant_write_unlock(gt);
  out2:
     rcu_unlock_domain(d);
  out1:
@@ -1655,13 +1669,13 @@  gnttab_query_size(
         goto query_out_unlock;
     }
 
-    read_lock(&d->grant_table->lock);
+    grant_read_lock(d->grant_table);
 
     op.nr_frames     = nr_grant_frames(d->grant_table);
     op.max_nr_frames = max_grant_frames;
     op.status        = GNTST_okay;
 
-    read_unlock(&d->grant_table->lock);
+    grant_read_unlock(d->grant_table);
 
  
  query_out_unlock:
@@ -1687,7 +1701,7 @@  gnttab_prepare_for_transfer(
     union grant_combo   scombo, prev_scombo, new_scombo;
     int                 retries = 0;
 
-    read_lock(&rgt->lock);
+    grant_read_lock(rgt);
 
     if ( unlikely(ref >= nr_grant_entries(rgt)) )
     {
@@ -1730,11 +1744,11 @@  gnttab_prepare_for_transfer(
         scombo = prev_scombo;
     }
 
-    read_unlock(&rgt->lock);
+    grant_read_unlock(rgt);
     return 1;
 
  fail:
-    read_unlock(&rgt->lock);
+    grant_read_unlock(rgt);
     return 0;
 }
 
@@ -1925,7 +1939,7 @@  gnttab_transfer(
         TRACE_1D(TRC_MEM_PAGE_GRANT_TRANSFER, e->domain_id);
 
         /* Tell the guest about its new page frame. */
-        read_lock(&e->grant_table->lock);
+        grant_read_lock(e->grant_table);
         act = active_entry_acquire(e->grant_table, gop.ref);
 
         if ( e->grant_table->gt_version == 1 )
@@ -1949,7 +1963,7 @@  gnttab_transfer(
             GTF_transfer_completed;
 
         active_entry_release(act);
-        read_unlock(&e->grant_table->lock);
+        grant_read_unlock(e->grant_table);
 
         rcu_unlock_domain(e);
 
@@ -1987,7 +2001,7 @@  __release_grant_for_copy(
     released_read = 0;
     released_write = 0;
 
-    read_lock(&rgt->lock);
+    grant_read_lock(rgt);
 
     act = active_entry_acquire(rgt, gref);
     sha = shared_entry_header(rgt, gref);
@@ -2029,7 +2043,7 @@  __release_grant_for_copy(
     }
 
     active_entry_release(act);
-    read_unlock(&rgt->lock);
+    grant_read_unlock(rgt);
 
     if ( td != rd )
     {
@@ -2086,7 +2100,7 @@  __acquire_grant_for_copy(
 
     *page = NULL;
 
-    read_lock(&rgt->lock);
+    grant_read_lock(rgt);
 
     if ( unlikely(gref >= nr_grant_entries(rgt)) )
         PIN_FAIL(gt_unlock_out, GNTST_bad_gntref,
@@ -2168,20 +2182,20 @@  __acquire_grant_for_copy(
              * here and reacquire
              */
             active_entry_release(act);
-            read_unlock(&rgt->lock);
+            grant_read_unlock(rgt);
 
             rc = __acquire_grant_for_copy(td, trans_gref, rd->domain_id,
                                           readonly, &grant_frame, page,
                                           &trans_page_off, &trans_length, 0);
 
-            read_lock(&rgt->lock);
+            grant_read_lock(rgt);
             act = active_entry_acquire(rgt, gref);
 
             if ( rc != GNTST_okay ) {
                 __fixup_status_for_copy_pin(act, status);
                 rcu_unlock_domain(td);
                 active_entry_release(act);
-                read_unlock(&rgt->lock);
+                grant_read_unlock(rgt);
                 return rc;
             }
 
@@ -2194,7 +2208,7 @@  __acquire_grant_for_copy(
                 __fixup_status_for_copy_pin(act, status);
                 rcu_unlock_domain(td);
                 active_entry_release(act);
-                read_unlock(&rgt->lock);
+                grant_read_unlock(rgt);
                 put_page(*page);
                 return __acquire_grant_for_copy(rd, gref, ldom, readonly,
                                                 frame, page, page_off, length,
@@ -2258,7 +2272,7 @@  __acquire_grant_for_copy(
     *frame = act->frame;
 
     active_entry_release(act);
-    read_unlock(&rgt->lock);
+    grant_read_unlock(rgt);
     return rc;
  
  unlock_out_clear:
@@ -2273,7 +2287,7 @@  __acquire_grant_for_copy(
     active_entry_release(act);
 
  gt_unlock_out:
-    read_unlock(&rgt->lock);
+    grant_read_unlock(rgt);
 
     return rc;
 }
@@ -2589,7 +2603,7 @@  gnttab_set_version(XEN_GUEST_HANDLE_PARAM(gnttab_set_version_t) uop)
     if ( gt->gt_version == op.version )
         goto out;
 
-    write_lock(&gt->lock);
+    grant_write_lock(gt);
     /*
      * Make sure that the grant table isn't currently in use when we
      * change the version number, except for the first 8 entries which
@@ -2702,7 +2716,7 @@  gnttab_set_version(XEN_GUEST_HANDLE_PARAM(gnttab_set_version_t) uop)
     gt->gt_version = op.version;
 
  out_unlock:
-    write_unlock(&gt->lock);
+    grant_write_unlock(gt);
 
  out:
     op.version = gt->gt_version;
@@ -2758,7 +2772,7 @@  gnttab_get_status_frames(XEN_GUEST_HANDLE_PARAM(gnttab_get_status_frames_t) uop,
 
     op.status = GNTST_okay;
 
-    read_lock(&gt->lock);
+    grant_read_lock(gt);
 
     for ( i = 0; i < op.nr_frames; i++ )
     {
@@ -2767,7 +2781,7 @@  gnttab_get_status_frames(XEN_GUEST_HANDLE_PARAM(gnttab_get_status_frames_t) uop,
             op.status = GNTST_bad_virt_addr;
     }
 
-    read_unlock(&gt->lock);
+    grant_read_unlock(gt);
 out2:
     rcu_unlock_domain(d);
 out1:
@@ -2817,7 +2831,7 @@  __gnttab_swap_grant_ref(grant_ref_t ref_a, grant_ref_t ref_b)
     struct active_grant_entry *act_b = NULL;
     s16 rc = GNTST_okay;
 
-    write_lock(&gt->lock);
+    grant_write_lock(gt);
 
     /* Bounds check on the grant refs */
     if ( unlikely(ref_a >= nr_grant_entries(d->grant_table)))
@@ -2865,7 +2879,7 @@  out:
         active_entry_release(act_b);
     if ( act_a != NULL )
         active_entry_release(act_a);
-    write_unlock(&gt->lock);
+    grant_write_unlock(gt);
 
     rcu_unlock_domain(d);
 
@@ -2936,12 +2950,12 @@  static int __gnttab_cache_flush(gnttab_cache_flush_t *cflush,
 
     if ( d != owner )
     {
-        read_lock(&owner->grant_table->lock);
+        grant_read_lock(owner->grant_table);
 
         ret = grant_map_exists(d, owner->grant_table, mfn, ref_count);
         if ( ret != 0 )
         {
-            read_unlock(&owner->grant_table->lock);
+            grant_read_unlock(owner->grant_table);
             rcu_unlock_domain(d);
             put_page(page);
             return ret;
@@ -2961,7 +2975,7 @@  static int __gnttab_cache_flush(gnttab_cache_flush_t *cflush,
         ret = 0;
 
     if ( d != owner )
-        read_unlock(&owner->grant_table->lock);
+        grant_read_unlock(owner->grant_table);
     unmap_domain_page(v);
     put_page(page);
 
@@ -3180,7 +3194,7 @@  grant_table_create(
         goto no_mem_0;
 
     /* Simple stuff. */
-    rwlock_init(&t->lock);
+    percpu_rwlock_resource_init(&t->lock, grant_rwlock);
     spin_lock_init(&t->maptrack_lock);
     t->nr_grant_frames = INITIAL_NR_GRANT_FRAMES;
 
@@ -3282,7 +3296,7 @@  gnttab_release_mappings(
         }
 
         rgt = rd->grant_table;
-        read_lock(&rgt->lock);
+        grant_read_lock(rgt);
 
         act = active_entry_acquire(rgt, ref);
         sha = shared_entry_header(rgt, ref);
@@ -3343,7 +3357,7 @@  gnttab_release_mappings(
             gnttab_clear_flag(_GTF_reading, status);
 
         active_entry_release(act);
-        read_unlock(&rgt->lock);
+        grant_read_unlock(rgt);
 
         rcu_unlock_domain(rd);
 
@@ -3360,7 +3374,7 @@  void grant_table_warn_active_grants(struct domain *d)
 
 #define WARN_GRANT_MAX 10
 
-    read_lock(&gt->lock);
+    grant_read_lock(gt);
 
     for ( ref = 0; ref != nr_grant_entries(gt); ref++ )
     {
@@ -3382,7 +3396,7 @@  void grant_table_warn_active_grants(struct domain *d)
         printk(XENLOG_G_DEBUG "Dom%d has too many (%d) active grants to report\n",
                d->domain_id, nr_active);
 
-    read_unlock(&gt->lock);
+    grant_read_unlock(gt);
 
 #undef WARN_GRANT_MAX
 }
@@ -3432,7 +3446,7 @@  static void gnttab_usage_print(struct domain *rd)
     printk("      -------- active --------       -------- shared --------\n");
     printk("[ref] localdom mfn      pin          localdom gmfn     flags\n");
 
-    read_lock(&gt->lock);
+    grant_read_lock(gt);
 
     for ( ref = 0; ref != nr_grant_entries(gt); ref++ )
     {
@@ -3475,7 +3489,7 @@  static void gnttab_usage_print(struct domain *rd)
         active_entry_release(act);
     }
 
-    read_unlock(&gt->lock);
+    grant_read_unlock(gt);
 
     if ( first )
         printk("grant-table for remote domain:%5d ... "
diff --git a/xen/include/xen/grant_table.h b/xen/include/xen/grant_table.h
index 1c29cee..b4f064e 100644
--- a/xen/include/xen/grant_table.h
+++ b/xen/include/xen/grant_table.h
@@ -51,13 +51,15 @@ 
 /* The maximum size of a grant table. */
 extern unsigned int max_grant_frames;
 
+DECLARE_PERCPU_RWLOCK_GLOBAL(grant_rwlock);
+
 /* Per-domain grant information. */
 struct grant_table {
     /*
      * Lock protecting updates to grant table state (version, active
      * entry list, etc.)
      */
-    rwlock_t              lock;
+    percpu_rwlock_t       lock;
     /* Table size. Number of frames shared with guest */
     unsigned int          nr_grant_frames;
     /* Shared grant table (see include/public/grant_table.h). */
@@ -82,6 +84,26 @@  struct grant_table {
     unsigned              gt_version;
 };
 
+static inline void grant_read_lock(struct grant_table *gt)
+{
+    percpu_read_lock(grant_rwlock, &gt->lock);
+}
+
+static inline void grant_read_unlock(struct grant_table *gt)
+{
+    percpu_read_unlock(grant_rwlock, &gt->lock);
+}
+
+static inline void grant_write_lock(struct grant_table *gt)
+{
+    percpu_write_lock(grant_rwlock, &gt->lock);
+}
+
+static inline void grant_write_unlock(struct grant_table *gt)
+{
+    percpu_write_unlock(grant_rwlock, &gt->lock);
+}
+
 /* Create/destroy per-domain grant table context. */
 int grant_table_create(
     struct domain *d);