diff mbox

[v2,2/2] x86/mm: fix a potential race condition in modify_xen_mappings().

Message ID 1510298286-30952-2-git-send-email-yu.c.zhang@linux.intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Yu Zhang Nov. 10, 2017, 7:18 a.m. UTC
In modify_xen_mappings(), a L1/L2 page table may be freed,
if all entries of this page table are empty. Corresponding
L2/L3 PTE will need be cleared in such scenario.

However, logic to enumerate the L1/L2 page table and to reset
the corresponding L2/L3 PTE need to be protected with spinlock.
Otherwise, the paging structure may be freed more than once, if
the same routine is invoked simultaneously on different CPUs.

Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/mm.c | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

Comments

Jan Beulich Nov. 10, 2017, 9:57 a.m. UTC | #1
>>> On 10.11.17 at 08:18, <yu.c.zhang@linux.intel.com> wrote:
> --- a/xen/arch/x86/mm.c
> +++ b/xen/arch/x86/mm.c
> @@ -5097,6 +5097,17 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
>               */
>              if ( (nf & _PAGE_PRESENT) || ((v != e) && (l1_table_offset(v) != 0)) )
>                  continue;
> +            if ( locking )
> +                spin_lock(&map_pgdir_lock);
> +
> +            /* L2E may be cleared on another CPU. */
> +            if ( !(l2e_get_flags(*pl2e) & _PAGE_PRESENT) )

I think you also need a PSE check here, or else the l2e_to_l1e() below
may be illegal.

> @@ -5105,11 +5116,16 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
>              {
>                  /* Empty: zap the L2E and free the L1 page. */
>                  l2e_write_atomic(pl2e, l2e_empty());
> +                if ( locking )
> +                    spin_unlock(&map_pgdir_lock);
>                  flush_area(NULL, FLUSH_TLB_GLOBAL); /* flush before free */
>                  free_xen_pagetable(pl1e);
>              }
> +            else if ( locking )
> +                spin_unlock(&map_pgdir_lock);
>          }
>  
> +check_l3:

Labels indented by at least one space please.

Jan
Yu Zhang Nov. 10, 2017, 2:02 p.m. UTC | #2
On 11/10/2017 5:57 PM, Jan Beulich wrote:
>>>> On 10.11.17 at 08:18, <yu.c.zhang@linux.intel.com> wrote:
>> --- a/xen/arch/x86/mm.c
>> +++ b/xen/arch/x86/mm.c
>> @@ -5097,6 +5097,17 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
>>                */
>>               if ( (nf & _PAGE_PRESENT) || ((v != e) && (l1_table_offset(v) != 0)) )
>>                   continue;
>> +            if ( locking )
>> +                spin_lock(&map_pgdir_lock);
>> +
>> +            /* L2E may be cleared on another CPU. */
>> +            if ( !(l2e_get_flags(*pl2e) & _PAGE_PRESENT) )
> I think you also need a PSE check here, or else the l2e_to_l1e() below
> may be illegal.

Hmm, interesting point, and thanks! :-)
I did not check the PSE, because modify_xen_mappings() will not do the 
re-consolidation, and
concurrent invokes of this routine will not change this flag. But now I 
believe this presumption
shall not be made, because the paging structures may be modified by 
other routines, like
map_pages_to_xen() on other CPUs.

So yes, I think a _PAGE_PSE check is necessary here. And I suggest we 
also check the _PAGE_PRESENT
flag as well, for the re-consolidation part in my first patch for 
map_pages_to_xen(). Do you agree?

>> @@ -5105,11 +5116,16 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
>>               {
>>                   /* Empty: zap the L2E and free the L1 page. */
>>                   l2e_write_atomic(pl2e, l2e_empty());
>> +                if ( locking )
>> +                    spin_unlock(&map_pgdir_lock);
>>                   flush_area(NULL, FLUSH_TLB_GLOBAL); /* flush before free */
>>                   free_xen_pagetable(pl1e);
>>               }
>> +            else if ( locking )
>> +                spin_unlock(&map_pgdir_lock);
>>           }
>>   
>> +check_l3:
> Labels indented by at least one space please.

Got it . Thanks.

Yu
>
> Jan
>
>
Jan Beulich Nov. 13, 2017, 9:33 a.m. UTC | #3
>>> On 10.11.17 at 15:02, <yu.c.zhang@linux.intel.com> wrote:
> On 11/10/2017 5:57 PM, Jan Beulich wrote:
>>>>> On 10.11.17 at 08:18, <yu.c.zhang@linux.intel.com> wrote:
>>> --- a/xen/arch/x86/mm.c
>>> +++ b/xen/arch/x86/mm.c
>>> @@ -5097,6 +5097,17 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
>>>                */
>>>               if ( (nf & _PAGE_PRESENT) || ((v != e) && (l1_table_offset(v) != 0)) )
>>>                   continue;
>>> +            if ( locking )
>>> +                spin_lock(&map_pgdir_lock);
>>> +
>>> +            /* L2E may be cleared on another CPU. */
>>> +            if ( !(l2e_get_flags(*pl2e) & _PAGE_PRESENT) )
>> I think you also need a PSE check here, or else the l2e_to_l1e() below
>> may be illegal.
> 
> Hmm, interesting point, and thanks! :-)
> I did not check the PSE, because modify_xen_mappings() will not do the 
> re-consolidation, and
> concurrent invokes of this routine will not change this flag. But now I 
> believe this presumption
> shall not be made, because the paging structures may be modified by 
> other routines, like
> map_pages_to_xen() on other CPUs.
> 
> So yes, I think a _PAGE_PSE check is necessary here. And I suggest we 
> also check the _PAGE_PRESENT
> flag as well, for the re-consolidation part in my first patch for 
> map_pages_to_xen(). Do you agree?

Oh, yes, definitely. I should have noticed this myself.

Jan
diff mbox

Patch

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 47855fb..c07c528 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -5097,6 +5097,17 @@  int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
              */
             if ( (nf & _PAGE_PRESENT) || ((v != e) && (l1_table_offset(v) != 0)) )
                 continue;
+            if ( locking )
+                spin_lock(&map_pgdir_lock);
+
+            /* L2E may be cleared on another CPU. */
+            if ( !(l2e_get_flags(*pl2e) & _PAGE_PRESENT) )
+            {
+                if ( locking )
+                    spin_unlock(&map_pgdir_lock);
+                goto check_l3;
+            }
+
             pl1e = l2e_to_l1e(*pl2e);
             for ( i = 0; i < L1_PAGETABLE_ENTRIES; i++ )
                 if ( l1e_get_intpte(pl1e[i]) != 0 )
@@ -5105,11 +5116,16 @@  int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
             {
                 /* Empty: zap the L2E and free the L1 page. */
                 l2e_write_atomic(pl2e, l2e_empty());
+                if ( locking )
+                    spin_unlock(&map_pgdir_lock);
                 flush_area(NULL, FLUSH_TLB_GLOBAL); /* flush before free */
                 free_xen_pagetable(pl1e);
             }
+            else if ( locking )
+                spin_unlock(&map_pgdir_lock);
         }
 
+check_l3:
         /*
          * If we are not destroying mappings, or not done with the L3E,
          * skip the empty&free check.
@@ -5117,6 +5133,17 @@  int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
         if ( (nf & _PAGE_PRESENT) ||
              ((v != e) && (l2_table_offset(v) + l1_table_offset(v) != 0)) )
             continue;
+        if ( locking )
+            spin_lock(&map_pgdir_lock);
+
+        /* L3E may be cleared on another CPU. */
+        if ( !(l3e_get_flags(*pl3e) & _PAGE_PRESENT) )
+        {
+            if ( locking )
+                spin_unlock(&map_pgdir_lock);
+            continue;
+        }
+
         pl2e = l3e_to_l2e(*pl3e);
         for ( i = 0; i < L2_PAGETABLE_ENTRIES; i++ )
             if ( l2e_get_intpte(pl2e[i]) != 0 )
@@ -5125,9 +5152,13 @@  int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
         {
             /* Empty: zap the L3E and free the L2 page. */
             l3e_write_atomic(pl3e, l3e_empty());
+            if ( locking )
+                spin_unlock(&map_pgdir_lock);
             flush_area(NULL, FLUSH_TLB_GLOBAL); /* flush before free */
             free_xen_pagetable(pl2e);
         }
+        else if ( locking )
+            spin_unlock(&map_pgdir_lock);
     }
 
     flush_area(NULL, FLUSH_TLB_GLOBAL);