diff mbox

[v3,for-4.10,1/2] x86/mm: fix potential race conditions in map_pages_to_xen().

Message ID 1510642427-3629-1-git-send-email-yu.c.zhang@linux.intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Yu Zhang Nov. 14, 2017, 6:53 a.m. UTC
From: Min He <min.he@intel.com>

In map_pages_to_xen(), a L2 page table entry may be reset to point to
a superpage, and its corresponding L1 page table need be freed in such
scenario, when these L1 page table entries are mapping to consecutive
page frames and having the same mapping flags.

However, variable `pl1e` is not protected by the lock before L1 page table
is enumerated. A race condition may happen if this code path is invoked
simultaneously on different CPUs.

For example, `pl1e` value on CPU0 may hold an obsolete value, pointing
to a page which has just been freed on CPU1. Besides, before this page
is reused, it will still be holding the old PTEs, referencing consecutive
page frames. Consequently the `free_xen_pagetable(l2e_to_l1e(ol2e))` will
be triggered on CPU0, resulting the unexpected free of a normal page.

This patch fixes the above problem by protecting the `pl1e` with the lock.

Also, there're other potential race conditions. For instance, the L2/L3
entry may be modified concurrently on different CPUs, by routines such as
map_pages_to_xen(), modify_xen_mappings() etc. To fix this, this patch will
check the _PAGE_PRESENT and _PAGE_PSE flags, after the spinlock is obtained,
for the corresponding L2/L3 entry.

Signed-off-by: Min He <min.he@intel.com>
Signed-off-by: Yi Zhang <yi.z.zhang@intel.com>
Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>

Changes in v3:
According to comments from Jan Beulich:
  - use local variable instead of dereference pointer to pte to check flag.
  - also check the _PAGE_PRESENT for L2E/L3E.
Others:
  - Commit message changes.

Changes in v2:
According to comments from Jan Beulich:
  - check PSE of pl2e and pl3e, and skip the re-consolidation if set.
  - commit message changes, e.g. add "From :" tag etc.
  - code style changes.
  - introduce a seperate patch to resolve the similar issue in
    modify_xen_mappings().
---
 xen/arch/x86/mm.c | 36 ++++++++++++++++++++++++++++++++++--
 1 file changed, 34 insertions(+), 2 deletions(-)

Comments

Jan Beulich Nov. 14, 2017, 8:20 a.m. UTC | #1
>>> On 14.11.17 at 07:53, <yu.c.zhang@linux.intel.com> wrote:
> From: Min He <min.he@intel.com>
> 
> In map_pages_to_xen(), a L2 page table entry may be reset to point to
> a superpage, and its corresponding L1 page table need be freed in such
> scenario, when these L1 page table entries are mapping to consecutive
> page frames and having the same mapping flags.
> 
> However, variable `pl1e` is not protected by the lock before L1 page table
> is enumerated. A race condition may happen if this code path is invoked
> simultaneously on different CPUs.
> 
> For example, `pl1e` value on CPU0 may hold an obsolete value, pointing
> to a page which has just been freed on CPU1. Besides, before this page
> is reused, it will still be holding the old PTEs, referencing consecutive
> page frames. Consequently the `free_xen_pagetable(l2e_to_l1e(ol2e))` will
> be triggered on CPU0, resulting the unexpected free of a normal page.
> 
> This patch fixes the above problem by protecting the `pl1e` with the lock.
> 
> Also, there're other potential race conditions. For instance, the L2/L3
> entry may be modified concurrently on different CPUs, by routines such as
> map_pages_to_xen(), modify_xen_mappings() etc. To fix this, this patch will
> check the _PAGE_PRESENT and _PAGE_PSE flags, after the spinlock is obtained,
> for the corresponding L2/L3 entry.
> 
> Signed-off-by: Min He <min.he@intel.com>
> Signed-off-by: Yi Zhang <yi.z.zhang@intel.com>
> Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>
Julien Grall Nov. 14, 2017, 12:32 p.m. UTC | #2
Hi,

On 14/11/17 08:20, Jan Beulich wrote:
>>>> On 14.11.17 at 07:53, <yu.c.zhang@linux.intel.com> wrote:
>> From: Min He <min.he@intel.com>
>>
>> In map_pages_to_xen(), a L2 page table entry may be reset to point to
>> a superpage, and its corresponding L1 page table need be freed in such
>> scenario, when these L1 page table entries are mapping to consecutive
>> page frames and having the same mapping flags.
>>
>> However, variable `pl1e` is not protected by the lock before L1 page table
>> is enumerated. A race condition may happen if this code path is invoked
>> simultaneously on different CPUs.
>>
>> For example, `pl1e` value on CPU0 may hold an obsolete value, pointing
>> to a page which has just been freed on CPU1. Besides, before this page
>> is reused, it will still be holding the old PTEs, referencing consecutive
>> page frames. Consequently the `free_xen_pagetable(l2e_to_l1e(ol2e))` will
>> be triggered on CPU0, resulting the unexpected free of a normal page.
>>
>> This patch fixes the above problem by protecting the `pl1e` with the lock.
>>
>> Also, there're other potential race conditions. For instance, the L2/L3
>> entry may be modified concurrently on different CPUs, by routines such as
>> map_pages_to_xen(), modify_xen_mappings() etc. To fix this, this patch will
>> check the _PAGE_PRESENT and _PAGE_PSE flags, after the spinlock is obtained,
>> for the corresponding L2/L3 entry.
>>
>> Signed-off-by: Min He <min.he@intel.com>
>> Signed-off-by: Yi Zhang <yi.z.zhang@intel.com>
>> Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
> 
> Reviewed-by: Jan Beulich <jbeulich@suse.com>

Please try to have a cover letter in the future when you have multiple 
patches. This will make easier to give comments/release-ack for the all 
the patches. Anyway for the 2 patches:

Release-acked-by: Julien Grall <julien.grall@linaro.org>

Cheers,
Yu Zhang Nov. 14, 2017, 3:07 p.m. UTC | #3
On 11/14/2017 8:32 PM, Julien Grall wrote:
> Hi,
>
> On 14/11/17 08:20, Jan Beulich wrote:
>>>>> On 14.11.17 at 07:53, <yu.c.zhang@linux.intel.com> wrote:
>>> From: Min He <min.he@intel.com>
>>>
>>> In map_pages_to_xen(), a L2 page table entry may be reset to point to
>>> a superpage, and its corresponding L1 page table need be freed in such
>>> scenario, when these L1 page table entries are mapping to consecutive
>>> page frames and having the same mapping flags.
>>>
>>> However, variable `pl1e` is not protected by the lock before L1 page 
>>> table
>>> is enumerated. A race condition may happen if this code path is invoked
>>> simultaneously on different CPUs.
>>>
>>> For example, `pl1e` value on CPU0 may hold an obsolete value, pointing
>>> to a page which has just been freed on CPU1. Besides, before this page
>>> is reused, it will still be holding the old PTEs, referencing 
>>> consecutive
>>> page frames. Consequently the `free_xen_pagetable(l2e_to_l1e(ol2e))` 
>>> will
>>> be triggered on CPU0, resulting the unexpected free of a normal page.
>>>
>>> This patch fixes the above problem by protecting the `pl1e` with the 
>>> lock.
>>>
>>> Also, there're other potential race conditions. For instance, the L2/L3
>>> entry may be modified concurrently on different CPUs, by routines 
>>> such as
>>> map_pages_to_xen(), modify_xen_mappings() etc. To fix this, this 
>>> patch will
>>> check the _PAGE_PRESENT and _PAGE_PSE flags, after the spinlock is 
>>> obtained,
>>> for the corresponding L2/L3 entry.
>>>
>>> Signed-off-by: Min He <min.he@intel.com>
>>> Signed-off-by: Yi Zhang <yi.z.zhang@intel.com>
>>> Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
>>
>> Reviewed-by: Jan Beulich <jbeulich@suse.com>
>
> Please try to have a cover letter in the future when you have multiple 
> patches. This will make easier to give comments/release-ack for the 
> all the patches. Anyway for the 2 patches:

Oh, got it. Thanks for the suggestion. :-)

Yu

>
> Release-acked-by: Julien Grall <julien.grall@linaro.org>
>
> Cheers,
>
diff mbox

Patch

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index a20fdca..1697be9 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -4844,9 +4844,29 @@  int map_pages_to_xen(
             {
                 unsigned long base_mfn;
 
-                pl1e = l2e_to_l1e(*pl2e);
                 if ( locking )
                     spin_lock(&map_pgdir_lock);
+
+                ol2e = *pl2e;
+                /*
+                 * L2E may be already cleared, or set to a superpage, by
+                 * concurrent paging structure modifications on other CPUs.
+                 */
+                if ( !(l2e_get_flags(ol2e) & _PAGE_PRESENT) )
+                {
+                    if ( locking )
+                        spin_unlock(&map_pgdir_lock);
+                    continue;
+                }
+
+                if ( l2e_get_flags(ol2e) & _PAGE_PSE )
+                {
+                    if ( locking )
+                        spin_unlock(&map_pgdir_lock);
+                    goto check_l3;
+                }
+
+                pl1e = l2e_to_l1e(ol2e);
                 base_mfn = l1e_get_pfn(*pl1e) & ~(L1_PAGETABLE_ENTRIES - 1);
                 for ( i = 0; i < L1_PAGETABLE_ENTRIES; i++, pl1e++ )
                     if ( (l1e_get_pfn(*pl1e) != (base_mfn + i)) ||
@@ -4854,7 +4874,6 @@  int map_pages_to_xen(
                         break;
                 if ( i == L1_PAGETABLE_ENTRIES )
                 {
-                    ol2e = *pl2e;
                     l2e_write_atomic(pl2e, l2e_from_pfn(base_mfn,
                                                         l1f_to_lNf(flags)));
                     if ( locking )
@@ -4880,7 +4899,20 @@  int map_pages_to_xen(
 
             if ( locking )
                 spin_lock(&map_pgdir_lock);
+
             ol3e = *pl3e;
+            /*
+             * L3E may be already cleared, or set to a superpage, by
+             * concurrent paging structure modifications on other CPUs.
+             */
+            if ( !(l3e_get_flags(ol3e) & _PAGE_PRESENT) ||
+                (l3e_get_flags(ol3e) & _PAGE_PSE) )
+            {
+                if ( locking )
+                    spin_unlock(&map_pgdir_lock);
+                continue;
+            }
+
             pl2e = l3e_to_l2e(ol3e);
             base_mfn = l2e_get_pfn(*pl2e) & ~(L2_PAGETABLE_ENTRIES *
                                               L1_PAGETABLE_ENTRIES - 1);