diff mbox series

[MM-PART2,RESEND,v2,16/19] xen/arm: mm: Protect Xen page-table update with a spinlock

Message ID 20190514122456.28559-17-julien.grall@arm.com (mailing list archive)
State New, archived
Headers show
Series xen/arm: Clean-up & fixes in boot/mm code | expand

Commit Message

Julien Grall May 14, 2019, 12:24 p.m. UTC
The function create_xen_entries() may be called concurrently. For
instance, while the vmap allocation is protected by a spinlock, the
mapping is not.

The implementation create_xen_entries() contains quite a few TOCTOU
races such as when allocating the 3rd-level page-tables.

Thankfully, they are pretty hard to reach as page-tables are allocated
once and never released. Yet it is possible, so we need to protect with
a spinlock to avoid corrupting the page-tables.

Signed-off-by: Julien Grall <julien.grall@arm.com>

---
    Changes in v2:
        - Rework the commit message
---
 xen/arch/arm/mm.c | 6 ++++++
 1 file changed, 6 insertions(+)

Comments

Andrii Anisov May 21, 2019, 10:04 a.m. UTC | #1
On 14.05.19 15:24, Julien Grall wrote:
> The function create_xen_entries() may be called concurrently. For
> instance, while the vmap allocation is protected by a spinlock, the
> mapping is not.
> 
> The implementation create_xen_entries() contains quite a few TOCTOU
> races such as when allocating the 3rd-level page-tables.
> 
> Thankfully, they are pretty hard to reach as page-tables are allocated
> once and never released. Yet it is possible, so we need to protect with
> a spinlock to avoid corrupting the page-tables.
> 
> Signed-off-by: Julien Grall <julien.grall@arm.com>
> 
> ---
>      Changes in v2:
>          - Rework the commit message
> ---

Reviewed-by: Andrii Anisov <andrii_anisov@epam.com>
Stefano Stabellini June 4, 2019, 11:11 p.m. UTC | #2
On Tue, 14 May 2019, Julien Grall wrote:
> The function create_xen_entries() may be called concurrently. For
> instance, while the vmap allocation is protected by a spinlock, the
> mapping is not.

Do you have an example of potential concurrent calls of
create_xen_entries() which doesn't involve concurrent vmaps (because
vmaps are already protected by their spinlock)? vmap + something_else
for instance?


> The implementation create_xen_entries() contains quite a few TOCTOU
> races such as when allocating the 3rd-level page-tables.
> 
> Thankfully, they are pretty hard to reach as page-tables are allocated
> once and never released. Yet it is possible, so we need to protect with
> a spinlock to avoid corrupting the page-tables.
> 
> Signed-off-by: Julien Grall <julien.grall@arm.com>
> 
> ---
>     Changes in v2:
>         - Rework the commit message
> ---
>  xen/arch/arm/mm.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
> index 9a5f2e1c3f..7502a14760 100644
> --- a/xen/arch/arm/mm.c
> +++ b/xen/arch/arm/mm.c
> @@ -974,6 +974,8 @@ enum xenmap_operation {
>      RESERVE
>  };
>  
> +static DEFINE_SPINLOCK(xen_pt_lock);
> +
>  static int create_xen_entries(enum xenmap_operation op,
>                                unsigned long virt,
>                                mfn_t mfn,
> @@ -985,6 +987,8 @@ static int create_xen_entries(enum xenmap_operation op,
>      lpae_t pte, *entry;
>      lpae_t *third = NULL;
>  
> +    spin_lock(&xen_pt_lock);
> +
>      for(; addr < addr_end; addr += PAGE_SIZE, mfn = mfn_add(mfn, 1))
>      {
>          entry = &xen_second[second_linear_offset(addr)];
> @@ -1059,6 +1063,8 @@ out:
>       */
>      flush_xen_tlb_range_va(virt, PAGE_SIZE * nr_mfns);
>  
> +    spin_unlock(&xen_pt_lock);
> +
>      return rc;
>  }
>  
> -- 
> 2.11.0
>
Julien Grall June 5, 2019, 10:36 a.m. UTC | #3
Hi Stefano,

On 05/06/2019 00:11, Stefano Stabellini wrote:
> On Tue, 14 May 2019, Julien Grall wrote:
>> The function create_xen_entries() may be called concurrently. For
>> instance, while the vmap allocation is protected by a spinlock, the
>> mapping is not.
> 
> Do you have an example of potential concurrent calls of
> create_xen_entries() which doesn't involve concurrent vmaps (because
> vmaps are already protected by their spinlock)? vmap + something_else
> for instance?
Well, I gave an example here. The vmap allocation (i.e vm_alloc) is protected by 
a spinlock. However, when the mapping is done there are no spinlock to protected 
against concurrent one.

So the following scenario could happen:

CPU0				      |	CPU1
				      |
vmap()				      |	vmap()
   vm_alloc()			      |   vm_alloc()
     spin_lock()			      |
     ...			  	      |
     spin_unlock()		      |
				      |	    spin_lock()
     * interrupt *		      |	    ...
				      |	    spin_unlock()
				      |
     map_pages_to_xen()		      |	    map_pages_to_xen()
	entry = &xen_second[X]	      |	 	entry = &xen_second[Y]
	* entry invalid *             |         * entry invalid *
	create_xen_table()	      |		create_xen_table()
	

Assuming X == Y (i.e we the xen second entry is the same), then one will win the 
race and therefore one mapping will be inexistent.

But as I wrote, the chance it is happening is very limited.

I can add that in the commit message.

Cheers,
Stefano Stabellini June 8, 2019, 12:17 a.m. UTC | #4
On Wed, 5 Jun 2019, Julien Grall wrote:
> Hi Stefano,
> 
> On 05/06/2019 00:11, Stefano Stabellini wrote:
> > On Tue, 14 May 2019, Julien Grall wrote:
> > > The function create_xen_entries() may be called concurrently. For
> > > instance, while the vmap allocation is protected by a spinlock, the
> > > mapping is not.
> > 
> > Do you have an example of potential concurrent calls of
> > create_xen_entries() which doesn't involve concurrent vmaps (because
> > vmaps are already protected by their spinlock)? vmap + something_else
> > for instance?
> Well, I gave an example here. The vmap allocation (i.e vm_alloc) is protected
> by a spinlock. However, when the mapping is done there are no spinlock to
> protected against concurrent one.
> 
> So the following scenario could happen:
> 
> CPU0				      |	CPU1
> 				      |
> vmap()				      |	vmap()
>   vm_alloc()			      |   vm_alloc()
>     spin_lock()			      |
>     ...			  	      |
>     spin_unlock()		      |
> 				      |	    spin_lock()
>     * interrupt *		      |	    ...
> 				      |	    spin_unlock()
> 				      |
>     map_pages_to_xen()		      |	    map_pages_to_xen()
> 	entry = &xen_second[X]	      |	 	entry = &xen_second[Y]
> 	* entry invalid *             |         * entry invalid *
> 	create_xen_table()	      |		create_xen_table()
> 	
> 
> Assuming X == Y (i.e we the xen second entry is the same), then one will win
> the race and therefore one mapping will be inexistent.
> 
> But as I wrote, the chance it is happening is very limited.
> 
> I can add that in the commit message.

Thanks for the detailed explanation, I am just trying to understand and
double-check the race. Wouldn't vm_alloc guarantee to return a different
va in the two cases (CPU0 and CPU1 above), given that the selection of
the va is done under spin_lock? But it would be still possible to have X
and Y so that they select the same &xen_second entry, hence, the race
with create_xen_table(). It looks like the race is there.

Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Julien Grall June 13, 2019, 12:06 p.m. UTC | #5
Hi Stefano,

On 08/06/2019 01:17, Stefano Stabellini wrote:
> On Wed, 5 Jun 2019, Julien Grall wrote:
>> On 05/06/2019 00:11, Stefano Stabellini wrote:
>>> On Tue, 14 May 2019, Julien Grall wrote:
>>>> The function create_xen_entries() may be called concurrently. For
>>>> instance, while the vmap allocation is protected by a spinlock, the
>>>> mapping is not.
>>>
>>> Do you have an example of potential concurrent calls of
>>> create_xen_entries() which doesn't involve concurrent vmaps (because
>>> vmaps are already protected by their spinlock)? vmap + something_else
>>> for instance?
>> Well, I gave an example here. The vmap allocation (i.e vm_alloc) is protected
>> by a spinlock. However, when the mapping is done there are no spinlock to
>> protected against concurrent one.
>>
>> So the following scenario could happen:
>>
>> CPU0				      |	CPU1
>> 				      |
>> vmap()				      |	vmap()
>>    vm_alloc()			      |   vm_alloc()
>>      spin_lock()			      |
>>      ...			  	      |
>>      spin_unlock()		      |
>> 				      |	    spin_lock()
>>      * interrupt *		      |	    ...
>> 				      |	    spin_unlock()
>> 				      |
>>      map_pages_to_xen()		      |	    map_pages_to_xen()
>> 	entry = &xen_second[X]	      |	 	entry = &xen_second[Y]
>> 	* entry invalid *             |         * entry invalid *
>> 	create_xen_table()	      |		create_xen_table()
>> 	
>>
>> Assuming X == Y (i.e we the xen second entry is the same), then one will win
>> the race and therefore one mapping will be inexistent.
>>
>> But as I wrote, the chance it is happening is very limited.
>>
>> I can add that in the commit message.
> 
> Thanks for the detailed explanation, I am just trying to understand and
> double-check the race. Wouldn't vm_alloc guarantee to return a different
> va in the two cases (CPU0 and CPU1 above), given that the selection of
> the va is done under spin_lock?

Yes vm_alloc() would guarantee you to have a different VA.

> But it would be still possible to have X
> and Y so that they select the same &xen_second entry, hence, the race
> with create_xen_table(). It looks like the race is there.

That's correct.

> 
> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>

Thank you!

Cheers,
diff mbox series

Patch

diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
index 9a5f2e1c3f..7502a14760 100644
--- a/xen/arch/arm/mm.c
+++ b/xen/arch/arm/mm.c
@@ -974,6 +974,8 @@  enum xenmap_operation {
     RESERVE
 };
 
+static DEFINE_SPINLOCK(xen_pt_lock);
+
 static int create_xen_entries(enum xenmap_operation op,
                               unsigned long virt,
                               mfn_t mfn,
@@ -985,6 +987,8 @@  static int create_xen_entries(enum xenmap_operation op,
     lpae_t pte, *entry;
     lpae_t *third = NULL;
 
+    spin_lock(&xen_pt_lock);
+
     for(; addr < addr_end; addr += PAGE_SIZE, mfn = mfn_add(mfn, 1))
     {
         entry = &xen_second[second_linear_offset(addr)];
@@ -1059,6 +1063,8 @@  out:
      */
     flush_xen_tlb_range_va(virt, PAGE_SIZE * nr_mfns);
 
+    spin_unlock(&xen_pt_lock);
+
     return rc;
 }