diff mbox series

[for-4.15,v3,3/3] xen/iommu: x86: Harden the IOMMU page-table allocator

Message ID 20210217142458.3769-4-julien@xen.org (mailing list archive)
State New, archived
Headers show
Series xen/iommu: Collection of bug fixes for IOMMU teadorwn | expand

Commit Message

Julien Grall Feb. 17, 2021, 2:24 p.m. UTC
From: Julien Grall <jgrall@amazon.com>

At the moment, we are assuming that only iommu_map() can allocate
IOMMU page-table.

Given the complexity of the IOMMU framework, it would be sensible to
have a check closer to the IOMMU allocator. This would avoid to leak
IOMMU page-tables again in the future.

iommu_alloc_pgtable() is now checking if the domain is dying before
adding the page in the list. We are relying on &hd->arch.pgtables.lock
to synchronize d->is_dying.

Take the opportunity to add an ASSERT() in arch_iommu_domain_destroy()
to check if we freed all the IOMMU page tables.

Signed-off-by: Julien Grall <jgrall@amazon.com>

---

Changes in v3:
    - Rename the patch. This was originally "xen/iommu: x86: Don't leak
    the IOMMU page-tables"
    - Rework the commit message
    - Move the patch towards the end of the series

Changes in v2:
    - Rework the approach
    - Move the patch earlier in the series
---
 xen/drivers/passthrough/x86/iommu.c | 33 ++++++++++++++++++++++++++++-
 1 file changed, 32 insertions(+), 1 deletion(-)

Comments

Jan Beulich Feb. 17, 2021, 3:13 p.m. UTC | #1
On 17.02.2021 15:24, Julien Grall wrote:> --- a/xen/drivers/passthrough/x86/iommu.c> +++ b/xen/drivers/passthrough/x86/iommu.c> @@ -149,6 +149,13 @@ int arch_iommu_domain_init(struct domain *d)>  >  void arch_iommu_domain_destroy(struct domain *d)>  {> +    /*> +     * There should be not page-tables left allocated by the time the
Nit: s/not/no/ ?

> +     * domain is destroyed. Note that arch_iommu_domain_destroy() is
> +     * called unconditionally, so pgtables may be unitialized.
> +     */
> +    ASSERT(dom_iommu(d)->platform_ops == NULL ||
> +           page_list_empty(&dom_iommu(d)->arch.pgtables.list));
>  }
>  
>  static bool __hwdom_init hwdom_iommu_map(const struct domain *d,
> @@ -279,6 +286,9 @@ int iommu_free_pgtables(struct domain *d)
>       */
>      hd->platform_ops->clear_root_pgtable(d);
>  
> +    /* After this barrier no new page allocations can occur. */
> +    spin_barrier(&hd->arch.pgtables.lock);

Didn't patch 2 utilize the call to ->clear_root_pgtable() itself as
the barrier? Why introduce another one (with a similar comment)
explicitly now?

> @@ -315,9 +326,29 @@ struct page_info *iommu_alloc_pgtable(struct domain *d)
>      unmap_domain_page(p);
>  
>      spin_lock(&hd->arch.pgtables.lock);
> -    page_list_add(pg, &hd->arch.pgtables.list);
> +    /*
> +     * The IOMMU page-tables are freed when relinquishing the domain, but
> +     * nothing prevent allocation to happen afterwards. There is no valid
> +     * reasons to continue to update the IOMMU page-tables while the
> +     * domain is dying.
> +     *
> +     * So prevent page-table allocation when the domain is dying.
> +     *
> +     * We relying on &hd->arch.pgtables.lock to synchronize d->is_dying.
> +     */
> +    if ( likely(!d->is_dying) )
> +    {
> +        alive = true;
> +        page_list_add(pg, &hd->arch.pgtables.list);
> +    }
>      spin_unlock(&hd->arch.pgtables.lock);
>  
> +    if ( unlikely(!alive) )
> +    {
> +        free_domheap_page(pg);
> +        pg = NULL;
> +    }
> +
>      return pg;
>  }

As before I'm concerned of this forcing error paths to be taken
elsewhere, in case an allocation still happens (e.g. from unmap
once super page mappings are supported). Considering some of the
error handling in the IOMMU code is to invoke domain_crash(), it
would be quite unfortunate if we ended up crashing a domain
while it is being cleaned up after.

Additionally, the (at present still hypothetical) unmap case, if
failing because of the change here, would then again chance to
leave mappings in place while the underlying pages get freed. As
this would likely require an XSA, the change doesn't feel like
"hardening" to me.

Jan
Julien Grall Feb. 17, 2021, 4:29 p.m. UTC | #2
Hi Jan,

On 17/02/2021 15:13, Jan Beulich wrote:
> On 17.02.2021 15:24, Julien Grall wrote:> --- a/xen/drivers/passthrough/x86/iommu.c> +++ b/xen/drivers/passthrough/x86/iommu.c> @@ -149,6 +149,13 @@ int arch_iommu_domain_init(struct domain *d)>  >  void arch_iommu_domain_destroy(struct domain *d)>  {> +    /*> +     * There should be not page-tables left allocated by the time the
> Nit: s/not/no/ ?
> 
>> +     * domain is destroyed. Note that arch_iommu_domain_destroy() is
>> +     * called unconditionally, so pgtables may be unitialized.
>> +     */
>> +    ASSERT(dom_iommu(d)->platform_ops == NULL ||
>> +           page_list_empty(&dom_iommu(d)->arch.pgtables.list));
>>   }
>>   
>>   static bool __hwdom_init hwdom_iommu_map(const struct domain *d,
>> @@ -279,6 +286,9 @@ int iommu_free_pgtables(struct domain *d)
>>        */
>>       hd->platform_ops->clear_root_pgtable(d);
>>   
>> +    /* After this barrier no new page allocations can occur. */
>> +    spin_barrier(&hd->arch.pgtables.lock);
> 
> Didn't patch 2 utilize the call to ->clear_root_pgtable() itself as
> the barrier? Why introduce another one (with a similar comment)
> explicitly now?
The barriers act differently, one will get against any IOMMU page-tables 
modification. The other one will gate against allocation.

There is no guarantee that the former will prevent the latter.

> 
>> @@ -315,9 +326,29 @@ struct page_info *iommu_alloc_pgtable(struct domain *d)
>>       unmap_domain_page(p);
>>   
>>       spin_lock(&hd->arch.pgtables.lock);
>> -    page_list_add(pg, &hd->arch.pgtables.list);
>> +    /*
>> +     * The IOMMU page-tables are freed when relinquishing the domain, but
>> +     * nothing prevent allocation to happen afterwards. There is no valid
>> +     * reasons to continue to update the IOMMU page-tables while the
>> +     * domain is dying.
>> +     *
>> +     * So prevent page-table allocation when the domain is dying.
>> +     *
>> +     * We relying on &hd->arch.pgtables.lock to synchronize d->is_dying.
>> +     */
>> +    if ( likely(!d->is_dying) )
>> +    {
>> +        alive = true;
>> +        page_list_add(pg, &hd->arch.pgtables.list);
>> +    }
>>       spin_unlock(&hd->arch.pgtables.lock);
>>   
>> +    if ( unlikely(!alive) )
>> +    {
>> +        free_domheap_page(pg);
>> +        pg = NULL;
>> +    }
>> +
>>       return pg;
>>   }
> 
> As before I'm concerned of this forcing error paths to be taken
> elsewhere, in case an allocation still happens (e.g. from unmap
> once super page mappings are supported). Considering some of the
> error handling in the IOMMU code is to invoke domain_crash(), it
> would be quite unfortunate if we ended up crashing a domain
> while it is being cleaned up after.

It is unfortunate, but I think this is better than having to leak page 
tables.

> 
> Additionally, the (at present still hypothetical) unmap case, if
> failing because of the change here, would then again chance to
> leave mappings in place while the underlying pages get freed. As
> this would likely require an XSA, the change doesn't feel like
> "hardening" to me.

I would agree with this if memory allocations could never fail. That's 
not that case and will become worse as we use IOMMU pool.

Do you have callers in mind that doesn't check the returns of iommu_unmap()?

Cheers,
Jan Beulich Feb. 18, 2021, 1:10 p.m. UTC | #3
On 17.02.2021 17:29, Julien Grall wrote:
> On 17/02/2021 15:13, Jan Beulich wrote:
>> On 17.02.2021 15:24, Julien Grall wrote:> --- a/xen/drivers/passthrough/x86/iommu.c> +++ b/xen/drivers/passthrough/x86/iommu.c> @@ -149,6 +149,13 @@ int arch_iommu_domain_init(struct domain *d)>  >  void arch_iommu_domain_destroy(struct domain *d)>  {> +    /*> +     * There should be not page-tables left allocated by the time the
>> Nit: s/not/no/ ?
>>
>>> +     * domain is destroyed. Note that arch_iommu_domain_destroy() is
>>> +     * called unconditionally, so pgtables may be unitialized.
>>> +     */
>>> +    ASSERT(dom_iommu(d)->platform_ops == NULL ||
>>> +           page_list_empty(&dom_iommu(d)->arch.pgtables.list));
>>>   }
>>>   
>>>   static bool __hwdom_init hwdom_iommu_map(const struct domain *d,
>>> @@ -279,6 +286,9 @@ int iommu_free_pgtables(struct domain *d)
>>>        */
>>>       hd->platform_ops->clear_root_pgtable(d);
>>>   
>>> +    /* After this barrier no new page allocations can occur. */
>>> +    spin_barrier(&hd->arch.pgtables.lock);
>>
>> Didn't patch 2 utilize the call to ->clear_root_pgtable() itself as
>> the barrier? Why introduce another one (with a similar comment)
>> explicitly now?
> The barriers act differently, one will get against any IOMMU page-tables 
> modification. The other one will gate against allocation.
> 
> There is no guarantee that the former will prevent the latter.

Oh, right - different locks. I got confused here because in both
cases the goal is to prevent allocations.

>>> @@ -315,9 +326,29 @@ struct page_info *iommu_alloc_pgtable(struct domain *d)
>>>       unmap_domain_page(p);
>>>   
>>>       spin_lock(&hd->arch.pgtables.lock);
>>> -    page_list_add(pg, &hd->arch.pgtables.list);
>>> +    /*
>>> +     * The IOMMU page-tables are freed when relinquishing the domain, but
>>> +     * nothing prevent allocation to happen afterwards. There is no valid
>>> +     * reasons to continue to update the IOMMU page-tables while the
>>> +     * domain is dying.
>>> +     *
>>> +     * So prevent page-table allocation when the domain is dying.
>>> +     *
>>> +     * We relying on &hd->arch.pgtables.lock to synchronize d->is_dying.
>>> +     */
>>> +    if ( likely(!d->is_dying) )
>>> +    {
>>> +        alive = true;
>>> +        page_list_add(pg, &hd->arch.pgtables.list);
>>> +    }
>>>       spin_unlock(&hd->arch.pgtables.lock);
>>>   
>>> +    if ( unlikely(!alive) )
>>> +    {
>>> +        free_domheap_page(pg);
>>> +        pg = NULL;
>>> +    }
>>> +
>>>       return pg;
>>>   }
>>
>> As before I'm concerned of this forcing error paths to be taken
>> elsewhere, in case an allocation still happens (e.g. from unmap
>> once super page mappings are supported). Considering some of the
>> error handling in the IOMMU code is to invoke domain_crash(), it
>> would be quite unfortunate if we ended up crashing a domain
>> while it is being cleaned up after.
> 
> It is unfortunate, but I think this is better than having to leak page 
> tables.
> 
>>
>> Additionally, the (at present still hypothetical) unmap case, if
>> failing because of the change here, would then again chance to
>> leave mappings in place while the underlying pages get freed. As
>> this would likely require an XSA, the change doesn't feel like
>> "hardening" to me.
> 
> I would agree with this if memory allocations could never fail. That's 
> not that case and will become worse as we use IOMMU pool.
> 
> Do you have callers in mind that doesn't check the returns of iommu_unmap()?

The function is marked __must_check, so there won't be any direct
callers ignoring errors (albeit I may be wrong here - we used to
have cases where we simply suppressed the resulting compiler
diagnostic, without really handling errors; not sure if all of
these are gone by now). Risks might be elsewhere.

Jan
Julien Grall Feb. 18, 2021, 1:19 p.m. UTC | #4
On 18/02/2021 13:10, Jan Beulich wrote:
> On 17.02.2021 17:29, Julien Grall wrote:
>> On 17/02/2021 15:13, Jan Beulich wrote:
>>> On 17.02.2021 15:24, Julien Grall wrote:> --- a/xen/drivers/passthrough/x86/iommu.c> +++ b/xen/drivers/passthrough/x86/iommu.c> @@ -149,6 +149,13 @@ int arch_iommu_domain_init(struct domain *d)>  >  void arch_iommu_domain_destroy(struct domain *d)>  {> +    /*> +     * There should be not page-tables left allocated by the time the
>>> Nit: s/not/no/ ?
>>>
>>>> +     * domain is destroyed. Note that arch_iommu_domain_destroy() is
>>>> +     * called unconditionally, so pgtables may be unitialized.
>>>> +     */
>>>> +    ASSERT(dom_iommu(d)->platform_ops == NULL ||
>>>> +           page_list_empty(&dom_iommu(d)->arch.pgtables.list));
>>>>    }
>>>>    
>>>>    static bool __hwdom_init hwdom_iommu_map(const struct domain *d,
>>>> @@ -279,6 +286,9 @@ int iommu_free_pgtables(struct domain *d)
>>>>         */
>>>>        hd->platform_ops->clear_root_pgtable(d);
>>>>    
>>>> +    /* After this barrier no new page allocations can occur. */
>>>> +    spin_barrier(&hd->arch.pgtables.lock);
>>>
>>> Didn't patch 2 utilize the call to ->clear_root_pgtable() itself as
>>> the barrier? Why introduce another one (with a similar comment)
>>> explicitly now?
>> The barriers act differently, one will get against any IOMMU page-tables
>> modification. The other one will gate against allocation.
>>
>> There is no guarantee that the former will prevent the latter.
> 
> Oh, right - different locks. I got confused here because in both
> cases the goal is to prevent allocations.
> 
>>>> @@ -315,9 +326,29 @@ struct page_info *iommu_alloc_pgtable(struct domain *d)
>>>>        unmap_domain_page(p);
>>>>    
>>>>        spin_lock(&hd->arch.pgtables.lock);
>>>> -    page_list_add(pg, &hd->arch.pgtables.list);
>>>> +    /*
>>>> +     * The IOMMU page-tables are freed when relinquishing the domain, but
>>>> +     * nothing prevent allocation to happen afterwards. There is no valid
>>>> +     * reasons to continue to update the IOMMU page-tables while the
>>>> +     * domain is dying.
>>>> +     *
>>>> +     * So prevent page-table allocation when the domain is dying.
>>>> +     *
>>>> +     * We relying on &hd->arch.pgtables.lock to synchronize d->is_dying.
>>>> +     */
>>>> +    if ( likely(!d->is_dying) )
>>>> +    {
>>>> +        alive = true;
>>>> +        page_list_add(pg, &hd->arch.pgtables.list);
>>>> +    }
>>>>        spin_unlock(&hd->arch.pgtables.lock);
>>>>    
>>>> +    if ( unlikely(!alive) )
>>>> +    {
>>>> +        free_domheap_page(pg);
>>>> +        pg = NULL;
>>>> +    }
>>>> +
>>>>        return pg;
>>>>    }
>>>
>>> As before I'm concerned of this forcing error paths to be taken
>>> elsewhere, in case an allocation still happens (e.g. from unmap
>>> once super page mappings are supported). Considering some of the
>>> error handling in the IOMMU code is to invoke domain_crash(), it
>>> would be quite unfortunate if we ended up crashing a domain
>>> while it is being cleaned up after.
>>
>> It is unfortunate, but I think this is better than having to leak page
>> tables.
>>
>>>
>>> Additionally, the (at present still hypothetical) unmap case, if
>>> failing because of the change here, would then again chance to
>>> leave mappings in place while the underlying pages get freed. As
>>> this would likely require an XSA, the change doesn't feel like
>>> "hardening" to me.
>>
>> I would agree with this if memory allocations could never fail. That's
>> not that case and will become worse as we use IOMMU pool.
>>
>> Do you have callers in mind that doesn't check the returns of iommu_unmap()?
> 
> The function is marked __must_check, so there won't be any direct
> callers ignoring errors (albeit I may be wrong here - we used to
> have cases where we simply suppressed the resulting compiler
> diagnostic, without really handling errors; not sure if all of
> these are gone by now). Risks might be elsewhere.

But this is not a new risk. So I don't understand why you think my patch 
is the one that may lead to an XSA in the future.

What my patch could do is expose such issues more easily rather than 
waiting until an OOM condition.

Cheers,
Jan Beulich Feb. 18, 2021, 5:04 p.m. UTC | #5
On 18.02.2021 14:19, Julien Grall wrote:
> 
> 
> On 18/02/2021 13:10, Jan Beulich wrote:
>> On 17.02.2021 17:29, Julien Grall wrote:
>>> On 17/02/2021 15:13, Jan Beulich wrote:
>>>> On 17.02.2021 15:24, Julien Grall wrote:> --- a/xen/drivers/passthrough/x86/iommu.c> +++ b/xen/drivers/passthrough/x86/iommu.c> @@ -149,6 +149,13 @@ int arch_iommu_domain_init(struct domain *d)>  >  void arch_iommu_domain_destroy(struct domain *d)>  {> +    /*> +     * There should be not page-tables left allocated by the time the
>>>> Nit: s/not/no/ ?
>>>>
>>>>> +     * domain is destroyed. Note that arch_iommu_domain_destroy() is
>>>>> +     * called unconditionally, so pgtables may be unitialized.
>>>>> +     */
>>>>> +    ASSERT(dom_iommu(d)->platform_ops == NULL ||
>>>>> +           page_list_empty(&dom_iommu(d)->arch.pgtables.list));
>>>>>    }
>>>>>    
>>>>>    static bool __hwdom_init hwdom_iommu_map(const struct domain *d,
>>>>> @@ -279,6 +286,9 @@ int iommu_free_pgtables(struct domain *d)
>>>>>         */
>>>>>        hd->platform_ops->clear_root_pgtable(d);
>>>>>    
>>>>> +    /* After this barrier no new page allocations can occur. */
>>>>> +    spin_barrier(&hd->arch.pgtables.lock);
>>>>
>>>> Didn't patch 2 utilize the call to ->clear_root_pgtable() itself as
>>>> the barrier? Why introduce another one (with a similar comment)
>>>> explicitly now?
>>> The barriers act differently, one will get against any IOMMU page-tables
>>> modification. The other one will gate against allocation.
>>>
>>> There is no guarantee that the former will prevent the latter.
>>
>> Oh, right - different locks. I got confused here because in both
>> cases the goal is to prevent allocations.
>>
>>>>> @@ -315,9 +326,29 @@ struct page_info *iommu_alloc_pgtable(struct domain *d)
>>>>>        unmap_domain_page(p);
>>>>>    
>>>>>        spin_lock(&hd->arch.pgtables.lock);
>>>>> -    page_list_add(pg, &hd->arch.pgtables.list);
>>>>> +    /*
>>>>> +     * The IOMMU page-tables are freed when relinquishing the domain, but
>>>>> +     * nothing prevent allocation to happen afterwards. There is no valid
>>>>> +     * reasons to continue to update the IOMMU page-tables while the
>>>>> +     * domain is dying.
>>>>> +     *
>>>>> +     * So prevent page-table allocation when the domain is dying.
>>>>> +     *
>>>>> +     * We relying on &hd->arch.pgtables.lock to synchronize d->is_dying.
>>>>> +     */
>>>>> +    if ( likely(!d->is_dying) )
>>>>> +    {
>>>>> +        alive = true;
>>>>> +        page_list_add(pg, &hd->arch.pgtables.list);
>>>>> +    }
>>>>>        spin_unlock(&hd->arch.pgtables.lock);
>>>>>    
>>>>> +    if ( unlikely(!alive) )
>>>>> +    {
>>>>> +        free_domheap_page(pg);
>>>>> +        pg = NULL;
>>>>> +    }
>>>>> +
>>>>>        return pg;
>>>>>    }
>>>>
>>>> As before I'm concerned of this forcing error paths to be taken
>>>> elsewhere, in case an allocation still happens (e.g. from unmap
>>>> once super page mappings are supported). Considering some of the
>>>> error handling in the IOMMU code is to invoke domain_crash(), it
>>>> would be quite unfortunate if we ended up crashing a domain
>>>> while it is being cleaned up after.
>>>
>>> It is unfortunate, but I think this is better than having to leak page
>>> tables.
>>>
>>>>
>>>> Additionally, the (at present still hypothetical) unmap case, if
>>>> failing because of the change here, would then again chance to
>>>> leave mappings in place while the underlying pages get freed. As
>>>> this would likely require an XSA, the change doesn't feel like
>>>> "hardening" to me.
>>>
>>> I would agree with this if memory allocations could never fail. That's
>>> not that case and will become worse as we use IOMMU pool.
>>>
>>> Do you have callers in mind that doesn't check the returns of iommu_unmap()?
>>
>> The function is marked __must_check, so there won't be any direct
>> callers ignoring errors (albeit I may be wrong here - we used to
>> have cases where we simply suppressed the resulting compiler
>> diagnostic, without really handling errors; not sure if all of
>> these are gone by now). Risks might be elsewhere.
> 
> But this is not a new risk. So I don't understand why you think my patch 
> is the one that may lead to an XSA in the future.

I didn't mean to imply it would _lead_ to an XSA (you're
right that the problem was there already before), but the term
"harden" suggests to me that the patch aims at eliminating
possible conditions. IOW the result here looks to me as if it
would yield a false sense of safety.

Jan
Julien Grall Feb. 18, 2021, 5:41 p.m. UTC | #6
On 18/02/2021 17:04, Jan Beulich wrote:
> On 18.02.2021 14:19, Julien Grall wrote:
>>
>>
>> On 18/02/2021 13:10, Jan Beulich wrote:
>>> On 17.02.2021 17:29, Julien Grall wrote:
>>>> On 17/02/2021 15:13, Jan Beulich wrote:
>>>>> On 17.02.2021 15:24, Julien Grall wrote:> --- a/xen/drivers/passthrough/x86/iommu.c> +++ b/xen/drivers/passthrough/x86/iommu.c> @@ -149,6 +149,13 @@ int arch_iommu_domain_init(struct domain *d)>  >  void arch_iommu_domain_destroy(struct domain *d)>  {> +    /*> +     * There should be not page-tables left allocated by the time the
>>>>> Nit: s/not/no/ ?
>>>>>
>>>>>> +     * domain is destroyed. Note that arch_iommu_domain_destroy() is
>>>>>> +     * called unconditionally, so pgtables may be unitialized.
>>>>>> +     */
>>>>>> +    ASSERT(dom_iommu(d)->platform_ops == NULL ||
>>>>>> +           page_list_empty(&dom_iommu(d)->arch.pgtables.list));
>>>>>>     }
>>>>>>     
>>>>>>     static bool __hwdom_init hwdom_iommu_map(const struct domain *d,
>>>>>> @@ -279,6 +286,9 @@ int iommu_free_pgtables(struct domain *d)
>>>>>>          */
>>>>>>         hd->platform_ops->clear_root_pgtable(d);
>>>>>>     
>>>>>> +    /* After this barrier no new page allocations can occur. */
>>>>>> +    spin_barrier(&hd->arch.pgtables.lock);
>>>>>
>>>>> Didn't patch 2 utilize the call to ->clear_root_pgtable() itself as
>>>>> the barrier? Why introduce another one (with a similar comment)
>>>>> explicitly now?
>>>> The barriers act differently, one will get against any IOMMU page-tables
>>>> modification. The other one will gate against allocation.
>>>>
>>>> There is no guarantee that the former will prevent the latter.
>>>
>>> Oh, right - different locks. I got confused here because in both
>>> cases the goal is to prevent allocations.
>>>
>>>>>> @@ -315,9 +326,29 @@ struct page_info *iommu_alloc_pgtable(struct domain *d)
>>>>>>         unmap_domain_page(p);
>>>>>>     
>>>>>>         spin_lock(&hd->arch.pgtables.lock);
>>>>>> -    page_list_add(pg, &hd->arch.pgtables.list);
>>>>>> +    /*
>>>>>> +     * The IOMMU page-tables are freed when relinquishing the domain, but
>>>>>> +     * nothing prevent allocation to happen afterwards. There is no valid
>>>>>> +     * reasons to continue to update the IOMMU page-tables while the
>>>>>> +     * domain is dying.
>>>>>> +     *
>>>>>> +     * So prevent page-table allocation when the domain is dying.
>>>>>> +     *
>>>>>> +     * We relying on &hd->arch.pgtables.lock to synchronize d->is_dying.
>>>>>> +     */
>>>>>> +    if ( likely(!d->is_dying) )
>>>>>> +    {
>>>>>> +        alive = true;
>>>>>> +        page_list_add(pg, &hd->arch.pgtables.list);
>>>>>> +    }
>>>>>>         spin_unlock(&hd->arch.pgtables.lock);
>>>>>>     
>>>>>> +    if ( unlikely(!alive) )
>>>>>> +    {
>>>>>> +        free_domheap_page(pg);
>>>>>> +        pg = NULL;
>>>>>> +    }
>>>>>> +
>>>>>>         return pg;
>>>>>>     }
>>>>>
>>>>> As before I'm concerned of this forcing error paths to be taken
>>>>> elsewhere, in case an allocation still happens (e.g. from unmap
>>>>> once super page mappings are supported). Considering some of the
>>>>> error handling in the IOMMU code is to invoke domain_crash(), it
>>>>> would be quite unfortunate if we ended up crashing a domain
>>>>> while it is being cleaned up after.
>>>>
>>>> It is unfortunate, but I think this is better than having to leak page
>>>> tables.
>>>>
>>>>>
>>>>> Additionally, the (at present still hypothetical) unmap case, if
>>>>> failing because of the change here, would then again chance to
>>>>> leave mappings in place while the underlying pages get freed. As
>>>>> this would likely require an XSA, the change doesn't feel like
>>>>> "hardening" to me.
>>>>
>>>> I would agree with this if memory allocations could never fail. That's
>>>> not that case and will become worse as we use IOMMU pool.
>>>>
>>>> Do you have callers in mind that doesn't check the returns of iommu_unmap()?
>>>
>>> The function is marked __must_check, so there won't be any direct
>>> callers ignoring errors (albeit I may be wrong here - we used to
>>> have cases where we simply suppressed the resulting compiler
>>> diagnostic, without really handling errors; not sure if all of
>>> these are gone by now). Risks might be elsewhere.
>>
>> But this is not a new risk. So I don't understand why you think my patch
>> is the one that may lead to an XSA in the future.
> 
> I didn't mean to imply it would _lead_ to an XSA (you're
> right that the problem was there already before), but the term
> "harden" suggests to me that the patch aims at eliminating
> possible conditions.

It elimitates the risk that someone inadvertently call 
iommu_alloc_pgtable() when the domain is dying. If this is happening 
after the page tables have been freed, then we would end up to leak memory.

> IOW the result here looks to me as if it
> would yield a false sense of safety.

So you are concerned about the wording rather than the code itself. Is 
that correct?

If so, how about "xen/iommu: Make the IOMMU page-table allocator 
slightly firmer"?

Cheers,
Jan Beulich Feb. 19, 2021, 8:46 a.m. UTC | #7
On 18.02.2021 18:41, Julien Grall wrote:
> 
> 
> On 18/02/2021 17:04, Jan Beulich wrote:
>> On 18.02.2021 14:19, Julien Grall wrote:
>>>
>>>
>>> On 18/02/2021 13:10, Jan Beulich wrote:
>>>> On 17.02.2021 17:29, Julien Grall wrote:
>>>>> On 17/02/2021 15:13, Jan Beulich wrote:
>>>>>> On 17.02.2021 15:24, Julien Grall wrote:> --- a/xen/drivers/passthrough/x86/iommu.c> +++ b/xen/drivers/passthrough/x86/iommu.c> @@ -149,6 +149,13 @@ int arch_iommu_domain_init(struct domain *d)>  >  void arch_iommu_domain_destroy(struct domain *d)>  {> +    /*> +     * There should be not page-tables left allocated by the time the
>>>>>> Nit: s/not/no/ ?
>>>>>>
>>>>>>> +     * domain is destroyed. Note that arch_iommu_domain_destroy() is
>>>>>>> +     * called unconditionally, so pgtables may be unitialized.
>>>>>>> +     */
>>>>>>> +    ASSERT(dom_iommu(d)->platform_ops == NULL ||
>>>>>>> +           page_list_empty(&dom_iommu(d)->arch.pgtables.list));
>>>>>>>     }
>>>>>>>     
>>>>>>>     static bool __hwdom_init hwdom_iommu_map(const struct domain *d,
>>>>>>> @@ -279,6 +286,9 @@ int iommu_free_pgtables(struct domain *d)
>>>>>>>          */
>>>>>>>         hd->platform_ops->clear_root_pgtable(d);
>>>>>>>     
>>>>>>> +    /* After this barrier no new page allocations can occur. */
>>>>>>> +    spin_barrier(&hd->arch.pgtables.lock);
>>>>>>
>>>>>> Didn't patch 2 utilize the call to ->clear_root_pgtable() itself as
>>>>>> the barrier? Why introduce another one (with a similar comment)
>>>>>> explicitly now?
>>>>> The barriers act differently, one will get against any IOMMU page-tables
>>>>> modification. The other one will gate against allocation.
>>>>>
>>>>> There is no guarantee that the former will prevent the latter.
>>>>
>>>> Oh, right - different locks. I got confused here because in both
>>>> cases the goal is to prevent allocations.
>>>>
>>>>>>> @@ -315,9 +326,29 @@ struct page_info *iommu_alloc_pgtable(struct domain *d)
>>>>>>>         unmap_domain_page(p);
>>>>>>>     
>>>>>>>         spin_lock(&hd->arch.pgtables.lock);
>>>>>>> -    page_list_add(pg, &hd->arch.pgtables.list);
>>>>>>> +    /*
>>>>>>> +     * The IOMMU page-tables are freed when relinquishing the domain, but
>>>>>>> +     * nothing prevent allocation to happen afterwards. There is no valid
>>>>>>> +     * reasons to continue to update the IOMMU page-tables while the
>>>>>>> +     * domain is dying.
>>>>>>> +     *
>>>>>>> +     * So prevent page-table allocation when the domain is dying.
>>>>>>> +     *
>>>>>>> +     * We relying on &hd->arch.pgtables.lock to synchronize d->is_dying.
>>>>>>> +     */
>>>>>>> +    if ( likely(!d->is_dying) )
>>>>>>> +    {
>>>>>>> +        alive = true;
>>>>>>> +        page_list_add(pg, &hd->arch.pgtables.list);
>>>>>>> +    }
>>>>>>>         spin_unlock(&hd->arch.pgtables.lock);
>>>>>>>     
>>>>>>> +    if ( unlikely(!alive) )
>>>>>>> +    {
>>>>>>> +        free_domheap_page(pg);
>>>>>>> +        pg = NULL;
>>>>>>> +    }
>>>>>>> +
>>>>>>>         return pg;
>>>>>>>     }
>>>>>>
>>>>>> As before I'm concerned of this forcing error paths to be taken
>>>>>> elsewhere, in case an allocation still happens (e.g. from unmap
>>>>>> once super page mappings are supported). Considering some of the
>>>>>> error handling in the IOMMU code is to invoke domain_crash(), it
>>>>>> would be quite unfortunate if we ended up crashing a domain
>>>>>> while it is being cleaned up after.
>>>>>
>>>>> It is unfortunate, but I think this is better than having to leak page
>>>>> tables.
>>>>>
>>>>>>
>>>>>> Additionally, the (at present still hypothetical) unmap case, if
>>>>>> failing because of the change here, would then again chance to
>>>>>> leave mappings in place while the underlying pages get freed. As
>>>>>> this would likely require an XSA, the change doesn't feel like
>>>>>> "hardening" to me.
>>>>>
>>>>> I would agree with this if memory allocations could never fail. That's
>>>>> not that case and will become worse as we use IOMMU pool.
>>>>>
>>>>> Do you have callers in mind that doesn't check the returns of iommu_unmap()?
>>>>
>>>> The function is marked __must_check, so there won't be any direct
>>>> callers ignoring errors (albeit I may be wrong here - we used to
>>>> have cases where we simply suppressed the resulting compiler
>>>> diagnostic, without really handling errors; not sure if all of
>>>> these are gone by now). Risks might be elsewhere.
>>>
>>> But this is not a new risk. So I don't understand why you think my patch
>>> is the one that may lead to an XSA in the future.
>>
>> I didn't mean to imply it would _lead_ to an XSA (you're
>> right that the problem was there already before), but the term
>> "harden" suggests to me that the patch aims at eliminating
>> possible conditions.
> 
> It elimitates the risk that someone inadvertently call 
> iommu_alloc_pgtable() when the domain is dying. If this is happening 
> after the page tables have been freed, then we would end up to leak memory.
> 
>> IOW the result here looks to me as if it
>> would yield a false sense of safety.
> 
> So you are concerned about the wording rather than the code itself. Is 
> that correct?

In a way, yes. First of all I'd like us to settle on what to do
with late unmap requests, for 4.15 (and if need be longer term).

Jan

> If so, how about "xen/iommu: Make the IOMMU page-table allocator 
> slightly firmer"?
> 
> Cheers,
>
Julien Grall Feb. 19, 2021, 8:57 a.m. UTC | #8
On 19/02/2021 08:46, Jan Beulich wrote:
> On 18.02.2021 18:41, Julien Grall wrote:
>>
>>
>> On 18/02/2021 17:04, Jan Beulich wrote:
>>> On 18.02.2021 14:19, Julien Grall wrote:
>>>>
>>>>
>>>> On 18/02/2021 13:10, Jan Beulich wrote:
>>>>> On 17.02.2021 17:29, Julien Grall wrote:
>>>>>> On 17/02/2021 15:13, Jan Beulich wrote:
>>>>>>> On 17.02.2021 15:24, Julien Grall wrote:> --- a/xen/drivers/passthrough/x86/iommu.c> +++ b/xen/drivers/passthrough/x86/iommu.c> @@ -149,6 +149,13 @@ int arch_iommu_domain_init(struct domain *d)>  >  void arch_iommu_domain_destroy(struct domain *d)>  {> +    /*> +     * There should be not page-tables left allocated by the time the
>>>>>>> Nit: s/not/no/ ?
>>>>>>>
>>>>>>>> +     * domain is destroyed. Note that arch_iommu_domain_destroy() is
>>>>>>>> +     * called unconditionally, so pgtables may be unitialized.
>>>>>>>> +     */
>>>>>>>> +    ASSERT(dom_iommu(d)->platform_ops == NULL ||
>>>>>>>> +           page_list_empty(&dom_iommu(d)->arch.pgtables.list));
>>>>>>>>      }
>>>>>>>>      
>>>>>>>>      static bool __hwdom_init hwdom_iommu_map(const struct domain *d,
>>>>>>>> @@ -279,6 +286,9 @@ int iommu_free_pgtables(struct domain *d)
>>>>>>>>           */
>>>>>>>>          hd->platform_ops->clear_root_pgtable(d);
>>>>>>>>      
>>>>>>>> +    /* After this barrier no new page allocations can occur. */
>>>>>>>> +    spin_barrier(&hd->arch.pgtables.lock);
>>>>>>>
>>>>>>> Didn't patch 2 utilize the call to ->clear_root_pgtable() itself as
>>>>>>> the barrier? Why introduce another one (with a similar comment)
>>>>>>> explicitly now?
>>>>>> The barriers act differently, one will get against any IOMMU page-tables
>>>>>> modification. The other one will gate against allocation.
>>>>>>
>>>>>> There is no guarantee that the former will prevent the latter.
>>>>>
>>>>> Oh, right - different locks. I got confused here because in both
>>>>> cases the goal is to prevent allocations.
>>>>>
>>>>>>>> @@ -315,9 +326,29 @@ struct page_info *iommu_alloc_pgtable(struct domain *d)
>>>>>>>>          unmap_domain_page(p);
>>>>>>>>      
>>>>>>>>          spin_lock(&hd->arch.pgtables.lock);
>>>>>>>> -    page_list_add(pg, &hd->arch.pgtables.list);
>>>>>>>> +    /*
>>>>>>>> +     * The IOMMU page-tables are freed when relinquishing the domain, but
>>>>>>>> +     * nothing prevent allocation to happen afterwards. There is no valid
>>>>>>>> +     * reasons to continue to update the IOMMU page-tables while the
>>>>>>>> +     * domain is dying.
>>>>>>>> +     *
>>>>>>>> +     * So prevent page-table allocation when the domain is dying.
>>>>>>>> +     *
>>>>>>>> +     * We relying on &hd->arch.pgtables.lock to synchronize d->is_dying.
>>>>>>>> +     */
>>>>>>>> +    if ( likely(!d->is_dying) )
>>>>>>>> +    {
>>>>>>>> +        alive = true;
>>>>>>>> +        page_list_add(pg, &hd->arch.pgtables.list);
>>>>>>>> +    }
>>>>>>>>          spin_unlock(&hd->arch.pgtables.lock);
>>>>>>>>      
>>>>>>>> +    if ( unlikely(!alive) )
>>>>>>>> +    {
>>>>>>>> +        free_domheap_page(pg);
>>>>>>>> +        pg = NULL;
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>>          return pg;
>>>>>>>>      }
>>>>>>>
>>>>>>> As before I'm concerned of this forcing error paths to be taken
>>>>>>> elsewhere, in case an allocation still happens (e.g. from unmap
>>>>>>> once super page mappings are supported). Considering some of the
>>>>>>> error handling in the IOMMU code is to invoke domain_crash(), it
>>>>>>> would be quite unfortunate if we ended up crashing a domain
>>>>>>> while it is being cleaned up after.
>>>>>>
>>>>>> It is unfortunate, but I think this is better than having to leak page
>>>>>> tables.
>>>>>>
>>>>>>>
>>>>>>> Additionally, the (at present still hypothetical) unmap case, if
>>>>>>> failing because of the change here, would then again chance to
>>>>>>> leave mappings in place while the underlying pages get freed. As
>>>>>>> this would likely require an XSA, the change doesn't feel like
>>>>>>> "hardening" to me.
>>>>>>
>>>>>> I would agree with this if memory allocations could never fail. That's
>>>>>> not that case and will become worse as we use IOMMU pool.
>>>>>>
>>>>>> Do you have callers in mind that doesn't check the returns of iommu_unmap()?
>>>>>
>>>>> The function is marked __must_check, so there won't be any direct
>>>>> callers ignoring errors (albeit I may be wrong here - we used to
>>>>> have cases where we simply suppressed the resulting compiler
>>>>> diagnostic, without really handling errors; not sure if all of
>>>>> these are gone by now). Risks might be elsewhere.
>>>>
>>>> But this is not a new risk. So I don't understand why you think my patch
>>>> is the one that may lead to an XSA in the future.
>>>
>>> I didn't mean to imply it would _lead_ to an XSA (you're
>>> right that the problem was there already before), but the term
>>> "harden" suggests to me that the patch aims at eliminating
>>> possible conditions.
>>
>> It elimitates the risk that someone inadvertently call
>> iommu_alloc_pgtable() when the domain is dying. If this is happening
>> after the page tables have been freed, then we would end up to leak memory.
>>
>>> IOW the result here looks to me as if it
>>> would yield a false sense of safety.
>>
>> So you are concerned about the wording rather than the code itself. Is
>> that correct?
> 
> In a way, yes. First of all I'd like us to settle on what to do
> with late unmap requests, for 4.15 (and if need be longer term).

iommu_unmap() doesn't allocate memory today and very unlikely going to 
do for the lifetime of 4.15. So why should we address the late unmap 
requests in 4.15?

At best, to me this looks like introduce a risk for fixing a so far 
inexistent bug.

Cheers,
diff mbox series

Patch

diff --git a/xen/drivers/passthrough/x86/iommu.c b/xen/drivers/passthrough/x86/iommu.c
index faa0078db595..a67075f0045d 100644
--- a/xen/drivers/passthrough/x86/iommu.c
+++ b/xen/drivers/passthrough/x86/iommu.c
@@ -149,6 +149,13 @@  int arch_iommu_domain_init(struct domain *d)
 
 void arch_iommu_domain_destroy(struct domain *d)
 {
+    /*
+     * There should be not page-tables left allocated by the time the
+     * domain is destroyed. Note that arch_iommu_domain_destroy() is
+     * called unconditionally, so pgtables may be unitialized.
+     */
+    ASSERT(dom_iommu(d)->platform_ops == NULL ||
+           page_list_empty(&dom_iommu(d)->arch.pgtables.list));
 }
 
 static bool __hwdom_init hwdom_iommu_map(const struct domain *d,
@@ -279,6 +286,9 @@  int iommu_free_pgtables(struct domain *d)
      */
     hd->platform_ops->clear_root_pgtable(d);
 
+    /* After this barrier no new page allocations can occur. */
+    spin_barrier(&hd->arch.pgtables.lock);
+
     while ( (pg = page_list_remove_head(&hd->arch.pgtables.list)) )
     {
         free_domheap_page(pg);
@@ -296,6 +306,7 @@  struct page_info *iommu_alloc_pgtable(struct domain *d)
     unsigned int memflags = 0;
     struct page_info *pg;
     void *p;
+    bool alive = false;
 
 #ifdef CONFIG_NUMA
     if ( hd->node != NUMA_NO_NODE )
@@ -315,9 +326,29 @@  struct page_info *iommu_alloc_pgtable(struct domain *d)
     unmap_domain_page(p);
 
     spin_lock(&hd->arch.pgtables.lock);
-    page_list_add(pg, &hd->arch.pgtables.list);
+    /*
+     * The IOMMU page-tables are freed when relinquishing the domain, but
+     * nothing prevent allocation to happen afterwards. There is no valid
+     * reasons to continue to update the IOMMU page-tables while the
+     * domain is dying.
+     *
+     * So prevent page-table allocation when the domain is dying.
+     *
+     * We relying on &hd->arch.pgtables.lock to synchronize d->is_dying.
+     */
+    if ( likely(!d->is_dying) )
+    {
+        alive = true;
+        page_list_add(pg, &hd->arch.pgtables.list);
+    }
     spin_unlock(&hd->arch.pgtables.lock);
 
+    if ( unlikely(!alive) )
+    {
+        free_domheap_page(pg);
+        pg = NULL;
+    }
+
     return pg;
 }