diff mbox series

xen/arm: optee: Allocate anonymous domheap pages

Message ID 1630935741-17088-1-git-send-email-olekstysh@gmail.com (mailing list archive)
State New, archived
Headers show
Series xen/arm: optee: Allocate anonymous domheap pages | expand

Commit Message

Oleksandr Tyshchenko Sept. 6, 2021, 1:42 p.m. UTC
From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

Allocate anonymous domheap pages as there is no strict need to
account them to a particular domain.

Since XSA-383 "xen/arm: Restrict the amount of memory that dom0less
domU and dom0 can allocate" the dom0 cannot allocate memory outside
of the pre-allocated region. This means if we try to allocate
non-anonymous page to be accounted to dom0 we will get an
over-allocation issue when assigning that page to the domain.
The anonymous page, in turn, is not assigned to any domain.

CC: Julien Grall <jgrall@amazon.com>
Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Acked-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
---
 xen/arch/arm/tee/optee.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

Comments

Oleksandr Tyshchenko Sept. 23, 2021, 2:36 p.m. UTC | #1
Hello all.


Gentle reminder.


On 06.09.21 16:42, Oleksandr Tyshchenko wrote:
> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>
> Allocate anonymous domheap pages as there is no strict need to
> account them to a particular domain.
>
> Since XSA-383 "xen/arm: Restrict the amount of memory that dom0less
> domU and dom0 can allocate" the dom0 cannot allocate memory outside
> of the pre-allocated region. This means if we try to allocate
> non-anonymous page to be accounted to dom0 we will get an
> over-allocation issue when assigning that page to the domain.
> The anonymous page, in turn, is not assigned to any domain.
>
> CC: Julien Grall <jgrall@amazon.com>
> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> Acked-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
> ---
>   xen/arch/arm/tee/optee.c | 6 +++---
>   1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/xen/arch/arm/tee/optee.c b/xen/arch/arm/tee/optee.c
> index 3453615..83b4994 100644
> --- a/xen/arch/arm/tee/optee.c
> +++ b/xen/arch/arm/tee/optee.c
> @@ -410,7 +410,7 @@ static struct shm_rpc *allocate_and_pin_shm_rpc(struct optee_domain *ctx,
>       if ( !shm_rpc )
>           return ERR_PTR(-ENOMEM);
>   
> -    shm_rpc->xen_arg_pg = alloc_domheap_page(current->domain, 0);
> +    shm_rpc->xen_arg_pg = alloc_domheap_page(NULL, 0);
>       if ( !shm_rpc->xen_arg_pg )
>       {
>           xfree(shm_rpc);
> @@ -774,7 +774,7 @@ static int translate_noncontig(struct optee_domain *ctx,
>        * - There is a plan to implement preemption in the code below, which
>        *   will allow use to increase default MAX_SHM_BUFFER_PG value.
>        */
> -    xen_pgs = alloc_domheap_pages(current->domain, order, 0);
> +    xen_pgs = alloc_domheap_pages(NULL, order, 0);
>       if ( !xen_pgs )
>           return -ENOMEM;
>   
> @@ -938,7 +938,7 @@ static bool copy_std_request(struct cpu_user_regs *regs,
>   
>       BUILD_BUG_ON(OPTEE_MSG_NONCONTIG_PAGE_SIZE > PAGE_SIZE);
>   
> -    call->xen_arg_pg = alloc_domheap_page(current->domain, 0);
> +    call->xen_arg_pg = alloc_domheap_page(NULL, 0);
>       if ( !call->xen_arg_pg )
>       {
>           set_user_reg(regs, 0, OPTEE_SMC_RETURN_ENOMEM);
Bertrand Marquis Sept. 23, 2021, 2:51 p.m. UTC | #2
Hi Oleksandr,

> On 6 Sep 2021, at 14:42, Oleksandr Tyshchenko <olekstysh@gmail.com> wrote:
> 
> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> 
> Allocate anonymous domheap pages as there is no strict need to
> account them to a particular domain.
> 
> Since XSA-383 "xen/arm: Restrict the amount of memory that dom0less
> domU and dom0 can allocate" the dom0 cannot allocate memory outside
> of the pre-allocated region. This means if we try to allocate
> non-anonymous page to be accounted to dom0 we will get an
> over-allocation issue when assigning that page to the domain.
> The anonymous page, in turn, is not assigned to any domain.
> 
> CC: Julien Grall <jgrall@amazon.com>
> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> Acked-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>

Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>

Cheers
Bertrand

> ---
> xen/arch/arm/tee/optee.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/xen/arch/arm/tee/optee.c b/xen/arch/arm/tee/optee.c
> index 3453615..83b4994 100644
> --- a/xen/arch/arm/tee/optee.c
> +++ b/xen/arch/arm/tee/optee.c
> @@ -410,7 +410,7 @@ static struct shm_rpc *allocate_and_pin_shm_rpc(struct optee_domain *ctx,
>     if ( !shm_rpc )
>         return ERR_PTR(-ENOMEM);
> 
> -    shm_rpc->xen_arg_pg = alloc_domheap_page(current->domain, 0);
> +    shm_rpc->xen_arg_pg = alloc_domheap_page(NULL, 0);
>     if ( !shm_rpc->xen_arg_pg )
>     {
>         xfree(shm_rpc);
> @@ -774,7 +774,7 @@ static int translate_noncontig(struct optee_domain *ctx,
>      * - There is a plan to implement preemption in the code below, which
>      *   will allow use to increase default MAX_SHM_BUFFER_PG value.
>      */
> -    xen_pgs = alloc_domheap_pages(current->domain, order, 0);
> +    xen_pgs = alloc_domheap_pages(NULL, order, 0);
>     if ( !xen_pgs )
>         return -ENOMEM;
> 
> @@ -938,7 +938,7 @@ static bool copy_std_request(struct cpu_user_regs *regs,
> 
>     BUILD_BUG_ON(OPTEE_MSG_NONCONTIG_PAGE_SIZE > PAGE_SIZE);
> 
> -    call->xen_arg_pg = alloc_domheap_page(current->domain, 0);
> +    call->xen_arg_pg = alloc_domheap_page(NULL, 0);
>     if ( !call->xen_arg_pg )
>     {
>         set_user_reg(regs, 0, OPTEE_SMC_RETURN_ENOMEM);
> -- 
> 2.7.4
> 
>
Stefano Stabellini Sept. 23, 2021, 8:08 p.m. UTC | #3
On Mon, 6 Sep 2021, Oleksandr Tyshchenko wrote:
> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> 
> Allocate anonymous domheap pages as there is no strict need to
> account them to a particular domain.
> 
> Since XSA-383 "xen/arm: Restrict the amount of memory that dom0less
> domU and dom0 can allocate" the dom0 cannot allocate memory outside
> of the pre-allocated region. This means if we try to allocate
> non-anonymous page to be accounted to dom0 we will get an
> over-allocation issue when assigning that page to the domain.
> The anonymous page, in turn, is not assigned to any domain.
> 
> CC: Julien Grall <jgrall@amazon.com>
> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> Acked-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>

Only one question, which is more architectural: given that these pages
are "unlimited", could the guest exploit the interface somehow to force
Xen to allocate an very high number of anonymous pages?

E.g. could a domain call OPTEE_SMC_RPC_FUNC_ALLOC in a loop to force Xen
to exaust all memory pages?



> ---
>  xen/arch/arm/tee/optee.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/xen/arch/arm/tee/optee.c b/xen/arch/arm/tee/optee.c
> index 3453615..83b4994 100644
> --- a/xen/arch/arm/tee/optee.c
> +++ b/xen/arch/arm/tee/optee.c
> @@ -410,7 +410,7 @@ static struct shm_rpc *allocate_and_pin_shm_rpc(struct optee_domain *ctx,
>      if ( !shm_rpc )
>          return ERR_PTR(-ENOMEM);
>  
> -    shm_rpc->xen_arg_pg = alloc_domheap_page(current->domain, 0);
> +    shm_rpc->xen_arg_pg = alloc_domheap_page(NULL, 0);
>      if ( !shm_rpc->xen_arg_pg )
>      {
>          xfree(shm_rpc);
> @@ -774,7 +774,7 @@ static int translate_noncontig(struct optee_domain *ctx,
>       * - There is a plan to implement preemption in the code below, which
>       *   will allow use to increase default MAX_SHM_BUFFER_PG value.
>       */
> -    xen_pgs = alloc_domheap_pages(current->domain, order, 0);
> +    xen_pgs = alloc_domheap_pages(NULL, order, 0);
>      if ( !xen_pgs )
>          return -ENOMEM;
>  
> @@ -938,7 +938,7 @@ static bool copy_std_request(struct cpu_user_regs *regs,
>  
>      BUILD_BUG_ON(OPTEE_MSG_NONCONTIG_PAGE_SIZE > PAGE_SIZE);
>  
> -    call->xen_arg_pg = alloc_domheap_page(current->domain, 0);
> +    call->xen_arg_pg = alloc_domheap_page(NULL, 0);
>      if ( !call->xen_arg_pg )
>      {
>          set_user_reg(regs, 0, OPTEE_SMC_RETURN_ENOMEM);
> -- 
> 2.7.4
>
Volodymyr Babchuk Sept. 23, 2021, 8:57 p.m. UTC | #4
Hi Stefano,

Stefano Stabellini <sstabellini@kernel.org> writes:

> On Mon, 6 Sep 2021, Oleksandr Tyshchenko wrote:
>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>> 
>> Allocate anonymous domheap pages as there is no strict need to
>> account them to a particular domain.
>> 
>> Since XSA-383 "xen/arm: Restrict the amount of memory that dom0less
>> domU and dom0 can allocate" the dom0 cannot allocate memory outside
>> of the pre-allocated region. This means if we try to allocate
>> non-anonymous page to be accounted to dom0 we will get an
>> over-allocation issue when assigning that page to the domain.
>> The anonymous page, in turn, is not assigned to any domain.
>> 
>> CC: Julien Grall <jgrall@amazon.com>
>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>> Acked-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
>
> Only one question, which is more architectural: given that these pages
> are "unlimited", could the guest exploit the interface somehow to force
> Xen to allocate an very high number of anonymous pages?
>
> E.g. could a domain call OPTEE_SMC_RPC_FUNC_ALLOC in a loop to force Xen
> to exaust all memory pages?

Generally, OP-TEE mediator tracks all resources allocated and imposes
limits on them.

OPTEE_SMC_RPC_FUNC_ALLOC case is a bit different, because it is issued
not by domain, but by OP-TEE itself. As OP-TEE is more trusted piece of
system we allow it to request as many buffers as it wants. Also, we know
that OP-TEE asks only for one such buffer per every standard call. And
number of simultaneous calls is limited by number of OP-TEE threads,
which is quite low: typically only two.
Oleksandr Tyshchenko Oct. 6, 2021, 1:23 p.m. UTC | #5
Hello all


Gentle reminder.


On 23.09.21 23:57, Volodymyr Babchuk wrote:
> Hi Stefano,
>
> Stefano Stabellini <sstabellini@kernel.org> writes:
>
>> On Mon, 6 Sep 2021, Oleksandr Tyshchenko wrote:
>>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>>
>>> Allocate anonymous domheap pages as there is no strict need to
>>> account them to a particular domain.
>>>
>>> Since XSA-383 "xen/arm: Restrict the amount of memory that dom0less
>>> domU and dom0 can allocate" the dom0 cannot allocate memory outside
>>> of the pre-allocated region. This means if we try to allocate
>>> non-anonymous page to be accounted to dom0 we will get an
>>> over-allocation issue when assigning that page to the domain.
>>> The anonymous page, in turn, is not assigned to any domain.
>>>
>>> CC: Julien Grall <jgrall@amazon.com>
>>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>> Acked-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
>> Only one question, which is more architectural: given that these pages
>> are "unlimited", could the guest exploit the interface somehow to force
>> Xen to allocate an very high number of anonymous pages?
>>
>> E.g. could a domain call OPTEE_SMC_RPC_FUNC_ALLOC in a loop to force Xen
>> to exaust all memory pages?
> Generally, OP-TEE mediator tracks all resources allocated and imposes
> limits on them.
>
> OPTEE_SMC_RPC_FUNC_ALLOC case is a bit different, because it is issued
> not by domain, but by OP-TEE itself. As OP-TEE is more trusted piece of
> system we allow it to request as many buffers as it wants. Also, we know
> that OP-TEE asks only for one such buffer per every standard call. And
> number of simultaneous calls is limited by number of OP-TEE threads,
> which is quite low: typically only two.
>
Stefano Stabellini Oct. 6, 2021, 11:53 p.m. UTC | #6
On Wed, 6 Oct 2021, Oleksandr wrote:
> Hello all
> 
> Gentle reminder.
 
Many thanks for the ping, this patch fell off my radar.


 
> On 23.09.21 23:57, Volodymyr Babchuk wrote:
> > Hi Stefano,
> > 
> > Stefano Stabellini <sstabellini@kernel.org> writes:
> > 
> > > On Mon, 6 Sep 2021, Oleksandr Tyshchenko wrote:
> > > > From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> > > > 
> > > > Allocate anonymous domheap pages as there is no strict need to
> > > > account them to a particular domain.
> > > > 
> > > > Since XSA-383 "xen/arm: Restrict the amount of memory that dom0less
> > > > domU and dom0 can allocate" the dom0 cannot allocate memory outside
> > > > of the pre-allocated region. This means if we try to allocate
> > > > non-anonymous page to be accounted to dom0 we will get an
> > > > over-allocation issue when assigning that page to the domain.
> > > > The anonymous page, in turn, is not assigned to any domain.
> > > > 
> > > > CC: Julien Grall <jgrall@amazon.com>
> > > > Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> > > > Acked-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
> > > Only one question, which is more architectural: given that these pages
> > > are "unlimited", could the guest exploit the interface somehow to force
> > > Xen to allocate an very high number of anonymous pages?
> > > 
> > > E.g. could a domain call OPTEE_SMC_RPC_FUNC_ALLOC in a loop to force Xen
> > > to exaust all memory pages?
> > Generally, OP-TEE mediator tracks all resources allocated and imposes
> > limits on them.
> > 
> > OPTEE_SMC_RPC_FUNC_ALLOC case is a bit different, because it is issued
> > not by domain, but by OP-TEE itself. As OP-TEE is more trusted piece of
> > system we allow it to request as many buffers as it wants. Also, we know
> > that OP-TEE asks only for one such buffer per every standard call. And
> > number of simultaneous calls is limited by number of OP-TEE threads,
> > which is quite low: typically only two.

So let me repeat it differently to see if I understood correctly:

- OPTEE_SMC_RPC_FUNC_ALLOC is only called by OP-TEE, not by the domain
- OPTEE is trusted and only call it twice anyway

I am OK with this argument, but do we have a check to make sure a domU
cannot issue OPTEE_SMC_RPC_FUNC_ALLOC?


Looking at the patch, there are other two places, in addition to
OPTEE_SMC_RPC_FUNC_ALLOC, where the anonymous memory pages can be
allocated:

1) copy_std_request
2) translate_noncontig

We need to prove that neither 1) or 2) can result in a domU exausting
Xen memory.

In the case of 1), it looks like the memory is freed before returning to
the DomU, right? If so, it should be no problem?

In the case of 2), it looks like the memory could outlive the call where
it is allocated. Is there any kind of protection against issuing
something like OPTEE_MSG_ATTR_TYPE_TMEM_INOUT in a loop? Is it OP-TEE
itself that would refuse the attempt? Thus, the idea is that
do_call_with_arg will return error and we'll just free the memory there?

I cannot see a check for errors returned by do_call_with_arg and memory
freeing done because of that. Sorry I am not super familiar with the
code, I am just trying to make sure we are not offering to DomUs an easy
way to crash the system.

It looks like they could be called from one of the OPTEE operations that
a domU could request? Is there a limit for them?
Volodymyr Babchuk Oct. 7, 2021, 5:57 p.m. UTC | #7
Hi Stefano,

Stefano Stabellini <sstabellini@kernel.org> writes:

> On Wed, 6 Oct 2021, Oleksandr wrote:
>> Hello all
>> 
>> Gentle reminder.
>  
> Many thanks for the ping, this patch fell off my radar.
>
>
>  
>> On 23.09.21 23:57, Volodymyr Babchuk wrote:
>> > Hi Stefano,
>> > 
>> > Stefano Stabellini <sstabellini@kernel.org> writes:
>> > 
>> > > On Mon, 6 Sep 2021, Oleksandr Tyshchenko wrote:
>> > > > From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>> > > > 
>> > > > Allocate anonymous domheap pages as there is no strict need to
>> > > > account them to a particular domain.
>> > > > 
>> > > > Since XSA-383 "xen/arm: Restrict the amount of memory that dom0less
>> > > > domU and dom0 can allocate" the dom0 cannot allocate memory outside
>> > > > of the pre-allocated region. This means if we try to allocate
>> > > > non-anonymous page to be accounted to dom0 we will get an
>> > > > over-allocation issue when assigning that page to the domain.
>> > > > The anonymous page, in turn, is not assigned to any domain.
>> > > > 
>> > > > CC: Julien Grall <jgrall@amazon.com>
>> > > > Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>> > > > Acked-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
>> > > Only one question, which is more architectural: given that these pages
>> > > are "unlimited", could the guest exploit the interface somehow to force
>> > > Xen to allocate an very high number of anonymous pages?
>> > > 
>> > > E.g. could a domain call OPTEE_SMC_RPC_FUNC_ALLOC in a loop to force Xen
>> > > to exaust all memory pages?
>> > Generally, OP-TEE mediator tracks all resources allocated and imposes
>> > limits on them.
>> > 
>> > OPTEE_SMC_RPC_FUNC_ALLOC case is a bit different, because it is issued
>> > not by domain, but by OP-TEE itself. As OP-TEE is more trusted piece of
>> > system we allow it to request as many buffers as it wants. Also, we know
>> > that OP-TEE asks only for one such buffer per every standard call. And
>> > number of simultaneous calls is limited by number of OP-TEE threads,
>> > which is quite low: typically only two.
>
> So let me repeat it differently to see if I understood correctly:
>
> - OPTEE_SMC_RPC_FUNC_ALLOC is only called by OP-TEE, not by the domain
> - OPTEE is trusted and only call it twice anyway

Correct.

> I am OK with this argument, but do we have a check to make sure a domU
> cannot issue OPTEE_SMC_RPC_FUNC_ALLOC?

domU can't issue any RPC, because all RPCs are issued from OP-TEE
side. This is the nature of RPC - OP-TEE requests Normal World for some
service. But of course, Normal World can perform certain actions that
will make OP-TEE to issue a RPC. I discuss this in depth below.

>
> Looking at the patch, there are other two places, in addition to
> OPTEE_SMC_RPC_FUNC_ALLOC, where the anonymous memory pages can be
> allocated:
>
> 1) copy_std_request
> 2) translate_noncontig
>
> We need to prove that neither 1) or 2) can result in a domU exausting
> Xen memory.
>
> In the case of 1), it looks like the memory is freed before returning to
> the DomU, right? If so, it should be no problem?

Yes, mediator makes shadow copy of every request buffer to hide
translated addresses from the guest. Number of requests is limited by
number of OP-TEE threads.

> In the case of 2), it looks like the memory could outlive the call where
> it is allocated. Is there any kind of protection against issuing
> something like OPTEE_MSG_ATTR_TYPE_TMEM_INOUT in a loop? Is it OP-TEE
> itself that would refuse the attempt? Thus, the idea is that
> do_call_with_arg will return error and we'll just free the memory there?

Well, translate_noncontig() calls allocate_optee_shm_buf() which counts
all allocated buffers. So you can't call it more than
MAX_SHM_BUFFER_COUNT times, without de-allocating previous memory. But,
thanks to your question, I have found a bug there: memory is not freed
if allocate_optee_shm_buf() fails. I'll prepare patch later today.

> I cannot see a check for errors returned by do_call_with_arg and memory
> freeing done because of that. Sorry I am not super familiar with the
> code, I am just trying to make sure we are not offering to DomUs an easy
> way to crash the system.

I tried to eliminate all possibilities for a guest to crash the
system. Of course, this does not mean that there are none of them.

And yes, code is a bit hard to understand, because calls to OP-TEE are
stateful and you need to account for that state. From NW and SW this
looks quite fine, because state is handled naturally. But mediator sits
in a middle, so it's implementation is a bit messy.

I'll try to explain what is going on, so you it will be easier to
understand logic in the mediator.

There are two types of OP-TEE calls: fast calls and standard calls. Fast
call is simple: call SMC and get result. It does not allocate thread
context in OP-TEE and is non-preemptive. So yes, it should be fast. It
is used for simple things like "get OP-TEE version" or "exchange
capabilities". It is easy to handle them in mediator: just forward
the call, check result, return it back to a guest.

Standard calls are stateful. OP-TEE allocates thread for each call. This
call can be preempted either by IRQ or by RPC. For consistency IRQ
return is also considered as special type of RPC. So, in general one
standard call can consist of series of SMCs:

--> SMC with request
<-- RPC return (like IRQ)
--> SMC "resume call"
<-- RPC return (like "read disk")
--> SMC "resume call"
<-- RPC return (like "send network packet")
--> SMC "resume call"
...
<-- Final return

There are many types of RPCs: "handle IRQ", additional shared buffer
allocation/de-allocation, RPMB access, disks access, network access,
synchronization primitives (when OP-TEE thread is gets blocked on a
mutex), etc.

Two more things that makes all this worse: Normal World can register
shared buffer with OP-TEE. Such buffer can live indefinitely
long. Also, Normal World decides when to resume call. For example,
calling process can be preempted and then resumed seconds
later. Misbehaving guest can decide to not resume call at all.

As I said, I tried to take all this things into account. There are
basically 3 types of objects that can lead to memory allocation on Xen
side:

1. Standard call context. Besides memory space for struct optee_std_call
itself it allocates page for a shadow buffer, where argument addresses
are translated by Xen. Number of this objects is limited by number of
OP-TEE threads:

    count = atomic_add_unless(&ctx->call_count, 1, max_optee_threads);
    if ( count == max_optee_threads )
        return ERR_PTR(-ENOSPC);

2. Shared buffer. This is a buffer shared by guest with OP-TEE. It can
be either temporary buffer which is shared for one standard call
duration, or registered shared buffer, which is remains active until it
is de-registered. This is where translate_noncontig() comes into play.
Number of this buffers is limited in allocate_optee_shm_buf():

    count = atomic_add_unless(&ctx->optee_shm_buf_count, 1,
                              MAX_SHM_BUFFER_COUNT);
    if ( count == MAX_SHM_BUFFER_COUNT )
        return ERR_PTR(-ENOMEM);

3. Shared RPC buffer. This is very special kind of buffer. Basically,
OP-TEE needs some shared memory to provide RPC call parameters. So it
requests buffer from Normal World. There is no hard limit on this from
mediator side, because, as I told earlier, OP-TEE itself limits number
of this buffers. There is no cases when more that one buffer will be
allocated per OP-TEE thread. This type of buffer is used only to process
RPC requests themselves. OP-TEE can request more buffers via RPC, but
they will fall to p.2: NW uses separate request to register buffer and
then returns its handle in the preempted call.


Apart from those two limits, there is a limit on total number of pages
which is shared between guest and OP-TEE: MAX_TOTAL_SMH_BUF_PG. This
limit is for a case when guest tries to allocate few really BIG buffers.


> It looks like they could be called from one of the OPTEE operations that
> a domU could request? Is there a limit for them?

Yes, there are limits, as I described above.

Also, bear in mind that resources available to OP-TEE are also quite
limited. So, in case of some breach in mediator, OP-TEE will give up
first. This of course is not an excuse to have bugs in the mediator...
Stefano Stabellini Oct. 7, 2021, 9:14 p.m. UTC | #8
On Thu, 7 Oct 2021, Volodymyr Babchuk wrote:
> Hi Stefano,
> 
> Stefano Stabellini <sstabellini@kernel.org> writes:
> 
> > On Wed, 6 Oct 2021, Oleksandr wrote:
> >> Hello all
> >> 
> >> Gentle reminder.
> >  
> > Many thanks for the ping, this patch fell off my radar.
> >
> >
> >  
> >> On 23.09.21 23:57, Volodymyr Babchuk wrote:
> >> > Hi Stefano,
> >> > 
> >> > Stefano Stabellini <sstabellini@kernel.org> writes:
> >> > 
> >> > > On Mon, 6 Sep 2021, Oleksandr Tyshchenko wrote:
> >> > > > From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> >> > > > 
> >> > > > Allocate anonymous domheap pages as there is no strict need to
> >> > > > account them to a particular domain.
> >> > > > 
> >> > > > Since XSA-383 "xen/arm: Restrict the amount of memory that dom0less
> >> > > > domU and dom0 can allocate" the dom0 cannot allocate memory outside
> >> > > > of the pre-allocated region. This means if we try to allocate
> >> > > > non-anonymous page to be accounted to dom0 we will get an
> >> > > > over-allocation issue when assigning that page to the domain.
> >> > > > The anonymous page, in turn, is not assigned to any domain.
> >> > > > 
> >> > > > CC: Julien Grall <jgrall@amazon.com>
> >> > > > Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> >> > > > Acked-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
> >> > > Only one question, which is more architectural: given that these pages
> >> > > are "unlimited", could the guest exploit the interface somehow to force
> >> > > Xen to allocate an very high number of anonymous pages?
> >> > > 
> >> > > E.g. could a domain call OPTEE_SMC_RPC_FUNC_ALLOC in a loop to force Xen
> >> > > to exaust all memory pages?
> >> > Generally, OP-TEE mediator tracks all resources allocated and imposes
> >> > limits on them.
> >> > 
> >> > OPTEE_SMC_RPC_FUNC_ALLOC case is a bit different, because it is issued
> >> > not by domain, but by OP-TEE itself. As OP-TEE is more trusted piece of
> >> > system we allow it to request as many buffers as it wants. Also, we know
> >> > that OP-TEE asks only for one such buffer per every standard call. And
> >> > number of simultaneous calls is limited by number of OP-TEE threads,
> >> > which is quite low: typically only two.
> >
> > So let me repeat it differently to see if I understood correctly:
> >
> > - OPTEE_SMC_RPC_FUNC_ALLOC is only called by OP-TEE, not by the domain
> > - OPTEE is trusted and only call it twice anyway
> 
> Correct.
> 
> > I am OK with this argument, but do we have a check to make sure a domU
> > cannot issue OPTEE_SMC_RPC_FUNC_ALLOC?
> 
> domU can't issue any RPC, because all RPCs are issued from OP-TEE
> side. This is the nature of RPC - OP-TEE requests Normal World for some
> service. But of course, Normal World can perform certain actions that
> will make OP-TEE to issue a RPC. I discuss this in depth below.
> 
> >
> > Looking at the patch, there are other two places, in addition to
> > OPTEE_SMC_RPC_FUNC_ALLOC, where the anonymous memory pages can be
> > allocated:
> >
> > 1) copy_std_request
> > 2) translate_noncontig
> >
> > We need to prove that neither 1) or 2) can result in a domU exausting
> > Xen memory.
> >
> > In the case of 1), it looks like the memory is freed before returning to
> > the DomU, right? If so, it should be no problem?
> 
> Yes, mediator makes shadow copy of every request buffer to hide
> translated addresses from the guest. Number of requests is limited by
> number of OP-TEE threads.
> 
> > In the case of 2), it looks like the memory could outlive the call where
> > it is allocated. Is there any kind of protection against issuing
> > something like OPTEE_MSG_ATTR_TYPE_TMEM_INOUT in a loop? Is it OP-TEE
> > itself that would refuse the attempt? Thus, the idea is that
> > do_call_with_arg will return error and we'll just free the memory there?
> 
> Well, translate_noncontig() calls allocate_optee_shm_buf() which counts
> all allocated buffers. So you can't call it more than
> MAX_SHM_BUFFER_COUNT times, without de-allocating previous memory. But,
> thanks to your question, I have found a bug there: memory is not freed
> if allocate_optee_shm_buf() fails. I'll prepare patch later today.
> 
> > I cannot see a check for errors returned by do_call_with_arg and memory
> > freeing done because of that. Sorry I am not super familiar with the
> > code, I am just trying to make sure we are not offering to DomUs an easy
> > way to crash the system.
> 
> I tried to eliminate all possibilities for a guest to crash the
> system. Of course, this does not mean that there are none of them.
> 
> And yes, code is a bit hard to understand, because calls to OP-TEE are
> stateful and you need to account for that state. From NW and SW this
> looks quite fine, because state is handled naturally. But mediator sits
> in a middle, so it's implementation is a bit messy.
> 
> I'll try to explain what is going on, so you it will be easier to
> understand logic in the mediator.
> 
> There are two types of OP-TEE calls: fast calls and standard calls. Fast
> call is simple: call SMC and get result. It does not allocate thread
> context in OP-TEE and is non-preemptive. So yes, it should be fast. It
> is used for simple things like "get OP-TEE version" or "exchange
> capabilities". It is easy to handle them in mediator: just forward
> the call, check result, return it back to a guest.
> 
> Standard calls are stateful. OP-TEE allocates thread for each call. This
> call can be preempted either by IRQ or by RPC. For consistency IRQ
> return is also considered as special type of RPC. So, in general one
> standard call can consist of series of SMCs:
> 
> --> SMC with request
> <-- RPC return (like IRQ)
> --> SMC "resume call"
> <-- RPC return (like "read disk")
> --> SMC "resume call"
> <-- RPC return (like "send network packet")
> --> SMC "resume call"
> ...
> <-- Final return
> 
> There are many types of RPCs: "handle IRQ", additional shared buffer
> allocation/de-allocation, RPMB access, disks access, network access,
> synchronization primitives (when OP-TEE thread is gets blocked on a
> mutex), etc.
> 
> Two more things that makes all this worse: Normal World can register
> shared buffer with OP-TEE. Such buffer can live indefinitely
> long. Also, Normal World decides when to resume call. For example,
> calling process can be preempted and then resumed seconds
> later. Misbehaving guest can decide to not resume call at all.
> 
> As I said, I tried to take all this things into account. There are
> basically 3 types of objects that can lead to memory allocation on Xen
> side:
> 
> 1. Standard call context. Besides memory space for struct optee_std_call
> itself it allocates page for a shadow buffer, where argument addresses
> are translated by Xen. Number of this objects is limited by number of
> OP-TEE threads:
> 
>     count = atomic_add_unless(&ctx->call_count, 1, max_optee_threads);
>     if ( count == max_optee_threads )
>         return ERR_PTR(-ENOSPC);
> 
> 2. Shared buffer. This is a buffer shared by guest with OP-TEE. It can
> be either temporary buffer which is shared for one standard call
> duration, or registered shared buffer, which is remains active until it
> is de-registered. This is where translate_noncontig() comes into play.
> Number of this buffers is limited in allocate_optee_shm_buf():
> 
>     count = atomic_add_unless(&ctx->optee_shm_buf_count, 1,
>                               MAX_SHM_BUFFER_COUNT);
>     if ( count == MAX_SHM_BUFFER_COUNT )
>         return ERR_PTR(-ENOMEM);
> 
> 3. Shared RPC buffer. This is very special kind of buffer. Basically,
> OP-TEE needs some shared memory to provide RPC call parameters. So it
> requests buffer from Normal World. There is no hard limit on this from
> mediator side, because, as I told earlier, OP-TEE itself limits number
> of this buffers. There is no cases when more that one buffer will be
> allocated per OP-TEE thread. This type of buffer is used only to process
> RPC requests themselves. OP-TEE can request more buffers via RPC, but
> they will fall to p.2: NW uses separate request to register buffer and
> then returns its handle in the preempted call.
> 
> 
> Apart from those two limits, there is a limit on total number of pages
> which is shared between guest and OP-TEE: MAX_TOTAL_SMH_BUF_PG. This
> limit is for a case when guest tries to allocate few really BIG buffers.
> 
> 
> > It looks like they could be called from one of the OPTEE operations that
> > a domU could request? Is there a limit for them?
> 
> Yes, there are limits, as I described above.
> 
> Also, bear in mind that resources available to OP-TEE are also quite
> limited. So, in case of some breach in mediator, OP-TEE will give up
> first. This of course is not an excuse to have bugs in the mediator...

OK, thanks for the explanation. The reasons for my questions is that if
the allocations are using the memory of DomU, then at worst DomU can run
out of memory. But if the allocations are using anonymous memory, then
the whole platform might run out of memory. We have issued XSAs for
things like that in the past.

This is why I am worried about this patch: if we apply it we really
become reliant on these limits being implemented correctly. A bug can
have much more severe consequences.

As you are the maintainer for this code, and this code is not security
supported, I'll leave it up to you (also see the other email about
moving optee to "supported, not security supported").

However, maybe a different solution would be to increase max_pages for a
domain when optee is enabled? Maybe just by a few pages (as many as
needed by the optee mediator)? Because if we did that, we wouldn't risk
exposing DOS attack vectors for every bug in the mediator limits checks.

The below adds a 10 pages slack.


diff --git a/xen/arch/arm/tee/tee.c b/xen/arch/arm/tee/tee.c
index 3964a8a5cd..a3105f1a9a 100644
--- a/xen/arch/arm/tee/tee.c
+++ b/xen/arch/arm/tee/tee.c
@@ -38,8 +38,11 @@ bool tee_handle_call(struct cpu_user_regs *regs)
     return cur_mediator->ops->handle_call(regs);
 }
 
+#define TEE_SLACK (10)
 int tee_domain_init(struct domain *d, uint16_t tee_type)
 {
+    int ret;
+
     if ( tee_type == XEN_DOMCTL_CONFIG_TEE_NONE )
         return 0;
 
@@ -49,7 +52,15 @@ int tee_domain_init(struct domain *d, uint16_t tee_type)
     if ( cur_mediator->tee_type != tee_type )
         return -EINVAL;
 
-    return cur_mediator->ops->domain_init(d);
+    ret = cur_mediator->ops->domain_init(d);
+    if ( ret < 0 )
+        return ret;
+
+    /* 
+     * Increase maxmem for domains with TEE, the extra pages are used by
+     * the mediator
+     */
+    d->max_pages += TEE_SLACK;
 }
 
 int tee_relinquish_resources(struct domain *d)
Volodymyr Babchuk Oct. 7, 2021, 11 p.m. UTC | #9
Hi,

Stefano Stabellini <sstabellini@kernel.org> writes:

> On Thu, 7 Oct 2021, Volodymyr Babchuk wrote:
>> Hi Stefano,
>> 
>> Stefano Stabellini <sstabellini@kernel.org> writes:
>> 
>> > On Wed, 6 Oct 2021, Oleksandr wrote:
>> >> Hello all
>> >> 
>> >> Gentle reminder.
>> >  
>> > Many thanks for the ping, this patch fell off my radar.
>> >
>> >
>> >  
>> >> On 23.09.21 23:57, Volodymyr Babchuk wrote:
>> >> > Hi Stefano,
>> >> > 
>> >> > Stefano Stabellini <sstabellini@kernel.org> writes:
>> >> > 
>> >> > > On Mon, 6 Sep 2021, Oleksandr Tyshchenko wrote:
>> >> > > > From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>> >> > > > 
>> >> > > > Allocate anonymous domheap pages as there is no strict need to
>> >> > > > account them to a particular domain.
>> >> > > > 
>> >> > > > Since XSA-383 "xen/arm: Restrict the amount of memory that dom0less
>> >> > > > domU and dom0 can allocate" the dom0 cannot allocate memory outside
>> >> > > > of the pre-allocated region. This means if we try to allocate
>> >> > > > non-anonymous page to be accounted to dom0 we will get an
>> >> > > > over-allocation issue when assigning that page to the domain.
>> >> > > > The anonymous page, in turn, is not assigned to any domain.
>> >> > > > 
>> >> > > > CC: Julien Grall <jgrall@amazon.com>
>> >> > > > Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>> >> > > > Acked-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
>> >> > > Only one question, which is more architectural: given that these pages
>> >> > > are "unlimited", could the guest exploit the interface somehow to force
>> >> > > Xen to allocate an very high number of anonymous pages?
>> >> > > 
>> >> > > E.g. could a domain call OPTEE_SMC_RPC_FUNC_ALLOC in a loop to force Xen
>> >> > > to exaust all memory pages?
>> >> > Generally, OP-TEE mediator tracks all resources allocated and imposes
>> >> > limits on them.
>> >> > 
>> >> > OPTEE_SMC_RPC_FUNC_ALLOC case is a bit different, because it is issued
>> >> > not by domain, but by OP-TEE itself. As OP-TEE is more trusted piece of
>> >> > system we allow it to request as many buffers as it wants. Also, we know
>> >> > that OP-TEE asks only for one such buffer per every standard call. And
>> >> > number of simultaneous calls is limited by number of OP-TEE threads,
>> >> > which is quite low: typically only two.
>> >
>> > So let me repeat it differently to see if I understood correctly:
>> >
>> > - OPTEE_SMC_RPC_FUNC_ALLOC is only called by OP-TEE, not by the domain
>> > - OPTEE is trusted and only call it twice anyway
>> 
>> Correct.
>> 
>> > I am OK with this argument, but do we have a check to make sure a domU
>> > cannot issue OPTEE_SMC_RPC_FUNC_ALLOC?
>> 
>> domU can't issue any RPC, because all RPCs are issued from OP-TEE
>> side. This is the nature of RPC - OP-TEE requests Normal World for some
>> service. But of course, Normal World can perform certain actions that
>> will make OP-TEE to issue a RPC. I discuss this in depth below.
>> 
>> >
>> > Looking at the patch, there are other two places, in addition to
>> > OPTEE_SMC_RPC_FUNC_ALLOC, where the anonymous memory pages can be
>> > allocated:
>> >
>> > 1) copy_std_request
>> > 2) translate_noncontig
>> >
>> > We need to prove that neither 1) or 2) can result in a domU exausting
>> > Xen memory.
>> >
>> > In the case of 1), it looks like the memory is freed before returning to
>> > the DomU, right? If so, it should be no problem?
>> 
>> Yes, mediator makes shadow copy of every request buffer to hide
>> translated addresses from the guest. Number of requests is limited by
>> number of OP-TEE threads.
>> 
>> > In the case of 2), it looks like the memory could outlive the call where
>> > it is allocated. Is there any kind of protection against issuing
>> > something like OPTEE_MSG_ATTR_TYPE_TMEM_INOUT in a loop? Is it OP-TEE
>> > itself that would refuse the attempt? Thus, the idea is that
>> > do_call_with_arg will return error and we'll just free the memory there?
>> 
>> Well, translate_noncontig() calls allocate_optee_shm_buf() which counts
>> all allocated buffers. So you can't call it more than
>> MAX_SHM_BUFFER_COUNT times, without de-allocating previous memory. But,
>> thanks to your question, I have found a bug there: memory is not freed
>> if allocate_optee_shm_buf() fails. I'll prepare patch later today.
>> 
>> > I cannot see a check for errors returned by do_call_with_arg and memory
>> > freeing done because of that. Sorry I am not super familiar with the
>> > code, I am just trying to make sure we are not offering to DomUs an easy
>> > way to crash the system.
>> 
>> I tried to eliminate all possibilities for a guest to crash the
>> system. Of course, this does not mean that there are none of them.
>> 
>> And yes, code is a bit hard to understand, because calls to OP-TEE are
>> stateful and you need to account for that state. From NW and SW this
>> looks quite fine, because state is handled naturally. But mediator sits
>> in a middle, so it's implementation is a bit messy.
>> 
>> I'll try to explain what is going on, so you it will be easier to
>> understand logic in the mediator.
>> 
>> There are two types of OP-TEE calls: fast calls and standard calls. Fast
>> call is simple: call SMC and get result. It does not allocate thread
>> context in OP-TEE and is non-preemptive. So yes, it should be fast. It
>> is used for simple things like "get OP-TEE version" or "exchange
>> capabilities". It is easy to handle them in mediator: just forward
>> the call, check result, return it back to a guest.
>> 
>> Standard calls are stateful. OP-TEE allocates thread for each call. This
>> call can be preempted either by IRQ or by RPC. For consistency IRQ
>> return is also considered as special type of RPC. So, in general one
>> standard call can consist of series of SMCs:
>> 
>> --> SMC with request
>> <-- RPC return (like IRQ)
>> --> SMC "resume call"
>> <-- RPC return (like "read disk")
>> --> SMC "resume call"
>> <-- RPC return (like "send network packet")
>> --> SMC "resume call"
>> ...
>> <-- Final return
>> 
>> There are many types of RPCs: "handle IRQ", additional shared buffer
>> allocation/de-allocation, RPMB access, disks access, network access,
>> synchronization primitives (when OP-TEE thread is gets blocked on a
>> mutex), etc.
>> 
>> Two more things that makes all this worse: Normal World can register
>> shared buffer with OP-TEE. Such buffer can live indefinitely
>> long. Also, Normal World decides when to resume call. For example,
>> calling process can be preempted and then resumed seconds
>> later. Misbehaving guest can decide to not resume call at all.
>> 
>> As I said, I tried to take all this things into account. There are
>> basically 3 types of objects that can lead to memory allocation on Xen
>> side:
>> 
>> 1. Standard call context. Besides memory space for struct optee_std_call
>> itself it allocates page for a shadow buffer, where argument addresses
>> are translated by Xen. Number of this objects is limited by number of
>> OP-TEE threads:
>> 
>>     count = atomic_add_unless(&ctx->call_count, 1, max_optee_threads);
>>     if ( count == max_optee_threads )
>>         return ERR_PTR(-ENOSPC);
>> 
>> 2. Shared buffer. This is a buffer shared by guest with OP-TEE. It can
>> be either temporary buffer which is shared for one standard call
>> duration, or registered shared buffer, which is remains active until it
>> is de-registered. This is where translate_noncontig() comes into play.
>> Number of this buffers is limited in allocate_optee_shm_buf():
>> 
>>     count = atomic_add_unless(&ctx->optee_shm_buf_count, 1,
>>                               MAX_SHM_BUFFER_COUNT);
>>     if ( count == MAX_SHM_BUFFER_COUNT )
>>         return ERR_PTR(-ENOMEM);
>> 
>> 3. Shared RPC buffer. This is very special kind of buffer. Basically,
>> OP-TEE needs some shared memory to provide RPC call parameters. So it
>> requests buffer from Normal World. There is no hard limit on this from
>> mediator side, because, as I told earlier, OP-TEE itself limits number
>> of this buffers. There is no cases when more that one buffer will be
>> allocated per OP-TEE thread. This type of buffer is used only to process
>> RPC requests themselves. OP-TEE can request more buffers via RPC, but
>> they will fall to p.2: NW uses separate request to register buffer and
>> then returns its handle in the preempted call.
>> 
>> 
>> Apart from those two limits, there is a limit on total number of pages
>> which is shared between guest and OP-TEE: MAX_TOTAL_SMH_BUF_PG. This
>> limit is for a case when guest tries to allocate few really BIG buffers.
>> 
>> 
>> > It looks like they could be called from one of the OPTEE operations that
>> > a domU could request? Is there a limit for them?
>> 
>> Yes, there are limits, as I described above.
>> 
>> Also, bear in mind that resources available to OP-TEE are also quite
>> limited. So, in case of some breach in mediator, OP-TEE will give up
>> first. This of course is not an excuse to have bugs in the mediator...
>
> OK, thanks for the explanation. The reasons for my questions is that if
> the allocations are using the memory of DomU, then at worst DomU can run
> out of memory. But if the allocations are using anonymous memory, then
> the whole platform might run out of memory. We have issued XSAs for
> things like that in the past.
>
> This is why I am worried about this patch: if we apply it we really
> become reliant on these limits being implemented correctly. A bug can
> have much more severe consequences.

Agree.

> As you are the maintainer for this code, and this code is not security
> supported, I'll leave it up to you (also see the other email about
> moving optee to "supported, not security supported").

Yes, I've seen this email. Just didn't had time to write followup.

> However, maybe a different solution would be to increase max_pages for a
> domain when optee is enabled? Maybe just by a few pages (as many as
> needed by the optee mediator)? Because if we did that, we wouldn't risk
> exposing DOS attack vectors for every bug in the mediator limits checks.
>
> The below adds a 10 pages slack.

Well, I didn't know that such option is available. If this is a valid
approach and there are no objections from other maintainers, I'd rather
use it.

Only one comment there is about number of pages. Maximal number of
domheap pages used per request is 6: one for request itself, one for RPC
buffer, 4 at most for request arguments. I checked OP-TEE configuration,
looks like some platforms allow up to 16 threads. This yields 6 * 16 = 96
pages in total. If this is acceptable I'd set TEE_SLACK to 96.

>
> diff --git a/xen/arch/arm/tee/tee.c b/xen/arch/arm/tee/tee.c
> index 3964a8a5cd..a3105f1a9a 100644
> --- a/xen/arch/arm/tee/tee.c
> +++ b/xen/arch/arm/tee/tee.c
> @@ -38,8 +38,11 @@ bool tee_handle_call(struct cpu_user_regs *regs)
>      return cur_mediator->ops->handle_call(regs);
>  }
>  
> +#define TEE_SLACK (10)
>  int tee_domain_init(struct domain *d, uint16_t tee_type)
>  {
> +    int ret;
> +
>      if ( tee_type == XEN_DOMCTL_CONFIG_TEE_NONE )
>          return 0;
>  
> @@ -49,7 +52,15 @@ int tee_domain_init(struct domain *d, uint16_t tee_type)
>      if ( cur_mediator->tee_type != tee_type )
>          return -EINVAL;
>  
> -    return cur_mediator->ops->domain_init(d);
> +    ret = cur_mediator->ops->domain_init(d);
> +    if ( ret < 0 )
> +        return ret;
> +
> +    /* 
> +     * Increase maxmem for domains with TEE, the extra pages are used by
> +     * the mediator
> +     */
> +    d->max_pages += TEE_SLACK;
>  }
>  
>  int tee_relinquish_resources(struct domain *d)
Stefano Stabellini Oct. 7, 2021, 11:40 p.m. UTC | #10
On Thu, 7 Oct 2021, Volodymyr Babchuk wrote:
> > On Thu, 7 Oct 2021, Volodymyr Babchuk wrote:
> >> Hi Stefano,
> >> 
> >> Stefano Stabellini <sstabellini@kernel.org> writes:
> >> 
> >> > On Wed, 6 Oct 2021, Oleksandr wrote:
> >> >> Hello all
> >> >> 
> >> >> Gentle reminder.
> >> >  
> >> > Many thanks for the ping, this patch fell off my radar.
> >> >
> >> >
> >> >  
> >> >> On 23.09.21 23:57, Volodymyr Babchuk wrote:
> >> >> > Hi Stefano,
> >> >> > 
> >> >> > Stefano Stabellini <sstabellini@kernel.org> writes:
> >> >> > 
> >> >> > > On Mon, 6 Sep 2021, Oleksandr Tyshchenko wrote:
> >> >> > > > From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> >> >> > > > 
> >> >> > > > Allocate anonymous domheap pages as there is no strict need to
> >> >> > > > account them to a particular domain.
> >> >> > > > 
> >> >> > > > Since XSA-383 "xen/arm: Restrict the amount of memory that dom0less
> >> >> > > > domU and dom0 can allocate" the dom0 cannot allocate memory outside
> >> >> > > > of the pre-allocated region. This means if we try to allocate
> >> >> > > > non-anonymous page to be accounted to dom0 we will get an
> >> >> > > > over-allocation issue when assigning that page to the domain.
> >> >> > > > The anonymous page, in turn, is not assigned to any domain.
> >> >> > > > 
> >> >> > > > CC: Julien Grall <jgrall@amazon.com>
> >> >> > > > Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> >> >> > > > Acked-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
> >> >> > > Only one question, which is more architectural: given that these pages
> >> >> > > are "unlimited", could the guest exploit the interface somehow to force
> >> >> > > Xen to allocate an very high number of anonymous pages?
> >> >> > > 
> >> >> > > E.g. could a domain call OPTEE_SMC_RPC_FUNC_ALLOC in a loop to force Xen
> >> >> > > to exaust all memory pages?
> >> >> > Generally, OP-TEE mediator tracks all resources allocated and imposes
> >> >> > limits on them.
> >> >> > 
> >> >> > OPTEE_SMC_RPC_FUNC_ALLOC case is a bit different, because it is issued
> >> >> > not by domain, but by OP-TEE itself. As OP-TEE is more trusted piece of
> >> >> > system we allow it to request as many buffers as it wants. Also, we know
> >> >> > that OP-TEE asks only for one such buffer per every standard call. And
> >> >> > number of simultaneous calls is limited by number of OP-TEE threads,
> >> >> > which is quite low: typically only two.
> >> >
> >> > So let me repeat it differently to see if I understood correctly:
> >> >
> >> > - OPTEE_SMC_RPC_FUNC_ALLOC is only called by OP-TEE, not by the domain
> >> > - OPTEE is trusted and only call it twice anyway
> >> 
> >> Correct.
> >> 
> >> > I am OK with this argument, but do we have a check to make sure a domU
> >> > cannot issue OPTEE_SMC_RPC_FUNC_ALLOC?
> >> 
> >> domU can't issue any RPC, because all RPCs are issued from OP-TEE
> >> side. This is the nature of RPC - OP-TEE requests Normal World for some
> >> service. But of course, Normal World can perform certain actions that
> >> will make OP-TEE to issue a RPC. I discuss this in depth below.
> >> 
> >> >
> >> > Looking at the patch, there are other two places, in addition to
> >> > OPTEE_SMC_RPC_FUNC_ALLOC, where the anonymous memory pages can be
> >> > allocated:
> >> >
> >> > 1) copy_std_request
> >> > 2) translate_noncontig
> >> >
> >> > We need to prove that neither 1) or 2) can result in a domU exausting
> >> > Xen memory.
> >> >
> >> > In the case of 1), it looks like the memory is freed before returning to
> >> > the DomU, right? If so, it should be no problem?
> >> 
> >> Yes, mediator makes shadow copy of every request buffer to hide
> >> translated addresses from the guest. Number of requests is limited by
> >> number of OP-TEE threads.
> >> 
> >> > In the case of 2), it looks like the memory could outlive the call where
> >> > it is allocated. Is there any kind of protection against issuing
> >> > something like OPTEE_MSG_ATTR_TYPE_TMEM_INOUT in a loop? Is it OP-TEE
> >> > itself that would refuse the attempt? Thus, the idea is that
> >> > do_call_with_arg will return error and we'll just free the memory there?
> >> 
> >> Well, translate_noncontig() calls allocate_optee_shm_buf() which counts
> >> all allocated buffers. So you can't call it more than
> >> MAX_SHM_BUFFER_COUNT times, without de-allocating previous memory. But,
> >> thanks to your question, I have found a bug there: memory is not freed
> >> if allocate_optee_shm_buf() fails. I'll prepare patch later today.
> >> 
> >> > I cannot see a check for errors returned by do_call_with_arg and memory
> >> > freeing done because of that. Sorry I am not super familiar with the
> >> > code, I am just trying to make sure we are not offering to DomUs an easy
> >> > way to crash the system.
> >> 
> >> I tried to eliminate all possibilities for a guest to crash the
> >> system. Of course, this does not mean that there are none of them.
> >> 
> >> And yes, code is a bit hard to understand, because calls to OP-TEE are
> >> stateful and you need to account for that state. From NW and SW this
> >> looks quite fine, because state is handled naturally. But mediator sits
> >> in a middle, so it's implementation is a bit messy.
> >> 
> >> I'll try to explain what is going on, so you it will be easier to
> >> understand logic in the mediator.
> >> 
> >> There are two types of OP-TEE calls: fast calls and standard calls. Fast
> >> call is simple: call SMC and get result. It does not allocate thread
> >> context in OP-TEE and is non-preemptive. So yes, it should be fast. It
> >> is used for simple things like "get OP-TEE version" or "exchange
> >> capabilities". It is easy to handle them in mediator: just forward
> >> the call, check result, return it back to a guest.
> >> 
> >> Standard calls are stateful. OP-TEE allocates thread for each call. This
> >> call can be preempted either by IRQ or by RPC. For consistency IRQ
> >> return is also considered as special type of RPC. So, in general one
> >> standard call can consist of series of SMCs:
> >> 
> >> --> SMC with request
> >> <-- RPC return (like IRQ)
> >> --> SMC "resume call"
> >> <-- RPC return (like "read disk")
> >> --> SMC "resume call"
> >> <-- RPC return (like "send network packet")
> >> --> SMC "resume call"
> >> ...
> >> <-- Final return
> >> 
> >> There are many types of RPCs: "handle IRQ", additional shared buffer
> >> allocation/de-allocation, RPMB access, disks access, network access,
> >> synchronization primitives (when OP-TEE thread is gets blocked on a
> >> mutex), etc.
> >> 
> >> Two more things that makes all this worse: Normal World can register
> >> shared buffer with OP-TEE. Such buffer can live indefinitely
> >> long. Also, Normal World decides when to resume call. For example,
> >> calling process can be preempted and then resumed seconds
> >> later. Misbehaving guest can decide to not resume call at all.
> >> 
> >> As I said, I tried to take all this things into account. There are
> >> basically 3 types of objects that can lead to memory allocation on Xen
> >> side:
> >> 
> >> 1. Standard call context. Besides memory space for struct optee_std_call
> >> itself it allocates page for a shadow buffer, where argument addresses
> >> are translated by Xen. Number of this objects is limited by number of
> >> OP-TEE threads:
> >> 
> >>     count = atomic_add_unless(&ctx->call_count, 1, max_optee_threads);
> >>     if ( count == max_optee_threads )
> >>         return ERR_PTR(-ENOSPC);
> >> 
> >> 2. Shared buffer. This is a buffer shared by guest with OP-TEE. It can
> >> be either temporary buffer which is shared for one standard call
> >> duration, or registered shared buffer, which is remains active until it
> >> is de-registered. This is where translate_noncontig() comes into play.
> >> Number of this buffers is limited in allocate_optee_shm_buf():
> >> 
> >>     count = atomic_add_unless(&ctx->optee_shm_buf_count, 1,
> >>                               MAX_SHM_BUFFER_COUNT);
> >>     if ( count == MAX_SHM_BUFFER_COUNT )
> >>         return ERR_PTR(-ENOMEM);
> >> 
> >> 3. Shared RPC buffer. This is very special kind of buffer. Basically,
> >> OP-TEE needs some shared memory to provide RPC call parameters. So it
> >> requests buffer from Normal World. There is no hard limit on this from
> >> mediator side, because, as I told earlier, OP-TEE itself limits number
> >> of this buffers. There is no cases when more that one buffer will be
> >> allocated per OP-TEE thread. This type of buffer is used only to process
> >> RPC requests themselves. OP-TEE can request more buffers via RPC, but
> >> they will fall to p.2: NW uses separate request to register buffer and
> >> then returns its handle in the preempted call.
> >> 
> >> 
> >> Apart from those two limits, there is a limit on total number of pages
> >> which is shared between guest and OP-TEE: MAX_TOTAL_SMH_BUF_PG. This
> >> limit is for a case when guest tries to allocate few really BIG buffers.
> >> 
> >> 
> >> > It looks like they could be called from one of the OPTEE operations that
> >> > a domU could request? Is there a limit for them?
> >> 
> >> Yes, there are limits, as I described above.
> >> 
> >> Also, bear in mind that resources available to OP-TEE are also quite
> >> limited. So, in case of some breach in mediator, OP-TEE will give up
> >> first. This of course is not an excuse to have bugs in the mediator...
> >
> > OK, thanks for the explanation. The reasons for my questions is that if
> > the allocations are using the memory of DomU, then at worst DomU can run
> > out of memory. But if the allocations are using anonymous memory, then
> > the whole platform might run out of memory. We have issued XSAs for
> > things like that in the past.
> >
> > This is why I am worried about this patch: if we apply it we really
> > become reliant on these limits being implemented correctly. A bug can
> > have much more severe consequences.
> 
> Agree.
> 
> > As you are the maintainer for this code, and this code is not security
> > supported, I'll leave it up to you (also see the other email about
> > moving optee to "supported, not security supported").
> 
> Yes, I've seen this email. Just didn't had time to write followup.
> 
> > However, maybe a different solution would be to increase max_pages for a
> > domain when optee is enabled? Maybe just by a few pages (as many as
> > needed by the optee mediator)? Because if we did that, we wouldn't risk
> > exposing DOS attack vectors for every bug in the mediator limits checks.
> >
> > The below adds a 10 pages slack.
> 
> Well, I didn't know that such option is available. If this is a valid
> approach and there are no objections from other maintainers, I'd rather
> use it.

I think it is a valid approach, and it is "more secure" than the other
patch. I suggest that you send a patch for it so that if people can
voice their objections if any.


> Only one comment there is about number of pages. Maximal number of
> domheap pages used per request is 6: one for request itself, one for RPC
> buffer, 4 at most for request arguments. I checked OP-TEE configuration,
> looks like some platforms allow up to 16 threads. This yields 6 * 16 = 96
> pages in total. If this is acceptable I'd set TEE_SLACK to 96.

OK. Please add a good in-code comment explaining how you got to 96.


> > diff --git a/xen/arch/arm/tee/tee.c b/xen/arch/arm/tee/tee.c
> > index 3964a8a5cd..a3105f1a9a 100644
> > --- a/xen/arch/arm/tee/tee.c
> > +++ b/xen/arch/arm/tee/tee.c
> > @@ -38,8 +38,11 @@ bool tee_handle_call(struct cpu_user_regs *regs)
> >      return cur_mediator->ops->handle_call(regs);
> >  }
> >  
> > +#define TEE_SLACK (10)
> >  int tee_domain_init(struct domain *d, uint16_t tee_type)
> >  {
> > +    int ret;
> > +
> >      if ( tee_type == XEN_DOMCTL_CONFIG_TEE_NONE )
> >          return 0;
> >  
> > @@ -49,7 +52,15 @@ int tee_domain_init(struct domain *d, uint16_t tee_type)
> >      if ( cur_mediator->tee_type != tee_type )
> >          return -EINVAL;
> >  
> > -    return cur_mediator->ops->domain_init(d);
> > +    ret = cur_mediator->ops->domain_init(d);
> > +    if ( ret < 0 )
> > +        return ret;
> > +
> > +    /* 
> > +     * Increase maxmem for domains with TEE, the extra pages are used by
> > +     * the mediator
> > +     */
> > +    d->max_pages += TEE_SLACK;
> >  }
> >  
> >  int tee_relinquish_resources(struct domain *d)
Julien Grall Oct. 8, 2021, 7:30 a.m. UTC | #11
Hi Stefano,

On 07/10/2021 23:14, Stefano Stabellini wrote:
> On Thu, 7 Oct 2021, Volodymyr Babchuk wrote:
>> Hi Stefano,
>>
>> Stefano Stabellini <sstabellini@kernel.org> writes:
>>
>>> On Wed, 6 Oct 2021, Oleksandr wrote:
>>>> Hello all
>>>>
>>>> Gentle reminder.
>>>   
>>> Many thanks for the ping, this patch fell off my radar.
>>>
>>>
>>>   
>>>> On 23.09.21 23:57, Volodymyr Babchuk wrote:
>>>>> Hi Stefano,
>>>>>
>>>>> Stefano Stabellini <sstabellini@kernel.org> writes:
>>>>>
>>>>>> On Mon, 6 Sep 2021, Oleksandr Tyshchenko wrote:
>>>>>>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>>>>>>
>>>>>>> Allocate anonymous domheap pages as there is no strict need to
>>>>>>> account them to a particular domain.
>>>>>>>
>>>>>>> Since XSA-383 "xen/arm: Restrict the amount of memory that dom0less
>>>>>>> domU and dom0 can allocate" the dom0 cannot allocate memory outside
>>>>>>> of the pre-allocated region. This means if we try to allocate
>>>>>>> non-anonymous page to be accounted to dom0 we will get an
>>>>>>> over-allocation issue when assigning that page to the domain.
>>>>>>> The anonymous page, in turn, is not assigned to any domain.
>>>>>>>
>>>>>>> CC: Julien Grall <jgrall@amazon.com>
>>>>>>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>>>>>> Acked-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
>>>>>> Only one question, which is more architectural: given that these pages
>>>>>> are "unlimited", could the guest exploit the interface somehow to force
>>>>>> Xen to allocate an very high number of anonymous pages?
>>>>>>
>>>>>> E.g. could a domain call OPTEE_SMC_RPC_FUNC_ALLOC in a loop to force Xen
>>>>>> to exaust all memory pages?
>>>>> Generally, OP-TEE mediator tracks all resources allocated and imposes
>>>>> limits on them.
>>>>>
>>>>> OPTEE_SMC_RPC_FUNC_ALLOC case is a bit different, because it is issued
>>>>> not by domain, but by OP-TEE itself. As OP-TEE is more trusted piece of
>>>>> system we allow it to request as many buffers as it wants. Also, we know
>>>>> that OP-TEE asks only for one such buffer per every standard call. And
>>>>> number of simultaneous calls is limited by number of OP-TEE threads,
>>>>> which is quite low: typically only two.
>>>
>>> So let me repeat it differently to see if I understood correctly:
>>>
>>> - OPTEE_SMC_RPC_FUNC_ALLOC is only called by OP-TEE, not by the domain
>>> - OPTEE is trusted and only call it twice anyway
>>
>> Correct.
>>
>>> I am OK with this argument, but do we have a check to make sure a domU
>>> cannot issue OPTEE_SMC_RPC_FUNC_ALLOC?
>>
>> domU can't issue any RPC, because all RPCs are issued from OP-TEE
>> side. This is the nature of RPC - OP-TEE requests Normal World for some
>> service. But of course, Normal World can perform certain actions that
>> will make OP-TEE to issue a RPC. I discuss this in depth below.
>>
>>>
>>> Looking at the patch, there are other two places, in addition to
>>> OPTEE_SMC_RPC_FUNC_ALLOC, where the anonymous memory pages can be
>>> allocated:
>>>
>>> 1) copy_std_request
>>> 2) translate_noncontig
>>>
>>> We need to prove that neither 1) or 2) can result in a domU exausting
>>> Xen memory.
>>>
>>> In the case of 1), it looks like the memory is freed before returning to
>>> the DomU, right? If so, it should be no problem?
>>
>> Yes, mediator makes shadow copy of every request buffer to hide
>> translated addresses from the guest. Number of requests is limited by
>> number of OP-TEE threads.
>>
>>> In the case of 2), it looks like the memory could outlive the call where
>>> it is allocated. Is there any kind of protection against issuing
>>> something like OPTEE_MSG_ATTR_TYPE_TMEM_INOUT in a loop? Is it OP-TEE
>>> itself that would refuse the attempt? Thus, the idea is that
>>> do_call_with_arg will return error and we'll just free the memory there?
>>
>> Well, translate_noncontig() calls allocate_optee_shm_buf() which counts
>> all allocated buffers. So you can't call it more than
>> MAX_SHM_BUFFER_COUNT times, without de-allocating previous memory. But,
>> thanks to your question, I have found a bug there: memory is not freed
>> if allocate_optee_shm_buf() fails. I'll prepare patch later today.
>>
>>> I cannot see a check for errors returned by do_call_with_arg and memory
>>> freeing done because of that. Sorry I am not super familiar with the
>>> code, I am just trying to make sure we are not offering to DomUs an easy
>>> way to crash the system.
>>
>> I tried to eliminate all possibilities for a guest to crash the
>> system. Of course, this does not mean that there are none of them.
>>
>> And yes, code is a bit hard to understand, because calls to OP-TEE are
>> stateful and you need to account for that state. From NW and SW this
>> looks quite fine, because state is handled naturally. But mediator sits
>> in a middle, so it's implementation is a bit messy.
>>
>> I'll try to explain what is going on, so you it will be easier to
>> understand logic in the mediator.
>>
>> There are two types of OP-TEE calls: fast calls and standard calls. Fast
>> call is simple: call SMC and get result. It does not allocate thread
>> context in OP-TEE and is non-preemptive. So yes, it should be fast. It
>> is used for simple things like "get OP-TEE version" or "exchange
>> capabilities". It is easy to handle them in mediator: just forward
>> the call, check result, return it back to a guest.
>>
>> Standard calls are stateful. OP-TEE allocates thread for each call. This
>> call can be preempted either by IRQ or by RPC. For consistency IRQ
>> return is also considered as special type of RPC. So, in general one
>> standard call can consist of series of SMCs:
>>
>> --> SMC with request
>> <-- RPC return (like IRQ)
>> --> SMC "resume call"
>> <-- RPC return (like "read disk")
>> --> SMC "resume call"
>> <-- RPC return (like "send network packet")
>> --> SMC "resume call"
>> ...
>> <-- Final return
>>
>> There are many types of RPCs: "handle IRQ", additional shared buffer
>> allocation/de-allocation, RPMB access, disks access, network access,
>> synchronization primitives (when OP-TEE thread is gets blocked on a
>> mutex), etc.
>>
>> Two more things that makes all this worse: Normal World can register
>> shared buffer with OP-TEE. Such buffer can live indefinitely
>> long. Also, Normal World decides when to resume call. For example,
>> calling process can be preempted and then resumed seconds
>> later. Misbehaving guest can decide to not resume call at all.
>>
>> As I said, I tried to take all this things into account. There are
>> basically 3 types of objects that can lead to memory allocation on Xen
>> side:
>>
>> 1. Standard call context. Besides memory space for struct optee_std_call
>> itself it allocates page for a shadow buffer, where argument addresses
>> are translated by Xen. Number of this objects is limited by number of
>> OP-TEE threads:
>>
>>      count = atomic_add_unless(&ctx->call_count, 1, max_optee_threads);
>>      if ( count == max_optee_threads )
>>          return ERR_PTR(-ENOSPC);
>>
>> 2. Shared buffer. This is a buffer shared by guest with OP-TEE. It can
>> be either temporary buffer which is shared for one standard call
>> duration, or registered shared buffer, which is remains active until it
>> is de-registered. This is where translate_noncontig() comes into play.
>> Number of this buffers is limited in allocate_optee_shm_buf():
>>
>>      count = atomic_add_unless(&ctx->optee_shm_buf_count, 1,
>>                                MAX_SHM_BUFFER_COUNT);
>>      if ( count == MAX_SHM_BUFFER_COUNT )
>>          return ERR_PTR(-ENOMEM);
>>
>> 3. Shared RPC buffer. This is very special kind of buffer. Basically,
>> OP-TEE needs some shared memory to provide RPC call parameters. So it
>> requests buffer from Normal World. There is no hard limit on this from
>> mediator side, because, as I told earlier, OP-TEE itself limits number
>> of this buffers. There is no cases when more that one buffer will be
>> allocated per OP-TEE thread. This type of buffer is used only to process
>> RPC requests themselves. OP-TEE can request more buffers via RPC, but
>> they will fall to p.2: NW uses separate request to register buffer and
>> then returns its handle in the preempted call.
>>
>>
>> Apart from those two limits, there is a limit on total number of pages
>> which is shared between guest and OP-TEE: MAX_TOTAL_SMH_BUF_PG. This
>> limit is for a case when guest tries to allocate few really BIG buffers.
>>
>>
>>> It looks like they could be called from one of the OPTEE operations that
>>> a domU could request? Is there a limit for them?
>>
>> Yes, there are limits, as I described above.
>>
>> Also, bear in mind that resources available to OP-TEE are also quite
>> limited. So, in case of some breach in mediator, OP-TEE will give up
>> first. This of course is not an excuse to have bugs in the mediator...
> 
> OK, thanks for the explanation. The reasons for my questions is that if
> the allocations are using the memory of DomU, then at worst DomU can run
> out of memory. 

Not really. The worst outcome is still a DoS of the host because we 
don't pre-allocate memory or even check that the total allocation will 
not exhaust the memory.

The only difference is I would argue this would be a misconfiguration of 
the system.

> But if the allocations are using anonymous memory, then
> the whole platform might run out of memory. We have issued XSAs for
> things like that in the past.
> 
> This is why I am worried about this patch: if we apply it we really
> become reliant on these limits being implemented correctly. A bug can
> have much more severe consequences.

This is not a problem specific to OP-TEE. Any anymous allocation 
(xmalloc,...) done in Xen on behalf of the guest has, in theory, the 
same problem (see more below).

> 
> As you are the maintainer for this code, and this code is not security
> supported, I'll leave it up to you (also see the other email about
> moving optee to "supported, not security supported").
> 
> However, maybe a different solution would be to increase max_pages for a
> domain when optee is enabled? Maybe just by a few pages (as many as
> needed by the optee mediator)? Because if we did that, we wouldn't risk
> exposing DOS attack vectors for every bug in the mediator limits checks.
I think we need to differentiate two sorts of allocation:
   1) Memory used by Xen on behalf of the guest
   2) Memory used by the guest itself

d->max_pages is only meant to refer to the latter (in fact, a guest can 
balloon memory up to d->max_pages). In this case, we are discussing 
about the latter and therefore I think the should be accounted 
differently as the memory is not exposed to the guest.

Today, Xen doesn't have this facility. I know this has been discussed a 
few times in the past, but AFAIK, a patch series never materialized for it.

However, to me, this sounds more an hardening work for the whole Xen 
rather than OP-TEE itself. So I think the patch provided by Oleksandr is 
probably the best way to go for this release.

>
> The below adds a 10 pages slack.
> 
> 
> diff --git a/xen/arch/arm/tee/tee.c b/xen/arch/arm/tee/tee.c
> index 3964a8a5cd..a3105f1a9a 100644
> --- a/xen/arch/arm/tee/tee.c
> +++ b/xen/arch/arm/tee/tee.c
> @@ -38,8 +38,11 @@ bool tee_handle_call(struct cpu_user_regs *regs)
>       return cur_mediator->ops->handle_call(regs);
>   }
>   
> +#define TEE_SLACK (10)
>   int tee_domain_init(struct domain *d, uint16_t tee_type)
>   {
> +    int ret;
> +
>       if ( tee_type == XEN_DOMCTL_CONFIG_TEE_NONE )
>           return 0;
>   
> @@ -49,7 +52,15 @@ int tee_domain_init(struct domain *d, uint16_t tee_type)
>       if ( cur_mediator->tee_type != tee_type )
>           return -EINVAL;
>   
> -    return cur_mediator->ops->domain_init(d);
> +    ret = cur_mediator->ops->domain_init(d);
> +    if ( ret < 0 )
> +        return ret;
> +
> +    /*
> +     * Increase maxmem for domains with TEE, the extra pages are used by
> +     * the mediator
> +     */
> +    d->max_pages += TEE_SLACK;

Regardless what I wrote above, this change would be incorrect because 
TEE is initialized the when domain is created. However, d->max_pages is 
set afterwards via DOMCTL_max_mem, so the value will get overridden.

However, I don't think OP-TEE code should modify d->max_pages. Instead, 
this should be accounted by the toolstack (or domain_build for dom0/domU 
created by Xen).

Cheers,
Stefano Stabellini Oct. 8, 2021, 9:49 p.m. UTC | #12
On Fri, 8 Oct 2021, Julien Grall wrote:
> On 07/10/2021 23:14, Stefano Stabellini wrote:
> > On Thu, 7 Oct 2021, Volodymyr Babchuk wrote:
> > > Hi Stefano,
> > > 
> > > Stefano Stabellini <sstabellini@kernel.org> writes:
> > > 
> > > > On Wed, 6 Oct 2021, Oleksandr wrote:
> > > > > Hello all
> > > > > 
> > > > > Gentle reminder.
> > > >   Many thanks for the ping, this patch fell off my radar.
> > > > 
> > > > 
> > > >   
> > > > > On 23.09.21 23:57, Volodymyr Babchuk wrote:
> > > > > > Hi Stefano,
> > > > > > 
> > > > > > Stefano Stabellini <sstabellini@kernel.org> writes:
> > > > > > 
> > > > > > > On Mon, 6 Sep 2021, Oleksandr Tyshchenko wrote:
> > > > > > > > From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> > > > > > > > 
> > > > > > > > Allocate anonymous domheap pages as there is no strict need to
> > > > > > > > account them to a particular domain.
> > > > > > > > 
> > > > > > > > Since XSA-383 "xen/arm: Restrict the amount of memory that
> > > > > > > > dom0less
> > > > > > > > domU and dom0 can allocate" the dom0 cannot allocate memory
> > > > > > > > outside
> > > > > > > > of the pre-allocated region. This means if we try to allocate
> > > > > > > > non-anonymous page to be accounted to dom0 we will get an
> > > > > > > > over-allocation issue when assigning that page to the domain.
> > > > > > > > The anonymous page, in turn, is not assigned to any domain.
> > > > > > > > 
> > > > > > > > CC: Julien Grall <jgrall@amazon.com>
> > > > > > > > Signed-off-by: Oleksandr Tyshchenko
> > > > > > > > <oleksandr_tyshchenko@epam.com>
> > > > > > > > Acked-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
> > > > > > > Only one question, which is more architectural: given that these
> > > > > > > pages
> > > > > > > are "unlimited", could the guest exploit the interface somehow to
> > > > > > > force
> > > > > > > Xen to allocate an very high number of anonymous pages?
> > > > > > > 
> > > > > > > E.g. could a domain call OPTEE_SMC_RPC_FUNC_ALLOC in a loop to
> > > > > > > force Xen
> > > > > > > to exaust all memory pages?
> > > > > > Generally, OP-TEE mediator tracks all resources allocated and
> > > > > > imposes
> > > > > > limits on them.
> > > > > > 
> > > > > > OPTEE_SMC_RPC_FUNC_ALLOC case is a bit different, because it is
> > > > > > issued
> > > > > > not by domain, but by OP-TEE itself. As OP-TEE is more trusted piece
> > > > > > of
> > > > > > system we allow it to request as many buffers as it wants. Also, we
> > > > > > know
> > > > > > that OP-TEE asks only for one such buffer per every standard call.
> > > > > > And
> > > > > > number of simultaneous calls is limited by number of OP-TEE threads,
> > > > > > which is quite low: typically only two.
> > > > 
> > > > So let me repeat it differently to see if I understood correctly:
> > > > 
> > > > - OPTEE_SMC_RPC_FUNC_ALLOC is only called by OP-TEE, not by the domain
> > > > - OPTEE is trusted and only call it twice anyway
> > > 
> > > Correct.
> > > 
> > > > I am OK with this argument, but do we have a check to make sure a domU
> > > > cannot issue OPTEE_SMC_RPC_FUNC_ALLOC?
> > > 
> > > domU can't issue any RPC, because all RPCs are issued from OP-TEE
> > > side. This is the nature of RPC - OP-TEE requests Normal World for some
> > > service. But of course, Normal World can perform certain actions that
> > > will make OP-TEE to issue a RPC. I discuss this in depth below.
> > > 
> > > > 
> > > > Looking at the patch, there are other two places, in addition to
> > > > OPTEE_SMC_RPC_FUNC_ALLOC, where the anonymous memory pages can be
> > > > allocated:
> > > > 
> > > > 1) copy_std_request
> > > > 2) translate_noncontig
> > > > 
> > > > We need to prove that neither 1) or 2) can result in a domU exausting
> > > > Xen memory.
> > > > 
> > > > In the case of 1), it looks like the memory is freed before returning to
> > > > the DomU, right? If so, it should be no problem?
> > > 
> > > Yes, mediator makes shadow copy of every request buffer to hide
> > > translated addresses from the guest. Number of requests is limited by
> > > number of OP-TEE threads.
> > > 
> > > > In the case of 2), it looks like the memory could outlive the call where
> > > > it is allocated. Is there any kind of protection against issuing
> > > > something like OPTEE_MSG_ATTR_TYPE_TMEM_INOUT in a loop? Is it OP-TEE
> > > > itself that would refuse the attempt? Thus, the idea is that
> > > > do_call_with_arg will return error and we'll just free the memory there?
> > > 
> > > Well, translate_noncontig() calls allocate_optee_shm_buf() which counts
> > > all allocated buffers. So you can't call it more than
> > > MAX_SHM_BUFFER_COUNT times, without de-allocating previous memory. But,
> > > thanks to your question, I have found a bug there: memory is not freed
> > > if allocate_optee_shm_buf() fails. I'll prepare patch later today.
> > > 
> > > > I cannot see a check for errors returned by do_call_with_arg and memory
> > > > freeing done because of that. Sorry I am not super familiar with the
> > > > code, I am just trying to make sure we are not offering to DomUs an easy
> > > > way to crash the system.
> > > 
> > > I tried to eliminate all possibilities for a guest to crash the
> > > system. Of course, this does not mean that there are none of them.
> > > 
> > > And yes, code is a bit hard to understand, because calls to OP-TEE are
> > > stateful and you need to account for that state. From NW and SW this
> > > looks quite fine, because state is handled naturally. But mediator sits
> > > in a middle, so it's implementation is a bit messy.
> > > 
> > > I'll try to explain what is going on, so you it will be easier to
> > > understand logic in the mediator.
> > > 
> > > There are two types of OP-TEE calls: fast calls and standard calls. Fast
> > > call is simple: call SMC and get result. It does not allocate thread
> > > context in OP-TEE and is non-preemptive. So yes, it should be fast. It
> > > is used for simple things like "get OP-TEE version" or "exchange
> > > capabilities". It is easy to handle them in mediator: just forward
> > > the call, check result, return it back to a guest.
> > > 
> > > Standard calls are stateful. OP-TEE allocates thread for each call. This
> > > call can be preempted either by IRQ or by RPC. For consistency IRQ
> > > return is also considered as special type of RPC. So, in general one
> > > standard call can consist of series of SMCs:
> > > 
> > > --> SMC with request
> > > <-- RPC return (like IRQ)
> > > --> SMC "resume call"
> > > <-- RPC return (like "read disk")
> > > --> SMC "resume call"
> > > <-- RPC return (like "send network packet")
> > > --> SMC "resume call"
> > > ...
> > > <-- Final return
> > > 
> > > There are many types of RPCs: "handle IRQ", additional shared buffer
> > > allocation/de-allocation, RPMB access, disks access, network access,
> > > synchronization primitives (when OP-TEE thread is gets blocked on a
> > > mutex), etc.
> > > 
> > > Two more things that makes all this worse: Normal World can register
> > > shared buffer with OP-TEE. Such buffer can live indefinitely
> > > long. Also, Normal World decides when to resume call. For example,
> > > calling process can be preempted and then resumed seconds
> > > later. Misbehaving guest can decide to not resume call at all.
> > > 
> > > As I said, I tried to take all this things into account. There are
> > > basically 3 types of objects that can lead to memory allocation on Xen
> > > side:
> > > 
> > > 1. Standard call context. Besides memory space for struct optee_std_call
> > > itself it allocates page for a shadow buffer, where argument addresses
> > > are translated by Xen. Number of this objects is limited by number of
> > > OP-TEE threads:
> > > 
> > >      count = atomic_add_unless(&ctx->call_count, 1, max_optee_threads);
> > >      if ( count == max_optee_threads )
> > >          return ERR_PTR(-ENOSPC);
> > > 
> > > 2. Shared buffer. This is a buffer shared by guest with OP-TEE. It can
> > > be either temporary buffer which is shared for one standard call
> > > duration, or registered shared buffer, which is remains active until it
> > > is de-registered. This is where translate_noncontig() comes into play.
> > > Number of this buffers is limited in allocate_optee_shm_buf():
> > > 
> > >      count = atomic_add_unless(&ctx->optee_shm_buf_count, 1,
> > >                                MAX_SHM_BUFFER_COUNT);
> > >      if ( count == MAX_SHM_BUFFER_COUNT )
> > >          return ERR_PTR(-ENOMEM);
> > > 
> > > 3. Shared RPC buffer. This is very special kind of buffer. Basically,
> > > OP-TEE needs some shared memory to provide RPC call parameters. So it
> > > requests buffer from Normal World. There is no hard limit on this from
> > > mediator side, because, as I told earlier, OP-TEE itself limits number
> > > of this buffers. There is no cases when more that one buffer will be
> > > allocated per OP-TEE thread. This type of buffer is used only to process
> > > RPC requests themselves. OP-TEE can request more buffers via RPC, but
> > > they will fall to p.2: NW uses separate request to register buffer and
> > > then returns its handle in the preempted call.
> > > 
> > > 
> > > Apart from those two limits, there is a limit on total number of pages
> > > which is shared between guest and OP-TEE: MAX_TOTAL_SMH_BUF_PG. This
> > > limit is for a case when guest tries to allocate few really BIG buffers.
> > > 
> > > 
> > > > It looks like they could be called from one of the OPTEE operations that
> > > > a domU could request? Is there a limit for them?
> > > 
> > > Yes, there are limits, as I described above.
> > > 
> > > Also, bear in mind that resources available to OP-TEE are also quite
> > > limited. So, in case of some breach in mediator, OP-TEE will give up
> > > first. This of course is not an excuse to have bugs in the mediator...
> > 
> > OK, thanks for the explanation. The reasons for my questions is that if
> > the allocations are using the memory of DomU, then at worst DomU can run
> > out of memory. 
> 
> Not really. The worst outcome is still a DoS of the host because we don't
> pre-allocate memory or even check that the total allocation will not exhaust
> the memory.
> 
> The only difference is I would argue this would be a misconfiguration of the
> system.
> 
> > But if the allocations are using anonymous memory, then
> > the whole platform might run out of memory. We have issued XSAs for
> > things like that in the past.
> > 
> > This is why I am worried about this patch: if we apply it we really
> > become reliant on these limits being implemented correctly. A bug can
> > have much more severe consequences.
> 
> This is not a problem specific to OP-TEE. Any anymous allocation (xmalloc,...)
> done in Xen on behalf of the guest has, in theory, the same problem (see more
> below).
> 
> > 
> > As you are the maintainer for this code, and this code is not security
> > supported, I'll leave it up to you (also see the other email about
> > moving optee to "supported, not security supported").
> > 
> > However, maybe a different solution would be to increase max_pages for a
> > domain when optee is enabled? Maybe just by a few pages (as many as
> > needed by the optee mediator)? Because if we did that, we wouldn't risk
> > exposing DOS attack vectors for every bug in the mediator limits checks.
> I think we need to differentiate two sorts of allocation:
>   1) Memory used by Xen on behalf of the guest
>   2) Memory used by the guest itself
> 
> d->max_pages is only meant to refer to the latter (in fact, a guest can
> balloon memory up to d->max_pages). In this case, we are discussing about the
> latter and therefore I think the should be accounted differently as the memory
> is not exposed to the guest.

Yeah, I was thinking the same thing yesterday but I didn't suggest it
because we don't have a way to specify it today. I thought that rather
than using anonymous memory it would still be preferable from a security
standpoint to use d->max_pages, although I was aware of issues such as
the guest able to balloon up to d->max_pages.

BTW do you know on top of your head of other things that can cause
anonymous xmalloc (that lives longer than a single trap or hypercall) in
Xen on ARM today? I'll try to keep an eye on them.


> Today, Xen doesn't have this facility. I know this has been discussed a few
> times in the past, but AFAIK, a patch series never materialized for it.
>
>
> However, to me, this sounds more an hardening work for the whole Xen rather
> than OP-TEE itself. So I think the patch provided by Oleksandr is probably the
> best way to go for this release.

I am OK with that especially as OP-TEE is not security supported (I
recently suggested to upgrade its support status to "supported, not
security supported".)

I wrote a summary of the options at the boottom of the email.


> > The below adds a 10 pages slack.
> > 
> > 
> > diff --git a/xen/arch/arm/tee/tee.c b/xen/arch/arm/tee/tee.c
> > index 3964a8a5cd..a3105f1a9a 100644
> > --- a/xen/arch/arm/tee/tee.c
> > +++ b/xen/arch/arm/tee/tee.c
> > @@ -38,8 +38,11 @@ bool tee_handle_call(struct cpu_user_regs *regs)
> >       return cur_mediator->ops->handle_call(regs);
> >   }
> >   +#define TEE_SLACK (10)
> >   int tee_domain_init(struct domain *d, uint16_t tee_type)
> >   {
> > +    int ret;
> > +
> >       if ( tee_type == XEN_DOMCTL_CONFIG_TEE_NONE )
> >           return 0;
> >   @@ -49,7 +52,15 @@ int tee_domain_init(struct domain *d, uint16_t
> > tee_type)
> >       if ( cur_mediator->tee_type != tee_type )
> >           return -EINVAL;
> >   -    return cur_mediator->ops->domain_init(d);
> > +    ret = cur_mediator->ops->domain_init(d);
> > +    if ( ret < 0 )
> > +        return ret;
> > +
> > +    /*
> > +     * Increase maxmem for domains with TEE, the extra pages are used by
> > +     * the mediator
> > +     */
> > +    d->max_pages += TEE_SLACK;
> 
> Regardless what I wrote above, this change would be incorrect because TEE is
> initialized the when domain is created. However, d->max_pages is set
> afterwards via DOMCTL_max_mem, so the value will get overridden.
> 
> However, I don't think OP-TEE code should modify d->max_pages. Instead, this
> should be accounted by the toolstack (or domain_build for dom0/domU created by
> Xen).

Good point, and I can see that libxl has already a slack_memkb concept
for PV x86 used to increase the memmap limit.


The best solution would be to introduce a generic framework for
accounting memory that Xen allocates on behalf of the guest. But of
course we don't want to ask Oleksandr to do that now 1 week from the
release and in response to a simple 3 lines patch.

With the best option not being available, we have to pick one of the
following:

1) current patch that uses anonymous memory
2) slack but done right (the toolstack and domain_build apply the slack
   to d->max_pages)

My preference is 2) for security reasons but it is a bit more work.
Taking into account that Julien, Volodymyr, and Bertrand all think that
1) is acceptable as is, then I will not insist. Option 1) is OK.
Julien Grall Oct. 14, 2021, 3:14 p.m. UTC | #13
Hi Stefano,

On 08/10/2021 22:49, Stefano Stabellini wrote:
> On Fri, 8 Oct 2021, Julien Grall wrote:
>> On 07/10/2021 23:14, Stefano Stabellini wrote:
>>> On Thu, 7 Oct 2021, Volodymyr Babchuk wrote:
>>>> Hi Stefano,
>>>>
>>>> Stefano Stabellini <sstabellini@kernel.org> writes:
>>>>
>>>>> On Wed, 6 Oct 2021, Oleksandr wrote:
>>>>>> Hello all
>>>>>>
>>>>>> Gentle reminder.
>>>>>    Many thanks for the ping, this patch fell off my radar.
>>>>>
>>>>>
>>>>>    
>>>>>> On 23.09.21 23:57, Volodymyr Babchuk wrote:
>>>>>>> Hi Stefano,
>>>>>>>
>>>>>>> Stefano Stabellini <sstabellini@kernel.org> writes:
>>>>>>>
>>>>>>>> On Mon, 6 Sep 2021, Oleksandr Tyshchenko wrote:
>>>>>>>>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>>>>>>>>
>>>>>>>>> Allocate anonymous domheap pages as there is no strict need to
>>>>>>>>> account them to a particular domain.
>>>>>>>>>
>>>>>>>>> Since XSA-383 "xen/arm: Restrict the amount of memory that
>>>>>>>>> dom0less
>>>>>>>>> domU and dom0 can allocate" the dom0 cannot allocate memory
>>>>>>>>> outside
>>>>>>>>> of the pre-allocated region. This means if we try to allocate
>>>>>>>>> non-anonymous page to be accounted to dom0 we will get an
>>>>>>>>> over-allocation issue when assigning that page to the domain.
>>>>>>>>> The anonymous page, in turn, is not assigned to any domain.
>>>>>>>>>
>>>>>>>>> CC: Julien Grall <jgrall@amazon.com>
>>>>>>>>> Signed-off-by: Oleksandr Tyshchenko
>>>>>>>>> <oleksandr_tyshchenko@epam.com>
>>>>>>>>> Acked-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
>>>>>>>> Only one question, which is more architectural: given that these
>>>>>>>> pages
>>>>>>>> are "unlimited", could the guest exploit the interface somehow to
>>>>>>>> force
>>>>>>>> Xen to allocate an very high number of anonymous pages?
>>>>>>>>
>>>>>>>> E.g. could a domain call OPTEE_SMC_RPC_FUNC_ALLOC in a loop to
>>>>>>>> force Xen
>>>>>>>> to exaust all memory pages?
>>>>>>> Generally, OP-TEE mediator tracks all resources allocated and
>>>>>>> imposes
>>>>>>> limits on them.
>>>>>>>
>>>>>>> OPTEE_SMC_RPC_FUNC_ALLOC case is a bit different, because it is
>>>>>>> issued
>>>>>>> not by domain, but by OP-TEE itself. As OP-TEE is more trusted piece
>>>>>>> of
>>>>>>> system we allow it to request as many buffers as it wants. Also, we
>>>>>>> know
>>>>>>> that OP-TEE asks only for one such buffer per every standard call.
>>>>>>> And
>>>>>>> number of simultaneous calls is limited by number of OP-TEE threads,
>>>>>>> which is quite low: typically only two.
>>>>>
>>>>> So let me repeat it differently to see if I understood correctly:
>>>>>
>>>>> - OPTEE_SMC_RPC_FUNC_ALLOC is only called by OP-TEE, not by the domain
>>>>> - OPTEE is trusted and only call it twice anyway
>>>>
>>>> Correct.
>>>>
>>>>> I am OK with this argument, but do we have a check to make sure a domU
>>>>> cannot issue OPTEE_SMC_RPC_FUNC_ALLOC?
>>>>
>>>> domU can't issue any RPC, because all RPCs are issued from OP-TEE
>>>> side. This is the nature of RPC - OP-TEE requests Normal World for some
>>>> service. But of course, Normal World can perform certain actions that
>>>> will make OP-TEE to issue a RPC. I discuss this in depth below.
>>>>
>>>>>
>>>>> Looking at the patch, there are other two places, in addition to
>>>>> OPTEE_SMC_RPC_FUNC_ALLOC, where the anonymous memory pages can be
>>>>> allocated:
>>>>>
>>>>> 1) copy_std_request
>>>>> 2) translate_noncontig
>>>>>
>>>>> We need to prove that neither 1) or 2) can result in a domU exausting
>>>>> Xen memory.
>>>>>
>>>>> In the case of 1), it looks like the memory is freed before returning to
>>>>> the DomU, right? If so, it should be no problem?
>>>>
>>>> Yes, mediator makes shadow copy of every request buffer to hide
>>>> translated addresses from the guest. Number of requests is limited by
>>>> number of OP-TEE threads.
>>>>
>>>>> In the case of 2), it looks like the memory could outlive the call where
>>>>> it is allocated. Is there any kind of protection against issuing
>>>>> something like OPTEE_MSG_ATTR_TYPE_TMEM_INOUT in a loop? Is it OP-TEE
>>>>> itself that would refuse the attempt? Thus, the idea is that
>>>>> do_call_with_arg will return error and we'll just free the memory there?
>>>>
>>>> Well, translate_noncontig() calls allocate_optee_shm_buf() which counts
>>>> all allocated buffers. So you can't call it more than
>>>> MAX_SHM_BUFFER_COUNT times, without de-allocating previous memory. But,
>>>> thanks to your question, I have found a bug there: memory is not freed
>>>> if allocate_optee_shm_buf() fails. I'll prepare patch later today.
>>>>
>>>>> I cannot see a check for errors returned by do_call_with_arg and memory
>>>>> freeing done because of that. Sorry I am not super familiar with the
>>>>> code, I am just trying to make sure we are not offering to DomUs an easy
>>>>> way to crash the system.
>>>>
>>>> I tried to eliminate all possibilities for a guest to crash the
>>>> system. Of course, this does not mean that there are none of them.
>>>>
>>>> And yes, code is a bit hard to understand, because calls to OP-TEE are
>>>> stateful and you need to account for that state. From NW and SW this
>>>> looks quite fine, because state is handled naturally. But mediator sits
>>>> in a middle, so it's implementation is a bit messy.
>>>>
>>>> I'll try to explain what is going on, so you it will be easier to
>>>> understand logic in the mediator.
>>>>
>>>> There are two types of OP-TEE calls: fast calls and standard calls. Fast
>>>> call is simple: call SMC and get result. It does not allocate thread
>>>> context in OP-TEE and is non-preemptive. So yes, it should be fast. It
>>>> is used for simple things like "get OP-TEE version" or "exchange
>>>> capabilities". It is easy to handle them in mediator: just forward
>>>> the call, check result, return it back to a guest.
>>>>
>>>> Standard calls are stateful. OP-TEE allocates thread for each call. This
>>>> call can be preempted either by IRQ or by RPC. For consistency IRQ
>>>> return is also considered as special type of RPC. So, in general one
>>>> standard call can consist of series of SMCs:
>>>>
>>>> --> SMC with request
>>>> <-- RPC return (like IRQ)
>>>> --> SMC "resume call"
>>>> <-- RPC return (like "read disk")
>>>> --> SMC "resume call"
>>>> <-- RPC return (like "send network packet")
>>>> --> SMC "resume call"
>>>> ...
>>>> <-- Final return
>>>>
>>>> There are many types of RPCs: "handle IRQ", additional shared buffer
>>>> allocation/de-allocation, RPMB access, disks access, network access,
>>>> synchronization primitives (when OP-TEE thread is gets blocked on a
>>>> mutex), etc.
>>>>
>>>> Two more things that makes all this worse: Normal World can register
>>>> shared buffer with OP-TEE. Such buffer can live indefinitely
>>>> long. Also, Normal World decides when to resume call. For example,
>>>> calling process can be preempted and then resumed seconds
>>>> later. Misbehaving guest can decide to not resume call at all.
>>>>
>>>> As I said, I tried to take all this things into account. There are
>>>> basically 3 types of objects that can lead to memory allocation on Xen
>>>> side:
>>>>
>>>> 1. Standard call context. Besides memory space for struct optee_std_call
>>>> itself it allocates page for a shadow buffer, where argument addresses
>>>> are translated by Xen. Number of this objects is limited by number of
>>>> OP-TEE threads:
>>>>
>>>>       count = atomic_add_unless(&ctx->call_count, 1, max_optee_threads);
>>>>       if ( count == max_optee_threads )
>>>>           return ERR_PTR(-ENOSPC);
>>>>
>>>> 2. Shared buffer. This is a buffer shared by guest with OP-TEE. It can
>>>> be either temporary buffer which is shared for one standard call
>>>> duration, or registered shared buffer, which is remains active until it
>>>> is de-registered. This is where translate_noncontig() comes into play.
>>>> Number of this buffers is limited in allocate_optee_shm_buf():
>>>>
>>>>       count = atomic_add_unless(&ctx->optee_shm_buf_count, 1,
>>>>                                 MAX_SHM_BUFFER_COUNT);
>>>>       if ( count == MAX_SHM_BUFFER_COUNT )
>>>>           return ERR_PTR(-ENOMEM);
>>>>
>>>> 3. Shared RPC buffer. This is very special kind of buffer. Basically,
>>>> OP-TEE needs some shared memory to provide RPC call parameters. So it
>>>> requests buffer from Normal World. There is no hard limit on this from
>>>> mediator side, because, as I told earlier, OP-TEE itself limits number
>>>> of this buffers. There is no cases when more that one buffer will be
>>>> allocated per OP-TEE thread. This type of buffer is used only to process
>>>> RPC requests themselves. OP-TEE can request more buffers via RPC, but
>>>> they will fall to p.2: NW uses separate request to register buffer and
>>>> then returns its handle in the preempted call.
>>>>
>>>>
>>>> Apart from those two limits, there is a limit on total number of pages
>>>> which is shared between guest and OP-TEE: MAX_TOTAL_SMH_BUF_PG. This
>>>> limit is for a case when guest tries to allocate few really BIG buffers.
>>>>
>>>>
>>>>> It looks like they could be called from one of the OPTEE operations that
>>>>> a domU could request? Is there a limit for them?
>>>>
>>>> Yes, there are limits, as I described above.
>>>>
>>>> Also, bear in mind that resources available to OP-TEE are also quite
>>>> limited. So, in case of some breach in mediator, OP-TEE will give up
>>>> first. This of course is not an excuse to have bugs in the mediator...
>>>
>>> OK, thanks for the explanation. The reasons for my questions is that if
>>> the allocations are using the memory of DomU, then at worst DomU can run
>>> out of memory.
>>
>> Not really. The worst outcome is still a DoS of the host because we don't
>> pre-allocate memory or even check that the total allocation will not exhaust
>> the memory.
>>
>> The only difference is I would argue this would be a misconfiguration of the
>> system.
>>
>>> But if the allocations are using anonymous memory, then
>>> the whole platform might run out of memory. We have issued XSAs for
>>> things like that in the past.
>>>
>>> This is why I am worried about this patch: if we apply it we really
>>> become reliant on these limits being implemented correctly. A bug can
>>> have much more severe consequences.
>>
>> This is not a problem specific to OP-TEE. Any anymous allocation (xmalloc,...)
>> done in Xen on behalf of the guest has, in theory, the same problem (see more
>> below).
>>
>>>
>>> As you are the maintainer for this code, and this code is not security
>>> supported, I'll leave it up to you (also see the other email about
>>> moving optee to "supported, not security supported").
>>>
>>> However, maybe a different solution would be to increase max_pages for a
>>> domain when optee is enabled? Maybe just by a few pages (as many as
>>> needed by the optee mediator)? Because if we did that, we wouldn't risk
>>> exposing DOS attack vectors for every bug in the mediator limits checks.
>> I think we need to differentiate two sorts of allocation:
>>    1) Memory used by Xen on behalf of the guest
>>    2) Memory used by the guest itself
>>
>> d->max_pages is only meant to refer to the latter (in fact, a guest can
>> balloon memory up to d->max_pages). In this case, we are discussing about the
>> latter and therefore I think the should be accounted differently as the memory
>> is not exposed to the guest.
> 
> Yeah, I was thinking the same thing yesterday but I didn't suggest it
> because we don't have a way to specify it today. I thought that rather
> than using anonymous memory it would still be preferable from a security
> standpoint to use d->max_pages, although I was aware of issues such as
> the guest able to balloon up to d->max_pages.
> 
> BTW do you know on top of your head of other things that can cause
> anonymous xmalloc (that lives longer than a single trap or hypercall) in
> Xen on ARM today? I'll try to keep an eye on them.

The biggest user on Arm is the P2M code today. The ITS code is probably 
going to be another one long term.

There are other example on x86 (shadow pool, IOMMU pool).

>>> The below adds a 10 pages slack.
>>>
>>>
>>> diff --git a/xen/arch/arm/tee/tee.c b/xen/arch/arm/tee/tee.c
>>> index 3964a8a5cd..a3105f1a9a 100644
>>> --- a/xen/arch/arm/tee/tee.c
>>> +++ b/xen/arch/arm/tee/tee.c
>>> @@ -38,8 +38,11 @@ bool tee_handle_call(struct cpu_user_regs *regs)
>>>        return cur_mediator->ops->handle_call(regs);
>>>    }
>>>    +#define TEE_SLACK (10)
>>>    int tee_domain_init(struct domain *d, uint16_t tee_type)
>>>    {
>>> +    int ret;
>>> +
>>>        if ( tee_type == XEN_DOMCTL_CONFIG_TEE_NONE )
>>>            return 0;
>>>    @@ -49,7 +52,15 @@ int tee_domain_init(struct domain *d, uint16_t
>>> tee_type)
>>>        if ( cur_mediator->tee_type != tee_type )
>>>            return -EINVAL;
>>>    -    return cur_mediator->ops->domain_init(d);
>>> +    ret = cur_mediator->ops->domain_init(d);
>>> +    if ( ret < 0 )
>>> +        return ret;
>>> +
>>> +    /*
>>> +     * Increase maxmem for domains with TEE, the extra pages are used by
>>> +     * the mediator
>>> +     */
>>> +    d->max_pages += TEE_SLACK;
>>
>> Regardless what I wrote above, this change would be incorrect because TEE is
>> initialized the when domain is created. However, d->max_pages is set
>> afterwards via DOMCTL_max_mem, so the value will get overridden.
>>
>> However, I don't think OP-TEE code should modify d->max_pages. Instead, this
>> should be accounted by the toolstack (or domain_build for dom0/domU created by
>> Xen).
> 
> Good point, and I can see that libxl has already a slack_memkb concept
> for PV x86 used to increase the memmap limit.
> 
> 
> The best solution would be to introduce a generic framework for
> accounting memory that Xen allocates on behalf of the guest. But of
> course we don't want to ask Oleksandr to do that now 1 week from the
> release and in response to a simple 3 lines patch. >
> With the best option not being available, we have to pick one of the
> following:
> 
> 1) current patch that uses anonymous memory
> 2) slack but done right (the toolstack and domain_build apply the slack
>     to d->max_pages)
> 
> My preference is 2) for security reasons but it is a bit more work.
> Taking into account that Julien, Volodymyr, and Bertrand all think that
> 1) is acceptable as is, then I will not insist. Option 1) is OK.

I will commit this patch then.

Cheers,
diff mbox series

Patch

diff --git a/xen/arch/arm/tee/optee.c b/xen/arch/arm/tee/optee.c
index 3453615..83b4994 100644
--- a/xen/arch/arm/tee/optee.c
+++ b/xen/arch/arm/tee/optee.c
@@ -410,7 +410,7 @@  static struct shm_rpc *allocate_and_pin_shm_rpc(struct optee_domain *ctx,
     if ( !shm_rpc )
         return ERR_PTR(-ENOMEM);
 
-    shm_rpc->xen_arg_pg = alloc_domheap_page(current->domain, 0);
+    shm_rpc->xen_arg_pg = alloc_domheap_page(NULL, 0);
     if ( !shm_rpc->xen_arg_pg )
     {
         xfree(shm_rpc);
@@ -774,7 +774,7 @@  static int translate_noncontig(struct optee_domain *ctx,
      * - There is a plan to implement preemption in the code below, which
      *   will allow use to increase default MAX_SHM_BUFFER_PG value.
      */
-    xen_pgs = alloc_domheap_pages(current->domain, order, 0);
+    xen_pgs = alloc_domheap_pages(NULL, order, 0);
     if ( !xen_pgs )
         return -ENOMEM;
 
@@ -938,7 +938,7 @@  static bool copy_std_request(struct cpu_user_regs *regs,
 
     BUILD_BUG_ON(OPTEE_MSG_NONCONTIG_PAGE_SIZE > PAGE_SIZE);
 
-    call->xen_arg_pg = alloc_domheap_page(current->domain, 0);
+    call->xen_arg_pg = alloc_domheap_page(NULL, 0);
     if ( !call->xen_arg_pg )
     {
         set_user_reg(regs, 0, OPTEE_SMC_RETURN_ENOMEM);