optee: immediately free buffers that are released by OP-TEE

Message ID	20200506014246.3397490-1-volodymyr_babchuk@epam.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=k7LG=6U=lists.xenproject.org=xen-devel-bounces@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A9FB2206A5 From: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com> To: "xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org> Subject: [PATCH] optee: immediately free buffers that are released by OP-TEE Thread-Topic: [PATCH] optee: immediately free buffers that are released by OP-TEE Thread-Index: AQHWI0fTRy+rlPvmsUKZMLlR49t2iQ== Date: Wed, 6 May 2020 01:44:05 +0000 Message-ID: <20200506014246.3397490-1-volodymyr_babchuk@epam.com> Accept-Language: en-US Content-Language: en-US Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Precedence: list Cc: "tee-dev@lists.linaro.org" <tee-dev@lists.linaro.org>, Julien Grall <julien@xen.org>, Stefano Stabellini <sstabellini@kernel.org>, Volodymyr Babchuk <Volodymyr_Babchuk@epam.com> Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>
Series	optee: immediately free buffers that are released by OP-TEE \| expand optee: immediately free buffers that are released by OP-TEE

Volodymyr Babchuk May 6, 2020, 1:44 a.m. UTC

Normal World can share buffer with OP-TEE for two reasons:
1. Some client application wants to exchange data with TA
2. OP-TEE asks for shared buffer for internal needs

The second case was handle more strictly than necessary:

1. In RPC request OP-TEE asks for buffer
2. NW allocates buffer and provides it via RPC response
3. Xen pins pages and translates data
4. Xen provides buffer to OP-TEE
5. OP-TEE uses it
6. OP-TEE sends request to free the buffer
7. NW frees the buffer and sends the RPC response
8. Xen unpins pages and forgets about the buffer

The problem is that Xen should forget about buffer in between stages 6
and 7. I.e. the right flow should be like this:

6. OP-TEE sends request to free the buffer
7. Xen unpins pages and forgets about the buffer
8. NW frees the buffer and sends the RPC response

This is because OP-TEE internally frees the buffer before sending the
"free SHM buffer" request. So we have no reason to hold reference for
this buffer anymore. Moreover, in multiprocessor systems NW have time
to reuse buffer cookie for another buffer. Xen complained about this
and denied the new buffer registration. I have seen this issue while
running tests on iMX SoC.

So, this patch basically corrects that behavior by freeing the buffer
earlier, when handling RPC return from OP-TEE.

Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
---
 xen/arch/arm/tee/optee.c | 24 ++++++++++++++++++++----
 1 file changed, 20 insertions(+), 4 deletions(-)

Julien Grall May 11, 2020, 9:34 a.m. UTC | #1

Hi Volodymyr,

On 06/05/2020 02:44, Volodymyr Babchuk wrote:
> Normal World can share buffer with OP-TEE for two reasons:
> 1. Some client application wants to exchange data with TA
> 2. OP-TEE asks for shared buffer for internal needs
> 
> The second case was handle more strictly than necessary:
> 
> 1. In RPC request OP-TEE asks for buffer
> 2. NW allocates buffer and provides it via RPC response
> 3. Xen pins pages and translates data
> 4. Xen provides buffer to OP-TEE
> 5. OP-TEE uses it
> 6. OP-TEE sends request to free the buffer
> 7. NW frees the buffer and sends the RPC response
> 8. Xen unpins pages and forgets about the buffer
> 
> The problem is that Xen should forget about buffer in between stages 6
> and 7. I.e. the right flow should be like this:
> 
> 6. OP-TEE sends request to free the buffer
> 7. Xen unpins pages and forgets about the buffer
> 8. NW frees the buffer and sends the RPC response
> 
> This is because OP-TEE internally frees the buffer before sending the
> "free SHM buffer" request. So we have no reason to hold reference for
> this buffer anymore. Moreover, in multiprocessor systems NW have time
> to reuse buffer cookie for another buffer. Xen complained about this
> and denied the new buffer registration. I have seen this issue while
> running tests on iMX SoC.
> 
> So, this patch basically corrects that behavior by freeing the buffer
> earlier, when handling RPC return from OP-TEE.
> 
> Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
> ---
>   xen/arch/arm/tee/optee.c | 24 ++++++++++++++++++++----
>   1 file changed, 20 insertions(+), 4 deletions(-)
> 
> diff --git a/xen/arch/arm/tee/optee.c b/xen/arch/arm/tee/optee.c
> index 6a035355db..af19fc31f8 100644
> --- a/xen/arch/arm/tee/optee.c
> +++ b/xen/arch/arm/tee/optee.c
> @@ -1099,6 +1099,26 @@ static int handle_rpc_return(struct optee_domain *ctx,
>           if ( shm_rpc->xen_arg->cmd == OPTEE_RPC_CMD_SHM_ALLOC )
>               call->rpc_buffer_type = shm_rpc->xen_arg->params[0].u.value.a;
>   
> +        /*
> +         * OP-TEE signals that it frees the buffer that it requested
> +         * before. This is the right for us to do the same.
> +         */
> +        if ( shm_rpc->xen_arg->cmd == OPTEE_RPC_CMD_SHM_FREE )
> +        {
> +            uint64_t cookie = shm_rpc->xen_arg->params[0].u.value.b;
> +
> +            free_optee_shm_buf(ctx, cookie);
> +
> +            /*
> +             * This should never happen. We have a bug either in the
> +             * OP-TEE or in the mediator.
> +             */
> +            if ( call->rpc_data_cookie && call->rpc_data_cookie != cookie )
> +                gprintk(XENLOG_ERR,
> +                        "Saved RPC cookie does not corresponds to OP-TEE's (%"PRIx64" != %"PRIx64")\n",

s/corresponds/correspond/

> +                        call->rpc_data_cookie, cookie);

IIUC, if you free the wrong SHM buffer then your guest is likely to be 
running incorrectly afterwards. So shouldn't we crash the guest to avoid 
further issue?

> +            call->rpc_data_cookie = 0;
> +        }
>           unmap_domain_page(shm_rpc->xen_arg);
>       }
>   
> @@ -1464,10 +1484,6 @@ static void handle_rpc_cmd(struct optee_domain *ctx, struct cpu_user_regs *regs,
>               }
>               break;
>           case OPTEE_RPC_CMD_SHM_FREE:
> -            free_optee_shm_buf(ctx, shm_rpc->xen_arg->params[0].u.value.b);
> -            if ( call->rpc_data_cookie ==
> -                 shm_rpc->xen_arg->params[0].u.value.b )
> -                call->rpc_data_cookie = 0;
>               break;
>           default:
>               break;
> 

Cheers,

Andrew Cooper May 11, 2020, 10:10 a.m. UTC | #2

On 11/05/2020 10:34, Julien Grall wrote:
> Hi Volodymyr,
>
> On 06/05/2020 02:44, Volodymyr Babchuk wrote:
>> Normal World can share buffer with OP-TEE for two reasons:
>> 1. Some client application wants to exchange data with TA
>> 2. OP-TEE asks for shared buffer for internal needs
>>
>> The second case was handle more strictly than necessary:
>>
>> 1. In RPC request OP-TEE asks for buffer
>> 2. NW allocates buffer and provides it via RPC response
>> 3. Xen pins pages and translates data
>> 4. Xen provides buffer to OP-TEE
>> 5. OP-TEE uses it
>> 6. OP-TEE sends request to free the buffer
>> 7. NW frees the buffer and sends the RPC response
>> 8. Xen unpins pages and forgets about the buffer
>>
>> The problem is that Xen should forget about buffer in between stages 6
>> and 7. I.e. the right flow should be like this:
>>
>> 6. OP-TEE sends request to free the buffer
>> 7. Xen unpins pages and forgets about the buffer
>> 8. NW frees the buffer and sends the RPC response
>>
>> This is because OP-TEE internally frees the buffer before sending the
>> "free SHM buffer" request. So we have no reason to hold reference for
>> this buffer anymore. Moreover, in multiprocessor systems NW have time
>> to reuse buffer cookie for another buffer. Xen complained about this
>> and denied the new buffer registration. I have seen this issue while
>> running tests on iMX SoC.
>>
>> So, this patch basically corrects that behavior by freeing the buffer
>> earlier, when handling RPC return from OP-TEE.
>>
>> Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
>> ---
>>   xen/arch/arm/tee/optee.c | 24 ++++++++++++++++++++----
>>   1 file changed, 20 insertions(+), 4 deletions(-)
>>
>> diff --git a/xen/arch/arm/tee/optee.c b/xen/arch/arm/tee/optee.c
>> index 6a035355db..af19fc31f8 100644
>> --- a/xen/arch/arm/tee/optee.c
>> +++ b/xen/arch/arm/tee/optee.c
>> @@ -1099,6 +1099,26 @@ static int handle_rpc_return(struct
>> optee_domain *ctx,
>>           if ( shm_rpc->xen_arg->cmd == OPTEE_RPC_CMD_SHM_ALLOC )
>>               call->rpc_buffer_type =
>> shm_rpc->xen_arg->params[0].u.value.a;
>>   +        /*
>> +         * OP-TEE signals that it frees the buffer that it requested
>> +         * before. This is the right for us to do the same.
>> +         */
>> +        if ( shm_rpc->xen_arg->cmd == OPTEE_RPC_CMD_SHM_FREE )
>> +        {
>> +            uint64_t cookie = shm_rpc->xen_arg->params[0].u.value.b;
>> +
>> +            free_optee_shm_buf(ctx, cookie);
>> +
>> +            /*
>> +             * This should never happen. We have a bug either in the
>> +             * OP-TEE or in the mediator.
>> +             */
>> +            if ( call->rpc_data_cookie && call->rpc_data_cookie !=
>> cookie )
>> +                gprintk(XENLOG_ERR,
>> +                        "Saved RPC cookie does not corresponds to
>> OP-TEE's (%"PRIx64" != %"PRIx64")\n",
>
> s/corresponds/correspond/
>
>> +                        call->rpc_data_cookie, cookie);
>
> IIUC, if you free the wrong SHM buffer then your guest is likely to be
> running incorrectly afterwards. So shouldn't we crash the guest to
> avoid further issue?

No - crashing the guest prohibits testing of the interface, and/or the
guest realising it screwed up and dumping enough state to usefully debug
what is going on.

Furthermore, if userspace could trigger this path, we'd have to issue an
XSA.

Crashing the guest is almost never the right thing to do, and definitely
not appropriate for a bad parameter.

~Andrew

Julien Grall May 11, 2020, 10:26 a.m. UTC | #3

Hi Andrew,

On 11/05/2020 11:10, Andrew Cooper wrote:
> On 11/05/2020 10:34, Julien Grall wrote:
>> Hi Volodymyr,
>>
>> On 06/05/2020 02:44, Volodymyr Babchuk wrote:
>>> Normal World can share buffer with OP-TEE for two reasons:
>>> 1. Some client application wants to exchange data with TA
>>> 2. OP-TEE asks for shared buffer for internal needs
>>>
>>> The second case was handle more strictly than necessary:
>>>
>>> 1. In RPC request OP-TEE asks for buffer
>>> 2. NW allocates buffer and provides it via RPC response
>>> 3. Xen pins pages and translates data
>>> 4. Xen provides buffer to OP-TEE
>>> 5. OP-TEE uses it
>>> 6. OP-TEE sends request to free the buffer
>>> 7. NW frees the buffer and sends the RPC response
>>> 8. Xen unpins pages and forgets about the buffer
>>>
>>> The problem is that Xen should forget about buffer in between stages 6
>>> and 7. I.e. the right flow should be like this:
>>>
>>> 6. OP-TEE sends request to free the buffer
>>> 7. Xen unpins pages and forgets about the buffer
>>> 8. NW frees the buffer and sends the RPC response
>>>
>>> This is because OP-TEE internally frees the buffer before sending the
>>> "free SHM buffer" request. So we have no reason to hold reference for
>>> this buffer anymore. Moreover, in multiprocessor systems NW have time
>>> to reuse buffer cookie for another buffer. Xen complained about this
>>> and denied the new buffer registration. I have seen this issue while
>>> running tests on iMX SoC.
>>>
>>> So, this patch basically corrects that behavior by freeing the buffer
>>> earlier, when handling RPC return from OP-TEE.
>>>
>>> Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
>>> ---
>>>    xen/arch/arm/tee/optee.c | 24 ++++++++++++++++++++----
>>>    1 file changed, 20 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/xen/arch/arm/tee/optee.c b/xen/arch/arm/tee/optee.c
>>> index 6a035355db..af19fc31f8 100644
>>> --- a/xen/arch/arm/tee/optee.c
>>> +++ b/xen/arch/arm/tee/optee.c
>>> @@ -1099,6 +1099,26 @@ static int handle_rpc_return(struct
>>> optee_domain *ctx,
>>>            if ( shm_rpc->xen_arg->cmd == OPTEE_RPC_CMD_SHM_ALLOC )
>>>                call->rpc_buffer_type =
>>> shm_rpc->xen_arg->params[0].u.value.a;
>>>    +        /*
>>> +         * OP-TEE signals that it frees the buffer that it requested
>>> +         * before. This is the right for us to do the same.
>>> +         */
>>> +        if ( shm_rpc->xen_arg->cmd == OPTEE_RPC_CMD_SHM_FREE )
>>> +        {
>>> +            uint64_t cookie = shm_rpc->xen_arg->params[0].u.value.b;
>>> +
>>> +            free_optee_shm_buf(ctx, cookie);
>>> +
>>> +            /*
>>> +             * This should never happen. We have a bug either in the
>>> +             * OP-TEE or in the mediator.
>>> +             */
>>> +            if ( call->rpc_data_cookie && call->rpc_data_cookie !=
>>> cookie )
>>> +                gprintk(XENLOG_ERR,
>>> +                        "Saved RPC cookie does not corresponds to
>>> OP-TEE's (%"PRIx64" != %"PRIx64")\n",
>>
>> s/corresponds/correspond/
>>
>>> +                        call->rpc_data_cookie, cookie);
>>
>> IIUC, if you free the wrong SHM buffer then your guest is likely to be
>> running incorrectly afterwards. So shouldn't we crash the guest to
>> avoid further issue?
> 
> No - crashing the guest prohibits testing of the interface, and/or the
> guest realising it screwed up and dumping enough state to usefully debug
> what is going on.

The comment in the code suggests it is a bug in the OP-TEE/mediator:

/*
  * This should never happen. We have a bug either in the
  * OP-TEE or in the mediator.
  */

So I am not sure why this would be the guest fault here.

> 
> Furthermore, if userspace could trigger this path, we'd have to issue an
> XSA.

Why so? We don't issue XSAs for hypercalls issued through privcmd. While 
this is not hypercalls but close enough as this is using smc (Supervisor 
Mode Call) and hvc. Both are only accessible from kernel mode.

> 
> Crashing the guest is almost never the right thing to do, and definitely
> not appropriate for a bad parameter.

AFAICT, the bad parameter is not from the guest but OP-TEE firmware (or 
mediator) itself. If OP-TEE/mediator is returning buggy value, then it 
may mean you break the isolation. So I don't think simply printing a 
message and continue is the right thing to do.

Cheers,

Stefano Stabellini May 11, 2020, 10:37 p.m. UTC | #4

On Mon, 11 May 2020, Andrew Cooper wrote:
> On 11/05/2020 10:34, Julien Grall wrote:
> > Hi Volodymyr,
> >
> > On 06/05/2020 02:44, Volodymyr Babchuk wrote:
> >> Normal World can share buffer with OP-TEE for two reasons:
> >> 1. Some client application wants to exchange data with TA
> >> 2. OP-TEE asks for shared buffer for internal needs
> >>
> >> The second case was handle more strictly than necessary:
> >>
> >> 1. In RPC request OP-TEE asks for buffer
> >> 2. NW allocates buffer and provides it via RPC response
> >> 3. Xen pins pages and translates data
> >> 4. Xen provides buffer to OP-TEE
> >> 5. OP-TEE uses it
> >> 6. OP-TEE sends request to free the buffer
> >> 7. NW frees the buffer and sends the RPC response
> >> 8. Xen unpins pages and forgets about the buffer
> >>
> >> The problem is that Xen should forget about buffer in between stages 6
> >> and 7. I.e. the right flow should be like this:
> >>
> >> 6. OP-TEE sends request to free the buffer
> >> 7. Xen unpins pages and forgets about the buffer
> >> 8. NW frees the buffer and sends the RPC response
> >>
> >> This is because OP-TEE internally frees the buffer before sending the
> >> "free SHM buffer" request. So we have no reason to hold reference for
> >> this buffer anymore. Moreover, in multiprocessor systems NW have time
> >> to reuse buffer cookie for another buffer. Xen complained about this
> >> and denied the new buffer registration. I have seen this issue while
> >> running tests on iMX SoC.
> >>
> >> So, this patch basically corrects that behavior by freeing the buffer
> >> earlier, when handling RPC return from OP-TEE.
> >>
> >> Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
> >> ---
> >>   xen/arch/arm/tee/optee.c | 24 ++++++++++++++++++++----
> >>   1 file changed, 20 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/xen/arch/arm/tee/optee.c b/xen/arch/arm/tee/optee.c
> >> index 6a035355db..af19fc31f8 100644
> >> --- a/xen/arch/arm/tee/optee.c
> >> +++ b/xen/arch/arm/tee/optee.c
> >> @@ -1099,6 +1099,26 @@ static int handle_rpc_return(struct
> >> optee_domain *ctx,
> >>           if ( shm_rpc->xen_arg->cmd == OPTEE_RPC_CMD_SHM_ALLOC )
> >>               call->rpc_buffer_type =
> >> shm_rpc->xen_arg->params[0].u.value.a;
> >>   +        /*
> >> +         * OP-TEE signals that it frees the buffer that it requested
> >> +         * before. This is the right for us to do the same.
> >> +         */
> >> +        if ( shm_rpc->xen_arg->cmd == OPTEE_RPC_CMD_SHM_FREE )
> >> +        {
> >> +            uint64_t cookie = shm_rpc->xen_arg->params[0].u.value.b;
> >> +
> >> +            free_optee_shm_buf(ctx, cookie);
> >> +
> >> +            /*
> >> +             * This should never happen. We have a bug either in the
> >> +             * OP-TEE or in the mediator.
> >> +             */
> >> +            if ( call->rpc_data_cookie && call->rpc_data_cookie !=
> >> cookie )
> >> +                gprintk(XENLOG_ERR,
> >> +                        "Saved RPC cookie does not corresponds to
> >> OP-TEE's (%"PRIx64" != %"PRIx64")\n",
> >
> > s/corresponds/correspond/
> >
> >> +                        call->rpc_data_cookie, cookie);
> >
> > IIUC, if you free the wrong SHM buffer then your guest is likely to be
> > running incorrectly afterwards. So shouldn't we crash the guest to
> > avoid further issue?
> 
> No - crashing the guest prohibits testing of the interface, and/or the
> guest realising it screwed up and dumping enough state to usefully debug
> what is going on.
> 
> Furthermore, if userspace could trigger this path, we'd have to issue an
> XSA.
> 
> Crashing the guest is almost never the right thing to do, and definitely
> not appropriate for a bad parameter.

Maybe we want to close the OPTEE interface for the guest, instead of
crashing the whole VM. I.e. freeing the OPTEE context for the domain
(d->arch.tee)?

But I think the patch is good as it is honestly.

Volodymyr Babchuk May 18, 2020, 2:04 a.m. UTC | #5

Hi Julien,

On Mon, 2020-05-11 at 10:34 +0100, Julien Grall wrote:
> Hi Volodymyr,
> 
> On 06/05/2020 02:44, Volodymyr Babchuk wrote:
> > Normal World can share buffer with OP-TEE for two reasons:
> > 1. Some client application wants to exchange data with TA
> > 2. OP-TEE asks for shared buffer for internal needs
> > 
> > The second case was handle more strictly than necessary:
> > 
> > 1. In RPC request OP-TEE asks for buffer
> > 2. NW allocates buffer and provides it via RPC response
> > 3. Xen pins pages and translates data
> > 4. Xen provides buffer to OP-TEE
> > 5. OP-TEE uses it
> > 6. OP-TEE sends request to free the buffer
> > 7. NW frees the buffer and sends the RPC response
> > 8. Xen unpins pages and forgets about the buffer
> > 
> > The problem is that Xen should forget about buffer in between stages 6
> > and 7. I.e. the right flow should be like this:
> > 
> > 6. OP-TEE sends request to free the buffer
> > 7. Xen unpins pages and forgets about the buffer
> > 8. NW frees the buffer and sends the RPC response
> > 
> > This is because OP-TEE internally frees the buffer before sending the
> > "free SHM buffer" request. So we have no reason to hold reference for
> > this buffer anymore. Moreover, in multiprocessor systems NW have time
> > to reuse buffer cookie for another buffer. Xen complained about this
> > and denied the new buffer registration. I have seen this issue while
> > running tests on iMX SoC.
> > 
> > So, this patch basically corrects that behavior by freeing the buffer
> > earlier, when handling RPC return from OP-TEE.
> > 
> > Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
> > ---
> >   xen/arch/arm/tee/optee.c | 24 ++++++++++++++++++++----
> >   1 file changed, 20 insertions(+), 4 deletions(-)
> > 
> > diff --git a/xen/arch/arm/tee/optee.c b/xen/arch/arm/tee/optee.c
> > index 6a035355db..af19fc31f8 100644
> > --- a/xen/arch/arm/tee/optee.c
> > +++ b/xen/arch/arm/tee/optee.c
> > @@ -1099,6 +1099,26 @@ static int handle_rpc_return(struct optee_domain *ctx,
> >           if ( shm_rpc->xen_arg->cmd == OPTEE_RPC_CMD_SHM_ALLOC )
> >               call->rpc_buffer_type = shm_rpc->xen_arg->params[0].u.value.a;
> >   
> > +        /*
> > +         * OP-TEE signals that it frees the buffer that it requested
> > +         * before. This is the right for us to do the same.
> > +         */
> > +        if ( shm_rpc->xen_arg->cmd == OPTEE_RPC_CMD_SHM_FREE )
> > +        {
> > +            uint64_t cookie = shm_rpc->xen_arg->params[0].u.value.b;
> > +
> > +            free_optee_shm_buf(ctx, cookie);
> > +
> > +            /*
> > +             * This should never happen. We have a bug either in the
> > +             * OP-TEE or in the mediator.
> > +             */
> > +            if ( call->rpc_data_cookie && call->rpc_data_cookie != cookie )
> > +                gprintk(XENLOG_ERR,
> > +                        "Saved RPC cookie does not corresponds to OP-TEE's (%"PRIx64" != %"PRIx64")\n",
> 
> s/corresponds/correspond/
Will fix in the next version.

> > +                        call->rpc_data_cookie, cookie);
> 
> IIUC, if you free the wrong SHM buffer then your guest is likely to be 
> running incorrectly afterwards. So shouldn't we crash the guest to avoid 
> further issue?
> 

Well, we freed the exact buffer that OP-TEE asked us to free. So guest
didn't anything bad. Moreover, optee driver in Linux kernel does not
have similar check, so it will free this buffer without any complains. 
I'm just being overcautious here. Thus, I see no reason to crash the
guest.

Julien Grall May 18, 2020, 8:33 a.m. UTC | #6

On 18/05/2020 03:04, Volodymyr Babchuk wrote:
> Hi Julien,

Hi,

> On Mon, 2020-05-11 at 10:34 +0100, Julien Grall wrote:
>> Hi Volodymyr,
>>
>> On 06/05/2020 02:44, Volodymyr Babchuk wrote:
>>> Normal World can share buffer with OP-TEE for two reasons:
>>> 1. Some client application wants to exchange data with TA
>>> 2. OP-TEE asks for shared buffer for internal needs
>>>
>>> The second case was handle more strictly than necessary:
>>>
>>> 1. In RPC request OP-TEE asks for buffer
>>> 2. NW allocates buffer and provides it via RPC response
>>> 3. Xen pins pages and translates data
>>> 4. Xen provides buffer to OP-TEE
>>> 5. OP-TEE uses it
>>> 6. OP-TEE sends request to free the buffer
>>> 7. NW frees the buffer and sends the RPC response
>>> 8. Xen unpins pages and forgets about the buffer
>>>
>>> The problem is that Xen should forget about buffer in between stages 6
>>> and 7. I.e. the right flow should be like this:
>>>
>>> 6. OP-TEE sends request to free the buffer
>>> 7. Xen unpins pages and forgets about the buffer
>>> 8. NW frees the buffer and sends the RPC response
>>>
>>> This is because OP-TEE internally frees the buffer before sending the
>>> "free SHM buffer" request. So we have no reason to hold reference for
>>> this buffer anymore. Moreover, in multiprocessor systems NW have time
>>> to reuse buffer cookie for another buffer. Xen complained about this
>>> and denied the new buffer registration. I have seen this issue while
>>> running tests on iMX SoC.
>>>
>>> So, this patch basically corrects that behavior by freeing the buffer
>>> earlier, when handling RPC return from OP-TEE.
>>>
>>> Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
>>> ---
>>>    xen/arch/arm/tee/optee.c | 24 ++++++++++++++++++++----
>>>    1 file changed, 20 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/xen/arch/arm/tee/optee.c b/xen/arch/arm/tee/optee.c
>>> index 6a035355db..af19fc31f8 100644
>>> --- a/xen/arch/arm/tee/optee.c
>>> +++ b/xen/arch/arm/tee/optee.c
>>> @@ -1099,6 +1099,26 @@ static int handle_rpc_return(struct optee_domain *ctx,
>>>            if ( shm_rpc->xen_arg->cmd == OPTEE_RPC_CMD_SHM_ALLOC )
>>>                call->rpc_buffer_type = shm_rpc->xen_arg->params[0].u.value.a;
>>>    
>>> +        /*
>>> +         * OP-TEE signals that it frees the buffer that it requested
>>> +         * before. This is the right for us to do the same.
>>> +         */
>>> +        if ( shm_rpc->xen_arg->cmd == OPTEE_RPC_CMD_SHM_FREE )
>>> +        {
>>> +            uint64_t cookie = shm_rpc->xen_arg->params[0].u.value.b;
>>> +
>>> +            free_optee_shm_buf(ctx, cookie);
>>> +
>>> +            /*
>>> +             * This should never happen. We have a bug either in the
>>> +             * OP-TEE or in the mediator.
>>> +             */
>>> +            if ( call->rpc_data_cookie && call->rpc_data_cookie != cookie )
>>> +                gprintk(XENLOG_ERR,
>>> +                        "Saved RPC cookie does not corresponds to OP-TEE's (%"PRIx64" != %"PRIx64")\n",
>>
>> s/corresponds/correspond/
> Will fix in the next version.
> 
>>> +                        call->rpc_data_cookie, cookie);
>>
>> IIUC, if you free the wrong SHM buffer then your guest is likely to be
>> running incorrectly afterwards. So shouldn't we crash the guest to avoid
>> further issue?
>>
> 
> Well, we freed the exact buffer that OP-TEE asked us to free. So guest
> didn't anything bad. Moreover, optee driver in Linux kernel does not
> have similar check, so it will free this buffer without any complains.
> I'm just being overcautious here. Thus, I see no reason to crash the
> guest.

My point is not whether the guest did anything bad but whether 
acknowledging a bug and continuing like nothing happened is the right 
thing to do.

I can't judge whether the bug is critical enough. However I don't 
consider a single message on the console to be sufficient in a case of a 
bug. This is likely going to be missed and it may cause side-effect 
which may only be noticed a long time after. The amount of debugging 
required to figure out the original problem may be quite consequent.

The first suggestion would be to expand your comment and explain why it 
is fine continue.

Secondly, if it is consider safe to continue but still needs attention, 
then I would suggest to add a WARN() to make easier to spot in the log.

Cheers,

Stefano Stabellini June 18, 2020, 10:20 p.m. UTC | #7

Hi Paul, Julien,

Volodymyr hasn't come back with an update to this patch, but I think it
is good enough as-is as a bug fix and I would rather have it in its
current form in 4.14 than not having it at all leaving the bug unfixed.

I think Julien agrees.


Paul, are you OK with this?



On Mon, 11 May 2020, Stefano Stabellini wrote:
> On Mon, 11 May 2020, Andrew Cooper wrote:
> > On 11/05/2020 10:34, Julien Grall wrote:
> > > Hi Volodymyr,
> > >
> > > On 06/05/2020 02:44, Volodymyr Babchuk wrote:
> > >> Normal World can share buffer with OP-TEE for two reasons:
> > >> 1. Some client application wants to exchange data with TA
> > >> 2. OP-TEE asks for shared buffer for internal needs
> > >>
> > >> The second case was handle more strictly than necessary:
> > >>
> > >> 1. In RPC request OP-TEE asks for buffer
> > >> 2. NW allocates buffer and provides it via RPC response
> > >> 3. Xen pins pages and translates data
> > >> 4. Xen provides buffer to OP-TEE
> > >> 5. OP-TEE uses it
> > >> 6. OP-TEE sends request to free the buffer
> > >> 7. NW frees the buffer and sends the RPC response
> > >> 8. Xen unpins pages and forgets about the buffer
> > >>
> > >> The problem is that Xen should forget about buffer in between stages 6
> > >> and 7. I.e. the right flow should be like this:
> > >>
> > >> 6. OP-TEE sends request to free the buffer
> > >> 7. Xen unpins pages and forgets about the buffer
> > >> 8. NW frees the buffer and sends the RPC response
> > >>
> > >> This is because OP-TEE internally frees the buffer before sending the
> > >> "free SHM buffer" request. So we have no reason to hold reference for
> > >> this buffer anymore. Moreover, in multiprocessor systems NW have time
> > >> to reuse buffer cookie for another buffer. Xen complained about this
> > >> and denied the new buffer registration. I have seen this issue while
> > >> running tests on iMX SoC.
> > >>
> > >> So, this patch basically corrects that behavior by freeing the buffer
> > >> earlier, when handling RPC return from OP-TEE.
> > >>
> > >> Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
> > >> ---
> > >>   xen/arch/arm/tee/optee.c | 24 ++++++++++++++++++++----
> > >>   1 file changed, 20 insertions(+), 4 deletions(-)
> > >>
> > >> diff --git a/xen/arch/arm/tee/optee.c b/xen/arch/arm/tee/optee.c
> > >> index 6a035355db..af19fc31f8 100644
> > >> --- a/xen/arch/arm/tee/optee.c
> > >> +++ b/xen/arch/arm/tee/optee.c
> > >> @@ -1099,6 +1099,26 @@ static int handle_rpc_return(struct
> > >> optee_domain *ctx,
> > >>           if ( shm_rpc->xen_arg->cmd == OPTEE_RPC_CMD_SHM_ALLOC )
> > >>               call->rpc_buffer_type =
> > >> shm_rpc->xen_arg->params[0].u.value.a;
> > >>   +        /*
> > >> +         * OP-TEE signals that it frees the buffer that it requested
> > >> +         * before. This is the right for us to do the same.
> > >> +         */
> > >> +        if ( shm_rpc->xen_arg->cmd == OPTEE_RPC_CMD_SHM_FREE )
> > >> +        {
> > >> +            uint64_t cookie = shm_rpc->xen_arg->params[0].u.value.b;
> > >> +
> > >> +            free_optee_shm_buf(ctx, cookie);
> > >> +
> > >> +            /*
> > >> +             * This should never happen. We have a bug either in the
> > >> +             * OP-TEE or in the mediator.
> > >> +             */
> > >> +            if ( call->rpc_data_cookie && call->rpc_data_cookie !=
> > >> cookie )
> > >> +                gprintk(XENLOG_ERR,
> > >> +                        "Saved RPC cookie does not corresponds to
> > >> OP-TEE's (%"PRIx64" != %"PRIx64")\n",
> > >
> > > s/corresponds/correspond/
> > >
> > >> +                        call->rpc_data_cookie, cookie);
> > >
> > > IIUC, if you free the wrong SHM buffer then your guest is likely to be
> > > running incorrectly afterwards. So shouldn't we crash the guest to
> > > avoid further issue?
> > 
> > No - crashing the guest prohibits testing of the interface, and/or the
> > guest realising it screwed up and dumping enough state to usefully debug
> > what is going on.
> > 
> > Furthermore, if userspace could trigger this path, we'd have to issue an
> > XSA.
> > 
> > Crashing the guest is almost never the right thing to do, and definitely
> > not appropriate for a bad parameter.
> 
> Maybe we want to close the OPTEE interface for the guest, instead of
> crashing the whole VM. I.e. freeing the OPTEE context for the domain
> (d->arch.tee)?
> 
> But I think the patch is good as it is honestly.

Stefano Stabellini June 18, 2020, 10:21 p.m. UTC | #8

Actually adding Paul


On Thu, 18 Jun 2020, Stefano Stabellini wrote:
> Hi Paul, Julien,
> 
> Volodymyr hasn't come back with an update to this patch, but I think it
> is good enough as-is as a bug fix and I would rather have it in its
> current form in 4.14 than not having it at all leaving the bug unfixed.
> 
> I think Julien agrees.
> 
> 
> Paul, are you OK with this?
> 
> 
> 
> On Mon, 11 May 2020, Stefano Stabellini wrote:
> > On Mon, 11 May 2020, Andrew Cooper wrote:
> > > On 11/05/2020 10:34, Julien Grall wrote:
> > > > Hi Volodymyr,
> > > >
> > > > On 06/05/2020 02:44, Volodymyr Babchuk wrote:
> > > >> Normal World can share buffer with OP-TEE for two reasons:
> > > >> 1. Some client application wants to exchange data with TA
> > > >> 2. OP-TEE asks for shared buffer for internal needs
> > > >>
> > > >> The second case was handle more strictly than necessary:
> > > >>
> > > >> 1. In RPC request OP-TEE asks for buffer
> > > >> 2. NW allocates buffer and provides it via RPC response
> > > >> 3. Xen pins pages and translates data
> > > >> 4. Xen provides buffer to OP-TEE
> > > >> 5. OP-TEE uses it
> > > >> 6. OP-TEE sends request to free the buffer
> > > >> 7. NW frees the buffer and sends the RPC response
> > > >> 8. Xen unpins pages and forgets about the buffer
> > > >>
> > > >> The problem is that Xen should forget about buffer in between stages 6
> > > >> and 7. I.e. the right flow should be like this:
> > > >>
> > > >> 6. OP-TEE sends request to free the buffer
> > > >> 7. Xen unpins pages and forgets about the buffer
> > > >> 8. NW frees the buffer and sends the RPC response
> > > >>
> > > >> This is because OP-TEE internally frees the buffer before sending the
> > > >> "free SHM buffer" request. So we have no reason to hold reference for
> > > >> this buffer anymore. Moreover, in multiprocessor systems NW have time
> > > >> to reuse buffer cookie for another buffer. Xen complained about this
> > > >> and denied the new buffer registration. I have seen this issue while
> > > >> running tests on iMX SoC.
> > > >>
> > > >> So, this patch basically corrects that behavior by freeing the buffer
> > > >> earlier, when handling RPC return from OP-TEE.
> > > >>
> > > >> Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
> > > >> ---
> > > >>   xen/arch/arm/tee/optee.c | 24 ++++++++++++++++++++----
> > > >>   1 file changed, 20 insertions(+), 4 deletions(-)
> > > >>
> > > >> diff --git a/xen/arch/arm/tee/optee.c b/xen/arch/arm/tee/optee.c
> > > >> index 6a035355db..af19fc31f8 100644
> > > >> --- a/xen/arch/arm/tee/optee.c
> > > >> +++ b/xen/arch/arm/tee/optee.c
> > > >> @@ -1099,6 +1099,26 @@ static int handle_rpc_return(struct
> > > >> optee_domain *ctx,
> > > >>           if ( shm_rpc->xen_arg->cmd == OPTEE_RPC_CMD_SHM_ALLOC )
> > > >>               call->rpc_buffer_type =
> > > >> shm_rpc->xen_arg->params[0].u.value.a;
> > > >>   +        /*
> > > >> +         * OP-TEE signals that it frees the buffer that it requested
> > > >> +         * before. This is the right for us to do the same.
> > > >> +         */
> > > >> +        if ( shm_rpc->xen_arg->cmd == OPTEE_RPC_CMD_SHM_FREE )
> > > >> +        {
> > > >> +            uint64_t cookie = shm_rpc->xen_arg->params[0].u.value.b;
> > > >> +
> > > >> +            free_optee_shm_buf(ctx, cookie);
> > > >> +
> > > >> +            /*
> > > >> +             * This should never happen. We have a bug either in the
> > > >> +             * OP-TEE or in the mediator.
> > > >> +             */
> > > >> +            if ( call->rpc_data_cookie && call->rpc_data_cookie !=
> > > >> cookie )
> > > >> +                gprintk(XENLOG_ERR,
> > > >> +                        "Saved RPC cookie does not corresponds to
> > > >> OP-TEE's (%"PRIx64" != %"PRIx64")\n",
> > > >
> > > > s/corresponds/correspond/
> > > >
> > > >> +                        call->rpc_data_cookie, cookie);
> > > >
> > > > IIUC, if you free the wrong SHM buffer then your guest is likely to be
> > > > running incorrectly afterwards. So shouldn't we crash the guest to
> > > > avoid further issue?
> > > 
> > > No - crashing the guest prohibits testing of the interface, and/or the
> > > guest realising it screwed up and dumping enough state to usefully debug
> > > what is going on.
> > > 
> > > Furthermore, if userspace could trigger this path, we'd have to issue an
> > > XSA.
> > > 
> > > Crashing the guest is almost never the right thing to do, and definitely
> > > not appropriate for a bad parameter.
> > 
> > Maybe we want to close the OPTEE interface for the guest, instead of
> > crashing the whole VM. I.e. freeing the OPTEE context for the domain
> > (d->arch.tee)?
> > 
> > But I think the patch is good as it is honestly.

Paul Durrant June 19, 2020, 8:40 a.m. UTC | #9

> -----Original Message-----
> From: Stefano Stabellini <sstabellini@kernel.org>
> Sent: 18 June 2020 23:21
> To: xadimgnik@gmail.com; pdurrant@amazon.co.uk
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>; Julien Grall <julien@xen.org>; Volodymyr Babchuk
> <Volodymyr_Babchuk@epam.com>; xen-devel@lists.xenproject.org; tee-dev@lists.linaro.org
> Subject: Re: [PATCH] optee: immediately free buffers that are released by OP-TEE
> 
> Actually adding Paul
> 
> 
> On Thu, 18 Jun 2020, Stefano Stabellini wrote:
> > Hi Paul, Julien,
> >
> > Volodymyr hasn't come back with an update to this patch, but I think it
> > is good enough as-is as a bug fix and I would rather have it in its
> > current form in 4.14 than not having it at all leaving the bug unfixed.
> >
> > I think Julien agrees.
> >
> >
> > Paul, are you OK with this?

I will take my direction from the maintainers as to whether this fixes a critical issue and hence is a candidate for 4.14. If Volodymyr doesn't come back with a v2 then I would at least want a formal ack of this patch, and the cosmetic change requested by Julien fixed on commit, as well as...

> >
> >
> >
> > On Mon, 11 May 2020, Stefano Stabellini wrote:
> > > On Mon, 11 May 2020, Andrew Cooper wrote:
> > > > On 11/05/2020 10:34, Julien Grall wrote:
> > > > > Hi Volodymyr,
> > > > >
> > > > > On 06/05/2020 02:44, Volodymyr Babchuk wrote:
> > > > >> Normal World can share buffer with OP-TEE for two reasons:
> > > > >> 1. Some client application wants to exchange data with TA
> > > > >> 2. OP-TEE asks for shared buffer for internal needs
> > > > >>
> > > > >> The second case was handle more strictly than necessary:
> > > > >>
> > > > >> 1. In RPC request OP-TEE asks for buffer
> > > > >> 2. NW allocates buffer and provides it via RPC response
> > > > >> 3. Xen pins pages and translates data
> > > > >> 4. Xen provides buffer to OP-TEE
> > > > >> 5. OP-TEE uses it
> > > > >> 6. OP-TEE sends request to free the buffer
> > > > >> 7. NW frees the buffer and sends the RPC response
> > > > >> 8. Xen unpins pages and forgets about the buffer
> > > > >>
> > > > >> The problem is that Xen should forget about buffer in between stages 6
> > > > >> and 7. I.e. the right flow should be like this:
> > > > >>
> > > > >> 6. OP-TEE sends request to free the buffer
> > > > >> 7. Xen unpins pages and forgets about the buffer
> > > > >> 8. NW frees the buffer and sends the RPC response
> > > > >>
> > > > >> This is because OP-TEE internally frees the buffer before sending the
> > > > >> "free SHM buffer" request. So we have no reason to hold reference for
> > > > >> this buffer anymore. Moreover, in multiprocessor systems NW have time
> > > > >> to reuse buffer cookie for another buffer. Xen complained about this
> > > > >> and denied the new buffer registration. I have seen this issue while
> > > > >> running tests on iMX SoC.
> > > > >>
> > > > >> So, this patch basically corrects that behavior by freeing the buffer
> > > > >> earlier, when handling RPC return from OP-TEE.
> > > > >>
> > > > >> Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
> > > > >> ---
> > > > >>   xen/arch/arm/tee/optee.c | 24 ++++++++++++++++++++----
> > > > >>   1 file changed, 20 insertions(+), 4 deletions(-)
> > > > >>
> > > > >> diff --git a/xen/arch/arm/tee/optee.c b/xen/arch/arm/tee/optee.c
> > > > >> index 6a035355db..af19fc31f8 100644
> > > > >> --- a/xen/arch/arm/tee/optee.c
> > > > >> +++ b/xen/arch/arm/tee/optee.c
> > > > >> @@ -1099,6 +1099,26 @@ static int handle_rpc_return(struct
> > > > >> optee_domain *ctx,
> > > > >>           if ( shm_rpc->xen_arg->cmd == OPTEE_RPC_CMD_SHM_ALLOC )
> > > > >>               call->rpc_buffer_type =
> > > > >> shm_rpc->xen_arg->params[0].u.value.a;
> > > > >>   +        /*
> > > > >> +         * OP-TEE signals that it frees the buffer that it requested
> > > > >> +         * before. This is the right for us to do the same.
> > > > >> +         */

...this comment being re-worded:

"OP-TEE is signalling that it has freed the buffer that it requested before. This is the right time for us to do the same."

perhaps?

  Paul

> > > > >> +        if ( shm_rpc->xen_arg->cmd == OPTEE_RPC_CMD_SHM_FREE )
> > > > >> +        {
> > > > >> +            uint64_t cookie = shm_rpc->xen_arg->params[0].u.value.b;
> > > > >> +
> > > > >> +            free_optee_shm_buf(ctx, cookie);
> > > > >> +
> > > > >> +            /*
> > > > >> +             * This should never happen. We have a bug either in the
> > > > >> +             * OP-TEE or in the mediator.
> > > > >> +             */
> > > > >> +            if ( call->rpc_data_cookie && call->rpc_data_cookie !=
> > > > >> cookie )
> > > > >> +                gprintk(XENLOG_ERR,
> > > > >> +                        "Saved RPC cookie does not corresponds to
> > > > >> OP-TEE's (%"PRIx64" != %"PRIx64")\n",
> > > > >
> > > > > s/corresponds/correspond/
> > > > >
> > > > >> +                        call->rpc_data_cookie, cookie);
> > > > >
> > > > > IIUC, if you free the wrong SHM buffer then your guest is likely to be
> > > > > running incorrectly afterwards. So shouldn't we crash the guest to
> > > > > avoid further issue?
> > > >
> > > > No - crashing the guest prohibits testing of the interface, and/or the
> > > > guest realising it screwed up and dumping enough state to usefully debug
> > > > what is going on.
> > > >
> > > > Furthermore, if userspace could trigger this path, we'd have to issue an
> > > > XSA.
> > > >
> > > > Crashing the guest is almost never the right thing to do, and definitely
> > > > not appropriate for a bad parameter.
> > >
> > > Maybe we want to close the OPTEE interface for the guest, instead of
> > > crashing the whole VM. I.e. freeing the OPTEE context for the domain
> > > (d->arch.tee)?
> > >
> > > But I think the patch is good as it is honestly.

Julien Grall June 19, 2020, 8:53 a.m. UTC | #10

On 18/06/2020 23:20, Stefano Stabellini wrote:
> Hi Paul, Julien,
> 
> Volodymyr hasn't come back with an update to this patch, but I think it
> is good enough as-is as a bug fix and I would rather have it in its
> current form in 4.14 than not having it at all leaving the bug unfixed.
> 
> I think Julien agrees.

The approach is okayish but this is not ideal at least without any 
explanation why ignoring a potential bug is fine. I could settle with an 
expanded commit message for now.

Therefore, I don't feel I should provide my Ack on this approach. That 
said, I am not the maintainers of this code. You are free to Ack and 
commit it.

Cheers,

Volodymyr Babchuk June 19, 2020, 9:01 a.m. UTC | #11

Julien, Paul,

Julien Grall writes:

> On 18/06/2020 23:20, Stefano Stabellini wrote:
>> Hi Paul, Julien,
>>
>> Volodymyr hasn't come back with an update to this patch, but I think it
>> is good enough as-is as a bug fix and I would rather have it in its
>> current form in 4.14 than not having it at all leaving the bug unfixed.
>>
>> I think Julien agrees.
>
> The approach is okayish but this is not ideal at least without any
> explanation why ignoring a potential bug is fine. I could settle with
> an expanded commit message for now.
>
> Therefore, I don't feel I should provide my Ack on this approach. That
> said, I am not the maintainers of this code. You are free to Ack and
> commit it.

Sorry for the delay. I'll provide v2 later today.

optee: immediately free buffers that are released by OP-TEE

Commit Message

Comments

Patch