diff mbox series

[v2,1/7] xen/evtchn: Make sure all buckets below d->valid_evtchns are allocated

Message ID 710e9e6477270212136d6f2047fd15a033fa7d71.1660902588.git.rahul.singh@arm.com (mailing list archive)
State Superseded
Headers show
Series xen/evtchn: implement static event channel signaling | expand

Commit Message

Rahul Singh Aug. 19, 2022, 10:02 a.m. UTC
From: Julien Grall <jgrall@amazon.com>

Since commit 01280dc19cf3 "evtchn: simplify port_is_valid()", the event
channels code assumes that all the buckets below d->valid_evtchns are
always allocated.

This assumption hold in most of the situation because a guest is not
allowed to chose the port. Instead, it will be the first free from port
0.

When using Guest Transparent Migration and LiveUpdate, we will only
preserve ports that are currently in use. As a guest can open/close
event channels, this means the ports may be sparse.

The existing implementation of evtchn_allocate_port() is not able to
deal with such situation and will end up to override bucket or/and leave
some bucket unallocated. The latter will result to a droplet crash if
the event channel belongs to an unallocated bucket.

This can be solved by making sure that all the buckets below
d->valid_evtchns are allocated. There should be no impact for most of
the situation but LM/LU as only one bucket would be allocated. For
LM/LU, we may end up to allocate multiple buckets if ports in use are
sparse.

A potential alternative is to check that the bucket is valid in
is_port_valid(). This should still possible to do it without taking
per-domain lock but will result a couple more of memory access.

Signed-off-by: Julien Grall <jgrall@amazon.com>
Signed-off-by: Rahul Singh <rahul.singh@arm.com>
---
Changes in v2:
 - new patch in this version to fix the security issue
---
---
 xen/common/event_channel.c | 56 ++++++++++++++++++++++++--------------
 1 file changed, 35 insertions(+), 21 deletions(-)

Comments

Jan Beulich Aug. 22, 2022, 1:08 p.m. UTC | #1
On 19.08.2022 12:02, Rahul Singh wrote:
> From: Julien Grall <jgrall@amazon.com>
> 
> Since commit 01280dc19cf3 "evtchn: simplify port_is_valid()", the event
> channels code assumes that all the buckets below d->valid_evtchns are
> always allocated.
> 
> This assumption hold in most of the situation because a guest is not
> allowed to chose the port. Instead, it will be the first free from port
> 0.
> 
> When using Guest Transparent Migration and LiveUpdate, we will only
> preserve ports that are currently in use. As a guest can open/close
> event channels, this means the ports may be sparse.
> 
> The existing implementation of evtchn_allocate_port() is not able to
> deal with such situation and will end up to override bucket or/and leave
> some bucket unallocated. The latter will result to a droplet crash if
> the event channel belongs to an unallocated bucket.
> 
> This can be solved by making sure that all the buckets below
> d->valid_evtchns are allocated. There should be no impact for most of
> the situation but LM/LU as only one bucket would be allocated. For
> LM/LU, we may end up to allocate multiple buckets if ports in use are
> sparse.
> 
> A potential alternative is to check that the bucket is valid in
> is_port_valid(). This should still possible to do it without taking
> per-domain lock but will result a couple more of memory access.
> 
> Signed-off-by: Julien Grall <jgrall@amazon.com>
> Signed-off-by: Rahul Singh <rahul.singh@arm.com>

While I'm mostly okay with the code, I think the description wants
changing / amending as long as the features talked about above aren't
anywhere near reaching upstream (afaict), to at least _also_ mention
the goal you have with this.

> Changes in v2:
>  - new patch in this version to fix the security issue

I guess you mean "avoid", not "fix".

> @@ -207,30 +216,35 @@ int evtchn_allocate_port(struct domain *d, evtchn_port_t port)
>      }
>      else
>      {
> -        struct evtchn *chn;
> -        struct evtchn **grp;
> -
> -        if ( !group_from_port(d, port) )
> +        do
>          {
> -            grp = xzalloc_array(struct evtchn *, BUCKETS_PER_GROUP);
> -            if ( !grp )
> -                return -ENOMEM;
> -            group_from_port(d, port) = grp;
> -        }
> +            struct evtchn *chn;
> +            struct evtchn **grp;
> +            unsigned int alloc_port = read_atomic(&d->valid_evtchns);
>  
> -        chn = alloc_evtchn_bucket(d, port);
> -        if ( !chn )
> -            return -ENOMEM;
> -        bucket_from_port(d, port) = chn;
> +            if ( !group_from_port(d, alloc_port) )
> +            {
> +                grp = xzalloc_array(struct evtchn *, BUCKETS_PER_GROUP);
> +                if ( !grp )
> +                    return -ENOMEM;
> +                group_from_port(d, alloc_port) = grp;
> +            }
>  
> -        /*
> -         * d->valid_evtchns is used to check whether the bucket can be
> -         * accessed without the per-domain lock. Therefore,
> -         * d->valid_evtchns should be seen *after* the new bucket has
> -         * been setup.
> -         */
> -        smp_wmb();
> -        write_atomic(&d->valid_evtchns, d->valid_evtchns + EVTCHNS_PER_BUCKET);
> +            chn = alloc_evtchn_bucket(d, alloc_port);
> +            if ( !chn )
> +                return -ENOMEM;
> +            bucket_from_port(d, alloc_port) = chn;
> +
> +            /*
> +             * d->valid_evtchns is used to check whether the bucket can be
> +             * accessed without the per-domain lock. Therefore,
> +             * d->valid_evtchns should be seen *after* the new bucket has
> +             * been setup.
> +             */
> +            smp_wmb();
> +            write_atomic(&d->valid_evtchns,
> +                         d->valid_evtchns + EVTCHNS_PER_BUCKET);
> +        } while ( port >= read_atomic(&d->valid_evtchns) );

This updating of d->valid_evtchns looks a little inconsistent to me,
wrt the uses of {read,write}_atomic(). To make obvious that there's
an implicit expectation that no 2nd invocation of this function
could race the updates, I'd recommend reading allocate_port ahead
of the loop and then never again. Here you'd then do

            smp_wmb();
            allocate_port += EVTCHNS_PER_BUCKET;
            write_atomic(&d->valid_evtchns, allocate_port);
        } while ( port >= allocate_port );

at the same time rendering the code (imo) a little more legible.

Jan
Julien Grall Aug. 23, 2022, 8:14 a.m. UTC | #2
Hi Jan,

On 22/08/2022 14:08, Jan Beulich wrote:
> On 19.08.2022 12:02, Rahul Singh wrote:
>> From: Julien Grall <jgrall@amazon.com>
>>
>> Since commit 01280dc19cf3 "evtchn: simplify port_is_valid()", the event
>> channels code assumes that all the buckets below d->valid_evtchns are
>> always allocated.
>>
>> This assumption hold in most of the situation because a guest is not
>> allowed to chose the port. Instead, it will be the first free from port
>> 0.
>>
>> When using Guest Transparent Migration and LiveUpdate, we will only
>> preserve ports that are currently in use. As a guest can open/close
>> event channels, this means the ports may be sparse.
>>
>> The existing implementation of evtchn_allocate_port() is not able to
>> deal with such situation and will end up to override bucket or/and leave
>> some bucket unallocated. The latter will result to a droplet crash if
>> the event channel belongs to an unallocated bucket.
>>
>> This can be solved by making sure that all the buckets below
>> d->valid_evtchns are allocated. There should be no impact for most of
>> the situation but LM/LU as only one bucket would be allocated. For
>> LM/LU, we may end up to allocate multiple buckets if ports in use are
>> sparse.
>>
>> A potential alternative is to check that the bucket is valid in
>> is_port_valid(). This should still possible to do it without taking
>> per-domain lock but will result a couple more of memory access.
>>
>> Signed-off-by: Julien Grall <jgrall@amazon.com>
>> Signed-off-by: Rahul Singh <rahul.singh@arm.com>
> 
> While I'm mostly okay with the code, I think the description wants
> changing / amending as long as the features talked about above aren't
> anywhere near reaching upstream (afaict), to at least _also_ mention
> the goal you have with this.

Correct, neither Guest Transparent Migration nor Live-Update is going to 
reach Xen in 4.17 :). Also, if we decide to continue to mention it, then
we would need to s/Guest Transparent Migration/non-cooperative 
migration/ to match the name we decided to use in upstream (see 
docs/designs/non-cooperative-migration.md).

> 
>> Changes in v2:
>>   - new patch in this version to fix the security issue
> 
> I guess you mean "avoid", not "fix".
> 
>> @@ -207,30 +216,35 @@ int evtchn_allocate_port(struct domain *d, evtchn_port_t port)
>>       }
>>       else
>>       {
>> -        struct evtchn *chn;
>> -        struct evtchn **grp;
>> -
>> -        if ( !group_from_port(d, port) )
>> +        do
>>           {
>> -            grp = xzalloc_array(struct evtchn *, BUCKETS_PER_GROUP);
>> -            if ( !grp )
>> -                return -ENOMEM;
>> -            group_from_port(d, port) = grp;
>> -        }
>> +            struct evtchn *chn;
>> +            struct evtchn **grp;
>> +            unsigned int alloc_port = read_atomic(&d->valid_evtchns);
>>   
>> -        chn = alloc_evtchn_bucket(d, port);
>> -        if ( !chn )
>> -            return -ENOMEM;
>> -        bucket_from_port(d, port) = chn;
>> +            if ( !group_from_port(d, alloc_port) )
>> +            {
>> +                grp = xzalloc_array(struct evtchn *, BUCKETS_PER_GROUP);
>> +                if ( !grp )
>> +                    return -ENOMEM;
>> +                group_from_port(d, alloc_port) = grp;
>> +            }
>>   
>> -        /*
>> -         * d->valid_evtchns is used to check whether the bucket can be
>> -         * accessed without the per-domain lock. Therefore,
>> -         * d->valid_evtchns should be seen *after* the new bucket has
>> -         * been setup.
>> -         */
>> -        smp_wmb();
>> -        write_atomic(&d->valid_evtchns, d->valid_evtchns + EVTCHNS_PER_BUCKET);
>> +            chn = alloc_evtchn_bucket(d, alloc_port);
>> +            if ( !chn )
>> +                return -ENOMEM;
>> +            bucket_from_port(d, alloc_port) = chn;
>> +
>> +            /*
>> +             * d->valid_evtchns is used to check whether the bucket can be
>> +             * accessed without the per-domain lock. Therefore,
>> +             * d->valid_evtchns should be seen *after* the new bucket has
>> +             * been setup.
>> +             */
>> +            smp_wmb();
>> +            write_atomic(&d->valid_evtchns,
>> +                         d->valid_evtchns + EVTCHNS_PER_BUCKET);
>> +        } while ( port >= read_atomic(&d->valid_evtchns) );
> 
> This updating of d->valid_evtchns looks a little inconsistent to me,
> wrt the uses of {read,write}_atomic(). To make obvious that there's
> an implicit expectation that no 2nd invocation of this function
> could race the updates, I'd recommend reading allocate_port ahead
> of the loop and then never again. Here you'd then do
> 
>              smp_wmb();
>              allocate_port += EVTCHNS_PER_BUCKET;
>              write_atomic(&d->valid_evtchns, allocate_port);
>          } while ( port >= allocate_port );


I know it is my code. But I agree with this comment :).

> 
> at the same time rendering the code (imo) a little more legible.
> 
> Jan
Rahul Singh Aug. 23, 2022, 1:37 p.m. UTC | #3
Hi Jan.

> On 22 Aug 2022, at 2:08 pm, Jan Beulich <jbeulich@suse.com> wrote:
> 
> On 19.08.2022 12:02, Rahul Singh wrote:
>> From: Julien Grall <jgrall@amazon.com>
>> 
>> Since commit 01280dc19cf3 "evtchn: simplify port_is_valid()", the event
>> channels code assumes that all the buckets below d->valid_evtchns are
>> always allocated.
>> 
>> This assumption hold in most of the situation because a guest is not
>> allowed to chose the port. Instead, it will be the first free from port
>> 0.
>> 
>> When using Guest Transparent Migration and LiveUpdate, we will only
>> preserve ports that are currently in use. As a guest can open/close
>> event channels, this means the ports may be sparse.
>> 
>> The existing implementation of evtchn_allocate_port() is not able to
>> deal with such situation and will end up to override bucket or/and leave
>> some bucket unallocated. The latter will result to a droplet crash if
>> the event channel belongs to an unallocated bucket.
>> 
>> This can be solved by making sure that all the buckets below
>> d->valid_evtchns are allocated. There should be no impact for most of
>> the situation but LM/LU as only one bucket would be allocated. For
>> LM/LU, we may end up to allocate multiple buckets if ports in use are
>> sparse.
>> 
>> A potential alternative is to check that the bucket is valid in
>> is_port_valid(). This should still possible to do it without taking
>> per-domain lock but will result a couple more of memory access.
>> 
>> Signed-off-by: Julien Grall <jgrall@amazon.com>
>> Signed-off-by: Rahul Singh <rahul.singh@arm.com>
> 
> While I'm mostly okay with the code, I think the description wants
> changing / amending as long as the features talked about above aren't
> anywhere near reaching upstream (afaict), to at least _also_ mention
> the goal you have with this.

Ok. I will remove this and add that we need this patch to support static event channel.
Something like:
“ When static event channel support will be added for dom0less domains
  user can request to allocate the evtchn port numbers that are scattered
  in nature."

> 
>> Changes in v2:
>> - new patch in this version to fix the security issue
> 
> I guess you mean "avoid", not "fix".

Ack. 
> 
>> @@ -207,30 +216,35 @@ int evtchn_allocate_port(struct domain *d, evtchn_port_t port)
>>     }
>>     else
>>     {
>> -        struct evtchn *chn;
>> -        struct evtchn **grp;
>> -
>> -        if ( !group_from_port(d, port) )
>> +        do
>>         {
>> -            grp = xzalloc_array(struct evtchn *, BUCKETS_PER_GROUP);
>> -            if ( !grp )
>> -                return -ENOMEM;
>> -            group_from_port(d, port) = grp;
>> -        }
>> +            struct evtchn *chn;
>> +            struct evtchn **grp;
>> +            unsigned int alloc_port = read_atomic(&d->valid_evtchns);
>> 
>> -        chn = alloc_evtchn_bucket(d, port);
>> -        if ( !chn )
>> -            return -ENOMEM;
>> -        bucket_from_port(d, port) = chn;
>> +            if ( !group_from_port(d, alloc_port) )
>> +            {
>> +                grp = xzalloc_array(struct evtchn *, BUCKETS_PER_GROUP);
>> +                if ( !grp )
>> +                    return -ENOMEM;
>> +                group_from_port(d, alloc_port) = grp;
>> +            }
>> 
>> -        /*
>> -         * d->valid_evtchns is used to check whether the bucket can be
>> -         * accessed without the per-domain lock. Therefore,
>> -         * d->valid_evtchns should be seen *after* the new bucket has
>> -         * been setup.
>> -         */
>> -        smp_wmb();
>> -        write_atomic(&d->valid_evtchns, d->valid_evtchns + EVTCHNS_PER_BUCKET);
>> +            chn = alloc_evtchn_bucket(d, alloc_port);
>> +            if ( !chn )
>> +                return -ENOMEM;
>> +            bucket_from_port(d, alloc_port) = chn;
>> +
>> +            /*
>> +             * d->valid_evtchns is used to check whether the bucket can be
>> +             * accessed without the per-domain lock. Therefore,
>> +             * d->valid_evtchns should be seen *after* the new bucket has
>> +             * been setup.
>> +             */
>> +            smp_wmb();
>> +            write_atomic(&d->valid_evtchns,
>> +                         d->valid_evtchns + EVTCHNS_PER_BUCKET);
>> +        } while ( port >= read_atomic(&d->valid_evtchns) );
> 
> This updating of d->valid_evtchns looks a little inconsistent to me,
> wrt the uses of {read,write}_atomic(). To make obvious that there's
> an implicit expectation that no 2nd invocation of this function
> could race the updates, I'd recommend reading allocate_port ahead
> of the loop and then never again. Here you'd then do
> 
>            smp_wmb();
>            allocate_port += EVTCHNS_PER_BUCKET;
>            write_atomic(&d->valid_evtchns, allocate_port);
>        } while ( port >= allocate_port );
> 
> at the same time rendering the code (imo) a little more legible.
> 
> Jan

Ack. I will fix this as suggested by you I next version.

Regards,
Rahul
diff mbox series

Patch

diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c
index c2c6f8c151..dbe0a27311 100644
--- a/xen/common/event_channel.c
+++ b/xen/common/event_channel.c
@@ -193,6 +193,15 @@  static struct evtchn *alloc_evtchn_bucket(struct domain *d, unsigned int port)
     return NULL;
 }
 
+/*
+ * Allocate a given port and ensure all the buckets up to that ports
+ * have been allocated.
+ *
+ * The last part is important because the rest of the event channel code
+ * relies on all the buckets up to d->valid_evtchns to be valid. However,
+ * event channels may be sparsed when restoring a domain during Guest
+ * Transparent Migration and Live Update.
+ */
 int evtchn_allocate_port(struct domain *d, evtchn_port_t port)
 {
     if ( port > d->max_evtchn_port || port >= max_evtchns(d) )
@@ -207,30 +216,35 @@  int evtchn_allocate_port(struct domain *d, evtchn_port_t port)
     }
     else
     {
-        struct evtchn *chn;
-        struct evtchn **grp;
-
-        if ( !group_from_port(d, port) )
+        do
         {
-            grp = xzalloc_array(struct evtchn *, BUCKETS_PER_GROUP);
-            if ( !grp )
-                return -ENOMEM;
-            group_from_port(d, port) = grp;
-        }
+            struct evtchn *chn;
+            struct evtchn **grp;
+            unsigned int alloc_port = read_atomic(&d->valid_evtchns);
 
-        chn = alloc_evtchn_bucket(d, port);
-        if ( !chn )
-            return -ENOMEM;
-        bucket_from_port(d, port) = chn;
+            if ( !group_from_port(d, alloc_port) )
+            {
+                grp = xzalloc_array(struct evtchn *, BUCKETS_PER_GROUP);
+                if ( !grp )
+                    return -ENOMEM;
+                group_from_port(d, alloc_port) = grp;
+            }
 
-        /*
-         * d->valid_evtchns is used to check whether the bucket can be
-         * accessed without the per-domain lock. Therefore,
-         * d->valid_evtchns should be seen *after* the new bucket has
-         * been setup.
-         */
-        smp_wmb();
-        write_atomic(&d->valid_evtchns, d->valid_evtchns + EVTCHNS_PER_BUCKET);
+            chn = alloc_evtchn_bucket(d, alloc_port);
+            if ( !chn )
+                return -ENOMEM;
+            bucket_from_port(d, alloc_port) = chn;
+
+            /*
+             * d->valid_evtchns is used to check whether the bucket can be
+             * accessed without the per-domain lock. Therefore,
+             * d->valid_evtchns should be seen *after* the new bucket has
+             * been setup.
+             */
+            smp_wmb();
+            write_atomic(&d->valid_evtchns,
+                         d->valid_evtchns + EVTCHNS_PER_BUCKET);
+        } while ( port >= read_atomic(&d->valid_evtchns) );
     }
 
     write_atomic(&d->active_evtchns, d->active_evtchns + 1);