diff mbox series

[RFC,2/4] xen: add bitmap to indicate per-domain state changes

Message ID 20210914123600.1626-3-jgross@suse.com (mailing list archive)
State New, archived
Headers show
Series remove libxenctrl usage from xenstored | expand

Commit Message

Jürgen Groß Sept. 14, 2021, 12:35 p.m. UTC
Add a bitmap with one bit per possible domid indicating the respective
domain has changed its state (created, deleted, dying, crashed,
shutdown).

Registering the VIRQ_DOM_EXC event will result in setting the bits for
all existing domains and resetting all other bits.

Resetting a bit will be done in a future patch.

This information is needed for Xenstore to keep track of all domains.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/domain.c        | 22 ++++++++++++++++++++++
 xen/common/event_channel.c |  2 ++
 xen/include/xen/sched.h    |  2 ++
 3 files changed, 26 insertions(+)

Comments

Julien Grall Sept. 23, 2021, 9:16 a.m. UTC | #1
Hi Juergen,

On 14/09/2021 17:35, Juergen Gross wrote:
> Add a bitmap with one bit per possible domid indicating the respective
> domain has changed its state (created, deleted, dying, crashed,
> shutdown).
> 
> Registering the VIRQ_DOM_EXC event will result in setting the bits for
> all existing domains and resetting all other bits.
> 
> Resetting a bit will be done in a future patch.
> 
> This information is needed for Xenstore to keep track of all domains.
> 
> Signed-off-by: Juergen Gross <jgross@suse.com>
> ---
>   xen/common/domain.c        | 22 ++++++++++++++++++++++
>   xen/common/event_channel.c |  2 ++
>   xen/include/xen/sched.h    |  2 ++
>   3 files changed, 26 insertions(+)
> 
> diff --git a/xen/common/domain.c b/xen/common/domain.c
> index 755349b93f..14a427e2ef 100644
> --- a/xen/common/domain.c
> +++ b/xen/common/domain.c
> @@ -87,6 +87,22 @@ bool __read_mostly vmtrace_available;
>   /* Unique domain identifier, protected by domctl lock. */
>   static uint64_t unique_id;
>   
> +static DECLARE_BITMAP(dom_state_changed, DOMID_MASK + 1);
> +
> +void domain_reset_states(void)
> +{
> +    struct domain *d;
> +
> +    bitmap_zero(dom_state_changed, DOMID_MASK + 1);
> +
> +    rcu_read_lock(&domlist_read_lock);
> +
> +    for_each_domain ( d )
> +        set_bit(d->domain_id, dom_state_changed);
> +
> +    rcu_read_unlock(&domlist_read_lock);
> +}
> +
>   static void __domain_finalise_shutdown(struct domain *d)
>   {
>       struct vcpu *v;
> @@ -101,6 +117,7 @@ static void __domain_finalise_shutdown(struct domain *d)
>               return;
>   
>       d->is_shut_down = 1;
> +    set_bit(d->domain_id, dom_state_changed);

Looking at the follow-up patch, I think you want a smp_wmb() before 
set_bit() otherwise this could be re-ordered on Arm (set_bit() doesn't 
guarantee any ordering). This is to avoid any issue if the new hypercall 
run concurrently.

>       if ( (d->shutdown_code == SHUTDOWN_suspend) && d->suspend_evtchn )
>           evtchn_send(d, d->suspend_evtchn);
>       else
> @@ -720,6 +737,8 @@ struct domain *domain_create(domid_t domid,
>           rcu_assign_pointer(domain_hash[DOMAIN_HASH(domid)], d);
>           spin_unlock(&domlist_update_lock);
>   
> +        set_bit(d->domain_id, dom_state_changed);

Here you should not need one because spin_unlock() contains one.

> +
>           memcpy(d->handle, config->handle, sizeof(d->handle));
>       }
>   
> @@ -954,6 +973,7 @@ int domain_kill(struct domain *d)
>           /* Mem event cleanup has to go here because the rings
>            * have to be put before we call put_domain. */
>           vm_event_cleanup(d);
> +        set_bit(d->domain_id, dom_state_changed);

You might want an smp_wmb() here as well.

>           put_domain(d);
>           send_global_virq(VIRQ_DOM_EXC);
>           /* fallthrough */
> @@ -1141,6 +1161,8 @@ static void complete_domain_destroy(struct rcu_head *head)
>   
>       xfree(d->vcpu);
>   
> +    set_bit(d->domain_id, dom_state_changed);
> +

I think this wants to be moved *after* _domain_destroy() with an 
smp_wmb() between the two. So you when the bit is set you know the 
domain is dead.

>       _domain_destroy(d);
>   
>       send_global_virq(VIRQ_DOM_EXC);
> diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c
> index da88ad141a..e2a416052b 100644
> --- a/xen/common/event_channel.c
> +++ b/xen/common/event_channel.c
> @@ -1218,6 +1218,8 @@ long do_event_channel_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>           rc = evtchn_bind_virq(&bind_virq, 0);
>           if ( !rc && __copy_to_guest(arg, &bind_virq, 1) )
>               rc = -EFAULT; /* Cleaning up here would be a mess! */
> +        if ( !rc && bind_virq.virq == VIRQ_DOM_EXC )
> +            domain_reset_states();
>           break;
>       }
>   
> diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
> index b827c5af8d..2ae26bc50e 100644
> --- a/xen/include/xen/sched.h
> +++ b/xen/include/xen/sched.h
> @@ -730,6 +730,8 @@ void domain_resume(struct domain *d);
>   
>   int domain_soft_reset(struct domain *d, bool resuming);
>   
> +void domain_reset_states(void);
> +
>   int vcpu_start_shutdown_deferral(struct vcpu *v);
>   void vcpu_end_shutdown_deferral(struct vcpu *v);
>   
> 

Cheers,
Jan Beulich Nov. 22, 2021, 10:41 a.m. UTC | #2
On 14.09.2021 14:35, Juergen Gross wrote:
> Add a bitmap with one bit per possible domid indicating the respective
> domain has changed its state (created, deleted, dying, crashed,
> shutdown).
> 
> Registering the VIRQ_DOM_EXC event will result in setting the bits for
> all existing domains and resetting all other bits.

Generally I view VIRQ_DOM_EXC as overly restrictive already - what if both
a xenstore domain and a control domain want respective notification? Hence
similarly I'm not convinced a single, global instance will do here. Which
in turn raises the question whether the approach chosen may not take us
far enough, and we wouldn't instead want a more precise notification model
(i.e. not just "something has changed").

> --- a/xen/common/domain.c
> +++ b/xen/common/domain.c
> @@ -87,6 +87,22 @@ bool __read_mostly vmtrace_available;
>  /* Unique domain identifier, protected by domctl lock. */
>  static uint64_t unique_id;
>  
> +static DECLARE_BITMAP(dom_state_changed, DOMID_MASK + 1);

While not really making a difference to the size of the bitmap, afaict up
to DOMID_FIRST_RESERVED would do. Neither system domains nor idle ones
will reach, in particular, the set_bit() in domain_create(). And none of
them can be subject to destruction.

Also I think this could do with a brief comment as to what causes bits
to be set. This would avoid readers having to go hunt down all the
set_bit() (or the commit introducing the bitmap).

> +void domain_reset_states(void)
> +{
> +    struct domain *d;
> +
> +    bitmap_zero(dom_state_changed, DOMID_MASK + 1);

While this looks to be fine with the present updates of the bitmap, I
still wonder about the non-atomicity here vs the atomic updates
everywhere else. It feels like there's some locking needed to be future
proof. Since you come here from VIRQ_DOM_EXC binding, it could be that
domain's per-domain lock.

> @@ -1141,6 +1161,8 @@ static void complete_domain_destroy(struct rcu_head *head)
>  
>      xfree(d->vcpu);
>  
> +    set_bit(d->domain_id, dom_state_changed);
> +
>      _domain_destroy(d);
>  
>      send_global_virq(VIRQ_DOM_EXC);

Wouldn't this better be in domain_destroy() immediately after the domain
has been taken off the list, and hence is no longer "discoverable"?

Jan
Jürgen Groß Nov. 22, 2021, 12:46 p.m. UTC | #3
On 22.11.21 11:41, Jan Beulich wrote:
> On 14.09.2021 14:35, Juergen Gross wrote:
>> Add a bitmap with one bit per possible domid indicating the respective
>> domain has changed its state (created, deleted, dying, crashed,
>> shutdown).
>>
>> Registering the VIRQ_DOM_EXC event will result in setting the bits for
>> all existing domains and resetting all other bits.
> 
> Generally I view VIRQ_DOM_EXC as overly restrictive already - what if both
> a xenstore domain and a control domain want respective notification? Hence

The general idea was that in this case the control domain should
register a related Xenstore watch.

> similarly I'm not convinced a single, global instance will do here. Which
> in turn raises the question whether the approach chosen may not take us
> far enough, and we wouldn't instead want a more precise notification model
> (i.e. not just "something has changed").

Yes, that would be the job of Xenstore. It would provide the more
fine grained watches for that purpose.

> 
>> --- a/xen/common/domain.c
>> +++ b/xen/common/domain.c
>> @@ -87,6 +87,22 @@ bool __read_mostly vmtrace_available;
>>   /* Unique domain identifier, protected by domctl lock. */
>>   static uint64_t unique_id;
>>   
>> +static DECLARE_BITMAP(dom_state_changed, DOMID_MASK + 1);
> 
> While not really making a difference to the size of the bitmap, afaict up
> to DOMID_FIRST_RESERVED would do. Neither system domains nor idle ones
> will reach, in particular, the set_bit() in domain_create(). And none of
> them can be subject to destruction.

I'd be fine either way.

> Also I think this could do with a brief comment as to what causes bits
> to be set. This would avoid readers having to go hunt down all the
> set_bit() (or the commit introducing the bitmap).

Okay.

> 
>> +void domain_reset_states(void)
>> +{
>> +    struct domain *d;
>> +
>> +    bitmap_zero(dom_state_changed, DOMID_MASK + 1);
> 
> While this looks to be fine with the present updates of the bitmap, I
> still wonder about the non-atomicity here vs the atomic updates
> everywhere else. It feels like there's some locking needed to be future
> proof. Since you come here from VIRQ_DOM_EXC binding, it could be that
> domain's per-domain lock.

Fine with me.

> 
>> @@ -1141,6 +1161,8 @@ static void complete_domain_destroy(struct rcu_head *head)
>>   
>>       xfree(d->vcpu);
>>   
>> +    set_bit(d->domain_id, dom_state_changed);
>> +
>>       _domain_destroy(d);
>>   
>>       send_global_virq(VIRQ_DOM_EXC);
> 
> Wouldn't this better be in domain_destroy() immediately after the domain
> has been taken off the list, and hence is no longer "discoverable"?

In this case the call of send_global_virq() should be moved, too,
I guess?


Juergen
Jan Beulich Nov. 22, 2021, 1:39 p.m. UTC | #4
On 22.11.2021 13:46, Juergen Gross wrote:
> On 22.11.21 11:41, Jan Beulich wrote:
>> On 14.09.2021 14:35, Juergen Gross wrote:
>>> Add a bitmap with one bit per possible domid indicating the respective
>>> domain has changed its state (created, deleted, dying, crashed,
>>> shutdown).
>>>
>>> Registering the VIRQ_DOM_EXC event will result in setting the bits for
>>> all existing domains and resetting all other bits.
>>
>> Generally I view VIRQ_DOM_EXC as overly restrictive already - what if both
>> a xenstore domain and a control domain want respective notification? Hence
> 
> The general idea was that in this case the control domain should
> register a related Xenstore watch.
> 
>> similarly I'm not convinced a single, global instance will do here. Which
>> in turn raises the question whether the approach chosen may not take us
>> far enough, and we wouldn't instead want a more precise notification model
>> (i.e. not just "something has changed").
> 
> Yes, that would be the job of Xenstore. It would provide the more
> fine grained watches for that purpose.

And the watch consumer still wouldn't have a way to distinguish two domain
instances using the same ID, would it?

>>> @@ -1141,6 +1161,8 @@ static void complete_domain_destroy(struct rcu_head *head)
>>>         xfree(d->vcpu);
>>>   +    set_bit(d->domain_id, dom_state_changed);
>>> +
>>>       _domain_destroy(d);
>>>         send_global_virq(VIRQ_DOM_EXC);
>>
>> Wouldn't this better be in domain_destroy() immediately after the domain
>> has been taken off the list, and hence is no longer "discoverable"?
> 
> In this case the call of send_global_virq() should be moved, too,
> I guess?

Possibly, albeit I'd see this as a separate change to make. It doesn't
seem outright wrong for the vIRQ to be sent late. But I agree with the
idea of keeping sending and bit-setting together (ideally, even if this
was to stay here, the two would better sit truly side by side).

Jan
Jürgen Groß Nov. 25, 2021, 6:30 a.m. UTC | #5
On 22.11.21 14:39, Jan Beulich wrote:
> On 22.11.2021 13:46, Juergen Gross wrote:
>> On 22.11.21 11:41, Jan Beulich wrote:
>>> On 14.09.2021 14:35, Juergen Gross wrote:
>>>> Add a bitmap with one bit per possible domid indicating the respective
>>>> domain has changed its state (created, deleted, dying, crashed,
>>>> shutdown).
>>>>
>>>> Registering the VIRQ_DOM_EXC event will result in setting the bits for
>>>> all existing domains and resetting all other bits.
>>>
>>> Generally I view VIRQ_DOM_EXC as overly restrictive already - what if both
>>> a xenstore domain and a control domain want respective notification? Hence
>>
>> The general idea was that in this case the control domain should
>> register a related Xenstore watch.
>>
>>> similarly I'm not convinced a single, global instance will do here. Which
>>> in turn raises the question whether the approach chosen may not take us
>>> far enough, and we wouldn't instead want a more precise notification model
>>> (i.e. not just "something has changed").
>>
>> Yes, that would be the job of Xenstore. It would provide the more
>> fine grained watches for that purpose.
> 
> And the watch consumer still wouldn't have a way to distinguish two domain
> instances using the same ID, would it?

My further plans include new watches for domain creation/destroy
which will include the domid in the watch path reported with the
watch event. So anyone registering for domain creation watches
would no longer need another way to distinguish domains with the
same domid by other means.

The main question remaining here is whether we are okay to let
Xenstore be the instance providing that functionality to the rest
of the stack, or if we want that functionality in the hypervisor,
with all the related needed access control (Xenstore allows to
set the usability of the create/destroy watches for other domains
today).

>>>> @@ -1141,6 +1161,8 @@ static void complete_domain_destroy(struct rcu_head *head)
>>>>          xfree(d->vcpu);
>>>>    +    set_bit(d->domain_id, dom_state_changed);
>>>> +
>>>>        _domain_destroy(d);
>>>>          send_global_virq(VIRQ_DOM_EXC);
>>>
>>> Wouldn't this better be in domain_destroy() immediately after the domain
>>> has been taken off the list, and hence is no longer "discoverable"?
>>
>> In this case the call of send_global_virq() should be moved, too,
>> I guess?
> 
> Possibly, albeit I'd see this as a separate change to make. It doesn't
> seem outright wrong for the vIRQ to be sent late. But I agree with the
> idea of keeping sending and bit-setting together (ideally, even if this
> was to stay here, the two would better sit truly side by side).

Okay, I'll prepend another patch moving the call of send_global_virq()
to domain_destroy().


Juergen
diff mbox series

Patch

diff --git a/xen/common/domain.c b/xen/common/domain.c
index 755349b93f..14a427e2ef 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -87,6 +87,22 @@  bool __read_mostly vmtrace_available;
 /* Unique domain identifier, protected by domctl lock. */
 static uint64_t unique_id;
 
+static DECLARE_BITMAP(dom_state_changed, DOMID_MASK + 1);
+
+void domain_reset_states(void)
+{
+    struct domain *d;
+
+    bitmap_zero(dom_state_changed, DOMID_MASK + 1);
+
+    rcu_read_lock(&domlist_read_lock);
+
+    for_each_domain ( d )
+        set_bit(d->domain_id, dom_state_changed);
+
+    rcu_read_unlock(&domlist_read_lock);
+}
+
 static void __domain_finalise_shutdown(struct domain *d)
 {
     struct vcpu *v;
@@ -101,6 +117,7 @@  static void __domain_finalise_shutdown(struct domain *d)
             return;
 
     d->is_shut_down = 1;
+    set_bit(d->domain_id, dom_state_changed);
     if ( (d->shutdown_code == SHUTDOWN_suspend) && d->suspend_evtchn )
         evtchn_send(d, d->suspend_evtchn);
     else
@@ -720,6 +737,8 @@  struct domain *domain_create(domid_t domid,
         rcu_assign_pointer(domain_hash[DOMAIN_HASH(domid)], d);
         spin_unlock(&domlist_update_lock);
 
+        set_bit(d->domain_id, dom_state_changed);
+
         memcpy(d->handle, config->handle, sizeof(d->handle));
     }
 
@@ -954,6 +973,7 @@  int domain_kill(struct domain *d)
         /* Mem event cleanup has to go here because the rings 
          * have to be put before we call put_domain. */
         vm_event_cleanup(d);
+        set_bit(d->domain_id, dom_state_changed);
         put_domain(d);
         send_global_virq(VIRQ_DOM_EXC);
         /* fallthrough */
@@ -1141,6 +1161,8 @@  static void complete_domain_destroy(struct rcu_head *head)
 
     xfree(d->vcpu);
 
+    set_bit(d->domain_id, dom_state_changed);
+
     _domain_destroy(d);
 
     send_global_virq(VIRQ_DOM_EXC);
diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c
index da88ad141a..e2a416052b 100644
--- a/xen/common/event_channel.c
+++ b/xen/common/event_channel.c
@@ -1218,6 +1218,8 @@  long do_event_channel_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
         rc = evtchn_bind_virq(&bind_virq, 0);
         if ( !rc && __copy_to_guest(arg, &bind_virq, 1) )
             rc = -EFAULT; /* Cleaning up here would be a mess! */
+        if ( !rc && bind_virq.virq == VIRQ_DOM_EXC )
+            domain_reset_states();
         break;
     }
 
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index b827c5af8d..2ae26bc50e 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -730,6 +730,8 @@  void domain_resume(struct domain *d);
 
 int domain_soft_reset(struct domain *d, bool resuming);
 
+void domain_reset_states(void);
+
 int vcpu_start_shutdown_deferral(struct vcpu *v);
 void vcpu_end_shutdown_deferral(struct vcpu *v);