Message ID | 20221101152842.4257-7-jgross@suse.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | tools/xenstore: do some cleanup and fixes | expand |
Hi Juergen, On 01/11/2022 15:28, Juergen Gross wrote: > When a domain has been recognized to have stopped, remove all its > registered watches. This avoids sending watch events to the dead domain > when all the nodes related to it are being removed by the Xen tools. From my understanding, shutdown doesn't mean dead. It may be used during migration (or snapshotting), where we don't want to touch the state in case of a cancellation (or resume). For instance, see the command XS_RESUME which will clear domain->shutdown. > > Signed-off-by: Juergen Gross <jgross@suse.com> > --- > tools/xenstore/xenstored_domain.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/tools/xenstore/xenstored_domain.c b/tools/xenstore/xenstored_domain.c > index aa86892fed..1516df71d8 100644 > --- a/tools/xenstore/xenstored_domain.c > +++ b/tools/xenstore/xenstored_domain.c > @@ -364,6 +364,12 @@ void check_domains(void) > && !domain->shutdown) { > domain->shutdown = true; > notify = 1; > + /* > + * Avoid triggering watch events when the > + * domain's nodes are being deleted. > + */ > + if (domain->conn) > + conn_delete_all_watches(domain->conn); > } > if (!dominfo.dying) > continue; Cheers,
On 06.11.22 22:18, Julien Grall wrote: > Hi Juergen, > > On 01/11/2022 15:28, Juergen Gross wrote: >> When a domain has been recognized to have stopped, remove all its >> registered watches. This avoids sending watch events to the dead domain >> when all the nodes related to it are being removed by the Xen tools. > > From my understanding, shutdown doesn't mean dead. It may be used during > migration (or snapshotting), where we don't want to touch the state in case of a > cancellation (or resume). > > For instance, see the command XS_RESUME which will clear domain->shutdown. Oh, good catch! I need to additionally check the "shutdown reason". I can remove the watches only in case of the reason not having been "suspend". Juergen
On 07/11/2022 07:54, Juergen Gross wrote: > On 06.11.22 22:18, Julien Grall wrote: >> Hi Juergen, >> >> On 01/11/2022 15:28, Juergen Gross wrote: >>> When a domain has been recognized to have stopped, remove all its >>> registered watches. This avoids sending watch events to the dead domain >>> when all the nodes related to it are being removed by the Xen tools. >> >> From my understanding, shutdown doesn't mean dead. It may be used >> during migration (or snapshotting), where we don't want to touch the >> state in case of a cancellation (or resume). >> >> For instance, see the command XS_RESUME which will clear >> domain->shutdown. > > Oh, good catch! > > I need to additionally check the "shutdown reason". I can remove the > watches only in case of the reason not having been "suspend". This is quite fragile because we may add new shutdown code in the future that could resume. I think it would be better to only delete the watches if the domain is effectively dying (I don't think it can come back from that state) Cheers,
On 07.11.22 19:33, Julien Grall wrote: > > > On 07/11/2022 07:54, Juergen Gross wrote: >> On 06.11.22 22:18, Julien Grall wrote: >>> Hi Juergen, >>> >>> On 01/11/2022 15:28, Juergen Gross wrote: >>>> When a domain has been recognized to have stopped, remove all its >>>> registered watches. This avoids sending watch events to the dead domain >>>> when all the nodes related to it are being removed by the Xen tools. >>> >>> From my understanding, shutdown doesn't mean dead. It may be used during >>> migration (or snapshotting), where we don't want to touch the state in case >>> of a cancellation (or resume). >>> >>> For instance, see the command XS_RESUME which will clear domain->shutdown. >> >> Oh, good catch! >> >> I need to additionally check the "shutdown reason". I can remove the >> watches only in case of the reason not having been "suspend". > > This is quite fragile because we may add new shutdown code in the future that > could resume. > > I think it would be better to only delete the watches if the domain is > effectively dying (I don't think it can come back from that state) This is how it is done today. The domain's Xenstore entries are removed before the domain is being destroyed. Juergen
Hi Juergen, On 08/11/2022 07:54, Juergen Gross wrote: > On 07.11.22 19:33, Julien Grall wrote: >> >> >> On 07/11/2022 07:54, Juergen Gross wrote: >>> On 06.11.22 22:18, Julien Grall wrote: >>>> Hi Juergen, >>>> >>>> On 01/11/2022 15:28, Juergen Gross wrote: >>>>> When a domain has been recognized to have stopped, remove all its >>>>> registered watches. This avoids sending watch events to the dead >>>>> domain >>>>> when all the nodes related to it are being removed by the Xen tools. >>>> >>>> From my understanding, shutdown doesn't mean dead. It may be used >>>> during migration (or snapshotting), where we don't want to touch the >>>> state in case of a cancellation (or resume). >>>> >>>> For instance, see the command XS_RESUME which will clear >>>> domain->shutdown. >>> >>> Oh, good catch! >>> >>> I need to additionally check the "shutdown reason". I can remove the >>> watches only in case of the reason not having been "suspend". >> >> This is quite fragile because we may add new shutdown code in the >> future that could resume. >> >> I think it would be better to only delete the watches if the domain is >> effectively dying (I don't think it can come back from that state) > > This is how it is done today. Not really. dominfo.dying is only set if the domain is dead (i.e. DOMDYING_dead). This is happening *after* everything has been relinquished. So this is quite late compare to what you seem to want. > > The domain's Xenstore entries are removed before the domain is being > destroyed. Do you mean before domain_kill() is called? If so, maybe we should call domain_kill() before hand. The other alternative, would be to notify the Xenstored that domain is never going to come back. Cheers,
On 09.11.22 11:46, Julien Grall wrote: > Hi Juergen, > > On 08/11/2022 07:54, Juergen Gross wrote: >> On 07.11.22 19:33, Julien Grall wrote: >>> >>> >>> On 07/11/2022 07:54, Juergen Gross wrote: >>>> On 06.11.22 22:18, Julien Grall wrote: >>>>> Hi Juergen, >>>>> >>>>> On 01/11/2022 15:28, Juergen Gross wrote: >>>>>> When a domain has been recognized to have stopped, remove all its >>>>>> registered watches. This avoids sending watch events to the dead domain >>>>>> when all the nodes related to it are being removed by the Xen tools. >>>>> >>>>> From my understanding, shutdown doesn't mean dead. It may be used during >>>>> migration (or snapshotting), where we don't want to touch the state in case >>>>> of a cancellation (or resume). >>>>> >>>>> For instance, see the command XS_RESUME which will clear domain->shutdown. >>>> >>>> Oh, good catch! >>>> >>>> I need to additionally check the "shutdown reason". I can remove the >>>> watches only in case of the reason not having been "suspend". >>> >>> This is quite fragile because we may add new shutdown code in the future that >>> could resume. >>> >>> I think it would be better to only delete the watches if the domain is >>> effectively dying (I don't think it can come back from that state) >> >> This is how it is done today. > > Not really. dominfo.dying is only set if the domain is dead (i.e. > DOMDYING_dead). This is happening *after* everything has been relinquished. So > this is quite late compare to what you seem to want. I meant that the watches are removed today when the domain has been detected to by dying (so when removing the struct domain in xenstored). >> The domain's Xenstore entries are removed before the domain is being >> destroyed. > Do you mean before domain_kill() is called? If so, maybe we should call > domain_kill() before hand. This would probably introduce a race: the domid would no longer be reserved in the hypervisor, so a new domain with the same domid could show up and removal of the old domain data and introduction of the new domain data could interfere. > The other alternative, would be to notify the Xenstored that domain is never > going to come back. Yes, this should really work. xs_release_domain() should probably do the job. Juergen
diff --git a/tools/xenstore/xenstored_domain.c b/tools/xenstore/xenstored_domain.c index aa86892fed..1516df71d8 100644 --- a/tools/xenstore/xenstored_domain.c +++ b/tools/xenstore/xenstored_domain.c @@ -364,6 +364,12 @@ void check_domains(void) && !domain->shutdown) { domain->shutdown = true; notify = 1; + /* + * Avoid triggering watch events when the + * domain's nodes are being deleted. + */ + if (domain->conn) + conn_delete_all_watches(domain->conn); } if (!dominfo.dying) continue;
When a domain has been recognized to have stopped, remove all its registered watches. This avoids sending watch events to the dead domain when all the nodes related to it are being removed by the Xen tools. Signed-off-by: Juergen Gross <jgross@suse.com> --- tools/xenstore/xenstored_domain.c | 6 ++++++ 1 file changed, 6 insertions(+)