diff mbox series

[06/20] tools/xenstore: remove all watches when a domain has stopped

Message ID 20221101152842.4257-7-jgross@suse.com (mailing list archive)
State Superseded
Headers show
Series tools/xenstore: do some cleanup and fixes | expand

Commit Message

Jürgen Groß Nov. 1, 2022, 3:28 p.m. UTC
When a domain has been recognized to have stopped, remove all its
registered watches. This avoids sending watch events to the dead domain
when all the nodes related to it are being removed by the Xen tools.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 tools/xenstore/xenstored_domain.c | 6 ++++++
 1 file changed, 6 insertions(+)

Comments

Julien Grall Nov. 6, 2022, 9:18 p.m. UTC | #1
Hi Juergen,

On 01/11/2022 15:28, Juergen Gross wrote:
> When a domain has been recognized to have stopped, remove all its
> registered watches. This avoids sending watch events to the dead domain
> when all the nodes related to it are being removed by the Xen tools.

 From my understanding, shutdown doesn't mean dead. It may be used 
during migration (or snapshotting), where we don't want to touch the 
state in case of a cancellation (or resume).

For instance, see the command XS_RESUME which will clear domain->shutdown.

> 
> Signed-off-by: Juergen Gross <jgross@suse.com>
> ---
>   tools/xenstore/xenstored_domain.c | 6 ++++++
>   1 file changed, 6 insertions(+)
> 
> diff --git a/tools/xenstore/xenstored_domain.c b/tools/xenstore/xenstored_domain.c
> index aa86892fed..1516df71d8 100644
> --- a/tools/xenstore/xenstored_domain.c
> +++ b/tools/xenstore/xenstored_domain.c
> @@ -364,6 +364,12 @@ void check_domains(void)
>   			    && !domain->shutdown) {
>   				domain->shutdown = true;
>   				notify = 1;
> +				/*
> +				 * Avoid triggering watch events when the
> +				 * domain's nodes are being deleted.
> +				 */
> +				if (domain->conn)
> +					conn_delete_all_watches(domain->conn);
>   			}
>   			if (!dominfo.dying)
>   				continue;

Cheers,
Jürgen Groß Nov. 7, 2022, 7:54 a.m. UTC | #2
On 06.11.22 22:18, Julien Grall wrote:
> Hi Juergen,
> 
> On 01/11/2022 15:28, Juergen Gross wrote:
>> When a domain has been recognized to have stopped, remove all its
>> registered watches. This avoids sending watch events to the dead domain
>> when all the nodes related to it are being removed by the Xen tools.
> 
>  From my understanding, shutdown doesn't mean dead. It may be used during 
> migration (or snapshotting), where we don't want to touch the state in case of a 
> cancellation (or resume).
> 
> For instance, see the command XS_RESUME which will clear domain->shutdown.

Oh, good catch!

I need to additionally check the "shutdown reason". I can remove the
watches only in case of the reason not having been "suspend".


Juergen
Julien Grall Nov. 7, 2022, 6:33 p.m. UTC | #3
On 07/11/2022 07:54, Juergen Gross wrote:
> On 06.11.22 22:18, Julien Grall wrote:
>> Hi Juergen,
>>
>> On 01/11/2022 15:28, Juergen Gross wrote:
>>> When a domain has been recognized to have stopped, remove all its
>>> registered watches. This avoids sending watch events to the dead domain
>>> when all the nodes related to it are being removed by the Xen tools.
>>
>>  From my understanding, shutdown doesn't mean dead. It may be used 
>> during migration (or snapshotting), where we don't want to touch the 
>> state in case of a cancellation (or resume).
>>
>> For instance, see the command XS_RESUME which will clear 
>> domain->shutdown.
> 
> Oh, good catch!
> 
> I need to additionally check the "shutdown reason". I can remove the
> watches only in case of the reason not having been "suspend".

This is quite fragile because we may add new shutdown code in the future 
that could resume.

I think it would be better to only delete the watches if the domain is 
effectively dying (I don't think it can come back from that state)

Cheers,
Jürgen Groß Nov. 8, 2022, 7:54 a.m. UTC | #4
On 07.11.22 19:33, Julien Grall wrote:
> 
> 
> On 07/11/2022 07:54, Juergen Gross wrote:
>> On 06.11.22 22:18, Julien Grall wrote:
>>> Hi Juergen,
>>>
>>> On 01/11/2022 15:28, Juergen Gross wrote:
>>>> When a domain has been recognized to have stopped, remove all its
>>>> registered watches. This avoids sending watch events to the dead domain
>>>> when all the nodes related to it are being removed by the Xen tools.
>>>
>>>  From my understanding, shutdown doesn't mean dead. It may be used during 
>>> migration (or snapshotting), where we don't want to touch the state in case 
>>> of a cancellation (or resume).
>>>
>>> For instance, see the command XS_RESUME which will clear domain->shutdown.
>>
>> Oh, good catch!
>>
>> I need to additionally check the "shutdown reason". I can remove the
>> watches only in case of the reason not having been "suspend".
> 
> This is quite fragile because we may add new shutdown code in the future that 
> could resume.
> 
> I think it would be better to only delete the watches if the domain is 
> effectively dying (I don't think it can come back from that state)

This is how it is done today.

The domain's Xenstore entries are removed before the domain is being
destroyed.


Juergen
Julien Grall Nov. 9, 2022, 10:46 a.m. UTC | #5
Hi Juergen,

On 08/11/2022 07:54, Juergen Gross wrote:
> On 07.11.22 19:33, Julien Grall wrote:
>>
>>
>> On 07/11/2022 07:54, Juergen Gross wrote:
>>> On 06.11.22 22:18, Julien Grall wrote:
>>>> Hi Juergen,
>>>>
>>>> On 01/11/2022 15:28, Juergen Gross wrote:
>>>>> When a domain has been recognized to have stopped, remove all its
>>>>> registered watches. This avoids sending watch events to the dead 
>>>>> domain
>>>>> when all the nodes related to it are being removed by the Xen tools.
>>>>
>>>>  From my understanding, shutdown doesn't mean dead. It may be used 
>>>> during migration (or snapshotting), where we don't want to touch the 
>>>> state in case of a cancellation (or resume).
>>>>
>>>> For instance, see the command XS_RESUME which will clear 
>>>> domain->shutdown.
>>>
>>> Oh, good catch!
>>>
>>> I need to additionally check the "shutdown reason". I can remove the
>>> watches only in case of the reason not having been "suspend".
>>
>> This is quite fragile because we may add new shutdown code in the 
>> future that could resume.
>>
>> I think it would be better to only delete the watches if the domain is 
>> effectively dying (I don't think it can come back from that state)
> 
> This is how it is done today.

Not really. dominfo.dying is only set if the domain is dead (i.e. 
DOMDYING_dead). This is happening *after* everything has been 
relinquished. So this is quite late compare to what you seem to want.

> 
> The domain's Xenstore entries are removed before the domain is being
> destroyed.
Do you mean before domain_kill() is called? If so, maybe we should call 
domain_kill() before hand.

The other alternative, would be to notify the Xenstored that domain is 
never going to come back.

Cheers,
Jürgen Groß Nov. 9, 2022, 12:17 p.m. UTC | #6
On 09.11.22 11:46, Julien Grall wrote:
> Hi Juergen,
> 
> On 08/11/2022 07:54, Juergen Gross wrote:
>> On 07.11.22 19:33, Julien Grall wrote:
>>>
>>>
>>> On 07/11/2022 07:54, Juergen Gross wrote:
>>>> On 06.11.22 22:18, Julien Grall wrote:
>>>>> Hi Juergen,
>>>>>
>>>>> On 01/11/2022 15:28, Juergen Gross wrote:
>>>>>> When a domain has been recognized to have stopped, remove all its
>>>>>> registered watches. This avoids sending watch events to the dead domain
>>>>>> when all the nodes related to it are being removed by the Xen tools.
>>>>>
>>>>>  From my understanding, shutdown doesn't mean dead. It may be used during 
>>>>> migration (or snapshotting), where we don't want to touch the state in case 
>>>>> of a cancellation (or resume).
>>>>>
>>>>> For instance, see the command XS_RESUME which will clear domain->shutdown.
>>>>
>>>> Oh, good catch!
>>>>
>>>> I need to additionally check the "shutdown reason". I can remove the
>>>> watches only in case of the reason not having been "suspend".
>>>
>>> This is quite fragile because we may add new shutdown code in the future that 
>>> could resume.
>>>
>>> I think it would be better to only delete the watches if the domain is 
>>> effectively dying (I don't think it can come back from that state)
>>
>> This is how it is done today.
> 
> Not really. dominfo.dying is only set if the domain is dead (i.e. 
> DOMDYING_dead). This is happening *after* everything has been relinquished. So 
> this is quite late compare to what you seem to want.

I meant that the watches are removed today when the domain has been
detected to by dying (so when removing the struct domain in xenstored).

>> The domain's Xenstore entries are removed before the domain is being
>> destroyed.
> Do you mean before domain_kill() is called? If so, maybe we should call 
> domain_kill() before hand.

This would probably introduce a race: the domid would no longer be reserved
in the hypervisor, so a new domain with the same domid could show up and
removal of the old domain data and introduction of the new domain data could
interfere.

> The other alternative, would be to notify the Xenstored that domain is never 
> going to come back.

Yes, this should really work.

xs_release_domain() should probably do the job.


Juergen
diff mbox series

Patch

diff --git a/tools/xenstore/xenstored_domain.c b/tools/xenstore/xenstored_domain.c
index aa86892fed..1516df71d8 100644
--- a/tools/xenstore/xenstored_domain.c
+++ b/tools/xenstore/xenstored_domain.c
@@ -364,6 +364,12 @@  void check_domains(void)
 			    && !domain->shutdown) {
 				domain->shutdown = true;
 				notify = 1;
+				/*
+				 * Avoid triggering watch events when the
+				 * domain's nodes are being deleted.
+				 */
+				if (domain->conn)
+					conn_delete_all_watches(domain->conn);
 			}
 			if (!dominfo.dying)
 				continue;