diff mbox series

[for-4.15] tools/xenstored: liveupdate: Properly check long transaction

Message ID 20210303170526.15903-1-julien@xen.org (mailing list archive)
State New, archived
Headers show
Series [for-4.15] tools/xenstored: liveupdate: Properly check long transaction | expand

Commit Message

Julien Grall March 3, 2021, 5:05 p.m. UTC
From: Julien Grall <jgrall@amazon.com>

As XenStored is single-threaded, conn->ta_start_time will always be
smaller than now. As we substract the latter from the former, it means
a transaction will never be considered long running.

Invert the two operands of the substraction in both lu_reject_reason()
and lu_check_allowed(). In addition to that, the former also needs to
check that conn->ta_start_time is not 0 (i.e the transaction is not
active).

Take the opportunity to document the return condition of
lu_check_allowed().

Fixes: e04e53a5be20 ("tools/xenstore: allow live update only with no transaction active")
Reported-by: Bjoern Doebel <doebel@amazon.de>
Signed-off-by: Julien Grall <jgrall@amazon.com>

---

I am a bit puzzled on how -F is implemented. From my understanding we
will force LiveUpdate when one of the following conditions is met:
  1) All the active transactions are long running
  2) If we didn't manage to LiveUpdate after N sec

It is not quite clear why we need the both as 2) would indirectly cover
1). However 2) is probably unsafe as we may reset transactions for
"well-behaving" guest.

So I am thinking to send a patch to drop 2). Any opinions?

This patch is candidate for 4.15. Without it, the long-running
transactions are not properly accounted.
---
 tools/xenstore/xenstored_control.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

Comments

Ian Jackson March 3, 2021, 5:41 p.m. UTC | #1
Julien Grall writes ("[PATCH for-4.15] tools/xenstored: liveupdate: Properly check long transaction"):
> From: Julien Grall <jgrall@amazon.com>
> 
> As XenStored is single-threaded, conn->ta_start_time will always be
> smaller than now. As we substract the latter from the former, it means
> a transaction will never be considered long running.
> 
> Invert the two operands of the substraction in both lu_reject_reason()
> and lu_check_allowed(). In addition to that, the former also needs to
> check that conn->ta_start_time is not 0 (i.e the transaction is not
> active).
> 
> Take the opportunity to document the return condition of
> lu_check_allowed().

AFAICT this only affects live updated which is not security-supported
in 4.15 and which won't block our tests.  So:

Release-Acked-by: Ian Jackson <iwj@xenproject.org>

Ian.
Julien Grall March 3, 2021, 6:08 p.m. UTC | #2
Hi Ian,

On 03/03/2021 17:41, Ian Jackson wrote:
> Julien Grall writes ("[PATCH for-4.15] tools/xenstored: liveupdate: Properly check long transaction"):
>> From: Julien Grall <jgrall@amazon.com>
>>
>> As XenStored is single-threaded, conn->ta_start_time will always be
>> smaller than now. As we substract the latter from the former, it means
>> a transaction will never be considered long running.
>>
>> Invert the two operands of the substraction in both lu_reject_reason()
>> and lu_check_allowed(). In addition to that, the former also needs to
>> check that conn->ta_start_time is not 0 (i.e the transaction is not
>> active).
>>
>> Take the opportunity to document the return condition of
>> lu_check_allowed().
> 
> AFAICT this only affects live updated which is not security-supported
> in 4.15 and which won't block our tests. 

That's correct.

> So:
> 
> Release-Acked-by: Ian Jackson <iwj@xenproject.org>

Thanks!

Cheers,
Ian Jackson March 3, 2021, 6:14 p.m. UTC | #3
Julien Grall writes ("Re: [PATCH for-4.15] tools/xenstored: liveupdate: Properly check long transaction"):
> On 03/03/2021 17:41, Ian Jackson wrote:
> > AFAICT this only affects live updated which is not security-supported
> > in 4.15 and which won't block our tests. 
> 
> That's correct.
> 
> > So:
> > 
> > Release-Acked-by: Ian Jackson <iwj@xenproject.org>
> 
> Thanks!

PS, notwithstanding the above, submissions of tests for this feature
would be very welcome.

(If they turn out to block things during the release, I always have
the force push hammer.)

Ian.
Jürgen Groß March 4, 2021, 9 a.m. UTC | #4
On 03.03.21 18:05, Julien Grall wrote:
> From: Julien Grall <jgrall@amazon.com>
> 
> As XenStored is single-threaded, conn->ta_start_time will always be
> smaller than now. As we substract the latter from the former, it means
> a transaction will never be considered long running.
> 
> Invert the two operands of the substraction in both lu_reject_reason()
> and lu_check_allowed(). In addition to that, the former also needs to
> check that conn->ta_start_time is not 0 (i.e the transaction is not
> active).
> 
> Take the opportunity to document the return condition of
> lu_check_allowed().
> 
> Fixes: e04e53a5be20 ("tools/xenstore: allow live update only with no transaction active")
> Reported-by: Bjoern Doebel <doebel@amazon.de>
> Signed-off-by: Julien Grall <jgrall@amazon.com>

Reviewed-by: Juergen Gross <jgross@suse.com>

> 
> ---
> 
> I am a bit puzzled on how -F is implemented. From my understanding we
> will force LiveUpdate when one of the following conditions is met:
>    1) All the active transactions are long running
>    2) If we didn't manage to LiveUpdate after N sec
> 
> It is not quite clear why we need the both as 2) would indirectly cover
> 1). However 2) is probably unsafe as we may reset transactions for
> "well-behaving" guest.
> 
> So I am thinking to send a patch to drop 2). Any opinions?

This will enable two guests working together to block LU by using
overlapping transactions:

Guest 1: ----- ----- -----  ...
Guest 2: -- ----- ----- --- ...

There is always a transaction active, but none is regarded to be
long running.


Juergen
Julien Grall March 4, 2021, 9:39 a.m. UTC | #5
On 04/03/2021 09:00, Jürgen Groß wrote:
> On 03.03.21 18:05, Julien Grall wrote:
>> From: Julien Grall <jgrall@amazon.com>
>>
>> As XenStored is single-threaded, conn->ta_start_time will always be
>> smaller than now. As we substract the latter from the former, it means
>> a transaction will never be considered long running.
>>
>> Invert the two operands of the substraction in both lu_reject_reason()
>> and lu_check_allowed(). In addition to that, the former also needs to
>> check that conn->ta_start_time is not 0 (i.e the transaction is not
>> active).
>>
>> Take the opportunity to document the return condition of
>> lu_check_allowed().
>>
>> Fixes: e04e53a5be20 ("tools/xenstore: allow live update only with no 
>> transaction active")
>> Reported-by: Bjoern Doebel <doebel@amazon.de>
>> Signed-off-by: Julien Grall <jgrall@amazon.com>
> 
> Reviewed-by: Juergen Gross <jgross@suse.com>
> 
>>
>> ---
>>
>> I am a bit puzzled on how -F is implemented. From my understanding we
>> will force LiveUpdate when one of the following conditions is met:
>>    1) All the active transactions are long running
>>    2) If we didn't manage to LiveUpdate after N sec
>>
>> It is not quite clear why we need the both as 2) would indirectly cover
>> 1). However 2) is probably unsafe as we may reset transactions for
>> "well-behaving" guest.
>>
>> So I am thinking to send a patch to drop 2). Any opinions?
> 
> This will enable two guests working together to block LU by using
> overlapping transactions:
> 
> Guest 1: ----- ----- -----  ...
> Guest 2: -- ----- ----- --- ... >
> There is always a transaction active, but none is regarded to be
> long running.

Right, how do you know whether two guests are working together?

I understand that "-F" means that things could break... However, I am 
not entirely sure who can realistically use this option without risking 
breaking other guests. For instance, there might be a 3rd guest that has 
an active transaction and not cooperating with the first two.

Rather than forcing in this case, how about we quiesce the connection if 
it starts a transaction when LiveUpdate is pending?

Cheers,
Jürgen Groß March 4, 2021, 9:48 a.m. UTC | #6
On 04.03.21 10:39, Julien Grall wrote:
> 
> 
> On 04/03/2021 09:00, Jürgen Groß wrote:
>> On 03.03.21 18:05, Julien Grall wrote:
>>> From: Julien Grall <jgrall@amazon.com>
>>>
>>> As XenStored is single-threaded, conn->ta_start_time will always be
>>> smaller than now. As we substract the latter from the former, it means
>>> a transaction will never be considered long running.
>>>
>>> Invert the two operands of the substraction in both lu_reject_reason()
>>> and lu_check_allowed(). In addition to that, the former also needs to
>>> check that conn->ta_start_time is not 0 (i.e the transaction is not
>>> active).
>>>
>>> Take the opportunity to document the return condition of
>>> lu_check_allowed().
>>>
>>> Fixes: e04e53a5be20 ("tools/xenstore: allow live update only with no 
>>> transaction active")
>>> Reported-by: Bjoern Doebel <doebel@amazon.de>
>>> Signed-off-by: Julien Grall <jgrall@amazon.com>
>>
>> Reviewed-by: Juergen Gross <jgross@suse.com>
>>
>>>
>>> ---
>>>
>>> I am a bit puzzled on how -F is implemented. From my understanding we
>>> will force LiveUpdate when one of the following conditions is met:
>>>    1) All the active transactions are long running
>>>    2) If we didn't manage to LiveUpdate after N sec
>>>
>>> It is not quite clear why we need the both as 2) would indirectly cover
>>> 1). However 2) is probably unsafe as we may reset transactions for
>>> "well-behaving" guest.
>>>
>>> So I am thinking to send a patch to drop 2). Any opinions?
>>
>> This will enable two guests working together to block LU by using
>> overlapping transactions:
>>
>> Guest 1: ----- ----- -----  ...
>> Guest 2: -- ----- ----- --- ... >
>> There is always a transaction active, but none is regarded to be
>> long running.
> 
> Right, how do you know whether two guests are working together?

We can't know that. And this is the reason why you have to use the -F
option to force a LU.

> I understand that "-F" means that things could break... However, I am 
> not entirely sure who can realistically use this option without risking 
> breaking other guests. For instance, there might be a 3rd guest that has 
> an active transaction and not cooperating with the first two.

Yes. OTOH the chances are rather low that multiple LU attempts are
failing due to transactions being active all the time.

> Rather than forcing in this case, how about we quiesce the connection if 
> it starts a transaction when LiveUpdate is pending?

Yes, this should work, too.


Juergen
Julien Grall March 4, 2021, 10:14 a.m. UTC | #7
Hi Juergen,

On 04/03/2021 09:48, Jürgen Groß wrote:
> On 04.03.21 10:39, Julien Grall wrote:
>>
>>
>> On 04/03/2021 09:00, Jürgen Groß wrote:
>>> On 03.03.21 18:05, Julien Grall wrote:
>>>> From: Julien Grall <jgrall@amazon.com>
>>>>
>>>> As XenStored is single-threaded, conn->ta_start_time will always be
>>>> smaller than now. As we substract the latter from the former, it means
>>>> a transaction will never be considered long running.
>>>>
>>>> Invert the two operands of the substraction in both lu_reject_reason()
>>>> and lu_check_allowed(). In addition to that, the former also needs to
>>>> check that conn->ta_start_time is not 0 (i.e the transaction is not
>>>> active).
>>>>
>>>> Take the opportunity to document the return condition of
>>>> lu_check_allowed().
>>>>
>>>> Fixes: e04e53a5be20 ("tools/xenstore: allow live update only with no 
>>>> transaction active")
>>>> Reported-by: Bjoern Doebel <doebel@amazon.de>
>>>> Signed-off-by: Julien Grall <jgrall@amazon.com>
>>>
>>> Reviewed-by: Juergen Gross <jgross@suse.com>
>>>
>>>>
>>>> ---
>>>>
>>>> I am a bit puzzled on how -F is implemented. From my understanding we
>>>> will force LiveUpdate when one of the following conditions is met:
>>>>    1) All the active transactions are long running
>>>>    2) If we didn't manage to LiveUpdate after N sec
>>>>
>>>> It is not quite clear why we need the both as 2) would indirectly cover
>>>> 1). However 2) is probably unsafe as we may reset transactions for
>>>> "well-behaving" guest.
>>>>
>>>> So I am thinking to send a patch to drop 2). Any opinions?
>>>
>>> This will enable two guests working together to block LU by using
>>> overlapping transactions:
>>>
>>> Guest 1: ----- ----- -----  ...
>>> Guest 2: -- ----- ----- --- ... >
>>> There is always a transaction active, but none is regarded to be
>>> long running.
>>
>> Right, how do you know whether two guests are working together?
> 
> We can't know that. And this is the reason why you have to use the -F
> option to force a LU.

I understand that... But the consequence are potentially devastating on 
all the other connections, correct?

> 
>> I understand that "-F" means that things could break... However, I am 
>> not entirely sure who can realistically use this option without 
>> risking breaking other guests. For instance, there might be a 3rd 
>> guest that has an active transaction and not cooperating with the 
>> first two.
> 
> Yes. OTOH the chances are rather low that multiple LU attempts are
> failing due to transactions being active all the time.

Give me access to your server and I can run you a workload that prevent 
LiveUpdate without -F ;).

Joke aside, a guest crashing in the middle of the transaction can 
prevent LiveUpdate to succeed. As the guest owner may not be the host 
owner, you don't necessarily know when the problem will be remediated.

This is where I would expect -F to be useful as breaking transaction for 
such guest is low-risk. However, the side-effect of -F looks quite 
undesirable so far.

> 
>> Rather than forcing in this case, how about we quiesce the connection 
>> if it starts a transaction when LiveUpdate is pending?
> 
> Yes, this should work, too.

I will have a look. It is not going to be material for 4.15 though.

Cheers,
diff mbox series

Patch

diff --git a/tools/xenstore/xenstored_control.c b/tools/xenstore/xenstored_control.c
index d7ad112138b2..8e470f2b2056 100644
--- a/tools/xenstore/xenstored_control.c
+++ b/tools/xenstore/xenstored_control.c
@@ -459,11 +459,18 @@  static bool lu_check_lu_allowed(void)
 	list_for_each_entry(conn, &connections, list) {
 		if (conn->ta_start_time) {
 			ta_total++;
-			if (conn->ta_start_time - now >= lu_status->timeout)
+			if (now - conn->ta_start_time >= lu_status->timeout)
 				ta_long++;
 		}
 	}
 
+	/*
+	 * Allow LiveUpdate if one of the following conditions is met:
+	 *	- There is no active transactions
+	 *	- All transactions are long running (e.g. they have been
+	 *	active for more than lu_status->timeout sec) and the admin as
+	 *	requested to force the operation.
+	 */
 	return ta_total ? (lu_status->force && ta_long == ta_total) : true;
 }
 
@@ -474,11 +481,12 @@  static const char *lu_reject_reason(const void *ctx)
 	time_t now = time(NULL);
 
 	list_for_each_entry(conn, &connections, list) {
-		if (conn->ta_start_time - now >= lu_status->timeout) {
+		if (conn->ta_start_time &&
+		    (now - conn->ta_start_time >= lu_status->timeout)) {
 			ret = talloc_asprintf(ctx, "%s\nDomain %u: %ld s",
 					      ret ? : "Domains with long running transactions:",
 					      conn->id,
-					      conn->ta_start_time - now);
+					      now - conn->ta_start_time);
 		}
 	}