Message ID | 20210303170526.15903-1-julien@xen.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [for-4.15] tools/xenstored: liveupdate: Properly check long transaction | expand |
Julien Grall writes ("[PATCH for-4.15] tools/xenstored: liveupdate: Properly check long transaction"): > From: Julien Grall <jgrall@amazon.com> > > As XenStored is single-threaded, conn->ta_start_time will always be > smaller than now. As we substract the latter from the former, it means > a transaction will never be considered long running. > > Invert the two operands of the substraction in both lu_reject_reason() > and lu_check_allowed(). In addition to that, the former also needs to > check that conn->ta_start_time is not 0 (i.e the transaction is not > active). > > Take the opportunity to document the return condition of > lu_check_allowed(). AFAICT this only affects live updated which is not security-supported in 4.15 and which won't block our tests. So: Release-Acked-by: Ian Jackson <iwj@xenproject.org> Ian.
Hi Ian, On 03/03/2021 17:41, Ian Jackson wrote: > Julien Grall writes ("[PATCH for-4.15] tools/xenstored: liveupdate: Properly check long transaction"): >> From: Julien Grall <jgrall@amazon.com> >> >> As XenStored is single-threaded, conn->ta_start_time will always be >> smaller than now. As we substract the latter from the former, it means >> a transaction will never be considered long running. >> >> Invert the two operands of the substraction in both lu_reject_reason() >> and lu_check_allowed(). In addition to that, the former also needs to >> check that conn->ta_start_time is not 0 (i.e the transaction is not >> active). >> >> Take the opportunity to document the return condition of >> lu_check_allowed(). > > AFAICT this only affects live updated which is not security-supported > in 4.15 and which won't block our tests. That's correct. > So: > > Release-Acked-by: Ian Jackson <iwj@xenproject.org> Thanks! Cheers,
Julien Grall writes ("Re: [PATCH for-4.15] tools/xenstored: liveupdate: Properly check long transaction"): > On 03/03/2021 17:41, Ian Jackson wrote: > > AFAICT this only affects live updated which is not security-supported > > in 4.15 and which won't block our tests. > > That's correct. > > > So: > > > > Release-Acked-by: Ian Jackson <iwj@xenproject.org> > > Thanks! PS, notwithstanding the above, submissions of tests for this feature would be very welcome. (If they turn out to block things during the release, I always have the force push hammer.) Ian.
On 03.03.21 18:05, Julien Grall wrote: > From: Julien Grall <jgrall@amazon.com> > > As XenStored is single-threaded, conn->ta_start_time will always be > smaller than now. As we substract the latter from the former, it means > a transaction will never be considered long running. > > Invert the two operands of the substraction in both lu_reject_reason() > and lu_check_allowed(). In addition to that, the former also needs to > check that conn->ta_start_time is not 0 (i.e the transaction is not > active). > > Take the opportunity to document the return condition of > lu_check_allowed(). > > Fixes: e04e53a5be20 ("tools/xenstore: allow live update only with no transaction active") > Reported-by: Bjoern Doebel <doebel@amazon.de> > Signed-off-by: Julien Grall <jgrall@amazon.com> Reviewed-by: Juergen Gross <jgross@suse.com> > > --- > > I am a bit puzzled on how -F is implemented. From my understanding we > will force LiveUpdate when one of the following conditions is met: > 1) All the active transactions are long running > 2) If we didn't manage to LiveUpdate after N sec > > It is not quite clear why we need the both as 2) would indirectly cover > 1). However 2) is probably unsafe as we may reset transactions for > "well-behaving" guest. > > So I am thinking to send a patch to drop 2). Any opinions? This will enable two guests working together to block LU by using overlapping transactions: Guest 1: ----- ----- ----- ... Guest 2: -- ----- ----- --- ... There is always a transaction active, but none is regarded to be long running. Juergen
On 04/03/2021 09:00, Jürgen Groß wrote: > On 03.03.21 18:05, Julien Grall wrote: >> From: Julien Grall <jgrall@amazon.com> >> >> As XenStored is single-threaded, conn->ta_start_time will always be >> smaller than now. As we substract the latter from the former, it means >> a transaction will never be considered long running. >> >> Invert the two operands of the substraction in both lu_reject_reason() >> and lu_check_allowed(). In addition to that, the former also needs to >> check that conn->ta_start_time is not 0 (i.e the transaction is not >> active). >> >> Take the opportunity to document the return condition of >> lu_check_allowed(). >> >> Fixes: e04e53a5be20 ("tools/xenstore: allow live update only with no >> transaction active") >> Reported-by: Bjoern Doebel <doebel@amazon.de> >> Signed-off-by: Julien Grall <jgrall@amazon.com> > > Reviewed-by: Juergen Gross <jgross@suse.com> > >> >> --- >> >> I am a bit puzzled on how -F is implemented. From my understanding we >> will force LiveUpdate when one of the following conditions is met: >> 1) All the active transactions are long running >> 2) If we didn't manage to LiveUpdate after N sec >> >> It is not quite clear why we need the both as 2) would indirectly cover >> 1). However 2) is probably unsafe as we may reset transactions for >> "well-behaving" guest. >> >> So I am thinking to send a patch to drop 2). Any opinions? > > This will enable two guests working together to block LU by using > overlapping transactions: > > Guest 1: ----- ----- ----- ... > Guest 2: -- ----- ----- --- ... > > There is always a transaction active, but none is regarded to be > long running. Right, how do you know whether two guests are working together? I understand that "-F" means that things could break... However, I am not entirely sure who can realistically use this option without risking breaking other guests. For instance, there might be a 3rd guest that has an active transaction and not cooperating with the first two. Rather than forcing in this case, how about we quiesce the connection if it starts a transaction when LiveUpdate is pending? Cheers,
On 04.03.21 10:39, Julien Grall wrote: > > > On 04/03/2021 09:00, Jürgen Groß wrote: >> On 03.03.21 18:05, Julien Grall wrote: >>> From: Julien Grall <jgrall@amazon.com> >>> >>> As XenStored is single-threaded, conn->ta_start_time will always be >>> smaller than now. As we substract the latter from the former, it means >>> a transaction will never be considered long running. >>> >>> Invert the two operands of the substraction in both lu_reject_reason() >>> and lu_check_allowed(). In addition to that, the former also needs to >>> check that conn->ta_start_time is not 0 (i.e the transaction is not >>> active). >>> >>> Take the opportunity to document the return condition of >>> lu_check_allowed(). >>> >>> Fixes: e04e53a5be20 ("tools/xenstore: allow live update only with no >>> transaction active") >>> Reported-by: Bjoern Doebel <doebel@amazon.de> >>> Signed-off-by: Julien Grall <jgrall@amazon.com> >> >> Reviewed-by: Juergen Gross <jgross@suse.com> >> >>> >>> --- >>> >>> I am a bit puzzled on how -F is implemented. From my understanding we >>> will force LiveUpdate when one of the following conditions is met: >>> 1) All the active transactions are long running >>> 2) If we didn't manage to LiveUpdate after N sec >>> >>> It is not quite clear why we need the both as 2) would indirectly cover >>> 1). However 2) is probably unsafe as we may reset transactions for >>> "well-behaving" guest. >>> >>> So I am thinking to send a patch to drop 2). Any opinions? >> >> This will enable two guests working together to block LU by using >> overlapping transactions: >> >> Guest 1: ----- ----- ----- ... >> Guest 2: -- ----- ----- --- ... > >> There is always a transaction active, but none is regarded to be >> long running. > > Right, how do you know whether two guests are working together? We can't know that. And this is the reason why you have to use the -F option to force a LU. > I understand that "-F" means that things could break... However, I am > not entirely sure who can realistically use this option without risking > breaking other guests. For instance, there might be a 3rd guest that has > an active transaction and not cooperating with the first two. Yes. OTOH the chances are rather low that multiple LU attempts are failing due to transactions being active all the time. > Rather than forcing in this case, how about we quiesce the connection if > it starts a transaction when LiveUpdate is pending? Yes, this should work, too. Juergen
Hi Juergen, On 04/03/2021 09:48, Jürgen Groß wrote: > On 04.03.21 10:39, Julien Grall wrote: >> >> >> On 04/03/2021 09:00, Jürgen Groß wrote: >>> On 03.03.21 18:05, Julien Grall wrote: >>>> From: Julien Grall <jgrall@amazon.com> >>>> >>>> As XenStored is single-threaded, conn->ta_start_time will always be >>>> smaller than now. As we substract the latter from the former, it means >>>> a transaction will never be considered long running. >>>> >>>> Invert the two operands of the substraction in both lu_reject_reason() >>>> and lu_check_allowed(). In addition to that, the former also needs to >>>> check that conn->ta_start_time is not 0 (i.e the transaction is not >>>> active). >>>> >>>> Take the opportunity to document the return condition of >>>> lu_check_allowed(). >>>> >>>> Fixes: e04e53a5be20 ("tools/xenstore: allow live update only with no >>>> transaction active") >>>> Reported-by: Bjoern Doebel <doebel@amazon.de> >>>> Signed-off-by: Julien Grall <jgrall@amazon.com> >>> >>> Reviewed-by: Juergen Gross <jgross@suse.com> >>> >>>> >>>> --- >>>> >>>> I am a bit puzzled on how -F is implemented. From my understanding we >>>> will force LiveUpdate when one of the following conditions is met: >>>> 1) All the active transactions are long running >>>> 2) If we didn't manage to LiveUpdate after N sec >>>> >>>> It is not quite clear why we need the both as 2) would indirectly cover >>>> 1). However 2) is probably unsafe as we may reset transactions for >>>> "well-behaving" guest. >>>> >>>> So I am thinking to send a patch to drop 2). Any opinions? >>> >>> This will enable two guests working together to block LU by using >>> overlapping transactions: >>> >>> Guest 1: ----- ----- ----- ... >>> Guest 2: -- ----- ----- --- ... > >>> There is always a transaction active, but none is regarded to be >>> long running. >> >> Right, how do you know whether two guests are working together? > > We can't know that. And this is the reason why you have to use the -F > option to force a LU. I understand that... But the consequence are potentially devastating on all the other connections, correct? > >> I understand that "-F" means that things could break... However, I am >> not entirely sure who can realistically use this option without >> risking breaking other guests. For instance, there might be a 3rd >> guest that has an active transaction and not cooperating with the >> first two. > > Yes. OTOH the chances are rather low that multiple LU attempts are > failing due to transactions being active all the time. Give me access to your server and I can run you a workload that prevent LiveUpdate without -F ;). Joke aside, a guest crashing in the middle of the transaction can prevent LiveUpdate to succeed. As the guest owner may not be the host owner, you don't necessarily know when the problem will be remediated. This is where I would expect -F to be useful as breaking transaction for such guest is low-risk. However, the side-effect of -F looks quite undesirable so far. > >> Rather than forcing in this case, how about we quiesce the connection >> if it starts a transaction when LiveUpdate is pending? > > Yes, this should work, too. I will have a look. It is not going to be material for 4.15 though. Cheers,
diff --git a/tools/xenstore/xenstored_control.c b/tools/xenstore/xenstored_control.c index d7ad112138b2..8e470f2b2056 100644 --- a/tools/xenstore/xenstored_control.c +++ b/tools/xenstore/xenstored_control.c @@ -459,11 +459,18 @@ static bool lu_check_lu_allowed(void) list_for_each_entry(conn, &connections, list) { if (conn->ta_start_time) { ta_total++; - if (conn->ta_start_time - now >= lu_status->timeout) + if (now - conn->ta_start_time >= lu_status->timeout) ta_long++; } } + /* + * Allow LiveUpdate if one of the following conditions is met: + * - There is no active transactions + * - All transactions are long running (e.g. they have been + * active for more than lu_status->timeout sec) and the admin as + * requested to force the operation. + */ return ta_total ? (lu_status->force && ta_long == ta_total) : true; } @@ -474,11 +481,12 @@ static const char *lu_reject_reason(const void *ctx) time_t now = time(NULL); list_for_each_entry(conn, &connections, list) { - if (conn->ta_start_time - now >= lu_status->timeout) { + if (conn->ta_start_time && + (now - conn->ta_start_time >= lu_status->timeout)) { ret = talloc_asprintf(ctx, "%s\nDomain %u: %ld s", ret ? : "Domains with long running transactions:", conn->id, - conn->ta_start_time - now); + now - conn->ta_start_time); } }