Message ID | 06ad4b3a67a15192fc986b35e3f2fcd35b2f4c2f.1669383767.git.per.bilse@citrix.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | ioreq_broadcast(): accept partial broadcast success | expand |
Hi, On 25/11/2022 14:15, Per Bilse wrote: > A change to XAPI varstored causes For those unfamiliar with XAPI (like me), can you explain what was the change made? > an unnecessary error message > to be logged in hypervisor.log whenever an RTC timeoffset update > is broadcast. > In extreme cases this could flood the log file. Which should be ratelimited as this is using guest error loglevel. But I think this is irrelevant here. It would be more relevant to explain why it is OK to allow a partial broadcast. > This patch modifies ioreq_broadcast() to allow partial success. The commit message is quite vague, so it is hard to know what you are trying to solve exactly. AFAIU, there are two reasons for ioreq_broadcast to fails: 1) The IOREQ server didn't register the bufioreq 2) The IOREQ buffer page is full While I would agree that the error message is not necessary for 1) (the IOREQ server doesn't care about the event), I would disagree for 2) because it would indicate something went horribly wrong in the IOREQ server and there is a chance your domain may misbehave afterwards. Cheers,
On 26.11.2022 23:19, Julien Grall wrote: > On 25/11/2022 14:15, Per Bilse wrote: >> A change to XAPI varstored causes > > For those unfamiliar with XAPI (like me), can you explain what was the > change made? > >> an unnecessary error message >> to be logged in hypervisor.log whenever an RTC timeoffset update >> is broadcast. >> In extreme cases this could flood the log file. > > Which should be ratelimited as this is using guest error loglevel. But I > think this is irrelevant here. It would be more relevant to explain why > it is OK to allow a partial broadcast. > >> This patch modifies ioreq_broadcast() to allow partial success. > > The commit message is quite vague, so it is hard to know what you are > trying to solve exactly. AFAIU, there are two reasons for > ioreq_broadcast to fails: > 1) The IOREQ server didn't register the bufioreq > 2) The IOREQ buffer page is full > > While I would agree that the error message is not necessary for 1) (the > IOREQ server doesn't care about the event), I would disagree for 2) > because it would indicate something went horribly wrong in the IOREQ > server and there is a chance your domain may misbehave afterwards. In addition I think ignoring failure (and, as said by Julien, only because of no bufioreq being registered) is (kind of implicitly) valid only for buffered requests. Hence I'm unconvinced of the need of a new boolean function parameter. Instead I think we need a new IOREQ_STATUS_... value representing the "not registered" case. And that could then be used by ioreq_broadcast() to skip incrementing of "failed". Jan
On 28/11/2022 08:21, Jan Beulich wrote: > On 26.11.2022 23:19, Julien Grall wrote: >> >> The commit message is quite vague, so it is hard to know what you are >> trying to solve exactly. AFAIU, there are two reasons for >> ioreq_broadcast to fails: >> 1) The IOREQ server didn't register the bufioreq >> 2) The IOREQ buffer page is full >> >> While I would agree that the error message is not necessary for 1) (the >> IOREQ server doesn't care about the event), I would disagree for 2) >> because it would indicate something went horribly wrong in the IOREQ >> server and there is a chance your domain may misbehave afterwards. > > In addition I think ignoring failure (and, as said by Julien, only because > of no bufioreq being registered) is (kind of implicitly) valid only for > buffered requests. Hence I'm unconvinced of the need of a new boolean > function parameter. Instead I think we need a new IOREQ_STATUS_... value > representing the "not registered" case. And that could then be used by > ioreq_broadcast() to skip incrementing of "failed". Hi guys, and thank you very much for the feedback. As I'm sure you've guessed I'm a newbie in Xen terms, so apologies for not getting things quite right. Varstored dropped support for buffered ioreqs, hence the persistent error message(s), and the proposed fix was derived from discussion in Citrix's hypervisor team. The 'partial' parameter could arguably be considered a case of (undesirable) special case handling, but ioreq_broadcast() is called from only two places in the code, so this seemed to be the lightest and simplest solution. I'll have to defer to more knowledgeable team members for further thoughts on this. Best, -- Per
On Mon, Nov 28, 2022 at 09:21:47AM +0100, Jan Beulich wrote: > On 26.11.2022 23:19, Julien Grall wrote: > > On 25/11/2022 14:15, Per Bilse wrote: > >> A change to XAPI varstored causes > > > > For those unfamiliar with XAPI (like me), can you explain what was the > > change made? One ioreq client used by XenServer doesn't handle buffered ioreqs, so the broadcasted TIMEOFFSET always triggers an error due to that ioreq not handling it. While not harmful, it's still annoying to get the messages on the hypervisor console for something that's not really an error. The description can indeed be reworded to not mention XenServer specific components, something like: "Avoid printing an error message when a broadcast buffered ioreq is not handled by all registered clients, as long as the failure is strictly because the client doesn't handle buffered ioreqs." > >> an unnecessary error message > >> to be logged in hypervisor.log whenever an RTC timeoffset update > >> is broadcast. > >> In extreme cases this could flood the log file. > > > > Which should be ratelimited as this is using guest error loglevel. But I > > think this is irrelevant here. It would be more relevant to explain why > > it is OK to allow a partial broadcast. > > > >> This patch modifies ioreq_broadcast() to allow partial success. > > > > The commit message is quite vague, so it is hard to know what you are > > trying to solve exactly. AFAIU, there are two reasons for > > ioreq_broadcast to fails: > > 1) The IOREQ server didn't register the bufioreq > > 2) The IOREQ buffer page is full > > > > While I would agree that the error message is not necessary for 1) (the > > IOREQ server doesn't care about the event), I would disagree for 2) > > because it would indicate something went horribly wrong in the IOREQ > > server and there is a chance your domain may misbehave afterwards. > > In addition I think ignoring failure (and, as said by Julien, only because > of no bufioreq being registered) is (kind of implicitly) valid only for > buffered requests. Hence I'm unconvinced of the need of a new boolean > function parameter. Instead I think we need a new IOREQ_STATUS_... value > representing the "not registered" case. And that could then be used by > ioreq_broadcast() to skip incrementing of "failed". So introduce an IOREQ_STATUS_UNREGISTERED return code and don't increase failed if buffered == true and UNREGISTERED is returned, would that be acceptable? Thanks, Roger.
On 28.11.2022 12:06, Roger Pau Monné wrote: > On Mon, Nov 28, 2022 at 09:21:47AM +0100, Jan Beulich wrote: >> On 26.11.2022 23:19, Julien Grall wrote: >>> On 25/11/2022 14:15, Per Bilse wrote: >>>> This patch modifies ioreq_broadcast() to allow partial success. >>> >>> The commit message is quite vague, so it is hard to know what you are >>> trying to solve exactly. AFAIU, there are two reasons for >>> ioreq_broadcast to fails: >>> 1) The IOREQ server didn't register the bufioreq >>> 2) The IOREQ buffer page is full >>> >>> While I would agree that the error message is not necessary for 1) (the >>> IOREQ server doesn't care about the event), I would disagree for 2) >>> because it would indicate something went horribly wrong in the IOREQ >>> server and there is a chance your domain may misbehave afterwards. >> >> In addition I think ignoring failure (and, as said by Julien, only because >> of no bufioreq being registered) is (kind of implicitly) valid only for >> buffered requests. Hence I'm unconvinced of the need of a new boolean >> function parameter. Instead I think we need a new IOREQ_STATUS_... value >> representing the "not registered" case. And that could then be used by >> ioreq_broadcast() to skip incrementing of "failed". > > So introduce an IOREQ_STATUS_UNREGISTERED return code and don't > increase failed if buffered == true and UNREGISTERED is returned, > would that be acceptable? Yes afaic, but Paul is the maintainer of this code. And of course the new error indicator shouldn't surprise any existing callers. Jan
On 28/11/2022 12:26, Jan Beulich wrote: > On 28.11.2022 12:06, Roger Pau Monné wrote: >> On Mon, Nov 28, 2022 at 09:21:47AM +0100, Jan Beulich wrote: >>> On 26.11.2022 23:19, Julien Grall wrote: >>>> On 25/11/2022 14:15, Per Bilse wrote: >>>>> This patch modifies ioreq_broadcast() to allow partial success. >>>> >>>> The commit message is quite vague, so it is hard to know what you are >>>> trying to solve exactly. AFAIU, there are two reasons for >>>> ioreq_broadcast to fails: >>>> 1) The IOREQ server didn't register the bufioreq >>>> 2) The IOREQ buffer page is full >>>> >>>> While I would agree that the error message is not necessary for 1) (the >>>> IOREQ server doesn't care about the event), I would disagree for 2) >>>> because it would indicate something went horribly wrong in the IOREQ >>>> server and there is a chance your domain may misbehave afterwards. >>> >>> In addition I think ignoring failure (and, as said by Julien, only because >>> of no bufioreq being registered) is (kind of implicitly) valid only for >>> buffered requests. Hence I'm unconvinced of the need of a new boolean >>> function parameter. Instead I think we need a new IOREQ_STATUS_... value >>> representing the "not registered" case. And that could then be used by >>> ioreq_broadcast() to skip incrementing of "failed". >> >> So introduce an IOREQ_STATUS_UNREGISTERED return code and don't >> increase failed if buffered == true and UNREGISTERED is returned, >> would that be acceptable? > > Yes afaic, but Paul is the maintainer of this code. And of course the > new error indicator shouldn't surprise any existing callers. > A new status code does indeed seem like the cleanest way forward. Paul
On 28/11/2022 08:21, Jan Beulich wrote: > In addition I think ignoring failure (and, as said by Julien, only because > of no bufioreq being registered) is (kind of implicitly) valid only for > buffered requests. Hence I'm unconvinced of the need of a new boolean > function parameter. Instead I think we need a new IOREQ_STATUS_... value > representing the "not registered" case. And that could then be used by > ioreq_broadcast() to skip incrementing of "failed". I think I have been thinking about this the wrong way. My thinking has been that dropping an update (buffered or not) would be correct only in special cases such as timeoffset, and it would therefore generally be an error if a buffered update was directed to a handler that hadn't registered for buffered updates. The thinking in this proposal suggests that handlers are generally free to choose whether or not to accept buffered updates. I wouldn't have suspected this, but I assume then that this is perfectly reasonable in the context of Xen IO handling, so I'll go with that. Best, -- Per
diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c index 0309d05cfd..c4022bf7c2 100644 --- a/xen/arch/x86/hvm/io.c +++ b/xen/arch/x86/hvm/io.c @@ -60,7 +60,7 @@ void send_timeoffset_req(unsigned long timeoff) if ( timeoff == 0 ) return; - if ( ioreq_broadcast(&p, true) != 0 ) + if ( !ioreq_broadcast(&p, true, true) ) gprintk(XENLOG_ERR, "Unsuccessful timeoffset update\n"); } diff --git a/xen/common/ioreq.c b/xen/common/ioreq.c index 4617aef29b..1d6ca4d1ac 100644 --- a/xen/common/ioreq.c +++ b/xen/common/ioreq.c @@ -54,7 +54,7 @@ void ioreq_signal_mapcache_invalidate(void) .data = ~0UL, /* flush all */ }; - if ( ioreq_broadcast(&p, false) != 0 ) + if ( !ioreq_broadcast(&p, false, false) ) gprintk(XENLOG_ERR, "Unsuccessful map-cache invalidate\n"); } @@ -1309,11 +1309,11 @@ int ioreq_send(struct ioreq_server *s, ioreq_t *proto_p, return IOREQ_STATUS_UNHANDLED; } -unsigned int ioreq_broadcast(ioreq_t *p, bool buffered) +bool ioreq_broadcast(ioreq_t *p, bool buffered, bool partial) { struct domain *d = current->domain; struct ioreq_server *s; - unsigned int id, failed = 0; + unsigned int id, sent = 0, failed = 0; FOR_EACH_IOREQ_SERVER(d, id, s) { @@ -1322,9 +1322,10 @@ unsigned int ioreq_broadcast(ioreq_t *p, bool buffered) if ( ioreq_send(s, p, buffered) == IOREQ_STATUS_UNHANDLED ) failed++; + sent++; } - return failed; + return failed == 0 || (partial && failed < sent); } void ioreq_domain_init(struct domain *d) diff --git a/xen/include/xen/ioreq.h b/xen/include/xen/ioreq.h index a26614d331..65457ca5ba 100644 --- a/xen/include/xen/ioreq.h +++ b/xen/include/xen/ioreq.h @@ -102,7 +102,7 @@ struct ioreq_server *ioreq_server_select(struct domain *d, ioreq_t *p); int ioreq_send(struct ioreq_server *s, ioreq_t *proto_p, bool buffered); -unsigned int ioreq_broadcast(ioreq_t *p, bool buffered); +bool ioreq_broadcast(ioreq_t *p, bool buffered, bool partial); void ioreq_request_mapcache_invalidate(const struct domain *d); void ioreq_signal_mapcache_invalidate(void);
A change to XAPI varstored causes an unnecessary error message to be logged in hypervisor.log whenever an RTC timeoffset update is broadcast. In extreme cases this could flood the log file. This patch modifies ioreq_broadcast() to allow partial success. Signed-off-by: Per Bilse <per.bilse@citrix.com> --- xen/arch/x86/hvm/io.c | 2 +- xen/common/ioreq.c | 9 +++++---- xen/include/xen/ioreq.h | 2 +- 3 files changed, 7 insertions(+), 6 deletions(-)