Message ID | 20200319204025.2649661-1-dwmw2@infradead.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [1/2] tools/xenstore: Do not abort xenstore-ls if a node disappears while iterating | expand |
On 19.03.20 21:40, David Woodhouse wrote: > From: David Woodhouse <dwmw@amazon.co.uk> > > The do_ls() function has somewhat inconsistent handling of errors. > > If reading the node's contents with xs_read() fails, then do_ls() will > just quietly not display the contents. > > If reading the node's permissions with xs_get_permissions() fails, then > do_ls() will print a warning, continue, and ultimately won't exit with > an error code (unless another error happens). > > If recursing into the node with xs_directory() fails, then do_ls() will > abort immediately, not printing any further nodes. > > For persistent failure modes — such as ENOENT because a node has been > removed, or EACCES because it has had its permisions changed since the > xs_directory() on the parent directory returned its name — it's > obviously quite likely that if either of the first two errors occur for > a given node, then so will the third and thus xenstore-ls will abort. > > The ENOENT one is actually a fairly common case, and has caused tools to > fail to clean up a network device because it *apparently* already > doesn't exist in xenstore. > > There is a school of thought that says, "Well, xenstore-ls returned an > error. So the tools should not trust its output." > > The natural corollary of this would surely be that the tools must re-run > xenstore-ls as many times as is necessary until its manages to exit > without hitting the race condition. I am not keen on that conclusion. > > For the specific case of ENOENT it seems reasonable to declare that, > but for the timing, we might as well just not have seen that node at > all when calling xs_directory() for the parent. By ignoring the error, > we give acceptable output. Have you thought about the possibility to do the complete handling in a single transaction? This would ensure a complete consistent picture from the time the operation has started. Any inconsistency should be reported as an error then. Juergen
On Fri, 2020-03-20 at 07:34 +0100, Jürgen Groß wrote: > Have you thought about the possibility to do the complete handling in a > single transaction? This would ensure a complete consistent picture > from the time the operation has started. Any inconsistency should be > reported as an error then. Hm, how would that work? Do I have to buffer *all* the output from do_ls() and then only print it if/when xs_transaction_end() succeeds, else try again?
On 20.03.20 12:19, David Woodhouse wrote: > On Fri, 2020-03-20 at 07:34 +0100, Jürgen Groß wrote: >> Have you thought about the possibility to do the complete handling in a >> single transaction? This would ensure a complete consistent picture >> from the time the operation has started. Any inconsistency should be >> reported as an error then. > > Hm, how would that work? Do I have to buffer *all* the output from > do_ls() and then only print it if/when xs_transaction_end() succeeds, > else try again? No, you just don't care for the transaction to succeed or fail (IMO it should never fail as you are reading only). So just wrap everything into a transaction. I might be wrong, of course. :-) Juergen
Jürgen Groß writes ("Re: [PATCH 1/2] tools/xenstore: Do not abort xenstore-ls if a node disappears while iterating"): > On 19.03.20 21:40, David Woodhouse wrote: > > From: David Woodhouse <dwmw@amazon.co.uk> ... > > For the specific case of ENOENT it seems reasonable to declare that, > > but for the timing, we might as well just not have seen that node at > > all when calling xs_directory() for the parent. By ignoring the error, > > we give acceptable output. Thanks. > Have you thought about the possibility to do the complete handling in a > single transaction? This would ensure a complete consistent picture > from the time the operation has started. Any inconsistency should be > reported as an error then. I think this would be a good idea (not least because it would mean that callers of xenstore-ls wouldn't see inconsistent data) but I think it would be an enhancement. For now, for David's original patch: Reviewed-by: Ian Jackson <ian.jackson@eu.citrix.com> If and when we introduce a transaction, David's 1/ should be reverted as indeed then even ENOENT would indicate some kind of serious problem. Thanks, Ian.
On 20.03.20 15:11, Ian Jackson wrote: > Jürgen Groß writes ("Re: [PATCH 1/2] tools/xenstore: Do not abort xenstore-ls if a node disappears while iterating"): >> On 19.03.20 21:40, David Woodhouse wrote: >>> From: David Woodhouse <dwmw@amazon.co.uk> > ... >>> For the specific case of ENOENT it seems reasonable to declare that, >>> but for the timing, we might as well just not have seen that node at >>> all when calling xs_directory() for the parent. By ignoring the error, >>> we give acceptable output. > > Thanks. > >> Have you thought about the possibility to do the complete handling in a >> single transaction? This would ensure a complete consistent picture >> from the time the operation has started. Any inconsistency should be >> reported as an error then. > > I think this would be a good idea (not least because it would mean > that callers of xenstore-ls wouldn't see inconsistent data) but I > think it would be an enhancement. > > For now, for David's original patch: > > Reviewed-by: Ian Jackson <ian.jackson@eu.citrix.com> > > If and when we introduce a transaction, David's 1/ should be reverted > as indeed then even ENOENT would indicate some kind of serious > problem. Fine with me. Juergen
Jürgen Groß writes ("Re: [Xen-devel] [PATCH 1/2] tools/xenstore: Do not abort xenstore-ls if a node disappears while iterating"): > No, you just don't care for the transaction to succeed or fail (IMO it > should never fail as you are reading only). > > So just wrap everything into a transaction. Yes. xenstored will do the needed buffering. I think in principle there is a risk here that the transaction might run for a long time, if the output from xenstore-ls goes to something that blocks (eg a pager) and can't be written all at once. But if this is a problem it is a problem afflicting xenstored, not xenstore-ls. Ian.
On Fri, 2020-03-20 at 14:12 +0000, Ian Jackson wrote: > Jürgen Groß writes ("Re: [Xen-devel] [PATCH 1/2] tools/xenstore: Do not abort xenstore-ls if a node disappears while iterating"): > > No, you just don't care for the transaction to succeed or fail (IMO it > > should never fail as you are reading only). > > > > So just wrap everything into a transaction. > > Yes. xenstored will do the needed buffering. > > I think in principle there is a risk here that the transaction might > run for a long time, if the output from xenstore-ls goes to something > that blocks (eg a pager) and can't be written all at once. > > But if this is a problem it is a problem afflicting xenstored, not > xenstore-ls. > > Ian. So if I do this... --- a/tools/xenstore/xenstore_client.c +++ b/tools/xenstore/xenstore_client.c @@ -658,7 +658,6 @@ main(int argc, char **argv) case MODE_write: transaction = (argc - switch_argv - optind) > 2; break; - case MODE_ls: case MODE_watch: transaction = 0; break; @@ -683,6 +682,7 @@ again: xth = xs_transaction_start(xsh); if (xth == XBT_NULL) errx(1, "couldn't start transaction"); + printf("started transaction\n"); } ret = perform(mode, optind, argc - switch_argv, argv + switch_argv, xsh, xth, prefix, tidy, upto, recurse, nr_watches, raw); ... and then repeat my test case as shown in [PATCH 1/1], I should no longer see... xenstore-ls: could not access permissions for 407: No such file or directory xenstore-ls: in xs_directory (/local/domain/2/foo/407): No such file or directory 0 But it does still happen. And even if I turn the errx() into a warn() to stop it aborting, and add a warn() when the xs_transaction_end() returns EAGAIN... that isn't happening either. I'm just getting inconsistent data, within a transaction.
On 20.03.20 15:58, David Woodhouse wrote: > On Fri, 2020-03-20 at 14:12 +0000, Ian Jackson wrote: >> Jürgen Groß writes ("Re: [Xen-devel] [PATCH 1/2] tools/xenstore: Do not abort xenstore-ls if a node disappears while iterating"): >>> No, you just don't care for the transaction to succeed or fail (IMO it >>> should never fail as you are reading only). >>> >>> So just wrap everything into a transaction. >> >> Yes. xenstored will do the needed buffering. >> >> I think in principle there is a risk here that the transaction might >> run for a long time, if the output from xenstore-ls goes to something >> that blocks (eg a pager) and can't be written all at once. >> >> But if this is a problem it is a problem afflicting xenstored, not >> xenstore-ls. >> >> Ian. > > > So if I do this... > > --- a/tools/xenstore/xenstore_client.c > +++ b/tools/xenstore/xenstore_client.c > @@ -658,7 +658,6 @@ main(int argc, char **argv) > case MODE_write: > transaction = (argc - switch_argv - optind) > 2; > break; > - case MODE_ls: > case MODE_watch: > transaction = 0; > break; > @@ -683,6 +682,7 @@ again: > xth = xs_transaction_start(xsh); > if (xth == XBT_NULL) > errx(1, "couldn't start transaction"); > + printf("started transaction\n"); > } > > ret = perform(mode, optind, argc - switch_argv, argv + switch_argv, xsh, xth, prefix, tidy, upto, recurse, nr_watches, raw); > > > ... and then repeat my test case as shown in [PATCH 1/1], I should no > longer see... > > xenstore-ls: > could not access permissions for 407: No such file or directory > xenstore-ls: in xs_directory (/local/domain/2/foo/407): No such file or directory > 0 > > > But it does still happen. And even if I turn the errx() into a warn() > to stop it aborting, and add a warn() when the xs_transaction_end() > returns EAGAIN... that isn't happening either. I'm just getting > inconsistent data, within a transaction. Hmm, yes, thinking more about it: a non-transactional write of a node which hasn't been written or read by an ongoing transaction is not handled in a special way. This could be changed, but would require some structural changes. Juergen
Jürgen Groß writes ("Re: [Xen-devel] [PATCH 1/2] tools/xenstore: Do not abort xenstore-ls if a node disappears while iterating"): > On 20.03.20 15:58, David Woodhouse wrote: > > But it does still happen. And even if I turn the errx() into a warn() > > to stop it aborting, and add a warn() when the xs_transaction_end() > > returns EAGAIN... that isn't happening either. I'm just getting > > inconsistent data, within a transaction. > > Hmm, yes, thinking more about it: a non-transactional write of a node > which hasn't been written or read by an ongoing transaction is not > handled in a special way. This could be changed, but would require some > structural changes. And making a node visible by XS_DIRECTORY[_PART] doesn't count as reading it. But it does count as reading the parent ? In principle adding or removing a node could be made to count as a change to the containing directory. But I don't think doing this as a response to David's issue is sensible. Ian.
Ian Jackson writes ("Re: [Xen-devel] [PATCH 1/2] tools/xenstore: Do not abort xenstore-ls if a node disappears while iterating"): > And making a node visible by XS_DIRECTORY[_PART] doesn't count as > reading it. But it does count as reading the parent ? > In principle adding or removing a node could be made to count as a > change to the containing directory. But I don't think doing this as a > response to David's issue is sensible. So, err, putting that together and reviewing the state of the world: I still think David's 1/ patch is good. I think my comments on 2/ are still applicable, apart from the bits where I suggest using a transaction will fix all this. David: do you now intend to revise 2/ according to our comments ? Everyone else: is there some reason we shouldn't commit 1/ immediately ? Ian.
On Mon, 2020-03-30 at 17:40 +0100, Ian Jackson wrote: > Ian Jackson writes ("Re: [Xen-devel] [PATCH 1/2] tools/xenstore: Do > not abort xenstore-ls if a node disappears while iterating"): > > And making a node visible by XS_DIRECTORY[_PART] doesn't count as > > reading it. But it does count as reading the parent ? > > In principle adding or removing a node could be made to count as a > > change to the containing directory. But I don't think doing this > > as a > > response to David's issue is sensible. > > So, err, putting that together and reviewing the state of the world: > > I still think David's 1/ patch is good. > > I think my comments on 2/ are still applicable, apart from the > bits where I suggest using a transaction will fix all this. > > David: do you now intend to revise 2/ according to our comments ? I confess to having slightly lost the will to live, but sure. If #1 gets applied and actually fixes the bug that was biting us in production and which I started trying to upstream in March 2019, I'll happily revisit those subsequent cleanups you asked for. > Everyone else: is there some reason we shouldn't commit 1/ > immediately ? It was deliberately split out so that it could indeed be applied immediately when it was posted in March.
On Fri, Aug 14, 2020 at 03:23:10PM +0100, David Woodhouse wrote: > On Mon, 2020-03-30 at 17:40 +0100, Ian Jackson wrote: > > Ian Jackson writes ("Re: [Xen-devel] [PATCH 1/2] tools/xenstore: Do > > not abort xenstore-ls if a node disappears while iterating"): > > > And making a node visible by XS_DIRECTORY[_PART] doesn't count as > > > reading it. But it does count as reading the parent ? > > > In principle adding or removing a node could be made to count as a > > > change to the containing directory. But I don't think doing this > > > as a > > > response to David's issue is sensible. > > > > So, err, putting that together and reviewing the state of the world: > > > > I still think David's 1/ patch is good. > > Everyone else: is there some reason we shouldn't commit 1/ > > immediately ? OK I will turn this into an ack and apply this patch. Sorry this fell off my radar somehow. Wei. > > It was deliberately split out so that it could indeed be applied > immediately when it was posted in March.
diff --git a/tools/xenstore/xenstore_client.c b/tools/xenstore/xenstore_client.c index 3afc630ab8..ae7ed3eb9e 100644 --- a/tools/xenstore/xenstore_client.c +++ b/tools/xenstore/xenstore_client.c @@ -148,14 +148,20 @@ static void do_ls(struct xs_handle *h, char *path, int cur_depth, int show_perms int i; unsigned int num, len; + e = xs_directory(h, XBT_NULL, path, &num); + if (e == NULL) { + if (cur_depth && errno == ENOENT) { + /* If a node disappears while recursing, silently move on. */ + return; + } + + err(1, "xs_directory (%s)", path); + } + newpath = malloc(STRING_MAX); if (!newpath) err(1, "malloc in do_ls"); - e = xs_directory(h, XBT_NULL, path, &num); - if (e == NULL) - err(1, "xs_directory (%s)", path); - for (i = 0; i<num; i++) { char buf[MAX_STRLEN(unsigned int)+1]; struct xs_permissions *perms;