diff mbox series

[v2] tools/xenstore: Do not abort xenstore-ls if a node disappears while iterating

Message ID 5aa0116479240e5c2751fbaa745a6071a98f9480.camel@infradead.org (mailing list archive)
State New, archived
Headers show
Series [v2] tools/xenstore: Do not abort xenstore-ls if a node disappears while iterating | expand

Commit Message

David Woodhouse Aug. 7, 2019, 3:27 p.m. UTC
From: David Woodhouse <dwmw@amazon.co.uk>

When recursing, a node sometimes disappears. Deal with it and move on
instead of aborting and failing to print the rest of what was
requested.

Either EACCES or ENOENT may occur as the result of race conditions with
updates; any other error should abort as before.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
And thus did an extremely sporadic "not going to delete that device
because it already doesn't exist" failure mode become painfully obvious
in retrospect...

 tools/xenstore/xenstore_client.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

Comments

Ian Jackson Aug. 7, 2019, 3:37 p.m. UTC | #1
(Adding Juergen.)

Hi, thanks for the resend.

David Woodhouse writes ("[PATCH v2] tools/xenstore: Do not abort xenstore-ls if a node disappears while iterating"):
> From: David Woodhouse <dwmw@amazon.co.uk>
> 
> When recursing, a node sometimes disappears. Deal with it and move on
> instead of aborting and failing to print the rest of what was
> requested.
> 
> Either EACCES or ENOENT may occur as the result of race conditions with
> updates; any other error should abort as before.

I think it is not safe to continue if we get a permissions error.  I
think that would mean "we were not able to determine whether this node
exists", not "this node does not exist".

So with a permissions error silently ignored, xenstore-ls might
silently produce partial output.

I reread the discussion and I'm afraid I still don't understand why
people thought that ignoring such errors would be OK.  The fact (if
true) that a spurious permissions error might result from a race is
not sufficient to justify ignoring permissions errors.

What is needed is to show that all permissions errors are fine to
ignore.  And it is obvious that there are permissions errors that are
*not* fine to ignore, so the proposition is false; and permissions
errors must cause complaint and a nonzero exit status.

Given that the code doesn't have a way to print an error message and
record the error code or exit status for later, I think the best
approach is probably exactly David's patch only without the mention of
EACCES.

Sorry to come to this so late.


On IRC, David suggested this patch for backporting.  Subject to
resolution of my comments above, that seems like a good idea to me.
(However, this comes rather too late for 4.12.1.)

Thanks,
Ian.
David Woodhouse Aug. 8, 2019, 3:12 p.m. UTC | #2
On Wed, 2019-08-07 at 16:37 +0100, Ian Jackson wrote:
> I think it is not safe to continue if we get a permissions error.  I
> think that would mean "we were not able to determine whether this node
> exists", not "this node does not exist".

Either way, I interpreted it more as "haha screw you, now I'm not going
to tell you whether any *other* node exists".

This is the test case I actually started with:

(dom0) # for a in `seq 1 1000` ; do xenstore-write /local/domain/2/foo/$a $a ; done

Now simultaneously:

(dom0) # for a in `seq 1 999` ; do xenstore-rm /local/domain/2/foo/$a  ; done
(dom2) # while true ; do ./xenstore-ls -p /local/domain/2/foo | grep -c 1000;  done

I do expect to see node 1000, every time. With my patch, that works.

> So with a permissions error silently ignored, xenstore-ls might
> silently produce partial output.

Even without the race condition, we get partial output. In this test
case, observe that node four is absent from what dom2 reports, and we
get *two* error messages about node three.

(dom0) # xenstore-ls -p /local/domain/2/foo
one = "one"  . . . . . . . . . . . . . . . . . . . . . . . .  (n0,r2)
two = "two"  . . . . . . . . . . . . . . . . . . . . . . . .  (n0,r2)
three = "three"  . . . . . . . . . . . . . . . . . . . . . .  (n0)
four = "four"  . . . . . . . . . . . . . . . . . . . . . . .  (n0,r2)

(dom2) # xenstore-ls -p /local/domain/2/foo
one = "one"  . . . . . . . . . . . . . . . . . . . . . . . .  (n0,r2)
two = "two"  . . . . . . . . . . . . . . . . . . . . . . . .  (n0,r2)
three:
xenstore-ls: 
could not access permissions for three: Permission denied

xenstore-ls: xs_directory (/local/domain/2/foo/three): Permission denied



The tool actually makes three calls for each node it touches. If the
xs_read() fails, it silently prints ":\n" and doesn't report the errno.

If the (optional) xs_get_permissions() fails, it *warns* but continues,
attempting to recurse into the node in question.

If the xs_directory() fails, it aborts and doesn't even continue.

My v2 patch makes the third case (xs_directory()) behave like the first
(xs_read()), and silently continue. It gives me:

(dom2) # ./xenstore-ls -p  /local/domain/2/foo
one = "one"  . . . . . . . . . . . . . . . . . . . . . . . .  (n0,r2)
two = "two"  . . . . . . . . . . . . . . . . . . . . . . . .  (n0,r2)
three:
xenstore-ls: 
could not access permissions for three: Permission denied

four = "four"  . . . . . . . . . . . . . . . . . . . . . . .  (n0,r2)


(dom2) # ./xenstore-ls /local/domain/2/foo
one = "one"
two = "two"
three:

four = "four"

> Given that the code doesn't have a way to print an error message and
> record the error code or exit status for later, I think the best
> approach is probably exactly David's patch only without the mention of
> EACCES.

Well... here's an incremental straw man patch based on the existing v2,
which will print an error for the *first* operation that fails, and
should cope with race conditions too.

Tested with

(dom0) # while true; do xenstore-chmod /local/domain/2/foo/three n0; xenstore-chmod /local/domain/2/foo/three n0 r2 ; done
(dom2) # while true ;  do echo ================= ; sudo ./xenstore-ls  -p  /local/domain/2/foo; done

diff --git a/tools/xenstore/xenstore_client.c b/tools/xenstore/xenstore_client.c
index 9fcd3d2f9e..c5d0b18106 100644
--- a/tools/xenstore/xenstore_client.c
+++ b/tools/xenstore/xenstore_client.c
@@ -140,7 +140,7 @@ static int show_whole_path = 0;
 
 #define MIN(a, b) (((a) < (b))? (a) : (b))
 
-static void do_ls(struct xs_handle *h, char *path, int cur_depth, int show_perms)
+static void do_ls(struct xs_handle *h, char *path, int cur_depth, int show_perms, int ignore_errors)
 {
     char **e;
     char *newpath, *val;
@@ -151,7 +151,13 @@ static void do_ls(struct xs_handle *h, char *path, int cur_depth, int show_perms
     e = xs_directory(h, XBT_NULL, path, &num);
     if (e == NULL) {
         if (cur_depth && (errno == ENOENT || errno == EACCES)) {
-            /* If a node disappears while recursing, silently move on. */
+            /*
+             * If a node disappears or becomes inaccessible while traversing,
+             * only print an error if previous operations on this node haven't
+             * done do. Then move on.
+             */
+            if (!ignore_errors)
+                warn("xs_directory (%s)", path);
             return;
         }
 
@@ -197,7 +203,7 @@ static void do_ls(struct xs_handle *h, char *path, int cur_depth, int show_perms
 
         /* Print value */
         if (val == NULL) {
-            printf(":\n");
+            printf(": (%s)", strerror(errno));
         }
         else {
             if (max_width < (linewid + len + TAG_LEN)) {
@@ -222,7 +228,10 @@ static void do_ls(struct xs_handle *h, char *path, int cur_depth, int show_perms
         if (show_perms) {
             perms = xs_get_permissions(h, XBT_NULL, newpath, &nperms);
             if (perms == NULL) {
-                warn("\ncould not access permissions for %s", e[i]);
+                /* Don't repeat an error message if xs_read() already failed */
+                if (val)
+                    warn("could not access permissions for %s", e[i]);
+                val = NULL;
             }
             else {
                 int i;
@@ -239,7 +248,7 @@ static void do_ls(struct xs_handle *h, char *path, int cur_depth, int show_perms
 
         putchar('\n');
             
-        do_ls(h, newpath, cur_depth+1, show_perms); 
+        do_ls(h, newpath, cur_depth+1, show_perms, !val);
     }
     free(e);
     free(newpath);
@@ -448,7 +457,7 @@ perform(enum mode mode, int optind, int argc, char **argv, struct xs_handle *xsh
             break;
         }
         case MODE_ls: {
-            do_ls(xsh, argv[optind], 0, prefix);
+            do_ls(xsh, argv[optind], 0, prefix, 0);
             optind++;
             break;
         }
Ian Jackson Aug. 9, 2019, 11:27 a.m. UTC | #3
David Woodhouse writes ("Re: [PATCH v2] tools/xenstore: Do not abort xenstore-ls if a node disappears while iterating"):
> On Wed, 2019-08-07 at 16:37 +0100, Ian Jackson wrote:
> > So with a permissions error silently ignored, xenstore-ls might
> > silently produce partial output.
> 
> Even without the race condition, we get partial output. In this test
> case, observe that node four is absent from what dom2 reports, and we
> get *two* error messages about node three.

Sorry, perhaps I wasn't clear enough.  I am particularly concerned
about the relationship between possible partial output, and the exit
status.

> (dom2) # xenstore-ls -p /local/domain/2/foo
> one = "one"  . . . . . . . . . . . . . . . . . . . . . . . .  (n0,r2)
> two = "two"  . . . . . . . . . . . . . . . . . . . . . . . .  (n0,r2)
> three:
> xenstore-ls: 
> could not access permissions for three: Permission denied
> 
> xenstore-ls: xs_directory (/local/domain/2/foo/three): Permission denied

I'm pretty sure that here xenstore-ls exited nonzero.

> The tool actually makes three calls for each node it touches. If the
> xs_read() fails, it silently prints ":\n" and doesn't report the errno.

That's freaky but at the very least we shouldn't make it worse.

> If the (optional) xs_get_permissions() fails, it *warns* but continues,
> attempting to recurse into the node in question.

Presumably it fails to print permissions information, too.  So some
tool which is (1) checking the exit status and (2) using the output
will not be misled.

I think in both of these cases it exits zero (assuming nothing else
goes wrong).

> If the xs_directory() fails, it aborts and doesn't even continue.

In this situation it exits nonzero.  Partial output with a nonzero
exit status is not ideal, but it is not incorrect.  Omitted output,
with a zero exit status, is a potential data loss bug and must be
avoided.

(And I think this is true even if there is a message to stderr.)

> Well... here's an incremental straw man patch based on the existing v2,
> which will print an error for the *first* operation that fails, and
> should cope with race conditions too.

Can you explain to me what your needs are for all this ?  I want
xenstore-ls to be able to do what you want, but I don't want to impose
a lot of work on you.  I also want the correctness property I mention
above.

My view of the ideal situation would be for xenstore-ls to have
defined exit status values, like

  0    everything went well
  1    not used, reserved because some runtimes etc. invent 1
  2    permissions of at least one node could not be printed
  3    value of at least one node could not be printed
  4    children of at least one node could not be listed,
        nodes may be missing
 127   catastrophe, all bets are off

And maybe some command line options to control which of these
conditions print a message to stderr, and/or to turn some of them into
0 for the caller's convenience.

But this would involve xenstore-ls having a global variable where it
accumulates the error status, and going through the whole program and
fixing the error handling (which currently seems not very good to me).

If you want to do that, then great, let us talk about the details.

If that's too much work then maybe you can do something now which at
least goes in the right direction.

Does that make sense ?

Thanks,
Ian.
David Woodhouse Aug. 9, 2019, 11:48 a.m. UTC | #4
On Fri, 2019-08-09 at 12:27 +0100, Ian Jackson wrote:
> Can you explain to me what your needs are for all this ?  I want
> xenstore-ls to be able to do what you want, but I don't want to impose
> a lot of work on you.  I also want the correctness property I mention
> above.

Ultimately, I don't really care about anything but the ENOENT example
that I gave in my previous email:

(dom0) # for a in `seq 1 1000` ; do xenstore-write /local/domain/2/foo/$a $a ; done

Now simultaneously:

(dom0) # for a in `seq 1 999` ; do xenstore-rm /local/domain/2/foo/$a  ; done
(dom2) # while true ; do ./xenstore-ls -p /local/domain/2/foo | grep -c 1000;  done

I want to see node 1000, reliably, every single time. I absolutely
don't want it aborting because of the other nodes disappearing between
the xs_directory() on the parent, and on a (now removed) child that I
don't even care about.

> My view of the ideal situation would be for xenstore-ls to have
> defined exit status values, like
> 
>   0    everything went well
>   1    not used, reserved because some runtimes etc. invent 1
>   2    permissions of at least one node could not be printed
>   3    value of at least one node could not be printed
>   4    children of at least one node could not be listed,
>         nodes may be missing
>  127   catastrophe, all bets are off

Sure.

> And maybe some command line options to control which of these
> conditions print a message to stderr, and/or to turn some of them into
> 0 for the caller's convenience.

That part seems like overkill.

> But this would involve xenstore-ls having a global variable where it
> accumulates the error status, and going through the whole program and
> fixing the error handling (which currently seems not very good to me).

I don't think we need a global variable. The do_ls() function is
recursive and could return an error which then accumulates. The straw
man in my previous patch fixes up most of the error handling and makes
sure we're only printing one error per node, and could fairly easily be
extended to return an error value. Then we just need to define
priorities for ENOENT vs. EACCESS (or make it a bitfield, but let's
not).

> If you want to do that, then great, let us talk about the details.
> 
> If that's too much work then maybe you can do something now which at
> least goes in the right direction.

I'll send a v3 of the original patch which *only* ignores ENOENT. That
one really shouldn't need to cause a non-zero exit status because it's
equivalent to the case where the child node didn't exist when we called
xs_directory() on the parent. Then we can refine the rest of the
cleanup in a follow-on patch.
David Woodhouse Aug. 9, 2019, 12:13 p.m. UTC | #5
On Fri, 2019-08-09 at 12:48 +0100, David Woodhouse wrote:
> 
> I'll send a v3 of the original patch which *only* ignores ENOENT. That
> one really shouldn't need to cause a non-zero exit status because it's
> equivalent to the case where the child node didn't exist when we called
> xs_directory() on the parent. Then we can refine the rest of the
> cleanup in a follow-on patch.

How does this look?

https://xenbits.xen.org/gitweb/?p=people/dwmw2/xen.git;a=commitdiff;h=77b922bd027c321ff5c9426d5380f6d47c7022f1
diff mbox series

Patch

diff --git a/tools/xenstore/xenstore_client.c b/tools/xenstore/xenstore_client.c
index 3afc630ab8..9fcd3d2f9e 100644
--- a/tools/xenstore/xenstore_client.c
+++ b/tools/xenstore/xenstore_client.c
@@ -148,14 +148,20 @@  static void do_ls(struct xs_handle *h, char *path, int cur_depth, int show_perms
     int i;
     unsigned int num, len;
 
+    e = xs_directory(h, XBT_NULL, path, &num);
+    if (e == NULL) {
+        if (cur_depth && (errno == ENOENT || errno == EACCES)) {
+            /* If a node disappears while recursing, silently move on. */
+            return;
+        }
+
+        err(1, "xs_directory (%s)", path);
+    }
+
     newpath = malloc(STRING_MAX);
     if (!newpath)
       err(1, "malloc in do_ls");
 
-    e = xs_directory(h, XBT_NULL, path, &num);
-    if (e == NULL)
-        err(1, "xs_directory (%s)", path);
-
     for (i = 0; i<num; i++) {
         char buf[MAX_STRLEN(unsigned int)+1];
         struct xs_permissions *perms;