diff mbox

xfs_io: fix exitcode handling (was Re: generic/399 and xfs_io pwrite command)

Message ID 20171208141506.GA55826@bfoster.bfoster (mailing list archive)
State Not Applicable
Headers show

Commit Message

Brian Foster Dec. 8, 2017, 2:15 p.m. UTC
On Fri, Dec 08, 2017 at 10:46:31AM +1100, Dave Chinner wrote:
> On Thu, Dec 07, 2017 at 09:05:20AM -0500, Brian Foster wrote:
> > On Wed, Dec 06, 2017 at 11:26:38AM +1100, Dave Chinner wrote:
> > > On Wed, Dec 06, 2017 at 08:18:50AM +1100, Dave Chinner wrote:
> > > > On Tue, Dec 05, 2017 at 04:23:43PM +0200, Ari Sundholm wrote:
> > > > > On 12/05/2017 12:28 AM, Dave Chinner wrote:
> > > > > >On Thu, Nov 30, 2017 at 04:21:47PM +0200, Ari Sundholm wrote:
> > ...
> > > 
> > > xfs_io: set exitcode on failure appropriately
> > > 
> > > From: Dave Chinner <dchinner@redhat.com>
> > > 
> > > Many operations don't set the exitcode when they fail, resulting
> > > in xfs_io exiting with a zero (no failure) exit code despite the
> > > command failing and returning an error. Fix this.
> > > 
> > > Signed-Off-By: Dave Chinner <dchinner@redhat.com>
> > > ---
> > ...
> > > diff --git a/io/copy_file_range.c b/io/copy_file_range.c
> > > index d1dfc5a5e33a..9b737eff4c02 100644
> > > --- a/io/copy_file_range.c
> > > +++ b/io/copy_file_range.c
> > > @@ -92,6 +92,7 @@ copy_range_f(int argc, char **argv)
> > >  	int ret;
> > >  	int fd;
> > >  
> > > +	exitcode = 1;
> > >  	while ((opt = getopt(argc, argv, "s:d:l:")) != -1) {
> > >  		switch (opt) {
> > >  		case 's':
> > > @@ -132,6 +133,8 @@ copy_range_f(int argc, char **argv)
> > >  
> > >  	ret = copy_file_range(fd, &src, &dst, len);
> > >  	close(fd);
> > > +	if (ret >= 0)
> > > +		exitcode = 0;
> > 
> > I don't think it's appropriate to blindly overwrite the global exitcode
> > value like this. What about the case where multiple commands are chained
> > together? Should we expect an error code if any of the commands failed
> > or only the last?
> > 
> > For example:
> > 
> >   xfs_io -c "pwrite ..." <file>
> > 
> > ... returns an error if the write fails, while:
> > 
> >   xfs_io -c "pwrite ..." -c "fadvise ..." <file>
> > 
> > ... would never so long as the fadvise succeeds.
> 
> Yup, I mentioned that this would be a problem on IRC. The patch fixes
> the interactive and the single command cases, but won't work for
> chained commands as you rightly point out.
> 
> To fix it properly, it's going to take a lot more than 15 minutes of
> time. We're going to have to change how errors are returned to and
> propagated through the libxcmd infrastruture, how we handle "fatal
> error, don't continue" conditions through the libxcmd command loop,
> etc. If we want to stop at the first error, then we've got to go
> change all the return values from xfs_io commands to return non-zero
> on error. We still have to set the exitcode as per this patch,
> because the non-zero return value simply says "stop parsing input"
> and doesn't affect the exit code of the program.
> 
> Given that xfs_quota, xfs_io, xfs_db, and xfs_spaceman all use the
> same libxcmd infrastructure, we've got to make the changes
> consistently across all of those utilities. This will require an
> audit of all the commands first to determine if there's anything
> special in any of those utilities, then changing all the code, then
> testing all the CLI parsing still works correctly.  xfs_quota, as
> usual, will be the problem child that causes us lots of pain here.
> 
> I'm not planning on doing all this in the near future, so I did the
> next best thing that would only affect xfs_io behaviour. i.e.
> directly manipulate the exitcode as many of the existing xfs_io
> commands already do.
> 

Sure, I don't dispute the broader work required to fix everything up
properly and I don't take issue with modifying exitcode directly in
principle. I just don't see how any of this necessitates breaking the
chained command case for the those commands that we are trying to fix
up, particularly when the problem seems easily avoidable. See below for
a tweak to the fadvise example..

(I notice that I mistakenly dropped the return code from
command_usage(), but you get the idea (and it hardcodes to 0)).

Brian

--- 8< ---

--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Dave Chinner Dec. 8, 2017, 10:52 p.m. UTC | #1
On Fri, Dec 08, 2017 at 09:15:06AM -0500, Brian Foster wrote:
> On Fri, Dec 08, 2017 at 10:46:31AM +1100, Dave Chinner wrote:
> > On Thu, Dec 07, 2017 at 09:05:20AM -0500, Brian Foster wrote:
> > > On Wed, Dec 06, 2017 at 11:26:38AM +1100, Dave Chinner wrote:
> > > > On Wed, Dec 06, 2017 at 08:18:50AM +1100, Dave Chinner wrote:
> > > > > On Tue, Dec 05, 2017 at 04:23:43PM +0200, Ari Sundholm wrote:
> > > > > > On 12/05/2017 12:28 AM, Dave Chinner wrote:
> > > > > > >On Thu, Nov 30, 2017 at 04:21:47PM +0200, Ari Sundholm wrote:
> > > ...
> > > > 
> > > > xfs_io: set exitcode on failure appropriately
> > > > 
> > > > From: Dave Chinner <dchinner@redhat.com>
> > > > 
> > > > Many operations don't set the exitcode when they fail, resulting
> > > > in xfs_io exiting with a zero (no failure) exit code despite the
> > > > command failing and returning an error. Fix this.
> > > > 
> > > > Signed-Off-By: Dave Chinner <dchinner@redhat.com>
> > > > ---
> > > ...
> > > > diff --git a/io/copy_file_range.c b/io/copy_file_range.c
> > > > index d1dfc5a5e33a..9b737eff4c02 100644
> > > > --- a/io/copy_file_range.c
> > > > +++ b/io/copy_file_range.c
> > > > @@ -92,6 +92,7 @@ copy_range_f(int argc, char **argv)
> > > >  	int ret;
> > > >  	int fd;
> > > >  
> > > > +	exitcode = 1;
> > > >  	while ((opt = getopt(argc, argv, "s:d:l:")) != -1) {
> > > >  		switch (opt) {
> > > >  		case 's':
> > > > @@ -132,6 +133,8 @@ copy_range_f(int argc, char **argv)
> > > >  
> > > >  	ret = copy_file_range(fd, &src, &dst, len);
> > > >  	close(fd);
> > > > +	if (ret >= 0)
> > > > +		exitcode = 0;
> > > 
> > > I don't think it's appropriate to blindly overwrite the global exitcode
> > > value like this. What about the case where multiple commands are chained
> > > together? Should we expect an error code if any of the commands failed
> > > or only the last?
> > > 
> > > For example:
> > > 
> > >   xfs_io -c "pwrite ..." <file>
> > > 
> > > ... returns an error if the write fails, while:
> > > 
> > >   xfs_io -c "pwrite ..." -c "fadvise ..." <file>
> > > 
> > > ... would never so long as the fadvise succeeds.
> > 
> > Yup, I mentioned that this would be a problem on IRC. The patch fixes
> > the interactive and the single command cases, but won't work for
> > chained commands as you rightly point out.
> > 
> > To fix it properly, it's going to take a lot more than 15 minutes of
> > time. We're going to have to change how errors are returned to and
> > propagated through the libxcmd infrastruture, how we handle "fatal
> > error, don't continue" conditions through the libxcmd command loop,
> > etc. If we want to stop at the first error, then we've got to go
> > change all the return values from xfs_io commands to return non-zero
> > on error. We still have to set the exitcode as per this patch,
> > because the non-zero return value simply says "stop parsing input"
> > and doesn't affect the exit code of the program.
> > 
> > Given that xfs_quota, xfs_io, xfs_db, and xfs_spaceman all use the
> > same libxcmd infrastructure, we've got to make the changes
> > consistently across all of those utilities. This will require an
> > audit of all the commands first to determine if there's anything
> > special in any of those utilities, then changing all the code, then
> > testing all the CLI parsing still works correctly.  xfs_quota, as
> > usual, will be the problem child that causes us lots of pain here.
> > 
> > I'm not planning on doing all this in the near future, so I did the
> > next best thing that would only affect xfs_io behaviour. i.e.
> > directly manipulate the exitcode as many of the existing xfs_io
> > commands already do.
> > 
> 
> Sure, I don't dispute the broader work required to fix everything up
> properly and I don't take issue with modifying exitcode directly in
> principle. I just don't see how any of this necessitates breaking the
> chained command case for the those commands that we are trying to fix
> up, particularly when the problem seems easily avoidable. See below for
> a tweak to the fadvise example..

You're welcome to do this - I just threw out a quick patch that
makes the code *less broken* rather than perfect. exit codes for
chained commands are already somewhat broken, so what I did doesn't
make the state of affairs any worse.

Cheers,

Dave.
diff mbox

Patch

diff --git a/io/fadvise.c b/io/fadvise.c
index 9cc91a57..f802ce1b 100644
--- a/io/fadvise.c
+++ b/io/fadvise.c
@@ -54,7 +54,6 @@  fadvise_f(
 	off64_t		offset = 0, length = 0;
 	int		c, range = 0, advise = POSIX_FADV_NORMAL;
 
-	exitcode = 1;
 	while ((c = getopt(argc, argv, "dnrsw")) != EOF) {
 		switch (c) {
 		case 'd':	/* Don't need these pages */
@@ -78,37 +77,43 @@  fadvise_f(
 			range = 1;
 			break;
 		default:
-			return command_usage(&fadvise_cmd);
+			goto usage;
 		}
 	}
 	if (range) {
 		size_t	blocksize, sectsize;
 
 		if (optind != argc - 2)
-			return command_usage(&fadvise_cmd);
+			goto usage;
 		init_cvtnum(&blocksize, &sectsize);
 		offset = cvtnum(blocksize, sectsize, argv[optind]);
 		if (offset < 0) {
 			printf(_("non-numeric offset argument -- %s\n"),
 				argv[optind]);
-			return 0;
+			goto seterror;
 		}
 		optind++;
 		length = cvtnum(blocksize, sectsize, argv[optind]);
 		if (length < 0) {
 			printf(_("non-numeric length argument -- %s\n"),
 				argv[optind]);
-			return 0;
+			goto seterror;
 		}
 	} else if (optind != argc) {
-		return command_usage(&fadvise_cmd);
+		goto usage;
 	}
 
 	if (posix_fadvise(file->fd, offset, length, advise) < 0) {
 		perror("fadvise");
-		return 0;
+		goto seterror;
 	}
-	exitcode = 0;
+
+	return 0;
+
+usage:
+	command_usage(&fadvise_cmd);
+seterror:
+	exitcode = 1;
 	return 0;
 }