diff mbox series

var: Do not add 1 to return value of strchrnul

Message ID Y7P33bSfbmpe7EI/@gondor.apana.org.au (mailing list archive)
State Superseded
Delegated to: Herbert Xu
Headers show
Series var: Do not add 1 to return value of strchrnul | expand

Commit Message

Herbert Xu Jan. 3, 2023, 9:39 a.m. UTC
наб <nabijaczleweli@nabijaczleweli.xyz> wrote:
>
> --- a/src/var.c
> +++ b/src/var.c
> @@ -266,7 +266,7 @@ struct var *setvareq(char *s, int flags)
>                        goto out;
> 
>                if (vp->func && (flags & VNOFUNC) == 0)
> -                       (*vp->func)(strchrnul(s, '=') + 1);
> +                       (*vp->func)(strchrnul(s, '=') + 1, flags);

Yes this was definitely broken.  strchrnul returns a pointer to the
final NUL character so adding one to it is just wrong.

However, I don't think we need to pass the flags to the action
function since none of them care about whether it's unset.  We
just need to pass a pointer to an empty string rather than some
bogus pointer.

---8<---
When a variable like OPTIND is unset dash may call the action
function with a bogus pointer because it tries to add one to
the return value of strchrnul unconditionally.

Use strchr and nullstr instead.

Link: https://bugs.debian.org/985478
Reported-by: наб <nabijaczleweli@nabijaczleweli.xyz>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Comments

Ahelenia Ziemiańska Jan. 3, 2023, 12:26 p.m. UTC | #1
Hi!

On Tue, Jan 03, 2023 at 05:39:41PM +0800, Herbert Xu wrote:
> наб <nabijaczleweli@nabijaczleweli.xyz> wrote:
> > --- a/src/var.c
> > +++ b/src/var.c
> > @@ -266,7 +266,7 @@ struct var *setvareq(char *s, int flags)
> >                        goto out;
> > 
> >                if (vp->func && (flags & VNOFUNC) == 0)
> > -                       (*vp->func)(strchrnul(s, '=') + 1);
> > +                       (*vp->func)(strchrnul(s, '=') + 1, flags);
> Yes this was definitely broken.  strchrnul returns a pointer to the
> final NUL character so adding one to it is just wrong.
That originally gave me pause but I assumed this is an intentional
muddling for a known memory lay-out or w/e,
but that explains the corruption I saw sometimes, yeah.

> However, I don't think we need to pass the flags to the action
> function since none of them care about whether it's unset.  We
> just need to pass a pointer to an empty string rather than some
> bogus pointer.
This, unsurprisingly, produces the same error, because getoptsreset()
needs a valid number for OPTIND:
  $ src/dash -c 'unset OPTIND'
  src/dash: 1: unset: Illegal number:
and the empty string isn't a valid number
(in the simple case of src/dash -c 'unset OPTIND' &a. the original
 broken approach also produced an empty string on accident
 in most cases for me).

We do need to at least to /some/how signal to getoptsreset() that we're
unsetting while preserving the requirement that OPTIND is an integer,
and passing a flag is the only sensible way, I think.

Scissor-patch rebased on top of your v2 below.

Best,
-- >8 --
Subject: [PATCH] options: don't error when unsetting OPTIND

unset OPTIND ends up calling getoptsreset("") which errors out with
  sh: 1: unset: Illegal number:

Pass the current flags to struct var->func, set the getopts optind to 1
and continue with allowing the unset.

We still forbid OPTIND=, OPTIND=-1, OPTIND=abc, &c.

Fixes: https://bugs.debian.org/985478
---
 src/exec.c    | 2 +-
 src/exec.h    | 2 +-
 src/mail.c    | 2 +-
 src/mail.h    | 2 +-
 src/options.c | 5 ++---
 src/options.h | 2 +-
 src/var.c     | 4 ++--
 src/var.h     | 2 +-
 8 files changed, 10 insertions(+), 11 deletions(-)

diff --git a/src/exec.c b/src/exec.c
index 87354d4..68fa8ab 100644
--- a/src/exec.c
+++ b/src/exec.c
@@ -565,7 +565,7 @@ hashcd(void)
  */
 
 void
-changepath(const char *newval)
+changepath(const char *newval, int unused)
 {
 	const char *new;
 	int idx;
diff --git a/src/exec.h b/src/exec.h
index 423b07e..0f74be4 100644
--- a/src/exec.h
+++ b/src/exec.h
@@ -69,7 +69,7 @@ int hashcmd(int, char **);
 void find_command(char *, struct cmdentry *, int, const char *);
 struct builtincmd *find_builtin(const char *);
 void hashcd(void);
-void changepath(const char *);
+void changepath(const char *, int);
 #ifdef notdef
 void getcmdentry(char *, struct cmdentry *);
 #endif
diff --git a/src/mail.c b/src/mail.c
index 8eacb2d..e81d2b4 100644
--- a/src/mail.c
+++ b/src/mail.c
@@ -109,7 +109,7 @@ chkmail(void)
 
 
 void
-changemail(const char *val)
+changemail(const char *val, int unused)
 {
 	changed++;
 }
diff --git a/src/mail.h b/src/mail.h
index 3c6b21d..70b54a4 100644
--- a/src/mail.h
+++ b/src/mail.h
@@ -35,4 +35,4 @@
  */
 
 void chkmail(void);
-void changemail(const char *);
+void changemail(const char *, int);
diff --git a/src/options.c b/src/options.c
index a46c23b..81f2c4b 100644
--- a/src/options.c
+++ b/src/options.c
@@ -390,10 +390,9 @@ setcmd(int argc, char **argv)
 
 
 void
-getoptsreset(value)
-	const char *value;
+getoptsreset(const char *value, int flags)
 {
-	shellparam.optind = number(value) ?: 1;
+	shellparam.optind = (flags & VUNSET) ? 1 : number(value) ?: 1;
 	shellparam.optoff = -1;
 }
 
diff --git a/src/options.h b/src/options.h
index 975fe33..10bcb88 100644
--- a/src/options.h
+++ b/src/options.h
@@ -83,4 +83,4 @@ int shiftcmd(int, char **);
 int setcmd(int, char **);
 int getoptscmd(int, char **);
 int nextopt(const char *);
-void getoptsreset(const char *);
+void getoptsreset(const char *, int);
diff --git a/src/var.c b/src/var.c
index b70d72c..13a946b 100644
--- a/src/var.c
+++ b/src/var.c
@@ -270,7 +270,7 @@ struct var *setvareq(char *s, int flags)
 			goto out;
 
 		if (vp->func && (flags & VNOFUNC) == 0)
-			(*vp->func)(varnull(s));
+			(*vp->func)(varnull(s), flags);
 
 		if ((vp->flags & (VTEXTFIXED|VSTACK)) == 0)
 			ckfree(vp->text);
@@ -535,7 +535,7 @@ poplocalvars(void)
 			unsetvar(vp->text);
 		} else {
 			if (vp->func)
-				(*vp->func)(varnull(lvp->text));
+				(*vp->func)(varnull(lvp->text), lvp->flags);
 			if ((vp->flags & (VTEXTFIXED|VSTACK)) == 0)
 				ckfree(vp->text);
 			vp->flags = lvp->flags;
diff --git a/src/var.h b/src/var.h
index aa7575a..4329e22 100644
--- a/src/var.h
+++ b/src/var.h
@@ -56,7 +56,7 @@ struct var {
 	struct var *next;		/* next entry in hash list */
 	int flags;			/* flags are defined above */
 	const char *text;		/* name=value */
-	void (*func)(const char *);
+	void (*func)(const char *, int flags);
 					/* function to be called when  */
 					/* the variable gets set/unset */
 };
Herbert Xu Jan. 4, 2023, 9:59 a.m. UTC | #2
On Tue, Jan 03, 2023 at 01:26:19PM +0100, наб wrote:
>
> > However, I don't think we need to pass the flags to the action
> > function since none of them care about whether it's unset.  We
> > just need to pass a pointer to an empty string rather than some
> > bogus pointer.
> This, unsurprisingly, produces the same error, because getoptsreset()
> needs a valid number for OPTIND:
>   $ src/dash -c 'unset OPTIND'
>   src/dash: 1: unset: Illegal number:

You get the same result with OPTIND="" so I don't see the issue.

Am I missing something with respect to POSIX?

Thanks,
Ahelenia Ziemiańska Jan. 4, 2023, 12:54 p.m. UTC | #3
Hi!

On Wed, Jan 04, 2023 at 05:59:47PM +0800, Herbert Xu wrote:
> On Tue, Jan 03, 2023 at 01:26:19PM +0100, наб wrote:
> > > However, I don't think we need to pass the flags to the action
> > > function since none of them care about whether it's unset.  We
> > > just need to pass a pointer to an empty string rather than some
> > > bogus pointer.
> > This, unsurprisingly, produces the same error, because getoptsreset()
> > needs a valid number for OPTIND:
> >   $ src/dash -c 'unset OPTIND'
> >   src/dash: 1: unset: Illegal number:
> You get the same result with OPTIND="" so I don't see the issue.
Which is an unrelated operation, and obviously nonsensical.
It's good that it fails.

> Am I missing something with respect to POSIX?
AFAICT, after trawling through the Issue 8 Draft 2.1,
there are no special provisions for unset/OPTIND,
like there are for readonly.
(See the other branch of this thread, but to summarise:
 setting LINENO, OLDPWD, OPTARG, OPTIND, or PWD readonly /must/ fail in:
 (a) readonly xor (b) the builtin that edits it xor (c) on assignment.)
This indicates that "unset OPTIND" must behave as-specified,
i.e. unset OPTIND and remove it from the environment.

In dash, this really only affects OPTIND, since the only other special
variable that parses an int is sethistsize(), which explicitly has
-- >8 --
        if (hist != NULL) {
                if (hs == NULL || *hs == '\0' ||
                   (histsize = atoi(hs)) < 0)
                        histsize = 100;
                history(hist, &he, H_SETSIZE, histsize);
        }
-- >8 --
which considers HISTSIZE= and unset HISTSIZE to be equivalent;
the only other setter function that inspects the new value is for PATH,
and working out to the empty string works for that case.

We could also pass NULL when unsetting, I guess, since sethistsize() is
already equipped for it, if you prefer to not have a flags argument?

Best,
наб
Herbert Xu Jan. 4, 2023, 2:12 p.m. UTC | #4
On Wed, Jan 04, 2023 at 01:54:21PM +0100, наб wrote:
> 
> AFAICT, after trawling through the Issue 8 Draft 2.1,
> there are no special provisions for unset/OPTIND,

Well it says this regarding OPTIND:

: If the application sets OPTIND to the value 1, a new set of parameters
: can be used: either the current positional parameters or new arg
: values. Any other attempt to invoke getopts multiple times in a single
: shell execution environment with parameters (positional parameters
: or arg operands) that are not the same in all invocations, or with an
: OPTIND value modified to be a value other than 1, produces unspecified
: results.

Unsetting OPTIND ought to do the same thing as setting it to "",
which produces unspecified results and I see nothing wrong with
printing an error in this case.

Cheers,
Ahelenia Ziemiańska Jan. 4, 2023, 4:05 p.m. UTC | #5
Hi!

On Wed, Jan 04, 2023 at 10:12:36PM +0800, Herbert Xu wrote:
> On Wed, Jan 04, 2023 at 01:54:21PM +0100, наб wrote:
> > AFAICT, after trawling through the Issue 8 Draft 2.1,
> > there are no special provisions for unset/OPTIND,
> Well it says this regarding OPTIND:
> 
> : If the application sets OPTIND to the value 1, a new set of parameters
> : can be used: either the current positional parameters or new arg
> : values. Any other attempt to invoke getopts multiple times in a single
> : shell execution environment with parameters (positional parameters
> : or arg operands) that are not the same in all invocations, or with an
> : OPTIND value modified to be a value other than 1, produces unspecified
> : results.
> 
> Unsetting OPTIND ought to do the same thing as setting it to "",
On the basis of? Those are fundamentally unrelated operations,
and when they are to do the same thing, this is explicitly noted.

Additionally, this is a description of /getopts/, and therefore only
applies if you do "OPTIND=dupa; getopts ..." ‒ the getopts call is
unspecified.

/Any/ operation on OPTIND is legal, and the current behaviour is wrong
(but it seems that no-one has actually cared) ‒ POSIX defines this in
terms of getopts inspecting OPTIND when it's run.

"OPTIND=whatever" is legal and must result in OPTIND becoming whatever.
bash, ksh, and zsh all reduce it to OPTIND=0, so it appears that no-one
actually cares about this so erroring out is fine, I guess.
(mksh does the same but folds to 1; posh refuses the assignment),

"unset OPTIND" is legal, and subsequent getopts calls are well-defined
(it's unclear to me what it oughta do /per se/, but it's not "been set",
 so it's unclear what it's well-defined /to/, but since "the index of
 the next argument to be processed in the shell variable OPTIND" and
 that value doesn't exist, erroring out is fine I guess),
and all the above shells agree
(as does mksh; posh ignores the unset).
and this fell out of a debbug where someone used it
(and additionally cites a few more shells),
so people actually care about this behaviour.

In testing the poster's shells,
I found that yash behaves close to correct here:
-- >8 --
nabijaczleweli@tarta dash 2$ OPTIND=dupa
nabijaczleweli@tarta dash 2$ getopts a a a
getopts: $OPTIND has an invalid value
nabijaczleweli@tarta dash 1 2$ OPTIND=300
nabijaczleweli@tarta dash 2$ getopts a a a
nabijaczleweli@tarta dash 1 2$ unset OPTIND
nabijaczleweli@tarta dash 2$ set | grep -i optind
nabijaczleweli@tarta dash 1 2$ getopts a a a
getopts: $OPTIND has an invalid value
-- >8 --

As does busybox ash:
-- >8 --
~/code/dash $ OPTIND=dupa
~/code/dash $ set | grep -i optind
OPTIND='dupa'
~/code/dash $ getopts a a a
~/code/dash $ set | grep -i optind
OPTIND='1'
~/code/dash $ unset OPTIND
~/code/dash $ set | grep -i optind
_='OPTIND'
~/code/dash $ getopts a a a
~/code/dash $ set | grep -i optind
OPTIND='1'
-- >8 --

The minimal change required to achieve compatibility is my patch,
or some equivalent variant of it, like passing NULL instead of flags.
The minimal change required to achieve conformance would be
to get parse OPTIND in getopts like yash.

There's code in the wild that uses "unset OPTIND; getopts ..."
instead of "OPTIND=1; getopts ...":
  https://codesearch.debian.net/search?q=unset+OPTIND&literal=1&perpkg=1
but those are all bash programs, so it's up to you whether they should
receive explicit provisions for this.

I'm including a PoC scissor-patch below; in particular the ternary
should either be turned into an equivalent of ': "1"' if you want to
support "unset OPTIND; getopts ..." or produce a better error than
"./dash: 1: getopts: Illegal number: unset", but you get the idea
(and shellparam should get fully boolified
 and dash.1 wants to have an explicit note that reset is OPTIND=1,
 but I can follow up with that if we're in agreement here).

Best,
-- >8 --
Subject: [PATCH] options: lazy-parse OPTIND in getopts, per POSIX

This allows "unset OPTIND" and "OPTIND=dupa" to execute as-written
(and as required by POSIX), and catches them as an error in getopts,
which exits 2 ("An error occurred.").

Fixes: https://bugs.debian.org/985478
---
 src/options.c | 30 +++++++++++++++++-------------
 src/options.h |  5 ++++-
 src/var.h     |  1 +
 3 files changed, 22 insertions(+), 14 deletions(-)

diff --git a/src/options.c b/src/options.c
index a46c23b..578a13f 100644
--- a/src/options.c
+++ b/src/options.c
@@ -164,7 +164,7 @@ setarg0:
 	shellparam.p = xargv;
 	shellparam.optind = 1;
 	shellparam.optoff = -1;
-	/* assert(shellparam.malloc == 0 && shellparam.nparam == 0); */
+	/* assert(!shellparam.malloc && !shellparam.optreparse && shellparam.nparam == 0); */
 	while (*xargv) {
 		shellparam.nparam++;
 		xargv++;
@@ -390,11 +390,9 @@ setcmd(int argc, char **argv)
 
 
 void
-getoptsreset(value)
-	const char *value;
+getoptsreset(const char *value)
 {
-	shellparam.optind = number(value) ?: 1;
-	shellparam.optoff = -1;
+	shellparam.optreparse = true;
 }
 
 /*
@@ -408,22 +406,28 @@ int
 getoptscmd(int argc, char **argv)
 {
 	char **optbase;
+	unsigned max;
 
 	if (argc < 3)
 		sh_error("Usage: getopts optstring var [arg]");
 	else if (argc == 3) {
 		optbase = shellparam.p;
-		if ((unsigned)shellparam.optind > shellparam.nparam + 1) {
-			shellparam.optind = 1;
-			shellparam.optoff = -1;
-		}
+		max = shellparam.nparam + 1;
 	}
 	else {
 		optbase = &argv[3];
-		if ((unsigned)shellparam.optind > argc - 2) {
-			shellparam.optind = 1;
-			shellparam.optoff = -1;
-		}
+		max = argc - 2;
+	}
+
+	if (shellparam.optreparse) {
+		shellparam.optind = number(optindset() ? optindval() : "unset") ?: 1;
+		shellparam.optoff = -1;
+		shellparam.optreparse = false;
+	}
+
+	if ((unsigned)shellparam.optind > max) {
+		shellparam.optind = 1;
+		shellparam.optoff = -1;
 	}
 
 	return getopts(argv[1], argv[2], optbase);
diff --git a/src/options.h b/src/options.h
index 975fe33..8aebf9f 100644
--- a/src/options.h
+++ b/src/options.h
@@ -34,9 +34,12 @@
  *	@(#)options.h	8.2 (Berkeley) 5/4/95
  */
 
+#include <stdbool.h>
+
 struct shparam {
 	int nparam;		/* # of positional parameters (without $0) */
-	unsigned char malloc;	/* if parameter list dynamically allocated */
+	bool malloc;		/* if parameter list dynamically allocated */
+	bool optreparse;	/* re-parse optind/optoff from $OPTIND */
 	char **p;		/* parameter list */
 	int optind;		/* next parameter to be processed by getopts */
 	int optoff;		/* used by getopts */
diff --git a/src/var.h b/src/var.h
index aa7575a..ca90a68 100644
--- a/src/var.h
+++ b/src/var.h
@@ -133,6 +133,7 @@ extern char linenovar[];
 #define attyset()	((vatty.flags & VUNSET) == 0)
 #endif
 #define mpathset()	((vmpath.flags & VUNSET) == 0)
+#define optindset()	((voptind.flags & VUNSET) == 0)
 
 void initvar(void);
 struct var *setvar(const char *name, const char *val, int flags);
Harald van Dijk Jan. 4, 2023, 4:50 p.m. UTC | #6
On 04/01/2023 16:05, наб wrote:
> Hi!
> 
> On Wed, Jan 04, 2023 at 10:12:36PM +0800, Herbert Xu wrote:
>> On Wed, Jan 04, 2023 at 01:54:21PM +0100, наб wrote:
>>> AFAICT, after trawling through the Issue 8 Draft 2.1,
>>> there are no special provisions for unset/OPTIND,
>> Well it says this regarding OPTIND:
>>
>> : If the application sets OPTIND to the value 1, a new set of parameters
>> : can be used: either the current positional parameters or new arg
>> : values. Any other attempt to invoke getopts multiple times in a single
>> : shell execution environment with parameters (positional parameters
>> : or arg operands) that are not the same in all invocations, or with an
>> : OPTIND value modified to be a value other than 1, produces unspecified
>> : results.
>>
>> Unsetting OPTIND ought to do the same thing as setting it to "",
> On the basis of? Those are fundamentally unrelated operations,
> and when they are to do the same thing, this is explicitly noted.

On the basis of common sense, keeping in mind that different people's 
ideas of common sense will be different. In this one, I'm tempted to 
agree with Herbert Xu, $OPTIND would expand to an empty string whether 
OPTIND is unset or set to an empty string, assuming set -u is not in 
effect, so for the purpose of the getopts command it would be good for 
it to behave the same way there too.

> Additionally, this is a description of /getopts/, and therefore only
> applies if you do "OPTIND=dupa; getopts ..." ‒ the getopts call is
> unspecified.
> 
> /Any/ operation on OPTIND is legal, and the current behaviour is wrong
> (but it seems that no-one has actually cared) ‒ POSIX defines this in
> terms of getopts inspecting OPTIND when it's run.

Indeed, but dash isn't the only shell that behaves this way (posh does 
too) and I'm not sure whether this is what POSIX intends to specify. I 
agree with your interpretation of what POSIX actually says, but it would 
be good for POSIX to reword it regardless of what is intended to be 
clearer that this case was actually considered.

Personally, I do think it is better if shells allow assigning arbitrary 
values to OPTIND, including unsetting it, and only have the getopts 
command raise an error if the value is non-numeric, but that is my 
personal opinion and I don't see much of a problem if dash decides to go 
the other way, unless POSIX makes it explicit that it is not permitted 
for shells to do this. FWIW, I did make that change for my version, 
<https://github.com/hvdijk/gwsh/commit/0df0ba33748eb3881b07cb724fd4fa54422ef2bc>, 
if that change is desired for dash it is easy to apply.

Cheers,
Harald van Dijk
diff mbox series

Patch

diff --git a/src/var.c b/src/var.c
index ef9c2bd..f42bfd7 100644
--- a/src/var.c
+++ b/src/var.c
@@ -266,7 +266,7 @@  struct var *setvareq(char *s, int flags)
 			goto out;
 
 		if (vp->func && (flags & VNOFUNC) == 0)
-			(*vp->func)(strchrnul(s, '=') + 1);
+			(*vp->func)((strchr(s, '=') ?: nullstr - 1) + 1);
 
 		if ((vp->flags & (VTEXTFIXED|VSTACK)) == 0)
 			ckfree(vp->text);
@@ -531,7 +531,8 @@  poplocalvars(void)
 			unsetvar(vp->text);
 		} else {
 			if (vp->func)
-				(*vp->func)(strchrnul(lvp->text, '=') + 1);
+				(*vp->func)((strchr(lvp->text, '=') ?:
+					     nullstr - 1) + 1);
 			if ((vp->flags & (VTEXTFIXED|VSTACK)) == 0)
 				ckfree(vp->text);
 			vp->flags = lvp->flags;