diff mbox series

[RFC,v2,12/12] clean: fix theoretical path corruption

Message ID 20190905154735.29784-13-newren@gmail.com (mailing list archive)
State New, archived
Headers show
Series Fix some git clean issues | expand

Commit Message

Elijah Newren Sept. 5, 2019, 3:47 p.m. UTC
cmd_clean() had the following code structure:

    struct strbuf abs_path = STRBUF_INIT;
    for_each_string_list_item(item, &del_list) {
        strbuf_addstr(&abs_path, prefix);
        strbuf_addstr(&abs_path, item->string);
        PROCESS(&abs_path);
        strbuf_reset(&abs_path);
    }

where I've elided a bunch of unnecessary details and PROCESS(&abs_path)
represents a big chunk of code rather than an actual function call.  One
piece of PROCESS was:

    if (lstat(abs_path.buf, &st))
        continue;

which would cause the strbuf_reset() to be missed -- meaning that the
next path to be handled would have two paths concatenated.  This path
used to use die_errno() instead of continue prior to commit 396049e5fb62
("git-clean: refactor git-clean into two phases", 2013-06-25), but my
understanding of how correct_untracked_entries() works is that it will
prevent both dir/ and dir/file from being in the list to clean so this
should be dead code and the die_errno() should be safe.  But I hesitate
to remove it since I am not certain.  Instead, just fix it to avoid path
corruption in case it is possible to reach this continue statement.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 builtin/clean.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Comments

SZEDER Gábor Sept. 5, 2019, 7:27 p.m. UTC | #1
On Thu, Sep 05, 2019 at 08:47:35AM -0700, Elijah Newren wrote:
> cmd_clean() had the following code structure:
> 
>     struct strbuf abs_path = STRBUF_INIT;
>     for_each_string_list_item(item, &del_list) {
>         strbuf_addstr(&abs_path, prefix);
>         strbuf_addstr(&abs_path, item->string);
>         PROCESS(&abs_path);
>         strbuf_reset(&abs_path);
>     }
> 
> where I've elided a bunch of unnecessary details and PROCESS(&abs_path)
> represents a big chunk of code rather than an actual function call.  One
> piece of PROCESS was:
> 
>     if (lstat(abs_path.buf, &st))
>         continue;
> 
> which would cause the strbuf_reset() to be missed -- meaning that the
> next path to be handled would have two paths concatenated.  This path
> used to use die_errno() instead of continue prior to commit 396049e5fb62
> ("git-clean: refactor git-clean into two phases", 2013-06-25), but my
> understanding of how correct_untracked_entries() works is that it will
> prevent both dir/ and dir/file from being in the list to clean so this
> should be dead code and the die_errno() should be safe.  But I hesitate
> to remove it since I am not certain.  Instead, just fix it to avoid path
> corruption in case it is possible to reach this continue statement.
> 
> Signed-off-by: Elijah Newren <newren@gmail.com>
> ---
>  builtin/clean.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/builtin/clean.c b/builtin/clean.c
> index 6030842f3a..ccb6e23f0b 100644
> --- a/builtin/clean.c
> +++ b/builtin/clean.c
> @@ -1028,8 +1028,10 @@ int cmd_clean(int argc, const char **argv, const char *prefix)
>  		 * recursive directory removal, so lstat() here could
>  		 * fail with ENOENT.
>  		 */
> -		if (lstat(abs_path.buf, &st))
> +		if (lstat(abs_path.buf, &st)) {
> +			strbuf_reset(&abs_path);
>  			continue;
> +		}

I wonder whether it would be safer to call strbuf_reset() at the start
of each loop iteration instead of before 'continue'.  That way we
wouldn't have to worry about another 'continue' statements forgetting
about it.

It probably doesn't really matter in this particular case (considering
that it's potentially dead code to begin with), but have a look at
e.g. diff.c:show_stats() and its several strbuf_reset(&out) calls
preceeding continue statements.

>  		if (S_ISDIR(st.st_mode)) {
>  			if (remove_dirs(&abs_path, prefix, rm_flags, dry_run, quiet, &gone))
> -- 
> 2.22.1.11.g45a39ee867
>
Elijah Newren Sept. 7, 2019, 12:34 a.m. UTC | #2
On Thu, Sep 5, 2019 at 12:27 PM SZEDER Gábor <szeder.dev@gmail.com> wrote:
>
> On Thu, Sep 05, 2019 at 08:47:35AM -0700, Elijah Newren wrote:
> > cmd_clean() had the following code structure:
> >
> >     struct strbuf abs_path = STRBUF_INIT;
> >     for_each_string_list_item(item, &del_list) {
> >         strbuf_addstr(&abs_path, prefix);
> >         strbuf_addstr(&abs_path, item->string);
> >         PROCESS(&abs_path);
> >         strbuf_reset(&abs_path);
> >     }
> >
> > where I've elided a bunch of unnecessary details and PROCESS(&abs_path)
> > represents a big chunk of code rather than an actual function call.  One
> > piece of PROCESS was:
> >
> >     if (lstat(abs_path.buf, &st))
> >         continue;
> >
> > which would cause the strbuf_reset() to be missed -- meaning that the
> > next path to be handled would have two paths concatenated.  This path
> > used to use die_errno() instead of continue prior to commit 396049e5fb62
> > ("git-clean: refactor git-clean into two phases", 2013-06-25), but my
> > understanding of how correct_untracked_entries() works is that it will
> > prevent both dir/ and dir/file from being in the list to clean so this
> > should be dead code and the die_errno() should be safe.  But I hesitate
> > to remove it since I am not certain.  Instead, just fix it to avoid path
> > corruption in case it is possible to reach this continue statement.
> >
> > Signed-off-by: Elijah Newren <newren@gmail.com>
> > ---
> >  builtin/clean.c | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/builtin/clean.c b/builtin/clean.c
> > index 6030842f3a..ccb6e23f0b 100644
> > --- a/builtin/clean.c
> > +++ b/builtin/clean.c
> > @@ -1028,8 +1028,10 @@ int cmd_clean(int argc, const char **argv, const char *prefix)
> >                * recursive directory removal, so lstat() here could
> >                * fail with ENOENT.
> >                */
> > -             if (lstat(abs_path.buf, &st))
> > +             if (lstat(abs_path.buf, &st)) {
> > +                     strbuf_reset(&abs_path);
> >                       continue;
> > +             }
>
> I wonder whether it would be safer to call strbuf_reset() at the start
> of each loop iteration instead of before 'continue'.  That way we
> wouldn't have to worry about another 'continue' statements forgetting
> about it.
>
> It probably doesn't really matter in this particular case (considering
> that it's potentially dead code to begin with), but have a look at
> e.g. diff.c:show_stats() and its several strbuf_reset(&out) calls
> preceeding continue statements.

Ooh, I like that idea.  I think I'll apply that here.  I'll probably
leave diff.c:show_stats() as #leftoverbits for someone else, though I
really like the idea of fixing up other issues like this as you
suggest.
diff mbox series

Patch

diff --git a/builtin/clean.c b/builtin/clean.c
index 6030842f3a..ccb6e23f0b 100644
--- a/builtin/clean.c
+++ b/builtin/clean.c
@@ -1028,8 +1028,10 @@  int cmd_clean(int argc, const char **argv, const char *prefix)
 		 * recursive directory removal, so lstat() here could
 		 * fail with ENOENT.
 		 */
-		if (lstat(abs_path.buf, &st))
+		if (lstat(abs_path.buf, &st)) {
+			strbuf_reset(&abs_path);
 			continue;
+		}
 
 		if (S_ISDIR(st.st_mode)) {
 			if (remove_dirs(&abs_path, prefix, rm_flags, dry_run, quiet, &gone))