diff mbox series

sparse-checkout: use fdopen_lock_file() instead of xfdopen()

Message ID 20240905082749.GA88220@coredump.intra.peff.net (mailing list archive)
State New
Headers show
Series sparse-checkout: use fdopen_lock_file() instead of xfdopen() | expand

Commit Message

Jeff King Sept. 5, 2024, 8:27 a.m. UTC
When updating sparse patterns, we open a lock_file to write out the new
data. The lock_file struct holds the file descriptor, but we call
fdopen() to get a stdio handle to do the actual write.

After we finish writing, we fflush() so that all of the data is on disk,
and then call commit_lock_file() which closes the descriptor. But we
never fclose() the stdio handle, leaking it.

The obvious solution seems like it would be to just call fclose(). But
when? If we do it before commit_lock_file(), then the lock_file code is
left thinking it owns the now-closed file descriptor, and will do an
extra close() on the descriptor. But if we do it before, we have the
opposite problem: the lock_file code will close the descriptor, and
fclose() will do the extra close().

We can handle this correctly by using fdopen_lock_file(). That leaves
ownership of the stdio handle with the lock_file, which knows not to
double-close it.

We do have to adjust the code a bit:

  - we have to handle errors ourselves; we can just die(), since that's
    what xfdopen() would have done (and we can even provide a more
    specific error message).

  - we no longer need to call fflush(); committing the lock-file
    auto-closes it, which will now do the flush for us. As a bonus, this
    will actually check that the flush was successful before renaming
    the file into place. Let's likewise report when committing the lock
    fails (rather than quietly returning success from the command).

  - we can get rid of the local "fd" variable, since we never look at it
    ourselves now

Signed-off-by: Jeff King <peff@peff.net>
---
Curiously, this is not reported as a leak by LSan, and it is included in
the "reachable" set by valgrind. I imagine this is because libc has to
maintain a list of open writable handles, since fflush(NULL) is supposed
to flush all of them. I peeked at other fdopen(..., "w") calls to see if
there were other similar spots, but didn't notice any.

I found this because I was building git on an Android system, and they
have an "fdsan" that complains when the underlying descriptor of a
handle is closed. It flagged some other spots, too, but the rest that I
looked at involved the tempfile atexit() handler closing the descriptors
manually (we don't care about flushing there, as our goal is to just
prevent the "can't delete an open file" problem on Windows).

 builtin/sparse-checkout.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

Comments

Patrick Steinhardt Sept. 5, 2024, 10:54 a.m. UTC | #1
On Thu, Sep 05, 2024 at 04:27:49AM -0400, Jeff King wrote:
> diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c
> index 2604ab04df..f1bd31b2f7 100644
> --- a/builtin/sparse-checkout.c
> +++ b/builtin/sparse-checkout.c
> @@ -336,8 +335,7 @@ static int write_patterns_and_update(struct pattern_list *pl)
>  	if (safe_create_leading_directories(sparse_filename))
>  		die(_("failed to create directory for sparse-checkout file"));
>  
> -	fd = hold_lock_file_for_update(&lk, sparse_filename,
> -				      LOCK_DIE_ON_ERROR);
> +	hold_lock_file_for_update(&lk, sparse_filename, LOCK_DIE_ON_ERROR);
>  	free(sparse_filename);
>  
>  	result = update_working_directory(pl);

Okay, we die on an error so there is no need to check `fd`, either.
Makes sense.

> @@ -348,15 +346,17 @@ static int write_patterns_and_update(struct pattern_list *pl)
>  		return result;
>  	}
>  
> -	fp = xfdopen(fd, "w");
> +	fp = fdopen_lock_file(&lk, "w");
> +	if (!fp)
> +		die_errno(_("unable to fdopen %s"), get_lock_file_path(&lk));
>  
>  	if (core_sparse_checkout_cone)
>  		write_cone_to_file(fp, pl);
>  	else
>  		write_patterns_to_file(fp, pl);
>  
> -	fflush(fp);
> -	commit_lock_file(&lk);
> +	if (commit_lock_file(&lk))
> +		die_errno(_("unable to write %s"), get_locked_file_path(&lk));
>  
>  	clear_pattern_list(pl);

I think the error handling is broken. `commit_lock_file()` calls
`rename_tempfile()`, which deletes the temporary file even in the error
case. The consequence is that `lk->tempfile` will be set to the `NULL`
pointer. When we call `get_locked_file_path()` we then dereference it
unconditionally and would thus segfault.

Other than that this patch looks sensible. Also kind of curious that we
never checked that `commit_lock_file()` was successful, so we wouldn't
even notice when renaming the file into place failed.

Patrick
Junio C Hamano Sept. 5, 2024, 3:16 p.m. UTC | #2
Jeff King <peff@peff.net> writes:

> We do have to adjust the code a bit:
>
>   - we have to handle errors ourselves; we can just die(), since that's
>     what xfdopen() would have done (and we can even provide a more
>     specific error message).
>
>   - we no longer need to call fflush(); committing the lock-file
>     auto-closes it, which will now do the flush for us. As a bonus, this
>     will actually check that the flush was successful before renaming
>     the file into place. Let's likewise report when committing the lock
>     fails (rather than quietly returning success from the command).
>
>   - we can get rid of the local "fd" variable, since we never look at it
>     ourselves now

OK.  The neessary change is surprisingly small.

> I found this because I was building git on an Android system,...

Sounds like fun.

Will queue.  Thanks.
Junio C Hamano Sept. 5, 2024, 9:16 p.m. UTC | #3
Patrick Steinhardt <ps@pks.im> writes:

>> +	if (commit_lock_file(&lk))
>> +		die_errno(_("unable to write %s"), get_locked_file_path(&lk));
>>  
>>  	clear_pattern_list(pl);
>
> I think the error handling is broken. `commit_lock_file()` calls
> `rename_tempfile()`, which deletes the temporary file even in the error
> case. The consequence is that `lk->tempfile` will be set to the `NULL`
> pointer. When we call `get_locked_file_path()` we then dereference it
> unconditionally and would thus segfault.

Hmph.  Would this be sufficient as a band-aid, then?

 builtin/sparse-checkout.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git c/builtin/sparse-checkout.c w/builtin/sparse-checkout.c
index f1bd31b2f7..60363fd056 100644
--- c/builtin/sparse-checkout.c
+++ w/builtin/sparse-checkout.c
@@ -356,7 +356,7 @@ static int write_patterns_and_update(struct pattern_list *pl)
 		write_patterns_to_file(fp, pl);
 
 	if (commit_lock_file(&lk))
-		die_errno(_("unable to write %s"), get_locked_file_path(&lk));
+		die_errno(_("unable to write %s"), sparse_filename);
 
 	clear_pattern_list(pl);
Junio C Hamano Sept. 5, 2024, 9:20 p.m. UTC | #4
Junio C Hamano <gitster@pobox.com> writes:

> Patrick Steinhardt <ps@pks.im> writes:
>
>>> +	if (commit_lock_file(&lk))
>>> +		die_errno(_("unable to write %s"), get_locked_file_path(&lk));
>>>  
>>>  	clear_pattern_list(pl);
>>
>> I think the error handling is broken. `commit_lock_file()` calls
>> `rename_tempfile()`, which deletes the temporary file even in the error
>> case. The consequence is that `lk->tempfile` will be set to the `NULL`
>> pointer. When we call `get_locked_file_path()` we then dereference it
>> unconditionally and would thus segfault.
>
> Hmph.  Would this be sufficient as a band-aid, then?

Of course not.  That would refer to a piece of memory that we
already free'ed in this function.

Perhaps like this?

diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c
index f1bd31b2f7..27d181a612 100644
--- a/builtin/sparse-checkout.c
+++ b/builtin/sparse-checkout.c
@@ -328,7 +328,7 @@ static int write_patterns_and_update(struct pattern_list *pl)
 	char *sparse_filename;
 	FILE *fp;
 	struct lock_file lk = LOCK_INIT;
-	int result;
+	int result = 0;
 
 	sparse_filename = get_sparse_checkout_filename();
 
@@ -336,19 +336,18 @@ static int write_patterns_and_update(struct pattern_list *pl)
 		die(_("failed to create directory for sparse-checkout file"));
 
 	hold_lock_file_for_update(&lk, sparse_filename, LOCK_DIE_ON_ERROR);
-	free(sparse_filename);
 
 	result = update_working_directory(pl);
 	if (result) {
 		rollback_lock_file(&lk);
 		clear_pattern_list(pl);
 		update_working_directory(NULL);
-		return result;
+		goto out;
 	}
 
 	fp = fdopen_lock_file(&lk, "w");
 	if (!fp)
-		die_errno(_("unable to fdopen %s"), get_lock_file_path(&lk));
+		die_errno(_("unable to fdopen %s"), sparse_filename);
 
 	if (core_sparse_checkout_cone)
 		write_cone_to_file(fp, pl);
@@ -356,11 +355,13 @@ static int write_patterns_and_update(struct pattern_list *pl)
 		write_patterns_to_file(fp, pl);
 
 	if (commit_lock_file(&lk))
-		die_errno(_("unable to write %s"), get_locked_file_path(&lk));
+		die_errno(_("unable to write %s"), sparse_filename);
 
 	clear_pattern_list(pl);
 
-	return 0;
+out:
+	free(sparse_filename);
+	return result;
 }
 
 enum sparse_checkout_mode {
Jeff King Sept. 6, 2024, 1:19 a.m. UTC | #5
On Thu, Sep 05, 2024 at 02:20:18PM -0700, Junio C Hamano wrote:

> >> I think the error handling is broken. `commit_lock_file()` calls
> >> `rename_tempfile()`, which deletes the temporary file even in the error
> >> case. The consequence is that `lk->tempfile` will be set to the `NULL`
> >> pointer. When we call `get_locked_file_path()` we then dereference it
> >> unconditionally and would thus segfault.
> >
> > Hmph.  Would this be sufficient as a band-aid, then?
> 
> Of course not.  That would refer to a piece of memory that we
> already free'ed in this function.
> 
> Perhaps like this?

That works, though...

>  	fp = fdopen_lock_file(&lk, "w");
>  	if (!fp)
> -		die_errno(_("unable to fdopen %s"), get_lock_file_path(&lk));
> +		die_errno(_("unable to fdopen %s"), sparse_filename);
>  
>  	if (core_sparse_checkout_cone)
>  		write_cone_to_file(fp, pl);
> @@ -356,11 +355,13 @@ static int write_patterns_and_update(struct pattern_list *pl)
>  		write_patterns_to_file(fp, pl);
>  
>  	if (commit_lock_file(&lk))
> -		die_errno(_("unable to write %s"), get_locked_file_path(&lk));
> +		die_errno(_("unable to write %s"), sparse_filename);

Note the difference between "get_lock" and "get_locked" in these two.
The first will mention the tempfile name, and the second the destination
filename (and sparse_filename is the latter).

I don't know that it really matters much in practice, though. If
fdopen() fails it probably has nothing to do with the file itself, and
is either lack of memory or a programming bug (e.g., bogus descriptor).

I'll pick up this change, but I'll split the whole error-handling fix
into its own patch, since it's getting more complicated.

Will send v2 later tonight. Thanks, Patrick, for noticing the problem in
the first place.

-Peff
Junio C Hamano Sept. 6, 2024, 2:55 p.m. UTC | #6
Jeff King <peff@peff.net> writes:

>>  	fp = fdopen_lock_file(&lk, "w");
>>  	if (!fp)
>> -		die_errno(_("unable to fdopen %s"), get_lock_file_path(&lk));
>> +		die_errno(_("unable to fdopen %s"), sparse_filename);
>>  
>>  	if (core_sparse_checkout_cone)
>>  		write_cone_to_file(fp, pl);
>> @@ -356,11 +355,13 @@ static int write_patterns_and_update(struct pattern_list *pl)
>>  		write_patterns_to_file(fp, pl);
>>  
>>  	if (commit_lock_file(&lk))
>> -		die_errno(_("unable to write %s"), get_locked_file_path(&lk));
>> +		die_errno(_("unable to write %s"), sparse_filename);
>
> Note the difference between "get_lock" and "get_locked" in these two.
> The first will mention the tempfile name, and the second the destination
> filename (and sparse_filename is the latter).

I did consider ... to write %s.%s", sparse_filename, LOCK_SUFFIX)
but then thought that the final name is what is more relevant to the
end user.  Yes, while correcting unrelated error, I shouldn't have
tried to improve unrelated end-user experience ;-).

> Will send v2 later tonight. Thanks, Patrick, for noticing the problem in
> the first place.

Yeah, thanks, both.
diff mbox series

Patch

diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c
index 2604ab04df..f1bd31b2f7 100644
--- a/builtin/sparse-checkout.c
+++ b/builtin/sparse-checkout.c
@@ -327,7 +327,6 @@  static int write_patterns_and_update(struct pattern_list *pl)
 {
 	char *sparse_filename;
 	FILE *fp;
-	int fd;
 	struct lock_file lk = LOCK_INIT;
 	int result;
 
@@ -336,8 +335,7 @@  static int write_patterns_and_update(struct pattern_list *pl)
 	if (safe_create_leading_directories(sparse_filename))
 		die(_("failed to create directory for sparse-checkout file"));
 
-	fd = hold_lock_file_for_update(&lk, sparse_filename,
-				      LOCK_DIE_ON_ERROR);
+	hold_lock_file_for_update(&lk, sparse_filename, LOCK_DIE_ON_ERROR);
 	free(sparse_filename);
 
 	result = update_working_directory(pl);
@@ -348,15 +346,17 @@  static int write_patterns_and_update(struct pattern_list *pl)
 		return result;
 	}
 
-	fp = xfdopen(fd, "w");
+	fp = fdopen_lock_file(&lk, "w");
+	if (!fp)
+		die_errno(_("unable to fdopen %s"), get_lock_file_path(&lk));
 
 	if (core_sparse_checkout_cone)
 		write_cone_to_file(fp, pl);
 	else
 		write_patterns_to_file(fp, pl);
 
-	fflush(fp);
-	commit_lock_file(&lk);
+	if (commit_lock_file(&lk))
+		die_errno(_("unable to write %s"), get_locked_file_path(&lk));
 
 	clear_pattern_list(pl);