Message ID | Y8ijpJqtkDTi792i@coredump.intra.peff.net (mailing list archive) |
---|---|
State | Accepted |
Commit | 590b63673747597ab458cb4414e68bb7c4a66aea |
Headers | show |
Series | hash-object: fix descriptor leak with --literally | expand |
Jeff King <peff@peff.net> writes: > In hash_object(), we open a descriptor for each file to hash (whether we > got the filename from the command line or --stdin-paths), but never > close it. For the traditional code path which feeds the result to > index_fd(), this is OK; it closes the descriptor for us. > > But 5ba9a93b39 (hash-object: add --literally option, 2014-09-11) a > second code path which does not close the descriptor. A sentence without verb? "5ba9 (hash-...) added a second code path, which does not close the descriptor." or something? > After this patch, it completes successfully. I didn't bother with a > test, as it's a pain to deal with descriptor limits portably, and the > fix is so trivial. True. Will queue. Thanks. > I do think the world would be less confusing if index_fd() didn't close > the descriptor we pass it, and then hash_file() could just do: > > fd = open(); > hash_fd(fd); > close(fd); > > which is much more readable. But it has many other callers. So even if > we wanted to untangle all that, I think it makes sense to do this > obvious fix in the meantime. Indeed, thanks. > builtin/hash-object.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/builtin/hash-object.c b/builtin/hash-object.c > index b506381502..44db83f07f 100644 > --- a/builtin/hash-object.c > +++ b/builtin/hash-object.c > @@ -27,6 +27,7 @@ static int hash_literally(struct object_id *oid, int fd, const char *type, unsig > else > ret = write_object_file_literally(buf.buf, buf.len, type, oid, > flags); > + close(fd); > strbuf_release(&buf); > return ret; > }
On Wed, Jan 18, 2023 at 10:26:40PM -0800, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > > > In hash_object(), we open a descriptor for each file to hash (whether we > > got the filename from the command line or --stdin-paths), but never > > close it. For the traditional code path which feeds the result to > > index_fd(), this is OK; it closes the descriptor for us. > > > > But 5ba9a93b39 (hash-object: add --literally option, 2014-09-11) a > > second code path which does not close the descriptor. > > A sentence without verb? "5ba9 (hash-...) added a second code path, > which does not close the descriptor." or something? Yes, the missing word was "added". Thanks. -Peff
diff --git a/builtin/hash-object.c b/builtin/hash-object.c index b506381502..44db83f07f 100644 --- a/builtin/hash-object.c +++ b/builtin/hash-object.c @@ -27,6 +27,7 @@ static int hash_literally(struct object_id *oid, int fd, const char *type, unsig else ret = write_object_file_literally(buf.buf, buf.len, type, oid, flags); + close(fd); strbuf_release(&buf); return ret; }
In hash_object(), we open a descriptor for each file to hash (whether we got the filename from the command line or --stdin-paths), but never close it. For the traditional code path which feeds the result to index_fd(), this is OK; it closes the descriptor for us. But 5ba9a93b39 (hash-object: add --literally option, 2014-09-11) a second code path which does not close the descriptor. There we need to do so ourselves. You can see the problem in a clone of git.git like this: $ git ls-files -s | grep ^100644 | cut -f2 | git hash-object --stdin-paths --literally >/dev/null fatal: could not open 'builtin/var.c' for reading: Too many open files After this patch, it completes successfully. I didn't bother with a test, as it's a pain to deal with descriptor limits portably, and the fix is so trivial. Signed-off-by: Jeff King <peff@peff.net> --- Something I ran into while testing my hash-object fsck series, but I broke it off here because it's really an independent bug-fix. I do think the world would be less confusing if index_fd() didn't close the descriptor we pass it, and then hash_file() could just do: fd = open(); hash_fd(fd); close(fd); which is much more readable. But it has many other callers. So even if we wanted to untangle all that, I think it makes sense to do this obvious fix in the meantime. builtin/hash-object.c | 1 + 1 file changed, 1 insertion(+)