Message ID | dbef8168bc7044dbc8471c5ecc7146cf3e979263.1635515959.git.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Allow clean/smudge filters to handle huge files in the LLP64 data model | expand |
"Matt Cooper via GitGitGadget" <gitgitgadget@gmail.com> writes: > -void *read_blob_entry(const struct cache_entry *ce, unsigned long *size) > +void *read_blob_entry(const struct cache_entry *ce, size_t *size) > { > enum object_type type; > - void *blob_data = read_object_file(&ce->oid, &type, size); > + unsigned long ul; > + void *blob_data = read_object_file(&ce->oid, &type, &ul); > > + *size = ul; It is a bit curious place to draw the line; we want to make sure that blob_entry can hold huge data, but in this step we do not mind read_object_file() is not capable of going full 64-bit? I guess I'll see soon enough why by reading later steps. I can see that for the purpose of making write_entry() aware of the size_t, this is necessary at the minimum. Looking good. > if (blob_data) { > if (type == OBJ_BLOB) > return blob_data; > @@ -270,7 +272,7 @@ static int write_entry(struct cache_entry *ce, char *path, struct conv_attrs *ca > int fd, ret, fstat_done = 0; > char *new_blob; > struct strbuf buf = STRBUF_INIT; > - unsigned long size; > + size_t size; > ssize_t wrote; > size_t newsize = 0; > struct stat st; > diff --git a/entry.h b/entry.h > index b8c0e170dc7..61ee8c17604 100644 > --- a/entry.h > +++ b/entry.h > @@ -51,7 +51,7 @@ int finish_delayed_checkout(struct checkout *state, int *nr_checkouts); > */ > void unlink_entry(const struct cache_entry *ce); > > -void *read_blob_entry(const struct cache_entry *ce, unsigned long *size); > +void *read_blob_entry(const struct cache_entry *ce, size_t *size); > int fstat_checkout_output(int fd, const struct checkout *state, struct stat *st); > void update_ce_after_write(const struct checkout *state, struct cache_entry *ce, > struct stat *st); > diff --git a/parallel-checkout.c b/parallel-checkout.c > index 6b1af32bb3d..b6f4a25642e 100644 > --- a/parallel-checkout.c > +++ b/parallel-checkout.c > @@ -261,7 +261,7 @@ static int write_pc_item_to_fd(struct parallel_checkout_item *pc_item, int fd, > struct stream_filter *filter; > struct strbuf buf = STRBUF_INIT; > char *blob; > - unsigned long size; > + size_t size; > ssize_t wrote; > > /* Sanity check */ > diff --git a/t/t1051-large-conversion.sh b/t/t1051-large-conversion.sh > index bff86c13208..8b23d862600 100755 > --- a/t/t1051-large-conversion.sh > +++ b/t/t1051-large-conversion.sh > @@ -85,7 +85,7 @@ test_expect_success 'ident converts on output' ' > > # This smudge filter prepends 5GB of zeros to the file it checks out. This > # ensures that smudging doesn't mangle large files on 64-bit Windows. > -test_expect_failure EXPENSIVE,SIZE_T_IS_64BIT,!LONG_IS_64BIT \ > +test_expect_success EXPENSIVE,SIZE_T_IS_64BIT,!LONG_IS_64BIT \ > 'files over 4GB convert on output' ' > test_commit test small "a small file" && > test_config filter.makelarge.smudge \
Hi Junio, On Fri, 29 Oct 2021, Junio C Hamano wrote: > "Matt Cooper via GitGitGadget" <gitgitgadget@gmail.com> writes: > > > -void *read_blob_entry(const struct cache_entry *ce, unsigned long *size) > > +void *read_blob_entry(const struct cache_entry *ce, size_t *size) > > { > > enum object_type type; > > - void *blob_data = read_object_file(&ce->oid, &type, size); > > + unsigned long ul; > > + void *blob_data = read_object_file(&ce->oid, &type, &ul); > > > > + *size = ul; > > It is a bit curious place to draw the line; we want to make sure > that blob_entry can hold huge data, but in this step we do not mind > read_object_file() is not capable of going full 64-bit? Indeed that is the case. The consideration here is: how large of a patch series do I want to take this late in the cycle? Here is the call tree (for full details, look no further than https://github.com/git-for-windows/git/pull/3487#issuecomment-950727616): read_object_file() repo_read_object_file() read_object_file_extended() read_object() oid_object_info_extended() do_oid_object_info_extended() loose_object_info() parse_loose_header_extended() packed_object_info() cache_or_unpack_entry() unpack_entry() unpack_object_header() unpack_object_header_buffer() get_size_from_delta() get_delta_hdr_size() All three leaves have code that needs to be adjusted to use `size_t` instead of `unsigned long`, and all of the other functions in that call tree need to be adjusted for that. Some of the callers do not even pass an `unsigned long` pointer around, but instead a pointer to `struct object_info` (which, you guessed it, also has an `unsigned long` that should have been a `size_t` from the beginning). This is too big a change I am willing to work on, let alone accept, this late in the cycle. Sure, it would fix the scenario where clean/smudge filters are not even involved (read: `git add`ing a 5GB file _directly_). But the potential for bugs to hide! > I guess I'll see soon enough why by reading later steps. I can see > that for the purpose of making write_entry() aware of the size_t, > this is necessary at the minimum. I fear that you won't see more about this in the later steps ;-) Ciao, Dscho
diff --git a/entry.c b/entry.c index 711ee0693c7..4cb3942dbdc 100644 --- a/entry.c +++ b/entry.c @@ -82,11 +82,13 @@ static int create_file(const char *path, unsigned int mode) return open(path, O_WRONLY | O_CREAT | O_EXCL, mode); } -void *read_blob_entry(const struct cache_entry *ce, unsigned long *size) +void *read_blob_entry(const struct cache_entry *ce, size_t *size) { enum object_type type; - void *blob_data = read_object_file(&ce->oid, &type, size); + unsigned long ul; + void *blob_data = read_object_file(&ce->oid, &type, &ul); + *size = ul; if (blob_data) { if (type == OBJ_BLOB) return blob_data; @@ -270,7 +272,7 @@ static int write_entry(struct cache_entry *ce, char *path, struct conv_attrs *ca int fd, ret, fstat_done = 0; char *new_blob; struct strbuf buf = STRBUF_INIT; - unsigned long size; + size_t size; ssize_t wrote; size_t newsize = 0; struct stat st; diff --git a/entry.h b/entry.h index b8c0e170dc7..61ee8c17604 100644 --- a/entry.h +++ b/entry.h @@ -51,7 +51,7 @@ int finish_delayed_checkout(struct checkout *state, int *nr_checkouts); */ void unlink_entry(const struct cache_entry *ce); -void *read_blob_entry(const struct cache_entry *ce, unsigned long *size); +void *read_blob_entry(const struct cache_entry *ce, size_t *size); int fstat_checkout_output(int fd, const struct checkout *state, struct stat *st); void update_ce_after_write(const struct checkout *state, struct cache_entry *ce, struct stat *st); diff --git a/parallel-checkout.c b/parallel-checkout.c index 6b1af32bb3d..b6f4a25642e 100644 --- a/parallel-checkout.c +++ b/parallel-checkout.c @@ -261,7 +261,7 @@ static int write_pc_item_to_fd(struct parallel_checkout_item *pc_item, int fd, struct stream_filter *filter; struct strbuf buf = STRBUF_INIT; char *blob; - unsigned long size; + size_t size; ssize_t wrote; /* Sanity check */ diff --git a/t/t1051-large-conversion.sh b/t/t1051-large-conversion.sh index bff86c13208..8b23d862600 100755 --- a/t/t1051-large-conversion.sh +++ b/t/t1051-large-conversion.sh @@ -85,7 +85,7 @@ test_expect_success 'ident converts on output' ' # This smudge filter prepends 5GB of zeros to the file it checks out. This # ensures that smudging doesn't mangle large files on 64-bit Windows. -test_expect_failure EXPENSIVE,SIZE_T_IS_64BIT,!LONG_IS_64BIT \ +test_expect_success EXPENSIVE,SIZE_T_IS_64BIT,!LONG_IS_64BIT \ 'files over 4GB convert on output' ' test_commit test small "a small file" && test_config filter.makelarge.smudge \