Message ID | patch-1.3-68a7709fe5-20210409T080534Z-avarab@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | blob/object.c: trivial readability improvements | expand |
On Fri, Apr 09, 2021 at 10:07:27AM +0200, Ævar Arnfjörð Bjarmason wrote: > As noted in the comment introduced in 837d395a5c0 (Replace > parse_blob() with an explanatory comment, 2010-01-18) the old > parse_blob() function and the current parse_blob_buffer() exist merely > to provide consistency in the API. > > We're not going to parse blobs like we "parse" commits, trees or > tags. So let's not have the parse_blob_buffer() take arguments that > pretends that we do. Its only use is to set the "parsed" flag. > > See bd2c39f58f9 ([PATCH] don't load and decompress objects twice with > parse_object(), 2005-05-06) for the introduction of parse_blob_buffer(). OK. Calling it parse_blob_buffer() is a little silly since it doesn't even take a buffer anymore. But I guess parse_blob() might imply that it actually loads the contents from disk to check them (which the other parse_foo() functions do), so that's not a good name. So this might be the least bad thing. Given that there are only two callers, just setting blob->object.parsed might not be unreasonable, either. But I don't think it's worth spending too much time on. > @@ -266,7 +266,7 @@ struct object *parse_object(struct repository *r, const struct object_id *oid) > error(_("hash mismatch %s"), oid_to_hex(oid)); > return NULL; > } > - parse_blob_buffer(lookup_blob(r, oid), NULL, 0); > + parse_blob_buffer(lookup_blob(r, oid)); > return lookup_object(r, oid); Not new in your patch, but I wondered if this could cause a segfault when lookup_blob() returns NULL. I _think_ the answer is "no". We'd hit this code path when either: - lookup_object() returns an object with type OBJ_BLOB, in which case lookup_blob() would return that same object - lookup_object() returned NULL, in which case lookup_blob() will call it again, get NULL again, and then auto-create the blob and return it So I think it is OK. But there are a bunch of duplicate hash lookups in this code. It would be clearer and more efficient as: diff --git a/object.c b/object.c index 2c32691dc4..2dfa038f13 100644 --- a/object.c +++ b/object.c @@ -262,12 +262,14 @@ struct object *parse_object(struct repository *r, const struct object_id *oid) if ((obj && obj->type == OBJ_BLOB && repo_has_object_file(r, oid)) || (!obj && repo_has_object_file(r, oid) && oid_object_info(r, oid, NULL) == OBJ_BLOB)) { + if (!obj) + obj = create_object(r, oid, alloc_blob_node(r)); if (check_object_signature(r, repl, NULL, 0, NULL) < 0) { error(_("hash mismatch %s"), oid_to_hex(oid)); return NULL; } - parse_blob_buffer(lookup_blob(r, oid), NULL, 0); - return lookup_object(r, oid); + parse_blob_buffer(obj, NULL, 0); + return obj; } buffer = repo_read_object_file(r, oid, &type, &size); but I doubt the efficiency matters much in practice. Those hash lookups will be lost in the noise of computing the hash of the blob contents. -Peff
Jeff King <peff@peff.net> writes: > OK. Calling it parse_blob_buffer() is a little silly since it doesn't > even take a buffer anymore. But I guess parse_blob() might imply that it > actually loads the contents from disk to check them (which the other > parse_foo() functions do), so that's not a good name. mark_object_as_parsed(), perhaps? > So this might be the least bad thing. Given that there are only two > callers, just setting blob->object.parsed might not be unreasonable, > either. But I don't think it's worth spending too much time on. Yup.
On Fri, Apr 09 2021, Jeff King wrote: > On Fri, Apr 09, 2021 at 10:07:27AM +0200, Ævar Arnfjörð Bjarmason wrote: > >> As noted in the comment introduced in 837d395a5c0 (Replace >> parse_blob() with an explanatory comment, 2010-01-18) the old >> parse_blob() function and the current parse_blob_buffer() exist merely >> to provide consistency in the API. >> >> We're not going to parse blobs like we "parse" commits, trees or >> tags. So let's not have the parse_blob_buffer() take arguments that >> pretends that we do. Its only use is to set the "parsed" flag. >> >> See bd2c39f58f9 ([PATCH] don't load and decompress objects twice with >> parse_object(), 2005-05-06) for the introduction of parse_blob_buffer(). > > OK. Calling it parse_blob_buffer() is a little silly since it doesn't > even take a buffer anymore. But I guess parse_blob() might imply that it > actually loads the contents from disk to check them (which the other > parse_foo() functions do), so that's not a good name. > > So this might be the least bad thing. Given that there are only two > callers, just setting blob->object.parsed might not be unreasonable, > either. But I don't think it's worth spending too much time on. > >> @@ -266,7 +266,7 @@ struct object *parse_object(struct repository *r, const struct object_id *oid) >> error(_("hash mismatch %s"), oid_to_hex(oid)); >> return NULL; >> } >> - parse_blob_buffer(lookup_blob(r, oid), NULL, 0); >> + parse_blob_buffer(lookup_blob(r, oid)); >> return lookup_object(r, oid); > > Not new in your patch, but I wondered if this could cause a segfault > when lookup_blob() returns NULL. I _think_ the answer is "no". We'd hit > this code path when either: > > - lookup_object() returns an object with type OBJ_BLOB, in which case > lookup_blob() would return that same object > > - lookup_object() returned NULL, in which case lookup_blob() will call > it again, get NULL again, and then auto-create the blob and return > it > > So I think it is OK. But there are a bunch of duplicate hash lookups in > this code. It would be clearer and more efficient as: > > diff --git a/object.c b/object.c > index 2c32691dc4..2dfa038f13 100644 > --- a/object.c > +++ b/object.c > @@ -262,12 +262,14 @@ struct object *parse_object(struct repository *r, const struct object_id *oid) > if ((obj && obj->type == OBJ_BLOB && repo_has_object_file(r, oid)) || > (!obj && repo_has_object_file(r, oid) && > oid_object_info(r, oid, NULL) == OBJ_BLOB)) { > + if (!obj) > + obj = create_object(r, oid, alloc_blob_node(r)); > if (check_object_signature(r, repl, NULL, 0, NULL) < 0) { > error(_("hash mismatch %s"), oid_to_hex(oid)); > return NULL; > } > - parse_blob_buffer(lookup_blob(r, oid), NULL, 0); > - return lookup_object(r, oid); > + parse_blob_buffer(obj, NULL, 0); > + return obj; > } > > buffer = repo_read_object_file(r, oid, &type, &size); > > but I doubt the efficiency matters much in practice. Those hash lookups > will be lost in the noise of computing the hash of the blob contents. I was trying to keep the changes smaller, but what about just doing this?: diff --git a/blob.c b/blob.c index 182718aba9..69293e7d8e 100644 --- a/blob.c +++ b/blob.c @@ -5,16 +5,16 @@ const char *blob_type = "blob"; +struct blob *create_blob(struct repository *r, const struct object_id *oid) +{ + return create_object(r, oid, alloc_blob_node(r)); +} + struct blob *lookup_blob(struct repository *r, const struct object_id *oid) { struct object *obj = lookup_object(r, oid); if (!obj) - return create_object(r, oid, alloc_blob_node(r)); - return object_as_type(obj, OBJ_BLOB, 0); -} + return create_blob(r, oid); -int parse_blob_buffer(struct blob *item, void *buffer, unsigned long size) -{ - item->object.parsed = 1; - return 0; + return object_as_type(obj, OBJ_BLOB, 0); } diff --git a/blob.h b/blob.h index 1664872055..ad34f0e9cc 100644 --- a/blob.h +++ b/blob.h @@ -9,10 +9,9 @@ struct blob { struct object object; }; +struct blob *create_blob(struct repository *r, const struct object_id *oid); struct blob *lookup_blob(struct repository *r, const struct object_id *oid); -int parse_blob_buffer(struct blob *item, void *buffer, unsigned long size); - /** * Blobs do not contain references to other objects and do not have * structured data that needs parsing. However, code may use the diff --git a/object.c b/object.c index 78343781ae..2699431404 100644 --- a/object.c +++ b/object.c @@ -195,8 +195,7 @@ struct object *parse_object_buffer(struct repository *r, const struct object_id if (type == OBJ_BLOB) { struct blob *blob = lookup_blob(r, oid); if (blob) { - if (parse_blob_buffer(blob, buffer, size)) - return NULL; + blob->object.parsed = 1; obj = &blob->object; } } else if (type == OBJ_TREE) { @@ -262,12 +261,16 @@ struct object *parse_object(struct repository *r, const struct object_id *oid) if ((obj && obj->type == OBJ_BLOB && repo_has_object_file(r, oid)) || (!obj && repo_has_object_file(r, oid) && oid_object_info(r, oid, NULL) == OBJ_BLOB)) { + struct blob *blob; + if (!obj) + blob = create_blob(r, oid); if (check_object_signature(r, repl, NULL, 0, NULL) < 0) { error(_("hash mismatch %s"), oid_to_hex(oid)); return NULL; } - parse_blob_buffer(lookup_blob(r, oid), NULL, 0); - return lookup_object(r, oid); + obj = &blob->object; + obj->parsed = 1; + return obj; } buffer = repo_read_object_file(r, oid, &type, &size);
On Sat, Apr 10 2021, Ævar Arnfjörð Bjarmason wrote: > On Fri, Apr 09 2021, Jeff King wrote: > >> On Fri, Apr 09, 2021 at 10:07:27AM +0200, Ævar Arnfjörð Bjarmason wrote: >> >>> As noted in the comment introduced in 837d395a5c0 (Replace >>> parse_blob() with an explanatory comment, 2010-01-18) the old >>> parse_blob() function and the current parse_blob_buffer() exist merely >>> to provide consistency in the API. >>> >>> We're not going to parse blobs like we "parse" commits, trees or >>> tags. So let's not have the parse_blob_buffer() take arguments that >>> pretends that we do. Its only use is to set the "parsed" flag. >>> >>> See bd2c39f58f9 ([PATCH] don't load and decompress objects twice with >>> parse_object(), 2005-05-06) for the introduction of parse_blob_buffer(). >> >> OK. Calling it parse_blob_buffer() is a little silly since it doesn't >> even take a buffer anymore. But I guess parse_blob() might imply that it >> actually loads the contents from disk to check them (which the other >> parse_foo() functions do), so that's not a good name. >> >> So this might be the least bad thing. Given that there are only two >> callers, just setting blob->object.parsed might not be unreasonable, >> either. But I don't think it's worth spending too much time on. >> >>> @@ -266,7 +266,7 @@ struct object *parse_object(struct repository *r, const struct object_id *oid) >>> error(_("hash mismatch %s"), oid_to_hex(oid)); >>> return NULL; >>> } >>> - parse_blob_buffer(lookup_blob(r, oid), NULL, 0); >>> + parse_blob_buffer(lookup_blob(r, oid)); >>> return lookup_object(r, oid); >> >> Not new in your patch, but I wondered if this could cause a segfault >> when lookup_blob() returns NULL. I _think_ the answer is "no". We'd hit >> this code path when either: >> >> - lookup_object() returns an object with type OBJ_BLOB, in which case >> lookup_blob() would return that same object >> >> - lookup_object() returned NULL, in which case lookup_blob() will call >> it again, get NULL again, and then auto-create the blob and return >> it >> >> So I think it is OK. But there are a bunch of duplicate hash lookups in >> this code. It would be clearer and more efficient as: >> >> diff --git a/object.c b/object.c >> index 2c32691dc4..2dfa038f13 100644 >> --- a/object.c >> +++ b/object.c >> @@ -262,12 +262,14 @@ struct object *parse_object(struct repository *r, const struct object_id *oid) >> if ((obj && obj->type == OBJ_BLOB && repo_has_object_file(r, oid)) || >> (!obj && repo_has_object_file(r, oid) && >> oid_object_info(r, oid, NULL) == OBJ_BLOB)) { >> + if (!obj) >> + obj = create_object(r, oid, alloc_blob_node(r)); >> if (check_object_signature(r, repl, NULL, 0, NULL) < 0) { >> error(_("hash mismatch %s"), oid_to_hex(oid)); >> return NULL; >> } >> - parse_blob_buffer(lookup_blob(r, oid), NULL, 0); >> - return lookup_object(r, oid); >> + parse_blob_buffer(obj, NULL, 0); >> + return obj; >> } >> >> buffer = repo_read_object_file(r, oid, &type, &size); >> >> but I doubt the efficiency matters much in practice. Those hash lookups >> will be lost in the noise of computing the hash of the blob contents. > > I was trying to keep the changes smaller, but what about just doing this?: Sent a bit too soon...: > diff --git a/blob.c b/blob.c > index 182718aba9..69293e7d8e 100644 > --- a/blob.c > +++ b/blob.c > @@ -5,16 +5,16 @@ > > const char *blob_type = "blob"; > > +struct blob *create_blob(struct repository *r, const struct object_id *oid) > +{ > + return create_object(r, oid, alloc_blob_node(r)); > +} > + > struct blob *lookup_blob(struct repository *r, const struct object_id *oid) > { > struct object *obj = lookup_object(r, oid); > if (!obj) > - return create_object(r, oid, alloc_blob_node(r)); > - return object_as_type(obj, OBJ_BLOB, 0); > -} > + return create_blob(r, oid); > > -int parse_blob_buffer(struct blob *item, void *buffer, unsigned long size) > -{ > - item->object.parsed = 1; > - return 0; > + return object_as_type(obj, OBJ_BLOB, 0); > } > diff --git a/blob.h b/blob.h > index 1664872055..ad34f0e9cc 100644 > --- a/blob.h > +++ b/blob.h > @@ -9,10 +9,9 @@ struct blob { > struct object object; > }; > > +struct blob *create_blob(struct repository *r, const struct object_id *oid); > struct blob *lookup_blob(struct repository *r, const struct object_id *oid); > > -int parse_blob_buffer(struct blob *item, void *buffer, unsigned long size); > - > /** > * Blobs do not contain references to other objects and do not have > * structured data that needs parsing. However, code may use the > diff --git a/object.c b/object.c > index 78343781ae..2699431404 100644 > --- a/object.c > +++ b/object.c > @@ -195,8 +195,7 @@ struct object *parse_object_buffer(struct repository *r, const struct object_id > if (type == OBJ_BLOB) { > struct blob *blob = lookup_blob(r, oid); > if (blob) { > - if (parse_blob_buffer(blob, buffer, size)) > - return NULL; > + blob->object.parsed = 1; > obj = &blob->object; > } > } else if (type == OBJ_TREE) { > @@ -262,12 +261,16 @@ struct object *parse_object(struct repository *r, const struct object_id *oid) > if ((obj && obj->type == OBJ_BLOB && repo_has_object_file(r, oid)) || > (!obj && repo_has_object_file(r, oid) && > oid_object_info(r, oid, NULL) == OBJ_BLOB)) { > + struct blob *blob; > + if (!obj) > + blob = create_blob(r, oid); > > if (check_object_signature(r, repl, NULL, 0, NULL) < 0) { > error(_("hash mismatch %s"), oid_to_hex(oid)); > return NULL; > } > - parse_blob_buffer(lookup_blob(r, oid), NULL, 0); > - return lookup_object(r, oid); > + obj = &blob->object; > + obj->parsed = 1; > + return obj; > } > > buffer = repo_read_object_file(r, oid, &type, &size); Well, aside from this segfault think-o introduced while experimenting. Needs to be: @@ -262,12 +261,16 @@ struct object *parse_object(struct repository *r, const struct object_id *oid) if ((obj && obj->type == OBJ_BLOB && repo_has_object_file(r, oid)) || (!obj && repo_has_object_file(r, oid) && oid_object_info(r, oid, NULL) == OBJ_BLOB)) { + if (!obj) { + struct blob *blob = create_blob(r, oid); + obj = &blob->object; + } if (check_object_signature(r, repl, NULL, 0, NULL) < 0) { error(_("hash mismatch %s"), oid_to_hex(oid)); return NULL; } - parse_blob_buffer(lookup_blob(r, oid), NULL, 0); - return lookup_object(r, oid); + obj->parsed = 1; + return obj; } buffer = repo_read_object_file(r, oid, &type, &size);
On Sat, Apr 10, 2021 at 02:57:12PM +0200, Ævar Arnfjörð Bjarmason wrote: > > Not new in your patch, but I wondered if this could cause a segfault > > when lookup_blob() returns NULL. I _think_ the answer is "no". We'd hit > > this code path when either: > [...] > I was trying to keep the changes smaller, but what about just doing this?: > [...] Yeah, that seems pretty reasonable to me. It cleans up the extra lookups in parse_object() and gets rid of the funny-named "parse_blob_buffer()" that takes no buffer. -Peff
diff --git a/blob.c b/blob.c index 182718aba9..389a7546dc 100644 --- a/blob.c +++ b/blob.c @@ -13,7 +13,7 @@ struct blob *lookup_blob(struct repository *r, const struct object_id *oid) return object_as_type(obj, OBJ_BLOB, 0); } -int parse_blob_buffer(struct blob *item, void *buffer, unsigned long size) +int parse_blob_buffer(struct blob *item) { item->object.parsed = 1; return 0; diff --git a/blob.h b/blob.h index 1664872055..ac1d4804a5 100644 --- a/blob.h +++ b/blob.h @@ -11,8 +11,6 @@ struct blob { struct blob *lookup_blob(struct repository *r, const struct object_id *oid); -int parse_blob_buffer(struct blob *item, void *buffer, unsigned long size); - /** * Blobs do not contain references to other objects and do not have * structured data that needs parsing. However, code may use the @@ -21,5 +19,6 @@ int parse_blob_buffer(struct blob *item, void *buffer, unsigned long size); * parse_blob_buffer() is used (by object.c) to flag that the object * has been read successfully from the database. **/ +int parse_blob_buffer(struct blob *item); #endif /* BLOB_H */ diff --git a/object.c b/object.c index 78343781ae..63896abf01 100644 --- a/object.c +++ b/object.c @@ -195,7 +195,7 @@ struct object *parse_object_buffer(struct repository *r, const struct object_id if (type == OBJ_BLOB) { struct blob *blob = lookup_blob(r, oid); if (blob) { - if (parse_blob_buffer(blob, buffer, size)) + if (parse_blob_buffer(blob)) return NULL; obj = &blob->object; } @@ -266,7 +266,7 @@ struct object *parse_object(struct repository *r, const struct object_id *oid) error(_("hash mismatch %s"), oid_to_hex(oid)); return NULL; } - parse_blob_buffer(lookup_blob(r, oid), NULL, 0); + parse_blob_buffer(lookup_blob(r, oid)); return lookup_object(r, oid); }
As noted in the comment introduced in 837d395a5c0 (Replace parse_blob() with an explanatory comment, 2010-01-18) the old parse_blob() function and the current parse_blob_buffer() exist merely to provide consistency in the API. We're not going to parse blobs like we "parse" commits, trees or tags. So let's not have the parse_blob_buffer() take arguments that pretends that we do. Its only use is to set the "parsed" flag. See bd2c39f58f9 ([PATCH] don't load and decompress objects twice with parse_object(), 2005-05-06) for the introduction of parse_blob_buffer(). I'm moving the prototype of parse_blob_buffer() below the comment added in 837d395a5c0 while I'm at it. That comment was originally meant to be a replacement for the missing parse_blob() function, but it's much less confusing to have it be above the parse_blob_buffer() function it refers to. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> --- blob.c | 2 +- blob.h | 3 +-- object.c | 4 ++-- 3 files changed, 4 insertions(+), 5 deletions(-)