Message ID | 20231002024034.2611-1-ebiederm@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | initial support for multiple hash functions | expand |
"Eric W. Biederman" <ebiederm@gmail.com> writes: > From: "Eric W. Biederman" <ebiederm@xmission.com> > > Two basic functions are provided: > - convert_object_file Takes an object file it's type and hash algorithm > and converts it into the equivalent object file that would > have been generated with hash algorithm "to". Should probably be - convert_object_file takes an object file, its type, and hash algorithm and converts it into the equivalent object file using hash algorithm "to". It would be nice if you gave the function name some clue of what sort of conversion is being done though. Something like "convert_object_file_hash" or "switch_object_file_hash". I like "switch" because the word "hash" itself means to convert some input bytes into another set of (aka "hashed") bytes, and the indirect shadowing of "convert" in this way can be avoided here by using a different word.The whole point of this function is to switch the hashing scheme from one to another, while still keeping everything else the same, so "switch" seems more appropriate. Unless, of course, we already pervasively use "convert" this way elsewhere in the codebase (I have not checked). > For blob objects there is no conversation to be done and it is an s/conversation/conversion > error to use this function on them. Could you explain why no conversion is needed for blob objects, and also why it should be an error (and not just a NOP)? In the code we can also call BUG() if the from/to algos are the same. It's probably worth mentioning in here as well? Also for such detailed explanations, I think it's much better to place them as comments directly above the function (and only mention the important bits about these helper functions, other than the fact that they will come in handy in later patches, in the log message). > For commit, tree, and tag objects embedded oids are replaced by the > oids of the objects they refer to with those objects and their > object ids reencoded in with the hash algorithm "to". That's a little wordy. I assume embedded oids just mean oids that these objects refer to (e.g., commit objects have oids, but for example the tree referred by a commit would be an example of an embedded oid). If so, then how about just For commit, tree, and tag objects both their oids and embedded (dependent) oids are converted using hash algorithm "to". I dropped "reencoded" in favor of "converted" btecause that's the verb you use in your function name "convert_object_file()". > Signatures > are rearranged so that they remain valid after the object has > been reencoded. Maybe s/reencoded/converted here as well? But also, it sounds odd to me that signatures are simply "rearranged" and not "regenerated" or "recreated" becaues "rearranged" means keeping most of the old stuff around but just repositioning them, which doesn't sound like it's doing justice to the meaning of a hash algo transition. > - repo_oid_to_algop which takes an oid that refers to an object file > and returns the oid of the equivalent object file generated > with the target hash algorithm. The name is odd to me because "repo_oid" doesn't make sense (a repo is not an object so it doesn't have an oid), but also the "to_algop" name (AFAICS "algop" just means "pointer to algorithm" in the codebase, for examle in <hash-ll.h>). Here's a possible rewording: - switch_oid_hash takes an object file's oid and returns a new one using hash algorithm "to". > The pair of files object-file-convert.c and object-file-convert.h are > introduced to hold as much of this logic as possible to keep this > conversion logic cleanly separated from everything else and in the > hopes that someday the code will be clean enough git can support Did you mean "clean enough so that Git ..."? > compiling out support for sha1 and the various conversion functions. FYI you can cut down this ambitious sentence into two for readability, like this: The new files object-file-convert.{c,h} hold as much of this logic as possible to keep this conversion logic cleanly separated from everything else. This separation will help us easily phase out SHA1 support (perhaps as a compile-time flag) in the future. > Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> > --- > Makefile | 1 + > object-file-convert.c | 57 +++++++++++++++++++++++++++++++++++++++++++ > object-file-convert.h | 24 ++++++++++++++++++ > 3 files changed, 82 insertions(+) > create mode 100644 object-file-convert.c > create mode 100644 object-file-convert.h > > diff --git a/Makefile b/Makefile > index 577630936535..f7e824f25cda 100644 > --- a/Makefile > +++ b/Makefile > @@ -1073,6 +1073,7 @@ LIB_OBJS += notes-cache.o > LIB_OBJS += notes-merge.o > LIB_OBJS += notes-utils.o > LIB_OBJS += notes.o > +LIB_OBJS += object-file-convert.o > LIB_OBJS += object-file.o > LIB_OBJS += object-name.o > LIB_OBJS += object.o > diff --git a/object-file-convert.c b/object-file-convert.c > new file mode 100644 > index 000000000000..4777aba83636 > --- /dev/null > +++ b/object-file-convert.c > @@ -0,0 +1,57 @@ > +#include "git-compat-util.h" > +#include "gettext.h" > +#include "strbuf.h" > +#include "repository.h" > +#include "hash-ll.h" > +#include "object.h" > +#include "object-file-convert.h" > + > +int repo_oid_to_algop(struct repository *repo, const struct object_id *src, > + const struct git_hash_algo *to, struct object_id *dest) > +{ > + /* > + * If the source algorithm is not set, then we're using the > + * default hash algorithm for that object. > + */ > + const struct git_hash_algo *from = > + src->algo ? &hash_algos[src->algo] : repo->hash_algo; Hmm, if we're only using "repo" to grab its hash_algo member as a fallback, should this function be broken down into 2, one to get the fallback algo and another that has the meat of this function without the "repo" parameter? I haven't read the rest of the series yet, let me keep reading. > + > + if (from == to) { > + if (src != dest) > + oidcpy(dest, src); > + return 0; > + } > + return -1; > +} It's curious to me that you treat same-hash-algo as a NOP here, but as a BUG() in the other helper. Why the difference (perhaps add a comment)? > +int convert_object_file(struct strbuf *outbuf, > + const struct git_hash_algo *from, > + const struct git_hash_algo *to, > + const void *buf, size_t len, > + enum object_type type, > + int gentle) Is there a particular reason you chose to put the out parameter (outbuf) at the beginning, rather than at the end? In the other helper function the "dest" out param comes at the end. Style conflict...? Also, you don't use buf or len, so you could have kept them out from this patch (so that you only add them later as needed). > +{ > + int ret; > + > + /* Don't call this function when no conversion is necessary */ Please avoid double-negation. How about simply /* Refuse nonsensical conversion */ or simply drop the comment (as the BUG() description already serves the same purpose?) > + if ((from == to) || (type == OBJ_BLOB)) > + BUG("Refusing noop object file conversion"); I think it would be better if you separated these out and gave them different BUG() messages. If we barf with a BUG() it would be so much more helpful if the message we get is as specific as possible, rather than leaving us guessing whether (as in this case) we passed in identical hash algos or whether we tried to handle a blob object. Since you already have the switch statement below, the OBJ_BLOB case could go there easily enough. Nit: I think BUG() messages are not supposed to be capitalized. > + switch (type) { > + case OBJ_COMMIT: > + case OBJ_TREE: > + case OBJ_TAG: > + default: > + /* Not implemented yet, so fail. */ > + ret = -1; > + break; > + } > + if (!ret) > + return 0; > + if (gentle) { > + strbuf_release(outbuf); What does "gentle" mean here? But also if you are just freeing the outbuf, then why not just name this param "free_outbuf"? But also it seems a bit odd that the caller (who presumably owns the outbuf) is asking a conversion function to possibly free it before returning. > + return ret; > + } > + die(_("Failed to convert object from %s to %s"), > + from->name, to->name); > +} > diff --git a/object-file-convert.h b/object-file-convert.h > new file mode 100644 > index 000000000000..a4f802aa8eea > --- /dev/null > +++ b/object-file-convert.h > @@ -0,0 +1,24 @@ > +#ifndef OBJECT_CONVERT_H > +#define OBJECT_CONVERT_H > + > +struct repository; > +struct object_id; > +struct git_hash_algo; > +struct strbuf; > +#include "object.h" It looks a bit odd that the forward declarations of these structs come before the #include line. I don't think that's the pattern in our codebase (maybe I'm wrong?). In hindsight, the log message could have added that switching back and forth between different hash algos is a fundamental (but currently missing) operation, and that this is the reason why these conversion functions (currently unfinished) are necessary as the first step for the patch series. I realize that you've stated as much in your cover letter, but it is always nice to have the intent embedded in the log message(s) where applicable (such as the case in this patch that introduces a brand new header file) to save future developers the hassle of looking up the relevant cover letter. Thanks. > +int repo_oid_to_algop(struct repository *repo, const struct object_id *src, > + const struct git_hash_algo *to, struct object_id *dest); > + > +/* > + * Convert an object file from one hash algorithm to another algorithm. > + * Return -1 on failure, 0 on success. > + */ > +int convert_object_file(struct strbuf *outbuf, > + const struct git_hash_algo *from, > + const struct git_hash_algo *to, > + const void *buf, size_t len, > + enum object_type type, > + int gentle); > + > +#endif /* OBJECT_CONVERT_H */ > -- > 2.41.0
On Sun, Oct 01, 2023 at 09:40:05PM -0500, Eric W. Biederman wrote: > From: "Eric W. Biederman" <ebiederm@xmission.com> > > Two basic functions are provided: > - convert_object_file Takes an object file it's type and hash algorithm > and converts it into the equivalent object file that would > have been generated with hash algorithm "to". > > For blob objects there is no conversation to be done and it is an > error to use this function on them. > > For commit, tree, and tag objects embedded oids are replaced by the > oids of the objects they refer to with those objects and their > object ids reencoded in with the hash algorithm "to". Signatures > are rearranged so that they remain valid after the object has > been reencoded. > > - repo_oid_to_algop which takes an oid that refers to an object file > and returns the oid of the equivalent object file generated > with the target hash algorithm. > > The pair of files object-file-convert.c and object-file-convert.h are > introduced to hold as much of this logic as possible to keep this > conversion logic cleanly separated from everything else and in the > hopes that someday the code will be clean enough git can support > compiling out support for sha1 and the various conversion functions. > > Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> > --- > Makefile | 1 + > object-file-convert.c | 57 +++++++++++++++++++++++++++++++++++++++++++ > object-file-convert.h | 24 ++++++++++++++++++ > 3 files changed, 82 insertions(+) > create mode 100644 object-file-convert.c > create mode 100644 object-file-convert.h > > diff --git a/Makefile b/Makefile > index 577630936535..f7e824f25cda 100644 > --- a/Makefile > +++ b/Makefile > @@ -1073,6 +1073,7 @@ LIB_OBJS += notes-cache.o > LIB_OBJS += notes-merge.o > LIB_OBJS += notes-utils.o > LIB_OBJS += notes.o > +LIB_OBJS += object-file-convert.o > LIB_OBJS += object-file.o > LIB_OBJS += object-name.o > LIB_OBJS += object.o > diff --git a/object-file-convert.c b/object-file-convert.c > new file mode 100644 > index 000000000000..4777aba83636 > --- /dev/null > +++ b/object-file-convert.c > @@ -0,0 +1,57 @@ > +#include "git-compat-util.h" > +#include "gettext.h" > +#include "strbuf.h" > +#include "repository.h" > +#include "hash-ll.h" > +#include "object.h" > +#include "object-file-convert.h" > + > +int repo_oid_to_algop(struct repository *repo, const struct object_id *src, > + const struct git_hash_algo *to, struct object_id *dest) > +{ > + /* > + * If the source algorithm is not set, then we're using the > + * default hash algorithm for that object. > + */ > + const struct git_hash_algo *from = > + src->algo ? &hash_algos[src->algo] : repo->hash_algo; > + > + if (from == to) { > + if (src != dest) > + oidcpy(dest, src); > + return 0; > + } > + return -1; > +} In it's current form, `repo_oid_to_algop()` basically never does anything except for copying over the object ID because we do not handle the case where object hashes are different. I assume this is intended, as we basically only provide stubs in this commit. But still, it would help to document this in-code as well with a comment. > +int convert_object_file(struct strbuf *outbuf, > + const struct git_hash_algo *from, > + const struct git_hash_algo *to, > + const void *buf, size_t len, > + enum object_type type, > + int gentle) > +{ > + int ret; > + > + /* Don't call this function when no conversion is necessary */ > + if ((from == to) || (type == OBJ_BLOB)) > + BUG("Refusing noop object file conversion"); The extra braces around comparisons are unneeded and to the best of my knowledge not customary in our code base. Also, error messages should start with a lower-case letter. > + switch (type) { > + case OBJ_COMMIT: > + case OBJ_TREE: > + case OBJ_TAG: > + default: > + /* Not implemented yet, so fail. */ > + ret = -1; > + break; > + } It's a bit weird that we handle all object types except for blobs separately, and then still have a `default` statement. I would've thought that we should handle the object types specifically and set `ret = -1` for all of them, and then the `default` case would instead call `BUG()` due to an unknown object type. > + if (!ret) > + return 0; > + if (gentle) { > + strbuf_release(outbuf); > + return ret; > + } Do you really intend to call `strbuf_release()` on the caller provided buffer, or should this rather be `strbuf_reset()`? Memory management of such an in/out parameter should typically be handled by the caller, not the callee. > + die(_("Failed to convert object from %s to %s"), > + from->name, to->name); > +} The error message should start with a lower-case letter. > diff --git a/object-file-convert.h b/object-file-convert.h > new file mode 100644 > index 000000000000..a4f802aa8eea > --- /dev/null > +++ b/object-file-convert.h > @@ -0,0 +1,24 @@ > +#ifndef OBJECT_CONVERT_H > +#define OBJECT_CONVERT_H > + > +struct repository; > +struct object_id; > +struct git_hash_algo; > +struct strbuf; > +#include "object.h" > + > +int repo_oid_to_algop(struct repository *repo, const struct object_id *src, > + const struct git_hash_algo *to, struct object_id *dest); > + > +/* > + * Convert an object file from one hash algorithm to another algorithm. > + * Return -1 on failure, 0 on success. > + */ > +int convert_object_file(struct strbuf *outbuf, > + const struct git_hash_algo *from, > + const struct git_hash_algo *to, > + const void *buf, size_t len, > + enum object_type type, > + int gentle); It would be nice to document what `gentle` does. Patrick > +#endif /* OBJECT_CONVERT_H */ > -- > 2.41.0 >
diff --git a/Makefile b/Makefile index 577630936535..f7e824f25cda 100644 --- a/Makefile +++ b/Makefile @@ -1073,6 +1073,7 @@ LIB_OBJS += notes-cache.o LIB_OBJS += notes-merge.o LIB_OBJS += notes-utils.o LIB_OBJS += notes.o +LIB_OBJS += object-file-convert.o LIB_OBJS += object-file.o LIB_OBJS += object-name.o LIB_OBJS += object.o diff --git a/object-file-convert.c b/object-file-convert.c new file mode 100644 index 000000000000..4777aba83636 --- /dev/null +++ b/object-file-convert.c @@ -0,0 +1,57 @@ +#include "git-compat-util.h" +#include "gettext.h" +#include "strbuf.h" +#include "repository.h" +#include "hash-ll.h" +#include "object.h" +#include "object-file-convert.h" + +int repo_oid_to_algop(struct repository *repo, const struct object_id *src, + const struct git_hash_algo *to, struct object_id *dest) +{ + /* + * If the source algorithm is not set, then we're using the + * default hash algorithm for that object. + */ + const struct git_hash_algo *from = + src->algo ? &hash_algos[src->algo] : repo->hash_algo; + + if (from == to) { + if (src != dest) + oidcpy(dest, src); + return 0; + } + return -1; +} + +int convert_object_file(struct strbuf *outbuf, + const struct git_hash_algo *from, + const struct git_hash_algo *to, + const void *buf, size_t len, + enum object_type type, + int gentle) +{ + int ret; + + /* Don't call this function when no conversion is necessary */ + if ((from == to) || (type == OBJ_BLOB)) + BUG("Refusing noop object file conversion"); + + switch (type) { + case OBJ_COMMIT: + case OBJ_TREE: + case OBJ_TAG: + default: + /* Not implemented yet, so fail. */ + ret = -1; + break; + } + if (!ret) + return 0; + if (gentle) { + strbuf_release(outbuf); + return ret; + } + die(_("Failed to convert object from %s to %s"), + from->name, to->name); +} diff --git a/object-file-convert.h b/object-file-convert.h new file mode 100644 index 000000000000..a4f802aa8eea --- /dev/null +++ b/object-file-convert.h @@ -0,0 +1,24 @@ +#ifndef OBJECT_CONVERT_H +#define OBJECT_CONVERT_H + +struct repository; +struct object_id; +struct git_hash_algo; +struct strbuf; +#include "object.h" + +int repo_oid_to_algop(struct repository *repo, const struct object_id *src, + const struct git_hash_algo *to, struct object_id *dest); + +/* + * Convert an object file from one hash algorithm to another algorithm. + * Return -1 on failure, 0 on success. + */ +int convert_object_file(struct strbuf *outbuf, + const struct git_hash_algo *from, + const struct git_hash_algo *to, + const void *buf, size_t len, + enum object_type type, + int gentle); + +#endif /* OBJECT_CONVERT_H */