diff mbox series

[27/44] builtin/show-index: provide options to determine hash algo

Message ID 20200513005424.81369-28-sandals@crustytoothpaste.net (mailing list archive)
State New, archived
Headers show
Series SHA-256 part 2/3: protocol functionality | expand

Commit Message

brian m. carlson May 13, 2020, 12:54 a.m. UTC
It's possible to use a variety of index formats with show-index, and we
need a way to indicate the hash algorithm which is in use for a
particular index we'd like to show.  Default to using the value for the
repository we're in by calling setup_git_directory_gently, and allow
overriding it by using a --hash argument.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 builtin/show-index.c | 29 ++++++++++++++++++++++++-----
 git.c                |  2 +-
 2 files changed, 25 insertions(+), 6 deletions(-)

Comments

Junio C Hamano May 18, 2020, 4:20 p.m. UTC | #1
"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> It's possible to use a variety of index formats with show-index, and we
> need a way to indicate the hash algorithm which is in use for a
> particular index we'd like to show.  Default to using the value for the
> repository we're in by calling setup_git_directory_gently, and allow
> overriding it by using a --hash argument.

I think you meant to say that "show-index" does not autodetect what
hash algorithm is used from its input, and the new argument is a way
for the user to help the command when the hash algorithm is
different from what is used in the current repository?

I ask because I found that your version can be read to say that
"show-index" can show the contents of a given pack index using any
hash algorithm we support, and the user can specify --hash=SHA-256
when running the command on a pack .idx that uses SHA-1 object names
to auto-convert it, and readers wouldn't be able to guess which was
meant with only the above five lines.

> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
> ---
>  builtin/show-index.c | 29 ++++++++++++++++++++++++-----
>  git.c                |  2 +-
>  2 files changed, 25 insertions(+), 6 deletions(-)
>
> diff --git a/builtin/show-index.c b/builtin/show-index.c
> index 0826f6a5a2..ebfa2e9abd 100644
> --- a/builtin/show-index.c
> +++ b/builtin/show-index.c
> @@ -1,9 +1,12 @@
>  #include "builtin.h"
>  #include "cache.h"
>  #include "pack.h"
> +#include "parse-options.h"
>  
> -static const char show_index_usage[] =
> -"git show-index";
> +static const char *const show_index_usage[] = {
> +	"git show-index [--hash=HASH]",
> +	NULL
> +};

Do we say --hash=SHA-1 etc. or --hash-algo=SHA-256 in other places?
Would the word "hash" alone clear enough that it does not refer to
a specific "hash" value but the name of an algorithm?

The generating side seems to use "index-pack --object-format=<algo>"
and the transport seems to use a capability "object-format=<algo>",
neither of which is directly visible to the end users, but I think
they follow "git init --object-format=<algo>", so we are consistent
there.

Perhaps we should follow suit here, too?

>  int cmd_show_index(int argc, const char **argv, const char *prefix)
>  {
> @@ -11,10 +14,26 @@ int cmd_show_index(int argc, const char **argv, const char *prefix)
>  	unsigned nr;
>  	unsigned int version;
>  	static unsigned int top_index[256];
> -	const unsigned hashsz = the_hash_algo->rawsz;
> +	unsigned hashsz;
> +	const char *hash_name = NULL;
> +	int hash_algo;
> +	const struct option show_index_options[] = {
> +		OPT_STRING(0, "hash", &hash_name, N_("hash"),
> +			   N_("specify the hash algorithm to use")),

init-db has an entry identical to this except for the second token
given to the macro is "object-format" instead of "hash".  Both may
want to change what's inside N_() to "hash algorithm".

> +		OPT_END()
> +	};
> +
> +	argc = parse_options(argc, argv, prefix, show_index_options, show_index_usage, 0);
> +
> +	if (hash_name) {
> +		hash_algo = hash_algo_by_name(hash_name);
> +		if (hash_algo == GIT_HASH_UNKNOWN)
> +			die(_("Unknown hash algorithm"));
> +		repo_set_hash_algo(the_repository, hash_algo);
> +	}
> +
> +	hashsz = the_hash_algo->rawsz;
>  
> -	if (argc != 1)
> -		usage(show_index_usage);
>  	if (fread(top_index, 2 * 4, 1, stdin) != 1)
>  		die("unable to read header");
>  	if (top_index[0] == htonl(PACK_IDX_SIGNATURE)) {
> diff --git a/git.c b/git.c
> index 2e4efb4ff0..e53e8159a2 100644
> --- a/git.c
> +++ b/git.c
> @@ -573,7 +573,7 @@ static struct cmd_struct commands[] = {
>  	{ "shortlog", cmd_shortlog, RUN_SETUP_GENTLY | USE_PAGER },
>  	{ "show", cmd_show, RUN_SETUP },
>  	{ "show-branch", cmd_show_branch, RUN_SETUP },
> -	{ "show-index", cmd_show_index },
> +	{ "show-index", cmd_show_index, RUN_SETUP_GENTLY },

Hmph, this is not necessary to support peeking an .idx file in
another repository that uses a different hash algorithm than ours
(we do need the --hash=<algo> override to tell that the algo is
different from what we read from our repository settings).  Is this
absolutely necessary?

Ah, I am misreading the patch.  We didn't even do setup but we now
optionally do, in order to see if we are in a repository and what
object format it uses to give the default value to --hash=<algo>
when the argument is not given.  The need for RUN_SETUP_GENTLY
is understandable.

As we do not take any path argument on the command line, the other
side effect of setup_git_directory() that takes us up to the top
level of the working tree does not hurt us, either, so this is a
good change, I think.

Thanks.



>  	{ "show-ref", cmd_show_ref, RUN_SETUP },
>  	{ "sparse-checkout", cmd_sparse_checkout, RUN_SETUP | NEED_WORK_TREE },
>  	{ "stage", cmd_add, RUN_SETUP | NEED_WORK_TREE },
brian m. carlson May 19, 2020, 12:31 a.m. UTC | #2
On 2020-05-18 at 16:20:22, Junio C Hamano wrote:
> "brian m. carlson" <sandals@crustytoothpaste.net> writes:
> 
> > It's possible to use a variety of index formats with show-index, and we
> > need a way to indicate the hash algorithm which is in use for a
> > particular index we'd like to show.  Default to using the value for the
> > repository we're in by calling setup_git_directory_gently, and allow
> > overriding it by using a --hash argument.
> 
> I think you meant to say that "show-index" does not autodetect what
> hash algorithm is used from its input, and the new argument is a way
> for the user to help the command when the hash algorithm is
> different from what is used in the current repository?

Correct.

> I ask because I found that your version can be read to say that
> "show-index" can show the contents of a given pack index using any
> hash algorithm we support, and the user can specify --hash=SHA-256
> when running the command on a pack .idx that uses SHA-1 object names
> to auto-convert it, and readers wouldn't be able to guess which was
> meant with only the above five lines.

No, that's definitely not what I meant.  I'll adjust the commit message
to make this clearer.

> Do we say --hash=SHA-1 etc. or --hash-algo=SHA-256 in other places?
> Would the word "hash" alone clear enough that it does not refer to
> a specific "hash" value but the name of an algorithm?
> 
> The generating side seems to use "index-pack --object-format=<algo>"
> and the transport seems to use a capability "object-format=<algo>",
> neither of which is directly visible to the end users, but I think
> they follow "git init --object-format=<algo>", so we are consistent
> there.
> 
> Perhaps we should follow suit here, too?

Yeah, as I mentioned to Martin elsewhere in the thread, I'm going to
make this consistent and use --object-formta.

> > diff --git a/git.c b/git.c
> > index 2e4efb4ff0..e53e8159a2 100644
> > --- a/git.c
> > +++ b/git.c
> > @@ -573,7 +573,7 @@ static struct cmd_struct commands[] = {
> >  	{ "shortlog", cmd_shortlog, RUN_SETUP_GENTLY | USE_PAGER },
> >  	{ "show", cmd_show, RUN_SETUP },
> >  	{ "show-branch", cmd_show_branch, RUN_SETUP },
> > -	{ "show-index", cmd_show_index },
> > +	{ "show-index", cmd_show_index, RUN_SETUP_GENTLY },
> 
> Hmph, this is not necessary to support peeking an .idx file in
> another repository that uses a different hash algorithm than ours
> (we do need the --hash=<algo> override to tell that the algo is
> different from what we read from our repository settings).  Is this
> absolutely necessary?
> 
> Ah, I am misreading the patch.  We didn't even do setup but we now
> optionally do, in order to see if we are in a repository and what
> object format it uses to give the default value to --hash=<algo>
> when the argument is not given.  The need for RUN_SETUP_GENTLY
> is understandable.

Yes, this is designed to make us do the right thing when we're in a
repository (e.g., with --stdin) by autodetecting the algorithm in use
but not fail when we're outside of a repository.  I'll update the commit
message to make this a lot clearer, since I clearly omitted a lot of
things that were in my head when writing this.
diff mbox series

Patch

diff --git a/builtin/show-index.c b/builtin/show-index.c
index 0826f6a5a2..ebfa2e9abd 100644
--- a/builtin/show-index.c
+++ b/builtin/show-index.c
@@ -1,9 +1,12 @@ 
 #include "builtin.h"
 #include "cache.h"
 #include "pack.h"
+#include "parse-options.h"
 
-static const char show_index_usage[] =
-"git show-index";
+static const char *const show_index_usage[] = {
+	"git show-index [--hash=HASH]",
+	NULL
+};
 
 int cmd_show_index(int argc, const char **argv, const char *prefix)
 {
@@ -11,10 +14,26 @@  int cmd_show_index(int argc, const char **argv, const char *prefix)
 	unsigned nr;
 	unsigned int version;
 	static unsigned int top_index[256];
-	const unsigned hashsz = the_hash_algo->rawsz;
+	unsigned hashsz;
+	const char *hash_name = NULL;
+	int hash_algo;
+	const struct option show_index_options[] = {
+		OPT_STRING(0, "hash", &hash_name, N_("hash"),
+			   N_("specify the hash algorithm to use")),
+		OPT_END()
+	};
+
+	argc = parse_options(argc, argv, prefix, show_index_options, show_index_usage, 0);
+
+	if (hash_name) {
+		hash_algo = hash_algo_by_name(hash_name);
+		if (hash_algo == GIT_HASH_UNKNOWN)
+			die(_("Unknown hash algorithm"));
+		repo_set_hash_algo(the_repository, hash_algo);
+	}
+
+	hashsz = the_hash_algo->rawsz;
 
-	if (argc != 1)
-		usage(show_index_usage);
 	if (fread(top_index, 2 * 4, 1, stdin) != 1)
 		die("unable to read header");
 	if (top_index[0] == htonl(PACK_IDX_SIGNATURE)) {
diff --git a/git.c b/git.c
index 2e4efb4ff0..e53e8159a2 100644
--- a/git.c
+++ b/git.c
@@ -573,7 +573,7 @@  static struct cmd_struct commands[] = {
 	{ "shortlog", cmd_shortlog, RUN_SETUP_GENTLY | USE_PAGER },
 	{ "show", cmd_show, RUN_SETUP },
 	{ "show-branch", cmd_show_branch, RUN_SETUP },
-	{ "show-index", cmd_show_index },
+	{ "show-index", cmd_show_index, RUN_SETUP_GENTLY },
 	{ "show-ref", cmd_show_ref, RUN_SETUP },
 	{ "sparse-checkout", cmd_sparse_checkout, RUN_SETUP | NEED_WORK_TREE },
 	{ "stage", cmd_add, RUN_SETUP | NEED_WORK_TREE },