diff mbox series

[v2,03/10] scalar-diagnose: add directory to archiver more gently

Message ID 23349bfaf8fc5f5001f1ed1fa19e9b3909466ae3.1659577543.git.gitgitgadget@gmail.com (mailing list archive)
State Superseded
Headers show
Series Generalize 'scalar diagnose' into 'git diagnose' and 'git bugreport --diagnose' | expand

Commit Message

Victoria Dye Aug. 4, 2022, 1:45 a.m. UTC
From: Victoria Dye <vdye@github.com>

If a directory added to the 'scalar diagnose' archiver does not exist, warn
and return 0 from 'add_directory_to_archiver()' rather than failing with a
fatal error. This handles a failure edge case where the '.git/logs' has not
yet been created when running 'scalar diagnose', but extends to any
situation where a directory may be missing in the '.git' dir.

Now, when a directory is missing a warning is captured in the diagnostic
logs. This provides a user with more complete information than if 'scalar
diagnose' simply failed with an error.

Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Victoria Dye <vdye@github.com>
---
 contrib/scalar/scalar.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

Comments

Ævar Arnfjörð Bjarmason Aug. 4, 2022, 6:19 a.m. UTC | #1
On Thu, Aug 04 2022, Victoria Dye via GitGitGadget wrote:

> From: Victoria Dye <vdye@github.com>
>
> If a directory added to the 'scalar diagnose' archiver does not exist, warn
> and return 0 from 'add_directory_to_archiver()' rather than failing with a
> fatal error. This handles a failure edge case where the '.git/logs' has not
> yet been created when running 'scalar diagnose', but extends to any
> situation where a directory may be missing in the '.git' dir.
>
> Now, when a directory is missing a warning is captured in the diagnostic
> logs. This provides a user with more complete information than if 'scalar
> diagnose' simply failed with an error.
>
> Helped-by: Junio C Hamano <gitster@pobox.com>
> Signed-off-by: Victoria Dye <vdye@github.com>
> ---
>  contrib/scalar/scalar.c | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
> index 04046452284..b9092f0b612 100644
> --- a/contrib/scalar/scalar.c
> +++ b/contrib/scalar/scalar.c
> @@ -266,14 +266,20 @@ static int add_directory_to_archiver(struct strvec *archiver_args,
>  					  const char *path, int recurse)
>  {
>  	int at_root = !*path;
> -	DIR *dir = opendir(at_root ? "." : path);
> +	DIR *dir;
>  	struct dirent *e;
>  	struct strbuf buf = STRBUF_INIT;
>  	size_t len;
>  	int res = 0;
>  
> -	if (!dir)
> +	dir = opendir(at_root ? "." : path);
> +	if (!dir) {
> +		if (errno == ENOENT) {

Per [1] I think this is incorrect or overly strict. Let's not spew
warnings if the user "rm -rf .git/hooks" or whatever.

It might be valuable to note in some file in the archive such oddities
we find, but warning() won't give us that.

> +			warning(_("could not archive missing directory '%s'"), path);

In any case, this should be e.g.:

	warning_errno(_("could not archive directory '%s'"), path);

You already have an errno, so using *_errno() will add the standard
information about what the issue is.

> +			return 0;
> +		}
>  		return error_errno(_("could not open directory '%s'"), path);
> +	}
>  
>  	if (!at_root)
>  		strbuf_addf(&buf, "%s/", path);

1. https://lore.kernel.org/git/220610.86ilp9s1x7.gmgdl@evledraar.gmail.com/
Junio C Hamano Aug. 4, 2022, 5:12 p.m. UTC | #2
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

>> +	dir = opendir(at_root ? "." : path);
>> +	if (!dir) {
>> +		if (errno == ENOENT) {
>
> Per [1]

"Per [1]" somehow sounds more like a reference to an authoritative
source, at least to me.  Every time you use it, I have to see what
it refers to, and after realizing that you used it as a replacement
of "I said it already in [1]" again, it leaves a funny feeling.

> I think this is incorrect or overly strict. Let's not spew
> warnings if the user "rm -rf .git/hooks" or whatever.

The above is doing the right thing even in that situation, doesn't
it?  If there is no ".git/hooks" that is fine.  We get ENOENT, give
a warning to indicate that we found an unusual situation, and return
without failing.  If we got something other than ENOENT, we fail with
error_errno(), because opendir() failed for a reason other than "No
such file or directory".

> You already have an errno, so using *_errno() will add the standard
> information about what the issue is.

Reading the code aloud, slowly, may help.  When errno says ENOENT,
we know opendir() failed because of "No such file or directory",
so "path" was missing.  So let's say 'not archiving a missing directory'".

ENOENT or "No such file or directory" is an implementation detail
that does not help the end user.

The other side, i.e. when the errno is *not* ENOENT, already uses
error_errno().

So, I am puzzled.
Ævar Arnfjörð Bjarmason Aug. 4, 2022, 8:12 p.m. UTC | #3
On Thu, Aug 04 2022, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>
>>> +	dir = opendir(at_root ? "." : path);
>>> +	if (!dir) {
>>> +		if (errno == ENOENT) {
>>
>> Per [1]
>
> "Per [1]" somehow sounds more like a reference to an authoritative
> source, at least to me.  Every time you use it, I have to see what
> it refers to, and after realizing that you used it as a replacement
> of "I said it already in [1]" again, it leaves a funny feeling.

"Per" in the sense of "Per what I noted in [1]".

>> I think this is incorrect or overly strict. Let's not spew
>> warnings if the user "rm -rf .git/hooks" or whatever.
>
> The above is doing the right thing even in that situation, doesn't
> it?  If there is no ".git/hooks" that is fine.  We get ENOENT, give
> a warning to indicate that we found an unusual situation, and return
> without failing.  If we got something other than ENOENT, we fail with
> error_errno(), because opendir() failed for a reason other than "No
> such file or directory".

I'm mainly noting that the point of this step is to produce an archive
for the consumption of the remote end.

Therefore it seems to me like it would me much more useful to note these
"oddities" in some log that we're about to zip up, rather than issue a
warning().

The "per [1]" was a reference to the (paraphrasing) "maybe that's not
needed at all", but you seemed to think so. But for the purposes of the
discussion here let's assume we keep it.

>> You already have an errno, so using *_errno() will add the standard
>> information about what the issue is.
>
> Reading the code aloud, slowly, may help.  When errno says ENOENT,
> we know opendir() failed because of "No such file or directory",
> so "path" was missing.  So let's say 'not archiving a missing directory'".
>
> ENOENT or "No such file or directory" is an implementation detail
> that does not help the end user.
>
> The other side, i.e. when the errno is *not* ENOENT, already uses
> error_errno().
>
> So, I am puzzled.

I'm pointing out that we don't need to include that part in the message,
because warning_errno() will already give us that for free. I.e.:

	warning: could not archive directory '<some dir>': No such file or directory

v.s.:

	warning: could not archive missing directory '<some dir>'

The advantages of doing so being:

 * It's clear (at least to the keen eye) that it's using the "errno
   format", so you know it's not just saying "could not for <whatever
   reason>", it specifically got ENOENT.

 * The i18n for the strerror() comes from the C library, which will be
   translated already, whereas a new git.pot message won't be (but we'll
   hopefully bridge the gap eventually).

 * This way we we can share the message, whatever the errno happens to
   be, so we could e.g.:

      errno = ENOENT;
      warning_errno(_("could not archive directory '%s'"), "<some dir>");
      errno = ENOMEM;
      error_errno(_("could not archive directory '%s'"), "<some dir>");

   Whereas putting the reason for why we couldn't (which just duplicates
   the errno) in the message forces the messages & i18n to diverge.
Junio C Hamano Aug. 4, 2022, 9:09 p.m. UTC | #4
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> I'm mainly noting that the point of this step is to produce an archive
> for the consumption of the remote end.
>
> Therefore it seems to me like it would me much more useful to note these
> "oddities" in some log that we're about to zip up, rather than issue a
> warning().

Hmph, the receiving end that inspects the archive will know that a
directory did not get archived, but they cannot tell if that is
because it did not exist, or because it was unreadable, and the
trouble the user may be having can well be the result of having
the directory unreadable.  So from that point of view, in addition
to these warnings and errors, it would be helpful to record the
errors we encounter while we generate the diagnostic archive in the
archive itself for inspection.

But the warning on the local side has merit to warn the user of an
unusual situation.  "I am puzzled why the hook I thought I wrote did
not trigger, but the diag tool says I do not have .git/hooks at all"
is a welcome side effect, even though it may not be the primary
effect we are aiming to gain by having these warning messages.

> I'm pointing out that we don't need to include that part in the message,
> because warning_errno() will already give us that for free. I.e.:
>
> 	warning: could not archive directory '<some dir>': No such file or directory
>
> v.s.:
>
> 	warning: could not archive missing directory '<some dir>'
>
> The advantages of doing so being:
>
>  * It's clear (at least to the keen eye) that it's using the "errno
>    format", so you know it's not just saying "could not for <whatever
>    reason>", it specifically got ENOENT.

Funny.  I find it much clearer if we can use our own message without
having to rely on whatever strerror(error) gives us.  We know better
than the C library why we got ENOENT and be more readable.  They say
"No such file or directory" because from ENOENT alone they cannot
tell you if it was a file or a directory that was missing, but we
have a better context like "we were trying to create an archive" and
"we tried opendir, expecting that the thing is a directory".

>  * The i18n for the strerror() comes from the C library, which will be
>    translated already, whereas a new git.pot message won't be (but we'll
>    hopefully bridge the gap eventually).

As I said above, I do not think it is an advantage in this case that
strerror() is translated, as the point of having a separate message
is because we can be more to-the-point, and we do not need to use
the strerror() result in there.

>  * This way we we can share the message, whatever the errno happens to
>    be, so we could e.g.:
>
>       errno = ENOENT;
>       warning_errno(_("could not archive directory '%s'"), "<some dir>");
>       errno = ENOMEM;
>       error_errno(_("could not archive directory '%s'"), "<some dir>");
>
>    Whereas putting the reason for why we couldn't (which just duplicates
>    the errno) in the message forces the messages & i18n to diverge.

And an added clarity that we can use a separate message is something
I think is worth having, compared to the cost of having an extra
message over the more generic one for any errno.
diff mbox series

Patch

diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
index 04046452284..b9092f0b612 100644
--- a/contrib/scalar/scalar.c
+++ b/contrib/scalar/scalar.c
@@ -266,14 +266,20 @@  static int add_directory_to_archiver(struct strvec *archiver_args,
 					  const char *path, int recurse)
 {
 	int at_root = !*path;
-	DIR *dir = opendir(at_root ? "." : path);
+	DIR *dir;
 	struct dirent *e;
 	struct strbuf buf = STRBUF_INIT;
 	size_t len;
 	int res = 0;
 
-	if (!dir)
+	dir = opendir(at_root ? "." : path);
+	if (!dir) {
+		if (errno == ENOENT) {
+			warning(_("could not archive missing directory '%s'"), path);
+			return 0;
+		}
 		return error_errno(_("could not open directory '%s'"), path);
+	}
 
 	if (!at_root)
 		strbuf_addf(&buf, "%s/", path);