diff mbox series

[1/8] refs: remove EINVAL specification from the errno sideband in read_raw_ref_fn

Message ID 7e8181e77d409af7595e357ad233b7781e026b78.1619710329.git.gitgitgadget@gmail.com (mailing list archive)
State New
Headers show
Series refs: cleanup errno sideband ref related functions | expand

Commit Message

Han-Wen Nienhuys April 29, 2021, 3:32 p.m. UTC
From: Han-Wen Nienhuys <hanwen@google.com>

A grep for EINVAL */*c reveals that no code inspects EINVAL after reading
references.

The files ref backend does use EINVAL so parse_loose_ref_contents() can
communicate to lock_raw_ref() about garbage following the hex SHA1, or a short
read in files_read_raw_ref(), but the files backend does not call into
refs_read_raw_ref(), so its EINVAL sideband error is unused.

As the errno sideband is unintuitive and error-prone, remove EINVAL
value, as a step towards getting rid of the errno sideband altogether.

Spotted by Ævar Arnfjörð Bjarmason <avarab@gmail.com>.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 refs/refs-internal.h | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

Comments

Junio C Hamano April 30, 2021, 2:38 a.m. UTC | #1
"Han-Wen Nienhuys via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Han-Wen Nienhuys <hanwen@google.com>
>
> A grep for EINVAL */*c reveals that no code inspects EINVAL after reading
> references.

We often use a pattern (which is common) like this:

	if (some_func_in_ref_API(...) < 0) {
		if (errno == ENOENT || errno == EISDIR)
			... it is OK for the file to be missing ...
		else
			... error ...
	}

If a piece of code currently sets EINVAL to errno manually when
signalling a failure by returning a negative value to communicate
with such a caller, we wouldn't see EINVAL mentioned, so such a grep
alone would not help us guarantee the correctness of an update to
stop assignment of EINVAL at all.  The callers must be vetted more
carefully than "we are happy that nobody explicitly mentions EINVAL".

> The files ref backend does use EINVAL so parse_loose_ref_contents() can
> communicate to lock_raw_ref() about garbage following the hex SHA1, or a short
> read in files_read_raw_ref(), but the files backend does not call into
> refs_read_raw_ref(), so its EINVAL sideband error is unused.

This paragraph is confusing.  It says EINVAL is used to signal
lock_raw_ref(), and it says EINVAL is not used by the same files
backend.  Which is correct?  If one part of the backend uses it, and
other parts don't, wouldn't the backend as a whole still use it?

Having said that, ...

> - * Return 0 on success. If the ref doesn't exist, set errno to ENOENT
> - * and return -1. If the ref exists but is neither a symbolic ref nor
> - * an object ID, it is broken; set REF_ISBROKEN in type, set errno to
> - * EINVAL, and return -1. If there is another error reading the ref,
> - * set errno appropriately and return -1.

... the mention (and requirement) of EINVAL seems redundant, as it
sounds sufficient for the caller to inspect 'type' to see if it is
REF_ISBOKEN.  So it may be OK for the code that gives REF_ISBROKEN
to type *not* to set errno to EINVAL, as long as it won't leave it
as ENOENT (meaning, an unrelated system call failed earlier may have
set errno to ENOENT, and after having dealt with such an error, the
control may have reached to the codepath we are interested in
here---errno must be cleared to some value other than ENOENT, and
assigning EINVAL is as good as any).

That is because there is a codeflow like this:

	if (files_read_raw_ref(...)) {
		if (errno == ENOENT) {
			... do various things ...
		} else if (errno == EISDIR) {
			... do different and various things ...
		} else if (errno == EINVAL && (*type & REF_ISBROKEN)) {
			... deal with broken ref ...
		}
		 ...
	}

where errno is looked at.

> + * Return 0 on success. If the ref doesn't exist, set errno to ENOENT and return
> + * -1. If the ref exists but is neither a symbolic ref nor an object ID, it is
> + * broken; set REF_ISBROKEN in type, and return -1. If there is another error
> + * reading the ref, set errno appropriately and return -1.

So, this is not sufficient to let caller correctly and safely handle
errors.  "set REF_ISBROKEN in type, set errno to something other
than ENOENT or EISDIR, and then return -1" is necessary, I would
think.
Han-Wen Nienhuys May 19, 2021, 12:25 p.m. UTC | #2
On Fri, Apr 30, 2021 at 4:38 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> "Han-Wen Nienhuys via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > From: Han-Wen Nienhuys <hanwen@google.com>
> >
> > A grep for EINVAL */*c reveals that no code inspects EINVAL after reading
> > references.
>
> We often use a pattern (which is common) like this:
>
>         if (some_func_in_ref_API(...) < 0) {
>                 if (errno == ENOENT || errno == EISDIR)
>                         ... it is OK for the file to be missing ...
>                 else
>                         ... error ...
>         }
>
> If a piece of code currently sets EINVAL to errno manually when
> signalling a failure by returning a negative value to communicate
> with such a caller, we wouldn't see EINVAL mentioned, so such a grep
> alone would not help us guarantee the correctness of an update to
> stop assignment of EINVAL at all.  The callers must be vetted more
> carefully than "we are happy that nobody explicitly mentions EINVAL".

Sure. I looked at the callers, and documented further what I looked
for. But how far should one go? It's a global variable, so
transitively, almost all of the code could be observing the EINVAL
value under very specific circumstances. But it would also be a
terrible, fragile coding style and use of undocumented behavior.

> > The files ref backend does use EINVAL so parse_loose_ref_contents() can
> > communicate to lock_raw_ref() about garbage following the hex SHA1, or a short
> > read in files_read_raw_ref(), but the files backend does not call into
> > refs_read_raw_ref(), so its EINVAL sideband error is unused.
>
> This paragraph is confusing.  It says EINVAL is used to signal
> lock_raw_ref(), and it says EINVAL is not used by the same files
> backend.  Which is correct?  If one part of the backend uses it, and
> other parts don't, wouldn't the backend as a whole still use it?

I tried to clarify the message. files-backend.c makes assumptions
about the errno return for files_read_raw_ref, but it's not making
assumptions about the abstracted API in refs.h

> That is because there is a codeflow like this:
>
>         if (files_read_raw_ref(...)) {
>                 if (errno == ENOENT) {
>                         ... do various things ...
>                 } else if (errno == EISDIR) {
>                         ... do different and various things ...
>                 } else if (errno == EINVAL && (*type & REF_ISBROKEN)) {
>                         ... deal with broken ref ...
>                 }
>                  ...
>         }

as mentioned above, this isn't calling refs_read_raw_ref, so it's not
affected by this patch.

> > + * Return 0 on success. If the ref doesn't exist, set errno to ENOENT and return
> > + * -1. If the ref exists but is neither a symbolic ref nor an object ID, it is
> > + * broken; set REF_ISBROKEN in type, and return -1. If there is another error
> > + * reading the ref, set errno appropriately and return -1.
>
> So, this is not sufficient to let caller correctly and safely handle
> errors.  "set REF_ISBROKEN in type, set errno to something other
> than ENOENT or EISDIR, and then return -1" is necessary, I would
> think.

I tweaked the comment. Note that the function has only a handful of
callers (and only one caller where this behavior is relevant), and
it's changed in a follow-on patch in this series. Is it worth the
effort to wordsmith this further?
Jonathan Tan June 3, 2021, 2:19 a.m. UTC | #3
> From: Han-Wen Nienhuys <hanwen@google.com>
> 
> A grep for EINVAL */*c reveals that no code inspects EINVAL after reading
> references.
> 
> The files ref backend does use EINVAL so parse_loose_ref_contents() can
> communicate to lock_raw_ref() about garbage following the hex SHA1, or a short
> read in files_read_raw_ref(), but the files backend does not call into
> refs_read_raw_ref(), so its EINVAL sideband error is unused.

Does this mean that there is some code that sets EINVAL, but no code
uses it? If yes, that seems to mean that we can't remove EINVAL from the
documentation, since some code still sets it.

> As the errno sideband is unintuitive and error-prone, remove EINVAL
> value, as a step towards getting rid of the errno sideband altogether.

How is removing one possible value a step towards getting rid of the
errno sideband? I would have thought that we would be working towards
transmitting all existing values to the new sideband (the out param).
Unless we are planning to get rid of all values in the sideband - in
which case, this should be described in the commit message.

[snip]

> ---
>  refs/refs-internal.h | 9 ++++-----
>  1 file changed, 4 insertions(+), 5 deletions(-)
> 
> diff --git a/refs/refs-internal.h b/refs/refs-internal.h
> index 467f4b3c936d..29728a339fed 100644
> --- a/refs/refs-internal.h
> +++ b/refs/refs-internal.h
> @@ -617,11 +617,10 @@ typedef int reflog_expire_fn(struct ref_store *ref_store,
>   * properly-formatted or even safe reference name. NEITHER INPUT NOR
>   * OUTPUT REFERENCE NAMES ARE VALIDATED WITHIN THIS FUNCTION.
>   *
> - * Return 0 on success. If the ref doesn't exist, set errno to ENOENT
> - * and return -1. If the ref exists but is neither a symbolic ref nor
> - * an object ID, it is broken; set REF_ISBROKEN in type, set errno to
> - * EINVAL, and return -1. If there is another error reading the ref,
> - * set errno appropriately and return -1.
> + * Return 0 on success. If the ref doesn't exist, set errno to ENOENT and return
> + * -1. If the ref exists but is neither a symbolic ref nor an object ID, it is
> + * broken; set REF_ISBROKEN in type, and return -1. If there is another error
> + * reading the ref, set errno appropriately and return -1.

The rewrapping was unnecessary and makes it hard to see what changed.
Han-Wen Nienhuys June 9, 2021, 11:28 a.m. UTC | #4
On Thu, Jun 3, 2021 at 4:19 AM Jonathan Tan <jonathantanmy@google.com> wrote:
>
> > From: Han-Wen Nienhuys <hanwen@google.com>
> >
> > A grep for EINVAL */*c reveals that no code inspects EINVAL after reading
> > references.
> >
> > The files ref backend does use EINVAL so parse_loose_ref_contents() can
> > communicate to lock_raw_ref() about garbage following the hex SHA1, or a short
> > read in files_read_raw_ref(), but the files backend does not call into
> > refs_read_raw_ref(), so its EINVAL sideband error is unused.
>
> Does this mean that there is some code that sets EINVAL, but no code
> uses it? If yes, that seems to mean that we can't remove EINVAL from the
> documentation, since some code still sets it.

It means that an alternate ref backend doesn't have to return EINVAL
to work properly.

added to commit message.

> > As the errno sideband is unintuitive and error-prone, remove EINVAL
> > value, as a step towards getting rid of the errno sideband altogether.
>
> How is removing one possible value a step towards getting rid of the
> errno sideband? I would have thought that we would be working towards
> transmitting all existing values to the new sideband (the out param).
> Unless we are planning to get rid of all values in the sideband - in
> which case, this should be described in the commit message.

It means that we don't have to spend braincycles in further commits of
this series reasoning about EINVAL.

The existing functions potentially transmit all kinds of errors, eg.
EIO if there was a problem with the media, or ENOTCONN if the git repo
happens to be on a FUSE filesystem that crashed. These are not
relevant to the ref backend, so we don't treat them specially either.

> > - * set errno appropriately and return -1.
> > + * Return 0 on success. If the ref doesn't exist, set errno to ENOENT and return
> > + * -1. If the ref exists but is neither a symbolic ref nor an object ID, it is
> > + * broken; set REF_ISBROKEN in type, and return -1. If there is another error
> > + * reading the ref, set errno appropriately and return -1.
>
> The rewrapping was unnecessary and makes it hard to see what changed.

Fixed.  I suppose .clang-format settings should add some configuration
about the line size for wrapping comments.
diff mbox series

Patch

diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index 467f4b3c936d..29728a339fed 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -617,11 +617,10 @@  typedef int reflog_expire_fn(struct ref_store *ref_store,
  * properly-formatted or even safe reference name. NEITHER INPUT NOR
  * OUTPUT REFERENCE NAMES ARE VALIDATED WITHIN THIS FUNCTION.
  *
- * Return 0 on success. If the ref doesn't exist, set errno to ENOENT
- * and return -1. If the ref exists but is neither a symbolic ref nor
- * an object ID, it is broken; set REF_ISBROKEN in type, set errno to
- * EINVAL, and return -1. If there is another error reading the ref,
- * set errno appropriately and return -1.
+ * Return 0 on success. If the ref doesn't exist, set errno to ENOENT and return
+ * -1. If the ref exists but is neither a symbolic ref nor an object ID, it is
+ * broken; set REF_ISBROKEN in type, and return -1. If there is another error
+ * reading the ref, set errno appropriately and return -1.
  *
  * Backend-specific flags might be set in type as well, regardless of
  * outcome.