diff mbox series

[v2,3/8] refs: make errno output explicit for read_raw_ref_fn

Message ID 3e2831e59c8e4cb8aef416a41b55083887b4559c.1623329869.git.gitgitgadget@gmail.com (mailing list archive)
State Superseded
Headers show
Series refs: cleanup errno sideband ref related functions | expand

Commit Message

Han-Wen Nienhuys June 10, 2021, 12:57 p.m. UTC
From: Han-Wen Nienhuys <hanwen@google.com>

read_raw_ref_fn needs to supply a credible errno for a number of cases. These
are primarily:

1) The files backend calls read_raw_ref from lock_raw_ref, and uses the
resulting error codes to create/remove directories as needed.

2) ENOENT should be translated in a zero OID, optionally with REF_ISBROKEN set,
returning the last successfully resolved symref. This is necessary so
read_raw_ref("HEAD") on an empty repo returns refs/heads/main (or the default branch
du-jour), and we know on which branch to create the first commit.

Make this information flow explicit by adding a failure_errno to the signature
of read_raw_ref. All errnos from the files backend are still propagated
unchanged, even though inspection suggests only ENOTDIR, EISDIR and ENOENT are
relevant.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Reviewed-by: Jonathan Tan <jonathantanmy@google.com>
---
 refs.c                |  8 ++++++--
 refs/debug.c          |  4 ++--
 refs/files-backend.c  | 24 ++++++++++++------------
 refs/packed-backend.c |  8 ++++----
 refs/refs-internal.h  | 17 +++++++++--------
 5 files changed, 33 insertions(+), 28 deletions(-)

Comments

Ævar Arnfjörð Bjarmason July 1, 2021, 11:34 a.m. UTC | #1
On Thu, Jun 10 2021, Han-Wen Nienhuys via GitGitGadget wrote:

> From: Han-Wen Nienhuys <hanwen@google.com>
>
> read_raw_ref_fn needs to supply a credible errno for a number of cases. These
> are primarily:
>
> 1) The files backend calls read_raw_ref from lock_raw_ref, and uses the
> resulting error codes to create/remove directories as needed.
>
> 2) ENOENT should be translated in a zero OID, optionally with REF_ISBROKEN set,
> returning the last successfully resolved symref. This is necessary so
> read_raw_ref("HEAD") on an empty repo returns refs/heads/main (or the default branch
> du-jour), and we know on which branch to create the first commit.
>
> Make this information flow explicit by adding a failure_errno to the signature
> of read_raw_ref. All errnos from the files backend are still propagated
> unchanged, even though inspection suggests only ENOTDIR, EISDIR and ENOENT are
> relevant.
>
> Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
> Reviewed-by: Jonathan Tan <jonathantanmy@google.com>
> ---
>  refs.c                |  8 ++++++--
>  refs/debug.c          |  4 ++--
>  refs/files-backend.c  | 24 ++++++++++++------------
>  refs/packed-backend.c |  8 ++++----
>  refs/refs-internal.h  | 17 +++++++++--------
>  5 files changed, 33 insertions(+), 28 deletions(-)
>
> diff --git a/refs.c b/refs.c
> index 8c9490235ea6..bebe3f584da7 100644
> --- a/refs.c
> +++ b/refs.c
> @@ -1675,13 +1675,17 @@ int refs_read_raw_ref(struct ref_store *ref_store,
>  		      const char *refname, struct object_id *oid,
>  		      struct strbuf *referent, unsigned int *type)
>  {
> +	int result, failure;

Style nit: Different variables should be on different lines, except
where they're the same, so "int i, j", not "int ret, i". Perhaps "ret"
and "errno" are similar enough, but I'd split them, just my 0.02.

Also nit: s/failure/failure_errno/, just like the function signature.

>  	if (!strcmp(refname, "FETCH_HEAD") || !strcmp(refname, "MERGE_HEAD")) {
>  		return refs_read_special_head(ref_store, refname, oid, referent,
>  					      type);
>  	}
>  
> -	return ref_store->be->read_raw_ref(ref_store, refname, oid, referent,
> -					   type);
> +	failure = 0;
> +	result = ref_store->be->read_raw_ref(ref_store, refname, oid, referent,
> +					     type, &failure);
> +	errno = failure;
> +	return result;
>  }

Is there some subtlety here where this isn't equivalent to the more
simple/straightforward:

	diff --git a/refs.c b/refs.c
	index bebe3f584da..49ab7555de9 100644
	--- a/refs.c
	+++ b/refs.c
	@@ -1675,17 +1675,14 @@ int refs_read_raw_ref(struct ref_store *ref_store,
	 		      const char *refname, struct object_id *oid,
	 		      struct strbuf *referent, unsigned int *type)
	 {
	-	int result, failure;
	 	if (!strcmp(refname, "FETCH_HEAD") || !strcmp(refname, "MERGE_HEAD")) {
	 		return refs_read_special_head(ref_store, refname, oid, referent,
	 					      type);
	 	}
	 
	-	failure = 0;
	-	result = ref_store->be->read_raw_ref(ref_store, refname, oid, referent,
	-					     type, &failure);
	-	errno = failure;
	-	return result;
	+	errno = 0;
	+	return ref_store->be->read_raw_ref(ref_store, refname, oid, referent,
	+					     type, &errno);
	 }
	 
	 /* This function needs to return a meaningful errno on failure */

In that case adding the "failure_errno" to the signature won't be needed
either, but I'm assuming we're heading towards some reftable end-state
where that makes more sense.

I can only imagine a case where we think files_read_raw_ref() would
encounter a new errno after it assigned to *failure_errno, which is just
a couple of strbuf_release() calls.

if that is a case we're worried about then like in my comment on 2/8
shouldn't we be explicitly checking for such a lost/different errno?
I.e. something like (should probably be less fatal than BUG(...):
		
	diff --git a/refs.c b/refs.c
	index bebe3f584da..9584ddae392 100644
	--- a/refs.c
	+++ b/refs.c
	@@ -1675,16 +1675,22 @@ int refs_read_raw_ref(struct ref_store *ref_store,
	 		      const char *refname, struct object_id *oid,
	 		      struct strbuf *referent, unsigned int *type)
	 {
	-	int result, failure;
	+	int result, failure_errno = 0;
	+
	 	if (!strcmp(refname, "FETCH_HEAD") || !strcmp(refname, "MERGE_HEAD")) {
	 		return refs_read_special_head(ref_store, refname, oid, referent,
	 					      type);
	 	}
	 
	-	failure = 0;
	+	errno = 0;
	 	result = ref_store->be->read_raw_ref(ref_store, refname, oid, referent,
	-					     type, &failure);
	-	errno = failure;
	+					     type, &failure_errno);
	+	if (errno)
	+		BUG("Got another errno from read_raw_ref()?: %s, failure errno: %s",
	+		    strerror(errno),
	+		    strerror(failure_errno));
	+	else
	+		errno = failure_errno;
	 	return result;
	 }
	 
	diff --git a/refs/files-backend.c b/refs/files-backend.c
	index 8f969c8f711..57cfdf738da 100644
	--- a/refs/files-backend.c
	+++ b/refs/files-backend.c
	@@ -458,8 +458,10 @@ static int files_read_raw_ref(struct ref_store *ref_store, const char *refname,
	 	ret = parse_loose_ref_contents(buf, oid, referent, type);
	 
	 out:
	-	if (failure_errno)
	+	if (failure_errno) {
	 		*failure_errno = errno;
	+		errno = 0;
	+	}
	 	strbuf_release(&sb_path);
	 	strbuf_release(&sb_contents);
	 	return ret;
	diff --git a/refs/packed-backend.c b/refs/packed-backend.c
	index a457f18e93c..4bcb4777f0f 100644
	--- a/refs/packed-backend.c
	+++ b/refs/packed-backend.c
	@@ -740,6 +740,7 @@ static int packed_read_raw_ref(struct ref_store *ref_store, const char *refname,
	 	if (!rec) {
	 		/* refname is not a packed reference. */
	 		*failure_errno = ENOENT;
	+		errno = 0;
	 		return -1;
	 	}

Which also passes all our tests, and the packed_read_raw_ref() suggests
how you may have come up with the current pattern.
> [...]
> - * Return 0 on success. If the ref doesn't exist, set errno to ENOENT
> - * and return -1. If the ref exists but is neither a symbolic ref nor
> - * an object ID, it is broken; set REF_ISBROKEN in type, and return -1
> - * (errno should not be ENOENT) If there is another error reading the
> - * ref, set errno appropriately and return -1.
> + * Return 0 on success. If the ref doesn't exist, set failure_errno to ENOENT
> + * and return -1. If the ref exists but is neither a symbolic ref nor an object
> + * ID, it is broken; set REF_ISBROKEN in type, and return -1 (failure_errno
> + * should not be ENOENT). The files backend may return EISDIR (if the ref name
> + * is a directory) and ENOTDIR (if a ref prefix is not a directory). If there is
> + * another error reading the ref, set failure_errno appropriately and return -1.
>   *

This documentation is a bit confusing in light of the above comments. I
assume that "set failure_errno appropriately" here means that it will do
it, but it's really worded like it's suggesting that you should do it,
and does the "another error" suggest that a caller may need to deal with
an arbitrary errno, not just the ENOENT|EISDIR|ENOTDIR?
Han-Wen Nienhuys July 5, 2021, 2:34 p.m. UTC | #2
On Thu, Jul 1, 2021 at 1:55 PM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
> > diff --git a/refs.c b/refs.c
> > index 8c9490235ea6..bebe3f584da7 100644
> > --- a/refs.c
> > +++ b/refs.c
> > @@ -1675,13 +1675,17 @@ int refs_read_raw_ref(struct ref_store *ref_store,
> >                     const char *refname, struct object_id *oid,
> >                     struct strbuf *referent, unsigned int *type)
> >  {
> > +     int result, failure;
>
> Style nit: Different variables should be on different lines, except
> where they're the same, so "int i, j", not "int ret, i". Perhaps "ret"
> and "errno" are similar enough, but I'd split them, just my 0.02.
>
> Also nit: s/failure/failure_errno/, just like the function signature.

done.

> >       if (!strcmp(refname, "FETCH_HEAD") || !strcmp(refname, "MERGE_HEAD")) {
> >               return refs_read_special_head(ref_store, refname, oid, referent,
> >                                             type);
> >       }
> >
> > -     return ref_store->be->read_raw_ref(ref_store, refname, oid, referent,
> > -                                        type);
> > +     failure = 0;
> > +     result = ref_store->be->read_raw_ref(ref_store, refname, oid, referent,
> > +                                          type, &failure);
> > +     errno = failure;
> > +     return result;
> >  }
>
> Is there some subtlety here where this isn't equivalent to the more
> simple/straightforward:
>
>         diff --git a/refs.c b/refs.c
>         index bebe3f584da..49ab7555de9 100644
>         --- a/refs.c
>         +++ b/refs.c
>         @@ -1675,17 +1675,14 @@ int refs_read_raw_ref(struct ref_store *ref_store,
>                               const char *refname, struct object_id *oid,
>                               struct strbuf *referent, unsigned int *type)
>          {
>         -       int result, failure;
>                 if (!strcmp(refname, "FETCH_HEAD") || !strcmp(refname, "MERGE_HEAD")) {
>                         return refs_read_special_head(ref_store, refname, oid, referent,
>                                                       type);
>                 }
>
>         -       failure = 0;
>         -       result = ref_store->be->read_raw_ref(ref_store, refname, oid, referent,
>         -                                            type, &failure);
>         -       errno = failure;
>         -       return result;
>         +       errno = 0;
>         +       return ref_store->be->read_raw_ref(ref_store, refname, oid, referent,
>         +                                            type, &errno);

There is a big difference: in this case, in read_raw_ref, the
failure_errno pointer is aliased to errno. Which means that things
like

   *failure_errno = 0
   some_syscall();

might mysteriously clobber *failure_errno again, which is the kind of
effect-at-a-distance we're trying to get rid of.

> I can only imagine a case where we think files_read_raw_ref() would
> encounter a new errno after it assigned to *failure_errno, which is just
> a couple of strbuf_release() calls.
>
> if that is a case we're worried about then like in my comment on 2/8
> shouldn't we be explicitly checking for such a lost/different errno?
> I.e. something like (should probably be less fatal than BUG(...):
>
>         diff --git a/refs.c b/refs.c
>         index bebe3f584da..9584ddae392 100644
>         --- a/refs.c
>         +++ b/refs.c
>         @@ -1675,16 +1675,22 @@ int refs_read_raw_ref(struct ref_store *ref_store,
>                               const char *refname, struct object_id *oid,
>                               struct strbuf *referent, unsigned int *type)
>          {
>         -       int result, failure;
>         +       int result, failure_errno = 0;
>         +
>                 if (!strcmp(refname, "FETCH_HEAD") || !strcmp(refname, "MERGE_HEAD")) {
>                         return refs_read_special_head(ref_store, refname, oid, referent,
>                                                       type);
>                 }
>
>         -       failure = 0;
>         +       errno = 0;
>                 result = ref_store->be->read_raw_ref(ref_store, refname, oid, referent,
>         -                                            type, &failure);
>         -       errno = failure;
>         +                                            type, &failure_errno);
>         +       if (errno)
>         +               BUG("Got another errno from read_raw_ref()?: %s, failure errno: %s",
>         +                   strerror(errno),
>         +                   strerror(failure_errno));

The whole idea is to make sure that important (ie. errno values that
influence downstream logic) are separate from unimportant (ie. used
for error messages). By adding a BUG() here, you're making a drastic
downstream logic change depending on errno,  so your example code is
going precisely in the opposite direction of where I want to go, and
forces implementers of read_raw_ref to walk on eggshells again wrt.
errno handling to avoid triggering error messages.

> > + * Return 0 on success. If the ref doesn't exist, set failure_errno to ENOENT
> > + * and return -1. If the ref exists but is neither a symbolic ref nor an object
> > + * ID, it is broken; set REF_ISBROKEN in type, and return -1 (failure_errno
> > + * should not be ENOENT). The files backend may return EISDIR (if the ref name
> > + * is a directory) and ENOTDIR (if a ref prefix is not a directory). If there is
> > + * another error reading the ref, set failure_errno appropriately and return -1.
> >   *
>
> This documentation is a bit confusing in light of the above comments. I
> assume that "set failure_errno appropriately" here means that it will do
> it, but it's really worded like it's suggesting that you should do it,
> and does the "another error" suggest that a caller may need to deal with
> an arbitrary errno, not just the ENOENT|EISDIR|ENOTDIR?

Tried to clarify this.
diff mbox series

Patch

diff --git a/refs.c b/refs.c
index 8c9490235ea6..bebe3f584da7 100644
--- a/refs.c
+++ b/refs.c
@@ -1675,13 +1675,17 @@  int refs_read_raw_ref(struct ref_store *ref_store,
 		      const char *refname, struct object_id *oid,
 		      struct strbuf *referent, unsigned int *type)
 {
+	int result, failure;
 	if (!strcmp(refname, "FETCH_HEAD") || !strcmp(refname, "MERGE_HEAD")) {
 		return refs_read_special_head(ref_store, refname, oid, referent,
 					      type);
 	}
 
-	return ref_store->be->read_raw_ref(ref_store, refname, oid, referent,
-					   type);
+	failure = 0;
+	result = ref_store->be->read_raw_ref(ref_store, refname, oid, referent,
+					     type, &failure);
+	errno = failure;
+	return result;
 }
 
 /* This function needs to return a meaningful errno on failure */
diff --git a/refs/debug.c b/refs/debug.c
index 7db4abccc341..f12413a9bc0f 100644
--- a/refs/debug.c
+++ b/refs/debug.c
@@ -238,7 +238,7 @@  debug_ref_iterator_begin(struct ref_store *ref_store, const char *prefix,
 
 static int debug_read_raw_ref(struct ref_store *ref_store, const char *refname,
 			      struct object_id *oid, struct strbuf *referent,
-			      unsigned int *type)
+			      unsigned int *type, int *failure_errno)
 {
 	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
 	int res = 0;
@@ -246,7 +246,7 @@  static int debug_read_raw_ref(struct ref_store *ref_store, const char *refname,
 	oidcpy(oid, null_oid());
 	errno = 0;
 	res = drefs->refs->be->read_raw_ref(drefs->refs, refname, oid, referent,
-					    type);
+					    type, failure_errno);
 
 	if (res == 0) {
 		trace_printf_key(&trace_refs, "read_raw_ref: %s: %s (=> %s) type %x: %d\n",
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 6aa0f5c41dd3..8f969c8f711f 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -341,9 +341,9 @@  static struct ref_cache *get_loose_ref_cache(struct files_ref_store *refs)
 	return refs->loose;
 }
 
-static int files_read_raw_ref(struct ref_store *ref_store,
-			      const char *refname, struct object_id *oid,
-			      struct strbuf *referent, unsigned int *type)
+static int files_read_raw_ref(struct ref_store *ref_store, const char *refname,
+			      struct object_id *oid, struct strbuf *referent,
+			      unsigned int *type, int *failure_errno)
 {
 	struct files_ref_store *refs =
 		files_downcast(ref_store, REF_STORE_READ, "read_raw_ref");
@@ -354,7 +354,6 @@  static int files_read_raw_ref(struct ref_store *ref_store,
 	struct stat st;
 	int fd;
 	int ret = -1;
-	int save_errno;
 	int remaining_retries = 3;
 
 	*type = 0;
@@ -459,10 +458,10 @@  static int files_read_raw_ref(struct ref_store *ref_store,
 	ret = parse_loose_ref_contents(buf, oid, referent, type);
 
 out:
-	save_errno = errno;
+	if (failure_errno)
+		*failure_errno = errno;
 	strbuf_release(&sb_path);
 	strbuf_release(&sb_contents);
-	errno = save_errno;
 	return ret;
 }
 
@@ -541,6 +540,7 @@  static int lock_raw_ref(struct files_ref_store *refs,
 	struct strbuf ref_file = STRBUF_INIT;
 	int attempts_remaining = 3;
 	int ret = TRANSACTION_GENERIC_ERROR;
+	int failure_errno = 0;
 
 	assert(err);
 	files_assert_main_repository(refs, "lock_raw_ref");
@@ -629,9 +629,9 @@  static int lock_raw_ref(struct files_ref_store *refs,
 	 * fear that its value will change.
 	 */
 
-	if (files_read_raw_ref(&refs->base, refname,
-			       &lock->old_oid, referent, type)) {
-		if (errno == ENOENT) {
+	if (files_read_raw_ref(&refs->base, refname, &lock->old_oid, referent,
+			       type, &failure_errno)) {
+		if (failure_errno == ENOENT) {
 			if (mustexist) {
 				/* Garden variety missing reference. */
 				strbuf_addf(err, "unable to resolve reference '%s'",
@@ -655,7 +655,7 @@  static int lock_raw_ref(struct files_ref_store *refs,
 				 *   reference named "refs/foo/bar/baz".
 				 */
 			}
-		} else if (errno == EISDIR) {
+		} else if (failure_errno == EISDIR) {
 			/*
 			 * There is a directory in the way. It might have
 			 * contained references that have been deleted. If
@@ -693,13 +693,13 @@  static int lock_raw_ref(struct files_ref_store *refs,
 					goto error_return;
 				}
 			}
-		} else if (errno == EINVAL && (*type & REF_ISBROKEN)) {
+		} else if (failure_errno == EINVAL && (*type & REF_ISBROKEN)) {
 			strbuf_addf(err, "unable to resolve reference '%s': "
 				    "reference broken", refname);
 			goto error_return;
 		} else {
 			strbuf_addf(err, "unable to resolve reference '%s': %s",
-				    refname, strerror(errno));
+				    refname, strerror(failure_errno));
 			goto error_return;
 		}
 
diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index dfecdbc1db60..a457f18e93c8 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -724,9 +724,9 @@  static struct snapshot *get_snapshot(struct packed_ref_store *refs)
 	return refs->snapshot;
 }
 
-static int packed_read_raw_ref(struct ref_store *ref_store,
-			       const char *refname, struct object_id *oid,
-			       struct strbuf *referent, unsigned int *type)
+static int packed_read_raw_ref(struct ref_store *ref_store, const char *refname,
+			       struct object_id *oid, struct strbuf *referent,
+			       unsigned int *type, int *failure_errno)
 {
 	struct packed_ref_store *refs =
 		packed_downcast(ref_store, REF_STORE_READ, "read_raw_ref");
@@ -739,7 +739,7 @@  static int packed_read_raw_ref(struct ref_store *ref_store,
 
 	if (!rec) {
 		/* refname is not a packed reference. */
-		errno = ENOENT;
+		*failure_errno = ENOENT;
 		return -1;
 	}
 
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index f4445e329045..904b2a9e51ae 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -617,11 +617,12 @@  typedef int reflog_expire_fn(struct ref_store *ref_store,
  * properly-formatted or even safe reference name. NEITHER INPUT NOR
  * OUTPUT REFERENCE NAMES ARE VALIDATED WITHIN THIS FUNCTION.
  *
- * Return 0 on success. If the ref doesn't exist, set errno to ENOENT
- * and return -1. If the ref exists but is neither a symbolic ref nor
- * an object ID, it is broken; set REF_ISBROKEN in type, and return -1
- * (errno should not be ENOENT) If there is another error reading the
- * ref, set errno appropriately and return -1.
+ * Return 0 on success. If the ref doesn't exist, set failure_errno to ENOENT
+ * and return -1. If the ref exists but is neither a symbolic ref nor an object
+ * ID, it is broken; set REF_ISBROKEN in type, and return -1 (failure_errno
+ * should not be ENOENT). The files backend may return EISDIR (if the ref name
+ * is a directory) and ENOTDIR (if a ref prefix is not a directory). If there is
+ * another error reading the ref, set failure_errno appropriately and return -1.
  *
  * Backend-specific flags might be set in type as well, regardless of
  * outcome.
@@ -635,9 +636,9 @@  typedef int reflog_expire_fn(struct ref_store *ref_store,
  * - in all other cases, referent will be untouched, and therefore
  *   refname will still be valid and unchanged.
  */
-typedef int read_raw_ref_fn(struct ref_store *ref_store,
-			    const char *refname, struct object_id *oid,
-			    struct strbuf *referent, unsigned int *type);
+typedef int read_raw_ref_fn(struct ref_store *ref_store, const char *refname,
+			    struct object_id *oid, struct strbuf *referent,
+			    unsigned int *type, int *failure_errno);
 
 struct ref_storage_be {
 	struct ref_storage_be *next;