diff mbox series

[v4,1/2] abspath: add a function to resolve paths with missing components

Message ID 20201206225349.3392790-2-sandals@crustytoothpaste.net (mailing list archive)
State Superseded
Headers show
Series rev-parse options for absolute or relative paths | expand

Commit Message

brian m. carlson Dec. 6, 2020, 10:53 p.m. UTC
We'd like to canonicalize paths such that we can preserve any number of
trailing components that may be missing.  Let's add a function to do
that, allowing either one or an unlimited number of components to
canonicalize, and make strbuf_realpath a wrapper around it that allows
just one component.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 abspath.c | 13 ++++++++++++-
 cache.h   |  2 ++
 2 files changed, 14 insertions(+), 1 deletion(-)

Comments

Eric Sunshine Dec. 7, 2020, 12:02 a.m. UTC | #1
On Sun, Dec 6, 2020 at 5:56 PM brian m. carlson
<sandals@crustytoothpaste.net> wrote:
> We'd like to canonicalize paths such that we can preserve any number of
> trailing components that may be missing.  Let's add a function to do
> that, allowing either one or an unlimited number of components to
> canonicalize, and make strbuf_realpath a wrapper around it that allows
> just one component.

This commit message is too barebones. As a justification, "We'd
like..." is insufficient; it doesn't help the reader understand why
this change is desirable.

Further, the lack of explanation about the seemingly arbitrary "one or
infinite" condition confuses the issue. The first question which
popped into this reader's head was "why those two specific choices?".
What makes one missing path component special as opposed to any number
of missing components? (These questions are mostly rhetorical; I can
figure out reasonable answers, but the commit message ought to do a
better job of explaining.)

> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
> ---
> +/*
> + * Just like strbuf_realpath, but allows specifying how many missing components
> + * are permitted.  If many_missing is true, an arbitrary number of path
> + * components may be missing; otherwise, only the last component may be missing.
> + */
> +char *strbuf_realpath_missing(struct strbuf *resolved, const char *path,
> +                             int many_missing, int die_on_error)

This interface feels odd. Why would a caller ever want to call this
function with many_missing=0 when it would be easier, shorter, more
direct to simply call strbuf_realpath()? Is the plan to retire
strbuf_realpath() down the road?

A more orthogonal-feeling interface would be to make this function
_always_ allow any number of missing trailing path components (that
is, drop `many_missing` altogether). Doing so would simplify the
documentation and the signature. If the caller needs the original
behavior of only allowing the final path component to be missing, then
strbuf_realpath() can be used, as usual.

The name of the function is somewhat confusing, especially if you take
the suggestion of dropping the `many_missing` argument. Perhaps a name
such as strbuf_realpath_forgiving() would be more understandable.

Note that the above comments are only about the API. It's perfectly
fine if the two functions share an underlying private implementation
(making them each one-liners calling the actual worker function).
brian m. carlson Dec. 7, 2020, 2:11 a.m. UTC | #2
On 2020-12-07 at 00:02:16, Eric Sunshine wrote:
> This commit message is too barebones. As a justification, "We'd
> like..." is insufficient; it doesn't help the reader understand why
> this change is desirable.
> 
> Further, the lack of explanation about the seemingly arbitrary "one or
> infinite" condition confuses the issue. The first question which
> popped into this reader's head was "why those two specific choices?".
> What makes one missing path component special as opposed to any number
> of missing components? (These questions are mostly rhetorical; I can
> figure out reasonable answers, but the commit message ought to do a
> better job of explaining.)

Sure, I can expand the commit message to be a little more descriptive.

> The name of the function is somewhat confusing, especially if you take
> the suggestion of dropping the `many_missing` argument. Perhaps a name
> such as strbuf_realpath_forgiving() would be more understandable.

Sure, I agree that's a better name.  It shouldn't be surprising to
anyone on the list that I am absolutely terrible at naming things, so I
appreciate the suggestion.
René Scharfe Dec. 7, 2020, 5:19 p.m. UTC | #3
Am 06.12.20 um 23:53 schrieb brian m. carlson:
> We'd like to canonicalize paths such that we can preserve any number of
> trailing components that may be missing.  Let's add a function to do
> that, allowing either one or an unlimited number of components to
> canonicalize, and make strbuf_realpath a wrapper around it that allows
> just one component.
>
> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
> ---
>  abspath.c | 13 ++++++++++++-
>  cache.h   |  2 ++
>  2 files changed, 14 insertions(+), 1 deletion(-)
>
> diff --git a/abspath.c b/abspath.c
> index 6f15a418bb..e6630b3618 100644
> --- a/abspath.c
> +++ b/abspath.c
> @@ -80,6 +80,17 @@ static void get_root_part(struct strbuf *resolved, struct strbuf *remaining)
>   */
>  char *strbuf_realpath(struct strbuf *resolved, const char *path,
>  		      int die_on_error)
> +{
> +	return strbuf_realpath_missing(resolved, path, 0, die_on_error);
> +}
> +
> +/*
> + * Just like strbuf_realpath, but allows specifying how many missing components
> + * are permitted.  If many_missing is true, an arbitrary number of path
> + * components may be missing; otherwise, only the last component may be missing.
> + */
> +char *strbuf_realpath_missing(struct strbuf *resolved, const char *path,
> +			      int many_missing, int die_on_error)

So this uses two 32-bit values to pass two bits.  Hmm.

I find the concept of a "real" path with imaginary components strangely
amusing.  But perhaps a name like strbuf_resolve_path() would fit better?

>  {
>  	struct strbuf remaining = STRBUF_INIT;
>  	struct strbuf next = STRBUF_INIT;
> @@ -129,7 +140,7 @@ char *strbuf_realpath(struct strbuf *resolved, const char *path,
>
>  		if (lstat(resolved->buf, &st)) {
>  			/* error out unless this was the last component */
> -			if (errno != ENOENT || remaining.len) {
> +			if (errno != ENOENT || (!many_missing && remaining.len)) {
>  				if (die_on_error)
>  					die_errno("Invalid path '%s'",
>  						  resolved->buf);

So the original code errors out if there is a real error
(errno != ENOENT).  It also errors out if any component except the last
one is missing (errno == ENOENT && remaining.len); that's what the
comment is about.  This patch adds the ability to ignore ENOENT for all
components.

Perhaps convert many_missing and die_on_error into a single flags
parameter and implement the flags DIE_ON_ERR and REQUIRE_BASENAME or
similar?  Callers would be easier to read because such an interface is
self-documenting -- provided we find good flag names.

> diff --git a/cache.h b/cache.h
> index e986cf4ea9..a1386235fc 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -1320,6 +1320,8 @@ static inline int is_absolute_path(const char *path)
>  int is_directory(const char *);
>  char *strbuf_realpath(struct strbuf *resolved, const char *path,
>  		      int die_on_error);
> +char *strbuf_realpath_missing(struct strbuf *resolved, const char *path,
> +			      int missing_components, int die_on_error);
>  char *real_pathdup(const char *path, int die_on_error);
>  const char *absolute_path(const char *path);
>  char *absolute_pathdup(const char *path);
>
brian m. carlson Dec. 8, 2020, 2:51 a.m. UTC | #4
On 2020-12-07 at 17:19:32, René Scharfe wrote:
> I find the concept of a "real" path with imaginary components strangely
> amusing.  But perhaps a name like strbuf_resolve_path() would fit better?

I think I'm going to take the strbuf_realpath_forgiving solution from
Eric Sunshine because I think having similar names for similar functions
helps discoverability.

> So the original code errors out if there is a real error
> (errno != ENOENT).  It also errors out if any component except the last
> one is missing (errno == ENOENT && remaining.len); that's what the
> comment is about.  This patch adds the ability to ignore ENOENT for all
> components.
> 
> Perhaps convert many_missing and die_on_error into a single flags
> parameter and implement the flags DIE_ON_ERR and REQUIRE_BASENAME or
> similar?  Callers would be easier to read because such an interface is
> self-documenting -- provided we find good flag names.

As discussed elsewhere in the thread, this will be moving to an internal
function, but I can make that function take two flag parameters.
diff mbox series

Patch

diff --git a/abspath.c b/abspath.c
index 6f15a418bb..e6630b3618 100644
--- a/abspath.c
+++ b/abspath.c
@@ -80,6 +80,17 @@  static void get_root_part(struct strbuf *resolved, struct strbuf *remaining)
  */
 char *strbuf_realpath(struct strbuf *resolved, const char *path,
 		      int die_on_error)
+{
+	return strbuf_realpath_missing(resolved, path, 0, die_on_error);
+}
+
+/*
+ * Just like strbuf_realpath, but allows specifying how many missing components
+ * are permitted.  If many_missing is true, an arbitrary number of path
+ * components may be missing; otherwise, only the last component may be missing.
+ */
+char *strbuf_realpath_missing(struct strbuf *resolved, const char *path,
+			      int many_missing, int die_on_error)
 {
 	struct strbuf remaining = STRBUF_INIT;
 	struct strbuf next = STRBUF_INIT;
@@ -129,7 +140,7 @@  char *strbuf_realpath(struct strbuf *resolved, const char *path,
 
 		if (lstat(resolved->buf, &st)) {
 			/* error out unless this was the last component */
-			if (errno != ENOENT || remaining.len) {
+			if (errno != ENOENT || (!many_missing && remaining.len)) {
 				if (die_on_error)
 					die_errno("Invalid path '%s'",
 						  resolved->buf);
diff --git a/cache.h b/cache.h
index e986cf4ea9..a1386235fc 100644
--- a/cache.h
+++ b/cache.h
@@ -1320,6 +1320,8 @@  static inline int is_absolute_path(const char *path)
 int is_directory(const char *);
 char *strbuf_realpath(struct strbuf *resolved, const char *path,
 		      int die_on_error);
+char *strbuf_realpath_missing(struct strbuf *resolved, const char *path,
+			      int missing_components, int die_on_error);
 char *real_pathdup(const char *path, int die_on_error);
 const char *absolute_path(const char *path);
 char *absolute_pathdup(const char *path);