[v2,1/2] path: add a function to check for path suffix
diff mbox series

Message ID 20190811174748.33552-2-sandals@crustytoothpaste.net
State New
Headers show
Series
  • Honor .gitattributes with rebase --am
Related show

Commit Message

brian m. carlson Aug. 11, 2019, 5:47 p.m. UTC
We have a function to strip the path suffix from a commit, but we don't
have one to check for a path suffix. For a plain filename, we can use
basename, but that requires an allocation, since POSIX allows it to
modify its argument. Refactor strip_path_suffix into a helper function
and a new function, has_path_suffix to meet this need.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 path.c | 39 ++++++++++++++++++++++++++++++---------
 path.h |  3 +++
 2 files changed, 33 insertions(+), 9 deletions(-)

Comments

Junio C Hamano Aug. 12, 2019, 12:32 a.m. UTC | #1
"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> We have a function to strip the path suffix from a commit, but we don't
> have one to check for a path suffix. For a plain filename, we can use
> basename, but that requires an allocation, since POSIX allows it to
> modify its argument. Refactor strip_path_suffix into a helper function
> and a new function, has_path_suffix to meet this need.

I wish we did not use a crazy phrase "path suffix", that would
inevitably confuse ourselves with things like ".exe".

>  /*
> + * If path ends with suffix (complete path components), returns the offset of
> + * the last character in the path before the suffix (sans trailing directory
> + * separators), and -1 otherwise.

i.e. this is offset to the last path component.

> +static ssize_t stripped_path_suffix_offset(const char *path, const char *suffix)

Perhaps

    static ssize_t last_path_component_offset(const char *path, const char *name)

I am tempted to also call the second parameter to this function
"basename", as we know from the proposed log message that you wish
"basename" were usable for this purpose, but basename itself has
another confusing factor (i.e. "are we stripping ".exe" extension?",
to which the answer is no in the context of these functions).

If we agree with the "last path component" phrasing, has_path_suffix()
would become something like:

    int last_path_component_equals(const char *path, const char *name);

perhaps.
brian m. carlson Aug. 12, 2019, 1:10 a.m. UTC | #2
On 2019-08-12 at 00:32:26, Junio C Hamano wrote:
> "brian m. carlson" <sandals@crustytoothpaste.net> writes:
> > +static ssize_t stripped_path_suffix_offset(const char *path, const char *suffix)
> 
> Perhaps
> 
>     static ssize_t last_path_component_offset(const char *path, const char *name)
> 
> I am tempted to also call the second parameter to this function
> "basename", as we know from the proposed log message that you wish
> "basename" were usable for this purpose, but basename itself has
> another confusing factor (i.e. "are we stripping ".exe" extension?",
> to which the answer is no in the context of these functions).
> 
> If we agree with the "last path component" phrasing, has_path_suffix()
> would become something like:
> 
>     int last_path_component_equals(const char *path, const char *name);

Except this is not necessarily the last path component. It could match
one or more path components with the way the function is written. If you
want to ignore that and name the function accordingly, I won't object,
but we could theoretically handle a name like "foo/.gitattributes" as
well.
SZEDER Gábor Aug. 12, 2019, 4:36 a.m. UTC | #3
On Mon, Aug 12, 2019 at 01:10:54AM +0000, brian m. carlson wrote:
> On 2019-08-12 at 00:32:26, Junio C Hamano wrote:
> > "brian m. carlson" <sandals@crustytoothpaste.net> writes:
> > > +static ssize_t stripped_path_suffix_offset(const char *path, const char *suffix)
> > 
> > Perhaps
> > 
> >     static ssize_t last_path_component_offset(const char *path, const char *name)
> > 
> > I am tempted to also call the second parameter to this function
> > "basename", as we know from the proposed log message that you wish
> > "basename" were usable for this purpose, but basename itself has
> > another confusing factor (i.e. "are we stripping ".exe" extension?",
> > to which the answer is no in the context of these functions).
> > 
> > If we agree with the "last path component" phrasing, has_path_suffix()
> > would become something like:
> > 
> >     int last_path_component_equals(const char *path, const char *name);
> 
> Except this is not necessarily the last path component. It could match
> one or more path components with the way the function is written. If you
> want to ignore that and name the function accordingly, I won't object,
> but we could theoretically handle a name like "foo/.gitattributes" as
> well.

ends_with_path_components(), perhaps?

I think having "path_component" in some form in the function name
would have avoided my confusion mentioned earlier in a reply to the
first version.
Junio C Hamano Aug. 12, 2019, 4:49 p.m. UTC | #4
"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> On 2019-08-12 at 00:32:26, Junio C Hamano wrote:
>> "brian m. carlson" <sandals@crustytoothpaste.net> writes:
>> > +static ssize_t stripped_path_suffix_offset(const char *path, const char *suffix)
>> 
>> Perhaps
>> 
>>     static ssize_t last_path_component_offset(const char *path, const char *name)
>> 
>> I am tempted to also call the second parameter to this function
>> "basename", as we know from the proposed log message that you wish
>> "basename" were usable for this purpose, but basename itself has
>> another confusing factor (i.e. "are we stripping ".exe" extension?",
>> to which the answer is no in the context of these functions).
>> 
>> If we agree with the "last path component" phrasing, has_path_suffix()
>> would become something like:
>> 
>>     int last_path_component_equals(const char *path, const char *name);
>
> Except this is not necessarily the last path component. It could match
> one or more path components with the way the function is written.

That's fair.  Is the feature that allows the function called
ends_with_component*S* like Szeder suggests designed one, i.e. with
an explicit purpose of supporting callers that pass "foo/bar" as the
"suffix" to it, or is it merely that the implementation happens to
work that way, even though the expected use that is supported is to
pass only one level component but the implementation did not even
bother asserting that the "suffix" does not have a slash in it?
brian m. carlson Aug. 12, 2019, 10:40 p.m. UTC | #5
On 2019-08-12 at 16:49:20, Junio C Hamano wrote:
> "brian m. carlson" <sandals@crustytoothpaste.net> writes:
> 
> > On 2019-08-12 at 00:32:26, Junio C Hamano wrote:
> >> "brian m. carlson" <sandals@crustytoothpaste.net> writes:
> >> > +static ssize_t stripped_path_suffix_offset(const char *path, const char *suffix)
> >> 
> >> Perhaps
> >> 
> >>     static ssize_t last_path_component_offset(const char *path, const char *name)
> >> 
> >> I am tempted to also call the second parameter to this function
> >> "basename", as we know from the proposed log message that you wish
> >> "basename" were usable for this purpose, but basename itself has
> >> another confusing factor (i.e. "are we stripping ".exe" extension?",
> >> to which the answer is no in the context of these functions).
> >> 
> >> If we agree with the "last path component" phrasing, has_path_suffix()
> >> would become something like:
> >> 
> >>     int last_path_component_equals(const char *path, const char *name);
> >
> > Except this is not necessarily the last path component. It could match
> > one or more path components with the way the function is written.
> 
> That's fair.  Is the feature that allows the function called
> ends_with_component*S* like Szeder suggests designed one, i.e. with
> an explicit purpose of supporting callers that pass "foo/bar" as the
> "suffix" to it, or is it merely that the implementation happens to
> work that way, even though the expected use that is supported is to
> pass only one level component but the implementation did not even
> bother asserting that the "suffix" does not have a slash in it?

Well, I split it out from a function that handles multiple path
components, mostly so that I could leverage existing work (and not have
to worry about getting it wrong). It wasn't explicitly intended that it
support multiple components, since I don't require that for my
implementation, but I could see future users taking advantage of that.

I think "ends_with_path_components" might be the way forward, unless
you think something else would be better.
Jeff King Aug. 13, 2019, 1:13 a.m. UTC | #6
On Mon, Aug 12, 2019 at 10:40:21PM +0000, brian m. carlson wrote:

> I think "ends_with_path_components" might be the way forward, unless
> you think something else would be better.

FWIW, having read the rest of the thread, that was the name that clicked
for me.

-Peff
SZEDER Gábor Aug. 13, 2019, 6:36 a.m. UTC | #7
On Mon, Aug 12, 2019 at 09:49:20AM -0700, Junio C Hamano wrote:
> "brian m. carlson" <sandals@crustytoothpaste.net> writes:
> 
> > On 2019-08-12 at 00:32:26, Junio C Hamano wrote:
> >> "brian m. carlson" <sandals@crustytoothpaste.net> writes:
> >> > +static ssize_t stripped_path_suffix_offset(const char *path, const char *suffix)
> >> 
> >> Perhaps
> >> 
> >>     static ssize_t last_path_component_offset(const char *path, const char *name)
> >> 
> >> I am tempted to also call the second parameter to this function
> >> "basename", as we know from the proposed log message that you wish
> >> "basename" were usable for this purpose, but basename itself has
> >> another confusing factor (i.e. "are we stripping ".exe" extension?",
> >> to which the answer is no in the context of these functions).
> >> 
> >> If we agree with the "last path component" phrasing, has_path_suffix()
> >> would become something like:
> >> 
> >>     int last_path_component_equals(const char *path, const char *name);
> >
> > Except this is not necessarily the last path component. It could match
> > one or more path components with the way the function is written.
> 
> That's fair.  Is the feature that allows the function called
> ends_with_component*S* like Szeder suggests designed one, i.e. with
> an explicit purpose of supporting callers that pass "foo/bar" as the
> "suffix" to it, or is it merely that the implementation happens to
> work that way, even though the expected use that is supported is to
> pass only one level component but the implementation did not even
> bother asserting that the "suffix" does not have a slash in it?

The plural in the suggested function name was intentional on my part,
even though in this callsite in question we are only interested in the
filename, i.e. only a single path component.

I was hoping that the names of these related functions will be kept in
sync, and all will somehow contain the "path_components" substring,
e.g. strip_path_suffix() becomes strip_suffix_path_components() or
something.  And that function must be able to handle multiple path
components, becase there is this callsite:

  exec-cmd.c:         !(prefix = strip_path_suffix(executable_dirname, GIT_EXEC_PATH)) &&

and the build sets '-DGIT_EXEC_PATH="libexec/git-core"' by default.
Junio C Hamano Aug. 13, 2019, 4:40 p.m. UTC | #8
"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> Well, I split it out from a function that handles multiple path
> components, mostly so that I could leverage existing work (and not have
> to worry about getting it wrong). It wasn't explicitly intended that it
> support multiple components, since I don't require that for my
> implementation, but I could see future users taking advantage of that.
>
> I think "ends_with_path_components" might be the way forward, unless
> you think something else would be better.

Good; thanks.
Junio C Hamano Aug. 13, 2019, 4:42 p.m. UTC | #9
SZEDER Gábor <szeder.dev@gmail.com> writes:

> ...  And that function must be able to handle multiple path
> components, becase there is this callsite:
>
>   exec-cmd.c:         !(prefix = strip_path_suffix(executable_dirname, GIT_EXEC_PATH)) &&
>
> and the build sets '-DGIT_EXEC_PATH="libexec/git-core"' by default.

OK, that answers my earlier question.  We do want to support such a
caller with one or more components at the end.

Thanks.

Patch
diff mbox series

diff --git a/path.c b/path.c
index 25e97b8c3f..e193c62b7d 100644
--- a/path.c
+++ b/path.c
@@ -1221,31 +1221,52 @@  static inline int chomp_trailing_dir_sep(const char *path, int len)
 }
 
 /*
- * If path ends with suffix (complete path components), returns the
- * part before suffix (sans trailing directory separators).
- * Otherwise returns NULL.
+ * If path ends with suffix (complete path components), returns the offset of
+ * the last character in the path before the suffix (sans trailing directory
+ * separators), and -1 otherwise.
  */
-char *strip_path_suffix(const char *path, const char *suffix)
+static ssize_t stripped_path_suffix_offset(const char *path, const char *suffix)
 {
 	int path_len = strlen(path), suffix_len = strlen(suffix);
 
 	while (suffix_len) {
 		if (!path_len)
-			return NULL;
+			return -1;
 
 		if (is_dir_sep(path[path_len - 1])) {
 			if (!is_dir_sep(suffix[suffix_len - 1]))
-				return NULL;
+				return -1;
 			path_len = chomp_trailing_dir_sep(path, path_len);
 			suffix_len = chomp_trailing_dir_sep(suffix, suffix_len);
 		}
 		else if (path[--path_len] != suffix[--suffix_len])
-			return NULL;
+			return -1;
 	}
 
 	if (path_len && !is_dir_sep(path[path_len - 1]))
-		return NULL;
-	return xstrndup(path, chomp_trailing_dir_sep(path, path_len));
+		return -1;
+	return chomp_trailing_dir_sep(path, path_len);
+}
+
+/*
+ * Returns true if the path ends with suffix, considering only complete path
+ * components and false otherwise.
+ */
+int has_path_suffix(const char *path, const char *suffix)
+{
+	return stripped_path_suffix_offset(path, suffix) != -1;
+}
+
+/*
+ * If path ends with suffix (complete path components), returns the
+ * part before suffix (sans trailing directory separators).
+ * Otherwise returns NULL.
+ */
+char *strip_path_suffix(const char *path, const char *suffix)
+{
+	ssize_t offset = stripped_path_suffix_offset(path, suffix);
+
+	return offset == -1 ? NULL : xstrndup(path, offset);
 }
 
 int daemon_avoid_alias(const char *p)
diff --git a/path.h b/path.h
index 2ba6ca58c8..c01d045786 100644
--- a/path.h
+++ b/path.h
@@ -193,4 +193,7 @@  const char *git_path_merge_head(struct repository *r);
 const char *git_path_fetch_head(struct repository *r);
 const char *git_path_shallow(struct repository *r);
 
+
+int has_path_suffix(const char *path, const char *suffix);
+
 #endif /* PATH_H */