mbox series

[0/6] strbuf cleanups

Message ID 20230502211454.1673000-1-calvinwan@google.com (mailing list archive)
Headers show
Series strbuf cleanups | expand

Message

Calvin Wan May 2, 2023, 9:14 p.m. UTC
Strbuf is a basic structure that should function as a low-level library
due to how generic it is. Over time certain functions inside of
strbuf.[ch] have been added with dependencies to higher level objects
and functions. This series cleans up some of those higher level
dependencies by moving the offending functions to the files they
interact with. With the goal of eventually being able to stand up strbuf
as a libary, this series also removes the use of environment variables
from strbuf.

Calvin Wan (6):
  abspath: move related functions to abspath
  credential-store: move related functions to credential-store file
  object-name: move related functions to object-name
  path: move related function to path
  strbuf: clarify dependency
  strbuf: remove environment variables

 abspath.c                  |  36 +++++++++++++
 abspath.h                  |  21 ++++++++
 add-patch.c                |  12 +++--
 builtin/am.c               |   2 +-
 builtin/branch.c           |   4 +-
 builtin/commit.c           |   2 +-
 builtin/credential-store.c |  19 +++++++
 builtin/merge.c            |  10 ++--
 builtin/notes.c            |  16 +++---
 builtin/rebase.c           |   2 +-
 builtin/stripspace.c       |   6 ++-
 builtin/tag.c              |   9 ++--
 fmt-merge-msg.c            |   9 ++--
 gpg-interface.c            |   5 +-
 hook.c                     |   1 +
 object-name.c              |  15 ++++++
 object-name.h              |   9 ++++
 path.c                     |  20 +++++++
 path.h                     |   5 ++
 pretty.c                   |   1 +
 rebase-interactive.c       |  15 +++---
 sequencer.c                |  24 +++++----
 strbuf.c                   | 106 +++----------------------------------
 strbuf.h                   |  44 ++-------------
 tempfile.c                 |   1 +
 wt-status.c                |   6 +--
 26 files changed, 213 insertions(+), 187 deletions(-)

Comments

Junio C Hamano May 2, 2023, 10:20 p.m. UTC | #1
Calvin Wan <calvinwan@google.com> writes:

> Strbuf is a basic structure that should function as a low-level library
> due to how generic it is. Over time certain functions inside of
> strbuf.[ch] have been added with dependencies to higher level objects
> and functions. This series cleans up some of those higher level
> dependencies by moving the offending functions to the files they
> interact with. With the goal of eventually being able to stand up strbuf
> as a libary, this series also removes the use of environment variables
> from strbuf.

I touched a bit of the same in my review for 1/6, but the main
thrust of the series is that you want to make strbuf.[ch] mostly
about "string processing primitives".

Any meaningful features relate to more than one conceptual things,
e.g. pathname functions work on "paths" (that is one) and it may
store computed result in a "strbuf" (that is another).  We have
historically called them strbuf_do_something() and gave the pointer
to a strbuf that receives the result.  And this series wants to
reverse the course and want to see strbuf that is more pure.

That's fine.  The strbuf has become so successful a data structure,
its use has become as ubiquitous as a plain string or an integer in
our codebase.

But if we were moving in that direction, I have to wonder if some of
these functions also need to be renamed to lose their strbuf_
prefix.  A function that takes a pathname and creates a directory
there is not called string_mkdir() just because it takes a pathname
as a string.  A function that takes a string buffer and a path, and
reads the symbolic link content is not called string_readlink() just
because it learns the symlink target in a string buffer.  What their
parameters mean matters a lot more than what type these parameters
are represented as.

With some other topics in flight, merging this to 'seen' and merging
this to 'next' may require different merge fixes, but I'll manage.

Thanks.
Junio C Hamano May 2, 2023, 10:31 p.m. UTC | #2
Junio C Hamano <gitster@pobox.com> writes:

> But if we were moving in that direction, I have to wonder if some of
> these functions also need to be renamed to lose their strbuf_
> prefix.

Just to avoid misunderstanding.  I do not mean to suggest renaming
these inside this series.  It would make things too noisy and even
more distracting.  But in the longer term, as we treat strbuf more
and more as one of our basic data structures, it would make sense to
lose strbuf_ from functions that are thrown out of strbuf.[ch] with
this series, and reserve the prefix to functions that are left in
strbuf.[ch], i.e. those that are about string operations.
Calvin Wan May 2, 2023, 10:36 p.m. UTC | #3
On Tue, May 2, 2023 at 3:20 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Calvin Wan <calvinwan@google.com> writes:
>
> > Strbuf is a basic structure that should function as a low-level library
> > due to how generic it is. Over time certain functions inside of
> > strbuf.[ch] have been added with dependencies to higher level objects
> > and functions. This series cleans up some of those higher level
> > dependencies by moving the offending functions to the files they
> > interact with. With the goal of eventually being able to stand up strbuf
> > as a libary, this series also removes the use of environment variables
> > from strbuf.
>
> I touched a bit of the same in my review for 1/6, but the main
> thrust of the series is that you want to make strbuf.[ch] mostly
> about "string processing primitives".

Thanks for clarifying the boundary I was attempting to explain in my
cover letter, but unsuccessfully did so.

>
> Any meaningful features relate to more than one conceptual things,
> e.g. pathname functions work on "paths" (that is one) and it may
> store computed result in a "strbuf" (that is another).  We have
> historically called them strbuf_do_something() and gave the pointer
> to a strbuf that receives the result.  And this series wants to
> reverse the course and want to see strbuf that is more pure.
>
> That's fine.  The strbuf has become so successful a data structure,
> its use has become as ubiquitous as a plain string or an integer in
> our codebase.
>
> But if we were moving in that direction, I have to wonder if some of
> these functions also need to be renamed to lose their strbuf_
> prefix.  A function that takes a pathname and creates a directory
> there is not called string_mkdir() just because it takes a pathname
> as a string.  A function that takes a string buffer and a path, and
> reads the symbolic link content is not called string_readlink() just
> because it learns the symlink target in a string buffer.  What their
> parameters mean matters a lot more than what type these parameters
> are represented as.

I agree that with the functions that are moved out from strbuf, they
should be refactored in the future to the form, taking absolute path
as an example, strbuf_addstr(sb, get_absolute_path(path)) or something
similar.

>
> With some other topics in flight, merging this to 'seen' and merging
> this to 'next' may require different merge fixes, but I'll manage.
>
> Thanks.
>
Felipe Contreras May 2, 2023, 11:51 p.m. UTC | #4
Junio C Hamano wrote:
> Junio C Hamano <gitster@pobox.com> writes:
> 
> > But if we were moving in that direction, I have to wonder if some of
> > these functions also need to be renamed to lose their strbuf_
> > prefix.
> 
> Just to avoid misunderstanding.  I do not mean to suggest renaming
> these inside this series.  It would make things too noisy and even
> more distracting.  But in the longer term, as we treat strbuf more
> and more as one of our basic data structures, it would make sense to
> lose strbuf_ from functions that are thrown out of strbuf.[ch] with
> this series, and reserve the prefix to functions that are left in
> strbuf.[ch], i.e. those that are about string operations.

I thought precisely the same thing: 1) it's good to move them away, 2)
they should lose the "strbuf_" prefix, 3) that doesn't need to happen in
this series.
Elijah Newren May 3, 2023, 2:37 a.m. UTC | #5
On Tue, May 2, 2023 at 2:15 PM Calvin Wan <calvinwan@google.com> wrote:
>
> Strbuf is a basic structure that should function as a low-level library
> due to how generic it is. Over time certain functions inside of
> strbuf.[ch] have been added with dependencies to higher level objects
> and functions. This series cleans up some of those higher level
> dependencies by moving the offending functions to the files they
> interact with. With the goal of eventually being able to stand up strbuf
> as a libary, this series also removes the use of environment variables
> from strbuf.
>
> Calvin Wan (6):
>   abspath: move related functions to abspath
>   credential-store: move related functions to credential-store file
>   object-name: move related functions to object-name
>   path: move related function to path
>   strbuf: clarify dependency
>   strbuf: remove environment variables
>
>  abspath.c                  |  36 +++++++++++++
>  abspath.h                  |  21 ++++++++
>  add-patch.c                |  12 +++--
>  builtin/am.c               |   2 +-
>  builtin/branch.c           |   4 +-
>  builtin/commit.c           |   2 +-
>  builtin/credential-store.c |  19 +++++++
>  builtin/merge.c            |  10 ++--
>  builtin/notes.c            |  16 +++---
>  builtin/rebase.c           |   2 +-
>  builtin/stripspace.c       |   6 ++-
>  builtin/tag.c              |   9 ++--
>  fmt-merge-msg.c            |   9 ++--
>  gpg-interface.c            |   5 +-
>  hook.c                     |   1 +
>  object-name.c              |  15 ++++++
>  object-name.h              |   9 ++++
>  path.c                     |  20 +++++++
>  path.h                     |   5 ++
>  pretty.c                   |   1 +
>  rebase-interactive.c       |  15 +++---
>  sequencer.c                |  24 +++++----
>  strbuf.c                   | 106 +++----------------------------------
>  strbuf.h                   |  44 ++-------------
>  tempfile.c                 |   1 +
>  wt-status.c                |   6 +--
>  26 files changed, 213 insertions(+), 187 deletions(-)
>
> --
> 2.40.1.495.gc816e09b53d-goog

The series looks pretty good to me.  I left a couple small comments on
5/6 and 6/6.  One other high-level note:

As Ævar noted over at
https://lore.kernel.org/git/230501.86a5yohsme.gmgdl@evledraar.gmail.com/,
strbuf_add_separated_string_list() is currently only used by
merge-ort.c and merge-recursive.c (both of which include
merge-recursive.h).  It may make sense to move that function, and that
would permit you to drop the forward declaration of 'struct
string_list;' from strbuf.c as well.
Calvin Wan May 3, 2023, 6 p.m. UTC | #6
While moving strbuf_add_separated_string_list() to a separate file
would mean that strbuf would no longer have a dependency on
string-list, I don't think that dependency is problematic to begin
with. Widening the boundary for strbuf as a string manipulation
library to a string and string list manipulation library seems
reasonable to me.


On Tue, May 2, 2023 at 7:37 PM Elijah Newren <newren@gmail.com> wrote:
>
> On Tue, May 2, 2023 at 2:15 PM Calvin Wan <calvinwan@google.com> wrote:
> >
> > Strbuf is a basic structure that should function as a low-level library
> > due to how generic it is. Over time certain functions inside of
> > strbuf.[ch] have been added with dependencies to higher level objects
> > and functions. This series cleans up some of those higher level
> > dependencies by moving the offending functions to the files they
> > interact with. With the goal of eventually being able to stand up strbuf
> > as a libary, this series also removes the use of environment variables
> > from strbuf.
> >
> > Calvin Wan (6):
> >   abspath: move related functions to abspath
> >   credential-store: move related functions to credential-store file
> >   object-name: move related functions to object-name
> >   path: move related function to path
> >   strbuf: clarify dependency
> >   strbuf: remove environment variables
> >
> >  abspath.c                  |  36 +++++++++++++
> >  abspath.h                  |  21 ++++++++
> >  add-patch.c                |  12 +++--
> >  builtin/am.c               |   2 +-
> >  builtin/branch.c           |   4 +-
> >  builtin/commit.c           |   2 +-
> >  builtin/credential-store.c |  19 +++++++
> >  builtin/merge.c            |  10 ++--
> >  builtin/notes.c            |  16 +++---
> >  builtin/rebase.c           |   2 +-
> >  builtin/stripspace.c       |   6 ++-
> >  builtin/tag.c              |   9 ++--
> >  fmt-merge-msg.c            |   9 ++--
> >  gpg-interface.c            |   5 +-
> >  hook.c                     |   1 +
> >  object-name.c              |  15 ++++++
> >  object-name.h              |   9 ++++
> >  path.c                     |  20 +++++++
> >  path.h                     |   5 ++
> >  pretty.c                   |   1 +
> >  rebase-interactive.c       |  15 +++---
> >  sequencer.c                |  24 +++++----
> >  strbuf.c                   | 106 +++----------------------------------
> >  strbuf.h                   |  44 ++-------------
> >  tempfile.c                 |   1 +
> >  wt-status.c                |   6 +--
> >  26 files changed, 213 insertions(+), 187 deletions(-)
> >
> > --
> > 2.40.1.495.gc816e09b53d-goog
>
> The series looks pretty good to me.  I left a couple small comments on
> 5/6 and 6/6.  One other high-level note:
>
> As Ævar noted over at
> https://lore.kernel.org/git/230501.86a5yohsme.gmgdl@evledraar.gmail.com/,
> strbuf_add_separated_string_list() is currently only used by
> merge-ort.c and merge-recursive.c (both of which include
> merge-recursive.h).  It may make sense to move that function, and that
> would permit you to drop the forward declaration of 'struct
> string_list;' from strbuf.c as well.
Elijah Newren May 7, 2023, 12:14 a.m. UTC | #7
On Wed, May 3, 2023 at 11:00 AM Calvin Wan <calvinwan@google.com> wrote:
>
> While moving strbuf_add_separated_string_list() to a separate file
> would mean that strbuf would no longer have a dependency on
> string-list, I don't think that dependency is problematic to begin
> with. Widening the boundary for strbuf as a string manipulation
> library to a string and string list manipulation library seems
> reasonable to me.

Oh, the high level idea behind string-list might make sense at this
level, but I was assuming Peff would show up at some point and
highlight the evils of the current string-list API[1][2][3] and how we
should avoid using, depending on, or implementing something that acts
like it.  :-)

Of course, you're explicitly not trying to make any API or ABI
guarantees, so it's certainly fine to shelve or defer such cleanups
for later.

[1] https://lore.kernel.org/git/Y7lx1hUpZ7zOP1Lo@coredump.intra.peff.net/
[2] https://lore.kernel.org/git/20180906191203.GA26184@sigill.intra.peff.net/
[3] https://lore.kernel.org/git/20200821200121.GF1165@coredump.intra.peff.net/
Jeff King May 7, 2023, 1:14 p.m. UTC | #8
On Sat, May 06, 2023 at 05:14:55PM -0700, Elijah Newren wrote:

> On Wed, May 3, 2023 at 11:00 AM Calvin Wan <calvinwan@google.com> wrote:
> >
> > While moving strbuf_add_separated_string_list() to a separate file
> > would mean that strbuf would no longer have a dependency on
> > string-list, I don't think that dependency is problematic to begin
> > with. Widening the boundary for strbuf as a string manipulation
> > library to a string and string list manipulation library seems
> > reasonable to me.
> 
> Oh, the high level idea behind string-list might make sense at this
> level, but I was assuming Peff would show up at some point and
> highlight the evils of the current string-list API[1][2][3] and how we
> should avoid using, depending on, or implementing something that acts
> like it.  :-)

You rang? :)

Yes, IMHO this strbuf_add_separated_string_list() is another example of
why string-list sucks: it doesn't degrade to a natural array type
(because the "util" magic requires a struct). If it were a strvec or
similar, we could just pass "const char **str, size_t len", which would
make the helper function simpler and more generally useful.

I know there may be other reasons to use a string-list in the caller
here, though (looks like it uses "nodup"). So as usual, the situation is
not so simple as "we should just switch types".

-Peff