Message ID | 20230502211454.1673000-1-calvinwan@google.com (mailing list archive) |
---|---|
Headers | show |
Series | strbuf cleanups | expand |
Calvin Wan <calvinwan@google.com> writes: > Strbuf is a basic structure that should function as a low-level library > due to how generic it is. Over time certain functions inside of > strbuf.[ch] have been added with dependencies to higher level objects > and functions. This series cleans up some of those higher level > dependencies by moving the offending functions to the files they > interact with. With the goal of eventually being able to stand up strbuf > as a libary, this series also removes the use of environment variables > from strbuf. I touched a bit of the same in my review for 1/6, but the main thrust of the series is that you want to make strbuf.[ch] mostly about "string processing primitives". Any meaningful features relate to more than one conceptual things, e.g. pathname functions work on "paths" (that is one) and it may store computed result in a "strbuf" (that is another). We have historically called them strbuf_do_something() and gave the pointer to a strbuf that receives the result. And this series wants to reverse the course and want to see strbuf that is more pure. That's fine. The strbuf has become so successful a data structure, its use has become as ubiquitous as a plain string or an integer in our codebase. But if we were moving in that direction, I have to wonder if some of these functions also need to be renamed to lose their strbuf_ prefix. A function that takes a pathname and creates a directory there is not called string_mkdir() just because it takes a pathname as a string. A function that takes a string buffer and a path, and reads the symbolic link content is not called string_readlink() just because it learns the symlink target in a string buffer. What their parameters mean matters a lot more than what type these parameters are represented as. With some other topics in flight, merging this to 'seen' and merging this to 'next' may require different merge fixes, but I'll manage. Thanks.
Junio C Hamano <gitster@pobox.com> writes: > But if we were moving in that direction, I have to wonder if some of > these functions also need to be renamed to lose their strbuf_ > prefix. Just to avoid misunderstanding. I do not mean to suggest renaming these inside this series. It would make things too noisy and even more distracting. But in the longer term, as we treat strbuf more and more as one of our basic data structures, it would make sense to lose strbuf_ from functions that are thrown out of strbuf.[ch] with this series, and reserve the prefix to functions that are left in strbuf.[ch], i.e. those that are about string operations.
On Tue, May 2, 2023 at 3:20 PM Junio C Hamano <gitster@pobox.com> wrote: > > Calvin Wan <calvinwan@google.com> writes: > > > Strbuf is a basic structure that should function as a low-level library > > due to how generic it is. Over time certain functions inside of > > strbuf.[ch] have been added with dependencies to higher level objects > > and functions. This series cleans up some of those higher level > > dependencies by moving the offending functions to the files they > > interact with. With the goal of eventually being able to stand up strbuf > > as a libary, this series also removes the use of environment variables > > from strbuf. > > I touched a bit of the same in my review for 1/6, but the main > thrust of the series is that you want to make strbuf.[ch] mostly > about "string processing primitives". Thanks for clarifying the boundary I was attempting to explain in my cover letter, but unsuccessfully did so. > > Any meaningful features relate to more than one conceptual things, > e.g. pathname functions work on "paths" (that is one) and it may > store computed result in a "strbuf" (that is another). We have > historically called them strbuf_do_something() and gave the pointer > to a strbuf that receives the result. And this series wants to > reverse the course and want to see strbuf that is more pure. > > That's fine. The strbuf has become so successful a data structure, > its use has become as ubiquitous as a plain string or an integer in > our codebase. > > But if we were moving in that direction, I have to wonder if some of > these functions also need to be renamed to lose their strbuf_ > prefix. A function that takes a pathname and creates a directory > there is not called string_mkdir() just because it takes a pathname > as a string. A function that takes a string buffer and a path, and > reads the symbolic link content is not called string_readlink() just > because it learns the symlink target in a string buffer. What their > parameters mean matters a lot more than what type these parameters > are represented as. I agree that with the functions that are moved out from strbuf, they should be refactored in the future to the form, taking absolute path as an example, strbuf_addstr(sb, get_absolute_path(path)) or something similar. > > With some other topics in flight, merging this to 'seen' and merging > this to 'next' may require different merge fixes, but I'll manage. > > Thanks. >
Junio C Hamano wrote: > Junio C Hamano <gitster@pobox.com> writes: > > > But if we were moving in that direction, I have to wonder if some of > > these functions also need to be renamed to lose their strbuf_ > > prefix. > > Just to avoid misunderstanding. I do not mean to suggest renaming > these inside this series. It would make things too noisy and even > more distracting. But in the longer term, as we treat strbuf more > and more as one of our basic data structures, it would make sense to > lose strbuf_ from functions that are thrown out of strbuf.[ch] with > this series, and reserve the prefix to functions that are left in > strbuf.[ch], i.e. those that are about string operations. I thought precisely the same thing: 1) it's good to move them away, 2) they should lose the "strbuf_" prefix, 3) that doesn't need to happen in this series.
On Tue, May 2, 2023 at 2:15 PM Calvin Wan <calvinwan@google.com> wrote: > > Strbuf is a basic structure that should function as a low-level library > due to how generic it is. Over time certain functions inside of > strbuf.[ch] have been added with dependencies to higher level objects > and functions. This series cleans up some of those higher level > dependencies by moving the offending functions to the files they > interact with. With the goal of eventually being able to stand up strbuf > as a libary, this series also removes the use of environment variables > from strbuf. > > Calvin Wan (6): > abspath: move related functions to abspath > credential-store: move related functions to credential-store file > object-name: move related functions to object-name > path: move related function to path > strbuf: clarify dependency > strbuf: remove environment variables > > abspath.c | 36 +++++++++++++ > abspath.h | 21 ++++++++ > add-patch.c | 12 +++-- > builtin/am.c | 2 +- > builtin/branch.c | 4 +- > builtin/commit.c | 2 +- > builtin/credential-store.c | 19 +++++++ > builtin/merge.c | 10 ++-- > builtin/notes.c | 16 +++--- > builtin/rebase.c | 2 +- > builtin/stripspace.c | 6 ++- > builtin/tag.c | 9 ++-- > fmt-merge-msg.c | 9 ++-- > gpg-interface.c | 5 +- > hook.c | 1 + > object-name.c | 15 ++++++ > object-name.h | 9 ++++ > path.c | 20 +++++++ > path.h | 5 ++ > pretty.c | 1 + > rebase-interactive.c | 15 +++--- > sequencer.c | 24 +++++---- > strbuf.c | 106 +++---------------------------------- > strbuf.h | 44 ++------------- > tempfile.c | 1 + > wt-status.c | 6 +-- > 26 files changed, 213 insertions(+), 187 deletions(-) > > -- > 2.40.1.495.gc816e09b53d-goog The series looks pretty good to me. I left a couple small comments on 5/6 and 6/6. One other high-level note: As Ævar noted over at https://lore.kernel.org/git/230501.86a5yohsme.gmgdl@evledraar.gmail.com/, strbuf_add_separated_string_list() is currently only used by merge-ort.c and merge-recursive.c (both of which include merge-recursive.h). It may make sense to move that function, and that would permit you to drop the forward declaration of 'struct string_list;' from strbuf.c as well.
While moving strbuf_add_separated_string_list() to a separate file would mean that strbuf would no longer have a dependency on string-list, I don't think that dependency is problematic to begin with. Widening the boundary for strbuf as a string manipulation library to a string and string list manipulation library seems reasonable to me. On Tue, May 2, 2023 at 7:37 PM Elijah Newren <newren@gmail.com> wrote: > > On Tue, May 2, 2023 at 2:15 PM Calvin Wan <calvinwan@google.com> wrote: > > > > Strbuf is a basic structure that should function as a low-level library > > due to how generic it is. Over time certain functions inside of > > strbuf.[ch] have been added with dependencies to higher level objects > > and functions. This series cleans up some of those higher level > > dependencies by moving the offending functions to the files they > > interact with. With the goal of eventually being able to stand up strbuf > > as a libary, this series also removes the use of environment variables > > from strbuf. > > > > Calvin Wan (6): > > abspath: move related functions to abspath > > credential-store: move related functions to credential-store file > > object-name: move related functions to object-name > > path: move related function to path > > strbuf: clarify dependency > > strbuf: remove environment variables > > > > abspath.c | 36 +++++++++++++ > > abspath.h | 21 ++++++++ > > add-patch.c | 12 +++-- > > builtin/am.c | 2 +- > > builtin/branch.c | 4 +- > > builtin/commit.c | 2 +- > > builtin/credential-store.c | 19 +++++++ > > builtin/merge.c | 10 ++-- > > builtin/notes.c | 16 +++--- > > builtin/rebase.c | 2 +- > > builtin/stripspace.c | 6 ++- > > builtin/tag.c | 9 ++-- > > fmt-merge-msg.c | 9 ++-- > > gpg-interface.c | 5 +- > > hook.c | 1 + > > object-name.c | 15 ++++++ > > object-name.h | 9 ++++ > > path.c | 20 +++++++ > > path.h | 5 ++ > > pretty.c | 1 + > > rebase-interactive.c | 15 +++--- > > sequencer.c | 24 +++++---- > > strbuf.c | 106 +++---------------------------------- > > strbuf.h | 44 ++------------- > > tempfile.c | 1 + > > wt-status.c | 6 +-- > > 26 files changed, 213 insertions(+), 187 deletions(-) > > > > -- > > 2.40.1.495.gc816e09b53d-goog > > The series looks pretty good to me. I left a couple small comments on > 5/6 and 6/6. One other high-level note: > > As Ævar noted over at > https://lore.kernel.org/git/230501.86a5yohsme.gmgdl@evledraar.gmail.com/, > strbuf_add_separated_string_list() is currently only used by > merge-ort.c and merge-recursive.c (both of which include > merge-recursive.h). It may make sense to move that function, and that > would permit you to drop the forward declaration of 'struct > string_list;' from strbuf.c as well.
On Wed, May 3, 2023 at 11:00 AM Calvin Wan <calvinwan@google.com> wrote: > > While moving strbuf_add_separated_string_list() to a separate file > would mean that strbuf would no longer have a dependency on > string-list, I don't think that dependency is problematic to begin > with. Widening the boundary for strbuf as a string manipulation > library to a string and string list manipulation library seems > reasonable to me. Oh, the high level idea behind string-list might make sense at this level, but I was assuming Peff would show up at some point and highlight the evils of the current string-list API[1][2][3] and how we should avoid using, depending on, or implementing something that acts like it. :-) Of course, you're explicitly not trying to make any API or ABI guarantees, so it's certainly fine to shelve or defer such cleanups for later. [1] https://lore.kernel.org/git/Y7lx1hUpZ7zOP1Lo@coredump.intra.peff.net/ [2] https://lore.kernel.org/git/20180906191203.GA26184@sigill.intra.peff.net/ [3] https://lore.kernel.org/git/20200821200121.GF1165@coredump.intra.peff.net/
On Sat, May 06, 2023 at 05:14:55PM -0700, Elijah Newren wrote: > On Wed, May 3, 2023 at 11:00 AM Calvin Wan <calvinwan@google.com> wrote: > > > > While moving strbuf_add_separated_string_list() to a separate file > > would mean that strbuf would no longer have a dependency on > > string-list, I don't think that dependency is problematic to begin > > with. Widening the boundary for strbuf as a string manipulation > > library to a string and string list manipulation library seems > > reasonable to me. > > Oh, the high level idea behind string-list might make sense at this > level, but I was assuming Peff would show up at some point and > highlight the evils of the current string-list API[1][2][3] and how we > should avoid using, depending on, or implementing something that acts > like it. :-) You rang? :) Yes, IMHO this strbuf_add_separated_string_list() is another example of why string-list sucks: it doesn't degrade to a natural array type (because the "util" magic requires a struct). If it were a strvec or similar, we could just pass "const char **str, size_t len", which would make the helper function simpler and more generally useful. I know there may be other reasons to use a string-list in the caller here, though (looks like it uses "nodup"). So as usual, the situation is not so simple as "we should just switch types". -Peff