mbox series

[0/2] Simplify "commented" API functions

Message ID 20240912205301.1809355-1-gitster@pobox.com (mailing list archive)
Headers show
Series Simplify "commented" API functions | expand

Message

Junio C Hamano Sept. 12, 2024, 8:52 p.m. UTC
We [earlier] noticed that a few helper functions that format strings
into a strbuf, prefixed with an arbitrary comment leading character,
are forcing their callers to always pass the same argument.  Instead
of keeping this excess flexibility nobody wants in practice, narrow
the API so that all callers of these functions will get the same
comment leading character.

Superficially it may appear that this goes against one of the recent
trend, "war on global variables", but I doubt it matters much in the
longer run.

These call sites all have already been relying on the global
"comment_line_str" anyway, and passing the variable from the top of
arbitrary deep call chain, which may not care specifically about
this single variable comment_line_str, would not scale as a
solution.  If we were to make it a convention to pass something from
the very top of the call chain everywhere, it should not be an
individual variable that is overly specific, like comment_line_str.
Rather, it has to be something that allows access to a bag of all
the global settings (possibly wider than "the_repository" struct).
We can also limit our reliance to the globals by allowing a single
global function call to obtaion such a bag of all global settings,
i.e. "get_the_environment()", and allow everybody, including
functions at the leaf level, to ask "what is the comment_line_str in
the environment I am working in?".  As part of the libification, we
can plug in an appropriate shim for get_the_environment().


[earlier] https://lore.kernel.org/git/xmqq7cn4g3nx.fsf@gitster.g/

Junio C Hamano (2):
  strbuf: retire strbuf_commented_addf()
  strbuf: retire strbuf_commented_lines()

 add-patch.c          |  8 ++++----
 builtin/branch.c     |  2 +-
 builtin/merge.c      |  8 ++++----
 builtin/notes.c      | 10 +++++-----
 builtin/stripspace.c |  2 +-
 builtin/tag.c        |  4 ++--
 fmt-merge-msg.c      | 17 +++++++----------
 rebase-interactive.c |  8 ++++----
 sequencer.c          | 16 +++++++---------
 strbuf.c             | 11 +++++------
 strbuf.h             | 13 +++++--------
 wt-status.c          |  6 +++---
 12 files changed, 48 insertions(+), 57 deletions(-)

Comments

Junio C Hamano Sept. 12, 2024, 9:57 p.m. UTC | #1
Junio C Hamano <gitster@pobox.com> writes:

> [earlier] https://lore.kernel.org/git/xmqq7cn4g3nx.fsf@gitster.g/
>
> Junio C Hamano (2):
>   strbuf: retire strbuf_commented_addf()
>   strbuf: retire strbuf_commented_lines()

This of course has a semantic conflict with the change that hides
not just "the_repository" and functions that implicitly use that
variable but other unrelated globals from environment.h behind a
compatibility macro.

A trivial merge-fix to be used to annotate the merge to make it an
evil merge whose result compiles looks like this:

    merge-fix/jc/strbuf-commented-something

diff --git a/strbuf.c b/strbuf.c
index 05a6dc944b..cb972b07f7 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -7,6 +7,14 @@
 #include "utf8.h"
 #include "date.h"
 
+/*
+ * We do not need the_repository at all, but the comment_line_str is
+ * hidden behind USE_THE_REPOSITORY_VARIABLE CPP macro.  We could
+ * #define it before including "envirnoment.h".  Or we can make an
+ * external declaration ourselves here.
+ */
+extern const char *comment_line_str;
+
 int starts_with(const char *str, const char *prefix)
 {
 	for (; ; str++, prefix++)
Patrick Steinhardt Sept. 13, 2024, 6:15 a.m. UTC | #2
On Thu, Sep 12, 2024 at 01:52:59PM -0700, Junio C Hamano wrote:
> We [earlier] noticed that a few helper functions that format strings
> into a strbuf, prefixed with an arbitrary comment leading character,
> are forcing their callers to always pass the same argument.  Instead
> of keeping this excess flexibility nobody wants in practice, narrow
> the API so that all callers of these functions will get the same
> comment leading character.
> 
> Superficially it may appear that this goes against one of the recent
> trend, "war on global variables", but I doubt it matters much in the
> longer run.

I'm not quite sure I agree. The comment strings we have are in theory
broken, because they can be configured differently per repository. So if
you happen to have a Git command that operates on multiple repositories
at once and that needs to pay attention to that config then it would now
use the same comment character for both repositories, which I'd argue is
the wrong thing to do.

The recent patch series that makes "environment.c" aware of
`USE_THE_REPOSITORY_VARIABLE` already converted some of the global
config variables to be per-repository, because ultimately all of them
are broken in the described way. So from my point of view we should aim
to convert remaining ones to be per-repository, as well.

> These call sites all have already been relying on the global
> "comment_line_str" anyway, and passing the variable from the top of
> arbitrary deep call chain, which may not care specifically about
> this single variable comment_line_str, would not scale as a
> solution.  If we were to make it a convention to pass something from
> the very top of the call chain everywhere, it should not be an
> individual variable that is overly specific, like comment_line_str.
> Rather, it has to be something that allows access to a bag of all
> the global settings (possibly wider than "the_repository" struct).
> We can also limit our reliance to the globals by allowing a single
> global function call to obtaion such a bag of all global settings,
> i.e. "get_the_environment()", and allow everybody, including
> functions at the leaf level, to ask "what is the comment_line_str in
> the environment I am working in?".  As part of the libification, we
> can plug in an appropriate shim for get_the_environment().

This here I can definitely agree with. I think the best fit we have in
general is the "repo-settings.c" subsystem, which is also where I've
moved some of the previously-global settings into. It still has some
downsides in its current format:

  - It depends on a repository, but I'd argue it shouldn't such that we
    can also query configuration in repo-less settings.

  - `prepare_repo_settings()` makes up for a bad calling convention. I
    think that lazy accessors are way easier to use.

But it is a start, and something we can continue to build on.

Patrick
Junio C Hamano Sept. 13, 2024, 5:23 p.m. UTC | #3
Patrick Steinhardt <ps@pks.im> writes:

> I'm not quite sure I agree. The comment strings we have are in theory
> broken, because they can be configured differently per repository. So if
> you happen to have a Git command that operates on multiple repositories
> at once and that needs to pay attention to that config then it would now
> use the same comment character for both repositories, which I'd argue is
> the wrong thing to do.

Correct.  It needs a move from global to a member in a repository
instance, but the same "hey do not keep passing the same parameter"
motivation behind these patches applies, as the existing call sites
most likely will instead pass "the_repository->comment_line_str" to
these two functions.  The simplification would move that reference
to "the_repository->comment_line_str" down to these two functions.

> The recent patch series that makes "environment.c" aware of
> `USE_THE_REPOSITORY_VARIABLE` already converted some of the global
> config variables to be per-repository, because ultimately all of them
> are broken in the described way. So from my point of view we should aim
> to convert remaining ones to be per-repository, as well.

Yes, I view the environement change somewhat incomplete and it was
annoying to see things other than the_repository itself and those
that implicitly refer to the_repository covered by the CPP macro.

But we need to step back a bit in order to make the environment
change better.  Not everything works inside a repository and you may
not even have a repository but want to refer to a comment character
(say, "git bugreport" run outside a repository, perhaps, and the
bugreport may want to honor end-user configuration for commentChar
to mark its various sections).  Earlier I said it may make sense to
reimplement the global as a member of a repository instance, but
that is not entirely true.  Such a member in a repository struct may
be a good implementation detail for anybody who asks "what comment
character should I be using in the context I am called?", and there
may be "const char *get_comment_line_str(struct repository *)" that
accepts which repository to work with, but such a function would
need to be prepared to work without any repository, working out of
the system and per-user configuration files.

>   - It depends on a repository, but I'd argue it shouldn't such that we
>     can also query configuration in repo-less settings.
>
>   - `prepare_repo_settings()` makes up for a bad calling convention. I
>     think that lazy accessors are way easier to use.
>
> But it is a start, and something we can continue to build on.

Yup.