diff mbox series

[v3,7/7] git-sh-setup: don't mark trees not used in-tree for i18n

Message ID patch-v3-7.7-7a82b1fd005-20220326T171200Z-avarab@gmail.com (mailing list archive)
State New, archived
Headers show
Series various: remove dead code, drop i18n not used in-tree | expand

Commit Message

Ævar Arnfjörð Bjarmason March 26, 2022, 5:14 p.m. UTC
Partially revert d323c6b6410 (i18n: git-sh-setup.sh: mark strings for
translation, 2016-06-17).

These strings are no longer used in-tree, and we shouldn't be wasting
translator time on them for the benefit of a hypothetical out-of-tree
user of git-sh-setup.sh.

Since d03ebd411c6 (rebase: remove the rebase.useBuiltin setting,
2019-03-18) we've had no in-tree user of require_work_tree_exists(),
and since the more recent c1e10b2dce2 (git-sh-setup: remove messaging
supporting --preserve-merges, 2021-10-21) the only in-tree user of
require_clean_work_tree() is git-filter-branch.sh. Let's only
translate the message it uses, and revert the others to the pre-image
of d323c6b6410.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 git-sh-setup.sh | 20 +++++++++++++-------
 1 file changed, 13 insertions(+), 7 deletions(-)

Comments

Johannes Sixt March 27, 2022, 10:47 a.m. UTC | #1
Am 26.03.22 um 18:14 schrieb Ævar Arnfjörð Bjarmason:
> Partially revert d323c6b6410 (i18n: git-sh-setup.sh: mark strings for
> translation, 2016-06-17).
> 
> These strings are no longer used in-tree, and we shouldn't be wasting
> translator time on them for the benefit of a hypothetical out-of-tree
> user of git-sh-setup.sh.

There is public documentation for these functions. Hence, you must
assume that they are used in scripts outside of Git. Castrating their
functionality like this patch does is unacceptable.

-- Hannes
Ævar Arnfjörð Bjarmason March 27, 2022, 11:15 a.m. UTC | #2
On Sun, Mar 27 2022, Johannes Sixt wrote:

> Am 26.03.22 um 18:14 schrieb Ævar Arnfjörð Bjarmason:
>> Partially revert d323c6b6410 (i18n: git-sh-setup.sh: mark strings for
>> translation, 2016-06-17).
>> 
>> These strings are no longer used in-tree, and we shouldn't be wasting
>> translator time on them for the benefit of a hypothetical out-of-tree
>> user of git-sh-setup.sh.
>
> There is public documentation for these functions. Hence, you must
> assume that they are used in scripts outside of Git. Castrating their
> functionality like this patch does is unacceptable.

For require_clean_work_tree() the public documentation for this function
states that it will emit a specific error message in English, and you're
expected to Lego-interpolate a string that we'll concatenate with it:

	[...]It emits an error message of the form `Cannot
        <action>: <reason>. <hint>`, and dies.  Example:
	+
	----------------
	require_clean_work_tree rebase "Please commit or stash them."

So I think that marking it for translation like this as d323c6b6410 was
always broken in that it broke that documented promise.

But that's just an argument for keeping the require_clean_work_tree()
part of this patch, not require_work_tree_exists().

For that one and others in git-sh-setup we've never said that we'd
provide these translated (and to the extent we've implied anything about
the rest, have strongly implied the opposite with
require_clean_work_tree()'s docs).

Nothing will break for out-of-tree users as a result of this
change. When we added these functions and their documentation their
output wouldn't be translated, then sometimes it was, now it's not
again.

We need also need to be mindful of translator time, it's a *lot* of
strings to go through, and e.g. I've commented in the past on patches
that marked stuff in t/helper/ for translation.

Some hypothetical out-of-tree user is, I think, a much stronger
candidate for skipping translation than that.

Also keep in mind that we don't even translate in-tree contrib stuff
like contrib/subtree/ (the recent "not-really-contrib" scalar being an
exception).

So I really think this is fine as-is, don't you think that if someone
out-of-tree had such strong expectations about the human-readable
strings these emit that they'd have long since stopped using them and
provided their own replacements?
Johannes Sixt March 28, 2022, 6:04 a.m. UTC | #3
Am 27.03.22 um 13:15 schrieb Ævar Arnfjörð Bjarmason:
> 
> On Sun, Mar 27 2022, Johannes Sixt wrote:
> 
>> Am 26.03.22 um 18:14 schrieb Ævar Arnfjörð Bjarmason:
>>> Partially revert d323c6b6410 (i18n: git-sh-setup.sh: mark strings for
>>> translation, 2016-06-17).
>>>
>>> These strings are no longer used in-tree, and we shouldn't be wasting
>>> translator time on them for the benefit of a hypothetical out-of-tree
>>> user of git-sh-setup.sh.
>>
>> There is public documentation for these functions. Hence, you must
>> assume that they are used in scripts outside of Git. Castrating their
>> functionality like this patch does is unacceptable.
> 
> For require_clean_work_tree() the public documentation for this function
> states that it will emit a specific error message in English, and you're
> expected to Lego-interpolate a string that we'll concatenate with it:
> 
> 	[...]It emits an error message of the form `Cannot
>         <action>: <reason>. <hint>`, and dies.  Example:
> 	+
> 	----------------
> 	require_clean_work_tree rebase "Please commit or stash them."
> 
> So I think that marking it for translation like this as d323c6b6410 was
> always broken in that it broke that documented promise.

I can buy this argument. But then this must be a separate commit with
this justification.

> But that's just an argument for keeping the require_clean_work_tree()
> part of this patch, not require_work_tree_exists().
> 
> For that one and others in git-sh-setup we've never said that we'd
> provide these translated (and to the extent we've implied anything about
> the rest, have strongly implied the opposite with
> require_clean_work_tree()'s docs).
> 
> Nothing will break for out-of-tree users as a result of this
> change. When we added these functions and their documentation their
> output wouldn't be translated, then sometimes it was, now it's not
> again.

This does not sound convincing at all, but rather like "I want the code
to be so, and here is some handwaving to justify it". What is wrong with
the status quo?

> We need also need to be mindful of translator time, it's a *lot* of
> strings to go through, and e.g. I've commented in the past on patches
> that marked stuff in t/helper/ for translation.

Translator's time is your concern? No translator sifts through 5000
strings on every release. There are tools that show only new strings to
translate. A text is translated once and then it lies under the radar
until someone changes it. Don't tell me that is time-consuming. On the
other hand, there is a lot of *reviewer* time that you are spending with
changes like this. *That* should be your concern.

-- Hannes
Ævar Arnfjörð Bjarmason March 28, 2022, 12:16 p.m. UTC | #4
On Mon, Mar 28 2022, Johannes Sixt wrote:

> Am 27.03.22 um 13:15 schrieb Ævar Arnfjörð Bjarmason:
>> 
>> On Sun, Mar 27 2022, Johannes Sixt wrote:
>> 
>>> Am 26.03.22 um 18:14 schrieb Ævar Arnfjörð Bjarmason:
>>>> Partially revert d323c6b6410 (i18n: git-sh-setup.sh: mark strings for
>>>> translation, 2016-06-17).
>>>>
>>>> These strings are no longer used in-tree, and we shouldn't be wasting
>>>> translator time on them for the benefit of a hypothetical out-of-tree
>>>> user of git-sh-setup.sh.
>>>
>>> There is public documentation for these functions. Hence, you must
>>> assume that they are used in scripts outside of Git. Castrating their
>>> functionality like this patch does is unacceptable.
>> 
>> For require_clean_work_tree() the public documentation for this function
>> states that it will emit a specific error message in English, and you're
>> expected to Lego-interpolate a string that we'll concatenate with it:
>> 
>> 	[...]It emits an error message of the form `Cannot
>>         <action>: <reason>. <hint>`, and dies.  Example:
>> 	+
>> 	----------------
>> 	require_clean_work_tree rebase "Please commit or stash them."
>> 
>> So I think that marking it for translation like this as d323c6b6410 was
>> always broken in that it broke that documented promise.
>
> I can buy this argument. But then this must be a separate commit with
> this justification.

Sure, I can elaborate on that point & split it up.

>> But that's just an argument for keeping the require_clean_work_tree()
>> part of this patch, not require_work_tree_exists().
>> 
>> For that one and others in git-sh-setup we've never said that we'd
>> provide these translated (and to the extent we've implied anything about
>> the rest, have strongly implied the opposite with
>> require_clean_work_tree()'s docs).
>> 
>> Nothing will break for out-of-tree users as a result of this
>> change. When we added these functions and their documentation their
>> output wouldn't be translated, then sometimes it was, now it's not
>> again.
>
> This does not sound convincing at all, but rather like "I want the code
> to be so, and here is some handwaving to justify it". What is wrong with
> the status quo?

The larger context for why I was looking at this again is that I'm
trying to slowly get us to the point where we can remove the
i18n-in-shellscript entirtely.

But I thought that was a rather large digression for the commit message,
and that these being both unused, and not something the "public" API
affected ever promised it would do was sufficient.

>> We need also need to be mindful of translator time, it's a *lot* of
>> strings to go through, and e.g. I've commented in the past on patches
>> that marked stuff in t/helper/ for translation.
>
> Translator's time is your concern? No translator sifts through 5000
> strings on every release. There are tools that show only new strings to
> translate.

Yes, I'm the person who added this entire i18n infrastructure to git, I
know how it works :)

> A text is translated once and then it lies under the radar
> until someone changes it. Don't tell me that is time-consuming.

Yes, individual orphaned strings aren't, but they add up.

Just like having that "USE_PIC" comment in configure.ac isn't much of a
big deal, but it makes sense to clean up unused code, just as we're
adding new code.

I will say that your implicit proposal of keeping this forever instead
is assuming that we won't have more translations for git, and every new
translator will look at this.

Context is critical for translators, so even if it's one string it's a
string you'll quickly grep for and find ... no uses for, and then likely
go hunting around for where it's used only to (hopefully, in that case)
find this thread. Better not to have it.

> On the other hand, there is a lot of *reviewer* time that you are
> spending with changes like this. *That* should be your concern.

I'd think most of the that time, if any, will be spent on this
sub-thread you started, so ... :)

Which isn't to say it shouldn't have been brought up, but from my
perspective I was (and still am) making a rather small change that I
think won't harm anyone in practice, and gives us some incremental
tidyness & contributes to an eventual large "git rm git-sh-i18n.sh" et
al.

But on reflection I don't think it's worth worrying about, and we can
just do this change.
Johannes Sixt March 28, 2022, 8:06 p.m. UTC | #5
Am 28.03.22 um 14:16 schrieb Ævar Arnfjörð Bjarmason:
> On Mon, Mar 28 2022, Johannes Sixt wrote:
>> What is wrong with
>> the status quo?
> 
> The larger context for why I was looking at this again is that I'm
> trying to slowly get us to the point where we can remove the
> i18n-in-shellscript entirtely.

Why? Again: what is wrong with the status quo?

> Just like having that "USE_PIC" comment in configure.ac isn't much of a
> big deal, but it makes sense to clean up unused code, just as we're
> adding new code.

There is a difference between "clean up unused code" and "change
observable behavior".

-- Hannes
Phillip Wood March 31, 2022, 10:23 a.m. UTC | #6
Hi Ævar

On 28/03/2022 13:16, Ævar Arnfjörð Bjarmason wrote:
> 
> On Mon, Mar 28 2022, Johannes Sixt wrote:
> 
>> Am 27.03.22 um 13:15 schrieb Ævar Arnfjörð Bjarmason:
>>>
>>> On Sun, Mar 27 2022, Johannes Sixt wrote:
>>>
>>>> Am 26.03.22 um 18:14 schrieb Ævar Arnfjörð Bjarmason:
>>>>> Partially revert d323c6b6410 (i18n: git-sh-setup.sh: mark strings for
>>>>> translation, 2016-06-17).
>>>>>
>>>>> These strings are no longer used in-tree, and we shouldn't be wasting
>>>>> translator time on them for the benefit of a hypothetical out-of-tree
>>>>> user of git-sh-setup.sh.

The out of tree users of git-sh-setup.sh are not hypothetical, they 
exist and objected when you recently tried to remove these functions 
entirely[1].

>>>> There is public documentation for these functions. Hence, you must
>>>> assume that they are used in scripts outside of Git. Castrating their
>>>> functionality like this patch does is unacceptable.
>>>
>>> For require_clean_work_tree() the public documentation for this function
>>> states that it will emit a specific error message in English, and you're
>>> expected to Lego-interpolate a string that we'll concatenate with it:

The documentation does not say whether the message is translated or not, 
probably because it was not updated when the translations were added six 
years ago.

>>> 	[...]It emits an error message of the form `Cannot
>>>          <action>: <reason>. <hint>`, and dies.  Example:

This is not a promising a "specific error message in English"

>>> 	+
>>> 	----------------
>>> 	require_clean_work_tree rebase "Please commit or stash them."

This is an example message you cannot use that to argue that we will 
always show a message in English

>>> So I think that marking it for translation like this as d323c6b6410 was
>>> always broken in that it broke that documented promise.
>>
>> I can buy this argument. But then this must be a separate commit with
>> this justification.
> 
> Sure, I can elaborate on that point & split it up.
> 
>>> But that's just an argument for keeping the require_clean_work_tree()
>>> part of this patch, not require_work_tree_exists().
>>>
>>> For that one and others in git-sh-setup we've never said that we'd
>>> provide these translated (and to the extent we've implied anything about
>>> the rest, have strongly implied the opposite with
>>> require_clean_work_tree()'s docs).
>>>
>>> Nothing will break for out-of-tree users as a result of this
>>> change.

The strings the user sees will change

>>> When we added these functions and their documentation their
>>> output wouldn't be translated,

Where does the documentation say "the output will not be translated"?

>>> then sometimes it was, now it's not
>>> again.
>>
>> This does not sound convincing at all, but rather like "I want the code
>> to be so, and here is some handwaving to justify it". What is wrong with
>> the status quo?
> 
> The larger context for why I was looking at this again is that I'm
> trying to slowly get us to the point where we can remove the
> i18n-in-shellscript entirtely.
> 
> But I thought that was a rather large digression for the commit message,
> and that these being both unused, and not something the "public" API
> affected ever promised it would do was sufficient.

I think if that is what you want to do then you should propose a series 
that does just that and explains why it is desirable, rather than coming 
up with other reasons to justify the changes you want.

>>> We need also need to be mindful of translator time, it's a *lot* of
>>> strings to go through, and e.g. I've commented in the past on patches
>>> that marked stuff in t/helper/ for translation.
>>
>> Translator's time is your concern? No translator sifts through 5000
>> strings on every release. There are tools that show only new strings to
>> translate.
> 
> Yes, I'm the person who added this entire i18n infrastructure to git, I
> know how it works :)
> 
>> A text is translated once and then it lies under the radar
>> until someone changes it. Don't tell me that is time-consuming.
> 
> Yes, individual orphaned strings aren't, but they add up.
> 
> Just like having that "USE_PIC" comment in configure.ac isn't much of a
> big deal, but it makes sense to clean up unused code, just as we're
> adding new code.
> 
> I will say that your implicit proposal of keeping this forever instead
> is assuming that we won't have more translations for git, and every new
> translator will look at this.
> 
> Context is critical for translators, so even if it's one string it's a
> string you'll quickly grep for and find ... no uses for, and then likely
> go hunting around for where it's used only to (hopefully, in that case)
> find this thread. Better not to have it.
> 
>> On the other hand, there is a lot of *reviewer* time that you are
>> spending with changes like this. *That* should be your concern.
> 
> I'd think most of the that time, if any, will be spent on this
> sub-thread you started, so ... :)

This sub-tread exists because you posted this patch to the mailing list. 
Blaming reviewers for asking perfectly reasonable questions is neither 
fair nor helpful.

This patch does not remove dead code as the rest of the series does but 
instead changes user facing messages in code that we recently 
established is part of the public api[2]. Nothing has changed since that 
recent discussion so I'm confused as to why you are proposing to modify 
the api again so soon.

Best Wishes

Phillip

[1] 
https://lore.kernel.org/git/CAJm9OHfN9iXDt-rzu-wb=67y4PPpmCUgMfmZPy1JMBJkHG256g@mail.gmail.com/
[2] https://lore.kernel.org/git/xmqq5yvik8bc.fsf@gitster.g/

> Which isn't to say it shouldn't have been brought up, but from my
> perspective I was (and still am) making a rather small change that I
> think won't harm anyone in practice, and gives us some incremental
> tidyness & contributes to an eventual large "git rm git-sh-i18n.sh" et
> al.
> 
> But on reflection I don't think it's worth worrying about, and we can
> just do this change.
>
Ævar Arnfjörð Bjarmason March 31, 2022, 11:15 a.m. UTC | #7
On Thu, Mar 31 2022, Phillip Wood wrote:

[tl;dr: Reply below, but this whole thing should be addressed by the v4
I sent last night:
https://lore.kernel.org/git/cover-v4-0.6-00000000000-20220331T014349Z-avarab@gmail.com/

I.e. the controversial patch has been ejected].

> On 28/03/2022 13:16, Ævar Arnfjörð Bjarmason wrote:
>> On Mon, Mar 28 2022, Johannes Sixt wrote:
>> 
>>> Am 27.03.22 um 13:15 schrieb Ævar Arnfjörð Bjarmason:
>>>>
>>>> On Sun, Mar 27 2022, Johannes Sixt wrote:
>>>>
>>>>> Am 26.03.22 um 18:14 schrieb Ævar Arnfjörð Bjarmason:
>>>>>> Partially revert d323c6b6410 (i18n: git-sh-setup.sh: mark strings for
>>>>>> translation, 2016-06-17).
>>>>>>
>>>>>> These strings are no longer used in-tree, and we shouldn't be wasting
>>>>>> translator time on them for the benefit of a hypothetical out-of-tree
>>>>>> user of git-sh-setup.sh.
>
> The out of tree users of git-sh-setup.sh are not hypothetical, they
> exist and objected when you recently tried to remove these functions 
> entirely[1].

I see that what I wrote there is ambiguous, but I'm aware of that &
remember that thread. I meant to say the hypothetical user that cares
about the i18n these functions exposed.

>>>>> There is public documentation for these functions. Hence, you must
>>>>> assume that they are used in scripts outside of Git. Castrating their
>>>>> functionality like this patch does is unacceptable.
>>>>
>>>> For require_clean_work_tree() the public documentation for this function
>>>> states that it will emit a specific error message in English, and you're
>>>> expected to Lego-interpolate a string that we'll concatenate with it:
>
> The documentation does not say whether the message is translated or
> not, probably because it was not updated when the translations were
> added six years ago.

It does say it. It uses the word "Cannot" at the beginning, and promises
to emit that specific string.

Yes we didn't update it at the time for i18n, and probably should.

But to the extent that the gordian knot in making any changes to these
whatsoever is because they've been publicly documented I don't think
anyone using these has been promised different behavior. So it's highly
relevant here.

>>>> 	[...]It emits an error message of the form `Cannot
>>>>          <action>: <reason>. <hint>`, and dies.  Example:
>
> This is not a promising a "specific error message in English"

It really is. You cannot use this API to produce sensible output in any
other language. It was used like this:

    require_clean_work_tree "pull with rebase" "Please commit or stash them."

For which we'd emit:

    Cannot pull with rebase: You have unstaged changes.

    Please commit or stash them.

You can see e.g. in the Bulgarian translation that this was dealt with
by putting the interpolated string in double-quotes.

>>>> 	+
>>>> 	----------------
>>>> 	require_clean_work_tree rebase "Please commit or stash them."
>
> This is an example message you cannot use that to argue that we will
> always show a message in English

I'm saying that the documentation says it emits English, that it didn't
always do that, and now does so again.

And that to get it to emit anything sensible in cases where we're not
under LC_ALL=C would have required 1=1 matching the behavior of whatever
shellscript is using this to what git-sh-i18n in picking the locale.

I don't think it's plausible that there's an out-of-tree user
maintaining their own set of i18n'd po/ files which expect to interact
with our translations in this way.

Any out-of-tree user of this (if they're using this at all) will either
not care, or they'll see more sensible output again.

>>>> So I think that marking it for translation like this as d323c6b6410 was
>>>> always broken in that it broke that documented promise.
>>>
>>> I can buy this argument. But then this must be a separate commit with
>>> this justification.
>> Sure, I can elaborate on that point & split it up.
>> 
>>>> But that's just an argument for keeping the require_clean_work_tree()
>>>> part of this patch, not require_work_tree_exists().
>>>>
>>>> For that one and others in git-sh-setup we've never said that we'd
>>>> provide these translated (and to the extent we've implied anything about
>>>> the rest, have strongly implied the opposite with
>>>> require_clean_work_tree()'s docs).
>>>>
>>>> Nothing will break for out-of-tree users as a result of this
>>>> change.
>
> The strings the user sees will change

Yes, and I'll admit that "nothing will break here" on my part isn't the
same as saying "there will be no observable change whatsoever". Sorry
about being unclear there.

As a general matter we don't promise that such strings won't change,
even for die(), error() etc. messages emitted by plumbing commands.

Except in some rare cases where they've been known to be used out of
tree extensively, e.g. the human-readable "merge" messages where we
have/had no other API to expose the same information.

Or, in the case of plumbing output where such strings are part of the
API contract.

But for these commands in the "Internal helper commands" category I
think this fall squarely in the category of changing a random error(),
die() etc. in the C code (which we do quite freely).

>>>> When we added these functions and their documentation their
>>>> output wouldn't be translated,
>
> Where does the documentation say "the output will not be translated"?

I think this was covered above, it's sufficient that it didn't promise
that it would be, and in the one case where we discuss it in passing
with an example we imply that it won't be.

>>>> then sometimes it was, now it's not
>>>> again.
>>>
>>> This does not sound convincing at all, but rather like "I want the code
>>> to be so, and here is some handwaving to justify it". What is wrong with
>>> the status quo?
>> The larger context for why I was looking at this again is that I'm
>> trying to slowly get us to the point where we can remove the
>> i18n-in-shellscript entirtely.
>> But I thought that was a rather large digression for the commit
>> message,
>> and that these being both unused, and not something the "public" API
>> affected ever promised it would do was sufficient.
>
> I think if that is what you want to do then you should propose a
> series that does just that and explains why it is desirable, rather
> than coming up with other reasons to justify the changes you want.

Just because I start looking at some code for reason X that doesn't mean
that submitting a patch with rationale Y isn't a sufficient reason to
make that change.

I still think that in this case that they're not used by our own i18n
effort is a perfectly sufficient reason to make the change, as we won't
waste translator time in it. I.e. I'll still stand behind the stated
rationale.

But aside from that most changes I made to git are with an eye to some
larger semi-related goal.

I do have some WIP changes to tear down most of the *.sh and *.perl i18n
infrastructure (the parts still in use would still have translations),
and IIRC it's at least a 2k line negative diffstat, and enables us to do
more interesting things in i18n (e.g. getting rid of the libintl
dependency).

But I also don't think that such a series is probably not possible in
the near term if we're going to insist that all shellscript output must
byte-for-byte be the same (for boring reasons I won't go into, but it's
mainly to do with sh-i18n--envsubst.c).

So it's also a bit of a chicke & egg problem. I wanted to send any such
UI changes in first, to see if it was even worth finishing up that work,
or if the whole thing would stall on not being able to change some
output someone somewhere might have relied on being byte-for-byte the
same.

>>>> We need also need to be mindful of translator time, it's a *lot* of
>>>> strings to go through, and e.g. I've commented in the past on patches
>>>> that marked stuff in t/helper/ for translation.
>>>
>>> Translator's time is your concern? No translator sifts through 5000
>>> strings on every release. There are tools that show only new strings to
>>> translate.
>> Yes, I'm the person who added this entire i18n infrastructure to
>> git, I
>> know how it works :)
>> 
>>> A text is translated once and then it lies under the radar
>>> until someone changes it. Don't tell me that is time-consuming.
>> Yes, individual orphaned strings aren't, but they add up.
>> Just like having that "USE_PIC" comment in configure.ac isn't much
>> of a
>> big deal, but it makes sense to clean up unused code, just as we're
>> adding new code.
>> I will say that your implicit proposal of keeping this forever
>> instead
>> is assuming that we won't have more translations for git, and every new
>> translator will look at this.
>> Context is critical for translators, so even if it's one string it's
>> a
>> string you'll quickly grep for and find ... no uses for, and then likely
>> go hunting around for where it's used only to (hopefully, in that case)
>> find this thread. Better not to have it.
>> 
>>> On the other hand, there is a lot of *reviewer* time that you are
>>> spending with changes like this. *That* should be your concern.
>> I'd think most of the that time, if any, will be spent on this
>> sub-thread you started, so ... :)
>
> This sub-tread exists because you posted this patch to the mailing
> list. Blaming reviewers for asking perfectly reasonable questions is
> neither fair nor helpful.

I didn't mean any offense there, but did mean to suggest (smiley an all)
that a mountain was being made out a molehill in this case.

Yes translator time is my concern. I started the i18n effort in git, and
I think it's really important. We currently have 18 translations of git
in the po/ directory, 16 if you leave out "dialects". Which if you
compare it with
https://en.wikipedia.org/wiki/List_of_languages_by_number_of_native_speakers
is quite bad.

For comparison I worked extensively on MediaWiki in a past life, which
at the time had at least 100 such translations. I looked again and it's
up to around 600 (many incomplete, to be fair).

Is that our fault as project? No, but we could definitely help it along.

I value the scarcity of translator time (including future translations)
much more than concerns that there *may be* someone somewhere who's got
a reliance on this particular output.

> This patch does not remove dead code as the rest of the series does
> but instead changes user facing messages in code that we recently 
> established is part of the public api[2]. Nothing has changed since
> that recent discussion so I'm confused as to why you are proposing to
> modify the api again so soon.

As noted above I don't think that previous discussion applies to these
changes as you describe, but in any case, ~8 hours before you sent this
reply I sent a v4 re-roll which left out this change:

    https://lore.kernel.org/git/cover-v4-0.6-00000000000-20220331T014349Z-avarab@gmail.com/

Which I hope will address your & Johannes Sixt's concerns here. Does the
rest of this series look good to you?
Johannes Sixt March 31, 2022, 8:27 p.m. UTC | #8
Am 31.03.22 um 13:15 schrieb Ævar Arnfjörð Bjarmason:
> I do have some WIP changes to tear down most of the *.sh and *.perl i18n
> infrastructure (the parts still in use would still have translations),
> and IIRC it's at least a 2k line negative diffstat, and enables us to do
> more interesting things in i18n (e.g. getting rid of the libintl
> dependency).

Why? Why? Why? Does the status quo have a problem somewhere? All this
sounds like a change for the sake of change.

> But I also don't think that such a series is probably not possible in
> the near term if we're going to insist that all shellscript output must
> byte-for-byte be the same (for boring reasons I won't go into, but it's
> mainly to do with sh-i18n--envsubst.c).

Such an insistence can easily be lifted if the change is justified
sufficiently. I haven't seen such a justification, yet.

-- Hannes
Ævar Arnfjörð Bjarmason April 2, 2022, 10:44 a.m. UTC | #9
On Thu, Mar 31 2022, Johannes Sixt wrote:

> Am 31.03.22 um 13:15 schrieb Ævar Arnfjörð Bjarmason:
>> I do have some WIP changes to tear down most of the *.sh and *.perl i18n
>> infrastructure (the parts still in use would still have translations),
>> and IIRC it's at least a 2k line negative diffstat, and enables us to do
>> more interesting things in i18n (e.g. getting rid of the libintl
>> dependency).
>
> Why? Why? Why? Does the status quo have a problem somewhere? All this
> sounds like a change for the sake of change.

So this is quite the digression, but, hey, you asked for it.

We don't have translations universally available because libintl is a
rather heavy thing to ship.

I don't personally mind linking against it for my own builds, but grep
for NO_GETTEXT in our tree & history for some of the workarounds.

We're also heading towards being able to build a stand-alone git binary
for most things, which makes shipping in various setups much easier, but
libintl is more of an "old-school" *nix library.

You need to ferry around auxilliary *.mo files, and for the *.sh and
*.perl translations we need gettext.sh, /usr/bin/gettext and
Locale::Messages (and everything that brings in).

I'd like translations for Git to Just Work, including if you're in some
random docker image with someone's home-built git. Giving people fewer
reasons to enable it improves accessibility. A lot of people who use git
are not on their own personal laptop, but on some setup (corporate, CI
etc.) that they don't fully control.

The gettext model & libintl is also just bad at various use-cases I
think would make sense to support.

E.g. having a configurable option to emit output in two languages at the
same time, either because you'd both like to understand the output &
e.g. search errors online, or you'd understand more from a union of say
German an English than from just one or the other.

For libintl you need'd to juggle setlocale() in the middle of your
underlying sprintf implementation to do that, or pull other shenanigans
of bypassing its API (e.g. directly reading the *.mo files), which
pretty much amounts to the same thing.

So essentially I wanted to hack up something that would just
post-process output like this:

    msgunfmt --strict -s -w 0 -i -E --color=always po/build/locale/de/LC_MESSAGES/git.mo

And turn it into a lang-de.c file, for which we'd make a lang-de.o that
we'd link in. And then either binary search through it, or just generate
code we'd compile (one really big switch/case statement).

Now, if you count the number of messages we translate in *.sh land on
your digits you won't even need to use all of our toes, and for the
*.perl it's similar, especially with add--interactive.perl going away
any day now.

There isn't any fundamental obstacle to making such a thing portable to
*.sh and *.perl, but having gotten that particular interop working once
in the past needing to do that again would bring this (I think
worthwhile) project from a "maybe someday" to "nah".

>> But I also don't think that such a series is probably not possible in
>> the near term if we're going to insist that all shellscript output must
>> byte-for-byte be the same (for boring reasons I won't go into, but it's
>> mainly to do with sh-i18n--envsubst.c).
>
> Such an insistence can easily be lifted if the change is justified
> sufficiently. I haven't seen such a justification, yet.

Sure, but re the "chicken & egg" problem I might do all the work to do
all that, and someone such as yourself might rightly point out that it
would break someone's obscure use-case, scuttling the whole thing.

Which isn't an exaggeration b.t.w., if you e.g. look through our
remaining gettext.sh usage you'll find that we carry the entirety of
sh-i18n--ensubst.c and everything around it (at least ~1k lines) for
emitting a single word in a single message in git-sh-setup.sh, that's
it.

Because the whole reason eval_gettext exists, and everything to support
it, is to support the use-case of feeding *arbitrary input* into the
translation engine, i.e. not the string you yourself have in your source
code & trust (it avoids shell "eval").

But if you think that's of paramount importance (that word is "usage"
b.t.w., and the equivalent in usage.c isn't even translated) there
wouldn't be any way to make forward progress towards the next step of
making the remaining shellscript translations call some "git sh--i18n"
helper to get their output.

So, to the extent that I was going to pursue the above plan at all I
wanted to do it in small steps, especially now as git-submodule.sh et al
are going away.

So.

It would be nice to get some leeway in some areas, especially for
something like this where I implemented this entire i18n system to begin
with, so I'd think it would be clear that it's not some drive-by
contribution. I clearly care about the end-goal, and have been sticking
with this particular topic for more than a decade.

Not everything can always be a single atomically understood patch that
carries all possible reasons to make the change with it, some things are
more of a longer term incremental effort.

And since we all have limited time on this spinning ball of mud
sometimes it can make sense to trickle in initial changes to see if some
larger end-goal is even attainable, or will be blocked due to some
unforeseen (or underestimated) reasons.

Thanks.
Johannes Sixt April 2, 2022, 2:16 p.m. UTC | #10
Am 02.04.22 um 12:44 schrieb Ævar Arnfjörð Bjarmason:
> 
> On Thu, Mar 31 2022, Johannes Sixt wrote:
> 
>> Am 31.03.22 um 13:15 schrieb Ævar Arnfjörð Bjarmason:
>>> I do have some WIP changes to tear down most of the *.sh and *.perl i18n
>>> infrastructure (the parts still in use would still have translations),
>>> and IIRC it's at least a 2k line negative diffstat, and enables us to do
>>> more interesting things in i18n (e.g. getting rid of the libintl
>>> dependency).
>>
>> Why? Why? Why? Does the status quo have a problem somewhere? All this
>> sounds like a change for the sake of change.
> 
> So this is quite the digression, but, hey, you asked for it.

Oh, no, this is not a digression *at all*. What your write below is the
kind of text that is needed to judge the value of a change. Without it,
a change that does not have an otherwise obvious improvement[*] is just
for the change's sake.

[*] In my book, getting rid of a libintl dependency is not an obvious
improvement. I may be biased in this case, because that dependency was
never a problem for me. Might be because my personal builds all have
NO_GETTEXT set.

> We don't have translations universally available because libintl is a
> rather heavy thing to ship.
> 
> I don't personally mind linking against it for my own builds, but grep
> for NO_GETTEXT in our tree & history for some of the workarounds.
> 
> We're also heading towards being able to build a stand-alone git binary
> for most things, which makes shipping in various setups much easier, but
> libintl is more of an "old-school" *nix library.
> 
> You need to ferry around auxilliary *.mo files, and for the *.sh and
> *.perl translations we need gettext.sh, /usr/bin/gettext and
> Locale::Messages (and everything that brings in).
> 
> I'd like translations for Git to Just Work, including if you're in some
> random docker image with someone's home-built git. Giving people fewer
> reasons to enable it improves accessibility. A lot of people who use git
> are not on their own personal laptop, but on some setup (corporate, CI
> etc.) that they don't fully control.
> 
> The gettext model & libintl is also just bad at various use-cases I
> think would make sense to support.
> 
> E.g. having a configurable option to emit output in two languages at the
> same time, either because you'd both like to understand the output &
> e.g. search errors online, or you'd understand more from a union of say
> German an English than from just one or the other.
> 
> For libintl you need'd to juggle setlocale() in the middle of your
> underlying sprintf implementation to do that, or pull other shenanigans
> of bypassing its API (e.g. directly reading the *.mo files), which
> pretty much amounts to the same thing.
> 
> So essentially I wanted to hack up something that would just
> post-process output like this:
> 
>     msgunfmt --strict -s -w 0 -i -E --color=always po/build/locale/de/LC_MESSAGES/git.mo
> 
> And turn it into a lang-de.c file, for which we'd make a lang-de.o that
> we'd link in. And then either binary search through it, or just generate
> code we'd compile (one really big switch/case statement).
> 
> Now, if you count the number of messages we translate in *.sh land on
> your digits you won't even need to use all of our toes, and for the
> *.perl it's similar, especially with add--interactive.perl going away
> any day now.
> 
> There isn't any fundamental obstacle to making such a thing portable to
> *.sh and *.perl, but having gotten that particular interop working once
> in the past needing to do that again would bring this (I think
> worthwhile) project from a "maybe someday" to "nah".

Just to make it clear: I am totally neutral on your goal. It's on others
to tell whether this is worth doing.

>>> But I also don't think that such a series is probably not possible in
>>> the near term if we're going to insist that all shellscript output must
>>> byte-for-byte be the same (for boring reasons I won't go into, but it's
>>> mainly to do with sh-i18n--envsubst.c).
>>
>> Such an insistence can easily be lifted if the change is justified
>> sufficiently. I haven't seen such a justification, yet.
> 
> Sure, but re the "chicken & egg" problem I might do all the work to do
> all that, and someone such as yourself might rightly point out that it
> would break someone's obscure use-case, scuttling the whole thing.
> 
> Which isn't an exaggeration b.t.w., if you e.g. look through our
> remaining gettext.sh usage you'll find that we carry the entirety of
> sh-i18n--ensubst.c and everything around it (at least ~1k lines) for
> emitting a single word in a single message in git-sh-setup.sh, that's
> it.

See, someone thought it was a good idea to have i18n in shell scripts
and others agreed that it was worth having ~1k lines of code to do that.
So the code went in. From then on, these ~1k lines are *not a problem*
in themselves. From then on, the decision of having ~1k lines or not
having them can only be based on what effect they have, but no longer on
"oh, wow, that's 1k lines to write a single word; do we really want that"?

> 
> Because the whole reason eval_gettext exists, and everything to support
> it, is to support the use-case of feeding *arbitrary input* into the
> translation engine, i.e. not the string you yourself have in your source
> code & trust (it avoids shell "eval").
> 
> But if you think that's of paramount importance (that word is "usage"
> b.t.w., and the equivalent in usage.c isn't even translated) there
> wouldn't be any way to make forward progress towards the next step of
> making the remaining shellscript translations call some "git sh--i18n"
> helper to get their output.
> 
> So, to the extent that I was going to pursue the above plan at all I
> wanted to do it in small steps, especially now as git-submodule.sh et al
> are going away.
> 
> So.
> 
> It would be nice to get some leeway in some areas, especially for
> something like this where I implemented this entire i18n system to begin
> with, so I'd think it would be clear that it's not some drive-by
> contribution. I clearly care about the end-goal, and have been sticking
> with this particular topic for more than a decade.
> 
> Not everything can always be a single atomically understood patch that
> carries all possible reasons to make the change with it, some things are
> more of a longer term incremental effort.
> 
> And since we all have limited time on this spinning ball of mud
> sometimes it can make sense to trickle in initial changes to see if some
> larger end-goal is even attainable, or will be blocked due to some
> unforeseen (or underestimated) reasons.

You can't have leeway for a change whose conclusion is "removes some
miniscule feature". But if you add "Here is the secret plan to Scrat's
golden nut; let's start with this change, even though it removes some
miniscule feature", things are vastly different.

-- Hannes
Ævar Arnfjörð Bjarmason April 3, 2022, 3:22 p.m. UTC | #11
On Sat, Apr 02 2022, Johannes Sixt wrote:

> Am 02.04.22 um 12:44 schrieb Ævar Arnfjörð Bjarmason:
>> 
>> On Thu, Mar 31 2022, Johannes Sixt wrote:
>> 
>>> Am 31.03.22 um 13:15 schrieb Ævar Arnfjörð Bjarmason:
>>>> I do have some WIP changes to tear down most of the *.sh and *.perl i18n
>>>> infrastructure (the parts still in use would still have translations),
>>>> and IIRC it's at least a 2k line negative diffstat, and enables us to do
>>>> more interesting things in i18n (e.g. getting rid of the libintl
>>>> dependency).
>>>
>>> Why? Why? Why? Does the status quo have a problem somewhere? All this
>>> sounds like a change for the sake of change.
>> 
>> So this is quite the digression, but, hey, you asked for it.
>
> Oh, no, this is not a digression *at all*. What your write below is the
> kind of text that is needed to judge the value of a change. Without it,
> a change that does not have an otherwise obvious improvement[*] is just
> for the change's sake.

Well let's be clear here.

It's been your claim that the proposed change must not be worth doing
because you don't place the same value on having a 1=1 mapping between
strings we ask translators to work on, and those that we'll actually
present as part of git's UI.

Which is fair enough, and something we can respectfully disagree on.

But that's not the same as claiming that the stated reason for the
upthread patch is incomplete or insufficient.

I can tell you that as the person who implemented this whole i18n
facility that providing translations for someone's random shellscript
was never the point, at all.

It just so happens that because we implemented some bits of
functionality of the porcelain as shellscripts, and at the same time had
a shellscript library which (regrettably or not) seems to invite both
in-tree and out-of-tree users to use it,that the two went hand-in-hand.

But now that they don't anymore I don't see anything "handwaving" about
simply removing the translation markings. I don't think they serve any
purpose anymore.

> [*] In my book, getting rid of a libintl dependency is not an obvious
> improvement. I may be biased in this case, because that dependency was
> never a problem for me. Might be because my personal builds all have
> NO_GETTEXT set.

So not only don't you use a translated version of git, but you don't
even compile one with it?

Yes, I can imagine that hasn't exposed you to any of the problems with
it :)

>>>> But I also don't think that such a series is probably not possible in
>>>> the near term if we're going to insist that all shellscript output must
>>>> byte-for-byte be the same (for boring reasons I won't go into, but it's
>>>> mainly to do with sh-i18n--envsubst.c).
>>>
>>> Such an insistence can easily be lifted if the change is justified
>>> sufficiently. I haven't seen such a justification, yet.
>> 
>> Sure, but re the "chicken & egg" problem I might do all the work to do
>> all that, and someone such as yourself might rightly point out that it
>> would break someone's obscure use-case, scuttling the whole thing.
>> 
>> Which isn't an exaggeration b.t.w., if you e.g. look through our
>> remaining gettext.sh usage you'll find that we carry the entirety of
>> sh-i18n--ensubst.c and everything around it (at least ~1k lines) for
>> emitting a single word in a single message in git-sh-setup.sh, that's
>> it.
>
> See, someone thought it was a good idea to have i18n in shell scripts
> and others agreed that it was worth having ~1k lines of code to do that.
> So the code went in. From then on, these ~1k lines are *not a problem*
> in themselves. From then on, the decision of having ~1k lines or not
> having them can only be based on what effect they have, but no longer on
> "oh, wow, that's 1k lines to write a single word; do we really want that"?

Aside from i18n. I don't agree with that in general.

Yes, code that's in-tree and working needs to be under less scrutiny as
a new addition, and refactoring something isn't always worth it. We'll
also need to review the removals.

But there's also a cost to keeping things around, as you can e.g. see
from various portability and correctness fixes to this code we've
perma-forked from the GNU GPLv2 version.

There's some tipping point wherea refactoring isn't worth it, but
emitting the word "usage" with ~1k lines is a pretty clear candidate in
my mind for a "git rm".

>> Because the whole reason eval_gettext exists, and everything to support
>> it, is to support the use-case of feeding *arbitrary input* into the
>> translation engine, i.e. not the string you yourself have in your source
>> code & trust (it avoids shell "eval").
>> 
>> But if you think that's of paramount importance (that word is "usage"
>> b.t.w., and the equivalent in usage.c isn't even translated) there
>> wouldn't be any way to make forward progress towards the next step of
>> making the remaining shellscript translations call some "git sh--i18n"
>> helper to get their output.
>> 
>> So, to the extent that I was going to pursue the above plan at all I
>> wanted to do it in small steps, especially now as git-submodule.sh et al
>> are going away.
>> 
>> So.
>> 
>> It would be nice to get some leeway in some areas, especially for
>> something like this where I implemented this entire i18n system to begin
>> with, so I'd think it would be clear that it's not some drive-by
>> contribution. I clearly care about the end-goal, and have been sticking
>> with this particular topic for more than a decade.
>> 
>> Not everything can always be a single atomically understood patch that
>> carries all possible reasons to make the change with it, some things are
>> more of a longer term incremental effort.
>> 
>> And since we all have limited time on this spinning ball of mud
>> sometimes it can make sense to trickle in initial changes to see if some
>> larger end-goal is even attainable, or will be blocked due to some
>> unforeseen (or underestimated) reasons.
>
> You can't have leeway for a change whose conclusion is "removes some
> miniscule feature". But if you add "Here is the secret plan to Scrat's
> golden nut; let's start with this change, even though it removes some
> miniscule feature", things are vastly different.

I mean leeway on the topic that I probably have some idea of what I'm
talking about when it comes to git's i18n support, and whether it's
worth the effort to keep certain things around or not.

I.e. you started this thread by claiming that the removal of these
translations would be "castrating [out-of-tree] functionality, [which
is] unacceptable.".

As noted above I don't think that assessment is correct, and if I'm
understanding you correctly you don't even use git's i18n mechanism at
all.

Which I think presents only two possible conclusions.

One is that I, the person who added the i18n mechanism in the first
place, am so clueless about how it work or what it's for, that I'm
(intentionally or not) submitting patches that "castrate" it.

The other is that you've understandably missed some of the nuance, such
as why we're even marking strings for translation, and what the intended
audience of them.
Johannes Sixt April 4, 2022, 8:20 p.m. UTC | #12
Please appologize that I do not reply to your arguments directly. I
think I have said all I can. Perhaps I am unable to express my concerns
sufficiently clearly.

-- Hannes
diff mbox series

Patch

diff --git a/git-sh-setup.sh b/git-sh-setup.sh
index d92df37e992..1abceaac8d3 100644
--- a/git-sh-setup.sh
+++ b/git-sh-setup.sh
@@ -187,8 +187,7 @@  cd_to_toplevel () {
 require_work_tree_exists () {
 	if test "z$(git rev-parse --is-bare-repository)" != zfalse
 	then
-		program_name=$0
-		die "$(eval_gettext "fatal: \$program_name cannot be used without a working tree.")"
+		die "fatal: $0 cannot be used without a working tree."
 	fi
 }
 
@@ -206,13 +205,13 @@  require_clean_work_tree () {
 
 	if ! git diff-files --quiet --ignore-submodules
 	then
-		action=$1
-		case "$action" in
+		case "$1" in
 		"rewrite branches")
 			gettextln "Cannot rewrite branches: You have unstaged changes." >&2
 			;;
 		*)
-			eval_gettextln "Cannot \$action: You have unstaged changes." >&2
+			# Some out-of-tree user of require_clean_work_tree()
+			echo "Cannot $1: You have unstaged changes." >&2
 			;;
 		esac
 		err=1
@@ -222,8 +221,15 @@  require_clean_work_tree () {
 	then
 		if test $err = 0
 		then
-			action=$1
-			eval_gettextln "Cannot \$action: Your index contains uncommitted changes." >&2
+			case "$1" in
+			"rewrite branches")
+				gettextln "Cannot rewrite branches: You have unstaged changes." >&2
+				;;
+			*)
+				# Some out-of-tree user of require_clean_work_tree()
+				echo "Cannot $1: Your index contains uncommitted changes." >&2
+				;;
+			esac
 		else
 		    gettextln "Additionally, your index contains uncommitted changes." >&2
 		fi