[1/1] Introduce "precious" file concept

Message ID	20190216114938.18843-2-pclouds@gmail.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <git-owner@kernel.org> From: =?utf-8?b?Tmd1eeG7hW4gVGjDoWkgTmfhu41jIER1eQ==?= <pclouds@gmail.com> To: git@vger.kernel.org Cc: Junio C Hamano <gitster@pobox.com>, =?utf-8?b?Tmd1eeG7hW4gVGjDoWkgTmc=?= =?utf-8?b?4buNYyBEdXk=?= <pclouds@gmail.com> Subject: [PATCH 1/1] Introduce "precious" file concept Date: Sat, 16 Feb 2019 18:49:38 +0700 Message-Id: <20190216114938.18843-2-pclouds@gmail.com> In-Reply-To: <20190216114938.18843-1-pclouds@gmail.com> References: <20190216114938.18843-1-pclouds@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: git-owner@vger.kernel.org Precedence: bulk
Series	Introduce "precious" file attribute \| expand [0/1] Introduce "precious" file attribute [1/1] Introduce "precious" file concept

Duy Nguyen Feb. 16, 2019, 11:49 a.m. UTC

A new attribute "precious" is added to indicate that certain files
have valuable content and should not be easily discarded even if they
are ignored or untracked.

So far there are one part of Git that are made aware of precious
files: "git clean" will leave precious files alone.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 Documentation/git-clean.txt     |  3 ++-
 Documentation/gitattributes.txt | 11 +++++++++++
 Documentation/gitignore.txt     |  4 ++++
 attr.c                          | 12 ++++++++++++
 attr.h                          |  2 ++
 builtin/clean.c                 | 20 +++++++++++++++++---
 t/t7300-clean.sh                | 29 +++++++++++++++++++++++++++++
 7 files changed, 77 insertions(+), 4 deletions(-)

Ævar Arnfjörð Bjarmason Feb. 16, 2019, 7:36 p.m. UTC | #1

On Sat, Feb 16 2019, Nguyễn Thái Ngọc Duy wrote:

[Re-CC some people involved the last time around]

> A new attribute "precious" is added to indicate that certain files
> have valuable content and should not be easily discarded even if they
> are ignored or untracked.
>
> So far there are one part of Git that are made aware of precious
> files: "git clean" will leave precious files alone.

Thanks for bringing this up again. There were also some patches recently
to save away clobbered files, do you/anyone else have any end goal in
mind here that combines this & that, or some other thing I may not have
kept up with?

My commentary on this whole thing is basically a repeat of what I said
in https://public-inbox.org/git/87wop0yvxv.fsf@evledraar.gmail.com/

I.e. we have a definite problem here somewhere, and there is some
solution, but this patch feels a bit like navigating that maze in the
dark without a map.

We had users report that the likes of "pull" were eating their data, but
now with this iteration of "precious" only impacting "clean" the only
problem anyone with the current semantics is still left unaddressed. My
memory (I may be wrong) is that "clean" was just brought up (by you?) as
a "what about this other related case?" in that whole discussion.

So as noted in the E-Mail linked above I think the first step should be
to enumerate/document/test the cases where we're now eating data
implicitly, and discuss how that relates to the semantics we desired
when the data-eating behavior was first introduced (as noted in E-Mails
linked from the above, my own preliminary digging seems to reveal there
isn't much of a relationship between the two).

Only when we have that list of XYZ cases we're supporting now, and can
see that XYZ is so important to maintain backwards compatibility for
that we can't change it should way say "we eat your data by default
because XYZ is so useful/backcompat, set 'precious' ...".

But right now we don't even have the list of XYZ or tests for them (as
my RFC "garbage" attribute patch revealed). So this whole thing still
feels like jumping three steps ahead to me in terms of addressing *that*
issue, but perhaps you have some orthogonal use-case in mind for this?

Duy Nguyen Feb. 17, 2019, 9:31 a.m. UTC | #2

On Sun, Feb 17, 2019 at 2:36 AM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
>
> On Sat, Feb 16 2019, Nguyễn Thái Ngọc Duy wrote:
>
> [Re-CC some people involved the last time around]
>
> > A new attribute "precious" is added to indicate that certain files
> > have valuable content and should not be easily discarded even if they
> > are ignored or untracked.
> >
> > So far there are one part of Git that are made aware of precious
> > files: "git clean" will leave precious files alone.
>
> Thanks for bringing this up again. There were also some patches recently
> to save away clobbered files, do you/anyone else have any end goal in
> mind here that combines this & that, or some other thing I may not have
> kept up with?

I assume you mean the clobbering untracked files by merge/checkout.
Those files will be backed up [1] if backup-log is implemented. Even
files deleted by "git clean" could be saved but that might go a little
too far.

[1] https://public-inbox.org/git/20181209104419.12639-20-pclouds@gmail.com/

> My commentary on this whole thing is basically a repeat of what I said
> in https://public-inbox.org/git/87wop0yvxv.fsf@evledraar.gmail.com/
>
> I.e. we have a definite problem here somewhere, and there is some
> solution, but this patch feels a bit like navigating that maze in the
> dark without a map.
>
> We had users report that the likes of "pull" were eating their data, but
> now with this iteration of "precious" only impacting "clean" the only
> problem anyone with the current semantics is still left unaddressed. My
> memory (I may be wrong) is that "clean" was just brought up (by you?) as
> a "what about this other related case?" in that whole discussion.
>
> So as noted in the E-Mail linked above I think the first step should be
> to enumerate/document/test the cases where we're now eating data
> implicitly, and discuss how that relates to the semantics we desired
> when the data-eating behavior was first introduced (as noted in E-Mails
> linked from the above, my own preliminary digging seems to reveal there
> isn't much of a relationship between the two).
>
> Only when we have that list of XYZ cases we're supporting now, and can
> see that XYZ is so important to maintain backwards compatibility for
> that we can't change it should way say "we eat your data by default
> because XYZ is so useful/backcompat, set 'precious' ...".
>
> But right now we don't even have the list of XYZ or tests for them (as
> my RFC "garbage" attribute patch revealed). So this whole thing still
> feels like jumping three steps ahead to me in terms of addressing *that*
> issue, but perhaps you have some orthogonal use-case in mind for this?

I'm not addressing the accidentally losing data in this patch. My
answer for that would still be backup-log, if it ever gets merged. But
this patch is about _known_ files that I want to keep when doing "git
clean", no more.

Ævar Arnfjörð Bjarmason Feb. 18, 2019, 9:53 a.m. UTC | #3

On Sun, Feb 17 2019, Duy Nguyen wrote:

> On Sun, Feb 17, 2019 at 2:36 AM Ævar Arnfjörð Bjarmason
> <avarab@gmail.com> wrote:
>>
>>
>> On Sat, Feb 16 2019, Nguyễn Thái Ngọc Duy wrote:
>>
>> [Re-CC some people involved the last time around]
>>
>> > A new attribute "precious" is added to indicate that certain files
>> > have valuable content and should not be easily discarded even if they
>> > are ignored or untracked.
>> >
>> > So far there are one part of Git that are made aware of precious
>> > files: "git clean" will leave precious files alone.
>>
>> Thanks for bringing this up again. There were also some patches recently
>> to save away clobbered files, do you/anyone else have any end goal in
>> mind here that combines this & that, or some other thing I may not have
>> kept up with?
>
> I assume you mean the clobbering untracked files by merge/checkout.
> Those files will be backed up [1] if backup-log is implemented. Even
> files deleted by "git clean" could be saved but that might go a little
> too far.

And I suppose if we have some mechanism for "don't backup but error out
if it was detectes as needed" we'll have the equivalent of inverting the
merge/checkout --force behavior now (& my "garbage" patch), i.e. we'd
stall on potential clobbering and need to carry on with --force.

> [1] https://public-inbox.org/git/20181209104419.12639-20-pclouds@gmail.com/
>
>> My commentary on this whole thing is basically a repeat of what I said
>> in https://public-inbox.org/git/87wop0yvxv.fsf@evledraar.gmail.com/
>>
>> I.e. we have a definite problem here somewhere, and there is some
>> solution, but this patch feels a bit like navigating that maze in the
>> dark without a map.
>>
>> We had users report that the likes of "pull" were eating their data, but
>> now with this iteration of "precious" only impacting "clean" the only
>> problem anyone with the current semantics is still left unaddressed. My
>> memory (I may be wrong) is that "clean" was just brought up (by you?) as
>> a "what about this other related case?" in that whole discussion.
>>
>> So as noted in the E-Mail linked above I think the first step should be
>> to enumerate/document/test the cases where we're now eating data
>> implicitly, and discuss how that relates to the semantics we desired
>> when the data-eating behavior was first introduced (as noted in E-Mails
>> linked from the above, my own preliminary digging seems to reveal there
>> isn't much of a relationship between the two).
>>
>> Only when we have that list of XYZ cases we're supporting now, and can
>> see that XYZ is so important to maintain backwards compatibility for
>> that we can't change it should way say "we eat your data by default
>> because XYZ is so useful/backcompat, set 'precious' ...".
>>
>> But right now we don't even have the list of XYZ or tests for them (as
>> my RFC "garbage" attribute patch revealed). So this whole thing still
>> feels like jumping three steps ahead to me in terms of addressing *that*
>> issue, but perhaps you have some orthogonal use-case in mind for this?
>
> I'm not addressing the accidentally losing data in this patch. My
> answer for that would still be backup-log, if it ever gets merged. But
> this patch is about _known_ files that I want to keep when doing "git
> clean", no more.

Indeed. My concern is that we're making incremental steps without a
clear idea of the end state, and once we get there we might find that
some steps along the way box us in or weren't what we wanted to solve
the overall UX issue.

More so in the CL / commit message not describing where we are overall,
where this patch fits in (backup log, etc.) than this whole thing
(backup log is already 24 patches) needing to be sent as one giant
series...

Duy Nguyen Feb. 18, 2019, 10:14 a.m. UTC | #4

On Mon, Feb 18, 2019 at 4:53 PM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
>
> On Sun, Feb 17 2019, Duy Nguyen wrote:
>
> > On Sun, Feb 17, 2019 at 2:36 AM Ævar Arnfjörð Bjarmason
> > <avarab@gmail.com> wrote:
> >>
> >>
> >> On Sat, Feb 16 2019, Nguyễn Thái Ngọc Duy wrote:
> >>
> >> [Re-CC some people involved the last time around]
> >>
> >> > A new attribute "precious" is added to indicate that certain files
> >> > have valuable content and should not be easily discarded even if they
> >> > are ignored or untracked.
> >> >
> >> > So far there are one part of Git that are made aware of precious
> >> > files: "git clean" will leave precious files alone.
> >>
> >> Thanks for bringing this up again. There were also some patches recently
> >> to save away clobbered files, do you/anyone else have any end goal in
> >> mind here that combines this & that, or some other thing I may not have
> >> kept up with?
> >
> > I assume you mean the clobbering untracked files by merge/checkout.
> > Those files will be backed up [1] if backup-log is implemented. Even
> > files deleted by "git clean" could be saved but that might go a little
> > too far.
>
> And I suppose if we have some mechanism for "don't backup but error out
> if it was detectes as needed" we'll have the equivalent of inverting the
> merge/checkout --force behavior now (& my "garbage" patch), i.e. we'd
> stall on potential clobbering and need to carry on with --force.

Yes that's another way to go.

> > [1] https://public-inbox.org/git/20181209104419.12639-20-pclouds@gmail.com/
> >
> >> My commentary on this whole thing is basically a repeat of what I said
> >> in https://public-inbox.org/git/87wop0yvxv.fsf@evledraar.gmail.com/
> >>
> >> I.e. we have a definite problem here somewhere, and there is some
> >> solution, but this patch feels a bit like navigating that maze in the
> >> dark without a map.
> >>
> >> We had users report that the likes of "pull" were eating their data, but
> >> now with this iteration of "precious" only impacting "clean" the only
> >> problem anyone with the current semantics is still left unaddressed. My
> >> memory (I may be wrong) is that "clean" was just brought up (by you?) as
> >> a "what about this other related case?" in that whole discussion.
> >>
> >> So as noted in the E-Mail linked above I think the first step should be
> >> to enumerate/document/test the cases where we're now eating data
> >> implicitly, and discuss how that relates to the semantics we desired
> >> when the data-eating behavior was first introduced (as noted in E-Mails
> >> linked from the above, my own preliminary digging seems to reveal there
> >> isn't much of a relationship between the two).
> >>
> >> Only when we have that list of XYZ cases we're supporting now, and can
> >> see that XYZ is so important to maintain backwards compatibility for
> >> that we can't change it should way say "we eat your data by default
> >> because XYZ is so useful/backcompat, set 'precious' ...".
> >>
> >> But right now we don't even have the list of XYZ or tests for them (as
> >> my RFC "garbage" attribute patch revealed). So this whole thing still
> >> feels like jumping three steps ahead to me in terms of addressing *that*
> >> issue, but perhaps you have some orthogonal use-case in mind for this?
> >
> > I'm not addressing the accidentally losing data in this patch. My
> > answer for that would still be backup-log, if it ever gets merged. But
> > this patch is about _known_ files that I want to keep when doing "git
> > clean", no more.
>
> Indeed. My concern is that we're making incremental steps without a
> clear idea of the end state, and once we get there we might find that
> some steps along the way box us in or weren't what we wanted to solve
> the overall UX issue.
>
> More so in the CL / commit message not describing where we are overall,
> where this patch fits in (backup log, etc.) than this whole thing
> (backup log is already 24 patches) needing to be sent as one giant
> series...

Not seeing the overall picture is probably because I don't have any.
Both backup-log and this precious attribute are more at plumbing level
that could be combined to make something, but no I don't have the
whole final UX worked out. The end game for backup log may be "git
undo" (or just --undo option across commands). The "precious"
attribute does not exactly have any role in "git undo" picture. It's
just refining the file classification (untracked, tracked, ignored)
that we have although it could be used as hints for "git undo". That's
all I got.

Junio C Hamano Feb. 19, 2019, 6:08 p.m. UTC | #5

Duy Nguyen <pclouds@gmail.com> writes:

> On Sun, Feb 17, 2019 at 2:36 AM Ævar Arnfjörð Bjarmason
> <avarab@gmail.com> wrote:
>>
>>
>> On Sat, Feb 16 2019, Nguyễn Thái Ngọc Duy wrote:
>>
>> [Re-CC some people involved the last time around]
>>
>> > A new attribute "precious" is added to indicate that certain files
>> > have valuable content and should not be easily discarded even if they
>> > are ignored or untracked.
>> >
>> > So far there are one part of Git that are made aware of precious
>> > files: "git clean" will leave precious files alone.
>>
>> Thanks for bringing this up again. There were also some patches recently
>> to save away clobbered files, do you/anyone else have any end goal in
>> mind here that combines this & that, or some other thing I may not have
>> kept up with?
>
> I assume you mean the clobbering untracked files by merge/checkout.
> Those files will be backed up [1] if backup-log is implemented. Even
> files deleted by "git clean" could be saved but that might go a little
> too far.

I agree with Ævar that it is a very good idea to ask what the
endgame should look like.  I would have expected that, with an
introduction of new "ignored but unexpendable" class of file
(i.e. "precious" here), operations such as merge and checkout will
be updated to keep them in situations where we would remove "ignored
and expendable" files (i.e. "ignored").  And it is perfectly OK if
the very first introduction of the "precious" support begins only
with a single operation, such as "clean", as long as the end-goal is
clear.

I personally do not believe in "backup log"; if we can screw up and
can fail to stop an operation that must avoid losing info, then we
can screw up the same way and fail to design and implement "backup"
to save info before an operation loses it.  If we do a good job in
supporting "precious" in various operations, we can rely less on
"backup log" and still be safe ;-)

Duy Nguyen Feb. 20, 2019, 1:35 a.m. UTC | #6

On Wed, Feb 20, 2019 at 1:08 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> Duy Nguyen <pclouds@gmail.com> writes:
>
> > On Sun, Feb 17, 2019 at 2:36 AM Ævar Arnfjörð Bjarmason
> > <avarab@gmail.com> wrote:
> >>
> >>
> >> On Sat, Feb 16 2019, Nguyễn Thái Ngọc Duy wrote:
> >>
> >> [Re-CC some people involved the last time around]
> >>
> >> > A new attribute "precious" is added to indicate that certain files
> >> > have valuable content and should not be easily discarded even if they
> >> > are ignored or untracked.
> >> >
> >> > So far there are one part of Git that are made aware of precious
> >> > files: "git clean" will leave precious files alone.
> >>
> >> Thanks for bringing this up again. There were also some patches recently
> >> to save away clobbered files, do you/anyone else have any end goal in
> >> mind here that combines this & that, or some other thing I may not have
> >> kept up with?
> >
> > I assume you mean the clobbering untracked files by merge/checkout.
> > Those files will be backed up [1] if backup-log is implemented. Even
> > files deleted by "git clean" could be saved but that might go a little
> > too far.
>
> I agree with Ævar that it is a very good idea to ask what the
> endgame should look like.  I would have expected that, with an
> introduction of new "ignored but unexpendable" class of file
> (i.e. "precious" here), operations such as merge and checkout will
> be updated to keep them in situations where we would remove "ignored
> and expendable" files (i.e. "ignored").  And it is perfectly OK if
> the very first introduction of the "precious" support begins only
> with a single operation, such as "clean", as long as the end-goal is
> clear.

I think the sticking point is how to deal with the surprise factor and
"precious" will not help at all in this aspect. In my mind there are
three classes

 - total expectation, i know i want git to not touch some files, i
tell git so (e.g. with "precious")

 - surprises sometimes, but in known classes. This is the main use
case of backup log, where I may accidentally do "git commit
-amsomething" after carefully preparing the index. Saving overwritten
files by merge/checkout could be done here as an alternative to
"garbage" attribute.

> I personally do not believe in "backup log"; if we can screw up and
> can fail to stop an operation that must avoid losing info, then we
> can screw up the same way and fail to design and implement "backup"
> to save info before an operation loses it.  If we do a good job in
> supporting "precious" in various operations, we can rely less on
> "backup log" and still be safe ;-)

and this is the third class, something completely unexpected. Yes
backup-log can't help here, but I don't think "precious" can either.
And I have no good proposal for this case.

Clemens Buchacher Feb. 20, 2019, 8:31 a.m. UTC | #7

On February 20, 2019 2:35:41 AM GMT+01:00, Duy Nguyen <pclouds@gmail.com> wrote:
>On Wed, Feb 20, 2019 at 1:08 AM Junio C Hamano <gitster@pobox.com>
>wrote:
>>
>> Duy Nguyen <pclouds@gmail.com> writes:
>>
>> > On Sun, Feb 17, 2019 at 2:36 AM Ævar Arnfjörð Bjarmason
>> > <avarab@gmail.com> wrote:
>> >>
>> >>
>> >> On Sat, Feb 16 2019, Nguyễn Thái Ngọc Duy wrote:
>> >>
>> >> [Re-CC some people involved the last time around]
>> >>
>> >> > A new attribute "precious" is added to indicate that certain
>files
>> >> > have valuable content and should not be easily discarded even if
>they
>> >> > are ignored or untracked.
>> >> >
>> >> > So far there are one part of Git that are made aware of precious
>> >> > files: "git clean" will leave precious files alone.
>> >>
>> >> Thanks for bringing this up again. There were also some patches
>recently
>> >> to save away clobbered files, do you/anyone else have any end goal
>in
>> >> mind here that combines this & that, or some other thing I may not
>have
>> >> kept up with?
>> >
>> > I assume you mean the clobbering untracked files by merge/checkout.
>> > Those files will be backed up [1] if backup-log is implemented.
>Even
>> > files deleted by "git clean" could be saved but that might go a
>little
>> > too far.
>>
>> I agree with Ævar that it is a very good idea to ask what the
>> endgame should look like.  I would have expected that, with an
>> introduction of new "ignored but unexpendable" class of file
>> (i.e. "precious" here), operations such as merge and checkout will
>> be updated to keep them in situations where we would remove "ignored
>> and expendable" files (i.e. "ignored").  And it is perfectly OK if
>> the very first introduction of the "precious" support begins only
>> with a single operation, such as "clean", as long as the end-goal is
>> clear.
>
>I think the sticking point is how to deal with the surprise factor and
>"precious" will not help at all in this aspect. In my mind there are
>three classes
>
> - total expectation, i know i want git to not touch some files, i
>tell git so (e.g. with "precious")
>
> - surprises sometimes, but in known classes. This is the main use
>case of backup log, where I may accidentally do "git commit
>-amsomething" after carefully preparing the index. Saving overwritten
>files by merge/checkout could be done here as an alternative to
>"garbage" attribute.
>
>> I personally do not believe in "backup log"; if we can screw up and
>> can fail to stop an operation that must avoid losing info, then we
>> can screw up the same way and fail to design and implement "backup"
>> to save info before an operation loses it.  If we do a good job in
>> supporting "precious" in various operations, we can rely less on
>> "backup log" and still be safe ;-)
>
>and this is the third class, something completely unexpected. Yes
>backup-log can't help here, but I don't think "precious" can either.
>And I have no good proposal for this case.

Sorry for going off on a tangent here, but I have had this on my mind for a long time. For cases where merge can lead to loss of a non-ignored untracked file (t7607-merge-overwrite.sh), I have the following proposal:

1. Merge the ORIG_HEAD and MERGE_HEAD commits without touching the index or the work tree. This is where we do rename detection, recursive merge, and content (line-by-line) merge. The result is CHECKOUT_HEAD, a tree with possible merge conflicts. For the switch branch operation CHECKOUT_HEAD is the tree to switch to. The remaining steps are the same for merge and switch branch operations.
2. Merge CHECKOUT_HEAD and the index with ORIG_HEAD as the merge base. The result is the CHECKOUT_INDEX. Do this in order to keep staged changes which are not affected by the merge. Do not do rename detection or content merge. In case of conflict, rollback and error out.
3. Merge CHECKOUT_INDEX with the work tree with the original index as merge base. Do this to simulate the work tree update. Dp not do remame detection or content merge. A conflict means that the checkout operation would touch untracked files or files with unstaged changes. In case of such a conflict, rollback and error out.

I believe this algorithm would behave much like the current implementation. But it separates the rename/history/content aspects of the merge algorithm from the checkout operation. It greatly simplifies the implementation of the checkout operation and there are no special cases where we lose files. Implementing step 1 is the tricky part. But it may still be worthwhile because the merge algorithm does not have to worry about staged changes or unstaged changes. The merge algorithm could work on the hierarchical tree structure instead of the flattened index. This makes it trivial to detect directory/file conflicts (no need to do a lookahead when iterating index files). This is also a better fit for detecting directory renames. Maybe this will allow us to focus more on rename detection, such as directory renames or moved functions [*1*]. 

[*1*] Also: moved files where the original file is replaced with a wrapper for the moved file always fools rename detection because we don't detect renames for files which were not removed.

Ævar Arnfjörð Bjarmason Feb. 20, 2019, 9:19 a.m. UTC | #8

On Tue, Feb 19 2019, Junio C Hamano wrote:

> Duy Nguyen <pclouds@gmail.com> writes:
>
>> On Sun, Feb 17, 2019 at 2:36 AM Ævar Arnfjörð Bjarmason
>> <avarab@gmail.com> wrote:
>>>
>>>
>>> On Sat, Feb 16 2019, Nguyễn Thái Ngọc Duy wrote:
>>>
>>> [Re-CC some people involved the last time around]
>>>
>>> > A new attribute "precious" is added to indicate that certain files
>>> > have valuable content and should not be easily discarded even if they
>>> > are ignored or untracked.
>>> >
>>> > So far there are one part of Git that are made aware of precious
>>> > files: "git clean" will leave precious files alone.
>>>
>>> Thanks for bringing this up again. There were also some patches recently
>>> to save away clobbered files, do you/anyone else have any end goal in
>>> mind here that combines this & that, or some other thing I may not have
>>> kept up with?
>>
>> I assume you mean the clobbering untracked files by merge/checkout.
>> Those files will be backed up [1] if backup-log is implemented. Even
>> files deleted by "git clean" could be saved but that might go a little
>> too far.
>
> I agree with Ævar that it is a very good idea to ask what the
> endgame should look like.  I would have expected that, with an
> introduction of new "ignored but unexpendable" class of file
> (i.e. "precious" here), operations such as merge and checkout will
> be updated to keep them in situations where we would remove "ignored
> and expendable" files (i.e. "ignored").  And it is perfectly OK if
> the very first introduction of the "precious" support begins only
> with a single operation, such as "clean", as long as the end-goal is
> clear.

FWIW I'm in full agreement with that.

> I personally do not believe in "backup log"; if we can screw up and
> can fail to stop an operation that must avoid losing info, then we
> can screw up the same way and fail to design and implement "backup"
> to save info before an operation loses it.

Yes, there could be some unforseen interaction between git commands
where we should have such a backup log, but did not think to implement
it. I'd hope such cases would be reported, and we could fix them.

But those sorts of cases aren't why we started discussing this, rather
we *know* what the data shredding command interaction is, but there
wasn't a consensus for just not shredding data by default by making
users use "checkout -f" or "merge -f" to proceed. I.e. taking some
variant of my "trashable" patch[1].

> If we do a good job in
> supporting "precious" in various operations, we can rely less on
> "backup log" and still be safe ;-)

Is noted in previous discussions[2] I think that's entirely
implausible. I think at best the "precious" facility will be used to
mark e.g *.o files as "don't check in, but don't clean (Makefile handles
it)".

Most git users are at the level of only knowing very basic
add/commit/pull/push command interaction. I feel strongly that we need
to make our tools safe to use by default, and not require some
relatively advanced "precious"/attribute facility to be carefully
configured in advance so we don't throw away uncommitted work on the
likes of merge/checkout.

1. https://public-inbox.org/git/87zhuf3gs0.fsf@evledraar.gmail.com/
2. https://public-inbox.org/git/871s7r4wuv.fsf@evledraar.gmail.com/

Steffen Jost Feb. 20, 2019, 9:36 a.m. UTC | #9

On 20.02.19 10:19, Ævar Arnfjörð Bjarmason wrote:
> Most git users are at the level of only knowing very basic
> add/commit/pull/push command interaction. I feel strongly that we need
> to make our tools safe to use by default, and not require some
> relatively advanced "precious"/attribute facility to be carefully
> configured in advance so we don't throw away uncommitted work on the
> likes of merge/checkout.
> 
> 1. https://public-inbox.org/git/87zhuf3gs0.fsf@evledraar.gmail.com/
> 2. https://public-inbox.org/git/871s7r4wuv.fsf@evledraar.gmail.com/
> 

+1
Please consider that silently deleting files is a no-go.

I teach computer science, and our switch from subversion to git for our second year programming projects caused a lot of grief, so much that my colleagues consider switching back to subversion as the point of first contact with revisioning.

Silently deleting partially revisioned files is a major source: students regularly destroy IDE or OS specific config files that they cannot restore themselves. (Project participants use all kinds of different IDEs on different OSs and thus have all kinds of weird hidden files that always manage to get checked into the repository, wreaking havoc on another's machine. So they get deleted and thus disturb the student that needed those files.) We do provide a huge .gitignore that ought to prevent this, but despite numerous warnings they only add it later, which then causes previously checked-in files to be lost upon switching between branches.

Please, by default, issue at least a warning before files are irrevocably los - or maybe keep a local snapshot of everything for the last few checkout in order to undo them?

Thanks,
  Steffen.

Duy Nguyen Feb. 20, 2019, 9:41 a.m. UTC | #10

On Wed, Feb 20, 2019 at 4:19 PM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
> > I personally do not believe in "backup log"; if we can screw up and
> > can fail to stop an operation that must avoid losing info, then we
> > can screw up the same way and fail to design and implement "backup"
> > to save info before an operation loses it.
>
> Yes, there could be some unforseen interaction between git commands
> where we should have such a backup log, but did not think to implement
> it. I'd hope such cases would be reported, and we could fix them.
>
> But those sorts of cases aren't why we started discussing this, rather
> we *know* what the data shredding command interaction is, but there
> wasn't a consensus for just not shredding data by default by making
> users use "checkout -f" or "merge -f" to proceed. I.e. taking some
> variant of my "trashable" patch[1].
>
> > If we do a good job in
> > supporting "precious" in various operations, we can rely less on
> > "backup log" and still be safe ;-)
>
> Is noted in previous discussions[2] I think that's entirely
> implausible. I think at best the "precious" facility will be used to
> mark e.g *.o files as "don't check in, but don't clean (Makefile handles
> it)".
>
> Most git users are at the level of only knowing very basic
> add/commit/pull/push command interaction. I feel strongly that we need
> to make our tools safe to use by default, and not require some
> relatively advanced "precious"/attribute facility to be carefully
> configured in advance so we don't throw away uncommitted work on the
> likes of merge/checkout.

There is a trade off somewhere. "new user first" should not come at
the cost for more experienced users.

Making "git checkout/merge" abort while it's working before breaks
scripts. And requiring to mark trashable files manually duplicates a
lot of ignore patterns. Have a look at any .gitignore file, the
majority of them is for discardable files because "ignored" class was
created with those in mind (*.o and friends). So now you would need to
add more or less the same set of ignore rules in .gitattributes to
mark them trashable, and gitignore/gitattributes rules are not exactly
compatible, you can't just blindly copy them over. Every time you add
one more .gitignore rule, there's a good chance you need to add a
similar rule for trashable attribute.

Maybe we just add a new "newbie" config knob and turn on the safety
nets on. Leave the knob on by default. And I will turn it off in my
~/.gitconfig as soon as it's real.

Ævar Arnfjörð Bjarmason Feb. 20, 2019, 10:46 a.m. UTC | #11

On Wed, Feb 20 2019, Duy Nguyen wrote:

> On Wed, Feb 20, 2019 at 4:19 PM Ævar Arnfjörð Bjarmason
> <avarab@gmail.com> wrote:
>> > I personally do not believe in "backup log"; if we can screw up and
>> > can fail to stop an operation that must avoid losing info, then we
>> > can screw up the same way and fail to design and implement "backup"
>> > to save info before an operation loses it.
>>
>> Yes, there could be some unforseen interaction between git commands
>> where we should have such a backup log, but did not think to implement
>> it. I'd hope such cases would be reported, and we could fix them.
>>
>> But those sorts of cases aren't why we started discussing this, rather
>> we *know* what the data shredding command interaction is, but there
>> wasn't a consensus for just not shredding data by default by making
>> users use "checkout -f" or "merge -f" to proceed. I.e. taking some
>> variant of my "trashable" patch[1].
>>
>> > If we do a good job in
>> > supporting "precious" in various operations, we can rely less on
>> > "backup log" and still be safe ;-)
>>
>> Is noted in previous discussions[2] I think that's entirely
>> implausible. I think at best the "precious" facility will be used to
>> mark e.g *.o files as "don't check in, but don't clean (Makefile handles
>> it)".
>>
>> Most git users are at the level of only knowing very basic
>> add/commit/pull/push command interaction. I feel strongly that we need
>> to make our tools safe to use by default, and not require some
>> relatively advanced "precious"/attribute facility to be carefully
>> configured in advance so we don't throw away uncommitted work on the
>> likes of merge/checkout.
>
> There is a trade off somewhere. "new user first" should not come at
> the cost for more experienced users.
>
> Making "git checkout/merge" abort while it's working before breaks
> scripts. And requiring to mark trashable files manually duplicates a
> lot of ignore patterns. Have a look at any .gitignore file, the
> majority of them is for discardable files because "ignored" class was
> created with those in mind (*.o and friends). So now you would need to
> add more or less the same set of ignore rules in .gitattributes to
> mark them trashable, and gitignore/gitattributes rules are not exactly
> compatible, you can't just blindly copy them over. Every time you add
> one more .gitignore rule, there's a good chance you need to add a
> similar rule for trashable attribute.
>
> Maybe we just add a new "newbie" config knob and turn on the safety
> nets on. Leave the knob on by default. And I will turn it off in my
> ~/.gitconfig as soon as it's real.

Oh yes, as noted upthread ("My commentary on this whole thing..."[1] )
my position on what we should do at this point is not that we should
definitely go one way or the other, but that more investigation is
needed.

As my "trashable"[2] patch makes clear we don't even have good tests or
documentation for these cases, which would be a good first step.

The one thing that *is* clear from my digging a few months back is that
the behavior we have now in git is overzealous when we look at the
initial case reported by Shawn way back when it was added.

Specifically, the intention back in 2007 was to fix a case where "git
checkout" ("read-tree -m", but whatever) would barf on a branch switch
where switching needed to replace a *tracked* "smth" with a *tracked*
"smth/file", or the other way around[3].

Does that mean we can just back that behavior out? No, because people
might have come to rely on it, but we should start with seeing exactly
what it *does* do, whether all those things are important or intended,
and maybe we can weight the shredding/backcompat trade-off for some of
those differently than others.

So the obvious thing to try would be to see if we can narrowly keep the
behavior where we end up shredding a file on disk, *but* are switching
between two trees A & B where that have/don't have that file.

Or more generously, try to "git hash-object" arbitrary files we're about
to shred, and check if it's already in the object database. That would
catch case where e.g. the user switching from A->B and would shred a
file, but it (or conflicting dir) is known to neither "A" nor "B", but
exists as a checked-in file in unrelated commit "C", which the user
recently had checked out (and e.g. their editor auto saved it as-is or
something...).

But it's entirely possible that after all that digging we'll come to the
conclusion that we can't change this at all, and we're just going to
live with all the current caveats.

That doesn't mean that having what amounts to a power user feature to
mitigate that damage if you know git well enough that it's going to be a
problem is going to help anything but a small minority of users. So
"dude, where's my data?" problem will still exist.

Even then there room to maneuver, e.g.:

 X. Perhaps after investigating it's not acceptable to change the
    default for script use, but could we require --force if we detect
    that we're connected to a terminal?

 Y. Or if even that is considered too much, we could have something like
    how help.autoCorrect works, where if we detect we're about to eat
    data we wait for 10 seconds, and invite the user to Ctrl+C now
    because we're about to clobber file "xyz".

 Z. It's for whatever reason still unacceptable to do X or Y (or some
    similar mitigation) for all cases of file shredding, but would be OK
    for some specific sub-cases (e.g. the not known to git-hash-object
    case above), and we have reason to suspect that such a narrow
    mitigation strikes the right trade-off between backwards
    compatibility and preventing the "dude, where's my data?" reports we
    get about this periodically.

1. https://public-inbox.org/git/87wolzo7a1.fsf@evledraar.gmail.com/
2. https://public-inbox.org/git/87zhuf3gs0.fsf@evledraar.gmail.com/
3. https://public-inbox.org/git/87wopj3661.fsf@evledraar.gmail.com/

Clemens Buchacher Feb. 20, 2019, 11:11 a.m. UTC | #12

On February 20, 2019 10:41:51 AM GMT+01:00, Duy Nguyen <pclouds@gmail.com> wrote:
>Making "git checkout/merge" abort while it's working before breaks
>scripts.

Change is always a trade-off. We should not reject change without considering the merits. Once we agree on the desired state, we can think about the migration strategy. 

>And requiring to mark trashable files manually duplicates a
>lot of ignore patterns. Have a look at any .gitignore file, the
>majority of them is for discardable files because "ignored" class was
>created with those in mind (*.o and friends). So now you would need to
>add more or less the same set of ignore rules in .gitattributes to
>mark them trashable, and gitignore/gitattributes rules are not exactly
>compatible, you can't just blindly copy them over. Every time you add
>one more .gitignore rule, there's a good chance you need to add a
>similar rule for trashable attribute.

I agree that ignored precious files are typically a small subset of the ignore files. Maintaining separate rules for ignored files and for trashable files would result in a lot of duplication.

On the other hand, how frequently do we really have to trash ignored files? Trashing a file should only be necessary if a tracked file overwrites an ignored file. When does this happen? I don't think it will happen for *.o files. So in most cases, there is simply no need to specify which files are precious. The default could simply be that all files are precious.

To support more complex use cases, we could specify precious files in addition to ignored files. Only if we specify precious files (and/or enable the ignored-are-trashable config option on a repository level), all other files become trashable.

Functionally this is equivalent the newbie option which you suggest, but I think it is not an issue of newbie vs experienced users but an issue of common vs special use cases.

>Maybe we just add a new "newbie" config knob and turn on the safety
>nets on. Leave the knob on by default. And I will turn it off in my
>~/.gitconfig as soon as it's real.

Junio C Hamano Feb. 20, 2019, 10:32 p.m. UTC | #13

Duy Nguyen <pclouds@gmail.com> writes:

>  - surprises sometimes, but in known classes. This is the main use
> case of backup log, where I may accidentally do "git commit
> -amsomething" after carefully preparing the index. Saving overwritten
> files by merge/checkout could be done here as an alternative to
> "garbage" attribute.

The problem with either of these is not the "saving" half, but
"restoring".  For different people, and for different cases even to
the same person, granularity of what is perceived as a single action
is different, so "give me the state of my working tree files before
the last operation" is a request that does not have a good
definition.  After finishing "git rebase" that replays 3 changes,
one of which needs manual conflict resolution, you may realize that
you made an incorrect resolution for one path but not others.  How
would you let the user say "no, I do not want to undo the whole
rebase, I want to go back to the state where I replayed the first
change, saw the conflicts while replaying the second change, and
resolved them in these files, but before touching that last one I
screwed up resolving, so that I can correct"?

Piling many "backups" (or "snapshots") on top of each other is the
easier part; I'd expect that it would be a much harder design
problem to let users make use of them in meaningful ways, and that
is the primary reason why I am skeptical.

Junio C Hamano Feb. 20, 2019, 10:39 p.m. UTC | #14

Duy Nguyen <pclouds@gmail.com> writes:

> There is a trade off somewhere. "new user first" should not come at
> the cost for more experienced users.

Probably.  Nobody will stay being newbie forever.

> Making "git checkout/merge" abort while it's working before breaks
> scripts. And requiring to mark trashable files manually duplicates a
> lot of ignore patterns. Have a look at any .gitignore file, the
> majority of them is for discardable files because "ignored" class was
> created with those in mind (*.o and friends).

Very true.  That is why we were OK for so long with "ignored" that
means "ignored and expendable".  We know in some situations we want
"ignored but precious", and that is why we are discussing this topic.

> So now you would need to
> add more or less the same set of ignore rules in .gitattributes to
> mark them trashable, and gitignore/gitattributes rules are not exactly
> compatible, you can't just blindly copy them over. Every time you add
> one more .gitignore rule, there's a good chance you need to add a
> similar rule for trashable attribute.

I am not sure why you would even need to _duplicate_.

Are you saying for each and every rule that specify "ignored and
expendable" in .gitignore there always will be "ignored but
precious" exception that match the pattern?  Given that we have been
OK for so long without even needing "precious", I find it somewhat
unrealistic to assume so.

Duy Nguyen Feb. 22, 2019, 9:35 a.m. UTC | #15

On Thu, Feb 21, 2019 at 5:39 AM Junio C Hamano <gitster@pobox.com> wrote:
> > So now you would need to
> > add more or less the same set of ignore rules in .gitattributes to
> > mark them trashable, and gitignore/gitattributes rules are not exactly
> > compatible, you can't just blindly copy them over. Every time you add
> > one more .gitignore rule, there's a good chance you need to add a
> > similar rule for trashable attribute.
>
> I am not sure why you would even need to _duplicate_.
>
> Are you saying for each and every rule that specify "ignored and
> expendable" in .gitignore there always will be "ignored but
> precious" exception that match the pattern?  Given that we have been
> OK for so long without even needing "precious", I find it somewhat
> unrealistic to assume so.

If all ignored files are now redefined as precious and we mark them
expendable with trashable attribute, then we need to duplicate most of
the rules. The "precious" attribute of course does not have this
problem since precious-and-ignored files should be rare.

Duy Nguyen Feb. 22, 2019, 9:46 a.m. UTC | #16

On Wed, Feb 20, 2019 at 6:11 PM Clemens Buchacher <drizzd@gmx.net> wrote:
> >And requiring to mark trashable files manually duplicates a
> >lot of ignore patterns. Have a look at any .gitignore file, the
> >majority of them is for discardable files because "ignored" class was
> >created with those in mind (*.o and friends). So now you would need to
> >add more or less the same set of ignore rules in .gitattributes to
> >mark them trashable, and gitignore/gitattributes rules are not exactly
> >compatible, you can't just blindly copy them over. Every time you add
> >one more .gitignore rule, there's a good chance you need to add a
> >similar rule for trashable attribute.
>
> I agree that ignored precious files are typically a small subset of the ignore files. Maintaining separate rules for ignored files and for trashable files would result in a lot of duplication.
>
> On the other hand, how frequently do we really have to trash ignored files? Trashing a file should only be necessary if a tracked file overwrites an ignored file. When does this happen? I don't think it will happen for *.o files. So in most cases, there is simply no need to specify which files are precious. The default could simply be that all files are precious.
>
> To support more complex use cases, we could specify precious files in addition to ignored files. Only if we specify precious files (and/or enable the ignored-are-trashable config option on a repository level), all other files become trashable.
>
> Functionally this is equivalent the newbie option which you suggest, but I think it is not an issue of newbie vs experienced users but an issue of common vs special use cases.

So far the two conflicting cases are "git checkout/merge" and "git
clean". Ignored files are valuable by default in the former, while
it's expendable by default in the latter.

So if you add this ignored-are-trashable config key (defaults to
false), git-clean -if will not do anything anymore. We _could_ advice
the user to turn the config on (with all the consequences). I don't
know if we have any other use cases that deserve the same advice.

Another option is simply leave the expendable/precious nature of
ignored files undefined like it is now and handle case by case:

 - git-clean learns to use a new attribute "clean". Undefined
attribute is seen as +clean. To keep some files "precious" you update
.gitattributes and add -clean rules.
 - git merge/checkout learns another attribute,
checkout-overwrite-ignore? Undefined attribute is seen as
-checkout-overwrite-ignore (i.e. abort the operation).

We stay away from any generic attribute name in this direction to make
clear it's only applicable to specific use cases.

Junio C Hamano Feb. 22, 2019, 6:07 p.m. UTC | #17

Duy Nguyen <pclouds@gmail.com> writes:

> On Thu, Feb 21, 2019 at 5:39 AM Junio C Hamano <gitster@pobox.com> wrote:
>> > So now you would need to
>> > add more or less the same set of ignore rules in .gitattributes to
>> > mark them trashable, and gitignore/gitattributes rules are not exactly
>> > compatible, you can't just blindly copy them over. Every time you add
>> > one more .gitignore rule, there's a good chance you need to add a
>> > similar rule for trashable attribute.
>>
>> I am not sure why you would even need to _duplicate_.
>>
>> Are you saying for each and every rule that specify "ignored and
>> expendable" in .gitignore there always will be "ignored but
>> precious" exception that match the pattern?  Given that we have been
>> OK for so long without even needing "precious", I find it somewhat
>> unrealistic to assume so.
>
> If all ignored files are now redefined as precious and we mark them
> expendable with trashable attribute, then we need to duplicate most of
> the rules. The "precious" attribute of course does not have this
> problem since precious-and-ignored files should be rare.

Ah, so you are saying that ignored (the traditional one we always
had) plus precious is a better combination than ignored (repurposed
to mean ignored-but-precious) plus trashable, because the latter
will cause people configure with a lot of duplication?

If that is the case, I'd agree ;-)

[1/1] Introduce "precious" file concept

Commit Message

Comments

Patch