Message ID | 20190216114938.18843-2-pclouds@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Introduce "precious" file attribute | expand |
On Sat, Feb 16 2019, Nguyễn Thái Ngọc Duy wrote: [Re-CC some people involved the last time around] > A new attribute "precious" is added to indicate that certain files > have valuable content and should not be easily discarded even if they > are ignored or untracked. > > So far there are one part of Git that are made aware of precious > files: "git clean" will leave precious files alone. Thanks for bringing this up again. There were also some patches recently to save away clobbered files, do you/anyone else have any end goal in mind here that combines this & that, or some other thing I may not have kept up with? My commentary on this whole thing is basically a repeat of what I said in https://public-inbox.org/git/87wop0yvxv.fsf@evledraar.gmail.com/ I.e. we have a definite problem here somewhere, and there is some solution, but this patch feels a bit like navigating that maze in the dark without a map. We had users report that the likes of "pull" were eating their data, but now with this iteration of "precious" only impacting "clean" the only problem anyone with the current semantics is still left unaddressed. My memory (I may be wrong) is that "clean" was just brought up (by you?) as a "what about this other related case?" in that whole discussion. So as noted in the E-Mail linked above I think the first step should be to enumerate/document/test the cases where we're now eating data implicitly, and discuss how that relates to the semantics we desired when the data-eating behavior was first introduced (as noted in E-Mails linked from the above, my own preliminary digging seems to reveal there isn't much of a relationship between the two). Only when we have that list of XYZ cases we're supporting now, and can see that XYZ is so important to maintain backwards compatibility for that we can't change it should way say "we eat your data by default because XYZ is so useful/backcompat, set 'precious' ...". But right now we don't even have the list of XYZ or tests for them (as my RFC "garbage" attribute patch revealed). So this whole thing still feels like jumping three steps ahead to me in terms of addressing *that* issue, but perhaps you have some orthogonal use-case in mind for this?
On Sun, Feb 17, 2019 at 2:36 AM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote: > > > On Sat, Feb 16 2019, Nguyễn Thái Ngọc Duy wrote: > > [Re-CC some people involved the last time around] > > > A new attribute "precious" is added to indicate that certain files > > have valuable content and should not be easily discarded even if they > > are ignored or untracked. > > > > So far there are one part of Git that are made aware of precious > > files: "git clean" will leave precious files alone. > > Thanks for bringing this up again. There were also some patches recently > to save away clobbered files, do you/anyone else have any end goal in > mind here that combines this & that, or some other thing I may not have > kept up with? I assume you mean the clobbering untracked files by merge/checkout. Those files will be backed up [1] if backup-log is implemented. Even files deleted by "git clean" could be saved but that might go a little too far. [1] https://public-inbox.org/git/20181209104419.12639-20-pclouds@gmail.com/ > My commentary on this whole thing is basically a repeat of what I said > in https://public-inbox.org/git/87wop0yvxv.fsf@evledraar.gmail.com/ > > I.e. we have a definite problem here somewhere, and there is some > solution, but this patch feels a bit like navigating that maze in the > dark without a map. > > We had users report that the likes of "pull" were eating their data, but > now with this iteration of "precious" only impacting "clean" the only > problem anyone with the current semantics is still left unaddressed. My > memory (I may be wrong) is that "clean" was just brought up (by you?) as > a "what about this other related case?" in that whole discussion. > > So as noted in the E-Mail linked above I think the first step should be > to enumerate/document/test the cases where we're now eating data > implicitly, and discuss how that relates to the semantics we desired > when the data-eating behavior was first introduced (as noted in E-Mails > linked from the above, my own preliminary digging seems to reveal there > isn't much of a relationship between the two). > > Only when we have that list of XYZ cases we're supporting now, and can > see that XYZ is so important to maintain backwards compatibility for > that we can't change it should way say "we eat your data by default > because XYZ is so useful/backcompat, set 'precious' ...". > > But right now we don't even have the list of XYZ or tests for them (as > my RFC "garbage" attribute patch revealed). So this whole thing still > feels like jumping three steps ahead to me in terms of addressing *that* > issue, but perhaps you have some orthogonal use-case in mind for this? I'm not addressing the accidentally losing data in this patch. My answer for that would still be backup-log, if it ever gets merged. But this patch is about _known_ files that I want to keep when doing "git clean", no more.
On Sun, Feb 17 2019, Duy Nguyen wrote: > On Sun, Feb 17, 2019 at 2:36 AM Ævar Arnfjörð Bjarmason > <avarab@gmail.com> wrote: >> >> >> On Sat, Feb 16 2019, Nguyễn Thái Ngọc Duy wrote: >> >> [Re-CC some people involved the last time around] >> >> > A new attribute "precious" is added to indicate that certain files >> > have valuable content and should not be easily discarded even if they >> > are ignored or untracked. >> > >> > So far there are one part of Git that are made aware of precious >> > files: "git clean" will leave precious files alone. >> >> Thanks for bringing this up again. There were also some patches recently >> to save away clobbered files, do you/anyone else have any end goal in >> mind here that combines this & that, or some other thing I may not have >> kept up with? > > I assume you mean the clobbering untracked files by merge/checkout. > Those files will be backed up [1] if backup-log is implemented. Even > files deleted by "git clean" could be saved but that might go a little > too far. And I suppose if we have some mechanism for "don't backup but error out if it was detectes as needed" we'll have the equivalent of inverting the merge/checkout --force behavior now (& my "garbage" patch), i.e. we'd stall on potential clobbering and need to carry on with --force. > [1] https://public-inbox.org/git/20181209104419.12639-20-pclouds@gmail.com/ > >> My commentary on this whole thing is basically a repeat of what I said >> in https://public-inbox.org/git/87wop0yvxv.fsf@evledraar.gmail.com/ >> >> I.e. we have a definite problem here somewhere, and there is some >> solution, but this patch feels a bit like navigating that maze in the >> dark without a map. >> >> We had users report that the likes of "pull" were eating their data, but >> now with this iteration of "precious" only impacting "clean" the only >> problem anyone with the current semantics is still left unaddressed. My >> memory (I may be wrong) is that "clean" was just brought up (by you?) as >> a "what about this other related case?" in that whole discussion. >> >> So as noted in the E-Mail linked above I think the first step should be >> to enumerate/document/test the cases where we're now eating data >> implicitly, and discuss how that relates to the semantics we desired >> when the data-eating behavior was first introduced (as noted in E-Mails >> linked from the above, my own preliminary digging seems to reveal there >> isn't much of a relationship between the two). >> >> Only when we have that list of XYZ cases we're supporting now, and can >> see that XYZ is so important to maintain backwards compatibility for >> that we can't change it should way say "we eat your data by default >> because XYZ is so useful/backcompat, set 'precious' ...". >> >> But right now we don't even have the list of XYZ or tests for them (as >> my RFC "garbage" attribute patch revealed). So this whole thing still >> feels like jumping three steps ahead to me in terms of addressing *that* >> issue, but perhaps you have some orthogonal use-case in mind for this? > > I'm not addressing the accidentally losing data in this patch. My > answer for that would still be backup-log, if it ever gets merged. But > this patch is about _known_ files that I want to keep when doing "git > clean", no more. Indeed. My concern is that we're making incremental steps without a clear idea of the end state, and once we get there we might find that some steps along the way box us in or weren't what we wanted to solve the overall UX issue. More so in the CL / commit message not describing where we are overall, where this patch fits in (backup log, etc.) than this whole thing (backup log is already 24 patches) needing to be sent as one giant series...
On Mon, Feb 18, 2019 at 4:53 PM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote: > > > On Sun, Feb 17 2019, Duy Nguyen wrote: > > > On Sun, Feb 17, 2019 at 2:36 AM Ævar Arnfjörð Bjarmason > > <avarab@gmail.com> wrote: > >> > >> > >> On Sat, Feb 16 2019, Nguyễn Thái Ngọc Duy wrote: > >> > >> [Re-CC some people involved the last time around] > >> > >> > A new attribute "precious" is added to indicate that certain files > >> > have valuable content and should not be easily discarded even if they > >> > are ignored or untracked. > >> > > >> > So far there are one part of Git that are made aware of precious > >> > files: "git clean" will leave precious files alone. > >> > >> Thanks for bringing this up again. There were also some patches recently > >> to save away clobbered files, do you/anyone else have any end goal in > >> mind here that combines this & that, or some other thing I may not have > >> kept up with? > > > > I assume you mean the clobbering untracked files by merge/checkout. > > Those files will be backed up [1] if backup-log is implemented. Even > > files deleted by "git clean" could be saved but that might go a little > > too far. > > And I suppose if we have some mechanism for "don't backup but error out > if it was detectes as needed" we'll have the equivalent of inverting the > merge/checkout --force behavior now (& my "garbage" patch), i.e. we'd > stall on potential clobbering and need to carry on with --force. Yes that's another way to go. > > [1] https://public-inbox.org/git/20181209104419.12639-20-pclouds@gmail.com/ > > > >> My commentary on this whole thing is basically a repeat of what I said > >> in https://public-inbox.org/git/87wop0yvxv.fsf@evledraar.gmail.com/ > >> > >> I.e. we have a definite problem here somewhere, and there is some > >> solution, but this patch feels a bit like navigating that maze in the > >> dark without a map. > >> > >> We had users report that the likes of "pull" were eating their data, but > >> now with this iteration of "precious" only impacting "clean" the only > >> problem anyone with the current semantics is still left unaddressed. My > >> memory (I may be wrong) is that "clean" was just brought up (by you?) as > >> a "what about this other related case?" in that whole discussion. > >> > >> So as noted in the E-Mail linked above I think the first step should be > >> to enumerate/document/test the cases where we're now eating data > >> implicitly, and discuss how that relates to the semantics we desired > >> when the data-eating behavior was first introduced (as noted in E-Mails > >> linked from the above, my own preliminary digging seems to reveal there > >> isn't much of a relationship between the two). > >> > >> Only when we have that list of XYZ cases we're supporting now, and can > >> see that XYZ is so important to maintain backwards compatibility for > >> that we can't change it should way say "we eat your data by default > >> because XYZ is so useful/backcompat, set 'precious' ...". > >> > >> But right now we don't even have the list of XYZ or tests for them (as > >> my RFC "garbage" attribute patch revealed). So this whole thing still > >> feels like jumping three steps ahead to me in terms of addressing *that* > >> issue, but perhaps you have some orthogonal use-case in mind for this? > > > > I'm not addressing the accidentally losing data in this patch. My > > answer for that would still be backup-log, if it ever gets merged. But > > this patch is about _known_ files that I want to keep when doing "git > > clean", no more. > > Indeed. My concern is that we're making incremental steps without a > clear idea of the end state, and once we get there we might find that > some steps along the way box us in or weren't what we wanted to solve > the overall UX issue. > > More so in the CL / commit message not describing where we are overall, > where this patch fits in (backup log, etc.) than this whole thing > (backup log is already 24 patches) needing to be sent as one giant > series... Not seeing the overall picture is probably because I don't have any. Both backup-log and this precious attribute are more at plumbing level that could be combined to make something, but no I don't have the whole final UX worked out. The end game for backup log may be "git undo" (or just --undo option across commands). The "precious" attribute does not exactly have any role in "git undo" picture. It's just refining the file classification (untracked, tracked, ignored) that we have although it could be used as hints for "git undo". That's all I got.
Duy Nguyen <pclouds@gmail.com> writes: > On Sun, Feb 17, 2019 at 2:36 AM Ævar Arnfjörð Bjarmason > <avarab@gmail.com> wrote: >> >> >> On Sat, Feb 16 2019, Nguyễn Thái Ngọc Duy wrote: >> >> [Re-CC some people involved the last time around] >> >> > A new attribute "precious" is added to indicate that certain files >> > have valuable content and should not be easily discarded even if they >> > are ignored or untracked. >> > >> > So far there are one part of Git that are made aware of precious >> > files: "git clean" will leave precious files alone. >> >> Thanks for bringing this up again. There were also some patches recently >> to save away clobbered files, do you/anyone else have any end goal in >> mind here that combines this & that, or some other thing I may not have >> kept up with? > > I assume you mean the clobbering untracked files by merge/checkout. > Those files will be backed up [1] if backup-log is implemented. Even > files deleted by "git clean" could be saved but that might go a little > too far. I agree with Ævar that it is a very good idea to ask what the endgame should look like. I would have expected that, with an introduction of new "ignored but unexpendable" class of file (i.e. "precious" here), operations such as merge and checkout will be updated to keep them in situations where we would remove "ignored and expendable" files (i.e. "ignored"). And it is perfectly OK if the very first introduction of the "precious" support begins only with a single operation, such as "clean", as long as the end-goal is clear. I personally do not believe in "backup log"; if we can screw up and can fail to stop an operation that must avoid losing info, then we can screw up the same way and fail to design and implement "backup" to save info before an operation loses it. If we do a good job in supporting "precious" in various operations, we can rely less on "backup log" and still be safe ;-)
On Wed, Feb 20, 2019 at 1:08 AM Junio C Hamano <gitster@pobox.com> wrote: > > Duy Nguyen <pclouds@gmail.com> writes: > > > On Sun, Feb 17, 2019 at 2:36 AM Ævar Arnfjörð Bjarmason > > <avarab@gmail.com> wrote: > >> > >> > >> On Sat, Feb 16 2019, Nguyễn Thái Ngọc Duy wrote: > >> > >> [Re-CC some people involved the last time around] > >> > >> > A new attribute "precious" is added to indicate that certain files > >> > have valuable content and should not be easily discarded even if they > >> > are ignored or untracked. > >> > > >> > So far there are one part of Git that are made aware of precious > >> > files: "git clean" will leave precious files alone. > >> > >> Thanks for bringing this up again. There were also some patches recently > >> to save away clobbered files, do you/anyone else have any end goal in > >> mind here that combines this & that, or some other thing I may not have > >> kept up with? > > > > I assume you mean the clobbering untracked files by merge/checkout. > > Those files will be backed up [1] if backup-log is implemented. Even > > files deleted by "git clean" could be saved but that might go a little > > too far. > > I agree with Ævar that it is a very good idea to ask what the > endgame should look like. I would have expected that, with an > introduction of new "ignored but unexpendable" class of file > (i.e. "precious" here), operations such as merge and checkout will > be updated to keep them in situations where we would remove "ignored > and expendable" files (i.e. "ignored"). And it is perfectly OK if > the very first introduction of the "precious" support begins only > with a single operation, such as "clean", as long as the end-goal is > clear. I think the sticking point is how to deal with the surprise factor and "precious" will not help at all in this aspect. In my mind there are three classes - total expectation, i know i want git to not touch some files, i tell git so (e.g. with "precious") - surprises sometimes, but in known classes. This is the main use case of backup log, where I may accidentally do "git commit -amsomething" after carefully preparing the index. Saving overwritten files by merge/checkout could be done here as an alternative to "garbage" attribute. > I personally do not believe in "backup log"; if we can screw up and > can fail to stop an operation that must avoid losing info, then we > can screw up the same way and fail to design and implement "backup" > to save info before an operation loses it. If we do a good job in > supporting "precious" in various operations, we can rely less on > "backup log" and still be safe ;-) and this is the third class, something completely unexpected. Yes backup-log can't help here, but I don't think "precious" can either. And I have no good proposal for this case.
On February 20, 2019 2:35:41 AM GMT+01:00, Duy Nguyen <pclouds@gmail.com> wrote: >On Wed, Feb 20, 2019 at 1:08 AM Junio C Hamano <gitster@pobox.com> >wrote: >> >> Duy Nguyen <pclouds@gmail.com> writes: >> >> > On Sun, Feb 17, 2019 at 2:36 AM Ævar Arnfjörð Bjarmason >> > <avarab@gmail.com> wrote: >> >> >> >> >> >> On Sat, Feb 16 2019, Nguyễn Thái Ngọc Duy wrote: >> >> >> >> [Re-CC some people involved the last time around] >> >> >> >> > A new attribute "precious" is added to indicate that certain >files >> >> > have valuable content and should not be easily discarded even if >they >> >> > are ignored or untracked. >> >> > >> >> > So far there are one part of Git that are made aware of precious >> >> > files: "git clean" will leave precious files alone. >> >> >> >> Thanks for bringing this up again. There were also some patches >recently >> >> to save away clobbered files, do you/anyone else have any end goal >in >> >> mind here that combines this & that, or some other thing I may not >have >> >> kept up with? >> > >> > I assume you mean the clobbering untracked files by merge/checkout. >> > Those files will be backed up [1] if backup-log is implemented. >Even >> > files deleted by "git clean" could be saved but that might go a >little >> > too far. >> >> I agree with Ævar that it is a very good idea to ask what the >> endgame should look like. I would have expected that, with an >> introduction of new "ignored but unexpendable" class of file >> (i.e. "precious" here), operations such as merge and checkout will >> be updated to keep them in situations where we would remove "ignored >> and expendable" files (i.e. "ignored"). And it is perfectly OK if >> the very first introduction of the "precious" support begins only >> with a single operation, such as "clean", as long as the end-goal is >> clear. > >I think the sticking point is how to deal with the surprise factor and >"precious" will not help at all in this aspect. In my mind there are >three classes > > - total expectation, i know i want git to not touch some files, i >tell git so (e.g. with "precious") > > - surprises sometimes, but in known classes. This is the main use >case of backup log, where I may accidentally do "git commit >-amsomething" after carefully preparing the index. Saving overwritten >files by merge/checkout could be done here as an alternative to >"garbage" attribute. > >> I personally do not believe in "backup log"; if we can screw up and >> can fail to stop an operation that must avoid losing info, then we >> can screw up the same way and fail to design and implement "backup" >> to save info before an operation loses it. If we do a good job in >> supporting "precious" in various operations, we can rely less on >> "backup log" and still be safe ;-) > >and this is the third class, something completely unexpected. Yes >backup-log can't help here, but I don't think "precious" can either. >And I have no good proposal for this case. Sorry for going off on a tangent here, but I have had this on my mind for a long time. For cases where merge can lead to loss of a non-ignored untracked file (t7607-merge-overwrite.sh), I have the following proposal: 1. Merge the ORIG_HEAD and MERGE_HEAD commits without touching the index or the work tree. This is where we do rename detection, recursive merge, and content (line-by-line) merge. The result is CHECKOUT_HEAD, a tree with possible merge conflicts. For the switch branch operation CHECKOUT_HEAD is the tree to switch to. The remaining steps are the same for merge and switch branch operations. 2. Merge CHECKOUT_HEAD and the index with ORIG_HEAD as the merge base. The result is the CHECKOUT_INDEX. Do this in order to keep staged changes which are not affected by the merge. Do not do rename detection or content merge. In case of conflict, rollback and error out. 3. Merge CHECKOUT_INDEX with the work tree with the original index as merge base. Do this to simulate the work tree update. Dp not do remame detection or content merge. A conflict means that the checkout operation would touch untracked files or files with unstaged changes. In case of such a conflict, rollback and error out. I believe this algorithm would behave much like the current implementation. But it separates the rename/history/content aspects of the merge algorithm from the checkout operation. It greatly simplifies the implementation of the checkout operation and there are no special cases where we lose files. Implementing step 1 is the tricky part. But it may still be worthwhile because the merge algorithm does not have to worry about staged changes or unstaged changes. The merge algorithm could work on the hierarchical tree structure instead of the flattened index. This makes it trivial to detect directory/file conflicts (no need to do a lookahead when iterating index files). This is also a better fit for detecting directory renames. Maybe this will allow us to focus more on rename detection, such as directory renames or moved functions [*1*]. [*1*] Also: moved files where the original file is replaced with a wrapper for the moved file always fools rename detection because we don't detect renames for files which were not removed.
On Tue, Feb 19 2019, Junio C Hamano wrote: > Duy Nguyen <pclouds@gmail.com> writes: > >> On Sun, Feb 17, 2019 at 2:36 AM Ævar Arnfjörð Bjarmason >> <avarab@gmail.com> wrote: >>> >>> >>> On Sat, Feb 16 2019, Nguyễn Thái Ngọc Duy wrote: >>> >>> [Re-CC some people involved the last time around] >>> >>> > A new attribute "precious" is added to indicate that certain files >>> > have valuable content and should not be easily discarded even if they >>> > are ignored or untracked. >>> > >>> > So far there are one part of Git that are made aware of precious >>> > files: "git clean" will leave precious files alone. >>> >>> Thanks for bringing this up again. There were also some patches recently >>> to save away clobbered files, do you/anyone else have any end goal in >>> mind here that combines this & that, or some other thing I may not have >>> kept up with? >> >> I assume you mean the clobbering untracked files by merge/checkout. >> Those files will be backed up [1] if backup-log is implemented. Even >> files deleted by "git clean" could be saved but that might go a little >> too far. > > I agree with Ævar that it is a very good idea to ask what the > endgame should look like. I would have expected that, with an > introduction of new "ignored but unexpendable" class of file > (i.e. "precious" here), operations such as merge and checkout will > be updated to keep them in situations where we would remove "ignored > and expendable" files (i.e. "ignored"). And it is perfectly OK if > the very first introduction of the "precious" support begins only > with a single operation, such as "clean", as long as the end-goal is > clear. FWIW I'm in full agreement with that. > I personally do not believe in "backup log"; if we can screw up and > can fail to stop an operation that must avoid losing info, then we > can screw up the same way and fail to design and implement "backup" > to save info before an operation loses it. Yes, there could be some unforseen interaction between git commands where we should have such a backup log, but did not think to implement it. I'd hope such cases would be reported, and we could fix them. But those sorts of cases aren't why we started discussing this, rather we *know* what the data shredding command interaction is, but there wasn't a consensus for just not shredding data by default by making users use "checkout -f" or "merge -f" to proceed. I.e. taking some variant of my "trashable" patch[1]. > If we do a good job in > supporting "precious" in various operations, we can rely less on > "backup log" and still be safe ;-) Is noted in previous discussions[2] I think that's entirely implausible. I think at best the "precious" facility will be used to mark e.g *.o files as "don't check in, but don't clean (Makefile handles it)". Most git users are at the level of only knowing very basic add/commit/pull/push command interaction. I feel strongly that we need to make our tools safe to use by default, and not require some relatively advanced "precious"/attribute facility to be carefully configured in advance so we don't throw away uncommitted work on the likes of merge/checkout. 1. https://public-inbox.org/git/87zhuf3gs0.fsf@evledraar.gmail.com/ 2. https://public-inbox.org/git/871s7r4wuv.fsf@evledraar.gmail.com/
On 20.02.19 10:19, Ævar Arnfjörð Bjarmason wrote: > Most git users are at the level of only knowing very basic > add/commit/pull/push command interaction. I feel strongly that we need > to make our tools safe to use by default, and not require some > relatively advanced "precious"/attribute facility to be carefully > configured in advance so we don't throw away uncommitted work on the > likes of merge/checkout. > > 1. https://public-inbox.org/git/87zhuf3gs0.fsf@evledraar.gmail.com/ > 2. https://public-inbox.org/git/871s7r4wuv.fsf@evledraar.gmail.com/ > +1 Please consider that silently deleting files is a no-go. I teach computer science, and our switch from subversion to git for our second year programming projects caused a lot of grief, so much that my colleagues consider switching back to subversion as the point of first contact with revisioning. Silently deleting partially revisioned files is a major source: students regularly destroy IDE or OS specific config files that they cannot restore themselves. (Project participants use all kinds of different IDEs on different OSs and thus have all kinds of weird hidden files that always manage to get checked into the repository, wreaking havoc on another's machine. So they get deleted and thus disturb the student that needed those files.) We do provide a huge .gitignore that ought to prevent this, but despite numerous warnings they only add it later, which then causes previously checked-in files to be lost upon switching between branches. Please, by default, issue at least a warning before files are irrevocably los - or maybe keep a local snapshot of everything for the last few checkout in order to undo them? Thanks, Steffen.
On Wed, Feb 20, 2019 at 4:19 PM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote: > > I personally do not believe in "backup log"; if we can screw up and > > can fail to stop an operation that must avoid losing info, then we > > can screw up the same way and fail to design and implement "backup" > > to save info before an operation loses it. > > Yes, there could be some unforseen interaction between git commands > where we should have such a backup log, but did not think to implement > it. I'd hope such cases would be reported, and we could fix them. > > But those sorts of cases aren't why we started discussing this, rather > we *know* what the data shredding command interaction is, but there > wasn't a consensus for just not shredding data by default by making > users use "checkout -f" or "merge -f" to proceed. I.e. taking some > variant of my "trashable" patch[1]. > > > If we do a good job in > > supporting "precious" in various operations, we can rely less on > > "backup log" and still be safe ;-) > > Is noted in previous discussions[2] I think that's entirely > implausible. I think at best the "precious" facility will be used to > mark e.g *.o files as "don't check in, but don't clean (Makefile handles > it)". > > Most git users are at the level of only knowing very basic > add/commit/pull/push command interaction. I feel strongly that we need > to make our tools safe to use by default, and not require some > relatively advanced "precious"/attribute facility to be carefully > configured in advance so we don't throw away uncommitted work on the > likes of merge/checkout. There is a trade off somewhere. "new user first" should not come at the cost for more experienced users. Making "git checkout/merge" abort while it's working before breaks scripts. And requiring to mark trashable files manually duplicates a lot of ignore patterns. Have a look at any .gitignore file, the majority of them is for discardable files because "ignored" class was created with those in mind (*.o and friends). So now you would need to add more or less the same set of ignore rules in .gitattributes to mark them trashable, and gitignore/gitattributes rules are not exactly compatible, you can't just blindly copy them over. Every time you add one more .gitignore rule, there's a good chance you need to add a similar rule for trashable attribute. Maybe we just add a new "newbie" config knob and turn on the safety nets on. Leave the knob on by default. And I will turn it off in my ~/.gitconfig as soon as it's real.
On Wed, Feb 20 2019, Duy Nguyen wrote: > On Wed, Feb 20, 2019 at 4:19 PM Ævar Arnfjörð Bjarmason > <avarab@gmail.com> wrote: >> > I personally do not believe in "backup log"; if we can screw up and >> > can fail to stop an operation that must avoid losing info, then we >> > can screw up the same way and fail to design and implement "backup" >> > to save info before an operation loses it. >> >> Yes, there could be some unforseen interaction between git commands >> where we should have such a backup log, but did not think to implement >> it. I'd hope such cases would be reported, and we could fix them. >> >> But those sorts of cases aren't why we started discussing this, rather >> we *know* what the data shredding command interaction is, but there >> wasn't a consensus for just not shredding data by default by making >> users use "checkout -f" or "merge -f" to proceed. I.e. taking some >> variant of my "trashable" patch[1]. >> >> > If we do a good job in >> > supporting "precious" in various operations, we can rely less on >> > "backup log" and still be safe ;-) >> >> Is noted in previous discussions[2] I think that's entirely >> implausible. I think at best the "precious" facility will be used to >> mark e.g *.o files as "don't check in, but don't clean (Makefile handles >> it)". >> >> Most git users are at the level of only knowing very basic >> add/commit/pull/push command interaction. I feel strongly that we need >> to make our tools safe to use by default, and not require some >> relatively advanced "precious"/attribute facility to be carefully >> configured in advance so we don't throw away uncommitted work on the >> likes of merge/checkout. > > There is a trade off somewhere. "new user first" should not come at > the cost for more experienced users. > > Making "git checkout/merge" abort while it's working before breaks > scripts. And requiring to mark trashable files manually duplicates a > lot of ignore patterns. Have a look at any .gitignore file, the > majority of them is for discardable files because "ignored" class was > created with those in mind (*.o and friends). So now you would need to > add more or less the same set of ignore rules in .gitattributes to > mark them trashable, and gitignore/gitattributes rules are not exactly > compatible, you can't just blindly copy them over. Every time you add > one more .gitignore rule, there's a good chance you need to add a > similar rule for trashable attribute. > > Maybe we just add a new "newbie" config knob and turn on the safety > nets on. Leave the knob on by default. And I will turn it off in my > ~/.gitconfig as soon as it's real. Oh yes, as noted upthread ("My commentary on this whole thing..."[1] ) my position on what we should do at this point is not that we should definitely go one way or the other, but that more investigation is needed. As my "trashable"[2] patch makes clear we don't even have good tests or documentation for these cases, which would be a good first step. The one thing that *is* clear from my digging a few months back is that the behavior we have now in git is overzealous when we look at the initial case reported by Shawn way back when it was added. Specifically, the intention back in 2007 was to fix a case where "git checkout" ("read-tree -m", but whatever) would barf on a branch switch where switching needed to replace a *tracked* "smth" with a *tracked* "smth/file", or the other way around[3]. Does that mean we can just back that behavior out? No, because people might have come to rely on it, but we should start with seeing exactly what it *does* do, whether all those things are important or intended, and maybe we can weight the shredding/backcompat trade-off for some of those differently than others. So the obvious thing to try would be to see if we can narrowly keep the behavior where we end up shredding a file on disk, *but* are switching between two trees A & B where that have/don't have that file. Or more generously, try to "git hash-object" arbitrary files we're about to shred, and check if it's already in the object database. That would catch case where e.g. the user switching from A->B and would shred a file, but it (or conflicting dir) is known to neither "A" nor "B", but exists as a checked-in file in unrelated commit "C", which the user recently had checked out (and e.g. their editor auto saved it as-is or something...). But it's entirely possible that after all that digging we'll come to the conclusion that we can't change this at all, and we're just going to live with all the current caveats. That doesn't mean that having what amounts to a power user feature to mitigate that damage if you know git well enough that it's going to be a problem is going to help anything but a small minority of users. So "dude, where's my data?" problem will still exist. Even then there room to maneuver, e.g.: X. Perhaps after investigating it's not acceptable to change the default for script use, but could we require --force if we detect that we're connected to a terminal? Y. Or if even that is considered too much, we could have something like how help.autoCorrect works, where if we detect we're about to eat data we wait for 10 seconds, and invite the user to Ctrl+C now because we're about to clobber file "xyz". Z. It's for whatever reason still unacceptable to do X or Y (or some similar mitigation) for all cases of file shredding, but would be OK for some specific sub-cases (e.g. the not known to git-hash-object case above), and we have reason to suspect that such a narrow mitigation strikes the right trade-off between backwards compatibility and preventing the "dude, where's my data?" reports we get about this periodically. 1. https://public-inbox.org/git/87wolzo7a1.fsf@evledraar.gmail.com/ 2. https://public-inbox.org/git/87zhuf3gs0.fsf@evledraar.gmail.com/ 3. https://public-inbox.org/git/87wopj3661.fsf@evledraar.gmail.com/
On February 20, 2019 10:41:51 AM GMT+01:00, Duy Nguyen <pclouds@gmail.com> wrote: >Making "git checkout/merge" abort while it's working before breaks >scripts. Change is always a trade-off. We should not reject change without considering the merits. Once we agree on the desired state, we can think about the migration strategy. >And requiring to mark trashable files manually duplicates a >lot of ignore patterns. Have a look at any .gitignore file, the >majority of them is for discardable files because "ignored" class was >created with those in mind (*.o and friends). So now you would need to >add more or less the same set of ignore rules in .gitattributes to >mark them trashable, and gitignore/gitattributes rules are not exactly >compatible, you can't just blindly copy them over. Every time you add >one more .gitignore rule, there's a good chance you need to add a >similar rule for trashable attribute. I agree that ignored precious files are typically a small subset of the ignore files. Maintaining separate rules for ignored files and for trashable files would result in a lot of duplication. On the other hand, how frequently do we really have to trash ignored files? Trashing a file should only be necessary if a tracked file overwrites an ignored file. When does this happen? I don't think it will happen for *.o files. So in most cases, there is simply no need to specify which files are precious. The default could simply be that all files are precious. To support more complex use cases, we could specify precious files in addition to ignored files. Only if we specify precious files (and/or enable the ignored-are-trashable config option on a repository level), all other files become trashable. Functionally this is equivalent the newbie option which you suggest, but I think it is not an issue of newbie vs experienced users but an issue of common vs special use cases. >Maybe we just add a new "newbie" config knob and turn on the safety >nets on. Leave the knob on by default. And I will turn it off in my >~/.gitconfig as soon as it's real.
Duy Nguyen <pclouds@gmail.com> writes: > - surprises sometimes, but in known classes. This is the main use > case of backup log, where I may accidentally do "git commit > -amsomething" after carefully preparing the index. Saving overwritten > files by merge/checkout could be done here as an alternative to > "garbage" attribute. The problem with either of these is not the "saving" half, but "restoring". For different people, and for different cases even to the same person, granularity of what is perceived as a single action is different, so "give me the state of my working tree files before the last operation" is a request that does not have a good definition. After finishing "git rebase" that replays 3 changes, one of which needs manual conflict resolution, you may realize that you made an incorrect resolution for one path but not others. How would you let the user say "no, I do not want to undo the whole rebase, I want to go back to the state where I replayed the first change, saw the conflicts while replaying the second change, and resolved them in these files, but before touching that last one I screwed up resolving, so that I can correct"? Piling many "backups" (or "snapshots") on top of each other is the easier part; I'd expect that it would be a much harder design problem to let users make use of them in meaningful ways, and that is the primary reason why I am skeptical.
Duy Nguyen <pclouds@gmail.com> writes: > There is a trade off somewhere. "new user first" should not come at > the cost for more experienced users. Probably. Nobody will stay being newbie forever. > Making "git checkout/merge" abort while it's working before breaks > scripts. And requiring to mark trashable files manually duplicates a > lot of ignore patterns. Have a look at any .gitignore file, the > majority of them is for discardable files because "ignored" class was > created with those in mind (*.o and friends). Very true. That is why we were OK for so long with "ignored" that means "ignored and expendable". We know in some situations we want "ignored but precious", and that is why we are discussing this topic. > So now you would need to > add more or less the same set of ignore rules in .gitattributes to > mark them trashable, and gitignore/gitattributes rules are not exactly > compatible, you can't just blindly copy them over. Every time you add > one more .gitignore rule, there's a good chance you need to add a > similar rule for trashable attribute. I am not sure why you would even need to _duplicate_. Are you saying for each and every rule that specify "ignored and expendable" in .gitignore there always will be "ignored but precious" exception that match the pattern? Given that we have been OK for so long without even needing "precious", I find it somewhat unrealistic to assume so.
On Thu, Feb 21, 2019 at 5:39 AM Junio C Hamano <gitster@pobox.com> wrote: > > So now you would need to > > add more or less the same set of ignore rules in .gitattributes to > > mark them trashable, and gitignore/gitattributes rules are not exactly > > compatible, you can't just blindly copy them over. Every time you add > > one more .gitignore rule, there's a good chance you need to add a > > similar rule for trashable attribute. > > I am not sure why you would even need to _duplicate_. > > Are you saying for each and every rule that specify "ignored and > expendable" in .gitignore there always will be "ignored but > precious" exception that match the pattern? Given that we have been > OK for so long without even needing "precious", I find it somewhat > unrealistic to assume so. If all ignored files are now redefined as precious and we mark them expendable with trashable attribute, then we need to duplicate most of the rules. The "precious" attribute of course does not have this problem since precious-and-ignored files should be rare.
On Wed, Feb 20, 2019 at 6:11 PM Clemens Buchacher <drizzd@gmx.net> wrote: > >And requiring to mark trashable files manually duplicates a > >lot of ignore patterns. Have a look at any .gitignore file, the > >majority of them is for discardable files because "ignored" class was > >created with those in mind (*.o and friends). So now you would need to > >add more or less the same set of ignore rules in .gitattributes to > >mark them trashable, and gitignore/gitattributes rules are not exactly > >compatible, you can't just blindly copy them over. Every time you add > >one more .gitignore rule, there's a good chance you need to add a > >similar rule for trashable attribute. > > I agree that ignored precious files are typically a small subset of the ignore files. Maintaining separate rules for ignored files and for trashable files would result in a lot of duplication. > > On the other hand, how frequently do we really have to trash ignored files? Trashing a file should only be necessary if a tracked file overwrites an ignored file. When does this happen? I don't think it will happen for *.o files. So in most cases, there is simply no need to specify which files are precious. The default could simply be that all files are precious. > > To support more complex use cases, we could specify precious files in addition to ignored files. Only if we specify precious files (and/or enable the ignored-are-trashable config option on a repository level), all other files become trashable. > > Functionally this is equivalent the newbie option which you suggest, but I think it is not an issue of newbie vs experienced users but an issue of common vs special use cases. So far the two conflicting cases are "git checkout/merge" and "git clean". Ignored files are valuable by default in the former, while it's expendable by default in the latter. So if you add this ignored-are-trashable config key (defaults to false), git-clean -if will not do anything anymore. We _could_ advice the user to turn the config on (with all the consequences). I don't know if we have any other use cases that deserve the same advice. Another option is simply leave the expendable/precious nature of ignored files undefined like it is now and handle case by case: - git-clean learns to use a new attribute "clean". Undefined attribute is seen as +clean. To keep some files "precious" you update .gitattributes and add -clean rules. - git merge/checkout learns another attribute, checkout-overwrite-ignore? Undefined attribute is seen as -checkout-overwrite-ignore (i.e. abort the operation). We stay away from any generic attribute name in this direction to make clear it's only applicable to specific use cases.
Duy Nguyen <pclouds@gmail.com> writes: > On Thu, Feb 21, 2019 at 5:39 AM Junio C Hamano <gitster@pobox.com> wrote: >> > So now you would need to >> > add more or less the same set of ignore rules in .gitattributes to >> > mark them trashable, and gitignore/gitattributes rules are not exactly >> > compatible, you can't just blindly copy them over. Every time you add >> > one more .gitignore rule, there's a good chance you need to add a >> > similar rule for trashable attribute. >> >> I am not sure why you would even need to _duplicate_. >> >> Are you saying for each and every rule that specify "ignored and >> expendable" in .gitignore there always will be "ignored but >> precious" exception that match the pattern? Given that we have been >> OK for so long without even needing "precious", I find it somewhat >> unrealistic to assume so. > > If all ignored files are now redefined as precious and we mark them > expendable with trashable attribute, then we need to duplicate most of > the rules. The "precious" attribute of course does not have this > problem since precious-and-ignored files should be rare. Ah, so you are saying that ignored (the traditional one we always had) plus precious is a better combination than ignored (repurposed to mean ignored-but-precious) plus trashable, because the latter will cause people configure with a lot of duplication? If that is the case, I'd agree ;-)
diff --git a/Documentation/git-clean.txt b/Documentation/git-clean.txt index 03056dad0d..a9beadfb12 100644 --- a/Documentation/git-clean.txt +++ b/Documentation/git-clean.txt @@ -21,7 +21,8 @@ option is specified, ignored files are also removed. This can, for example, be useful to remove all build products. If any optional `<path>...` arguments are given, only those paths -are affected. +are affected. Ignored or untracked files with `precious` attributes +are not removed. OPTIONS ------- diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt index 9b41f81c06..1f2e830241 100644 --- a/Documentation/gitattributes.txt +++ b/Documentation/gitattributes.txt @@ -1192,6 +1192,17 @@ If this attribute is not set or has an invalid value, the value of the (See linkgit:git-config[1]). +Precious files +~~~~~~~~~~~~~~ + +`precious` +^^^^^^^^^^ + +This attribute is set on files to indicate that their content is +valuable. Some commands will behave slightly different on precious +files. linkgit:git-clean[1] will leave precious files alone. + + USING MACRO ATTRIBUTES ---------------------- diff --git a/Documentation/gitignore.txt b/Documentation/gitignore.txt index 1c94f08ff4..0e9614289e 100644 --- a/Documentation/gitignore.txt +++ b/Documentation/gitignore.txt @@ -141,6 +141,10 @@ not tracked by Git remain untracked. To stop tracking a file that is currently tracked, use 'git rm --cached'. +Ignored files are generally considered discardable. See `precious` +attribute in linkgit:gitattributes[5] to change the behavior regarding +ignored files. + EXAMPLES -------- diff --git a/attr.c b/attr.c index fdd110bec5..6349410e14 100644 --- a/attr.c +++ b/attr.c @@ -1157,3 +1157,15 @@ void attr_start(void) pthread_mutex_init(&g_attr_hashmap.mutex, NULL); pthread_mutex_init(&check_vector.mutex, NULL); } + +int is_precious_file(struct index_state *istate, const char *path) +{ + static struct attr_check *check; + if (!check) + check = attr_check_initl("precious", NULL); + if (!check) + return 0; + + git_check_attr(istate, path, check); + return ATTR_TRUE(check->items[0].value); +} diff --git a/attr.h b/attr.h index b0378bfe5f..b9a9751a66 100644 --- a/attr.h +++ b/attr.h @@ -82,4 +82,6 @@ void git_attr_set_direction(enum git_attr_direction new_direction); void attr_start(void); +int is_precious_file(struct index_state *istate, const char *path); + #endif /* ATTR_H */ diff --git a/builtin/clean.c b/builtin/clean.c index aaba4af3c2..47ec5e58cf 100644 --- a/builtin/clean.c +++ b/builtin/clean.c @@ -18,6 +18,7 @@ #include "color.h" #include "pathspec.h" #include "help.h" +#include "attr.h" static int force = -1; /* unset */ static int interactive; @@ -31,6 +32,8 @@ static const char *const builtin_clean_usage[] = { static const char *msg_remove = N_("Removing %s\n"); static const char *msg_would_remove = N_("Would remove %s\n"); +static const char *msg_skip_precious = N_("Skipping precious file %s\n"); +static const char *msg_would_skip_precious = N_("Would skip precious file %s\n"); static const char *msg_skip_git_dir = N_("Skipping repository %s\n"); static const char *msg_would_skip_git_dir = N_("Would skip repository %s\n"); static const char *msg_warn_remove_failed = N_("failed to remove %s"); @@ -154,6 +157,7 @@ static int remove_dirs(struct strbuf *path, const char *prefix, int force_flag, struct dirent *e; int res = 0, ret = 0, gone = 1, original_len = path->len, len; struct string_list dels = STRING_LIST_INIT_DUP; + const char *rel_path; *dir_gone = 1; @@ -193,9 +197,16 @@ static int remove_dirs(struct strbuf *path, const char *prefix, int force_flag, strbuf_setlen(path, len); strbuf_addstr(path, e->d_name); - if (lstat(path->buf, &st)) + if (lstat(path->buf, &st)) { ; /* fall thru */ - else if (S_ISDIR(st.st_mode)) { + } else if ((!prefix && is_precious_file(&the_index, path->buf)) || + (prefix && skip_prefix(path->buf, prefix, &rel_path) && + is_precious_file(&the_index, rel_path))) { + quote_path_relative(path->buf, prefix, "ed); + printf(dry_run ? _(msg_would_skip_precious) : _(msg_skip_precious), quoted.buf); + *dir_gone = 0; + continue; + } else if (S_ISDIR(st.st_mode)) { if (remove_dirs(path, prefix, force_flag, dry_run, quiet, &gone)) ret = 1; if (gone) { @@ -1019,7 +1030,10 @@ int cmd_clean(int argc, const char **argv, const char *prefix) if (lstat(abs_path.buf, &st)) continue; - if (S_ISDIR(st.st_mode)) { + if (is_precious_file(&the_index, item->string)) { + qname = quote_path_relative(item->string, NULL, &buf); + printf(dry_run ? _(msg_would_skip_precious) : _(msg_skip_precious), qname); + } else if (S_ISDIR(st.st_mode)) { if (remove_dirs(&abs_path, prefix, rm_flags, dry_run, quiet, &gone)) errors++; if (gone && !quiet) { diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh index 7b36954d63..a478a08a27 100755 --- a/t/t7300-clean.sh +++ b/t/t7300-clean.sh @@ -669,4 +669,33 @@ test_expect_success 'git clean -d skips untracked dirs containing ignored files' test_path_is_missing foo/b/bb ' +test_expect_success 'git clean -xd leaves precious files alone' ' + git init precious && + ( + cd precious && + test_commit one && + cat >.gitignore <<-\EOF && + *.o + *.mak + EOF + cat >.gitattributes <<-\EOF && + *.mak precious + .gitattributes precious + *.precious precious + EOF + mkdir sub && + touch one.o sub/two.o one.mak sub/two.mak && + touch one.untracked two.precious sub/also.precious && + git clean -fdx && + test_path_is_missing one.o && + test_path_is_missing sub/two.o && + test_path_is_missing one.untracked && + test_path_is_file .gitattributes && + test_path_is_file one.mak && + test_path_is_file sub/two.mak && + test_path_is_file two.precious && + test_path_is_file sub/also.precious + ) +' + test_done
A new attribute "precious" is added to indicate that certain files have valuable content and should not be easily discarded even if they are ignored or untracked. So far there are one part of Git that are made aware of precious files: "git clean" will leave precious files alone. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> --- Documentation/git-clean.txt | 3 ++- Documentation/gitattributes.txt | 11 +++++++++++ Documentation/gitignore.txt | 4 ++++ attr.c | 12 ++++++++++++ attr.h | 2 ++ builtin/clean.c | 20 +++++++++++++++++--- t/t7300-clean.sh | 29 +++++++++++++++++++++++++++++ 7 files changed, 77 insertions(+), 4 deletions(-)