Message ID | 20181111095254.30473-1-pclouds@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [RFC] Introduce "precious" file concept | expand |
On Sun, Nov 11, 2018 at 10:53 AM Nguyễn Thái Ngọc Duy <pclouds@gmail.com> wrote: > > Also while "precious" is a fun name, but it does not sound serious. > Any suggestions? Perhaps "valuable"? "precious" is also used by POSIX make: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/make.html Bert
[CC-ing some of the people involved in recent threads about this] On Sun, Nov 11 2018, Nguyễn Thái Ngọc Duy wrote: > Since this topic has come up twice recently, I revisited this > "precious" thingy that I started four years ago and tried to see if I > could finally finish it. There are a couple things to be sorted out... > > A new attribute "precious" is added to indicate that certain files > have valuable content and should not be easily discarded even if they > are ignored or untracked (*). > > So far there are two parts of Git that are made aware of precious > files: "git clean" will leave precious files alone and unpack-trees.c > (i.e. merges and branch switches) will not overwrite > ignored-but-precious files. > > Is there any other parts of Git that should be made aware of this > "precious" attribute? > > Also while "precious" is a fun name, but it does not sound serious. > Any suggestions? Perhaps "valuable"? > > Very lightly tested. The patch is more to have something to discuss > than is bug free and ready to use. > > (*) Note that tracked files could be marked "precious" in the future > too although the exact semantics is not very clear since tracked > files are by default precious. > > But something like "index log" could use this to record all > changes to precious files instead of just "git add -p" changes, > for example. So these files are in a sense more precious than > other tracked files. > > Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> > --- > Documentation/git-clean.txt | 3 ++- > Documentation/gitattributes.txt | 13 +++++++++++++ > attr.c | 9 +++++++++ > attr.h | 2 ++ > builtin/clean.c | 19 ++++++++++++++++--- > unpack-trees.c | 3 ++- > 6 files changed, 44 insertions(+), 5 deletions(-) > > diff --git a/Documentation/git-clean.txt b/Documentation/git-clean.txt > index 03056dad0d..a9beadfb12 100644 > --- a/Documentation/git-clean.txt > +++ b/Documentation/git-clean.txt > @@ -21,7 +21,8 @@ option is specified, ignored files are also removed. This can, for > example, be useful to remove all build products. > > If any optional `<path>...` arguments are given, only those paths > -are affected. > +are affected. Ignored or untracked files with `precious` attributes > +are not removed. > > OPTIONS > ------- > diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt > index b8392fc330..c722479bdc 100644 > --- a/Documentation/gitattributes.txt > +++ b/Documentation/gitattributes.txt > @@ -1188,6 +1188,19 @@ If this attribute is not set or has an invalid value, the value of the > (See linkgit:git-config[1]). > > > +Precious files > +~~~~~~~~~~~~~~~~~~~~~~~~ > + > +`precious` > +^^^^^^^^^^ > + > +This attribute is set on files to indicate that their content is > +valuable. Many commands will behave slightly different on precious > +files. linkgit:git-clean[1] will leave precious files alone. Merging > +and branch switching will not silently overwrite ignored files that > +are marked "precious". > + > + > USING MACRO ATTRIBUTES > ---------------------- > > diff --git a/attr.c b/attr.c > index 60d284796d..d06ca0ae4b 100644 > --- a/attr.c > +++ b/attr.c > @@ -1186,3 +1186,12 @@ void attr_start(void) > pthread_mutex_init(&check_vector.mutex, NULL); > #endif > } > + > +int is_precious_file(struct index_state *istate, const char *path) > +{ > + static struct attr_check *check; > + if (!check) > + check = attr_check_initl("precious", NULL); > + git_check_attr(istate, path, check); > + return check && ATTR_TRUE(check->items[0].value); > +} If we merge two branches is this using the merged post-image of .gitattributes as a source? > if (o->dir && > - is_excluded(o->dir, o->src_index, name, &dtype)) > + is_excluded(o->dir, o->src_index, name, &dtype) && > + !is_precious_file(o->src_index, name)) > /* > * ce->name is explicitly excluded, so it is Ok to > * overwrite it. I wonder if instead we should just be reverting c81935348b ("Fix switching to a branch with D/F when current branch has file D.", 2007-03-15), which these days (haven't dug deeply) would just be this, right?: > diff --git a/unpack-trees.c b/unpack-trees.c index 7570df481b..b3efaddd4f 100644 --- a/unpack-trees.c +++ b/unpack-trees.c @@ -1894,13 +1894,6 @@ static int check_ok_to_remove(const char *name, int len, int dtype, if (ignore_case && icase_exists(o, name, len, st)) return 0; - if (o->dir && - is_excluded(o->dir, o->src_index, name, &dtype)) - /* - * ce->name is explicitly excluded, so it is Ok to - * overwrite it. - */ - return 0; if (S_ISDIR(st->st_mode)) { /* * We are checking out path "foo" and Something like the approach you're taking will absolutely work from a technical standpoint, but I fear that it's going to be useless in practice. The users who need protection against git deleting their files the most are exactly the sort of users who aren't expert-level enough to understand the nuances of how the semantics of .gitignore and "precious" are going to interact before git eats their data. This is pretty apparent from the bug reports we're getting about this. None of them are: "Hey, I 100% understood .gitignore semantics including this one part of the docs where you say you'll do this, but just forgot one day and deleted my work. Can we get some more safety?" But rather (with some hyperbole for effect): "ZOMG git deleted my file! Is this a bug??" So I think we should have the inverse of this "precious" attribute". Just a change to the docs to say that .gitignore doesn't imply these eager deletion semantics on tree unpacking anymore, and if users want it back they can define a "garbage" attribute (s/precious/garbage/). That will lose no data, and in the very rare cases where a checkout of tracked files would overwrite an ignored pattern, we can just error out (as we do with the "Ok to overwrite" branch removed) and tell the user to delete the files to proceed. Three tests in our test suite fail with that patch applied, and they're explicitly testing for exactly the sort of scenario where users are likely to lose data. I.e.: 1. Open a tracked file in an editor 2. Save it 3. Switch to a topic branch, that has different .gitignore semantics (e.g. let's say a build/ dir exists there) 4. Have their work deleted So actually in writing this out I've become convinced that this "precious" approach can't work either, because *even if* you're an expert who manages to perfectly define their .gitignore and "precious" rules in advance to avoid data deletion, those rules will *also* need to take into account switching between branches (or even different histories) where you have other sorts of rules. So really, if there's ambiguity let's just not delete stuff by default and ask the user to resolve it.
Nguyễn Thái Ngọc Duy <pclouds@gmail.com> writes: > Also while "precious" is a fun name, but it does not sound serious. > Any suggestions? Perhaps "valuable"? FWIW, I am reasonably sure that I was the first in Git circle who used the term "precious" in discussions wrt .gitignore, i.e. "Git has ignored but not precious category". Since it was not my invention but was a borrowed term from tla (aka GNU arch), I'd suggest to keep using that term, unless there is a strong reason not to follow longstanding precedent.
On Sun, Nov 11 2018, Ævar Arnfjörð Bjarmason wrote: > [CC-ing some of the people involved in recent threads about this] > > On Sun, Nov 11 2018, Nguyễn Thái Ngọc Duy wrote: > >> Since this topic has come up twice recently, I revisited this >> "precious" thingy that I started four years ago and tried to see if I >> could finally finish it. There are a couple things to be sorted out... >> >> A new attribute "precious" is added to indicate that certain files >> have valuable content and should not be easily discarded even if they >> are ignored or untracked (*). >> >> So far there are two parts of Git that are made aware of precious >> files: "git clean" will leave precious files alone and unpack-trees.c >> (i.e. merges and branch switches) will not overwrite >> ignored-but-precious files. >> >> Is there any other parts of Git that should be made aware of this >> "precious" attribute? >> >> Also while "precious" is a fun name, but it does not sound serious. >> Any suggestions? Perhaps "valuable"? >> >> Very lightly tested. The patch is more to have something to discuss >> than is bug free and ready to use. >> >> (*) Note that tracked files could be marked "precious" in the future >> too although the exact semantics is not very clear since tracked >> files are by default precious. >> >> But something like "index log" could use this to record all >> changes to precious files instead of just "git add -p" changes, >> for example. So these files are in a sense more precious than >> other tracked files. >> >> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> >> --- >> Documentation/git-clean.txt | 3 ++- >> Documentation/gitattributes.txt | 13 +++++++++++++ >> attr.c | 9 +++++++++ >> attr.h | 2 ++ >> builtin/clean.c | 19 ++++++++++++++++--- >> unpack-trees.c | 3 ++- >> 6 files changed, 44 insertions(+), 5 deletions(-) >> >> diff --git a/Documentation/git-clean.txt b/Documentation/git-clean.txt >> index 03056dad0d..a9beadfb12 100644 >> --- a/Documentation/git-clean.txt >> +++ b/Documentation/git-clean.txt >> @@ -21,7 +21,8 @@ option is specified, ignored files are also removed. This can, for >> example, be useful to remove all build products. >> >> If any optional `<path>...` arguments are given, only those paths >> -are affected. >> +are affected. Ignored or untracked files with `precious` attributes >> +are not removed. >> >> OPTIONS >> ------- >> diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt >> index b8392fc330..c722479bdc 100644 >> --- a/Documentation/gitattributes.txt >> +++ b/Documentation/gitattributes.txt >> @@ -1188,6 +1188,19 @@ If this attribute is not set or has an invalid value, the value of the >> (See linkgit:git-config[1]). >> >> >> +Precious files >> +~~~~~~~~~~~~~~~~~~~~~~~~ >> + >> +`precious` >> +^^^^^^^^^^ >> + >> +This attribute is set on files to indicate that their content is >> +valuable. Many commands will behave slightly different on precious >> +files. linkgit:git-clean[1] will leave precious files alone. Merging >> +and branch switching will not silently overwrite ignored files that >> +are marked "precious". >> + >> + >> USING MACRO ATTRIBUTES >> ---------------------- >> >> diff --git a/attr.c b/attr.c >> index 60d284796d..d06ca0ae4b 100644 >> --- a/attr.c >> +++ b/attr.c >> @@ -1186,3 +1186,12 @@ void attr_start(void) >> pthread_mutex_init(&check_vector.mutex, NULL); >> #endif >> } >> + >> +int is_precious_file(struct index_state *istate, const char *path) >> +{ >> + static struct attr_check *check; >> + if (!check) >> + check = attr_check_initl("precious", NULL); >> + git_check_attr(istate, path, check); >> + return check && ATTR_TRUE(check->items[0].value); >> +} > > If we merge two branches is this using the merged post-image of > .gitattributes as a source? > >> if (o->dir && >> - is_excluded(o->dir, o->src_index, name, &dtype)) >> + is_excluded(o->dir, o->src_index, name, &dtype) && >> + !is_precious_file(o->src_index, name)) >> /* >> * ce->name is explicitly excluded, so it is Ok to >> * overwrite it. > > I wonder if instead we should just be reverting c81935348b ("Fix > switching to a branch with D/F when current branch has file D.", > 2007-03-15), which these days (haven't dug deeply) would just be this, > right?: > >> diff --git a/unpack-trees.c b/unpack-trees.c > index 7570df481b..b3efaddd4f 100644 > --- a/unpack-trees.c > +++ b/unpack-trees.c > @@ -1894,13 +1894,6 @@ static int check_ok_to_remove(const char *name, int len, int dtype, > if (ignore_case && icase_exists(o, name, len, st)) > return 0; > > - if (o->dir && > - is_excluded(o->dir, o->src_index, name, &dtype)) > - /* > - * ce->name is explicitly excluded, so it is Ok to > - * overwrite it. > - */ > - return 0; > if (S_ISDIR(st->st_mode)) { > /* > * We are checking out path "foo" and > > Something like the approach you're taking will absolutely work from a > technical standpoint, but I fear that it's going to be useless in > practice. > > The users who need protection against git deleting their files the most > are exactly the sort of users who aren't expert-level enough to > understand the nuances of how the semantics of .gitignore and "precious" > are going to interact before git eats their data. > > This is pretty apparent from the bug reports we're getting about > this. None of them are: > > "Hey, I 100% understood .gitignore semantics including this one part > of the docs where you say you'll do this, but just forgot one day > and deleted my work. Can we get some more safety?" > > But rather (with some hyperbole for effect): > > "ZOMG git deleted my file! Is this a bug??" > > So I think we should have the inverse of this "precious" > attribute". Just a change to the docs to say that .gitignore doesn't > imply these eager deletion semantics on tree unpacking anymore, and if > users want it back they can define a "garbage" attribute > (s/precious/garbage/). > > That will lose no data, and in the very rare cases where a checkout of > tracked files would overwrite an ignored pattern, we can just error out > (as we do with the "Ok to overwrite" branch removed) and tell the user > to delete the files to proceed. > > Three tests in our test suite fail with that patch applied, and they're > explicitly testing for exactly the sort of scenario where users are likely to lose data. I.e.: > > 1. Open a tracked file in an editor > 2. Save it > 3. Switch to a topic branch, that has different .gitignore semantics > (e.g. let's say a build/ dir exists there) > 4. Have their work deleted > > So actually in writing this out I've become convinced that this > "precious" approach can't work either, because *even if* you're an > expert who manages to perfectly define their .gitignore and "precious" > rules in advance to avoid data deletion, those rules will *also* need to > take into account switching between branches (or even different > histories) where you have other sorts of rules. > > So really, if there's ambiguity let's just not delete stuff by default > and ask the user to resolve it. Here's a patch to implement that (which borrows from some of yours). It passes all of our tests: diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt index b8392fc330..a6cad17899 100644 --- a/Documentation/gitattributes.txt +++ b/Documentation/gitattributes.txt @@ -1188,6 +1188,17 @@ If this attribute is not set or has an invalid value, the value of the (See linkgit:git-config[1]). +Trashable files +~~~~~~~~~~~~~~~ + +`trashable` +^^^^^^^^^^ + +Provides an escape hatch for re-enabling a potentially data destroying +feature which was enabled by default between Git versions 1.5.2 and +2.20. See the `NOTES` section of linkgit:gitignore[5] for details. + + USING MACRO ATTRIBUTES ---------------------- diff --git a/Documentation/gitignore.txt b/Documentation/gitignore.txt index d107daaffd..39c6d5955a 100644 --- a/Documentation/gitignore.txt +++ b/Documentation/gitignore.txt @@ -140,6 +140,13 @@ not tracked by Git remain untracked. To stop tracking a file that is currently tracked, use 'git rm --cached'. +Between Git versions 1.5.2 and 2.20 untracked files or directories +which were ignored and conflicted with a file about to be checked out +(e.g. during linkgit:git-checkout[1] or linkgit:git-merge[1]) would be +deleted. This could lead to loss of user data and is no longer the +default, See `trashable` in linkgit:gitattributes[5]. for how to +selectively enable this behavior. + EXAMPLES -------- diff --git a/attr.c b/attr.c index 60d284796d..930af78650 100644 --- a/attr.c +++ b/attr.c @@ -1186,3 +1186,12 @@ void attr_start(void) pthread_mutex_init(&check_vector.mutex, NULL); #endif } + +int is_trashable_file(struct index_state *istate, const char *path) +{ + static struct attr_check *check; + if (!check) + check = attr_check_initl("trashable", NULL); + git_check_attr(istate, path, check); + return check && ATTR_TRUE(check->items[0].value); +} diff --git a/attr.h b/attr.h index b0378bfe5f..ccf4d4e6b5 100644 --- a/attr.h +++ b/attr.h @@ -82,4 +82,6 @@ void git_attr_set_direction(enum git_attr_direction new_direction); void attr_start(void); +int is_trashable_file(struct index_state *istate, const char *path); + #endif /* ATTR_H */ diff --git a/t/lib-submodule-update.sh b/t/lib-submodule-update.sh index 016391723c..d2ceee33d2 100755 --- a/t/lib-submodule-update.sh +++ b/t/lib-submodule-update.sh @@ -844,6 +844,8 @@ test_submodule_switch_recursing_with_args () { git branch -t add_sub1 origin/add_sub1 && : >sub1 && echo sub1 >.git/info/exclude && + test_must_fail $command add_sub1 && + echo sub1 trashable >.gitattributes && $command add_sub1 && test_superproject_content origin/add_sub1 && test_submodule_content sub1 origin/add_sub1 diff --git a/t/t1004-read-tree-m-u-wf.sh b/t/t1004-read-tree-m-u-wf.sh index c13578a635..2243cd955e 100755 --- a/t/t1004-read-tree-m-u-wf.sh +++ b/t/t1004-read-tree-m-u-wf.sh @@ -63,8 +63,10 @@ test_expect_success 'two-way with incorrect --exclude-per-directory (2)' ' fi ' -test_expect_success 'two-way clobbering a ignored file' ' +test_expect_success 'two-way keeping a ignored file, trashing a trashable file' ' + read_tree_u_must_fail -m -u --exclude-per-directory=.gitignore master side && + echo file2 trashable >.gitattributes && read_tree_u_must_succeed -m -u --exclude-per-directory=.gitignore master side ' @@ -106,7 +108,7 @@ test_expect_success 'three-way not clobbering a working tree file' ' echo >.gitignore file3 -test_expect_success 'three-way not complaining on an untracked file' ' +test_expect_success 'three-way complaining on an untracked file, trashing a trashable file' ' git reset --hard && rm -f file2 subdir/file2 file3 subdir/file3 && @@ -114,6 +116,8 @@ test_expect_success 'three-way not complaining on an untracked file' ' echo >file3 file three created in master, untracked && echo >subdir/file3 file three created in master, untracked && + read_tree_u_must_fail -m -u --exclude-per-directory=.gitignore branch-point master side && + echo file3 trashable >.gitattributes && read_tree_u_must_succeed -m -u --exclude-per-directory=.gitignore branch-point master side ' diff --git a/unpack-trees.c b/unpack-trees.c index 7570df481b..e9a7fb6583 100644 --- a/unpack-trees.c +++ b/unpack-trees.c @@ -1895,9 +1895,10 @@ static int check_ok_to_remove(const char *name, int len, int dtype, return 0; if (o->dir && - is_excluded(o->dir, o->src_index, name, &dtype)) + is_excluded(o->dir, o->src_index, name, &dtype) && + is_trashable_file(o->src_index, name)) /* - * ce->name is explicitly excluded, so it is Ok to + * ce->name is explicitly trashable, so it is Ok to * overwrite it. */ return 0;
On Sun, Nov 11, 2018 at 1:33 PM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote: > The users who need protection against git deleting their files the most > are exactly the sort of users who aren't expert-level enough to > understand the nuances of how the semantics of .gitignore and "precious" > are going to interact before git eats their data. > > This is pretty apparent from the bug reports we're getting about > this. None of them are: > > "Hey, I 100% understood .gitignore semantics including this one part > of the docs where you say you'll do this, but just forgot one day > and deleted my work. Can we get some more safety?" > > But rather (with some hyperbole for effect): > > "ZOMG git deleted my file! Is this a bug??" > > So I think we should have the inverse of this "precious" > attribute". Just a change to the docs to say that .gitignore doesn't > imply these eager deletion semantics on tree unpacking anymore, and if > users want it back they can define a "garbage" attribute > (s/precious/garbage/). > > That will lose no data, and in the very rare cases where a checkout of > tracked files would overwrite an ignored pattern, we can just error out > (as we do with the "Ok to overwrite" branch removed) and tell the user > to delete the files to proceed. There's also the other side of the coin. If this refuse to overwrite triggers too often, it can become an annoyance. So far I've seen two reports of accident overwriting which make me think turning precious to trashable may be too extreme. Plus ignored files are trashable by default (or at least by design so far), adding trashable attribute changes how we handle ignored files quite significantly.
On Sun, Nov 11 2018, Duy Nguyen wrote: > On Sun, Nov 11, 2018 at 1:33 PM Ævar Arnfjörð Bjarmason > <avarab@gmail.com> wrote: >> The users who need protection against git deleting their files the most >> are exactly the sort of users who aren't expert-level enough to >> understand the nuances of how the semantics of .gitignore and "precious" >> are going to interact before git eats their data. >> >> This is pretty apparent from the bug reports we're getting about >> this. None of them are: >> >> "Hey, I 100% understood .gitignore semantics including this one part >> of the docs where you say you'll do this, but just forgot one day >> and deleted my work. Can we get some more safety?" >> >> But rather (with some hyperbole for effect): >> >> "ZOMG git deleted my file! Is this a bug??" >> >> So I think we should have the inverse of this "precious" >> attribute". Just a change to the docs to say that .gitignore doesn't >> imply these eager deletion semantics on tree unpacking anymore, and if >> users want it back they can define a "garbage" attribute >> (s/precious/garbage/). >> >> That will lose no data, and in the very rare cases where a checkout of >> tracked files would overwrite an ignored pattern, we can just error out >> (as we do with the "Ok to overwrite" branch removed) and tell the user >> to delete the files to proceed. > > There's also the other side of the coin. If this refuse to overwrite > triggers too often, it can become an annoyance. So far I've seen two > reports of accident overwriting which make me think turning precious > to trashable may be too extreme. Plus ignored files are trashable by > default (or at least by design so far), adding trashable attribute > changes how we handle ignored files quite significantly. Yeah I'm not trying to make the argument that we should just go with these user bug reports, clearly that's just going to give us selection bias and we could continue to flip between the two behaviors with that approach. Just that an advanced opt-in feature to prevent dataloss will not prevent it in practice. Is taking my patch the right thing? I don't know. I'm leaning in that direction, but more making a devil's advocate argument to see if anyone finds good cases that'll demonstrate how it's bad. I haven't read/seen them so far, and the test suite didn't have any. I did go through the list archives as Junio suggested in https://public-inbox.org/git/7viq39avay.fsf@alter.siamese.dyndns.org/ and found these two: https://public-inbox.org/git/?q=d%3A20070301..20070331+verify_absent It seems to me that the reason we ended up with this behavior is a bug report from Shawn that was describing a similar but not quite the same problem: "[...]a bug in read-tree -m that prevents him from switching branches when the type of a path changes between a directory and a file.[...]" That's not the same as when a now-tracked file clobbers a .gitignored file. As far as I can tell (but may not have read carefully enough) that wasn't a problem anyone reported, but was changed while fixing another bug in c81935348b ("Fix switching to a branch with D/F when current branch has file D.", 2007-03-15).
On 11/11/18 5:41 PM, Duy Nguyen wrote: > On Sun, Nov 11, 2018 at 1:33 PM Ævar Arnfjörð Bjarmason > <avarab@gmail.com> wrote: > >> That will lose no data, and in the very rare cases where a checkout of >> tracked files would overwrite an ignored pattern, we can just error out >> (as we do with the "Ok to overwrite" branch removed) and tell the user >> to delete the files to proceed. > There's also the other side of the coin. If this refuse to overwrite > triggers too often, it can become an annoyance. Sure, but doing "git checkout -f some_ref" instead of "git checkout some_ref" isn't really _that_ annoying, is it? I think, people (because of not having read/studied the .gitignore semantics well enough) having their files being overwritten _without realizing it_ is a bigger danger. But obviously there is a bit of treading a thin line here. If we feel thrashable is stretching it too far (which I don't think it is), we could add a "core.ignore_files_are_trashable" setting that brings back the old semantics, for those who have a strong feeling about it. It's also quite possible to do it the other way around - i.e. set "core.ignore_files_are_trashable" to true by default, and let the "new" behavior be opt-in. However, this might "miss the mark" in that those people who would really benefit from the new semantics might miss this setting, just like they could risk missing the "precious" setting. (I also think "trashable" sounds better and is more clear & precise than "precious", for whatever that is worth.) -- Per Lundberg
"Per Lundberg" <per.lundberg@hibox.tv> wrote: > On 11/11/18 5:41 PM, Duy Nguyen wrote: > > On Sun, Nov 11, 2018 at 1:33 PM Ævar Arnfjörð Bjarmason > > <avarab@gmail.com> wrote: > > >> That will lose no data, and in the very rare cases where a checkout of > >> tracked files would overwrite an ignored pattern, we can just error out > >> (as we do with the "Ok to overwrite" branch removed) and tell the user > >> to delete the files to proceed. > > There's also the other side of the coin. If this refuse to overwrite > > triggers too often, it can become an annoyance. I may have missed some cases, but to me the cases when checkout may try to overwrite an ignored file are essentially: * Someone "git add"ed a file meant to be ignored by mistake (e.g. "git add -f *.o"). * A file that was meant to be kept private (e.g. config.mak.dev) ends up being tracked. This may happen when we find a way to make per-developer settings the same for everyone. I both cases I'd want at least to be notified that something is going on, and in the second I'd probably want to keep my local file around. > If we feel thrashable is stretching it too far (which I don't think it > is), we could add a "core.ignore_files_are_trashable" setting that > brings back the old semantics, for those who have a strong feeling about it. May I remind an idea I sugested in an old thread: add an intermediate level where ignored files to be overwritten are renamed (eg. foo -> foo~ like Emacs' backup files): https://public-inbox.org/git/vpqd3t9656k.fsf@bauges.imag.fr/ One advantage of the "rename" behavior is that it's safer that the current, but still not very disturbing for people who like the current behavior. This makes it a good candidate for a default behavior. This could come in complement with this thread's "precious" concept: * If you know what you're doing and know that such or such file is precious, mark it as such and Git will never overwrite it. * If you don't know about precious files, just keep the default setting and the worse that can happen is to get your file overwritten with a bakup of the old version kept around. This would probably play better with a notion of "precious" files than with a notion of "trashable" files.
On Mon, Nov 12 2018, Matthieu Moy wrote: > "Per Lundberg" <per.lundberg@hibox.tv> wrote: > >> On 11/11/18 5:41 PM, Duy Nguyen wrote: >> > On Sun, Nov 11, 2018 at 1:33 PM Ævar Arnfjörð Bjarmason >> > <avarab@gmail.com> wrote: >> >> >> That will lose no data, and in the very rare cases where a checkout of >> >> tracked files would overwrite an ignored pattern, we can just error out >> >> (as we do with the "Ok to overwrite" branch removed) and tell the user >> >> to delete the files to proceed. >> > There's also the other side of the coin. If this refuse to overwrite >> > triggers too often, it can become an annoyance. > > I may have missed some cases, but to me the cases when checkout may try > to overwrite an ignored file are essentially: > > * Someone "git add"ed a file meant to be ignored by mistake (e.g. > "git add -f *.o"). > > * A file that was meant to be kept private (e.g. config.mak.dev) ends > up being tracked. This may happen when we find a way to make per-developer > settings the same for everyone. Yes, the cases under discussion here are all cases where a tracked file clobbers a file matching a pattern in in .gitignore. What I'd add to your list is: * Some projects (I've seen this in the wild) add e.g. *.mp3 or whatever else usually doesn't belong in the repo as a "soft ignore". This is something we've never recommended, but have implicitly supported since the only caveats are a) you need a one-off "git add -f" and then they're tracked b) them being missing from "status" before being tracked c) the issue under discussion here. * Cases where e.g. filename changes to a directory or globs because of that make this more complex. > I both cases I'd want at least to be notified that something is going on, > and in the second I'd probably want to keep my local file around. >> If we feel thrashable is stretching it too far (which I don't think it >> is), we could add a "core.ignore_files_are_trashable" setting that >> brings back the old semantics, for those who have a strong feeling about it. > > May I remind an idea I sugested in an old thread: add an intermediate level > where ignored files to be overwritten are renamed (eg. foo -> foo~ like Emacs' > backup files): > > https://public-inbox.org/git/vpqd3t9656k.fsf@bauges.imag.fr/ > > One advantage of the "rename" behavior is that it's safer that the current, > but still not very disturbing for people who like the current behavior. This > makes it a good candidate for a default behavior. > > This could come in complement with this thread's "precious" concept: > > * If you know what you're doing and know that such or such file is precious, > mark it as such and Git will never overwrite it. > > * If you don't know about precious files, just keep the default setting and > the worse that can happen is to get your file overwritten with a bakup > of the old version kept around. > > This would probably play better with a notion of "precious" files than with > a notion of "trashable" files. I used to think this foo -> foo~ approach made the most sense (and said as much in https://public-inbox.org/git/871s8qdzph.fsf@evledraar.gmail.com/) but I think it's probably best not to do it and just error out, because: * We'd still need to handle the cases where "tests" the file collides with "tests" the directory. Then where do we move the colliding file? ~/.git/lost+found/* ? We could handle the subdir case with another special-case though... * I think such silent action will just leave users more confused, and in many cases (e.g. a merge) whatever message we print out will be missed in a deluge of other messaging, and they'll probably miss it. I'd like to avoid a case where a bunch of *~ files get committed because the user's workflow is (and some beginner git users do this): git pull && git add . && git commit && git push As the "pull" would now invoke a merge that would do this rename. * If I have the "foo" file open in my editor (a plausible way to run into this) I switch to another terminal, do the merge, miss the message, then re-save "foo". Now I have both "foo" and "foo~" on-disk. Another case where we should just refuse until the user resolves the situation to avoid the confusion.
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > What I'd add to your list is: > > * Some projects (I've seen this in the wild) add e.g. *.mp3 or whatever > else usually doesn't belong in the repo as a "soft ignore". This is > something we've never recommended, but have implicitly supported since > the only caveats are a) you need a one-off "git add -f" and then > they're tracked b) them being missing from "status" before being > tracked c) the issue under discussion here. Or only selected "*.o" (vendor supplied binary blob) kept tracked while everything else is built from the source. I do not know who you are referring to "we" in your sentence, but as far as I am concerned, it has been and still is a BCP recommendation on this list to deal with a case like that.
On Mon, Nov 12 2018, Junio C Hamano wrote: > Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > >> What I'd add to your list is: >> >> * Some projects (I've seen this in the wild) add e.g. *.mp3 or whatever >> else usually doesn't belong in the repo as a "soft ignore". This is >> something we've never recommended, but have implicitly supported since >> the only caveats are a) you need a one-off "git add -f" and then >> they're tracked b) them being missing from "status" before being >> tracked c) the issue under discussion here. > > Or only selected "*.o" (vendor supplied binary blob) kept tracked > while everything else is built from the source. > > I do not know who you are referring to "we" in your sentence, but as > far as I am concerned, it has been and still is a BCP recommendation > on this list to deal with a case like that. I mean that this use-case of having a "soft" ignore by carrying it across the "git add" barrier with a one-off "-f" isn't something explicitly documented, and apparently not something many expect. I.e. you / Matthieu have mentioned .gitignore in the past for only-generated *.o use-case. But it also does get used for "mostly we don't want this file, but sometimes we do" use-case, so that's something we need to deal with in practice. Like many workflows in git it's not something that was forseen or intended, but does happen in the wild.
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: >> Or only selected "*.o" (vendor supplied binary blob) kept tracked >> while everything else is built from the source. >> ... > But it also does get used for "mostly we don't want this file, but > sometimes we do" use-case, so that's something we need to deal with in > practice. Exactly. "Mostly we don't want *.o as we prefer to build from the source, but we have only object files for some selected ones" is an often cited use case where it is the BCP to have *.o in .gitignore and use "add -f" to add the "selected" ones initially.
On Mon, Nov 12, 2018 at 10:09 AM Matthieu Moy <git@matthieu-moy.fr> wrote: > May I remind an idea I sugested in an old thread: add an intermediate level > where ignored files to be overwritten are renamed (eg. foo -> foo~ like Emacs' > backup files): > > https://public-inbox.org/git/vpqd3t9656k.fsf@bauges.imag.fr/ > > One advantage of the "rename" behavior is that it's safer that the current, > but still not very disturbing for people who like the current behavior. This > makes it a good candidate for a default behavior. I have something else in the bag that does something like this. The idea is that we go ahead and do destructive things but we let the user undo. Some more background in [1] but basically we hash "every" change and store in the object database (in this case we store "foo" content before overwriting it). We maintain a list of these hashes so that undo is possible, but of course we don't keep infinite change history, eventually too old changes will be pruned. [1] talks about index changes (e.g. "git add -p") but it could apply to worktree changes as well (and I'm also eyeing $GIT_DIR/config changes). The upside: a similar undo mechanism that works for more than just this case and it allows undoing multiple times while foo~ only allow once. The downside: hashing is definitely heavier than renaming foo to foo~. So this will feature be opt-in in most cases. But for "dangerous" overwrite like this case, I think we value the file content more and make it opt-out. [1] https://public-inbox.org/git/CACsJy8A3QCYY6QeJQYkbCKYh=7Q7pj=rer_OQHLGoAMqTNomNA@mail.gmail.com/
On Sun, Nov 11, 2018 at 2:06 PM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote: > +Trashable files > +~~~~~~~~~~~~~~~ > + > +`trashable` > +^^^^^^^^^^ > + > +Provides an escape hatch for re-enabling a potentially data destroying > +feature which was enabled by default between Git versions 1.5.2 and > +2.20. See the `NOTES` section of linkgit:gitignore[5] for details. How does this interact with "git clean -x"? Most ignored files will not have trashable attribute, so we don't remove any of them? Making "git clean" completely ignore this attribute is also possible, I guess, if we rename it somehow to avoid confusion.
On Sun, Nov 11, 2018 at 01:33:44PM +0100, Ævar Arnfjörð Bjarmason wrote: > The users who need protection against git deleting their files the most > are exactly the sort of users who aren't expert-level enough to > understand the nuances of how the semantics of .gitignore and "precious" > are going to interact before git eats their data. > > This is pretty apparent from the bug reports we're getting about > this. None of them are: > > "Hey, I 100% understood .gitignore semantics including this one part > of the docs where you say you'll do this, but just forgot one day > and deleted my work. Can we get some more safety?" > > But rather (with some hyperbole for effect): > > "ZOMG git deleted my file! Is this a bug??" > > So I think we should have the inverse of this "precious" > attribute". Just a change to the docs to say that .gitignore doesn't > imply these eager deletion semantics on tree unpacking anymore, and if > users want it back they can define a "garbage" attribute > (s/precious/garbage/). > > That will lose no data, and in the very rare cases where a checkout of > tracked files would overwrite an ignored pattern, we can just error out > (as we do with the "Ok to overwrite" branch removed) and tell the user > to delete the files to proceed. This is going to totally hose automation. My last job had files which might move from tracked to untracked (a file that had become generated), and long-running CI and build systems would need to be able to check out one status and switch to the other. Your proposed change will prevent those systems from working, whereas they previously did. I agree that your proposal would have been a better design originally, but breaking the way automated systems currently work is probably going to be a dealbreaker.
On 11/13/18 1:22 AM, brian m. carlson wrote: > This is going to totally hose automation. My last job had files which > might move from tracked to untracked (a file that had become generated), > and long-running CI and build systems would need to be able to check out > one status and switch to the other. Your proposed change will prevent > those systems from working, whereas they previously did. > > I agree that your proposal would have been a better design originally, > but breaking the way automated systems currently work is probably going > to be a dealbreaker. How about something like this: 1. Introduce a concept with "garbage" files, which git is "permitted to delete" without prompting. 2. Retain the current default, i.e. "ignored files are garbage" for now, making the new behavior _opt in_ to avoid breaking automated systems/existing scripts for anyone. Put the setting for this behind a new core.* config flag. 3. In the plan for version 3.0 (a new major version where some breakage can be tolerable, according to Semantic Versioning), change the default so that "only explicit garbage is garbage". Include very clear notices of this in the release notes. The config flag is retained, but its default changes from true->false or vice versa. People who dislike the new behavior can easily change back to the 2.x semantics. Would this be a reasonable compromise for everybody?
On Mon, Nov 26 2018, Per Lundberg wrote: > On 11/13/18 1:22 AM, brian m. carlson wrote: >> This is going to totally hose automation. My last job had files which >> might move from tracked to untracked (a file that had become generated), >> and long-running CI and build systems would need to be able to check out >> one status and switch to the other. Your proposed change will prevent >> those systems from working, whereas they previously did. >> >> I agree that your proposal would have been a better design originally, >> but breaking the way automated systems currently work is probably going >> to be a dealbreaker. > > How about something like this: > > 1. Introduce a concept with "garbage" files, which git is "permitted to > delete" without prompting. > > 2. Retain the current default, i.e. "ignored files are garbage" for now, > making the new behavior _opt in_ to avoid breaking automated > systems/existing scripts for anyone. Put the setting for this behind a > new core.* config flag. > > 3. In the plan for version 3.0 (a new major version where some breakage > can be tolerable, according to Semantic Versioning), change the default > so that "only explicit garbage is garbage". Include very clear notices > of this in the release notes. The config flag is retained, but its > default changes from true->false or vice versa. People who dislike the > new behavior can easily change back to the 2.x semantics. > > Would this be a reasonable compromise for everybody? Possibly, but I think there's an earlier step zero there for anyone interested in pursuing this (and currently I can't make time for it), which is to submit a patch with tests and documentation showing exactly the sort of scenarios where we clobber or don't clobber existing files. As my https://public-inbox.org/git/87zhuf3gs0.fsf@evledraar.gmail.com/ shows we have tests for this, but they're not explicit, and some want to test some unrelated thing. I.e. to test the cases where we clobber foo.c because foo.c now explicitly exists, or cases where dir/foo.c is clobbered because "dir" is now a tracked text file etc., are those the only two cases? I vaguely suspect that there were other interesting cases, but at this point the information has been paged out of the working set of my wetware. The thread at https://public-inbox.org/git/87o9au39s7.fsf@evledraar.gmail.com/ has some notes about this. Then as noted in https://public-inbox.org/git/87wopj3661.fsf@evledraar.gmail.com/ the reason we have this behavior seems to be something that grew organically from a semi-related bugfix. So I don't think we're at a point where we're all dug into our trenches and some people want X and others want Y and in the name of backwards compatibility we're going to stay with X. It may turn out that we just want to retain 10% of X, and can get 99% of the safety of Y by doing that.
Per Lundberg <per.lundberg@hibox.tv> writes: > How about something like this: > ... > Would this be a reasonable compromise for everybody? I do not think you'd need to introduce such a deliberately breaking change at all. Just introduce a new "precious" class, perhaps mark them with the atttribute mechanism, and that would be the endgame. Early adopters would start marking ignored but not expendable paths with the "precious" attribute and they won't be clobbered. As the mechanism becomes widely known and mature, more and more people use it. And even after that happens, early adopters do not have to change any attribute setting, and late adopters would have plenty of examples to imitate. Those who do not need any "precious" class do not have to do anything and you won't break any existing automation that way.
On Mon, Nov 26, 2018 at 10:30 AM Per Lundberg <per.lundberg@hibox.tv> wrote: > > On 11/13/18 1:22 AM, brian m. carlson wrote: > > This is going to totally hose automation. My last job had files which > > might move from tracked to untracked (a file that had become generated), > > and long-running CI and build systems would need to be able to check out > > one status and switch to the other. Your proposed change will prevent > > those systems from working, whereas they previously did. > > > > I agree that your proposal would have been a better design originally, > > but breaking the way automated systems currently work is probably going > > to be a dealbreaker. > > How about something like this: > > 1. Introduce a concept with "garbage" files, which git is "permitted to > delete" without prompting. > > 2. Retain the current default, i.e. "ignored files are garbage" for now, > making the new behavior _opt in_ to avoid breaking automated > systems/existing scripts for anyone. Put the setting for this behind a > new core.* config flag. > > 3. In the plan for version 3.0 (a new major version where some breakage > can be tolerable, according to Semantic Versioning), change the default > so that "only explicit garbage is garbage". Include very clear notices > of this in the release notes. The config flag is retained, but its > default changes from true->false or vice versa. People who dislike the > new behavior can easily change back to the 2.x semantics. How does this garbage thing interact with "git clean -x"? My interpretation of this flag/attribute is that at version 3.0 by default all ignored files are _not_ garbage, so "git clean -x" should not remove any of them. Which is weird because most of ignored files are like *.o that should be removed. I also need to mark "precious" on untracked or even tracked files (*). Not sure how this "garbage" attribute interacts with that. (*) I was hoping I could get the idea [1] implemented in somewhat good shape before presenting here. But I'm a bit slow on that front. So yeah this "precious" on untracked/tracked thingy may be even irrelevant if the patch series will be rejected. [1] https://public-inbox.org/git/CACsJy8C3rOFv0kQeJrWufQQzbnfU4mSxJtphEYBGMmrroFFN-A@mail.gmail.com/ > Would this be a reasonable compromise for everybody?
On Mon, Nov 26 2018, Duy Nguyen wrote: > On Mon, Nov 26, 2018 at 10:30 AM Per Lundberg <per.lundberg@hibox.tv> wrote: >> >> On 11/13/18 1:22 AM, brian m. carlson wrote: >> > This is going to totally hose automation. My last job had files which >> > might move from tracked to untracked (a file that had become generated), >> > and long-running CI and build systems would need to be able to check out >> > one status and switch to the other. Your proposed change will prevent >> > those systems from working, whereas they previously did. >> > >> > I agree that your proposal would have been a better design originally, >> > but breaking the way automated systems currently work is probably going >> > to be a dealbreaker. >> >> How about something like this: >> >> 1. Introduce a concept with "garbage" files, which git is "permitted to >> delete" without prompting. >> >> 2. Retain the current default, i.e. "ignored files are garbage" for now, >> making the new behavior _opt in_ to avoid breaking automated >> systems/existing scripts for anyone. Put the setting for this behind a >> new core.* config flag. >> >> 3. In the plan for version 3.0 (a new major version where some breakage >> can be tolerable, according to Semantic Versioning), change the default >> so that "only explicit garbage is garbage". Include very clear notices >> of this in the release notes. The config flag is retained, but its >> default changes from true->false or vice versa. People who dislike the >> new behavior can easily change back to the 2.x semantics. > > How does this garbage thing interact with "git clean -x"? My > interpretation of this flag/attribute is that at version 3.0 by > default all ignored files are _not_ garbage, so "git clean -x" should > not remove any of them. Which is weird because most of ignored files > are like *.o that should be removed. > > I also need to mark "precious" on untracked or even tracked files (*). > Not sure how this "garbage" attribute interacts with that. > > (*) I was hoping I could get the idea [1] implemented in somewhat good > shape before presenting here. But I'm a bit slow on that front. So > yeah this "precious" on untracked/tracked thingy may be even > irrelevant if the patch series will be rejected. I think a garbage (or trashable) flag, if implemented, wouldn't need any special case in git-clean, i.e. -x would remove all untracked files, whether ignored or garbage/trashable. That's what my patch to implement it does: https://public-inbox.org/git/87zhuf3gs0.fsf@evledraar.gmail.com/ I think that makes sense. Users running "git clean" have "--dry-run" and unlike "checkout a branch" or "merge this commit" where we'll now shred data implicitly it's obvious that git-clean is going to shred your data.
On Mon, Nov 26, 2018 at 4:34 PM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote: > > > On Mon, Nov 26 2018, Duy Nguyen wrote: > > > On Mon, Nov 26, 2018 at 10:30 AM Per Lundberg <per.lundberg@hibox.tv> wrote: > >> > >> On 11/13/18 1:22 AM, brian m. carlson wrote: > >> > This is going to totally hose automation. My last job had files which > >> > might move from tracked to untracked (a file that had become generated), > >> > and long-running CI and build systems would need to be able to check out > >> > one status and switch to the other. Your proposed change will prevent > >> > those systems from working, whereas they previously did. > >> > > >> > I agree that your proposal would have been a better design originally, > >> > but breaking the way automated systems currently work is probably going > >> > to be a dealbreaker. > >> > >> How about something like this: > >> > >> 1. Introduce a concept with "garbage" files, which git is "permitted to > >> delete" without prompting. > >> > >> 2. Retain the current default, i.e. "ignored files are garbage" for now, > >> making the new behavior _opt in_ to avoid breaking automated > >> systems/existing scripts for anyone. Put the setting for this behind a > >> new core.* config flag. > >> > >> 3. In the plan for version 3.0 (a new major version where some breakage > >> can be tolerable, according to Semantic Versioning), change the default > >> so that "only explicit garbage is garbage". Include very clear notices > >> of this in the release notes. The config flag is retained, but its > >> default changes from true->false or vice versa. People who dislike the > >> new behavior can easily change back to the 2.x semantics. > > > > How does this garbage thing interact with "git clean -x"? My > > interpretation of this flag/attribute is that at version 3.0 by > > default all ignored files are _not_ garbage, so "git clean -x" should > > not remove any of them. Which is weird because most of ignored files > > are like *.o that should be removed. > > > > I also need to mark "precious" on untracked or even tracked files (*). > > Not sure how this "garbage" attribute interacts with that. > > > > (*) I was hoping I could get the idea [1] implemented in somewhat good > > shape before presenting here. But I'm a bit slow on that front. So > > yeah this "precious" on untracked/tracked thingy may be even > > irrelevant if the patch series will be rejected. > > I think a garbage (or trashable) flag, if implemented, wouldn't need any > special case in git-clean, i.e. -x would remove all untracked files, > whether ignored or garbage/trashable. That's what my patch to implement > it does: > https://public-inbox.org/git/87zhuf3gs0.fsf@evledraar.gmail.com/ > > I think that makes sense. Users running "git clean" have "--dry-run" and > unlike "checkout a branch" or "merge this commit" where we'll now shred > data implicitly it's obvious that git-clean is going to shred your data. Then that't not what I want. If I'm going to mark to keep "config.mak" around, I'm not going to carefully move it away before doing "git clean -fdx" then move it back. No "git clean --dry-run" telling me to make a backup of config.mak is no good.
On Mon, Nov 26 2018, Duy Nguyen wrote: > On Mon, Nov 26, 2018 at 4:34 PM Ævar Arnfjörð Bjarmason > <avarab@gmail.com> wrote: >> >> >> On Mon, Nov 26 2018, Duy Nguyen wrote: >> >> > On Mon, Nov 26, 2018 at 10:30 AM Per Lundberg <per.lundberg@hibox.tv> wrote: >> >> >> >> On 11/13/18 1:22 AM, brian m. carlson wrote: >> >> > This is going to totally hose automation. My last job had files which >> >> > might move from tracked to untracked (a file that had become generated), >> >> > and long-running CI and build systems would need to be able to check out >> >> > one status and switch to the other. Your proposed change will prevent >> >> > those systems from working, whereas they previously did. >> >> > >> >> > I agree that your proposal would have been a better design originally, >> >> > but breaking the way automated systems currently work is probably going >> >> > to be a dealbreaker. >> >> >> >> How about something like this: >> >> >> >> 1. Introduce a concept with "garbage" files, which git is "permitted to >> >> delete" without prompting. >> >> >> >> 2. Retain the current default, i.e. "ignored files are garbage" for now, >> >> making the new behavior _opt in_ to avoid breaking automated >> >> systems/existing scripts for anyone. Put the setting for this behind a >> >> new core.* config flag. >> >> >> >> 3. In the plan for version 3.0 (a new major version where some breakage >> >> can be tolerable, according to Semantic Versioning), change the default >> >> so that "only explicit garbage is garbage". Include very clear notices >> >> of this in the release notes. The config flag is retained, but its >> >> default changes from true->false or vice versa. People who dislike the >> >> new behavior can easily change back to the 2.x semantics. >> > >> > How does this garbage thing interact with "git clean -x"? My >> > interpretation of this flag/attribute is that at version 3.0 by >> > default all ignored files are _not_ garbage, so "git clean -x" should >> > not remove any of them. Which is weird because most of ignored files >> > are like *.o that should be removed. >> > >> > I also need to mark "precious" on untracked or even tracked files (*). >> > Not sure how this "garbage" attribute interacts with that. >> > >> > (*) I was hoping I could get the idea [1] implemented in somewhat good >> > shape before presenting here. But I'm a bit slow on that front. So >> > yeah this "precious" on untracked/tracked thingy may be even >> > irrelevant if the patch series will be rejected. >> >> I think a garbage (or trashable) flag, if implemented, wouldn't need any >> special case in git-clean, i.e. -x would remove all untracked files, >> whether ignored or garbage/trashable. That's what my patch to implement >> it does: >> https://public-inbox.org/git/87zhuf3gs0.fsf@evledraar.gmail.com/ >> >> I think that makes sense. Users running "git clean" have "--dry-run" and >> unlike "checkout a branch" or "merge this commit" where we'll now shred >> data implicitly it's obvious that git-clean is going to shred your data. > > Then that't not what I want. If I'm going to mark to keep "config.mak" > around, I'm not going to carefully move it away before doing "git > clean -fdx" then move it back. No "git clean --dry-run" telling me to > make a backup of config.mak is no good. Understood. I mean this in the context of solving the problem users have with running otherwise non-data-destroying commands like "checkout" and "merge" and getting their data destroyed, which is overwhelmingly why this topic gets resurrected. Some of the solutions overlap with this thing you want, but I think it's worth keeping the distinction between the two in mind. I.e. I can imagine us finding some acceptable solution to the data shredding problem that doesn't implement this mode for "git-clean", or the other way around.
On Mon, Nov 26, 2018 at 4:47 PM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote: > >> >> How about something like this: > >> >> > >> >> 1. Introduce a concept with "garbage" files, which git is "permitted to > >> >> delete" without prompting. > >> >> > >> >> 2. Retain the current default, i.e. "ignored files are garbage" for now, > >> >> making the new behavior _opt in_ to avoid breaking automated > >> >> systems/existing scripts for anyone. Put the setting for this behind a > >> >> new core.* config flag. > >> >> > >> >> 3. In the plan for version 3.0 (a new major version where some breakage > >> >> can be tolerable, according to Semantic Versioning), change the default > >> >> so that "only explicit garbage is garbage". Include very clear notices > >> >> of this in the release notes. The config flag is retained, but its > >> >> default changes from true->false or vice versa. People who dislike the > >> >> new behavior can easily change back to the 2.x semantics. > >> > > >> > How does this garbage thing interact with "git clean -x"? My > >> > interpretation of this flag/attribute is that at version 3.0 by > >> > default all ignored files are _not_ garbage, so "git clean -x" should > >> > not remove any of them. Which is weird because most of ignored files > >> > are like *.o that should be removed. > >> > > >> > I also need to mark "precious" on untracked or even tracked files (*). > >> > Not sure how this "garbage" attribute interacts with that. > >> > > >> > (*) I was hoping I could get the idea [1] implemented in somewhat good > >> > shape before presenting here. But I'm a bit slow on that front. So > >> > yeah this "precious" on untracked/tracked thingy may be even > >> > irrelevant if the patch series will be rejected. > >> > >> I think a garbage (or trashable) flag, if implemented, wouldn't need any > >> special case in git-clean, i.e. -x would remove all untracked files, > >> whether ignored or garbage/trashable. That's what my patch to implement > >> it does: > >> https://public-inbox.org/git/87zhuf3gs0.fsf@evledraar.gmail.com/ > >> > >> I think that makes sense. Users running "git clean" have "--dry-run" and > >> unlike "checkout a branch" or "merge this commit" where we'll now shred > >> data implicitly it's obvious that git-clean is going to shred your data. > > > > Then that't not what I want. If I'm going to mark to keep "config.mak" > > around, I'm not going to carefully move it away before doing "git > > clean -fdx" then move it back. No "git clean --dry-run" telling me to > > make a backup of config.mak is no good. > > Understood. I mean this in the context of solving the problem users have > with running otherwise non-data-destroying commands like "checkout" and > "merge" and getting their data destroyed, which is overwhelmingly why > this topic gets resurrected. > > Some of the solutions overlap with this thing you want, but I think it's > worth keeping the distinction between the two in mind. On the other hand all use cases should be considered. It's going to be a mess to have "trashable" attribute that applies to some commands while "precious" to some others (and even worse when they overlap, imagine having to define both in .gitattributes) > I.e. I can > imagine us finding some acceptable solution to the data shredding > problem that doesn't implement this mode for "git-clean", or the other > way around.
On Mon, Nov 12, 2018 at 11:22:09PM +0000, brian m. carlson wrote: > This is going to totally hose automation. My last job had files which > might move from tracked to untracked (a file that had become generated), > and long-running CI and build systems would need to be able to check out > one status and switch to the other. Your proposed change will prevent > those systems from working, whereas they previously did. Wouldn't those systems not use -f right now? And shouldn't Git have the same semantic for -f to clobber everything in the proposed use case? Like it does right now for untracked files which are not ignored. So to be save going back and forth I would expect those systems to use -f anyway. Have I missed something here? Regards, Eckhard
On 11/26/18 5:55 PM, Duy Nguyen wrote: > On Mon, Nov 26, 2018 at 4:47 PM Ævar Arnfjörð Bjarmason > <avarab@gmail.com> wrote: >> Some of the solutions overlap with this thing you want, but I think it's >> worth keeping the distinction between the two in mind. > > On the other hand all use cases should be considered. It's going to be > a mess to have "trashable" attribute that applies to some commands > while "precious" to some others (and even worse when they overlap, > imagine having to define both in .gitattributes) Agree - I think it would be a very bad idea to have a "mix" of both trashable and precious. IMO, we should try to find which one of these concepts suits most general use cases best and causes less churn for existing scripts/users' existing "semantic expectations", and pick that one. -- Per Lundberg
On Tue, Nov 27, 2018 at 1:45 AM Per Lundberg <per.lundberg@hibox.tv> wrote: > > On 11/26/18 5:55 PM, Duy Nguyen wrote: > > On Mon, Nov 26, 2018 at 4:47 PM Ævar Arnfjörð Bjarmason > > <avarab@gmail.com> wrote: > >> Some of the solutions overlap with this thing you want, but I think it's > >> worth keeping the distinction between the two in mind. > > > > On the other hand all use cases should be considered. It's going to be > > a mess to have "trashable" attribute that applies to some commands > > while "precious" to some others (and even worse when they overlap, > > imagine having to define both in .gitattributes) > > Agree - I think it would be a very bad idea to have a "mix" of both > trashable and precious. IMO, we should try to find which one of these > concepts suits most general use cases best and causes less churn for > existing scripts/users' existing "semantic expectations", and pick that one. > -- > Per Lundberg Personally, I would rather err on the side which requires the least interaction from users to avoid silently clobbering an ignored file. Either Duy's solution with a sort of "untracked" reflog, or the garbage/trashable notion. I don't like the idea of precious because it means people have to know and remember to opt in, and it's quite possible they will not do so until after they've lost real data. I'd only have trashable apply in the case where it was implicit. i.e. git clean -fdx would still delete them, as this is an explicit operation that (hopefully?) users know will delete data. Thanks, Jake
On 11/27/18 2:55 PM, Jacob Keller wrote: > Personally, I would rather err on the side which requires the least > interaction from users to avoid silently clobbering an ignored file. > > [...] > > I don't like the idea of precious because it means people have to know > and remember to opt in, and it's quite possible they will not do so > until after they've lost real data. I agree strongly with this personally; if we must choose between "might break automation" and "might delete non-garbage files", I would say the former is the lesser evil of the two. But, if I had 10 000 000 servers set up using automated scripts that would break because of this, I might think differently. Quite likely so, in fact. What are these automation scenarios _more specifically_? Junio or Brian, would you care to elaborate? Is it for build servers where you want "git clean -dfx" to always reset the working copy to a pristine state or are we talking about some other scenarios? > I'd only have trashable apply in the case where it was implicit. i.e. > git clean -fdx would still delete them, as this is an explicit > operation that (hopefully?) users know will delete data. This is one of the tougher calls, unfortunately. If I was a user (which I am), and I was typing "git clean -dfx", what would I expect? The help text (currently) states "-x remove ignored files, too". Would it be safe to assume that people would understand that "ignored _does not_ mean trashable when doing "git checkout some-ref" BUT it _does_ mean trashable in the "git clean -dfx" context"? I'm not so certain. It would be one of those perceived inconsistencies that would make people scream in anger because they _presumed_ that with the new "trashable" concept, "git clean -dfx" would no longer hit them in the leg. And the other way around: if we change "git clean -dfx" to _not_ treat "ignored == trashable", it is likely to "hose automation" as it has been previously stated. People who might be using this syntax and _want_ it to remove ignored files would be upset, and rightfully so. So in my POV, it's a tough decision between two, less-than-optimal alternatives. But I would perhaps be able to live with the current semantics for "git clean -dfx" _as long as we update the help text_ so that "-x" indicates more clearly that non-trashable files can be deleted. It doesn't make things _worse_ than they currently are and if this is what it takes to get the trashable concept implemented and accepted by the community, it's a compromise I'd be willing to make. -- Per Lundberg
On Mon, Nov 26 2018, Junio C Hamano wrote: > Per Lundberg <per.lundberg@hibox.tv> writes: > >> How about something like this: >> ... >> Would this be a reasonable compromise for everybody? > > I do not think you'd need to introduce such a deliberately breaking > change at all. Just introduce a new "precious" class, perhaps mark > them with the atttribute mechanism, and that would be the endgame. > Early adopters would start marking ignored but not expendable paths > with the "precious" attribute and they won't be clobbered. As the > mechanism becomes widely known and mature, more and more people use > it. And even after that happens, early adopters do not have to change > any attribute setting, and late adopters would have plenty of examples > to imitate. Those who do not need any "precious" class do not have > to do anything and you won't break any existing automation that way. The patch I submitted in <87zhuf3gs0.fsf@evledraar.gmail.com>[1] changed the behavior for read-tree & checkout & merge etc. It was an RFC more in the spirit of showing what in our current tests had to change to spur some discussion. But I'm very sympathetic to this line of argument. I.e. in my patch I'm changing the semantics of read-tree, which is plumbing. What do you think about some patch like that which retains the plumbing behavior for things like read-tree, doesn't introduce "precious" or "trashable", and just makes you specify "[checkout|merge|...] --force" in cases where we'd have clobbering? This would give scripts which relied on our stable plumbing consistent behavior, while helping users who're using our main porcelain not to lose data. I could then add a --force option to the likes of read-tree (on by default), so you could get porcelain-like behavior with --no-force. 1. https://public-inbox.org/git/87zhuf3gs0.fsf@evledraar.gmail.com/
On Tue, Nov 27, 2018 at 1:56 PM Jacob Keller <jacob.keller@gmail.com> wrote: > > On Tue, Nov 27, 2018 at 1:45 AM Per Lundberg <per.lundberg@hibox.tv> wrote: > > > > On 11/26/18 5:55 PM, Duy Nguyen wrote: > > > On Mon, Nov 26, 2018 at 4:47 PM Ævar Arnfjörð Bjarmason > > > <avarab@gmail.com> wrote: > > >> Some of the solutions overlap with this thing you want, but I think it's > > >> worth keeping the distinction between the two in mind. > > > > > > On the other hand all use cases should be considered. It's going to be > > > a mess to have "trashable" attribute that applies to some commands > > > while "precious" to some others (and even worse when they overlap, > > > imagine having to define both in .gitattributes) > > > > Agree - I think it would be a very bad idea to have a "mix" of both > > trashable and precious. IMO, we should try to find which one of these > > concepts suits most general use cases best and causes less churn for > > existing scripts/users' existing "semantic expectations", and pick that one. > > -- > > Per Lundberg > > Personally, I would rather err on the side which requires the least > interaction from users to avoid silently clobbering an ignored file. > > Either Duy's solution with a sort of "untracked" reflog, or the > garbage/trashable notion. > > I don't like the idea of precious because it means people have to know > and remember to opt in, and it's quite possible they will not do so > until after they've lost real data. > > I'd only have trashable apply in the case where it was implicit. i.e. > git clean -fdx would still delete them, as this is an explicit > operation that (hopefully?) users know will delete data. Yes I know it will delete ignored files. But I don't want it to delete some files. There is no way I can tell Git to do that. It's the same with merge/checkout's overwriting problem. Once the initial surprise is over, I want control over what files I want Git to just delete and not annoy me, what Git should not delete.
On Tue, Nov 27, 2018 at 02:50:34PM +0000, Per Lundberg wrote: > I agree strongly with this personally; if we must choose between "might > break automation" and "might delete non-garbage files", I would say the > former is the lesser evil of the two. > > But, if I had 10 000 000 servers set up using automated scripts that > would break because of this, I might think differently. Quite likely so, > in fact. > > What are these automation scenarios _more specifically_? Junio or Brian, > would you care to elaborate? Is it for build servers where you want "git > clean -dfx" to always reset the working copy to a pristine state or are > we talking about some other scenarios? We had long-running CI servers, since bootstrapping a new system took an hour. These would check out the branch to test and run some process (essentially, a "make" and "make test"). Then, another branch would be tested, and so on. The old branch would likely not be merged at this point. The scenario I'm thinking of is when a file (say a CSS file) became built instead of stored in the repository. Then the file would be added to .gitignore in the new commit, and it would be generated as part of the make step. It would be important to blow away that file away when checking out a new commit, because not doing so would mean that the CI system would simply fail to work and require manual intervention. Moreover, a CI job might fail, due to either a flaky test or a legitimate failures, so the job might need to be re-run multiple times. Requiring human intervention, especially when such jobs might be running at odd hours, would be undesirable. Another thing we did was to use a specially named gitignore file in our build step. We created a new repository, copied the special gitignore file in as .gitignore, copied in the source and build products, ran git add and git commit, and then ran git clean -dfx to remove proprietary source code, packaging the result. A change to the behavior of git clean -dfx would be devastating here. I point this out to underscore how fundamental this change is. People overwhelmingly do not read the release notes, so expecting people to realize that a change has been made, especially when many people only upgrade Git because of a security issue, may result in unexpected consequences. Just because we don't think of this use of Git as normal or desirable doesn't mean people don't do it and don't expect it to keep working. People do read and rely on our documentation. I think any change we make here has to be opt-in, at least until Git 3.0. A config knob would probably be the right way to go. I realize that may not provide all the benefits we want out of the box, but it lets people turn the option on once and forget about it. It also lets people who don't desire this new behavior explicitly turn it off.
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > What do you think about some patch like that which retains the plumbing > behavior for things like read-tree, doesn't introduce "precious" or > "trashable", and just makes you specify "[checkout|merge|...] --force" > in cases where we'd have clobbering? Whether you like it or not, don't people's automation use tons of invocations of "git merge", "git checkout", etc.? You'd be breaking them by such a change. Other than that, if we never had Git before and do not have to worry about existing users, I'd think it would be a lot closer to the ideal than today's system if "checkout <tree> foo.o" rejected overwriting "foo.o" that is not tracked in the current index but matches an ignore pattern, and required a "--force" option to overwrite it. A user, during a conflict resolution, may say "I want this 'git checkout foo/' to ignore conflicted paths in that directory, so I would give "--force" option to it, but now "--force" also implies that I am willing to clobber ignored paths, which means I cannot use it". I would think that a proper automation needs per-path hint from the user and/or the project, not just a single-size-fits-all --force option, and "unlike all the *.o ignored files that are expendable, this vendor-supplied-object.o is not" is one way to give such a per-path hint. > This would give scripts which relied on our stable plumbing consistent > behavior, while helping users who're using our main porcelain not to > lose data. I could then add a --force option to the likes of read-tree > (on by default), so you could get porcelain-like behavior with > --no-force. At that low level, I suspect that a single size fits all "--force" would work even less well.
On 11/28/18 3:21 AM, brian m. carlson wrote: Thanks for the elaboration, Brian - good to get things down to a practical, real-world level. > [...] > > I point this out to underscore how fundamental this change is. People > overwhelmingly do not read the release notes, so expecting people to > realize that a change has been made, especially when many people only > upgrade Git because of a security issue, may result in unexpected > consequences. This is one of the more important things of software engineering. _Don't mix security fixes with breaking changes_. They are very different things and like you say, we can't really expect people to real release notes for every little incremental release we do. That's an important part of the SemVer guarantee: a minor version bump/patch level increase means "bug fix" or "added functionality in a backwards-compatible way". So: no changing of default behavior or semantics, but adding a new behavior which is opt-in is perfectly fine. > Just because we don't think of this use of Git as normal or desirable > doesn't mean people don't do it and don't expect it to keep working. In other words, we need to be "bug-by-bug" compatible with previous versions. :-) What some people would consider a bug, others would consider a feature. > I think any change we make here has to be opt-in, at least until Git > 3.0. A config knob would probably be the right way to go. Agree. It's less than optimal but I think it's something that we all could live with. Deciding to switching the default (or not) is then rightfully postponed to a later time, and we can revisit the pros and cons then. The important thing now is to get the functionality implemented in a good way, tested and eventually merged. -- Per Lundberg
On Wed, Nov 28 2018, Junio C Hamano wrote: > Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > >> What do you think about some patch like that which retains the plumbing >> behavior for things like read-tree, doesn't introduce "precious" or >> "trashable", and just makes you specify "[checkout|merge|...] --force" >> in cases where we'd have clobbering? > > Whether you like it or not, don't people's automation use tons of > invocations of "git merge", "git checkout", etc.? You'd be breaking > them by such a change. I'm so sympathetic to this argument that I tried to convince you of something like this around a year and a half ago: https://public-inbox.org/git/CACBZZX59KXPOEjiUKtZLN6zjO_xpiWve7Xga6q-53J2LwvfZyw@mail.gmail.com/ :) I was probing for what your current stance on this sort of thing is, because discussions like this tend to get bogged down in the irrelevant distraction of whether something is plumbing or porcelain, which almost none of our users care about, and we've effectively stopped caring about ourselves. But we must have some viable way to repair warts in the tools, and losing user data is a *big* wart. I don't think something like the endgame you've described in https://public-inbox.org/git/xmqqzhtwuhpc.fsf@gitster-ct.c.googlers.com/ is ever going to work. Novice git users (the vast majority) are not going to diligently update both .gitignore and some .gitattribute mechanism in lockstep. I'd bet most git users haven't read more than a few paragraphs of our entire documentation at best. So what's the way forward? I think ultimately we must move to something where we effectively version the entire CLI UI similar to stable API versions. I.e. for things like this that would break some things (or Duy's new "split checkout") introduce them as flags first, then bundle up all such flags and cut a major release "Git 3, 4, ...", and eventually remove old functionality. > Other than that, if we never had Git before and do not have to worry > about existing users, I'd think it would be a lot closer to the ideal > than today's system if "checkout <tree> foo.o" rejected overwriting > "foo.o" that is not tracked in the current index but matches an ignore > pattern, and required a "--force" option to overwrite it. > > A user, during a conflict resolution, may say "I want this 'git > checkout foo/' to ignore conflicted paths in that directory, so I > would give "--force" option to it, but now "--force" also implies > that I am willing to clobber ignored paths, which means I cannot use > it". > > I would think that a proper automation needs per-path hint from the > user and/or the project, not just a single-size-fits-all --force > option, and "unlike all the *.o ignored files that are expendable, > this vendor-supplied-object.o is not" is one way to give such a > per-path hint. > >> This would give scripts which relied on our stable plumbing consistent >> behavior, while helping users who're using our main porcelain not to >> lose data. I could then add a --force option to the likes of read-tree >> (on by default), so you could get porcelain-like behavior with >> --no-force. > > At that low level, I suspect that a single size fits all "--force" > would work even less well. Yeah I don't think the one-size-fits-all way out of this is a single --force flag.
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > I don't think something like the endgame you've described in > https://public-inbox.org/git/xmqqzhtwuhpc.fsf@gitster-ct.c.googlers.com/ > is ever going to work. Novice git users (the vast majority) are not > going to diligently update both .gitignore and some .gitattribute > mechanism in lockstep. That goes both ways, no? Forcing people to add the same pattern, e.g. *.o, to .gitignore and .gitattribute to say they are expendable, when most of the time they are, is not something you should expect from the normal population. >> I would think that a proper automation needs per-path hint from the >> user and/or the project, not just a single-size-fits-all --force >> option, and "unlike all the *.o ignored files that are expendable, >> this vendor-supplied-object.o is not" is one way to give such a >> per-path hint. >> >>> This would give scripts which relied on our stable plumbing consistent >>> behavior, while helping users who're using our main porcelain not to >>> lose data. I could then add a --force option to the likes of read-tree >>> (on by default), so you could get porcelain-like behavior with >>> --no-force. >> >> At that low level, I suspect that a single size fits all "--force" >> would work even less well. > > Yeah I don't think the one-size-fits-all way out of this is a single > --force flag. Yes, indeed. That's why I prefer the "precious" bit. The system would behave the same way with or without it, but projects (not individual endusers) can take advantage of the feature if they wanted to.
On Wed, Nov 28, 2018 at 10:54 PM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote: > But we must have some viable way to repair warts in the tools, and > losing user data is a *big* wart. > > I don't think something like the endgame you've described in > https://public-inbox.org/git/xmqqzhtwuhpc.fsf@gitster-ct.c.googlers.com/ > is ever going to work. Novice git users (the vast majority) are not > going to diligently update both .gitignore and some .gitattribute > mechanism in lockstep. I'd bet most git users haven't read more than a > few paragraphs of our entire documentation at best. > > So what's the way forward? I think ultimately we must move to something > where we effectively version the entire CLI UI similar to stable API > versions. I.e. for things like this that would break some things (or > Duy's new "split checkout") introduce them as flags first, then bundle > up all such flags and cut a major release "Git 3, 4, ...", and > eventually remove old functionality. Related to Duy's new "split chekckout", I just realized that I added --overwrite-ignore (enabled by default) [1] years ago to allow to out out of this behavior. We could turn --no-overwrite-ignore by default on the new command "git switch-branch" to err on the safe side. Then the user could switch to --overwrite-ignore once they learn more about gitignore and gitattributes (or the coming "backup log"). I'm not sure if I really like this, but at least it's one of the options. [1] https://public-inbox.org/git/1322388933-6284-2-git-send-email-pclouds@gmail.com/
On Tue, Nov 27, 2018 at 1:56 PM Jacob Keller <jacob.keller@gmail.com> wrote: > Personally, I would rather err on the side which requires the least > interaction from users to avoid silently clobbering an ignored file. > > Either Duy's solution with a sort of "untracked" reflog, or the > garbage/trashable notion. The "untracked reflog" is partially functional now [1] if you want to have a look. I'm not going to post the series until post-2.20, but if you do look, I suggest the first patch that lays out the design and a plumbing command to manage it. Basically you'll do git backup-log --id=worktree log <some-path> then pick up the version you like with "git backup-log cat" and do whatever you want with it. High level UI is not there and will be a topic of discussion. The precious/trashable/garbage notion can be used to suppress making backups. [1] https://gitlab.com/pclouds/git/commits/backup-log
diff --git a/Documentation/git-clean.txt b/Documentation/git-clean.txt index 03056dad0d..a9beadfb12 100644 --- a/Documentation/git-clean.txt +++ b/Documentation/git-clean.txt @@ -21,7 +21,8 @@ option is specified, ignored files are also removed. This can, for example, be useful to remove all build products. If any optional `<path>...` arguments are given, only those paths -are affected. +are affected. Ignored or untracked files with `precious` attributes +are not removed. OPTIONS ------- diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt index b8392fc330..c722479bdc 100644 --- a/Documentation/gitattributes.txt +++ b/Documentation/gitattributes.txt @@ -1188,6 +1188,19 @@ If this attribute is not set or has an invalid value, the value of the (See linkgit:git-config[1]). +Precious files +~~~~~~~~~~~~~~~~~~~~~~~~ + +`precious` +^^^^^^^^^^ + +This attribute is set on files to indicate that their content is +valuable. Many commands will behave slightly different on precious +files. linkgit:git-clean[1] will leave precious files alone. Merging +and branch switching will not silently overwrite ignored files that +are marked "precious". + + USING MACRO ATTRIBUTES ---------------------- diff --git a/attr.c b/attr.c index 60d284796d..d06ca0ae4b 100644 --- a/attr.c +++ b/attr.c @@ -1186,3 +1186,12 @@ void attr_start(void) pthread_mutex_init(&check_vector.mutex, NULL); #endif } + +int is_precious_file(struct index_state *istate, const char *path) +{ + static struct attr_check *check; + if (!check) + check = attr_check_initl("precious", NULL); + git_check_attr(istate, path, check); + return check && ATTR_TRUE(check->items[0].value); +} diff --git a/attr.h b/attr.h index b0378bfe5f..b9a9751a66 100644 --- a/attr.h +++ b/attr.h @@ -82,4 +82,6 @@ void git_attr_set_direction(enum git_attr_direction new_direction); void attr_start(void); +int is_precious_file(struct index_state *istate, const char *path); + #endif /* ATTR_H */ diff --git a/builtin/clean.c b/builtin/clean.c index 8d9a7dc206..9e554448a6 100644 --- a/builtin/clean.c +++ b/builtin/clean.c @@ -17,6 +17,7 @@ #include "color.h" #include "pathspec.h" #include "help.h" +#include "attr.h" static int force = -1; /* unset */ static int interactive; @@ -30,6 +31,8 @@ static const char *const builtin_clean_usage[] = { static const char *msg_remove = N_("Removing %s\n"); static const char *msg_would_remove = N_("Would remove %s\n"); +static const char *msg_skip_precious = N_("Skipping precious file %s\n"); +static const char *msg_would_skip_precious = N_("Would skip precious file %s\n"); static const char *msg_skip_git_dir = N_("Skipping repository %s\n"); static const char *msg_would_skip_git_dir = N_("Would skip repository %s\n"); static const char *msg_warn_remove_failed = N_("failed to remove %s"); @@ -152,6 +155,7 @@ static int remove_dirs(struct strbuf *path, const char *prefix, int force_flag, struct dirent *e; int res = 0, ret = 0, gone = 1, original_len = path->len, len; struct string_list dels = STRING_LIST_INIT_DUP; + const char *rel_path; *dir_gone = 1; @@ -191,9 +195,15 @@ static int remove_dirs(struct strbuf *path, const char *prefix, int force_flag, strbuf_setlen(path, len); strbuf_addstr(path, e->d_name); - if (lstat(path->buf, &st)) + if (lstat(path->buf, &st)) { ; /* fall thru */ - else if (S_ISDIR(st.st_mode)) { + } else if ((!prefix || skip_prefix(path->buf, prefix, &rel_path)) && + is_precious_file(&the_index, rel_path)) { + quote_path_relative(path->buf, prefix, "ed); + printf(dry_run ? _(msg_would_skip_precious) : _(msg_skip_precious), quoted.buf); + *dir_gone = 0; + continue; + } else if (S_ISDIR(st.st_mode)) { if (remove_dirs(path, prefix, force_flag, dry_run, quiet, &gone)) ret = 1; if (gone) { @@ -1017,7 +1027,10 @@ int cmd_clean(int argc, const char **argv, const char *prefix) if (lstat(abs_path.buf, &st)) continue; - if (S_ISDIR(st.st_mode)) { + if (is_precious_file(&the_index, item->string)) { + qname = quote_path_relative(item->string, NULL, &buf); + printf(dry_run ? _(msg_would_skip_precious) : _(msg_skip_precious), qname); + } else if (S_ISDIR(st.st_mode)) { if (remove_dirs(&abs_path, prefix, rm_flags, dry_run, quiet, &gone)) errors++; if (gone && !quiet) { diff --git a/unpack-trees.c b/unpack-trees.c index 7570df481b..d49fe0f77e 100644 --- a/unpack-trees.c +++ b/unpack-trees.c @@ -1895,7 +1895,8 @@ static int check_ok_to_remove(const char *name, int len, int dtype, return 0; if (o->dir && - is_excluded(o->dir, o->src_index, name, &dtype)) + is_excluded(o->dir, o->src_index, name, &dtype) && + !is_precious_file(o->src_index, name)) /* * ce->name is explicitly excluded, so it is Ok to * overwrite it.
Since this topic has come up twice recently, I revisited this "precious" thingy that I started four years ago and tried to see if I could finally finish it. There are a couple things to be sorted out... A new attribute "precious" is added to indicate that certain files have valuable content and should not be easily discarded even if they are ignored or untracked (*). So far there are two parts of Git that are made aware of precious files: "git clean" will leave precious files alone and unpack-trees.c (i.e. merges and branch switches) will not overwrite ignored-but-precious files. Is there any other parts of Git that should be made aware of this "precious" attribute? Also while "precious" is a fun name, but it does not sound serious. Any suggestions? Perhaps "valuable"? Very lightly tested. The patch is more to have something to discuss than is bug free and ready to use. (*) Note that tracked files could be marked "precious" in the future too although the exact semantics is not very clear since tracked files are by default precious. But something like "index log" could use this to record all changes to precious files instead of just "git add -p" changes, for example. So these files are in a sense more precious than other tracked files. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> --- Documentation/git-clean.txt | 3 ++- Documentation/gitattributes.txt | 13 +++++++++++++ attr.c | 9 +++++++++ attr.h | 2 ++ builtin/clean.c | 19 ++++++++++++++++--- unpack-trees.c | 3 ++- 6 files changed, 44 insertions(+), 5 deletions(-)