diff mbox series

precious-files.txt: new document proposing new precious file type

Message ID pull.1627.git.1703643931314.gitgitgadget@gmail.com (mailing list archive)
State New, archived
Headers show
Series precious-files.txt: new document proposing new precious file type | expand

Commit Message

Elijah Newren Dec. 27, 2023, 2:25 a.m. UTC
From: Elijah Newren <newren@gmail.com>

We have traditionally considered all ignored files to be expendable, but
users occasionally want ignored files that are not considered
expendable.  Add a design document covering how to split ignored files
into two types: 'trashable' (what all ignored files are currently
considered) and 'precious' (the new type of ignored file).

Helped-by: Sebastian Thiel <sebastian.thiel@icloud.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
---
    precious-files.txt: new document proposing new precious file type
    
    A couple months ago, we had another in-depth discussion of precious
    files[1]. As with previous times, multiple strategies were discussed
    (including multiple new ones), meaning we keep making the possible
    solution space wider and never nail down an agreed path. I also got the
    feeling we were potentially pigeonholing on a subset of the problem
    space, and I thought it'd be good to better enumerate what areas of Git
    are affected.
    
    So, I went through the exercise of creating a design document to: (1)
    provide a specific design proposal and explore it, (2) cover at a high
    level the breadth of issues that an implementor needs to at least think
    about and which reviewers should be aware of in terms of readiness of a
    potential implementation, and (3) provide links to other discussions and
    alternative proposals for completeness.
    
    I had some off-list discussions with Sebastian about this proposal, and
    he provided some helpful feedback. The idea at this point is that if
    folks agree with the general direction, that he is going to be
    implementing at least the first cut basic capability. I'll help review
    changes, but I'm mostly interested in avoiding unfortunate surprises.
    
    So...does the proposed direction seem reasonable to folks?
    
    [1]
    https://lore.kernel.org/git/79901E6C-9839-4AB2-9360-9EBCA1AAE549@icloud.com/

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1627%2Fnewren%2Fprecious-files-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1627/newren/precious-files-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1627

 Documentation/technical/precious-files.txt | 540 +++++++++++++++++++++
 1 file changed, 540 insertions(+)
 create mode 100644 Documentation/technical/precious-files.txt


base-commit: 564d0252ca632e0264ed670534a51d18a689ef5d

Comments

Junio C Hamano Dec. 27, 2023, 5:28 a.m. UTC | #1
"Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Elijah Newren <newren@gmail.com>
>
> We have traditionally considered all ignored files to be expendable, but
> users occasionally want ignored files that are not considered
> expendable.  Add a design document covering how to split ignored files
> into two types: 'trashable' (what all ignored files are currently
> considered) and 'precious' (the new type of ignored file).

The proposed syntax is a bit different from what I personally prefer
(which is Phillip's [P14] or something like it), but I consider that
the more valuable parts of this document is about how various
commands ought to interact with precious paths, which shouldn't
change regardless of the syntax.

Thanks for putting this together.
Elijah Newren Dec. 27, 2023, 6:54 a.m. UTC | #2
On Tue, Dec 26, 2023 at 9:28 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > From: Elijah Newren <newren@gmail.com>
> >
> > We have traditionally considered all ignored files to be expendable, but
> > users occasionally want ignored files that are not considered
> > expendable.  Add a design document covering how to split ignored files
> > into two types: 'trashable' (what all ignored files are currently
> > considered) and 'precious' (the new type of ignored file).
>
> The proposed syntax is a bit different from what I personally prefer
> (which is Phillip's [P14] or something like it), but I consider that
> the more valuable parts of this document is about how various
> commands ought to interact with precious paths, which shouldn't
> change regardless of the syntax.

I agree that syntax and command behavior are mostly separate issues,
but unfortunately they are not orthogonal.  In particular, syntax of
precious file specification is directly tied to fallback behavior for
older Git clients, and it might potentially affect backward
compatibility of non-cone-mode sparse-checkout syntax as well.

I think fallback behavior is of particular importance.  There are
precisely two choices in our design for how older Git versions can
treat precious files:
  * ignored-and-expendable
  * untracked-and-precious
If we pick syntax that causes older Git versions to treat precious
files as ignored-and-expendable, we risk deleting important files.
Alternatively, if we pick syntax that causes older Git versions to
treat precious files as untracked-and-precious, they won't be ignored
by e.g. git-status, and are easier to accidentally add with git-add.

I felt the "precious" bit was much more important than the "ignored"
bit of "precious" files, so I thought untracked-and-precious was a
better fallback.  However, to get that, we have to rule out lots of
the syntax proposals, such as Phillip's [P14].

Anyway, I'm open to alternative syntax, but we need to measure it
against the relevant criteria, which I believe are:
  (A) ease for users to understand, remember, and use
  (B) size of backward compatibility break with .gitignore syntax
  (C) appropriateness of implied fallback behavior for older Git
clients with precious files
  (D) room for additional extension in .gitignore files
  (E) potential affects on backward compatibility of non-cone
sparse-checkout syntax
We probably also need to agree on the relative importance of these
criteria; personally, I would probably order them from most important
to least as C, B, E, A, D.

Phillip's P14 is better for D, and perhaps a little better for B, but
I thought slightly worse for A, and much worse for C.  (I think
there's no significant relative difference for E between his proposed
syntax and mine.)

> Thanks for putting this together.
Junio C Hamano Dec. 27, 2023, 10:15 p.m. UTC | #3
Elijah Newren <newren@gmail.com> writes:

> There are
> precisely two choices in our design for how older Git versions can
> treat precious files:
>   * ignored-and-expendable
>   * untracked-and-precious
> If we pick syntax that causes older Git versions to treat precious
> files as ignored-and-expendable, we risk deleting important files.

Yes but not really.  I'd expect the adoption of precious feature and
the adoption of versions of Git that supports that feature will go
more or less hand in hand.  Projects that, for any reason, need to
keep their participants at pre-precious versions of Git would
naturally refrain from marking the "precious" paths in their "ignore"
mechanism before their participants are ready, so even if we chose
syntax that will make the precious ones mistaken as merely ignored,
the damage would be fairly small.
Sebastian Thiel Jan. 18, 2024, 7:51 a.m. UTC | #4
I thought it would be helpful to see the syntax being referred to here,
as first brought up by Phillip Wood:

#(keep)
/my-precious-file

The main benefit I see for it is that it's extensible, despite having
trouble imagining what such extension would be 10 years from now.
On the flip side, since it's already using a comment, people will
be even more inclined to document the reason for the preciousness
of the file.

# The kernel configuration, typically created by running a TUI program
#(keep)
.config

As a side-effect of the syntax, it's obvious this is an 'upgrade', with
perfect backwards compatibility as old git does the same as always.

I'd love to take first steps into the implementation, and if the above
should be the syntax to use, I'd be happy to submit a patch for parsing
it, along with initial support for precious files in `git clean` and
`git status`.

Does that sound like a reasonable next step?


On 27 Dec 2023, at 23:15, Junio C Hamano wrote:

> Elijah Newren <newren@gmail.com> writes:
>
>> There are
>> precisely two choices in our design for how older Git versions can
>> treat precious files:
>>   * ignored-and-expendable
>>   * untracked-and-precious
>> If we pick syntax that causes older Git versions to treat precious
>> files as ignored-and-expendable, we risk deleting important files.
>
> Yes but not really.  I'd expect the adoption of precious feature and
> the adoption of versions of Git that supports that feature will go
> more or less hand in hand.  Projects that, for any reason, need to
> keep their participants at pre-precious versions of Git would
> naturally refrain from marking the "precious" paths in their "ignore"
> mechanism before their participants are ready, so even if we chose
> syntax that will make the precious ones mistaken as merely ignored,
> the damage would be fairly small.
Junio C Hamano Jan. 18, 2024, 7:14 p.m. UTC | #5
Sebastian Thiel <sebastian.thiel@icloud.com> writes:

> #(keep)
> .config
>
> As a side-effect of the syntax, it's obvious this is an 'upgrade', with
> perfect backwards compatibility as old git does the same as always.

Yes but ...

The point Elijah makes is worth considering.  To existing versions
of git, having this entry for ".config" means that it is ignored
(i.e. "git add ." will not include it), but expendable (i.e. "git
clean" considers ".config" as a candidate for removal; "git checkout
other", if the "other" branch has it as a tracked path, will clobber
it).  Compared to the case where ".config" is not mentioned in
".gitignore", where it may be added by use of "git add .", it won't
be clobbered by "git clean".

So this syntax having "perfect backward compatibility" is not quite
true.  It does have downsides when used by existing versions of Git.

If we use Elijah's syntax to say

	$.config

then the entry to existing versions of git is a no-op wrt a file
named ".config".  It simply does not match the pattern, so an
accidental "git add ." *will* add ".config" to the index, while "git
clean" may not touch it, simply because it is treated as "untracked
and precious".  In other words, its downside is the same as not
marking the path ".config" in any way in ".gitignore", as far as
existing versions of Git are concerned.

We of course discount the possibility that people keep a file whose
name literally is dollar-dot followed by "config" and older versions
of Git would start treating them as ignored-and-expendable.  While
it *is* an additional downside compared to Phillip's "#(keep)"
approach, I do not think that particular downside is worth worrying
about.  Yet another downside compared to Phillip's is that it is
less extensible.  Over the years, however, the ignored-but-precious
is the only one we heard from users that lack of which is hurting
them, so lack of extensibility may not be too huge a deal.

For projects that are currently listing these files in ".gitignore"
as "ignored-and-expendable" already and want to categorize them as
"ignored-and-precious" by changing ".config" to "$.config" (or
adding "#(keep)" comment before the existing entry), the
pros-and-cons equation may differ.  Their current participants are
protected from accidentally adding them with "git add ." but risking
to lose them with "git clean -f".  They may even be trained to be
careful to see "git clean -n" output before actually running the
command with "-f".  Now, if their project ships a new version of
".gitignore" that marks these paths as "ignored-and-precious", both
approaches will have intended effect to participants who upgraded to
the version of Git.

To participants using the current version of Git:

 * Phillip's approach to add "#(keep)" will not change anything.
   They will be protected from accidental "git add ." as before, and
   they will still have to be careful about "git clean -f".

 * Elijah's approach to rewrite existing'.config' to '$.config',
   however, will stop protecting them from "git add .", even though
   it will start protecting them from "git clean -f".

The devil you already know may be the lessor of two evils in such a
situation.

So, all it boils down to is these two questions.

 * Which one between "'git add .' adds '.config' that users did not
   want to add" and "'git clean -f' removes '.config' together with
   other files" a larger problem to the users, who participate in a
   project that already decided to use the new .gitignore feature to
   mark ".config" as "precious", of older versions of Git that
   predate "precious"?

 * What are projects doing to paths that they want to make
   "precious" with the current system?  Do they leave them out of
   ".gitignore" and have them subject to accidental "git add ." to
   protect them from "git clean -f"?  Or do they list them in
   ".gitignore" to prevent "git add ." from touching, but leave them
   susceptible to accidental removal by "git clean -f"?

Thanks.
Sebastian Thiel Jan. 18, 2024, 9:33 p.m. UTC | #6
Thanks so much for the analysis, as seeing the problem of choosing
a syntax from the perspective of its effects when using common commands
like "git add" and "git clean -f" seems very promising!

When thinking about "git add ." vs "git clean -f" one difference comes to
mind: "git clean -f" is much less desirable it's fatal. "git add ." on the
other hand leaves room for correction, even when used with `git commit -a"
(and with the exception of "git commit -am 'too late'").

From that point of view I'd naturally prefer the "$.config" syntax as it
will turn precious files into untracked ones for current Git.

>  * Which one between "'git add .' adds '.config' that users did not
>    want to add" and "'git clean -f' removes '.config' together with
>    other files" a larger problem to the users, who participate in a
>    project that already decided to use the new .gitignore feature to
>    mark ".config" as "precious", of older versions of Git that
>    predate "precious"?
>

If the user should have a choice, than both syntaxes could also be allowed
to let them choose what to optimise for.

Doing so might be less relevant in the `.config` case, but most relevant
for ignored files like ".env" or ".env.secret" which under no circumstances
must be tracked.

>  * What are projects doing to paths that they want to make
>    "precious" with the current system?  Do they leave them out of
>    ".gitignore" and have them subject to accidental "git add ." to
>    protect them from "git clean -f"?  Or do they list them in
>    ".gitignore" to prevent "git add ." from touching, but leave them
>    susceptible to accidental removal by "git clean -f"?

I did hear that some projects use make files with specifically configured
"git clean" invocations to specifically "--exclude" precious files.
Thus far I didn't encounter one that use such a technique to prevent
"git add" from tracking too much though.

To my mind, in order to support projects with both ".config" and
".env.secret" they would have to be given a choice of which syntax
to use, e.g.

    # This file shouldn't accidentally be deleted by `git clean`
    $.config

    # These files should never be accidentally tracked
    #(keep)
    .env*


On 18 Jan 2024, at 20:14, Junio C Hamano wrote:

> Sebastian Thiel <sebastian.thiel@icloud.com> writes:
>
>> #(keep)
>> .config
>>
>> As a side-effect of the syntax, it's obvious this is an 'upgrade', with
>> perfect backwards compatibility as old git does the same as always.
>
> Yes but ...
>
> The point Elijah makes is worth considering.  To existing versions
> of git, having this entry for ".config" means that it is ignored
> (i.e. "git add ." will not include it), but expendable (i.e. "git
> clean" considers ".config" as a candidate for removal; "git checkout
> other", if the "other" branch has it as a tracked path, will clobber
> it).  Compared to the case where ".config" is not mentioned in
> ".gitignore", where it may be added by use of "git add .", it won't
> be clobbered by "git clean".
>
> So this syntax having "perfect backward compatibility" is not quite
> true.  It does have downsides when used by existing versions of Git.
>
> If we use Elijah's syntax to say
>
>     $.config
>
> then the entry to existing versions of git is a no-op wrt a file
> named ".config".  It simply does not match the pattern, so an
> accidental "git add ." *will* add ".config" to the index, while "git
> clean" may not touch it, simply because it is treated as "untracked
> and precious".  In other words, its downside is the same as not
> marking the path ".config" in any way in ".gitignore", as far as
> existing versions of Git are concerned.
>
> We of course discount the possibility that people keep a file whose
> name literally is dollar-dot followed by "config" and older versions
> of Git would start treating them as ignored-and-expendable.  While
> it *is* an additional downside compared to Phillip's "#(keep)"
> approach, I do not think that particular downside is worth worrying
> about.  Yet another downside compared to Phillip's is that it is
> less extensible.  Over the years, however, the ignored-but-precious
> is the only one we heard from users that lack of which is hurting
> them, so lack of extensibility may not be too huge a deal.
>
> For projects that are currently listing these files in ".gitignore"
> as "ignored-and-expendable" already and want to categorize them as
> "ignored-and-precious" by changing ".config" to "$.config" (or
> adding "#(keep)" comment before the existing entry), the
> pros-and-cons equation may differ.  Their current participants are
> protected from accidentally adding them with "git add ." but risking
> to lose them with "git clean -f".  They may even be trained to be
> careful to see "git clean -n" output before actually running the
> command with "-f".  Now, if their project ships a new version of
> ".gitignore" that marks these paths as "ignored-and-precious", both
> approaches will have intended effect to participants who upgraded to
> the version of Git.
>
> To participants using the current version of Git:
>
>  * Phillip's approach to add "#(keep)" will not change anything.
>    They will be protected from accidental "git add ." as before, and
>    they will still have to be careful about "git clean -f".
>
>  * Elijah's approach to rewrite existing'.config' to '$.config',
>    however, will stop protecting them from "git add .", even though
>    it will start protecting them from "git clean -f".
>
> The devil you already know may be the lessor of two evils in such a
> situation.
>
> So, all it boils down to is these two questions.
>
>  * Which one between "'git add .' adds '.config' that users did not
>    want to add" and "'git clean -f' removes '.config' together with
>    other files" a larger problem to the users, who participate in a
>    project that already decided to use the new .gitignore feature to
>    mark ".config" as "precious", of older versions of Git that
>    predate "precious"?
>
>  * What are projects doing to paths that they want to make
>    "precious" with the current system?  Do they leave them out of
>    ".gitignore" and have them subject to accidental "git add ." to
>    protect them from "git clean -f"?  Or do they list them in
>    ".gitignore" to prevent "git add ." from touching, but leave them
>    susceptible to accidental removal by "git clean -f"?
>
> Thanks.
Elijah Newren Jan. 19, 2024, 2:37 a.m. UTC | #7
On Thu, Jan 18, 2024 at 1:33 PM Sebastian Thiel
<sebastian.thiel@icloud.com> wrote:
>
> Thanks so much for the analysis, as seeing the problem of choosing
> a syntax from the perspective of its effects when using common commands
> like "git add" and "git clean -f" seems very promising!
>
> When thinking about "git add ." vs "git clean -f" one difference comes to
> mind: "git clean -f" is much less desirable it's fatal. "git add ." on the
> other hand leaves room for correction, even when used with `git commit -a"
> (and with the exception of "git commit -am 'too late'").

"git commit -a" and "git commit -am 'too late'", by themselves, will
only commit changes to already-tracked files.  So they wouldn't be
problematic alone.

But perhaps the -a was distracting and you were thinking of "git add .
&& git commit -m whatever".  That does remove the chance to correct
before creating a commit, but I don't think it's too bad either.  Even
though it skips the chance to catch the problem pre-commit, there's
still time to review & correct before publishing for patch review (or
PR review or MR review or whatever you want to call it).  And, even if
published for patch review, it can still be caught & corrected by
those doing patch review as well.

So, I just don't see the "accidental add" problem as being very
severe; there are so many chances to catch and correct it.

> To my mind, in order to support projects with both ".config" and
> ".env.secret" they would have to be given a choice of which syntax
> to use, e.g.
>
>     # This file shouldn't accidentally be deleted by `git clean`
>     $.config
>
>     # These files should never be accidentally tracked
>     #(keep)
>     .env*

Reminds me of https://www.emacswiki.org/pics/static/TabsSpacesBoth.png

;-)

Besides, if for a specific file or filetype, accidental additions are
more important to protect against than accidental nuking, then can't
folks achieve that by simply using

    # Don't let older git versions add the file
    .env.secret

    # For newer git versions, override the above; treat it as precious
(i.e. don't add AND don't accidentally nuke)
    $.env.secret

In contrast, if protection against accidental nuking is more important
for certain files, one can use just the second line without the first.

And, whether you have a file with both lines or just the second line,
newer git versions will protect against both accidental nuking and
accidental adding.

In contrast...

Phillip's syntax provides no way to achieve treating accidental nuking
as more important than accidental adding; it can only handle
protection against accidental adding in older Git versions.  And, as I
discussed above, the accidental add problem seems much less severe and
is thus the less important problem to protect against.
Elijah Newren Jan. 19, 2024, 2:58 a.m. UTC | #8
Hi,

On Thu, Jan 18, 2024 at 11:14 AM Junio C Hamano <gitster@pobox.com> wrote:
>
[...]
> So, all it boils down to is these two questions.

Thanks for summarizing this.

>  * Which one between "'git add .' adds '.config' that users did not
>    want to add" and "'git clean -f' removes '.config' together with
>    other files" a larger problem to the users, who participate in a
>    project that already decided to use the new .gitignore feature to
>    mark ".config" as "precious", of older versions of Git that
>    predate "precious"?

Accidental "git add ." comes with 3 opportunities to correct the
problem before it becomes permanent: before commiting, after
committing but before pushing, and after publishing for patch review
(where it can even be caught by third parties) but before the
patch/PR/MR is accepted and included.  At each stage there's a chance
to go back and correct the problem.

Accidental nuking of a file (via either git clean or git checkout or
git merge or whatever), cannot be reviewed or corrected; it's
immediately too late.  And given that we're calling this feature
"precious", that seems a little extra unfortunate.

>  * What are projects doing to paths that they want to make
>    "precious" with the current system?  Do they leave them out of
>    ".gitignore" and have them subject to accidental "git add ." to
>    protect them from "git clean -f"?  Or do they list them in
>    ".gitignore" to prevent "git add ." from touching, but leave them
>    susceptible to accidental removal by "git clean -f"?

Good questions; I have no answers to these.

However, on a closely related note, in my response to Sebastian I
point out that the '$' syntax permits individual teams to prioritize
avoiding either accidental deletions or accidental adds on a filename
or glob granularity, so if folks are concerned with handling by older
Git versions or are just extra concerned with certain files, they can
optimize accordingly.  Sadly, the '#(keep)' syntax does not permit
such prioritization and always treats avoiding accidental adds as the
priority (which, in my opinion, is the less important one to generally
prioritze).
Sebastian Thiel Jan. 19, 2024, 7:51 a.m. UTC | #9
Yes, indeed I was a little confused when making the "git commit..." based examples,
thanks for correcting them.

>
> Reminds me of https://www.emacswiki.org/pics/static/TabsSpacesBoth.png
>
> ;-)
>


Phillip Wood Jan. 19, 2024, 4:53 p.m. UTC | #10
Hi Elijah

On 19/01/2024 02:58, Elijah Newren wrote:
> On Thu, Jan 18, 2024 at 11:14 AM Junio C Hamano <gitster@pobox.com> wrote:
>>
> [...]
>> So, all it boils down to is these two questions.
> 
> Thanks for summarizing this.

Yes, thank you Junio - I found it very helpful as well

>>   * Which one between "'git add .' adds '.config' that users did not
>>     want to add" and "'git clean -f' removes '.config' together with
>>     other files" a larger problem to the users, who participate in a
>>     project that already decided to use the new .gitignore feature to
>>     mark ".config" as "precious", of older versions of Git that
>>     predate "precious"?
> 
> Accidental "git add ." comes with 3 opportunities to correct the
> problem before it becomes permanent: before commiting, after
> committing but before pushing, and after publishing for patch review
> (where it can even be caught by third parties) but before the
> patch/PR/MR is accepted and included.  At each stage there's a chance
> to go back and correct the problem.

If you've added a secret then catching it after you've published the 
patch for review is likely to be too late. I agree there are a couple of 
chances to catch it before that though.

> Accidental nuking of a file (via either git clean or git checkout or
> git merge or whatever), cannot be reviewed or corrected; it's
> immediately too late.

Indeed, though "git clean" requires the user to pass a flag before it 
will delete anything does have a dry-run mode to check what's going to 
happen so there is an opportunity for users to avoid accidental deletions.

> [...] 
> However, on a closely related note, in my response to Sebastian I
> point out that the '$' syntax permits individual teams to prioritize
> avoiding either accidental deletions or accidental adds on a filename
> or glob granularity, so if folks are concerned with handling by older
> Git versions or are just extra concerned with certain files, they can
> optimize accordingly.

That is an advantage. I do worry that the '$' syntax is unintuitive and 
will further add to the impression that git is hard to use. I think the 
choice comes down how much we are worried about the way older versions 
of git treat ".gitignore" files with the new syntax.

While I can see it would be helpful to settle the syntax question I 
think parsing the new syntax is a relatively small part of the work that 
needs to be done to implement precious files.

Best Wishes

Phillip
Junio C Hamano Jan. 19, 2024, 5:17 p.m. UTC | #11
Phillip Wood <phillip.wood123@gmail.com> writes:

> If you've added a secret then catching it after you've published the
> patch for review is likely to be too late. I agree there are a couple
> of chances to catch it before that though.

Yes, this is one of the two remaining things that still make me a
bit worried about the "$.config" syntax.

> Indeed, though "git clean" requires the user to pass a flag before it
> will delete anything does have a dry-run mode to check what's going to
> happen so there is an opportunity for users to avoid accidental
> deletions.

True, too.

The other one that still make me a bit worried about the "$.config"
syntax is what I called "the devil you already know" that is
applicable only for participants of a project that currently mark
precious files as ignored, to avoid the accidental "git add ." of
secrets.

I think we already are in agreement that all other points (aside
from possible ergonomics preferences and future extensibility, both
feel a lot minor) raised during this discussion are in favor of the
"$.config" syntax.

> While I can see it would be helpful to settle the syntax question I
> think parsing the new syntax is a relatively small part of the work
> that needs to be done to implement precious files.

True.  The parser can be isolated and it should be relatively easy
to revamp.  My current preference is to (at least) tentatively agree
on using the "$.config" syntax, which would allow us to update
dir.c:parse_path_pattern(), and that would make it possible for us
to start adjusting dir.c:is_excluded(), adding is_precious() next to
it, and adjusting all current callers of the former.

Thanks.
Junio C Hamano Jan. 19, 2024, 6:45 p.m. UTC | #12
Sebastian Thiel <sebastian.thiel@icloud.com> writes:

> I am glad I can pull my initial proposition of 'having both syntaxes' off
> the table to side with this version - it's gorgeous.
>
> It's easy to forget that the search-order when matching ignore patterns
> is back to front, which makes this 'trick' work.

The true gem is not the search-order, though.  It is the "last one
wins" rule.  Back to front search is merely an implementation detail
to optimize the search so that we can stop at the first hit ;-)

> If the insights gained with the last couple of emails would see their digest
> in the user-facing documentation, I think precious files wouldn't only become
> usable but would also allow projects to make the their choice during
> the transition period during which some users will inevitably access the repository
> with a Git that doesn't know about precious files yet.

OK.
Elijah Newren Jan. 24, 2024, 6:50 a.m. UTC | #13
Hi Phillip,

On Fri, Jan 19, 2024 at 8:53 AM Phillip Wood <phillip.wood123@gmail.com> wrote:
[...]
> >>   * Which one between "'git add .' adds '.config' that users did not
> >>     want to add" and "'git clean -f' removes '.config' together with
> >>     other files" a larger problem to the users, who participate in a
> >>     project that already decided to use the new .gitignore feature to
> >>     mark ".config" as "precious", of older versions of Git that
> >>     predate "precious"?
> >
> > Accidental "git add ." comes with 3 opportunities to correct the
> > problem before it becomes permanent: before commiting, after
> > committing but before pushing, and after publishing for patch review
> > (where it can even be caught by third parties) but before the
> > patch/PR/MR is accepted and included.  At each stage there's a chance
> > to go back and correct the problem.
>
> If you've added a secret then catching it after you've published the
> patch for review is likely to be too late. I agree there are a couple of
> chances to catch it before that though.

Ah, good point.

> > Accidental nuking of a file (via either git clean or git checkout or
> > git merge or whatever), cannot be reviewed or corrected; it's
> > immediately too late.
>
> Indeed, though "git clean" requires the user to pass a flag before it
> will delete anything does have a dry-run mode to check what's going to
> happen so there is an opportunity for users to avoid accidental deletions.

Yes, good point again for "git clean"; it does have one level of check
before the operation users can take advantage of.  The same cannot be
said for the files nuked by checkout/merge/rebase/cherry-pick, though.

> > [...]
> > However, on a closely related note, in my response to Sebastian I
> > point out that the '$' syntax permits individual teams to prioritize
> > avoiding either accidental deletions or accidental adds on a filename
> > or glob granularity, so if folks are concerned with handling by older
> > Git versions or are just extra concerned with certain files, they can
> > optimize accordingly.
>
> That is an advantage. I do worry that the '$' syntax is unintuitive and
> will further add to the impression that git is hard to use. I think the
> choice comes down how much we are worried about the way older versions
> of git treat ".gitignore" files with the new syntax.

Interesting, I thought the mixture of '!' as a prefix and '#(keep)' as
a previous-line directive would be somewhat inconsistent and add
further to the impression that git is hard to use, though I can also
see your point that '$' as a prefix can as well.

> While I can see it would be helpful to settle the syntax question I
> think parsing the new syntax is a relatively small part of the work that
> needs to be done to implement precious files.

Oh, I agree it's a small part of the work, but as stated previously,
I'm not doing that work (Sebastian is).  I'm just trying to help avoid
getting unintended consequences in the design, and to me this is an
important edge case to consider, get an agreement on, and document in
some fashion.

Anyway, Junio seems to have weighed in with a tentative path forward,
and everyone has been very good about bringing up additional
considerations around this issue that are worth documenting in the
design document, so I'll try to put together an update soon-ish.
Sebastian Thiel Feb. 11, 2024, 10:08 p.m. UTC | #14
I didn't know where I would best reply to give an update on my work
on precious file support, but here I go.

On my journey to daring implementing precious files in Git, I decided
to implement it in Gitoxide first to ease myself into it.

After what felt like months of work on the Gitoxide-equivalent of
dir.c, it just took 2 days to cobble together a 'gix clean' with
precious files support.

You might say that something as destructive as a 'clean' subcommand
would better not be rushed, but it was surprisingly straightforward
to implement. It was so inviting even that I could spend the second
day, today, entirely on polishing, yielding a 'gix clean' which is
fun to use, with some extras I never knew I wanted until I had full
control over it and could play around easily.

What I found myself do immediately by the way is adjust `.gitignore`
files of the project to have precious declarations right after
their non-precious counterparts for backwards compatibility.

It works perfectly, from what I can tell, and it is truly wonderful
to be able to wipe a repo clean without fear of destroying anything
valuable. And I am aware that we all know that, but wanted to write
it to underline how psychologically valuable this feature is.

Without further ado, I invite you all to give it a go yourself
for first experiences with precious files maybe.

    git clone https://github.com/Byron/gitoxide
    cd gitoxide
    cargo build --release --bin gix --no-default-features --features max-pure
	target/release/gix clean

This should do the trick - from there the program should guide the
user.

If you want to see some more interesting features besides precious
files, you can run 'cargo test -p gix' and follow the 'gix clean -xd'
instructions along with the `--debug` flag.

A word about performance: It is slower.
It started out to be only about 1% slower even on the biggest repositories
and under optimal conditions (i.e. precomposeUnicode and ignoreCase off
and skipHash true). But as I improved correctness and added features,
that was lost and it's now about 15% slower on bigger repositories.

I appended a benchmark run on the Linux kernel at the end, and it shows
that Gitoxide definitely spends more time in userland. I can only
assume that some performance was lost when I started to deviate from
the 'only do the work you need' recipe that I learned from Git to
'always provide a consistent set of information about directory entries'.

On top of that, there is multiple major shortcomings in this realm:

- Gitoxide doesn't actually get faster when reading indices with multiple
  threads for some reason.
- the icase-hashtable is created only with a single thread.
- the precompose-unicode conversion is very slow and easily costs 25%
  performance.

But that's details, some of which you can see yourself when running
'gix --trace -v clean'.

Now I hope you will have fun trying 'gix clean' with precious files in your
repositories. Also, I am particularly interested in learning how it fares
in situations where you know 'git clean' might have difficulties.
I tried very hard to achieve correctness, and any problem you find
will be fixed ASAP.

With this experience, I think I am in a good position to get precious
files support for 'git clean' implemented, once I get to make the start.

Cheers,
Sebastian

----

Here is the benchmark result (and before I forget, Gitoxide also uses about 25% more memory
for some reason, so really has some catchup to do, eventually)

linux (ffc2532) +369 -819 [!] took 2s
❯ hyperfine -N -w1 -r4  'gix clean -xd --skip-hidden-repositories=non-bare' 'gix -c index.skipHash=1 -c core.ignoreCase=0 -c core.precomposeUnicode=0 clean -xd --skip-hidden-repositories=non-bare' 'git clean -nxd'
Benchmark 1: gix clean -xd --skip-hidden-repositories=non-bare
  Time (mean ± σ):     171.7 ms ±   3.0 ms    [User: 70.4 ms, System: 101.4 ms]
  Range (min … max):   167.4 ms … 174.2 ms    4 runs

Benchmark 2: gix -c index.skipHash=1 -c core.ignoreCase=0 -c core.precomposeUnicode=0 clean -xd --skip-hidden-repositories=non-bare
  Time (mean ± σ):     156.3 ms ±   3.1 ms    [User: 56.9 ms, System: 99.3 ms]
  Range (min … max):   154.1 ms … 160.8 ms    4 runs

Benchmark 3: git clean -nxd
  Time (mean ± σ):     138.4 ms ±   2.7 ms    [User: 40.5 ms, System: 103.7 ms]
  Range (min … max):   136.1 ms … 142.0 ms    4 runs

Summary
  git clean -nxd ran
    1.13 ± 0.03 times faster than gix -c index.skipHash=1 -c core.ignoreCase=0 -c core.precomposeUnicode=0 clean -xd --skip-hidden-repositories=non-bare
    1.24 ± 0.03 times faster than gix clean -xd --skip-hidden-repositories=non-bare


On 27 Dec 2023, at 6:28, Junio C Hamano wrote:

> "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
>> From: Elijah Newren <newren@gmail.com>
>>
>> We have traditionally considered all ignored files to be expendable, but
>> users occasionally want ignored files that are not considered
>> expendable.  Add a design document covering how to split ignored files
>> into two types: 'trashable' (what all ignored files are currently
>> considered) and 'precious' (the new type of ignored file).
>
> The proposed syntax is a bit different from what I personally prefer
> (which is Phillip's [P14] or something like it), but I consider that
> the more valuable parts of this document is about how various
> commands ought to interact with precious paths, which shouldn't
> change regardless of the syntax.
>
> Thanks for putting this together.
diff mbox series

Patch

diff --git a/Documentation/technical/precious-files.txt b/Documentation/technical/precious-files.txt
new file mode 100644
index 00000000000..05c205b57bb
--- /dev/null
+++ b/Documentation/technical/precious-files.txt
@@ -0,0 +1,540 @@ 
+Precious Files Design Document
+==============================
+
+Table of Contents
+  * Objective
+  * Background
+    * File categorization exceptions
+  * Proposal
+    * Precious file specification
+    * Breakdown of suggested behaviors by command
+  * Backward compatibility notes
+    * Slightly incompatible syntax
+    * Interaction with sparse-checkout parsing
+    * Behavior of traditional flags
+    * Interaction with older Git clients
+    * Commands with modified meaning
+  * Implementation hints
+    * Data structures
+    * Code areas
+    * Minimum
+  * Out of scope
+  * Previous discussions
+  * Alternatives considered
+
+Objective
+---------
+Support "Precious" Files in git, a set of files which are considered
+ignored (e.g. do not show up in "git status" output) but are not expendable
+(thus won't be removed to make room for a file when switching or merging
+branches).
+
+Background
+----------
+In git we have different types of files, with various subdivisions:
+  * tracked
+    * present (i.e. part of sparse checkout)
+    * not present (i.e. not part of sparse checkout)
+  * not tracked
+    * ignored (also treated as expendable)
+    * untracked (more precisely, not-tracked-and-not-ignored, but often
+      referred to as simply "untracked" despite the fact that such a term
+      is easily mistaken as a synonym to "not tracked".  However, we haven't
+      been fully consistent, and some places like `git ls-files --others`
+      may use "untracked" to refer to the larger not-tracked category).
+      Not considered expendable.
+
+Over the years, the fact that ignored files are unconditionally treated as
+expendable (so that other operations like git checkout might wipe them out
+to make room for files on the other branch) has occasionally caused
+problems.  Many have expressed a desire for subdividing the ignored class,
+so that we have both ignored-and-expendable (possibly referred to as
+"trashable", covering the only type of ignored file we have today) and
+introducing ignored-and-not-expendable (often referred to as "precious").
+
+File categorization exceptions
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Our division above into nice categories is actually a bit of a lie.
+
+Once upon a time untracked files were considered expendable[1].  Even after
+that changed, we still had lots of edge cases where untracked files were
+deleted when they shouldn't be, and ignored files weren't deleted when they
+should be[2].  While that has been (mostly) fixed, despite the general
+intent to preserve untracked files, we have special cases that are
+documented as not preserving them[4,5].  There are also a few codepaths
+that have comments about locations that might (or definitely do)
+erroneously delete untracked paths[6].  And at least one code path that is
+known to erroneously delete untracked paths which has not been commented:
+`git checkout <tree> <pathspec>`.  And there may be more.
+
+[1] https://lore.kernel.org/git/CABPp-BFyR19ch71W10oJDFuRX1OHzQ3si971pMn6dPtHKxJDXQ@mail.gmail.com/
+[2] https://lore.kernel.org/git/pull.1036.v3.git.1632760428.gitgitgadget@gmail.com/
+[3] https://lore.kernel.org/git/de416f887d7ce24f20ad3ad4cc838394d6523635.1632760428.git.gitgitgadget@gmail.com/
+[4] https://lore.kernel.org/git/xmqqr1e2ejs9.fsf@gitster.g/
+[5] https://lore.kernel.org/git/de416f887d7ce24f20ad3ad4cc838394d6523635.1632760428.git.gitgitgadget@gmail.com/
+[6] https://lore.kernel.org/git/6b42a80bf3d46e16980d0724e8b07101225239d0.1632760428.git.gitgitgadget@gmail.com/
+
+This history and these exceptions matter to this proposal because:
+  * it highlights how much work can be involved in trying to treat a class
+    of files as not expendable
+  * the existing corner cases where untracked files are erroneously
+    treated as expendable will probably also double as corner cases where
+    precious files are treated as expendable
+  * the past fixes for treating untracked files as precious will likely
+    highlight the needed types of code changes to treat ignored files as
+    precious
+
+Proposal
+--------
+We propose adding another class of files: ignored-but-not-expendable,
+referred to by the shorthand of "precious".  The proposal is simple at a
+high level, but there are many details to consider:
+  * How to specify precious files (extended .gitignore syntax?  attributes?)
+  * Which commands should be modified, and how?
+  * How to handle flags that are essentially a partial implementation of
+    a precious capability (e.g. [--[no-]overwrite-ignore])?
+  * How will older Git clients behave on a repo with precious files?
+The subsequent sections will try to address these questions in more detail.
+
+One thing to highlight here is that the class formerly called
+`ignored` now has two subtypes: (1) the type we already have,
+ignored-and-expendable (sometimes referred to below as "trashable")
+and (2) the new type, ignored-and-not-expendable (referred to as
+"precious").
+
+Precious file specification
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+As per [P2]:
+
+    """
+    Even though I referred to the precious _attribute_ in some of these
+    discussions, between the attribute mechanism and the ignore
+    mechanism, I am actually leaning toward suggesting to extend the
+    exclude/ignore mechanism to introduce the "precious" class.  That
+    way, we can avoid possible snafu arising from marking a path in
+    .gitignore as ignored, and in .gitattrbutes as precious, and have to
+    figure out how these two settings are to work together.
+    """
+
+we specify precious files via an extension to .gitignore.  In particular,
+lines starting with a '$' character specify that the file is precious.
+For example:
+  $.config
+would say the file `.config` is precious.
+
+Now that there are three types of files specified by .gitignore files --
+untracked, trashable (ignored-and-expendable), and precious
+(ignored-and-not-expendable), the meaning of `!` at the begining of a line
+needs careful clarification.  It could be seen as "not ignored" or as "not
+trashable", given the subdivision of ignored files that has occurred.  We
+specifically take it to mean "not ignored", i.e. "untracked".
+
+This leaves us with a simple set of rules to provide to users about lines
+in their '.gitignore' file:
+  * No special prefix character => ignored-and-expendable ("trashable")
+  * A '$' prefix character      => ignored-and-not-expendable ("precious")
+  * A '!' prefix character      => not ignored, i.e. untracked
+
+It's worth noting that the traditional use of '!' as a negation
+character needs updating, given the introduction of a ternary state
+("not trashable" could mean either untracked or precious, which is
+ambiguous).  Refrain from referring to '!' as a negation character to
+avoid confusion.  To assist users in making this mindset shift, flag
+any line beginning with '!$' as an error. As always,
+backslash-escaping remains an option, allowing users to specify
+entries like '!\$foo' to mark a file named '$foo' as untracked.
+
+Breakdown of suggested behaviors by command
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+See also "Out of Scope" section below, particularly for:
+  * apply, am [without -3]
+  * checkout/restore
+  * checkout-index
+  * additional information on merge backends
+
+Documentation:
+  * audit for references to "ignore" and "ignored", to see which ones need
+    to now replace those with either "ignored-and-expendable" (or
+    "trashable"), and which can remain "ignored".
+  * audit for "exclude" and "excluded" (the older terminology for ignored
+    files) and update them as well.
+  * add references to "precious" (and perhaps "trashable) as needed (don't
+    forget the glossary)
+  * rm: update the documentation:
+      "Ignored files are deemed expendable and won't stop" ->
+      "Ignored files, unless specifically marked precious, are by default
+       deemed expendable and won't stop"
+  * ensure all codepaths touched by 0e29222e0c2 ("Documentation: call out
+    commands that nuke untracked files/directories", 2021-09-27) also call
+    out that they'll nuke precious files in addition to untracked ones.
+  * change the documentation for '!' in gitignore to stop using the term
+    'negates'; it's potentially misleading now (negating a ternary value
+    yields an ambiguous value).  Instead, the prefix is used to mark
+    untracked (or "not ignored") files.
+  * note that the --[no-]overwrite-ignore option is deprecated, and, since
+    it predated the introduction of precious files is also a misnomer.  The
+    correct name of the option would actually be --[no-]overwrite-trashable
+    but it is too late to rename.
+  * consider documenting that merge's --no-overwrite-ignore option is
+    virtually worthless (only works with the fast-forwarding backend).
+  * consider auditing the code for 'untracked' and fixing those to be
+    'not tracked' in cases where both 'untracked' and 'ignored' files
+    are meant
+
+checkout/switch:
+  * will need to not overwrite precious files when they are in the way of
+    switching branches, unless --force/-f is specified.
+
+checkout/restore:
+  * when passed a <tree> as a source, do not overwrite precious files
+    (NOR untracked files!), unless --force/-f is specified.  [Could be
+    considered a stretch goal...]
+
+merge:
+  * do not overwrite precious files when they are in the way of merging
+    branches.  (Must be handled in each and every merge strategy;
+    user-defined merge strategies may get this wrong.)
+
+read-tree:
+  * -u: do not overwrite precious files when they are in the way, unless...
+  * --reset and -u: overwrite precious files as well as untracked files.
+    Add to the warning under --reset about overwritten untracked files to
+    note that precious files are also overwritten.
+
+am -3, cherry-pick, rebase, revert, : same as above for checkout/switch and
+  merge.
+
+add:
+  * same as today, just make sure when we split the ignored array (ignored &
+    ignored_nr) into multiple categories that it continues working
+
+rm:
+  * make sure submodules are not removed if precious files are present.
+    Currently, rm will remove submodules if only ignored files are present.
+
+check-ignore:
+  * since this command exists for debugging gitignore rules, there needs to
+    be some kind of mechanism for differentiating between trashable and
+    precious files.  It is okay if this comes with a new command-line flag,
+    but there should be some tests showing how it behaves both with and
+    without that flag when precious files are present
+
+clean:
+  * clarify the meaning of -x and -X options: -X now means only remove
+    trashable files.  -x means remove both untracked and trashable files.
+    (See also [P17])
+  * add a --all option for removing all not-tracked files: untracked,
+    trashable, and precious.
+  * Other than --all, it is not worth adding flags for cleaning subsets of
+    not-tracked files that include precious files (thus, no flag for just
+    precious, or trashable and precious, or untracked and precious)
+  * Patterns with a leading '$' can be passed to --exclude, if wanted.
+
+ls-files:
+  * --ignored/-i: shows every kind of ignored file (thus behaving the same
+    as today, since there is no way to distinguish between the types of
+    ignored in the output)
+  * add new `--ignored=precious` and `--ignored=trashable` flags for
+    differentiating.  A plain `--ignored` is like having both
+    `--ignored=precious` and `--ignored=trashable` specified.
+  * --exclude,--exclude-from can now take patterns with a leading '$' and
+    the file will be considered precious rather than trashable.
+
+status:
+  * --ignored (without additional parameters) continues behaving as-is: it
+    prints both trashable and precious files in its "Ignored" category with
+    no distinguishing.
+  * --ignored --short will continue showing trashable files with '!!', but
+    will now show precious files using '$$'.
+  * --ignored --porcelain={v1,v2} will continue showing precious files
+    with the '!' character, since scripts may not be prepared to parse a
+    leading '$'.  We can't break those scripts, even if it'd avoid the
+    off chance that those scripts act on the information about "ignored"
+    files and end up nuking precious files.
+  * --ignored --porcelain=v3 will need to be introduced to show precious
+    files with a leading '$'.
+
+sparse-checkout:
+  * the --rules-file option should be tested with a pattern with a leading
+    '$' to make sure it prints an expected error.
+  * it might be worth noting somewhere that sparse-checkout treats
+    ignored files as precious; when sparsifying, it attempts to remove
+    directories that do not match the sparse specification, but will
+    leave them present if any of the tracked files are modified, or if
+    there are any not-tracked files present.  That includes ignored
+    files.  That means no additional work is needed for precious
+    support; I just mention it for completeness.
+
+Backward compatibility notes
+----------------------------
+There are multiple issues that impinge on backward compatibility (either in
+terms of special care we need to take, or in terms of messaging we may need
+to send out about changes):
+  * Slightly incompatible syntax
+  * Interaction with sparse-checkout parsing
+  * Behavior of traditional flags
+  * Interaction with older Git clients
+  * Commands with modified meaning
+We'll discuss each in its own subsection below.
+
+Slightly incompatible syntax
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+This new syntax obviously breaks backward compatibility in that an ignored
+path named `$.config` would now have to be specified as `\$.config`.  This
+is similar to how introducing `!` as a prefix in .gitignore files was a
+backward compatibility break.  We expect and hope that the fallout will be
+minor.  See also [P10].
+
+Interaction with sparse-checkout parsing
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+The $GIT_DIR/info/sparse-checkout file also makes use of gitignore syntax
+and the gitignore parsing to read the file.  It differs in that the files
+specified are considered the files to be included (i.e. present in the
+working copy) rather than which files should be excluded, but otherwise
+has until now used identical syntax and parsing.
+
+However, for sparse-checkout there is no third type of file, so the '$'
+prefix makes no sense for it.  As such, it should be an error for any
+lines to begin with '$' in a sparse-checkout file.
+
+(This also means that if anyone really did have a path beginning with '$'
+in sparse-checkout files previously, then they now need to backslash escape
+them, the same as with .gitignore files.)
+
+While we could theoretically avoid this small backward compatibility break
+for sparse-checkout parsing by just treating a leading '$' the way it
+traditionally has been done, I am worried about practically maintaining that
+solution:
+  * the gitignore parsing is peppered with references like 'exclude' that
+    are specific to the gitignore case
+  * because of the above, it is _heavily_ confusing to attempt to read and
+    understand the gitignore handling while considering the sparse-checkout
+    case.  I've been tripped up by it *many* times.
+  * I think trying to reuse the existing parsing engine and have it handling
+    both old and new syntax is a recipe for failure.  It'd be much cleaner
+    to have errors thrown if the processing turns up any "precious" files,
+    or perhaps if any line starts with '$'.
+  * I think making a copy of the existing parsing, and then letting them
+    diverge, means the two will eventually diverge even further, and we
+    would need to make a copy of all the documentation about gitignore rules
+    for sparse-checkout, all for the non-default non-cone case we are
+    already recommending users away from.
+
+Behavior of traditional flags
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+There are two flags to consider here: the --porcelain flag to git-status,
+and the --no-overwrite-ignore command to checkout & merge commands.  For
+the --porcelain flag to git-status, see the "Breakdown of suggested
+behaviors by command" and look for git-status there.  The rest of this
+section will focus on --[no-]overwrite-ignore.
+
+People have wanted precious files long enough, that they implemented an
+interim kludge of sorts -- a command line option that can be passed to
+various subcommands that treats all ignored files as precious:
+--no-overwrite-ignore.
+
+In particular, this flag can be passed to both git-checkout, and git-merge.
+However, in merge's case, the support depended the flag being passed to the
+backend and the backend supporting it.  The builtin/merge.c code only ever
+bothered to pass this flag down to the fast-forwarding merge handling code,
+so it never worked with any backends that actually create a merge commit.
+
+We do need to keep these flags working, at least as much as they did
+previously.  However, we don't want to consider them desired features,
+which would lead us to making related equivalents for precious files like
+--overwrite-precious.  Instead we will:
+  * Keep --[no-]overwrite-ignore working, as much as it already was.
+  * Recommend users mark precious files in their gitignore files instead of
+    using these flags
+
+Interaction with older Git clients
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Older Git clients will not understand precious files.  This means that:
+  * precious files will be considered untracked and not ignored.
+  * most comands will preserve these files, since untracked-and-not-ignored
+    are not considered expendable.
+  * git status will continue listing these files
+  * git add will add these files without requiring -f.
+
+This seems like a reasonable tradeoff that only has minor annoyances.  The
+alternative of having the precious files treated as ignored has the very
+risky trade-off of deleting files which the users marked as important for
+us to keep.
+
+Commands with modified meaning
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+In clean, we adjust the meaning of both -x and -X:
+  -X: remove only trashable files
+  -x: remove untracked and trashable files (but preserve precious ones)
+
+Implementation hints
+--------------------
+
+Data structures
+~~~~~~~~~~~~~~~
+  * We will want to add a `precious` and `precious_nr` in dir_struct,
+    similar to the current entries/nr or ignored/ignored_nr.
+  * We may want to rename `ignored` and `ignored_nr` in dir_struct to
+    `trashable` and `trashable_nr`.
+
+Code areas
+~~~~~~~~~~
+  * "preserve_ignored", a flag in the code for handling the
+    --[no-]overwrite-ignore flag, is a very helpful marker about what needs
+    to be tweaked and how to tweak it to preserve more files.  In particular,
+    note that --no-overwrite-ignore works by telling the machinery in dir.c
+    to not do the setup_standard_excludes() stuff, so that all ignored files
+    just look like untracked files.  We'll need something slightly smarter,
+    which makes precious files look like untracked while trashable files
+    still appear in ignored.  Shouldn't be too bad.
+  * we might need to add another entry to the unpack_trees_reset_type
+    enum.  Or perhaps we still keep both UNPACK_RESET_PROTECT_UNTRACKED
+    and UNPACK_RESET_OVERWRITE_UNTRACKED but rename them with
+    s/UNTRACKED/NOT_EXPENDABLE/ so it is clear they handle both untracked and
+    precious files.  Not sure which is needed yet.
+  * dir_struct->flags _might_ need new entries.
+  * ensure all relevant codepaths touched by 94b7f1563ac ("Comment important
+    codepaths regarding nuking untracked files/dirs", 2021-09-27) are either
+    fixed or also mention precious files
+  * am/rebase/checkout[without -f]: see 480d3d6bf90 ("Change unpack_trees'
+    'reset' flag into an enum", 2021-09-27)
+  * Merge backends:
+    * (see also "Out of scope" section)
+    * merge-ort can be fixed by fixing the checkout code.
+    * merge-resolve and merge-octopus can probably be fixed by fixing
+      git-reset.
+  * stash:
+    * there is an existing --include-untracked option.  There was no reason
+      to add a --include-ignored, because ignored files were trashable.  Do
+      we need to add a --include-precious, though?
+    * this is a sad pile of shell-reimplemented-in-C.  It's just awful.
+      See b34ab4a43ba ("stash: remove unnecessary process forking",
+      2020-12-01) and ba359fd5070 ("stash: fix stash application in
+      sparse-checkouts", 2020-12-01) and 94b7f1563ac ("Comment important
+      codepaths regarding nuking untracked files/dirs", 2021-09-27).
+      Fixing stash to not nuke precious files (and to not nuke untracked
+      files either) might mean expunging the stupid
+      shell-reimplemented-in-C design, or at least moving things more in
+      that direction.
+  * rebase (merge backend), revert, cherry-pick, am -3: should automatically
+    be handled by getting merge-ort to work, which should work by making
+    checkout/switch work.
+  * bisect: should work by making checkout work
+
+Minimum
+~~~~~~~
+
+I think for a minimum implementation, we need to ensure that the following
+are handled:
+  * parsing:
+    * parsing of lines starting with '$' in .gitignore
+    * erroring on lines starting with '!$' in .gitignore
+    * erroring on lines starting with '$' in $GIT_DIR/info/sparse-checkout
+  * commands with support:
+    * switch/checkout
+    * merge when using the ort backend
+    * read-tree -u [without --reset] (due to internal use)
+    * ls-files
+
+Out of scope
+------------
+The following tasks are currently out of scope for this proposal:
+
+apply, am [without -3]: apply won't overwrite any file in the working
+  directory even when a new file is in the patch.  It should overwrite
+  trashable files.  We could log that bug via testcase, but make sure
+  there's a companion testcase that ensures overwriting untracked or
+  precious files continues to make apply throw an error.  However, since
+  apply/am don't misbehave for precious files, we can defer this to later.
+
+checkout-index: similar to apply; won't overwrite any existing files, but
+  trashable files should be overwritten
+
+reset --hard:
+  * `git reset --hard` is a little funny and we have thought about changing
+    it[4].  However, that can be left for later and will not be tackled as
+    part of the work of introducing "precious" files as a concept.
+
+merge backends:
+  * it may make sense to try to make --no-overwrite-ignore work with more
+    merge backends, both because it's technically documented behavior, and
+    because doing so may be a step towards getting precious files supported
+  * when multiple merge strategies are specified, builtin/merge.c will
+    stash and restore state between the attempt of different strategies.
+    Since the reset_hard() function invokes `read-tree --reset -u`, there
+    might be a way to cause it to trash untracked files or to trash
+    precious files, depending on what the merge strategies did.  It seems
+    unlikely (maybe the strategy handles D/F conflicts or rename
+    conflicts by renaming files in the way, and happens to rename a
+    precious file to a path that is considered either untracked or
+    precious -- merge-recursive certainly did this something like this
+    once upon a time and still might); we can probably ignore it for now.
+  * merge-recursive is a lost cause; it'd be a _huge_ amount of effort to
+    fix, but we intend to deprecate and delete it soon anyway (making all
+    requests for recursive just trigger ort instead).
+  * user-defined merge strategies are up to their authors to get right.
+    Odds are they won't, but odds are they already incorrectly nuke
+    untracked files too because who'd pay attention to a special case
+    like files being in the way of a merge?  Anyway, "not our problem".  :-)
+
+Previous discussions
+--------------------
+
+A far from exhaustive sampling of various past conversations on the topic:
+
+[P1] https://lore.kernel.org/git/7vipsnar23.fsf@alter.siamese.dyndns.org/
+[P2] https://lore.kernel.org/git/xmqqttqytnqb.fsf@gitster.g/
+[P3] https://lore.kernel.org/git/79901E6C-9839-4AB2-9360-9EBCA1AAE549@icloud.com/
+[P4] https://lore.kernel.org/git/87a6q9kacx.fsf@evledraar.gmail.com/
+[P5] https://lore.kernel.org/git/20190216114938.18843-1-pclouds@gmail.com/
+[P6] https://lore.kernel.org/git/87ftsi68ke.fsf@evledraar.gmail.com/
+[P7] https://lore.kernel.org/git/xmqqo7ub4sfh.fsf@gitster.g/
+[P8] https://lore.kernel.org/git/7v4oepaup7.fsf@alter.siamese.dyndns.org/
+[P9] https://lore.kernel.org/git/20181112232209.GK890086@genre.crustytoothpaste.net/
+[P10] https://lore.kernel.org/git/xmqqttqvg4lw.fsf@gitster.g/
+[P11] https://lore.kernel.org/git/xmqqk1hrr91s.fsf@gitster-ct.c.googlers.com/
+[P12] https://lore.kernel.org/git/9C4A2AFD-AAA2-4ABA-8A8B-2133FD870366@icloud.com/
+[P13] https://lore.kernel.org/git/xmqqfs2e3292.fsf@gitster.g/
+[P14] https://lore.kernel.org/git/0deee2bc-1775-4459-906d-1d44b3103499@gmail.com/
+[P15] https://lore.kernel.org/git/ZSkpOc%2FdcGcrFQNU@ugly/
+[P16] https://lore.kernel.org/git/xmqqil79t82q.fsf@gitster.g/
+[P17] https://lore.kernel.org/git/xmqqo7h6tnib.fsf@gitster.g/
+
+Alternatives considered
+-----------------------
+There have been multiple alternatives considered, along a few different
+axes:
+  * .gitattributes instead of .gitignore
+  * leaving sparse-checkout alone
+  * Trashable [P9,P11]
+  * Alternative gitignore syntax
+
+The choice of .gitattributes vs .gitignore was already addressed in the
+"Precious file specification" section.
+
+The choice to modify or leave alone the parsing of
+$GIT_DIR/info/sparse-checkout was already addressed in the "Interaction
+with sparse-checkout parsing" section.
+
+One alternative raised in the past was treating ignored files as not
+expendable by default, and then introducing a new category of
+ignored-but-expendable.  This new category has been dubbed "trashable" in
+the past.  That may have been a reasonable solution if Git did not have a
+large userbase already, but moving in this direction would cause severe
+problems for existing builds everywhere[P9] and would require users to
+doubly configure most files (since it is expected that
+ignored-but-expendable is a much larger class of files than
+ignored-but-precious).  See also [P11].
+
+There have been multiple alternative suggestions for extending gitignore
+syntax to handle precious files and optionally future extensions as well.
+For example: [P10, P12, P13, P14, P15, P16]  However:
+  * There have been on and off requests for precious files for about 14
+    years
+  * We are not aware of other types of extensions needed; there might
+    not be any
+  * The alternatives all seem much more complex to explain to users than
+    the simple proposal here.
+In particular, we like the simplicity of the providing the simple mapping
+to users from the penultimate paragraph of the "Precious file
+specification" section (the one regarding no-prefix vs. '!' vs '$').