[v3,3/4] doc: dissuade users from trying to ignore tracked files
diff mbox series

Message ID 20191102192615.10013-4-sandals@crustytoothpaste.net
State New
Headers show
Series
  • Documentation for common user misconceptions
Related show

Commit Message

brian m. carlson Nov. 2, 2019, 7:26 p.m. UTC
It is quite common for users to want to ignore the changes to a file
that Git tracks.  Common scenarios for this case are IDE settings and
configuration files, which should generally not be tracked and possibly
generated from tracked files using a templating mechanism.

However, users learn about the assume-unchanged and skip-worktree bits
and try to use them to do this anyway.  This is problematic, because
when these bits are set, many operations behave as the user expects, but
they usually do not help when git checkout needs to replace a file.

There is no sensible behavior in this case, because sometimes the data
is precious, such as certain configuration files, and sometimes it is
irrelevant data that the user would be happy to discard.

Since this is not a supported configuration and users are prone to
misuse the existing features for unintended purposes, causing general
sadness and confusion, let's document the existing behavior and the
pitfalls in the documentation for git update-index so that users know
they should explore alternate solutions.

In additon, let's provide a recommended solution to dealing with the
common case of configuration files, since there are well-known
approaches used successfully in many environments.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 Documentation/git-update-index.txt | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

Comments

Randall S. Becker Nov. 2, 2019, 8:14 p.m. UTC | #1
On November 2, 2019 3:26 PM, brian m. carlson wrote:
> It is quite common for users to want to ignore the changes to a file that
Git
> tracks.  Common scenarios for this case are IDE settings and configuration
> files, which should generally not be tracked and possibly generated from
> tracked files using a templating mechanism.
> 
> However, users learn about the assume-unchanged and skip-worktree bits
> and try to use them to do this anyway.  This is problematic, because when
> these bits are set, many operations behave as the user expects, but they
> usually do not help when git checkout needs to replace a file.
> 
> There is no sensible behavior in this case, because sometimes the data is
> precious, such as certain configuration files, and sometimes it is
irrelevant
> data that the user would be happy to discard.
> 
> Since this is not a supported configuration and users are prone to misuse
the
> existing features for unintended purposes, causing general sadness and
> confusion, let's document the existing behavior and the pitfalls in the
> documentation for git update-index so that users know they should explore
> alternate solutions.
> 
> In additon, let's provide a recommended solution to dealing with the
> common case of configuration files, since there are well-known approaches
> used successfully in many environments.

Just noodling about a potential solution. If we assume the use case that
files are modified by an IDE that have no real relevance, but should not
interfere with other git operations including checkout...

What if we introduce something like .gitignore.changes, with the same syntax
as .gitignore. The difference is files listed in this file will not show in
git status (or could show as "changes ignored" with an option to enable
that. The only way to have the changes considered would be git add -f, so
git add . and git commit -a would not pick up the changes. From checkout's
perspective, the file would be considered unmodified so if a change is
incoming for that path, checkout replaces it instead of rejecting the
checkout, otherwise the file is untouched. Pull would act similarly. Branch
switching would be permitted without stashing the files - they would remain
unchanged unless the switch modified the files.

OTOH, this is a change that is most relevant to IDE users, so JGit would
have to implement it as well to really get any real benefit.

This does have some benefit in post-install situations as well as the IDE
use-case, but for that I might want to consider finer granularity, like some
way to identify regions of files being ignored. This being a pretty deep
rabbit hole we'd end up following.

If this idea seems reasonable, it might make a nice small project for
someone, possibly me, if I could unentangle from my current hellish $DAYJOB
project.

Just my few coins of thought.

Randall
Jakub Narebski Nov. 3, 2019, 3:04 p.m. UTC | #2
"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> It is quite common for users to want to ignore the changes to a file
> that Git tracks.  Common scenarios for this case are IDE settings and
> configuration files, which should generally not be tracked and possibly
> generated from tracked files using a templating mechanism.
>
> However, users learn about the assume-unchanged and skip-worktree bits
> and try to use them to do this anyway.  This is problematic, because
> when these bits are set, many operations behave as the user expects, but
> they usually do not help when git checkout needs to replace a file.
>
> There is no sensible behavior in this case, because sometimes the data
> is precious, such as certain configuration files, and sometimes it is
> irrelevant data that the user would be happy to discard.
>
> Since this is not a supported configuration and users are prone to
> misuse the existing features for unintended purposes, causing general
> sadness and confusion, let's document the existing behavior and the
> pitfalls in the documentation for git update-index so that users know
> they should explore alternate solutions.
>
> In additon, let's provide a recommended solution to dealing with the
> common case of configuration files, since there are well-known
> approaches used successfully in many environments.

All right, looks sensible and good thing to have.

> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
> ---
>  Documentation/git-update-index.txt | 16 ++++++++++++++++
>  1 file changed, 16 insertions(+)
>
> diff --git a/Documentation/git-update-index.txt b/Documentation/git-update-index.txt
> index 1c4d146a41..11230376c8 100644
> --- a/Documentation/git-update-index.txt
> +++ b/Documentation/git-update-index.txt
> @@ -543,6 +543,22 @@ The untracked cache extension can be enabled by the
>  `core.untrackedCache` configuration variable (see
>  linkgit:git-config[1]).
>  
> +NOTES
> +-----
> +
> +Users often try to use the ``assume unchanged'' and skip-worktree bits

Why the change between formatting '``assume unchanged''' (with double
quotes and space separated) and 'skip-worktree' (without quotes, and
kebab-cased)?  In the commit message you write about assume-unchanged
and skip-worktree.

I guess that follows the inconsistency in git-update-index(1) headers,
namely we have

  USING ``ASSUME UNCHANGED'' BIT
  ------------------------------

but

  SKIP-WORKTREE BIT
  -----------------

This incconsistency is much more viible when both names are on the same
line, however.

This is a minor nit.

> +to tell Git to ignore changes to files that are tracked.  This does not
> +work as expected, since Git may still check working tree files against
> +the index when performing certain operations.  In general, Git does not
> +provide a way to ignore changes to tracked files, so alternate solutions
> +are recommended.

I'm not sure if it is a place for it, but the proposed text treats
assume-unchanged and skip-worktree as similarly unsuited for intended
purpose.  However, their failure modes are different: (ab)using the
assume-unchanged for "ignore changed to tracked files" may lead to data
loss (as changes are overwritten), while with skip-worktree the trouble
is that some operations that should succeed (like unstashing) are
unnecessarily blocked - but no data loss.

> +
> +If the file you want to change is some sort of configuration file (say,
> +for a build tool, IDE, or editor), a common solution is to use a
> +templating mechanism, such as Ruby's ERB, to generate the ignored
> +configuration file from a template stored in the repository and a source
> +of data using a script or build step.

I would really like to see a simple example of such template, so that
even people who are unfamiliar with Ruby's ERB can think of equivalent
solution for their language or toolchain of choice.

> +
>  SEE ALSO
>  --------
>  linkgit:git-config[1],

Best,
Jakub Narebski Nov. 3, 2019, 3:46 p.m. UTC | #3
<rsbecker@nexbridge.com> writes:

> On November 2, 2019 3:26 PM, brian m. carlson wrote:
>> It is quite common for users to want to ignore the changes to a file that Git
>> tracks.  Common scenarios for this case are IDE settings and configuration
>> files, which should generally not be tracked and possibly generated from
>> tracked files using a templating mechanism.
>> 
>> However, users learn about the assume-unchanged and skip-worktree bits
>> and try to use them to do this anyway.  This is problematic, because when
>> these bits are set, many operations behave as the user expects, but they
>> usually do not help when git checkout needs to replace a file.
[...]
> Just noodling about a potential solution. If we assume the use case that
> files are modified by an IDE that have no real relevance, but should not
> interfere with other git operations including checkout...
>
> What if we introduce something like .gitignore.changes, with the same syntax
> as .gitignore. The difference is files listed in this file will not show in
> `git status` (or could show as "changes ignored" with an option to enable
> that. The only way to have the changes considered would be `git add -f`, so
> `git add .` and `git commit -a` would not pick up the changes.
[...]

I think it would be better and easier to add new attribute and use
.gitattributes instead of a new .gitignore.changes (and its
per-repository, per-user and system-wide version).

> If this idea seems reasonable, it might make a nice small project for
> someone, possibly me, if I could unentangle from my current hellish $DAYJOB
> project.

I wish you luck.

The fact that it was nod done yet may mean that there are some annoying
corner-cases in the concept, or that it is not commonly useful... or
maybe that is the problem that needs reframing.

Best,
brian m. carlson Nov. 3, 2019, 6:59 p.m. UTC | #4
On 2019-11-03 at 15:04:44, Jakub Narebski wrote:
> Why the change between formatting '``assume unchanged''' (with double
> quotes and space separated) and 'skip-worktree' (without quotes, and
> kebab-cased)?  In the commit message you write about assume-unchanged
> and skip-worktree.
> 
> I guess that follows the inconsistency in git-update-index(1) headers,
> namely we have
> 
>   USING ``ASSUME UNCHANGED'' BIT
>   ------------------------------
> 
> but
> 
>   SKIP-WORKTREE BIT
>   -----------------
> 
> This incconsistency is much more viible when both names are on the same
> line, however.

Yeah, I can change them to make them consistent.  I did preserve the
existing formatting for both, as you mentioned.

> I'm not sure if it is a place for it, but the proposed text treats
> assume-unchanged and skip-worktree as similarly unsuited for intended
> purpose.  However, their failure modes are different: (ab)using the
> assume-unchanged for "ignore changed to tracked files" may lead to data
> loss (as changes are overwritten), while with skip-worktree the trouble
> is that some operations that should succeed (like unstashing) are
> unnecessarily blocked - but no data loss.

I agree the failure modes can be different, but from my experience there
are people who have seen checkout failures with both bits set
independently.  I'm not exactly sure what those cases are, but folks do
see them on Stack Overflow quite frequently.

Even if there is a difference in failure modes, I'd rather encourage
people to just not use this mechanism rather than explain why or in
which cases it won't do what you want.

> I would really like to see a simple example of such template, so that
> even people who are unfamiliar with Ruby's ERB can think of equivalent
> solution for their language or toolchain of choice.

I hesitated to mention ERB, but I wasn't sure that folks would know what
we meant by a "templating mechanism".  The reason I had chosen to
mention it is that someone could search for "Ruby ERB" and find out what
it looked like and then look for an option in their own language.

My concern with adding such a template is that an example will likely
grow this section quite a bit, and it's meant as a short aside to help
people avoid making a common mistake and guide them to a proper solution
rather than a comprehensive howto.  I'm planning on adding a FAQ
document in the future that covers a lot of these issues in more detail
and helps folks figure out solutions to common problems, and I'd prefer
to explain this more in depth there.
Jakub Narebski Nov. 3, 2019, 7:40 p.m. UTC | #5
"brian m. carlson" <sandals@crustytoothpaste.net> writes:
> On 2019-11-03 at 15:04:44, Jakub Narebski wrote:

>> Why the change between formatting '``assume unchanged''' (with double
>> quotes and space separated) and 'skip-worktree' (without quotes, and
>> kebab-cased)?  In the commit message you write about assume-unchanged
>> and skip-worktree.
>> 
>> I guess that follows the inconsistency in git-update-index(1) headers,
>> namely we have
>> 
>>   USING ``ASSUME UNCHANGED'' BIT
>>   ------------------------------
>> 
>> but
>> 
>>   SKIP-WORKTREE BIT
>>   -----------------
>> 
>> This inconsistency is much more visible when both names are on the same
>> line, however.
>
> Yeah, I can change them to make them consistent.  I did preserve the
> existing formatting for both, as you mentioned.

Thanks in advance.  This inconsistence just looks strange in one line.

>> I'm not sure if it is a place for it, but the proposed text treats
>> assume-unchanged and skip-worktree as similarly unsuited for intended
>> purpose.  However, their failure modes are different: (ab)using the
>> assume-unchanged for "ignore changed to tracked files" may lead to data
>> loss (as changes are overwritten), while with skip-worktree the trouble
>> is that some operations that should succeed (like unstashing) are
>> unnecessarily blocked - but no data loss.
>
> I agree the failure modes can be different, but from my experience there
> are people who have seen checkout failures with both bits set
> independently.  I'm not exactly sure what those cases are, but folks do
> see them on Stack Overflow quite frequently.
>
> Even if there is a difference in failure modes, I'd rather encourage
> people to just not use this mechanism rather than explain why or in
> which cases it won't do what you want.

What I wanted to say here is that if you really, really need to abuse
those mechanism for "ignore tracked files changes", it is better to use
skip-worktree bit -- at least you wouldn't loose your data (your
precious changes).

Nb. I have experimented with both assume-unchanged and skip-worktree,
so I know their annoyances, but I have not had a need to use them.

>> I would really like to see a simple example of such template, so that
>> even people who are unfamiliar with Ruby's ERB can think of equivalent
>> solution for their language or toolchain of choice.
>
> I hesitated to mention ERB, but I wasn't sure that folks would know what
> we meant by a "templating mechanism".  The reason I had chosen to
> mention it is that someone could search for "Ruby ERB" and find out what
> it looked like and then look for an option in their own language.

I would like to see not an example of template, but example how such
template can be used in place of "ignore tracked files change".

> My concern with adding such a template is that an example will likely
> grow this section quite a bit, and it's meant as a short aside to help
> people avoid making a common mistake and guide them to a proper solution
> rather than a comprehensive howto.  I'm planning on adding a FAQ
> document in the future that covers a lot of these issues in more detail
> and helps folks figure out solutions to common problems, and I'd prefer
> to explain this more in depth there.

Your proposed change:

  +If the file you want to change is some sort of configuration file (say,
  +for a build tool, IDE, or editor), a common solution is to use a
  +templating mechanism, such as Ruby's ERB, to generate the ignored
  +configuration file from a template stored in the repository and a source
  +of data using a script or build step.

I don't see how such templating mechanism could be used.  You have some
kind of configuration file with placeholders comitted, and you have a
version of this file with local changes -- how templating mechanism
could solve this?  I would like to see few lines of an example and its
use.


Alterantives:
~~~~~~~~~~~~~

In our build system, we have versioned Makefile, and not versioned
config.mak (with local configuration), which is included by Makefile.

In many cases using environment variables to provide local changes is
recommended, something like e.g.

  AWS::S3::Base.establish_connection!(
    :access_key_id     => ENV['S3_KEY'],
    :secret_access_key => ENV['S3_SECRET']
  )


Best,
brian m. carlson Nov. 3, 2019, 9:46 p.m. UTC | #6
On 2019-11-03 at 19:40:36, Jakub Narebski wrote:
> Your proposed change:
> 
>   +If the file you want to change is some sort of configuration file (say,
>   +for a build tool, IDE, or editor), a common solution is to use a
>   +templating mechanism, such as Ruby's ERB, to generate the ignored
>   +configuration file from a template stored in the repository and a source
>   +of data using a script or build step.
> 
> I don't see how such templating mechanism could be used.  You have some
> kind of configuration file with placeholders comitted, and you have a
> version of this file with local changes -- how templating mechanism
> could solve this?  I would like to see few lines of an example and its
> use.
> 
> Alterantives:
> ~~~~~~~~~~~~~
> 
> In our build system, we have versioned Makefile, and not versioned
> config.mak (with local configuration), which is included by Makefile.

Essentially, make and shell support this by themselves, but if, for
example, I wanted to adjust my dotfiles to set the email address once
and for all, I could create the following files:

.muttrc.erb:
----
my_hdr From: brian m. carlson <<%= data["email"] -%>>
----

.gitconfig.erb:
----
[user]
name = brian m. carlson
email = <%= data["email"] -%>
----

template.rb:
----
#!/usr/bin/env ruby

require 'erb'

class Template
  def name
    @name ||= 0
    @name += 1
    "name_a#{@name}"
  end

  def data
    ENV.map { |k, v| [k.downcase, v] }.to_h
  end

  def erb(file)
    ERB.new(File.read(file), nil, '-', name).result(binding)
  end
end

puts Template.new.erb(ARGV[0])
----

and then run:

EMAIL=sandals@crustytoothpaste.net template.rb .muttrc.erb >.muttrc
EMAIL=sandals@crustytoothpaste.net template.rb .gitconfig.erb >.gitconfig

The problem that folks tend to have is that they have a single editor or
IDE project file, such as an XCode configuration file, that can't be
split among multiple files, some of which are checked in and some of
which are not.  Other situations are generating a configuration file for
a web server like nginx in development, which may of course differ
depending on where the user has checked out the repository.

Using some sort of file like a config.mak is a fine solution, but many
programs don't support that, so it's necessary to create a template for
the build process and add a script to generate it.  The actual
configuration values can come from the environment, the user's
gitconfig, a YAML file the user has configured, or anywhere else that
makes sense.

As you can see, the example is large and unwieldy, and would not make a
good inclusion in the man page.  I included that paragraph because Peff
stated that it would be nice if we could offer people a solution, but
I'd rather drop it if it's too confusing without an example.
Jeff King Nov. 4, 2019, 10:24 p.m. UTC | #7
On Sun, Nov 03, 2019 at 06:59:08PM +0000, brian m. carlson wrote:

> > I would really like to see a simple example of such template, so that
> > even people who are unfamiliar with Ruby's ERB can think of equivalent
> > solution for their language or toolchain of choice.
> 
> I hesitated to mention ERB, but I wasn't sure that folks would know what
> we meant by a "templating mechanism".  The reason I had chosen to
> mention it is that someone could search for "Ruby ERB" and find out what
> it looked like and then look for an option in their own language.

I don't mind what is here, but I might even suggest going slightly in
the opposite direction. Say something like:

  For example, you the repository can include a sample config file that
  can then be copied into the ignored name and modified.

which points people in the right direction without giving specifics. But
if you did want to go further, you can then say:

  The repository can even include a script to treat the sample file as a
  template, modifying and copying it automatically (e.g., a Ruby script
  using an ERB template).

or something.

> My concern with adding such a template is that an example will likely
> grow this section quite a bit, and it's meant as a short aside to help
> people avoid making a common mistake and guide them to a proper solution
> rather than a comprehensive howto.

Yeah. I don't think we should get into best practices for using ERB.

-Peff
brian m. carlson Nov. 4, 2019, 11:52 p.m. UTC | #8
On 2019-11-04 at 22:24:16, Jeff King wrote:
> On Sun, Nov 03, 2019 at 06:59:08PM +0000, brian m. carlson wrote:
> 
> > > I would really like to see a simple example of such template, so that
> > > even people who are unfamiliar with Ruby's ERB can think of equivalent
> > > solution for their language or toolchain of choice.
> > 
> > I hesitated to mention ERB, but I wasn't sure that folks would know what
> > we meant by a "templating mechanism".  The reason I had chosen to
> > mention it is that someone could search for "Ruby ERB" and find out what
> > it looked like and then look for an option in their own language.
> 
> I don't mind what is here, but I might even suggest going slightly in
> the opposite direction. Say something like:
> 
>   For example, you the repository can include a sample config file that
>   can then be copied into the ignored name and modified.
> 
> which points people in the right direction without giving specifics. But
> if you did want to go further, you can then say:
> 
>   The repository can even include a script to treat the sample file as a
>   template, modifying and copying it automatically (e.g., a Ruby script
>   using an ERB template).
> 
> or something.

I think this is a nice change.  It summarizes the proposed solution
without needing to be specific.
Jakub Narebski Nov. 5, 2019, 12:21 a.m. UTC | #9
"brian m. carlson" <sandals@crustytoothpaste.net> writes:
> On 2019-11-03 at 19:40:36, Jakub Narebski wrote:
>>
>> Your proposed change:
>> 
>>   +If the file you want to change is some sort of configuration file (say,
>>   +for a build tool, IDE, or editor), a common solution is to use a
>>   +templating mechanism, such as Ruby's ERB, to generate the ignored
>>   +configuration file from a template stored in the repository and a source
>>   +of data using a script or build step.
>> 
>> I don't see how such templating mechanism could be used.  You have some
>> kind of configuration file with placeholders comitted, and you have a
>> version of this file with local changes -- how templating mechanism
>> could solve this?  I would like to see few lines of an example and its
>> use.
>> 
>> Alternatives:
>> ~~~~~~~~~~~~~
>> 
>> In our build system, we have versioned Makefile, and not versioned
>> config.mak (with local configuration), which is included by Makefile.
>
> Essentially, make and shell support this by themselves, but if, for
> example, I wanted to adjust my dotfiles to set the email address once
> and for all, I could create the following files:
>
> .muttrc.erb:
> ----
> my_hdr From: brian m. carlson <<%= data["email"] -%>>
> ----
>
> .gitconfig.erb:
> ----
> [user]
> name = brian m. carlson
> email = <%= data["email"] -%>
> ----

All right, the above might be useful as an example (well, one of those),
but might be not necessary if the description of preferred solution is
stated in more detail.  I think it is better to start with generics,
i.e. track template, and generate untracked file, then provide examples
like Ruby's ERB.

> template.rb:
> ----
> #!/usr/bin/env ruby
>
> require 'erb'
>
> class Template
>   def name
>     @name ||= 0
>     @name += 1
>     "name_a#{@name}"
>   end
>
>   def data
>     ENV.map { |k, v| [k.downcase, v] }.to_h
>   end
>
>   def erb(file)
>     ERB.new(File.read(file), nil, '-', name).result(binding)
>   end
> end
>
> puts Template.new.erb(ARGV[0])
> ----

This is certainly too much detail.

> and then run:
>
> EMAIL=sandals@crustytoothpaste.net template.rb .muttrc.erb >.muttrc
> EMAIL=sandals@crustytoothpaste.net template.rb .gitconfig.erb >.gitconfig

That could be kept as an example, after simplification.

> The problem that folks tend to have is that they have a single editor or
> IDE project file, such as an XCode configuration file, that can't be
> split among multiple files, some of which are checked in and some of
> which are not.  Other situations are generating a configuration file for
> a web server like nginx in development, which may of course differ
> depending on where the user has checked out the repository.

All right.

> Using some sort of file like a config.mak is a fine solution, but many
> programs don't support that, so it's necessary to create a template for
> the build process and add a script to generate it.  The actual
> configuration values can come from the environment, the user's
> gitconfig, a YAML file the user has configured, or anywhere else that
> makes sense.

I still think that config.mak (or equivalent) solution might be worth
mentioning in passing.

> As you can see, the example is large and unwieldy, and would not make a
> good inclusion in the man page.  I included that paragraph because Peff
> stated that it would be nice if we could offer people a solution, but
> I'd rather drop it if it's too confusing without an example.

I think it can be done even without example, but without example it
would need careful crafting of the wording.

Best,

Patch
diff mbox series

diff --git a/Documentation/git-update-index.txt b/Documentation/git-update-index.txt
index 1c4d146a41..11230376c8 100644
--- a/Documentation/git-update-index.txt
+++ b/Documentation/git-update-index.txt
@@ -543,6 +543,22 @@  The untracked cache extension can be enabled by the
 `core.untrackedCache` configuration variable (see
 linkgit:git-config[1]).
 
+NOTES
+-----
+
+Users often try to use the ``assume unchanged'' and skip-worktree bits
+to tell Git to ignore changes to files that are tracked.  This does not
+work as expected, since Git may still check working tree files against
+the index when performing certain operations.  In general, Git does not
+provide a way to ignore changes to tracked files, so alternate solutions
+are recommended.
+
+If the file you want to change is some sort of configuration file (say,
+for a build tool, IDE, or editor), a common solution is to use a
+templating mechanism, such as Ruby's ERB, to generate the ignored
+configuration file from a template stored in the repository and a source
+of data using a script or build step.
+
 SEE ALSO
 --------
 linkgit:git-config[1],