mbox series

[0/2] Add --no-filters option to git-add

Message ID pull.880.git.1613758333.gitgitgadget@gmail.com (mailing list archive)
Headers show
Series Add --no-filters option to git-add | expand

Message

Linus Arver via GitGitGadget Feb. 19, 2021, 6:12 p.m. UTC
It is possible for a user to disable attribute-based filtering when
committing by doing one of the following:

 * Create .git/info/attributes unapplying all possible transforming
   attributes.
 * Use git hash-object and git update-index to stage files manually.

Doing the former requires keeping an up-to-date list of all attributes which
can transform files when committing or checking out. Doing the latter is
difficult, error-prone and slow when done from scripts.

Instead, similarly to git hash-object, --no-filter can be added to git add
to enable temporarily disabling filtering in an easy to use way.

These patches:

 * Add new flag ADD_CACHE_RAW to add_to_index()
 * Add new flag HASH_RAW to index_fd()
 * Make git hash-object use the new HASH_RAW flag for consistency
 * Add tests for the new git-add option.

Andrej Shadura (2):
  add: add option --no-filters to disable attribute-based filtering
  hash-object: use the new HASH_RAW flag instead of setting path to NULL

 Documentation/git-add.txt |  7 +++++-
 builtin/add.c             |  3 +++
 builtin/hash-object.c     | 17 ++++++---------
 cache.h                   |  2 ++
 object-file.c             |  2 +-
 read-cache.c              |  3 +++
 t/t2205-add-no-filters.sh | 46 +++++++++++++++++++++++++++++++++++++++
 7 files changed, 68 insertions(+), 12 deletions(-)
 create mode 100755 t/t2205-add-no-filters.sh


base-commit: 2283e0e9af55689215afa39c03beb2315ce18e83
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-880%2Fandrewshadura%2Fgit-add-no-filters-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-880/andrewshadura/git-add-no-filters-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/880

Comments

brian m. carlson Feb. 19, 2021, 10:36 p.m. UTC | #1
On 2021-02-19 at 18:12:11, Andrej Shadura via GitGitGadget wrote:
> It is possible for a user to disable attribute-based filtering when
> committing by doing one of the following:
> 
>  * Create .git/info/attributes unapplying all possible transforming
>    attributes.
>  * Use git hash-object and git update-index to stage files manually.
> 
> Doing the former requires keeping an up-to-date list of all attributes which
> can transform files when committing or checking out. Doing the latter is
> difficult, error-prone and slow when done from scripts.
> 
> Instead, similarly to git hash-object, --no-filter can be added to git add
> to enable temporarily disabling filtering in an easy to use way.
> 
> These patches:
> 
>  * Add new flag ADD_CACHE_RAW to add_to_index()
>  * Add new flag HASH_RAW to index_fd()
>  * Make git hash-object use the new HASH_RAW flag for consistency
>  * Add tests for the new git-add option.

I'm interested in your use cases here.  While I agree that this is an
interesting feature, it also means that practically, a user who checks
out a file that's added this way may find that git status marks it as
perpetually modified until a properly cleaned version is committed.
Moreover, even "git reset --hard" won't fix this situation.

We see this problem extremely frequently with Git LFS where people
change the .gitattributes file but don't run "git add --renormalize ."
and then end up with this problem.  However, it's not limited to Git LFS
in particular; anything that uses filters, working tree encodings, or
end of line attributes can be affected.

So I think that while this might be a useful escape hatch for users, I
definitely want to see a compelling rationale for it and a big warning
in the documentation and an update to the relevant entry in the Git FAQ
before we accept such a patch.
Andrej Shadura Feb. 20, 2021, 8:06 a.m. UTC | #2
On 19/02/2021 23:36, brian m. carlson wrote:
> On 2021-02-19 at 18:12:11, Andrej Shadura via GitGitGadget wrote:
>> It is possible for a user to disable attribute-based filtering when
>> committing by doing one of the following:
>>
>>  * Create .git/info/attributes unapplying all possible transforming
>>    attributes.
>>  * Use git hash-object and git update-index to stage files manually.
>>
>> Doing the former requires keeping an up-to-date list of all attributes which
>> can transform files when committing or checking out. Doing the latter is
>> difficult, error-prone and slow when done from scripts.
>>
>> Instead, similarly to git hash-object, --no-filter can be added to git add
>> to enable temporarily disabling filtering in an easy to use way.

> I'm interested in your use cases here.  While I agree that this is an
> interesting feature, it also means that practically, a user who checks
> out a file that's added this way may find that git status marks it as
> perpetually modified until a properly cleaned version is committed.
> Moreover, even "git reset --hard" won't fix this situation.
> 
> We see this problem extremely frequently with Git LFS where people
> change the .gitattributes file but don't run "git add --renormalize ."
> and then end up with this problem.  However, it's not limited to Git LFS
> in particular; anything that uses filters, working tree encodings, or
> end of line attributes can be affected.
> 
> So I think that while this might be a useful escape hatch for users, I
> definitely want to see a compelling rationale for it and a big warning
> in the documentation and an update to the relevant entry in the Git FAQ
> before we accept such a patch.

My use case here is mostly non-interactive use in scripts creating Git
trees that need to exactly correspond to a working directory regardless
of whether or not they have any .gitattributes files.

For example, for git-buildpackage or dgit, which facilitate Git
workflows with Debian packages, want to ensure the contents of the
packages can be exactly reproduced, which is difficult if the upstream’s
tarball has .gitattributes. It is possible to "defuse" the attributes as
demonstrated above, but this will break if the user modifies the
.git/i/a file *or* if Git adds more attribute-based conversions. This is
what dgit currently does, and this is what git-buildpackage will soon do
too.

Of course, this patch set only addresses staging files. While working on
a patch to git-buildpackage to reproducibly import the contents of
tarballs, I realised that the only realistic way seem to do that is to
use hash-object + update-index manually, which is likely to come with a
performance drop compared to git add (which is what gbp currently uses).
A workaround might be to use dulwich (which would allow to do
hash-object without fork/exec) or perhaps GitPython (which I haven’t
really looked into), or maybe to use git fast-import, but all of these
alternatives are quite complex and don’t guarantee the same performance.

Adding a new option to git add allows to keep the performance without
having to ensure attributes are set to the right values. The attributes
will likely have to be adjusted anyway for user’s convenience, but at
least if they modify them afterwards, the tools won’t break.
Andrej Shadura Feb. 20, 2021, 9:30 a.m. UTC | #3
On 20/02/2021 09:06, Andrej Shadura wrote:
> On 19/02/2021 23:36, brian m. carlson wrote:
>> So I think that while this might be a useful escape hatch for users, I
>> definitely want to see a compelling rationale for it and a big warning
>> in the documentation and an update to the relevant entry in the Git FAQ
>> before we accept such a patch.

Here’s my proposal for the updated manpage description of the option:

--no-filters::

Add the contents as is, ignoring any input filter that would have been
chosen by the attributes mechanism, including the end-of-line
conversion. Note that this option is not intended for interactive use,
since files added this way will always show up as modified if Git were
to apply transformations to them, making the situation potentially very
confusing.

And here the FAQ entry extended:

It is also possible for perpetually modified files to occur on any
platform if a smudge or clean filter is in use on your system but a file
was previously committed without running the smudge or clean filter.  To
fix this, run the following on an otherwise clean working tree:
+
----
$ git add --renormalize .
----
+
Another situation where perpetually modified may appear on any platform
is when a file has been committed without running any filters (including
the end-of-line conversion), but the `.gitattributes` file states that
this file requires a conversion.  In this case, you can either
renormalize the files if this happened by mistake, or modify
`.gitattributes` or `$GIT_DIR/info/attributes` as described above to
exempt the file from the conversion if this was intentional.

(I will send an updated patch set when we agree on the wording.)
brian m. carlson Feb. 20, 2021, 2:10 p.m. UTC | #4
On 2021-02-20 at 09:30:58, Andrej Shadura wrote:
> On 20/02/2021 09:06, Andrej Shadura wrote:
> > On 19/02/2021 23:36, brian m. carlson wrote:
> >> So I think that while this might be a useful escape hatch for users, I
> >> definitely want to see a compelling rationale for it and a big warning
> >> in the documentation and an update to the relevant entry in the Git FAQ
> >> before we accept such a patch.
> 
> Here’s my proposal for the updated manpage description of the option:
> 
> --no-filters::
> 
> Add the contents as is, ignoring any input filter that would have been
> chosen by the attributes mechanism, including the end-of-line
> conversion. Note that this option is not intended for interactive use,
> since files added this way will always show up as modified if Git were
> to apply transformations to them, making the situation potentially very
> confusing.
> 
> And here the FAQ entry extended:
> 
> It is also possible for perpetually modified files to occur on any
> platform if a smudge or clean filter is in use on your system but a file
> was previously committed without running the smudge or clean filter.  To
> fix this, run the following on an otherwise clean working tree:
> +
> ----
> $ git add --renormalize .
> ----
> +
> Another situation where perpetually modified may appear on any platform
> is when a file has been committed without running any filters (including
> the end-of-line conversion), but the `.gitattributes` file states that
> this file requires a conversion.  In this case, you can either
> renormalize the files if this happened by mistake, or modify
> `.gitattributes` or `$GIT_DIR/info/attributes` as described above to
> exempt the file from the conversion if this was intentional.
> 
> (I will send an updated patch set when we agree on the wording.)

This seems fine.  Thanks for being open to addressing my concerns, and I
agree that your use case seems like a good one and that this is a
valuable feature.