diff mbox series

add: don't write objects with --dry-run

Message ID 0131d21f-dabd-3da5-34bd-a570e990f9e0@web.de (mailing list archive)
State New, archived
Headers show
Series add: don't write objects with --dry-run | expand

Commit Message

René Scharfe Oct. 12, 2021, 7:15 p.m. UTC
When the option --dry-run/-n is given, "git add" doesn't change the
index, but still writes out new object files.  Only hash the latter
without writing instead to make the run as dry as possible.

Use this opportunity to also make the hash_flags variable unsigned,
to match the index_path() parameter it is used as.

Reported-by: git.mexon@spamgourmet.com
Signed-off-by: René Scharfe <l.s.r@web.de>
---
Am I missing something?  Do we sometimes rely on the written objects
within the "git add --dry-run" command?

 read-cache.c          | 2 +-
 t/t2200-add-update.sh | 3 +++
 2 files changed, 4 insertions(+), 1 deletion(-)

--
2.33.0

Comments

Junio C Hamano Oct. 12, 2021, 8:15 p.m. UTC | #1
René Scharfe <l.s.r@web.de> writes:

> When the option --dry-run/-n is given, "git add" doesn't change the
> index, but still writes out new object files.  Only hash the latter
> without writing instead to make the run as dry as possible.
>
> Use this opportunity to also make the hash_flags variable unsigned,
> to match the index_path() parameter it is used as.
>
> Reported-by: git.mexon@spamgourmet.com
> Signed-off-by: René Scharfe <l.s.r@web.de>
> ---
> Am I missing something?  Do we sometimes rely on the written objects
> within the "git add --dry-run" command?

Good question.  I do not think of anything offhand, but this obvious
"omission" makes me suspect that we may be forgetting something.

Thanks.


>  read-cache.c          | 2 +-
>  t/t2200-add-update.sh | 3 +++
>  2 files changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/read-cache.c b/read-cache.c
> index a78b88a41b..7fcc948077 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -738,7 +738,7 @@ int add_to_index(struct index_state *istate, const char *path, struct stat *st,
>  	int intent_only = flags & ADD_CACHE_INTENT;
>  	int add_option = (ADD_CACHE_OK_TO_ADD|ADD_CACHE_OK_TO_REPLACE|
>  			  (intent_only ? ADD_CACHE_NEW_ONLY : 0));
> -	int hash_flags = HASH_WRITE_OBJECT;
> +	unsigned hash_flags = pretend ? 0 : HASH_WRITE_OBJECT;
>  	struct object_id oid;
>
>  	if (flags & ADD_CACHE_RENORMALIZE)
> diff --git a/t/t2200-add-update.sh b/t/t2200-add-update.sh
> index 45ca35d60a..94c4cb0672 100755
> --- a/t/t2200-add-update.sh
> +++ b/t/t2200-add-update.sh
> @@ -129,12 +129,15 @@ test_expect_success 'add -n -u should not add but just report' '
>  		echo "remove '\''top'\''"
>  	) >expect &&
>  	before=$(git ls-files -s check top) &&
> +	git count-objects -v >objects_before &&
>  	echo changed >>check &&
>  	rm -f top &&
>  	git add -n -u >actual &&
>  	after=$(git ls-files -s check top) &&
> +	git count-objects -v >objects_after &&
>
>  	test "$before" = "$after" &&
> +	test_cmp objects_before objects_after &&
>  	test_cmp expect actual
>
>  '
> --
> 2.33.0
Ævar Arnfjörð Bjarmason Oct. 12, 2021, 8:17 p.m. UTC | #2
On Tue, Oct 12 2021, René Scharfe wrote:

> When the option --dry-run/-n is given, "git add" doesn't change the
> index, but still writes out new object files.  Only hash the latter
> without writing instead to make the run as dry as possible.
>
> Use this opportunity to also make the hash_flags variable unsigned,
> to match the index_path() parameter it is used as.
>
> Reported-by: git.mexon@spamgourmet.com
> Signed-off-by: René Scharfe <l.s.r@web.de>
> ---
> Am I missing something?  Do we sometimes rely on the written objects
> within the "git add --dry-run" command?

Probably not, here's a semi-related patch of mine that never got
integrated. E.g. you'll probably find that even if you're not writing
objects we're still doing things like zlib compression here too (or not,
I haven't looked):
https://lore.kernel.org/git/20190520222932.22843-1-avarab@gmail.com/

I think the "git fetch --dry-run" command behaves like this too,
i.e. doesn't update refs, but fetches and writes objects.

For the patch I hacked up I think it's easy to argue that it shouldn't
do compression etc.

For this sort of thing and "fetch" I'm not so sure. Do we really know
that there aren't people who rely on this for say the performance of
seeing what an operation would do, and then not pay as much for the
"real one" that updates the index/refs/etc. later? Is that subsequent
"fetch" cheaper because of the --dry-run?

Maybe not, but it seems like something to look into.
Junio C Hamano Oct. 12, 2021, 8:37 p.m. UTC | #3
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> I think the "git fetch --dry-run" command behaves like this too,
> i.e. doesn't update refs, but fetches and writes objects.
>
> For the patch I hacked up I think it's easy to argue that it shouldn't
> do compression etc.
>
> For this sort of thing and "fetch" I'm not so sure. Do we really know
> that there aren't people who rely on this for say the performance of
> seeing what an operation would do, and then not pay as much for the
> "real one" that updates the index/refs/etc. later? Is that subsequent
> "fetch" cheaper because of the --dry-run?

The answer to the last one is an easy "yes".  Trying to gauge the
time it would take for a real fetch with "--dry-run" is a losing
battle, I would think, as the pre-fetching would make the "real" one
cheaper, so from that point of view, I think we can ignore those who
time "--dry-run" and try to figure out anything meaningful.

This in any case is an interesting area, as the definition of
correctness of what "dry-run" does can be quite fuzzy.  As long as
it does not change the index, "git add --dry-run", even if it writes
objects or detects filesystem corruption by noticing I/O error while
compressing the data taken from the working tree files, is still
correct and the patch in question is not technically a bugfix (it is
a performance thing).  "git fetch --dry-run" would fall into the
same category, so would "git hash-object" without "-w".

All can use performance enhancement without breaking existing users,
I would think.

Thanks.
diff mbox series

Patch

diff --git a/read-cache.c b/read-cache.c
index a78b88a41b..7fcc948077 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -738,7 +738,7 @@  int add_to_index(struct index_state *istate, const char *path, struct stat *st,
 	int intent_only = flags & ADD_CACHE_INTENT;
 	int add_option = (ADD_CACHE_OK_TO_ADD|ADD_CACHE_OK_TO_REPLACE|
 			  (intent_only ? ADD_CACHE_NEW_ONLY : 0));
-	int hash_flags = HASH_WRITE_OBJECT;
+	unsigned hash_flags = pretend ? 0 : HASH_WRITE_OBJECT;
 	struct object_id oid;

 	if (flags & ADD_CACHE_RENORMALIZE)
diff --git a/t/t2200-add-update.sh b/t/t2200-add-update.sh
index 45ca35d60a..94c4cb0672 100755
--- a/t/t2200-add-update.sh
+++ b/t/t2200-add-update.sh
@@ -129,12 +129,15 @@  test_expect_success 'add -n -u should not add but just report' '
 		echo "remove '\''top'\''"
 	) >expect &&
 	before=$(git ls-files -s check top) &&
+	git count-objects -v >objects_before &&
 	echo changed >>check &&
 	rm -f top &&
 	git add -n -u >actual &&
 	after=$(git ls-files -s check top) &&
+	git count-objects -v >objects_after &&

 	test "$before" = "$after" &&
+	test_cmp objects_before objects_after &&
 	test_cmp expect actual

 '