diff mbox series

pack-write: skip *.rev work when not writing *.rev

Message ID patch-1.1-366ba928bd-20210908T010743Z-avarab@gmail.com (mailing list archive)
State Accepted
Commit 1972c5931bb1cbd81470c527d22a5ed193465ec1
Headers show
Series pack-write: skip *.rev work when not writing *.rev | expand

Commit Message

Ævar Arnfjörð Bjarmason Sept. 8, 2021, 1:08 a.m. UTC
Fix a performance regression introduced in a587b5a786 (pack-write.c:
extract 'write_rev_file_order', 2021-03-30) and stop needlessly
allocating the "pack_order" array and sorting it with
"pack_order_cmp()", only to throw that work away when we discover that
we're not writing *.rev files after all.

This redundant work was not present in the original version of this
code added in 8ef50d9958 (pack-write.c: prepare to write 'pack-*.rev'
files, 2021-01-25). There we'd call write_rev_file() from
e.g. finish_tmp_packfile(), but we'd "return NULL" early in
write_rev_file() if not doing a "WRITE_REV" or "WRITE_REV_VERIFY".

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 pack-write.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Carlo Marcelo Arenas Belón Sept. 8, 2021, 1:35 a.m. UTC | #1
On Tue, Sep 7, 2021 at 6:10 PM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
> +       if (!(flags & WRITE_REV) && !(flags & WRITE_REV_VERIFY))
> +               return NULL;

I see this expression matches exactly the logic from 8ef50d9958 which
is why I presume
you used it, but the simpler (and logically equivalent[1]) :

  if !((flags & WRITE_REV) || (flags & WRITE_REV_VERIFY))

is easier to read IMHO

Carlo

[1] https://en.wikipedia.org/wiki/De_Morgan%27s_laws
Taylor Blau Sept. 8, 2021, 2:42 a.m. UTC | #2
On Tue, Sep 07, 2021 at 06:35:10PM -0700, Carlo Arenas wrote:
> On Tue, Sep 7, 2021 at 6:10 PM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
> > +       if (!(flags & WRITE_REV) && !(flags & WRITE_REV_VERIFY))
> > +               return NULL;
>
> I see this expression matches exactly the logic from 8ef50d9958 which
> is why I presume
> you used it, but the simpler (and logically equivalent[1]) :
>
>   if !((flags & WRITE_REV) || (flags & WRITE_REV_VERIFY))

Even simpler would be:

    if (!(flags & (WRITE_REV | WRITE_REV_VERIFY)))

although with optimization flags other than -O0, it seems that each of
these three produce the same result [1], so I don't think that it
matters much either way ;-).

Thanks,
Taylor

[1]: https://godbolt.org/z/fxxhzEz79
Taylor Blau Sept. 8, 2021, 2:50 a.m. UTC | #3
On Wed, Sep 08, 2021 at 03:08:03AM +0200, Ævar Arnfjörð Bjarmason wrote:
> Fix a performance regression introduced in a587b5a786 (pack-write.c:
> extract 'write_rev_file_order', 2021-03-30) and stop needlessly
> allocating the "pack_order" array and sorting it with
> "pack_order_cmp()", only to throw that work away when we discover that
> we're not writing *.rev files after all.
>
> This redundant work was not present in the original version of this
> code added in 8ef50d9958 (pack-write.c: prepare to write 'pack-*.rev'
> files, 2021-01-25). There we'd call write_rev_file() from
> e.g. finish_tmp_packfile(), but we'd "return NULL" early in
> write_rev_file() if not doing a "WRITE_REV" or "WRITE_REV_VERIFY".
>
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
>  pack-write.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/pack-write.c b/pack-write.c
> index f1fc3ecafa..1883848e7c 100644
> --- a/pack-write.c
> +++ b/pack-write.c
> @@ -224,6 +224,9 @@ const char *write_rev_file(const char *rev_name,
>  	uint32_t i;
>  	const char *ret;
>
> +	if (!(flags & WRITE_REV) && !(flags & WRITE_REV_VERIFY))
> +		return NULL;
> +

Great catch! This fix as-is is obviously correct, but it does make the
same checks in write_rev_file_order redundant as a result.

If we call write_rev_file() without WRITE_REV, then we'll never call
write_rev_file_order(). The other caller of write_rev_file_order() is
from midx.c:write_midx_reverse_index(), which is only called by
write_midx_internal() where it is guarded by a similar conditional.

So I think we could probably either:

  - remove the check from write_rev_file_order() altogether, moving it
    to write_rev_file() like you did here, or

  - remove the check from write_rev_file_order() and elevate it to the
    caller which is missing the check in finish_tmp_packfile()

Of the two, I think the former is more appealing (since no other
functions called by finish_tmp_packfile() are guarded like that; they
conditionally behave as noops depending on `flags`).

But what you wrote here is just fine as-is, so the above are just some
optional ideas for potential improvements.

Thanks,
Taylor
Taylor Blau Sept. 8, 2021, 3:50 a.m. UTC | #4
On Tue, Sep 07, 2021 at 10:50:58PM -0400, Taylor Blau wrote:
> Of the two, I think the former is more appealing (since no other
> functions called by finish_tmp_packfile() are guarded like that; they
> conditionally behave as noops depending on `flags`).

Sorry; this is nonsensical. The only other function we call is
write_idx_file() which merely changes its behavior based on flags, but
it never behaves as a noop.

That doesn't change my thinking about preferring the former of my two
suggestions, but just wanted to correct my error.

Thanks,
Taylor
Ævar Arnfjörð Bjarmason Sept. 8, 2021, 10:18 a.m. UTC | #5
On Tue, Sep 07 2021, Taylor Blau wrote:

> On Tue, Sep 07, 2021 at 10:50:58PM -0400, Taylor Blau wrote:
>> Of the two, I think the former is more appealing (since no other
>> functions called by finish_tmp_packfile() are guarded like that; they
>> conditionally behave as noops depending on `flags`).
>
> Sorry; this is nonsensical. The only other function we call is
> write_idx_file() which merely changes its behavior based on flags, but
> it never behaves as a noop.
>
> That doesn't change my thinking about preferring the former of my two
> suggestions, but just wanted to correct my error.

I agree that this code is very confusing overall, but would prefer to
wait on refactoring further until the two topics in flight (this and the
other pack-write topic) settle.

As shown in
https://lore.kernel.org/git/87v93bidhn.fsf@evledraar.gmail.com/ I think
the best thing to do is neither of the narrow fixes you suggest, but to
more deeply untangle the whole mess around how we choose to write these
files & with what options. A lot of it is bit-twiddling back and forth
for no real reason.

Once we do that it becomes impossible to land in a mode where these
functions need to in principle deal with writing a "real" file and the
"verify" mode, which as noted in the linked E-Mail is the case now, we
just need/want these "is more than one set?" checks & assertions because
we've made the interface overly confusing/general.
Junio C Hamano Sept. 8, 2021, 3:47 p.m. UTC | #6
Taylor Blau <me@ttaylorr.com> writes:

> On Tue, Sep 07, 2021 at 06:35:10PM -0700, Carlo Arenas wrote:
>> On Tue, Sep 7, 2021 at 6:10 PM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
>> > +       if (!(flags & WRITE_REV) && !(flags & WRITE_REV_VERIFY))
>> > +               return NULL;
>>
>> I see this expression matches exactly the logic from 8ef50d9958 which
>> is why I presume
>> you used it, but the simpler (and logically equivalent[1]) :
>>
>>   if !((flags & WRITE_REV) || (flags & WRITE_REV_VERIFY))
>
> Even simpler would be:
>
>     if (!(flags & (WRITE_REV | WRITE_REV_VERIFY)))
>
> although with optimization flags other than -O0, it seems that each of
> these three produce the same result [1], so I don't think that it
> matters much either way ;-).

If all result in the same binary, the only deciding factor would be
how readable the code is to human readers.

I too find that your version, "we care about these two bits---if
flags does not have either of them, then...", the easiest to follow.
But the original is not unreadable and is good enough once it has
already been written.

Thanks.
Taylor Blau Sept. 8, 2021, 4:32 p.m. UTC | #7
On Wed, Sep 08, 2021 at 12:18:38PM +0200, Ævar Arnfjörð Bjarmason wrote:
>
> On Tue, Sep 07 2021, Taylor Blau wrote:
>
> > On Tue, Sep 07, 2021 at 10:50:58PM -0400, Taylor Blau wrote:
> >> Of the two, I think the former is more appealing (since no other
> >> functions called by finish_tmp_packfile() are guarded like that; they
> >> conditionally behave as noops depending on `flags`).
> >
> > Sorry; this is nonsensical. The only other function we call is
> > write_idx_file() which merely changes its behavior based on flags, but
> > it never behaves as a noop.
> >
> > That doesn't change my thinking about preferring the former of my two
> > suggestions, but just wanted to correct my error.
>
> I agree that this code is very confusing overall, but would prefer to
> wait on refactoring further until the two topics in flight (this and the
> other pack-write topic) settle.

I'm fine to wait on any further refactorings. And I agree that this code
is confusing, since when I read it last night I thought that the check
in write_rev_file_order() was a duplicate of the one you introduced, but
it is not:

    if ((flags & WRITE_REV) && (flags & WRITE_REV_VERIFY))
      die(_("cannot both write and verify reverse index"));

and that check is different than the one you added, which I think is
appropriate.

So this patch looks good to me, and sorry for the confusion.

Thanks,
Taylor
diff mbox series

Patch

diff --git a/pack-write.c b/pack-write.c
index f1fc3ecafa..1883848e7c 100644
--- a/pack-write.c
+++ b/pack-write.c
@@ -224,6 +224,9 @@  const char *write_rev_file(const char *rev_name,
 	uint32_t i;
 	const char *ret;
 
+	if (!(flags & WRITE_REV) && !(flags & WRITE_REV_VERIFY))
+		return NULL;
+
 	ALLOC_ARRAY(pack_order, nr_objects);
 	for (i = 0; i < nr_objects; i++)
 		pack_order[i] = i;