diff mbox series

[v1,2/3] pack-bitmap.c: add "break" statement in "open_pack_bitmap()"

Message ID 70500b63434f6e454631b3a8fde07c62b5adfef0.1648119652.git.dyroneteng@gmail.com (mailing list archive)
State Superseded
Headers show
Series trace2 output for bitmap decision path | expand

Commit Message

Teng Long March 24, 2022, 11:44 a.m. UTC
There will be only one operant ".bitmap" file in repo, so let's
add "break" statement in "open_pack_bitmap()" when looping all
the packs in repo.

Signed-off-by: Teng Long <dyroneteng@gmail.com>
---
 pack-bitmap.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Comments

Junio C Hamano March 24, 2022, 6:40 p.m. UTC | #1
Teng Long <dyroneteng@gmail.com> writes:

> There will be only one operant ".bitmap" file in repo, so let's
> add "break" statement in "open_pack_bitmap()" when looping all
> the packs in repo.

Googling "operant" only gives a psychology term X-<.

    : behavior (such as bar pressing by a rat to obtain food) that
    operates on the environment to produce rewarding and reinforcing
    effects.

I do not quite get this.  We expect that there may be more than one
.pack in the object store, and that is why we iterate over the packs
get_all_packs() gives us.  Then for each of these packs, we try to
open pack bitmap.

Ah, are you referring to the current limitation that we can have
only one .bitmap file, even when there are many packs in the object
store?

I wonder if that limitation is something we need to hardcode here.

IOW, when we ever extend the feature to allow more than one, we may
have to extend the "struct bitmap_index" so that we can have bitmap
info from more than one on-disk files, and the implementation of
open_pack_bitmap_1() (which is "read bitmap for this one pack" as
opposed to "read bitmpa for this repository" which is this function)
would certainly have to change, but would we need to revert this
change to the function?  If we do not omit this patch, i.e. even if
we know there can be only one pack that may have a corresponding
bitmap but if the code does not know it and has to try git_open()
only to see it fail, how much wasted effort are we talking about?

> Signed-off-by: Teng Long <dyroneteng@gmail.com>
> ---
>  pack-bitmap.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/pack-bitmap.c b/pack-bitmap.c
> index 931219adf0..b1357137bf 100644
> --- a/pack-bitmap.c
> +++ b/pack-bitmap.c
> @@ -484,8 +484,10 @@ static int open_pack_bitmap(struct repository *r,
>  	assert(!bitmap_git->map);
>  
>  	for (p = get_all_packs(r); p; p = p->next) {
> -		if (open_pack_bitmap_1(bitmap_git, p) == 0)
> +		if (open_pack_bitmap_1(bitmap_git, p) == 0) {
>  			ret = 0;
> +			break;
> +		}
>  	}
>  
>  	return ret;
Taylor Blau March 24, 2022, 7:03 p.m. UTC | #2
On Thu, Mar 24, 2022 at 07:44:00PM +0800, Teng Long wrote:
> There will be only one operant ".bitmap" file in repo, so let's
> add "break" statement in "open_pack_bitmap()" when looping all
> the packs in repo.
>
> Signed-off-by: Teng Long <dyroneteng@gmail.com>
> ---
>  pack-bitmap.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/pack-bitmap.c b/pack-bitmap.c
> index 931219adf0..b1357137bf 100644
> --- a/pack-bitmap.c
> +++ b/pack-bitmap.c
> @@ -484,8 +484,10 @@ static int open_pack_bitmap(struct repository *r,
>  	assert(!bitmap_git->map);
>
>  	for (p = get_all_packs(r); p; p = p->next) {
> -		if (open_pack_bitmap_1(bitmap_git, p) == 0)
> +		if (open_pack_bitmap_1(bitmap_git, p) == 0) {
>  			ret = 0;
> +			break;

The lack of a break here is intentional, I think, since having more than
one bitmap of the same kind in a repository is an error.

(This is behavior we inherited from the pre-MIDX bitmap days, when
having more than one pack bitmap caused Git to signal an error, since it
could only use the results from a single bitmap).

You can see in pack-bitmap.c::open_pack_bitmap_1() that we have a
condition saying:

    if (bitmap_git->pack || bitmap_git->midx) {
        /* ignore extra bitmap file; we can only handle one */
        warning("...")
        close(fd;)
        return -1;
    }

We do want to call that open_pack_bitmap_1() function on every pack we
know about to make sure that one and only one of them corresponds to a
.bitmap.

Thanks,
Taylor
Taylor Blau March 24, 2022, 7:06 p.m. UTC | #3
On Thu, Mar 24, 2022 at 11:40:02AM -0700, Junio C Hamano wrote:
> Teng Long <dyroneteng@gmail.com> writes:
>
> > There will be only one operant ".bitmap" file in repo, so let's
> > add "break" statement in "open_pack_bitmap()" when looping all
> > the packs in repo.
>
> Googling "operant" only gives a psychology term X-<.
>
>     : behavior (such as bar pressing by a rat to obtain food) that
>     operates on the environment to produce rewarding and reinforcing
>     effects.

I suspect that the author meant "operating", which I would probably have
written as "Git will only use information from a single .bitmap file in
the repository" or similar.

But...

> I do not quite get this.  We expect that there may be more than one
> .pack in the object store, and that is why we iterate over the packs
> get_all_packs() gives us.  Then for each of these packs, we try to
> open pack bitmap.
>
> Ah, are you referring to the current limitation that we can have
> only one .bitmap file, even when there are many packs in the object
> store?
>
> I wonder if that limitation is something we need to hardcode here.

...you're exactly right: if we see more than one pack bitmap in the
repository, we can and should warn there, since we'll only ever *use*
information from a single bitmap at a time.

That's the behavior before this patch which is guaranteed by t5310.74.
This patch causes that test to break, and I would suggest that is
telling us to drop this part of the series.

Thanks,
Taylor
Teng Long March 29, 2022, 2:49 a.m. UTC | #4
On Thu, 24 Mar 2022 15:03:09 -0400, Taylor Blau wrote:

> The lack of a break here is intentional, I think, since having more than
> one bitmap of the same kind in a repository is an error.

Yeah. I quite sure that it's intentional now.

Thanks for your explanations.

> (This is behavior we inherited from the pre-MIDX bitmap days, when
> having more than one pack bitmap caused Git to signal an error, since it
> could only use the results from a single bitmap).
> 
> You can see in pack-bitmap.c::open_pack_bitmap_1() that we have a
> condition saying:
> 
>     if (bitmap_git->pack || bitmap_git->midx) {
>         /* ignore extra bitmap file; we can only handle one */
>         warning("...")
>         close(fd;)
>         return -1;
>     }
> 
> We do want to call that open_pack_bitmap_1() function on every pack we
> know about to make sure that one and only one of them corresponds to a
> .bitmap.

Yeah, the condition and "warning" make it clear to me which is if already
exists a bitmap of the pack or MIDX is ready, we will give warnings and just
let it fail (return -1 means a return of NULL in "prepare_bitmap_git()", and
will then die() in usage cases I found).

In addition of above, I had a question that why we need "bitmap_git->midx"
in the condition? Because here in "open_pack_bitmap_1()" we intent to open the 
non-midx-bitmap and it's after we deal with "open_midx_bitmap()" in "open_bitmap()":

static int open_bitmap(struct repository *r,
              struct bitmap_index *bitmap_git)
{			      
	assert(!bitmap_git->map);

	if (!open_midx_bitmap(r, bitmap_git))
	   return 0;
	   return open_pack_bitmap(r, bitmap_git);
}

So if I understood correct, maybe we can made condition of "bitmap_git->midx" a little
earlier so that we can avoid to open every packfile,  maybe it's like:

diff --git a/pack-bitmap.c b/pack-bitmap.c
index 9c666cdb8b..38f53b8f1c 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -483,11 +483,12 @@ static int open_pack_bitmap(struct repository *r,
 
        assert(!bitmap_git->map);
 
-       for (p = get_all_packs(r); p; p = p->next) {
-               if (open_pack_bitmap_1(bitmap_git, p) == 0)
-                       ret = 0;
+       if (!bitmap_git->midx) {
+               for (p = get_all_packs(r); p; p = p->next) {
+                       if (open_pack_bitmap_1(bitmap_git, p) == 0)
+                               ret = 0;
+               }
        }
-
        return ret;
 }


Thanks.
Taylor Blau March 30, 2022, 2:55 a.m. UTC | #5
On Tue, Mar 29, 2022 at 10:49:49AM +0800, Teng Long wrote:
> Yeah, the condition and "warning" make it clear to me which is if
> already exists a bitmap of the pack or MIDX is ready, we will give
> warnings and just let it fail (return -1 means a return of NULL in
> "prepare_bitmap_git()", and will then die() in usage cases I found).
>
> In addition of above, I had a question that why we need
> "bitmap_git->midx" in the condition? Because here in
> "open_pack_bitmap_1()" we intent to open the non-midx-bitmap and it's
> after we deal with "open_midx_bitmap()" in "open_bitmap()":

You're right; open_pack_bitmap_1() doesn't need to care about whether or
not bitmap_git->midx is or isn't non-NULL, since:

  - if we did open a MIDX bitmap (which we will always attempt first
    before trying to open single-pack bitmaps), then we won't even
    bother to call open_pack_bitmap() at all.

  - if we _do_ end up within open_pack_bitmap_1(), then we _know_ that
    no MIDX bitmap could be found/opened, so there is no need to check
    in that case, either.

So I think we realistically could do something like:

--- 8< ---

diff --git a/pack-bitmap.c b/pack-bitmap.c
index 97909d48da..6e7c89826d 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -387,3 +387,3 @@ static int open_pack_bitmap_1(struct bitmap_index *bitmap_git, struct packed_git

-	if (bitmap_git->pack || bitmap_git->midx) {
+	if (bitmap_git->pack) {
 		/* ignore extra bitmap file; we can only handle one */

--- >8 ---

...but having the conditional there from the pre-image doesn't hurt,
either, and it makes the error clearer in case of an accidental
regression where we start looking for single-pack bitmaps after
successfully opening a multi-pack one.

> static int open_bitmap(struct repository *r,
>               struct bitmap_index *bitmap_git)
> {
> 	assert(!bitmap_git->map);
>
> 	if (!open_midx_bitmap(r, bitmap_git))
> 	   return 0;
> 	   return open_pack_bitmap(r, bitmap_git);
> }
>
> So if I understood correct, maybe we can made condition of "bitmap_git->midx" a little
> earlier so that we can avoid to open every packfile,  maybe it's like:
>
> diff --git a/pack-bitmap.c b/pack-bitmap.c
> index 9c666cdb8b..38f53b8f1c 100644
> --- a/pack-bitmap.c
> +++ b/pack-bitmap.c
> @@ -483,11 +483,12 @@ static int open_pack_bitmap(struct repository *r,
>
>         assert(!bitmap_git->map);
>
> -       for (p = get_all_packs(r); p; p = p->next) {
> -               if (open_pack_bitmap_1(bitmap_git, p) == 0)
> -                       ret = 0;
> +       if (!bitmap_git->midx) {
> +               for (p = get_all_packs(r); p; p = p->next) {
> +                       if (open_pack_bitmap_1(bitmap_git, p) == 0)
> +                               ret = 0;
> +               }
>         }
> -
>         return ret;
>  }

This shouldn't be necessary, since we don't bother calling
open_pack_bitmap() at all if open_midx_bitmap() returns success. In
other words, based on the way that open_bitmap() (which is the caller
for both of these functions) is written, we know that once we're in
open_pack_bitmap(), that `bitmap_git->midx` is definitely NULL, which
makes this change a noop.

Thanks,
Taylor
diff mbox series

Patch

diff --git a/pack-bitmap.c b/pack-bitmap.c
index 931219adf0..b1357137bf 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -484,8 +484,10 @@  static int open_pack_bitmap(struct repository *r,
 	assert(!bitmap_git->map);
 
 	for (p = get_all_packs(r); p; p = p->next) {
-		if (open_pack_bitmap_1(bitmap_git, p) == 0)
+		if (open_pack_bitmap_1(bitmap_git, p) == 0) {
 			ret = 0;
+			break;
+		}
 	}
 
 	return ret;