diff mbox series

[v2,3/4] builtin/pack-objects.c: ensure included `--stdin-packs` exist

Message ID cdc3265ec27f04accc433d9e4e54ac0edc3b3746.1653418457.git.me@ttaylorr.com (mailing list archive)
State New, archived
Headers show
Series pack-objects: fix a pair of MIDX bitmap-related races | expand

Commit Message

Taylor Blau May 24, 2022, 6:54 p.m. UTC
A subsequent patch will teach `want_object_in_pack()` to set its
`*found_pack` and `*found_offset` poitners to NULL when the provided
pack does not pass the `is_pack_valid()` check.

The `--stdin-packs` mode of `pack-objects` is not quite prepared to
handle this. To prepare it for this change, do the following two things:

  - Ensure provided packs pass the `is_pack_valid()` check when
    collecting the caller-provided packs into the "included" and
    "excluded" lists.

  - Gracefully handle any _invalid_ packs being passed to
    `want_object_in_pack()`.

Calling `is_pack_valid()` early on makes it substantially less likely
that we will have to deal with a pack going away, since we'll have an
open file descriptor on its contents much earlier.

But even packs with open descriptors can become invalid in the future if
we (a) hit our open descriptor limit, forcing us to close some open
packs, and (b) one of those just-closed packs has gone away in the
meantime.

`add_object_entry_from_pack()` depends on having a non-NULL
`*found_pack`, since it passes that pointer to `packed_object_info()`,
meaning that we would SEGV if the pointer became NULL (like we propose
to do in `want_object_in_pack()` in the following patch).

But avoiding calling `packed_object_info()` entirely is OK, too, since
its only purpose is to identify which objects in the included packs are
commits, so that they can form the tips of the advisory traversal used
to discover the object namehashes.

Failing to do this means that at worst we will produce lower-quality
deltas, but it does not prevent us from generating the pack as long as
we can find a copy of each object from the disappearing pack in some
other part of the repository.

Co-authored-by: Victoria Dye <vdye@github.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/pack-objects.c | 35 ++++++++++++++++++++---------------
 1 file changed, 20 insertions(+), 15 deletions(-)

Comments

Ævar Arnfjörð Bjarmason May 24, 2022, 7:46 p.m. UTC | #1
On Tue, May 24 2022, Taylor Blau wrote:

> -	struct rev_info *revs = _data;
> -	struct object_info oi = OBJECT_INFO_INIT;
>  	off_t ofs;
> -	enum object_type type;
> +	enum object_type type = OBJ_NONE;
>  
>  	display_progress(progress_state, ++nr_seen);
>  
> @@ -3215,20 +3213,25 @@ static int add_object_entry_from_pack(const struct object_id *oid,
>  	if (!want_object_in_pack(oid, 0, &p, &ofs))
>  		return 0;
>  
> -	oi.typep = &type;
> -	if (packed_object_info(the_repository, p, ofs, &oi) < 0)
> -		die(_("could not get type of object %s in pack %s"),
> -		    oid_to_hex(oid), p->pack_name);
> -	else if (type == OBJ_COMMIT) {
> -		/*
> -		 * commits in included packs are used as starting points for the
> -		 * subsequent revision walk
> -		 */
> -		add_pending_oid(revs, NULL, oid, 0);
> +	if (p) {
> +		struct rev_info *revs = _data;
> +		struct object_info oi = OBJECT_INFO_INIT;
> +
> +		oi.typep = &type;
> +		if (packed_object_info(the_repository, p, ofs, &oi) < 0) {
> +			die(_("could not get type of object %s in pack %s"),
> +			    oid_to_hex(oid), p->pack_name);
> +		} else if (type == OBJ_COMMIT) {
> +			/*
> +			 * commits in included packs are used as starting points for the
> +			 * subsequent revision walk
> +			 */
> +			add_pending_oid(revs, NULL, oid, 0);
> +		}
> +
> +		stdin_packs_found_nr++;
>  	}
>  
> -	stdin_packs_found_nr++;
> -
>  	create_object_entry(oid, type, 0, 0, 0, p, ofs);

Not rhetorical, since I have no idea: Is the behavior change here to
make create_object_entry with type=OBJ_NONE desired? I.e. do we actually
want to create object entries for OBJ_NONE?

If that is the case I for one would find this a bit easier to follow
like this, even if it has some minor duplication, i.e. the intent is
clearer:
	
	diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
	index ffeaecd1d84..a447f6d5164 100644
	--- a/builtin/pack-objects.c
	+++ b/builtin/pack-objects.c
	@@ -3202,7 +3202,6 @@ static int add_object_entry_from_pack(const struct object_id *oid,
	 				      void *_data)
	 {
	 	off_t ofs;
	-	enum object_type type = OBJ_NONE;
	 
	 	display_progress(progress_state, ++nr_seen);
	 
	@@ -3216,6 +3215,7 @@ static int add_object_entry_from_pack(const struct object_id *oid,
	 	if (p) {
	 		struct rev_info *revs = _data;
	 		struct object_info oi = OBJECT_INFO_INIT;
	+		enum object_type type;
	 
	 		oi.typep = &type;
	 		if (packed_object_info(the_repository, p, ofs, &oi) < 0) {
	@@ -3230,9 +3230,11 @@ static int add_object_entry_from_pack(const struct object_id *oid,
	 		}
	 
	 		stdin_packs_found_nr++;
	-	}
	 
	-	create_object_entry(oid, type, 0, 0, 0, p, ofs);
	+		create_object_entry(oid, type, 0, 0, 0, p, ofs);
	+	} else  {
	+		create_object_entry(oid, OBJ_NONE, 0, 0, 0, p, ofs);
	+	}
	 
	 	return 0;
	 }

Or the same with adding "type = OBJ_NONE" to the "else" branch, leaving
the initial "type" uninitialized"?

Or perhaps this is a bug? I see some OBJ_NONE mentions in the code, but
do packfiles really have "none" objects in some fashion as far as
add_object_entry_from_pack() is concerned? (I'm not familiar enough with
this part of the codebase to know).
Taylor Blau May 24, 2022, 9:33 p.m. UTC | #2
On Tue, May 24, 2022 at 09:46:09PM +0200, Ævar Arnfjörð Bjarmason wrote:
>
> On Tue, May 24 2022, Taylor Blau wrote:
>
> > -	struct rev_info *revs = _data;
> > -	struct object_info oi = OBJECT_INFO_INIT;
> >  	off_t ofs;
> > -	enum object_type type;
> > +	enum object_type type = OBJ_NONE;
> >
> >  	display_progress(progress_state, ++nr_seen);
> >
> > @@ -3215,20 +3213,25 @@ static int add_object_entry_from_pack(const struct object_id *oid,
> >  	if (!want_object_in_pack(oid, 0, &p, &ofs))
> >  		return 0;
> >
> > -	oi.typep = &type;
> > -	if (packed_object_info(the_repository, p, ofs, &oi) < 0)
> > -		die(_("could not get type of object %s in pack %s"),
> > -		    oid_to_hex(oid), p->pack_name);
> > -	else if (type == OBJ_COMMIT) {
> > -		/*
> > -		 * commits in included packs are used as starting points for the
> > -		 * subsequent revision walk
> > -		 */
> > -		add_pending_oid(revs, NULL, oid, 0);
> > +	if (p) {
> > +		struct rev_info *revs = _data;
> > +		struct object_info oi = OBJECT_INFO_INIT;
> > +
> > +		oi.typep = &type;
> > +		if (packed_object_info(the_repository, p, ofs, &oi) < 0) {
> > +			die(_("could not get type of object %s in pack %s"),
> > +			    oid_to_hex(oid), p->pack_name);
> > +		} else if (type == OBJ_COMMIT) {
> > +			/*
> > +			 * commits in included packs are used as starting points for the
> > +			 * subsequent revision walk
> > +			 */
> > +			add_pending_oid(revs, NULL, oid, 0);
> > +		}
> > +
> > +		stdin_packs_found_nr++;
> >  	}
> >
> > -	stdin_packs_found_nr++;
> > -
> >  	create_object_entry(oid, type, 0, 0, 0, p, ofs);
>
> Not rhetorical, since I have no idea: Is the behavior change here to
> make create_object_entry with type=OBJ_NONE desired? I.e. do we actually
> want to create object entries for OBJ_NONE?

This is intentional. OBJ_NONE tells create_object_entry() "we don't know
the type of this object yet", and then `check_object()` (which does the
bulk of the work in the "Counting objects" phase) goes through and fills
in any missing type information.

The caller in `builtin/pack-objects.c::read_object_list_from_stdin()` is
a good example of this (all of the objects created this way start out
with OBJ_NONE).

> If that is the case I for one would find this a bit easier to follow
> like this, even if it has some minor duplication, i.e. the intent is
> clearer:
>
> 	diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> 	index ffeaecd1d84..a447f6d5164 100644
> 	--- a/builtin/pack-objects.c
> 	+++ b/builtin/pack-objects.c
> 	@@ -3202,7 +3202,6 @@ static int add_object_entry_from_pack(const struct object_id *oid,
> 	 				      void *_data)
> 	 {
> 	 	off_t ofs;
> 	-	enum object_type type = OBJ_NONE;
>
> 	 	display_progress(progress_state, ++nr_seen);
>
> 	@@ -3216,6 +3215,7 @@ static int add_object_entry_from_pack(const struct object_id *oid,
> 	 	if (p) {
> 	 		struct rev_info *revs = _data;
> 	 		struct object_info oi = OBJECT_INFO_INIT;
> 	+		enum object_type type;
>
> 	 		oi.typep = &type;
> 	 		if (packed_object_info(the_repository, p, ofs, &oi) < 0) {
> 	@@ -3230,9 +3230,11 @@ static int add_object_entry_from_pack(const struct object_id *oid,
> 	 		}
>
> 	 		stdin_packs_found_nr++;
> 	-	}
>
> 	-	create_object_entry(oid, type, 0, 0, 0, p, ofs);
> 	+		create_object_entry(oid, type, 0, 0, 0, p, ofs);
> 	+	} else  {
> 	+		create_object_entry(oid, OBJ_NONE, 0, 0, 0, p, ofs);
> 	+	}
>
> 	 	return 0;
> 	 }
>
> Or the same with adding "type = OBJ_NONE" to the "else" branch, leaving
> the initial "type" uninitialized"?

I'd be fine with that (and don't really have a very strong opinion
either way). Let's see if anybody else has thoughts about it, and then
I'm happy to change it in a subsequent version.

Thanks,
Taylor
Ævar Arnfjörð Bjarmason May 24, 2022, 9:49 p.m. UTC | #3
On Tue, May 24 2022, Taylor Blau wrote:

> On Tue, May 24, 2022 at 09:46:09PM +0200, Ævar Arnfjörð Bjarmason wrote:
>>
>> On Tue, May 24 2022, Taylor Blau wrote:
>>
>> > -	struct rev_info *revs = _data;
>> > -	struct object_info oi = OBJECT_INFO_INIT;
>> >  	off_t ofs;
>> > -	enum object_type type;
>> > +	enum object_type type = OBJ_NONE;
>> >
>> >  	display_progress(progress_state, ++nr_seen);
>> >
>> > @@ -3215,20 +3213,25 @@ static int add_object_entry_from_pack(const struct object_id *oid,
>> >  	if (!want_object_in_pack(oid, 0, &p, &ofs))
>> >  		return 0;
>> >
>> > -	oi.typep = &type;
>> > -	if (packed_object_info(the_repository, p, ofs, &oi) < 0)
>> > -		die(_("could not get type of object %s in pack %s"),
>> > -		    oid_to_hex(oid), p->pack_name);
>> > -	else if (type == OBJ_COMMIT) {
>> > -		/*
>> > -		 * commits in included packs are used as starting points for the
>> > -		 * subsequent revision walk
>> > -		 */
>> > -		add_pending_oid(revs, NULL, oid, 0);
>> > +	if (p) {
>> > +		struct rev_info *revs = _data;
>> > +		struct object_info oi = OBJECT_INFO_INIT;
>> > +
>> > +		oi.typep = &type;
>> > +		if (packed_object_info(the_repository, p, ofs, &oi) < 0) {
>> > +			die(_("could not get type of object %s in pack %s"),
>> > +			    oid_to_hex(oid), p->pack_name);
>> > +		} else if (type == OBJ_COMMIT) {
>> > +			/*
>> > +			 * commits in included packs are used as starting points for the
>> > +			 * subsequent revision walk
>> > +			 */
>> > +			add_pending_oid(revs, NULL, oid, 0);
>> > +		}
>> > +
>> > +		stdin_packs_found_nr++;
>> >  	}
>> >
>> > -	stdin_packs_found_nr++;
>> > -
>> >  	create_object_entry(oid, type, 0, 0, 0, p, ofs);
>>
>> Not rhetorical, since I have no idea: Is the behavior change here to
>> make create_object_entry with type=OBJ_NONE desired? I.e. do we actually
>> want to create object entries for OBJ_NONE?
>
> This is intentional. OBJ_NONE tells create_object_entry() "we don't know
> the type of this object yet", and then `check_object()` (which does the
> bulk of the work in the "Counting objects" phase) goes through and fills
> in any missing type information.

Ah, I didn't know that.

> The caller in `builtin/pack-objects.c::read_object_list_from_stdin()` is
> a good example of this (all of the objects created this way start out
> with OBJ_NONE).
>
>> If that is the case I for one would find this a bit easier to follow
>> like this, even if it has some minor duplication, i.e. the intent is
>> clearer:
>>
>> 	diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
>> 	index ffeaecd1d84..a447f6d5164 100644
>> 	--- a/builtin/pack-objects.c
>> 	+++ b/builtin/pack-objects.c
>> 	@@ -3202,7 +3202,6 @@ static int add_object_entry_from_pack(const struct object_id *oid,
>> 	 				      void *_data)
>> 	 {
>> 	 	off_t ofs;
>> 	-	enum object_type type = OBJ_NONE;
>>
>> 	 	display_progress(progress_state, ++nr_seen);
>>
>> 	@@ -3216,6 +3215,7 @@ static int add_object_entry_from_pack(const struct object_id *oid,
>> 	 	if (p) {
>> 	 		struct rev_info *revs = _data;
>> 	 		struct object_info oi = OBJECT_INFO_INIT;
>> 	+		enum object_type type;
>>
>> 	 		oi.typep = &type;
>> 	 		if (packed_object_info(the_repository, p, ofs, &oi) < 0) {
>> 	@@ -3230,9 +3230,11 @@ static int add_object_entry_from_pack(const struct object_id *oid,
>> 	 		}
>>
>> 	 		stdin_packs_found_nr++;
>> 	-	}
>>
>> 	-	create_object_entry(oid, type, 0, 0, 0, p, ofs);
>> 	+		create_object_entry(oid, type, 0, 0, 0, p, ofs);
>> 	+	} else  {
>> 	+		create_object_entry(oid, OBJ_NONE, 0, 0, 0, p, ofs);
>> 	+	}
>>
>> 	 	return 0;
>> 	 }
>>
>> Or the same with adding "type = OBJ_NONE" to the "else" branch, leaving
>> the initial "type" uninitialized"?
>
> I'd be fine with that (and don't really have a very strong opinion
> either way). Let's see if anybody else has thoughts about it, and then
> I'm happy to change it in a subsequent version.

FWIW I think you should place a particularly low value on my suggestion
of this.

I.e. the last thing we should do is probably to optimize the code to be
read by someone who hadn't spent even 10 minutes finding out such
obvious major code-flow details, i.e. me not knowing about how OBJ_NONE
was used in this case....

So it's probably all fine as-is, but p erhaps others will think it's
good or whatever...
Junio C Hamano May 24, 2022, 10:03 p.m. UTC | #4
Taylor Blau <me@ttaylorr.com> writes:

> Calling `is_pack_valid()` early on makes it substantially less likely
> that we will have to deal with a pack going away, since we'll have an
> open file descriptor on its contents much earlier.

Sorry for asking a stupid question (or two), but I am confused.

This does make sure that we can read and use the contents of the
packfile even when somebody else removes it from the disk by
ensuring that

 (1) we have an open file descriptor to it, so that we could open
     mmap window into it at will; or

 (2) we have a mmap window that covers all of it (this should be the
     norm on platforms with vast address space); or

 (3) we are in the same state as (1) by opening the packfile to
     validate the pack right now.
     
and during the pack-object we are running (aka "repack"), we can
continue to read from that pack that may have already disappeared
from the disk.

But is that sufficient?  Are we writing the resulting new pack(s)
out in such a way that the repository is healthy without the pack
we noticed is disappearing?  How do we ensure that?

Thanks.
Taylor Blau May 25, 2022, 12:14 a.m. UTC | #5
On Tue, May 24, 2022 at 03:03:11PM -0700, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
>
> > Calling `is_pack_valid()` early on makes it substantially less likely
> > that we will have to deal with a pack going away, since we'll have an
> > open file descriptor on its contents much earlier.
>
> Sorry for asking a stupid question (or two), but I am confused.

No such thing as a stupid question, so your apology is not necessary in
the slightest :).

> This does make sure that we can read and use the contents of the
> packfile even when somebody else removes it from the disk by
> ensuring that
>
>  (1) we have an open file descriptor to it, so that we could open
>      mmap window into it at will; or
>
>  (2) we have a mmap window that covers all of it (this should be the
>      norm on platforms with vast address space); or
>
>  (3) we are in the same state as (1) by opening the packfile to
>      validate the pack right now.
>
> and during the pack-object we are running (aka "repack"), we can
> continue to read from that pack that may have already disappeared
> from the disk.
>
> But is that sufficient?  Are we writing the resulting new pack(s)
> out in such a way that the repository is healthy without the pack
> we noticed is disappearing?  How do we ensure that?

It's sufficient in the sense that we're writing out all of the objects
we were asked to (from pack-objects's perspective). Of course, if the
"simultaneous writer" is just removing packs right after they are
opened, that will produce an obviously-broken state. But assuming that
repack isn't removing objects it shouldn't (which I think is a safe
assumption from pack-objects' perspective, since all it cares about is
writing packs that contain the desired set of objects), then we are OK.

Thanks,
Taylor
Victoria Dye May 26, 2022, 7:21 p.m. UTC | #6
Taylor Blau wrote:
> A subsequent patch will teach `want_object_in_pack()` to set its
> `*found_pack` and `*found_offset` poitners to NULL when the provided

s/poitners/pointers

> pack does not pass the `is_pack_valid()` check.
> 
> The `--stdin-packs` mode of `pack-objects` is not quite prepared to
> handle this. To prepare it for this change, do the following two things:
> 
>   - Ensure provided packs pass the `is_pack_valid()` check when
>     collecting the caller-provided packs into the "included" and
>     "excluded" lists.
> 

Is the 'is_pack_valid()' check happening for the "excluded" packs? It looks
like you only added it for the packs in the "included" list in this patch.

>   - Gracefully handle any _invalid_ packs being passed to
>     `want_object_in_pack()`.
> 
> Calling `is_pack_valid()` early on makes it substantially less likely
> that we will have to deal with a pack going away, since we'll have an
> open file descriptor on its contents much earlier.
> 
> But even packs with open descriptors can become invalid in the future if
> we (a) hit our open descriptor limit, forcing us to close some open
> packs, and (b) one of those just-closed packs has gone away in the
> meantime.
> 
> `add_object_entry_from_pack()` depends on having a non-NULL
> `*found_pack`, since it passes that pointer to `packed_object_info()`,
> meaning that we would SEGV if the pointer became NULL (like we propose
> to do in `want_object_in_pack()` in the following patch).
> 
> But avoiding calling `packed_object_info()` entirely is OK, too, since
> its only purpose is to identify which objects in the included packs are
> commits, so that they can form the tips of the advisory traversal used
> to discover the object namehashes.
> 
> Failing to do this means that at worst we will produce lower-quality
> deltas, but it does not prevent us from generating the pack as long as
> we can find a copy of each object from the disappearing pack in some
> other part of the repository.
> 

The rest of this makes sense and (as far as I can tell) lines up with the
implementation below.

> Co-authored-by: Victoria Dye <vdye@github.com>
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> ---
>  builtin/pack-objects.c | 35 ++++++++++++++++++++---------------
>  1 file changed, 20 insertions(+), 15 deletions(-)
> 
> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> index ec3193fd95..ffeaecd1d8 100644
> --- a/builtin/pack-objects.c
> +++ b/builtin/pack-objects.c
> @@ -3201,10 +3201,8 @@ static int add_object_entry_from_pack(const struct object_id *oid,
>  				      uint32_t pos,
>  				      void *_data)
>  {
> -	struct rev_info *revs = _data;
> -	struct object_info oi = OBJECT_INFO_INIT;
>  	off_t ofs;
> -	enum object_type type;
> +	enum object_type type = OBJ_NONE;
>  
>  	display_progress(progress_state, ++nr_seen);
>  
> @@ -3215,20 +3213,25 @@ static int add_object_entry_from_pack(const struct object_id *oid,
>  	if (!want_object_in_pack(oid, 0, &p, &ofs))
>  		return 0;
>  
> -	oi.typep = &type;
> -	if (packed_object_info(the_repository, p, ofs, &oi) < 0)
> -		die(_("could not get type of object %s in pack %s"),
> -		    oid_to_hex(oid), p->pack_name);
> -	else if (type == OBJ_COMMIT) {
> -		/*
> -		 * commits in included packs are used as starting points for the
> -		 * subsequent revision walk
> -		 */
> -		add_pending_oid(revs, NULL, oid, 0);
> +	if (p) {
> +		struct rev_info *revs = _data;
> +		struct object_info oi = OBJECT_INFO_INIT;
> +
> +		oi.typep = &type;
> +		if (packed_object_info(the_repository, p, ofs, &oi) < 0) {
> +			die(_("could not get type of object %s in pack %s"),
> +			    oid_to_hex(oid), p->pack_name);
> +		} else if (type == OBJ_COMMIT) {
> +			/*
> +			 * commits in included packs are used as starting points for the
> +			 * subsequent revision walk
> +			 */
> +			add_pending_oid(revs, NULL, oid, 0);
> +		}
> +
> +		stdin_packs_found_nr++;
>  	}
>  
> -	stdin_packs_found_nr++;
> -
>  	create_object_entry(oid, type, 0, 0, 0, p, ofs);
>  
>  	return 0;
> @@ -3346,6 +3349,8 @@ static void read_packs_list_from_stdin(void)
>  		struct packed_git *p = item->util;
>  		if (!p)
>  			die(_("could not find pack '%s'"), item->string);
> +		if (!is_pack_valid(p))
> +			die(_("packfile %s cannot be accessed"), p->pack_name);
>  	}
>  
>  	/*
Taylor Blau May 26, 2022, 8:05 p.m. UTC | #7
On Thu, May 26, 2022 at 12:21:48PM -0700, Victoria Dye wrote:
> > pack does not pass the `is_pack_valid()` check.
> >
> > The `--stdin-packs` mode of `pack-objects` is not quite prepared to
> > handle this. To prepare it for this change, do the following two things:
> >
> >   - Ensure provided packs pass the `is_pack_valid()` check when
> >     collecting the caller-provided packs into the "included" and
> >     "excluded" lists.
> >
>
> Is the 'is_pack_valid()' check happening for the "excluded" packs? It looks
> like you only added it for the packs in the "included" list in this patch.

You're right that we don't do it explicitly. That's OK, since we won't
use any objects in excluded packs, and thus don't need to eagerly grab
an descriptor on it to prevent against the race we're handling here.

(In practice, we do end up calling is_pack_valid() on excluded packs
later on, via

  - want_found_object() (or one of its many callers), which itself calls
  - has_object_kept_pack(), which calls
  - find_kept_pack_entry(), which calls
  - fill_pack_entry(), which calls
  - is_pack_valid(), which calls

but that's a side-effect that doesn't help or hurt us.)

We _do_ need to be able to open the .idx (which happens in
`fill_pack_entry() -> find_pack_entry_one() -> open_pack_index()` , but
we'll fail appropriately when the index cannot be located.

> The rest of this makes sense and (as far as I can tell) lines up with the
> implementation below.

Thanks for taking a look!

Thanks,
Taylor
diff mbox series

Patch

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index ec3193fd95..ffeaecd1d8 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3201,10 +3201,8 @@  static int add_object_entry_from_pack(const struct object_id *oid,
 				      uint32_t pos,
 				      void *_data)
 {
-	struct rev_info *revs = _data;
-	struct object_info oi = OBJECT_INFO_INIT;
 	off_t ofs;
-	enum object_type type;
+	enum object_type type = OBJ_NONE;
 
 	display_progress(progress_state, ++nr_seen);
 
@@ -3215,20 +3213,25 @@  static int add_object_entry_from_pack(const struct object_id *oid,
 	if (!want_object_in_pack(oid, 0, &p, &ofs))
 		return 0;
 
-	oi.typep = &type;
-	if (packed_object_info(the_repository, p, ofs, &oi) < 0)
-		die(_("could not get type of object %s in pack %s"),
-		    oid_to_hex(oid), p->pack_name);
-	else if (type == OBJ_COMMIT) {
-		/*
-		 * commits in included packs are used as starting points for the
-		 * subsequent revision walk
-		 */
-		add_pending_oid(revs, NULL, oid, 0);
+	if (p) {
+		struct rev_info *revs = _data;
+		struct object_info oi = OBJECT_INFO_INIT;
+
+		oi.typep = &type;
+		if (packed_object_info(the_repository, p, ofs, &oi) < 0) {
+			die(_("could not get type of object %s in pack %s"),
+			    oid_to_hex(oid), p->pack_name);
+		} else if (type == OBJ_COMMIT) {
+			/*
+			 * commits in included packs are used as starting points for the
+			 * subsequent revision walk
+			 */
+			add_pending_oid(revs, NULL, oid, 0);
+		}
+
+		stdin_packs_found_nr++;
 	}
 
-	stdin_packs_found_nr++;
-
 	create_object_entry(oid, type, 0, 0, 0, p, ofs);
 
 	return 0;
@@ -3346,6 +3349,8 @@  static void read_packs_list_from_stdin(void)
 		struct packed_git *p = item->util;
 		if (!p)
 			die(_("could not find pack '%s'"), item->string);
+		if (!is_pack_valid(p))
+			die(_("packfile %s cannot be accessed"), p->pack_name);
 	}
 
 	/*