diff mbox series

[v13,1/7] unpack-objects: low memory footprint for get_data() in dry_run mode

Message ID patch-v13-1.7-12873fc9915-20220604T095113Z-avarab@gmail.com (mailing list archive)
State New, archived
Headers show
Series unpack-objects: support streaming blobs to disk | expand

Commit Message

Ævar Arnfjörð Bjarmason June 4, 2022, 10:10 a.m. UTC
From: Han Xin <hanxin.hx@alibaba-inc.com>

As the name implies, "get_data(size)" will allocate and return a given
amount of memory. Allocating memory for a large blob object may cause the
system to run out of memory. Before preparing to replace calling of
"get_data()" to unpack large blob objects in latter commits, refactor
"get_data()" to reduce memory footprint for dry_run mode.

Because in dry_run mode, "get_data()" is only used to check the
integrity of data, and the returned buffer is not used at all, we can
allocate a smaller buffer and reuse it as zstream output. Therefore,
in dry_run mode, "get_data()" will release the allocated buffer and
return NULL instead of returning garbage data.

The "find [...]objects/?? -type f | wc -l" test idiom being used here
is adapted from the same "find" use added to another test in
d9545c7f465 (fast-import: implement unpack limit, 2016-04-25).

Suggested-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
Signed-off-by: Han Xin <hanxin.hx@alibaba-inc.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/unpack-objects.c        | 34 ++++++++++++++++++---------
 t/t5351-unpack-large-objects.sh | 41 +++++++++++++++++++++++++++++++++
 2 files changed, 64 insertions(+), 11 deletions(-)
 create mode 100755 t/t5351-unpack-large-objects.sh

Comments

Junio C Hamano June 6, 2022, 6:35 p.m. UTC | #1
Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> From: Han Xin <hanxin.hx@alibaba-inc.com>
>
> As the name implies, "get_data(size)" will allocate and return a given
> amount of memory. Allocating memory for a large blob object may cause the
> system to run out of memory. Before preparing to replace calling of
> "get_data()" to unpack large blob objects in latter commits, refactor
> "get_data()" to reduce memory footprint for dry_run mode.
>
> Because in dry_run mode, "get_data()" is only used to check the
> integrity of data, and the returned buffer is not used at all, we can
> allocate a smaller buffer and reuse it as zstream output. Therefore,

"reuse" -> "use"

> in dry_run mode, "get_data()" will release the allocated buffer and
> return NULL instead of returning garbage data.

It makes it sound as if we used to return garbage data, but I do not
think that is what happened in reality.  Perhaps rewrite the last
sentence like

	Make the function return NULL in the dry-run mode, as no
	callers use the returned buffer.

or something?

The overall logic sounds quite sensible.

> The "find [...]objects/?? -type f | wc -l" test idiom being used here
> is adapted from the same "find" use added to another test in
> d9545c7f465 (fast-import: implement unpack limit, 2016-04-25).


> +/*
> + * Decompress zstream from stdin and return specific size of data.

"specific size"?  The caller specifies the size of data (because it
knows a-priori how many bytes the zstream should inflate to), so

    Decompress zstream from the standard input into a newly
    allocated buffer of specified size and return the buffer.

or something, perhaps.  In any case, it needs to say that the caller
is responsible for giving the "right" size.

> + * The caller is responsible to free the returned buffer.
> + *
> + * But for dry_run mode, "get_data()" is only used to check the
> + * integrity of data, and the returned buffer is not used at all.
> + * Therefore, in dry_run mode, "get_data()" will release the small
> + * allocated buffer which is reused to hold temporary zstream output
> + * and return NULL instead of returning garbage data.
> + */
>  static void *get_data(unsigned long size)
>  {
>  	git_zstream stream;
> -	void *buf = xmallocz(size);
> +	unsigned long bufsize = dry_run && size > 8192 ? 8192 : size;
> +	void *buf = xmallocz(bufsize);

OK.

>  	memset(&stream, 0, sizeof(stream));
>  
>  	stream.next_out = buf;
> -	stream.avail_out = size;
> +	stream.avail_out = bufsize;
>  	stream.next_in = fill(1);
>  	stream.avail_in = len;
>  	git_inflate_init(&stream);
> @@ -125,8 +136,15 @@ static void *get_data(unsigned long size)

What's hidden in the pre-context is this bit:

		int ret = git_inflate(&stream, 0);
		use(len - stream.avail_in);
		if (stream.total_out == size && ret == Z_STREAM_END)
			break;
		if (ret != Z_OK) {
			error("inflate returned %d", ret);
			FREE_AND_NULL(buf);
			if (!recover)
				exit(1);
			has_errors = 1;
			break;
		}

and it is correct to use "size", not "bufsize", for this check.
Unless we receive exactly the caller-specified "size" bytes from the
inflated zstream with Z_STREAM_END, we want to detect an error and
bail out.

I am not sure if this is not loosening the error checking in the
dry-run case, though.  In the original code, we set the avail_out
to the total expected size so

 (1) if the caller gives too small a size, git_inflate() would stop
     at stream.total_out with ret that is not STREAM_END nor OK,
     bypassing the "break", and we catch the error.

 (2) if the caller gives too large a size, git_inflate() would stop
     at the true size of inflated zstream, with STREAM_END and would
     not hit this "break", and we catch the error.

With the new code, since we keep refreshing avail_out (see below),
git_inflate() does not even learn how many bytes we are _expecting_
to see.  Is the error checking in the loop, with the updated code,
catch the mismatch between expected and actual size (plausibly
caused by a corrupted zstream) the same way as we do in the 
non dry-run code path?

>  		}
>  		stream.next_in = fill(1);
>  		stream.avail_in = len;
> +		if (dry_run) {
> +			/* reuse the buffer in dry_run mode */
> +			stream.next_out = buf;
> +			stream.avail_out = bufsize;
> +		}
>  	}
>  	git_inflate_end(&stream);
> +	if (dry_run)
> +		FREE_AND_NULL(buf);
>  	return buf;
>  }
Han Xin June 9, 2022, 4:10 a.m. UTC | #2
On Tue, Jun 7, 2022 at 2:35 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:
>
> > From: Han Xin <hanxin.hx@alibaba-inc.com>
> >
> > As the name implies, "get_data(size)" will allocate and return a given
> > amount of memory. Allocating memory for a large blob object may cause the
> > system to run out of memory. Before preparing to replace calling of
> > "get_data()" to unpack large blob objects in latter commits, refactor
> > "get_data()" to reduce memory footprint for dry_run mode.
> >
> > Because in dry_run mode, "get_data()" is only used to check the
> > integrity of data, and the returned buffer is not used at all, we can
> > allocate a smaller buffer and reuse it as zstream output. Therefore,
>
> "reuse" -> "use"
>
> > in dry_run mode, "get_data()" will release the allocated buffer and
> > return NULL instead of returning garbage data.
>
> It makes it sound as if we used to return garbage data, but I do not
> think that is what happened in reality.  Perhaps rewrite the last
> sentence like
>
>         Make the function return NULL in the dry-run mode, as no
>         callers use the returned buffer.
>
> or something?
>
> The overall logic sounds quite sensible.
>
> > The "find [...]objects/?? -type f | wc -l" test idiom being used here
> > is adapted from the same "find" use added to another test in
> > d9545c7f465 (fast-import: implement unpack limit, 2016-04-25).
>
>
> > +/*
> > + * Decompress zstream from stdin and return specific size of data.
>
> "specific size"?  The caller specifies the size of data (because it
> knows a-priori how many bytes the zstream should inflate to), so
>
>     Decompress zstream from the standard input into a newly
>     allocated buffer of specified size and return the buffer.
>
> or something, perhaps.  In any case, it needs to say that the caller
> is responsible for giving the "right" size.
>
> > + * The caller is responsible to free the returned buffer.
> > + *
> > + * But for dry_run mode, "get_data()" is only used to check the
> > + * integrity of data, and the returned buffer is not used at all.
> > + * Therefore, in dry_run mode, "get_data()" will release the small
> > + * allocated buffer which is reused to hold temporary zstream output
> > + * and return NULL instead of returning garbage data.
> > + */
> >  static void *get_data(unsigned long size)
> >  {
> >       git_zstream stream;
> > -     void *buf = xmallocz(size);
> > +     unsigned long bufsize = dry_run && size > 8192 ? 8192 : size;
> > +     void *buf = xmallocz(bufsize);
>
> OK.
>
> >       memset(&stream, 0, sizeof(stream));
> >
> >       stream.next_out = buf;
> > -     stream.avail_out = size;
> > +     stream.avail_out = bufsize;
> >       stream.next_in = fill(1);
> >       stream.avail_in = len;
> >       git_inflate_init(&stream);
> > @@ -125,8 +136,15 @@ static void *get_data(unsigned long size)
>
> What's hidden in the pre-context is this bit:
>
>                 int ret = git_inflate(&stream, 0);
>                 use(len - stream.avail_in);
>                 if (stream.total_out == size && ret == Z_STREAM_END)
>                         break;
>                 if (ret != Z_OK) {
>                         error("inflate returned %d", ret);
>                         FREE_AND_NULL(buf);
>                         if (!recover)
>                                 exit(1);
>                         has_errors = 1;
>                         break;
>                 }
>
> and it is correct to use "size", not "bufsize", for this check.
> Unless we receive exactly the caller-specified "size" bytes from the
> inflated zstream with Z_STREAM_END, we want to detect an error and
> bail out.
>
> I am not sure if this is not loosening the error checking in the
> dry-run case, though.  In the original code, we set the avail_out
> to the total expected size so
>
>  (1) if the caller gives too small a size, git_inflate() would stop
>      at stream.total_out with ret that is not STREAM_END nor OK,
>      bypassing the "break", and we catch the error.
>
>  (2) if the caller gives too large a size, git_inflate() would stop
>      at the true size of inflated zstream, with STREAM_END and would
>      not hit this "break", and we catch the error.
>
> With the new code, since we keep refreshing avail_out (see below),
> git_inflate() does not even learn how many bytes we are _expecting_
> to see.  Is the error checking in the loop, with the updated code,
> catch the mismatch between expected and actual size (plausibly
> caused by a corrupted zstream) the same way as we do in the
> non dry-run code path?
>

Unlike the original implementation, if we get a corrupted zstream, we
won't break at Z_BUFFER_ERROR, maybe until we've read all the
input. I think it can still catch the mismatch between expected and
actual size when "fill(1)" gets an EOF, if it's not too late.

Thanks.
-Han Xin

> >               }
> >               stream.next_in = fill(1);
> >               stream.avail_in = len;
> > +             if (dry_run) {
> > +                     /* reuse the buffer in dry_run mode */
> > +                     stream.next_out = buf;
> > +                     stream.avail_out = bufsize;
> > +             }
> >       }
> >       git_inflate_end(&stream);
> > +     if (dry_run)
> > +             FREE_AND_NULL(buf);
> >       return buf;
> >  }
Junio C Hamano June 9, 2022, 6:27 p.m. UTC | #3
Han Xin <chiyutianyi@gmail.com> writes:

>> I am not sure if this is not loosening the error checking in the
>> dry-run case, though.  In the original code, we set the avail_out
>> to the total expected size so
>>
>>  (1) if the caller gives too small a size, git_inflate() would stop
>>      at stream.total_out with ret that is not STREAM_END nor OK,
>>      bypassing the "break", and we catch the error.
>>
>>  (2) if the caller gives too large a size, git_inflate() would stop
>>      at the true size of inflated zstream, with STREAM_END and would
>>      not hit this "break", and we catch the error.
>>
>> With the new code, since we keep refreshing avail_out (see below),
>> git_inflate() does not even learn how many bytes we are _expecting_
>> to see.  Is the error checking in the loop, with the updated code,
>> catch the mismatch between expected and actual size (plausibly
>> caused by a corrupted zstream) the same way as we do in the
>> non dry-run code path?
>>
>
> Unlike the original implementation, if we get a corrupted zstream, we
> won't break at Z_BUFFER_ERROR, maybe until we've read all the
> input. I think it can still catch the mismatch between expected and
> actual size when "fill(1)" gets an EOF, if it's not too late.

That is only one half of the two possible failure cases, i.e. input
is shorter than the expected size.  If the caller specified size is
smaller than what the stream inflates to, I do not see the new code
to be limiting the .avail_out near the end of the iteration, which
would be necessary to catch such an error, even if we are not
interested in using the inflated contents, no?
Han Xin June 10, 2022, 1:50 a.m. UTC | #4
On Fri, Jun 10, 2022 at 2:27 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> Han Xin <chiyutianyi@gmail.com> writes:
>
> >> I am not sure if this is not loosening the error checking in the
> >> dry-run case, though.  In the original code, we set the avail_out
> >> to the total expected size so
> >>
> >>  (1) if the caller gives too small a size, git_inflate() would stop
> >>      at stream.total_out with ret that is not STREAM_END nor OK,
> >>      bypassing the "break", and we catch the error.
> >>
> >>  (2) if the caller gives too large a size, git_inflate() would stop
> >>      at the true size of inflated zstream, with STREAM_END and would
> >>      not hit this "break", and we catch the error.
> >>
> >> With the new code, since we keep refreshing avail_out (see below),
> >> git_inflate() does not even learn how many bytes we are _expecting_
> >> to see.  Is the error checking in the loop, with the updated code,
> >> catch the mismatch between expected and actual size (plausibly
> >> caused by a corrupted zstream) the same way as we do in the
> >> non dry-run code path?
> >>
> >
> > Unlike the original implementation, if we get a corrupted zstream, we
> > won't break at Z_BUFFER_ERROR, maybe until we've read all the
> > input. I think it can still catch the mismatch between expected and
> > actual size when "fill(1)" gets an EOF, if it's not too late.
>
> That is only one half of the two possible failure cases, i.e. input
> is shorter than the expected size.  If the caller specified size is
> smaller than what the stream inflates to, I do not see the new code
> to be limiting the .avail_out near the end of the iteration, which
> would be necessary to catch such an error, even if we are not
> interested in using the inflated contents, no?
>

Yes, you are right.

Instead of always using a fixed "bufsize" even if there is not enough
expected output remaining, we can get a more accurate one by comparing
"total_out" to "size", so we can catch problems early by getting
Z_BUFFER_ERROR.

diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c
index 64abba8dba..5d59144883 100644
--- a/builtin/unpack-objects.c
+++ b/builtin/unpack-objects.c
@@ -139,7 +139,8 @@ static void *get_data(unsigned long size)
                if (dry_run) {
                        /* reuse the buffer in dry_run mode */
                        stream.next_out = buf;
-                       stream.avail_out = bufsize;
+                       stream.avail_out = bufsize > size - stream.total_out ?
+                               size - stream.total_out : bufsize;
                }
        }
        git_inflate_end(&stream);

Thanks
-Han Xin
Ævar Arnfjörð Bjarmason June 10, 2022, 2:05 a.m. UTC | #5
On Fri, Jun 10 2022, Han Xin wrote:

> On Fri, Jun 10, 2022 at 2:27 AM Junio C Hamano <gitster@pobox.com> wrote:
>>
>> Han Xin <chiyutianyi@gmail.com> writes:
>>
>> >> I am not sure if this is not loosening the error checking in the
>> >> dry-run case, though.  In the original code, we set the avail_out
>> >> to the total expected size so
>> >>
>> >>  (1) if the caller gives too small a size, git_inflate() would stop
>> >>      at stream.total_out with ret that is not STREAM_END nor OK,
>> >>      bypassing the "break", and we catch the error.
>> >>
>> >>  (2) if the caller gives too large a size, git_inflate() would stop
>> >>      at the true size of inflated zstream, with STREAM_END and would
>> >>      not hit this "break", and we catch the error.
>> >>
>> >> With the new code, since we keep refreshing avail_out (see below),
>> >> git_inflate() does not even learn how many bytes we are _expecting_
>> >> to see.  Is the error checking in the loop, with the updated code,
>> >> catch the mismatch between expected and actual size (plausibly
>> >> caused by a corrupted zstream) the same way as we do in the
>> >> non dry-run code path?
>> >>
>> >
>> > Unlike the original implementation, if we get a corrupted zstream, we
>> > won't break at Z_BUFFER_ERROR, maybe until we've read all the
>> > input. I think it can still catch the mismatch between expected and
>> > actual size when "fill(1)" gets an EOF, if it's not too late.
>>
>> That is only one half of the two possible failure cases, i.e. input
>> is shorter than the expected size.  If the caller specified size is
>> smaller than what the stream inflates to, I do not see the new code
>> to be limiting the .avail_out near the end of the iteration, which
>> would be necessary to catch such an error, even if we are not
>> interested in using the inflated contents, no?
>>
>
> Yes, you are right.
>
> Instead of always using a fixed "bufsize" even if there is not enough
> expected output remaining, we can get a more accurate one by comparing
> "total_out" to "size", so we can catch problems early by getting
> Z_BUFFER_ERROR.
>
> diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c
> index 64abba8dba..5d59144883 100644
> --- a/builtin/unpack-objects.c
> +++ b/builtin/unpack-objects.c
> @@ -139,7 +139,8 @@ static void *get_data(unsigned long size)
>                 if (dry_run) {
>                         /* reuse the buffer in dry_run mode */
>                         stream.next_out = buf;
> -                       stream.avail_out = bufsize;
> +                       stream.avail_out = bufsize > size - stream.total_out ?
> +                               size - stream.total_out : bufsize;
>                 }
>         }
>         git_inflate_end(&stream);
>
> Thanks
> -Han Xin

Han, do you want to pick this up again for a v14? It looks like you're
very on top of it already, and I re-sent your patches because I saw that
your
https://lore.kernel.org/git/cover.1653015534.git.chiyutianyi@gmail.com/
wasn't picked up in the interim & you hadn't been active on-list
otherwise.

But it looks like there's some interest now, and that you have more time
to test & follow-up on this topic than I do at the moment, so if you
wanted to do the work of properly rebasing ot in tho recent fsync
changes that would be great. Thanks.
Han Xin June 10, 2022, 12:04 p.m. UTC | #6
On Fri, Jun 10, 2022 at 10:07 AM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
>
> On Fri, Jun 10 2022, Han Xin wrote:
>
> > On Fri, Jun 10, 2022 at 2:27 AM Junio C Hamano <gitster@pobox.com> wrote:
> >>
> >> Han Xin <chiyutianyi@gmail.com> writes:
> >>
> >> >> I am not sure if this is not loosening the error checking in the
> >> >> dry-run case, though.  In the original code, we set the avail_out
> >> >> to the total expected size so
> >> >>
> >> >>  (1) if the caller gives too small a size, git_inflate() would stop
> >> >>      at stream.total_out with ret that is not STREAM_END nor OK,
> >> >>      bypassing the "break", and we catch the error.
> >> >>
> >> >>  (2) if the caller gives too large a size, git_inflate() would stop
> >> >>      at the true size of inflated zstream, with STREAM_END and would
> >> >>      not hit this "break", and we catch the error.
> >> >>
> >> >> With the new code, since we keep refreshing avail_out (see below),
> >> >> git_inflate() does not even learn how many bytes we are _expecting_
> >> >> to see.  Is the error checking in the loop, with the updated code,
> >> >> catch the mismatch between expected and actual size (plausibly
> >> >> caused by a corrupted zstream) the same way as we do in the
> >> >> non dry-run code path?
> >> >>
> >> >
> >> > Unlike the original implementation, if we get a corrupted zstream, we
> >> > won't break at Z_BUFFER_ERROR, maybe until we've read all the
> >> > input. I think it can still catch the mismatch between expected and
> >> > actual size when "fill(1)" gets an EOF, if it's not too late.
> >>
> >> That is only one half of the two possible failure cases, i.e. input
> >> is shorter than the expected size.  If the caller specified size is
> >> smaller than what the stream inflates to, I do not see the new code
> >> to be limiting the .avail_out near the end of the iteration, which
> >> would be necessary to catch such an error, even if we are not
> >> interested in using the inflated contents, no?
> >>
> >
> > Yes, you are right.
> >
> > Instead of always using a fixed "bufsize" even if there is not enough
> > expected output remaining, we can get a more accurate one by comparing
> > "total_out" to "size", so we can catch problems early by getting
> > Z_BUFFER_ERROR.
> >
> > diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c
> > index 64abba8dba..5d59144883 100644
> > --- a/builtin/unpack-objects.c
> > +++ b/builtin/unpack-objects.c
> > @@ -139,7 +139,8 @@ static void *get_data(unsigned long size)
> >                 if (dry_run) {
> >                         /* reuse the buffer in dry_run mode */
> >                         stream.next_out = buf;
> > -                       stream.avail_out = bufsize;
> > +                       stream.avail_out = bufsize > size - stream.total_out ?
> > +                               size - stream.total_out : bufsize;
> >                 }
> >         }
> >         git_inflate_end(&stream);
> >
> > Thanks
> > -Han Xin
>
> Han, do you want to pick this up again for a v14? It looks like you're
> very on top of it already, and I re-sent your patches because I saw that
> your
> https://lore.kernel.org/git/cover.1653015534.git.chiyutianyi@gmail.com/
> wasn't picked up in the interim & you hadn't been active on-list
> otherwise.
>
> But it looks like there's some interest now, and that you have more time
> to test & follow-up on this topic than I do at the moment, so if you
> wanted to do the work of properly rebasing ot in tho recent fsync
> changes that would be great. Thanks.

OK, I am glad to do that.

Thank you very much.

-Han Xin
diff mbox series

Patch

diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c
index 56d05e2725d..64abba8dbac 100644
--- a/builtin/unpack-objects.c
+++ b/builtin/unpack-objects.c
@@ -97,15 +97,26 @@  static void use(int bytes)
 	display_throughput(progress, consumed_bytes);
 }
 
+/*
+ * Decompress zstream from stdin and return specific size of data.
+ * The caller is responsible to free the returned buffer.
+ *
+ * But for dry_run mode, "get_data()" is only used to check the
+ * integrity of data, and the returned buffer is not used at all.
+ * Therefore, in dry_run mode, "get_data()" will release the small
+ * allocated buffer which is reused to hold temporary zstream output
+ * and return NULL instead of returning garbage data.
+ */
 static void *get_data(unsigned long size)
 {
 	git_zstream stream;
-	void *buf = xmallocz(size);
+	unsigned long bufsize = dry_run && size > 8192 ? 8192 : size;
+	void *buf = xmallocz(bufsize);
 
 	memset(&stream, 0, sizeof(stream));
 
 	stream.next_out = buf;
-	stream.avail_out = size;
+	stream.avail_out = bufsize;
 	stream.next_in = fill(1);
 	stream.avail_in = len;
 	git_inflate_init(&stream);
@@ -125,8 +136,15 @@  static void *get_data(unsigned long size)
 		}
 		stream.next_in = fill(1);
 		stream.avail_in = len;
+		if (dry_run) {
+			/* reuse the buffer in dry_run mode */
+			stream.next_out = buf;
+			stream.avail_out = bufsize;
+		}
 	}
 	git_inflate_end(&stream);
+	if (dry_run)
+		FREE_AND_NULL(buf);
 	return buf;
 }
 
@@ -326,10 +344,8 @@  static void unpack_non_delta_entry(enum object_type type, unsigned long size,
 {
 	void *buf = get_data(size);
 
-	if (!dry_run && buf)
+	if (buf)
 		write_object(nr, type, buf, size);
-	else
-		free(buf);
 }
 
 static int resolve_against_held(unsigned nr, const struct object_id *base,
@@ -359,10 +375,8 @@  static void unpack_delta_entry(enum object_type type, unsigned long delta_size,
 		oidread(&base_oid, fill(the_hash_algo->rawsz));
 		use(the_hash_algo->rawsz);
 		delta_data = get_data(delta_size);
-		if (dry_run || !delta_data) {
-			free(delta_data);
+		if (!delta_data)
 			return;
-		}
 		if (has_object_file(&base_oid))
 			; /* Ok we have this one */
 		else if (resolve_against_held(nr, &base_oid,
@@ -398,10 +412,8 @@  static void unpack_delta_entry(enum object_type type, unsigned long delta_size,
 			die("offset value out of bound for delta base object");
 
 		delta_data = get_data(delta_size);
-		if (dry_run || !delta_data) {
-			free(delta_data);
+		if (!delta_data)
 			return;
-		}
 		lo = 0;
 		hi = nr;
 		while (lo < hi) {
diff --git a/t/t5351-unpack-large-objects.sh b/t/t5351-unpack-large-objects.sh
new file mode 100755
index 00000000000..8d84313221c
--- /dev/null
+++ b/t/t5351-unpack-large-objects.sh
@@ -0,0 +1,41 @@ 
+#!/bin/sh
+#
+# Copyright (c) 2022 Han Xin
+#
+
+test_description='git unpack-objects with large objects'
+
+. ./test-lib.sh
+
+prepare_dest () {
+	test_when_finished "rm -rf dest.git" &&
+	git init --bare dest.git
+}
+
+test_expect_success "create large objects (1.5 MB) and PACK" '
+	test-tool genrandom foo 1500000 >big-blob &&
+	test_commit --append foo big-blob &&
+	test-tool genrandom bar 1500000 >big-blob &&
+	test_commit --append bar big-blob &&
+	PACK=$(echo HEAD | git pack-objects --revs pack)
+'
+
+test_expect_success 'set memory limitation to 1MB' '
+	GIT_ALLOC_LIMIT=1m &&
+	export GIT_ALLOC_LIMIT
+'
+
+test_expect_success 'unpack-objects failed under memory limitation' '
+	prepare_dest &&
+	test_must_fail git -C dest.git unpack-objects <pack-$PACK.pack 2>err &&
+	grep "fatal: attempting to allocate" err
+'
+
+test_expect_success 'unpack-objects works with memory limitation in dry-run mode' '
+	prepare_dest &&
+	git -C dest.git unpack-objects -n <pack-$PACK.pack &&
+	test_stdout_line_count = 0 find dest.git/objects -type f &&
+	test_dir_is_empty dest.git/objects/pack
+'
+
+test_done