diff mbox series

[v13,5/7] object-file.c: add "stream_loose_object()" to handle large object

Message ID patch-v13-5.7-0b07b29836b-20220604T095113Z-avarab@gmail.com (mailing list archive)
State New, archived
Headers show
Series unpack-objects: support streaming blobs to disk | expand

Commit Message

Ævar Arnfjörð Bjarmason June 4, 2022, 10:10 a.m. UTC
From: Han Xin <hanxin.hx@alibaba-inc.com>

If we want unpack and write a loose object using "write_loose_object",
we have to feed it with a buffer with the same size of the object, which
will consume lots of memory and may cause OOM. This can be improved by
feeding data to "stream_loose_object()" in a stream.

Add a new function "stream_loose_object()", which is a stream version of
"write_loose_object()" but with a low memory footprint. We will use this
function to unpack large blob object in later commit.

Another difference with "write_loose_object()" is that we have no chance
to run "write_object_file_prepare()" to calculate the oid in advance.
In "write_loose_object()", we know the oid and we can write the
temporary file in the same directory as the final object, but for an
object with an undetermined oid, we don't know the exact directory for
the object.

Still, we need to save the temporary file we're preparing
somewhere. We'll do that in the top-level ".git/objects/"
directory (or whatever "GIT_OBJECT_DIRECTORY" is set to). Once we've
streamed it we'll know the OID, and will move it to its canonical
path.

"freshen_packed_object()" or "freshen_loose_object()" will be called
inside "stream_loose_object()" after obtaining the "oid".

Helped-by: René Scharfe <l.s.r@web.de>
Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Helped-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
Signed-off-by: Han Xin <hanxin.hx@alibaba-inc.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object-file.c  | 100 +++++++++++++++++++++++++++++++++++++++++++++++++
 object-store.h |   8 ++++
 2 files changed, 108 insertions(+)

Comments

Junio C Hamano June 6, 2022, 7:44 p.m. UTC | #1
Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> From: Han Xin <hanxin.hx@alibaba-inc.com>
>
> If we want unpack and write a loose object using "write_loose_object",
> we have to feed it with a buffer with the same size of the object, which
> will consume lots of memory and may cause OOM. This can be improved by
> feeding data to "stream_loose_object()" in a stream.
>
> Add a new function "stream_loose_object()", which is a stream version of
> "write_loose_object()" but with a low memory footprint. We will use this
> function to unpack large blob object in later commit.

Yay.

> Another difference with "write_loose_object()" is that we have no chance
> to run "write_object_file_prepare()" to calculate the oid in advance.

That is somewhat curious.  Is it fundamentally impossible, or is it
just that this patch was written in such a way that conflates the
two and it is cumbersome to split the "we repeat the sequence of
reading and deflating just a bit until we process all" and the "we
compute the hash over the data first and then we write out for
real"?

> In "write_loose_object()", we know the oid and we can write the
> temporary file in the same directory as the final object, but for an
> object with an undetermined oid, we don't know the exact directory for
> the object.
>
> Still, we need to save the temporary file we're preparing
> somewhere. We'll do that in the top-level ".git/objects/"
> directory (or whatever "GIT_OBJECT_DIRECTORY" is set to). Once we've
> streamed it we'll know the OID, and will move it to its canonical
> path.

This may have negative implications on some filesystems where cross
directory links do not work atomically, but it is a small price to pay.

I am very tempted to ask why we do not do this to _all_ loose object
files.  Instead of running the machinery twice over the data (once to
compute the object name, then to compute the contents and write out),
if we can produce loose object files of any size with a single pass,
wouldn't that be an overall win?

Is the fixed overhead, i.e. cost of setting up the streaming interface,
reasonably large to make it not worth doing for smaller objects?

> "freshen_packed_object()" or "freshen_loose_object()" will be called
> inside "stream_loose_object()" after obtaining the "oid".

That much we can read from the patch text.  Saying just "we do X"
without explaining "why we do so" in the proposed log message leaves
readers more confused than otherwise.  Why is it worth pointing out
in the proposed log message?  Is the reason why we need to do so
involve something tricky?

> +int stream_loose_object(struct input_stream *in_stream, size_t len,
> +			struct object_id *oid)
> +{
> +	int fd, ret, err = 0, flush = 0;
> +	unsigned char compressed[4096];
> +	git_zstream stream;
> +	git_hash_ctx c;
> +	struct strbuf tmp_file = STRBUF_INIT;
> +	struct strbuf filename = STRBUF_INIT;
> +	int dirlen;
> +	char hdr[MAX_HEADER_LEN];
> +	int hdrlen;
> +
> +	/* Since oid is not determined, save tmp file to odb path. */
> +	strbuf_addf(&filename, "%s/", get_object_directory());
> +	hdrlen = format_object_header(hdr, sizeof(hdr), OBJ_BLOB, len);
> +
> +	/*
> +	 * Common steps for write_loose_object and stream_loose_object to
> +	 * start writing loose objects:
> +	 *
> +	 *  - Create tmpfile for the loose object.
> +	 *  - Setup zlib stream for compression.
> +	 *  - Start to feed header to zlib stream.
> +	 */
> +	fd = start_loose_object_common(&tmp_file, filename.buf, 0,
> +				       &stream, compressed, sizeof(compressed),
> +				       &c, hdr, hdrlen);
> +	if (fd < 0) {
> +		err = -1;
> +		goto cleanup;
> +	}
> +
> +	/* Then the data itself.. */
> +	do {
> +		unsigned char *in0 = stream.next_in;
> +
> +		if (!stream.avail_in && !in_stream->is_finished) {
> +			const void *in = in_stream->read(in_stream, &stream.avail_in);
> +			stream.next_in = (void *)in;
> +			in0 = (unsigned char *)in;
> +			/* All data has been read. */
> +			if (in_stream->is_finished)
> +				flush = 1;
> +		}
> +		ret = write_loose_object_common(&c, &stream, flush, in0, fd,
> +						compressed, sizeof(compressed));
> +		/*
> +		 * Unlike write_loose_object(), we do not have the entire
> +		 * buffer. If we get Z_BUF_ERROR due to too few input bytes,
> +		 * then we'll replenish them in the next input_stream->read()
> +		 * call when we loop.
> +		 */
> +	} while (ret == Z_OK || ret == Z_BUF_ERROR);
>
> +	if (stream.total_in != len + hdrlen)
> +		die(_("write stream object %ld != %"PRIuMAX), stream.total_in,
> +		    (uintmax_t)len + hdrlen);

> +	/* Common steps for write_loose_object and stream_loose_object to

Style.

> +	 * end writing loose oject:
> +	 *
> +	 *  - End the compression of zlib stream.
> +	 *  - Get the calculated oid.
> +	 */
> +	if (ret != Z_STREAM_END)
> +		die(_("unable to stream deflate new object (%d)"), ret);

Good to check this, after the loop exits above.  I was expecting to
see it immediately after the loop, but here is also OK.

> +	ret = end_loose_object_common(&c, &stream, oid);
> +	if (ret != Z_OK)
> +		die(_("deflateEnd on stream object failed (%d)"), ret);
> +	close_loose_object(fd, tmp_file.buf);
> +
> +	if (freshen_packed_object(oid) || freshen_loose_object(oid)) {
> +		unlink_or_warn(tmp_file.buf);
> +		goto cleanup;

So, we were told to write an object, we wrote to a temporary file,
and we wanted to mark the object to be recent and found that there
indeed is already the object.  We remove the temporary and do not
leave the new copy of the object, and the value of err at this point
is 0 (success) which is what is returned from cleanup: label.

Good.

> +	}
> +
> +	loose_object_path(the_repository, &filename, oid);
> +
> +	/* We finally know the object path, and create the missing dir. */
> +	dirlen = directory_size(filename.buf);
> +	if (dirlen) {
> +		struct strbuf dir = STRBUF_INIT;
> +		strbuf_add(&dir, filename.buf, dirlen);
> +
> +		if (mkdir_in_gitdir(dir.buf) && errno != EEXIST) {
> +			err = error_errno(_("unable to create directory %s"), dir.buf);
> +			strbuf_release(&dir);
> +			goto cleanup;
> +		}
> +		strbuf_release(&dir);
> +	}
> +
> +	err = finalize_object_file(tmp_file.buf, filename.buf);
> +cleanup:
> +	strbuf_release(&tmp_file);
> +	strbuf_release(&filename);
> +	return err;
> +}
> +
Junio C Hamano June 6, 2022, 8:02 p.m. UTC | #2
Junio C Hamano <gitster@pobox.com> writes:

>> Another difference with "write_loose_object()" is that we have no chance
>> to run "write_object_file_prepare()" to calculate the oid in advance.
>
> That is somewhat curious.  Is it fundamentally impossible, or is it
> just that this patch was written in such a way that conflates the
> two and it is cumbersome to split the "we repeat the sequence of
> reading and deflating just a bit until we process all" and the "we
> compute the hash over the data first and then we write out for
> real"?

OK, the answer lies somewhere in between.

The initial user of this streaming interface reads from an incoming
packfile and feeds the inflated bytestream to the interface, which
means we cannot seek.  That meaks it "fundamentally impossible" for
that codepath (i.e. unpack-objects to read from packstream and write
to on-disk loose objects).

But if the input source is seekable (e.g. a file in the working
tree), there is no fundamental reason why the new interface has "no
chance to run prepare to calculate the oid in advance".  It's just
that the such a different caller is not added by the series and we
chose not to allow the "prepare and then write" two-step process,
because we currently do not need it when this series lands.

> I am very tempted to ask why we do not do this to _all_ loose object
> files.  Instead of running the machinery twice over the data (once to
> compute the object name, then to compute the contents and write out),
> if we can produce loose object files of any size with a single pass,
> wouldn't that be an overall win?

There is a patch later in the series whose proposed log message has
benchmarks to show that it is slower in general.  It still is
curious where the slowness comes from and if it is something we can
tune, though.

Thanks.
Neeraj Singh June 7, 2022, 7:53 p.m. UTC | #3
On 6/4/2022 3:10 AM, Ævar Arnfjörð Bjarmason wrote:
> From: Han Xin <hanxin.hx@alibaba-inc.com>
> 
> If we want unpack and write a loose object using "write_loose_object",
> we have to feed it with a buffer with the same size of the object, which
> will consume lots of memory and may cause OOM. This can be improved by
> feeding data to "stream_loose_object()" in a stream.
> 
> Add a new function "stream_loose_object()", which is a stream version of
> "write_loose_object()" but with a low memory footprint. We will use this
> function to unpack large blob object in later commit.
> 
> Another difference with "write_loose_object()" is that we have no chance
> to run "write_object_file_prepare()" to calculate the oid in advance.
> In "write_loose_object()", we know the oid and we can write the
> temporary file in the same directory as the final object, but for an
> object with an undetermined oid, we don't know the exact directory for
> the object.
> 
> Still, we need to save the temporary file we're preparing
> somewhere. We'll do that in the top-level ".git/objects/"
> directory (or whatever "GIT_OBJECT_DIRECTORY" is set to). Once we've
> streamed it we'll know the OID, and will move it to its canonical
> path.
> 

I think this new logic doesn't play well with batched-fsync. Even 
through we don't know the final OID, we should still call 
prepare_loose_object_bulk_checkin to potentially create the bulk checkin 
objdir.


> diff --git a/object-file.c b/object-file.c
> index 7946fa5e088..9fd449693c4 100644
> --- a/object-file.c
> +++ b/object-file.c
> @@ -2119,6 +2119,106 @@ static int freshen_packed_object(const struct object_id *oid)
>   	return 1;
>   }
>   
> +int stream_loose_object(struct input_stream *in_stream, size_t len,
> +			struct object_id *oid)
> +{
> +	int fd, ret, err = 0, flush = 0;
> +	unsigned char compressed[4096];
> +	git_zstream stream;
> +	git_hash_ctx c;
> +	struct strbuf tmp_file = STRBUF_INIT;
> +	struct strbuf filename = STRBUF_INIT;
> +	int dirlen;
> +	char hdr[MAX_HEADER_LEN];
> +	int hdrlen;
> +
> +	/* Since oid is not determined, save tmp file to odb path. */
> +	strbuf_addf(&filename, "%s/", get_object_directory());
> +	hdrlen = format_object_header(hdr, sizeof(hdr), OBJ_BLOB, len);
> +
> +	/*
> +	 * Common steps for write_loose_object and stream_loose_object to
> +	 * start writing loose objects:
> +	 *
> +	 *  - Create tmpfile for the loose object.
> +	 *  - Setup zlib stream for compression.
> +	 *  - Start to feed header to zlib stream.
> +	 */
> +	fd = start_loose_object_common(&tmp_file, filename.buf, 0,
> +				       &stream, compressed, sizeof(compressed),
> +				       &c, hdr, hdrlen);
> +	if (fd < 0) {
> +		err = -1;
> +		goto cleanup;
> +	}
> +
> +	/* Then the data itself.. */
> +	do {
> +		unsigned char *in0 = stream.next_in;
> +
> +		if (!stream.avail_in && !in_stream->is_finished) {
> +			const void *in = in_stream->read(in_stream, &stream.avail_in);
> +			stream.next_in = (void *)in;
> +			in0 = (unsigned char *)in;
> +			/* All data has been read. */
> +			if (in_stream->is_finished)
> +				flush = 1;
> +		}
> +		ret = write_loose_object_common(&c, &stream, flush, in0, fd,
> +						compressed, sizeof(compressed));
> +		/*
> +		 * Unlike write_loose_object(), we do not have the entire
> +		 * buffer. If we get Z_BUF_ERROR due to too few input bytes,
> +		 * then we'll replenish them in the next input_stream->read()
> +		 * call when we loop.
> +		 */
> +	} while (ret == Z_OK || ret == Z_BUF_ERROR);
> +
> +	if (stream.total_in != len + hdrlen)
> +		die(_("write stream object %ld != %"PRIuMAX), stream.total_in,
> +		    (uintmax_t)len + hdrlen);
> +
> +	/* Common steps for write_loose_object and stream_loose_object to
> +	 * end writing loose oject:
> +	 *
> +	 *  - End the compression of zlib stream.
> +	 *  - Get the calculated oid.
> +	 */
> +	if (ret != Z_STREAM_END)
> +		die(_("unable to stream deflate new object (%d)"), ret);
> +	ret = end_loose_object_common(&c, &stream, oid);
> +	if (ret != Z_OK)
> +		die(_("deflateEnd on stream object failed (%d)"), ret);
> +	close_loose_object(fd, tmp_file.buf);
> +

If batch fsync is enabled, the close_loose_object call will refrain from 
syncing the tmp file.

> +	if (freshen_packed_object(oid) || freshen_loose_object(oid)) {
> +		unlink_or_warn(tmp_file.buf);
> +		goto cleanup;
> +	}
> +
> +	loose_object_path(the_repository, &filename, oid);
> +

We expect this loose_object_path call to return a path in the bulk fsync 
object directory. It might not do so if we don't call 
prepare_loose_object_bulk_checkin.

In the new test case introduced in (7/7), we seem to be getting lucky
in that there are some small objects (commits) earlier in the packfile,
so we go through write_loose_object first.

Thanks for including me on the review!

-Neeraj
Junio C Hamano June 8, 2022, 3:34 p.m. UTC | #4
Neeraj Singh <nksingh85@gmail.com> writes:

>> Still, we need to save the temporary file we're preparing
>> somewhere. We'll do that in the top-level ".git/objects/"
>> directory (or whatever "GIT_OBJECT_DIRECTORY" is set to). Once we've
>> streamed it we'll know the OID, and will move it to its canonical
>> path.
>> 
>
> I think this new logic doesn't play well with batched-fsync. Even
> through we don't know the final OID, we should still call 
> prepare_loose_object_bulk_checkin to potentially create the bulk
> checkin objdir.

Good point.  Careful sanity checks like this are very much
appreciated.

> Thanks for including me on the review!

Yes, indeed.

Thanks, both.
Han Xin June 9, 2022, 6:04 a.m. UTC | #5
On Tue, Jun 7, 2022 at 4:03 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> Junio C Hamano <gitster@pobox.com> writes:
>
> > I am very tempted to ask why we do not do this to _all_ loose object
> > files.  Instead of running the machinery twice over the data (once to
> > compute the object name, then to compute the contents and write out),
> > if we can produce loose object files of any size with a single pass,
> > wouldn't that be an overall win?
>
> There is a patch later in the series whose proposed log message has
> benchmarks to show that it is slower in general.  It still is
> curious where the slowness comes from and if it is something we can
> tune, though.
>

Compared with getting the whole object buffer, stream_loose_object() uses
limited avail_in buffer and never fill new content until the whole
avail_in has been
deflated. It will generate small avail_in fragments due to the limited
avail_out,
and I think it is precisely because these avail_in fragments generate additional
git_deflate() loops.

In "unpack-objects", we use a buffer size of 8192. Increasing the buffer
can alleviate this problem, but maybe it's not worth it?

> Thanks.
Han Xin June 9, 2022, 6:14 a.m. UTC | #6
On Tue, Jun 7, 2022 at 3:44 AM Junio C Hamano <gitster@pobox.com> wrote:
>
>
> > "freshen_packed_object()" or "freshen_loose_object()" will be called
> > inside "stream_loose_object()" after obtaining the "oid".
>
> That much we can read from the patch text.  Saying just "we do X"
> without explaining "why we do so" in the proposed log message leaves
> readers more confused than otherwise.  Why is it worth pointing out
> in the proposed log message?  Is the reason why we need to do so
> involve something tricky?
>

Yes, it really should be made clear why this is done here.

Thanks.
-Han Xin

> > +     ret = end_loose_object_common(&c, &stream, oid);
> > +     if (ret != Z_OK)
> > +             die(_("deflateEnd on stream object failed (%d)"), ret);
> > +     close_loose_object(fd, tmp_file.buf);
> > +
> > +     if (freshen_packed_object(oid) || freshen_loose_object(oid)) {
> > +             unlink_or_warn(tmp_file.buf);
> > +             goto cleanup;
>
> So, we were told to write an object, we wrote to a temporary file,
> and we wanted to mark the object to be recent and found that there
> indeed is already the object.  We remove the temporary and do not
> leave the new copy of the object, and the value of err at this point
> is 0 (success) which is what is returned from cleanup: label.
>
> Good.
>
diff mbox series

Patch

diff --git a/object-file.c b/object-file.c
index 7946fa5e088..9fd449693c4 100644
--- a/object-file.c
+++ b/object-file.c
@@ -2119,6 +2119,106 @@  static int freshen_packed_object(const struct object_id *oid)
 	return 1;
 }
 
+int stream_loose_object(struct input_stream *in_stream, size_t len,
+			struct object_id *oid)
+{
+	int fd, ret, err = 0, flush = 0;
+	unsigned char compressed[4096];
+	git_zstream stream;
+	git_hash_ctx c;
+	struct strbuf tmp_file = STRBUF_INIT;
+	struct strbuf filename = STRBUF_INIT;
+	int dirlen;
+	char hdr[MAX_HEADER_LEN];
+	int hdrlen;
+
+	/* Since oid is not determined, save tmp file to odb path. */
+	strbuf_addf(&filename, "%s/", get_object_directory());
+	hdrlen = format_object_header(hdr, sizeof(hdr), OBJ_BLOB, len);
+
+	/*
+	 * Common steps for write_loose_object and stream_loose_object to
+	 * start writing loose objects:
+	 *
+	 *  - Create tmpfile for the loose object.
+	 *  - Setup zlib stream for compression.
+	 *  - Start to feed header to zlib stream.
+	 */
+	fd = start_loose_object_common(&tmp_file, filename.buf, 0,
+				       &stream, compressed, sizeof(compressed),
+				       &c, hdr, hdrlen);
+	if (fd < 0) {
+		err = -1;
+		goto cleanup;
+	}
+
+	/* Then the data itself.. */
+	do {
+		unsigned char *in0 = stream.next_in;
+
+		if (!stream.avail_in && !in_stream->is_finished) {
+			const void *in = in_stream->read(in_stream, &stream.avail_in);
+			stream.next_in = (void *)in;
+			in0 = (unsigned char *)in;
+			/* All data has been read. */
+			if (in_stream->is_finished)
+				flush = 1;
+		}
+		ret = write_loose_object_common(&c, &stream, flush, in0, fd,
+						compressed, sizeof(compressed));
+		/*
+		 * Unlike write_loose_object(), we do not have the entire
+		 * buffer. If we get Z_BUF_ERROR due to too few input bytes,
+		 * then we'll replenish them in the next input_stream->read()
+		 * call when we loop.
+		 */
+	} while (ret == Z_OK || ret == Z_BUF_ERROR);
+
+	if (stream.total_in != len + hdrlen)
+		die(_("write stream object %ld != %"PRIuMAX), stream.total_in,
+		    (uintmax_t)len + hdrlen);
+
+	/* Common steps for write_loose_object and stream_loose_object to
+	 * end writing loose oject:
+	 *
+	 *  - End the compression of zlib stream.
+	 *  - Get the calculated oid.
+	 */
+	if (ret != Z_STREAM_END)
+		die(_("unable to stream deflate new object (%d)"), ret);
+	ret = end_loose_object_common(&c, &stream, oid);
+	if (ret != Z_OK)
+		die(_("deflateEnd on stream object failed (%d)"), ret);
+	close_loose_object(fd, tmp_file.buf);
+
+	if (freshen_packed_object(oid) || freshen_loose_object(oid)) {
+		unlink_or_warn(tmp_file.buf);
+		goto cleanup;
+	}
+
+	loose_object_path(the_repository, &filename, oid);
+
+	/* We finally know the object path, and create the missing dir. */
+	dirlen = directory_size(filename.buf);
+	if (dirlen) {
+		struct strbuf dir = STRBUF_INIT;
+		strbuf_add(&dir, filename.buf, dirlen);
+
+		if (mkdir_in_gitdir(dir.buf) && errno != EEXIST) {
+			err = error_errno(_("unable to create directory %s"), dir.buf);
+			strbuf_release(&dir);
+			goto cleanup;
+		}
+		strbuf_release(&dir);
+	}
+
+	err = finalize_object_file(tmp_file.buf, filename.buf);
+cleanup:
+	strbuf_release(&tmp_file);
+	strbuf_release(&filename);
+	return err;
+}
+
 int write_object_file_flags(const void *buf, unsigned long len,
 			    enum object_type type, struct object_id *oid,
 			    unsigned flags)
diff --git a/object-store.h b/object-store.h
index 539ea439046..5222ee54600 100644
--- a/object-store.h
+++ b/object-store.h
@@ -46,6 +46,12 @@  struct object_directory {
 	char *path;
 };
 
+struct input_stream {
+	const void *(*read)(struct input_stream *, unsigned long *len);
+	void *data;
+	int is_finished;
+};
+
 KHASH_INIT(odb_path_map, const char * /* key: odb_path */,
 	struct object_directory *, 1, fspathhash, fspatheq)
 
@@ -269,6 +275,8 @@  static inline int write_object_file(const void *buf, unsigned long len,
 int write_object_file_literally(const void *buf, unsigned long len,
 				const char *type, struct object_id *oid,
 				unsigned flags);
+int stream_loose_object(struct input_stream *in_stream, size_t len,
+			struct object_id *oid);
 
 /*
  * Add an object file to the in-memory object store, without writing it