[v2,05/10] cat-file: use delta_base_cache entries directly

Message ID	20240823224630.1180772-6-e@80x24.org (mailing list archive)
State	New
Headers	show Received: from dcvr.yhbt.net (dcvr.yhbt.net [173.255.242.215]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DCE68185B72 for <git@vger.kernel.org>; Fri, 23 Aug 2024 22:47:13 +0000 (UTC) From: Eric Wong <e@80x24.org> To: git@vger.kernel.org Cc: Jeff King <peff@peff.net>, Patrick Steinhardt <ps@pks.im> Subject: [PATCH v2 05/10] cat-file: use delta_base_cache entries directly Date: Fri, 23 Aug 2024 22:46:25 +0000 Message-ID: <20240823224630.1180772-6-e@80x24.org> In-Reply-To: <20240823224630.1180772-1-e@80x24.org> References: <20240823224630.1180772-1-e@80x24.org> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	cat-file speedups \| expand [v2,00/10] cat-file speedups [v2,01/10] packfile: move sizep computation [v2,02/10] packfile: allow content-limit for cat-file [v2,03/10] packfile: fix off-by-one in content_limit comparison [v2,04/10] packfile: inline cache_or_unpack_entry [v2,05/10] cat-file: use delta_base_cache entries directly [v2,06/10] packfile: packed_object_info avoids packed_to_object_type [v2,07/10] object_info: content_limit only applies to blobs [v2,08/10] cat-file: batch-command uses content_limit [v2,09/10] cat-file: batch_write: use size_t for length [v2,10/10] cat-file: use writev(2) if available

Message ID

20240823224630.1180772-6-e@80x24.org (mailing list archive)

State

New

Headers

From: Eric Wong <e@80x24.org>
To: git@vger.kernel.org
Cc: Jeff King <peff@peff.net>,
	Patrick Steinhardt <ps@pks.im>
Subject: [PATCH v2 05/10] cat-file: use delta_base_cache entries directly
Date: Fri, 23 Aug 2024 22:46:25 +0000
Message-ID: <20240823224630.1180772-6-e@80x24.org>
In-Reply-To: <20240823224630.1180772-1-e@80x24.org>
References: <20240823224630.1180772-1-e@80x24.org>
Precedence: bulk
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit

Series

cat-file speedups | expand

Commit Message

Eric Wong Aug. 23, 2024, 10:46 p.m. UTC

For objects already in the delta_base_cache, we can safely use
one entry at-a-time directly to avoid the malloc+memcpy+free
overhead.  For a 1MB delta base object, this eliminates the
speed penalty of duplicating large objects into memory and
speeds up those 1MB delta base cached content retrievals by
roughly 30%.

While only 2-7% of objects are delta bases in repos I've looked
at, this avoids up to 96MB of duplicated memory in the worst
case with the default git config.

The new delta_base_cache_lock is a simple single-threaded
assertion to ensure cat-file (and similar) is the exclusive user
of the delta_base_cache.  In other words, we cannot have diff
or similar commands using two or more entries directly from the
delta base cache.  The new lock has nothing to do with parallel
access via multiple threads at the moment.

Signed-off-by: Eric Wong <e@80x24.org>
---
 builtin/cat-file.c | 16 +++++++++++++++-
 object-file.c      |  5 +++++
 object-store-ll.h  |  8 ++++++++
 packfile.c         | 33 ++++++++++++++++++++++++++++++---
 packfile.h         |  4 ++++
 5 files changed, 62 insertions(+), 4 deletions(-)

Comments

Junio C Hamano Aug. 26, 2024, 9:31 p.m. UTC | #1

Eric Wong <e@80x24.org> writes:

> For objects already in the delta_base_cache, we can safely use
> one entry at-a-time directly to avoid the malloc+memcpy+free
> overhead.  For a 1MB delta base object, this eliminates the
> speed penalty of duplicating large objects into memory and
> speeds up those 1MB delta base cached content retrievals by
> roughly 30%.
>
> While only 2-7% of objects are delta bases in repos I've looked
> at, this avoids up to 96MB of duplicated memory in the worst
> case with the default git config.
>
> The new delta_base_cache_lock is a simple single-threaded
> assertion to ensure cat-file (and similar) is the exclusive user
> of the delta_base_cache.  In other words, we cannot have diff
> or similar commands using two or more entries directly from the
> delta base cache.  The new lock has nothing to do with parallel
> access via multiple threads at the moment.
>
> Signed-off-by: Eric Wong <e@80x24.org>
> ---
>  builtin/cat-file.c | 16 +++++++++++++++-
>  object-file.c      |  5 +++++
>  object-store-ll.h  |  8 ++++++++
>  packfile.c         | 33 ++++++++++++++++++++++++++++++---
>  packfile.h         |  4 ++++
>  5 files changed, 62 insertions(+), 4 deletions(-)
>
> diff --git a/builtin/cat-file.c b/builtin/cat-file.c
> index bc4bb89610..8debcdca3e 100644
> --- a/builtin/cat-file.c
> +++ b/builtin/cat-file.c
> @@ -386,7 +386,20 @@ static void print_object_or_die(struct batch_options *opt, struct expand_data *d
>  
>  	if (data->content) {
>  		batch_write(opt, data->content, data->size);
> -		FREE_AND_NULL(data->content);
> +		switch (data->info.whence) {
> +		case OI_CACHED:
> +			/*
> +			 * only blame uses OI_CACHED atm, so it's unlikely
> +			 * we'll ever hit this path
> +			 */
> +			BUG("TODO OI_CACHED support not done");
> +		case OI_LOOSE:
> +		case OI_PACKED:
> +			FREE_AND_NULL(data->content);
> +			break;
> +		case OI_DBCACHED:
> +			unlock_delta_base_cache();
> +		}
>  	} else if (data->type == OBJ_BLOB) {
>  		if (opt->buffer_output)
>  			fflush(stdout);
> @@ -815,6 +828,7 @@ static int batch_objects(struct batch_options *opt)
>  			data.info.sizep = &data.size;
>  			data.info.contentp = &data.content;
>  			data.info.content_limit = big_file_threshold;
> +			data.info.direct_cache = 1;
>  		}
>  	}
>  
> diff --git a/object-file.c b/object-file.c
> index 1cc29c3c58..19100e823d 100644
> --- a/object-file.c
> +++ b/object-file.c
> @@ -1586,6 +1586,11 @@ static int do_oid_object_info_extended(struct repository *r,
>  			oidclr(oi->delta_base_oid, the_repository->hash_algo);
>  		if (oi->type_name)
>  			strbuf_addstr(oi->type_name, type_name(co->type));
> +		/*
> +		 * Currently `blame' is the only command which creates
> +		 * OI_CACHED, and direct_cache is only used by `cat-file'.
> +		 */
> +		assert(!oi->direct_cache);

If "git cat-file" ever gets enhanced to also use
pretend_object_file(), this assumption gets violated.  

What happens then?  Would we return inconsistent answer to the
caller that may result in garbage output, breaking the downstream
process somehow?  If so, I'd prefer to see this guarded more
explicitly with "if (...)  BUG()" than assert() here.

>  		if (oi->contentp)
>  			*oi->contentp = xmemdupz(co->buf, co->size);

Regardless of the answer of the question above about assert(),
shouldn't the step [02/10] of this series also have been made to pay
attention to oi->content_limit mechanism?  It looks inconsistent.

> diff --git a/object-store-ll.h b/object-store-ll.h
> index b71a15f590..669bb93784 100644
> --- a/object-store-ll.h
> +++ b/object-store-ll.h
> @@ -298,6 +298,14 @@ struct object_info {
>  		OI_PACKED,
>  		OI_DBCACHED
>  	} whence;
> +
> +	/*
> +	 * Set if caller is able to use OI_DBCACHED entries without copying.
> +	 * This only applies to OI_DBCACHED entries at the moment,
> +	 * not OI_CACHED or any other type of entry.
> +	 */
> +	unsigned direct_cache:1;

This is a "response" from the API to the caller, but "Set if ..."
sounds as if you are telling the caller what to do.  Would the
caller set this bit when making a request to say "I am cat-file and
I use entry before making any other requests to the object layer so
I am sure object layer will not evict the entry I am using"?  No,
that is not a "response".

You seem to set this bit in batch_objects(), so it does sound like
that the bit is expected to be set by the caller to tell the API
something.  If that is the case, then (1) move it to "request" part
of the object_info structure, and (2) define what it means to be
"able to use ... without copying".  Mechanically, it may mean
"contentp directly points into the delta base cache", but what
implication does it have to callers?  If the caller obtains such a
pointer in .contentp, what is required for its access pattern to
make accessing the pointer safe?  The caller cannot use the pointed
memory after it accesses another object?  What is the definition of
"access" in the context of this discussion?  Does "checking for
existence of an object" count as an access?

> +static void lock_delta_base_cache(void)
> +{
> +	delta_base_cache_lock++;
> +	assert(delta_base_cache_lock == 1);
> +}
> +
> +void unlock_delta_base_cache(void)
> +{
> +	delta_base_cache_lock--;
> +	assert(delta_base_cache_lock == 0);
> +}

I view assert() validating precondition for the control flow to pass
the point in the code path, so the above looks somewhat surprising.

Shouldn't we instead checking the current value first and then only
if the caller is _allowed_ to lock/unlock from that state (i.e. the
lock is not held/the lock is held), increment/decrement the variable?

> +			/* ignore content_limit if avoiding copy from cache */
> +			if (oi->direct_cache) {
> +				lock_delta_base_cache();
> +				*oi->contentp = ent->data;
> +			} else if (!oi->content_limit ||
> +					ent->size <= oi->content_limit) {
>  				*oi->contentp = xmemdupz(ent->data, ent->size);

This somehow looks making the memory ownership rules confusing.

How does the caller know when it can free(oi->contentp) after a call
returns?  If it sets direct_cache and got a non-NULL *oi->contentp
back, is it always pointing at a borrowed piece memory, so that it
does not have to worry about freeing, but if it didn't set
direct_cache, it can be holding an allocated piece of memory and you
are responsible for freeing it?

The following is a tangent that is not necessary to be addressed in
this series, but since I've spent time to think about it already,
let me record it as #leftoverbits here.

Whatever the answers to the earlier questions on the comment on the
direct_cache member are, I wonder if the access pattern of the
caller that satisfies such requirement allows a lot simpler
solution.  Would it give us similar performance boost by (morally)
reverting 85fe35ab (cache_or_unpack_entry: drop keep_cache
parameter, 2016-08-22) to resurrect the "pass ownership of a delta
base cache entry to the caller" feature?  We give the content to the
caller of object_info, and evict the cached delta base.  Ideally, we
shouldn't have to ask the caller to do anything special, other than
just setting contentp to non-NULL (and optionally setting
content_limit), and the choice of copying or passing the ownership
should be made in the object access layer, which knows how big the
object is (e.g., small things may be easier to recreate), how deep
the delta chain the entry in the delta base cache is (e.g., a delta
base object that is simply deflated can be recreated cheaply, while
one based on top of 100-delta deep chain is very expensive to
recreate), etc.

Junio C Hamano Aug. 26, 2024, 11:05 p.m. UTC | #2

Junio C Hamano <gitster@pobox.com> writes:

>> +	/*
>> +	 * Set if caller is able to use OI_DBCACHED entries without copying.
>> +	 * This only applies to OI_DBCACHED entries at the moment,
>> +	 * not OI_CACHED or any other type of entry.
>> +	 */
>> +	unsigned direct_cache:1;
> ...
> You seem to set this bit in batch_objects(), so it does sound like
> that the bit is expected to be set by the caller to tell the API
> something.  If that is the case, then (1) move it to "request" part
> of the object_info structure, and (2) define what it means to be
> "able to use ... without copying".  Mechanically, it may mean
> "contentp directly points into the delta base cache", but what
> implication does it have to callers?  If the caller obtains such a
> pointer in .contentp, what is required for its access pattern to
> make accessing the pointer safe?  The caller cannot use the pointed
> memory after it accesses another object?  What is the definition of
> "access" in the context of this discussion?  Does "checking for
> existence of an object" count as an access?

Another thing that makes me worry about this approach (as opposed to
a much simpler to reason about alternative like "transfer ownership
to the caller") is that it is very hard to guarantee that other
object access that is not under caller's control will never happen,
and it is even harder to make sure that the code will keep giving
such a guarantee.

In other words, the arrangement smells a bit too brittle.

For example, would it be possible for a lazy fetching of (unrelated)
objects triggers after the caller asks about an object and borrows a
pointer into delta base cache, and would it scramble the entries of
delta base cache, silently invalidating the borrowed piece of
memory?  Would it be possible for the textconv and other filtering
mechanism driven by the attribute system trigger access to
configured attribute blob, which has to be lazily fetched from other
place?

Thanks.

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index bc4bb89610..8debcdca3e 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -386,7 +386,20 @@  static void print_object_or_die(struct batch_options *opt, struct expand_data *d
 
 	if (data->content) {
 		batch_write(opt, data->content, data->size);
-		FREE_AND_NULL(data->content);
+		switch (data->info.whence) {
+		case OI_CACHED:
+			/*
+			 * only blame uses OI_CACHED atm, so it's unlikely
+			 * we'll ever hit this path
+			 */
+			BUG("TODO OI_CACHED support not done");
+		case OI_LOOSE:
+		case OI_PACKED:
+			FREE_AND_NULL(data->content);
+			break;
+		case OI_DBCACHED:
+			unlock_delta_base_cache();
+		}
 	} else if (data->type == OBJ_BLOB) {
 		if (opt->buffer_output)
 			fflush(stdout);
@@ -815,6 +828,7 @@  static int batch_objects(struct batch_options *opt)
 			data.info.sizep = &data.size;
 			data.info.contentp = &data.content;
 			data.info.content_limit = big_file_threshold;
+			data.info.direct_cache = 1;
 		}
 	}
 
diff --git a/object-file.c b/object-file.c
index 1cc29c3c58..19100e823d 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1586,6 +1586,11 @@  static int do_oid_object_info_extended(struct repository *r,
 			oidclr(oi->delta_base_oid, the_repository->hash_algo);
 		if (oi->type_name)
 			strbuf_addstr(oi->type_name, type_name(co->type));
+		/*
+		 * Currently `blame' is the only command which creates
+		 * OI_CACHED, and direct_cache is only used by `cat-file'.
+		 */
+		assert(!oi->direct_cache);
 		if (oi->contentp)
 			*oi->contentp = xmemdupz(co->buf, co->size);
 		oi->whence = OI_CACHED;
diff --git a/object-store-ll.h b/object-store-ll.h
index b71a15f590..669bb93784 100644
--- a/object-store-ll.h
+++ b/object-store-ll.h
@@ -298,6 +298,14 @@  struct object_info {
 		OI_PACKED,
 		OI_DBCACHED
 	} whence;
+
+	/*
+	 * Set if caller is able to use OI_DBCACHED entries without copying.
+	 * This only applies to OI_DBCACHED entries at the moment,
+	 * not OI_CACHED or any other type of entry.
+	 */
+	unsigned direct_cache:1;
+
 	union {
 		/*
 		 * struct {
diff --git a/packfile.c b/packfile.c
index 0a90a5ed67..40c6c2e387 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1362,6 +1362,14 @@  static enum object_type packed_to_object_type(struct repository *r,
 static struct hashmap delta_base_cache;
 static size_t delta_base_cached;
 
+/*
+ * Ensures only a single object is used at-a-time via oi->direct_cache.
+ * Using two objects directly at once (e.g. diff) would cause corruption
+ * since populating the cache may invalidate existing entries.
+ * This lock has nothing to do with parallelism at the moment.
+ */
+static int delta_base_cache_lock;
+
 static LIST_HEAD(delta_base_cache_lru);
 
 struct delta_base_cache_key {
@@ -1444,6 +1452,18 @@  static void detach_delta_base_cache_entry(struct delta_base_cache_entry *ent)
 	free(ent);
 }
 
+static void lock_delta_base_cache(void)
+{
+	delta_base_cache_lock++;
+	assert(delta_base_cache_lock == 1);
+}
+
+void unlock_delta_base_cache(void)
+{
+	delta_base_cache_lock--;
+	assert(delta_base_cache_lock == 0);
+}
+
 static inline void release_delta_base_cache(struct delta_base_cache_entry *ent)
 {
 	free(ent->data);
@@ -1453,6 +1473,7 @@  static inline void release_delta_base_cache(struct delta_base_cache_entry *ent)
 void clear_delta_base_cache(void)
 {
 	struct list_head *lru, *tmp;
+	assert(!delta_base_cache_lock);
 	list_for_each_safe(lru, tmp, &delta_base_cache_lru) {
 		struct delta_base_cache_entry *entry =
 			list_entry(lru, struct delta_base_cache_entry, lru);
@@ -1466,6 +1487,7 @@  static void add_delta_base_cache(struct packed_git *p, off_t base_offset,
 	struct delta_base_cache_entry *ent;
 	struct list_head *lru, *tmp;
 
+	assert(!delta_base_cache_lock);
 	/*
 	 * Check required to avoid redundant entries when more than one thread
 	 * is unpacking the same object, in unpack_entry() (since its phases I
@@ -1520,11 +1542,16 @@  int packed_object_info(struct repository *r, struct packed_git *p,
 		if (oi->sizep)
 			*oi->sizep = ent->size;
 		if (oi->contentp) {
-			if (!oi->content_limit ||
-					ent->size <= oi->content_limit)
+			/* ignore content_limit if avoiding copy from cache */
+			if (oi->direct_cache) {
+				lock_delta_base_cache();
+				*oi->contentp = ent->data;
+			} else if (!oi->content_limit ||
+					ent->size <= oi->content_limit) {
 				*oi->contentp = xmemdupz(ent->data, ent->size);
-			else
+			} else {
 				*oi->contentp = NULL; /* caller must stream */
+			}
 		}
 	} else if (oi->contentp && !oi->content_limit) {
 		*oi->contentp = unpack_entry(r, p, obj_offset, &type,
diff --git a/packfile.h b/packfile.h
index eb18ec15db..94941bbe80 100644
--- a/packfile.h
+++ b/packfile.h
@@ -210,4 +210,8 @@  int is_promisor_object(const struct object_id *oid);
 int load_idx(const char *path, const unsigned int hashsz, void *idx_map,
 	     size_t idx_size, struct packed_git *p);
 
+/*
+ * release lock acquired via oi->direct_cache
+ */
+void unlock_delta_base_cache(void);
 #endif

[v2,05/10] cat-file: use delta_base_cache entries directly

Commit Message

Comments

Patch