diff mbox series

fetch-pack: do not take shallow lock unnecessarily

Message ID 20190110193645.34080-1-jonathantanmy@google.com (mailing list archive)
State New, archived
Headers show
Series fetch-pack: do not take shallow lock unnecessarily | expand

Commit Message

Jonathan Tan Jan. 10, 2019, 7:36 p.m. UTC
When fetching using protocol v2, the remote may send a "shallow-info"
section if the client is shallow. If so, Git as the client currently
takes the shallow file lock, even if the "shallow-info" section is
empty.

This is not a problem except that Git does not support taking the
shallow file lock after modifying the shallow file, because
is_repository_shallow() stores information that is never cleared. And
this take-after-modify occurs when Git does a tag-following fetch from a
shallow repository on a transport that does not support tag following
(since in this case, 2 fetches are performed).

To solve this issue, take the shallow file lock (and perform all other
shallow processing) only if the "shallow-info" section is non-empty;
otherwise, behave as if it were empty.

A full solution (probably, ensuring that any action of committing
shallow file locks also includes clearing the information stored by
is_repository_shallow()) would solve the issue without need for this
patch, but this patch is independently useful (as an optimization to
prevent writing a file in an unnecessary case), hence why I wrote it. I
have included a NEEDSWORK outlining the full solution.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
Sorry for sending out multiple solutions to this issue, but I think I
have found the simplest way to do this. Even if we end up needing one of
the more complicated ways, I think that this is independently useful (as
stated above), so I am sending out this patch for your consideration.

For reference, the other solutions I sent out are:

(1) https://public-inbox.org/git/20181218010811.143608-1-jonathantanmy@google.com/
This is the full solution described in the commit message above. Locking
and committing the shallow file is now abstracted behind an interface
that ensures that anything done by is_repository_shallow() is cleared
when the shallow file is committed.

(2) https://public-inbox.org/git/20181220195349.92214-1-jonathantanmy@google.com/
A partial solution - if the client did not make any depth requests (as
is the case above - the client is shallow but made a normal fetch
request), any "shallow" lines are first filtered before determining if
the lock needs to be taken. This solves the issue in practice because
there are no "shallow"s, so no lock is taken (and the filter is a
no-op).

The two prior versions do not have the annotated tag in the test. I
noticed that the second fetch occurs regardless of whether the annotated
tag is present, but I included it anyway - the second fetch is possibly
a bug if the annotated tag is not present, but that issue is outside the
scope of this patch.
---
 fetch-pack.c           | 11 +++++++++--
 shallow.c              |  7 +++++++
 t/t5702-protocol-v2.sh | 18 ++++++++++++++++++
 3 files changed, 34 insertions(+), 2 deletions(-)

Comments

Stefan Beller Jan. 10, 2019, 11:17 p.m. UTC | #1
On Thu, Jan 10, 2019 at 11:36 AM Jonathan Tan <jonathantanmy@google.com> wrote:
>
> When fetching using protocol v2, the remote may send a "shallow-info"
> section if the client is shallow. If so, Git as the client currently
> takes the shallow file lock, even if the "shallow-info" section is
> empty.
>
> This is not a problem except that Git does not support taking the
> shallow file lock after modifying the shallow file, because
> is_repository_shallow() stores information that is never cleared. And
> this take-after-modify occurs when Git does a tag-following fetch from a
> shallow repository on a transport that does not support tag following
> (since in this case, 2 fetches are performed).
>
> To solve this issue, take the shallow file lock (and perform all other
> shallow processing) only if the "shallow-info" section is non-empty;
> otherwise, behave as if it were empty.

In other parts of the code we often have an early exit instead of
setting a variable and reacting to that in the end, i.e. what
do you think about:

static void receive_shallow_info(struct fetch_pack_args *args,
    struct packet_reader *reader)
{
     process_section_header(reader, "shallow-info", 0);
+    if (reader->status == PACKET_READ_FLUSH ||
+        reader->status == PACKET_READ_DELIM)
+            /* useful comment why empty sections appear */
+            return;
    while (packet_reader_read(reader) == PACKET_READ_NORMAL) {
    ...

instead? This would allow us to keep the rest of the function
relatively simple as well as we'd have a dedicated space where
we can explain why empty sections need to be treated specially.

> A full solution (probably, ensuring that any action of committing
> shallow file locks also includes clearing the information stored by
> is_repository_shallow()) would solve the issue without need for this
> patch, but this patch is independently useful (as an optimization to
> prevent writing a file in an unnecessary case), hence why I wrote it. I
> have included a NEEDSWORK outlining the full solution.

I like this patch..

> +test_expect_success 'ensure that multiple fetches in same process from a shallow repo works' '
> +       rm -rf server client trace &&
> +
> +       test_create_repo server &&
> +       test_commit -C server one &&
> +       test_commit -C server two &&
> +       test_commit -C server three &&
> +       git clone --shallow-exclude two "file://$(pwd)/server" client &&
> +
> +       git -C server tag -a -m "an annotated tag" twotag two &&
> +
> +       # Triggers tag following (thus, 2 fetches in one process)
> +       GIT_TRACE_PACKET="$(pwd)/trace" git -C client -c protocol.version=2 \
> +               fetch --shallow-exclude one origin &&
> +       # Ensure that protocol v2 is used
> +       grep "fetch< version 2" trace
> +'

Would we also need to ensure tags 'one' and 'three',
but not 'two' are present?
(What error condition do we see without this patch?)
Jonathan Tan Jan. 14, 2019, 7:08 p.m. UTC | #2
> In other parts of the code we often have an early exit instead of
> setting a variable and reacting to that in the end, i.e. what
> do you think about:
> 
> static void receive_shallow_info(struct fetch_pack_args *args,
>     struct packet_reader *reader)
> {
>      process_section_header(reader, "shallow-info", 0);
> +    if (reader->status == PACKET_READ_FLUSH ||
> +        reader->status == PACKET_READ_DELIM)
> +            /* useful comment why empty sections appear */
> +            return;
>     while (packet_reader_read(reader) == PACKET_READ_NORMAL) {
>     ...
> 
> instead? This would allow us to keep the rest of the function
> relatively simple as well as we'd have a dedicated space where
> we can explain why empty sections need to be treated specially.

Good idea. I'll do something like this in the next version, which will
be combined with another patch of mine into a series [1].

[1] https://public-inbox.org/git/xmqqwoncyvh5.fsf@gitster-ct.c.googlers.com/

> I like this patch..

Thanks.

> > +test_expect_success 'ensure that multiple fetches in same process from a shallow repo works' '
> > +       rm -rf server client trace &&
> > +
> > +       test_create_repo server &&
> > +       test_commit -C server one &&
> > +       test_commit -C server two &&
> > +       test_commit -C server three &&
> > +       git clone --shallow-exclude two "file://$(pwd)/server" client &&
> > +
> > +       git -C server tag -a -m "an annotated tag" twotag two &&
> > +
> > +       # Triggers tag following (thus, 2 fetches in one process)
> > +       GIT_TRACE_PACKET="$(pwd)/trace" git -C client -c protocol.version=2 \
> > +               fetch --shallow-exclude one origin &&
> > +       # Ensure that protocol v2 is used
> > +       grep "fetch< version 2" trace
> > +'
> 
> Would we also need to ensure tags 'one' and 'three',
> but not 'two' are present?
> (What error condition do we see without this patch?)

Well, both "two" and "three" should be present, but not "one". The error
condition is that, previously, this would fail with "fatal: shallow file
has changed since we read it". I'll add a note about this in the commit
message.
diff mbox series

Patch

diff --git a/fetch-pack.c b/fetch-pack.c
index dd6700bda9..5885623ece 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -1232,6 +1232,8 @@  static int process_acks(struct fetch_negotiator *negotiator,
 static void receive_shallow_info(struct fetch_pack_args *args,
 				 struct packet_reader *reader)
 {
+	int line_received = 0;
+
 	process_section_header(reader, "shallow-info", 0);
 	while (packet_reader_read(reader) == PACKET_READ_NORMAL) {
 		const char *arg;
@@ -1241,6 +1243,7 @@  static void receive_shallow_info(struct fetch_pack_args *args,
 			if (get_oid_hex(arg, &oid))
 				die(_("invalid shallow line: %s"), reader->line);
 			register_shallow(the_repository, &oid);
+			line_received = 1;
 			continue;
 		}
 		if (skip_prefix(reader->line, "unshallow ", &arg)) {
@@ -1253,6 +1256,7 @@  static void receive_shallow_info(struct fetch_pack_args *args,
 				die(_("error in object: %s"), reader->line);
 			if (unregister_shallow(&oid))
 				die(_("no shallow found: %s"), reader->line);
+			line_received = 1;
 			continue;
 		}
 		die(_("expected shallow/unshallow, got %s"), reader->line);
@@ -1262,8 +1266,11 @@  static void receive_shallow_info(struct fetch_pack_args *args,
 	    reader->status != PACKET_READ_DELIM)
 		die(_("error processing shallow info: %d"), reader->status);
 
-	setup_alternate_shallow(&shallow_lock, &alternate_shallow_file, NULL);
-	args->deepen = 1;
+	if (line_received) {
+		setup_alternate_shallow(&shallow_lock, &alternate_shallow_file,
+					NULL);
+		args->deepen = 1;
+	}
 }
 
 static void receive_wanted_refs(struct packet_reader *reader,
diff --git a/shallow.c b/shallow.c
index 02fdbfc554..ce45297940 100644
--- a/shallow.c
+++ b/shallow.c
@@ -43,6 +43,13 @@  int register_shallow(struct repository *r, const struct object_id *oid)
 
 int is_repository_shallow(struct repository *r)
 {
+	/*
+	 * NEEDSWORK: This function updates
+	 * r->parsed_objects->{is_shallow,shallow_stat} as a side effect but
+	 * there is no corresponding function to clear them when the shallow
+	 * file is updated.
+	 */
+
 	FILE *fp;
 	char buf[1024];
 	const char *path = r->parsed_objects->alternate_shallow_file;
diff --git a/t/t5702-protocol-v2.sh b/t/t5702-protocol-v2.sh
index 0f2b09ebb8..fd164d414d 100755
--- a/t/t5702-protocol-v2.sh
+++ b/t/t5702-protocol-v2.sh
@@ -471,6 +471,24 @@  test_expect_success 'upload-pack respects client shallows' '
 	grep "fetch< version 2" trace
 '
 
+test_expect_success 'ensure that multiple fetches in same process from a shallow repo works' '
+	rm -rf server client trace &&
+
+	test_create_repo server &&
+	test_commit -C server one &&
+	test_commit -C server two &&
+	test_commit -C server three &&
+	git clone --shallow-exclude two "file://$(pwd)/server" client &&
+
+	git -C server tag -a -m "an annotated tag" twotag two &&
+
+	# Triggers tag following (thus, 2 fetches in one process)
+	GIT_TRACE_PACKET="$(pwd)/trace" git -C client -c protocol.version=2 \
+		fetch --shallow-exclude one origin &&
+	# Ensure that protocol v2 is used
+	grep "fetch< version 2" trace
+'
+
 # Test protocol v2 with 'http://' transport
 #
 . "$TEST_DIRECTORY"/lib-httpd.sh