mbox series

[v4,0/6] cat-file: add remote-object-info to batch-command

Message ID 20241024205359.16376-1-eric.peijian@gmail.com (mailing list archive)
Headers show
Series cat-file: add remote-object-info to batch-command | expand

Message

Peijian Ju Oct. 24, 2024, 8:53 p.m. UTC
This is a continuation of Calvin Wan's (calvinwan@google.com)
patch series [PATCH v5 0/6] cat-file: add --batch-command remote-object-info command at [1].

Sometimes it is useful to get information about an object without having to download
it completely. The server logic for retrieving size has already been implemented and merged in
"a2ba162cda (object-info: support for retrieving object info, 2021-04-20)"[2].
This patch series implement the client option for it.

This patch series add the `remote-object-info` command to `cat-file --batch-command`.
This command allows the client to make an object-info command request to a server
that supports protocol v2. If the server is v2, but does not have
object-info capability, the entire object is fetched and the
relevant object info is returned.

A few questions open for discussions please:

1. In the current implementation, if a user puts `remote-object-info` in protocol v1,
   `cat-file --batch-command` will die. Which way do we prefer? "error and exit (i.e. die)"
   or "warn and wait for new command".

2. Right now, only the size is supported. If the batch command format
   contains objectsize:disk or deltabase, it will die. The question
   is about objecttype. In the current implementation, it will die too.
   But dying on objecttype breaks the default format. We have changed the
   default format to %(objectname) %(objectsize) when remote-object-info is used.
   Any suggestions on this approach?

[1] https://lore.kernel.org/git/20220728230210.2952731-1-calvinwan@google.com/#t
[2] https://git.kernel.org/pub/scm/git/git.git/commit/?id=a2ba162cda2acc171c3e36acbbc854792b093cb7

V1 of the patch series can be found here:
https://lore.kernel.org/git/20240628190503.67389-1-eric.peijian@gmail.com/

v2 of the patch series can be found here:
https://lore.kernel.org/git/20240720034337.57125-1-eric.peijian@gmail.com/

Changes since V3
================

- Fix typos and formatting errors

- Add warning in the git-cat-file doc about the default format

- Add a new test lib file, lib-cat-file.sh. And put the shared code of
t1017-cat-file-remote-object-info.sh and  t1006-cat-file.sh in it.

Thank you.

Eric Ju

Calvin Wan (5):
  fetch-pack: refactor packet writing
  fetch-pack: move fetch initialization
  serve: advertise object-info feature
  transport: add client support for object-info
  cat-file: add remote-object-info to batch-command

Eric Ju (1):
  cat-file: add declaration of variable i inside its for loop

 Documentation/git-cat-file.txt         |  24 +-
 builtin/cat-file.c                     | 119 +++-
 fetch-pack.c                           |  49 +-
 fetch-pack.h                           |  10 +
 object-file.c                          |  11 +
 object-store-ll.h                      |   3 +
 serve.c                                |   4 +-
 t/lib-cat-file.sh                      |  16 +
 t/t1006-cat-file.sh                    |  13 +-
 t/t1017-cat-file-remote-object-info.sh | 739 +++++++++++++++++++++++++
 transport-helper.c                     |  11 +-
 transport.c                            | 115 +++-
 transport.h                            |  11 +
 13 files changed, 1081 insertions(+), 44 deletions(-)
 create mode 100644 t/lib-cat-file.sh
 create mode 100755 t/t1017-cat-file-remote-object-info.sh

Range-diff against v3:
1:  b570dee186 = 1:  41898fe23e fetch-pack: refactor packet writing
2:  e8777e8776 = 2:  b3a1bee551 fetch-pack: move fetch initialization
3:  d00d19cf2c = 3:  d363b0f768 serve: advertise object-info feature
4:  3e1773910c ! 4:  3118061b21 transport: add client support for object-info
    @@ transport-helper.c: static int fetch_refs(struct transport *transport,
      	 */
      	if (data->transport_options.acked_commits) {
      		warning(_("--negotiate-only requires protocol v2"));
    - 		return -1;
    +@@ transport-helper.c: static int fetch_refs(struct transport *transport,
    + 		free_refs(dummy);
      	}
      
     +	/* fail the command explicitly to avoid further commands input. */
     +	if (transport->smart_options->object_info)
     +		die(_("remote-object-info requires protocol v2"));
     +
    - 	if (!data->get_refs_list_called)
    - 		get_refs_list_using_list(transport, 0);
    - 
    ++	if (!data->get_refs_list_called)
    ++		get_refs_list_using_list(transport, 0);
    ++
    + 	count = 0;
    + 	for (i = 0; i < nr_heads; i++)
    + 		if (!(to_fetch[i]->status & REF_STATUS_UPTODATE))
     
      ## transport.c ##
     @@ transport.c: static struct ref *handshake(struct transport *transport, int for_push,
    @@ transport.c: static struct ref *handshake(struct transport *transport, int for_p
     @@ transport.c: static int fetch_refs_via_pack(struct transport *transport,
      	struct ref *refs = NULL;
      	struct fetch_pack_args args;
    - 	struct ref *refs_tmp = NULL;
    + 	struct ref *refs_tmp = NULL, **to_fetch_dup = NULL;
     +	struct ref *object_info_refs = NULL;
      
      	memset(&args, 0, sizeof(args));
    @@ transport.c: static int fetch_refs_via_pack(struct transport *transport,
     +			ref_itr->exact_oid = 1;
     +			if (i == transport->smart_options->object_info_oids->nr - 1)
     +				/* last element, no need to allocate to next */
    -+				ref_itr -> next = NULL;
    ++				ref_itr->next = NULL;
     +			else
     +				ref_itr->next = alloc_ref("");
      
    @@ transport.c: static int fetch_refs_via_pack(struct transport *transport,
      	close(data->fd[0]);
      	if (data->fd[1] >= 0)
      		close(data->fd[1]);
    - 	if (finish_connect(data->conn))
    - 		ret = -1;
    - 	data->conn = NULL;
    --
    - 	free_refs(refs_tmp);
    - 	free_refs(refs);
    - 	list_objects_filter_release(&args.filter_options);
     
      ## transport.h ##
     @@
5:  bb110fbc93 = 5:  2ae81acf2a cat-file: add declaration of variable i inside its for loop
6:  6dd143c164 ! 6:  b5aa6c1888 cat-file: add remote-object-info to batch-command
    @@ Commit message
                 - Get object info, print object info
     
         To summarize, `remote-object-info` gets object info from the remote and
    -    then loop through the object info passed in, print the info.
    +    then loop through the object info passed in, printing the info.
     
         In order for remote-object-info to avoid remote communication overhead
         in the non-buffer mode, the objects are passed in as such:
    @@ Documentation/git-cat-file.txt: newline. The available atoms are:
     -%(objecttype) %(objectsize)`.
     +%(objecttype) %(objectsize)`, except for `remote-object-info` commands which use
     +`%(objectname) %(objectsize)` for now because "%(objecttype)" is not supported yet.
    -+When "%(objecttype)" is supported, default format should be unified.
    ++WARNING: When "%(objecttype)" is supported, the default format WILL be unified, so
    ++DO NOT RELY on the current the default format to stay the same!!!
      
      If `--batch` is specified, or if `--batch-command` is used with the `contents`
      command, the object information is followed by the object contents (consisting
    @@ Documentation/git-cat-file.txt: scripting purposes.
      CAVEATS
      -------
      
    -+Note that since %(objecttype), %(objectsize:disk) and %(deltabase) are currently not supported by the
    -+`remote-object-info` command, we will error and exit when they are in the format string.
    ++Note that since %(objecttype), %(objectsize:disk) and %(deltabase) are
    ++currently not supported by the `remote-object-info` command, we will error
    ++and exit when they are in the format string.
     +
      Note that the sizes of objects on disk are reported accurately, but care
      should be taken in drawing conclusions about which refs or objects are
    @@ object-store-ll.h: int for_each_object_in_pack(struct packed_git *p,
     +
      #endif /* OBJECT_STORE_LL_H */
     
    - ## t/t1017-cat-file-remote-object-info.sh (new) ##
    + ## t/lib-cat-file.sh (new) ##
     @@
    -+#!/bin/sh
    -+
    -+test_description='git cat-file --batch-command with remote-object-info command'
    -+
    -+GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
    -+export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
    -+
    -+. ./test-lib.sh
    ++# Library of git-cat-file related functions.
     +
    ++# Print a string without a trailing newline
     +echo_without_newline () {
    -+    printf '%s' "$*"
    ++	printf '%s' "$*"
     +}
     +
    ++# Print a string without newlines and replaces them with a NULL character (\0).
     +echo_without_newline_nul () {
    -+    echo_without_newline "$@" | tr '\n' '\0'
    ++	echo_without_newline "$@" | tr '\n' '\0'
     +}
     +
    ++# Calculate the length of a string removing any leading spaces.
     +strlen () {
    -+    echo_without_newline "$1" | wc -c | sed -e 's/^ *//'
    ++	echo_without_newline "$1" | wc -c | sed -e 's/^ *//'
     +}
    +
    + ## t/t1006-cat-file.sh ##
    +@@ t/t1006-cat-file.sh: test_description='git cat-file'
    + 
    + TEST_PASSES_SANITIZE_LEAK=true
    + . ./test-lib.sh
    ++. "$TEST_DIRECTORY"/lib-cat-file.sh
    + 
    + test_cmdmode_usage () {
    + 	test_expect_code 129 "$@" 2>err &&
    +@@ t/t1006-cat-file.sh: do
    + 	'
    + done
    + 
    +-echo_without_newline () {
    +-    printf '%s' "$*"
    +-}
    +-
    +-echo_without_newline_nul () {
    +-	echo_without_newline "$@" | tr '\n' '\0'
    +-}
    +-
    +-strlen () {
    +-    echo_without_newline "$1" | wc -c | sed -e 's/^ *//'
    +-}
    +-
    + run_tests () {
    +     type=$1
    +     oid=$2
    +
    + ## t/t1017-cat-file-remote-object-info.sh (new) ##
    +@@
    ++#!/bin/sh
    ++
    ++test_description='git cat-file --batch-command with remote-object-info command'
    ++
    ++GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
    ++export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
    ++
    ++. ./test-lib.sh
    ++. "$TEST_DIRECTORY"/lib-cat-file.sh
     +
     +hello_content="Hello World"
     +hello_size=$(strlen "$hello_content")

Comments

Taylor Blau Oct. 25, 2024, 8:56 p.m. UTC | #1
On Thu, Oct 24, 2024 at 04:53:53PM -0400, Eric Ju wrote:
> Calvin Wan (5):
>   fetch-pack: refactor packet writing
>   fetch-pack: move fetch initialization
>   serve: advertise object-info feature
>   transport: add client support for object-info
>   cat-file: add remote-object-info to batch-command
>
> Eric Ju (1):
>   cat-file: add declaration of variable i inside its for loop

Thanks. I just want to make sure that I have the right base here... this
was previously based on 3857aae53f (Git 2.47-rc0, 2024-09-25), but
applying the new round did not cleanly apply on top of that commit as
its merge base.

I applied the new round on top of the current tip of 'master', which is
6a11438f43 (The fifth batch, 2024-10-25) at the time of writing.

Please let me know if that was the right choice to make ;-).

Thanks,
Taylor