mbox series

[v6,0/6] cat-file: add remote-object-info to batch-command

Message ID 20241108162441.50736-1-eric.peijian@gmail.com (mailing list archive)
Headers show
Series cat-file: add remote-object-info to batch-command | expand

Message

Eric Ju Nov. 8, 2024, 4:24 p.m. UTC
This is a continuation of Calvin Wan's (calvinwan@google.com)
patch series [PATCH v5 0/6] cat-file: add --batch-command remote-object-info command at [1].

Sometimes it is useful to get information about an object without having to download
it completely. The server logic for retrieving size has already been implemented and merged in
"a2ba162cda (object-info: support for retrieving object info, 2021-04-20)"[2].
This patch series implement the client option for it.

This patch series add the `remote-object-info` command to `cat-file --batch-command`.
This command allows the client to make an object-info command request to a server
that supports protocol v2. If the server is v2, but does not have
object-info capability, the entire object is fetched and the
relevant object info is returned.

A few questions open for discussions please:

1. In the current implementation, if a user puts `remote-object-info` in protocol v1,
   `cat-file --batch-command` will die. Which way do we prefer? "error and exit (i.e. die)"
   or "warn and wait for new command".

2. Right now, only the size is supported. If the batch command format
   contains objectsize:disk or deltabase, it will die. The question
   is about objecttype. In the current implementation, it will die too.
   But dying on objecttype breaks the default format. We have changed the
   default format to %(objectname) %(objectsize) when remote-object-info is used.
   Any suggestions on this approach?

[1] https://lore.kernel.org/git/20220728230210.2952731-1-calvinwan@google.com/#t
[2] https://git.kernel.org/pub/scm/git/git.git/commit/?id=a2ba162cda2acc171c3e36acbbc854792b093cb7

Changes since V5
================
- Move small changes commit to the very first commit
- Add more detailed description on what changes when moving write_fetch_command_and_capabilities() to connect.c
- Fix indentation problems

Thank you.

Eric Ju

Calvin Wan (4):
  fetch-pack: refactor packet writing
  fetch-pack: move fetch initialization
  serve: advertise object-info feature
  transport: add client support for object-info

Eric Ju (2):
  cat-file: add declaration of variable i inside its for loop
  cat-file: add remote-object-info to batch-command

 Documentation/git-cat-file.txt         |  24 +-
 Makefile                               |   1 +
 builtin/cat-file.c                     | 116 +++-
 connect.c                              |  34 ++
 connect.h                              |   4 +
 fetch-object-info.c                    |  95 ++++
 fetch-object-info.h                    |  18 +
 fetch-pack.c                           |  51 +-
 fetch-pack.h                           |   2 +
 object-file.c                          |  11 +
 object-store-ll.h                      |   3 +
 serve.c                                |   4 +-
 t/lib-cat-file.sh                      |  16 +
 t/t1006-cat-file.sh                    |  13 +-
 t/t1017-cat-file-remote-object-info.sh | 739 +++++++++++++++++++++++++
 transport-helper.c                     |  11 +-
 transport.c                            |  77 ++-
 transport.h                            |  11 +
 18 files changed, 1162 insertions(+), 68 deletions(-)
 create mode 100644 fetch-object-info.c
 create mode 100644 fetch-object-info.h
 create mode 100644 t/lib-cat-file.sh
 create mode 100755 t/t1017-cat-file-remote-object-info.sh

Range-diff against v5:
5:  0e533d6d0a ! 1:  858a864651 cat-file: add declaration of variable i inside its for loop
    @@ Metadata
      ## Commit message ##
         cat-file: add declaration of variable i inside its for loop
     
    -    Some code declares variable i and only uses it
    +    Some code used in this series declares variable i and only uses it
         in a for loop, not in any other logic outside the loop.
     
         Change the declaration of i to be inside the for loop for readability.
    @@ builtin/cat-file.c: static void batch_objects_command(struct batch_options *opt,
      			if (!skip_prefix(input.buf, commands[i].name, &cmd_end))
      				continue;
      
    +
    + ## fetch-pack.c ##
    +@@ fetch-pack.c: static void write_fetch_command_and_capabilities(struct strbuf *req_buf,
    + 	if (advertise_sid && server_supports_v2("session-id"))
    + 		packet_buf_write(req_buf, "session-id=%s", trace2_session_id());
    + 	if (server_options && server_options->nr) {
    +-		int i;
    + 		ensure_server_supports_v2("server-option");
    +-		for (i = 0; i < server_options->nr; i++)
    ++		for (int i = 0; i < server_options->nr; i++)
    + 			packet_buf_write(req_buf, "server-option=%s",
    + 					 server_options->items[i].string);
    + 	}
1:  d5c93792d3 ! 2:  51707f08d1 fetch-pack: refactor packet writing
    @@ Metadata
      ## Commit message ##
         fetch-pack: refactor packet writing
     
    -    Refactor write_fetch_command_and_capabilities() to be a more general
    -    purpose function write_command_and_capabilities(), so that it can be
    -    used by both fetch and future commands.
    +    Refactor write_fetch_command_and_capabilities() to a more
    +    general-purpose function, write_command_and_capabilities(), enabling it
    +    to serve both fetch and additional commands.
     
    -    Here "command" means the "operations" supported by Git’s wire protocol
    -    https://git-scm.com/docs/protocol-v2. An example would be a
    -    git's subcommand, such as git-fetch(1); or an operation supported by
    -    the server side such as "object-info" implemented in "a2ba162cda
    -    (object-info: support for retrieving object info, 2021-04-20)".
    +    In this context, "command" refers to the "operations" supported by
    +    Git's wire protocol https://git-scm.com/docs/protocol-v2, such as a Git
    +    subcommand (e.g., git-fetch(1)) or a server-side operation like
    +    "object-info" as implemented in commit a2ba162c
    +    (object-info: support for retrieving object info, 2021-04-20).
     
    -    The new write_command_and_capabilities() function is also moved to
    -    connect.c, so that it becomes accessible to other commands.
    +    Furthermore, write_command_and_capabilities() is moved to connect.c,
    +    making it accessible to additional commands in the future.
    +
    +    To move write_command_and_capabilities() to connect.c, we need to
    +    adjust how `advertise_sid` is managed. Previously,
    +    in fetch_pack.c, `advertise_sid` was a static variable, modified using
    +    git_config_get_bool().
    +
    +    In connect.c, we now initialize `advertise_sid` at the beginning by
    +    directly using git_config_get_bool(). This change is safe because:
    +
    +    In the original fetch-pack.c code, there are only two places that
    +    write `advertise_sid` :
    +    1. In function do_fetch_pack:
    +            if (!server_supports("session-id"))
    +                    advertise_sid = 0;
    +    2. In function fetch_pack_config():
    +            git_config_get_bool("transfer.advertisesid", &advertise_sid);
    +
    +    About 1, since do_fetch_pack() is only relevant for protocol v1, this
    +    assignment can be ignored in our refactor, as
    +    write_command_and_capabilities() is only used in protocol v2.
    +
    +    About 2, git_config_get_bool() is from config.h and it is an out-of-box
    +    dependency of connect.c, so we can reuse it directly.
     
         Helped-by: Jonathan Tan <jonathantanmy@google.com>
         Helped-by: Christian Couder <chriscool@tuxfamily.org>
    @@ connect.c: int server_supports(const char *feature)
      }
      
     +void write_command_and_capabilities(struct strbuf *req_buf, const char *command,
    -+									const struct string_list *server_options)
    ++				    const struct string_list *server_options)
     +{
     +	const char *hash_name;
     +	int advertise_sid;
    @@ connect.h: void check_stateless_delimiter(int stateless_rpc,
      			       const char *error);
      
     +void write_command_and_capabilities(struct strbuf *req_buf, const char *command,
    -+									const struct string_list *server_options);
    ++				    const struct string_list *server_options);
     +
      #endif
     
    @@ fetch-pack.c: static int add_haves(struct fetch_negotiator *negotiator,
     -	if (advertise_sid && server_supports_v2("session-id"))
     -		packet_buf_write(req_buf, "session-id=%s", trace2_session_id());
     -	if (server_options && server_options->nr) {
    --		int i;
     -		ensure_server_supports_v2("server-option");
    --		for (i = 0; i < server_options->nr; i++)
    +-		for (int i = 0; i < server_options->nr; i++)
     -			packet_buf_write(req_buf, "server-option=%s",
     -					 server_options->items[i].string);
     -	}
2:  4bf51150c0 = 3:  f02338e90e fetch-pack: move fetch initialization
3:  15b4095e28 = 4:  934bafb1db serve: advertise object-info feature
4:  68794ae57a ! 5:  e2e94d1a32 transport: add client support for object-info
    @@ fetch-object-info.c (new)
     + * and read the results from packets.
     + */
     +int fetch_object_info(const enum protocol_version version, struct object_info_args *args,
    -+					  struct packet_reader *reader, struct object_info *object_info_data,
    -+					  const int stateless_rpc, const int fd_out)
    ++		      struct packet_reader *reader, struct object_info *object_info_data,
    ++		      const int stateless_rpc, const int fd_out)
     +{
     +	int size_index = -1;
     +
    @@ fetch-object-info.h (new)
     +};
     +
     +int fetch_object_info(enum protocol_version version, struct object_info_args *args,
    -+					  struct packet_reader *reader, struct object_info *object_info_data,
    -+					  int stateless_rpc, int fd_out);
    ++		      struct packet_reader *reader, struct object_info *object_info_data,
    ++		      int stateless_rpc, int fd_out);
     +
     +#endif /* FETCH_OBJECT_INFO_H */
     
    @@ transport.c: static int fetch_refs_via_pack(struct transport *transport,
     +		}
      
     -	if (!data->finished_handshake) {
    --		int i;
     +		/*
     +		 * If the code reaches here, it means we can't retrieve object info from
     +		 * packets, and we will fallback to downland the pack files.
    @@ transport.c: static int fetch_refs_via_pack(struct transport *transport,
     +		transport->remote_refs = object_info_refs;
     +
     +	} else if (!data->finished_handshake) {
    + 		int i;
      		int must_list_refs = 0;
    --		for (i = 0; i < nr_heads; i++) {
    -+		for (int i = 0; i < nr_heads; i++) {
    - 			if (!to_fetch[i]->exact_oid) {
    - 				must_list_refs = 1;
    - 				break;
    + 		for (i = 0; i < nr_heads; i++) {
     @@ transport.c: static int fetch_refs_via_pack(struct transport *transport,
      			  &transport->pack_lockfiles, data->version);
      
6:  98f8248203 ! 6:  69eb091a6b cat-file: add remote-object-info to batch-command
    @@ Documentation/git-cat-file.txt: info <object>::
      	output of `--batch-check`.
      
     +remote-object-info <remote> <object>...::
    -+	Print object info for object references `<object>` at specified <remote> without
    -+	downloading objects from remote. If the object-info capability is not
    -+	supported by the server, the objects will be downloaded instead.
    ++	Print object info for object references `<object>` at specified <remote>
    ++	without downloading objects from remote. If the object-info capability
    ++	is not supported by the server, the objects will be downloaded instead.
     +	Error when no object references are provided.
     +	This command may be combined with `--buffer`.
     +
    @@ builtin/cat-file.c: struct batch_options {
      
      static struct string_list mailmap = STRING_LIST_INIT_NODUP;
      static int use_mailmap;
    -@@ builtin/cat-file.c: static void batch_one_object(const char *obj_name,
    - 	enum get_oid_result result;
    - 
    - 	result = get_oid_with_context(the_repository, obj_name,
    --				      flags, &data->oid, &ctx);
    -+								  flags, &data->oid, &ctx);
    - 	if (result != FOUND) {
    - 		switch (result) {
    - 		case MISSING_OBJECT:
     @@ builtin/cat-file.c: static void batch_one_object(const char *obj_name,
      	object_context_release(&ctx);
      }
    @@ builtin/cat-file.c: static void parse_cmd_info(struct batch_options *opt,
      }
      
     +static void parse_cmd_remote_object_info(struct batch_options *opt,
    -+			   const char *line,
    -+			   struct strbuf *output,
    -+			   struct expand_data *data)
    ++					 const char *line, struct strbuf *output,
    ++					 struct expand_data *data)
     +{
     +	int count;
     +	const char **argv;

Comments

Junio C Hamano Nov. 11, 2024, 4:38 a.m. UTC | #1
Eric Ju <eric.peijian@gmail.com> writes:

> This is a continuation of Calvin Wan's (calvinwan@google.com)
> patch series [PATCH v5 0/6] cat-file: add --batch-command remote-object-info command at [1].
>
> Sometimes it is useful to get information about an object without having to download
> it completely. The server logic for retrieving size has already been implemented and merged in
> "a2ba162cda (object-info: support for retrieving object info, 2021-04-20)"[2].
> This patch series implement the client option for it.
>
> This patch series add the `remote-object-info` command to `cat-file --batch-command`.
> This command allows the client to make an object-info command request to a server
> that supports protocol v2. If the server is v2, but does not have
> object-info capability, the entire object is fetched and the
> relevant object info is returned.
>
> A few questions open for discussions please:
>
> 1. In the current implementation, if a user puts `remote-object-info` in protocol v1,
>    `cat-file --batch-command` will die. Which way do we prefer? "error and exit (i.e. die)"
>    or "warn and wait for new command".

In the primary use case envisioned, would it be a program that is
driving the "cat-file --batch-command" process?  Can it sensibly
react to "warn and wait" and throw different commands to achieve
what it wanted to do with the remote-object-info command?

If the answer is "no", die would be more appropriate.

> 2. Right now, only the size is supported. If the batch command format
>    contains objectsize:disk or deltabase, it will die. The question
>    is about objecttype. In the current implementation, it will die too.
>    But dying on objecttype breaks the default format. We have changed the
>    default format to %(objectname) %(objectsize) when remote-object-info is used.
>    Any suggestions on this approach?

Why bend the default format to the shortcoming of the new feature?
What makes it impossible to learn what type of object it is?  If the
limitation that makes it impossible cannot be avoided, would it make
more sense to fall back to the "fetch and locally inspect" just like
"the other side does not know how to do object-info" case?

Another thing you did not list, which is related, is where the
"fetch and locally inspect" fallback fetch the object into.  Would
we use a quarantine mechanism, so that a mere request for remote
object info for an object will not contaminate our local object
store until the next gc realizes that such an object is dangling?

Thanks.