Message ID | 20210415212017.1407303-1-bga@google.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | object-info: support for retrieving object info | expand |
Bruno Albuquerque <bga@google.com> writes: > Sometimes it is useful to get information of an object without having to > download it completely. > > Add the "object-info" capability that lets the client ask for > object-related information with their full hexadecimal object names. > > Only sizes are returned for now. > > Signed-off-by: Bruno Albuquerque <bga@google.com> > --- > Documentation/technical/protocol-v2.txt | 31 +++++++ > Makefile | 1 + > protocol-caps.c | 115 ++++++++++++++++++++++++ > protocol-caps.h | 10 +++ > serve.c | 2 + > t/t5701-git-serve.sh | 26 ++++++ > 6 files changed, 185 insertions(+) > create mode 100644 protocol-caps.c > create mode 100644 protocol-caps.h > > Hello. > > This is my first git patch so I thought I would introduce myself. I am a > software engineer at Google and I have been involved with opensource for > a while (mostly with the Haiku OS project) and now I am working on some > Git changes that hopefully will be generally usefull. > > For this specific change, a clear usage scenario is implementing a VFS > on top of Git (something like https://github.com/microsoft/VFSForGit) in > a way that would not require someone to always fully download objects to > get information about them. Object size is the obvious one and what is > implemented here. Is the assumption that such an implementation of VFS would fetch individual tree object with the existing "fetch this single object by the object name" interface? What I am wondering is, as an ingredient for implementing VFS layer, if this is a bit too low level. To respond to "ls -l" when you only have a tree object name, you'd need two roundtrips, one to retrieve the tree, and then after parsing the tree to find out what objects the tree refers to with what pathname component, you'd issue the object-info for all of them in a single request. If a request takes a single (or multiple) tree object name, and lets you retrieve _both_ the tree object itself _and_ object-info for the objects the tree refers to, you can build "ls -l" with a single roundtrip instead. I do not know how much the latency matters (or more importantly, how much a naïve coutner-proposal like the above would help), but it is what immediately came to my mind. Assuming that we are good with an interface that needs two requests to obtain "object contents" and "object info" separately, I find what in this patch quite reasonable, though (admittedly, I've already read this patch during internal review number of times). Thanks. > diff --git a/Documentation/technical/protocol-v2.txt b/Documentation/technical/protocol-v2.txt > index a7c806a73e..f4ed141774 100644 > --- a/Documentation/technical/protocol-v2.txt > +++ b/Documentation/technical/protocol-v2.txt > @@ -514,3 +514,34 @@ packet-line, and must not contain non-printable or whitespace characters. The > current implementation uses trace2 session IDs (see > link:api-trace2.html[api-trace2] for details), but this may change and users of > the session ID should not rely on this fact. > + > +object-info > +~~~~~~~~~~~ > + > +`object-info` is the command to retrieve information about one or more objects. > +Its main purpose is to allow a client to make decisions based on this > +information without having to fully fetch objects. Object size is the only > +information that is currently supported. > + > +An `object-info` request takes the following arguments: > + > + size > + Requests size information to be returned for each listed object id. > + > + oid <oid> > + Indicates to the server an object which the client wants to obtain > + information for. > + > +The response of `object-info` is a list of the the requested object ids > +and associated requested information, each separated by a single space. > + > + output = info flush-pkt > + > + info = PKT-LINE(attrs) LF) > + *PKT-LINE(obj-info LF) > + > + attrs = attr | attrs SP attrs > + > + attr = "size" > + > + obj-info = obj-id SP obj-size > diff --git a/Makefile b/Makefile > index 21c0bf1667..3225e37b63 100644 > --- a/Makefile > +++ b/Makefile > @@ -961,6 +961,7 @@ LIB_OBJS += progress.o > LIB_OBJS += promisor-remote.o > LIB_OBJS += prompt.o > LIB_OBJS += protocol.o > +LIB_OBJS += protocol-caps.o > LIB_OBJS += prune-packed.o > LIB_OBJS += quote.o > LIB_OBJS += range-diff.o > diff --git a/protocol-caps.c b/protocol-caps.c > new file mode 100644 > index 0000000000..c15e397756 > --- /dev/null > +++ b/protocol-caps.c > @@ -0,0 +1,115 @@ > +#include "git-compat-util.h" > +#include "protocol-caps.h" > +#include "gettext.h" > +#include "pkt-line.h" > +#include "strvec.h" > +#include "hash.h" > +#include "object.h" > +#include "object-store.h" > +#include "string-list.h" > +#include "strbuf.h" > + > +struct requested_info { > + unsigned size : 1; > +}; > + > +/* > + * Parses oids from the given line and collects them in the given > + * oid_str_list. Returns 1 if parsing was successful and 0 otherwise. > + */ > +static int parse_oid(const char *line, struct string_list *oid_str_list) > +{ > + const char *arg; > + > + if (!skip_prefix(line, "oid ", &arg)) > + return 0; > + > + string_list_append(oid_str_list, arg); > + > + return 1; > +} > + > +/* > + * Validates and send requested info back to the client. Any errors detected > + * are returned as they are detected. > + */ > +static void send_info(struct repository *r, struct packet_writer *writer, > + struct string_list *oid_str_list, > + struct requested_info *info) > +{ > + struct string_list_item *item; > + struct strbuf send_buffer = STRBUF_INIT; > + > + if (!oid_str_list->nr) > + return; > + > + if (info->size) > + packet_writer_write(writer, "size"); > + > + for_each_string_list_item (item, oid_str_list) { > + const char *oid_str = item->string; > + struct object_id oid; > + unsigned long object_size; > + > + if (get_oid_hex(oid_str, &oid) < 0) { > + packet_writer_error( > + writer, > + "object-info: protocol error, expected to get oid, not '%s'", > + oid_str); > + continue; > + } > + > + strbuf_addstr(&send_buffer, oid_str); > + > + if (info->size) { > + if (oid_object_info(r, &oid, &object_size) < 0) { > + strbuf_addstr(&send_buffer, " "); > + } else { > + strbuf_addf(&send_buffer, " %lu", object_size); > + } > + } > + > + packet_writer_write(writer, "%s", > + strbuf_detach(&send_buffer, NULL)); > + } > +} > + > +int cap_object_info(struct repository *r, struct strvec *keys, > + struct packet_reader *request) > +{ > + struct packet_writer writer; > + packet_writer_init(&writer, 1); > + int parsed_header; > + struct requested_info info; > + > + struct string_list oid_str_list = STRING_LIST_INIT_DUP; > + > + parsed_header = 0; > + while (packet_reader_read(request) == PACKET_READ_NORMAL) { > + if (!strcmp("size", request->line)) { > + info.size = 1; > + continue; > + } > + > + if (parse_oid(request->line, &oid_str_list)) > + continue; > + > + packet_writer_error(&writer, > + "object-info: unexpected line: '%s'", > + request->line); > + } > + > + if (request->status != PACKET_READ_FLUSH) { > + packet_writer_error( > + &writer, "object-info: expected flush after arguments"); > + die(_("object-info: expected flush after arguments")); > + } > + > + send_info(r, &writer, &oid_str_list, &info); > + > + string_list_clear(&oid_str_list, 1); > + > + packet_flush(1); > + > + return 0; > +} > diff --git a/protocol-caps.h b/protocol-caps.h > new file mode 100644 > index 0000000000..6351648e37 > --- /dev/null > +++ b/protocol-caps.h > @@ -0,0 +1,10 @@ > +#ifndef PROTOCOL_CAPS_H > +#define PROTOCOL_CAPS_H > + > +struct repository; > +struct strvec; > +struct packet_reader; > +int cap_object_info(struct repository *r, struct strvec *keys, > + struct packet_reader *request); > + > +#endif /* PROTOCOL_CAPS_H */ > \ No newline at end of file > diff --git a/serve.c b/serve.c > index ac20c72763..aa8209f147 100644 > --- a/serve.c > +++ b/serve.c > @@ -5,6 +5,7 @@ > #include "version.h" > #include "strvec.h" > #include "ls-refs.h" > +#include "protocol-caps.h" > #include "serve.h" > #include "upload-pack.h" > > @@ -78,6 +79,7 @@ static struct protocol_capability capabilities[] = { > { "server-option", always_advertise, NULL }, > { "object-format", object_format_advertise, NULL }, > { "session-id", session_id_advertise, NULL }, > + { "object-info", always_advertise, cap_object_info }, > }; > > static void advertise_capabilities(void) > diff --git a/t/t5701-git-serve.sh b/t/t5701-git-serve.sh > index 509f379d49..73e74a9c54 100755 > --- a/t/t5701-git-serve.sh > +++ b/t/t5701-git-serve.sh > @@ -19,6 +19,7 @@ test_expect_success 'test capability advertisement' ' > fetch=shallow > server-option > object-format=$(test_oid algo) > + object-info > 0000 > EOF > > @@ -240,4 +241,29 @@ test_expect_success 'unexpected lines are not allowed in fetch request' ' > grep "unexpected line: .this-is-not-a-command." err > ' > > +# Test the basics of object-info > +# > +test_expect_success 'basics of object-info' ' > + test-tool pkt-line pack >in <<-EOF && > + command=object-info > + object-format=$(test_oid algo) > + 0001 > + size > + oid $(git rev-parse two:two.t) > + oid $(git rev-parse two:two.t) > + 0000 > + EOF > + > + cat >expect <<-EOF && > + size > + $(git rev-parse two:two.t) $(wc -c <two.t | xargs) > + $(git rev-parse two:two.t) $(wc -c <two.t | xargs) > + 0000 > + EOF > + > + test-tool serve-v2 --stateless-rpc <in >out && > + test-tool pkt-line unpack <out >actual && > + test_cmp expect actual > +' > + > test_done
Bruno Albuquerque <bga@google.com> writes: > +int cap_object_info(struct repository *r, struct strvec *keys, > + struct packet_reader *request) > +{ > + struct packet_writer writer; > + packet_writer_init(&writer, 1); This triggers -Wdeclaration-after-statement below. Move it down. > + int parsed_header; > + struct requested_info info; > + Puzzling blank line here. There does not seem to be a good reason why 'parsed_header' bit plus 'info' pair together closely than to the other 'oid_str_list' variable, so it does not make much sense as a grouping aid. > + struct string_list oid_str_list = STRING_LIST_INIT_DUP; > + Here, just before "parsed_header = 0;" after the blank line that separates the decls and the statements, is a good place to say "packet_writer_init()". Also it may make more sense to give initial value to parsed_header where it is declared. > + parsed_header = 0; > + while (packet_reader_read(request) == PACKET_READ_NORMAL) { > + if (!strcmp("size", request->line)) { > + info.size = 1; > + continue; And upon further inspection, nobody seems to use parsed_header at all. Let's lose it. Perhaps this minimum fix-up squashed in? Next time, perhaps try "make DEVELOPER=YesPlease test" to catch possible problems like these early? Thanks. protocol-caps.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git c/protocol-caps.c w/protocol-caps.c index c15e397756..922bfeb905 100644 --- c/protocol-caps.c +++ w/protocol-caps.c @@ -78,13 +78,10 @@ int cap_object_info(struct repository *r, struct strvec *keys, struct packet_reader *request) { struct packet_writer writer; - packet_writer_init(&writer, 1); - int parsed_header; struct requested_info info; - struct string_list oid_str_list = STRING_LIST_INIT_DUP; - parsed_header = 0; + packet_writer_init(&writer, 1); while (packet_reader_read(request) == PACKET_READ_NORMAL) { if (!strcmp("size", request->line)) { info.size = 1;
On Thu, Apr 15, 2021 at 2:53 PM Junio C Hamano <gitster@pobox.com> wrote: > > Is the assumption that such an implementation of VFS would fetch > individual tree object with the existing "fetch this single object > by the object name" interface? That is the general idea, yes. > What I am wondering is, as an ingredient for implementing VFS layer, > if this is a bit too low level. To respond to "ls -l" when you only > have a tree object name, you'd need two roundtrips, one to retrieve > the tree, and then after parsing the tree to find out what objects > the tree refers to with what pathname component, you'd issue the > object-info for all of them in a single request. Yes, as it is designed you would need to do that. There are a couple few reasons for this implementation to do this the way it does: 1 - Although it is currently only returning object sizes, it was designed in a way that it can be extended if need to return other object metadata. 2 - Doing it like this is a fully backwards-compatible change and older clients would still work without changes (just not make use of this). 3- In a real filesystem, it is common to have multiple directories under a tree so in practice you can optimize requests (if needed) by retrieving several tree objects and doing a single object-info request for all objects in those trees. Note I am not saying it could not be done in a different way, but it does look to me this change strikes a good balance between what it provides and its cost, > If a request takes a single (or multiple) tree object name, and lets > you retrieve _both_ the tree object itself _and_ object-info for the > objects the tree refers to, you can build "ls -l" with a single > roundtrip instead. That is true. But it seems to me that there is not a good place to fit size (and maybe other metadata eventually) information currently with the existing protocol. But it is not impossible that I might simply be missing something. > I do not know how much the latency matters (or more importantly, how > much a naïve coutner-proposal like the above would help), but it is > what immediately came to my mind. I do not expect the latency of the object-info request to be an issue especially because fetching the information can be done in batches and also prefetched in some cases (as we do not need to download the objects, we do not need to worry about downloading possibly gigabytes of data while just iterating through directories). But, of course, assuming there is a clean way to make things even better, I am all for it. > Assuming that we are good with an interface that needs two requests > to obtain "object contents" and "object info" separately, I find > what in this patch quite reasonable, though (admittedly, I've > already read this patch during internal review number of times). FWIIW, I gave this a lot of thought and for the purposes of doing a remote file system (and one can even optimize git ls-tree to be a lot faster for partial clones using this), I feel confident about the change (module missing something, of course). Thanks for your comments.
On 2021-04-15 at 21:20:17, Bruno Albuquerque wrote: > Sometimes it is useful to get information of an object without having to > download it completely. > > Add the "object-info" capability that lets the client ask for > object-related information with their full hexadecimal object names. > > Only sizes are returned for now. > > Signed-off-by: Bruno Albuquerque <bga@google.com> > --- > Documentation/technical/protocol-v2.txt | 31 +++++++ > Makefile | 1 + > protocol-caps.c | 115 ++++++++++++++++++++++++ > protocol-caps.h | 10 +++ > serve.c | 2 + > t/t5701-git-serve.sh | 26 ++++++ > 6 files changed, 185 insertions(+) > create mode 100644 protocol-caps.c > create mode 100644 protocol-caps.h > > Hello. > > This is my first git patch so I thought I would introduce myself. I am a > software engineer at Google and I have been involved with opensource for > a while (mostly with the Haiku OS project) and now I am working on some > Git changes that hopefully will be generally usefull. > > For this specific change, a clear usage scenario is implementing a VFS > on top of Git (something like https://github.com/microsoft/VFSForGit) in > a way that would not require someone to always fully download objects to > get information about them. Object size is the obvious one and what is > implemented here. I want to point out a few notes as someone who's worked on projects appurtenant to VFS for Git. If your goal is to create a VFS, then learning what Git thinks the object size is is not helpful. Git performs all sorts of operations on files when checking them out, such as smudge and clean filters (e.g., Git LFS), line ending conversion, and working tree encodings (e.g., to and from UTF-16 on Windows). Thus, your VFS must necessarily exclude all of this functionality to use Git's concept of an object size as a VFS parameter. For the average project, the worst functionality to lose is actually line ending conversion, since that means people will practically check in a mix of line endings unless you are very strict in CI. VFS for Git was originally designed to address the needs of a project that was not cross-platform and thus this was not a problem, but these limitations are substantial and I wouldn't recommend repeating them. So I'm not opposed to seeing this functionality come in if you have other plans for it (which is fine and I'm certainly interested in hearing about them if you do), but I don't think this is helpful for VFS scenarios and shouldn't be used for that purpose.
On Fri, Apr 16, 2021 at 3:02 PM brian m. carlson <sandals@crustytoothpaste.net> wrote: > I want to point out a few notes as someone who's worked on projects > appurtenant to VFS for Git. If your goal is to create a VFS, then > learning what Git thinks the object size is is not helpful. > > Git performs all sorts of operations on files when checking them out, > such as smudge and clean filters (e.g., Git LFS), line ending > conversion, and working tree encodings (e.g., to and from UTF-16 on > Windows). Thus, your VFS must necessarily exclude all of this > functionality to use Git's concept of an object size as a VFS parameter. This is, of course, completely valid from a Git-specific point of view but, because we are dealing with a hypothetical remote FS, it would not be unheard of (and, actually, this is most likely the norm more often than not) that when exposing git objects in an actual FS, we would not be doing any conversions whatsoever and would, instead, show everything as it is actually stored on the remote server. > For the average project, the worst functionality to lose is actually > line ending conversion, since that means people will practically check > in a mix of line endings unless you are very strict in CI. VFS for Git > was originally designed to address the needs of a project that was not > cross-platform and thus this was not a problem, but these limitations > are substantial and I wouldn't recommend repeating them. Although I understand the issue here, this would be fine in the context of a remote filesystem as I see it. In fact this is something that actually has precedent in git anyway ("git cat-file -s" without --filters will emit the size without any conversions). As long as the size we report match with the actual contents of the file we expose (i.e. without conversions), this looks ok to me. > So I'm not opposed to seeing this functionality come in if you have > other plans for it (which is fine and I'm certainly interested in > hearing about them if you do), but I don't think this is helpful for VFS > scenarios and shouldn't be used for that purpose. Thanks for your comments. This actually works for our purposes and, more generally, it would be useful even if you do care about all the conversions (in this case, as a reasonable approximation of the actual disk size I guess).
On Thu, Apr 15, 2021 at 3:15 PM Junio C Hamano <gitster@pobox.com> wrote: > > +int cap_object_info(struct repository *r, struct strvec *keys, > > + struct packet_reader *request) > > +{ > > + struct packet_writer writer; > > + packet_writer_init(&writer, 1); > > This triggers -Wdeclaration-after-statement below. Move it down. Done. > > + int parsed_header; > > + struct requested_info info; > > Puzzling blank line here. There does not seem to be a good reason > why 'parsed_header' bit plus 'info' pair together closely than to > the other 'oid_str_list' variable, so it does not make much sense as > a grouping aid. Done. > > + struct string_list oid_str_list = STRING_LIST_INIT_DUP; > > + > > Here, just before "parsed_header = 0;" after the blank line that > separates the decls and the statements, is a good place to say > "packet_writer_init()". Also it may make more sense to give initial > value to parsed_header where it is declared. Done. > > + parsed_header = 0; > > + while (packet_reader_read(request) == PACKET_READ_NORMAL) { > > + if (!strcmp("size", request->line)) { > > + info.size = 1; > > + continue; > > And upon further inspection, nobody seems to use parsed_header at > all. Let's lose it. Done. > Next time, perhaps try "make DEVELOPER=YesPlease test" to catch > possible problems like these early? I did. But I was coding on a MacOS machine (which defaults to clang) and it looks like the warnings are not triggered at all in clang even with -Werror being explicitly set. I switched to a Linux machine and got the warnings.
diff --git a/Documentation/technical/protocol-v2.txt b/Documentation/technical/protocol-v2.txt index a7c806a73e..f4ed141774 100644 --- a/Documentation/technical/protocol-v2.txt +++ b/Documentation/technical/protocol-v2.txt @@ -514,3 +514,34 @@ packet-line, and must not contain non-printable or whitespace characters. The current implementation uses trace2 session IDs (see link:api-trace2.html[api-trace2] for details), but this may change and users of the session ID should not rely on this fact. + +object-info +~~~~~~~~~~~ + +`object-info` is the command to retrieve information about one or more objects. +Its main purpose is to allow a client to make decisions based on this +information without having to fully fetch objects. Object size is the only +information that is currently supported. + +An `object-info` request takes the following arguments: + + size + Requests size information to be returned for each listed object id. + + oid <oid> + Indicates to the server an object which the client wants to obtain + information for. + +The response of `object-info` is a list of the the requested object ids +and associated requested information, each separated by a single space. + + output = info flush-pkt + + info = PKT-LINE(attrs) LF) + *PKT-LINE(obj-info LF) + + attrs = attr | attrs SP attrs + + attr = "size" + + obj-info = obj-id SP obj-size diff --git a/Makefile b/Makefile index 21c0bf1667..3225e37b63 100644 --- a/Makefile +++ b/Makefile @@ -961,6 +961,7 @@ LIB_OBJS += progress.o LIB_OBJS += promisor-remote.o LIB_OBJS += prompt.o LIB_OBJS += protocol.o +LIB_OBJS += protocol-caps.o LIB_OBJS += prune-packed.o LIB_OBJS += quote.o LIB_OBJS += range-diff.o diff --git a/protocol-caps.c b/protocol-caps.c new file mode 100644 index 0000000000..c15e397756 --- /dev/null +++ b/protocol-caps.c @@ -0,0 +1,115 @@ +#include "git-compat-util.h" +#include "protocol-caps.h" +#include "gettext.h" +#include "pkt-line.h" +#include "strvec.h" +#include "hash.h" +#include "object.h" +#include "object-store.h" +#include "string-list.h" +#include "strbuf.h" + +struct requested_info { + unsigned size : 1; +}; + +/* + * Parses oids from the given line and collects them in the given + * oid_str_list. Returns 1 if parsing was successful and 0 otherwise. + */ +static int parse_oid(const char *line, struct string_list *oid_str_list) +{ + const char *arg; + + if (!skip_prefix(line, "oid ", &arg)) + return 0; + + string_list_append(oid_str_list, arg); + + return 1; +} + +/* + * Validates and send requested info back to the client. Any errors detected + * are returned as they are detected. + */ +static void send_info(struct repository *r, struct packet_writer *writer, + struct string_list *oid_str_list, + struct requested_info *info) +{ + struct string_list_item *item; + struct strbuf send_buffer = STRBUF_INIT; + + if (!oid_str_list->nr) + return; + + if (info->size) + packet_writer_write(writer, "size"); + + for_each_string_list_item (item, oid_str_list) { + const char *oid_str = item->string; + struct object_id oid; + unsigned long object_size; + + if (get_oid_hex(oid_str, &oid) < 0) { + packet_writer_error( + writer, + "object-info: protocol error, expected to get oid, not '%s'", + oid_str); + continue; + } + + strbuf_addstr(&send_buffer, oid_str); + + if (info->size) { + if (oid_object_info(r, &oid, &object_size) < 0) { + strbuf_addstr(&send_buffer, " "); + } else { + strbuf_addf(&send_buffer, " %lu", object_size); + } + } + + packet_writer_write(writer, "%s", + strbuf_detach(&send_buffer, NULL)); + } +} + +int cap_object_info(struct repository *r, struct strvec *keys, + struct packet_reader *request) +{ + struct packet_writer writer; + packet_writer_init(&writer, 1); + int parsed_header; + struct requested_info info; + + struct string_list oid_str_list = STRING_LIST_INIT_DUP; + + parsed_header = 0; + while (packet_reader_read(request) == PACKET_READ_NORMAL) { + if (!strcmp("size", request->line)) { + info.size = 1; + continue; + } + + if (parse_oid(request->line, &oid_str_list)) + continue; + + packet_writer_error(&writer, + "object-info: unexpected line: '%s'", + request->line); + } + + if (request->status != PACKET_READ_FLUSH) { + packet_writer_error( + &writer, "object-info: expected flush after arguments"); + die(_("object-info: expected flush after arguments")); + } + + send_info(r, &writer, &oid_str_list, &info); + + string_list_clear(&oid_str_list, 1); + + packet_flush(1); + + return 0; +} diff --git a/protocol-caps.h b/protocol-caps.h new file mode 100644 index 0000000000..6351648e37 --- /dev/null +++ b/protocol-caps.h @@ -0,0 +1,10 @@ +#ifndef PROTOCOL_CAPS_H +#define PROTOCOL_CAPS_H + +struct repository; +struct strvec; +struct packet_reader; +int cap_object_info(struct repository *r, struct strvec *keys, + struct packet_reader *request); + +#endif /* PROTOCOL_CAPS_H */ \ No newline at end of file diff --git a/serve.c b/serve.c index ac20c72763..aa8209f147 100644 --- a/serve.c +++ b/serve.c @@ -5,6 +5,7 @@ #include "version.h" #include "strvec.h" #include "ls-refs.h" +#include "protocol-caps.h" #include "serve.h" #include "upload-pack.h" @@ -78,6 +79,7 @@ static struct protocol_capability capabilities[] = { { "server-option", always_advertise, NULL }, { "object-format", object_format_advertise, NULL }, { "session-id", session_id_advertise, NULL }, + { "object-info", always_advertise, cap_object_info }, }; static void advertise_capabilities(void) diff --git a/t/t5701-git-serve.sh b/t/t5701-git-serve.sh index 509f379d49..73e74a9c54 100755 --- a/t/t5701-git-serve.sh +++ b/t/t5701-git-serve.sh @@ -19,6 +19,7 @@ test_expect_success 'test capability advertisement' ' fetch=shallow server-option object-format=$(test_oid algo) + object-info 0000 EOF @@ -240,4 +241,29 @@ test_expect_success 'unexpected lines are not allowed in fetch request' ' grep "unexpected line: .this-is-not-a-command." err ' +# Test the basics of object-info +# +test_expect_success 'basics of object-info' ' + test-tool pkt-line pack >in <<-EOF && + command=object-info + object-format=$(test_oid algo) + 0001 + size + oid $(git rev-parse two:two.t) + oid $(git rev-parse two:two.t) + 0000 + EOF + + cat >expect <<-EOF && + size + $(git rev-parse two:two.t) $(wc -c <two.t | xargs) + $(git rev-parse two:two.t) $(wc -c <two.t | xargs) + 0000 + EOF + + test-tool serve-v2 --stateless-rpc <in >out && + test-tool pkt-line unpack <out >actual && + test_cmp expect actual +' + test_done
Sometimes it is useful to get information of an object without having to download it completely. Add the "object-info" capability that lets the client ask for object-related information with their full hexadecimal object names. Only sizes are returned for now. Signed-off-by: Bruno Albuquerque <bga@google.com> --- Documentation/technical/protocol-v2.txt | 31 +++++++ Makefile | 1 + protocol-caps.c | 115 ++++++++++++++++++++++++ protocol-caps.h | 10 +++ serve.c | 2 + t/t5701-git-serve.sh | 26 ++++++ 6 files changed, 185 insertions(+) create mode 100644 protocol-caps.c create mode 100644 protocol-caps.h Hello. This is my first git patch so I thought I would introduce myself. I am a software engineer at Google and I have been involved with opensource for a while (mostly with the Haiku OS project) and now I am working on some Git changes that hopefully will be generally usefull. For this specific change, a clear usage scenario is implementing a VFS on top of Git (something like https://github.com/microsoft/VFSForGit) in a way that would not require someone to always fully download objects to get information about them. Object size is the obvious one and what is implemented here.