Message ID | cover.1604006121.git.steadmon@google.com (mailing list archive) |
---|---|
Headers | show |
Series | Advertise trace2 SID in protocol capabilities | expand |
On 10/29/20 5:32 PM, Josh Steadmon wrote: > In order to more easily debug remote operations, it is useful to be able > to inspect both client-side and server-side traces. This series allows > clients to record the server's trace2 session ID, and vice versa, by > advertising the SID in a new "trace2-sid" protocol capability. > Very nice! This should be very helpful when matching up client and server commands. > Two questions in particular for reviewers: > > 1) Is trace2/tr2_sid.h intended to be visible to the rest of the code, > or is the trace2/ directory supposed to be opaque implementation > detail? If the latter, would it be acceptable to move tr2_sid_get() > into trace2.h? I put all the trace2-private stuff in that sub-directory and gave everything tr2_ prefixes to hide the details as much as I could (and as an alternative to the usual tendency of having everything be static within a massive .c file). So, yeah, my intent was to make all of it opaque. So that just trace2.h contains the official API. Perhaps in trace2.c you could create: const char *trace2_session_id(void) { return tr2_sid_get(); } > > 2) upload-pack generally takes configuration via flags rather than > gitconfig. From offline discussions, it sounds like this is an > intentional choice to limit potential vulnerability from malicious > configs in local repositories accessed via the file:// URL scheme. Is > it reasonable to load the trace2.announceSID option from config files > in upload-pack, or should this be changed to a flag? I don't have the history to comment on this. One thing to consider is that the SID for a Git process is built up from the SID of the parent and the unique data for the current process. So for example, if `git fetch` has SID `<sid1>` and it spawns `git upload-pack`, the child will have SID `<sid1>/<sid2>` and if that spawns `git index-pack`, that child will have `<sid1>/<sid2>/<sid3>`. This is very helpful when tracking down parent/child relationships and perf hunting. This SID inheritance is passed via an environment variable to the child, which extends it and passes the longer version to its children. So the value being passed between client and server over the protocol may look like `<sid1>/<sid2>/<sid3>` rather than just a single `<sid_x>` term. For your purposes, do you want or care if you get the single term or the full SID ? Also, there's nothing to stop someone from seeding that environment variable in their shell with some mischief before launching the top-level Git command. So the above example might see the SID as `<mischief>/<sid1>/<sid2>/<sid3>`. I'm not sure if this could be abused when inserted into the V0/V1/V2 protocol or your logging facility. $ GIT_TRACE2_EVENT=1 GIT_TRACE2_PARENT_SID=hello git version {"event":"version","sid":"hello/20201030T154143.660608Z-H86606a97- P00001d30",...} ... So maybe we want to have a public API to return a pointer to just the final `<sid_x>` term ? (Then again, maybe I worry too much.) Thanks, Jeff > > > > Josh Steadmon (10): > docs: new capability to advertise trace2 SIDs > docs: new trace2.advertiseSID option > upload-pack: advertise trace2 SID in v0 capabilities > receive-pack: advertise trace2 SID in v0 capabilities > serve: advertise trace2 SID in v2 capabilities > transport: log received server trace2 SID > fetch-pack: advertise trace2 SID in capabilities > upload-pack, serve: log received client trace2 SID > send-pack: advertise trace2 SID in capabilities > receive-pack: log received client trace2 SID > > Documentation/config/trace2.txt | 4 + > .../technical/protocol-capabilities.txt | 13 ++- > Documentation/technical/protocol-v2.txt | 9 +++ > builtin/receive-pack.c | 16 ++++ > fetch-pack.c | 11 +++ > send-pack.c | 9 +++ > serve.c | 19 +++++ > t/t5705-trace2-sid-in-capabilities.sh | 79 +++++++++++++++++++ > transport.c | 10 +++ > upload-pack.c | 23 +++++- > 10 files changed, 190 insertions(+), 3 deletions(-) > create mode 100755 t/t5705-trace2-sid-in-capabilities.sh >
Josh Steadmon <steadmon@google.com> writes: > 2) upload-pack generally takes configuration via flags rather than > gitconfig. From offline discussions, it sounds like this is an > intentional choice to limit potential vulnerability from malicious > configs in local repositories accessed via the file:// URL scheme. Is > it reasonable to load the trace2.announceSID option from config files > in upload-pack, or should this be changed to a flag? I do not know about your offline discussion, but it certainly would make it easier to reason about the attack surface if we know it never gets affected by any configuration files. Having said that, upload-pack.c::upload_pack_config() already reads a lot from the configuration file, many of these variables are named "allowSomething", so... IOW, I do not see why the announceSID (should it be in trace2.* hierarchy, though?) needs to be treated in any more sensitive than say uploadpack.allowrefinwant or *.allowfilter variables.
On 2020.10.30 11:54, Jeff Hostetler wrote: > > > On 10/29/20 5:32 PM, Josh Steadmon wrote: > > In order to more easily debug remote operations, it is useful to be able > > to inspect both client-side and server-side traces. This series allows > > clients to record the server's trace2 session ID, and vice versa, by > > advertising the SID in a new "trace2-sid" protocol capability. > > > > Very nice! This should be very helpful when matching up client and > server commands. > > > > Two questions in particular for reviewers: > > > > 1) Is trace2/tr2_sid.h intended to be visible to the rest of the code, > > or is the trace2/ directory supposed to be opaque implementation > > detail? If the latter, would it be acceptable to move tr2_sid_get() > > into trace2.h? > > I put all the trace2-private stuff in that sub-directory and gave > everything tr2_ prefixes to hide the details as much as I could > (and as an alternative to the usual tendency of having everything > be static within a massive .c file). > > So, yeah, my intent was to make all of it opaque. > So that just trace2.h contains the official API. > > Perhaps in trace2.c you could create: > > const char *trace2_session_id(void) > { > return tr2_sid_get(); > } Done in V2, thanks. > > 2) upload-pack generally takes configuration via flags rather than > > gitconfig. From offline discussions, it sounds like this is an > > intentional choice to limit potential vulnerability from malicious > > configs in local repositories accessed via the file:// URL scheme. Is > > it reasonable to load the trace2.announceSID option from config files > > in upload-pack, or should this be changed to a flag? > > I don't have the history to comment on this. > > One thing to consider is that the SID for a Git process is built up > from the SID of the parent and the unique data for the current process. > So for example, if `git fetch` has SID `<sid1>` and it spawns > `git upload-pack`, the child will have SID `<sid1>/<sid2>` and if that > spawns `git index-pack`, that child will have `<sid1>/<sid2>/<sid3>`. > This is very helpful when tracking down parent/child relationships > and perf hunting. > > This SID inheritance is passed via an environment variable to > the child, which extends it and passes the longer version to its > children. > > So the value being passed between client and server over the > protocol may look like `<sid1>/<sid2>/<sid3>` rather than just a > single `<sid_x>` term. For your purposes, do you want or care if > you get the single term or the full SID ? I'm not sure we care too much one way or the other. A single component of the SID should be enough to join client & server logs, but it's easier to just send the whole thing. > Also, there's nothing to stop someone from seeding that environment > variable in their shell with some mischief before launching the > top-level Git command. So the above example might see the SID as > `<mischief>/<sid1>/<sid2>/<sid3>`. I'm not sure if this could be > abused when inserted into the V0/V1/V2 protocol or your logging > facility. > > $ GIT_TRACE2_EVENT=1 GIT_TRACE2_PARENT_SID=hello git version > {"event":"version","sid":"hello/20201030T154143.660608Z-H86606a97- > P00001d30",...} > ... > > So maybe we want to have a public API to return a pointer to just > the final `<sid_x>` term ? (Then again, maybe I worry too much.) Yeah, it's certainly possible to muck with the SID as you describe, but I'm not sure I see any big problems that could be caused. If someone points out an issue I've missed, I'll be happy to change this, though. > Thanks, > Jeff Thanks for the review! > > Josh Steadmon (10): > > docs: new capability to advertise trace2 SIDs > > docs: new trace2.advertiseSID option > > upload-pack: advertise trace2 SID in v0 capabilities > > receive-pack: advertise trace2 SID in v0 capabilities > > serve: advertise trace2 SID in v2 capabilities > > transport: log received server trace2 SID > > fetch-pack: advertise trace2 SID in capabilities > > upload-pack, serve: log received client trace2 SID > > send-pack: advertise trace2 SID in capabilities > > receive-pack: log received client trace2 SID > > > > Documentation/config/trace2.txt | 4 + > > .../technical/protocol-capabilities.txt | 13 ++- > > Documentation/technical/protocol-v2.txt | 9 +++ > > builtin/receive-pack.c | 16 ++++ > > fetch-pack.c | 11 +++ > > send-pack.c | 9 +++ > > serve.c | 19 +++++ > > t/t5705-trace2-sid-in-capabilities.sh | 79 +++++++++++++++++++ > > transport.c | 10 +++ > > upload-pack.c | 23 +++++- > > 10 files changed, 190 insertions(+), 3 deletions(-) > > create mode 100755 t/t5705-trace2-sid-in-capabilities.sh > >
Josh Steadmon <steadmon@google.com> writes: >> So the value being passed between client and server over the >> protocol may look like `<sid1>/<sid2>/<sid3>` rather than just a >> single `<sid_x>` term. For your purposes, do you want or care if >> you get the single term or the full SID ? > > I'm not sure we care too much one way or the other. A single component > of the SID should be enough to join client & server logs, but it's > easier to just send the whole thing. It may be worth documenting this design decision in the protocol doc; even though protocol doc may say this should be treated as an opaque token, people may assume certain structure.
On 11/3/20 4:22 PM, Junio C Hamano wrote: > Josh Steadmon <steadmon@google.com> writes: > >>> So the value being passed between client and server over the >>> protocol may look like `<sid1>/<sid2>/<sid3>` rather than just a >>> single `<sid_x>` term. For your purposes, do you want or care if >>> you get the single term or the full SID ? >> >> I'm not sure we care too much one way or the other. A single component >> of the SID should be enough to join client & server logs, but it's >> easier to just send the whole thing. > > It may be worth documenting this design decision in the protocol > doc; even though protocol doc may say this should be treated as an > opaque token, people may assume certain structure. > good point. let me make a note to revisit that text. thanks Jeff
On 2020.11.03 13:22, Junio C Hamano wrote: > Josh Steadmon <steadmon@google.com> writes: > > >> So the value being passed between client and server over the > >> protocol may look like `<sid1>/<sid2>/<sid3>` rather than just a > >> single `<sid_x>` term. For your purposes, do you want or care if > >> you get the single term or the full SID ? > > > > I'm not sure we care too much one way or the other. A single component > > of the SID should be enough to join client & server logs, but it's > > easier to just send the whole thing. > > It may be worth documenting this design decision in the protocol > doc; even though protocol doc may say this should be treated as an > opaque token, people may assume certain structure. Done in V3.