[00/10] Advertise trace2 SID in protocol capabilities

Message ID	cover.1604006121.git.steadmon@google.com (mailing list archive)
Headers	show Return-Path: <SRS0=U/aV=EE=vger.kernel.org=git-owner@kernel.org> Sender: "steadmon via sendgmr" <steadmon@lunarfall.svl.corp.google.com> Date: Thu, 29 Oct 2020 14:32:30 -0700 Message-Id: <cover.1604006121.git.steadmon@google.com> Mime-Version: 1.0 Subject: [PATCH 00/10] Advertise trace2 SID in protocol capabilities From: Josh Steadmon <steadmon@google.com> To: git@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Precedence: bulk
Series	Advertise trace2 SID in protocol capabilities \| expand [00/10] Advertise trace2 SID in protocol capabilities [01/10] docs: new capability to advertise trace2 SIDs [02/10] docs: new trace2.advertiseSID option [03/10] upload-pack: advertise trace2 SID in v0 capabilities [04/10] receive-pack: advertise trace2 SID in v0 capabilities [05/10] serve: advertise trace2 SID in v2 capabilities [06/10] transport: log received server trace2 SID [07/10] fetch-pack: advertise trace2 SID in capabilities [08/10] upload-pack, serve: log received client trace2 SID [09/10] send-pack: advertise trace2 SID in capabilities [10/10] receive-pack: log received client trace2 SID

Josh Steadmon Oct. 29, 2020, 9:32 p.m. UTC

In order to more easily debug remote operations, it is useful to be able
to inspect both client-side and server-side traces. This series allows
clients to record the server's trace2 session ID, and vice versa, by
advertising the SID in a new "trace2-sid" protocol capability.

Two questions in particular for reviewers:

1) Is trace2/tr2_sid.h intended to be visible to the rest of the code,
  or is the trace2/ directory supposed to be opaque implementation
  detail? If the latter, would it be acceptable to move tr2_sid_get()
  into trace2.h?

2) upload-pack generally takes configuration via flags rather than
  gitconfig. From offline discussions, it sounds like this is an
  intentional choice to limit potential vulnerability from malicious
  configs in local repositories accessed via the file:// URL scheme. Is
  it reasonable to load the trace2.announceSID option from config files
  in upload-pack, or should this be changed to a flag?



Josh Steadmon (10):
  docs: new capability to advertise trace2 SIDs
  docs: new trace2.advertiseSID option
  upload-pack: advertise trace2 SID in v0 capabilities
  receive-pack: advertise trace2 SID in v0 capabilities
  serve: advertise trace2 SID in v2 capabilities
  transport: log received server trace2 SID
  fetch-pack: advertise trace2 SID in capabilities
  upload-pack, serve: log received client trace2 SID
  send-pack: advertise trace2 SID in capabilities
  receive-pack: log received client trace2 SID

 Documentation/config/trace2.txt               |  4 +
 .../technical/protocol-capabilities.txt       | 13 ++-
 Documentation/technical/protocol-v2.txt       |  9 +++
 builtin/receive-pack.c                        | 16 ++++
 fetch-pack.c                                  | 11 +++
 send-pack.c                                   |  9 +++
 serve.c                                       | 19 +++++
 t/t5705-trace2-sid-in-capabilities.sh         | 79 +++++++++++++++++++
 transport.c                                   | 10 +++
 upload-pack.c                                 | 23 +++++-
 10 files changed, 190 insertions(+), 3 deletions(-)
 create mode 100755 t/t5705-trace2-sid-in-capabilities.sh

Jeff Hostetler Oct. 30, 2020, 3:54 p.m. UTC | #1

On 10/29/20 5:32 PM, Josh Steadmon wrote:
> In order to more easily debug remote operations, it is useful to be able
> to inspect both client-side and server-side traces. This series allows
> clients to record the server's trace2 session ID, and vice versa, by
> advertising the SID in a new "trace2-sid" protocol capability.
> 

Very nice!  This should be very helpful when matching up client and
server commands.

> Two questions in particular for reviewers:
> 
> 1) Is trace2/tr2_sid.h intended to be visible to the rest of the code,
>    or is the trace2/ directory supposed to be opaque implementation
>    detail? If the latter, would it be acceptable to move tr2_sid_get()
>    into trace2.h?

I put all the trace2-private stuff in that sub-directory and gave
everything tr2_ prefixes to hide the details as much as I could
(and as an alternative to the usual tendency of having everything
be static within a massive .c file).

So, yeah, my intent was to make all of it opaque.
So that just trace2.h contains the official API.

Perhaps in trace2.c you could create:

const char *trace2_session_id(void)
{
     return tr2_sid_get();
}

> 
> 2) upload-pack generally takes configuration via flags rather than
>    gitconfig. From offline discussions, it sounds like this is an
>    intentional choice to limit potential vulnerability from malicious
>    configs in local repositories accessed via the file:// URL scheme. Is
>    it reasonable to load the trace2.announceSID option from config files
>    in upload-pack, or should this be changed to a flag?

I don't have the history to comment on this.

One thing to consider is that the SID for a Git process is built up
from the SID of the parent and the unique data for the current process.
So for example, if `git fetch` has SID `<sid1>` and it spawns
`git upload-pack`, the child will have SID `<sid1>/<sid2>` and if that
spawns `git index-pack`, that child will have `<sid1>/<sid2>/<sid3>`.
This is very helpful when tracking down parent/child relationships
and perf hunting.

This SID inheritance is passed via an environment variable to
the child, which extends it and passes the longer version to its
children.

So the value being passed between client and server over the
protocol may look like `<sid1>/<sid2>/<sid3>` rather than just a
single `<sid_x>` term.  For your purposes, do you want or care if
you get the single term or the full SID ?

Also, there's nothing to stop someone from seeding that environment
variable in their shell with some mischief before launching the
top-level Git command.  So the above example might see the SID as
`<mischief>/<sid1>/<sid2>/<sid3>`.  I'm not sure if this could be
abused when inserted into the V0/V1/V2 protocol or your logging
facility.

     $ GIT_TRACE2_EVENT=1 GIT_TRACE2_PARENT_SID=hello git version
     {"event":"version","sid":"hello/20201030T154143.660608Z-H86606a97- 
  P00001d30",...}
     ...

So maybe we want to have a public API to return a pointer to just
the final `<sid_x>` term ?  (Then again, maybe I worry too much.)

Thanks,
Jeff

> 
> 
> 
> Josh Steadmon (10):
>    docs: new capability to advertise trace2 SIDs
>    docs: new trace2.advertiseSID option
>    upload-pack: advertise trace2 SID in v0 capabilities
>    receive-pack: advertise trace2 SID in v0 capabilities
>    serve: advertise trace2 SID in v2 capabilities
>    transport: log received server trace2 SID
>    fetch-pack: advertise trace2 SID in capabilities
>    upload-pack, serve: log received client trace2 SID
>    send-pack: advertise trace2 SID in capabilities
>    receive-pack: log received client trace2 SID
> 
>   Documentation/config/trace2.txt               |  4 +
>   .../technical/protocol-capabilities.txt       | 13 ++-
>   Documentation/technical/protocol-v2.txt       |  9 +++
>   builtin/receive-pack.c                        | 16 ++++
>   fetch-pack.c                                  | 11 +++
>   send-pack.c                                   |  9 +++
>   serve.c                                       | 19 +++++
>   t/t5705-trace2-sid-in-capabilities.sh         | 79 +++++++++++++++++++
>   transport.c                                   | 10 +++
>   upload-pack.c                                 | 23 +++++-
>   10 files changed, 190 insertions(+), 3 deletions(-)
>   create mode 100755 t/t5705-trace2-sid-in-capabilities.sh
>

Junio C Hamano Oct. 30, 2020, 10:31 p.m. UTC | #2

Josh Steadmon <steadmon@google.com> writes:

> 2) upload-pack generally takes configuration via flags rather than
>   gitconfig. From offline discussions, it sounds like this is an
>   intentional choice to limit potential vulnerability from malicious
>   configs in local repositories accessed via the file:// URL scheme. Is
>   it reasonable to load the trace2.announceSID option from config files
>   in upload-pack, or should this be changed to a flag?

I do not know about your offline discussion, but it certainly would
make it easier to reason about the attack surface if we know it
never gets affected by any configuration files.

Having said that, upload-pack.c::upload_pack_config() already reads
a lot from the configuration file, many of these variables are named
"allowSomething", so...

IOW, I do not see why the announceSID (should it be in trace2.*
hierarchy, though?) needs to be treated in any more sensitive than
say uploadpack.allowrefinwant or *.allowfilter variables.

Josh Steadmon Nov. 2, 2020, 10:20 p.m. UTC | #3

On 2020.10.30 11:54, Jeff Hostetler wrote:
> 
> 
> On 10/29/20 5:32 PM, Josh Steadmon wrote:
> > In order to more easily debug remote operations, it is useful to be able
> > to inspect both client-side and server-side traces. This series allows
> > clients to record the server's trace2 session ID, and vice versa, by
> > advertising the SID in a new "trace2-sid" protocol capability.
> > 
> 
> Very nice!  This should be very helpful when matching up client and
> server commands.
> 
> 
> > Two questions in particular for reviewers:
> > 
> > 1) Is trace2/tr2_sid.h intended to be visible to the rest of the code,
> >    or is the trace2/ directory supposed to be opaque implementation
> >    detail? If the latter, would it be acceptable to move tr2_sid_get()
> >    into trace2.h?
> 
> I put all the trace2-private stuff in that sub-directory and gave
> everything tr2_ prefixes to hide the details as much as I could
> (and as an alternative to the usual tendency of having everything
> be static within a massive .c file).
> 
> So, yeah, my intent was to make all of it opaque.
> So that just trace2.h contains the official API.
> 
> Perhaps in trace2.c you could create:
> 
> const char *trace2_session_id(void)
> {
>     return tr2_sid_get();
> }

Done in V2, thanks.


> > 2) upload-pack generally takes configuration via flags rather than
> >    gitconfig. From offline discussions, it sounds like this is an
> >    intentional choice to limit potential vulnerability from malicious
> >    configs in local repositories accessed via the file:// URL scheme. Is
> >    it reasonable to load the trace2.announceSID option from config files
> >    in upload-pack, or should this be changed to a flag?
> 
> I don't have the history to comment on this.
> 
> One thing to consider is that the SID for a Git process is built up
> from the SID of the parent and the unique data for the current process.
> So for example, if `git fetch` has SID `<sid1>` and it spawns
> `git upload-pack`, the child will have SID `<sid1>/<sid2>` and if that
> spawns `git index-pack`, that child will have `<sid1>/<sid2>/<sid3>`.
> This is very helpful when tracking down parent/child relationships
> and perf hunting.
> 
> This SID inheritance is passed via an environment variable to
> the child, which extends it and passes the longer version to its
> children.
> 
> So the value being passed between client and server over the
> protocol may look like `<sid1>/<sid2>/<sid3>` rather than just a
> single `<sid_x>` term.  For your purposes, do you want or care if
> you get the single term or the full SID ?

I'm not sure we care too much one way or the other. A single component
of the SID should be enough to join client & server logs, but it's
easier to just send the whole thing.


> Also, there's nothing to stop someone from seeding that environment
> variable in their shell with some mischief before launching the
> top-level Git command.  So the above example might see the SID as
> `<mischief>/<sid1>/<sid2>/<sid3>`.  I'm not sure if this could be
> abused when inserted into the V0/V1/V2 protocol or your logging
> facility.
> 
>     $ GIT_TRACE2_EVENT=1 GIT_TRACE2_PARENT_SID=hello git version
>     {"event":"version","sid":"hello/20201030T154143.660608Z-H86606a97-
> P00001d30",...}
>     ...
> 
> So maybe we want to have a public API to return a pointer to just
> the final `<sid_x>` term ?  (Then again, maybe I worry too much.)

Yeah, it's certainly possible to muck with the SID as you describe, but
I'm not sure I see any big problems that could be caused. If someone
points out an issue I've missed, I'll be happy to change this, though.

> Thanks,
> Jeff

Thanks for the review!


> > Josh Steadmon (10):
> >    docs: new capability to advertise trace2 SIDs
> >    docs: new trace2.advertiseSID option
> >    upload-pack: advertise trace2 SID in v0 capabilities
> >    receive-pack: advertise trace2 SID in v0 capabilities
> >    serve: advertise trace2 SID in v2 capabilities
> >    transport: log received server trace2 SID
> >    fetch-pack: advertise trace2 SID in capabilities
> >    upload-pack, serve: log received client trace2 SID
> >    send-pack: advertise trace2 SID in capabilities
> >    receive-pack: log received client trace2 SID
> > 
> >   Documentation/config/trace2.txt               |  4 +
> >   .../technical/protocol-capabilities.txt       | 13 ++-
> >   Documentation/technical/protocol-v2.txt       |  9 +++
> >   builtin/receive-pack.c                        | 16 ++++
> >   fetch-pack.c                                  | 11 +++
> >   send-pack.c                                   |  9 +++
> >   serve.c                                       | 19 +++++
> >   t/t5705-trace2-sid-in-capabilities.sh         | 79 +++++++++++++++++++
> >   transport.c                                   | 10 +++
> >   upload-pack.c                                 | 23 +++++-
> >   10 files changed, 190 insertions(+), 3 deletions(-)
> >   create mode 100755 t/t5705-trace2-sid-in-capabilities.sh
> >

Junio C Hamano Nov. 3, 2020, 9:22 p.m. UTC | #4

Josh Steadmon <steadmon@google.com> writes:

>> So the value being passed between client and server over the
>> protocol may look like `<sid1>/<sid2>/<sid3>` rather than just a
>> single `<sid_x>` term.  For your purposes, do you want or care if
>> you get the single term or the full SID ?
>
> I'm not sure we care too much one way or the other. A single component
> of the SID should be enough to join client & server logs, but it's
> easier to just send the whole thing.

It may be worth documenting this design decision in the protocol
doc; even though protocol doc may say this should be treated as an
opaque token, people may assume certain structure.

Jeff Hostetler Nov. 5, 2020, 9:01 p.m. UTC | #5

On 11/3/20 4:22 PM, Junio C Hamano wrote:
> Josh Steadmon <steadmon@google.com> writes:
> 
>>> So the value being passed between client and server over the
>>> protocol may look like `<sid1>/<sid2>/<sid3>` rather than just a
>>> single `<sid_x>` term.  For your purposes, do you want or care if
>>> you get the single term or the full SID ?
>>
>> I'm not sure we care too much one way or the other. A single component
>> of the SID should be enough to join client & server logs, but it's
>> easier to just send the whole thing.
> 
> It may be worth documenting this design decision in the protocol
> doc; even though protocol doc may say this should be treated as an
> opaque token, people may assume certain structure.
> 

good point.  let me make a note to revisit that text.

thanks
Jeff

Josh Steadmon Nov. 10, 2020, 9:37 p.m. UTC | #6

On 2020.11.03 13:22, Junio C Hamano wrote:
> Josh Steadmon <steadmon@google.com> writes:
> 
> >> So the value being passed between client and server over the
> >> protocol may look like `<sid1>/<sid2>/<sid3>` rather than just a
> >> single `<sid_x>` term.  For your purposes, do you want or care if
> >> you get the single term or the full SID ?
> >
> > I'm not sure we care too much one way or the other. A single component
> > of the SID should be enough to join client & server logs, but it's
> > easier to just send the whole thing.
> 
> It may be worth documenting this design decision in the protocol
> doc; even though protocol doc may say this should be treated as an
> opaque token, people may assume certain structure.

Done in V3.

[00/10] Advertise trace2 SID in protocol capabilities

Message

Comments