mbox series

[v3,0/6,Outreachy] Introduce os-version Capability with Configurable Options

Message ID 20250124122217.250925-1-usmanakinyemi202@gmail.com (mailing list archive)
Headers show
Series Introduce os-version Capability with Configurable Options | expand

Message

Usman Akinyemi Jan. 24, 2025, 12:21 p.m. UTC
For debugging, statistical analysis, and security purposes, it can
be valuable for Git servers to know the operating system the clients
are using.

For example:
- A server noticing that a client is using an old Git version with
security issues on one platform, like macOS, could verify if the
user is indeed running macOS before sending a message to upgrade."
- Similarly, a server identifying a client that could benefit from
an upgrade (e.g., for performance reasons) could better customize the
message it sends to nudge the client to upgrade.

So let's add a new 'os-version' capability to the v2 protocol, in the
same way as the existing 'agent' capability that lets clients and servers
exchange the Git version they are running.

Having the `os-version` protocol capability separately from other protocol
capabilities like `agent` is beneficial in ways like:

- It provides a clear separation between Git versioning and OS-specific,
concerns making troubleshooting and environment analysis more modular.
- It ensures we do not disrupt people's scripts that collect statistics
from other protocol capabilities like `agent`.
- It offers flexibility for possible future extensibility, allowing us to
add additional system-level details without modifying existing `agent`
parsing logic.
- It provides better control over privacy and security by allowing
selective exposure of OS information.

By default this sends similar info as `git bugreport` is already sending,
which uses uname(2). The difference is that it is sanitized in the same
way as the Git version sent by the 'agent' capability is sanitized
(by replacing characters having an ascii code less than 32 or more
than 127 with '.'). Also, it only sends the result of `uname -s` i.e
just only the operating system name (e.g "Linux").

Due to privacy issues and concerns, let's add the `transfer.advertiseOSVersion`
config option. This boolean option is enabled by default, but allows users to
disable this feature completely by setting it to "false".

Note that, due to differences between `uname(1)` (command-line
utility) and `uname(2)` (system call) outputs on Windows,
`transfer.advertiseOSVersion` is set to false on Windows during
testing. See the message part of patch 5/6 for more details.

My mentor, Christian Couder, sent a previous patch series about this
before. You can find it here
https://lore.kernel.org/git/20240619125708.3719150-1-christian.couder@gmail.com/

Changes since v2
================
  - Dropped the last patch which introduced `osversion.command`.
  - Use isprint() for checking printables byte in a preparatory
  patch. 
  - Add a few reasons why we should have `os-version` as a separate
  capability in the commit message that introduces it.
  - Improve how `printf` is used in the tests for better clarity.
  - Refactor documentation for improved clarity.
  - Retrieve and immediately sanitize the system information in the
  same function for simpler API surface.

Usman Akinyemi (6):
  version: replace manual ASCII checks with isprint() for clarity
  version: refactor redact_non_printables()
  version: refactor get_uname_info()
  version: extend get_uname_info() to hide system details
  t5701: add setup test to remove side-effect dependency
  connect: advertise OS version

 Documentation/config/transfer.txt |  7 +++
 Documentation/gitprotocol-v2.txt  | 17 +++++++
 builtin/bugreport.c               | 13 +-----
 connect.c                         |  3 ++
 serve.c                           | 14 ++++++
 t/t5555-http-smart-common.sh      | 10 ++++-
 t/t5701-git-serve.sh              | 30 +++++++++++--
 t/test-lib-functions.sh           |  8 ++++
 version.c                         | 73 ++++++++++++++++++++++++++++---
 version.h                         | 22 ++++++++++
 10 files changed, 175 insertions(+), 22 deletions(-)

Range-diff versus v2:

-:  ---------- > 1:  82b62c5e66 version: replace manual ASCII checks with isprint() for clarity
1:  97bccab6d5 ! 2:  0a7d7ce871 version: refactor redact_non_printables()
    @@ version.c
     +/*
     + * Trim and replace each character with ascii code below 32 or above
     + * 127 (included) using a dot '.' character.
    -+ * TODO: ensure consecutive non-printable characters are only replaced once
    -+*/
    ++ */
     +static void redact_non_printables(struct strbuf *buf)
     +{
     +	strbuf_trim(buf);
     +	for (size_t i = 0; i < buf->len; i++) {
    -+		if (buf->buf[i] <= 32 || buf->buf[i] >= 127)
    ++		if (!isprint(buf->buf[i]) || buf->buf[i] == ' ')
     +			buf->buf[i] = '.';
     +	}
     +}
    @@ version.c: const char *git_user_agent_sanitized(void)
      		strbuf_addstr(&buf, git_user_agent());
     -		strbuf_trim(&buf);
     -		for (size_t i = 0; i < buf.len; i++) {
    --			if (buf.buf[i] <= 32 || buf.buf[i] >= 127)
    +-			if (!isprint(buf.buf[i]) || buf.buf[i] == ' ')
     -				buf.buf[i] = '.';
     -		}
     -		agent = buf.buf;
2:  1f8a4024a4 ! 3:  0187db59a4 version: refactor get_uname_info()
    @@ builtin/bugreport.c: static void get_system_info(struct strbuf *sys_info)
     
      ## version.c ##
     @@
    - #include "version.h"
      #include "version-def.h"
      #include "strbuf.h"
    + #include "sane-ctype.h"
     +#include "gettext.h"
      
      const char git_version_string[] = GIT_VERSION;
3:  962b42702f ! 4:  d3a3573594 version: extend get_uname_info() to hide system details
    @@ Commit message
         version: extend get_uname_info() to hide system details
     
         Currently, get_uname_info() function provides the full OS information.
    -    In a follwing commit, we will need it to provide only the OS name.
    +    In a following commit, we will need it to provide only the OS name.
     
         Let's extend it to accept a "full" flag that makes it switch between
         providing full OS information and providing only the OS name.
4:  7f0ec75a0d ! 5:  d9edd2ffc8 t5701: add setup test to remove side-effect dependency
    @@ t/t5701-git-serve.sh: export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
      
     -test_expect_success 'test capability advertisement' '
     +test_expect_success 'setup to generate files with expected content' '
    -+	printf "agent=git/$(git version | cut -d" " -f3)" >agent_and_osversion &&
    ++	printf "agent=git/%s\n" "$(git version | cut -d" " -f3)" >agent_and_osversion &&
     +
      	test_oid_cache <<-EOF &&
      	wrong_algo sha1:sha256
    @@ t/t5701-git-serve.sh: export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
      	ls-refs=unborn
      	fetch=shallow wait-for-done
      	server-option
    -@@ t/t5701-git-serve.sh: test_expect_success 'test capability advertisement' '
    - 	cat >expect.trailer <<-EOF &&
    + 	object-format=$(test_oid algo)
    + 	EOF
    +-	cat >expect.trailer <<-EOF &&
    ++	cat >expect.trailer <<-EOF
      	0000
      	EOF
     +'
5:  007f8582d9 ! 6:  8a936b25f7 connect: advertise OS version
    @@ Commit message
         a server is using.
     
         Let's introduce a new protocol (`os-version`) allowing Git clients and
    -    servers to exchange operating system information. The protocol is
    -    controlled by the new `transfer.advertiseOSVersion` config option.
    +    servers to exchange operating system information.
    +
    +    Having the `os-version` protocol capability separately from other protocol
    +    capabilities like `agent` is beneficial in ways like:
    +
    +    - It provides a clear separation between Git versioning and OS-specific,
    +    concerns making troubleshooting and environment analysis more modular.
    +    - It ensures we do not disrupt people's scripts that collect statistics
    +    from other protocol like `agent`.
    +    - It offers flexibility for possible future extensibility, allowing us to
    +    add additional system-level details without modifying existing `agent`
    +    parsing logic.
    +    - It provides better control over privacy and security by allowing
    +    selective exposure of OS information.
     
         Add the `transfer.advertiseOSVersion` config option to address
         privacy concerns. It defaults to `true` and can be changed to
         `false`. When enabled, this option makes clients and servers send each
         other the OS name (e.g., "Linux" or "Windows"). The information is
    -    retrieved using the 'sysname' field of the `uname(2)` system call.
    +    retrieved using the 'sysname' field of the `uname(2)` system call or its
    +    equivalent.
     
         However, there are differences between `uname(1)` (command-line utility)
         and `uname(2)` (system call) outputs on Windows. These discrepancies
    @@ Documentation/config/transfer.txt: transfer.bundleURI::
      	servers. Defaults to false.
     +
     +transfer.advertiseOSVersion::
    -+	When `true`, the `os-version` capability is advertised by clients and
    -+	servers. It makes clients and servers send to each other a string
    -+	representing the operating system name, like "Linux" or "Windows".
    -+	This string is retrieved from the `sysname` field of the struct returned
    -+	by the uname(2) system call. Defaults to true.
    ++	When set to `true` on the server, the server will advertise its
    ++	`os-version` capability to the client. On the client side, if set
    ++	to `true`, it will advertise its `os-version` capability to the
    ++	server only if the server also advertises its `os-version` capability.
    ++	Defaults to true.
     
      ## Documentation/gitprotocol-v2.txt ##
     @@ Documentation/gitprotocol-v2.txt: printable ASCII characters except space (i.e., the byte range 32 < x <
    @@ Documentation/gitprotocol-v2.txt: printable ASCII characters except space (i.e.,
     +In the same way as the `agent` capability above, the server can
     +advertise the `os-version` capability to notify the client the
     +kind of operating system it is running on. The client may optionally
    -+send its own `os-version` capability, to notify the server the kind of
    -+operating system it is also running on in its request to the server
    ++send its own `os-version` capability, to notify the server the kind
    ++of operating system it is also running on in its request to the server
     +(but it MUST NOT do so if the server did not advertise the os-version
     +capability). The value of this capability may consist of ASCII printable
    -+characters(from 33 to 126 inclusive) and are typically made from the result of
    -+`uname -s`(OS name e.g Linux). The os-version capability can be disabled
    -+entirely by setting the `transfer.advertiseOSVersion` config option
    -+to `false`. The `os-version` strings are purely informative for
    ++characters(from 33 to 126 inclusive) and are typically made from the
    ++result of `uname -s`(OS name e.g Linux). The os-version capability can
    ++be disabled entirely by setting the `transfer.advertiseOSVersion` config
    ++option to `false`. The `os-version` strings are purely informative for
     +statistics and debugging purposes, and MUST NOT be used to
    -+programmatically assume the presence or absence of particular
    -+features.
    ++programmatically assume the presence or absence of particular features.
     +
      ls-refs
      ~~~~~~~
    @@ t/t5555-http-smart-common.sh: test_expect_success 'git receive-pack --advertise-
      '
      
      test_expect_success 'git upload-pack --advertise-refs: v2' '
    -+	printf "agent=FAKE" >agent_and_osversion &&
    ++	printf "agent=FAKE\n" >agent_and_osversion &&
     +	if test_have_prereq WINDOWS
     +	then
     +		git config transfer.advertiseOSVersion false
     +	else
    -+		printf "\nos-version=%s\n" $(uname -s | test_redact_non_printables) >>agent_and_osversion
    ++		printf "os-version=%s\n" $(uname -s | test_redact_non_printables) >>agent_and_osversion
     +	fi &&
     +
      	cat >expect <<-EOF &&
    @@ t/t5555-http-smart-common.sh: test_expect_success 'git receive-pack --advertise-
     
      ## t/t5701-git-serve.sh ##
     @@ t/t5701-git-serve.sh: test_expect_success 'setup to generate files with expected content' '
    - 	cat >expect.trailer <<-EOF &&
    + 	server-option
    + 	object-format=$(test_oid algo)
    + 	EOF
    +-	cat >expect.trailer <<-EOF
    ++	cat >expect.trailer <<-EOF &&
      	0000
      	EOF
     +
    @@ t/t5701-git-serve.sh: test_expect_success 'setup to generate files with expected
     +	then
     +		git config transfer.advertiseOSVersion false
     +	else
    -+		printf "\nos-version=%s\n" $(uname -s | test_redact_non_printables) >>agent_and_osversion
    ++		printf "os-version=%s\n" $(uname -s | test_redact_non_printables) >>agent_and_osversion
     +	fi &&
     +
     +	cat >expect_osversion.base <<-EOF
    @@ t/test-lib-functions.sh: test_trailing_hash () {
     +# Trim and replace each character with ascii code below 32 or above
     +# 127 (included) using a dot '.' character.
     +# Octal intervals \001-\040 and \177-\377
    -+# corresponds to decimal intervals 1-32 and 127-255
    ++# correspond to decimal intervals 1-32 and 127-255
     +test_redact_non_printables () {
     +    tr -d "\n\r" | tr "[\001-\040][\177-\377]" "."
     +}
     
      ## version.c ##
     @@
    - #include "version-def.h"
      #include "strbuf.h"
    + #include "sane-ctype.h"
      #include "gettext.h"
     +#include "config.h"
      
    @@ version.c: int get_uname_info(struct strbuf *buf, unsigned int full)
      	return 0;
      }
     +
    -+const char *os_version(void)
    ++const char *os_version_sanitized(void)
     +{
     +	static const char *os = NULL;
     +
    @@ version.c: int get_uname_info(struct strbuf *buf, unsigned int full)
     +		struct strbuf buf = STRBUF_INIT;
     +
     +		get_uname_info(&buf, 0);
    ++		/* Sanitize the os information immediately */
    ++		redact_non_printables(&buf);
     +		os = strbuf_detach(&buf, NULL);
     +	}
     +
     +	return os;
     +}
     +
    -+const char *os_version_sanitized(void)
    -+{
    -+	static const char *os_sanitized = NULL;
    -+
    -+	if (!os_sanitized) {
    -+		struct strbuf buf = STRBUF_INIT;
    -+
    -+		strbuf_addstr(&buf, os_version());
    -+		redact_non_printables(&buf);
    -+		os_sanitized = strbuf_detach(&buf, NULL);
    -+	}
    -+
    -+	return os_sanitized;
    -+}
    -+
     +int advertise_os_version(struct repository *r)
     +{
     +	static int transfer_advertise_os_version = -1;
    @@ version.h: const char *git_user_agent_sanitized(void);
      int get_uname_info(struct strbuf *buf, unsigned int full);
      
     +/*
    -+  Retrieve and cache system information for subsequent calls.
    -+  Return a pointer to the cached system information string.
    -+*/
    -+const char *os_version(void);
    -+
    -+/*
    -+  Retrieve system information string from os_version(). Then
    -+  sanitize and cache it. Return a pointer to the sanitized
    -+  system information string.
    ++  Retrieve, sanitize and cache system information for subsequent
    ++  calls. Return a pointer to the sanitized system information
    ++  string.
     +*/
     +const char *os_version_sanitized(void);
     +
6:  10a07a3095 < -:  ---------- version: introduce osversion.command config for os-version output

Comments

Junio C Hamano Jan. 24, 2025, 6:39 p.m. UTC | #1
Usman Akinyemi <usmanakinyemi202@gmail.com> writes:

> For debugging, statistical analysis, and security purposes, it can
> be valuable for Git servers to know the operating system the clients
> are using.

OK.  I think the reorganization done in this round makes it much
easier to see what is going on in each step.  Very well done.

The only remaining issue from my point of view is if we really want
this as a separate and new knob with capability, or if we would be
better off to carry this kind of extra piece of information by
enhancing existing "agent" capability.  Given what Web Browsers do
in their UA strings, it does feel cumbersome for analitics tools to
pay attention to two separate input sources (os-version and agent).

Has somebody brought up any downsides of cramming the OS information
to the existing agent thing?  I have not thought of any possible
downsides since I made this suggestion in a previous review of this
topic, but I may be missing something obvious, so...

Thanks.
Christian Couder Jan. 27, 2025, 1:38 p.m. UTC | #2
On Fri, Jan 24, 2025 at 7:39 PM Junio C Hamano <gitster@pobox.com> wrote:

> The only remaining issue from my point of view is if we really want
> this as a separate and new knob with capability, or if we would be
> better off to carry this kind of extra piece of information by
> enhancing existing "agent" capability.  Given what Web Browsers do
> in their UA strings, it does feel cumbersome for analitics tools to
> pay attention to two separate input sources (os-version and agent).
>
> Has somebody brought up any downsides of cramming the OS information
> to the existing agent thing?  I have not thought of any possible
> downsides since I made this suggestion in a previous review of this
> topic, but I may be missing something obvious, so...

My opinion is that it isn't a good idea to enhance the existing
"agent" capability. Yeah, it goes in the same direction as what web
browsers have been doing with the User-Agent header, but I think web
browsers are an especially bad example that we should strive not to
follow.

According to Wikipedia
(https://en.wikipedia.org/wiki/User-Agent_header) the format for the
User-Agent header is now "Mozilla/[version] ([system and browser
information]) [platform] ([platform details]) [extensions]", for
example "Mozilla/5.0 (iPad; U; CPU OS 3_2_1 like Mac OS X; en-us)
AppleWebKit/531.21.10 (KHTML, like Gecko) Mobile/7B405". This is
obviously very difficult to parse for everyone including analytics
tools and is not very flexible either. It serves as a way to pass
information about available features, but leak some privacy
information in the process. The fact that it's used to pass
information about available features has led to a lot of user agent
spoofing which means that analytics, statistics and debugging are
likely harder than they need to be.

When Git developed capabilities and the "agent" capability, the doc
took care of saying things that it "MUST NOT be used to
programmatically assume the presence or absence of particular
features". This was done to go in the direction of not passing more
information through this "agent" capability but instead use separate
ones. So I think we should just avoid putting other things in the
"agent"  capability to avoid what happened to the User-Agent header in
browsers and to stay true to our original intent to have a different
capability for each advertised information or feature.
Junio C Hamano Jan. 27, 2025, 3:26 p.m. UTC | #3
Christian Couder <christian.couder@gmail.com> writes:

> information in the process. The fact that it's used to pass
> information about available features has led to a lot of user agent
> spoofing which means that analytics, statistics and debugging are
> likely harder than they need to be.

Yes, that is a valid viewpoint, but ...

> When Git developed capabilities and the "agent" capability, the doc
> took care of saying things that it "MUST NOT be used to
> programmatically assume the presence or absence of particular
> features".

... the proposed os-version thing has the same wording in its
documentation, doesn't it?  What is being added is not to be used
in a way that requires parsing and trusting the result.

So unless your point is that users (like those who parse User-Agent
string by browsers) will do the wrong thing and assume these strings
are usable for feature detection anyway so we should make it easier
to parse, I'd have to disagree.  If we are not aiming to make it
easier to parse and assume certain things that we do not want them
to, I do not see why we need to have the pieces of information in
two separate capabilities.

Thanks.
Christian Couder Jan. 31, 2025, 2:30 p.m. UTC | #4
On Mon, Jan 27, 2025 at 4:26 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Christian Couder <christian.couder@gmail.com> writes:
>
> > information in the process. The fact that it's used to pass
> > information about available features has led to a lot of user agent
> > spoofing which means that analytics, statistics and debugging are
> > likely harder than they need to be.
>
> Yes, that is a valid viewpoint, but ...
>
> > When Git developed capabilities and the "agent" capability, the doc
> > took care of saying things that it "MUST NOT be used to
> > programmatically assume the presence or absence of particular
> > features".
>
> ... the proposed os-version thing has the same wording in its
> documentation, doesn't it?

Yeah, we repeat it to make sure that users read it. I am fine with
refactoring that wording if we think that having it once is enough.

> What is being added is not to be used
> in a way that requires parsing and trusting the result.

Why not? If server people want to do OS stats on their clients, for
example, why shouldn't they parse and trust the result?

> So unless your point is that users (like those who parse User-Agent
> string by browsers) will do the wrong thing and assume these strings
> are usable for feature detection anyway so we should make it easier
> to parse, I'd have to disagree.

We should make it easy to parse because people will use this field
(otherwise why are we adding it?), and we want to make it easy to use
rather than hard just because we are nice with our users.

I think we should not assume that they will do the wrong thing,
especially if our docs are clear about how it shouldn't be used.

>  If we are not aiming to make it
> easier to parse and assume certain things that we do not want them
> to, I do not see why we need to have the pieces of information in
> two separate capabilities.

I think it's just the right thing to make it easy to parse. Doing OS
stats on the server side doesn't need to be unnecessarily hard.

By the way, if we put the OS information in the "agent" capability,
how do we separate it from the existing "package/version" content and
make it easy to parse? I don't see a good solution because
GIT_USER_AGENT could be used, and the config option to not show the OS
name could be used too.

Also we don't know what could be in the "version" part. The doc says
that the agent part is typically of the form "package/version" but
doesn't require it.

Thanks.
Junio C Hamano Jan. 31, 2025, 4:37 p.m. UTC | #5
Christian Couder <christian.couder@gmail.com> writes:

> By the way, if we put the OS information in the "agent" capability,
> how do we separate it from the existing "package/version" content and
> make it easy to parse?

Do NOT parse, period.

If three "things" that talk the Git protocol on the other end of the
connection gives "Linux git/2.48.0", and "macOS libgit2/1.9.0", and
"Windows git/2.47.1" as their (enhanced) "agent" strings, there is
no "ah, this one is 1.9.0 which way older than 2.47.1 so it must be
missing features X and Y" the users of the information are allowed
to infer.

Just take it as a single opaque string, and group identical ones.

In the above scenario, we found three different kinds now.  Maybe
we'll accumulate the counts and notice that there are N times as
many connections whose agent string begins with "Windows" as "Linux"
and "macOS" combined or something.  That would be an offline
analysis, and forcing users to do the stats offline would reduce the
temptation to use it for purposes other than its intended one.

You may find "ImNotTellingYou" and may wonder what OS the user is
really using, but they do not want to tell you, so you honor their
wish.

> I don't see a good solution because
> GIT_USER_AGENT could be used, and the config option to not show the OS
> name could be used too.

That is a good privacy measure.

> Also we don't know what could be in the "version" part. The doc says
> that the agent part is typically of the form "package/version" but
> doesn't require it.

Exactly.  I would think it is a feature, and the way to treat the
string in line with the philosophy behind that feature is to take it
as a single opaque thing.
Usman Akinyemi Jan. 31, 2025, 7:42 p.m. UTC | #6
On Fri, Jan 31, 2025 at 10:07 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Christian Couder <christian.couder@gmail.com> writes:
>
> > By the way, if we put the OS information in the "agent" capability,
> > how do we separate it from the existing "package/version" content and
> > make it easy to parse?
>
> Do NOT parse, period.
>
> If three "things" that talk the Git protocol on the other end of the
> connection gives "Linux git/2.48.0", and "macOS libgit2/1.9.0", and
> "Windows git/2.47.1" as their (enhanced) "agent" strings, there is
> no "ah, this one is 1.9.0 which way older than 2.47.1 so it must be
> missing features X and Y" the users of the information are allowed
> to infer.
Hi Junio,

Do you have any concerns "git/2.47.1 Windows" instead of
"Windows git/2.47.1" ?

Thank you.
>
> Just take it as a single opaque string, and group identical ones.
>
> In the above scenario, we found three different kinds now.  Maybe
> we'll accumulate the counts and notice that there are N times as
> many connections whose agent string begins with "Windows" as "Linux"
> and "macOS" combined or something.  That would be an offline
> analysis, and forcing users to do the stats offline would reduce the
> temptation to use it for purposes other than its intended one.
>
> You may find "ImNotTellingYou" and may wonder what OS the user is
> really using, but they do not want to tell you, so you honor their
> wish.
>
> > I don't see a good solution because
> > GIT_USER_AGENT could be used, and the config option to not show the OS
> > name could be used too.
>
> That is a good privacy measure.
>
> > Also we don't know what could be in the "version" part. The doc says
> > that the agent part is typically of the form "package/version" but
> > doesn't require it.
>
> Exactly.  I would think it is a feature, and the way to treat the
> string in line with the philosophy behind that feature is to take it
> as a single opaque thing.
>
>
Usman Akinyemi Jan. 31, 2025, 7:46 p.m. UTC | #7
On Fri, Jan 31, 2025 at 10:07 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Christian Couder <christian.couder@gmail.com> writes:
>
> > By the way, if we put the OS information in the "agent" capability,
> > how do we separate it from the existing "package/version" content and
> > make it easy to parse?
>
> Do NOT parse, period.
>
> If three "things" that talk the Git protocol on the other end of the
> connection gives "Linux git/2.48.0", and "macOS libgit2/1.9.0", and
> "Windows git/2.47.1" as their (enhanced) "agent" strings, there is
> no "ah, this one is 1.9.0 which way older than 2.47.1 so it must be
> missing features X and Y" the users of the information are allowed
> to infer.
>
> Just take it as a single opaque string, and group identical ones.
>
> In the above scenario, we found three different kinds now.  Maybe
> we'll accumulate the counts and notice that there are N times as
> many connections whose agent string begins with "Windows" as "Linux"
> and "macOS" combined or something.  That would be an offline
> analysis, and forcing users to do the stats offline would reduce the
> temptation to use it for purposes other than its intended one.
>
> You may find "ImNotTellingYou" and may wonder what OS the user is
> really using, but they do not want to tell you, so you honor their
> wish.
While the current implementation allows user to specify this form of string
 i.e "ImNotTellingYou", for agent value, it is not mentioned in the docs,
I will add in the next iteration.
>
> > I don't see a good solution because
> > GIT_USER_AGENT could be used, and the config option to not show the OS
> > name could be used too.
>
> That is a good privacy measure.
>
> > Also we don't know what could be in the "version" part. The doc says
> > that the agent part is typically of the form "package/version" but
> > doesn't require it.
>
> Exactly.  I would think it is a feature, and the way to treat the
> string in line with the philosophy behind that feature is to take it
> as a single opaque thing.
>
>
Junio C Hamano Jan. 31, 2025, 8:15 p.m. UTC | #8
Usman Akinyemi <usmanakinyemi202@gmail.com> writes:

> Do you have any concerns "git/2.47.1 Windows" instead of
> "Windows git/2.47.1" ?

Either is fine.  I expect that

 (1) Implementors on _our_ side will do the sensible thing and
     reviewers help them to make sure, where the definition of "the
     sensible thing" will be that whatever order we pick, we
     consistently use that same order.  If "git/2.47.1 Windows" is
     how GfW identifies itself, "git/2.48.1 Linux" or "git/2.49.0
     macOS" would be its contemporary counterparts, and _our_
     binaries would not identify themselves as "Linux git/2.49.0".

 (2) Implementors of third-party reimplementations of Git will just
     mimick what we will do, as long as we tell them our intention
     (i.e. this is a single opaque unparsable string to be collected
     for statistics, nothing more) clearly enough.

 (3) Most users are lazy and/or trusting enough that only a very few
     minority privacy conscious folks would configure it away,
     making their "IamNotTellingYou" agent string merely an
     insignificant noise in the statistics.
Junio C Hamano Jan. 31, 2025, 8:17 p.m. UTC | #9
Usman Akinyemi <usmanakinyemi202@gmail.com> writes:

>> You may find "ImNotTellingYou" and may wonder what OS the user is
>> really using, but they do not want to tell you, so you honor their
>> wish.
> While the current implementation allows user to specify this form of string
>  i.e "ImNotTellingYou", for agent value, it is not mentioned in the docs,
> I will add in the next iteration.

OK.  You may want to wait before hearing other's opinions, though,
for at least the time it takes for the earth to rotate once.

Thanks.