diff mbox series

[v4,6/6] agent: advertise OS name via agent capability

Message ID 20250205185246.111447-7-usmanakinyemi202@gmail.com (mailing list archive)
State Superseded
Headers show
Series extend agent capability to include OS name | expand

Commit Message

Usman Akinyemi Feb. 5, 2025, 6:52 p.m. UTC
As some issues that can happen with a Git client can be operating system
specific, it can be useful for a server to know which OS a client is
using. In the same way it can be useful for a client to know which OS
a server is using.

Our current agent capability is in the form of "package/version" (e.g.,
"git/1.8.3.1"). Let's extend it to include the operating system name (os)
i.e in the form "package/version os" (e.g., "git/1.8.3.1 Linux").

Including OS details in the agent capability simplifies implementation,
maintains backward compatibility, avoids introducing a new capability,
encourages adoption across Git-compatible software, and enhances
debugging by providing complete environment information without affecting
functionality.

Add the `transfer.advertiseOSInfo` config option to address privacy
concerns. It defaults to `true` and can be changed to `false`.
When `true`, both the client and server independently append their
operating system name(os) to the `agent` capability value. The `agent`
capability will now be in form of "package/version os" (e.g.,
"git/1.8.3.1 Linux"). When `false`, the `agent` capability will be
in the form of "package/version" e.g "git/1.8.3.1". The server's
configuration is independent of the client's. Defaults to `true`.
The operating system name is retrieved using the 'sysname' field of
the `uname(2)` system call or its equivalent.

However, there are differences between `uname(1)` (command-line utility)
and `uname(2)` (system call) outputs on Windows. These discrepancies
complicate testing on Windows platforms. For example:
  - `uname(1)` output: MINGW64_NT-10.0-20348.3.4.10-87d57229.x86_64\
  .2024-02-14.20:17.UTC.x86_64
  - `uname(2)` output: Windows.10.0.20348

On Windows, uname(2) is not actually system-supplied but is instead
already faked up by Git itself. We could have overcome the test issue
on Windows by implementing a new `uname` subcommand in `test-tool`
using uname(2), but except uname(2), which would be tested against
itself, there would be nothing platform specific, so it's just simpler
to disable the tests on Windows.

Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Usman Akinyemi <usmanakinyemi202@gmail.com>
---
 Documentation/config/transfer.txt |  8 +++++++
 Documentation/gitprotocol-v2.txt  | 15 ++++++++-----
 t/t5555-http-smart-common.sh      | 10 ++++++++-
 t/t5701-git-serve.sh              |  9 +++++++-
 t/test-lib-functions.sh           |  8 +++++++
 version.c                         | 37 +++++++++++++++++++++++++++++++
 version.h                         | 15 +++++++++++++
 7 files changed, 95 insertions(+), 7 deletions(-)

Comments

Junio C Hamano Feb. 5, 2025, 9:48 p.m. UTC | #1
Usman Akinyemi <usmanakinyemi202@gmail.com> writes:

> As some issues that can happen with a Git client can be operating system
> specific, it can be useful for a server to know which OS a client is
> using. In the same way it can be useful for a client to know which OS
> a server is using.
>
> Our current agent capability is in the form of "package/version" (e.g.,
> "git/1.8.3.1"). Let's extend it to include the operating system name (os)
> i.e in the form "package/version os" (e.g., "git/1.8.3.1 Linux").
>
> Including OS details in the agent capability simplifies implementation,
> maintains backward compatibility, avoids introducing a new capability,
> encourages adoption across Git-compatible software, and enhances
> debugging by providing complete environment information without affecting
> functionality.

I obviously agree with the benefits enumerated in the above
paragraph.  The simpler, the better.

I however wonder ...

> Add the `transfer.advertiseOSInfo` config option to address privacy
> concerns. It defaults to `true` and can be changed to `false`.

... if this configuration knob is at the right granularity.

For privacy concious folks, I would imagine that the distinction
between "git/1.8.3.1" vs "git/2.48.1" would be something they do not
want to reveal equally as, if not more than, which Operating System
they are on.  Such a privacy concious user may already be using
GIT_USER_AGENT environment variable to squelch it already, anyway.

If we were to give them an improvement in the area for privacy
features, I would think it would be to add a configuration variable
to turn the agent off, instead of having to leave GIT_USER_AGENT
environment variable set in the environment of their processes.

On the other hand, for the rest of us who think "git/1.8.3.1 Linux"
is not too much of a secret, we do not need a knob to configure it
between "git/1.8.3.1" and "git/1.8.3.1 Linux".

So, while I view some parts of the series would have been a good
exercise to use various features (like config subsystem) from our
API, I prefer if we kept the end-user interface not overly
customizable (iow, without a config-knob, we do not need to add a
code to inspect the new configuration variable).

After all, GIT_USER_AGENT let's you hide not just the OS part but
any other things from the user-agent string already.

I notice that unlike user_agent() vs user_agent_sanitized(), you
only have a single function for os_info(), which I think is a good
design.  But if we were to go that route, shouldn't we call the
function os_info(), not os_info_sanitized()?  The idea behind a
single function is that you cannot obtain unsanitized version of
os_info() out of the system at all, so what _sanitized() returns
would be what os_info() without _sanitized suffix would return to
the caller anyway.

Thanks.
Usman Akinyemi Feb. 6, 2025, 6:37 a.m. UTC | #2
On Thu, Feb 6, 2025 at 3:18 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> Usman Akinyemi <usmanakinyemi202@gmail.com> writes:
>
> > As some issues that can happen with a Git client can be operating system
> > specific, it can be useful for a server to know which OS a client is
> > using. In the same way it can be useful for a client to know which OS
> > a server is using.
> >
> > Our current agent capability is in the form of "package/version" (e.g.,
> > "git/1.8.3.1"). Let's extend it to include the operating system name (os)
> > i.e in the form "package/version os" (e.g., "git/1.8.3.1 Linux").
> >
> > Including OS details in the agent capability simplifies implementation,
> > maintains backward compatibility, avoids introducing a new capability,
> > encourages adoption across Git-compatible software, and enhances
> > debugging by providing complete environment information without affecting
> > functionality.
>
> I obviously agree with the benefits enumerated in the above
> paragraph.  The simpler, the better.
>
> I however wonder ...
>
> > Add the `transfer.advertiseOSInfo` config option to address privacy
> > concerns. It defaults to `true` and can be changed to `false`.
>
> ... if this configuration knob is at the right granularity.
>
> For privacy concious folks, I would imagine that the distinction
> between "git/1.8.3.1" vs "git/2.48.1" would be something they do not
> want to reveal equally as, if not more than, which Operating System
> they are on.  Such a privacy concious user may already be using
> GIT_USER_AGENT environment variable to squelch it already, anyway.
>
> If we were to give them an improvement in the area for privacy
> features, I would think it would be to add a configuration variable
> to turn the agent off, instead of having to leave GIT_USER_AGENT
> environment variable set in the environment of their processes.
>
> On the other hand, for the rest of us who think "git/1.8.3.1 Linux"
> is not too much of a secret, we do not need a knob to configure it
> between "git/1.8.3.1" and "git/1.8.3.1 Linux".
>
> So, while I view some parts of the series would have been a good
> exercise to use various features (like config subsystem) from our
> API, I prefer if we kept the end-user interface not overly
> customizable (iow, without a config-knob, we do not need to add a
> code to inspect the new configuration variable).
>
> After all, GIT_USER_AGENT let's you hide not just the OS part but
> any other things from the user-agent string already.
Hi Junio,

The conclusion now is that we should not add any config option since
the GIT_USER_AGENT could actually allow the user to hide whatever
info they do not want to share ?
>
> I notice that unlike user_agent() vs user_agent_sanitized(), you
> only have a single function for os_info(), which I think is a good
> design.  But if we were to go that route, shouldn't we call the
> function os_info(), not os_info_sanitized()?  The idea behind a
> single function is that you cannot obtain unsanitized version of
> os_info() out of the system at all, so what _sanitized() returns
> would be what os_info() without _sanitized suffix would return to
> the caller anyway.
Yeah, we can change it to os_info, if in the future someone needs
the os information in some way, they could use the get_uname_info.

Thanks.
>
> Thanks.
Junio C Hamano Feb. 6, 2025, 3:13 p.m. UTC | #3
Usman Akinyemi <usmanakinyemi202@gmail.com> writes:

>> I obviously agree with the benefits enumerated in the above
>> paragraph.  The simpler, the better.
>>
>> I however wonder ...
>>
>> > Add the `transfer.advertiseOSInfo` config option to address privacy
>> > concerns. It defaults to `true` and can be changed to `false`.
>>
>> ... if this configuration knob is at the right granularity.
>
> The conclusion now is that we should not add any config option since
> the GIT_USER_AGENT could actually allow the user to hide whatever
> info they do not want to share ?

I wouldn't call that a conclusion (as you and I are the only people
who expressed their opinion on this so far), but that is my take on
it---tweaking only the (os) part in the agent string with a config
smells like the tweakability is at a wrong level.
Usman Akinyemi Feb. 7, 2025, 5:27 p.m. UTC | #4
On Thu, Feb 6, 2025 at 8:43 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Usman Akinyemi <usmanakinyemi202@gmail.com> writes:
>
> >> I obviously agree with the benefits enumerated in the above
> >> paragraph.  The simpler, the better.
> >>
> >> I however wonder ...
> >>
> >> > Add the `transfer.advertiseOSInfo` config option to address privacy
> >> > concerns. It defaults to `true` and can be changed to `false`.
> >>
> >> ... if this configuration knob is at the right granularity.
> >
> > The conclusion now is that we should not add any config option since
> > the GIT_USER_AGENT could actually allow the user to hide whatever
> > info they do not want to share ?
>
> I wouldn't call that a conclusion (as you and I are the only people
> who expressed their opinion on this so far), but that is my take on
> it---tweaking only the (os) part in the agent string with a config
> smells like the tweakability is at a wrong level.
>
Hi Junio,

I was actually thinking about this inside the bathroom when it
occurred to me that,
according to the current implementation, GIT_USER_AGENT will not allow the user
to specify an empty string at all. It is either you specify some value
or we decide for
you. I think we can add the config at a level that can disable the
agent capability completely
instead of only tweaking the (os) part.

With this, the user can disable the agent  capability completely,
share whatever string they want using the GIT_USER_AGENT.

What do you think ?

Thanks.
Junio C Hamano Feb. 7, 2025, 5:57 p.m. UTC | #5
Usman Akinyemi <usmanakinyemi202@gmail.com> writes:

> I was actually thinking about this inside the bathroom when it
> occurred to me that,
> according to the current implementation, GIT_USER_AGENT will not allow the user
> to specify an empty string at all. It is either you specify some value
> or we decide for
> you.

Yes.  GIT_USER_AGENT=ImNotTellingYou would work just fine for
privacy concious folks.

> I think we can add the config at a level that can disable the
> agent capability completely
> instead of only tweaking the (os) part.

Yes, go back to a few messages you received from me earlier; it is
already there ;-)

    If we were to give them an improvement in the area for privacy
    features, I would think it would be to add a configuration variable
    to turn the agent off, instead of having to leave GIT_USER_AGENT
    environment variable set in the environment of their processes.
Usman Akinyemi Feb. 7, 2025, 7:25 p.m. UTC | #6
On Thu, Feb 6, 2025 at 3:18 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> Usman Akinyemi <usmanakinyemi202@gmail.com> writes:
>
> > As some issues that can happen with a Git client can be operating system
> > specific, it can be useful for a server to know which OS a client is
> > using. In the same way it can be useful for a client to know which OS
> > a server is using.
> >
> > Our current agent capability is in the form of "package/version" (e.g.,
> > "git/1.8.3.1"). Let's extend it to include the operating system name (os)
> > i.e in the form "package/version os" (e.g., "git/1.8.3.1 Linux").
> >
> > Including OS details in the agent capability simplifies implementation,
> > maintains backward compatibility, avoids introducing a new capability,
> > encourages adoption across Git-compatible software, and enhances
> > debugging by providing complete environment information without affecting
> > functionality.
>
> I obviously agree with the benefits enumerated in the above
> paragraph.  The simpler, the better.
>
> I however wonder ...
>
> > Add the `transfer.advertiseOSInfo` config option to address privacy
> > concerns. It defaults to `true` and can be changed to `false`.
>
> ... if this configuration knob is at the right granularity.
>
> For privacy concious folks, I would imagine that the distinction
> between "git/1.8.3.1" vs "git/2.48.1" would be something they do not
> want to reveal equally as, if not more than, which Operating System
> they are on.  Such a privacy concious user may already be using
> GIT_USER_AGENT environment variable to squelch it already, anyway.
>
> If we were to give them an improvement in the area for privacy
> features, I would think it would be to add a configuration variable
> to turn the agent off, instead of having to leave GIT_USER_AGENT
> environment variable set in the environment of their processes.
>
> On the other hand, for the rest of us who think "git/1.8.3.1 Linux"
> is not too much of a secret, we do not need a knob to configure it
> between "git/1.8.3.1" and "git/1.8.3.1 Linux".
>
> So, while I view some parts of the series would have been a good
> exercise to use various features (like config subsystem) from our
> API, I prefer if we kept the end-user interface not overly
> customizable (iow, without a config-knob, we do not need to add a
> code to inspect the new configuration variable).
Hi Junio,

Yeah, I believe appending the (os) to the agent might attract the attention
of some set of privacy conscious users who might not really be worried
about the agent when it was just a string like "git/1.8.3.1". While
GIT_USER_AGENT
can be used to suppress it, I believe having a dedicated config option to
completely disable the agent is a more user-friendly and persistent approach
than relying solely on environment variables.
Requiring users to manually set GIT_USER_AGENT (since it cannot be empty)
can feel cumbersome, making a config option a cleaner and more
intuitive alternative.
Additionally, having a config option would provide a consistent
mechanism in case
similar privacy-related features are introduced in the future.

This is me convincing you that having a config option to disable the
agent is more user friendly than having only the environment variable for
users who do not want to share anything at all.

What do you think ? Or maybe there is strong reason for having the
GIT_USER_AGENT in the first place and not having a config to disable
the agent capability?

We could also wait for input from other community members.

Thanks,
Usman

>
> After all, GIT_USER_AGENT let's you hide not just the OS part but
> any other things from the user-agent string already.
>
> I notice that unlike user_agent() vs user_agent_sanitized(), you
> only have a single function for os_info(), which I think is a good
> design.  But if we were to go that route, shouldn't we call the
> function os_info(), not os_info_sanitized()?  The idea behind a
> single function is that you cannot obtain unsanitized version of
> os_info() out of the system at all, so what _sanitized() returns
> would be what os_info() without _sanitized suffix would return to
> the caller anyway.
>
> Thanks.
diff mbox series

Patch

diff --git a/Documentation/config/transfer.txt b/Documentation/config/transfer.txt
index f1ce50f4a6..1e1dc849ef 100644
--- a/Documentation/config/transfer.txt
+++ b/Documentation/config/transfer.txt
@@ -125,3 +125,11 @@  transfer.bundleURI::
 transfer.advertiseObjectInfo::
 	When `true`, the `object-info` capability is advertised by
 	servers. Defaults to false.
+
+transfer.advertiseOSInfo::
+	When `true`, both the client and server independently append their
+	operating system name (os) to the `agent` capability value. The `agent`
+	capability will now be in form of "package/version os" (e.g.,
+	"git/1.8.3.1 Linux"). When `false`, the `agent` capability will be
+	in the form of "package/version" e.g "git/1.8.3.1". The server's
+	configuration is independent of the client's. Defaults to `true`.
diff --git a/Documentation/gitprotocol-v2.txt b/Documentation/gitprotocol-v2.txt
index 1652fef3ae..8fab7d7d52 100644
--- a/Documentation/gitprotocol-v2.txt
+++ b/Documentation/gitprotocol-v2.txt
@@ -184,11 +184,16 @@  form `agent=X`) to notify the client that the server is running version
 the `agent` capability with a value `Y` (in the form `agent=Y`) in its
 request to the server (but it MUST NOT do so if the server did not
 advertise the agent capability). The `X` and `Y` strings may contain any
-printable ASCII characters except space (i.e., the byte range 32 < x <
-127), and are typically of the form "package/version" (e.g.,
-"git/1.8.3.1"). The agent strings are purely informative for statistics
-and debugging purposes, and MUST NOT be used to programmatically assume
-the presence or absence of particular features.
+printable ASCII characters (i.e., the byte range 32 < x < 127), and are
+typically of the form "package/version os" (e.g., "git/1.8.3.1 Linux")
+where `os` is the operating system name (e.g., "Linux"). `X` and `Y` can
+be configured using the GIT_USER_AGENT environment variable and it takes
+priority. If `transfer.advertiseOSInfo` is `false` on the server, the server
+omits the `os` from X. If it is `false` on the client, the client omits the
+`os` from `Y`. The `os` is retrieved using the 'sysname' field of the `uname(2)`
+system call or its equivalent. The agent strings are purely informative for
+statistics and debugging purposes, and MUST NOT be used to programmatically
+assume the presence or absence of particular features.
 
 ls-refs
 ~~~~~~~
diff --git a/t/t5555-http-smart-common.sh b/t/t5555-http-smart-common.sh
index e47ea1ad10..140a7f0ffb 100755
--- a/t/t5555-http-smart-common.sh
+++ b/t/t5555-http-smart-common.sh
@@ -123,9 +123,17 @@  test_expect_success 'git receive-pack --advertise-refs: v1' '
 '
 
 test_expect_success 'git upload-pack --advertise-refs: v2' '
+	printf "agent=FAKE" >agent_capability &&
+	if test_have_prereq WINDOWS
+	then
+		printf "\n" >>agent_capability &&
+		git config transfer.advertiseOSInfo false
+	else
+		printf " %s\n" $(uname -s | test_redact_non_printables) >>agent_capability
+	fi &&
 	cat >expect <<-EOF &&
 	version 2
-	agent=FAKE
+	$(cat agent_capability)
 	ls-refs=unborn
 	fetch=shallow wait-for-done
 	server-option
diff --git a/t/t5701-git-serve.sh b/t/t5701-git-serve.sh
index 4c24a188b9..a4c12372f8 100755
--- a/t/t5701-git-serve.sh
+++ b/t/t5701-git-serve.sh
@@ -8,13 +8,20 @@  export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
 . ./test-lib.sh
 
 test_expect_success 'setup to generate files with expected content' '
-	printf "agent=git/%s\n" "$(git version | cut -d" " -f3)" >agent_capability &&
+	printf "agent=git/%s" "$(git version | cut -d" " -f3)" >agent_capability &&
 
 	test_oid_cache <<-EOF &&
 	wrong_algo sha1:sha256
 	wrong_algo sha256:sha1
 	EOF
 
+	if test_have_prereq WINDOWS
+	then
+		printf "\n" >>agent_capability &&
+		git config transfer.advertiseOSInfo false
+	else
+		printf " %s\n" $(uname -s | test_redact_non_printables) >>agent_capability
+	fi &&
 	cat >expect.base <<-EOF &&
 	version 2
 	$(cat agent_capability)
diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
index 78e054ab50..3465904323 100644
--- a/t/test-lib-functions.sh
+++ b/t/test-lib-functions.sh
@@ -2007,3 +2007,11 @@  test_trailing_hash () {
 		test-tool hexdump |
 		sed "s/ //g"
 }
+
+# Trim and replace each character with ascii code below 32 or above
+# 127 (included) using a dot '.' character.
+# Octal intervals \001-\040 and \177-\377
+# correspond to decimal intervals 1-32 and 127-255
+test_redact_non_printables () {
+    tr -d "\n\r" | tr "[\001-\040][\177-\377]" "."
+}
diff --git a/version.c b/version.c
index d95221a72a..f0f936a75e 100644
--- a/version.c
+++ b/version.c
@@ -1,9 +1,12 @@ 
+#define USE_THE_REPOSITORY_VARIABLE
+
 #include "git-compat-util.h"
 #include "version.h"
 #include "version-def.h"
 #include "strbuf.h"
 #include "sane-ctype.h"
 #include "gettext.h"
+#include "config.h"
 
 const char git_version_string[] = GIT_VERSION;
 const char git_built_from_commit_string[] = GIT_BUILT_FROM_COMMIT;
@@ -43,6 +46,12 @@  const char *git_user_agent_sanitized(void)
 
 		strbuf_addstr(&buf, git_user_agent());
 		redact_non_printables(&buf);
+		/* Add os name if the transfer.advertiseosinfo config is true */
+		if (advertise_os_info()) {
+			/* Add space to space character after git version string */
+			strbuf_addch(&buf, ' ');
+			strbuf_addstr(&buf, os_info_sanitized());
+		}
 		agent = strbuf_detach(&buf, NULL);
 	}
 
@@ -69,3 +78,31 @@  int get_uname_info(struct strbuf *buf, unsigned int full)
 	     strbuf_addf(buf, "%s\n", uname_info.sysname);
 	return 0;
 }
+
+const char *os_info_sanitized(void)
+{
+	static const char *os = NULL;
+
+	if (!os) {
+		struct strbuf buf = STRBUF_INIT;
+
+		get_uname_info(&buf, 0);
+		/* Sanitize the os information immediately */
+		redact_non_printables(&buf);
+		os = strbuf_detach(&buf, NULL);
+	}
+
+	return os;
+}
+
+int advertise_os_info(void)
+{
+	static int transfer_advertise_os_info= -1;
+
+	if (transfer_advertise_os_info == -1) {
+		repo_config_get_bool(the_repository, "transfer.advertiseosinfo", &transfer_advertise_os_info);
+		/* enabled by default */
+		transfer_advertise_os_info = !!transfer_advertise_os_info;
+	}
+	return transfer_advertise_os_info;
+}
diff --git a/version.h b/version.h
index 5eb586c0bd..b2325865d7 100644
--- a/version.h
+++ b/version.h
@@ -1,6 +1,8 @@ 
 #ifndef VERSION_H
 #define VERSION_H
 
+struct repository;
+
 extern const char git_version_string[];
 extern const char git_built_from_commit_string[];
 
@@ -14,4 +16,17 @@  const char *git_user_agent_sanitized(void);
 */
 int get_uname_info(struct strbuf *buf, unsigned int full);
 
+/*
+  Retrieve, sanitize and cache operating system info for subsequent
+  calls. Return a pointer to the sanitized operating system info
+  string.
+*/
+const char *os_info_sanitized(void);
+
+/*
+  Retrieve and cache transfer.advertiseosinfo config value. Return 1
+  if true, 0 if false.
+*/
+int advertise_os_info(void);
+
 #endif /* VERSION_H */