Message ID | 20250205185246.111447-7-usmanakinyemi202@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | extend agent capability to include OS name | expand |
Usman Akinyemi <usmanakinyemi202@gmail.com> writes: > As some issues that can happen with a Git client can be operating system > specific, it can be useful for a server to know which OS a client is > using. In the same way it can be useful for a client to know which OS > a server is using. > > Our current agent capability is in the form of "package/version" (e.g., > "git/1.8.3.1"). Let's extend it to include the operating system name (os) > i.e in the form "package/version os" (e.g., "git/1.8.3.1 Linux"). > > Including OS details in the agent capability simplifies implementation, > maintains backward compatibility, avoids introducing a new capability, > encourages adoption across Git-compatible software, and enhances > debugging by providing complete environment information without affecting > functionality. I obviously agree with the benefits enumerated in the above paragraph. The simpler, the better. I however wonder ... > Add the `transfer.advertiseOSInfo` config option to address privacy > concerns. It defaults to `true` and can be changed to `false`. ... if this configuration knob is at the right granularity. For privacy concious folks, I would imagine that the distinction between "git/1.8.3.1" vs "git/2.48.1" would be something they do not want to reveal equally as, if not more than, which Operating System they are on. Such a privacy concious user may already be using GIT_USER_AGENT environment variable to squelch it already, anyway. If we were to give them an improvement in the area for privacy features, I would think it would be to add a configuration variable to turn the agent off, instead of having to leave GIT_USER_AGENT environment variable set in the environment of their processes. On the other hand, for the rest of us who think "git/1.8.3.1 Linux" is not too much of a secret, we do not need a knob to configure it between "git/1.8.3.1" and "git/1.8.3.1 Linux". So, while I view some parts of the series would have been a good exercise to use various features (like config subsystem) from our API, I prefer if we kept the end-user interface not overly customizable (iow, without a config-knob, we do not need to add a code to inspect the new configuration variable). After all, GIT_USER_AGENT let's you hide not just the OS part but any other things from the user-agent string already. I notice that unlike user_agent() vs user_agent_sanitized(), you only have a single function for os_info(), which I think is a good design. But if we were to go that route, shouldn't we call the function os_info(), not os_info_sanitized()? The idea behind a single function is that you cannot obtain unsanitized version of os_info() out of the system at all, so what _sanitized() returns would be what os_info() without _sanitized suffix would return to the caller anyway. Thanks.
On Thu, Feb 6, 2025 at 3:18 AM Junio C Hamano <gitster@pobox.com> wrote: > > Usman Akinyemi <usmanakinyemi202@gmail.com> writes: > > > As some issues that can happen with a Git client can be operating system > > specific, it can be useful for a server to know which OS a client is > > using. In the same way it can be useful for a client to know which OS > > a server is using. > > > > Our current agent capability is in the form of "package/version" (e.g., > > "git/1.8.3.1"). Let's extend it to include the operating system name (os) > > i.e in the form "package/version os" (e.g., "git/1.8.3.1 Linux"). > > > > Including OS details in the agent capability simplifies implementation, > > maintains backward compatibility, avoids introducing a new capability, > > encourages adoption across Git-compatible software, and enhances > > debugging by providing complete environment information without affecting > > functionality. > > I obviously agree with the benefits enumerated in the above > paragraph. The simpler, the better. > > I however wonder ... > > > Add the `transfer.advertiseOSInfo` config option to address privacy > > concerns. It defaults to `true` and can be changed to `false`. > > ... if this configuration knob is at the right granularity. > > For privacy concious folks, I would imagine that the distinction > between "git/1.8.3.1" vs "git/2.48.1" would be something they do not > want to reveal equally as, if not more than, which Operating System > they are on. Such a privacy concious user may already be using > GIT_USER_AGENT environment variable to squelch it already, anyway. > > If we were to give them an improvement in the area for privacy > features, I would think it would be to add a configuration variable > to turn the agent off, instead of having to leave GIT_USER_AGENT > environment variable set in the environment of their processes. > > On the other hand, for the rest of us who think "git/1.8.3.1 Linux" > is not too much of a secret, we do not need a knob to configure it > between "git/1.8.3.1" and "git/1.8.3.1 Linux". > > So, while I view some parts of the series would have been a good > exercise to use various features (like config subsystem) from our > API, I prefer if we kept the end-user interface not overly > customizable (iow, without a config-knob, we do not need to add a > code to inspect the new configuration variable). > > After all, GIT_USER_AGENT let's you hide not just the OS part but > any other things from the user-agent string already. Hi Junio, The conclusion now is that we should not add any config option since the GIT_USER_AGENT could actually allow the user to hide whatever info they do not want to share ? > > I notice that unlike user_agent() vs user_agent_sanitized(), you > only have a single function for os_info(), which I think is a good > design. But if we were to go that route, shouldn't we call the > function os_info(), not os_info_sanitized()? The idea behind a > single function is that you cannot obtain unsanitized version of > os_info() out of the system at all, so what _sanitized() returns > would be what os_info() without _sanitized suffix would return to > the caller anyway. Yeah, we can change it to os_info, if in the future someone needs the os information in some way, they could use the get_uname_info. Thanks. > > Thanks.
Usman Akinyemi <usmanakinyemi202@gmail.com> writes: >> I obviously agree with the benefits enumerated in the above >> paragraph. The simpler, the better. >> >> I however wonder ... >> >> > Add the `transfer.advertiseOSInfo` config option to address privacy >> > concerns. It defaults to `true` and can be changed to `false`. >> >> ... if this configuration knob is at the right granularity. > > The conclusion now is that we should not add any config option since > the GIT_USER_AGENT could actually allow the user to hide whatever > info they do not want to share ? I wouldn't call that a conclusion (as you and I are the only people who expressed their opinion on this so far), but that is my take on it---tweaking only the (os) part in the agent string with a config smells like the tweakability is at a wrong level.
On Thu, Feb 6, 2025 at 8:43 PM Junio C Hamano <gitster@pobox.com> wrote: > > Usman Akinyemi <usmanakinyemi202@gmail.com> writes: > > >> I obviously agree with the benefits enumerated in the above > >> paragraph. The simpler, the better. > >> > >> I however wonder ... > >> > >> > Add the `transfer.advertiseOSInfo` config option to address privacy > >> > concerns. It defaults to `true` and can be changed to `false`. > >> > >> ... if this configuration knob is at the right granularity. > > > > The conclusion now is that we should not add any config option since > > the GIT_USER_AGENT could actually allow the user to hide whatever > > info they do not want to share ? > > I wouldn't call that a conclusion (as you and I are the only people > who expressed their opinion on this so far), but that is my take on > it---tweaking only the (os) part in the agent string with a config > smells like the tweakability is at a wrong level. > Hi Junio, I was actually thinking about this inside the bathroom when it occurred to me that, according to the current implementation, GIT_USER_AGENT will not allow the user to specify an empty string at all. It is either you specify some value or we decide for you. I think we can add the config at a level that can disable the agent capability completely instead of only tweaking the (os) part. With this, the user can disable the agent capability completely, share whatever string they want using the GIT_USER_AGENT. What do you think ? Thanks.
Usman Akinyemi <usmanakinyemi202@gmail.com> writes: > I was actually thinking about this inside the bathroom when it > occurred to me that, > according to the current implementation, GIT_USER_AGENT will not allow the user > to specify an empty string at all. It is either you specify some value > or we decide for > you. Yes. GIT_USER_AGENT=ImNotTellingYou would work just fine for privacy concious folks. > I think we can add the config at a level that can disable the > agent capability completely > instead of only tweaking the (os) part. Yes, go back to a few messages you received from me earlier; it is already there ;-) If we were to give them an improvement in the area for privacy features, I would think it would be to add a configuration variable to turn the agent off, instead of having to leave GIT_USER_AGENT environment variable set in the environment of their processes.
On Thu, Feb 6, 2025 at 3:18 AM Junio C Hamano <gitster@pobox.com> wrote: > > Usman Akinyemi <usmanakinyemi202@gmail.com> writes: > > > As some issues that can happen with a Git client can be operating system > > specific, it can be useful for a server to know which OS a client is > > using. In the same way it can be useful for a client to know which OS > > a server is using. > > > > Our current agent capability is in the form of "package/version" (e.g., > > "git/1.8.3.1"). Let's extend it to include the operating system name (os) > > i.e in the form "package/version os" (e.g., "git/1.8.3.1 Linux"). > > > > Including OS details in the agent capability simplifies implementation, > > maintains backward compatibility, avoids introducing a new capability, > > encourages adoption across Git-compatible software, and enhances > > debugging by providing complete environment information without affecting > > functionality. > > I obviously agree with the benefits enumerated in the above > paragraph. The simpler, the better. > > I however wonder ... > > > Add the `transfer.advertiseOSInfo` config option to address privacy > > concerns. It defaults to `true` and can be changed to `false`. > > ... if this configuration knob is at the right granularity. > > For privacy concious folks, I would imagine that the distinction > between "git/1.8.3.1" vs "git/2.48.1" would be something they do not > want to reveal equally as, if not more than, which Operating System > they are on. Such a privacy concious user may already be using > GIT_USER_AGENT environment variable to squelch it already, anyway. > > If we were to give them an improvement in the area for privacy > features, I would think it would be to add a configuration variable > to turn the agent off, instead of having to leave GIT_USER_AGENT > environment variable set in the environment of their processes. > > On the other hand, for the rest of us who think "git/1.8.3.1 Linux" > is not too much of a secret, we do not need a knob to configure it > between "git/1.8.3.1" and "git/1.8.3.1 Linux". > > So, while I view some parts of the series would have been a good > exercise to use various features (like config subsystem) from our > API, I prefer if we kept the end-user interface not overly > customizable (iow, without a config-knob, we do not need to add a > code to inspect the new configuration variable). Hi Junio, Yeah, I believe appending the (os) to the agent might attract the attention of some set of privacy conscious users who might not really be worried about the agent when it was just a string like "git/1.8.3.1". While GIT_USER_AGENT can be used to suppress it, I believe having a dedicated config option to completely disable the agent is a more user-friendly and persistent approach than relying solely on environment variables. Requiring users to manually set GIT_USER_AGENT (since it cannot be empty) can feel cumbersome, making a config option a cleaner and more intuitive alternative. Additionally, having a config option would provide a consistent mechanism in case similar privacy-related features are introduced in the future. This is me convincing you that having a config option to disable the agent is more user friendly than having only the environment variable for users who do not want to share anything at all. What do you think ? Or maybe there is strong reason for having the GIT_USER_AGENT in the first place and not having a config to disable the agent capability? We could also wait for input from other community members. Thanks, Usman > > After all, GIT_USER_AGENT let's you hide not just the OS part but > any other things from the user-agent string already. > > I notice that unlike user_agent() vs user_agent_sanitized(), you > only have a single function for os_info(), which I think is a good > design. But if we were to go that route, shouldn't we call the > function os_info(), not os_info_sanitized()? The idea behind a > single function is that you cannot obtain unsanitized version of > os_info() out of the system at all, so what _sanitized() returns > would be what os_info() without _sanitized suffix would return to > the caller anyway. > > Thanks.
diff --git a/Documentation/config/transfer.txt b/Documentation/config/transfer.txt index f1ce50f4a6..1e1dc849ef 100644 --- a/Documentation/config/transfer.txt +++ b/Documentation/config/transfer.txt @@ -125,3 +125,11 @@ transfer.bundleURI:: transfer.advertiseObjectInfo:: When `true`, the `object-info` capability is advertised by servers. Defaults to false. + +transfer.advertiseOSInfo:: + When `true`, both the client and server independently append their + operating system name (os) to the `agent` capability value. The `agent` + capability will now be in form of "package/version os" (e.g., + "git/1.8.3.1 Linux"). When `false`, the `agent` capability will be + in the form of "package/version" e.g "git/1.8.3.1". The server's + configuration is independent of the client's. Defaults to `true`. diff --git a/Documentation/gitprotocol-v2.txt b/Documentation/gitprotocol-v2.txt index 1652fef3ae..8fab7d7d52 100644 --- a/Documentation/gitprotocol-v2.txt +++ b/Documentation/gitprotocol-v2.txt @@ -184,11 +184,16 @@ form `agent=X`) to notify the client that the server is running version the `agent` capability with a value `Y` (in the form `agent=Y`) in its request to the server (but it MUST NOT do so if the server did not advertise the agent capability). The `X` and `Y` strings may contain any -printable ASCII characters except space (i.e., the byte range 32 < x < -127), and are typically of the form "package/version" (e.g., -"git/1.8.3.1"). The agent strings are purely informative for statistics -and debugging purposes, and MUST NOT be used to programmatically assume -the presence or absence of particular features. +printable ASCII characters (i.e., the byte range 32 < x < 127), and are +typically of the form "package/version os" (e.g., "git/1.8.3.1 Linux") +where `os` is the operating system name (e.g., "Linux"). `X` and `Y` can +be configured using the GIT_USER_AGENT environment variable and it takes +priority. If `transfer.advertiseOSInfo` is `false` on the server, the server +omits the `os` from X. If it is `false` on the client, the client omits the +`os` from `Y`. The `os` is retrieved using the 'sysname' field of the `uname(2)` +system call or its equivalent. The agent strings are purely informative for +statistics and debugging purposes, and MUST NOT be used to programmatically +assume the presence or absence of particular features. ls-refs ~~~~~~~ diff --git a/t/t5555-http-smart-common.sh b/t/t5555-http-smart-common.sh index e47ea1ad10..140a7f0ffb 100755 --- a/t/t5555-http-smart-common.sh +++ b/t/t5555-http-smart-common.sh @@ -123,9 +123,17 @@ test_expect_success 'git receive-pack --advertise-refs: v1' ' ' test_expect_success 'git upload-pack --advertise-refs: v2' ' + printf "agent=FAKE" >agent_capability && + if test_have_prereq WINDOWS + then + printf "\n" >>agent_capability && + git config transfer.advertiseOSInfo false + else + printf " %s\n" $(uname -s | test_redact_non_printables) >>agent_capability + fi && cat >expect <<-EOF && version 2 - agent=FAKE + $(cat agent_capability) ls-refs=unborn fetch=shallow wait-for-done server-option diff --git a/t/t5701-git-serve.sh b/t/t5701-git-serve.sh index 4c24a188b9..a4c12372f8 100755 --- a/t/t5701-git-serve.sh +++ b/t/t5701-git-serve.sh @@ -8,13 +8,20 @@ export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME . ./test-lib.sh test_expect_success 'setup to generate files with expected content' ' - printf "agent=git/%s\n" "$(git version | cut -d" " -f3)" >agent_capability && + printf "agent=git/%s" "$(git version | cut -d" " -f3)" >agent_capability && test_oid_cache <<-EOF && wrong_algo sha1:sha256 wrong_algo sha256:sha1 EOF + if test_have_prereq WINDOWS + then + printf "\n" >>agent_capability && + git config transfer.advertiseOSInfo false + else + printf " %s\n" $(uname -s | test_redact_non_printables) >>agent_capability + fi && cat >expect.base <<-EOF && version 2 $(cat agent_capability) diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh index 78e054ab50..3465904323 100644 --- a/t/test-lib-functions.sh +++ b/t/test-lib-functions.sh @@ -2007,3 +2007,11 @@ test_trailing_hash () { test-tool hexdump | sed "s/ //g" } + +# Trim and replace each character with ascii code below 32 or above +# 127 (included) using a dot '.' character. +# Octal intervals \001-\040 and \177-\377 +# correspond to decimal intervals 1-32 and 127-255 +test_redact_non_printables () { + tr -d "\n\r" | tr "[\001-\040][\177-\377]" "." +} diff --git a/version.c b/version.c index d95221a72a..f0f936a75e 100644 --- a/version.c +++ b/version.c @@ -1,9 +1,12 @@ +#define USE_THE_REPOSITORY_VARIABLE + #include "git-compat-util.h" #include "version.h" #include "version-def.h" #include "strbuf.h" #include "sane-ctype.h" #include "gettext.h" +#include "config.h" const char git_version_string[] = GIT_VERSION; const char git_built_from_commit_string[] = GIT_BUILT_FROM_COMMIT; @@ -43,6 +46,12 @@ const char *git_user_agent_sanitized(void) strbuf_addstr(&buf, git_user_agent()); redact_non_printables(&buf); + /* Add os name if the transfer.advertiseosinfo config is true */ + if (advertise_os_info()) { + /* Add space to space character after git version string */ + strbuf_addch(&buf, ' '); + strbuf_addstr(&buf, os_info_sanitized()); + } agent = strbuf_detach(&buf, NULL); } @@ -69,3 +78,31 @@ int get_uname_info(struct strbuf *buf, unsigned int full) strbuf_addf(buf, "%s\n", uname_info.sysname); return 0; } + +const char *os_info_sanitized(void) +{ + static const char *os = NULL; + + if (!os) { + struct strbuf buf = STRBUF_INIT; + + get_uname_info(&buf, 0); + /* Sanitize the os information immediately */ + redact_non_printables(&buf); + os = strbuf_detach(&buf, NULL); + } + + return os; +} + +int advertise_os_info(void) +{ + static int transfer_advertise_os_info= -1; + + if (transfer_advertise_os_info == -1) { + repo_config_get_bool(the_repository, "transfer.advertiseosinfo", &transfer_advertise_os_info); + /* enabled by default */ + transfer_advertise_os_info = !!transfer_advertise_os_info; + } + return transfer_advertise_os_info; +} diff --git a/version.h b/version.h index 5eb586c0bd..b2325865d7 100644 --- a/version.h +++ b/version.h @@ -1,6 +1,8 @@ #ifndef VERSION_H #define VERSION_H +struct repository; + extern const char git_version_string[]; extern const char git_built_from_commit_string[]; @@ -14,4 +16,17 @@ const char *git_user_agent_sanitized(void); */ int get_uname_info(struct strbuf *buf, unsigned int full); +/* + Retrieve, sanitize and cache operating system info for subsequent + calls. Return a pointer to the sanitized operating system info + string. +*/ +const char *os_info_sanitized(void); + +/* + Retrieve and cache transfer.advertiseosinfo config value. Return 1 + if true, 0 if false. +*/ +int advertise_os_info(void); + #endif /* VERSION_H */
As some issues that can happen with a Git client can be operating system specific, it can be useful for a server to know which OS a client is using. In the same way it can be useful for a client to know which OS a server is using. Our current agent capability is in the form of "package/version" (e.g., "git/1.8.3.1"). Let's extend it to include the operating system name (os) i.e in the form "package/version os" (e.g., "git/1.8.3.1 Linux"). Including OS details in the agent capability simplifies implementation, maintains backward compatibility, avoids introducing a new capability, encourages adoption across Git-compatible software, and enhances debugging by providing complete environment information without affecting functionality. Add the `transfer.advertiseOSInfo` config option to address privacy concerns. It defaults to `true` and can be changed to `false`. When `true`, both the client and server independently append their operating system name(os) to the `agent` capability value. The `agent` capability will now be in form of "package/version os" (e.g., "git/1.8.3.1 Linux"). When `false`, the `agent` capability will be in the form of "package/version" e.g "git/1.8.3.1". The server's configuration is independent of the client's. Defaults to `true`. The operating system name is retrieved using the 'sysname' field of the `uname(2)` system call or its equivalent. However, there are differences between `uname(1)` (command-line utility) and `uname(2)` (system call) outputs on Windows. These discrepancies complicate testing on Windows platforms. For example: - `uname(1)` output: MINGW64_NT-10.0-20348.3.4.10-87d57229.x86_64\ .2024-02-14.20:17.UTC.x86_64 - `uname(2)` output: Windows.10.0.20348 On Windows, uname(2) is not actually system-supplied but is instead already faked up by Git itself. We could have overcome the test issue on Windows by implementing a new `uname` subcommand in `test-tool` using uname(2), but except uname(2), which would be tested against itself, there would be nothing platform specific, so it's just simpler to disable the tests on Windows. Mentored-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Usman Akinyemi <usmanakinyemi202@gmail.com> --- Documentation/config/transfer.txt | 8 +++++++ Documentation/gitprotocol-v2.txt | 15 ++++++++----- t/t5555-http-smart-common.sh | 10 ++++++++- t/t5701-git-serve.sh | 9 +++++++- t/test-lib-functions.sh | 8 +++++++ version.c | 37 +++++++++++++++++++++++++++++++ version.h | 15 +++++++++++++ 7 files changed, 95 insertions(+), 7 deletions(-)