Message ID | 20250117104639.65608-2-usmanakinyemi202@gmail.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Introduce os-version Capability with Configurable Options | expand |
Usman Akinyemi <usmanakinyemi202@gmail.com> writes: > The git_user_agent_sanitized() function performs some sanitizing to > avoid special characters being sent over the line and possibly messing > up with the protocol or with the parsing on the other side. > > Let's extract this sanitizing into a new redact_non_printables() function, > as we will want to reuse it in a following patch. > > For now the new redact_non_printables() function is still static as > it's only needed locally. > > While at it, let's use strbuf_detach() to explicitly detach the string > contained by the 'buf' strbuf. > > Mentored-by: Christian Couder <chriscool@tuxfamily.org> > Signed-off-by: Usman Akinyemi <usmanakinyemi202@gmail.com> > --- > version.c | 22 ++++++++++++++++------ > 1 file changed, 16 insertions(+), 6 deletions(-) > > diff --git a/version.c b/version.c > index 4d763ab48d..78f025c808 100644 > --- a/version.c > +++ b/version.c > @@ -6,6 +6,20 @@ > const char git_version_string[] = GIT_VERSION; > const char git_built_from_commit_string[] = GIT_BUILT_FROM_COMMIT; > > +/* > + * Trim and replace each character with ascii code below 32 or above > + * 127 (included) using a dot '.' character. /* * Trim and replace each byte outside ASCII printable * (33 to 127, inclusive) with a dot '.'. */ perhaps? > + * TODO: ensure consecutive non-printable characters are only replaced once I am not sure what your plans are for this change. Has the list reached the consensus to squish consecutive redaction dots into one in the user-agent string? If not, let's not mention it. Making an incompatible change to the user-agent string is not the primary aim of this topic anyway. > +*/ Funny indentation. The asterisk should have a SP before it, just like on the previous lines. > +static void redact_non_printables(struct strbuf *buf) > +{ > + strbuf_trim(buf); > + for (size_t i = 0; i < buf->len; i++) { > + if (buf->buf[i] <= 32 || buf->buf[i] >= 127) <sane-ctype.h> defines isprint() we can use here. > + buf->buf[i] = '.'; > + } > +} Do we want to do anything special when the resulting buf->buf[] becomes empty or just full of dots without anything else? Should the caller be told about such a condition, or is it callers' responsibility to check if they care? I am inclined to say that it is the latter. > @@ -27,12 +41,8 @@ const char *git_user_agent_sanitized(void) > struct strbuf buf = STRBUF_INIT; > > strbuf_addstr(&buf, git_user_agent()); > - strbuf_trim(&buf); > - for (size_t i = 0; i < buf.len; i++) { > - if (buf.buf[i] <= 32 || buf.buf[i] >= 127) > - buf.buf[i] = '.'; > - } > - agent = buf.buf; > + redact_non_printables(&buf); > + agent = strbuf_detach(&buf, NULL); > } > > return agent;
Junio C Hamano <gitster@pobox.com> writes: > /* > * Trim and replace each byte outside ASCII printable > * (33 to 127, inclusive) with a dot '.'. > */ > > perhaps? "127" -> "126"; that is what an inclusive range should say. Sorry for a noise.
On Fri, Jan 17, 2025 at 11:56 PM Junio C Hamano <gitster@pobox.com> wrote: > > Usman Akinyemi <usmanakinyemi202@gmail.com> writes: > > > The git_user_agent_sanitized() function performs some sanitizing to > > avoid special characters being sent over the line and possibly messing > > up with the protocol or with the parsing on the other side. > > > > Let's extract this sanitizing into a new redact_non_printables() function, > > as we will want to reuse it in a following patch. > > > > For now the new redact_non_printables() function is still static as > > it's only needed locally. > > > > While at it, let's use strbuf_detach() to explicitly detach the string > > contained by the 'buf' strbuf. > > > > Mentored-by: Christian Couder <chriscool@tuxfamily.org> > > Signed-off-by: Usman Akinyemi <usmanakinyemi202@gmail.com> > > --- > > version.c | 22 ++++++++++++++++------ > > 1 file changed, 16 insertions(+), 6 deletions(-) > > > > diff --git a/version.c b/version.c > > index 4d763ab48d..78f025c808 100644 > > --- a/version.c > > +++ b/version.c > > @@ -6,6 +6,20 @@ > > const char git_version_string[] = GIT_VERSION; > > const char git_built_from_commit_string[] = GIT_BUILT_FROM_COMMIT; > > > > +/* > > + * Trim and replace each character with ascii code below 32 or above > > + * 127 (included) using a dot '.' character. > > /* > * Trim and replace each byte outside ASCII printable > * (33 to 127, inclusive) with a dot '.'. > */ > > perhaps? This sounds confusing, it sounds like the byte we are replacing with dot are in the range of 33 to 127 whereas, it is those outside these range. > > > + * TODO: ensure consecutive non-printable characters are only replaced once > > I am not sure what your plans are for this change. Has the list > reached the consensus to squish consecutive redaction dots into one > in the user-agent string? If not, let's not mention it. Making an > incompatible change to the user-agent string is not the primary aim > of this topic anyway. > > > +*/ > > Funny indentation. The asterisk should have a SP before it, just > like on the previous lines. Mistake, thanks for catching it, will make a change to it in the next patch series. > > > +static void redact_non_printables(struct strbuf *buf) > > +{ > > + strbuf_trim(buf); > > + for (size_t i = 0; i < buf->len; i++) { > > + if (buf->buf[i] <= 32 || buf->buf[i] >= 127) > > <sane-ctype.h> defines isprint() we can use here. I think it would be better to add this in another commit so that one commit does one thing. I will add it after this patch series got settled, what do you think ? > > > + buf->buf[i] = '.'; > > + } > > +} > > Do we want to do anything special when the resulting buf->buf[] > becomes empty or just full of dots without anything else? Should > the caller be told about such a condition, or is it callers' > responsibility to check if they care? I am inclined to say that it > is the latter. I agreed. > > > @@ -27,12 +41,8 @@ const char *git_user_agent_sanitized(void) > > struct strbuf buf = STRBUF_INIT; > > > > strbuf_addstr(&buf, git_user_agent()); > > - strbuf_trim(&buf); > > - for (size_t i = 0; i < buf.len; i++) { > > - if (buf.buf[i] <= 32 || buf.buf[i] >= 127) > > - buf.buf[i] = '.'; > > - } > > - agent = buf.buf; > > + redact_non_printables(&buf); > > + agent = strbuf_detach(&buf, NULL); > > } > > > > return agent;
On Mon, Jan 20, 2025 at 6:10 PM Usman Akinyemi <usmanakinyemi202@gmail.com> wrote: > > On Fri, Jan 17, 2025 at 11:56 PM Junio C Hamano <gitster@pobox.com> wrote: > > > > Usman Akinyemi <usmanakinyemi202@gmail.com> writes: > > > +static void redact_non_printables(struct strbuf *buf) > > > +{ > > > + strbuf_trim(buf); > > > + for (size_t i = 0; i < buf->len; i++) { > > > + if (buf->buf[i] <= 32 || buf->buf[i] >= 127) > > > > <sane-ctype.h> defines isprint() we can use here. > I think it would be better to add this in another commit so that one commit > does one thing. I will add it after this patch series got settled, > what do you think ? Alternatively it could be done in its own preparatory patch at the beginning of this patch series. <sane-ctype.h> has: #define isprint(x) ((x) >= 0x20 && (x) <= 0x7e) So if we wanted to use isprint() we would have to use something like: for (size_t i = 0; i < buf->len; i++) { if (!isprint(buf->buf[i]) || buf->buf[i] == ' ') buf->buf[i] = '.'; } It would have been nicer if we didn't need a special case for SP. So I would say it's likely a matter of taste if the result is nicer than the original. > > > > > + buf->buf[i] = '.'; > > > + } > > > +}
Christian Couder <christian.couder@gmail.com> writes: > On Mon, Jan 20, 2025 at 6:10 PM Usman Akinyemi > <usmanakinyemi202@gmail.com> wrote: >> >> On Fri, Jan 17, 2025 at 11:56 PM Junio C Hamano <gitster@pobox.com> wrote: >> > >> > Usman Akinyemi <usmanakinyemi202@gmail.com> writes: > >> > > +static void redact_non_printables(struct strbuf *buf) >> > > +{ >> > > + strbuf_trim(buf); >> > > + for (size_t i = 0; i < buf->len; i++) { >> > > + if (buf->buf[i] <= 32 || buf->buf[i] >= 127) >> > >> > <sane-ctype.h> defines isprint() we can use here. >> I think it would be better to add this in another commit so that one commit >> does one thing. I will add it after this patch series got settled, >> what do you think ? > > Alternatively it could be done in its own preparatory patch at the > beginning of this patch series. Yup, a preliminary clean-up sounds fine, but so does a follow-up after all the dust settles. Thanks.
diff --git a/version.c b/version.c index 4d763ab48d..78f025c808 100644 --- a/version.c +++ b/version.c @@ -6,6 +6,20 @@ const char git_version_string[] = GIT_VERSION; const char git_built_from_commit_string[] = GIT_BUILT_FROM_COMMIT; +/* + * Trim and replace each character with ascii code below 32 or above + * 127 (included) using a dot '.' character. + * TODO: ensure consecutive non-printable characters are only replaced once +*/ +static void redact_non_printables(struct strbuf *buf) +{ + strbuf_trim(buf); + for (size_t i = 0; i < buf->len; i++) { + if (buf->buf[i] <= 32 || buf->buf[i] >= 127) + buf->buf[i] = '.'; + } +} + const char *git_user_agent(void) { static const char *agent = NULL; @@ -27,12 +41,8 @@ const char *git_user_agent_sanitized(void) struct strbuf buf = STRBUF_INIT; strbuf_addstr(&buf, git_user_agent()); - strbuf_trim(&buf); - for (size_t i = 0; i < buf.len; i++) { - if (buf.buf[i] <= 32 || buf.buf[i] >= 127) - buf.buf[i] = '.'; - } - agent = buf.buf; + redact_non_printables(&buf); + agent = strbuf_detach(&buf, NULL); } return agent;
The git_user_agent_sanitized() function performs some sanitizing to avoid special characters being sent over the line and possibly messing up with the protocol or with the parsing on the other side. Let's extract this sanitizing into a new redact_non_printables() function, as we will want to reuse it in a following patch. For now the new redact_non_printables() function is still static as it's only needed locally. While at it, let's use strbuf_detach() to explicitly detach the string contained by the 'buf' strbuf. Mentored-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Usman Akinyemi <usmanakinyemi202@gmail.com> --- version.c | 22 ++++++++++++++++------ 1 file changed, 16 insertions(+), 6 deletions(-)