diff mbox series

[v2,1/6] version: refactor redact_non_printables()

Message ID 20250117104639.65608-2-usmanakinyemi202@gmail.com (mailing list archive)
State New
Headers show
Series Introduce os-version Capability with Configurable Options | expand

Commit Message

Usman Akinyemi Jan. 17, 2025, 10:46 a.m. UTC
The git_user_agent_sanitized() function performs some sanitizing to
avoid special characters being sent over the line and possibly messing
up with the protocol or with the parsing on the other side.

Let's extract this sanitizing into a new redact_non_printables() function,
as we will want to reuse it in a following patch.

For now the new redact_non_printables() function is still static as
it's only needed locally.

While at it, let's use strbuf_detach() to explicitly detach the string
contained by the 'buf' strbuf.

Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Usman Akinyemi <usmanakinyemi202@gmail.com>
---
 version.c | 22 ++++++++++++++++------
 1 file changed, 16 insertions(+), 6 deletions(-)

Comments

Junio C Hamano Jan. 17, 2025, 6:26 p.m. UTC | #1
Usman Akinyemi <usmanakinyemi202@gmail.com> writes:

> The git_user_agent_sanitized() function performs some sanitizing to
> avoid special characters being sent over the line and possibly messing
> up with the protocol or with the parsing on the other side.
>
> Let's extract this sanitizing into a new redact_non_printables() function,
> as we will want to reuse it in a following patch.
>
> For now the new redact_non_printables() function is still static as
> it's only needed locally.
>
> While at it, let's use strbuf_detach() to explicitly detach the string
> contained by the 'buf' strbuf.
>
> Mentored-by: Christian Couder <chriscool@tuxfamily.org>
> Signed-off-by: Usman Akinyemi <usmanakinyemi202@gmail.com>
> ---
>  version.c | 22 ++++++++++++++++------
>  1 file changed, 16 insertions(+), 6 deletions(-)
>
> diff --git a/version.c b/version.c
> index 4d763ab48d..78f025c808 100644
> --- a/version.c
> +++ b/version.c
> @@ -6,6 +6,20 @@
>  const char git_version_string[] = GIT_VERSION;
>  const char git_built_from_commit_string[] = GIT_BUILT_FROM_COMMIT;
>  
> +/*
> + * Trim and replace each character with ascii code below 32 or above
> + * 127 (included) using a dot '.' character.

/*
 * Trim and replace each byte outside ASCII printable
 * (33 to 127, inclusive) with a dot '.'.
 */

perhaps?

> + * TODO: ensure consecutive non-printable characters are only replaced once

I am not sure what your plans are for this change.  Has the list
reached the consensus to squish consecutive redaction dots into one
in the user-agent string?  If not, let's not mention it.  Making an
incompatible change to the user-agent string is not the primary aim
of this topic anyway.

> +*/

Funny indentation.  The asterisk should have a SP before it, just
like on the previous lines.

> +static void redact_non_printables(struct strbuf *buf)
> +{
> +	strbuf_trim(buf);
> +	for (size_t i = 0; i < buf->len; i++) {
> +		if (buf->buf[i] <= 32 || buf->buf[i] >= 127)

<sane-ctype.h> defines isprint() we can use here.

> +			buf->buf[i] = '.';
> +	}
> +}

Do we want to do anything special when the resulting buf->buf[]
becomes empty or just full of dots without anything else?  Should
the caller be told about such a condition, or is it callers'
responsibility to check if they care?  I am inclined to say that it
is the latter.

> @@ -27,12 +41,8 @@ const char *git_user_agent_sanitized(void)
>  		struct strbuf buf = STRBUF_INIT;
>  
>  		strbuf_addstr(&buf, git_user_agent());
> -		strbuf_trim(&buf);
> -		for (size_t i = 0; i < buf.len; i++) {
> -			if (buf.buf[i] <= 32 || buf.buf[i] >= 127)
> -				buf.buf[i] = '.';
> -		}
> -		agent = buf.buf;
> +		redact_non_printables(&buf);
> +		agent = strbuf_detach(&buf, NULL);
>  	}
>  
>  	return agent;
Junio C Hamano Jan. 17, 2025, 7:48 p.m. UTC | #2
Junio C Hamano <gitster@pobox.com> writes:

> /*
>  * Trim and replace each byte outside ASCII printable
>  * (33 to 127, inclusive) with a dot '.'.
>  */
>
> perhaps?

"127" -> "126"; that is what an inclusive range should say.

Sorry for a noise.
Usman Akinyemi Jan. 20, 2025, 5:10 p.m. UTC | #3
On Fri, Jan 17, 2025 at 11:56 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Usman Akinyemi <usmanakinyemi202@gmail.com> writes:
>
> > The git_user_agent_sanitized() function performs some sanitizing to
> > avoid special characters being sent over the line and possibly messing
> > up with the protocol or with the parsing on the other side.
> >
> > Let's extract this sanitizing into a new redact_non_printables() function,
> > as we will want to reuse it in a following patch.
> >
> > For now the new redact_non_printables() function is still static as
> > it's only needed locally.
> >
> > While at it, let's use strbuf_detach() to explicitly detach the string
> > contained by the 'buf' strbuf.
> >
> > Mentored-by: Christian Couder <chriscool@tuxfamily.org>
> > Signed-off-by: Usman Akinyemi <usmanakinyemi202@gmail.com>
> > ---
> >  version.c | 22 ++++++++++++++++------
> >  1 file changed, 16 insertions(+), 6 deletions(-)
> >
> > diff --git a/version.c b/version.c
> > index 4d763ab48d..78f025c808 100644
> > --- a/version.c
> > +++ b/version.c
> > @@ -6,6 +6,20 @@
> >  const char git_version_string[] = GIT_VERSION;
> >  const char git_built_from_commit_string[] = GIT_BUILT_FROM_COMMIT;
> >
> > +/*
> > + * Trim and replace each character with ascii code below 32 or above
> > + * 127 (included) using a dot '.' character.
>
> /*
>  * Trim and replace each byte outside ASCII printable
>  * (33 to 127, inclusive) with a dot '.'.
>  */
>
> perhaps?
This sounds confusing, it sounds like the byte we are replacing with dot are
in the range of 33 to 127 whereas, it is those outside these range.
>
> > + * TODO: ensure consecutive non-printable characters are only replaced once
>
> I am not sure what your plans are for this change.  Has the list
> reached the consensus to squish consecutive redaction dots into one
> in the user-agent string?  If not, let's not mention it.  Making an
> incompatible change to the user-agent string is not the primary aim
> of this topic anyway.
>
> > +*/
>
> Funny indentation.  The asterisk should have a SP before it, just
> like on the previous lines.
Mistake, thanks for catching it, will make a change to it in the next
patch series.
>
> > +static void redact_non_printables(struct strbuf *buf)
> > +{
> > +     strbuf_trim(buf);
> > +     for (size_t i = 0; i < buf->len; i++) {
> > +             if (buf->buf[i] <= 32 || buf->buf[i] >= 127)
>
> <sane-ctype.h> defines isprint() we can use here.
I think it would be better to add this in another commit so that one commit
does one thing. I will add it after this patch series got settled,
what do you think ?
>
> > +                     buf->buf[i] = '.';
> > +     }
> > +}
>
> Do we want to do anything special when the resulting buf->buf[]
> becomes empty or just full of dots without anything else?  Should
> the caller be told about such a condition, or is it callers'
> responsibility to check if they care?  I am inclined to say that it
> is the latter.
I agreed.
>
> > @@ -27,12 +41,8 @@ const char *git_user_agent_sanitized(void)
> >               struct strbuf buf = STRBUF_INIT;
> >
> >               strbuf_addstr(&buf, git_user_agent());
> > -             strbuf_trim(&buf);
> > -             for (size_t i = 0; i < buf.len; i++) {
> > -                     if (buf.buf[i] <= 32 || buf.buf[i] >= 127)
> > -                             buf.buf[i] = '.';
> > -             }
> > -             agent = buf.buf;
> > +             redact_non_printables(&buf);
> > +             agent = strbuf_detach(&buf, NULL);
> >       }
> >
> >       return agent;
Christian Couder Jan. 21, 2025, 8:12 a.m. UTC | #4
On Mon, Jan 20, 2025 at 6:10 PM Usman Akinyemi
<usmanakinyemi202@gmail.com> wrote:
>
> On Fri, Jan 17, 2025 at 11:56 PM Junio C Hamano <gitster@pobox.com> wrote:
> >
> > Usman Akinyemi <usmanakinyemi202@gmail.com> writes:

> > > +static void redact_non_printables(struct strbuf *buf)
> > > +{
> > > +     strbuf_trim(buf);
> > > +     for (size_t i = 0; i < buf->len; i++) {
> > > +             if (buf->buf[i] <= 32 || buf->buf[i] >= 127)
> >
> > <sane-ctype.h> defines isprint() we can use here.
> I think it would be better to add this in another commit so that one commit
> does one thing. I will add it after this patch series got settled,
> what do you think ?

Alternatively it could be done in its own preparatory patch at the
beginning of this patch series.

<sane-ctype.h> has:

#define isprint(x) ((x) >= 0x20 && (x) <= 0x7e)

So if we wanted to use isprint() we would have to use something like:

    for (size_t i = 0; i < buf->len; i++) {
            if (!isprint(buf->buf[i]) || buf->buf[i] == ' ')
                    buf->buf[i] = '.';
    }

It would have been nicer if we didn't need a special case for SP. So I
would say it's likely a matter of taste if the result is nicer than
the original.

> >
> > > +                     buf->buf[i] = '.';
> > > +     }
> > > +}
Junio C Hamano Jan. 21, 2025, 6:01 p.m. UTC | #5
Christian Couder <christian.couder@gmail.com> writes:

> On Mon, Jan 20, 2025 at 6:10 PM Usman Akinyemi
> <usmanakinyemi202@gmail.com> wrote:
>>
>> On Fri, Jan 17, 2025 at 11:56 PM Junio C Hamano <gitster@pobox.com> wrote:
>> >
>> > Usman Akinyemi <usmanakinyemi202@gmail.com> writes:
>
>> > > +static void redact_non_printables(struct strbuf *buf)
>> > > +{
>> > > +     strbuf_trim(buf);
>> > > +     for (size_t i = 0; i < buf->len; i++) {
>> > > +             if (buf->buf[i] <= 32 || buf->buf[i] >= 127)
>> >
>> > <sane-ctype.h> defines isprint() we can use here.
>> I think it would be better to add this in another commit so that one commit
>> does one thing. I will add it after this patch series got settled,
>> what do you think ?
>
> Alternatively it could be done in its own preparatory patch at the
> beginning of this patch series.

Yup, a preliminary clean-up sounds fine, but so does a follow-up
after all the dust settles.

Thanks.
diff mbox series

Patch

diff --git a/version.c b/version.c
index 4d763ab48d..78f025c808 100644
--- a/version.c
+++ b/version.c
@@ -6,6 +6,20 @@ 
 const char git_version_string[] = GIT_VERSION;
 const char git_built_from_commit_string[] = GIT_BUILT_FROM_COMMIT;
 
+/*
+ * Trim and replace each character with ascii code below 32 or above
+ * 127 (included) using a dot '.' character.
+ * TODO: ensure consecutive non-printable characters are only replaced once
+*/
+static void redact_non_printables(struct strbuf *buf)
+{
+	strbuf_trim(buf);
+	for (size_t i = 0; i < buf->len; i++) {
+		if (buf->buf[i] <= 32 || buf->buf[i] >= 127)
+			buf->buf[i] = '.';
+	}
+}
+
 const char *git_user_agent(void)
 {
 	static const char *agent = NULL;
@@ -27,12 +41,8 @@  const char *git_user_agent_sanitized(void)
 		struct strbuf buf = STRBUF_INIT;
 
 		strbuf_addstr(&buf, git_user_agent());
-		strbuf_trim(&buf);
-		for (size_t i = 0; i < buf.len; i++) {
-			if (buf.buf[i] <= 32 || buf.buf[i] >= 127)
-				buf.buf[i] = '.';
-		}
-		agent = buf.buf;
+		redact_non_printables(&buf);
+		agent = strbuf_detach(&buf, NULL);
 	}
 
 	return agent;