From patchwork Tue Apr 18 19:18:43 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13216117 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A3C78C77B75 for ; Tue, 18 Apr 2023 19:20:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230138AbjDRTUC (ORCPT ); Tue, 18 Apr 2023 15:20:02 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37090 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232882AbjDRTT7 (ORCPT ); Tue, 18 Apr 2023 15:19:59 -0400 Received: from mail-yw1-x112e.google.com (mail-yw1-x112e.google.com [IPv6:2607:f8b0:4864:20::112e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7E63AAF03 for ; Tue, 18 Apr 2023 12:19:34 -0700 (PDT) Received: by mail-yw1-x112e.google.com with SMTP id 00721157ae682-54fe82d8bf5so157684467b3.3 for ; Tue, 18 Apr 2023 12:19:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20221208.gappssmtp.com; s=20221208; t=1681845525; x=1684437525; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=343Qxla/tclrYjEObik0B5i3+pGcL9+BeFxi6fybwzc=; b=LHuvD/zHsQj+kwqoLgFudFsY6IIIQvciJgHpZeL+gd4xpidzwRpOQ4aw6bAGFZwAkW rCLUEa3rq+6VYJim/ImQ9YC4jwy3czdBYMdBhsOD0A83JvQ+M2TJSQhTrqgYAuCaKonf csVyLNFi8EjVYJcehHEp4Ak2v/lebeVt8SH1z76n/eEVn/WD54G22Knu1eYD0kCYtjAI D6WdHrnr1eMLsC2w3a/cbXg+2lYEHI8fXDQ+xCLDEXipLhCY9T/zYV08fBllZ/oMIrr6 nN1YVnRHcj6FupJ7+PdOGm0VvbXuxLuZNEfqO6quO9G7R21YVygt1VJ2d4pWbRCGXMY9 HNzQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1681845525; x=1684437525; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=343Qxla/tclrYjEObik0B5i3+pGcL9+BeFxi6fybwzc=; b=Jl3DcG88OFzvddgms63mtbGM/YiN83mJP5TjjAua8S5EO4n3+9DHpsC9EZzpB3774/ cckMPfVeESYYqUsOVYi8ut9KN+joibhy0kGQeC2VV6QpJg1IhsaoZLcvzbAhQTVC/8b1 5z0ISsI+DlSqyX/kYfZx2ORXTlqyckltMoID59+7uP/xUPH78+6fZoDJ/xAUJNcOb9B6 EZ8qqiZZpHkmukjKRqJTtq3dH72i6IV+gi+Jkjku2WzxW0f51pT3rb8QVWfL852uPx6p 4W1MfdvP/TAzkUI9u363q/JacM8SAA/S/gS0xRNeUK+/Xzvm+/JetbHgy/AG9vVGvMp1 3IwA== X-Gm-Message-State: AAQBX9dQ1HB2K84OWNEE8IdC7V8W9Y6fWEekWkraQMvvFkaUJpn49W1a E5tw8VaNc98Yy4pHFHPfsZDleyujEW8OWjbvp8piuw== X-Google-Smtp-Source: AKy350ZmV+HAYJrvigNMmiaIjWLq6iRoQisuKVZqtXosOsIwLSA/FblwuwfCKme3PDbo1/wUWLivvA== X-Received: by 2002:a0d:ca0f:0:b0:54f:ae60:867e with SMTP id m15-20020a0dca0f000000b0054fae60867emr951217ywd.7.1681845524854; Tue, 18 Apr 2023 12:18:44 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id 3-20020a811203000000b00545a08184f3sm4004875yws.131.2023.04.18.12.18.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 18 Apr 2023 12:18:44 -0700 (PDT) Date: Tue, 18 Apr 2023 15:18:43 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Chris Torek , Junio C Hamano Subject: [PATCH v2 1/6] string-list: introduce `string_list_split_in_place_multi()` Message-ID: <6658b231a906dde6acbe7ce156da693ef7dc40e6.1681845518.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Introduce a variant of the `string_list_split_in_place()` function that takes a string of accepted delimiters. By contrast to its cousin `string_list_split_in_place()` which splits the given string at every instance of the single character `delim`, the `_multi` variant splits the given string any any character appearing in the string `delim`. Like `strtok()`, the `_multi` variant skips past sequential delimiting characters. For example: string_list_split_in_place(&xs, xstrdup("foo::bar::baz"), ":", -1); would place in `xs` the elements "foo", "bar", and "baz". Instead of using `strchr(2)` to locate the first occurrence of the given delimiter character, `string_list_split_in_place_multi()` uses `strcspn(2)` to move past the initial segment of characters comprised of any characters in the delimiting set. When only a single delimiting character is provided, `strcspn(2)` has equivalent performance to `strchr(2)`. Modern `strcspn(2)` implementations treat an empty delimiter or the singleton delimiter as a special case and fall back to calling strchrnul(). Both glibc[1] and musl[2] implement `strcspn(2)` this way. Since the `_multi` variant is a generalization of the original implementation, reimplement `string_list_split_in_place()` in terms of the more general function by providing a single-character string for the list of accepted delimiters. To avoid regressions, update t0063 in this patch as well. Any "common" test cases (i.e., those that produce the same result whether you call `string_list_split()` or `string_list_split_in_place_multi()`) are grouped into a loop which is parameterized over the function to test. Any cases which aren't common (of which there is one existing case, and a handful of new ones added which are specific to the `_multi` variant) are tested independently. [1]: https://sourceware.org/git/?p=glibc.git;a=blob;f=string/strcspn.c;hb=glibc-2.37#l35 [2]: https://git.musl-libc.org/cgit/musl/tree/src/string/strcspn.c?h=v1.2.3#n11 Signed-off-by: Taylor Blau --- string-list.c | 26 +++++++-- string-list.h | 7 +++ t/helper/test-string-list.c | 15 ++++++ t/t0063-string-list.sh | 105 +++++++++++++++++++++++++----------- 4 files changed, 118 insertions(+), 35 deletions(-) diff --git a/string-list.c b/string-list.c index db473f273e..b27a53f2e1 100644 --- a/string-list.c +++ b/string-list.c @@ -300,8 +300,9 @@ int string_list_split(struct string_list *list, const char *string, } } -int string_list_split_in_place(struct string_list *list, char *string, - int delim, int maxsplit) +static int string_list_split_in_place_1(struct string_list *list, char *string, + const char *delim, int maxsplit, + unsigned runs) { int count = 0; char *p = string, *end; @@ -310,13 +311,16 @@ int string_list_split_in_place(struct string_list *list, char *string, die("internal error in string_list_split_in_place(): " "list->strdup_strings must not be set"); for (;;) { + if (runs) + p += strspn(p, delim); + count++; if (maxsplit >= 0 && count > maxsplit) { string_list_append(list, p); return count; } - end = strchr(p, delim); - if (end) { + end = p + strcspn(p, delim); + if (end && *end) { *end = '\0'; string_list_append(list, p); p = end + 1; @@ -326,3 +330,17 @@ int string_list_split_in_place(struct string_list *list, char *string, } } } + +int string_list_split_in_place_multi(struct string_list *list, char *string, + const char *delim, int maxsplit) +{ + return string_list_split_in_place_1(list, string, delim, maxsplit, 1); +} + +int string_list_split_in_place(struct string_list *list, char *string, + int delim, int maxsplit) +{ + char delim_s[2] = { delim, 0 }; + + return string_list_split_in_place_1(list, string, delim_s, maxsplit, 0); +} diff --git a/string-list.h b/string-list.h index c7b0d5d000..f01bbb0bb6 100644 --- a/string-list.h +++ b/string-list.h @@ -268,7 +268,14 @@ int string_list_split(struct string_list *list, const char *string, * new string_list_items point into string (which therefore must not * be modified or freed while the string_list is in use). * list->strdup_strings must *not* be set. + * + * The "_multi" variant splits the given string on any character + * appearing in "delim", and the non-"_multi" variant splits only on the + * given character. The "_multi" variant behaves like `strtok()` where + * no element contains the delimiting byte(s). */ +int string_list_split_in_place_multi(struct string_list *list, char *string, + const char *delim, int maxsplit); int string_list_split_in_place(struct string_list *list, char *string, int delim, int maxsplit); #endif /* STRING_LIST_H */ diff --git a/t/helper/test-string-list.c b/t/helper/test-string-list.c index 2123dda85b..119bc9e1c9 100644 --- a/t/helper/test-string-list.c +++ b/t/helper/test-string-list.c @@ -73,6 +73,21 @@ int cmd__string_list(int argc, const char **argv) return 0; } + if (argc == 5 && !strcmp(argv[1], "split_in_place_multi")) { + struct string_list list = STRING_LIST_INIT_NODUP; + int i; + char *s = xstrdup(argv[2]); + const char *delim = argv[3]; + int maxsplit = atoi(argv[4]); + + i = string_list_split_in_place_multi(&list, s, delim, maxsplit); + printf("%d\n", i); + write_list(&list); + string_list_clear(&list, 0); + free(s); + return 0; + } + if (argc == 4 && !strcmp(argv[1], "filter")) { /* * Retain only the items that have the specified prefix. diff --git a/t/t0063-string-list.sh b/t/t0063-string-list.sh index 46d4839194..9c5094616a 100755 --- a/t/t0063-string-list.sh +++ b/t/t0063-string-list.sh @@ -18,42 +18,53 @@ test_split () { " } -test_split "foo:bar:baz" ":" "-1" <expected && + test_expect_success "split_in_place_multi $1 at $2, max $3" " + test-tool string-list split_in_place_multi '$1' '$2' '$3' >actual && + test_cmp expected actual + " +} -test_split "foo:bar:baz" ":" "0" < X-Patchwork-Id: 13216118 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id ACA75C77B78 for ; Tue, 18 Apr 2023 19:20:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230479AbjDRTUE (ORCPT ); Tue, 18 Apr 2023 15:20:04 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37148 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229813AbjDRTUB (ORCPT ); Tue, 18 Apr 2023 15:20:01 -0400 Received: from mail-yb1-xb31.google.com (mail-yb1-xb31.google.com [IPv6:2607:f8b0:4864:20::b31]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6213D1FC3 for ; Tue, 18 Apr 2023 12:19:36 -0700 (PDT) Received: by mail-yb1-xb31.google.com with SMTP id v9so9653989ybm.0 for ; Tue, 18 Apr 2023 12:19:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20221208.gappssmtp.com; s=20221208; t=1681845528; x=1684437528; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=cEz+IGEOZrlMZAJLqUNyPK7jLzwh9XjxUPuK8lSgZng=; b=dgAd320Vm2WSnWH1b0zcTdCX+PF+AVeeXgRt2uaMbynHjQRJjAJuYWQnvN9Szv5AOJ 41NnAv7GQdgghBpXXxAhAGtEFkKcLrPsP58+NOqliK15j1SIbtNS+4gLLTGPv3K6wKky 722tNgcyl01YA8Nx2Xo+1E5sVfLMyMfpJxnh4Y9UdRzrNU1doKPmbVqCTbxEOyo3SVAO nqolBQ6bmZR3DVXsdiIY+kBim46HVtXZ5IoUoI0PlkAZo+kWAgqXXYYkfl7+c7gD8CjS P5sahixRH2zt59B6GHD/nyeKASoKNAiL7jbe4H13V0bbThTLETP4lQU7Jyrme9t/6pfJ fUcQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1681845528; x=1684437528; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=cEz+IGEOZrlMZAJLqUNyPK7jLzwh9XjxUPuK8lSgZng=; b=ItXuyQ/BKjcgSBw0biZlU8YFN8I/YGzMq3aP+g8bUjWs4KPPL/sbWBRbnjxmW0ydAb MIO54oriAQa2O+fHIqQZOWHjNUov/L6VO+Dnma5wxGbt4Wu6IOge+N6jxbIWKlMuC6I9 s4Nwy+2z8r0DCx8S2jVjE69DHXsloxbPPaelkkUIge+7j1i0sfVg8B6uFSQxsXLrU5hS 6ZvR2zJCcwzo3FGEQdJ6hJX5qNHFC/sOOu6Iq1N2Juh1oI88R2ycUHxgh/fq5RaysVAt c7MxpklzWRWWEdoKTkSMwORurM1UfGSUNsH4er0K5r4ZXnzc06xICPVCS02AbK/M/tDQ aq1A== X-Gm-Message-State: AAQBX9eiIgshJGI3GZ6CaJ80gldZ1GhshvuqpdWSbPVEEknMFSlwOcIt kRNq/XEYuYicv4oSVmjKb8eQaut30urrXVeEab2xLw== X-Google-Smtp-Source: AKy350aTjYd9gp2pxx+eXPc6CTRgaN6TpJCr/SnIf8DkWW4AWo+TuBWjIGvU4aIqfIp/owEUlhT64w== X-Received: by 2002:a05:6902:707:b0:b8f:3685:c12a with SMTP id k7-20020a056902070700b00b8f3685c12amr22691367ybt.39.1681845527952; Tue, 18 Apr 2023 12:18:47 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id 126-20020a250a84000000b00b8f54571fc0sm2326961ybk.5.2023.04.18.12.18.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 18 Apr 2023 12:18:47 -0700 (PDT) Date: Tue, 18 Apr 2023 15:18:46 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Chris Torek , Junio C Hamano Subject: [PATCH v2 2/6] string-list: introduce `string_list_setlen()` Message-ID: <2a20ad8bc5513aae912c53a294092ee5087e1873.1681845518.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org It is sometimes useful to reduce the size of a `string_list`'s list of items without having to re-allocate them. For example, doing the following: struct strbuf buf = STRBUF_INIT; struct string_list parts = STRING_LIST_INIT_NO_DUP; while (strbuf_getline(&buf, stdin) != EOF) { parts.nr = 0; string_list_split_in_place(&parts, buf.buf, ":", -1); /* ... */ } string_list_clear(&parts, 0); is preferable over calling `string_list_clear()` on every iteration of the loop. This is because `string_list_clear()` causes us free our existing `items` array. This means that every time we call `string_list_split_in_place()`, the string-list internals re-allocate the same size array. Since in the above example we do not care about the individual parts after processing each line, it is much more efficient to pretend that there aren't any elements in the `string_list` by setting `list->nr` to 0 while leaving the list of elements allocated as-is. This allows `string_list_split_in_place()` to overwrite any existing entries without needing to free and re-allocate them. However, setting `list->nr` manually is not safe in all instances. There are a couple of cases worth worrying about: - If the `string_list` is initialized with `strdup_strings`, truncating the list can lead to overwriting strings which are allocated elsewhere. If there aren't any other pointers to those strings other than the ones inside of the `items` array, they will become unreachable and leak. (We could ourselves free the truncated items between string_list->items[nr] and `list->nr`, but no present or future callers would benefit from this additional complexity). - If the given `nr` is larger than the current value of `list->nr`, we'll trick the `string_list` into a state where it thinks there are more items allocated than there actually are, which can lead to undefined behavior if we try to read or write those entries. Guard against both of these by introducing a helper function which guards assignment of `list->nr` against each of the above conditions. Co-authored-by: Jeff King Signed-off-by: Jeff King Signed-off-by: Taylor Blau --- string-list.c | 9 +++++++++ string-list.h | 10 ++++++++++ 2 files changed, 19 insertions(+) diff --git a/string-list.c b/string-list.c index b27a53f2e1..f0b3cdae94 100644 --- a/string-list.c +++ b/string-list.c @@ -203,6 +203,15 @@ void string_list_clear_func(struct string_list *list, string_list_clear_func_t c list->nr = list->alloc = 0; } +void string_list_setlen(struct string_list *list, size_t nr) +{ + if (list->strdup_strings) + BUG("cannot setlen a string_list which owns its entries"); + if (nr > list->nr) + BUG("cannot grow a string_list with setlen"); + list->nr = nr; +} + struct string_list_item *string_list_append_nodup(struct string_list *list, char *string) { diff --git a/string-list.h b/string-list.h index f01bbb0bb6..b41ecda6f4 100644 --- a/string-list.h +++ b/string-list.h @@ -134,6 +134,16 @@ typedef void (*string_list_clear_func_t)(void *p, const char *str); /** Call a custom clear function on each util pointer */ void string_list_clear_func(struct string_list *list, string_list_clear_func_t clearfunc); +/* + * Set the length of a string_list to `nr`, provided that (a) `list` + * does not own its own storage, and (b) that `nr` is no larger than + * `list->nr`. + * + * Useful when "shrinking" `list` to write over existing entries that + * are no longer used without reallocating. + */ +void string_list_setlen(struct string_list *list, size_t nr); + /** * Apply `func` to each item. If `func` returns nonzero, the * iteration aborts and the return value is propagated. From patchwork Tue Apr 18 19:18:49 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13216119 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0B3AAC77B75 for ; Tue, 18 Apr 2023 19:20:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230181AbjDRTUO (ORCPT ); Tue, 18 Apr 2023 15:20:14 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37254 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230053AbjDRTUF (ORCPT ); Tue, 18 Apr 2023 15:20:05 -0400 Received: from mail-yw1-x1129.google.com (mail-yw1-x1129.google.com [IPv6:2607:f8b0:4864:20::1129]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4E2B3B771 for ; Tue, 18 Apr 2023 12:19:39 -0700 (PDT) Received: by mail-yw1-x1129.google.com with SMTP id 00721157ae682-555bc7f6746so26993377b3.6 for ; Tue, 18 Apr 2023 12:19:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20221208.gappssmtp.com; s=20221208; t=1681845531; x=1684437531; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=MJd2a9t1532cF2WjNpo/oHCfwwv+snJeLWbryabPFKk=; b=Sc0B7jtQhhF5pMWSGR+FZGCxVAaj8Yd1inJrtHa/jl0y5JXplncDRUAMgAfBV2ZJ3p QH6nEskvA/Dg+2J6kGsmAWFc9gNhM2lbIYAL3Q6YJvq9aUlx+h8TtumUqW4mskfuEDpJ 9biAZkDG2foNVimB4E9uhVc2Un2fzH6P6BpHCN9Z8tdRtQ1Cg3RAW6SZoitWP0pKa4Zt MELsuaGyVK+wh0WAhV5B728ox3qlyqFWuiOtashLtKkPGFub6FCZb8ntZ4GqZds221Hj vg15yvu/nlCGGYCDykhiueae/S0+eu8npubELgW2w81P4DpptREL98U6WpS5q0RS1I5v 2C3w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1681845531; x=1684437531; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=MJd2a9t1532cF2WjNpo/oHCfwwv+snJeLWbryabPFKk=; b=d+7zIwiAHgFOt02y58ESQtsw1tKYOzTLm7w1PgeSuXhtdjz0hiwia6QdXz/AiE/uWz q/aGZejrTtuDWQIG3bvCmxoyFHn8HJrqtOXOQy6E7FktV639pV5mfipBJdFQFV0u68mC 0cj5wj0IxUPmadOg1BhI6iVKVsUIMpJBzZIuJLNf7jMQRzxp0er9KM9KhG+RyhIt9xHO BbFMSyX3JPLpSJ0B9uR4BN4c3KZcM7QWaRoU/dVhZZGkGG6PLfn2q8v4WIYupxqBYDN+ ZmaRGkCD4ookOXGWhX8sV0lVga+h4c+aEOpwJi54lIYUDdj5yRVX6l8HeMzRRo+wdHrO wH1g== X-Gm-Message-State: AAQBX9dqqRScJFa6cPinMRBBc19CiqQtldaPk4HgORA36Gu5jcIYEDIE 1ETzsb3AvASQZw7F6MOKXB9PnL/n8HTvvb8imqmM9A== X-Google-Smtp-Source: AKy350Y8Ci75dc6JZwc9V+QX8gYRkK0LZcwEjGQ7+uqok6xWGwaDXGlQa465RIhDtj7A1YbVL3++HQ== X-Received: by 2002:a81:91c6:0:b0:4fb:8b40:5899 with SMTP id i189-20020a8191c6000000b004fb8b405899mr1030266ywg.7.1681845531156; Tue, 18 Apr 2023 12:18:51 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id 17-20020a810d11000000b00545a0818479sm3986892ywn.9.2023.04.18.12.18.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 18 Apr 2023 12:18:50 -0700 (PDT) Date: Tue, 18 Apr 2023 15:18:49 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Chris Torek , Junio C Hamano Subject: [PATCH v2 3/6] t/helper/test-hashmap.c: avoid using `strtok()` Message-ID: <0ae07dec3663d7cbb0f8662c47485c0667a879b9.1681845518.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Avoid using the non-reentrant `strtok()` to separate the parts of each incoming command. Instead of replacing it with `strtok_r()`, let's instead use the more friendly `string_list_split_in_place_multi()`. Signed-off-by: Taylor Blau --- t/helper/test-hashmap.c | 30 +++++++++++++++++++++++------- 1 file changed, 23 insertions(+), 7 deletions(-) diff --git a/t/helper/test-hashmap.c b/t/helper/test-hashmap.c index 36ff07bd4b..5a3e74a3e5 100644 --- a/t/helper/test-hashmap.c +++ b/t/helper/test-hashmap.c @@ -2,6 +2,7 @@ #include "git-compat-util.h" #include "hashmap.h" #include "strbuf.h" +#include "string-list.h" struct test_entry { @@ -150,6 +151,7 @@ static void perf_hashmap(unsigned int method, unsigned int rounds) */ int cmd__hashmap(int argc, const char **argv) { + struct string_list parts = STRING_LIST_INIT_NODUP; struct strbuf line = STRBUF_INIT; int icase; struct hashmap map = HASHMAP_INIT(test_entry_cmp, &icase); @@ -159,21 +161,34 @@ int cmd__hashmap(int argc, const char **argv) /* process commands from stdin */ while (strbuf_getline(&line, stdin) != EOF) { - char *cmd, *p1 = NULL, *p2 = NULL; + char *cmd, *p1, *p2; unsigned int hash = 0; struct test_entry *entry; + /* + * Because we memdup() the arguments out of the + * string_list before inserting them into the hashmap, + * it's OK to set its length back to zero to avoid + * re-allocating the items array once per line. + * + * By doing so, we'll instead overwrite the existing + * entries and avoid re-allocating. + */ + string_list_setlen(&parts, 0); /* break line into command and up to two parameters */ - cmd = strtok(line.buf, DELIM); + string_list_split_in_place_multi(&parts, line.buf, DELIM, 2); + /* ignore empty lines */ - if (!cmd || *cmd == '#') + if (!parts.nr) + continue; + if (!*parts.items[0].string || *parts.items[0].string == '#') continue; - p1 = strtok(NULL, DELIM); - if (p1) { + cmd = parts.items[0].string; + p1 = parts.nr >= 1 ? parts.items[1].string : NULL; + p2 = parts.nr >= 2 ? parts.items[2].string : NULL; + if (p1) hash = icase ? strihash(p1) : strhash(p1); - p2 = strtok(NULL, DELIM); - } if (!strcmp("add", cmd) && p1 && p2) { @@ -260,6 +275,7 @@ int cmd__hashmap(int argc, const char **argv) } } + string_list_clear(&parts, 0); strbuf_release(&line); hashmap_clear_and_free(&map, struct test_entry, ent); return 0; From patchwork Tue Apr 18 19:18:52 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13216120 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A3817C6FD18 for ; Tue, 18 Apr 2023 19:20:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231545AbjDRTUQ (ORCPT ); Tue, 18 Apr 2023 15:20:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37346 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231492AbjDRTUN (ORCPT ); Tue, 18 Apr 2023 15:20:13 -0400 Received: from mail-yw1-x1129.google.com (mail-yw1-x1129.google.com [IPv6:2607:f8b0:4864:20::1129]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 67A1DC3 for ; Tue, 18 Apr 2023 12:19:42 -0700 (PDT) Received: by mail-yw1-x1129.google.com with SMTP id 00721157ae682-54fe82d8bf5so157689137b3.3 for ; Tue, 18 Apr 2023 12:19:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20221208.gappssmtp.com; s=20221208; t=1681845534; x=1684437534; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=hLjgfWaKe+/siMa8MFjhmCmCV5Uh0/l+qwY/ehc6PLE=; b=FaFN2QiE1gczhU0pAdWOJB2C8ZmYmYLItX/Y8I4CzElJVcmA7hM/A23s7tTzvqyag8 v0webUcqlxoRY/YAgKSJX/Gr2cWqZ0crGgJwucS4rnMYYTYkpGOLQ5UvN8yP6dS9Q7KV Xn8YR4Om2wUAcctHZXYtcg15IWC7YXWyKUZel6hAOm9QcFrFzOmlNYmLnhslWFWFhBV/ io20FiBc+JHxdSyCPad780SxgrhqTjO2auEzn0zFez4cKS75tiusmfnsEN8w9wrUUhYt 1t4sdQFq3egeExeqvrfnhxEghnmh7sYwWEDIcNWZLcojfCt0BmHlM5m6B8m380Inbsmi YtTg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1681845534; x=1684437534; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=hLjgfWaKe+/siMa8MFjhmCmCV5Uh0/l+qwY/ehc6PLE=; b=XT41PL+ZJ1F8YRs3Musu4hgW4BGej3vZaxtiZBaEeIzDz4AB+/IAK74X/hAp7u+5Lf ASVDC57/DIND+XkOMuSVNt4Xp6rXqc4fKrHCi7DbQdlfchZX7XnKpHRx0Jo6mRk1fah4 lYA31998UEUENr0jkAhG5FoWMLA4ZnbDEm18oCfbnSpRMtnpP1ypW+FP/YfmgUm08ZcV d8i0FFJFxXTsxup6rouYSPBUj+iSbJ22f8qAPY0goEqUZuTc+K8nMrbXWmUQ6MiVa6Jj mXFbVpTmhFDtmHz3ckSwXX/oW5PUeOiLrMdN4v0bgh771UoiqP1bHyuFRQzkgpnARCiM UgLg== X-Gm-Message-State: AAQBX9d3SzBAFIxZy1Wf7djKy3RbJQjqUtdZFUFWkLNbfGRm9Ir7v5BL UrtMhDqReyfQWvoK9ZhmefxTjkzrViA350W+i2hgFg== X-Google-Smtp-Source: AKy350ayX3aqRlYeEPVOUtpwCk8gGz1XZ0XZMcVyv3e8B7NbWPx996rB2bJlAtVWRf6H5E1uOIQ9IQ== X-Received: by 2002:a0d:ebd0:0:b0:552:d913:ea18 with SMTP id u199-20020a0debd0000000b00552d913ea18mr1000930ywe.45.1681845534328; Tue, 18 Apr 2023 12:18:54 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id l69-20020a812548000000b00552a118d059sm2710621ywl.117.2023.04.18.12.18.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 18 Apr 2023 12:18:53 -0700 (PDT) Date: Tue, 18 Apr 2023 15:18:52 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Chris Torek , Junio C Hamano Subject: [PATCH v2 4/6] t/helper/test-oidmap.c: avoid using `strtok()` Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Apply similar treatment as in the previous commit to remove usage of `strtok()` from the "oidmap" test helper. Signed-off-by: Taylor Blau --- t/helper/test-oidmap.c | 20 ++++++++++++++------ 1 file changed, 14 insertions(+), 6 deletions(-) diff --git a/t/helper/test-oidmap.c b/t/helper/test-oidmap.c index a7b7b38df1..bca2ca0e06 100644 --- a/t/helper/test-oidmap.c +++ b/t/helper/test-oidmap.c @@ -4,6 +4,7 @@ #include "oidmap.h" #include "setup.h" #include "strbuf.h" +#include "string-list.h" /* key is an oid and value is a name (could be a refname for example) */ struct test_entry { @@ -25,6 +26,7 @@ struct test_entry { */ int cmd__oidmap(int argc UNUSED, const char **argv UNUSED) { + struct string_list parts = STRING_LIST_INIT_NODUP; struct strbuf line = STRBUF_INIT; struct oidmap map = OIDMAP_INIT; @@ -35,19 +37,24 @@ int cmd__oidmap(int argc UNUSED, const char **argv UNUSED) /* process commands from stdin */ while (strbuf_getline(&line, stdin) != EOF) { - char *cmd, *p1 = NULL, *p2 = NULL; + char *cmd, *p1, *p2; struct test_entry *entry; struct object_id oid; + /* see the comment in cmd__hashmap() */ + string_list_setlen(&parts, 0); /* break line into command and up to two parameters */ - cmd = strtok(line.buf, DELIM); + string_list_split_in_place_multi(&parts, line.buf, DELIM, 2); + /* ignore empty lines */ - if (!cmd || *cmd == '#') + if (!parts.nr) + continue; + if (!*parts.items[0].string || *parts.items[0].string == '#') continue; - p1 = strtok(NULL, DELIM); - if (p1) - p2 = strtok(NULL, DELIM); + cmd = parts.items[0].string; + p1 = parts.nr >= 1 ? parts.items[1].string : NULL; + p2 = parts.nr >= 2 ? parts.items[2].string : NULL; if (!strcmp("put", cmd) && p1 && p2) { @@ -108,6 +115,7 @@ int cmd__oidmap(int argc UNUSED, const char **argv UNUSED) } } + string_list_clear(&parts, 0); strbuf_release(&line); oidmap_free(&map, 1); return 0; From patchwork Tue Apr 18 19:18:55 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13216121 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 97055C77B75 for ; Tue, 18 Apr 2023 19:20:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232691AbjDRTU1 (ORCPT ); Tue, 18 Apr 2023 15:20:27 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37326 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232095AbjDRTUO (ORCPT ); Tue, 18 Apr 2023 15:20:14 -0400 Received: from mail-yw1-x1132.google.com (mail-yw1-x1132.google.com [IPv6:2607:f8b0:4864:20::1132]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 35ACBD302 for ; Tue, 18 Apr 2023 12:19:44 -0700 (PDT) Received: by mail-yw1-x1132.google.com with SMTP id 00721157ae682-54fe82d8bf5so157691157b3.3 for ; Tue, 18 Apr 2023 12:19:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20221208.gappssmtp.com; s=20221208; t=1681845537; x=1684437537; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=l9HI34V9F8AmyEowwW3Dgatr6yvmunmNRSnjftEGhn8=; b=DEAW5MoNT7UXYxXHSEn2abeS/PbOkkTuyztdofLTqhSyAXZOz781qgZzEqQnYi2tbr 9cLPhDnTDMeiJkCL7ktOtr+krmQmhkqYoWHppVR4zB3ZB3WmiWESI9anrXsrcrHzzfWg 6Oj7x/oahAtKOEQHPPkHfHnYfeSXmZbKE6ulYBl4GmHFZwbkiU0WJgyjc/4XbhChEnFS Yit7OinDUfIwzceDsPY0lFttf8EtaH2nA52VmE9dc97+lX813bcVmcZdf5Vqkqbb2cAE NcqeCSsfLR3Jb6sT3AGi0qvxT/Jv8ZmCVGD2BvVA0WaB9MBBCg1luFpwrwFJ2k5+T6g6 z8RA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1681845537; x=1684437537; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=l9HI34V9F8AmyEowwW3Dgatr6yvmunmNRSnjftEGhn8=; b=BqMLpjz20i7ZOsZQMxvTPulW7SyYg/hRk5KSNUhoaV0xVfLrKslYugKDffTfpJVMiM s2LckTBZVL+NYqo0mAXFQAqtqa958wsks53ZxHfge2yzmz0Jrzt3Bt6nOKgqYmrILwtZ 7Klm37PpDaAjZysate5wGhTyrl7M0k/4eBig3UBhawKypUx8xhGhfdRSwDTzv5R94Sat kqYfRCVX5NC1xst8CxX1WJgQDkdd/EJRNe/PPGmu3SZyO+ACnqiNTLA4PfWsBsIWzzo4 KzHzM5XgHyoUNXdHgQAGndqv7RAysXd+o46NkHm0B0rth8G92pEcIa0MjhZYO13y/PXQ ZdNg== X-Gm-Message-State: AAQBX9ct4zybsn4tfuSgAaN0hbgtzjuY/dF/DuSfi2C6DnCL3lKk/m+b Tm0SRkphDg2ObxnkakAS2YdJggHjfLUpRWHeDW0qTg== X-Google-Smtp-Source: AKy350YbAXTCF6eLCgzztw6fTpVNJRBw7TZqv2qpVTotDqZXO1ohFmOC2jOioTmnDjHOe0WSfMMujA== X-Received: by 2002:a0d:fb01:0:b0:552:96b0:c4c0 with SMTP id l1-20020a0dfb01000000b0055296b0c4c0mr730792ywf.19.1681845537584; Tue, 18 Apr 2023 12:18:57 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id bz2-20020a05690c084200b0054601a8399csm4002786ywb.119.2023.04.18.12.18.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 18 Apr 2023 12:18:56 -0700 (PDT) Date: Tue, 18 Apr 2023 15:18:55 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Chris Torek , Junio C Hamano Subject: [PATCH v2 5/6] t/helper/test-json-writer.c: avoid using `strtok()` Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Apply similar treatment as in the previous commit to remove usage of `strtok()` from the "oidmap" test helper. Each of the different commands that the "json-writer" helper accepts pops the next space-delimited token from the current line and interprets it as a string, integer, or double (with the exception of the very first token, which is the command itself). To accommodate this, split the line in place by the space character, and pass the corresponding string_list to each of the specialized `get_s()`, `get_i()`, and `get_d()` functions. `get_i()` and `get_d()` are thin wrappers around `get_s()` that convert their result into the appropriate type by either calling `strtol()` or `strtod()`, respectively. In `get_s()`, we mark the token as "consumed" by incrementing the `consumed_nr` counter, indicating how many tokens we have read up to that point. Because each of these functions needs the string-list parts, the number of tokens consumed, and the line number, these three are wrapped up in to a struct representing the line state. Signed-off-by: Taylor Blau --- t/helper/test-json-writer.c | 76 +++++++++++++++++++++++-------------- 1 file changed, 48 insertions(+), 28 deletions(-) diff --git a/t/helper/test-json-writer.c b/t/helper/test-json-writer.c index 86887f5320..af0a34aa04 100644 --- a/t/helper/test-json-writer.c +++ b/t/helper/test-json-writer.c @@ -1,5 +1,6 @@ #include "test-tool.h" #include "json-writer.h" +#include "string-list.h" static const char *expect_obj1 = "{\"a\":\"abc\",\"b\":42,\"c\":true}"; static const char *expect_obj2 = "{\"a\":-1,\"b\":2147483647,\"c\":0}"; @@ -394,35 +395,41 @@ static int unit_tests(void) return 0; } -static void get_s(int line_nr, char **s_in) +struct line { + struct string_list *parts; + size_t consumed_nr; + int nr; +}; + +static void get_s(struct line *line, char **s_in) { - *s_in = strtok(NULL, " "); - if (!*s_in) - die("line[%d]: expected: ", line_nr); + if (line->consumed_nr > line->parts->nr) + die("line[%d]: expected: ", line->nr); + *s_in = line->parts->items[line->consumed_nr++].string; } -static void get_i(int line_nr, intmax_t *s_in) +static void get_i(struct line *line, intmax_t *s_in) { char *s; char *endptr; - get_s(line_nr, &s); + get_s(line, &s); *s_in = strtol(s, &endptr, 10); if (*endptr || errno == ERANGE) - die("line[%d]: invalid integer value", line_nr); + die("line[%d]: invalid integer value", line->nr); } -static void get_d(int line_nr, double *s_in) +static void get_d(struct line *line, double *s_in) { char *s; char *endptr; - get_s(line_nr, &s); + get_s(line, &s); *s_in = strtod(s, &endptr); if (*endptr || errno == ERANGE) - die("line[%d]: invalid float value", line_nr); + die("line[%d]: invalid float value", line->nr); } static int pretty; @@ -453,6 +460,7 @@ static char *get_trimmed_line(char *buf, int buf_size) static int scripted(void) { + struct string_list parts = STRING_LIST_INIT_NODUP; struct json_writer jw = JSON_WRITER_INIT; char buf[MAX_LINE_LENGTH]; char *line; @@ -470,66 +478,77 @@ static int scripted(void) die("expected first line to be 'object' or 'array'"); while ((line = get_trimmed_line(buf, MAX_LINE_LENGTH)) != NULL) { + struct line state = { 0 }; char *verb; char *key; char *s_value; intmax_t i_value; double d_value; - line_nr++; + state.parts = &parts; + state.nr = ++line_nr; - verb = strtok(line, " "); + /* see the comment in cmd__hashmap() */ + string_list_setlen(&parts, 0); + /* break line into command and zero or more tokens */ + string_list_split_in_place(&parts, line, ' ', -1); + + /* ignore empty lines */ + if (!parts.nr || !*parts.items[0].string) + continue; + + verb = parts.items[state.consumed_nr++].string; if (!strcmp(verb, "end")) { jw_end(&jw); } else if (!strcmp(verb, "object-string")) { - get_s(line_nr, &key); - get_s(line_nr, &s_value); + get_s(&state, &key); + get_s(&state, &s_value); jw_object_string(&jw, key, s_value); } else if (!strcmp(verb, "object-int")) { - get_s(line_nr, &key); - get_i(line_nr, &i_value); + get_s(&state, &key); + get_i(&state, &i_value); jw_object_intmax(&jw, key, i_value); } else if (!strcmp(verb, "object-double")) { - get_s(line_nr, &key); - get_i(line_nr, &i_value); - get_d(line_nr, &d_value); + get_s(&state, &key); + get_i(&state, &i_value); + get_d(&state, &d_value); jw_object_double(&jw, key, i_value, d_value); } else if (!strcmp(verb, "object-true")) { - get_s(line_nr, &key); + get_s(&state, &key); jw_object_true(&jw, key); } else if (!strcmp(verb, "object-false")) { - get_s(line_nr, &key); + get_s(&state, &key); jw_object_false(&jw, key); } else if (!strcmp(verb, "object-null")) { - get_s(line_nr, &key); + get_s(&state, &key); jw_object_null(&jw, key); } else if (!strcmp(verb, "object-object")) { - get_s(line_nr, &key); + get_s(&state, &key); jw_object_inline_begin_object(&jw, key); } else if (!strcmp(verb, "object-array")) { - get_s(line_nr, &key); + get_s(&state, &key); jw_object_inline_begin_array(&jw, key); } else if (!strcmp(verb, "array-string")) { - get_s(line_nr, &s_value); + get_s(&state, &s_value); jw_array_string(&jw, s_value); } else if (!strcmp(verb, "array-int")) { - get_i(line_nr, &i_value); + get_i(&state, &i_value); jw_array_intmax(&jw, i_value); } else if (!strcmp(verb, "array-double")) { - get_i(line_nr, &i_value); - get_d(line_nr, &d_value); + get_i(&state, &i_value); + get_d(&state, &d_value); jw_array_double(&jw, i_value, d_value); } else if (!strcmp(verb, "array-true")) @@ -552,6 +571,7 @@ static int scripted(void) printf("%s\n", jw.json.buf); jw_release(&jw); + string_list_clear(&parts, 0); return 0; } From patchwork Tue Apr 18 19:18:59 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13216122 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4C504C6FD18 for ; Tue, 18 Apr 2023 19:20:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231425AbjDRTU2 (ORCPT ); Tue, 18 Apr 2023 15:20:28 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37338 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232279AbjDRTUO (ORCPT ); Tue, 18 Apr 2023 15:20:14 -0400 Received: from mail-yb1-xb31.google.com (mail-yb1-xb31.google.com [IPv6:2607:f8b0:4864:20::b31]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0838CD31D for ; Tue, 18 Apr 2023 12:19:45 -0700 (PDT) Received: by mail-yb1-xb31.google.com with SMTP id u13so30315069ybu.5 for ; Tue, 18 Apr 2023 12:19:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20221208.gappssmtp.com; s=20221208; t=1681845540; x=1684437540; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=EksJ/pNVG6zsm55qBvZhvabtvs5lTXMOs4ak1ZA04Cs=; b=1oQ44qqpU78Q+VLxRltA2u3zxFSu9ffOU5RwcutaNJ1etWVGXjkxysMKtxv2Gc1J52 flCG2ZaWdVTFXWjW+YPd7ojq0tNnHZgvG585q7LiOoFjbk9py/UMnenG4x3xG8MKICWN oXONnKEQT7EF0s/VWzREQjX1mf/jaqn9MMcorDrsUNFO4k11EG8injaOHWcGUx3tItqZ 5TJmKrNk7Q0mm/j0XgZ6oRGxiG4lVLrDECvJDllSBQDQbQg6GQwc7iIwRrcJMdBo02B7 SpZFDph0wfC7ykCXkvLa32sAPBdw+SVM6+pJXuKZQ5bfOU/ZFEN3KA9D5Ar1f7lT7AJn 8HDQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1681845540; x=1684437540; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=EksJ/pNVG6zsm55qBvZhvabtvs5lTXMOs4ak1ZA04Cs=; b=Uaj1M/ZxYGtu2u1OYqsLjC8Q4571b27Y6ggt8yUwKISRm0WBMEy50k6QB/0SVasv8O QfHJSQ3UU5dpAb5auvrzr6BFbccLLzkYDWMWU+RDgxZST2YjK9xZtY4GXO1qqkbWC+Ek yvL0xLRL9hRhTZKxm62Pd2UlsymmoxZXbV2knxdElYepw9/KpE2TRLZS6ASkO/isVPFD mDSORIt7JaE3xuVI6+RHbJbyUg11X688MWl2ayn0FMo7y7EwXi40kOnXe8DR4AEB/86X A9oaG0t3ggmpBoqT8CJH0FTuyfH1bIIWnEHeS/Wp3bdJNa0Zxib89b7UxL//Vrb0XWJX AoeQ== X-Gm-Message-State: AAQBX9fIYcMO4QU60eNKlyZbf7clfIdNXkePi4Mm/M5hX8qpf49o+fhT Fk5jneMDgiayk4q4iSmM5oUgZvTEWSozEkS/sV6MEw== X-Google-Smtp-Source: AKy350a64LSnB+BFfJTrM+gNxAnO6CoS5yjixOts4u6s3Q/dgVqobwoImjZevLYU7dlW/o4eXVkHEw== X-Received: by 2002:a25:2707:0:b0:b93:a1bc:a326 with SMTP id n7-20020a252707000000b00b93a1bca326mr5373723ybn.23.1681845540545; Tue, 18 Apr 2023 12:19:00 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id 65-20020a810744000000b00552df52450csm1625673ywh.88.2023.04.18.12.18.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 18 Apr 2023 12:19:00 -0700 (PDT) Date: Tue, 18 Apr 2023 15:18:59 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Chris Torek , Junio C Hamano Subject: [PATCH v2 6/6] banned.h: mark `strtok()` as banned Message-ID: <56d2318a6d0aa17e2d56cd7d7e755adae89f8d99.1681845518.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org `strtok_r()` is reentrant, but `strtok()` is not, meaning that using it is not thread-safe. `strtok()` has a couple of drawbacks that make it undesirable to have any new instances. In addition to being thread-unsafe, it also encourages confusing data flows, where `strtok()` may be called from multiple functions with its first argument as NULL, making it unclear from the immediate context which string is being tokenized. Now that we have removed all instances of `strtok()` from the tree, let's ban `strtok()` to avoid introducing new ones in the future. If new callers should arise, they can either use: - `string_list_split_in_place()`, - `string_list_split_in_place_multi()`, or - `strtok_r()`. Callers are encouraged to use either of the string_list functions when appropriate over `strtok_r()`, since the latter suffers from the same confusing data-flow problem as `strtok()` does. But callers may prefer `strtok_r()` when the number of tokens in a given string is unknown, and they want to split and process them one at a time, so `strtok_r()` is left off the banned.h list. Signed-off-by: Taylor Blau --- banned.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/banned.h b/banned.h index 6ccf46bc19..dd43ab3178 100644 --- a/banned.h +++ b/banned.h @@ -18,6 +18,8 @@ #define strncpy(x,y,n) BANNED(strncpy) #undef strncat #define strncat(x,y,n) BANNED(strncat) +#undef strtok +#define strtok(x,y) BANNED(strtok) #undef sprintf #undef vsprintf