From patchwork Thu Apr 18 21:17:02 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff King X-Patchwork-Id: 10908059 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2414A13B5 for ; Thu, 18 Apr 2019 21:17:06 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0A8A028ADC for ; Thu, 18 Apr 2019 21:17:06 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id F152928BE2; Thu, 18 Apr 2019 21:17:05 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 91DC828ADC for ; Thu, 18 Apr 2019 21:17:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2388301AbfDRVRE (ORCPT ); Thu, 18 Apr 2019 17:17:04 -0400 Received: from cloud.peff.net ([104.130.231.41]:34404 "HELO cloud.peff.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S1731565AbfDRVRE (ORCPT ); Thu, 18 Apr 2019 17:17:04 -0400 Received: (qmail 5556 invoked by uid 109); 18 Apr 2019 21:17:04 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with SMTP; Thu, 18 Apr 2019 21:17:04 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 4651 invoked by uid 111); 18 Apr 2019 21:17:34 -0000 Received: from sigill.intra.peff.net (HELO sigill.intra.peff.net) (10.0.0.7) by peff.net (qpsmtpd/0.94) with (ECDHE-RSA-AES256-GCM-SHA384 encrypted) SMTP; Thu, 18 Apr 2019 17:17:34 -0400 Authentication-Results: peff.net; auth=none Received: by sigill.intra.peff.net (sSMTP sendmail emulation); Thu, 18 Apr 2019 17:17:02 -0400 Date: Thu, 18 Apr 2019 17:17:02 -0400 From: Jeff King To: Junio C Hamano Cc: Johannes Schindelin via GitGitGadget , git@vger.kernel.org, Johannes Schindelin , =?utf-8?b?Tmd1eeG7hW4gVGjDoWkgTmfhu41j?= Duy Subject: [PATCH 1/3] untracked-cache: be defensive about missing NULs in index Message-ID: <20190418211701.GA18520@sigill.intra.peff.net> References: <20190418211408.GA18011@sigill.intra.peff.net> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20190418211408.GA18011@sigill.intra.peff.net> Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP The on-disk format for the untracked-cache extension contains NUL-terminated filenames. We parse these from the mmap'd file using string functions like strlen(). This works fine in the normal case, but if we see a malformed or corrupted index, we might read off the end of our mmap. Instead, let's use memchr() to find the trailing NUL within the bytes we know are available, and return an error if it's missing. Note that we can further simplify by folding another range check into our conditional. After we find the end of the string, we set "next" to the byte after the string and treat it as an error if there are no such bytes left. That saves us from having to do a range check at the beginning of each subsequent string (and works because there is always data after each string). We can do both range checks together by checking "!eos" (we didn't find a NUL) and "eos == end" (it was on the last available byte, meaning there's nothing after). This replaces the existing "next > end" checks. Note also that the decode_varint() calls have a similar problem (we don't even pass them "end"; they just keep parsing). These are probably OK in practice since varints have a finite length (we stop parsing when we'd overflow a uintmax_t), so the worst case is that we'd overflow into reading the trailing bytes of the index. Signed-off-by: Jeff King --- dir.c | 18 +++++++++++------- 1 file changed, 11 insertions(+), 7 deletions(-) diff --git a/dir.c b/dir.c index f5293a6536..7b0513c476 100644 --- a/dir.c +++ b/dir.c @@ -2733,6 +2733,7 @@ static int read_one_dir(struct untracked_cache_dir **untracked_, { struct untracked_cache_dir ud, *untracked; const unsigned char *next, *data = rd->data, *end = rd->end; + const unsigned char *eos; unsigned int value; int i, len; @@ -2756,21 +2757,24 @@ static int read_one_dir(struct untracked_cache_dir **untracked_, ALLOC_ARRAY(ud.dirs, ud.dirs_nr); data = next; - len = strlen((const char *)data); - next = data + len + 1; - if (next > rd->end) + eos = memchr(data, '\0', end - data); + if (!eos || eos == end) return -1; + len = eos - data; + next = eos + 1; + *untracked_ = untracked = xmalloc(st_add3(sizeof(*untracked), len, 1)); memcpy(untracked, &ud, sizeof(ud)); memcpy(untracked->name, data, len + 1); data = next; for (i = 0; i < untracked->untracked_nr; i++) { - len = strlen((const char *)data); - next = data + len + 1; - if (next > rd->end) + eos = memchr(data, '\0', end - data); + if (!eos || eos == end) return -1; - untracked->untracked[i] = xstrdup((const char*)data); + len = eos - data; + next = eos + 1; + untracked->untracked[i] = xmemdupz(data, len); data = next; } From patchwork Thu Apr 18 21:17:38 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff King X-Patchwork-Id: 10908061 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A4C2513B5 for ; Thu, 18 Apr 2019 21:17:42 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8F17D28ADC for ; Thu, 18 Apr 2019 21:17:42 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 829D128BF5; Thu, 18 Apr 2019 21:17:42 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2D6F728BE2 for ; Thu, 18 Apr 2019 21:17:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732180AbfDRVRl (ORCPT ); Thu, 18 Apr 2019 17:17:41 -0400 Received: from cloud.peff.net ([104.130.231.41]:34416 "HELO cloud.peff.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S1728531AbfDRVRk (ORCPT ); Thu, 18 Apr 2019 17:17:40 -0400 Received: (qmail 5585 invoked by uid 109); 18 Apr 2019 21:17:40 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with SMTP; Thu, 18 Apr 2019 21:17:40 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 4671 invoked by uid 111); 18 Apr 2019 21:18:11 -0000 Received: from sigill.intra.peff.net (HELO sigill.intra.peff.net) (10.0.0.7) by peff.net (qpsmtpd/0.94) with (ECDHE-RSA-AES256-GCM-SHA384 encrypted) SMTP; Thu, 18 Apr 2019 17:18:11 -0400 Authentication-Results: peff.net; auth=none Received: by sigill.intra.peff.net (sSMTP sendmail emulation); Thu, 18 Apr 2019 17:17:38 -0400 Date: Thu, 18 Apr 2019 17:17:38 -0400 From: Jeff King To: Junio C Hamano Cc: Johannes Schindelin via GitGitGadget , git@vger.kernel.org, Johannes Schindelin , =?utf-8?b?Tmd1eeG7hW4gVGjDoWkgTmfhu41j?= Duy Subject: [PATCH 2/3] untracked-cache: simplify parsing by dropping "next" Message-ID: <20190418211738.GB18520@sigill.intra.peff.net> References: <20190418211408.GA18011@sigill.intra.peff.net> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20190418211408.GA18011@sigill.intra.peff.net> Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP When we parse an on-disk untracked cache, we have two pointers, "data" and "next". As we parse, we point "next" to the end of an element, and then later update "data" to match. But we actually don't need two pointers. Each parsing step can just update "data" directly from other variables we hold (and we don't have to worry about bailing in an intermediate state, since any parsing failure causes us to immediately discard "data" and return). Signed-off-by: Jeff King --- dir.c | 20 +++++++------------- 1 file changed, 7 insertions(+), 13 deletions(-) diff --git a/dir.c b/dir.c index 7b0513c476..17865f44df 100644 --- a/dir.c +++ b/dir.c @@ -2732,50 +2732,44 @@ static int read_one_dir(struct untracked_cache_dir **untracked_, struct read_data *rd) { struct untracked_cache_dir ud, *untracked; - const unsigned char *next, *data = rd->data, *end = rd->end; + const unsigned char *data = rd->data, *end = rd->end; const unsigned char *eos; unsigned int value; int i, len; memset(&ud, 0, sizeof(ud)); - next = data; - value = decode_varint(&next); - if (next > end) + value = decode_varint(&data); + if (data > end) return -1; ud.recurse = 1; ud.untracked_alloc = value; ud.untracked_nr = value; if (ud.untracked_nr) ALLOC_ARRAY(ud.untracked, ud.untracked_nr); - data = next; - next = data; - ud.dirs_alloc = ud.dirs_nr = decode_varint(&next); - if (next > end) + ud.dirs_alloc = ud.dirs_nr = decode_varint(&data); + if (data > end) return -1; ALLOC_ARRAY(ud.dirs, ud.dirs_nr); - data = next; eos = memchr(data, '\0', end - data); if (!eos || eos == end) return -1; len = eos - data; - next = eos + 1; *untracked_ = untracked = xmalloc(st_add3(sizeof(*untracked), len, 1)); memcpy(untracked, &ud, sizeof(ud)); memcpy(untracked->name, data, len + 1); - data = next; + data = eos + 1; for (i = 0; i < untracked->untracked_nr; i++) { eos = memchr(data, '\0', end - data); if (!eos || eos == end) return -1; len = eos - data; - next = eos + 1; untracked->untracked[i] = xmemdupz(data, len); - data = next; + data = eos + 1; } rd->ucd[rd->index++] = untracked; From patchwork Thu Apr 18 21:18:35 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff King X-Patchwork-Id: 10908067 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6032213B5 for ; Thu, 18 Apr 2019 21:18:39 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 498D028ADC for ; Thu, 18 Apr 2019 21:18:39 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 3A4E628BE2; Thu, 18 Apr 2019 21:18:39 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D626528ADC for ; Thu, 18 Apr 2019 21:18:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732251AbfDRVSh (ORCPT ); Thu, 18 Apr 2019 17:18:37 -0400 Received: from cloud.peff.net ([104.130.231.41]:34436 "HELO cloud.peff.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S1728531AbfDRVSh (ORCPT ); Thu, 18 Apr 2019 17:18:37 -0400 Received: (qmail 5630 invoked by uid 109); 18 Apr 2019 21:18:37 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with SMTP; Thu, 18 Apr 2019 21:18:37 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 4701 invoked by uid 111); 18 Apr 2019 21:19:08 -0000 Received: from sigill.intra.peff.net (HELO sigill.intra.peff.net) (10.0.0.7) by peff.net (qpsmtpd/0.94) with (ECDHE-RSA-AES256-GCM-SHA384 encrypted) SMTP; Thu, 18 Apr 2019 17:19:08 -0400 Authentication-Results: peff.net; auth=none Received: by sigill.intra.peff.net (sSMTP sendmail emulation); Thu, 18 Apr 2019 17:18:35 -0400 Date: Thu, 18 Apr 2019 17:18:35 -0400 From: Jeff King To: Junio C Hamano Cc: Johannes Schindelin via GitGitGadget , git@vger.kernel.org, Johannes Schindelin , =?utf-8?b?Tmd1eeG7hW4gVGjDoWkgTmfhu41j?= Duy Subject: [PATCH 3/3] untracked-cache: simplify parsing by dropping "len" Message-ID: <20190418211835.GC18520@sigill.intra.peff.net> References: <20190418211408.GA18011@sigill.intra.peff.net> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20190418211408.GA18011@sigill.intra.peff.net> Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP The code which parses untracked-cache extensions from disk keeps a "len" variable, which is the size of the string we are parsing. But since we now have an "end of string" variable, we can just use that to get the length when we need it. This eliminates the need to keep "len" up to date (and removes the possibility of any errors where "len" and "eos" get out of sync). As a bonus, it means we are not storing a string length in an "int", which is a potential source of overflows (though in this case it seems fairly unlikely for that to cause any memory problems). Signed-off-by: Jeff King --- dir.c | 13 +++++-------- 1 file changed, 5 insertions(+), 8 deletions(-) diff --git a/dir.c b/dir.c index 17865f44df..60438b2cdc 100644 --- a/dir.c +++ b/dir.c @@ -2735,7 +2735,7 @@ static int read_one_dir(struct untracked_cache_dir **untracked_, const unsigned char *data = rd->data, *end = rd->end; const unsigned char *eos; unsigned int value; - int i, len; + int i; memset(&ud, 0, sizeof(ud)); @@ -2756,28 +2756,25 @@ static int read_one_dir(struct untracked_cache_dir **untracked_, eos = memchr(data, '\0', end - data); if (!eos || eos == end) return -1; - len = eos - data; - *untracked_ = untracked = xmalloc(st_add3(sizeof(*untracked), len, 1)); + *untracked_ = untracked = xmalloc(st_add3(sizeof(*untracked), eos - data, 1)); memcpy(untracked, &ud, sizeof(ud)); - memcpy(untracked->name, data, len + 1); + memcpy(untracked->name, data, eos - data + 1); data = eos + 1; for (i = 0; i < untracked->untracked_nr; i++) { eos = memchr(data, '\0', end - data); if (!eos || eos == end) return -1; - len = eos - data; - untracked->untracked[i] = xmemdupz(data, len); + untracked->untracked[i] = xmemdupz(data, eos - data); data = eos + 1; } rd->ucd[rd->index++] = untracked; rd->data = data; for (i = 0; i < untracked->dirs_nr; i++) { - len = read_one_dir(untracked->dirs + i, rd); - if (len < 0) + if (read_one_dir(untracked->dirs + i, rd) < 0) return -1; } return 0; From patchwork Thu Apr 18 21:24:05 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff King X-Patchwork-Id: 10908073 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E836B1515 for ; Thu, 18 Apr 2019 21:24:10 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C518828CBB for ; Thu, 18 Apr 2019 21:24:10 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id B912D28CD0; Thu, 18 Apr 2019 21:24:10 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5613B28CBB for ; Thu, 18 Apr 2019 21:24:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732180AbfDRVYH (ORCPT ); Thu, 18 Apr 2019 17:24:07 -0400 Received: from cloud.peff.net ([104.130.231.41]:34444 "HELO cloud.peff.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S1728264AbfDRVYH (ORCPT ); Thu, 18 Apr 2019 17:24:07 -0400 Received: (qmail 5777 invoked by uid 109); 18 Apr 2019 21:24:07 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with SMTP; Thu, 18 Apr 2019 21:24:07 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 4735 invoked by uid 111); 18 Apr 2019 21:24:38 -0000 Received: from sigill.intra.peff.net (HELO sigill.intra.peff.net) (10.0.0.7) by peff.net (qpsmtpd/0.94) with (ECDHE-RSA-AES256-GCM-SHA384 encrypted) SMTP; Thu, 18 Apr 2019 17:24:38 -0400 Authentication-Results: peff.net; auth=none Received: by sigill.intra.peff.net (sSMTP sendmail emulation); Thu, 18 Apr 2019 17:24:05 -0400 Date: Thu, 18 Apr 2019 17:24:05 -0400 From: Jeff King To: Junio C Hamano Cc: Johannes Schindelin via GitGitGadget , git@vger.kernel.org, Johannes Schindelin , =?utf-8?b?Tmd1eeG7hW4gVGjDoWkgTmfhu41j?= Duy Subject: [PATCH 4/3] untracked-cache: use FLEX_ALLOC to create internal structs Message-ID: <20190418212405.GA18623@sigill.intra.peff.net> References: <20190410162029.GA30818@sigill.intra.peff.net> <20190418211408.GA18011@sigill.intra.peff.net> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20190418211408.GA18011@sigill.intra.peff.net> Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On Thu, Apr 18, 2019 at 05:14:08PM -0400, Jeff King wrote: > Just so we don't forget about it, I wrote this fix up as a patch. And in > fact it led to a few other cleanups. I think the first one is definitely > worth doing now, even if there are other similar cases lurking in the > rest of the index code. > > The other two are optional, though I think they are worth it (and not > too hard to verify that they are doing the right thing). > > These are on top of js/untracked-cache-allocfix (though they could > easily be ported to a separate topic if we want). > > [1/3]: untracked-cache: be defensive about missing NULs in index > [2/3]: untracked-cache: simplify parsing by dropping "next" > [3/3]: untracked-cache: simplify parsing by dropping "len" I also wondered if we could just accept the cost of calloc() here and use FLEX_ALLOC to simplify things. That resulted in the patch below, but I didn't include it with the initial 3, because I think it's too subtle/gross for my tastes. -- >8 -- Subject: untracked-cache: use FLEX_ALLOC to create internal structs The untracked_cache_dir struct has a FLEX_ARRAY in it. Let's use FLEX_ALLOC_MEM to allocate it, which saves us having to compute the length ourselves. In theory this could be slightly slower, since the FLEX_ALLOC macros use calloc (and we just memcpy over most of the contents anyway). But in practice this distinction is not generally measurable. Note that because we then fill in the pre-flex elements of the struct using a memcpy, we need to take care to use the exact size of that space and _not_ "sizeof(ud)", since the latter may include padding (or even an extra byte on systems where FLEX_ARRAY is 1). Signed-off-by: Jeff King --- If we wanted to go this route, I think it would make sense to provide a FLEX_ALLOC macro that takes a "template" set of bytes as a ptr/len pair, and writes it before we fill in the flex portion. Then we could do something like: FLEX_ALLOC_COPY(untracked, &ud, sizeof(ud), name, data, eos - data); If this is the only such case, it's probably not worth it (I didn't really look around for more, though). dir.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/dir.c b/dir.c index 60438b2cdc..7cd4eec198 100644 --- a/dir.c +++ b/dir.c @@ -2757,9 +2757,9 @@ static int read_one_dir(struct untracked_cache_dir **untracked_, if (!eos || eos == end) return -1; - *untracked_ = untracked = xmalloc(st_add3(sizeof(*untracked), eos - data, 1)); - memcpy(untracked, &ud, sizeof(ud)); - memcpy(untracked->name, data, eos - data + 1); + FLEX_ALLOC_MEM(untracked, name, data, eos - data); + memcpy(untracked, &ud, offsetof(struct untracked_cache_dir, name)); + *untracked_ = untracked; data = eos + 1; for (i = 0; i < untracked->untracked_nr; i++) {