From patchwork Tue Nov 20 09:46:38 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff King X-Patchwork-Id: 10690123 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9188A14BD for ; Tue, 20 Nov 2018 09:46:42 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 825E92A259 for ; Tue, 20 Nov 2018 09:46:42 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 76C822A769; Tue, 20 Nov 2018 09:46:42 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 160492A259 for ; Tue, 20 Nov 2018 09:46:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726917AbeKTUOy (ORCPT ); Tue, 20 Nov 2018 15:14:54 -0500 Received: from cloud.peff.net ([104.130.231.41]:45466 "HELO cloud.peff.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S1726250AbeKTUOx (ORCPT ); Tue, 20 Nov 2018 15:14:53 -0500 Received: (qmail 21328 invoked by uid 109); 20 Nov 2018 09:46:40 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with SMTP; Tue, 20 Nov 2018 09:46:40 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 27020 invoked by uid 111); 20 Nov 2018 09:46:02 -0000 Received: from sigill.intra.peff.net (HELO sigill.intra.peff.net) (10.0.0.7) by peff.net (qpsmtpd/0.94) with (ECDHE-RSA-AES256-GCM-SHA384 encrypted) SMTP; Tue, 20 Nov 2018 04:46:02 -0500 Authentication-Results: peff.net; auth=none Received: by sigill.intra.peff.net (sSMTP sendmail emulation); Tue, 20 Nov 2018 04:46:38 -0500 Date: Tue, 20 Nov 2018 04:46:38 -0500 From: Jeff King To: git@vger.kernel.org Cc: Christian Couder , Derrick Stolee Subject: [PATCH 1/3] pack-objects: fix tree_depth and layer invariants Message-ID: <20181120094638.GA22742@sigill.intra.peff.net> References: <20181120094451.GA21725@sigill.intra.peff.net> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20181120094451.GA21725@sigill.intra.peff.net> Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Commit 108f530385 (pack-objects: move tree_depth into 'struct packing_data', 2018-08-16) dynamically manages a tree_depth array in packing_data that maintains one of these invariants: 1. tree_depth is NULL (i.e., the requested options don't require us to track tree depths) 2. tree_depth is non-NULL and has as many entries as the "objects" array We maintain (2) by: a. When the objects array grows, grow tree_depth to the same size (unless it's NULL, in which case we can leave it). b. When a caller asks to set a depth via oe_set_tree_depth(), if tree_depth is NULL we allocate it. But in (b), we use the number of stored objects, _not_ the allocated size of the objects array. So we can run into a situation like this: 1. packlist_alloc() needs to store the Nth object, so it grows the objects array to M, where M > N. 2. oe_set_tree_depth() wants to store a depth, so it allocates an array of length N. Now we've violated our invariant. 3. packlist_alloc() needs to store the N+1th object. But it _doesn't_ grow the objects array, since N <= M still holds. We try to assign to tree_depth[N+1], which is out of bounds. That doesn't happen in our test scripts, because the repositories they use are so small, but it's easy to trigger by running: echo HEAD | git pack-objects --revs --delta-islands --stdout >/dev/null in any reasonably-sized repo (like git.git). We can fix it by always growing the array to match pack->nr_alloc, not pack->nr_objects. Likewise for the "layer" array from fe0ac2fb7f (pack-objects: move 'layer' into 'struct packing_data', 2018-08-16), which has the same bug. Signed-off-by: Jeff King --- pack-objects.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/pack-objects.h b/pack-objects.h index feb6a6a05e..f31ac1c81c 100644 --- a/pack-objects.h +++ b/pack-objects.h @@ -412,7 +412,7 @@ static inline void oe_set_tree_depth(struct packing_data *pack, unsigned int tree_depth) { if (!pack->tree_depth) - ALLOC_ARRAY(pack->tree_depth, pack->nr_objects); + ALLOC_ARRAY(pack->tree_depth, pack->nr_alloc); pack->tree_depth[e - pack->objects] = tree_depth; } @@ -429,7 +429,7 @@ static inline void oe_set_layer(struct packing_data *pack, unsigned char layer) { if (!pack->layer) - ALLOC_ARRAY(pack->layer, pack->nr_objects); + ALLOC_ARRAY(pack->layer, pack->nr_alloc); pack->layer[e - pack->objects] = layer; } From patchwork Tue Nov 20 09:48:57 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff King X-Patchwork-Id: 10690125 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 166EA14BD for ; Tue, 20 Nov 2018 09:49:01 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 057402A4E8 for ; Tue, 20 Nov 2018 09:49:01 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id EDCE92A500; Tue, 20 Nov 2018 09:49:00 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8EA302A4E8 for ; Tue, 20 Nov 2018 09:49:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727585AbeKTURN (ORCPT ); Tue, 20 Nov 2018 15:17:13 -0500 Received: from cloud.peff.net ([104.130.231.41]:45476 "HELO cloud.peff.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S1726250AbeKTURM (ORCPT ); Tue, 20 Nov 2018 15:17:12 -0500 Received: (qmail 21432 invoked by uid 109); 20 Nov 2018 09:48:58 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with SMTP; Tue, 20 Nov 2018 09:48:58 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 27039 invoked by uid 111); 20 Nov 2018 09:48:21 -0000 Received: from sigill.intra.peff.net (HELO sigill.intra.peff.net) (10.0.0.7) by peff.net (qpsmtpd/0.94) with (ECDHE-RSA-AES256-GCM-SHA384 encrypted) SMTP; Tue, 20 Nov 2018 04:48:21 -0500 Authentication-Results: peff.net; auth=none Received: by sigill.intra.peff.net (sSMTP sendmail emulation); Tue, 20 Nov 2018 04:48:57 -0500 Date: Tue, 20 Nov 2018 04:48:57 -0500 From: Jeff King To: git@vger.kernel.org Cc: Christian Couder , Derrick Stolee Subject: [PATCH 2/3] pack-objects: zero-initialize tree_depth/layer arrays Message-ID: <20181120094856.GB22742@sigill.intra.peff.net> References: <20181120094451.GA21725@sigill.intra.peff.net> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20181120094451.GA21725@sigill.intra.peff.net> Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Commit 108f530385 (pack-objects: move tree_depth into 'struct packing_data', 2018-08-16) started maintaining a tree_depth array that matches the "objects" array. We extend the array when: 1. The objects array is extended, in which case we use realloc to extend the tree_depth array. 2. A caller asks to store a tree_depth for object N, and this is the first such request; we create the array from scratch and store the value for N. In the latter case, though, we use regular xmalloc(), and the depth values for any objects besides N is undefined. This happens to not trigger a bug with the current code, but the reasons are quite subtle: - we never ask about the depth for any object with index i < N. This is because we store the depth immediately for all trees and blobs. So any such "i" must be a non-tree, and therefore we will never need to care about its depth (in fact, we really only care about the depth of trees). - there are no objects at this point with index i > N, because we always fill in the depth for a tree immediately after its object entry is created (we may still allocate uninitialized depth entries, but they'll be initialized by packlist_alloc() when it initializes the entry in the "objects" array). So it works, but only by chance. To be defensive, let's zero the array, which matches the "unset" values which would be handed out by oe_tree_depth() already. Signed-off-by: Jeff King --- git-compat-util.h | 1 + pack-objects.h | 4 ++-- 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/git-compat-util.h b/git-compat-util.h index f16058182f..09b0102cae 100644 --- a/git-compat-util.h +++ b/git-compat-util.h @@ -861,6 +861,7 @@ extern FILE *fopen_or_warn(const char *path, const char *mode); #define FREE_AND_NULL(p) do { free(p); (p) = NULL; } while (0) #define ALLOC_ARRAY(x, alloc) (x) = xmalloc(st_mult(sizeof(*(x)), (alloc))) +#define CALLOC_ARRAY(x, alloc) (x) = xcalloc((alloc), sizeof(*(x))); #define REALLOC_ARRAY(x, alloc) (x) = xrealloc((x), st_mult(sizeof(*(x)), (alloc))) #define COPY_ARRAY(dst, src, n) copy_array((dst), (src), (n), sizeof(*(dst)) + \ diff --git a/pack-objects.h b/pack-objects.h index f31ac1c81c..dc869f26c2 100644 --- a/pack-objects.h +++ b/pack-objects.h @@ -412,7 +412,7 @@ static inline void oe_set_tree_depth(struct packing_data *pack, unsigned int tree_depth) { if (!pack->tree_depth) - ALLOC_ARRAY(pack->tree_depth, pack->nr_alloc); + CALLOC_ARRAY(pack->tree_depth, pack->nr_alloc); pack->tree_depth[e - pack->objects] = tree_depth; } @@ -429,7 +429,7 @@ static inline void oe_set_layer(struct packing_data *pack, unsigned char layer) { if (!pack->layer) - ALLOC_ARRAY(pack->layer, pack->nr_alloc); + CALLOC_ARRAY(pack->layer, pack->nr_alloc); pack->layer[e - pack->objects] = layer; } From patchwork Tue Nov 20 09:50:53 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff King X-Patchwork-Id: 10690127 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 216B714BD for ; Tue, 20 Nov 2018 09:50:58 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 141632A7CF for ; Tue, 20 Nov 2018 09:50:58 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 08B3E2A7F0; Tue, 20 Nov 2018 09:50:58 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D7E512A801 for ; Tue, 20 Nov 2018 09:50:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726569AbeKTUTJ (ORCPT ); Tue, 20 Nov 2018 15:19:09 -0500 Received: from cloud.peff.net ([104.130.231.41]:45484 "HELO cloud.peff.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S1725890AbeKTUTJ (ORCPT ); Tue, 20 Nov 2018 15:19:09 -0500 Received: (qmail 21531 invoked by uid 109); 20 Nov 2018 09:50:54 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with SMTP; Tue, 20 Nov 2018 09:50:54 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 27063 invoked by uid 111); 20 Nov 2018 09:50:17 -0000 Received: from sigill.intra.peff.net (HELO sigill.intra.peff.net) (10.0.0.7) by peff.net (qpsmtpd/0.94) with (ECDHE-RSA-AES256-GCM-SHA384 encrypted) SMTP; Tue, 20 Nov 2018 04:50:17 -0500 Authentication-Results: peff.net; auth=none Received: by sigill.intra.peff.net (sSMTP sendmail emulation); Tue, 20 Nov 2018 04:50:53 -0500 Date: Tue, 20 Nov 2018 04:50:53 -0500 From: Jeff King To: git@vger.kernel.org Cc: Christian Couder , Derrick Stolee Subject: [PATCH 3/3] pack-objects: fix off-by-one in delta-island tree-depth computation Message-ID: <20181120095053.GC22742@sigill.intra.peff.net> References: <20181120094451.GA21725@sigill.intra.peff.net> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20181120094451.GA21725@sigill.intra.peff.net> Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP When delta-islands are in use, we need to record the deepest path at which we find each tree and blob. Our loop to do so counts slashes, so "foo" is depth 0, "foo/bar" is depth 1, and so on. However, this neglects root trees, which are represented by the empty string. Those also have depth 0, but are at a layer above "foo". Thus, "foo" should be 1, "foo/bar" at 2, and so on. We use this depth to topo-sort the trees in resolve_tree_islands(). As a result, we may fail to visit a root tree before the sub-trees it contains, and therefore not correctly pass down the island marks. That in turn could lead to missing some delta opportunities (objects are in the same island, but we didn't realize it) or creating unwanted cross-island deltas (one object is in an island another isn't, but we don't realize). In practice, it seems to have only a small effect. Some experiments on the real-world git/git fork network at GitHub showed an improvement of only 0.14% in the resulting clone size. Signed-off-by: Jeff King --- builtin/pack-objects.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index e7ea206c08..411aefd687 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -2786,9 +2786,11 @@ static void show_object(struct object *obj, const char *name, void *data) if (use_delta_islands) { const char *p; - unsigned depth = 0; + unsigned depth; struct object_entry *ent; + /* the empty string is a root tree, which is depth 0 */ + depth = *name ? 1 : 0; for (p = strchr(name, '/'); p; p = strchr(p + 1, '/')) depth++;