From patchwork Tue Sep 10 02:28:26 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13797732 Received: from mail-ej1-f48.google.com (mail-ej1-f48.google.com [209.85.218.48]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AC37A558BA for ; Tue, 10 Sep 2024 02:29:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.218.48 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725935342; cv=none; b=DtkK3AHfVdZKRmuiXrOzGQaztI8byhC9YSxnRwU659ZYsx1vtkYwqcnPS04B4chP9Or6zKoN1MxDCnJ0YfYOZSBNqiaALSXpsU8V5brQwQEqTxkNUwxd0nlWiwz+Z05yPViHRmd1hfZPTkWKmxdk+lieyso4cvLuwZqnrczRSqc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725935342; c=relaxed/simple; bh=6bAXoeDab+vsodc2K4BXkLNy/Uln74I4PM95nrVpQpc=; h=Message-Id:In-Reply-To:References:From:Date:Subject:Content-Type: MIME-Version:To:Cc; b=WzCr9LUMfVhVP4DO4NHULaz2fGHLomNC73XtzO9hCuMTtovoKVEpy4d5y7cSyd2Ni6d3AUDQaENypYxBiLjMPNC/7uQ+ePnjI66apzugcECPMyX2LEy2+LBwBcOAX7yoZFoL5ZQgK3lyGxFOSSJaHceGE9ASwFActOolS+/bB70= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=J+i3SVMl; arc=none smtp.client-ip=209.85.218.48 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="J+i3SVMl" Received: by mail-ej1-f48.google.com with SMTP id a640c23a62f3a-a80eab3945eso472231566b.1 for ; Mon, 09 Sep 2024 19:29:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1725935338; x=1726540138; darn=vger.kernel.org; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=8WWOMy4WIUWhxeKz0gGI2lTYbTiCTB1FXD5uAnefrxM=; b=J+i3SVMl9NPzZwco46r0+XTSytu56VN4bOhnZ2J3GDe7PTWbhpzjjatgi+tNUE68iE HCg3HM/kGdAStryEZNZONBoMJvGmPH8BQAqLfTtwT7Iitq7b2lBk76Bj1pRVQJDRpw9c p/fXyz8hArUSZdzysbNlMQUAU91ShUN1CHIKtNcJeAiNGisxSYDlU4gVjCXH88SCzrsa kajQwMWa7AT0uzfyt7PSyOUSdHrOKeNRroXQNEZuQv16NanoAq9rfs7WyFIyCeh8DdIx kNc0hpPRyHm8twqFfM0Z3q0olglSmyr2EpP01ytKqax2iimB0dNuHOWZsc02s3Cl53dL x4VA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725935338; x=1726540138; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=8WWOMy4WIUWhxeKz0gGI2lTYbTiCTB1FXD5uAnefrxM=; b=Ev2Iqf6ci4AGDc5ZuDEreOh9mOO//3GZHBE2KdVJ95xx/fF77EqAgHk9RNg5H5yoPb KCtAPLzkQJ/QFFo1BuHFXlCe4tOiNuZAhuCj0Q0Bm9mnKlafouyOv9ixBsJzu31y1kPV fEQXxzybZn1YsoNmb1+LjmvpwXaS6HG/kFFWiNZgs/h1e5RUyPtCg/OjWB4UomVmg471 FQeMjs/eyr7VxSucq+MZsvIK6TCVu4yAGjGAe7vqP79t75MRRkSIiDlVYtod3e7F6k0h SFPc1bBvzRunpMQ7HP+Ikm37MKUeS61QJ5uanX/K0uUN07zqbW10zsZkBS5ni9qHigX5 bzfA== X-Gm-Message-State: AOJu0YyVYc73KOpAixYogvggh+HtOMyF4NH4UPw7ojaIiKFQ0L7aZhQO 4qEKm9Q7yZRTbTOtjeLIfuTKms1RRh0gW++98pNqzBexeo3vNhdiW3q8Xg== X-Google-Smtp-Source: AGHT+IEYUH3tlJ9ahlXl/x93XoK8gBLBLcr+O8Y0PehmXwnURxXMjt+BuFvhcKUDennPHMVI2Wd6BQ== X-Received: by 2002:a17:907:3687:b0:a86:b923:4a04 with SMTP id a640c23a62f3a-a8a887fcdbemr1007695366b.50.1725935337745; Mon, 09 Sep 2024 19:28:57 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a8d25c61258sm413909166b.116.2024.09.09.19.28.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 09 Sep 2024 19:28:57 -0700 (PDT) Message-Id: In-Reply-To: References: Date: Tue, 10 Sep 2024 02:28:26 +0000 Subject: [PATCH 01/30] path-walk: introduce an object walk by path Fcc: Sent Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, johannes.schindelin@gmx.de, peff@peff.net, ps@pks.im, me@ttaylorr.com, johncai86@gmail.com, newren@gmail.com, Derrick Stolee , Derrick Stolee From: Derrick Stolee From: Derrick Stolee In anticipation of a few planned applications, introduce the most basic form of a path-walk API. It currently assumes that there are no UNINTERESTING objects, and does not include any complicated filters. It calls a function pointer on groups of tree and blob objects as grouped by path. This only includes objects the first time they are discovered, so an object that appears at multiple paths will not be included in two batches. There are many future adaptations that could be made, but they are left for future updates when consumers are ready to take advantage of those features. RFC TODO: It would be helpful to create a test-tool that allows printing of each batch for strong testing. Signed-off-by: Derrick Stolee --- Makefile | 1 + path-walk.c | 235 ++++++++++++++++++++++++++++++++++++++++++++++++++++ path-walk.h | 43 ++++++++++ 3 files changed, 279 insertions(+) create mode 100644 path-walk.c create mode 100644 path-walk.h diff --git a/Makefile b/Makefile index deb175a0408..e83f6de9a2c 100644 --- a/Makefile +++ b/Makefile @@ -1090,6 +1090,7 @@ LIB_OBJS += parse-options.o LIB_OBJS += patch-delta.o LIB_OBJS += patch-ids.o LIB_OBJS += path.o +LIB_OBJS += path-walk.o LIB_OBJS += pathspec.o LIB_OBJS += pkt-line.o LIB_OBJS += preload-index.o diff --git a/path-walk.c b/path-walk.c new file mode 100644 index 00000000000..2edfa0572e4 --- /dev/null +++ b/path-walk.c @@ -0,0 +1,235 @@ +/* + * path-walk.c: implementation for path-based walks of the object graph. + */ +#include "git-compat-util.h" +#include "path-walk.h" +#include "blob.h" +#include "commit.h" +#include "dir.h" +#include "hashmap.h" +#include "hex.h" +#include "object.h" +#include "oid-array.h" +#include "revision.h" +#include "string-list.h" +#include "strmap.h" +#include "trace2.h" +#include "tree.h" +#include "tree-walk.h" + +struct type_and_oid_list +{ + enum object_type type; + struct oid_array oids; +}; + +#define TYPE_AND_OID_LIST_INIT { \ + .type = OBJ_NONE, \ + .oids = OID_ARRAY_INIT \ +} + +struct path_walk_context { + /** + * Repeats of data in 'struct path_walk_info' for + * access with fewer characters. + */ + struct repository *repo; + struct rev_info *revs; + struct path_walk_info *info; + + /** + * Map a path to a 'struct type_and_oid_list' + * containing the objects discovered at that + * path. + */ + struct strmap paths_to_lists; + + /** + * Store the current list of paths in a stack, to + * facilitate depth-first-search without recursion. + */ + struct string_list path_stack; +}; + +static int add_children(struct path_walk_context *ctx, + const char *base_path, + struct object_id *oid) +{ + struct tree_desc desc; + struct name_entry entry; + struct strbuf path = STRBUF_INIT; + size_t base_len; + struct tree *tree = lookup_tree(ctx->repo, oid); + + if (!tree) { + error(_("failed to walk children of tree %s: not found"), + oid_to_hex(oid)); + return -1; + } + + strbuf_addstr(&path, base_path); + base_len = path.len; + + parse_tree(tree); + init_tree_desc(&desc, &tree->object.oid, tree->buffer, tree->size); + while (tree_entry(&desc, &entry)) { + struct type_and_oid_list *list; + struct object *o; + /* Not actually true, but we will ignore submodules later. */ + enum object_type type = S_ISDIR(entry.mode) ? OBJ_TREE : OBJ_BLOB; + + /* Skip submodules. */ + if (S_ISGITLINK(entry.mode)) + continue; + + if (type == OBJ_TREE) { + struct tree *child = lookup_tree(ctx->repo, &entry.oid); + o = child ? &child->object : NULL; + } else if (type == OBJ_BLOB) { + struct blob *child = lookup_blob(ctx->repo, &entry.oid); + o = child ? &child->object : NULL; + } else { + /* Wrong type? */ + continue; + } + + if (!o) /* report error?*/ + continue; + + /* Skip this object if already seen. */ + if (o->flags & SEEN) + continue; + o->flags |= SEEN; + + strbuf_setlen(&path, base_len); + strbuf_add(&path, entry.path, entry.pathlen); + + /* + * Trees will end with "/" for concatenation and distinction + * from blobs at the same path. + */ + if (type == OBJ_TREE) + strbuf_addch(&path, '/'); + + if (!(list = strmap_get(&ctx->paths_to_lists, path.buf))) { + CALLOC_ARRAY(list, 1); + list->type = type; + strmap_put(&ctx->paths_to_lists, path.buf, list); + string_list_append(&ctx->path_stack, path.buf); + } + oid_array_append(&list->oids, &entry.oid); + } + + free_tree_buffer(tree); + strbuf_release(&path); + return 0; +} + +/* + * For each path in paths_to_explore, walk the trees another level + * and add any found blobs to the batch (but only if they don't + * exist and haven't been added yet). + */ +static int walk_path(struct path_walk_context *ctx, + const char *path) +{ + struct type_and_oid_list *list; + int ret = 0; + + list = strmap_get(&ctx->paths_to_lists, path); + + /* Evaluate function pointer on this data. */ + ret = ctx->info->path_fn(path, &list->oids, list->type, + ctx->info->path_fn_data); + + /* Expand data for children. */ + if (list->type == OBJ_TREE) { + for (size_t i = 0; i < list->oids.nr; i++) { + ret |= add_children(ctx, + path, + &list->oids.oid[i]); + } + } + + oid_array_clear(&list->oids); + strmap_remove(&ctx->paths_to_lists, path, 1); + return ret; +} + +static void clear_strmap(struct strmap *map) +{ + struct hashmap_iter iter; + struct strmap_entry *e; + + hashmap_for_each_entry(&map->map, &iter, e, ent) { + struct type_and_oid_list *list = e->value; + oid_array_clear(&list->oids); + } + strmap_clear(map, 1); + strmap_init(map); +} + +/** + * Given the configuration of 'info', walk the commits based on 'info->revs' and + * call 'info->path_fn' on each discovered path. + * + * Returns nonzero on an error. + */ +int walk_objects_by_path(struct path_walk_info *info) +{ + const char *root_path = ""; + int ret = 0; + size_t commits_nr = 0, paths_nr = 0; + struct commit *c; + struct type_and_oid_list *root_tree_list; + struct path_walk_context ctx = { + .repo = info->revs->repo, + .revs = info->revs, + .info = info, + .path_stack = STRING_LIST_INIT_DUP, + .paths_to_lists = STRMAP_INIT + }; + + trace2_region_enter("path-walk", "commit-walk", info->revs->repo); + + /* Insert a single list for the root tree into the paths. */ + CALLOC_ARRAY(root_tree_list, 1); + root_tree_list->type = OBJ_TREE; + strmap_put(&ctx.paths_to_lists, root_path, root_tree_list); + + if (prepare_revision_walk(info->revs)) + die(_("failed to setup revision walk")); + + while ((c = get_revision(info->revs))) { + struct object_id *oid = get_commit_tree_oid(c); + struct tree *t = lookup_tree(info->revs->repo, oid); + commits_nr++; + + if (t) + oid_array_append(&root_tree_list->oids, oid); + else + warning("could not find tree %s", oid_to_hex(oid)); + } + + trace2_data_intmax("path-walk", ctx.repo, "commits", commits_nr); + trace2_region_leave("path-walk", "commit-walk", info->revs->repo); + + string_list_append(&ctx.path_stack, root_path); + + trace2_region_enter("path-walk", "path-walk", info->revs->repo); + while (!ret && ctx.path_stack.nr) { + char *path = ctx.path_stack.items[ctx.path_stack.nr - 1].string; + ctx.path_stack.nr--; + paths_nr++; + + ret = walk_path(&ctx, path); + + free(path); + } + trace2_data_intmax("path-walk", ctx.repo, "paths", paths_nr); + trace2_region_leave("path-walk", "path-walk", info->revs->repo); + + clear_strmap(&ctx.paths_to_lists); + string_list_clear(&ctx.path_stack, 0); + return ret; +} diff --git a/path-walk.h b/path-walk.h new file mode 100644 index 00000000000..c9e94a98bc8 --- /dev/null +++ b/path-walk.h @@ -0,0 +1,43 @@ +/* + * path-walk.h : Methods and structures for walking the object graph in batches + * by the paths that can reach those objects. + */ +#include "object.h" /* Required for 'enum object_type'. */ + +struct rev_info; +struct oid_array; + +/** + * The type of a function pointer for the method that is called on a list of + * objects reachable at a given path. + */ +typedef int (*path_fn)(const char *path, + struct oid_array *oids, + enum object_type type, + void *data); + +struct path_walk_info { + /** + * revs provides the definitions for the commit walk, including + * which commits are UNINTERESTING or not. + */ + struct rev_info *revs; + + /** + * The caller wishes to execute custom logic on objects reachable at a + * given path. Every reachable object will be visited exactly once, and + * the first path to see an object wins. This may not be a stable choice. + */ + path_fn path_fn; + void *path_fn_data; +}; + +#define PATH_WALK_INFO_INIT { 0 } + +/** + * Given the configuration of 'info', walk the commits based on 'info->revs' and + * call 'info->path_fn' on each discovered path. + * + * Returns nonzero on an error. + */ +int walk_objects_by_path(struct path_walk_info *info); From patchwork Tue Sep 10 02:28:27 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13797733 Received: from mail-ed1-f48.google.com (mail-ed1-f48.google.com [209.85.208.48]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0121413B2B8 for ; Tue, 10 Sep 2024 02:29:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.208.48 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725935342; cv=none; b=Sh6sKX7guOlzKWYFziEp+2za/fBtilhwVo6hsMxBx7A8XD33zcbpbHl9IAl2qwIM4Fl8m/AsI0TW5R+w7u4wYLtEZHZFeSGiVPrBtnCTn5l2XCS0rDsm1B0iXlcjISrrFvSQXCr41nNFuDyyoUR8oJuNfbdvDCf192S4liT7xXU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725935342; c=relaxed/simple; bh=8WpLaP8S48Qhtbv7IPfx6RoVF5NWIcMenRRJiD3/HEg=; h=Message-Id:In-Reply-To:References:From:Date:Subject:Content-Type: MIME-Version:To:Cc; b=shzRL33Ld+x4OqPHWcZF6X6I/Qj/uD2vRpwxpBawBc0V3Snzeg/fsi7VeYwEUs0TYSWSed4oIkQ0wVleygYeom2vFJ4CxJZfBQfwDnPmptLv1Wcm+gldzqBeSlLd8Pfwg4dESELnnPgmKS90ZUiTbIoQ/L+WDQzjPzKosEnYNps= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=Unn7b7TC; arc=none smtp.client-ip=209.85.208.48 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Unn7b7TC" Received: by mail-ed1-f48.google.com with SMTP id 4fb4d7f45d1cf-5c3ca32971cso357467a12.0 for ; Mon, 09 Sep 2024 19:29:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1725935338; x=1726540138; darn=vger.kernel.org; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=AGMH37EycR8jxMJo3/rBXnAayoJ7MMo1d9WtbnU5SrE=; b=Unn7b7TCFPJgQS6o1O67q4dvxlL3hxajN2JIqQxHx3spbZtFSaTjqhp1aFeojfEXv1 tZRjyzLDVcIceTx+GKzsdTnmi7U+z1SwHC2uh4lqfuwjCdN5vxHLwjh22CDFRLjogYfK 3oFamUT3wv5Jn2kSMREYbBht5CUaDv992uHgVwpPnlk/1IV0EC+7b+IwCltbe5DIDwdW tWIgoxUhwSmWZ1UHWakyGxhLGMB3YVZ7nPeSylmnif8f6nlcMeGweNMXR1emYKhmgAM8 m+cz1kxwaH1O/QQekoHcz54SHhPuHeAsR/7mLpjZbVu+AYmz/OJ+ruda9ScsKpabrJR9 Q0tA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725935338; x=1726540138; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=AGMH37EycR8jxMJo3/rBXnAayoJ7MMo1d9WtbnU5SrE=; b=YckPGoqUayn1sp3nUsFI3NwmJLLt94IkMLROitvPNmgx3ZbUDppfPNL1zXS/qAuEC0 BOS6JMyysWLk/6k/S4TqYT9z4ANJ8v1phUjNn9SJRcyKibUQGIcilVCbTjpcMgQ2gro5 EZQdDIdxYyI37tuM1qeaajMuilTr8RQHiv9CkgGjfEF6mm58YBxLRwxxRpXVzwcZ4i2U UOOsg1PaH6hO3jumsbG9Wd3IhJDu5QhN/jW3Qs12wOf1HpFlTci2bHGt510YUGe7gkcJ Byg0DhNcIzTXzArZ8XxYe02FDRbpH72De7ChiuK84iLt4GKLL80cBCpPFCq3DKaRwZxy dE8A== X-Gm-Message-State: AOJu0YytZYal5bZsVJnnyzPNk4gwGyPx6uAxSZn3gG2A+qLpbmj2pUh9 BtWROOyAd8GQD9zItYT/VJ+dggT2hWna7lUwbpOwuse77bgVTasPwRjoZw== X-Google-Smtp-Source: AGHT+IHr4c5/R4ZFE1bp2dUNwI2l6A9nv6kDdtXjSf8xhAqLqa9kE0xu/XXRZw51dPbeWoqDIpJRDw== X-Received: by 2002:a17:907:6d24:b0:a8d:11c2:2b4 with SMTP id a640c23a62f3a-a8d11c206b0mr934964666b.56.1725935338398; Mon, 09 Sep 2024 19:28:58 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a8d25835cc9sm411772866b.26.2024.09.09.19.28.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 09 Sep 2024 19:28:58 -0700 (PDT) Message-Id: <41c49bba131ed014cc4f7ab579313527dd4c9d29.1725935335.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Tue, 10 Sep 2024 02:28:27 +0000 Subject: [PATCH 02/30] backfill: add builtin boilerplate Fcc: Sent Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, johannes.schindelin@gmx.de, peff@peff.net, ps@pks.im, me@ttaylorr.com, johncai86@gmail.com, newren@gmail.com, Derrick Stolee , Derrick Stolee From: Derrick Stolee From: Derrick Stolee In anticipation of implementing 'git backfill', populate the necessary files with the boilerplate of a new builtin. RFC TODO: When preparing this for a full implementation, make sure it is based on the newest standards introduced by [1]. [1] https://lore.kernel.org/git/xmqqjzfq2f0f.fsf@gitster.g/T/#m606036ea2e75a6d6819d6b5c90e729643b0ff7f7 [PATCH 1/3] builtin: add a repository parameter for builtin functions Signed-off-by: Derrick Stolee --- .gitignore | 1 + Documentation/git-backfill.txt | 23 +++++++++++++++++++++++ Makefile | 1 + builtin.h | 1 + builtin/backfill.c | 29 +++++++++++++++++++++++++++++ command-list.txt | 1 + git.c | 1 + 7 files changed, 57 insertions(+) create mode 100644 Documentation/git-backfill.txt create mode 100644 builtin/backfill.c diff --git a/.gitignore b/.gitignore index 8caf3700c23..8f5cb938ecb 100644 --- a/.gitignore +++ b/.gitignore @@ -19,6 +19,7 @@ /git-apply /git-archimport /git-archive +/git-backfill /git-bisect /git-blame /git-branch diff --git a/Documentation/git-backfill.txt b/Documentation/git-backfill.txt new file mode 100644 index 00000000000..640144187d3 --- /dev/null +++ b/Documentation/git-backfill.txt @@ -0,0 +1,23 @@ +git-backfill(1) +=============== + +NAME +---- +git-backfill - Download missing objects in a partial clone + + +SYNOPSIS +-------- +[verse] +'git backfill' [] + +DESCRIPTION +----------- + +SEE ALSO +-------- +linkgit:git-clone[1]. + +GIT +--- +Part of the linkgit:git[1] suite diff --git a/Makefile b/Makefile index e83f6de9a2c..4305474d96e 100644 --- a/Makefile +++ b/Makefile @@ -1198,6 +1198,7 @@ BUILTIN_OBJS += builtin/am.o BUILTIN_OBJS += builtin/annotate.o BUILTIN_OBJS += builtin/apply.o BUILTIN_OBJS += builtin/archive.o +BUILTIN_OBJS += builtin/backfill.o BUILTIN_OBJS += builtin/bisect.o BUILTIN_OBJS += builtin/blame.o BUILTIN_OBJS += builtin/branch.o diff --git a/builtin.h b/builtin.h index 14fa0171607..73dd0ccbe8c 100644 --- a/builtin.h +++ b/builtin.h @@ -127,6 +127,7 @@ int cmd_am(int argc, const char **argv, const char *prefix); int cmd_annotate(int argc, const char **argv, const char *prefix); int cmd_apply(int argc, const char **argv, const char *prefix); int cmd_archive(int argc, const char **argv, const char *prefix); +int cmd_backfill(int argc, const char **argv, const char *prefix); int cmd_bisect(int argc, const char **argv, const char *prefix); int cmd_blame(int argc, const char **argv, const char *prefix); int cmd_branch(int argc, const char **argv, const char *prefix); diff --git a/builtin/backfill.c b/builtin/backfill.c new file mode 100644 index 00000000000..77b05a2f838 --- /dev/null +++ b/builtin/backfill.c @@ -0,0 +1,29 @@ +#include "builtin.h" +#include "config.h" +#include "parse-options.h" +#include "repository.h" +#include "object.h" + +static const char * const builtin_backfill_usage[] = { + N_("git backfill []"), + NULL +}; + +int cmd_backfill(int argc, const char **argv, const char *prefix) +{ + struct option options[] = { + OPT_END(), + }; + + if (argc == 2 && !strcmp(argv[1], "-h")) + usage_with_options(builtin_backfill_usage, options); + + argc = parse_options(argc, argv, prefix, options, builtin_backfill_usage, + 0); + + git_config(git_default_config, NULL); + + die(_("not implemented")); + + return 0; +} diff --git a/command-list.txt b/command-list.txt index e0bb87b3b5c..c537114b468 100644 --- a/command-list.txt +++ b/command-list.txt @@ -60,6 +60,7 @@ git-annotate ancillaryinterrogators git-apply plumbingmanipulators complete git-archimport foreignscminterface git-archive mainporcelain +git-backfill mainporcelain history git-bisect mainporcelain info git-blame ancillaryinterrogators complete git-branch mainporcelain history diff --git a/git.c b/git.c index 9a618a2740f..4f2215e9c8b 100644 --- a/git.c +++ b/git.c @@ -509,6 +509,7 @@ static struct cmd_struct commands[] = { { "annotate", cmd_annotate, RUN_SETUP }, { "apply", cmd_apply, RUN_SETUP_GENTLY }, { "archive", cmd_archive, RUN_SETUP_GENTLY }, + { "backfill", cmd_backfill, RUN_SETUP }, { "bisect", cmd_bisect, RUN_SETUP }, { "blame", cmd_blame, RUN_SETUP }, { "branch", cmd_branch, RUN_SETUP | DELAY_PAGER_CONFIG }, From patchwork Tue Sep 10 02:28:28 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13797735 Received: from mail-lf1-f49.google.com (mail-lf1-f49.google.com [209.85.167.49]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7B78914D710 for ; Tue, 10 Sep 2024 02:29:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.167.49 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725935345; cv=none; b=uqAWnV4WtjJXfjItlCd8JVUQrCnU292ZIS2eDNHhe0S94X0YT53BMAwiW7pwkKwIv/CTNRTjlEpdeXKrOUK4GmkEVlUQcZnkmZNOLwjK0yT+9GA106OqAHX2fMuFaENMoQgQ8CDugOiqTzLXZRX0bXHiR8XehOe39fMvNJ5OdJ4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725935345; c=relaxed/simple; bh=8ozNh6DaupTCj0gOuq5l65B4rl9Pe7gL3E8KUR122Gc=; h=Message-Id:In-Reply-To:References:From:Date:Subject:Content-Type: MIME-Version:To:Cc; b=mAywELv7SSKpGaHcevqqh2vJBQqWqXeVJ3wB0WDE4vpRxLX1MSSrL0RVScUYoK8a8niIsqXIT3IQCr7YaFaVbXIEtFt1eex9wJZJbdTvF25LCAEpbCmXaml2LhhajwreXzYKCHJzeToEM/huifjC0/fpZN6OH7rJGnDlpRylkWE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=Rjb3dnH9; arc=none smtp.client-ip=209.85.167.49 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Rjb3dnH9" Received: by mail-lf1-f49.google.com with SMTP id 2adb3069b0e04-5365d3f9d34so3373314e87.3 for ; Mon, 09 Sep 2024 19:29:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1725935341; x=1726540141; darn=vger.kernel.org; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=mHQ+dG+TUAngLKzXqrm9rZ+af9iT+T6fPXhRv1EJnDM=; b=Rjb3dnH9FS4Tt1O0L1diOX12AxWzflskDFzfu6DRnPIZyJCSDWiqX6Cx8UStdSCVJ4 UWi8M5CJq2Y/VrAB8dWeX5+mYojoJykgOFljietf9UmPatd6svl4CRWe0HBsMEzm/YWh 09qWmsx9tqnGu1hiC3kK00vI/Mgga0NM3YG339LXXfaWBG+0giL4hvKzLYEO62Q/HEfx 94BfrAZ1VQry4DRazqyPeJdv2d+xOoYMh9urT+9zWKWcl2Dn707Qsh9Hw6rWpkNx0wMc PUIKHFHID4B3XeVaYPQF+BrXZQs2xuj6bDJv/r3fjVdQb0PSfOcodXg+LUt+GRy3gCBK x6iQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725935341; x=1726540141; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=mHQ+dG+TUAngLKzXqrm9rZ+af9iT+T6fPXhRv1EJnDM=; b=ku5ff/cus+OXhGmlUx0N1DrQw1kIYel1Bzx4HqT0qDPay1bamHsk+iwxrBUHgzr7jd glP2xVTNsxJc0Yipt5nVLOwuuO4cqDUm28AQZbiGGc4knX51EC6xxgvXWK4jBjZ7/wcq WX6SkOsHwOgeQgMHpcjusDfn0p/DPQr8mkP6PhFuAY1zcovEltrtMJ0NvM3TEu3j49Pv C1fiG7Rrc8vBn3NXRDMw5/tZ5QETMACyVeo17zF+fVXhcPjCvPjCMwZk3le6UAR2Q2St T7J4T6HohMth8a2vrza+cWn3Lty4PGh3L7Lgb0akNQRERpuNaVUhPfWMFWirXCUhJeK8 HfEw== X-Gm-Message-State: AOJu0YxlNbiY6dqAuX4RIX5tMUzJS6yPt3egWJ/ui1RZQBL8LWqJwM46 1m5uM4Ft9nf5GHp2qINSHVo6pqO4frYaHmCN6PiZ6M36gCrwBVXn980FJA== X-Google-Smtp-Source: AGHT+IHzYELrtVtfoZXodOD7ZB1r5n/+pUhEuqrVzP1oyOwAhwRGpSoKA9w0vTqhSTS/Ex0pOGr/TA== X-Received: by 2002:a05:6512:3989:b0:536:54d6:e6e3 with SMTP id 2adb3069b0e04-53658819213mr7941394e87.61.1725935340088; Mon, 09 Sep 2024 19:29:00 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id 4fb4d7f45d1cf-5c3ebd467f5sm3655943a12.36.2024.09.09.19.28.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 09 Sep 2024 19:28:58 -0700 (PDT) Message-Id: In-Reply-To: References: Date: Tue, 10 Sep 2024 02:28:28 +0000 Subject: [PATCH 03/30] backfill: basic functionality and tests Fcc: Sent Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, johannes.schindelin@gmx.de, peff@peff.net, ps@pks.im, me@ttaylorr.com, johncai86@gmail.com, newren@gmail.com, Derrick Stolee , Derrick Stolee From: Derrick Stolee From: Derrick Stolee The default behavior of 'git backfill' is to fetch all missing blobs that are reachable from HEAD. Document and test this behavior. The implementation is a very simple use of the path-walk API, initializing the revision walk at HEAD to start the path-walk from all commits reachable from HEAD. Ignore the object arrays that correspond to tree entries, assuming that they are all present already. Signed-off-by: Derrick Stolee --- Documentation/git-backfill.txt | 24 ++++++++ builtin/backfill.c | 101 ++++++++++++++++++++++++++++++++- t/t5620-backfill.sh | 97 +++++++++++++++++++++++++++++++ 3 files changed, 219 insertions(+), 3 deletions(-) create mode 100755 t/t5620-backfill.sh diff --git a/Documentation/git-backfill.txt b/Documentation/git-backfill.txt index 640144187d3..0e10f066fef 100644 --- a/Documentation/git-backfill.txt +++ b/Documentation/git-backfill.txt @@ -14,6 +14,30 @@ SYNOPSIS DESCRIPTION ----------- +Blobless partial clones are created using `git clone --filter=blob:none` +and then configure the local repository such that the Git client avoids +downloading blob objects unless they are required for a local operation. +This initially means that the clone and later fetches download reachable +commits and trees but no blobs. Later operations that change the `HEAD` +pointer, such as `git checkout` or `git merge`, may need to download +missing blobs in order to complete their operation. + +In the worst cases, commands that compute blob diffs, such as `git blame`, +become very slow as they download the missing blobs in single-blob +requests to satisfy the missing object as the Git command needs it. This +leads to multiple download requests and no ability for the Git server to +provide delta compression across those objects. + +The `git backfill` command provides a way for the user to request that +Git downloads the missing blobs (with optional filters) such that the +missing blobs representing historical versions of files can be downloaded +in batches. The `backfill` command attempts to optimize the request by +grouping blobs that appear at the same path, hopefully leading to good +delta compression in the packfile sent by the server. + +By default, `git backfill` downloads all blobs reachable from the `HEAD` +commit. This set can be restricted or expanded using various options. + SEE ALSO -------- linkgit:git-clone[1]. diff --git a/builtin/backfill.c b/builtin/backfill.c index 77b05a2f838..23d40fc02a2 100644 --- a/builtin/backfill.c +++ b/builtin/backfill.c @@ -1,16 +1,113 @@ #include "builtin.h" +#include "git-compat-util.h" #include "config.h" #include "parse-options.h" #include "repository.h" +#include "commit.h" +#include "hex.h" +#include "tree.h" +#include "tree-walk.h" #include "object.h" +#include "object-store-ll.h" +#include "oid-array.h" +#include "oidset.h" +#include "promisor-remote.h" +#include "strmap.h" +#include "string-list.h" +#include "revision.h" +#include "trace2.h" +#include "progress.h" +#include "packfile.h" +#include "path-walk.h" static const char * const builtin_backfill_usage[] = { N_("git backfill []"), NULL }; +struct backfill_context { + struct repository *repo; + struct oid_array current_batch; + size_t batch_size; +}; + +static void clear_backfill_context(struct backfill_context *ctx) +{ + oid_array_clear(&ctx->current_batch); +} + +static void download_batch(struct backfill_context *ctx) +{ + promisor_remote_get_direct(ctx->repo, + ctx->current_batch.oid, + ctx->current_batch.nr); + oid_array_clear(&ctx->current_batch); + + /* + * We likely have a new packfile. Add it to the packed list to + * avoid possible duplicate downloads of the same objects. + */ + reprepare_packed_git(ctx->repo); +} + +static int fill_missing_blobs(const char *path, + struct oid_array *list, + enum object_type type, + void *data) +{ + struct backfill_context *ctx = data; + + if (type != OBJ_BLOB) + return 0; + + for (size_t i = 0; i < list->nr; i++) { + off_t size = 0; + struct object_info info = OBJECT_INFO_INIT; + info.disk_sizep = &size; + if (oid_object_info_extended(the_repository, + &list->oid[i], + &info, + OBJECT_INFO_FOR_PREFETCH) || + !size) + oid_array_append(&ctx->current_batch, &list->oid[i]); + } + + if (ctx->current_batch.nr >= ctx->batch_size) + download_batch(ctx); + + return 0; +} + +static int do_backfill(struct backfill_context *ctx) +{ + struct rev_info revs; + struct path_walk_info info = PATH_WALK_INFO_INIT; + int ret; + + repo_init_revisions(ctx->repo, &revs, ""); + handle_revision_arg("HEAD", &revs, 0, 0); + + info.revs = &revs; + info.path_fn = fill_missing_blobs; + info.path_fn_data = ctx; + + ret = walk_objects_by_path(&info); + + /* Download the objects that did not fill a batch. */ + if (!ret) + download_batch(ctx); + + clear_backfill_context(ctx); + return ret; +} + int cmd_backfill(int argc, const char **argv, const char *prefix) { + struct backfill_context ctx = { + .repo = the_repository, + .current_batch = OID_ARRAY_INIT, + .batch_size = 16000, + }; struct option options[] = { OPT_END(), }; @@ -23,7 +120,5 @@ int cmd_backfill(int argc, const char **argv, const char *prefix) git_config(git_default_config, NULL); - die(_("not implemented")); - - return 0; + return do_backfill(&ctx); } diff --git a/t/t5620-backfill.sh b/t/t5620-backfill.sh new file mode 100755 index 00000000000..43868a4a75f --- /dev/null +++ b/t/t5620-backfill.sh @@ -0,0 +1,97 @@ +#!/bin/sh + +test_description='git backfill on partial clones' + +GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main +export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME + +TEST_PASSES_SANITIZE_LEAK=0 +export TEST_PASSES_SANITIZE_LEAK + +. ./test-lib.sh + +# We create objects in the 'src' repo. +test_expect_success 'setup repo for object creation' ' + echo "{print \$1}" >print_1.awk && + echo "{print \$2}" >print_2.awk && + + git init src && + + mkdir -p src/a/b/c && + mkdir -p src/d/e && + + for i in 1 2 + do + for n in 1 2 3 4 + do + echo "Version $i of file $n" > src/file.$n.txt && + echo "Version $i of file a/$n" > src/a/file.$n.txt && + echo "Version $i of file a/b/$n" > src/a/b/file.$n.txt && + echo "Version $i of file a/b/c/$n" > src/a/b/c/file.$n.txt && + echo "Version $i of file d/$n" > src/d/file.$n.txt && + echo "Version $i of file d/e/$n" > src/d/e/file.$n.txt && + git -C src add . && + git -C src commit -m "Iteration $n" || return 1 + done + done +' + +# Clone 'src' into 'srv.bare' so we have a bare repo to be our origin +# server for the partial clone. +test_expect_success 'setup bare clone for server' ' + git clone --bare "file://$(pwd)/src" srv.bare && + git -C srv.bare config --local uploadpack.allowfilter 1 && + git -C srv.bare config --local uploadpack.allowanysha1inwant 1 +' + +# do basic partial clone from "srv.bare" +test_expect_success 'do partial clone 1, backfill gets all objects' ' + git clone --no-checkout --filter=blob:none \ + --single-branch --branch=main \ + "file://$(pwd)/srv.bare" backfill1 && + + # Backfill with no options gets everything reachable from HEAD. + GIT_TRACE2_EVENT="$(pwd)/backfill-file-trace" git \ + -C backfill1 backfill && + + # We should have engaged the partial clone machinery + test_trace2_data promisor fetch_count 48 revs2 && + test_line_count = 0 revs2 +' + +. "$TEST_DIRECTORY"/lib-httpd.sh +start_httpd + +test_expect_success 'create a partial clone over HTTP' ' + SERVER="$HTTPD_DOCUMENT_ROOT_PATH/server" && + rm -rf "$SERVER" repo && + git clone --bare "file://$(pwd)/src" "$SERVER" && + test_config -C "$SERVER" uploadpack.allowfilter 1 && + test_config -C "$SERVER" uploadpack.allowanysha1inwant 1 && + + git clone --no-checkout --filter=blob:none \ + "$HTTPD_URL/smart/server" backfill-http +' + +test_expect_success 'backfilling over HTTP succeeds' ' + GIT_TRACE2_EVENT="$(pwd)/backfill-http-trace" git \ + -C backfill-http backfill && + + # We should have engaged the partial clone machinery + test_trace2_data promisor fetch_count 48 rev-list-out && + awk "{print \$1;}" oids && + GIT_TRACE2_EVENT="$(pwd)/walk-trace" git -C backfill-http \ + cat-file --batch-check batch-out && + ! grep missing batch-out +' + +# DO NOT add non-httpd-specific tests here, because the last part of this +# test script is only executed when httpd is available and enabled. + +test_done From patchwork Tue Sep 10 02:28:29 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13797736 Received: from mail-ej1-f52.google.com (mail-ej1-f52.google.com [209.85.218.52]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 95E8B1531E1 for ; Tue, 10 Sep 2024 02:29:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.218.52 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725935346; cv=none; b=kdBx9KoH9Nk87rQZPG/tkI6EbrGA3T9TVYLjnezZ7rvWjmEt2OuNuT+VWXrCokAjK+d//R6QK/sbi/AMgfAJ4RAt+qyDht1ZIpfAEt44aG3Vr1IDOdi5OeyU+RQhrBLWxUTe06JAOsIPmEDG3Avighlq+HVLY2g/hWGQR+Xz3ss= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725935346; c=relaxed/simple; bh=6jdAjoeS1g0eJtXGhPwMcbbnb2WhAIL55DjeXfzWZUI=; h=Message-Id:In-Reply-To:References:From:Date:Subject:Content-Type: MIME-Version:To:Cc; b=EQ3N4oUwV/j5Px12gwao7yC3M0KuBc5IV+DlJjjFhGsnMp/IQWLhKh53tbV1O1ILSdvcfrrSuMUKZ/+U+8xr61oy3/SydOWZPeetoQl9NhUMnW8PQEnDk+KXZx3wFGIAXyrAvJRcCjwOC5nMUefma6VyjbFz3+h/4f8pmK7uXCY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=cxqAYncw; arc=none smtp.client-ip=209.85.218.52 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="cxqAYncw" Received: by mail-ej1-f52.google.com with SMTP id a640c23a62f3a-a8d13b83511so361252866b.2 for ; Mon, 09 Sep 2024 19:29:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1725935341; x=1726540141; darn=vger.kernel.org; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=xZI+0g2ezLXpeQ0lk1bb92N3OXLntYJyZ6dhDZcz5Ho=; b=cxqAYncwElgOofvcF3Bk9NW/cP1Scy8tfGnHF+IhFczN8xCAnd03CEnN7WYV4Cuc2b gWJMuIMjslK02WZ/zHja2wqkHAcMx7hu8DQLQhqfwdo3eQIVqeb+4JXJs3MtSOanK8eD xs4WaprByBahii64ycPdTZijl0AGV+QtH7ufk9TcIAsWAc8EADPRrzfW2jAGgIwfOhWK lGr9pSaOrJkhKoRugLZmqtBhGjGQZsJaYbrd2yir1bht8yn+NORmQkm8b7JTjaP1lXO/ 75H4BbthjBVirTkcR6upPcwIajhUV7aWO7nz7xid9YyG8uRfLLDfBj9s4zsmnrNAIEyq JV8g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725935341; x=1726540141; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=xZI+0g2ezLXpeQ0lk1bb92N3OXLntYJyZ6dhDZcz5Ho=; b=lY3ASR6Yf7JI8MESPEjC6FqYct1nXVZgAwqUPqeOYfTTA0Q/mtE75QOvJfyg8gctbM /emkQj9q2cwpSSA4VJ1+ZM/iF3aIiYtPaAIocsiqXQgrbkbhOxfZSnNKGFMTEHP1E1h3 l3uUp+wDdyv6pKoOnwLXnDk6u+pYbQLzgE5te00mCo/Cffkfk5wNYHaH12FNEdlYP+0g uckTreIsP4eLuhHXdCff7D4F6kf4XCek+nNAgnlyrVbiigvbfSlH59EnMxelqPeJoyZd Doob+7dpSjNjwGdzzo0cYIP+ctiwZkAy7Ju7/WWlLHhtXSXc9AkfL8Z1jjtu/rqjgKlB ehQg== X-Gm-Message-State: AOJu0YwgCClJHA3p714pTiQ5J7eTcc1niM00BnxPW6Swe3o6ewsT/Eh4 ZUTRCE8xXrqwR4gGyaaFEaXNGHm47dL8uppqz7CZkwBwQYolDjeJMLTm+w== X-Google-Smtp-Source: AGHT+IGxr0CBidvuEC0u+sp5uemM4uof/QpZsat8rdeQKFwSo15jf8FqFUUe1glh5PlAmx8HJZyRvQ== X-Received: by 2002:a17:907:920a:b0:a8a:18f9:269f with SMTP id a640c23a62f3a-a8a888710d9mr709934666b.60.1725935340759; Mon, 09 Sep 2024 19:29:00 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a8d25c60f0bsm415800066b.125.2024.09.09.19.29.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 09 Sep 2024 19:29:00 -0700 (PDT) Message-Id: In-Reply-To: References: Date: Tue, 10 Sep 2024 02:28:29 +0000 Subject: [PATCH 04/30] backfill: add --batch-size= option Fcc: Sent Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, johannes.schindelin@gmx.de, peff@peff.net, ps@pks.im, me@ttaylorr.com, johncai86@gmail.com, newren@gmail.com, Derrick Stolee , Derrick Stolee From: Derrick Stolee From: Derrick Stolee Users may want to specify a minimum batch size for their needs. This is only a minimum: the path-walk API provides a list of OIDs that correspond to the same path, and thus it is optimal to allow delta compression across those objects in a single server request. We could consider limiting the request to have a maximum batch size in the future. Signed-off-by: Derrick Stolee --- Documentation/git-backfill.txt | 10 +++++++++- builtin/backfill.c | 4 +++- t/t5620-backfill.sh | 18 ++++++++++++++++++ 3 files changed, 30 insertions(+), 2 deletions(-) diff --git a/Documentation/git-backfill.txt b/Documentation/git-backfill.txt index 0e10f066fef..9b0bae04e9d 100644 --- a/Documentation/git-backfill.txt +++ b/Documentation/git-backfill.txt @@ -9,7 +9,7 @@ git-backfill - Download missing objects in a partial clone SYNOPSIS -------- [verse] -'git backfill' [] +'git backfill' [--batch-size=] DESCRIPTION ----------- @@ -38,6 +38,14 @@ delta compression in the packfile sent by the server. By default, `git backfill` downloads all blobs reachable from the `HEAD` commit. This set can be restricted or expanded using various options. +OPTIONS +------- + +--batch-size=:: + Specify a minimum size for a batch of missing objects to request + from the server. This size may be exceeded by the last set of + blobs seen at a given path. Default batch size is 16,000. + SEE ALSO -------- linkgit:git-clone[1]. diff --git a/builtin/backfill.c b/builtin/backfill.c index 23d40fc02a2..50006f15740 100644 --- a/builtin/backfill.c +++ b/builtin/backfill.c @@ -21,7 +21,7 @@ #include "path-walk.h" static const char * const builtin_backfill_usage[] = { - N_("git backfill []"), + N_("git backfill [--batch-size=]"), NULL }; @@ -109,6 +109,8 @@ int cmd_backfill(int argc, const char **argv, const char *prefix) .batch_size = 16000, }; struct option options[] = { + OPT_INTEGER(0, "batch-size", &ctx.batch_size, + N_("Minimun number of objects to request at a time")), OPT_END(), }; diff --git a/t/t5620-backfill.sh b/t/t5620-backfill.sh index 43868a4a75f..2d81559d8e9 100755 --- a/t/t5620-backfill.sh +++ b/t/t5620-backfill.sh @@ -62,6 +62,24 @@ test_expect_success 'do partial clone 1, backfill gets all objects' ' test_line_count = 0 revs2 ' +test_expect_success 'do partial clone 2, backfill batch size' ' + git clone --no-checkout --filter=blob:none \ + --single-branch --branch=main \ + "file://$(pwd)/srv.bare" backfill2 && + + GIT_TRACE2_EVENT="$(pwd)/batch-trace" git \ + -C backfill2 backfill --batch-size=20 && + + # Batches were used + test_trace2_data promisor fetch_count 20 matches && + test_line_count = 2 matches && + test_trace2_data promisor fetch_count 8 revs2 && + test_line_count = 0 revs2 +' + . "$TEST_DIRECTORY"/lib-httpd.sh start_httpd From patchwork Tue Sep 10 02:28:30 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13797738 Received: from mail-ed1-f42.google.com (mail-ed1-f42.google.com [209.85.208.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6B5C51514E4 for ; Tue, 10 Sep 2024 02:29:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.208.42 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725935347; cv=none; b=U6I62ZwLwPgZpV1h/aIkLU+j7TybWvgymjr82qxFssH/Ukb/vFhDD37NGc5uhnVmmefbhBS2Bk0DvnAfM65NuN0DohRpf+jelJ/SJGGbpejMyUJQG114HGytc0YKRjjqtbcCh5wQ0ML8UrsNAixm5pVUMI907vjBkfq89GnXDKo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725935347; c=relaxed/simple; bh=ndtY29naIc8nZq8JNhBDL5UNuWm+2MTjMgy3IS7TlEc=; h=Message-Id:In-Reply-To:References:From:Date:Subject:Content-Type: MIME-Version:To:Cc; b=BCY1HX7b+XO3KN8RMb5clw3ftyfTiY/9zcFJe8U3Vy7isFi6qO31QBSQlBK/YXJLeUYluMLkmkaffiTCsnANONXv55/wL/oJyrsRrfAWxPHbddMtOCKATyLeHOyEV97cuRSl/H+Pobi+4fXnt2i2ULOU5Zxpq0lWcSXf1vZSp1Y= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=Hx49Pedp; arc=none smtp.client-ip=209.85.208.42 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Hx49Pedp" Received: by mail-ed1-f42.google.com with SMTP id 4fb4d7f45d1cf-5c3cdbe4728so4968813a12.2 for ; Mon, 09 Sep 2024 19:29:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1725935342; x=1726540142; darn=vger.kernel.org; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=eJe0VJtiw3wmxFS8RJdy7DTd2s4o6MGfHdt9vLbNyTI=; b=Hx49PedpqSA3ZR+icf82CIS5Bf/KRbngw25jEPThuU3Rq62OmpbwfT+q5h7RcT0I5e Dw4VL8QatzfzLwL2ZNZOUUH3J80FQ7rna4EABQYB0KKqIoL68WIieHUDa4V1A1/m28MQ MbOQwJVpbkrjILARo4qJpv5pqg1qLxYKPhwQJ6VG9wzaWpGvm2Yc8PA/fts+PGgqavJ+ TIeeh9fI4wDMlc4gyixL2JSiYCw5Toup9NKaMSgR6yJJiUGbKSyboCeGOHFlvzFewsfv h5SOTdcSvOGUM6MvxVuI3FY8+nxUyM6w9ueTzIehT8K9eDFRlaokCzwvzTCL8fZYIozx HQAg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725935342; x=1726540142; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=eJe0VJtiw3wmxFS8RJdy7DTd2s4o6MGfHdt9vLbNyTI=; b=QbBGNQszlgPPUGHeCXlJv13kqMhOLpn9GkJAL6vjEKeTza3V6LDYZjEn/nCpachPYO Sbd9CZi/yP1AZSZPd59Q/Ihqh3OmyPIcrfyQ1zWHvHq5a8UfZV3Td7LdRqnx8XWCS4tJ dfP95m7hqF8dOSF3N0CIeMvejav81s0ybgWhNf5u1C/t122W/ijxdwoR0/KYE4ksOitD dWzGUnc2lCnEl8KIB1pUlh1UMf/d9rvf0dMHr3kU3LP2iIB540F4r5BsugkY/Hk6KF5b QNbgYs6/Jni0TOmtp4XpJBv1oQ+m2TwTZRXtnuiwC5VbJCpsEXI8WAh50njbxeMMg/JN wb2g== X-Gm-Message-State: AOJu0Yx7rnTuUZD73iMCPKx2sEHy7Ow9h70Yb0AmL8E7M2C8tKDozhsj IcNHRFjVq4xbdc6qeKt6S3raM+t1//CTC+56C5F0Lm3mgPXV9phb4BWoDQ== X-Google-Smtp-Source: AGHT+IFkqQDrqH+16cwjm0/f+0xbmA8xMwYyYkRnDZOLdzTBx5Spy2tpAOYrqGDJAIIzoTVdrt0gHQ== X-Received: by 2002:a17:907:d2c5:b0:a86:95ff:f3a0 with SMTP id a640c23a62f3a-a8d245139admr650974966b.3.1725935341653; Mon, 09 Sep 2024 19:29:01 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a8d25c80a24sm415963366b.137.2024.09.09.19.29.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 09 Sep 2024 19:29:01 -0700 (PDT) Message-Id: In-Reply-To: References: Date: Tue, 10 Sep 2024 02:28:30 +0000 Subject: [PATCH 05/30] backfill: add --sparse option Fcc: Sent Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, johannes.schindelin@gmx.de, peff@peff.net, ps@pks.im, me@ttaylorr.com, johncai86@gmail.com, newren@gmail.com, Derrick Stolee , Derrick Stolee From: Derrick Stolee From: Derrick Stolee One way to significantly reduce the cost of a Git clone and later fetches is to use a blobless partial clone and combine that with a sparse-checkout that reduces the paths that need to be populated in the working directory. Not only does this reduce the cost of clones and fetches, the sparse-checkout reduces the number of objects needed to download from a promisor remote. However, history investigations can be expensie as computing blob diffs will trigger promisor remote requests for one object at a time. This can be avoided by downloading the blobs needed for the given sparse-checkout using 'git backfill' and its new '--sparse' mode, at a time that the user is willing to pay that extra cost. Note that this is distinctly different from the '--filter=sparse:' option, as this assumes that the partial clone has all reachable trees and we are using client-side logic to avoid downloading blobs outside of the sparse-checkout cone. This avoids the server-side cost of walking trees while also achieving a similar goal. It also downloads in batches based on similar path names, presenting a resumable download if things are interrupted. This augments the path-walk API to have a possibly-NULL 'pl' member that may point to a 'struct pattern_list'. This could be more general than the sparse-checkout definition at HEAD, but 'git backfill --sparse' is currently the only consumer. Be sure to test this in both cone mode and not cone mode. Cone mode has the benefit that the path-walk can skip certain paths once they would expand beyond the sparse-checkout. Signed-off-by: Derrick Stolee --- Documentation/git-backfill.txt | 6 +++- builtin/backfill.c | 13 +++++++- path-walk.c | 18 +++++++++++ path-walk.h | 11 +++++++ t/t5620-backfill.sh | 55 ++++++++++++++++++++++++++++++++++ 5 files changed, 101 insertions(+), 2 deletions(-) diff --git a/Documentation/git-backfill.txt b/Documentation/git-backfill.txt index 9b0bae04e9d..ecf2ac428ce 100644 --- a/Documentation/git-backfill.txt +++ b/Documentation/git-backfill.txt @@ -9,7 +9,7 @@ git-backfill - Download missing objects in a partial clone SYNOPSIS -------- [verse] -'git backfill' [--batch-size=] +'git backfill' [--batch-size=] [--[no-]sparse] DESCRIPTION ----------- @@ -46,6 +46,10 @@ OPTIONS from the server. This size may be exceeded by the last set of blobs seen at a given path. Default batch size is 16,000. +--[no-]sparse:: + Only download objects if they appear at a path that matches the + current sparse-checkout. + SEE ALSO -------- linkgit:git-clone[1]. diff --git a/builtin/backfill.c b/builtin/backfill.c index 50006f15740..de75471cf44 100644 --- a/builtin/backfill.c +++ b/builtin/backfill.c @@ -4,6 +4,7 @@ #include "parse-options.h" #include "repository.h" #include "commit.h" +#include "dir.h" #include "hex.h" #include "tree.h" #include "tree-walk.h" @@ -21,7 +22,7 @@ #include "path-walk.h" static const char * const builtin_backfill_usage[] = { - N_("git backfill [--batch-size=]"), + N_("git backfill [--batch-size=] [--[no-]sparse]"), NULL }; @@ -29,6 +30,7 @@ struct backfill_context { struct repository *repo; struct oid_array current_batch; size_t batch_size; + int sparse; }; static void clear_backfill_context(struct backfill_context *ctx) @@ -84,6 +86,12 @@ static int do_backfill(struct backfill_context *ctx) struct path_walk_info info = PATH_WALK_INFO_INIT; int ret; + if (ctx->sparse) { + CALLOC_ARRAY(info.pl, 1); + if (get_sparse_checkout_patterns(info.pl)) + return error(_("problem loading sparse-checkout")); + } + repo_init_revisions(ctx->repo, &revs, ""); handle_revision_arg("HEAD", &revs, 0, 0); @@ -107,10 +115,13 @@ int cmd_backfill(int argc, const char **argv, const char *prefix) .repo = the_repository, .current_batch = OID_ARRAY_INIT, .batch_size = 16000, + .sparse = 0, }; struct option options[] = { OPT_INTEGER(0, "batch-size", &ctx.batch_size, N_("Minimun number of objects to request at a time")), + OPT_BOOL(0, "sparse", &ctx.sparse, + N_("Restrict the missing objects to the current sparse-checkout")), OPT_END(), }; diff --git a/path-walk.c b/path-walk.c index 2edfa0572e4..dc2390dd9ea 100644 --- a/path-walk.c +++ b/path-walk.c @@ -10,6 +10,7 @@ #include "hex.h" #include "object.h" #include "oid-array.h" +#include "repository.h" #include "revision.h" #include "string-list.h" #include "strmap.h" @@ -111,6 +112,23 @@ static int add_children(struct path_walk_context *ctx, if (type == OBJ_TREE) strbuf_addch(&path, '/'); + if (ctx->info->pl) { + int dtype; + enum pattern_match_result match; + match = path_matches_pattern_list(path.buf, path.len, + path.buf + base_len, &dtype, + ctx->info->pl, + ctx->repo->index); + + if (ctx->info->pl->use_cone_patterns && + match == NOT_MATCHED) + continue; + else if (!ctx->info->pl->use_cone_patterns && + type == OBJ_BLOB && + match != MATCHED) + continue; + } + if (!(list = strmap_get(&ctx->paths_to_lists, path.buf))) { CALLOC_ARRAY(list, 1); list->type = type; diff --git a/path-walk.h b/path-walk.h index c9e94a98bc8..bc1ebba5081 100644 --- a/path-walk.h +++ b/path-walk.h @@ -6,6 +6,7 @@ struct rev_info; struct oid_array; +struct pattern_list; /** * The type of a function pointer for the method that is called on a list of @@ -30,6 +31,16 @@ struct path_walk_info { */ path_fn path_fn; void *path_fn_data; + + /** + * Specify a sparse-checkout definition to match our paths to. Do not + * walk outside of this sparse definition. If the patterns are in + * cone mode, then the search may prune directories that are outside + * of the cone. If not in cone mode, then all tree paths will be + * explored but the path_fn will only be called when the path matches + * the sparse-checkout patterns. + */ + struct pattern_list *pl; }; #define PATH_WALK_INFO_INIT { 0 } diff --git a/t/t5620-backfill.sh b/t/t5620-backfill.sh index 2d81559d8e9..c7bb27b72c1 100755 --- a/t/t5620-backfill.sh +++ b/t/t5620-backfill.sh @@ -80,6 +80,61 @@ test_expect_success 'do partial clone 2, backfill batch size' ' test_line_count = 0 revs2 ' +test_expect_success 'backfill --sparse' ' + git clone --sparse --filter=blob:none \ + --single-branch --branch=main \ + "file://$(pwd)/srv.bare" backfill3 && + + # Initial checkout includes four files at root. + git -C backfill3 rev-list --quiet --objects --missing=print HEAD >missing && + test_line_count = 44 missing && + + # Initial sparse-checkout is just the files at root, so we get the + # older versions of the four files at tip. + GIT_TRACE2_EVENT="$(pwd)/sparse-trace1" git \ + -C backfill3 backfill --sparse && + test_trace2_data promisor fetch_count 4 missing && + test_line_count = 40 missing && + + # Expand the sparse-checkout to include 'd' recursively. This + # engages the algorithm to skip the trees for 'a'. Note that + # the "sparse-checkout set" command downloads the objects at tip + # to satisfy the current checkout. + git -C backfill3 sparse-checkout set d && + GIT_TRACE2_EVENT="$(pwd)/sparse-trace2" git \ + -C backfill3 backfill --sparse && + test_trace2_data promisor fetch_count 8 missing && + test_line_count = 24 missing +' + +test_expect_success 'backfill --sparse without cone mode' ' + git clone --no-checkout --filter=blob:none \ + --single-branch --branch=main \ + "file://$(pwd)/srv.bare" backfill4 && + + # No blobs yet + git -C backfill4 rev-list --quiet --objects --missing=print HEAD >missing && + test_line_count = 48 missing && + + # Define sparse-checkout by filename regardless of parent directory. + # This downloads 6 blobs to satisfy the checkout. + git -C backfill4 sparse-checkout set --no-cone "**/file.1.txt" && + git -C backfill4 checkout main && + + GIT_TRACE2_EVENT="$(pwd)/no-cone-trace1" git \ + -C backfill4 backfill --sparse && + test_trace2_data promisor fetch_count 6 missing && + test_line_count = 36 missing +' + . "$TEST_DIRECTORY"/lib-httpd.sh start_httpd From patchwork Tue Sep 10 02:28:31 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13797737 Received: from mail-lj1-f176.google.com (mail-lj1-f176.google.com [209.85.208.176]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 180D46A332 for ; Tue, 10 Sep 2024 02:29:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.208.176 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725935346; cv=none; b=UPpRrQgDNIGqyBv3X0m75U+E835v8SBVy3jiG3J7W9eXsRWbmaz+Z0L9frh+Rq5H5G4TDCfnUtgzGsmjEXfgs/p28ZD8HNV7DbJUgNQpvesAwNNm91ZR6cxVWPiUvIRKTRsgrFwyGSMn18TF09OLA9me2iRqC/Blmk55Ix2qe0Q= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725935346; c=relaxed/simple; bh=+6bdnFUJ/2pDVt2eVytKz9FfIJNlxUgODrxummiYGj8=; h=Message-Id:In-Reply-To:References:From:Date:Subject:Content-Type: MIME-Version:To:Cc; b=fHOBWA6ZB5Pub6EWUiuxv2bDWAzEoWTgCKxX1HDCNDL1MjhybNQB3i2S6IGQa+eh58rsLyBfmbqW6Vdnw1C9HRnCHMNo+Ya+E4mAPjrV9J6uVzvom3L6Y/BMvnIyb0EsXY34ILUtTONT7IR8aFA9/ImFJ/wC7rrQPFCXMk765B8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=R7JWl2xf; arc=none smtp.client-ip=209.85.208.176 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="R7JWl2xf" Received: by mail-lj1-f176.google.com with SMTP id 38308e7fff4ca-2f760f7e25bso21846861fa.2 for ; Mon, 09 Sep 2024 19:29:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1725935343; x=1726540143; darn=vger.kernel.org; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=o4Jm4HohjZ/FsBXKVZyLjP/yNbs/nNpnfyVh79DmCrI=; b=R7JWl2xfHRcGu/nWxIT4hUyWaf+/2xzskQxLafWTUJBQ+2oSVK7YruS1IuPVK62M8w Da7KGQs5UzdRpfmYzrUjvw2obMxs1g7Vx4h/CDe2LjvhNSIk7a5Ri1vOUOId6wU0m88w JBbAmKhIBtOKALkVv4NsH+TIP8AGalir57MbbJ2KJrl+5tweKYAyv0dcOA7oaODAQfBo Wml4jfTNatXGaBf0INJeuWQOqznxDqhW+jaNlYZKoSsgGjnLviEGev19pEmBonCnbC9N 8pswhdZezOe9qduO31P30yQ5WW2MpQB50RBXoGWN1oiLVHSQheO2+oLeur8ten9Wa5b/ JWRQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725935343; x=1726540143; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=o4Jm4HohjZ/FsBXKVZyLjP/yNbs/nNpnfyVh79DmCrI=; b=ZCQWrV0qZihUveyi+y69hQLpuMnqKpRqm9sWGksERIXyFCh6BFVNStW+hbQgq9YxKV u4GCOxCP/ODeYW1R678QESON3K5zZYjsqi0jd8UbSwnXtCMgVI5OwjElK/9y9PRXU5u+ nu/47Q1+/Y7t/7lGjA+ksPDk+lXNQSIkNiFluANIENK1f+Ut8cIcbZKGWizbLowKjF5E O/PnebP3ZSMORgnXFjtiwJQT4UOKJo07HwhcLVj7R4jPoYzl2sFA9OH7lR1b5SzAC8cb ipoL4WT8PdfYGfaWlvXNeKcaWqofHOzqRVBpVAmtXuJ/m0A8L6byz/VDj0AyPyyIIipn SyFg== X-Gm-Message-State: AOJu0YxaCVBhHlYxj4XhuaF7yPNibBJjNUJTCrHi5rqumBLb2Ol6WToS aM/ufuTGjzLq5HGqKll29CkCZx8sPtsA+gnGr35/035LVLTWwIYODCp+wg== X-Google-Smtp-Source: AGHT+IG0OW2B5xIfhqRp50g7M5z3yCG/sFq3cy7wlKLHvaP/Sf4CLWeh/gmHhgUji9tCksoKAuspgA== X-Received: by 2002:a2e:be1b:0:b0:2f5:1fa7:ac7d with SMTP id 38308e7fff4ca-2f752495fdbmr69359371fa.37.1725935342537; Mon, 09 Sep 2024 19:29:02 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id 4fb4d7f45d1cf-5c3ebd8cdf7sm3665548a12.92.2024.09.09.19.29.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 09 Sep 2024 19:29:02 -0700 (PDT) Message-Id: In-Reply-To: References: Date: Tue, 10 Sep 2024 02:28:31 +0000 Subject: [PATCH 06/30] backfill: assume --sparse when sparse-checkout is enabled Fcc: Sent Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, johannes.schindelin@gmx.de, peff@peff.net, ps@pks.im, me@ttaylorr.com, johncai86@gmail.com, newren@gmail.com, Derrick Stolee , Derrick Stolee From: Derrick Stolee From: Derrick Stolee The previous change introduced the '--[no-]sparse' option for the 'git backfill' command, but did not assume it as enabled by default. However, this is likely the behavior that users will most often want to happen. Without this default, users with a small sparse-checkout may be confused when 'git backfill' downloads every version of every object in the full history. However, this is left as a separate change so this decision can be reviewed independently of the value of the '--[no-]sparse' option. Add a test of adding the '--sparse' option to a repo without sparse-checkout to make it clear that supplying it without a sparse-checkout is an error. Signed-off-by: Derrick Stolee --- Documentation/git-backfill.txt | 3 ++- builtin/backfill.c | 4 ++++ t/t5620-backfill.sh | 13 ++++++++++++- 3 files changed, 18 insertions(+), 2 deletions(-) diff --git a/Documentation/git-backfill.txt b/Documentation/git-backfill.txt index ecf2ac428ce..066ec6b161a 100644 --- a/Documentation/git-backfill.txt +++ b/Documentation/git-backfill.txt @@ -48,7 +48,8 @@ OPTIONS --[no-]sparse:: Only download objects if they appear at a path that matches the - current sparse-checkout. + current sparse-checkout. If the sparse-checkout feature is enabled, + then `--sparse` is assumed and can be disabled with `--no-sparse`. SEE ALSO -------- diff --git a/builtin/backfill.c b/builtin/backfill.c index de75471cf44..82a18e58a41 100644 --- a/builtin/backfill.c +++ b/builtin/backfill.c @@ -5,6 +5,7 @@ #include "repository.h" #include "commit.h" #include "dir.h" +#include "environment.h" #include "hex.h" #include "tree.h" #include "tree-walk.h" @@ -133,5 +134,8 @@ int cmd_backfill(int argc, const char **argv, const char *prefix) git_config(git_default_config, NULL); + if (ctx.sparse < 0) + ctx.sparse = core_apply_sparse_checkout; + return do_backfill(&ctx); } diff --git a/t/t5620-backfill.sh b/t/t5620-backfill.sh index c7bb27b72c1..1fa2e90f8cf 100755 --- a/t/t5620-backfill.sh +++ b/t/t5620-backfill.sh @@ -80,6 +80,12 @@ test_expect_success 'do partial clone 2, backfill batch size' ' test_line_count = 0 revs2 ' +test_expect_success 'backfill --sparse without sparse-checkout fails' ' + git init not-sparse && + test_must_fail git -C not-sparse backfill --sparse 2>err && + grep "problem loading sparse-checkout" err +' + test_expect_success 'backfill --sparse' ' git clone --sparse --filter=blob:none \ --single-branch --branch=main \ @@ -108,7 +114,12 @@ test_expect_success 'backfill --sparse' ' test_trace2_data promisor fetch_count 8 missing && - test_line_count = 24 missing + test_line_count = 24 missing && + + # Disabling the --sparse option (on by default) will download everything + git -C backfill3 backfill --no-sparse && + git -C backfill3 rev-list --quiet --objects --missing=print HEAD >missing && + test_line_count = 0 missing ' test_expect_success 'backfill --sparse without cone mode' ' From patchwork Tue Sep 10 02:28:32 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13797739 Received: from mail-ed1-f54.google.com (mail-ed1-f54.google.com [209.85.208.54]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 61E3D1553A2 for ; Tue, 10 Sep 2024 02:29:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.208.54 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725935348; cv=none; b=Lhnu0T1a9hjUz1yrMwWh7/Qzsn3NWITOjYvtkKLQKguKqU4s+KaGFPEHnfH5Y18uhh4c1BUoIJdiHffAn52FmKZNtSLHhUX29+RMSzfb2k7NYgOpeZ+nk0VgGPUZHoPNvROCuBFafjMRj65YbW2egKGlbPSlSIuZy3qgYDY0Ek4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725935348; c=relaxed/simple; bh=QRJsSZc60IEV4ZDpcbs9kydefRBOi/HwlQC97Qvrkxs=; h=Message-Id:In-Reply-To:References:From:Date:Subject:Content-Type: MIME-Version:To:Cc; b=QZWOE+VbkioJD7R8za2N2XMSOWOt+SAt/pf+24XsiuNnWtIROxWRkfDgAjBlGdzl+lNR4ZhzaXnivf67fYt9jJgE44Efsv2sykEFcsukrAovmgIMeBYr7qttSoYP3DhgzQGw4I5Lj/t6f1zP8BlQHLO3471FWLeqfRhngSK6SR8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=YpOm1nzo; arc=none smtp.client-ip=209.85.208.54 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="YpOm1nzo" Received: by mail-ed1-f54.google.com with SMTP id 4fb4d7f45d1cf-5c263118780so285892a12.2 for ; Mon, 09 Sep 2024 19:29:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1725935344; x=1726540144; darn=vger.kernel.org; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=NbVTtYaEPCjJM/2vZZq/HednK1w+JHK8tzmD/VLle5o=; b=YpOm1nzoYqtyQV2oXbQ3hKyYDlCrsOuXRuQzQTIOZxe7rJ9+8a3haM+ViHBm+604A6 +N44eKE7MNHoJenAFRH4LV/jJHR3ekpoMcK5RpYIbKFqz6iDppcKAz53pLjla/L+EHUL HysNpTiNjXGYD625NkCEggV8Hgzk6/gsPNLBScaIjAIxZK6URH8UZTRQN2Cq+B42nK1b q/WtS/8LoYvbWwrLG4uLSLjWhZjaQYAMkVq9fG4vMwOBgUzLLm0ONfEK2YyeiVdYzeKH V3xK8p8wjZM5chL06lM3rmf0rtqX76J/JftKOZEBYIjpktrYZQWWb2FUsJ88ZhRA6pKh s6lw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725935344; x=1726540144; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=NbVTtYaEPCjJM/2vZZq/HednK1w+JHK8tzmD/VLle5o=; b=t0TOzeMGhjBVdT8Dv5FCmufQ6fNm2qLojZ57772m/Qkz2MuqEfp3X9XGftHefeJyAP 5011mB1MFEvOhN5WH5X8R2cJ1GMuOppMjxb0Ra6Ihhf3CNS/VpxyxLyNAU5vyuBOz3cv oE4b88CRYgVgYBj4tWBkS5M4GYJr1E/8Ijyy10A9mMX9CW1gIl9eTgySImUvjAhIUuHC 9d7brHB8AJ2uEHgvCPbHHDDkRsxwQkyUwTlafts8cy1OdmrQnRLCGxIMe9MqzB6ju+J0 or6c3lzVZ4B1IxRt9Go1xnvWNNlqmgypmshFaaHtvFtXsmdCw+QgJcCmKbXeWLBr0x5p /pEQ== X-Gm-Message-State: AOJu0YwzOUgwlSLmMdchA/Mf1eNj2PqRvpU6Hn7hB6F0DYl//VdRB5Cf nJ59XpsoPHC29JxNVLZIC5zHGv2O4ZcgBzOCPLkMKC+SeGPYu1ROtlJ9Ig== X-Google-Smtp-Source: AGHT+IH6iJc6IJM2REftbErvoUcKKbD/+lQHT0iUush/Izw9TQEYGD3eHAbl1W9Nl9S0kY0oOdPIyg== X-Received: by 2002:a17:907:3181:b0:a86:aa57:57b8 with SMTP id a640c23a62f3a-a8a888d0570mr1090913866b.63.1725935343807; Mon, 09 Sep 2024 19:29:03 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a8d25ced17asm420038366b.170.2024.09.09.19.29.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 09 Sep 2024 19:29:03 -0700 (PDT) Message-Id: <2829fe3875438f3a9907f36d825d6c24952abded.1725935335.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Tue, 10 Sep 2024 02:28:32 +0000 Subject: [PATCH 07/30] path-walk: allow consumer to specify object types Fcc: Sent Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, johannes.schindelin@gmx.de, peff@peff.net, ps@pks.im, me@ttaylorr.com, johncai86@gmail.com, newren@gmail.com, Derrick Stolee , Derrick Stolee From: Derrick Stolee From: Derrick Stolee This adds the ability to ask for the commits as a single list. This will also reduce the calls in 'git backfill' to be a BUG() statement if called with anything other than blobs. Signed-off-by: Derrick Stolee --- builtin/backfill.c | 2 +- path-walk.c | 40 ++++++++++++++++++++++++++++++++++------ path-walk.h | 12 +++++++++++- 3 files changed, 46 insertions(+), 8 deletions(-) diff --git a/builtin/backfill.c b/builtin/backfill.c index 82a18e58a41..2a1b043f188 100644 --- a/builtin/backfill.c +++ b/builtin/backfill.c @@ -61,7 +61,7 @@ static int fill_missing_blobs(const char *path, struct backfill_context *ctx = data; if (type != OBJ_BLOB) - return 0; + BUG("fill_missing_blobs only takes blob objects"); for (size_t i = 0; i < list->nr; i++) { off_t size = 0; diff --git a/path-walk.c b/path-walk.c index dc2390dd9ea..d70e6840fb5 100644 --- a/path-walk.c +++ b/path-walk.c @@ -83,6 +83,10 @@ static int add_children(struct path_walk_context *ctx, if (S_ISGITLINK(entry.mode)) continue; + /* If the caller doesn't want blobs, then don't bother. */ + if (!ctx->info->blobs && type == OBJ_BLOB) + continue; + if (type == OBJ_TREE) { struct tree *child = lookup_tree(ctx->repo, &entry.oid); o = child ? &child->object : NULL; @@ -156,9 +160,11 @@ static int walk_path(struct path_walk_context *ctx, list = strmap_get(&ctx->paths_to_lists, path); - /* Evaluate function pointer on this data. */ - ret = ctx->info->path_fn(path, &list->oids, list->type, - ctx->info->path_fn_data); + /* Evaluate function pointer on this data, if requested. */ + if ((list->type == OBJ_TREE && ctx->info->trees) || + (list->type == OBJ_BLOB && ctx->info->blobs)) + ret = ctx->info->path_fn(path, &list->oids, list->type, + ctx->info->path_fn_data); /* Expand data for children. */ if (list->type == OBJ_TREE) { @@ -200,6 +206,7 @@ int walk_objects_by_path(struct path_walk_info *info) size_t commits_nr = 0, paths_nr = 0; struct commit *c; struct type_and_oid_list *root_tree_list; + struct type_and_oid_list *commit_list; struct path_walk_context ctx = { .repo = info->revs->repo, .revs = info->revs, @@ -210,28 +217,49 @@ int walk_objects_by_path(struct path_walk_info *info) trace2_region_enter("path-walk", "commit-walk", info->revs->repo); + CALLOC_ARRAY(commit_list, 1); + commit_list->type = OBJ_COMMIT; + /* Insert a single list for the root tree into the paths. */ CALLOC_ARRAY(root_tree_list, 1); root_tree_list->type = OBJ_TREE; strmap_put(&ctx.paths_to_lists, root_path, root_tree_list); - if (prepare_revision_walk(info->revs)) die(_("failed to setup revision walk")); while ((c = get_revision(info->revs))) { - struct object_id *oid = get_commit_tree_oid(c); - struct tree *t = lookup_tree(info->revs->repo, oid); + struct object_id *oid; + struct tree *t; commits_nr++; + if (info->commits) + oid_array_append(&commit_list->oids, + &c->object.oid); + + /* If we only care about commits, then skip trees. */ + if (!info->trees && !info->blobs) + continue; + + oid = get_commit_tree_oid(c); + t = lookup_tree(info->revs->repo, oid); + if (t) oid_array_append(&root_tree_list->oids, oid); else warning("could not find tree %s", oid_to_hex(oid)); + } trace2_data_intmax("path-walk", ctx.repo, "commits", commits_nr); trace2_region_leave("path-walk", "commit-walk", info->revs->repo); + /* Track all commits. */ + if (info->commits) + ret = info->path_fn("", &commit_list->oids, OBJ_COMMIT, + info->path_fn_data); + oid_array_clear(&commit_list->oids); + free(commit_list); + string_list_append(&ctx.path_stack, root_path); trace2_region_enter("path-walk", "path-walk", info->revs->repo); diff --git a/path-walk.h b/path-walk.h index bc1ebba5081..49b982dade6 100644 --- a/path-walk.h +++ b/path-walk.h @@ -32,6 +32,14 @@ struct path_walk_info { path_fn path_fn; void *path_fn_data; + /** + * Initialize which object types the path_fn should be called on. This + * could also limit the walk to skip blobs if not set. + */ + int commits; + int trees; + int blobs; + /** * Specify a sparse-checkout definition to match our paths to. Do not * walk outside of this sparse definition. If the patterns are in @@ -43,7 +51,9 @@ struct path_walk_info { struct pattern_list *pl; }; -#define PATH_WALK_INFO_INIT { 0 } +#define PATH_WALK_INFO_INIT { \ + .blobs = 1, \ +} /** * Given the configuration of 'info', walk the commits based on 'info->revs' and From patchwork Tue Sep 10 02:28:33 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13797740 Received: from mail-ej1-f54.google.com (mail-ej1-f54.google.com [209.85.218.54]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DAFA116D33F for ; Tue, 10 Sep 2024 02:29:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.218.54 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725935349; cv=none; b=bfOLddMHBjy2xcOBVHZF4N8DeH4FID4JFa1aJDD/EsrmDoRM/rD8iVpRzNdxWVQ23Res1Ap3iMqpT3V1nse7oH1yhhRFQrvbjAIVdjmifimAo534wVD6WF1pk/PnC40v9nuyVKUFu59/pMbz5N9jb6rBRBm2iQ4fsTEtY8E55Xk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725935349; c=relaxed/simple; bh=buS50GJxBqC3txCyMW9cio0rAAX/4QU+Hh9maMXHxxE=; h=Message-Id:In-Reply-To:References:From:Date:Subject:Content-Type: MIME-Version:To:Cc; b=NTLJafPFVHUn5Az4oHA6EE+h+O0/QCG+tOlutjLmngpxkS8uvsVlCVn5Ra21gn4p4KhTyKTqJf4LGPYYTax4Rw+MiLhHRTeY40ijYRbIVOPm7RxQpSwp/kVtLhSPE9pDessHMew25F1KKD+hSFFkc5CHzl2ArYFK2yMHskHgy9U= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=RPygyZIo; arc=none smtp.client-ip=209.85.218.54 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="RPygyZIo" Received: by mail-ej1-f54.google.com with SMTP id a640c23a62f3a-a8d64b27c45so175540866b.3 for ; Mon, 09 Sep 2024 19:29:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1725935346; x=1726540146; darn=vger.kernel.org; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=JrInDEHn9+e1E/ZYKB/gh9cxwB913Z/rr/mttm9VHnM=; b=RPygyZIotEdjBqTpG3xx/5up3h+Lt9QtPW6GMUpy1oeRYjgiQyEJBubsl5OVZ13xWX 7KaJRdFOhlBAornrY/gXyb2y1xJ09INHM96LHC0/U/6cfg/QVj/uEBmjg/Mye2XhEXWD 9mr4KSUQTXLrXWDpx0MER2zIwKRVGgPoTKWlSiUeT561vnGJk6u7B3qXiBi/N0xAiODT bBS73owXzcpRRhWk0JTO5kpZ2FoOcSRlvMyWN5pA1hWGV/+HByRqf7Okqql66d/qg8Xb 1YRaVFyA9T8eNJnHYHzoJpqHHEYXII51gZ2EKzrAoon/j2bryeexW6JmUyOBIGE+vySN leEA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725935346; x=1726540146; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=JrInDEHn9+e1E/ZYKB/gh9cxwB913Z/rr/mttm9VHnM=; b=UtNaonGvOtpTzfCRKvpwHcpeGIFbRZG2HZXJhKYikXgh3BHsCzJu+xHcpU8YHvvToF ZfVWcx+k2NlvyPSOkmk6V8NdGBPTsBB6OWNwnTY8UHbWBbDxpaDWjk/vBT9eBX1WfEIJ IND6wXH256QJ+3LcmuvksPeUCtFQuA9fbcW+4yLJD3TYrdhKR0TpdyC3FJHHuaOP9Tic 05vXEHkes6NDv8WIS48HSriEQdeSsvJjh4aunTk9Cbr11CXi9AzcSqnfQ9jyNcVcCXOY 4PTR4kyn078V1gjvHx9H+EcG6gU8faFiyifFeP2yeqr9Gk07rIiiUF3O00ZIU4f2cswd mOuA== X-Gm-Message-State: AOJu0YwmbLcLAWlcJwWvQeBoBytKbj8lAZFDTLTUy/GlPXdjslML5GSZ WYtxfNx1pDOuS1uMHWezv0unIrnQRSGJNRR6EZkBaUilooUvGNv02L5IEA== X-Google-Smtp-Source: AGHT+IFzXK+x41L0VARFeeV8p1ItgSw9KmnkLgW08E1eik9afRRvY95RNhBLhMWjEh1kW0Eck7M7Rg== X-Received: by 2002:a17:907:5011:b0:a8a:8915:6bf7 with SMTP id a640c23a62f3a-a8a89156c92mr733095466b.51.1725935345753; Mon, 09 Sep 2024 19:29:05 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a8d25d65396sm414836066b.217.2024.09.09.19.29.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 09 Sep 2024 19:29:04 -0700 (PDT) Message-Id: In-Reply-To: References: Date: Tue, 10 Sep 2024 02:28:33 +0000 Subject: [PATCH 08/30] path-walk: allow visiting tags Fcc: Sent Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, johannes.schindelin@gmx.de, peff@peff.net, ps@pks.im, me@ttaylorr.com, johncai86@gmail.com, newren@gmail.com, Derrick Stolee , Derrick Stolee From: Derrick Stolee From: Derrick Stolee In anticipation of using the path-walk API to analyze tags or include them in a pack-file, add the ability to walk the tags that were included in the revision walk. Signed-off-by: Derrick Stolee --- path-walk.c | 58 +++++++++++++++++++++++++++++++++++++++++++++++++++++ path-walk.h | 1 + 2 files changed, 59 insertions(+) diff --git a/path-walk.c b/path-walk.c index d70e6840fb5..65f9856afa2 100644 --- a/path-walk.c +++ b/path-walk.c @@ -14,6 +14,7 @@ #include "revision.h" #include "string-list.h" #include "strmap.h" +#include "tag.h" #include "trace2.h" #include "tree.h" #include "tree-walk.h" @@ -215,6 +216,9 @@ int walk_objects_by_path(struct path_walk_info *info) .paths_to_lists = STRMAP_INIT }; + struct oid_array tagged_tree_list = OID_ARRAY_INIT; + struct oid_array tagged_blob_list = OID_ARRAY_INIT; + trace2_region_enter("path-walk", "commit-walk", info->revs->repo); CALLOC_ARRAY(commit_list, 1); @@ -260,6 +264,60 @@ int walk_objects_by_path(struct path_walk_info *info) oid_array_clear(&commit_list->oids); free(commit_list); + if (info->tags) { + struct oid_array tags = OID_ARRAY_INIT; + + trace2_region_enter("path-walk", "tag-walk", info->revs->repo); + + /* + * Walk any pending objects at this point, but they should only + * be tags. + */ + for (size_t i = 0; i < info->revs->pending.nr; i++) { + struct object_array_entry *pending = info->revs->pending.objects + i; + struct object *obj = pending->item; + + while (obj->type == OBJ_TAG) { + struct tag *tag = lookup_tag(info->revs->repo, + &obj->oid); + oid_array_append(&tags, &obj->oid); + obj = tag->tagged; + } + + switch (obj->type) { + case OBJ_TREE: + oid_array_append(&tagged_tree_list, &obj->oid); + break; + + case OBJ_BLOB: + oid_array_append(&tagged_blob_list, &obj->oid); + break; + + case OBJ_COMMIT: + /* skip */ + break; + + default: + BUG("should not see any other type here"); + } + } + + info->path_fn("initial", &tags, OBJ_TAG, info->path_fn_data); + + if (tagged_tree_list.nr) + info->path_fn("tagged-trees", &tagged_tree_list, OBJ_TREE, + info->path_fn_data); + if (tagged_blob_list.nr) + info->path_fn("tagged-blobs", &tagged_blob_list, OBJ_BLOB, + info->path_fn_data); + + trace2_data_intmax("path-walk", ctx.repo, "tags", tags.nr); + trace2_region_leave("path-walk", "tag-walk", info->revs->repo); + oid_array_clear(&tags); + oid_array_clear(&tagged_tree_list); + oid_array_clear(&tagged_blob_list); + } + string_list_append(&ctx.path_stack, root_path); trace2_region_enter("path-walk", "path-walk", info->revs->repo); diff --git a/path-walk.h b/path-walk.h index 49b982dade6..637d3b0cabb 100644 --- a/path-walk.h +++ b/path-walk.h @@ -39,6 +39,7 @@ struct path_walk_info { int commits; int trees; int blobs; + int tags; /** * Specify a sparse-checkout definition to match our paths to. Do not From patchwork Tue Sep 10 02:28:34 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff Hostetler X-Patchwork-Id: 13797741 Received: from mail-ed1-f41.google.com (mail-ed1-f41.google.com [209.85.208.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A352D170A0C for ; Tue, 10 Sep 2024 02:29:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.208.41 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725935350; cv=none; b=A0a8oqHw5xmyEfk73dzfzlv/jH1c2qGaCa0C4VGxWvSuep8dYnVVoxM9IGHiJXCb556JdRTZhp9t1LmPdfP6+03DKpczQJo/p82kxg9Fp+A1oZ4Wm7hIjfsUes44A/Z5fuj+u0J234uGqeOJsRQHYF+NEBwe+pF1bGObWaQtL28= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725935350; c=relaxed/simple; bh=Go1XUKLY7c0eb5W9KHBv7FO3C997HZdEcDN77UNK3Mk=; h=Message-Id:In-Reply-To:References:From:Date:Subject:Content-Type: MIME-Version:To:Cc; b=DHz3lxbZPQ8BZICVaDy3FhrknSknomOta4OvoUo6mrwKABCDEFZQqxB7rrJFjObSIBmiXwo2S49hqKjZ9Q31zdQe6fUps8A8SvAI+1WWDbgvZNtENt0HUVJS6cjK7PapiUPj4Q+BeuWIHitnWJetaMm2Gej5M4K4nzHnWALquEo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=J/XlDF8R; arc=none smtp.client-ip=209.85.208.41 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="J/XlDF8R" Received: by mail-ed1-f41.google.com with SMTP id 4fb4d7f45d1cf-5c3cdba33b0so5322758a12.1 for ; Mon, 09 Sep 2024 19:29:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1725935346; x=1726540146; darn=vger.kernel.org; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=6TIY02C2h2gxt3+JNHhmVHI27oCIFDdgGPNie6FNZlY=; b=J/XlDF8Rh1ZyWG+qLR0Q1i31tt+SejTpU3nTzi2Mp0miqmSHF6IE6ZfkDSRBhwLEEq O3vHrHnHwviE48J/cTlI77W+MBpm1KMiqqovPoUksXaAzt5dj7vBGja5ySSBeMZl277Z zQHtqCPonXapdM2C1sKkmJrgjvewyTeO3jBIYrvSPj4QJapv7x8+RcFCKukjf/Q/yG4f QoOvtm28irPCwnGb5j2nWv32yVvJta2u2I5cwf1kFttYHpo0otrd2/oudVNHJ71DMhBX km7soh2fvIoTjnH/+vbLvvJYxBggf+FMmW/VHtPQIv1BNiqcseQP81/u0dA76MTendCE 6beA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725935346; x=1726540146; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=6TIY02C2h2gxt3+JNHhmVHI27oCIFDdgGPNie6FNZlY=; b=ecVIbhosBY4BoGQvh2cseEbl1l+v7gft9ry+ZqrDYM21FOZ1tgLqM9lxx5Xvg6S/54 pcPpz60u22A2kp2Kx62xuhKRMG3wcfVMYci9C+zJGTwfRN5U+TVZ2lDq8yu+s5KrTYdh qE62SVoz6tlVTVryIa+sxIEjfE2KFhRWPK9BaJ/2yszH7tu9y3PUe5ocSjk0P5I05m5f TgkopEOmuobmV4jNBtVZzUR2oT0uIS/1lRhWRYto45dEFFoFN9hLbaLT1KDsHRajv+Rp rp+6LAX/P+ntbD9j+W56WyMN8hGId41L2K6TRPLnPcKM6wWqWYV3MkYBHucvjQRuiME/ OfOQ== X-Gm-Message-State: AOJu0YwA2lKuTl+BamlIl7CeahcdknmGxt0wjPRsnoNxHrKF9CNUhgob dmkmwDY86/i4LW7E8NMv8SpciRR8uH9Cl8LalZ16CqLr4Qpyr9LORV1J0w== X-Google-Smtp-Source: AGHT+IFMVYE7IqRGpHP6rSA5w0SZzhVo7gGtKuHDpGV31hpDgmwQi7XCGxC9cIuRlA/ZDnPlIiJ06Q== X-Received: by 2002:a17:906:fd8b:b0:a8a:809b:14e0 with SMTP id a640c23a62f3a-a8a88871d22mr1362627066b.48.1725935346401; Mon, 09 Sep 2024 19:29:06 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a8d25a2539bsm415256266b.85.2024.09.09.19.29.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 09 Sep 2024 19:29:06 -0700 (PDT) Message-Id: <7d43a1634bbe2d2efa96a806e3de1f1fd480041b.1725935335.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Tue, 10 Sep 2024 02:28:34 +0000 Subject: [PATCH 09/30] survey: stub in new experimental `git-survey` command Fcc: Sent Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, johannes.schindelin@gmx.de, peff@peff.net, ps@pks.im, me@ttaylorr.com, johncai86@gmail.com, newren@gmail.com, Derrick Stolee , Jeff Hostetler From: Jeff Hostetler From: Jeff Hostetler Start work on a new `git survey` command to scan the repository for monorepo performance and scaling problems. The goal is to measure the various known "dimensions of scale" and serve as a foundation for adding additional measurements as we learn more about Git monorepo scaling problems. The initial goal is to complement the scanning and analysis performed by the GO-based `git-sizer` (https://github.com/github/git-sizer) tool. It is hoped that by creating a builtin command, we may be able to take advantage of internal Git data structures and code that is not accessible from GO to gain further insight into potential scaling problems. RFC TODO: Adapt this boilerplat to match the upcoming changes to builtin methods that include a 'struct repository' pointer. Co-authored-by: Derrick Stolee Signed-off-by: Jeff Hostetler Signed-off-by: Derrick Stolee --- .gitignore | 1 + Documentation/git-survey.txt | 36 ++++++++++++++++++++++ Makefile | 1 + builtin.h | 1 + builtin/survey.c | 60 ++++++++++++++++++++++++++++++++++++ command-list.txt | 1 + git.c | 1 + t/t8100-git-survey.sh | 18 +++++++++++ 8 files changed, 119 insertions(+) create mode 100644 Documentation/git-survey.txt create mode 100644 builtin/survey.c create mode 100755 t/t8100-git-survey.sh diff --git a/.gitignore b/.gitignore index 8f5cb938ecb..3f6fdb31a5e 100644 --- a/.gitignore +++ b/.gitignore @@ -165,6 +165,7 @@ /git-submodule /git-submodule--helper /git-subtree +/git-survey /git-svn /git-switch /git-symbolic-ref diff --git a/Documentation/git-survey.txt b/Documentation/git-survey.txt new file mode 100644 index 00000000000..cdd1ec4358b --- /dev/null +++ b/Documentation/git-survey.txt @@ -0,0 +1,36 @@ +git-survey(1) +============= + +NAME +---- +git-survey - EXPERIMENTAL: Measure various repository dimensions of scale + +SYNOPSIS +-------- +[verse] +(EXPERIMENTAL!) `git survey` + +DESCRIPTION +----------- + +Survey the repository and measure various dimensions of scale. + +As repositories grow to "monorepo" size, certain data shapes can cause +performance problems. `git-survey` attempts to measure and report on +known problem areas. + +OPTIONS +------- + +--progress:: + Show progress. This is automatically enabled when interactive. + +OUTPUT +------ + +By default, `git survey` will print information about the repository in a +human-readable format that includes overviews and tables. + +GIT +--- +Part of the linkgit:git[1] suite diff --git a/Makefile b/Makefile index 4305474d96e..154de6e01d0 100644 --- a/Makefile +++ b/Makefile @@ -1303,6 +1303,7 @@ BUILTIN_OBJS += builtin/sparse-checkout.o BUILTIN_OBJS += builtin/stash.o BUILTIN_OBJS += builtin/stripspace.o BUILTIN_OBJS += builtin/submodule--helper.o +BUILTIN_OBJS += builtin/survey.o BUILTIN_OBJS += builtin/symbolic-ref.o BUILTIN_OBJS += builtin/tag.o BUILTIN_OBJS += builtin/unpack-file.o diff --git a/builtin.h b/builtin.h index 73dd0ccbe8c..d4e8cf3b97b 100644 --- a/builtin.h +++ b/builtin.h @@ -239,6 +239,7 @@ int cmd_status(int argc, const char **argv, const char *prefix); int cmd_stash(int argc, const char **argv, const char *prefix); int cmd_stripspace(int argc, const char **argv, const char *prefix); int cmd_submodule__helper(int argc, const char **argv, const char *prefix); +int cmd_survey(int argc, const char **argv, const char *prefix); int cmd_switch(int argc, const char **argv, const char *prefix); int cmd_symbolic_ref(int argc, const char **argv, const char *prefix); int cmd_tag(int argc, const char **argv, const char *prefix); diff --git a/builtin/survey.c b/builtin/survey.c new file mode 100644 index 00000000000..4cfd0f0293c --- /dev/null +++ b/builtin/survey.c @@ -0,0 +1,60 @@ +#include "builtin.h" +#include "config.h" +#include "parse-options.h" + +static const char * const survey_usage[] = { + N_("(EXPERIMENTAL!) git survey "), + NULL, +}; + +struct survey_opts { + int verbose; + int show_progress; +}; + +static struct survey_opts survey_opts = { + .verbose = 0, + .show_progress = -1, /* defaults to isatty(2) */ +}; + +static struct option survey_options[] = { + OPT__VERBOSE(&survey_opts.verbose, N_("verbose output")), + OPT_BOOL(0, "progress", &survey_opts.show_progress, N_("show progress")), + OPT_END(), +}; + +static int survey_load_config_cb(const char *var, const char *value, + const struct config_context *ctx, void *pvoid) +{ + if (!strcmp(var, "survey.verbose")) { + survey_opts.verbose = git_config_bool(var, value); + return 0; + } + if (!strcmp(var, "survey.progress")) { + survey_opts.show_progress = git_config_bool(var, value); + return 0; + } + + return git_default_config(var, value, ctx, pvoid); +} + +static void survey_load_config(void) +{ + git_config(survey_load_config_cb, NULL); +} + +int cmd_survey(int argc, const char **argv, const char *prefix) +{ + if (argc == 2 && !strcmp(argv[1], "-h")) + usage_with_options(survey_usage, survey_options); + + prepare_repo_settings(the_repository); + survey_load_config(); + + argc = parse_options(argc, argv, prefix, survey_options, survey_usage, 0); + + if (survey_opts.show_progress < 0) + survey_opts.show_progress = isatty(2); + + return 0; +} diff --git a/command-list.txt b/command-list.txt index c537114b468..ecc9d2281a0 100644 --- a/command-list.txt +++ b/command-list.txt @@ -187,6 +187,7 @@ git-stash mainporcelain git-status mainporcelain info git-stripspace purehelpers git-submodule mainporcelain +git-survey mainporcelain git-svn foreignscminterface git-switch mainporcelain history git-symbolic-ref plumbingmanipulators diff --git a/git.c b/git.c index 4f2215e9c8b..98e90838e42 100644 --- a/git.c +++ b/git.c @@ -630,6 +630,7 @@ static struct cmd_struct commands[] = { { "status", cmd_status, RUN_SETUP | NEED_WORK_TREE }, { "stripspace", cmd_stripspace }, { "submodule--helper", cmd_submodule__helper, RUN_SETUP }, + { "survey", cmd_survey, RUN_SETUP }, { "switch", cmd_switch, RUN_SETUP | NEED_WORK_TREE }, { "symbolic-ref", cmd_symbolic_ref, RUN_SETUP }, { "tag", cmd_tag, RUN_SETUP | DELAY_PAGER_CONFIG }, diff --git a/t/t8100-git-survey.sh b/t/t8100-git-survey.sh new file mode 100755 index 00000000000..2df7fa83629 --- /dev/null +++ b/t/t8100-git-survey.sh @@ -0,0 +1,18 @@ +#!/bin/sh + +test_description='git survey' + +GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main +export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME + +TEST_PASSES_SANITIZE_LEAK=0 +export TEST_PASSES_SANITIZE_LEAK + +. ./test-lib.sh + +test_expect_success 'git survey -h shows experimental warning' ' + test_expect_code 129 git survey -h 2>usage && + grep "EXPERIMENTAL!" usage +' + +test_done From patchwork Tue Sep 10 02:28:35 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff Hostetler X-Patchwork-Id: 13797742 Received: from mail-ed1-f49.google.com (mail-ed1-f49.google.com [209.85.208.49]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 608D5175D5E for ; Tue, 10 Sep 2024 02:29:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.208.49 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725935352; cv=none; b=FTr78Jx9Q51VIqzpRu33w174fou9J63ElvjpK3HdpfR/QLaBchwCh/xJI9kkt4aSQZLD1Im28MfA1PgwHydr4BqqX3a2kzNbsCOjJe4DeYVLsTzeVA1eDl6EGVreYfHtRoVmpeX3YDeVSeapfp2Te0hon0I3cXjjvapbO9u8UaM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725935352; c=relaxed/simple; bh=QkNCjheZJf97d3eT/AcgVlvKzl8U8AevTnI/fn6r8Ww=; h=Message-Id:In-Reply-To:References:From:Date:Subject:Content-Type: MIME-Version:To:Cc; b=S7cYtEKPQwolOFUQpk5YXgP22aPxzemBq4NVtHubQYx28W/zuMPa/cOIW65ob1WcfjebN9AVqncJ1de/667ywdW0qXq1ahlN7keo0kL+yAvgKKdjyDa7aCBw92xV5YJTBAL7U/6OTHWLPsAtNd3++kDSqMFpap3uou2W6pqFtl8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=HmKdTxaH; arc=none smtp.client-ip=209.85.208.49 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="HmKdTxaH" Received: by mail-ed1-f49.google.com with SMTP id 4fb4d7f45d1cf-5c26311c6f0so6100060a12.3 for ; Mon, 09 Sep 2024 19:29:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1725935348; x=1726540148; darn=vger.kernel.org; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=y+b6DMYyerKciEpsy0G6dMm8vyY33u469IiHYEKCKfQ=; b=HmKdTxaHFdRHWmmL6XyTceQSMpDAaL5H0jgJOJl1tfM9Hl6euDsZIZNSqdQRtx4Xt6 mqXwyxCSA2YHEEcymyjblPZ3djInCfT/C/4BN+LmvNaNa1JVvNBnHXFRL5c0ZwAa8YsG xOTiBSdrtBbBrjUkbJEnbD12yeABawK+Z7OEYZg/a1UqMsdgE7XdivKYmhvWmbx0t9wP 49vzX5XzAdvLq0JuTQTNkJ+HkY6kbmS9m7W2dt8wBGPvsKYcpm6RvzVbMttViQxQxw2J 7+vfz1cU+cpDeVxDZ6iZJNafRWrIzJcylQQRwbTD9RlXWRjNl2TgOUCwNutETiGy1u15 Posw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725935348; x=1726540148; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=y+b6DMYyerKciEpsy0G6dMm8vyY33u469IiHYEKCKfQ=; b=EX1tj9F+Elf2asLnQF1cUx1E6FCjswzpPtodz9QPLDkwJIHAguI4slZlddvyV/lT6z wb0aXM/5pwveYxm3InxtdSFChT+89HhP4veK9JGm8+RuNqpNbCdFqjyr1kC+e5HeEB87 J+Pe+xoP/wKNMvNkUNjKtHZvBmhSyZuGmXB+rkgzPKNcWR1nQr6FT/m1VPmgDeBvsaIF Do6YEGlkVpB/v6te4OhN/U0SAzdknvKSkY8KjywJfEHIfSp3F9MiyFIdyFcn5ed2wfZx r9rtaZuV91OuUgnknyxUzkQpxnkJ0wSKoMjF5l2tX4H4g6bS/8OdWmP+vN1o5T/IoEzN hfyw== X-Gm-Message-State: AOJu0YxtW/zObGVqe61dfrabnxa881Xld0O9eoOIsC718rFaXbLry7ME 4as0+T6/PegRpKj4boYAKYBI+5XplFNWs2qPfFW4A7K/0r1EwSPYZjiiKw== X-Google-Smtp-Source: AGHT+IEE4yRz88wzOsDWv1UA5+RKY/5Z4lL/THwv8/rCA5pdJHFidMoPPiwN6Cz4zaajZviPKGHIJw== X-Received: by 2002:a05:6402:34c4:b0:5be:cdaf:1c09 with SMTP id 4fb4d7f45d1cf-5c3dc7baef3mr9661238a12.28.1725935348055; Mon, 09 Sep 2024 19:29:08 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id 4fb4d7f45d1cf-5c3ebd46824sm3653939a12.23.2024.09.09.19.29.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 09 Sep 2024 19:29:07 -0700 (PDT) Message-Id: <90986876381e4ccc10c5e191a3928407181e6a04.1725935335.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Tue, 10 Sep 2024 02:28:35 +0000 Subject: [PATCH 10/30] survey: add command line opts to select references Fcc: Sent Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, johannes.schindelin@gmx.de, peff@peff.net, ps@pks.im, me@ttaylorr.com, johncai86@gmail.com, newren@gmail.com, Derrick Stolee , Jeff Hostetler From: Jeff Hostetler From: Jeff Hostetler By default we will scan all references in "refs/heads/", "refs/tags/" and "refs/remotes/". Add command line opts let the use ask for all refs or a subset of them and to include a detached HEAD. Signed-off-by: Jeff Hostetler Signed-off-by: Derrick Stolee --- Documentation/git-survey.txt | 34 +++++++++++++ builtin/survey.c | 99 ++++++++++++++++++++++++++++++++++++ 2 files changed, 133 insertions(+) diff --git a/Documentation/git-survey.txt b/Documentation/git-survey.txt index cdd1ec4358b..c648ef704e3 100644 --- a/Documentation/git-survey.txt +++ b/Documentation/git-survey.txt @@ -19,12 +19,46 @@ As repositories grow to "monorepo" size, certain data shapes can cause performance problems. `git-survey` attempts to measure and report on known problem areas. +Ref Selection and Reachable Objects +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +In this first analysis phase, `git survey` will iterate over the set of +requested branches, tags, and other refs and treewalk over all of the +reachable commits, trees, and blobs and generate various statistics. + OPTIONS ------- --progress:: Show progress. This is automatically enabled when interactive. +Ref Selection +~~~~~~~~~~~~~ + +The following options control the set of refs that `git survey` will examine. +By default, `git survey` will look at tags, local branches, and remote refs. +If any of the following options are given, the default set is cleared and +only refs for the given options are added. + +--all-refs:: + Use all refs. This includes local branches, tags, remote refs, + notes, and stashes. This option overrides all of the following. + +--branches:: + Add local branches (`refs/heads/`) to the set. + +--tags:: + Add tags (`refs/tags/`) to the set. + +--remotes:: + Add remote branches (`refs/remote/`) to the set. + +--detached:: + Add HEAD to the set. + +--other:: + Add notes (`refs/notes/`) and stashes (`refs/stash/`) to the set. + OUTPUT ------ diff --git a/builtin/survey.c b/builtin/survey.c index 4cfd0f0293c..e0e844201de 100644 --- a/builtin/survey.c +++ b/builtin/survey.c @@ -7,19 +7,117 @@ static const char * const survey_usage[] = { NULL, }; +struct survey_refs_wanted { + int want_all_refs; /* special override */ + + int want_branches; + int want_tags; + int want_remotes; + int want_detached; + int want_other; /* see FILTER_REFS_OTHERS -- refs/notes/, refs/stash/ */ +}; + +/* + * The set of refs that we will search if the user doesn't select + * any on the command line. + */ +static struct survey_refs_wanted refs_if_unspecified = { + .want_all_refs = 0, + + .want_branches = 1, + .want_tags = 1, + .want_remotes = 1, + .want_detached = 0, + .want_other = 0, +}; + struct survey_opts { int verbose; int show_progress; + struct survey_refs_wanted refs; }; static struct survey_opts survey_opts = { .verbose = 0, .show_progress = -1, /* defaults to isatty(2) */ + + .refs.want_all_refs = -1, + + .refs.want_branches = -1, /* default these to undefined */ + .refs.want_tags = -1, + .refs.want_remotes = -1, + .refs.want_detached = -1, + .refs.want_other = -1, }; +/* + * After parsing the command line arguments, figure out which refs we + * should scan. + * + * If ANY were given in positive sense, then we ONLY include them and + * do not use the builtin values. + */ +static void fixup_refs_wanted(void) +{ + struct survey_refs_wanted *rw = &survey_opts.refs; + + /* + * `--all-refs` overrides and enables everything. + */ + if (rw->want_all_refs == 1) { + rw->want_branches = 1; + rw->want_tags = 1; + rw->want_remotes = 1; + rw->want_detached = 1; + rw->want_other = 1; + return; + } + + /* + * If none of the `--` were given, we assume all + * of the builtin unspecified values. + */ + if (rw->want_branches == -1 && + rw->want_tags == -1 && + rw->want_remotes == -1 && + rw->want_detached == -1 && + rw->want_other == -1) { + *rw = refs_if_unspecified; + return; + } + + /* + * Since we only allow positive boolean values on the command + * line, we will only have true values where they specified + * a `--`. + * + * So anything that still has an unspecified value should be + * set to false. + */ + if (rw->want_branches == -1) + rw->want_branches = 0; + if (rw->want_tags == -1) + rw->want_tags = 0; + if (rw->want_remotes == -1) + rw->want_remotes = 0; + if (rw->want_detached == -1) + rw->want_detached = 0; + if (rw->want_other == -1) + rw->want_other = 0; +} + static struct option survey_options[] = { OPT__VERBOSE(&survey_opts.verbose, N_("verbose output")), OPT_BOOL(0, "progress", &survey_opts.show_progress, N_("show progress")), + + OPT_BOOL_F(0, "all-refs", &survey_opts.refs.want_all_refs, N_("include all refs"), PARSE_OPT_NONEG), + + OPT_BOOL_F(0, "branches", &survey_opts.refs.want_branches, N_("include branches"), PARSE_OPT_NONEG), + OPT_BOOL_F(0, "tags", &survey_opts.refs.want_tags, N_("include tags"), PARSE_OPT_NONEG), + OPT_BOOL_F(0, "remotes", &survey_opts.refs.want_remotes, N_("include all remotes refs"), PARSE_OPT_NONEG), + OPT_BOOL_F(0, "detached", &survey_opts.refs.want_detached, N_("include detached HEAD"), PARSE_OPT_NONEG), + OPT_BOOL_F(0, "other", &survey_opts.refs.want_other, N_("include notes and stashes"), PARSE_OPT_NONEG), + OPT_END(), }; @@ -55,6 +153,7 @@ int cmd_survey(int argc, const char **argv, const char *prefix) if (survey_opts.show_progress < 0) survey_opts.show_progress = isatty(2); + fixup_refs_wanted(); return 0; } From patchwork Tue Sep 10 02:28:36 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff Hostetler X-Patchwork-Id: 13797743 Received: from mail-ed1-f53.google.com (mail-ed1-f53.google.com [209.85.208.53]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 37962176FA5 for ; Tue, 10 Sep 2024 02:29:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.208.53 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725935353; cv=none; b=BUGq2KhOFdb3Im2g9mLauG33hS+rQnTELnNCuqnUKvgdZWr2V0aPJpx6PgGMxsgYnBeuOejah/Ps4szwPqbbNCAe21McE0nbPl1IeQ7bnH0+3cAmVhEMJYyRv7r2Mg/TCSakm3lky7VpIe6j8KxbzsTFOxEZSRddKm7e/j9XeIo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725935353; c=relaxed/simple; bh=cpXSFDQ571W8jKACYpRAozMOaFxC0Dt5V1aitd3Aq4g=; h=Message-Id:In-Reply-To:References:From:Date:Subject:Content-Type: MIME-Version:To:Cc; b=IDmy64Lk4VkyLFfbqMTi53eA7x56X6/qfyPaxEArvYENKctkvr5GBEuA84EOZKyCYN843p7QXditf6Jkmq1hTcSScgn3LEq148YzEAMZnt1UaNmZOI9zwoO2TjzefXgtHHQ7qTreLokOdvJSlTqA9LCyE/XZH3R/FEX3V6snEB4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=EcMWV5ax; arc=none smtp.client-ip=209.85.208.53 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="EcMWV5ax" Received: by mail-ed1-f53.google.com with SMTP id 4fb4d7f45d1cf-5c24c92f699so5544090a12.2 for ; Mon, 09 Sep 2024 19:29:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1725935349; x=1726540149; darn=vger.kernel.org; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=7MFLD0kB3zXHEO4hN2bbn+KjTFRmzPx4JwLlLJ1ZJZ4=; b=EcMWV5axpMHXQB+Ek2iUJe94SfkAYxq+TUoAcpBIDjDqly7Th+01LvcAaCziVY8qFh OBD/FB/G7D6uCpeeUnK4SFfVLH2gVuM6he8tPKMx95RwsUlQBuefx7KGkQFW7uxWemc7 SXcNUVxah1WSFnm9xyjE9uUqvOsoahbXaKeU+mWur7lN9M+fxXPGuL0io2JqEsC51v9H sTNIPFNlshub3utHKXBgvLmOsL7Upd+Dpg2riY57/jo1qHqwEEtkvTwvLRXQK0kF3xG6 8wtl/CUcmDtUkJYrnN8oYfiDo00kz9h8qZa9rdmmjcKFF9daRWgAxqTJN4VxazvT84Lv ra2g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725935349; x=1726540149; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=7MFLD0kB3zXHEO4hN2bbn+KjTFRmzPx4JwLlLJ1ZJZ4=; b=QYfSWKXJh7KbZp/0DmQ5byNHadEdqeAEi6Zn+knAmkbNgy/Bi2V8Yu6a9DOsbA23Zr Tfv8VyF4wWuuITuFi80yeM+C5klvaTNHmuY2YsZ5bJGJPnY2ND/iSHvGjAkXOiY6bOWW iVugazXx5YTsU1ya4nsqs/cPKIuJwE23G34k1VPWWMdZqT5OCU+E/5Q0CK+pEsCWYh2D jIOLu4P0NKT19vf6F/sWIhe1WMb0xGjnojNSwLLLK7m7ZDzVnHfdRqC3kuOwif20Q05T BsSYaHOGRtVNyWopblPtBNh4sVt9oOuB0vzI1Mhctnh59UFdqzRjvPNPCeUJZkmXWqy6 EMrA== X-Gm-Message-State: AOJu0YyE4dZ4C4fJ2FqhNGu37wflR1ERbjgIfWrs8so1y+2DmMKDxBen IIsLDdrWoEo7HYw+bkszAdZtoZXq+lwSmSlAteQfGddSredBv2YyciPVqA== X-Google-Smtp-Source: AGHT+IGRr2SEcS0Bu4ogigNbnyO3qo6+H18DruWB0p2VaJIkKs8VFDywLVpTheFjDk4Ez7kAKq0SGQ== X-Received: by 2002:a05:6402:2751:b0:5c2:5075:7d1 with SMTP id 4fb4d7f45d1cf-5c3dc785341mr11495936a12.7.1725935348811; Mon, 09 Sep 2024 19:29:08 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id 4fb4d7f45d1cf-5c3ebd76eb6sm3652946a12.61.2024.09.09.19.29.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 09 Sep 2024 19:29:08 -0700 (PDT) Message-Id: In-Reply-To: References: Date: Tue, 10 Sep 2024 02:28:36 +0000 Subject: [PATCH 11/30] survey: collect the set of requested refs Fcc: Sent Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, johannes.schindelin@gmx.de, peff@peff.net, ps@pks.im, me@ttaylorr.com, johncai86@gmail.com, newren@gmail.com, Derrick Stolee , Jeff Hostetler From: Jeff Hostetler From: Jeff Hostetler Collect the set of requested branches, tags, and etc into a ref_array and collect the set of requested patterns into a strvec. RFC TODO: This patch has some changes that should be in the previous patch, to make the diff look a lot better. Co-authored-by: Derrick Stolee Signed-off-by: Jeff Hostetler Signed-off-by: Derrick Stolee --- builtin/survey.c | 258 ++++++++++++++++++++++++++++++++++-------- t/t8100-git-survey.sh | 9 ++ 2 files changed, 217 insertions(+), 50 deletions(-) diff --git a/builtin/survey.c b/builtin/survey.c index e0e844201de..1b4fe591e59 100644 --- a/builtin/survey.c +++ b/builtin/survey.c @@ -1,6 +1,12 @@ #include "builtin.h" #include "config.h" +#include "object.h" +#include "object-store-ll.h" #include "parse-options.h" +#include "progress.h" +#include "ref-filter.h" +#include "strvec.h" +#include "trace2.h" static const char * const survey_usage[] = { N_("(EXPERIMENTAL!) git survey "), @@ -17,18 +23,8 @@ struct survey_refs_wanted { int want_other; /* see FILTER_REFS_OTHERS -- refs/notes/, refs/stash/ */ }; -/* - * The set of refs that we will search if the user doesn't select - * any on the command line. - */ -static struct survey_refs_wanted refs_if_unspecified = { - .want_all_refs = 0, - - .want_branches = 1, - .want_tags = 1, - .want_remotes = 1, - .want_detached = 0, - .want_other = 0, +static struct survey_refs_wanted default_ref_options = { + .want_all_refs = 1, }; struct survey_opts { @@ -37,19 +33,51 @@ struct survey_opts { struct survey_refs_wanted refs; }; -static struct survey_opts survey_opts = { - .verbose = 0, - .show_progress = -1, /* defaults to isatty(2) */ +struct survey_report_ref_summary { + size_t refs_nr; + size_t branches_nr; + size_t remote_refs_nr; + size_t tags_nr; + size_t tags_annotated_nr; + size_t others_nr; + size_t unknown_nr; +}; + +/** + * This struct contains all of the information that needs to be printed + * at the end of the exploration of the repository and its references. + */ +struct survey_report { + struct survey_report_ref_summary refs; +}; + +struct survey_context { + /* Options that control what is done. */ + struct survey_opts opts; + + /* Info for output only. */ + struct survey_report report; - .refs.want_all_refs = -1, + /* + * The rest of the members are about enabling the activity + * of the 'git survey' command, including ref listings, object + * pointers, and progress. + */ + + struct repository *repo; + + struct progress *progress; + size_t progress_nr; + size_t progress_total; - .refs.want_branches = -1, /* default these to undefined */ - .refs.want_tags = -1, - .refs.want_remotes = -1, - .refs.want_detached = -1, - .refs.want_other = -1, + struct strvec refs; }; +static void clear_survey_context(struct survey_context *ctx) +{ + strvec_clear(&ctx->refs); +} + /* * After parsing the command line arguments, figure out which refs we * should scan. @@ -57,9 +85,9 @@ static struct survey_opts survey_opts = { * If ANY were given in positive sense, then we ONLY include them and * do not use the builtin values. */ -static void fixup_refs_wanted(void) +static void fixup_refs_wanted(struct survey_context *ctx) { - struct survey_refs_wanted *rw = &survey_opts.refs; + struct survey_refs_wanted *rw = &ctx->opts.refs; /* * `--all-refs` overrides and enables everything. @@ -82,7 +110,7 @@ static void fixup_refs_wanted(void) rw->want_remotes == -1 && rw->want_detached == -1 && rw->want_other == -1) { - *rw = refs_if_unspecified; + *rw = default_ref_options; return; } @@ -106,54 +134,184 @@ static void fixup_refs_wanted(void) rw->want_other = 0; } -static struct option survey_options[] = { - OPT__VERBOSE(&survey_opts.verbose, N_("verbose output")), - OPT_BOOL(0, "progress", &survey_opts.show_progress, N_("show progress")), - - OPT_BOOL_F(0, "all-refs", &survey_opts.refs.want_all_refs, N_("include all refs"), PARSE_OPT_NONEG), - - OPT_BOOL_F(0, "branches", &survey_opts.refs.want_branches, N_("include branches"), PARSE_OPT_NONEG), - OPT_BOOL_F(0, "tags", &survey_opts.refs.want_tags, N_("include tags"), PARSE_OPT_NONEG), - OPT_BOOL_F(0, "remotes", &survey_opts.refs.want_remotes, N_("include all remotes refs"), PARSE_OPT_NONEG), - OPT_BOOL_F(0, "detached", &survey_opts.refs.want_detached, N_("include detached HEAD"), PARSE_OPT_NONEG), - OPT_BOOL_F(0, "other", &survey_opts.refs.want_other, N_("include notes and stashes"), PARSE_OPT_NONEG), - - OPT_END(), -}; - static int survey_load_config_cb(const char *var, const char *value, - const struct config_context *ctx, void *pvoid) + const struct config_context *cctx, void *pvoid) { + struct survey_context *sctx = pvoid; if (!strcmp(var, "survey.verbose")) { - survey_opts.verbose = git_config_bool(var, value); + sctx->opts.verbose = git_config_bool(var, value); return 0; } if (!strcmp(var, "survey.progress")) { - survey_opts.show_progress = git_config_bool(var, value); + sctx->opts.show_progress = git_config_bool(var, value); return 0; } - return git_default_config(var, value, ctx, pvoid); + return git_default_config(var, value, cctx, pvoid); } -static void survey_load_config(void) +static void survey_load_config(struct survey_context *ctx) { - git_config(survey_load_config_cb, NULL); + git_config(survey_load_config_cb, ctx); +} + +static void do_load_refs(struct survey_context *ctx, + struct ref_array *ref_array) +{ + struct ref_filter filter = REF_FILTER_INIT; + struct ref_sorting *sorting; + struct string_list sorting_options = STRING_LIST_INIT_DUP; + + string_list_append(&sorting_options, "objectname"); + sorting = ref_sorting_options(&sorting_options); + + if (ctx->opts.refs.want_detached) + strvec_push(&ctx->refs, "HEAD"); + + if (ctx->opts.refs.want_all_refs) { + strvec_push(&ctx->refs, "refs/"); + } else { + if (ctx->opts.refs.want_branches) + strvec_push(&ctx->refs, "refs/heads/"); + if (ctx->opts.refs.want_tags) + strvec_push(&ctx->refs, "refs/tags/"); + if (ctx->opts.refs.want_remotes) + strvec_push(&ctx->refs, "refs/remotes/"); + if (ctx->opts.refs.want_other) { + strvec_push(&ctx->refs, "refs/notes/"); + strvec_push(&ctx->refs, "refs/stash/"); + } + } + + filter.name_patterns = ctx->refs.v; + filter.ignore_case = 0; + filter.match_as_path = 1; + + if (ctx->opts.show_progress) { + ctx->progress_total = 0; + ctx->progress = start_progress(_("Scanning refs..."), 0); + } + + filter_refs(ref_array, &filter, FILTER_REFS_KIND_MASK); + + if (ctx->opts.show_progress) { + ctx->progress_total = ref_array->nr; + display_progress(ctx->progress, ctx->progress_total); + } + + ref_array_sort(sorting, ref_array); + + stop_progress(&ctx->progress); + ref_filter_clear(&filter); + ref_sorting_release(sorting); +} + +/* + * The REFS phase: + * + * Load the set of requested refs and assess them for scalablity problems. + * Use that set to start a treewalk to all reachable objects and assess + * them. + * + * This data will give us insights into the repository itself (the number + * of refs, the size and shape of the DAG, the number and size of the + * objects). + * + * Theoretically, this data is independent of the on-disk representation + * (e.g. independent of packing concerns). + */ +static void survey_phase_refs(struct survey_context *ctx) +{ + struct ref_array ref_array = { 0 }; + + trace2_region_enter("survey", "phase/refs", ctx->repo); + do_load_refs(ctx, &ref_array); + + ctx->report.refs.refs_nr = ref_array.nr; + for (size_t i = 0; i < ref_array.nr; i++) { + size_t size; + struct ref_array_item *item = ref_array.items[i]; + + switch (item->kind) { + case FILTER_REFS_TAGS: + ctx->report.refs.tags_nr++; + if (oid_object_info(ctx->repo, + &item->objectname, + &size) == OBJ_TAG) + ctx->report.refs.tags_annotated_nr++; + break; + + case FILTER_REFS_BRANCHES: + ctx->report.refs.branches_nr++; + break; + + case FILTER_REFS_REMOTES: + ctx->report.refs.remote_refs_nr++; + break; + + case FILTER_REFS_OTHERS: + ctx->report.refs.others_nr++; + break; + + default: + ctx->report.refs.unknown_nr++; + break; + } + } + + trace2_region_leave("survey", "phase/refs", ctx->repo); + + ref_array_clear(&ref_array); } int cmd_survey(int argc, const char **argv, const char *prefix) { + static struct survey_context ctx = { + .opts = { + .verbose = 0, + .show_progress = -1, /* defaults to isatty(2) */ + + .refs.want_all_refs = -1, + + .refs.want_branches = -1, /* default these to undefined */ + .refs.want_tags = -1, + .refs.want_remotes = -1, + .refs.want_detached = -1, + .refs.want_other = -1, + }, + .refs = STRVEC_INIT, + }; + + static struct option survey_options[] = { + OPT__VERBOSE(&ctx.opts.verbose, N_("verbose output")), + OPT_BOOL(0, "progress", &ctx.opts.show_progress, N_("show progress")), + + OPT_BOOL_F(0, "all-refs", &ctx.opts.refs.want_all_refs, N_("include all refs"), PARSE_OPT_NONEG), + + OPT_BOOL_F(0, "branches", &ctx.opts.refs.want_branches, N_("include branches"), PARSE_OPT_NONEG), + OPT_BOOL_F(0, "tags", &ctx.opts.refs.want_tags, N_("include tags"), PARSE_OPT_NONEG), + OPT_BOOL_F(0, "remotes", &ctx.opts.refs.want_remotes, N_("include all remotes refs"), PARSE_OPT_NONEG), + OPT_BOOL_F(0, "detached", &ctx.opts.refs.want_detached, N_("include detached HEAD"), PARSE_OPT_NONEG), + OPT_BOOL_F(0, "other", &ctx.opts.refs.want_other, N_("include notes and stashes"), PARSE_OPT_NONEG), + + OPT_END(), + }; + if (argc == 2 && !strcmp(argv[1], "-h")) usage_with_options(survey_usage, survey_options); - prepare_repo_settings(the_repository); - survey_load_config(); + ctx.repo = the_repository; + prepare_repo_settings(ctx.repo); + survey_load_config(&ctx); argc = parse_options(argc, argv, prefix, survey_options, survey_usage, 0); - if (survey_opts.show_progress < 0) - survey_opts.show_progress = isatty(2); - fixup_refs_wanted(); + if (ctx.opts.show_progress < 0) + ctx.opts.show_progress = isatty(2); + fixup_refs_wanted(&ctx); + + survey_phase_refs(&ctx); + clear_survey_context(&ctx); return 0; } diff --git a/t/t8100-git-survey.sh b/t/t8100-git-survey.sh index 2df7fa83629..5903c90cb57 100755 --- a/t/t8100-git-survey.sh +++ b/t/t8100-git-survey.sh @@ -15,4 +15,13 @@ test_expect_success 'git survey -h shows experimental warning' ' grep "EXPERIMENTAL!" usage ' +test_expect_success 'creat a semi-interesting repo' ' + test_commit_bulk 10 +' + +test_expect_success 'git survey (default)' ' + git survey >out 2>err && + test_line_count = 0 err +' + test_done From patchwork Tue Sep 10 02:28:37 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13797744 Received: from mail-ed1-f44.google.com (mail-ed1-f44.google.com [209.85.208.44]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A567614B08C for ; Tue, 10 Sep 2024 02:29:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.208.44 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725935354; cv=none; b=o+Mpw4llwb1FepYJ6XZmwEeozwkB8aNc7G/aiQFua8vp3gMOUOmqpvRQJwoOnmWzDQSG5xR+kuTvGgL5NA9D16xVf11JTy0XjmHeCPRsnqTvNmTqC/sC5dkrUjVsyK65Cst/AIp4aZetpijOvS4lL0yycIeMbwTPMr8dMmGJAQE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725935354; c=relaxed/simple; bh=fpbnCbN3CgaANU2dyeExRT1HiGxuhgu2xpyYmuEnQCQ=; h=Message-Id:In-Reply-To:References:From:Date:Subject:Content-Type: MIME-Version:To:Cc; b=XzZ5PmHznh1n9ehVfwbwp5cu9+abtgDnzI4tpQ4Byd1vnIk94owQI6eDYMsgl07L6QJr3fgZVk6P73vd+p9UmUDxiZ5OHC/kNUg1qHf/5ef3M5xmlFO1eRwrR2WW6lq/ssqUVr6z/0MdnjXziVYKoH903RXFdliTqtjOrhFVSbY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=k+aQoUfF; arc=none smtp.client-ip=209.85.208.44 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="k+aQoUfF" Received: by mail-ed1-f44.google.com with SMTP id 4fb4d7f45d1cf-5c3cdbe4728so4968889a12.2 for ; Mon, 09 Sep 2024 19:29:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1725935350; x=1726540150; darn=vger.kernel.org; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=OxibNVfaacDMWZrY5jLbaQv9IP94XIEeA7Z2qrkWvho=; b=k+aQoUfFTnA6m7le36QSBxrtFCiqbWIlqfNG3RHopUIOVy6ZiPzRE19VxFWPcgNLQs dny59LwBroX+i3DiWAhsIKZuFwJx8NB9Vqk/a06C6+vQ4RvgcSx16p49n2Z2aCJsZu6o Jhw9BQtKiVcvEKZUqh255aCO51upeKDH8huOMtjbygv9pJt2jPsHuqv2kNMoU3X1nnT0 VSzq3v5eJl00D/u57yCSlGDn6sC4vPLSOvM4a52Okj8iSdUBm2tju+ORdmIdxhNg1wLX RJelPTrezk3Z02dhoYDtAw/2LbpaRhhGq18HJNiUaND2UKUFlsqac0PcO1b92ifVjBen g+Jg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725935350; x=1726540150; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=OxibNVfaacDMWZrY5jLbaQv9IP94XIEeA7Z2qrkWvho=; b=ouVqaI1+MQfUZSdGa7cJCkB1e9KDgzB2rd05qsa446/5PynioOuu6cPt28gD4Rl4HJ P3faN2RIzYUz6x0jB0OYF72tqs2AJf7HfveWkiCR23Oi6whzYdqMCSc627FNO0r97B0o 8wPcHSLLC9JQp2/fMF9fXyPSwXMCm5UcxVhxw9fYB7MbnvHXrMgenYPupSd1NX5wQeiB sSMSU0Z1ydVK2hJJUhE93UCdujj35gHW+nnleqgnjH8YGrwqL6LF6PAswjA78J375wMJ SCSkurNftd0pgWB/zgXRXhW5CC8aRn2U0Xh7fkO9s70vrc1A2mYQ4eK+3dsSwx3VAOop 1AxQ== X-Gm-Message-State: AOJu0YwSEGbGAKuDKTbSBj32nINsQW3Rn9G7lOIWFz8iTaymtTpxaoKr UenF9VfRVSw011V4GYDIvp83eDp3uhI6IVoHpOIVLl/xAho8uuAeB/kMhw== X-Google-Smtp-Source: AGHT+IFVqhdYyj6tvy6nj2948diZL52/d3rUtm0upCqb5q7j8HS3PQb9kVRF3wX3bQ4ZBGeKFKZDVw== X-Received: by 2002:a17:907:d2c5:b0:a86:95ff:f3a0 with SMTP id a640c23a62f3a-a8d245139admr650988766b.3.1725935349616; Mon, 09 Sep 2024 19:29:09 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a8d25ced17asm420047966b.170.2024.09.09.19.29.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 09 Sep 2024 19:29:09 -0700 (PDT) Message-Id: <44417cceddcaeec9e90acd0b058edd8c80627479.1725935335.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Tue, 10 Sep 2024 02:28:37 +0000 Subject: [PATCH 12/30] survey: start pretty printing data in table form Fcc: Sent Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, johannes.schindelin@gmx.de, peff@peff.net, ps@pks.im, me@ttaylorr.com, johncai86@gmail.com, newren@gmail.com, Derrick Stolee , Derrick Stolee From: Derrick Stolee From: Derrick Stolee When 'git survey' provides information to the user, this will be presented in one of two formats: plaintext and JSON. The JSON implementation will be delayed until the functionality is complete for the plaintext format. The most important parts of the plaintext format are headers specifying the different sections of the report and tables providing concreted data. Create a custom table data structure that allows specifying a list of strings for the row values. When printing the table, check each column for the maximum width so we can create a table of the correct size from the start. The table structure is designed to be flexible to the different kinds of output that will be implemented in future changes. Signed-off-by: Derrick Stolee --- builtin/survey.c | 175 ++++++++++++++++++++++++++++++++++++++++++ t/t8100-git-survey.sh | 17 +++- 2 files changed, 191 insertions(+), 1 deletion(-) diff --git a/builtin/survey.c b/builtin/survey.c index 1b4fe591e59..b2104e84d61 100644 --- a/builtin/survey.c +++ b/builtin/survey.c @@ -5,6 +5,7 @@ #include "parse-options.h" #include "progress.h" #include "ref-filter.h" +#include "strbuf.h" #include "strvec.h" #include "trace2.h" @@ -27,10 +28,16 @@ static struct survey_refs_wanted default_ref_options = { .want_all_refs = 1, }; +enum survey_format { + SURVEY_PLAINTEXT = 0, + SURVEY_JSON = 1, +}; + struct survey_opts { int verbose; int show_progress; struct survey_refs_wanted refs; + enum survey_format format; }; struct survey_report_ref_summary { @@ -78,6 +85,161 @@ static void clear_survey_context(struct survey_context *ctx) strvec_clear(&ctx->refs); } +struct survey_table { + const char *table_name; + struct strvec header; + struct strvec *rows; + size_t rows_nr; + size_t rows_alloc; +}; + +#define SURVEY_TABLE_INIT { \ + .header = STRVEC_INIT, \ +} + +static void clear_table(struct survey_table *table) +{ + strvec_clear(&table->header); + for (size_t i = 0; i < table->rows_nr; i++) + strvec_clear(&table->rows[i]); + free(table->rows); +} + +static void insert_table_rowv(struct survey_table *table, ...) +{ + va_list ap; + char *arg; + ALLOC_GROW(table->rows, table->rows_nr + 1, table->rows_alloc); + + memset(&table->rows[table->rows_nr], 0, sizeof(struct strvec)); + + va_start(ap, table); + while ((arg = va_arg(ap, char *))) + strvec_push(&table->rows[table->rows_nr], arg); + va_end(ap); + + table->rows_nr++; +} + +static void print_table_title(const char *name, size_t *widths, size_t nr) +{ + static struct strbuf lines = STRBUF_INIT; + size_t width = 0; + strbuf_setlen(&lines, 0); + + strbuf_addch(&lines, ' '); + strbuf_addstr(&lines, name); + strbuf_addch(&lines, '\n'); + + for (size_t i = 0; i < nr; i++) { + if (i) + width += 3; + width += widths[i]; + } + strbuf_addchars(&lines, '=', width); + printf("%s\n", lines.buf); +} + +static void print_row_plaintext(struct strvec *row, size_t *widths) +{ + static struct strbuf line = STRBUF_INIT; + strbuf_setlen(&line, 0); + + for (size_t i = 0; i < row->nr; i++) { + const char *str = row->v[i]; + size_t len = strlen(str); + if (i) + strbuf_add(&line, " | ", 3); + strbuf_addchars(&line, ' ', widths[i] - len); + strbuf_add(&line, str, len); + } + printf("%s\n", line.buf); +} + +static void print_divider_plaintext(size_t *widths, size_t nr) +{ + static struct strbuf line = STRBUF_INIT; + strbuf_setlen(&line, 0); + + for (size_t i = 0; i < nr; i++) { + if (i) + strbuf_add(&line, "-+-", 3); + strbuf_addchars(&line, '-', widths[i]); + } + printf("%s\n", line.buf); +} + +static void print_table_plaintext(struct survey_table *table) +{ + size_t *column_widths; + size_t columns_nr = table->header.nr; + CALLOC_ARRAY(column_widths, columns_nr); + + for (size_t i = 0; i < columns_nr; i++) { + column_widths[i] = strlen(table->header.v[i]); + + for (size_t j = 0; j < table->rows_nr; j++) { + size_t rowlen = strlen(table->rows[j].v[i]); + if (column_widths[i] < rowlen) + column_widths[i] = rowlen; + } + } + + print_table_title(table->table_name, column_widths, columns_nr); + print_row_plaintext(&table->header, column_widths); + print_divider_plaintext(column_widths, columns_nr); + + for (size_t j = 0; j < table->rows_nr; j++) + print_row_plaintext(&table->rows[j], column_widths); +} + +static void survey_report_plaintext_refs(struct survey_context *ctx) +{ + struct survey_report_ref_summary *refs = &ctx->report.refs; + struct survey_table table = SURVEY_TABLE_INIT; + + table.table_name = _("REFERENCES SUMMARY"); + + strvec_push(&table.header, _("Ref Type")); + strvec_push(&table.header, _("Count")); + + if (ctx->opts.refs.want_all_refs || ctx->opts.refs.want_branches) { + char *fmt = xstrfmt("%"PRIuMAX"", refs->branches_nr); + insert_table_rowv(&table, _("Branches"), fmt, NULL); + free(fmt); + } + + if (ctx->opts.refs.want_all_refs || ctx->opts.refs.want_remotes) { + char *fmt = xstrfmt("%"PRIuMAX"", refs->remote_refs_nr); + insert_table_rowv(&table, _("Remote refs"), fmt, NULL); + free(fmt); + } + + if (ctx->opts.refs.want_all_refs || ctx->opts.refs.want_tags) { + char *fmt = xstrfmt("%"PRIuMAX"", refs->tags_nr); + insert_table_rowv(&table, _("Tags (all)"), fmt, NULL); + free(fmt); + fmt = xstrfmt("%"PRIuMAX"", refs->tags_annotated_nr); + insert_table_rowv(&table, _("Tags (annotated)"), fmt, NULL); + free(fmt); + } + + print_table_plaintext(&table); + clear_table(&table); +} + +static void survey_report_plaintext(struct survey_context *ctx) +{ + printf("GIT SURVEY for \"%s\"\n", ctx->repo->worktree); + printf("-----------------------------------------------------\n"); + survey_report_plaintext_refs(ctx); +} + +static void survey_report_json(struct survey_context *ctx) +{ + /* TODO. */ +} + /* * After parsing the command line arguments, figure out which refs we * should scan. @@ -312,6 +474,19 @@ int cmd_survey(int argc, const char **argv, const char *prefix) survey_phase_refs(&ctx); + switch (ctx.opts.format) { + case SURVEY_PLAINTEXT: + survey_report_plaintext(&ctx); + break; + + case SURVEY_JSON: + survey_report_json(&ctx); + break; + + default: + BUG("Undefined format"); + } + clear_survey_context(&ctx); return 0; } diff --git a/t/t8100-git-survey.sh b/t/t8100-git-survey.sh index 5903c90cb57..a57f6ca7a59 100755 --- a/t/t8100-git-survey.sh +++ b/t/t8100-git-survey.sh @@ -21,7 +21,22 @@ test_expect_success 'creat a semi-interesting repo' ' test_expect_success 'git survey (default)' ' git survey >out 2>err && - test_line_count = 0 err + test_line_count = 0 err && + + cat >expect <<-EOF && + GIT SURVEY for "$(pwd)" + ----------------------------------------------------- + REFERENCES SUMMARY + ======================== + Ref Type | Count + -----------------+------ + Branches | 1 + Remote refs | 0 + Tags (all) | 0 + Tags (annotated) | 0 + EOF + + test_cmp expect out ' test_done From patchwork Tue Sep 10 02:28:38 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13797745 Received: from mail-ej1-f54.google.com (mail-ej1-f54.google.com [209.85.218.54]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C184D17ADE8 for ; Tue, 10 Sep 2024 02:29:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.218.54 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725935354; cv=none; b=nSHa6ZVXW2HdL0SpPjKOnUJ9izopBrVs5vKIbmjk1xxQru2d+IRC85y6U/yNa/pxbufVqwXpkf9keeuYuuEtACP3fCJJSos8DIcAY3VzAHYqjlZlgouCS8aGiKa1huYUEUVHwaPH5jX+eH0A+6fQCph3BHHJ0kqMwgtyaDzC1P4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725935354; c=relaxed/simple; bh=dR+6m0q3rgJ+G1PzzrpqW1BFtdPGHsljCvzGk2DJqAk=; h=Message-Id:In-Reply-To:References:From:Date:Subject:Content-Type: MIME-Version:To:Cc; b=oCCQyWE8d3w1k4W8IN5JwMitcICdhEsEiKnvC9SSyenPaq3P2fCsfh3YCm703WO7M5LEI7Q5YjqbwwR7RsKPEXmI9GCupjb3ej1W/mZfRSE2sO+nD7ovuwB0fYtmRcioN8AQWLKgPIRRjmdL0IFYO18UuvyjJyR/Qn6H0ulqbU8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=NbW2oBDL; arc=none smtp.client-ip=209.85.218.54 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="NbW2oBDL" Received: by mail-ej1-f54.google.com with SMTP id a640c23a62f3a-a7aa086b077so453284566b.0 for ; Mon, 09 Sep 2024 19:29:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1725935350; x=1726540150; darn=vger.kernel.org; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=R8ZX/8mDjkErdUuE2YO4sefyn3ZqBWJZ6++TpOw+3Wc=; b=NbW2oBDL5alx91XgpLTELQRB/oe2zZ76y/Rf/fTpNCNCS9hQoHFR/oQ1+Yi2xaLBch cNBBhU3/i/9kd0vdd9dFqJgiAXiMJzJxjR1ZyLuKSfrqDo/I1/KOB2+mdnMxOc2+MFwK lVLPBAPBpApramVUn64iqnWlDM6xNcLL0DI7riX8j6vJ4BES5B3DIHXrg2QnGdvc+vM5 9nexVDV1PSWxOOizo59iCpbDh/kL9djdQXUZ0IIWPoPg2sg4fO080pQ1rKDXKlEupIeq P4BIl0PjNBwBJK/+GmQUL8R9fn3W4WsbWKUgc5LoRETQPvHXK76HY0yCCt++L73QFAQL t9jQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725935350; x=1726540150; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=R8ZX/8mDjkErdUuE2YO4sefyn3ZqBWJZ6++TpOw+3Wc=; b=t/RpVmUmuXq5C6qrMu7Zg52kRMexsUm02uJIe+M9IQzEKNbkNC6LB8WwblKSu3v7HG Q6jEyZfIEtGN4rLrvmQQZJYaxJTXE7JzbCIIGSO6mTuvObAXDU/RWVMbD+6pHYWEqqmM NRP9Hzl10DpMVbh5rqgzE79w0TQcxitMhJkr1mL0/PHAw7xoaaI6ELTcrtkZXKC0IdiP z+2kKw+GfLeLLj18e32JYWS8yFF5LiiGQ+V1J52UkaKFPD4hchRHd3jClepGErg0ouqC x+Pb5SN8yTeUGeYS+mLxEMlb/ZlamsTZZCcjCeQWyUiQheYx8v7NQ3ol9xrKYCTnS8eh fbZQ== X-Gm-Message-State: AOJu0YwtiJ1oXcXUoCEsOBX9MyJ74qn7W6lRHzelWpvqAiWmI/n/cXsw fr05seH6ymqR6Giqn4aUKgkEfswcwYpMrThhcxuUUjsg4kbHuhWuWpuqRw== X-Google-Smtp-Source: AGHT+IE0yNhn5PW4ZCrbWWTkemth4kCFygGabZ2g7DyG0SMgjUu1HX9ucC+FJ7pK2dRaqB7ivdzTiw== X-Received: by 2002:a17:907:1c85:b0:a8d:2b86:d761 with SMTP id a640c23a62f3a-a8d2b86d95fmr633254066b.54.1725935350367; Mon, 09 Sep 2024 19:29:10 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a8d25a2b281sm419679766b.92.2024.09.09.19.29.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 09 Sep 2024 19:29:10 -0700 (PDT) Message-Id: In-Reply-To: References: Date: Tue, 10 Sep 2024 02:28:38 +0000 Subject: [PATCH 13/30] survey: add object count summary Fcc: Sent Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, johannes.schindelin@gmx.de, peff@peff.net, ps@pks.im, me@ttaylorr.com, johncai86@gmail.com, newren@gmail.com, Derrick Stolee , Derrick Stolee From: Derrick Stolee From: Derrick Stolee At the moment, nothing is obvious about the reason for the use of the path-walk API, but this will become more prevelant in future iterations. For now, use the path-walk API to sum up the counts of each kind of object. For example, this is the reachable object summary output for my local repo: REACHABLE OBJECT SUMMARY ======================== Object Type | Count ------------+------- Tags | 0 Commits | 178573 Trees | 312745 Blobs | 183035 (Note: the "Tags" are zero right now because the path-walk API has not been integrated to walk tags yet. This will be fixed in a later change.) RFC TODO: make sure tags are walked before this change. Signed-off-by: Derrick Stolee --- builtin/survey.c | 196 ++++++++++++++++++++++++++++++++++++++++-- t/t8100-git-survey.sh | 26 ++++-- 2 files changed, 209 insertions(+), 13 deletions(-) diff --git a/builtin/survey.c b/builtin/survey.c index b2104e84d61..504b4edafce 100644 --- a/builtin/survey.c +++ b/builtin/survey.c @@ -1,12 +1,19 @@ #include "builtin.h" #include "config.h" +#include "environment.h" +#include "hex.h" #include "object.h" +#include "object-name.h" #include "object-store-ll.h" #include "parse-options.h" +#include "path-walk.h" #include "progress.h" #include "ref-filter.h" +#include "refs.h" +#include "revision.h" #include "strbuf.h" #include "strvec.h" +#include "tag.h" #include "trace2.h" static const char * const survey_usage[] = { @@ -50,12 +57,20 @@ struct survey_report_ref_summary { size_t unknown_nr; }; +struct survey_report_object_summary { + size_t commits_nr; + size_t tags_nr; + size_t trees_nr; + size_t blobs_nr; +}; + /** * This struct contains all of the information that needs to be printed * at the end of the exploration of the repository and its references. */ struct survey_report { struct survey_report_ref_summary refs; + struct survey_report_object_summary reachable_objects; }; struct survey_context { @@ -78,10 +93,12 @@ struct survey_context { size_t progress_total; struct strvec refs; + struct ref_array ref_array; }; static void clear_survey_context(struct survey_context *ctx) { + ref_array_clear(&ctx->ref_array); strvec_clear(&ctx->refs); } @@ -125,10 +142,12 @@ static void print_table_title(const char *name, size_t *widths, size_t nr) { static struct strbuf lines = STRBUF_INIT; size_t width = 0; + size_t min_width; strbuf_setlen(&lines, 0); - strbuf_addch(&lines, ' '); + strbuf_addch(&lines, '\n'); strbuf_addstr(&lines, name); + min_width = lines.len - 1; strbuf_addch(&lines, '\n'); for (size_t i = 0; i < nr; i++) { @@ -136,6 +155,10 @@ static void print_table_title(const char *name, size_t *widths, size_t nr) width += 3; width += widths[i]; } + + if (width < min_width) + width = min_width; + strbuf_addchars(&lines, '=', width); printf("%s\n", lines.buf); } @@ -228,11 +251,43 @@ static void survey_report_plaintext_refs(struct survey_context *ctx) clear_table(&table); } +static void survey_report_plaintext_reachable_object_summary(struct survey_context *ctx) +{ + struct survey_report_object_summary *objs = &ctx->report.reachable_objects; + struct survey_table table = SURVEY_TABLE_INIT; + char *fmt; + + table.table_name = _("REACHABLE OBJECT SUMMARY"); + + strvec_push(&table.header, _("Object Type")); + strvec_push(&table.header, _("Count")); + + fmt = xstrfmt("%"PRIuMAX"", objs->tags_nr); + insert_table_rowv(&table, _("Tags"), fmt, NULL); + free(fmt); + + fmt = xstrfmt("%"PRIuMAX"", objs->commits_nr); + insert_table_rowv(&table, _("Commits"), fmt, NULL); + free(fmt); + + fmt = xstrfmt("%"PRIuMAX"", objs->trees_nr); + insert_table_rowv(&table, _("Trees"), fmt, NULL); + free(fmt); + + fmt = xstrfmt("%"PRIuMAX"", objs->blobs_nr); + insert_table_rowv(&table, _("Blobs"), fmt, NULL); + free(fmt); + + print_table_plaintext(&table); + clear_table(&table); +} + static void survey_report_plaintext(struct survey_context *ctx) { printf("GIT SURVEY for \"%s\"\n", ctx->repo->worktree); printf("-----------------------------------------------------\n"); survey_report_plaintext_refs(ctx); + survey_report_plaintext_reachable_object_summary(ctx); } static void survey_report_json(struct survey_context *ctx) @@ -384,15 +439,13 @@ static void do_load_refs(struct survey_context *ctx, */ static void survey_phase_refs(struct survey_context *ctx) { - struct ref_array ref_array = { 0 }; - trace2_region_enter("survey", "phase/refs", ctx->repo); - do_load_refs(ctx, &ref_array); + do_load_refs(ctx, &ctx->ref_array); - ctx->report.refs.refs_nr = ref_array.nr; - for (size_t i = 0; i < ref_array.nr; i++) { + ctx->report.refs.refs_nr = ctx->ref_array.nr; + for (size_t i = 0; i < ctx->ref_array.nr; i++) { size_t size; - struct ref_array_item *item = ref_array.items[i]; + struct ref_array_item *item = ctx->ref_array.items[i]; switch (item->kind) { case FILTER_REFS_TAGS: @@ -422,8 +475,133 @@ static void survey_phase_refs(struct survey_context *ctx) } trace2_region_leave("survey", "phase/refs", ctx->repo); +} + +static void increment_object_counts( + struct survey_report_object_summary *summary, + enum object_type type, + size_t nr) +{ + switch (type) { + case OBJ_COMMIT: + summary->commits_nr += nr; + break; + + case OBJ_TREE: + summary->trees_nr += nr; + break; + + case OBJ_BLOB: + summary->blobs_nr += nr; + break; + + default: + break; + } +} + +static int survey_objects_path_walk_fn(const char *path, + struct oid_array *oids, + enum object_type type, + void *data) +{ + struct survey_context *ctx = data; + + increment_object_counts(&ctx->report.reachable_objects, + type, oids->nr); + + return 0; +} + +static int iterate_tag_chain(struct survey_context *ctx, + struct object_id *oid, + struct object_id *peeled) +{ + struct object *o = lookup_unknown_object(ctx->repo, oid); + struct tag *t; + + if (o->type != OBJ_TAG) { + oidcpy(peeled, &o->oid); + return o->type != OBJ_COMMIT; + } + + t = lookup_tag(ctx->repo, oid); + while (t) { + parse_tag(t); + ctx->report.reachable_objects.tags_nr++; + + if (!t->tagged) + break; + + o = lookup_unknown_object(ctx->repo, &t->tagged->oid); + if (o && o->type == OBJ_TAG) + t = lookup_tag(ctx->repo, &t->tagged->oid); + else + break; + } + + if (!t || !t->tagged) + return -1; - ref_array_clear(&ref_array); + oidcpy(peeled, &t->tagged->oid); + o = lookup_unknown_object(ctx->repo, peeled); + if (o && o->type == OBJ_COMMIT) + return 0; + return -1; +} + +static void survey_phase_objects(struct survey_context *ctx) +{ + struct rev_info revs = REV_INFO_INIT; + struct path_walk_info info = PATH_WALK_INFO_INIT; + unsigned int add_flags = 0; + + trace2_region_enter("survey", "phase/objects", ctx->repo); + + info.revs = &revs; + info.path_fn = survey_objects_path_walk_fn; + info.path_fn_data = ctx; + + info.commits = 1; + info.trees = 1; + info.blobs = 1; + info.tags = 1; + + repo_init_revisions(ctx->repo, &revs, ""); + + for (size_t i = 0; i < ctx->ref_array.nr; i++) { + struct ref_array_item *item = ctx->ref_array.items[i]; + struct object_id peeled; + + switch (item->kind) { + case FILTER_REFS_TAGS: + if (!iterate_tag_chain(ctx, &item->objectname, &peeled)) + add_pending_oid(&revs, NULL, &peeled, add_flags); + break; + case FILTER_REFS_BRANCHES: + add_pending_oid(&revs, NULL, &item->objectname, add_flags); + break; + case FILTER_REFS_REMOTES: + add_pending_oid(&revs, NULL, &item->objectname, add_flags); + break; + case FILTER_REFS_OTHERS: + /* + * This may be a note, stash, or custom namespace branch. + */ + add_pending_oid(&revs, NULL, &item->objectname, add_flags); + break; + case FILTER_REFS_DETACHED_HEAD: + add_pending_oid(&revs, NULL, &item->objectname, add_flags); + break; + default: + break; + } + } + + walk_objects_by_path(&info); + + release_revisions(&revs); + trace2_region_leave("survey", "phase/objects", ctx->repo); } int cmd_survey(int argc, const char **argv, const char *prefix) @@ -474,6 +652,8 @@ int cmd_survey(int argc, const char **argv, const char *prefix) survey_phase_refs(&ctx); + survey_phase_objects(&ctx); + switch (ctx.opts.format) { case SURVEY_PLAINTEXT: survey_report_plaintext(&ctx); diff --git a/t/t8100-git-survey.sh b/t/t8100-git-survey.sh index a57f6ca7a59..0da92eafa95 100755 --- a/t/t8100-git-survey.sh +++ b/t/t8100-git-survey.sh @@ -16,24 +16,40 @@ test_expect_success 'git survey -h shows experimental warning' ' ' test_expect_success 'creat a semi-interesting repo' ' - test_commit_bulk 10 + test_commit_bulk 10 && + git tag -a -m one one HEAD~5 && + git tag -a -m two two HEAD~3 && + git tag -a -m three three two && + git tag -a -m four four three && + git update-ref -d refs/tags/three && + git update-ref -d refs/tags/two ' test_expect_success 'git survey (default)' ' - git survey >out 2>err && + git survey --all-refs >out 2>err && test_line_count = 0 err && cat >expect <<-EOF && GIT SURVEY for "$(pwd)" ----------------------------------------------------- - REFERENCES SUMMARY + + REFERENCES SUMMARY ======================== Ref Type | Count -----------------+------ Branches | 1 Remote refs | 0 - Tags (all) | 0 - Tags (annotated) | 0 + Tags (all) | 2 + Tags (annotated) | 2 + + REACHABLE OBJECT SUMMARY + ======================== + Object Type | Count + ------------+------ + Tags | 0 + Commits | 10 + Trees | 10 + Blobs | 10 EOF test_cmp expect out From patchwork Tue Sep 10 02:28:39 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13797746 Received: from mail-ed1-f47.google.com (mail-ed1-f47.google.com [209.85.208.47]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1E97E14D280 for ; Tue, 10 Sep 2024 02:29:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.208.47 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725935355; cv=none; b=LmqR/nTqstq8ZfzitohSTV6BXzvIRICzOOR1hoJt2iieKsTiDkiqphBSAXEDHO4gun8BEUJ4BekFVkCGXN9dUDKr7P/x8VJGsb0NFt1kbFm2c9xhLQ5W6acYXFSwzdFnkC2GTvcwpuzFhQp6MHLcrRPSOAV9QtfDZ+LP7ruiY1E= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725935355; c=relaxed/simple; bh=1PA5ohjmN/AsAyqQAApcY2mIpLTbT3btqcaE+eSbgos=; h=Message-Id:In-Reply-To:References:From:Date:Subject:Content-Type: MIME-Version:To:Cc; b=MlG1WpSCaY1J2IU2xhrcnc4Qh4pu3F344WLJGFPa310ppUX8QtXWODvshomqSQRfVWWXHaytlO9zQl8EloXV5b3XMNVTIzrLFJ4SwpEjUdpjmiyncE5g0uYmZ9w5t82kwYCQp4QmPy1JtvbvcYwAnyX17gZXcVVupto9yHZGFJU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=hDoWHr11; arc=none smtp.client-ip=209.85.208.47 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="hDoWHr11" Received: by mail-ed1-f47.google.com with SMTP id 4fb4d7f45d1cf-5c263118780so286002a12.2 for ; Mon, 09 Sep 2024 19:29:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1725935352; x=1726540152; darn=vger.kernel.org; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=uVwOpzShaFdLEHIKShR7it9pXSaygDKlYMAMokAqElI=; b=hDoWHr11cvuvoS5m3UzA3MEdacFednK3YOClHUkiiLCajS1c0AxKkEp2NFrgu+9Ldw EH/kOvNDs4x9ev6MLZw0p6p6l3b2XBuibQvizTj8JUMDqqz3rA+5242LAFS5bqokZCxY 7hIie+hkmyzUXsMXudBwH4jUg+4aG+SBlJuxx+AFhaszGt1ghgD318SHjow89fEgxze3 39oicR/Qe2a4EAH2aGHUAWYp38nNU63yIXd7DtHP0og2eiFVo0qbbgMlhJhfeUnR0Hdg URIRSm3PleAfFc1uXmwbSeiIP1u1zUivx7wWlO73uPOjyQ3t3hb2rX7MAwyOf7JRtua1 Lobw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725935352; x=1726540152; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=uVwOpzShaFdLEHIKShR7it9pXSaygDKlYMAMokAqElI=; b=bLurQZfVypD9IXF0vEuagwlfYwq7Wf74naazvo8UTI65XTXsiCd3dv3JV9iHOVmhj0 6uQb6QkSLABUfT5eKNHwvxQAdlf88jujNh9q0Vf+JyBQTaQoxoFfkFmArGS1NjrMKOmU yCFQ7yc0b0Kx91nPxQ8vi5ks4FtzuFIKSrpTxPZng94MWg9xfk3MLIBFU1xuc6cGlBCi KRxmdsSn3G5qonaSMsFfMW2rV2tq3EWNpUvXIDsEAdOgGZGWB0egUGMZ9U6wLZssNhdb Jnh1cis0w2Rw2/3Ayx44C84ydDdXRVhlYH1DBe730qwB+M+YomZOs4gh1nmIBXu4wXql omsQ== X-Gm-Message-State: AOJu0YwBoFZyU4kKT+14CqjJf2J8ypJO3ZCKb30jz7PNeMGjUW3HW7hs ZmwuyHNZs+LEuvMJtYeT9+sFhW8irLxTTO60zz7Q0gTC38hQHbnCQpVzqw== X-Google-Smtp-Source: AGHT+IFqrDAhsf1LFMza8+lvewKeR8I/PmFzN5Al4sOe6u2BlGMqE/qH5qO42zvirTf60Mg41T3OgQ== X-Received: by 2002:a17:907:3f91:b0:a8d:14e4:f94a with SMTP id a640c23a62f3a-a8d14e4fbf1mr833468366b.38.1725935352050; Mon, 09 Sep 2024 19:29:12 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a8d25a455besm419621966b.93.2024.09.09.19.29.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 09 Sep 2024 19:29:10 -0700 (PDT) Message-Id: <462ca0b80d29218647ffd26c2ae22c359917f00c.1725935335.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Tue, 10 Sep 2024 02:28:39 +0000 Subject: [PATCH 14/30] survey: summarize total sizes by object type Fcc: Sent Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, johannes.schindelin@gmx.de, peff@peff.net, ps@pks.im, me@ttaylorr.com, johncai86@gmail.com, newren@gmail.com, Derrick Stolee , Derrick Stolee From: Derrick Stolee From: Derrick Stolee Now that we have explored objects by count, we can expand that a bit more to summarize the data for the on-disk and inflated size of those objects. This information is helpful for diagnosing both why disk space (and perhaps clone or fetch times) is growing but also why certain operations are slow because the inflated size of the abstract objects that must be processed is so large. Signed-off-by: Derrick Stolee --- builtin/survey.c | 113 ++++++++++++++++++++++++++++++++++++++++++ t/t8100-git-survey.sh | 8 +++ 2 files changed, 121 insertions(+) diff --git a/builtin/survey.c b/builtin/survey.c index 504b4edafce..435c4bd452a 100644 --- a/builtin/survey.c +++ b/builtin/survey.c @@ -64,6 +64,19 @@ struct survey_report_object_summary { size_t blobs_nr; }; +/** + * For some category given by 'label', count the number of objects + * that match that label along with the on-disk size and the size + * after decompressing (both with delta bases and zlib). + */ +struct survey_report_object_size_summary { + char *label; + size_t nr; + size_t disk_size; + size_t inflated_size; + size_t num_missing; +}; + /** * This struct contains all of the information that needs to be printed * at the end of the exploration of the repository and its references. @@ -71,8 +84,15 @@ struct survey_report_object_summary { struct survey_report { struct survey_report_ref_summary refs; struct survey_report_object_summary reachable_objects; + + struct survey_report_object_size_summary *by_type; }; +#define REPORT_TYPE_COMMIT 0 +#define REPORT_TYPE_TREE 1 +#define REPORT_TYPE_BLOB 2 +#define REPORT_TYPE_COUNT 3 + struct survey_context { /* Options that control what is done. */ struct survey_opts opts; @@ -282,12 +302,41 @@ static void survey_report_plaintext_reachable_object_summary(struct survey_conte clear_table(&table); } +static void survey_report_object_sizes(const char *title, + const char *categories, + struct survey_report_object_size_summary *summary, + size_t summary_nr) +{ + struct survey_table table = SURVEY_TABLE_INIT; + table.table_name = title; + + strvec_push(&table.header, xstrdup(categories)); + strvec_push(&table.header, xstrdup(_("Count"))); + strvec_push(&table.header, xstrdup(_("Disk Size"))); + strvec_push(&table.header, xstrdup(_("Inflated Size"))); + + for (size_t i = 0; i < summary_nr; i++) { + insert_table_rowv(&table, xstrdup(summary[i].label), + xstrfmt("%"PRIuMAX, summary[i].nr), + xstrfmt("%"PRIuMAX, summary[i].disk_size), + xstrfmt("%"PRIuMAX, summary[i].inflated_size), + NULL); + } + + print_table_plaintext(&table); + clear_table(&table); +} + static void survey_report_plaintext(struct survey_context *ctx) { printf("GIT SURVEY for \"%s\"\n", ctx->repo->worktree); printf("-----------------------------------------------------\n"); survey_report_plaintext_refs(ctx); survey_report_plaintext_reachable_object_summary(ctx); + survey_report_object_sizes(_("TOTAL OBJECT SIZES BY TYPE"), + _("Object Type"), + ctx->report.by_type, + REPORT_TYPE_COUNT); } static void survey_report_json(struct survey_context *ctx) @@ -500,6 +549,64 @@ static void increment_object_counts( } } +static void increment_totals(struct survey_context *ctx, + struct oid_array *oids, + struct survey_report_object_size_summary *summary) +{ + for (size_t i = 0; i < oids->nr; i++) { + struct object_info oi = OBJECT_INFO_INIT; + unsigned oi_flags = OBJECT_INFO_FOR_PREFETCH; + unsigned long object_length = 0; + off_t disk_sizep = 0; + enum object_type type; + + oi.typep = &type; + oi.sizep = &object_length; + oi.disk_sizep = &disk_sizep; + + if (oid_object_info_extended(ctx->repo, &oids->oid[i], + &oi, oi_flags) < 0) { + summary->num_missing++; + } else { + summary->nr++; + summary->disk_size += disk_sizep; + summary->inflated_size += object_length; + } + } +} + +static void increment_object_totals(struct survey_context *ctx, + struct oid_array *oids, + enum object_type type) +{ + struct survey_report_object_size_summary *total; + struct survey_report_object_size_summary summary = { 0 }; + + increment_totals(ctx, oids, &summary); + + switch (type) { + case OBJ_COMMIT: + total = &ctx->report.by_type[REPORT_TYPE_COMMIT]; + break; + + case OBJ_TREE: + total = &ctx->report.by_type[REPORT_TYPE_TREE]; + break; + + case OBJ_BLOB: + total = &ctx->report.by_type[REPORT_TYPE_BLOB]; + break; + + default: + BUG("No other type allowed"); + } + + total->nr += summary.nr; + total->disk_size += summary.disk_size; + total->inflated_size += summary.inflated_size; + total->num_missing += summary.num_missing; +} + static int survey_objects_path_walk_fn(const char *path, struct oid_array *oids, enum object_type type, @@ -509,6 +616,7 @@ static int survey_objects_path_walk_fn(const char *path, increment_object_counts(&ctx->report.reachable_objects, type, oids->nr); + increment_object_totals(ctx, oids, type); return 0; } @@ -567,6 +675,11 @@ static void survey_phase_objects(struct survey_context *ctx) info.blobs = 1; info.tags = 1; + CALLOC_ARRAY(ctx->report.by_type, REPORT_TYPE_COUNT); + ctx->report.by_type[REPORT_TYPE_COMMIT].label = xstrdup(_("Commits")); + ctx->report.by_type[REPORT_TYPE_TREE].label = xstrdup(_("Trees")); + ctx->report.by_type[REPORT_TYPE_BLOB].label = xstrdup(_("Blobs")); + repo_init_revisions(ctx->repo, &revs, ""); for (size_t i = 0; i < ctx->ref_array.nr; i++) { diff --git a/t/t8100-git-survey.sh b/t/t8100-git-survey.sh index 0da92eafa95..f8af9601214 100755 --- a/t/t8100-git-survey.sh +++ b/t/t8100-git-survey.sh @@ -50,6 +50,14 @@ test_expect_success 'git survey (default)' ' Commits | 10 Trees | 10 Blobs | 10 + + TOTAL OBJECT SIZES BY TYPE + =============================================== + Object Type | Count | Disk Size | Inflated Size + ------------+-------+-----------+-------------- + Commits | 10 | 1523 | 2153 + Trees | 10 | 495 | 1706 + Blobs | 10 | 191 | 101 EOF test_cmp expect out From patchwork Tue Sep 10 02:28:40 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13797747 Received: from mail-wr1-f53.google.com (mail-wr1-f53.google.com [209.85.221.53]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 36F7817DFE8 for ; Tue, 10 Sep 2024 02:29:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.53 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725935356; cv=none; b=renFHSD3NjoLAeRBG6Tp8ewYtOOBNuHSsipb7uFF0squtrRLsTXnVfsglfr8SN+MvLuZAT7j9wB1ZekBnqutUwUjtVKrAulKIxt61pxACqfqYibOYGTh/PldQ/fLNnty6gc8SAtOo3I4Iv85L8RF41+F7snHmhjlEK71BXvalGE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725935356; c=relaxed/simple; bh=0xfSBK+gtVzJWzzFWtcbF7uOaXmJ7Zy/DKlx7MkylJI=; h=Message-Id:In-Reply-To:References:From:Date:Subject:Content-Type: MIME-Version:To:Cc; b=R1qTGFNwFhfd81HOsSCIwL0m+oGA3CSQ4FQOtmyHxAyLsOJk54CWw8Z/H2dtTYB69FcnaWFkcN9/ejyN62WmARBqmkVfcgMbxyUc0PzS8jL/AsUqkkYFUb+HAgZ09tspRpgJDw1gDFrv/aCPwxYX5DyzwIEscikTW3HDKaPhzaw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=gW8jz9jl; arc=none smtp.client-ip=209.85.221.53 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="gW8jz9jl" Received: by mail-wr1-f53.google.com with SMTP id ffacd0b85a97d-374bd0da617so147523f8f.3 for ; Mon, 09 Sep 2024 19:29:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1725935353; x=1726540153; darn=vger.kernel.org; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=9lB37MuKhQj/8cviwldin0IXWfEt5gYeFdN+KFF+dEE=; b=gW8jz9jljczjGIinZA90PO10l7d/JvV7okckGrqYjB8SagidifU8c4mIEt94xzdExE pJ4bCjwc5GV3RVD4i5WUr9yOw8Ffq6Z2DOkqTC9HIZpOd32bDc42SRD8hSufawwAXzsg 3tVNXOIXWbeBO+Kpz06M5Z7kQ/a92H1TTq1IdeBjmoVrJRcFQ5lI1cJ7gaARwHpTU8vd 2qRFiFxvGzHbuAs5Ea1ZLGYUR9LmFzMxzrZwWOB+CUof9MZ6pqUNynW6YtorOqU1jEJu f3BGGOFfHWRRzByoNCPTe0vwSMdP6ibNmV4WcacGjk/dK8VG8Cofhjwd8f9TrA9Wl1vB 39tQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725935353; x=1726540153; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=9lB37MuKhQj/8cviwldin0IXWfEt5gYeFdN+KFF+dEE=; b=P25I61DfPSjZEi+Mpl+pfGH8qrN0sa0PHZe6++Z+8byR+RKWtCdNsJNxNtF5r+kik6 t28y4InjgZ+4B+YeoCXX1JTGis+5ladfiVibK9CjwCOoKCmbfx3LZf6y9ySE/5woqTEf zS/s864qRODwT4n4HTYFYl5U1l3Tc+mqegE5ytUtWDUA+eVJpEauFc/h0IRwsB9e7iqB FmKKbxDHFPv5BhZSGvzJg5VmNk6NbOANCzqh9lMurn6XOXRrKNrfKuyHsrBXKEoZiiFN TW9FHnyQ4vDwtlaWk8NxgD4a9ggFWxhtNP9UCp0kH+SAxQcMkPb9RinovmxJ/FeBGv0x ovsg== X-Gm-Message-State: AOJu0YyiY4FdYJ41gEFKV/C71OqxsyqlRQG7+JO6JjO/bjC+Si5A+q4d YlClWJF33hROUHr39tbx1rKDMWvJsxuZeNmKuulbvEYjLK+nUWTurptviA== X-Google-Smtp-Source: AGHT+IE4yOwhcRMxGzyuLz1YURtR6C7n5LgqPIp9Yhpee0FtKqGRcG6j060IqCLyh2rFJVH/ziqi1Q== X-Received: by 2002:a5d:54c2:0:b0:374:bb00:31eb with SMTP id ffacd0b85a97d-378895c5c5amr7048628f8f.6.1725935353199; Mon, 09 Sep 2024 19:29:13 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a8d25c72e56sm411892966b.124.2024.09.09.19.29.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 09 Sep 2024 19:29:12 -0700 (PDT) Message-Id: <9c54c14435742927a66487df2862204aca8e6fc7.1725935335.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Tue, 10 Sep 2024 02:28:40 +0000 Subject: [PATCH 15/30] survey: show progress during object walk Fcc: Sent Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, johannes.schindelin@gmx.de, peff@peff.net, ps@pks.im, me@ttaylorr.com, johncai86@gmail.com, newren@gmail.com, Derrick Stolee , Derrick Stolee From: Derrick Stolee From: Derrick Stolee Signed-off-by: Derrick Stolee --- builtin/survey.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/builtin/survey.c b/builtin/survey.c index 435c4bd452a..baaaf8a6374 100644 --- a/builtin/survey.c +++ b/builtin/survey.c @@ -618,6 +618,9 @@ static int survey_objects_path_walk_fn(const char *path, type, oids->nr); increment_object_totals(ctx, oids, type); + ctx->progress_nr += oids->nr; + display_progress(ctx->progress, ctx->progress_nr); + return 0; } @@ -682,6 +685,11 @@ static void survey_phase_objects(struct survey_context *ctx) repo_init_revisions(ctx->repo, &revs, ""); + ctx->progress_nr = 0; + ctx->progress_total = ctx->ref_array.nr; + if (ctx->opts.show_progress) + ctx->progress = start_progress(_("Preparing object walk"), + ctx->progress_total); for (size_t i = 0; i < ctx->ref_array.nr; i++) { struct ref_array_item *item = ctx->ref_array.items[i]; struct object_id peeled; @@ -709,9 +717,17 @@ static void survey_phase_objects(struct survey_context *ctx) default: break; } + + display_progress(ctx->progress, ++(ctx->progress_nr)); } + stop_progress(&ctx->progress); + ctx->progress_nr = 0; + ctx->progress_total = 0; + if (ctx->opts.show_progress) + ctx->progress = start_progress(_("Walking objects"), 0); walk_objects_by_path(&info); + stop_progress(&ctx->progress); release_revisions(&revs); trace2_region_leave("survey", "phase/objects", ctx->repo); From patchwork Tue Sep 10 02:28:41 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13797748 Received: from mail-lj1-f181.google.com (mail-lj1-f181.google.com [209.85.208.181]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id ABDB717E44A for ; Tue, 10 Sep 2024 02:29:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.208.181 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725935358; cv=none; b=QkFquUC8J8HAJref+oTiX8MY1OAdADeJp/6aThGc3GcGve3/PyBETobcMhC7U/h/MxT1mOPkOrMWjLaJ7aotTUfhdk2aTl1akKCr5725rUUMkmNebAutmLSucU7GV5dkY5ZEB9WVy7wez8zVjTN5F550xybQUxNsCbZDZx82kD0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725935358; c=relaxed/simple; bh=oxreYrDp9BdYmvGLbyhFl1o7XFz1FkY+tTYhhFHIrIE=; h=Message-Id:In-Reply-To:References:From:Date:Subject:Content-Type: MIME-Version:To:Cc; b=rZfBN7Yz/+YOJ3eXws9v/Nar6Awrk1a6P5ttyCUSoFzIETxvIN8urN3iyszfvoWvfHutN1owuiWCo6ZlVZXJI8M9pssweLMkdDUGTDrGCWKEYMJANAlbyhWbp1V/znTQUQfy3TxTiFmUokHwGVsE7Wer5mfCnF8NSbaYNZdcJMQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=U3wiWlmE; arc=none smtp.client-ip=209.85.208.181 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="U3wiWlmE" Received: by mail-lj1-f181.google.com with SMTP id 38308e7fff4ca-2f74e613a10so38460111fa.1 for ; Mon, 09 Sep 2024 19:29:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1725935354; x=1726540154; darn=vger.kernel.org; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=D90cTLe7P0X0gWvYLCHbiUHT89yrSeqOaitywQgyQyA=; b=U3wiWlmEu3Ve+S/WijStp/NJshBDrtX63tfEOP/IDyLivtbybovCMzexaPhK986Tee HLj0bxZNQraPbYMfB8E7uffk9poCX2FnSxYP8hjWueXSRTPrusmsaSwn3LbB62PWQBOM pXW175kIceXLNC9tcqYZMCRF8dKw4+vNukiequWwpMygdkejZAzzXh1AwWlUcQ8heVPL KxM7nReyhYp3odsHRLNTaLPkBON7CtZ1ODy210ViCpY5AoIK+mj5MWNI80z8Dg1JQw+m mgi20KQH5QjLe40+2c99fq6jKdSFGIu5dLYcTX+XQzKr1iHHS9YajNZ3fQM0cAQkGW5u 4TQA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725935354; x=1726540154; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=D90cTLe7P0X0gWvYLCHbiUHT89yrSeqOaitywQgyQyA=; b=q52CbI1magB+XPIJlhQFkqo1HMs+4jgA1wxttTcscQPwFBb2pRe+nDIEcdR/SB5kfI XkTS0xU6CeSZsb2Ew9FcbLh8wh6IF2HaTrYCVv3gwZMhHV90zAB4j0ZheDUy7gT6erK1 v4OPGLmiYxnD+JFNxI8DHycdyEQcn77+vRgG78UYA7LCOduf607j4PjkrUFZJlxYyMDV h161AdTSQKvW35zy3CJ33XlFmT1v05fDCV8nBLzpGKovdnbpxtsW1wnUL2kTeijPkLSF ySWV0uEzZ5tjrLafHjulIgxxdq3DPA+uvb3ZrTdqRv6i5BbCXK3Bzc42h29sB+syZZsU Hv/g== X-Gm-Message-State: AOJu0Yw55atus4SHt595Q8SkykdXmIIomMINOZxa4C+Ixp0bdAwea3L/ fu5jZlEkTE4UVdN9XXrT+OcHI3e7uMqPr2L0l5tXsf+fRVJiaIM/Tx23Aw== X-Google-Smtp-Source: AGHT+IEpv+x0QqQGxKt5p5y0QhN67uOi/cHgvGLY4cUJevKAJJ0srggYU/Sks7qdMCekW22iMNUyyQ== X-Received: by 2002:a2e:619:0:b0:2f7:712d:d08 with SMTP id 38308e7fff4ca-2f7749e9fb7mr3244931fa.23.1725935354288; Mon, 09 Sep 2024 19:29:14 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a8d258317d6sm414148566b.29.2024.09.09.19.29.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 09 Sep 2024 19:29:13 -0700 (PDT) Message-Id: <3504abb269b5229d4aaff4db9f4d694d925ac1b2.1725935335.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Tue, 10 Sep 2024 02:28:41 +0000 Subject: [PATCH 16/30] survey: add ability to track prioritized lists Fcc: Sent Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, johannes.schindelin@gmx.de, peff@peff.net, ps@pks.im, me@ttaylorr.com, johncai86@gmail.com, newren@gmail.com, Derrick Stolee , Derrick Stolee From: Derrick Stolee From: Derrick Stolee In future changes, we will make use of these methods. The intention is to keep track of the top contributors according to some metric. We don't want to store all of the entries and do a sort at the end, so track a constant-size table and remove rows that get pushed out depending on the chosen sorting algorithm. Co-authored-by: Jeff Hostetler Signed-off-by; Jeff Hostetler Signed-off-by: Derrick Stolee --- builtin/survey.c | 96 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 96 insertions(+) diff --git a/builtin/survey.c b/builtin/survey.c index baaaf8a6374..ad467e9a88c 100644 --- a/builtin/survey.c +++ b/builtin/survey.c @@ -77,6 +77,102 @@ struct survey_report_object_size_summary { size_t num_missing; }; +typedef int (*survey_top_size_cmp)(struct survey_report_object_size_summary *s1, + struct survey_report_object_size_summary *s2); + +MAYBE_UNUSED +static int cmp_by_nr(struct survey_report_object_size_summary *s1, + struct survey_report_object_size_summary *s2) +{ + if (s1->nr < s2->nr) + return -1; + if (s1->nr > s2->nr) + return 1; + return 0; +} + +MAYBE_UNUSED +static int cmp_by_disk_size(struct survey_report_object_size_summary *s1, + struct survey_report_object_size_summary *s2) +{ + if (s1->disk_size < s2->disk_size) + return -1; + if (s1->disk_size > s2->disk_size) + return 1; + return 0; +} + +MAYBE_UNUSED +static int cmp_by_inflated_size(struct survey_report_object_size_summary *s1, + struct survey_report_object_size_summary *s2) +{ + if (s1->inflated_size < s2->inflated_size) + return -1; + if (s1->inflated_size > s2->inflated_size) + return 1; + return 0; +} + +/** + * Store a list of "top" categories by some sorting function. When + * inserting a new category, reorder the list and free the one that + * got ejected (if any). + */ +struct survey_report_top_sizes { + const char *name; + survey_top_size_cmp cmp_fn; + struct survey_report_object_size_summary *data; + size_t nr; + size_t alloc; +}; + +MAYBE_UNUSED +static void init_top_sizes(struct survey_report_top_sizes *top, + size_t limit, const char *name, + survey_top_size_cmp cmp) +{ + top->name = name; + top->alloc = limit; + top->nr = 0; + CALLOC_ARRAY(top->data, limit); + top->cmp_fn = cmp; +} + +MAYBE_UNUSED +static void clear_top_sizes(struct survey_report_top_sizes *top) +{ + for (size_t i = 0; i < top->nr; i++) + free(top->data[i].label); + free(top->data); +} + +MAYBE_UNUSED +static void maybe_insert_into_top_size(struct survey_report_top_sizes *top, + struct survey_report_object_size_summary *summary) +{ + size_t pos = top->nr; + + /* Compare against list from the bottom. */ + while (pos > 0 && top->cmp_fn(&top->data[pos - 1], summary) < 0) + pos--; + + /* Not big enough! */ + if (pos >= top->alloc) + return; + + /* We need to shift the data. */ + if (top->nr == top->alloc) + free(top->data[top->nr - 1].label); + else + top->nr++; + + for (size_t i = top->nr - 1; i > pos; i--) + memcpy(&top->data[i], &top->data[i - 1], sizeof(*top->data)); + + memcpy(&top->data[pos], summary, sizeof(*summary)); + top->data[pos].label = xstrdup(summary->label); +} + /** * This struct contains all of the information that needs to be printed * at the end of the exploration of the repository and its references. From patchwork Tue Sep 10 02:28:42 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13797749 Received: from mail-ej1-f49.google.com (mail-ej1-f49.google.com [209.85.218.49]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 253E517F4F7 for ; Tue, 10 Sep 2024 02:29:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.218.49 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725935358; cv=none; b=PEjoqEZBnZ+B2EGSRyFqzD0lyg55ip19yBY6LXZV1yo4Ew60C4d7JxQ+Qgwquh98sf+fOYxT9WeKCSZiEu/U3WJUGr2NUZC1uqsLFKcmLaUJnH6frHKyXPNHGq7smz22AwZZq8pGjuIKpIPIXVewT+hk9gKALpEipiAYP4YC1h0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725935358; c=relaxed/simple; bh=rKvKw3coFFkk+x6m6KQn3G/JMWJJ3SgCHNrLO/laWcQ=; h=Message-Id:In-Reply-To:References:From:Date:Subject:Content-Type: MIME-Version:To:Cc; b=T0E1haJgWQroq+gNAOKkxellPHZqoynQqblnrmrVcqDs7+YusMuYOCOH4gDY/dzQLU2IxWbTEpUq9tjJ0zq3yVXkqASgTqy3hTA0lOUHue2imTgRYdt6Gs327cNVdSIXi12o4yT71I6eoOdbbop5ctTXvOH3UGjxR+ijT64n/r4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=GTxra8DM; arc=none smtp.client-ip=209.85.218.49 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="GTxra8DM" Received: by mail-ej1-f49.google.com with SMTP id a640c23a62f3a-a8d64b27c45so175555166b.3 for ; Mon, 09 Sep 2024 19:29:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1725935355; x=1726540155; darn=vger.kernel.org; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=E3DGRIZ4uknVZo5b3VkXakDV8jAIl0BZPL96PynTuv0=; b=GTxra8DMj+4y+L9prZfOv71ZpCT9PuVXvZASgmpimBN+2ljbGN9XaSCkeS62zHXzli FwkApgR6/4tJAKm0S2Kcfh/vGEt5wrKuPhW80YQiCPNTJLt7jucFJ64z/kgvSZwUmlag l5/QRgggLR9FQW7JXPgNG4mY78QZ+PHVB9Lvc55osm3noBUv48DFh3vNSZWuMcj8kvWW 34TmwExqPajowTGZLn389FG+R0shhuK5zNSiYZdrEM13F8C3yyNJarhYPd8MScLD+xHO MUXuftO7oZ8lVWNLhAMgk5n4Psb4XcDA/wAUQN0FbfWwBeWu41tKHyxJPD3pvxvUHM7v uw+Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725935355; x=1726540155; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=E3DGRIZ4uknVZo5b3VkXakDV8jAIl0BZPL96PynTuv0=; b=M2e1PVQiT1g7DbFWm0rFR7r97UvxmsfqMw1mUoxGniJG4KAfZAnJ1svLpfQav7XAV5 9dodPrH7Ia0P3HM1cp7A8/8fLyhLvDIdS+VOVOusWJeNJ12bp9uwaHS3BfKT9YoEU7Hp u+3viIHR6eCRNVCNCksVRO6WoROguxL+AjbMDMw/VNgLn3LMSZTrtyG2MYzBg7Ar8A5z CDXRJrTnsNSIRokERIUiFFjB5EaNsYoEF+AoLKATd3YbVsGlIAIvMyQzlaFeqpXO76B/ 5NjYBIcXRCY2bUPaUeSTBbUpy57xj9yyM7V+4qmHXSZAuBpinYD8HqYdI/Uxgyq84HFI KsrQ== X-Gm-Message-State: AOJu0YzBRXqI5MKJf62+2i1HEhWS2UhVeuXfNn4FADOVEM/RwsdcAc/B 5+aTBbUNovaYaj9wl0M+iiwCTzJ93T8CidAaAUMi23ltKHsMx2mCOVf4iA== X-Google-Smtp-Source: AGHT+IEZ8NjXTGiK2DvC8B76hyuHSk3I2kWPcWoCkji/xn05eZh61iPeqstTJQDsrSWtth3BJu6BRA== X-Received: by 2002:a17:907:934d:b0:a8a:6db5:7e42 with SMTP id a640c23a62f3a-a8a885c0057mr1135869166b.9.1725935355030; Mon, 09 Sep 2024 19:29:15 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a8d25833955sm415520266b.32.2024.09.09.19.29.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 09 Sep 2024 19:29:14 -0700 (PDT) Message-Id: <9e95914d393ff83054ee419b58b9db4d3560a36c.1725935335.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Tue, 10 Sep 2024 02:28:42 +0000 Subject: [PATCH 17/30] survey: add report of "largest" paths Fcc: Sent Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, johannes.schindelin@gmx.de, peff@peff.net, ps@pks.im, me@ttaylorr.com, johncai86@gmail.com, newren@gmail.com, Derrick Stolee , Derrick Stolee From: Derrick Stolee From: Derrick Stolee Since we are already walking our reachable objects using the path-walk API, let's now collect lists of the paths that contribute most to different metrics. Specifically, we care about * Number of versions. * Total size on disk. * Total inflated size (no delta or zlib compression). This information can be critical to discovering which parts of the repository are causing the most growth, especially on-disk size. Different packing strategies might help compress data more efficiently, but the toal inflated size is a representation of the raw size of all snapshots of those paths. Even when stored efficiently on disk, that size represents how much information must be processed to complete a command such as 'git blame'. Since the on-disk size is likely to be fragile, stop testing the exact output of 'git survey' and check that the correct set of headers is output. Signed-off-by: Derrick Stolee --- builtin/survey.c | 90 +++++++++++++++++++++++++++++++++++++------ t/t8100-git-survey.sh | 12 +++++- 2 files changed, 90 insertions(+), 12 deletions(-) diff --git a/builtin/survey.c b/builtin/survey.c index ad467e9a88c..90b041967c8 100644 --- a/builtin/survey.c +++ b/builtin/survey.c @@ -80,7 +80,6 @@ struct survey_report_object_size_summary { typedef int (*survey_top_size_cmp)(struct survey_report_object_size_summary *s1, struct survey_report_object_size_summary *s2); -MAYBE_UNUSED static int cmp_by_nr(struct survey_report_object_size_summary *s1, struct survey_report_object_size_summary *s2) { @@ -91,7 +90,6 @@ static int cmp_by_nr(struct survey_report_object_size_summary *s1, return 0; } -MAYBE_UNUSED static int cmp_by_disk_size(struct survey_report_object_size_summary *s1, struct survey_report_object_size_summary *s2) { @@ -102,7 +100,6 @@ static int cmp_by_disk_size(struct survey_report_object_size_summary *s1, return 0; } -MAYBE_UNUSED static int cmp_by_inflated_size(struct survey_report_object_size_summary *s1, struct survey_report_object_size_summary *s2) { @@ -126,7 +123,6 @@ struct survey_report_top_sizes { size_t alloc; }; -MAYBE_UNUSED static void init_top_sizes(struct survey_report_top_sizes *top, size_t limit, const char *name, survey_top_size_cmp cmp) @@ -146,7 +142,6 @@ static void clear_top_sizes(struct survey_report_top_sizes *top) free(top->data); } -MAYBE_UNUSED static void maybe_insert_into_top_size(struct survey_report_top_sizes *top, struct survey_report_object_size_summary *summary) { @@ -182,6 +177,10 @@ struct survey_report { struct survey_report_object_summary reachable_objects; struct survey_report_object_size_summary *by_type; + + struct survey_report_top_sizes *top_paths_by_count; + struct survey_report_top_sizes *top_paths_by_disk; + struct survey_report_top_sizes *top_paths_by_inflate; }; #define REPORT_TYPE_COMMIT 0 @@ -423,6 +422,13 @@ static void survey_report_object_sizes(const char *title, clear_table(&table); } +static void survey_report_plaintext_sorted_size( + struct survey_report_top_sizes *top) +{ + survey_report_object_sizes(top->name, _("Path"), + top->data, top->nr); +} + static void survey_report_plaintext(struct survey_context *ctx) { printf("GIT SURVEY for \"%s\"\n", ctx->repo->worktree); @@ -433,6 +439,21 @@ static void survey_report_plaintext(struct survey_context *ctx) _("Object Type"), ctx->report.by_type, REPORT_TYPE_COUNT); + + survey_report_plaintext_sorted_size( + &ctx->report.top_paths_by_count[REPORT_TYPE_TREE]); + survey_report_plaintext_sorted_size( + &ctx->report.top_paths_by_count[REPORT_TYPE_BLOB]); + + survey_report_plaintext_sorted_size( + &ctx->report.top_paths_by_disk[REPORT_TYPE_TREE]); + survey_report_plaintext_sorted_size( + &ctx->report.top_paths_by_disk[REPORT_TYPE_BLOB]); + + survey_report_plaintext_sorted_size( + &ctx->report.top_paths_by_inflate[REPORT_TYPE_TREE]); + survey_report_plaintext_sorted_size( + &ctx->report.top_paths_by_inflate[REPORT_TYPE_BLOB]); } static void survey_report_json(struct survey_context *ctx) @@ -673,7 +694,8 @@ static void increment_totals(struct survey_context *ctx, static void increment_object_totals(struct survey_context *ctx, struct oid_array *oids, - enum object_type type) + enum object_type type, + const char *path) { struct survey_report_object_size_summary *total; struct survey_report_object_size_summary summary = { 0 }; @@ -701,6 +723,27 @@ static void increment_object_totals(struct survey_context *ctx, total->disk_size += summary.disk_size; total->inflated_size += summary.inflated_size; total->num_missing += summary.num_missing; + + if (type == OBJ_TREE || type == OBJ_BLOB) { + int index = type == OBJ_TREE ? + REPORT_TYPE_TREE : REPORT_TYPE_BLOB; + struct survey_report_top_sizes *top; + + /* + * Temporarily store (const char *) here, but it will + * be duped if inserted and will not be freed. + */ + summary.label = (char *)path; + + top = ctx->report.top_paths_by_count; + maybe_insert_into_top_size(&top[index], &summary); + + top = ctx->report.top_paths_by_disk; + maybe_insert_into_top_size(&top[index], &summary); + + top = ctx->report.top_paths_by_inflate; + maybe_insert_into_top_size(&top[index], &summary); + } } static int survey_objects_path_walk_fn(const char *path, @@ -712,7 +755,7 @@ static int survey_objects_path_walk_fn(const char *path, increment_object_counts(&ctx->report.reachable_objects, type, oids->nr); - increment_object_totals(ctx, oids, type); + increment_object_totals(ctx, oids, type, path); ctx->progress_nr += oids->nr; display_progress(ctx->progress, ctx->progress_nr); @@ -757,6 +800,34 @@ static int iterate_tag_chain(struct survey_context *ctx, return -1; } +static void initialize_report(struct survey_context *ctx) +{ + const int top_limit = 100; + + CALLOC_ARRAY(ctx->report.by_type, REPORT_TYPE_COUNT); + ctx->report.by_type[REPORT_TYPE_COMMIT].label = xstrdup(_("Commits")); + ctx->report.by_type[REPORT_TYPE_TREE].label = xstrdup(_("Trees")); + ctx->report.by_type[REPORT_TYPE_BLOB].label = xstrdup(_("Blobs")); + + CALLOC_ARRAY(ctx->report.top_paths_by_count, REPORT_TYPE_COUNT); + init_top_sizes(&ctx->report.top_paths_by_count[REPORT_TYPE_TREE], + top_limit, _("TOP DIRECTORIES BY COUNT"), cmp_by_nr); + init_top_sizes(&ctx->report.top_paths_by_count[REPORT_TYPE_BLOB], + top_limit, _("TOP FILES BY COUNT"), cmp_by_nr); + + CALLOC_ARRAY(ctx->report.top_paths_by_disk, REPORT_TYPE_COUNT); + init_top_sizes(&ctx->report.top_paths_by_disk[REPORT_TYPE_TREE], + top_limit, _("TOP DIRECTORIES BY DISK SIZE"), cmp_by_disk_size); + init_top_sizes(&ctx->report.top_paths_by_disk[REPORT_TYPE_BLOB], + top_limit, _("TOP FILES BY DISK SIZE"), cmp_by_disk_size); + + CALLOC_ARRAY(ctx->report.top_paths_by_inflate, REPORT_TYPE_COUNT); + init_top_sizes(&ctx->report.top_paths_by_inflate[REPORT_TYPE_TREE], + top_limit, _("TOP DIRECTORIES BY INFLATED SIZE"), cmp_by_inflated_size); + init_top_sizes(&ctx->report.top_paths_by_inflate[REPORT_TYPE_BLOB], + top_limit, _("TOP FILES BY INFLATED SIZE"), cmp_by_inflated_size); +} + static void survey_phase_objects(struct survey_context *ctx) { struct rev_info revs = REV_INFO_INIT; @@ -774,10 +845,7 @@ static void survey_phase_objects(struct survey_context *ctx) info.blobs = 1; info.tags = 1; - CALLOC_ARRAY(ctx->report.by_type, REPORT_TYPE_COUNT); - ctx->report.by_type[REPORT_TYPE_COMMIT].label = xstrdup(_("Commits")); - ctx->report.by_type[REPORT_TYPE_TREE].label = xstrdup(_("Trees")); - ctx->report.by_type[REPORT_TYPE_BLOB].label = xstrdup(_("Blobs")); + initialize_report(ctx); repo_init_revisions(ctx->repo, &revs, ""); diff --git a/t/t8100-git-survey.sh b/t/t8100-git-survey.sh index f8af9601214..c2dab0033f9 100755 --- a/t/t8100-git-survey.sh +++ b/t/t8100-git-survey.sh @@ -60,7 +60,17 @@ test_expect_success 'git survey (default)' ' Blobs | 10 | 191 | 101 EOF - test_cmp expect out + lines=$(wc -l out-trimmed && + test_cmp expect out-trimmed && + + for type in "DIRECTORIES" "FILES" + do + for metric in "COUNT" "DISK SIZE" "INFLATED SIZE" + do + grep "TOP $type BY $metric" out || return 1 + done || return 1 + done ' test_done From patchwork Tue Sep 10 02:28:43 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13797750 Received: from mail-ej1-f49.google.com (mail-ej1-f49.google.com [209.85.218.49]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 00B0818132A for ; Tue, 10 Sep 2024 02:29:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.218.49 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725935359; cv=none; b=hZnWqSBW5PJPrDaN8pnFHQ/qbb5QuvmNc8JzOyREE9hB0IulKpyezu65I3CoD7BfPbF8yIPd0V63UnM9OCwbO8CbeWbwWeD2d56j1nsj0HeGVjm1y2KJ4VmaT7AeD43sUgoiR74LkZS1PIzPkDOLtOPwnjHoiUqNJp4MyRVltBc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725935359; c=relaxed/simple; bh=VvZ16Ozx0DyoLT8whGnDIq5AaFJThd9mhJCzrRsC+nE=; h=Message-Id:In-Reply-To:References:From:Date:Subject:Content-Type: MIME-Version:To:Cc; b=nePHrNLNjpZ7Gt57eqPFrCCQCkt4gkO9YGXlNaeb29gBHAcryNphR80hp7hEkTXVnP8LGE2ksmOZMfqxxfaZs8QP8HXIKkI6FZDW5TVP/Q8EwSWAHBZalqeWkO8TmsiT8PO8V9afJ7son/fDXK3UEyIbAhcnL3USXhtmr9foIXg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=U73UYKao; arc=none smtp.client-ip=209.85.218.49 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="U73UYKao" Received: by mail-ej1-f49.google.com with SMTP id a640c23a62f3a-a8d2daa2262so253320466b.1 for ; Mon, 09 Sep 2024 19:29:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1725935356; x=1726540156; darn=vger.kernel.org; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=5SrrzhqQHt1RgqpPZdritqFwaRXue5QjWL7rwmunmcI=; b=U73UYKao04nCErVjLvETqG8uNDko78+gOsEM35GV4caBM1v1TDllkOJZv3ZabVW1h+ K+9uj1fTAXAWZzUyaurTJJWIy9Ssf6w6Vet+yky4hetV7YkNPGNtJmGnPyJmmDgZYJse 0ZeJdgRHwgksnqDUwtZ89XkyYvLXvPKcVSkhlPPwPGqgNtFj6Y3xAy7FeLcFB61j7MZS saMTWtt8Qh6TFia3r2BqwoyCfNL6NzwWJOvjmGQaOy+aSZnLajBdx9JopoNQHW37yIOQ UAAHpkmmo2/cMAQDQD76s6p+jhUksU4WuYN5Xs1brv6d0/D/Q8wbRuE7YWANLXpN8dcf Lu0A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725935356; x=1726540156; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=5SrrzhqQHt1RgqpPZdritqFwaRXue5QjWL7rwmunmcI=; b=P0tTLo2zBQNHeRP0tM6fZ17SGCYxRrYfM0Lg3M6gitQOK0wp5spMEw4LoMiGAN7DS2 rIFL0rRgHbgH3HgBS61gYiSJyxKL9XFv2vUKkN0F12D/IMBgIV38VHBPmjnmN7epqwwu YuTT0B68rhQjUZeFNEanTYZ9LaeXox+jx8lZTFK0pxTIi4Uf93HrzDNDoPVlioY52ipp 5mwG4gD1CG102b375LFV7njolJygr1rStaXf5ybnl8eVaDLkbh/zjBAvRfGERzt+tS8s ALpM5B4NE9Z6rR+PNcMY3PJ3GwejWThndvyhfw5SITry8ErH3bUcc/jj66L0WrEkZ+l5 C4Nw== X-Gm-Message-State: AOJu0Yx55RhpckUXcKfWuXPSwQFVfTns3yElk5oxMZqvxjKDlbU7Xs3H teHaNOmV4URV4drwPlkdsM4aZxNrrmYBndbYpnIcbsouTy+R3gQzXmS4ag== X-Google-Smtp-Source: AGHT+IHOCWKipwYFAZsginAZHWI8rKphguiL5aHAMis6qHQ+ZYHxM44MfWRRO1XkRJa9ocaHGTXMSg== X-Received: by 2002:a17:906:dc8e:b0:a89:f1b9:d391 with SMTP id a640c23a62f3a-a8a885f2d85mr993129366b.14.1725935355654; Mon, 09 Sep 2024 19:29:15 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a8d25d40064sm416508366b.190.2024.09.09.19.29.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 09 Sep 2024 19:29:15 -0700 (PDT) Message-Id: <98a854c4b542309269f56ba0ae8b9a7c1504e409.1725935335.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Tue, 10 Sep 2024 02:28:43 +0000 Subject: [PATCH 18/30] revision: create mark_trees_uninteresting_dense() Fcc: Sent Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, johannes.schindelin@gmx.de, peff@peff.net, ps@pks.im, me@ttaylorr.com, johncai86@gmail.com, newren@gmail.com, Derrick Stolee , Derrick Stolee From: Derrick Stolee From: Derrick Stolee The sparse tree walk algorithm was created in d5d2e93577e (revision: implement sparse algorithm, 2019-01-16) and involves using the mark_trees_uninteresting_sparse() method. This method takes a repository and an oidset of tree IDs, some of which have the UNINTERESTING flag and some of which do not. Create a method that has an equivalent set of preconditions but uses a "dense" walk (recursively visits all reachable trees, as long as they have not previously been marked UNINTERESTING). This is an important difference from mark_tree_uninteresting(), which short-circuits if the given tree has the UNINTERESTING flag. A use of this method will be added in a later change, with a condition set whether the sparse or dense approach should be used. Signed-off-by: Derrick Stolee --- revision.c | 15 +++++++++++++++ revision.h | 1 + 2 files changed, 16 insertions(+) diff --git a/revision.c b/revision.c index ac94f8d4292..21c8b6d1bc0 100644 --- a/revision.c +++ b/revision.c @@ -219,6 +219,21 @@ static void add_children_by_path(struct repository *r, free_tree_buffer(tree); } +void mark_trees_uninteresting_dense(struct repository *r, + struct oidset *trees) +{ + struct object_id *oid; + struct oidset_iter iter; + + oidset_iter_init(trees, &iter); + while ((oid = oidset_iter_next(&iter))) { + struct tree *tree = lookup_tree(r, oid); + + if (tree->object.flags & UNINTERESTING) + mark_tree_contents_uninteresting(r, tree); + } +} + void mark_trees_uninteresting_sparse(struct repository *r, struct oidset *trees) { diff --git a/revision.h b/revision.h index 0e470d1df19..6c3df8e42bf 100644 --- a/revision.h +++ b/revision.h @@ -487,6 +487,7 @@ void put_revision_mark(const struct rev_info *revs, void mark_parents_uninteresting(struct rev_info *revs, struct commit *commit); void mark_tree_uninteresting(struct repository *r, struct tree *tree); +void mark_trees_uninteresting_dense(struct repository *r, struct oidset *trees); void mark_trees_uninteresting_sparse(struct repository *r, struct oidset *trees); void show_object_with_name(FILE *, struct object *, const char *); From patchwork Tue Sep 10 02:28:44 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13797751 Received: from mail-ej1-f44.google.com (mail-ej1-f44.google.com [209.85.218.44]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6FEFB17DE16 for ; Tue, 10 Sep 2024 02:29:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.218.44 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725935360; cv=none; b=D6CCh8OMEhm8vljFlFYtjZVNVVeJo+h7NQrxF0akaa2ofDBBbn31MhjuPkdL/rfw5yX0w20PV54vEls7BlVVmauJjeov89cPE+zNSU9rsqof7W6GAk91ZEz34Sff4qJ9vZlHqUE5t9W6nYLcXgU7ekhLoRnALoOjaHMZvSjS1Xg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725935360; c=relaxed/simple; bh=ttDuIh7z4Xnl03pc9eNTRzaKs1ja7alLscoirLt4hJA=; h=Message-Id:In-Reply-To:References:From:Date:Subject:Content-Type: MIME-Version:To:Cc; b=XUs5ZDC8yQHzW6HaGzdeSrk4lCLRNFmhxCtuia9ItU1CRPpWr9kn+rwKeb3D3hXZhfEU0G/3W2nHaIoYXJJ091/DZU1LakuaVg6efECLCLwtu3u8+e9RlSq4QHRo3pZOeGH0YuF8+j+I8Q6Ft5tTXVMev0Bfiq+PprwmORkj9H4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=U4B+t/5w; arc=none smtp.client-ip=209.85.218.44 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="U4B+t/5w" Received: by mail-ej1-f44.google.com with SMTP id a640c23a62f3a-a8b155b5e9eso22396966b.1 for ; Mon, 09 Sep 2024 19:29:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1725935356; x=1726540156; darn=vger.kernel.org; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=WMpP/BeX9sULLuqaiDSEQp87PU3BsZjmiWXIXZZTv0I=; b=U4B+t/5wCwoDvcRYWiMqeguAjMJ7PGSRgivW2A+YpDhpoQdOgXrL7iZOkc0HD1m4SU 7eiQ5bpHLKfjuwpWvn75y+F/n9vepG+nvXkI5mmtNlS/4S54RKpxvJumReNeslFaDAbX IqdVYLSMdnZsdjFp8uI+8thmCD3BAZmIUqlo2bnR8xLZvol9DSCFHKsmSlOPEpMXea+h Jbq8k9gYN0XAHFFhDAfp3V0nIJ4NdZyIEVKwEoBb3XmuATFmoXEwlCFr3/Ij/tldjR7D urowStUz9pBujPlnIYTOrndH38vmKQ9vTwunvz/x/z2y573kUtGtV6VmX1lBQ4s0ol4e BMFQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725935356; x=1726540156; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=WMpP/BeX9sULLuqaiDSEQp87PU3BsZjmiWXIXZZTv0I=; b=P9mtyf7Isq4tbyxhAZ+UGDTvlxeojqCWwoSsHJ5pAP9mQHgTz7glqhGQ4g28xeFjWG 8k9mr0UarOYULO434/05BDcOkUDmpNjKeXJHHLxbl3iRbH07w0t4w1xz1W/qejBsrkDr VHpScTfo7t2/8mOTGlvqR4s/lfYW4TEbOUc79PyJdNAiGKJXIzvSuddsDO6IGkCz2FtY 0xsUHwzJ627FtJ5Dl1mFd1shOqHJtXSTUmP7kWYVNpA4bhG05VAwwX2FaWXZiP5DfdZn nBdakwtwZed40LOqbYwsXfO4wTmIlHI1TV2loWC7zM0uZbtO24BCxNYT7hM7Wd2ajMcQ V4qg== X-Gm-Message-State: AOJu0Yx2Zn/1g+vFF0Y2VzNGqDVmdlvnl2iDjLcppw7BP7nIfqM6WmJ+ +aGGuqpnjHMO8xUS65kOJt32s4lcRYBjxTRYrqqVj2/awSu8Fiqq2o0ZeQ== X-Google-Smtp-Source: AGHT+IEEZCB61l66b9Cts+iPAG7zacznUPZCX+lD4eIQU0sJcJHy2rFghn+0jP28NjvylpDL4mS9TA== X-Received: by 2002:a17:907:940a:b0:a8a:83e9:43e2 with SMTP id a640c23a62f3a-a8a885bdf9emr937945466b.12.1725935356321; Mon, 09 Sep 2024 19:29:16 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a8d2598b85bsm413200666b.77.2024.09.09.19.29.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 09 Sep 2024 19:29:15 -0700 (PDT) Message-Id: <78168d98bfc0df7151eac5280e12b95c9fb694ec.1725935335.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Tue, 10 Sep 2024 02:28:44 +0000 Subject: [PATCH 19/30] path-walk: add prune_all_uninteresting option Fcc: Sent Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, johannes.schindelin@gmx.de, peff@peff.net, ps@pks.im, me@ttaylorr.com, johncai86@gmail.com, newren@gmail.com, Derrick Stolee , Derrick Stolee From: Derrick Stolee From: Derrick Stolee This option causes the path-walk API to act like the sparse tree-walk algorithm implemented by mark_trees_uninteresting_sparse() in list-objects.c. Starting from the commits marked as UNINTERESTING, their root trees and all objects reachable from those trees are UNINTERSTING, at least as we walk path-by-path. When we reach a path where all objects associated with that path are marked UNINTERESTING, then do no continue walking the children of that path. We need to be careful to pass the UNINTERESTING flag in a deep way on the UNINTERESTING objects before we start the path-walk, or else the depth-first search for the path-walk API may accidentally report some objects as interesting. Signed-off-by: Derrick Stolee --- path-walk.c | 68 ++++++++++++++++++++++++++++++++++++++++++++++++++--- path-walk.h | 8 +++++++ 2 files changed, 73 insertions(+), 3 deletions(-) diff --git a/path-walk.c b/path-walk.c index 65f9856afa2..08de29614f7 100644 --- a/path-walk.c +++ b/path-walk.c @@ -23,6 +23,7 @@ struct type_and_oid_list { enum object_type type; struct oid_array oids; + int maybe_interesting; }; #define TYPE_AND_OID_LIST_INIT { \ @@ -139,6 +140,9 @@ static int add_children(struct path_walk_context *ctx, list->type = type; strmap_put(&ctx->paths_to_lists, path.buf, list); string_list_append(&ctx->path_stack, path.buf); + + if (!(o->flags & UNINTERESTING)) + list->maybe_interesting = 1; } oid_array_append(&list->oids, &entry.oid); } @@ -161,6 +165,40 @@ static int walk_path(struct path_walk_context *ctx, list = strmap_get(&ctx->paths_to_lists, path); + if (ctx->info->prune_all_uninteresting) { + /* + * This is true if all objects were UNINTERESTING + * when added to the list. + */ + if (!list->maybe_interesting) + return 0; + + /* + * But it's still possible that the objects were set + * as UNINTERESTING after being added. Do a quick check. + */ + list->maybe_interesting = 0; + for (size_t i = 0; + !list->maybe_interesting && i < list->oids.nr; + i++) { + if (list->type == OBJ_TREE) { + struct tree *t = lookup_tree(ctx->repo, + &list->oids.oid[i]); + if (t && !(t->object.flags & UNINTERESTING)) + list->maybe_interesting = 1; + } else { + struct blob *b = lookup_blob(ctx->repo, + &list->oids.oid[i]); + if (b && !(b->object.flags & UNINTERESTING)) + list->maybe_interesting = 1; + } + } + + /* We have confirmed that all objects are UNINTERESTING. */ + if (!list->maybe_interesting) + return 0; + } + /* Evaluate function pointer on this data, if requested. */ if ((list->type == OBJ_TREE && ctx->info->trees) || (list->type == OBJ_BLOB && ctx->info->blobs)) @@ -203,7 +241,7 @@ static void clear_strmap(struct strmap *map) int walk_objects_by_path(struct path_walk_info *info) { const char *root_path = ""; - int ret = 0; + int ret = 0, has_uninteresting = 0; size_t commits_nr = 0, paths_nr = 0; struct commit *c; struct type_and_oid_list *root_tree_list; @@ -215,6 +253,7 @@ int walk_objects_by_path(struct path_walk_info *info) .path_stack = STRING_LIST_INIT_DUP, .paths_to_lists = STRMAP_INIT }; + struct oidset root_tree_set = OIDSET_INIT; struct oid_array tagged_tree_list = OID_ARRAY_INIT; struct oid_array tagged_blob_list = OID_ARRAY_INIT; @@ -227,7 +266,9 @@ int walk_objects_by_path(struct path_walk_info *info) /* Insert a single list for the root tree into the paths. */ CALLOC_ARRAY(root_tree_list, 1); root_tree_list->type = OBJ_TREE; + root_tree_list->maybe_interesting = 1; strmap_put(&ctx.paths_to_lists, root_path, root_tree_list); + if (prepare_revision_walk(info->revs)) die(_("failed to setup revision walk")); @@ -247,11 +288,17 @@ int walk_objects_by_path(struct path_walk_info *info) oid = get_commit_tree_oid(c); t = lookup_tree(info->revs->repo, oid); - if (t) + if (t) { + oidset_insert(&root_tree_set, oid); oid_array_append(&root_tree_list->oids, oid); - else + } else { warning("could not find tree %s", oid_to_hex(oid)); + } + if (t && (c->object.flags & UNINTERESTING)) { + t->object.flags |= UNINTERESTING; + has_uninteresting = 1; + } } trace2_data_intmax("path-walk", ctx.repo, "commits", commits_nr); @@ -318,6 +365,21 @@ int walk_objects_by_path(struct path_walk_info *info) oid_array_clear(&tagged_blob_list); } + /* + * Before performing a DFS of our paths and emitting them as interesting, + * do a full walk of the trees to distribute the UNINTERESTING bit. Use + * the sparse algorithm if prune_all_uninteresting was set. + */ + if (has_uninteresting) { + trace2_region_enter("path-walk", "uninteresting-walk", info->revs->repo); + if (info->prune_all_uninteresting) + mark_trees_uninteresting_sparse(ctx.repo, &root_tree_set); + else + mark_trees_uninteresting_dense(ctx.repo, &root_tree_set); + trace2_region_leave("path-walk", "uninteresting-walk", info->revs->repo); + } + oidset_clear(&root_tree_set); + string_list_append(&ctx.path_stack, root_path); trace2_region_enter("path-walk", "path-walk", info->revs->repo); diff --git a/path-walk.h b/path-walk.h index 637d3b0cabb..7c02bca7156 100644 --- a/path-walk.h +++ b/path-walk.h @@ -50,6 +50,14 @@ struct path_walk_info { * the sparse-checkout patterns. */ struct pattern_list *pl; + + /** + * When 'prune_all_uninteresting' is set and a path has all objects + * marked as UNINTERESTING, then the path-walk will not visit those + * objects. It will not call path_fn on those objects and will not + * walk the children of such trees. + */ + int prune_all_uninteresting; }; #define PATH_WALK_INFO_INIT { \ From patchwork Tue Sep 10 02:28:45 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13797752 Received: from mail-ej1-f54.google.com (mail-ej1-f54.google.com [209.85.218.54]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A0DFF17DFE9 for ; Tue, 10 Sep 2024 02:29:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.218.54 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725935361; cv=none; b=TA8sgvF0rOFVDNk+rY9If4a6yNBa3f2+Bocp4q+9sXtYESshdeo5BAAF9KbPAE1OIXI7OU6ZqRvXnOIeiUngSXN3+20VP/+gZNdanpJYq5VQR2zBf3rKHxadMZ5NENtscX/imOB6bnuinqSDQ6iLWRLHHhDBkS0Qn/mPPbf70LU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725935361; c=relaxed/simple; bh=CWAdBslJWl6GFvz/RSbl236IaaskIQhXAcQ7O7ETmuA=; h=Message-Id:In-Reply-To:References:From:Date:Subject:Content-Type: MIME-Version:To:Cc; b=WiCGFRCrt7VaV7Y3SA3Kw4njtvpfWxLwsuReju0TQuA3zNc6BGqkmJLCVjPFZKaz9H0IfXm0ESgyVB24b4pBkxJQRQatNfjr8EOK//DcT5pqUc/yAFFR2H+El+9axvcUPNSK62+1x9J6yQoEy5OpS0khfMQhgJ7rmp+dJmXyPjA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=IPLSlpUc; arc=none smtp.client-ip=209.85.218.54 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="IPLSlpUc" Received: by mail-ej1-f54.google.com with SMTP id a640c23a62f3a-a7a843bef98so292835066b.2 for ; Mon, 09 Sep 2024 19:29:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1725935358; x=1726540158; darn=vger.kernel.org; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=OiaOfXbbtucRsfMahbpSduQIJc1ySiE1ouyPD34VKro=; b=IPLSlpUc2Su7bfIsiffRqUfiChsxw0ZIbpiE8fj8GOnVYAWgeMtrXl4meOx24R+Hic 0ZytJh2I8GcLZIRLMhnbeMtzLOuQ6Kc4nZmkldw6yCi4H4fnP931SQyAYMgdUBW9ZcxU v2+0b1NL1tiS9SJh4iGJih6tpl8wVs6kdHAwtlhFSEn827SXlFCONCfc5VBWNrI//8e2 a+J01JvF4BgbOEcVWJS7C0Eh1Ufyv2DpoP0oHVwpXEMpOSC6Em0+RNiqbEuNmDadixNF itvF2lCSYjds5k300p88F3nmiBStcMYUKhpsw9xJqFOcILljaBz7tEn03tzEsMNezrrU A0/Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725935358; x=1726540158; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=OiaOfXbbtucRsfMahbpSduQIJc1ySiE1ouyPD34VKro=; b=ZxNCpC+dvoL2sIGsuHacFDYYCTBHkhImH0IUXUa9dt6TsYZTnefjohHiL9yDhJbXkz gweAeWgTWwgglEHevrXIYIZuCadM7gw9rVb1RsA2PlJPfWFg0rnKeM229e9H2fFvyqCy IdTNC85jpHDhWAKxwRn8tPppezCmYMx9hNz0CWbW+n7D15kDG4rbavr2rkHsZb0Iw4qv BgJs3TTM+q2oDf0VsBIzgSF4NF0AHWWl7zHbc3kH15Opbq/jqdSDQvZs2iTRs5B4a6jZ E2P7lAVQFnMVtNKU+uDzdnQJqnzRXLTv4Re+TaSC7RTTs9S+gEkOMOSgpicfHQIhKWas Fv8w== X-Gm-Message-State: AOJu0Yx/5ezxoXXpRWL1r+aYmK2xW9BPLK/49B0skFBw+4szZBQmrgyC waTqg9ZRc9mjGRt7O81L2ZnEcgt3WLtZs8yZrK4oN/icxfHLTV1110O1kg== X-Google-Smtp-Source: AGHT+IEo6uUPXmIn00qP6nwTPVwLJv0/oEoFSzrCmXEPU16B+y6gzPAcm/+MXhP3r2E3wDIp7+QSpA== X-Received: by 2002:a17:906:da85:b0:a7a:9f0f:ab2c with SMTP id a640c23a62f3a-a8a8866090amr1175665166b.29.1725935357536; Mon, 09 Sep 2024 19:29:17 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a8d25a073b1sm412734866b.82.2024.09.09.19.29.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 09 Sep 2024 19:29:16 -0700 (PDT) Message-Id: <3455af21e1bea375f38d21cc3b1d718ca6e563e4.1725935335.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Tue, 10 Sep 2024 02:28:45 +0000 Subject: [PATCH 20/30] pack-objects: add --path-walk option Fcc: Sent Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, johannes.schindelin@gmx.de, peff@peff.net, ps@pks.im, me@ttaylorr.com, johncai86@gmail.com, newren@gmail.com, Derrick Stolee , Derrick Stolee From: Derrick Stolee From: Derrick Stolee In order to more easily compute delta bases among objects that appear at the exact same path, add a --path-walk option to 'git pack-objects'. This option will use the path-walk API instead of the object walk given by the revision machinery. Since objects will be provided in batches representing a common path, those objects can be tested for delta bases immediately instead of waiting for a sort of the full object list by name-hash. This has multiple benefits, including avoiding collisions by name-hash. The objects marked as UNINTERESTING are included in these batches, so we are guaranteeing some locality to find good delta bases. After the individual passes are done on a per-path basis, the default name-hash is used to find other opportunistic delta bases that did not match exactly by the full path name. RFC TODO: It is important to note that this option is inherently incompatible with using a bitmap index. This walk probably also does not work with other advanced features, such as delta islands. Getting ahead of myself, this option compares well with --full-name-hash when the packfile is large enough, but also performs at least as well as the default in all cases that I've seen. RFC TODO: this should probably be recording the batch locations to another list so they could be processed in a second phase using threads. RFC TODO: list some examples of how this outperforms previous pack-objects strategies. (This is coming in later commits that include performance test changes.) Signed-off-by: Derrick Stolee --- builtin/pack-objects.c | 209 ++++++++++++++++++++++++++++++++++------- path-walk.c | 2 +- 2 files changed, 177 insertions(+), 34 deletions(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 778be80f564..3d0bb33427d 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -39,6 +39,9 @@ #include "promisor-remote.h" #include "pack-mtimes.h" #include "parse-options.h" +#include "blob.h" +#include "tree.h" +#include "path-walk.h" /* * Objects we are going to pack are collected in the `to_pack` structure. @@ -215,6 +218,7 @@ static int delta_search_threads; static int pack_to_stdout; static int sparse; static int thin; +static int path_walk; static int num_preferred_base; static struct progress *progress_state; @@ -3139,6 +3143,38 @@ static int add_ref_tag(const char *tag UNUSED, const char *referent UNUSED, cons return 0; } +static int should_attempt_deltas(struct object_entry *entry) +{ + if (DELTA(entry)) + /* This happens if we decided to reuse existing + * delta from a pack. "reuse_delta &&" is implied. + */ + return 0; + + if (!entry->type_valid || + oe_size_less_than(&to_pack, entry, 50)) + return 0; + + if (entry->no_try_delta) + return 0; + + if (!entry->preferred_base) { + if (oe_type(entry) < 0) + die(_("unable to get type of object %s"), + oid_to_hex(&entry->idx.oid)); + } else { + if (oe_type(entry) < 0) { + /* + * This object is not found, but we + * don't have to include it anyway. + */ + return 0; + } + } + + return 1; +} + static void prepare_pack(int window, int depth) { struct object_entry **delta_list; @@ -3169,33 +3205,11 @@ static void prepare_pack(int window, int depth) for (i = 0; i < to_pack.nr_objects; i++) { struct object_entry *entry = to_pack.objects + i; - if (DELTA(entry)) - /* This happens if we decided to reuse existing - * delta from a pack. "reuse_delta &&" is implied. - */ + if (!should_attempt_deltas(entry)) continue; - if (!entry->type_valid || - oe_size_less_than(&to_pack, entry, 50)) - continue; - - if (entry->no_try_delta) - continue; - - if (!entry->preferred_base) { + if (!entry->preferred_base) nr_deltas++; - if (oe_type(entry) < 0) - die(_("unable to get type of object %s"), - oid_to_hex(&entry->idx.oid)); - } else { - if (oe_type(entry) < 0) { - /* - * This object is not found, but we - * don't have to include it anyway. - */ - continue; - } - } delta_list[n++] = entry; } @@ -4110,6 +4124,117 @@ static void mark_bitmap_preferred_tips(void) } } +static inline int is_oid_interesting(struct repository *repo, + struct object_id *oid, + enum object_type type) +{ + if (type == OBJ_TAG) { + struct tag *t = lookup_tag(repo, oid); + return t && !(t->object.flags & UNINTERESTING); + } + + if (type == OBJ_COMMIT) { + struct commit *c = lookup_commit(repo, oid); + return c && !(c->object.flags & UNINTERESTING); + } + + if (type == OBJ_TREE) { + struct tree *t = lookup_tree(repo, oid); + return t && !(t->object.flags & UNINTERESTING); + } + + if (type == OBJ_BLOB) { + struct blob *b = lookup_blob(repo, oid); + return b && !(b->object.flags & UNINTERESTING); + } + + return 0; +} + +static int add_objects_by_path(const char *path, + struct oid_array *oids, + enum object_type type, + void *data) +{ + struct object_entry **delta_list; + size_t oe_start = to_pack.nr_objects; + size_t oe_end; + unsigned int sub_list_size; + unsigned int *processed = data; + + /* + * First, add all objects to the packing data, including the ones + * marked UNINTERESTING (translated to 'exclude') as they can be + * used as delta bases. + */ + for (size_t i = 0; i < oids->nr; i++) { + struct object_id *oid = &oids->oid[i]; + int exclude = !is_oid_interesting(the_repository, oid, type); + add_object_entry(oid, type, path, exclude); + } + + oe_end = to_pack.nr_objects; + + /* We can skip delta calculations if it is a no-op. */ + if (oe_end == oe_start || !window) + return 0; + + sub_list_size = 0; + ALLOC_ARRAY(delta_list, oe_end - oe_start); + + for (size_t i = 0; i < oe_end - oe_start; i++) { + struct object_entry *entry = to_pack.objects + oe_start + i; + + if (!should_attempt_deltas(entry)) + continue; + + delta_list[sub_list_size++] = entry; + } + + /* + * Find delta bases among this list of objects that all match the same + * path. This causes the delta compression to be interleaved in the + * object walk, which can lead to confusing progress indicators. This is + * also incompatible with threaded delta calculations. In the future, + * consider creating a list of regions in the full to_pack.objects array + * that could be picked up by the threaded delta computation. + */ + if (sub_list_size && window) { + QSORT(delta_list, sub_list_size, type_size_sort); + find_deltas(delta_list, &sub_list_size, window, depth, processed); + } + + free(delta_list); + return 0; +} + +static void get_object_list_path_walk(struct rev_info *revs) +{ + struct path_walk_info info = PATH_WALK_INFO_INIT; + unsigned int processed = 0; + + info.revs = revs; + + info.revs->tag_objects = 1; + info.tags = 1; + info.commits = 1; + info.trees = 1; + info.blobs = 1; + info.path_fn = add_objects_by_path; + info.path_fn_data = &processed; + + /* + * Allow the --[no-]sparse option to be interesting here, if only + * for testing purposes. Paths with no interesting objects will not + * contribute to the resulting pack, but only create noisy preferred + * base objects. + */ + info.prune_all_uninteresting = sparse; + + if (walk_objects_by_path(&info)) + die(_("failed to pack objects via path-walk")); +} + static void get_object_list(struct rev_info *revs, int ac, const char **av) { struct setup_revision_opt s_r_opt = { @@ -4156,7 +4281,7 @@ static void get_object_list(struct rev_info *revs, int ac, const char **av) warn_on_object_refname_ambiguity = save_warning; - if (use_bitmap_index && !get_object_list_from_bitmap(revs)) + if (use_bitmap_index && !path_walk && !get_object_list_from_bitmap(revs)) return; if (use_delta_islands) @@ -4165,15 +4290,19 @@ static void get_object_list(struct rev_info *revs, int ac, const char **av) if (write_bitmap_index) mark_bitmap_preferred_tips(); - if (prepare_revision_walk(revs)) - die(_("revision walk setup failed")); - mark_edges_uninteresting(revs, show_edge, sparse); - if (!fn_show_object) fn_show_object = show_object; - traverse_commit_list(revs, - show_commit, fn_show_object, - NULL); + + if (path_walk) { + get_object_list_path_walk(revs); + } else { + if (prepare_revision_walk(revs)) + die(_("revision walk setup failed")); + mark_edges_uninteresting(revs, show_edge, sparse); + traverse_commit_list(revs, + show_commit, fn_show_object, + NULL); + } if (unpack_unreachable_expiration) { revs->ignore_missing_links = 1; @@ -4368,6 +4497,8 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) N_("use the sparse reachability algorithm")), OPT_BOOL(0, "thin", &thin, N_("create thin packs")), + OPT_BOOL(0, "path-walk", &path_walk, + N_("use the path-walk API to walk objects when possible")), OPT_BOOL(0, "shallow", &shallow, N_("create packs suitable for shallow fetches")), OPT_BOOL(0, "honor-pack-keep", &ignore_packed_keep_on_disk, @@ -4448,7 +4579,19 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) window = 0; strvec_push(&rp, "pack-objects"); - if (thin) { + + if (path_walk && filter_options.choice) { + warning(_("cannot use --filter with --path-walk")); + path_walk = 0; + } + if (path_walk) { + strvec_push(&rp, "--boundary"); + /* + * We must disable the bitmaps because we are removing + * the --objects / --objects-edge[-aggressive] options. + */ + use_bitmap_index = 0; + } else if (thin) { use_internal_rev_list = 1; strvec_push(&rp, shallow ? "--objects-edge-aggressive" diff --git a/path-walk.c b/path-walk.c index 08de29614f7..9391e0579ae 100644 --- a/path-walk.c +++ b/path-walk.c @@ -306,7 +306,7 @@ int walk_objects_by_path(struct path_walk_info *info) /* Track all commits. */ if (info->commits) - ret = info->path_fn("", &commit_list->oids, OBJ_COMMIT, + ret = info->path_fn("initial", &commit_list->oids, OBJ_COMMIT, info->path_fn_data); oid_array_clear(&commit_list->oids); free(commit_list); From patchwork Tue Sep 10 02:28:46 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13797753 Received: from mail-ej1-f54.google.com (mail-ej1-f54.google.com [209.85.218.54]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4788B184521 for ; Tue, 10 Sep 2024 02:29:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.218.54 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725935361; cv=none; b=Za70ysSUfYMhCdHSyXA5sysEpSIZIHpbpvXQ5UGM3K/wVyA6sBl62eBdq5mR1Zggt4IXbDCIRMWiTaIU2hdFRRsmDAycJm7TrArMZ0KrWKvp87/ml6ZJmJF/TQunwddiasppBAI9/4zSiIvLMJwbSdYwEXqhKMPEbDCdOt1Nfl4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725935361; c=relaxed/simple; bh=+25DMw5JT39mPlKaeecjlM5vJOsWhF/gOktQ7wz7nRY=; h=Message-Id:In-Reply-To:References:From:Date:Subject:Content-Type: MIME-Version:To:Cc; b=hvN3RVJe8wsgr2FOILRL/KjTdEfqUlwNMISXq3LVBpv/3l9tNGv1zfV0udYWHqxqatOlLn+o+JJMCo+yCjnruvEWoCqAR4sEcurDDAUEspbaNkl69knHgKkCgkzMdAun8AJPcgAHXIivYh3+LoWuS1vmK52vXzvrhC5KdSjwmwY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=aNXw2nR7; arc=none smtp.client-ip=209.85.218.54 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="aNXw2nR7" Received: by mail-ej1-f54.google.com with SMTP id a640c23a62f3a-a8d43657255so276571166b.0 for ; Mon, 09 Sep 2024 19:29:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1725935358; x=1726540158; darn=vger.kernel.org; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=/tbkxkoReMZi9SHv05I5ka/s+XtDoF2jN9IpBDQDIZA=; b=aNXw2nR756/FHJDY7ZIQTzUq+4HIB7oTX0qWersE80iuU6gbE7lSJgXywyOoNUskUC yjT3uVVatFixmGRapIBxkCUEc1t1+3oYuBQfgpNRPvhTsOX1EJL6jGVQydFRjpS8SrbL ++tR+Nf/Kbxty+2ZTlX6SGlOG/nr4DmpmsZ+VKbAHeEw6VH4dAJH5zIJmFFjceVqB9N8 R9rVIFdFwSMqpdWprEgUNCFIOypYPz7Zq2VqeWESiL/3yUfwMqJBiNor4h2MTIqJ+KOM 62VqrI9KsVmaDYBZltgMVmUQOpqJ6TYHePLTaIuieDF1Eez2Fkug9YvNwQ4N5kJsw5M0 /keQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725935358; x=1726540158; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=/tbkxkoReMZi9SHv05I5ka/s+XtDoF2jN9IpBDQDIZA=; b=WAIVrXbgFdMny6daoOAOGzPIQV7tgUN9v0YKuACBgpm+CJvbDJ6MDP60Mk9OspDGvS mlP1Y8HsBkiEbrYNSGLC6qQsd2DjhchpPLbgSC9GfMGGSfACnCzuvkojx0VY5zo2xrKk vC+5RoZagmN7cjtf+GYPeWUGk1KdqADat8xKe+DqpSinX4UvdgqmvNLTpeZZe7qpWnYH inrdyKarwf9E0O1wDzLlDUfb6MBmpp2pVf9JVKF3mlEzybpUNl9z+SCiF8j6OdOfD/ei oYFx3DmFigB7Hom4utAt7LyeMK9YkJEoxwBoWgqGCFK35Nq2NX4o89ePxgRw5rROFuKk BUKw== X-Gm-Message-State: AOJu0YxqnAO+uR4LBsu5siyLQ+a0/Eu4W6ZitrKsyGlMKiGNTHClH5yX ASN9nhOnNGd/hiaO7lYwkmY7N4b9xEpyHTMtyu3Vn+9NSnxhdqIA+IQljQ== X-Google-Smtp-Source: AGHT+IGl236GQxyeTpKQZRCxmjjlw4F5VsW2Y0N/JtFcCfpm+gHi5ZC3YlNa/Fgl+cIQGIWp7s8BQA== X-Received: by 2002:a17:907:1c08:b0:a8d:2671:4999 with SMTP id a640c23a62f3a-a8d26714e3amr635411866b.39.1725935358172; Mon, 09 Sep 2024 19:29:18 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a8d25835abbsm415288766b.9.2024.09.09.19.29.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 09 Sep 2024 19:29:17 -0700 (PDT) Message-Id: <502008bb7c57327bad65867a70871ef0cf8898b5.1725935335.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Tue, 10 Sep 2024 02:28:46 +0000 Subject: [PATCH 21/30] pack-objects: extract should_attempt_deltas() Fcc: Sent Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, johannes.schindelin@gmx.de, peff@peff.net, ps@pks.im, me@ttaylorr.com, johncai86@gmail.com, newren@gmail.com, Derrick Stolee , Derrick Stolee From: Derrick Stolee From: Derrick Stolee Signed-off-by: Derrick Stolee --- builtin/pack-objects.c | 17 +++++++---------- 1 file changed, 7 insertions(+), 10 deletions(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 3d0bb33427d..b1d684c3417 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -3151,8 +3151,7 @@ static int should_attempt_deltas(struct object_entry *entry) */ return 0; - if (!entry->type_valid || - oe_size_less_than(&to_pack, entry, 50)) + if (!entry->type_valid || oe_size_less_than(&to_pack, entry, 50)) return 0; if (entry->no_try_delta) @@ -3162,14 +3161,12 @@ static int should_attempt_deltas(struct object_entry *entry) if (oe_type(entry) < 0) die(_("unable to get type of object %s"), oid_to_hex(&entry->idx.oid)); - } else { - if (oe_type(entry) < 0) { - /* - * This object is not found, but we - * don't have to include it anyway. - */ - return 0; - } + } else if (oe_type(entry) < 0) { + /* + * This object is not found, but we + * don't have to include it anyway. + */ + return 0; } return 1; From patchwork Tue Sep 10 02:28:47 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13797754 Received: from mail-lf1-f54.google.com (mail-lf1-f54.google.com [209.85.167.54]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2405D18595E for ; Tue, 10 Sep 2024 02:29:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.167.54 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725935362; cv=none; b=hJRgOaA8rovujLpEEA8yy8Hh3GAoldZ2R9g0dD15QBUT8cR8LdcgD+v/mfvscD6kxnoqRRFtcZ9HntOqWc2xKNiSg6NggewAfCnXjf5ViO/25gJroZexijSbSaXmAhWyz2iNy4O+tAr4gB5gE/k1o0jS8odF2SGqNwRMc6kXrq8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725935362; c=relaxed/simple; bh=gId5JgM7izLt873jtAaSsdX7yjFb2X805k/gda2kMKk=; h=Message-Id:In-Reply-To:References:From:Date:Subject:Content-Type: MIME-Version:To:Cc; b=PxNuf623+xAaabKba4sMiY5CJn/QrgfKrQt0lNHEpVe9VOwy9sQ+TJrNw8kdjp1B1EGX3gBFkV75v8oYCR7UVbprwlY/I1TFag6v0mLqBUdMCwVighbBERqxBO6TeAceJpR7TKZ6W7KlCudrq5K9CoA0Y3Cr28WF09TlFbPwBB4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=Eav9bwZM; arc=none smtp.client-ip=209.85.167.54 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Eav9bwZM" Received: by mail-lf1-f54.google.com with SMTP id 2adb3069b0e04-535dc4ec181so4578669e87.3 for ; Mon, 09 Sep 2024 19:29:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1725935359; x=1726540159; darn=vger.kernel.org; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=WeWlcwU9yAFb56Ubkr+pvO6A5aBn24BR3efejK8Q6RQ=; b=Eav9bwZM6F6Y89HHbMJNtF+x+CnJThakBNQqMb3435nL9ut+5GzDZoTeTWRMyEpp+P 1qW3l2mZqF1eurKt1xw3o3c4MInbZfgZendLNG2VO3gBgITpsC8FAI3EzhAo2tHZeeR0 Ee2Cl9miwAxgZ56IGzzp4ZxT+g5S3ghVKpfxoKcpe449PPqJNQ0a/2t8lOf66JiNBxUW AW6pRlIQUCpz6TaDWiVV5UwdzhyhYC2tXueKGcTn+QWlvdAg7HobiRWWQiJl/yQxOakv KPr1Ju4EhgLxh3BhhXIaO3f8Bqn//WYIr7RtqrsiOm19RalLvXNcMVqibrf+3kVGseXM Stow== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725935359; x=1726540159; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=WeWlcwU9yAFb56Ubkr+pvO6A5aBn24BR3efejK8Q6RQ=; b=aDWQFfGgSDzMxY9X/j+13vOJ2ZzA9DjsDHzu01XgJiFu6MyfVrVnoVU8VsJehTWriv lyocMHIZdTEm7zBmARFcipBvhT9A5TQOc570XMSBAH8No2UGZjp6SpDbPuAROmqw72L/ n/vofDLmk8XeU7g3RDyH/sWUC9kpjObNo6zqhinRSnziPQNINQsS5FiKD6tGq66VJUro bmwj8W1j0rBOUv364diihVk6VV1soahBiv+GmZUmqnfGiYRTNa7FWkX00b/ne2odj7y3 UtBrfTM1YaQEfhZnV6BJ2fG38YGkVLEHDvgeBgR1g2ENjwhGi+j2tQApEaGYpibQfwDm Lcdg== X-Gm-Message-State: AOJu0YxKAcxxlICPNpQrinjVTxa6Dbw7xRCgLCG/OMX0ShZtz/slTG8S E4MyAaGkSsyouR4m2UK2C3DBI7WrxrS5z2JKC7xqEePfC5EwZOI89HSoiw== X-Google-Smtp-Source: AGHT+IFy0ZGMqpHvNLFzvifbE1PUolJy3/xw0Zb3hy+jxGyOL9Q7JSTrmGCiVO3Ae9HHVA7wcCAbEw== X-Received: by 2002:a05:6512:124b:b0:535:6892:3be3 with SMTP id 2adb3069b0e04-536587fc4c8mr7577395e87.41.1725935358798; Mon, 09 Sep 2024 19:29:18 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a8d25835adfsm414164166b.44.2024.09.09.19.29.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 09 Sep 2024 19:29:18 -0700 (PDT) Message-Id: In-Reply-To: References: Date: Tue, 10 Sep 2024 02:28:47 +0000 Subject: [PATCH 22/30] pack-objects: introduce GIT_TEST_PACK_PATH_WALK Fcc: Sent Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, johannes.schindelin@gmx.de, peff@peff.net, ps@pks.im, me@ttaylorr.com, johncai86@gmail.com, newren@gmail.com, Derrick Stolee , Derrick Stolee From: Derrick Stolee From: Derrick Stolee There are many tests that validate whether 'git pack-objects' works as expected. Instead of duplicating these tests, add a new test environment variable, GIT_TEST_PACK_PATH_WALK, that implies --path-walk by default when specified. This was useful in testing the implementation of the --path-walk implementation, especially in conjunction with test such as: - t5322-pack-objects-sparse.sh : This demonstrates the effectiveness of the --sparse option and how it combines with --path-walk. RFC TODO: list other helpful test cases, as well as the ones where the behavior breaks if this is enabled... Signed-off-by: Derrick Stolee --- builtin/pack-objects.c | 1 + t/README | 4 ++++ 2 files changed, 5 insertions(+) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index b1d684c3417..b9fe1b1fbd5 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -4534,6 +4534,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) disable_replace_refs(); + path_walk = git_env_bool("GIT_TEST_PACK_PATH_WALK", 0); sparse = git_env_bool("GIT_TEST_PACK_SPARSE", -1); if (the_repository->gitdir) { prepare_repo_settings(the_repository); diff --git a/t/README b/t/README index 44c02d81298..a5d7d0239e0 100644 --- a/t/README +++ b/t/README @@ -433,6 +433,10 @@ GIT_TEST_PACK_SPARSE= if disabled will default the pack-objects builtin to use the non-sparse object walk. This can still be overridden by the --sparse command-line argument. +GIT_TEST_PACK_PATH_WALK= if enabled will default the pack-objects +builtin to use the path-walk API for the object walk. This can still be +overridden by the --no-path-walk command-line argument. + GIT_TEST_PRELOAD_INDEX= exercises the preload-index code path by overriding the minimum number of cache entries required per thread. From patchwork Tue Sep 10 02:28:48 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13797755 Received: from mail-ed1-f43.google.com (mail-ed1-f43.google.com [209.85.208.43]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 90DBF185B76 for ; Tue, 10 Sep 2024 02:29:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.208.43 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725935363; cv=none; b=LrM5HtPYt8d1/hYS3XGZrQ7K0pbky6p4Hlug/+P75X7VKwSL4suAHyypE6yMBv2bqddVingTJ2aTG5ALfafeL32UHyENtXp47sKliwWmto+Ew0DkuFs9YHGiq/JUdX2VqbLCrtFqV1EMwMrTEDrhksc4V20P8myFvbyizRDSc/E= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725935363; c=relaxed/simple; bh=KTtSSbyiJNA+n3owSfOQQhVgd5hIiKsUwGgXTh1PrnE=; h=Message-Id:In-Reply-To:References:From:Date:Subject:Content-Type: MIME-Version:To:Cc; b=NhabeFOf4IfCrIkzpG8tslHRRwx7zr4A1Velyr0XOxxgBsuUh7rPpi8xPWioOA+Zo3SiWpfunDQLiWyxEf/Y1PZgWxERWdto1GFYASO7Z5GfVzBgmeJb7Y2whyatj7q3FjjsFCn6Bku3vbULANMwJSd2CEL2EPV5EivALVM0CAI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=iU95lzWi; arc=none smtp.client-ip=209.85.208.43 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="iU95lzWi" Received: by mail-ed1-f43.google.com with SMTP id 4fb4d7f45d1cf-5c255e3c327so5320473a12.1 for ; Mon, 09 Sep 2024 19:29:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1725935359; x=1726540159; darn=vger.kernel.org; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=tvBsYsvLR+bmYCeHxwTsXNEdbM76cKEWCFQQ2DVgrN0=; b=iU95lzWiCRoP0Llq+tKUPlSDz7+FCsj1WNx064FA8OQQ+v4lJ8Xh6dHs038I1ny3QR PU42Yx+LLC19FI2TVzibU+EPTrzW8APPuACiSodYRWAIPH9F4kWL5WE1gYPUk+8bFd6z kOSRE7JU6ts3whF4VnHNmwTWymQQgd2MXPEontmzEVzYaza6WdJq0vWYbhhOxzQvLQFe 15DLvMo7j5HkXUnBIqsBFOFho4+/jwNfMaokzQxQ/9LJg3dbtqjYa9vcI8/g5Zu7/wPx rl2BB2K+U703WOupopxPYF0FRgZJUD6oi/EIhynh8ppr6sPWb0r8M7MN8elEJLolS0Hi +GSQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725935359; x=1726540159; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=tvBsYsvLR+bmYCeHxwTsXNEdbM76cKEWCFQQ2DVgrN0=; b=ETx9ntoLG2DSFNCVT1rlvnhroX2V4JXi9yqJxEVWG+67qkyhCeQm4Es5oDWaSNIDA9 lL1DSczmd7S7IL+xdPvAaDnU4KpVOX/2XhyPO8JmLYRwcuTOlLhHvm6a0TtW3NgCFpoy hzHokCe6fQcQAYQoVE5hMMeb/avWQwHfeD31Ocfe3br7P6ZIaB1UXAipgcHmuKr/BndH ZZTP0/+C4jnUfiqYZrRlI1O4YIZ9hnG+c5ZIoBeM/yvGfw1uL6F6btgbs5f499gC7gwk tWza8UrzE4Kuo3iZCT/HuBh54jarmvmTPcgwkSEOeR53ve0a/9ZG5PBfp7j4KcJOh5jy A+9A== X-Gm-Message-State: AOJu0Yx5GkhIfm8n9u07oV2z9E1mks5IrqYg7tXKSFA7AlLAi4FArAce dsXY5d5XBN9FtMEEAyZhezAlqrj/Rg7DH9EUjF2u0RbqkjdP8JVSEZU+pQ== X-Google-Smtp-Source: AGHT+IFmEHVLdzIQhuIgOo1zYgpmsagzamLfiVaWHoKTeqQIGtIQXX9KWYU6saIK0s48HodSCYAWXQ== X-Received: by 2002:a05:6402:50c6:b0:5c2:1043:b3e1 with SMTP id 4fb4d7f45d1cf-5c3dc7993e8mr7745844a12.18.1725935359449; Mon, 09 Sep 2024 19:29:19 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id 4fb4d7f45d1cf-5c3ebd8cc28sm3677498a12.83.2024.09.09.19.29.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 09 Sep 2024 19:29:19 -0700 (PDT) Message-Id: <54bd80701fb9b55910d6d8453f235872fe549fdd.1725935335.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Tue, 10 Sep 2024 02:28:48 +0000 Subject: [PATCH 23/30] p5313: add size comparison test Fcc: Sent Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, johannes.schindelin@gmx.de, peff@peff.net, ps@pks.im, me@ttaylorr.com, johncai86@gmail.com, newren@gmail.com, Derrick Stolee , Derrick Stolee From: Derrick Stolee From: Derrick Stolee To test the benefits of the new --path-walk option in 'git pack-objects', create a performance test that times the process but also compares the size of the output. Against the microsoft/fluentui repo [1] against a particular commit [2], this has reproducible results of a similar scale: Test this tree --------------------------------------------------------------- 5313.2: thin pack 0.39(0.48+0.03) 5313.3: thin pack size 1.2M 5313.4: thin pack with --path-walk 0.09(0.07+0.01) 5313.5: thin pack size with --path-walk 20.8K 5313.6: big recent pack 2.13(8.29+0.26) 5313.7: big recent pack size 17.7M 5313.8: big recent pack with --path-walk 3.18(4.21+0.22) 5313.9: big recent pack size with --path-walk 15.0M [1] https://github.com/microsoft/reactui [2] e70848ebac1cd720875bccaa3026f4a9ed700e08 RFC TODO: Note that the path-walk version is slower for the big case, but the delta calculation is single-threaded with the current implementation! It's still faster for the small case that mimics a typical push. Signed-off-by: Derrick Stolee --- t/perf/p5313-pack-objects.sh | 55 ++++++++++++++++++++++++++++++++++++ 1 file changed, 55 insertions(+) create mode 100755 t/perf/p5313-pack-objects.sh diff --git a/t/perf/p5313-pack-objects.sh b/t/perf/p5313-pack-objects.sh new file mode 100755 index 00000000000..fdcdf188f95 --- /dev/null +++ b/t/perf/p5313-pack-objects.sh @@ -0,0 +1,55 @@ +#!/bin/sh + +test_description='Tests pack performance using bitmaps' +. ./perf-lib.sh + +GIT_TEST_PASSING_SANITIZE_LEAK=0 +export GIT_TEST_PASSING_SANITIZE_LEAK + +test_perf_large_repo + +test_expect_success 'create rev input' ' + cat >in-thin <<-EOF && + $(git rev-parse HEAD) + ^$(git rev-parse HEAD~1) + EOF + + cat >in-big-recent <<-EOF + $(git rev-parse HEAD) + ^$(git rev-parse HEAD~1000) + EOF +' + +test_perf 'thin pack' ' + git pack-objects --thin --stdout --revs --sparse out +' + +test_size 'thin pack size' ' + wc -c out +' + +test_size 'thin pack size with --path-walk' ' + wc -c out +' + +test_size 'big recent pack size' ' + wc -c out +' + +test_size 'big recent pack size with --path-walk' ' + wc -c X-Patchwork-Id: 13797756 Received: from mail-ej1-f46.google.com (mail-ej1-f46.google.com [209.85.218.46]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 38946189906 for ; Tue, 10 Sep 2024 02:29:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.218.46 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725935363; cv=none; b=FRUXVRJdq63TA0TsCJheUAJP6lf9j41sMXACBRePB+sXBkb4FQVPUER3pePKkq2y+/b7NCz7ncjMBtOn9pOgCoipo/t3leRCCauD00y54iy9NSrnhFj/Ny0zWOFS/BpBYbDtw7iquRUQGj2gAt9/jzXbH297iy2VIc0+d/tlMSE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725935363; c=relaxed/simple; bh=dKJz/bU0aUqqVivSFtqiUbLMRpABpe0JfV8yjnQpXtA=; h=Message-Id:In-Reply-To:References:From:Date:Subject:Content-Type: MIME-Version:To:Cc; b=ZVZdPbZJ8VIbprm9le1djTQTOfUC6nGN1IjRCD5XQspiUHikxJe+P9I9r+Sstzh/Q3kSMn1HSB5Anee0AT9YuodEoUaYmjuFxqnbdw8ZiVLkwFA+aYX6qNIPFZqYvm615fS+1SMGi6/8ree0qVdb/QbcIxgkYXGNnDIJH0HyjRU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=WgNNe4it; arc=none smtp.client-ip=209.85.218.46 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="WgNNe4it" Received: by mail-ej1-f46.google.com with SMTP id a640c23a62f3a-a8d4979b843so292100766b.3 for ; Mon, 09 Sep 2024 19:29:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1725935360; x=1726540160; darn=vger.kernel.org; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=ND5FrZLmEENSHzE4xHQvMnbro+PrdtvzAI1FSaIYoKw=; b=WgNNe4itdoiqEsIlSlUjctJSA9dOYoMc/7CZF5sHACZVye6YSTkF5nRlzPoHhhPZHb mrzYYfRtoG5laukz8dJZVWnhaAGfcxeeR3636rQpPR9MjAWTgQhb9mpMekyRG+CuEa8I 5riMmjh8RBgySV3ltdh0QJlcA+szSMy8MhFzei9cK//uJTCMR7t1AV3WaOaW6jSd7EiK Y3hyzqLnZAxnWYhBSpsXXAxEdE7oi9mj0yNZkN9iG+PLbh47tZ+fS+zM22f1oq8FRfok ZC+g6GsW7oerzZZkyb7D+Q6scGjuwFbz1FpjNJ9HU634hX3qDpa4bQTrxVgF/Dc5SDoD tmKg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725935360; x=1726540160; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ND5FrZLmEENSHzE4xHQvMnbro+PrdtvzAI1FSaIYoKw=; b=oUW+YmA1TSsqPdFwDWpS3ryGAwmcOmD8fdWfq5FkPsjb6gpJf/pzZXFT+LWhqSZr6O vZost48CmcuD5utX47A+ojrgwNxwm2GGjhOiSFVLlHQrB/vY6+T2QE2fqqXp2SmZlxmL fMW61X70806byCRrsnRKkDPE3M5fNVeSZAVbwUCQk1iPfteTnb0+kUsGd45gFovbECgP 8VEwSPdgGYVAnogiYYVre5cg+mFl/fXYHS5DflH0vflne7KtjhW2IZ4MkgTqMxkAtYih ZivdXoblPNpMlz8Rf346Nfrc86TKUsU5n0n6jAnql65tiL4t9543hYMOuHJOfoAIrSUr vW4g== X-Gm-Message-State: AOJu0YzFmu6jVw91FTLJPtTQyJnowD8MzQfEdlMCJsDHzwo5czlqPEa/ OU+1Bq5WuUe6ziqH8Zfu90ivUMEMAWG/UD8s0ZDd1z3x9UHP0ZBIYRcevA== X-Google-Smtp-Source: AGHT+IGtFroj12R10mIkKv8SIlYxdSWCPy07Ih+/w+exT+8DZIdhDZey3cpDOk8OjSPPBtLiREBCFA== X-Received: by 2002:a17:907:7f03:b0:a8d:5e1a:8d7b with SMTP id a640c23a62f3a-a8d5e1a9021mr414316266b.43.1725935360047; Mon, 09 Sep 2024 19:29:20 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a8d2583382dsm413050766b.27.2024.09.09.19.29.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 09 Sep 2024 19:29:19 -0700 (PDT) Message-Id: In-Reply-To: References: Date: Tue, 10 Sep 2024 02:28:49 +0000 Subject: [PATCH 24/30] repack: add --path-walk option Fcc: Sent Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, johannes.schindelin@gmx.de, peff@peff.net, ps@pks.im, me@ttaylorr.com, johncai86@gmail.com, newren@gmail.com, Derrick Stolee , Derrick Stolee From: Derrick Stolee From: Derrick Stolee Since 'git pack-objects' supports a --path-walk option, allow passing it through in 'git repack'. This presents interesting testing opportunities for comparing the different repacking strategies against each other. For the microsoft/fluentui repo [1], the results are very interesting: Test this tree ------------------------------------------------------------------- 5313.10: full repack 97.91(663.47+2.83) 5313.11: full repack size 449.1K 5313.12: full repack with --path-walk 105.42(120.49+0.95) 5313.13: full repack size with --path-walk 159.1K [1] https://github.com/microsoft/fluentui This repo suffers from having a lot of paths that collide in the name hash, so examining them in groups by path leads to better deltas. Also, in this case, the single-threaded implementation is competitive with the full repack. This is saving time diffing files that have significant differences from each other. Signed-off-by: Derrick Stolee --- builtin/repack.c | 5 +++++ t/perf/p5313-pack-objects.sh | 20 ++++++++++++++++++++ 2 files changed, 25 insertions(+) diff --git a/builtin/repack.c b/builtin/repack.c index 62cfa50c50f..9e39a1ea8f8 100644 --- a/builtin/repack.c +++ b/builtin/repack.c @@ -57,6 +57,7 @@ struct pack_objects_args { int no_reuse_object; int quiet; int local; + int path_walk; struct list_objects_filter_options filter_options; }; @@ -288,6 +289,8 @@ static void prepare_pack_objects(struct child_process *cmd, strvec_pushf(&cmd->args, "--no-reuse-delta"); if (args->no_reuse_object) strvec_pushf(&cmd->args, "--no-reuse-object"); + if (args->path_walk) + strvec_pushf(&cmd->args, "--path-walk"); if (args->local) strvec_push(&cmd->args, "--local"); if (args->quiet) @@ -1158,6 +1161,8 @@ int cmd_repack(int argc, const char **argv, const char *prefix) N_("pass --no-reuse-delta to git-pack-objects")), OPT_BOOL('F', NULL, &po_args.no_reuse_object, N_("pass --no-reuse-object to git-pack-objects")), + OPT_BOOL(0, "path-walk", &po_args.path_walk, + N_("pass --path-walk to git-pack-objects")), OPT_NEGBIT('n', NULL, &run_update_server_info, N_("do not run git-update-server-info"), 1), OPT__QUIET(&po_args.quiet, N_("be quiet")), diff --git a/t/perf/p5313-pack-objects.sh b/t/perf/p5313-pack-objects.sh index fdcdf188f95..48fc05bb6c6 100755 --- a/t/perf/p5313-pack-objects.sh +++ b/t/perf/p5313-pack-objects.sh @@ -52,4 +52,24 @@ test_size 'big recent pack size with --path-walk' ' wc -c X-Patchwork-Id: 13797757 Received: from mail-ed1-f46.google.com (mail-ed1-f46.google.com [209.85.208.46]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E99C318A95B for ; Tue, 10 Sep 2024 02:29:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.208.46 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725935364; cv=none; b=pr8C/PYJrRoFyVnwMTN9Bh6ha3e8ZH8dXlt/a2Jl4svr9or9vpk1iJeELOCim6Ob5aZ6kTDOYS2JkTWwb6/PI7Pf0EtLJHqItph3ggVxxuUG3g4jccorGysqjtiZtqD6TznZhYUt49EaGaWBL2QWR5OAlC+5VnEcnxaw21xhzig= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725935364; c=relaxed/simple; bh=35AKWIL9mvOnLvhQBUnekstnxUMHfQZ1FpDoQc0fKBs=; h=Message-Id:In-Reply-To:References:From:Date:Subject:Content-Type: MIME-Version:To:Cc; b=osrmDbpGE4fmB3fgXcrehiuQL5L0ZLHEjE30MfB02VqHipFfti1ccXBFpc7Md4SfcXLrVE4l2AUjAHPCS4/TXbZ/N028ZCknmPnyzWk6gBiGU6/a583m0AHyrDJZp/HiMKkSypkUTrqnlE47r0CRcgan3XVVIt1lVADsVciYjUc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=m8giCAoR; arc=none smtp.client-ip=209.85.208.46 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="m8giCAoR" Received: by mail-ed1-f46.google.com with SMTP id 4fb4d7f45d1cf-5c24ebaa427so9234529a12.1 for ; Mon, 09 Sep 2024 19:29:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1725935361; x=1726540161; darn=vger.kernel.org; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=PxYwP1Fn/4Uu0YYO0qKr6oF88g/DMHt71Jrq/P+mPwk=; b=m8giCAoRFYUpXrAKC/iB7UEhXDb1VwWW4KFj7r06LyQrlhBsbU+mrktVk7ZKqxhFCI tIIlUukM6welmocPmKjGDK9ICv1JuLYdR/iSDLdzCBfQplFBNZEb8qefARl2NOkfmDn0 c7coo8xTeRpd1n4jDfxzNok/GVPskR8gUzmkPUP/NtdlyFQfrt0UHBIFtR3djAyDU9wI VsK5E6dEX5a66HNfWKxg7zVqvY0cT4toGW43pBr40T4n2ulTjiDSejEpnTuaakqyrDf1 j0ZJhfevvXMUdy07kdieENiV0NStQo5CVyOzx9uK1Qur1HI67vHtmu1gB7zyr6IIMd8j 5oyQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725935361; x=1726540161; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=PxYwP1Fn/4Uu0YYO0qKr6oF88g/DMHt71Jrq/P+mPwk=; b=vXMWpLjoJ96sRrjl4tquJBgzMKnSRIZIRnBDwCjGemReOw4g8mHZ8PCt98YAA+gGgM v6CCbi0hsGwTq3Rd5z3lP9Z60LKLi2/EQpFDXm3EIPDK/Tcb27qsrBDCegBeyuPl5eeb ytQR1aVwSvn7iAJPXBhSksS4mMU+DzM2YwfJCo34uoX6LeCzjbnoUwPnl1vmKV9OEsXu JOTPzKGmjy7zHwaZtxHaPR6PTfY7VElEmfjrkbVk72FFl/fhjuY+d2yIcg9Px1PiOHQx vw05LE46/M4HSSg1s8ThnYvP8rhWHMJ1NkkoDS3W9tz8vccpGUytSn1AoA8Br5YjpZDT AICw== X-Gm-Message-State: AOJu0Yy76wWoVd+qaS5wI6y6hBEHs+HAjTKz3VwnXgKlC33MDd0NAdsk RenSdUlVg0o7gX8B/3YvpASOd3i2BCDrcxoNjvgpAN/0DzDTVPmRBS/Eiw== X-Google-Smtp-Source: AGHT+IHCpQe6Sl8xG3/I6n8Hu7qX8BgDKSm1QI8JF+dgLJCd2Q/hnNs6VTrOaPA0osGFSo+0DUlkwA== X-Received: by 2002:a05:6402:3596:b0:5c0:c559:ad6 with SMTP id 4fb4d7f45d1cf-5c4015df1d1mr1538135a12.6.1725935360746; Mon, 09 Sep 2024 19:29:20 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id 4fb4d7f45d1cf-5c3ebd8c4c6sm3650172a12.86.2024.09.09.19.29.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 09 Sep 2024 19:29:20 -0700 (PDT) Message-Id: <1942f7d03622f2740d83e766fca65938cb590f6a.1725935335.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Tue, 10 Sep 2024 02:28:50 +0000 Subject: [PATCH 25/30] pack-objects: enable --path-walk via config Fcc: Sent Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, johannes.schindelin@gmx.de, peff@peff.net, ps@pks.im, me@ttaylorr.com, johncai86@gmail.com, newren@gmail.com, Derrick Stolee , Derrick Stolee From: Derrick Stolee From: Derrick Stolee Users may want to enable the --path-walk option for 'git pack-objects' by default, especially underneath commands like 'git push' or 'git repack'. This should be limited to client repositories, since the --path-walk option disables bitmap walks, so would be bad to include in Git servers when serving fetches and clones. There is potential that it may be helpful to consider when repacking the repository, to take advantage of improved deltas across historical versions of the same files. Much like how "pack.useSparse" was introduced and included in "feature.experimental" before being enabled by default, use the repository settings infrastructure to make the new "pack.usePathWalk" config enabled by "feature.experimental" and "feature.manyFiles". Signed-off-by: Derrick Stolee --- Documentation/config/pack.txt | 8 ++++++++ builtin/pack-objects.c | 4 +++- repo-settings.c | 3 +++ repository.h | 1 + 4 files changed, 15 insertions(+), 1 deletion(-) diff --git a/Documentation/config/pack.txt b/Documentation/config/pack.txt index da527377faf..08d06271177 100644 --- a/Documentation/config/pack.txt +++ b/Documentation/config/pack.txt @@ -155,6 +155,14 @@ pack.useSparse:: commits contain certain types of direct renames. Default is `true`. +pack.usePathWalk:: + When true, git will default to using the '--path-walk' option in + 'git pack-objects' when the '--revs' option is present. This + algorithm groups objects by path to maximize the ability to + compute delta chains across historical versions of the same + object. This may disable other options, such as using bitmaps to + enumerate objects. + pack.preferBitmapTips:: When selecting which commits will receive bitmaps, prefer a commit at the tip of any reference that is a suffix of any value diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index b9fe1b1fbd5..e7a9d0349c3 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -4534,12 +4534,14 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) disable_replace_refs(); - path_walk = git_env_bool("GIT_TEST_PACK_PATH_WALK", 0); + path_walk = git_env_bool("GIT_TEST_PACK_PATH_WALK", -1); sparse = git_env_bool("GIT_TEST_PACK_SPARSE", -1); if (the_repository->gitdir) { prepare_repo_settings(the_repository); if (sparse < 0) sparse = the_repository->settings.pack_use_sparse; + if (path_walk < 0) + path_walk = the_repository->settings.pack_use_path_walk; if (the_repository->settings.pack_use_multi_pack_reuse) allow_pack_reuse = MULTI_PACK_REUSE; } diff --git a/repo-settings.c b/repo-settings.c index 2b4e68731be..d9597d84556 100644 --- a/repo-settings.c +++ b/repo-settings.c @@ -45,11 +45,13 @@ void prepare_repo_settings(struct repository *r) r->settings.fetch_negotiation_algorithm = FETCH_NEGOTIATION_SKIPPING; r->settings.pack_use_bitmap_boundary_traversal = 1; r->settings.pack_use_multi_pack_reuse = 1; + r->settings.pack_use_path_walk = 1; } if (manyfiles) { r->settings.index_version = 4; r->settings.index_skip_hash = 1; r->settings.core_untracked_cache = UNTRACKED_CACHE_WRITE; + r->settings.pack_use_path_walk = 1; } /* Commit graph config or default, does not cascade (simple) */ @@ -64,6 +66,7 @@ void prepare_repo_settings(struct repository *r) /* Boolean config or default, does not cascade (simple) */ repo_cfg_bool(r, "pack.usesparse", &r->settings.pack_use_sparse, 1); + repo_cfg_bool(r, "pack.usepathwalk", &r->settings.pack_use_path_walk, 0); repo_cfg_bool(r, "core.multipackindex", &r->settings.core_multi_pack_index, 1); repo_cfg_bool(r, "index.sparse", &r->settings.sparse_index, 0); repo_cfg_bool(r, "index.skiphash", &r->settings.index_skip_hash, r->settings.index_skip_hash); diff --git a/repository.h b/repository.h index af6ea0a62cd..2ae9c2b1741 100644 --- a/repository.h +++ b/repository.h @@ -62,6 +62,7 @@ struct repo_settings { enum untracked_cache_setting core_untracked_cache; int pack_use_sparse; + int pack_use_path_walk; enum fetch_negotiation_setting fetch_negotiation_algorithm; int core_multi_pack_index; From patchwork Tue Sep 10 02:28:51 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13797758 Received: from mail-ed1-f46.google.com (mail-ed1-f46.google.com [209.85.208.46]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8E81718C332 for ; Tue, 10 Sep 2024 02:29:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.208.46 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725935365; cv=none; b=pmkE4i0pp1hpahmviA+l5+3NpOPG8BzE7j40QRL/Qir0vTjgIUwpjLrlt04usEmn1B5ZLneZ6KlblaRAyUPujQDJ7UfWkcrZQWiU1N/to9Tz6oda0Q/xchkX1givfuUc6gBxImBsatIr33PQ6K16aB6XF7FJzOjbMzxzyU68kqU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725935365; c=relaxed/simple; bh=FEbFCfu0Lem4aWuaEg+LyD4WhLIyFEgd5J2X+vx5X+A=; h=Message-Id:In-Reply-To:References:From:Date:Subject:Content-Type: MIME-Version:To:Cc; b=YyVxLxnSjNMgGlYafTlwdLcebPgmCJ2ToQ0mMqjSX1dhJUJgUva0iodZkhHs7G7jpL26WCd5mAkgl8p37Wfxk5yldG+au0Vqu2E2V1uO+9X0biePzczeV31PCXa3SGuLwaNE5R9np4sHIEiukzEflQnwQaWgWuDroRpDIEUTTO8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=JLJrLwY3; arc=none smtp.client-ip=209.85.208.46 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="JLJrLwY3" Received: by mail-ed1-f46.google.com with SMTP id 4fb4d7f45d1cf-5c3c3b63135so5309538a12.3 for ; Mon, 09 Sep 2024 19:29:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1725935361; x=1726540161; darn=vger.kernel.org; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=YXDDuTS9csZ2ExwxQm0idy1D51Hxq7R4RWCbzOYPOj4=; b=JLJrLwY3GKgaccXX4ZkJnPedcHQi0XW2PB03XdRaX9bwn83Dd8A2y7Trsw+6Bp/rMi EHr8P0jkNGzIXtgYRrO4O0g8BqKhilajM045xypsU4YUWBkDg8LuDn8W3gOSEWjquo5G TFnjm3vgIbtOox+y1fZ6MUkPodVbVSy9Ne5wi5rRmpAvyDPhP4fAX70lHzt6NO/dnMyp EmELz+UnkMhLGpnl35SXt00JSktKpIvhzop3j0VE+WTpLHB2ILv3Xo6HfMAFrjLKrR+C qiaePKBcfLE3ArU2FTwRoq2oEOB7PhTCwA/qHXVJuGBm2MsqY8YwJ/Iy3foEnxt8ykdt uBBQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725935361; x=1726540161; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=YXDDuTS9csZ2ExwxQm0idy1D51Hxq7R4RWCbzOYPOj4=; b=QZwoCHESqFAdqkPYsbTNaAZhuHCXcfHKS5brimLTZn8yfp1xEdAjw0BU+2HSXBb/9Z A7GEy2xT1RyykmTs7F1H660rKoAckRkftpk0ITz/LDYgZS5wgJQvzPJac9Md3Bs2iUno jQGetB3vrDhoGqpsCrPX+Yf8FlSc/1NYNbLoQtjrmrWkJA7tWuHDr7Z7nbaRXwR8Wby1 AY4zxBpdXvRyNwb3eVA/VU+Hg3leg3kMBje2+HC9Zi8bBHoIEUDRp316Duh4e4wOyhg1 ZZ80mwjuZs9te+UT6PhawDRreaq2HF+ooDKqsofawoVBfj918Y+3S/9osmocCQuMIj08 YeOA== X-Gm-Message-State: AOJu0YzhkIuEDZh7kARiT01XDwwr46YSiGuseCONWwbTzltfMVHGney+ 6dED/DhzqoANKUBLITcfa1SBUXojHdGm8xcomc0HLGkYWtX4HdmrIF1Jkg== X-Google-Smtp-Source: AGHT+IGtf/COV7OSlEfl28d9L6lmGSgDLiAn2Tk3jiSfi1SJhD1FxvNkupjBqIuHZMInbuh8IoAmwA== X-Received: by 2002:a05:6402:40c3:b0:5c2:5cc8:353f with SMTP id 4fb4d7f45d1cf-5c3dc7bd5ddmr9774838a12.22.1725935361430; Mon, 09 Sep 2024 19:29:21 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id 4fb4d7f45d1cf-5c3ebd52167sm3688875a12.55.2024.09.09.19.29.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 09 Sep 2024 19:29:21 -0700 (PDT) Message-Id: <4c10f859c8dcc42c4d0470a1f295fba979aca336.1725935335.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Tue, 10 Sep 2024 02:28:51 +0000 Subject: [PATCH 26/30] scalar: enable path-walk during push via config Fcc: Sent Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, johannes.schindelin@gmx.de, peff@peff.net, ps@pks.im, me@ttaylorr.com, johncai86@gmail.com, newren@gmail.com, Derrick Stolee , Derrick Stolee From: Derrick Stolee From: Derrick Stolee Repositories registered with Scalar are expected to be client-only repositories that are rather large. This means that they are more likely to be good candidates for using the --path-walk option when running 'git pack-objects', especially under the hood of 'git push'. Enable this config in Scalar repositories. Signed-off-by: Derrick Stolee --- scalar.c | 1 + 1 file changed, 1 insertion(+) diff --git a/scalar.c b/scalar.c index 6166a8dd4c8..031d1ac179f 100644 --- a/scalar.c +++ b/scalar.c @@ -170,6 +170,7 @@ static int set_recommended_config(int reconfigure) { "core.autoCRLF", "false" }, { "core.safeCRLF", "false" }, { "fetch.showForcedUpdates", "false" }, + { "push.usePathWalk", "true" }, { NULL, NULL }, }; int i; From patchwork Tue Sep 10 02:28:52 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13797759 Received: from mail-ej1-f48.google.com (mail-ej1-f48.google.com [209.85.218.48]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6394018CBE8 for ; Tue, 10 Sep 2024 02:29:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.218.48 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725935366; cv=none; b=pv6lRmnQ32TT24De/4GXv77PZTUKCujcqy/dPG0xghmesevcnI8Uovxv3vBypv1BP7/Vh/9owQUwMQ2K9yt7bL5MGOauc2qz3qhXufZWkDGGDYTHoIyzPYBaYtYKohGKQnGxFX8s1xkuvPFaeFdSWL/yokULak1XhBrewApy3cU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725935366; c=relaxed/simple; bh=GOA4LnHup7QqZPhgYPo87lfCUoroLitANQIg2wETmtc=; h=Message-Id:In-Reply-To:References:From:Date:Subject:Content-Type: MIME-Version:To:Cc; b=aI50QQKHTy+S+xDbG347lttdUpP6STKtIDvdUwDiIKUCfRPblwqFMLVTMN5pzwvX8u28Kv04/HFIrEK48ynogU0TpScaGKs8IJBtcDXibq/iFRt8ZpvgsiDOLFHGwu6oreupGXvjYDaISDugbnLdMgnh4pwOLm+ECRCInm13gYM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=WOU/v3L+; arc=none smtp.client-ip=209.85.218.48 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="WOU/v3L+" Received: by mail-ej1-f48.google.com with SMTP id a640c23a62f3a-a8d24f98215so359047766b.1 for ; Mon, 09 Sep 2024 19:29:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1725935362; x=1726540162; darn=vger.kernel.org; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=KTaFrdUqHOfwtQ56odYrPo0YawVJ7ZpXQQo9OLhwZZc=; b=WOU/v3L+WAdbTUOURLjnZDGjlsrQ6cKQ/2zdcTlI04aNeBCfokQtKg+1soO3RnLfeQ sRvLKDl9fPIlQzd+9b4QvLNo0RKDdCTNoO7BTUPV0nX+2WBHLRqtQ4sncQiuMqAzNM/u FG7WCdLMqA4+BmMh6aw7qtyHSPeN90rz4xIXV2FmKRhGncvUovULoi44wDeGHHk+ML3+ CH/vzVDEJZLuaNojfZ9k7ujMjITJ4sViXvKOUkxzjLdmsZRCdgDRltHaJWeAOFKOfIr7 YCBpEcZn8MrUcZw3VSvBVK8YN2SurbX7bJeg4vWrWs9SnzLFIvIBdrce70isx1x+BOlx 0yMg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725935362; x=1726540162; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=KTaFrdUqHOfwtQ56odYrPo0YawVJ7ZpXQQo9OLhwZZc=; b=ODKxGNQfozJpQDMlzypzVkgU0jBGYQv5FQr5MVUMUgceqWVlf9relv7DBDk7XVIlyB gBBwXHb3tqfvH9BXgHKtxoBJgIja4Amf73BXwFf5uYgTTNIpaQ8FdUkJZis0cIYaN18c kmQocYfSkfKJoqZ0a8DvFyFxuqeyMF3vP87IFdYcuOLHOtnmnMKT/NTtHXxjZxOIf2Zu i9P2RlB70oODsRG4U1RVQqxYcFOPgsDFGPXYg4hqrzHrjDxKNTFtspE1pFlD+3s42SKZ v9OTNlUa+EMGDx+BF4KD17K7/uFURg9hvY/mAHOvlvN/DZZbujXxy1OFubb6vMNTA556 vIxg== X-Gm-Message-State: AOJu0Yx1hFNFCrAe/eWkt7W/m5D+6FqjG7EO7mmCDm1f9oD+pP81nvQd LIBEgNwKTWhtWs14qWthtrBTFnn/dmANRzY5EV1LpnD+gM0DcAh7ARx6Vg== X-Google-Smtp-Source: AGHT+IGBZNA4vZCQIkKWrnd7GzsHiinVjr5Ul8oUoRN8Ue7hkbSWdYBMkSwznbpZWVSYYvIVeizs3w== X-Received: by 2002:a17:907:e602:b0:a8a:906d:b485 with SMTP id a640c23a62f3a-a8d2457bf57mr612590066b.26.1725935362089; Mon, 09 Sep 2024 19:29:22 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a8d259528ecsm415863566b.67.2024.09.09.19.29.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 09 Sep 2024 19:29:21 -0700 (PDT) Message-Id: In-Reply-To: References: Date: Tue, 10 Sep 2024 02:28:52 +0000 Subject: [PATCH 27/30] pack-objects: add --full-name-hash option Fcc: Sent Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, johannes.schindelin@gmx.de, peff@peff.net, ps@pks.im, me@ttaylorr.com, johncai86@gmail.com, newren@gmail.com, Derrick Stolee , Derrick Stolee From: Derrick Stolee From: Derrick Stolee RFC NOTE: this is essentially the same as the patch introduced independently of the RFC, but now is on top of the --path-walk option instead. This is included in the RFC for comparison purposes. RFC NOTE: As you can see from the details below, the --full-name-hash option essentially attempts to do similar things as the --path-walk option, but sometimes misses the mark. Collisions still happen with the --full-name-hash option, leading to some misses. However, in cases where the default name-hash algorithm has low collision rates and deltas are actually desired across objects with similar names but different full names, the --path-walk option can still take advantage of the default name hash approach. Here are the new performance details simulating a single push in an internal monorepo using a lot of paths that collide in the default name hash. We can see that --full-name-hash gets close to the --path-walk option's size. Test this tree -------------------------------------------------------------- 5313.2: thin pack 2.43(2.92+0.14) 5313.3: thin pack size 4.5M 5313.4: thin pack with --full-name-hash 0.31(0.49+0.12) 5313.5: thin pack size with --full-name-hash 15.5K 5313.6: thin pack with --path-walk 0.35(0.31+0.04) 5313.7: thin pack size with --path-walk 14.2K However, when simulating pushes on repositories that do not have issues with name-hash collisions, the --full-name-hash option presents a potential of worse delta calculations, such as this example using my local Git repository: Test this tree -------------------------------------------------------------- 5313.2: thin pack 0.03(0.01+0.01) 5313.3: thin pack size 475 5313.4: thin pack with --full-name-hash 0.02(0.01+0.01) 5313.5: thin pack size with --full-name-hash 14.8K 5313.6: thin pack with --path-walk 0.02(0.01+0.01) 5313.7: thin pack size with --path-walk 475 Note that the path-walk option found the same delta bases as the default options in this case. In the full repack case, the --full-name-hash option may be preferable because it interacts well with other advanced features, such as using bitmap indexes and tracking delta islands. Signed-off-by: Derrick Stolee --- builtin/pack-objects.c | 20 +++++++++++++++----- builtin/repack.c | 5 +++++ pack-objects.h | 20 ++++++++++++++++++++ t/perf/p5313-pack-objects.sh | 26 ++++++++++++++++++++++++++ 4 files changed, 66 insertions(+), 5 deletions(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index e7a9d0349c3..5d5a57e6b1f 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -270,6 +270,14 @@ struct configured_exclusion { static struct oidmap configured_exclusions; static struct oidset excluded_by_config; +static int use_full_name_hash; + +static inline uint32_t pack_name_hash_fn(const char *name) +{ + if (use_full_name_hash) + return pack_full_name_hash(name); + return pack_name_hash(name); +} /* * stats @@ -1674,7 +1682,7 @@ static int add_object_entry(const struct object_id *oid, enum object_type type, return 0; } - create_object_entry(oid, type, pack_name_hash(name), + create_object_entry(oid, type, pack_name_hash_fn(name), exclude, name && no_try_delta(name), found_pack, found_offset); return 1; @@ -1888,7 +1896,7 @@ static void add_preferred_base_object(const char *name) { struct pbase_tree *it; size_t cmplen; - unsigned hash = pack_name_hash(name); + unsigned hash = pack_name_hash_fn(name); if (!num_preferred_base || check_pbase_path(hash)) return; @@ -3405,7 +3413,7 @@ static void show_object_pack_hint(struct object *object, const char *name, * here using a now in order to perhaps improve the delta selection * process. */ - oe->hash = pack_name_hash(name); + oe->hash = pack_name_hash_fn(name); oe->no_try_delta = name && no_try_delta(name); stdin_packs_hints_nr++; @@ -3555,7 +3563,7 @@ static void add_cruft_object_entry(const struct object_id *oid, enum object_type entry = packlist_find(&to_pack, oid); if (entry) { if (name) { - entry->hash = pack_name_hash(name); + entry->hash = pack_name_hash_fn(name); entry->no_try_delta = no_try_delta(name); } } else { @@ -3578,7 +3586,7 @@ static void add_cruft_object_entry(const struct object_id *oid, enum object_type return; } - entry = create_object_entry(oid, type, pack_name_hash(name), + entry = create_object_entry(oid, type, pack_name_hash_fn(name), 0, name && no_try_delta(name), pack, offset); } @@ -4526,6 +4534,8 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) OPT_STRING_LIST(0, "uri-protocol", &uri_protocols, N_("protocol"), N_("exclude any configured uploadpack.blobpackfileuri with this protocol")), + OPT_BOOL(0, "full-name-hash", &use_full_name_hash, + N_("optimize delta compression across identical path names over time")), OPT_END(), }; diff --git a/builtin/repack.c b/builtin/repack.c index 9e39a1ea8f8..a1ab103e62d 100644 --- a/builtin/repack.c +++ b/builtin/repack.c @@ -58,6 +58,7 @@ struct pack_objects_args { int quiet; int local; int path_walk; + int full_name_hash; struct list_objects_filter_options filter_options; }; @@ -291,6 +292,8 @@ static void prepare_pack_objects(struct child_process *cmd, strvec_pushf(&cmd->args, "--no-reuse-object"); if (args->path_walk) strvec_pushf(&cmd->args, "--path-walk"); + if (args->full_name_hash) + strvec_pushf(&cmd->args, "--full-name-hash"); if (args->local) strvec_push(&cmd->args, "--local"); if (args->quiet) @@ -1163,6 +1166,8 @@ int cmd_repack(int argc, const char **argv, const char *prefix) N_("pass --no-reuse-object to git-pack-objects")), OPT_BOOL(0, "path-walk", &po_args.path_walk, N_("pass --path-walk to git-pack-objects")), + OPT_BOOL(0, "full-name-hash", &po_args.full_name_hash, + N_("pass --full-name-hash to git-pack-objects")), OPT_NEGBIT('n', NULL, &run_update_server_info, N_("do not run git-update-server-info"), 1), OPT__QUIET(&po_args.quiet, N_("be quiet")), diff --git a/pack-objects.h b/pack-objects.h index b9898a4e64b..50097552d03 100644 --- a/pack-objects.h +++ b/pack-objects.h @@ -207,6 +207,26 @@ static inline uint32_t pack_name_hash(const char *name) return hash; } +static inline uint32_t pack_full_name_hash(const char *name) +{ + const uint32_t bigp = 1234572167U; + uint32_t c, hash = bigp; + + if (!name) + return 0; + + /* + * Just do the dumbest thing possible: add random multiples of a + * large prime number with a binary shift. Goal is not cryptographic, + * but generally uniformly distributed. + */ + while ((c = *name++) != 0) { + hash += c * bigp; + hash = (hash >> 5) | (hash << 27); + } + return hash; +} + static inline enum object_type oe_type(const struct object_entry *e) { return e->type_valid ? e->type_ : OBJ_BAD; diff --git a/t/perf/p5313-pack-objects.sh b/t/perf/p5313-pack-objects.sh index 48fc05bb6c6..b3b7fff8abf 100755 --- a/t/perf/p5313-pack-objects.sh +++ b/t/perf/p5313-pack-objects.sh @@ -28,6 +28,14 @@ test_size 'thin pack size' ' wc -c out +' + +test_size 'thin pack size with --full-name-hash' ' + wc -c out ' @@ -44,6 +52,14 @@ test_size 'big recent pack size' ' wc -c out +' + +test_size 'big recent pack size with --full-name-hash' ' + wc -c out ' @@ -62,6 +78,16 @@ test_size 'full repack size' ' sort -nr | head -n 1 ' +test_perf 'full repack with --full-name-hash' ' + git repack -adf --no-write-bitmap-index --full-name-hash +' + +test_size 'full repack size with --full-name-hash' ' + du -a .git/objects/pack | \ + awk "{ print \$1; }" | \ + sort -nr | head -n 1 +' + test_perf 'full repack with --path-walk' ' git repack -adf --no-write-bitmap-index --path-walk ' From patchwork Tue Sep 10 02:28:53 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13797760 Received: from mail-ej1-f49.google.com (mail-ej1-f49.google.com [209.85.218.49]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F2D5D18CC1D for ; Tue, 10 Sep 2024 02:29:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.218.49 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725935366; cv=none; b=IMAoDq9pICyfHJadzPnjnSA+4B9RUgNfHbYysZTJa975gTVkpgwzzUpxxE/WqRpFwPuqrodvBqlTw8/m54IOFZu0g6JlRI93KeeVAneUhPNHtIlGhcgW3gDytodbC22jVnS+xnS63MlpHFHiH9ZGIZobBm5Eohb46yhWBGNgPeU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725935366; c=relaxed/simple; bh=OZnbhNCEYjp5OWn9TAKT7RyVIJ5N4TUrfQmrSS8Xe1k=; h=Message-Id:In-Reply-To:References:From:Date:Subject:Content-Type: MIME-Version:To:Cc; b=JoWrxp5pKRMfPZg+1JPYQa7Noawa93pazQDtbgcakmNcXpwl1Mn16OTU4mtzw3pHq4Pwk9ukay5RSw+pA65wWC3XMeTHT77Eh4NoWsI1a/mUscusKOCMmenfW2QZCVOoi6m6Aj7A7e/71iERfYvWmo/70LX8kYmFCLxdGJU+/gA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=m5sTQ1Xa; arc=none smtp.client-ip=209.85.218.49 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="m5sTQ1Xa" Received: by mail-ej1-f49.google.com with SMTP id a640c23a62f3a-a8d13b83511so361277666b.2 for ; Mon, 09 Sep 2024 19:29:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1725935363; x=1726540163; darn=vger.kernel.org; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=6MhPFwdntLbOgzsZ2XNlug6y7NZQg41ztPKUhv4oIYM=; b=m5sTQ1Xa7QvUJ9uKqEnM+iK0niVt8JD3hlgqPVQRMmX5EbeuLimq4NPDLnOfUMDB2K fHwzmL2m6QYIsV/XuvDy/ppkhRhiJmfhULm18lhJYdhFWpfnFPw5joSxZRKopgSlST4f L0PX8BeOJaXJzOe6gZC/rmRljeWOVMCiSEoaamrsOwHJIzHZGgedraiqJOukX1UmYzNH Sb+y31emz9jncSZ6G0coTl6m2IHGDgvAtvahmuSaNW09+0mfhp5nIHju8lZJMxoE89Qs 3wncy7sM+bbyOqzKJyhae8jNBeV1c8kESfZBdYZ0zB7C5M0gEzQVhtsX//RQJ3h/jEbM CdQg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725935363; x=1726540163; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=6MhPFwdntLbOgzsZ2XNlug6y7NZQg41ztPKUhv4oIYM=; b=qJKpG1lI+xNMQzE7KQt68k2T4Vybx9bHDjm3X9yZ2HpicfG52F+sIlRF5g0Mpq+Akx KmcXr3yAL6AugdegXxlixns3quWBzGeXI37PV7mE7KP2nvnsJ9yfKo0Wcf3KxiST0ud0 iIn1UIG12VU89MMI4gZ/zZWTUHuZjt9kp4kWFGOlmVedvKNEEHSBsIA60eRLKB4+ue9N fjWOp1395fBUjM3hqQ7BSSbbpT117hqrJmt8yz0NdKvnJ/zYjkbWiYRQVTLkF894ImHF 56s3uKAfE/s0hDM9MrShEhjgvDu5FqpUJ0rpdER6hMIsyJf9NYsFyD1yoIIy8ACnSWAp W17Q== X-Gm-Message-State: AOJu0YxgRsNe+mmbi6JnD0nyRna4CUmYMl3sRZZ5afH6SG2URVNpAVAx /5EU2RwEYbIhvlmJwjKvOhjNG/szgjK9i3f+UkHkQspa3JSeIAfsI8yA0A== X-Google-Smtp-Source: AGHT+IHp4v/GmvHcQTFBZ+nQalOaPB6p+g72tGs2ZqViMQ+KhFQGYpkLeSXcmQS1wIufHb1yQTQtTQ== X-Received: by 2002:a17:907:3201:b0:a8a:3f78:7b7b with SMTP id a640c23a62f3a-a8a885f4353mr1100370766b.14.1725935362869; Mon, 09 Sep 2024 19:29:22 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a8d25a073b1sm412742066b.82.2024.09.09.19.29.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 09 Sep 2024 19:29:22 -0700 (PDT) Message-Id: <8df39a432fa682212d53d31389d437e86b4513f6.1725935335.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Tue, 10 Sep 2024 02:28:53 +0000 Subject: [PATCH 28/30] test-name-hash: add helper to compute name-hash functions Fcc: Sent Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, johannes.schindelin@gmx.de, peff@peff.net, ps@pks.im, me@ttaylorr.com, johncai86@gmail.com, newren@gmail.com, Derrick Stolee , Derrick Stolee From: Derrick Stolee From: Derrick Stolee Using this tool, we can count how many distinct name-hash values exist within a list of paths. Examples include git ls-tree -r --name-only HEAD | \ test-tool name-hash | \ awk "{print \$1;}" | \ sort -ns | uniq | wc -l which outputs the number of distinct name-hash values that appear at HEAD. Or, the following which presents the resulting name-hash values of maximum multiplicity: git ls-tree -r --name-only HEAD | \ test-tool name-hash | \ awk "{print \$1;}" | \ sort -n | uniq -c | sort -nr | head -n 25 For an internal monorepo with around a quarter million paths at HEAD, the highest multiplicity for the standard name-hash function was 14,424 while the full name-hash algorithm had only seven hash values with any collision, with a maximum multiplicity of two. Signed-off-by: Derrick Stolee --- Makefile | 1 + t/helper/test-name-hash.c | 23 +++++++++++++++++++++++ t/helper/test-tool.c | 1 + t/helper/test-tool.h | 1 + 4 files changed, 26 insertions(+) create mode 100644 t/helper/test-name-hash.c diff --git a/Makefile b/Makefile index 154de6e01d0..462aff65a50 100644 --- a/Makefile +++ b/Makefile @@ -808,6 +808,7 @@ TEST_BUILTINS_OBJS += test-lazy-init-name-hash.o TEST_BUILTINS_OBJS += test-match-trees.o TEST_BUILTINS_OBJS += test-mergesort.o TEST_BUILTINS_OBJS += test-mktemp.o +TEST_BUILTINS_OBJS += test-name-hash.o TEST_BUILTINS_OBJS += test-oid-array.o TEST_BUILTINS_OBJS += test-online-cpus.o TEST_BUILTINS_OBJS += test-pack-mtimes.o diff --git a/t/helper/test-name-hash.c b/t/helper/test-name-hash.c new file mode 100644 index 00000000000..c82ccd7cefd --- /dev/null +++ b/t/helper/test-name-hash.c @@ -0,0 +1,23 @@ +/* + * test-name-hash.c: Read a list of paths over stdin and report on their + * name-hash and full name-hash. + */ + +#include "test-tool.h" +#include "git-compat-util.h" +#include "pack-objects.h" +#include "strbuf.h" + +int cmd__name_hash(int argc, const char **argv) +{ + struct strbuf line = STRBUF_INIT; + + while (!strbuf_getline(&line, stdin)) { + uint32_t name_hash = pack_name_hash(line.buf); + uint32_t full_hash = pack_full_name_hash(line.buf); + + printf("%10"PRIu32"\t%10"PRIu32"\t%s\n", name_hash, full_hash, line.buf); + } + + return 0; +} diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c index f8a67df7de9..4a603921002 100644 --- a/t/helper/test-tool.c +++ b/t/helper/test-tool.c @@ -43,6 +43,7 @@ static struct test_cmd cmds[] = { { "match-trees", cmd__match_trees }, { "mergesort", cmd__mergesort }, { "mktemp", cmd__mktemp }, + { "name-hash", cmd__name_hash }, { "oid-array", cmd__oid_array }, { "online-cpus", cmd__online_cpus }, { "pack-mtimes", cmd__pack_mtimes }, diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h index e74bc0ffd41..56a83bf3aac 100644 --- a/t/helper/test-tool.h +++ b/t/helper/test-tool.h @@ -37,6 +37,7 @@ int cmd__lazy_init_name_hash(int argc, const char **argv); int cmd__match_trees(int argc, const char **argv); int cmd__mergesort(int argc, const char **argv); int cmd__mktemp(int argc, const char **argv); +int cmd__name_hash(int argc, const char **argv); int cmd__online_cpus(int argc, const char **argv); int cmd__pack_mtimes(int argc, const char **argv); int cmd__parse_options(int argc, const char **argv); From patchwork Tue Sep 10 02:28:54 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13797761 Received: from mail-ej1-f48.google.com (mail-ej1-f48.google.com [209.85.218.48]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D3AA318DF8A for ; Tue, 10 Sep 2024 02:29:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.218.48 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725935367; cv=none; b=WoNMn16LZqtW6R/G0JW5wqwOiiDu8F/hS13eDWDlZ3lCy0Dd6dw6l6djkWIZb1EUvExNe5yql1Gwxr6CE0UMmsZwi6akPzBcQ7o5q378JbngLOZb4jVfYFDkYgPvdiiQ5qqxe/5/8GOFSleQMaAuaEbNn4v2mJin7BG1fEDSQ7Y= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725935367; c=relaxed/simple; bh=ifQ83hbRE5yqDTpJrNdcF14o3YllgMV5MBsIYbFKVxA=; h=Message-Id:In-Reply-To:References:From:Date:Subject:Content-Type: MIME-Version:To:Cc; b=rzvjMT7v+sf9yXTKSkZmkFs74FNSZZPKnRWuww0UWjNmYuv0+otBtvBNjoDSrglEeUjlvA89IW8dIKGA2FndbP7vyoWFCx7w1kD6/B3mbzRNyEEr6lqYkOAaYvaEoUKLG0Tam92o7grm5J1N149RqwWCDTWKMmUAM2h/6EUPbOI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=GHCZ/WrT; arc=none smtp.client-ip=209.85.218.48 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="GHCZ/WrT" Received: by mail-ej1-f48.google.com with SMTP id a640c23a62f3a-a8d446adf6eso264418066b.2 for ; Mon, 09 Sep 2024 19:29:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1725935364; x=1726540164; darn=vger.kernel.org; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=K4Mnie54Dh3iZv+PphkgpjdEIdhOk5JVZexOv4i+0ow=; b=GHCZ/WrTqaWVLqDUQ9rX6+ofOQvEYPrSSXPqUjPxlvOEX6FYTA65/gfInGiox5Gnvb o8y71oXJgxDgyBcyR/uxsI9uq0z2cL+EzhWXQK5Yfg+ZwstI61MRAVBqOwcpxxpaR9r2 bEtcgUguPZLBKxDSwle9mIswfb7//LdhEM06eVcCjFL7j7ZRilKqDKoihN3nxdIGYm2z nE5kJftb1q0uUwCiTea7WPETEupGnePZLLXIDTrGGOoLA+5cad9Qb1RitMAXjhqe/eeB G5kvRoRnKqjaA+9+z6hnmk69jH80jPrYxf+ps9uC8TfnhRYF624Tp7/Hcokn39Os0orw wqnA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725935364; x=1726540164; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=K4Mnie54Dh3iZv+PphkgpjdEIdhOk5JVZexOv4i+0ow=; b=EZFPjTsJQE885jEcaHLgDsbwx2tGhW+IQgaAzeHL6dRc6kv7xHoZxOu36z6MkCnSZ5 JhOHRCLn7V0t6KXyGMVW9CD2EQ5W8lO2sHP76BlITQdbnn3s7yMsxn9kblVZFkHu6fYO WEvj2rbgz+VK+GzSIN8DtBT3tbbdO39SsIcsAzGmT0qSZOOMZ2mqhg3YyZj6cIW+BPp7 DdkyxeSt0ju6AvD5dgFrU9MoaBwAtD1mncVZo5WhZ2PkvTMKkNsj+4tzPrkllvxM8cBp 5kk8rDH5ZPoisFJiCmCfUByre5jIn6TVfITX9bTZwyU0uDIZbgaeGUUTlAO9YE2i3cD7 RBcA== X-Gm-Message-State: AOJu0Yxt7Gql2xh9R6PW5t4Y4echaT9o+/pO4ZDyzjEJT41xLdxt5ixg l4JfHS+zgfAWw4wvZ3LVd3d5Tzj2cERePLjdx77aRt+zwUTL7VPqnj+PCw== X-Google-Smtp-Source: AGHT+IHjJ9hmJmrn8q+ft3O83udhA3wLHXFpQw57OcsPf9qsoX+OuxcpT2hQmpCttWA6U7m93d+Lsg== X-Received: by 2002:a17:907:d2df:b0:a86:43c0:4270 with SMTP id a640c23a62f3a-a8d245135a8mr613095666b.13.1725935363588; Mon, 09 Sep 2024 19:29:23 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a8d25926eddsm415283766b.49.2024.09.09.19.29.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 09 Sep 2024 19:29:23 -0700 (PDT) Message-Id: <5dcb20a1c5c1e6f5dd676c54fa6b001af9abe072.1725935335.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Tue, 10 Sep 2024 02:28:54 +0000 Subject: [PATCH 29/30] p5314: add a size test for name-hash collisions Fcc: Sent Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, johannes.schindelin@gmx.de, peff@peff.net, ps@pks.im, me@ttaylorr.com, johncai86@gmail.com, newren@gmail.com, Derrick Stolee , Derrick Stolee From: Derrick Stolee From: Derrick Stolee This test helps inform someone as to the behavior of the name-hash algorithms for their repo based on the paths at HEAD. For example, the microsoft/fluentui repo had these statistics at time of committing: Test this tree ----------------------------------------------------------------- 5314.1: paths at head 19.6K 5314.2: number of distinct name-hashes 8.2K 5314.3: number of distinct full-name-hashes 19.6K 5314.4: maximum multiplicity of name-hashes 279 5314.5: maximum multiplicity of fullname-hashes 1 That demonstrates that of the nearly twenty thousand path names, they are assigned around eight thousand distinct values. 279 paths are assigned to a single value, leading the packing algorithm to sort objects from those paths together, by size. In this repository, no collisions occur for the full-name-hash algorithm. In a more extreme example, an internal monorepo had a much worse collision rate: Test this tree ----------------------------------------------------------------- 5314.1: paths at head 221.6K 5314.2: number of distinct name-hashes 72.0K 5314.3: number of distinct full-name-hashes 221.6K 5314.4: maximum multiplicity of name-hashes 14.4K 5314.5: maximum multiplicity of fullname-hashes 2 Even in this repository with many more paths at HEAD, the collision rate was low and the maximum number of paths being grouped into a single bucket by the full-path-name algorithm was two. Signed-off-by: Derrick Stolee --- t/perf/p5314-name-hash.sh | 41 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 41 insertions(+) create mode 100755 t/perf/p5314-name-hash.sh diff --git a/t/perf/p5314-name-hash.sh b/t/perf/p5314-name-hash.sh new file mode 100755 index 00000000000..9fe26612fac --- /dev/null +++ b/t/perf/p5314-name-hash.sh @@ -0,0 +1,41 @@ +#!/bin/sh + +test_description='Tests pack performance using bitmaps' +. ./perf-lib.sh + +GIT_TEST_PASSING_SANITIZE_LEAK=0 +export GIT_TEST_PASSING_SANITIZE_LEAK + +test_perf_large_repo + +test_size 'paths at head' ' + git ls-tree -r --name-only HEAD >path-list && + wc -l name-hashes && + cat name-hashes | awk "{ print \$1; }" | sort -n | uniq -c >name-hash-count && + wc -l full-name-hash-count && + wc -l X-Patchwork-Id: 13797762 Received: from mail-ej1-f51.google.com (mail-ej1-f51.google.com [209.85.218.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B70DB18E36E for ; Tue, 10 Sep 2024 02:29:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.218.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725935368; cv=none; b=F+BhsYtM0RBV8aA3/MZhvM/L61eSZpxp0VuygW6UnkgS3Ki/7D4TVvoFQraLwEeOyFxtIgH9GsM9qMSJLlSozV6AppQ91apqNrc1ZbJB5Bic9O9OQHOPQ1Y9Nr9nqITySdjb1xE083uOmM3lTiv07mY+aGJgV01owTAIpAAratc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725935368; c=relaxed/simple; bh=m4MLzlye2TCdFcxQ5yXjI9cu8FgC9uqy1O7ylsOZ7E4=; h=Message-Id:In-Reply-To:References:From:Date:Subject:Content-Type: MIME-Version:To:Cc; b=fY3shDNHx9xBpWd8sZl5ID32Z8JJb3NrDf1TonkBgplcdu3v6NxZgKA6l2i/rX4qlenFkd1p9E/nEoQugSkIbG1vTJPLWDnAUOZk7vIWHLVuyerk4y7mrWELmoY07S9zLFcPxlhoeZJfSa0pC/ZhW5zUOb4a+iE3xsMEM3BXnss= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=dt5n9PWX; arc=none smtp.client-ip=209.85.218.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="dt5n9PWX" Received: by mail-ej1-f51.google.com with SMTP id a640c23a62f3a-a8d56155f51so20067566b.2 for ; Mon, 09 Sep 2024 19:29:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1725935364; x=1726540164; darn=vger.kernel.org; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=RthP71xYYsrFkCGp0OJL2Lstp0libMEfFDTaMFMxoWQ=; b=dt5n9PWXCEUeyaHuRVPQ+UFHCCXjB45JwnMS9ndlWrYGhpbvYfoLPP9/AHoV0iiYX2 pLAxOKg+j63XYmP5a7iWFxJhfCkC83c45726fI9I10p5iolRo8mmUE/F8HI8ETKAeIQ9 4GOqFMy2AGHAnIe2lItHW3DWkTlEvLredA+JKIr6S5jMJrBffkMezFgYnShFRNBsNO4y 4Z+gVNstMy+Q+iouI9eKiFLevBt+8U4IOGgoqq6YUH2jb5Ir6XuUIAIt+l7kGXqTOlbh PAFPr6vP9ZeX8hlnA0ggVDupF+DzORFI8UbyzXL+lv3KsoLNR5vAHM4FCq9VcvNgC+zQ 22rA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725935364; x=1726540164; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=RthP71xYYsrFkCGp0OJL2Lstp0libMEfFDTaMFMxoWQ=; b=Stgmj4jmYrYXS4cL8WN6+7vo10Ipy35B0onWYbcJC/7iUZx88cCalUB8pBaHWLiqzB d2OaeENXjOKbn//HXh6y/4D+qRe02OckalxAZJkgasp6QVki5aP3MHwn0e6wQ6TY56sK d4F3HL/2q/vJfjiu9OIdFsm1lwRU/L2vItRvY8pgupnOQoJvaxEK5Hev/qgfa3QhkApk teZcykDMtYTBnJldUXT6roq17+TMx+0+vYwMiNdQ2zkQHzf0PQ9Ji9HFbYeeWOWF018B NFeBJDqCPNaCE6WcEsEGsLvRpBJKWpNy0GtZFOIk1DPa/oOm6XUQ3KX2duJvh15KrYtg OvOQ== X-Gm-Message-State: AOJu0YzRl1dmMtpttpPj3AcRccaTXJJyWsoZcdm1hBHOQ+CemIIwUV62 ipsUYmcxsK6kPfDwhhUzROagdN9y9k6GczlyuH+0PFHmNuHR9K1kHy/zlQ== X-Google-Smtp-Source: AGHT+IFSnl45fi6DQTSgP/TDnnv77vEVngJUWqC+rprlJRKQ+cLSffCR/hT9OgVK3b7ZNYNp8m42BQ== X-Received: by 2002:a17:907:60d4:b0:a8d:64af:dc2a with SMTP id a640c23a62f3a-a8d64afdcafmr334682566b.25.1725935364290; Mon, 09 Sep 2024 19:29:24 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a8d25ced18csm412235866b.161.2024.09.09.19.29.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 09 Sep 2024 19:29:23 -0700 (PDT) Message-Id: <460feef90fdd869b42e3663a1a1336a8ae663bc0.1725935335.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Tue, 10 Sep 2024 02:28:55 +0000 Subject: [PATCH 30/30] pack-objects: output debug info about deltas Fcc: Sent Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, johannes.schindelin@gmx.de, peff@peff.net, ps@pks.im, me@ttaylorr.com, johncai86@gmail.com, newren@gmail.com, Derrick Stolee , Derrick Stolee From: Derrick Stolee From: Derrick Stolee In order to debug what is going on during delta calculations, add a --debug-file= option to 'git pack-objects'. This leads to sending a JSON-formatted description of the delta information to that file. Signed-off-by: Derrick Stolee --- builtin/pack-objects.c | 69 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 69 insertions(+) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 5d5a57e6b1f..7d1dd5a6557 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -50,6 +50,9 @@ */ static struct packing_data to_pack; +static FILE *delta_file; +static int delta_file_nr; + static inline struct object_entry *oe_delta( const struct packing_data *pack, const struct object_entry *e) @@ -516,6 +519,14 @@ static unsigned long write_no_reuse_object(struct hashfile *f, struct object_ent hdrlen = encode_in_pack_object_header(header, sizeof(header), type, size); + if (delta_file) { + if (delta_file_nr++) + fprintf(delta_file, ",\n"); + fprintf(delta_file, "\t\t{\n"); + fprintf(delta_file, "\t\t\t\"oid\" : \"%s\",\n", oid_to_hex(&entry->idx.oid)); + fprintf(delta_file, "\t\t\t\"size\" : %"PRIuMAX",\n", datalen); + } + if (type == OBJ_OFS_DELTA) { /* * Deltas with relative base contain an additional @@ -536,6 +547,11 @@ static unsigned long write_no_reuse_object(struct hashfile *f, struct object_ent hashwrite(f, header, hdrlen); hashwrite(f, dheader + pos, sizeof(dheader) - pos); hdrlen += sizeof(dheader) - pos; + if (delta_file) { + fprintf(delta_file, "\t\t\t\"delta_type\" : \"OFS\",\n"); + fprintf(delta_file, "\t\t\t\"offset\" : %"PRIuMAX",\n", ofs); + fprintf(delta_file, "\t\t\t\"delta_base\" : \"%s\",\n", oid_to_hex(&DELTA(entry)->idx.oid)); + } } else if (type == OBJ_REF_DELTA) { /* * Deltas with a base reference contain @@ -550,6 +566,10 @@ static unsigned long write_no_reuse_object(struct hashfile *f, struct object_ent hashwrite(f, header, hdrlen); hashwrite(f, DELTA(entry)->idx.oid.hash, hashsz); hdrlen += hashsz; + if (delta_file) { + fprintf(delta_file, "\t\t\t\"delta_type\" : \"REF\",\n"); + fprintf(delta_file, "\t\t\t\"delta_base\" : \"%s\",\n", oid_to_hex(&DELTA(entry)->idx.oid)); + } } else { if (limit && hdrlen + datalen + hashsz >= limit) { if (st) @@ -559,6 +579,10 @@ static unsigned long write_no_reuse_object(struct hashfile *f, struct object_ent } hashwrite(f, header, hdrlen); } + + if (delta_file) + fprintf(delta_file, "\t\t\t\"reused\" : false\n\t\t}"); + if (st) { datalen = write_large_blob_data(st, f, &entry->idx.oid); close_istream(st); @@ -619,6 +643,14 @@ static off_t write_reuse_object(struct hashfile *f, struct object_entry *entry, return write_no_reuse_object(f, entry, limit, usable_delta); } + if (delta_file) { + if (delta_file_nr++) + fprintf(delta_file, ",\n"); + fprintf(delta_file, "\t\t{\n"); + fprintf(delta_file, "\t\t\t\"oid\" : \"%s\",\n", oid_to_hex(&entry->idx.oid)); + fprintf(delta_file, "\t\t\t\"size\" : %"PRIuMAX",\n", entry_size); + } + if (type == OBJ_OFS_DELTA) { off_t ofs = entry->idx.offset - DELTA(entry)->idx.offset; unsigned pos = sizeof(dheader) - 1; @@ -633,6 +665,12 @@ static off_t write_reuse_object(struct hashfile *f, struct object_entry *entry, hashwrite(f, dheader + pos, sizeof(dheader) - pos); hdrlen += sizeof(dheader) - pos; reused_delta++; + + if (delta_file) { + fprintf(delta_file, "\t\t\t\"delta_type\" : \"OFS\",\n"); + fprintf(delta_file, "\t\t\t\"offset\" : %"PRIuMAX",\n", ofs); + fprintf(delta_file, "\t\t\t\"delta_base\" : \"%s\",\n", oid_to_hex(&DELTA(entry)->idx.oid)); + } } else if (type == OBJ_REF_DELTA) { if (limit && hdrlen + hashsz + datalen + hashsz >= limit) { unuse_pack(&w_curs); @@ -642,6 +680,10 @@ static off_t write_reuse_object(struct hashfile *f, struct object_entry *entry, hashwrite(f, DELTA(entry)->idx.oid.hash, hashsz); hdrlen += hashsz; reused_delta++; + if (delta_file) { + fprintf(delta_file, "\t\t\t\"delta_type\" : \"REF\",\n"); + fprintf(delta_file, "\t\t\t\"delta_base\" : \"%s\",\n", oid_to_hex(&DELTA(entry)->idx.oid)); + } } else { if (limit && hdrlen + datalen + hashsz >= limit) { unuse_pack(&w_curs); @@ -652,6 +694,10 @@ static off_t write_reuse_object(struct hashfile *f, struct object_entry *entry, copy_pack_data(f, p, &w_curs, offset, datalen); unuse_pack(&w_curs); reused++; + + if (delta_file) + fprintf(delta_file, "\t\t\t\"reused\" : true\n\t\t}"); + return hdrlen + datalen; } @@ -1264,6 +1310,11 @@ static void write_pack_file(void) ALLOC_ARRAY(written_list, to_pack.nr_objects); write_order = compute_write_order(); + if (delta_file) { + fprintf(delta_file, "{\n\t\"num_objects\" : %"PRIu32",\n", to_pack.nr_objects); + fprintf(delta_file, "\t\"objects\" : [\n"); + } + do { unsigned char hash[GIT_MAX_RAWSZ]; char *pack_tmp_name = NULL; @@ -1412,6 +1463,9 @@ static void write_pack_file(void) written, nr_result); trace2_data_intmax("pack-objects", the_repository, "write_pack_file/wrote", nr_result); + + if (delta_file) + fprintf(delta_file, "\n\t]\n}"); } static int no_try_delta(const char *path) @@ -4430,6 +4484,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) struct string_list keep_pack_list = STRING_LIST_INIT_NODUP; struct list_objects_filter_options filter_options = LIST_OBJECTS_FILTER_INIT; + const char *delta_file_name = NULL; struct option pack_objects_options[] = { OPT_CALLBACK_F('q', "quiet", &progress, NULL, @@ -4536,6 +4591,9 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) N_("exclude any configured uploadpack.blobpackfileuri with this protocol")), OPT_BOOL(0, "full-name-hash", &use_full_name_hash, N_("optimize delta compression across identical path names over time")), + OPT_STRING(0, "delta-file", &delta_file_name, + N_("filename"), + N_("output delta compression details to the given file")), OPT_END(), }; @@ -4573,6 +4631,12 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) if (pack_to_stdout != !base_name || argc) usage_with_options(pack_usage, pack_objects_options); + if (delta_file_name) { + delta_file = fopen(delta_file_name, "w"); + if (!delta_file) + die_errno("failed to open '%s'", delta_file_name); + trace2_printf("opened '%s' for writing deltas", delta_file_name); + } if (depth < 0) depth = 0; if (depth >= (1 << OE_DEPTH_BITS)) { @@ -4796,5 +4860,10 @@ cleanup: list_objects_filter_release(&filter_options); strvec_clear(&rp); + if (delta_file) { + fflush(delta_file); + fclose(delta_file); + } + return 0; }