From patchwork Sun Oct 20 13:43:14 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13843078 Received: from mail-ej1-f45.google.com (mail-ej1-f45.google.com [209.85.218.45]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C576B1922C4 for ; Sun, 20 Oct 2024 13:43:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.218.45 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729431819; cv=none; b=hX1xj+m2iqvKJtPDcXG5DrIqgHGLrPlPSWGF51n97ZRp5aHeltZq5SD/593mcUEudKJFpkuWah3x3jEUhl5qpLAPYJTycASyxPGbIPgF53C9N0vVTR40eaSLvb5zhv+MafXg41pdELJaoxQ8Vsuhv3ArZfU3PwwYml+xMnBGBrY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729431819; c=relaxed/simple; bh=3RFrCpMyIeHwWLPS6FhFaejRqzgyAuxob4aFUYg9mHg=; h=Message-Id:In-Reply-To:References:From:Date:Subject:Content-Type: MIME-Version:To:Cc; b=GVCUgJtABpNtSnyybTxFtIj9UKehAIZoG2gsKz6o5D5wAGM/3ncpyUzh/Z6hV0Pw7elTscikN67jY8ar3LE4M5rpyoIvVcK4/FenfkrmI9Fi4qBXj9rKiZC6XTy/C2/DfeSjz26hba7Z81rwLbnLdKAd2CnK/TiFB3xwvIG9uD4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=fRva8AUd; arc=none smtp.client-ip=209.85.218.45 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="fRva8AUd" Received: by mail-ej1-f45.google.com with SMTP id a640c23a62f3a-a99c0beaaa2so599858066b.1 for ; Sun, 20 Oct 2024 06:43:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1729431814; x=1730036614; darn=vger.kernel.org; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=iVcW7vVL9aO5GkPyipqtIhpIRCh8ACMxU0E0nCA63JA=; b=fRva8AUdoX2U+Wkb6T5YO3B5nqBCkyIrfJJ67IW5VkdnvpDXyQi844BmW2yjxcdDyp Pg5766N+ddpF93gT6PJATDBUcdGJK71oThP3ZVnuQazhn9LDEvXagUgmjSaScBFAa4Nx s7nZ6cKxzvs/6pLEfGcaqf0ZPURqzKvXEvqNEa0MAhX0oAVt+WlfZeb60SHEC/X3cqEJ OJAggUow1grpKq6CSTlhnDxuU9Blx2h+/Hb7hXp9/Shi1DyhwURhmzguMJ7enhndTcxK fMWspBXHUqjKKcPaA+hNIoXeL0qVpqPoy97kCeKR3wy1DOwEsPaP9iu70HeH72uIZQp8 ex+Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729431814; x=1730036614; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=iVcW7vVL9aO5GkPyipqtIhpIRCh8ACMxU0E0nCA63JA=; b=kd6xCyXAggyHmjEyVcpG+TIvdr/5Issz8jznP4lWcD5Ly+2fiRKv+Sl+EKB6kG5uhx nyTXh7Onq8fRpefxhHAT6K4MFEjvY82lSz364bHrNknI+j9fqEOdJW0S2/5XTxGU6O30 PMNdVqse8WFJ/PrlSU2vIGZ55mlE6imy3Rkg3EYEK0OBhi57kZzgIxYKIjBYTB8rk5FU 8gmly0YZg8gHUH9pRpp052kkn+NuwPg5h4NJ/J2PRB/PJG1Tg2CE8Sbfb2Tttnt7OE6V Q9U+f/HVgNoFYr2XncHIQmPm+M8BzbAU7WCjshslmHH82vRUbD6NRWS0OEIBSIfM2DZH qy6g== X-Gm-Message-State: AOJu0YwnfpFZtNZhCfQac+l2DYjKc+TJqqh/Rfn0WInt1TZsdXLq6pbG QYAneezBWR9ILX+KkDWz0A7gGATBMYvLeNpO6qJyltaH+6bXrGjlaTiVUQ== X-Google-Smtp-Source: AGHT+IGEfG68dRoUtC2bTmuw3KY98RLST7HQcJPkeeHodwWf4B9KcofpLeHKFjFjp4fuOWaLJBKl4g== X-Received: by 2002:a17:907:97cb:b0:a9a:33c:f6e4 with SMTP id a640c23a62f3a-a9a69c9eb72mr841046566b.40.1729431813928; Sun, 20 Oct 2024 06:43:33 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a9a91370e9csm91655366b.136.2024.10.20.06.43.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 20 Oct 2024 06:43:33 -0700 (PDT) Message-Id: <98bdc94a773d797152bebfbfda88fa1bb0707821.1729431810.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Sun, 20 Oct 2024 13:43:14 +0000 Subject: [PATCH v2 01/17] path-walk: introduce an object walk by path Fcc: Sent Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, johannes.schindelin@gmx.de, peff@peff.net, ps@pks.im, me@ttaylorr.com, johncai86@gmail.com, newren@gmail.com, christian.couder@gmail.com, kristofferhaugsbakk@fastmail.com, Derrick Stolee , Derrick Stolee From: Derrick Stolee From: Derrick Stolee In anticipation of a few planned applications, introduce the most basic form of a path-walk API. It currently assumes that there are no UNINTERESTING objects and does not include any complicated filters. It calls a function pointer on groups of tree and blob objects as grouped by path. This only includes objects the first time they are discovered, so an object that appears at multiple paths will not be included in two batches. There are many future adaptations that could be made, but they are left for future updates when consumers are ready to take advantage of those features. Signed-off-by: Derrick Stolee --- Documentation/technical/api-path-walk.txt | 54 +++++ Makefile | 1 + path-walk.c | 241 ++++++++++++++++++++++ path-walk.h | 43 ++++ 4 files changed, 339 insertions(+) create mode 100644 Documentation/technical/api-path-walk.txt create mode 100644 path-walk.c create mode 100644 path-walk.h diff --git a/Documentation/technical/api-path-walk.txt b/Documentation/technical/api-path-walk.txt new file mode 100644 index 00000000000..6472222ae6d --- /dev/null +++ b/Documentation/technical/api-path-walk.txt @@ -0,0 +1,54 @@ +Path-Walk API +============= + +The path-walk API is used to walk reachable objects, but to visit objects +in batches based on a common path they appear in, or by type. + +For example, all reachable commits are visited in a group. All tags are +visited in a group. Then, all root trees are visited. At some point, all +blobs reachable via a path `my/dir/to/A` are visited. When there are +multiple paths possible to reach the same object, then only one of those +paths is used to visit the object. + +When walking a range of commits with some `UNINTERESTING` objects, the +objects with the `UNINTERESTING` flag are included in these batches. In +order to walk `UNINTERESTING` objects, the `--boundary` option must be +used in the commit walk in order to visit `UNINTERESTING` commits. + +Basics +------ + +To use the path-walk API, include `path-walk.h` and call +`walk_objects_by_path()` with a customized `path_walk_info` struct. The +struct is used to set all of the options for how the walk should proceed. +Let's dig into the different options and their use. + +`path_fn` and `path_fn_data`:: + The most important option is the `path_fn` option, which is a + function pointer to the callback that can execute logic on the + object IDs for objects grouped by type and path. This function + also receives a `data` value that corresponds to the + `path_fn_data` member, for providing custom data structures to + this callback function. + +`revs`:: + To configure the exact details of the reachable set of objects, + use the `revs` member and initialize it using the revision + machinery in `revision.h`. Initialize `revs` using calls such as + `setup_revisions()` or `parse_revision_opt()`. Do not call + `prepare_revision_walk()`, as that will be called within + `walk_objects_by_path()`. ++ +It is also important that you do not specify the `--objects` flag for the +`revs` struct. The revision walk should only be used to walk commits, and +the objects will be walked in a separate way based on those starting +commits. ++ +If you want the path-walk API to emit `UNINTERESTING` objects based on the +commit walk's boundary, be sure to set `revs.boundary` so the boundary +commits are emitted. + +Examples +-------- + +See example usages in future changes. diff --git a/Makefile b/Makefile index 7344a7f7257..d0d8d6888e3 100644 --- a/Makefile +++ b/Makefile @@ -1094,6 +1094,7 @@ LIB_OBJS += parse-options.o LIB_OBJS += patch-delta.o LIB_OBJS += patch-ids.o LIB_OBJS += path.o +LIB_OBJS += path-walk.o LIB_OBJS += pathspec.o LIB_OBJS += pkt-line.o LIB_OBJS += preload-index.o diff --git a/path-walk.c b/path-walk.c new file mode 100644 index 00000000000..66840187e28 --- /dev/null +++ b/path-walk.c @@ -0,0 +1,241 @@ +/* + * path-walk.c: implementation for path-based walks of the object graph. + */ +#include "git-compat-util.h" +#include "path-walk.h" +#include "blob.h" +#include "commit.h" +#include "dir.h" +#include "hashmap.h" +#include "hex.h" +#include "object.h" +#include "oid-array.h" +#include "revision.h" +#include "string-list.h" +#include "strmap.h" +#include "trace2.h" +#include "tree.h" +#include "tree-walk.h" + +struct type_and_oid_list +{ + enum object_type type; + struct oid_array oids; +}; + +#define TYPE_AND_OID_LIST_INIT { \ + .type = OBJ_NONE, \ + .oids = OID_ARRAY_INIT \ +} + +struct path_walk_context { + /** + * Repeats of data in 'struct path_walk_info' for + * access with fewer characters. + */ + struct repository *repo; + struct rev_info *revs; + struct path_walk_info *info; + + /** + * Map a path to a 'struct type_and_oid_list' + * containing the objects discovered at that + * path. + */ + struct strmap paths_to_lists; + + /** + * Store the current list of paths in a stack, to + * facilitate depth-first-search without recursion. + */ + struct string_list path_stack; +}; + +static int add_children(struct path_walk_context *ctx, + const char *base_path, + struct object_id *oid) +{ + struct tree_desc desc; + struct name_entry entry; + struct strbuf path = STRBUF_INIT; + size_t base_len; + struct tree *tree = lookup_tree(ctx->repo, oid); + + if (!tree) { + error(_("failed to walk children of tree %s: not found"), + oid_to_hex(oid)); + return -1; + } else if (parse_tree_gently(tree, 1)) { + die("bad tree object %s", oid_to_hex(oid)); + } + + strbuf_addstr(&path, base_path); + base_len = path.len; + + parse_tree(tree); + init_tree_desc(&desc, &tree->object.oid, tree->buffer, tree->size); + while (tree_entry(&desc, &entry)) { + struct type_and_oid_list *list; + struct object *o; + /* Not actually true, but we will ignore submodules later. */ + enum object_type type = S_ISDIR(entry.mode) ? OBJ_TREE : OBJ_BLOB; + + /* Skip submodules. */ + if (S_ISGITLINK(entry.mode)) + continue; + + if (type == OBJ_TREE) { + struct tree *child = lookup_tree(ctx->repo, &entry.oid); + o = child ? &child->object : NULL; + } else if (type == OBJ_BLOB) { + struct blob *child = lookup_blob(ctx->repo, &entry.oid); + o = child ? &child->object : NULL; + } else { + /* Wrong type? */ + continue; + } + + if (!o) /* report error?*/ + continue; + + /* Skip this object if already seen. */ + if (o->flags & SEEN) + continue; + o->flags |= SEEN; + + strbuf_setlen(&path, base_len); + strbuf_add(&path, entry.path, entry.pathlen); + + /* + * Trees will end with "/" for concatenation and distinction + * from blobs at the same path. + */ + if (type == OBJ_TREE) + strbuf_addch(&path, '/'); + + if (!(list = strmap_get(&ctx->paths_to_lists, path.buf))) { + CALLOC_ARRAY(list, 1); + list->type = type; + strmap_put(&ctx->paths_to_lists, path.buf, list); + string_list_append(&ctx->path_stack, path.buf); + } + oid_array_append(&list->oids, &entry.oid); + } + + free_tree_buffer(tree); + strbuf_release(&path); + return 0; +} + +/* + * For each path in paths_to_explore, walk the trees another level + * and add any found blobs to the batch (but only if they exist and + * haven't been added yet). + */ +static int walk_path(struct path_walk_context *ctx, + const char *path) +{ + struct type_and_oid_list *list; + int ret = 0; + + list = strmap_get(&ctx->paths_to_lists, path); + + /* Evaluate function pointer on this data. */ + ret = ctx->info->path_fn(path, &list->oids, list->type, + ctx->info->path_fn_data); + + /* Expand data for children. */ + if (list->type == OBJ_TREE) { + for (size_t i = 0; i < list->oids.nr; i++) { + ret |= add_children(ctx, + path, + &list->oids.oid[i]); + } + } + + oid_array_clear(&list->oids); + strmap_remove(&ctx->paths_to_lists, path, 1); + return ret; +} + +static void clear_strmap(struct strmap *map) +{ + struct hashmap_iter iter; + struct strmap_entry *e; + + hashmap_for_each_entry(&map->map, &iter, e, ent) { + struct type_and_oid_list *list = e->value; + oid_array_clear(&list->oids); + } + strmap_clear(map, 1); + strmap_init(map); +} + +/** + * Given the configuration of 'info', walk the commits based on 'info->revs' and + * call 'info->path_fn' on each discovered path. + * + * Returns nonzero on an error. + */ +int walk_objects_by_path(struct path_walk_info *info) +{ + const char *root_path = ""; + int ret = 0; + size_t commits_nr = 0, paths_nr = 0; + struct commit *c; + struct type_and_oid_list *root_tree_list; + struct path_walk_context ctx = { + .repo = info->revs->repo, + .revs = info->revs, + .info = info, + .path_stack = STRING_LIST_INIT_DUP, + .paths_to_lists = STRMAP_INIT + }; + + trace2_region_enter("path-walk", "commit-walk", info->revs->repo); + + /* Insert a single list for the root tree into the paths. */ + CALLOC_ARRAY(root_tree_list, 1); + root_tree_list->type = OBJ_TREE; + strmap_put(&ctx.paths_to_lists, root_path, root_tree_list); + + if (prepare_revision_walk(info->revs)) + die(_("failed to setup revision walk")); + + while ((c = get_revision(info->revs))) { + struct object_id *oid = get_commit_tree_oid(c); + struct tree *t = lookup_tree(info->revs->repo, oid); + commits_nr++; + + if (t) { + if (t->object.flags & SEEN) + continue; + t->object.flags |= SEEN; + oid_array_append(&root_tree_list->oids, oid); + } else { + warning("could not find tree %s", oid_to_hex(oid)); + } + } + + trace2_data_intmax("path-walk", ctx.repo, "commits", commits_nr); + trace2_region_leave("path-walk", "commit-walk", info->revs->repo); + + string_list_append(&ctx.path_stack, root_path); + + trace2_region_enter("path-walk", "path-walk", info->revs->repo); + while (!ret && ctx.path_stack.nr) { + char *path = ctx.path_stack.items[ctx.path_stack.nr - 1].string; + ctx.path_stack.nr--; + paths_nr++; + + ret = walk_path(&ctx, path); + + free(path); + } + trace2_data_intmax("path-walk", ctx.repo, "paths", paths_nr); + trace2_region_leave("path-walk", "path-walk", info->revs->repo); + + clear_strmap(&ctx.paths_to_lists); + string_list_clear(&ctx.path_stack, 0); + return ret; +} diff --git a/path-walk.h b/path-walk.h new file mode 100644 index 00000000000..c9e94a98bc8 --- /dev/null +++ b/path-walk.h @@ -0,0 +1,43 @@ +/* + * path-walk.h : Methods and structures for walking the object graph in batches + * by the paths that can reach those objects. + */ +#include "object.h" /* Required for 'enum object_type'. */ + +struct rev_info; +struct oid_array; + +/** + * The type of a function pointer for the method that is called on a list of + * objects reachable at a given path. + */ +typedef int (*path_fn)(const char *path, + struct oid_array *oids, + enum object_type type, + void *data); + +struct path_walk_info { + /** + * revs provides the definitions for the commit walk, including + * which commits are UNINTERESTING or not. + */ + struct rev_info *revs; + + /** + * The caller wishes to execute custom logic on objects reachable at a + * given path. Every reachable object will be visited exactly once, and + * the first path to see an object wins. This may not be a stable choice. + */ + path_fn path_fn; + void *path_fn_data; +}; + +#define PATH_WALK_INFO_INIT { 0 } + +/** + * Given the configuration of 'info', walk the commits based on 'info->revs' and + * call 'info->path_fn' on each discovered path. + * + * Returns nonzero on an error. + */ +int walk_objects_by_path(struct path_walk_info *info); From patchwork Sun Oct 20 13:43:15 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13843079 Received: from mail-ej1-f49.google.com (mail-ej1-f49.google.com [209.85.218.49]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A8CB6192584 for ; Sun, 20 Oct 2024 13:43:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.218.49 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729431819; cv=none; b=PX+DtWqzNK6PitBnN3EZZuTUheO2HLDIRbSIJ/H0ly/pvCfUrIRfOhPAiAjgZYhrlgdZAoHob3kDaPmgNsr/QSq7UG8dAuVsOrPKNhwD3nkYJglWcMQlzn4bKe7UreL45eLc/sk0AHSVQsd7GCJETKqi+waJt12120KVKq3VTBE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729431819; c=relaxed/simple; bh=+no2Fsg3ePTOblpujd50sIirScNQ4vEq4mZNIMf11zc=; h=Message-Id:In-Reply-To:References:From:Date:Subject:Content-Type: MIME-Version:To:Cc; b=KUMuUYMcELnTfJJybr44zJ9DXKaOsFYSXrjzFCTfqeO8jHPBo9AG7Eb/Uq6VYR2HDldDh0l7sGCRit6LvQcHJYkumCZCTWauWsxaW5HgNJKiqT5gCnT4DIS5XDQfLt0R9NDWcc/44IKyr/Y7c8HgW88Yt0JAmB4hNmXVkcibwkU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=YORDsKmJ; arc=none smtp.client-ip=209.85.218.49 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="YORDsKmJ" Received: by mail-ej1-f49.google.com with SMTP id a640c23a62f3a-a99e3b3a411so744316366b.0 for ; Sun, 20 Oct 2024 06:43:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1729431815; x=1730036615; darn=vger.kernel.org; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=Z8e37B+OfdvykHAq8461cA43BBv4rorwVpoDJOIItyw=; b=YORDsKmJo6k5doTaH0h3j8hNOTlNil2/T8sj6BiiFMqtQVTDJ5o7AZEPclqAnPqIfH NMUavDSVwgaEpsfoVwGKQbHQp+CidpZKkbkRE2lTP7AnLZ0rjNzOsaH/otvbv+Abm2k7 CKpeOn83JfpXRB2ilXXg+hdeL6n2oUCTJXGCspWAKUzToo42fS7Fm9tRfe/zSpm0tGt0 CPfJFrO1e5l/cWpt0jZzZ9xOZtADoVWo9vYsLbBH3BAxKAcEUkaCQZ2UTpjWmnnNAucN DNnk7RI8lQG0lTBkgwZW9AqfxeMuehim20ZpDCh3fn54sEgk1NlhSL15cVlNFNSUjVQ4 4rEg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729431815; x=1730036615; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Z8e37B+OfdvykHAq8461cA43BBv4rorwVpoDJOIItyw=; b=ATIWKWZwprHi2aLcqLlpToOoqkcXc3ACB0KBhxi0tOl1rbPZ7MKo3Im13SZrzFE5C9 L4Ixy+UvyVauBSbsjG3TzozIwmJAdd46ig2ykzrokm4ZUHuFa+RrmBq+XFNC5CckAwaC khxgMOGFm+1+NtxocjuWwB7pWTPjPzyupYflLw5pPvqn92HHMUh09aaZ1HDr8iP/S26j aIQKGLe1THZlO4bfjGV5AeldCKMJ58EJPkbBSCxAVHBT2hnL5V2ffbAiR5e+ohBjabZC 0KyrU1F3dX4y8XbkTTiT7FMF41Rm1c4mDy+bO9rhGKdgKsVXQHGJfHw5YnjP2UrY1/7I fE7g== X-Gm-Message-State: AOJu0Yx8WP4w462BnZE9U/vE31CJnnPrIejV5c6y15TtFoTREC24u4Aj qKltUO46gHVpe6CvmzntDiia2+5GsLohjBruvIwkNQKEQFdRAhjIEd42GA== X-Google-Smtp-Source: AGHT+IFck5nepHknvr4W7GJ8jX15WqZ4OIn9xTd/NIgGIc4c1FUxDsaJ8TE2KUfkChcslyfzxbt2nQ== X-Received: by 2002:a17:907:2d94:b0:a9a:616c:459e with SMTP id a640c23a62f3a-a9a6a569eadmr989696466b.27.1729431814811; Sun, 20 Oct 2024 06:43:34 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a9a91559a1csm92356266b.125.2024.10.20.06.43.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 20 Oct 2024 06:43:34 -0700 (PDT) Message-Id: In-Reply-To: References: Date: Sun, 20 Oct 2024 13:43:15 +0000 Subject: [PATCH v2 02/17] t6601: add helper for testing path-walk API Fcc: Sent Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, johannes.schindelin@gmx.de, peff@peff.net, ps@pks.im, me@ttaylorr.com, johncai86@gmail.com, newren@gmail.com, christian.couder@gmail.com, kristofferhaugsbakk@fastmail.com, Derrick Stolee , Derrick Stolee From: Derrick Stolee From: Derrick Stolee Add some tests based on the current behavior, doing interesting checks for different sets of branches, ranges, and the --boundary option. This sets a baseline for the behavior and we can extend it as new options are introduced. Signed-off-by: Derrick Stolee --- Documentation/technical/api-path-walk.txt | 3 +- Makefile | 1 + t/helper/test-path-walk.c | 86 ++++++++++++++ t/helper/test-tool.c | 1 + t/helper/test-tool.h | 1 + t/t6601-path-walk.sh | 130 ++++++++++++++++++++++ 6 files changed, 221 insertions(+), 1 deletion(-) create mode 100644 t/helper/test-path-walk.c create mode 100755 t/t6601-path-walk.sh diff --git a/Documentation/technical/api-path-walk.txt b/Documentation/technical/api-path-walk.txt index 6472222ae6d..e588897ab8d 100644 --- a/Documentation/technical/api-path-walk.txt +++ b/Documentation/technical/api-path-walk.txt @@ -51,4 +51,5 @@ commits are emitted. Examples -------- -See example usages in future changes. +See example usages in: + `t/helper/test-path-walk.c` diff --git a/Makefile b/Makefile index d0d8d6888e3..50413d96492 100644 --- a/Makefile +++ b/Makefile @@ -818,6 +818,7 @@ TEST_BUILTINS_OBJS += test-parse-options.o TEST_BUILTINS_OBJS += test-parse-pathspec-file.o TEST_BUILTINS_OBJS += test-partial-clone.o TEST_BUILTINS_OBJS += test-path-utils.o +TEST_BUILTINS_OBJS += test-path-walk.o TEST_BUILTINS_OBJS += test-pcre2-config.o TEST_BUILTINS_OBJS += test-pkt-line.o TEST_BUILTINS_OBJS += test-proc-receive.o diff --git a/t/helper/test-path-walk.c b/t/helper/test-path-walk.c new file mode 100644 index 00000000000..3c48f017fa0 --- /dev/null +++ b/t/helper/test-path-walk.c @@ -0,0 +1,86 @@ +#define USE_THE_REPOSITORY_VARIABLE + +#include "test-tool.h" +#include "environment.h" +#include "hex.h" +#include "object-name.h" +#include "object.h" +#include "pretty.h" +#include "revision.h" +#include "setup.h" +#include "parse-options.h" +#include "path-walk.h" +#include "oid-array.h" + +static const char * const path_walk_usage[] = { + N_("test-tool path-walk -- "), + NULL +}; + +struct path_walk_test_data { + uintmax_t tree_nr; + uintmax_t blob_nr; +}; + +static int emit_block(const char *path, struct oid_array *oids, + enum object_type type, void *data) +{ + struct path_walk_test_data *tdata = data; + const char *typestr; + + switch (type) { + case OBJ_TREE: + typestr = "TREE"; + tdata->tree_nr += oids->nr; + break; + + case OBJ_BLOB: + typestr = "BLOB"; + tdata->blob_nr += oids->nr; + break; + + default: + BUG("we do not understand this type"); + } + + for (size_t i = 0; i < oids->nr; i++) + printf("%s:%s:%s\n", typestr, path, oid_to_hex(&oids->oid[i])); + + return 0; +} + +int cmd__path_walk(int argc, const char **argv) +{ + int res; + struct rev_info revs = REV_INFO_INIT; + struct path_walk_info info = PATH_WALK_INFO_INIT; + struct path_walk_test_data data = { 0 }; + struct option options[] = { + OPT_END(), + }; + + initialize_repository(the_repository); + setup_git_directory(); + revs.repo = the_repository; + + argc = parse_options(argc, argv, NULL, + options, path_walk_usage, + PARSE_OPT_KEEP_UNKNOWN_OPT | PARSE_OPT_KEEP_ARGV0); + + if (argc > 1) + setup_revisions(argc, argv, &revs, NULL); + else + usage(path_walk_usage[0]); + + info.revs = &revs; + info.path_fn = emit_block; + info.path_fn_data = &data; + + res = walk_objects_by_path(&info); + + printf("trees:%" PRIuMAX "\n" + "blobs:%" PRIuMAX "\n", + data.tree_nr, data.blob_nr); + + return res; +} diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c index 1ebb69a5dc4..43676e7b93a 100644 --- a/t/helper/test-tool.c +++ b/t/helper/test-tool.c @@ -52,6 +52,7 @@ static struct test_cmd cmds[] = { { "parse-subcommand", cmd__parse_subcommand }, { "partial-clone", cmd__partial_clone }, { "path-utils", cmd__path_utils }, + { "path-walk", cmd__path_walk }, { "pcre2-config", cmd__pcre2_config }, { "pkt-line", cmd__pkt_line }, { "proc-receive", cmd__proc_receive }, diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h index 21802ac27da..9cfc5da6e57 100644 --- a/t/helper/test-tool.h +++ b/t/helper/test-tool.h @@ -45,6 +45,7 @@ int cmd__parse_pathspec_file(int argc, const char** argv); int cmd__parse_subcommand(int argc, const char **argv); int cmd__partial_clone(int argc, const char **argv); int cmd__path_utils(int argc, const char **argv); +int cmd__path_walk(int argc, const char **argv); int cmd__pcre2_config(int argc, const char **argv); int cmd__pkt_line(int argc, const char **argv); int cmd__proc_receive(int argc, const char **argv); diff --git a/t/t6601-path-walk.sh b/t/t6601-path-walk.sh new file mode 100755 index 00000000000..ca18b61c3f1 --- /dev/null +++ b/t/t6601-path-walk.sh @@ -0,0 +1,130 @@ +#!/bin/sh + +test_description='direct path-walk API tests' + +. ./test-lib.sh + +test_expect_success 'setup test repository' ' + git checkout -b base && + + mkdir left && + mkdir right && + echo a >a && + echo b >left/b && + echo c >right/c && + git add . && + git commit -m "first" && + + echo d >right/d && + git add right && + git commit -m "second" && + + echo bb >left/b && + git commit -a -m "third" && + + git checkout -b topic HEAD~1 && + echo cc >right/c && + git commit -a -m "topic" +' + +test_expect_success 'all' ' + test-tool path-walk -- --all >out && + + cat >expect <<-EOF && + TREE::$(git rev-parse topic^{tree}) + TREE::$(git rev-parse base^{tree}) + TREE::$(git rev-parse base~1^{tree}) + TREE::$(git rev-parse base~2^{tree}) + TREE:left/:$(git rev-parse base:left) + TREE:left/:$(git rev-parse base~2:left) + TREE:right/:$(git rev-parse topic:right) + TREE:right/:$(git rev-parse base~1:right) + TREE:right/:$(git rev-parse base~2:right) + trees:9 + BLOB:a:$(git rev-parse base~2:a) + BLOB:left/b:$(git rev-parse base~2:left/b) + BLOB:left/b:$(git rev-parse base:left/b) + BLOB:right/c:$(git rev-parse base~2:right/c) + BLOB:right/c:$(git rev-parse topic:right/c) + BLOB:right/d:$(git rev-parse base~1:right/d) + blobs:6 + EOF + + sort expect >expect.sorted && + sort out >out.sorted && + + test_cmp expect.sorted out.sorted +' + +test_expect_success 'topic only' ' + test-tool path-walk -- topic >out && + + cat >expect <<-EOF && + TREE::$(git rev-parse topic^{tree}) + TREE::$(git rev-parse base~1^{tree}) + TREE::$(git rev-parse base~2^{tree}) + TREE:left/:$(git rev-parse base~2:left) + TREE:right/:$(git rev-parse topic:right) + TREE:right/:$(git rev-parse base~1:right) + TREE:right/:$(git rev-parse base~2:right) + trees:7 + BLOB:a:$(git rev-parse base~2:a) + BLOB:left/b:$(git rev-parse base~2:left/b) + BLOB:right/c:$(git rev-parse base~2:right/c) + BLOB:right/c:$(git rev-parse topic:right/c) + BLOB:right/d:$(git rev-parse base~1:right/d) + blobs:5 + EOF + + sort expect >expect.sorted && + sort out >out.sorted && + + test_cmp expect.sorted out.sorted +' + +test_expect_success 'topic, not base' ' + test-tool path-walk -- topic --not base >out && + + cat >expect <<-EOF && + TREE::$(git rev-parse topic^{tree}) + TREE:left/:$(git rev-parse topic:left) + TREE:right/:$(git rev-parse topic:right) + trees:3 + BLOB:a:$(git rev-parse topic:a) + BLOB:left/b:$(git rev-parse topic:left/b) + BLOB:right/c:$(git rev-parse topic:right/c) + BLOB:right/d:$(git rev-parse topic:right/d) + blobs:4 + EOF + + sort expect >expect.sorted && + sort out >out.sorted && + + test_cmp expect.sorted out.sorted +' + +test_expect_success 'topic, not base, boundary' ' + test-tool path-walk -- --boundary topic --not base >out && + + cat >expect <<-EOF && + TREE::$(git rev-parse topic^{tree}) + TREE::$(git rev-parse base~1^{tree}) + TREE:left/:$(git rev-parse base~1:left) + TREE:right/:$(git rev-parse topic:right) + TREE:right/:$(git rev-parse base~1:right) + trees:5 + BLOB:a:$(git rev-parse base~1:a) + BLOB:left/b:$(git rev-parse base~1:left/b) + BLOB:right/c:$(git rev-parse base~1:right/c) + BLOB:right/c:$(git rev-parse topic:right/c) + BLOB:right/d:$(git rev-parse base~1:right/d) + blobs:5 + EOF + + sort expect >expect.sorted && + sort out >out.sorted && + + test_cmp expect.sorted out.sorted +' + +test_done From patchwork Sun Oct 20 13:43:16 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13843080 Received: from mail-ej1-f44.google.com (mail-ej1-f44.google.com [209.85.218.44]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 71D62192D87 for ; Sun, 20 Oct 2024 13:43:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.218.44 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729431821; cv=none; b=ScJxEK4Nqd9W4bBhEGinmDu/pt31XQuLmvNUOTnXKqlqsEbLaKhTCXcFb66HqWVntdYXbIWP9Lxmj0AzSLzNvKKs36wuLEi3O4pdK+JP8AJnVjdYgPwrWEmsEwkZ2wSIKzhBGkKfiUzQ70uhB1/DrAoS3IaCr7C0Gh7kd25FXUo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729431821; c=relaxed/simple; bh=DhMtzSLbO6KQGxfjDhkLvwejR8ALzzAtYNlSP5BIt78=; h=Message-Id:In-Reply-To:References:From:Date:Subject:Content-Type: MIME-Version:To:Cc; b=seldk+J9gIlJGzcOa9l+raaKapIq4EcsfGTGlM949Yyct5OhjxMoKY7IL+vNERfTtziMJd8qOBCL7KLFIadM8JtMe3Uaau2MyersOGeqIldEURz9iTan5o4tP8gGYR4fLsWL4SrYxX+ZaFsij6VqseiL7ERixjZaQrxOtnejWHU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=B7puUl3D; arc=none smtp.client-ip=209.85.218.44 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="B7puUl3D" Received: by mail-ej1-f44.google.com with SMTP id a640c23a62f3a-a998a5ca499so456121466b.0 for ; Sun, 20 Oct 2024 06:43:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1729431817; x=1730036617; darn=vger.kernel.org; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=SQmhKbdfkRQZrIpJL7j3+j8/w3CoauY10DEagaqdgY4=; b=B7puUl3DX22PXMb7ymOGqhuL470niGM4oFRlqp73HNzuMYlDI3tlgZWUbVLll+NqlS RYNp/mhImm7TboE1E5vbyRtC9h3l0TL3uOmoDH5G60Du//4IZd/0HPpokPvI5GCiO/rJ eJyBY5RA1Z7vKH895NgM/4vGT5CHKljePEzdXOHTKXQzjg2MfQd/rFr4VPTK6TgN22P8 wtVZXnWoBYmH7C4eek4f8Er4jI+N8iT8RK30BN4eV4npGBaU5Hn9bXw0Cyxqn3RNqNa0 rsI7ELBhugS00ceWpqw4mglfPqTmZMog72arEP/aDk6fbH1/UV7XW6ZM8adqMEEa/rxJ aFsQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729431817; x=1730036617; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=SQmhKbdfkRQZrIpJL7j3+j8/w3CoauY10DEagaqdgY4=; b=sVhz3skTzbPVEXpzv1e5NDbthOxh35N0gLvNahIOfE01Yy4V/n6JzNfL0WEzgGYDtq NOHNu96tvO/QyW8vwk31p0afZH3xidCXKT82XfVD0tDMgfoBWN/DxdCEfzAWhsIpe9bd x3G6djVTgLsNgTNTXzazZsu1/Wjd8zYrMSaYnq3FJohy7gIxeeMLXnTCOpDAByOACzf3 mIG8KV3OANNtx+SwBrn4XM3O5XU61Ogpnu44HVma8tloGNzm3hOTqjLBFYcBSC3XNn78 EK31yoBWXvxAyq2XI5xn4Bh6gxKsF7a1hkmG8h2FyJggDZW6KUg0/VqydzDWY98v9OXb DADA== X-Gm-Message-State: AOJu0YxXgt7XYgQzpbQ1dbcYvq6bwnD5c1mo3HcIqbnBnS8fr4J9Yw3V 4VutL19nezjgdjJSOfLrJZLIol8BQe2TO3vNDBsyCxgIQXeB7gniZ0+LiQ== X-Google-Smtp-Source: AGHT+IEHKwhRYwAwz5Tlk4LLBdcM9+/28LqM6JBzu1e4FiN7RGvoYUp8H3mjvL+X9/8EYl7/TGsHhQ== X-Received: by 2002:a17:907:9345:b0:a99:46dd:f397 with SMTP id a640c23a62f3a-a9a69caa3f8mr763703966b.64.1729431816592; Sun, 20 Oct 2024 06:43:36 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a9a912d893csm93991366b.16.2024.10.20.06.43.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 20 Oct 2024 06:43:35 -0700 (PDT) Message-Id: <14375d19392c406d903f6cb26be5a39c4c2ff1e9.1729431810.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Sun, 20 Oct 2024 13:43:16 +0000 Subject: [PATCH v2 03/17] path-walk: allow consumer to specify object types Fcc: Sent Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, johannes.schindelin@gmx.de, peff@peff.net, ps@pks.im, me@ttaylorr.com, johncai86@gmail.com, newren@gmail.com, christian.couder@gmail.com, kristofferhaugsbakk@fastmail.com, Derrick Stolee , Derrick Stolee From: Derrick Stolee From: Derrick Stolee We add the ability to filter the object types in the path-walk API so the callback function is called fewer times. This adds the ability to ask for the commits in a list, as well. Future changes will add the ability to visit annotated tags. Signed-off-by: Derrick Stolee --- Documentation/technical/api-path-walk.txt | 9 +++ path-walk.c | 39 ++++++++++-- path-walk.h | 13 +++- t/helper/test-path-walk.c | 17 +++++- t/t6601-path-walk.sh | 72 +++++++++++++++++++++++ 5 files changed, 141 insertions(+), 9 deletions(-) diff --git a/Documentation/technical/api-path-walk.txt b/Documentation/technical/api-path-walk.txt index e588897ab8d..b7ae476ea0a 100644 --- a/Documentation/technical/api-path-walk.txt +++ b/Documentation/technical/api-path-walk.txt @@ -48,6 +48,15 @@ If you want the path-walk API to emit `UNINTERESTING` objects based on the commit walk's boundary, be sure to set `revs.boundary` so the boundary commits are emitted. +`commits`, `blobs`, `trees`:: + By default, these members are enabled and signal that the path-walk + API should call the `path_fn` on objects of these types. Specialized + applications could disable some options to make it simpler to walk + the objects or to have fewer calls to `path_fn`. ++ +While it is possible to walk only commits in this way, consumers would be +better off using the revision walk API instead. + Examples -------- diff --git a/path-walk.c b/path-walk.c index 66840187e28..22e1aa13f31 100644 --- a/path-walk.c +++ b/path-walk.c @@ -84,6 +84,10 @@ static int add_children(struct path_walk_context *ctx, if (S_ISGITLINK(entry.mode)) continue; + /* If the caller doesn't want blobs, then don't bother. */ + if (!ctx->info->blobs && type == OBJ_BLOB) + continue; + if (type == OBJ_TREE) { struct tree *child = lookup_tree(ctx->repo, &entry.oid); o = child ? &child->object : NULL; @@ -140,9 +144,11 @@ static int walk_path(struct path_walk_context *ctx, list = strmap_get(&ctx->paths_to_lists, path); - /* Evaluate function pointer on this data. */ - ret = ctx->info->path_fn(path, &list->oids, list->type, - ctx->info->path_fn_data); + /* Evaluate function pointer on this data, if requested. */ + if ((list->type == OBJ_TREE && ctx->info->trees) || + (list->type == OBJ_BLOB && ctx->info->blobs)) + ret = ctx->info->path_fn(path, &list->oids, list->type, + ctx->info->path_fn_data); /* Expand data for children. */ if (list->type == OBJ_TREE) { @@ -184,6 +190,7 @@ int walk_objects_by_path(struct path_walk_info *info) size_t commits_nr = 0, paths_nr = 0; struct commit *c; struct type_and_oid_list *root_tree_list; + struct type_and_oid_list *commit_list; struct path_walk_context ctx = { .repo = info->revs->repo, .revs = info->revs, @@ -194,19 +201,32 @@ int walk_objects_by_path(struct path_walk_info *info) trace2_region_enter("path-walk", "commit-walk", info->revs->repo); + CALLOC_ARRAY(commit_list, 1); + commit_list->type = OBJ_COMMIT; + /* Insert a single list for the root tree into the paths. */ CALLOC_ARRAY(root_tree_list, 1); root_tree_list->type = OBJ_TREE; strmap_put(&ctx.paths_to_lists, root_path, root_tree_list); - if (prepare_revision_walk(info->revs)) die(_("failed to setup revision walk")); while ((c = get_revision(info->revs))) { - struct object_id *oid = get_commit_tree_oid(c); - struct tree *t = lookup_tree(info->revs->repo, oid); + struct object_id *oid; + struct tree *t; commits_nr++; + if (info->commits) + oid_array_append(&commit_list->oids, + &c->object.oid); + + /* If we only care about commits, then skip trees. */ + if (!info->trees && !info->blobs) + continue; + + oid = get_commit_tree_oid(c); + t = lookup_tree(info->revs->repo, oid); + if (t) { if (t->object.flags & SEEN) continue; @@ -220,6 +240,13 @@ int walk_objects_by_path(struct path_walk_info *info) trace2_data_intmax("path-walk", ctx.repo, "commits", commits_nr); trace2_region_leave("path-walk", "commit-walk", info->revs->repo); + /* Track all commits. */ + if (info->commits) + ret = info->path_fn("", &commit_list->oids, OBJ_COMMIT, + info->path_fn_data); + oid_array_clear(&commit_list->oids); + free(commit_list); + string_list_append(&ctx.path_stack, root_path); trace2_region_enter("path-walk", "path-walk", info->revs->repo); diff --git a/path-walk.h b/path-walk.h index c9e94a98bc8..6ef372d8942 100644 --- a/path-walk.h +++ b/path-walk.h @@ -30,9 +30,20 @@ struct path_walk_info { */ path_fn path_fn; void *path_fn_data; + /** + * Initialize which object types the path_fn should be called on. This + * could also limit the walk to skip blobs if not set. + */ + int commits; + int trees; + int blobs; }; -#define PATH_WALK_INFO_INIT { 0 } +#define PATH_WALK_INFO_INIT { \ + .blobs = 1, \ + .trees = 1, \ + .commits = 1, \ +} /** * Given the configuration of 'info', walk the commits based on 'info->revs' and diff --git a/t/helper/test-path-walk.c b/t/helper/test-path-walk.c index 3c48f017fa0..37c5e3e31e8 100644 --- a/t/helper/test-path-walk.c +++ b/t/helper/test-path-walk.c @@ -18,6 +18,7 @@ static const char * const path_walk_usage[] = { }; struct path_walk_test_data { + uintmax_t commit_nr; uintmax_t tree_nr; uintmax_t blob_nr; }; @@ -29,6 +30,11 @@ static int emit_block(const char *path, struct oid_array *oids, const char *typestr; switch (type) { + case OBJ_COMMIT: + typestr = "COMMIT"; + tdata->commit_nr += oids->nr; + break; + case OBJ_TREE: typestr = "TREE"; tdata->tree_nr += oids->nr; @@ -56,6 +62,12 @@ int cmd__path_walk(int argc, const char **argv) struct path_walk_info info = PATH_WALK_INFO_INIT; struct path_walk_test_data data = { 0 }; struct option options[] = { + OPT_BOOL(0, "blobs", &info.blobs, + N_("toggle inclusion of blob objects")), + OPT_BOOL(0, "commits", &info.commits, + N_("toggle inclusion of commit objects")), + OPT_BOOL(0, "trees", &info.trees, + N_("toggle inclusion of tree objects")), OPT_END(), }; @@ -78,9 +90,10 @@ int cmd__path_walk(int argc, const char **argv) res = walk_objects_by_path(&info); - printf("trees:%" PRIuMAX "\n" + printf("commits:%" PRIuMAX "\n" + "trees:%" PRIuMAX "\n" "blobs:%" PRIuMAX "\n", - data.tree_nr, data.blob_nr); + data.commit_nr, data.tree_nr, data.blob_nr); return res; } diff --git a/t/t6601-path-walk.sh b/t/t6601-path-walk.sh index ca18b61c3f1..e4788664f93 100755 --- a/t/t6601-path-walk.sh +++ b/t/t6601-path-walk.sh @@ -31,6 +31,11 @@ test_expect_success 'all' ' test-tool path-walk -- --all >out && cat >expect <<-EOF && + COMMIT::$(git rev-parse topic) + COMMIT::$(git rev-parse base) + COMMIT::$(git rev-parse base~1) + COMMIT::$(git rev-parse base~2) + commits:4 TREE::$(git rev-parse topic^{tree}) TREE::$(git rev-parse base^{tree}) TREE::$(git rev-parse base~1^{tree}) @@ -60,6 +65,10 @@ test_expect_success 'topic only' ' test-tool path-walk -- topic >out && cat >expect <<-EOF && + COMMIT::$(git rev-parse topic) + COMMIT::$(git rev-parse base~1) + COMMIT::$(git rev-parse base~2) + commits:3 TREE::$(git rev-parse topic^{tree}) TREE::$(git rev-parse base~1^{tree}) TREE::$(git rev-parse base~2^{tree}) @@ -86,6 +95,8 @@ test_expect_success 'topic, not base' ' test-tool path-walk -- topic --not base >out && cat >expect <<-EOF && + COMMIT::$(git rev-parse topic) + commits:1 TREE::$(git rev-parse topic^{tree}) TREE:left/:$(git rev-parse topic:left) TREE:right/:$(git rev-parse topic:right) @@ -103,10 +114,71 @@ test_expect_success 'topic, not base' ' test_cmp expect.sorted out.sorted ' +test_expect_success 'topic, not base, only blobs' ' + test-tool path-walk --no-trees --no-commits \ + -- topic --not base >out && + + cat >expect <<-EOF && + commits:0 + trees:0 + BLOB:a:$(git rev-parse topic:a) + BLOB:left/b:$(git rev-parse topic:left/b) + BLOB:right/c:$(git rev-parse topic:right/c) + BLOB:right/d:$(git rev-parse topic:right/d) + blobs:4 + EOF + + sort expect >expect.sorted && + sort out >out.sorted && + + test_cmp expect.sorted out.sorted +' + +# No, this doesn't make a lot of sense for the path-walk API, +# but it is possible to do. +test_expect_success 'topic, not base, only commits' ' + test-tool path-walk --no-blobs --no-trees \ + -- topic --not base >out && + + cat >expect <<-EOF && + COMMIT::$(git rev-parse topic) + commits:1 + trees:0 + blobs:0 + EOF + + sort expect >expect.sorted && + sort out >out.sorted && + + test_cmp expect.sorted out.sorted +' + +test_expect_success 'topic, not base, only trees' ' + test-tool path-walk --no-blobs --no-commits \ + -- topic --not base >out && + + cat >expect <<-EOF && + commits:0 + TREE::$(git rev-parse topic^{tree}) + TREE:left/:$(git rev-parse topic:left) + TREE:right/:$(git rev-parse topic:right) + trees:3 + blobs:0 + EOF + + sort expect >expect.sorted && + sort out >out.sorted && + + test_cmp expect.sorted out.sorted +' + test_expect_success 'topic, not base, boundary' ' test-tool path-walk -- --boundary topic --not base >out && cat >expect <<-EOF && + COMMIT::$(git rev-parse topic) + COMMIT::$(git rev-parse base~1) + commits:2 TREE::$(git rev-parse topic^{tree}) TREE::$(git rev-parse base~1^{tree}) TREE:left/:$(git rev-parse base~1:left) From patchwork Sun Oct 20 13:43:17 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13843081 Received: from mail-ed1-f51.google.com (mail-ed1-f51.google.com [209.85.208.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1CE73193077 for ; Sun, 20 Oct 2024 13:43:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.208.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729431822; cv=none; b=kbj+S4uuQ2TpKQmdnJnVjQkUMGP0FgiIai0VUdZiJcoaSEAVquqU8kfA3uBgW1SGef08gz5l+He9gDYcYU4aWLNuNQdkgqT1V17C0oldPslXidUUvFIQlV/ou5JifWnu/P7bQP7WF1Sjhc4W/u1QifBm6CqvddzANpZWscJ1uU4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729431822; c=relaxed/simple; bh=iOXhD57a3Gb9t8HGtSThCj2OpN1u1rmdzLlejJH0A4o=; h=Message-Id:In-Reply-To:References:From:Date:Subject:Content-Type: MIME-Version:To:Cc; b=YTygDgG+fcwJj07FveYBjqDNUpgwVu2RMKLWdGMi9/hLXqVG8e21a9K7KX7e2vryCMhGJRMsb9B+4udMXr4vaP82NRBPmAdATn+C+KTdzWqKSu0GnTMaPwmg/sT7HiK6FLf94n4JkqSqcEF0WJ3m8RCV2JU8U+EaQQPDklDOIzw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=O8srcwnp; arc=none smtp.client-ip=209.85.208.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="O8srcwnp" Received: by mail-ed1-f51.google.com with SMTP id 4fb4d7f45d1cf-5c96936065dso3941978a12.3 for ; Sun, 20 Oct 2024 06:43:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1729431818; x=1730036618; darn=vger.kernel.org; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=boWyHHZyB9N5omC3+NSUZnQGb10fWVU7aHMWylJuAMk=; b=O8srcwnpkwyA/r4h9BTanujrxOhyX7keHXUfOMrhOcDxNiTI36Yolq4qrwrENFCldW 4Xrx8BbnDMwJS2nSioq2fvBPwQwGAZbUT5aEDDmU4GzwEANgqGYKJO4H75HMKY1//mST k0kk9TNuEKsvnNXc7yJVrJnRv5E237+vM5Hiky+80skd+kq0I7RA1r9gtQknZt5j/pHd C7GiZaCBYydaQHg+WqFxoJro+16Yl3o45Rg2MAYsd2sDjjv8PRWBQtpCc/56oKiM8K6i xLjpYWXhh4hwUs4IxloYd4CaY4AAnCD3aNAbGfbiTh9LusTbGmgYDIth4FuDWbb7xr/+ wJSA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729431818; x=1730036618; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=boWyHHZyB9N5omC3+NSUZnQGb10fWVU7aHMWylJuAMk=; b=H5KSY2foaVa9zo4Iu/Z63RHZ0Ov6jzxqPKlF6+tFwmhdmDTFqvWqPgU+1wXfioy0uy qCFm+9hAUJdMof3nwLkhZaOFsC0BykSRW08sKVt256LW57+XNyBPYBSmxvBWFJyRYPgQ XZjVbOyjojtDgZOvch4QBTcHIyG5NtxiFWXghqHOu0Ec9q4UpBbFVabXL5jb6i7O/ihl kfidjFlXtS13nLNsF1WWMSk7v9YLzOzmQAZCosHSMGYMn5cJPbjG4FV8mYTHG2KlNS+l jF1F6q1nu+5s2FRSGR2r/KzgZuxjm3Ff5QWGSulobH9SipLz6LL8RzieGMeLvarOk5kK iVEg== X-Gm-Message-State: AOJu0YxL+nBsm7vU/ixxlJiyzU4dc9WFZ+NmH0KwwpMWwNZ1vFNow4q8 lCpOG0YKWqAd10gVT/lMx5VDVZKixV+ByZy3bumOzL5l85kjfmqs6QbSHQ== X-Google-Smtp-Source: AGHT+IFQWX9gVQq+NO86/9nlD40A7lDJqrZCFAnsmzaV8H6IzHEv40uqOXf+h9BdIMYhd/e8pg/BEw== X-Received: by 2002:a17:907:9813:b0:a9a:3cec:b322 with SMTP id a640c23a62f3a-a9a69c9c659mr974453166b.45.1729431817334; Sun, 20 Oct 2024 06:43:37 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a9a915a5629sm91660066b.223.2024.10.20.06.43.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 20 Oct 2024 06:43:36 -0700 (PDT) Message-Id: In-Reply-To: References: Date: Sun, 20 Oct 2024 13:43:17 +0000 Subject: [PATCH v2 04/17] path-walk: allow visiting tags Fcc: Sent Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, johannes.schindelin@gmx.de, peff@peff.net, ps@pks.im, me@ttaylorr.com, johncai86@gmail.com, newren@gmail.com, christian.couder@gmail.com, kristofferhaugsbakk@fastmail.com, Derrick Stolee , Derrick Stolee From: Derrick Stolee From: Derrick Stolee In anticipation of using the path-walk API to analyze tags or include them in a pack-file, add the ability to walk the tags that were included in the revision walk. When these tag objects point to blobs or trees, we need to make sure those objects are also visited. Treat tagged trees as root trees, but put the tagged blobs in their own category. Be careful about objects that are referred to by multiple references. Co-authored-by: Johannes Schindelin Signed-off-by: Johannes Schindelin Signed-off-by: Derrick Stolee --- Documentation/technical/api-path-walk.txt | 2 +- path-walk.c | 78 +++++++++++++++++++++ path-walk.h | 2 + t/helper/test-path-walk.c | 13 +++- t/t6601-path-walk.sh | 85 +++++++++++++++++++++-- 5 files changed, 172 insertions(+), 8 deletions(-) diff --git a/Documentation/technical/api-path-walk.txt b/Documentation/technical/api-path-walk.txt index b7ae476ea0a..5fea1d1db17 100644 --- a/Documentation/technical/api-path-walk.txt +++ b/Documentation/technical/api-path-walk.txt @@ -48,7 +48,7 @@ If you want the path-walk API to emit `UNINTERESTING` objects based on the commit walk's boundary, be sure to set `revs.boundary` so the boundary commits are emitted. -`commits`, `blobs`, `trees`:: +`commits`, `blobs`, `trees`, `tags`:: By default, these members are enabled and signal that the path-walk API should call the `path_fn` on objects of these types. Specialized applications could disable some options to make it simpler to walk diff --git a/path-walk.c b/path-walk.c index 22e1aa13f31..55758f50abd 100644 --- a/path-walk.c +++ b/path-walk.c @@ -13,6 +13,7 @@ #include "revision.h" #include "string-list.h" #include "strmap.h" +#include "tag.h" #include "trace2.h" #include "tree.h" #include "tree-walk.h" @@ -204,13 +205,90 @@ int walk_objects_by_path(struct path_walk_info *info) CALLOC_ARRAY(commit_list, 1); commit_list->type = OBJ_COMMIT; + if (info->tags) + info->revs->tag_objects = 1; + /* Insert a single list for the root tree into the paths. */ CALLOC_ARRAY(root_tree_list, 1); root_tree_list->type = OBJ_TREE; strmap_put(&ctx.paths_to_lists, root_path, root_tree_list); + + /* + * Set these values before preparing the walk to catch + * lightweight tags pointing to non-commits. + */ + info->revs->blob_objects = info->blobs; + info->revs->tree_objects = info->trees; + if (prepare_revision_walk(info->revs)) die(_("failed to setup revision walk")); + info->revs->blob_objects = info->revs->tree_objects = 0; + + if (info->tags) { + struct oid_array tagged_blob_list = OID_ARRAY_INIT; + struct oid_array tags = OID_ARRAY_INIT; + + trace2_region_enter("path-walk", "tag-walk", info->revs->repo); + + /* + * Walk any pending objects at this point, but they should only + * be tags. + */ + for (size_t i = 0; i < info->revs->pending.nr; i++) { + struct object_array_entry *pending = info->revs->pending.objects + i; + struct object *obj = pending->item; + + if (obj->type == OBJ_COMMIT || obj->flags & SEEN) + continue; + + while (obj->type == OBJ_TAG) { + struct tag *tag = lookup_tag(info->revs->repo, + &obj->oid); + if (!(obj->flags & SEEN)) { + obj->flags |= SEEN; + oid_array_append(&tags, &obj->oid); + } + obj = tag->tagged; + } + + if ((obj->flags & SEEN)) + continue; + obj->flags |= SEEN; + + switch (obj->type) { + case OBJ_TREE: + if (info->trees) + oid_array_append(&root_tree_list->oids, &obj->oid); + break; + + case OBJ_BLOB: + if (info->blobs) + oid_array_append(&tagged_blob_list, &obj->oid); + break; + + case OBJ_COMMIT: + /* Make sure it is in the object walk */ + add_pending_object(info->revs, obj, ""); + break; + + default: + BUG("should not see any other type here"); + } + } + + info->path_fn("", &tags, OBJ_TAG, info->path_fn_data); + + if (tagged_blob_list.nr && info->blobs) + info->path_fn("/tagged-blobs", &tagged_blob_list, OBJ_BLOB, + info->path_fn_data); + + trace2_data_intmax("path-walk", ctx.repo, "tags", tags.nr); + trace2_region_leave("path-walk", "tag-walk", info->revs->repo); + oid_array_clear(&tags); + oid_array_clear(&tagged_blob_list); + } + while ((c = get_revision(info->revs))) { struct object_id *oid; struct tree *t; diff --git a/path-walk.h b/path-walk.h index 6ef372d8942..3f3b63180ef 100644 --- a/path-walk.h +++ b/path-walk.h @@ -37,12 +37,14 @@ struct path_walk_info { int commits; int trees; int blobs; + int tags; }; #define PATH_WALK_INFO_INIT { \ .blobs = 1, \ .trees = 1, \ .commits = 1, \ + .tags = 1, \ } /** diff --git a/t/helper/test-path-walk.c b/t/helper/test-path-walk.c index 37c5e3e31e8..c6c60d68749 100644 --- a/t/helper/test-path-walk.c +++ b/t/helper/test-path-walk.c @@ -21,6 +21,7 @@ struct path_walk_test_data { uintmax_t commit_nr; uintmax_t tree_nr; uintmax_t blob_nr; + uintmax_t tag_nr; }; static int emit_block(const char *path, struct oid_array *oids, @@ -45,6 +46,11 @@ static int emit_block(const char *path, struct oid_array *oids, tdata->blob_nr += oids->nr; break; + case OBJ_TAG: + typestr = "TAG"; + tdata->tag_nr += oids->nr; + break; + default: BUG("we do not understand this type"); } @@ -66,6 +72,8 @@ int cmd__path_walk(int argc, const char **argv) N_("toggle inclusion of blob objects")), OPT_BOOL(0, "commits", &info.commits, N_("toggle inclusion of commit objects")), + OPT_BOOL(0, "tags", &info.tags, + N_("toggle inclusion of tag objects")), OPT_BOOL(0, "trees", &info.trees, N_("toggle inclusion of tree objects")), OPT_END(), @@ -92,8 +100,9 @@ int cmd__path_walk(int argc, const char **argv) printf("commits:%" PRIuMAX "\n" "trees:%" PRIuMAX "\n" - "blobs:%" PRIuMAX "\n", - data.commit_nr, data.tree_nr, data.blob_nr); + "blobs:%" PRIuMAX "\n" + "tags:%" PRIuMAX "\n", + data.commit_nr, data.tree_nr, data.blob_nr, data.tag_nr); return res; } diff --git a/t/t6601-path-walk.sh b/t/t6601-path-walk.sh index e4788664f93..7758e2529ee 100755 --- a/t/t6601-path-walk.sh +++ b/t/t6601-path-walk.sh @@ -7,24 +7,55 @@ test_description='direct path-walk API tests' test_expect_success 'setup test repository' ' git checkout -b base && + # Make some objects that will only be reachable + # via non-commit tags. + mkdir child && + echo file >child/file && + git add child && + git commit -m "will abandon" && + git tag -a -m "tree" tree-tag HEAD^{tree} && + echo file2 >file2 && + git add file2 && + git commit --amend -m "will abandon" && + git tag tree-tag2 HEAD^{tree} && + + echo blob >file && + blob_oid=$(git hash-object -t blob -w --stdin file2 && + blob2_oid=$(git hash-object -t blob -w --stdin a && echo b >left/b && echo c >right/c && git add . && - git commit -m "first" && + git commit --amend -m "first" && + git tag -m "first" first HEAD && echo d >right/d && git add right && git commit -m "second" && + git tag -a -m "second (under)" second.1 HEAD && + git tag -a -m "second (top)" second.2 second.1 && + # Set up file/dir collision in history. + rm a && + mkdir a && + echo a >a/a && echo bb >left/b && - git commit -a -m "third" && + git add a left && + git commit -m "third" && + git tag -a -m "third" third && git checkout -b topic HEAD~1 && echo cc >right/c && - git commit -a -m "topic" + git commit -a -m "topic" && + git tag -a -m "fourth" fourth ' test_expect_success 'all' ' @@ -40,19 +71,35 @@ test_expect_success 'all' ' TREE::$(git rev-parse base^{tree}) TREE::$(git rev-parse base~1^{tree}) TREE::$(git rev-parse base~2^{tree}) + TREE::$(git rev-parse refs/tags/tree-tag^{}) + TREE::$(git rev-parse refs/tags/tree-tag2^{}) + TREE:a/:$(git rev-parse base:a) TREE:left/:$(git rev-parse base:left) TREE:left/:$(git rev-parse base~2:left) TREE:right/:$(git rev-parse topic:right) TREE:right/:$(git rev-parse base~1:right) TREE:right/:$(git rev-parse base~2:right) - trees:9 + TREE:child/:$(git rev-parse refs/tags/tree-tag^{}:child) + trees:13 BLOB:a:$(git rev-parse base~2:a) + BLOB:file2:$(git rev-parse refs/tags/tree-tag2^{}:file2) BLOB:left/b:$(git rev-parse base~2:left/b) BLOB:left/b:$(git rev-parse base:left/b) BLOB:right/c:$(git rev-parse base~2:right/c) BLOB:right/c:$(git rev-parse topic:right/c) BLOB:right/d:$(git rev-parse base~1:right/d) - blobs:6 + BLOB:/tagged-blobs:$(git rev-parse refs/tags/blob-tag^{}) + BLOB:/tagged-blobs:$(git rev-parse refs/tags/blob-tag2^{}) + BLOB:child/file:$(git rev-parse refs/tags/tree-tag^{}:child/file) + blobs:10 + TAG::$(git rev-parse refs/tags/first) + TAG::$(git rev-parse refs/tags/second.1) + TAG::$(git rev-parse refs/tags/second.2) + TAG::$(git rev-parse refs/tags/third) + TAG::$(git rev-parse refs/tags/fourth) + TAG::$(git rev-parse refs/tags/tree-tag) + TAG::$(git rev-parse refs/tags/blob-tag) + tags:7 EOF sort expect >expect.sorted && @@ -83,6 +130,7 @@ test_expect_success 'topic only' ' BLOB:right/c:$(git rev-parse topic:right/c) BLOB:right/d:$(git rev-parse base~1:right/d) blobs:5 + tags:0 EOF sort expect >expect.sorted && @@ -106,6 +154,7 @@ test_expect_success 'topic, not base' ' BLOB:right/c:$(git rev-parse topic:right/c) BLOB:right/d:$(git rev-parse topic:right/d) blobs:4 + tags:0 EOF sort expect >expect.sorted && @@ -126,6 +175,7 @@ test_expect_success 'topic, not base, only blobs' ' BLOB:right/c:$(git rev-parse topic:right/c) BLOB:right/d:$(git rev-parse topic:right/d) blobs:4 + tags:0 EOF sort expect >expect.sorted && @@ -145,6 +195,7 @@ test_expect_success 'topic, not base, only commits' ' commits:1 trees:0 blobs:0 + tags:0 EOF sort expect >expect.sorted && @@ -164,6 +215,7 @@ test_expect_success 'topic, not base, only trees' ' TREE:right/:$(git rev-parse topic:right) trees:3 blobs:0 + tags:0 EOF sort expect >expect.sorted && @@ -191,6 +243,7 @@ test_expect_success 'topic, not base, boundary' ' BLOB:right/c:$(git rev-parse topic:right/c) BLOB:right/d:$(git rev-parse base~1:right/d) blobs:5 + tags:0 EOF sort expect >expect.sorted && @@ -199,4 +252,26 @@ test_expect_success 'topic, not base, boundary' ' test_cmp expect.sorted out.sorted ' +test_expect_success 'trees are reported exactly once' ' + test_when_finished "rm -rf unique-trees" && + test_create_repo unique-trees && + ( + cd unique-trees && + mkdir initial && + test_commit initial/file && + + git switch -c move-to-top && + git mv initial/file.t ./ && + test_tick && + git commit -m moved && + + git update-ref refs/heads/other HEAD + ) && + + test-tool -C unique-trees path-walk -- --all >out && + tree=$(git -C unique-trees rev-parse HEAD:) && + grep "$tree" out >out-filtered && + test_line_count = 1 out-filtered +' + test_done From patchwork Sun Oct 20 13:43:18 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13843082 Received: from mail-ed1-f50.google.com (mail-ed1-f50.google.com [209.85.208.50]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1A75A193419 for ; Sun, 20 Oct 2024 13:43:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.208.50 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729431824; cv=none; b=seq98KiobLqEEsPeOEx7a1XSyEIf+zR0vZBEZg1+wnS0fwB0pX5R24npMC1Rag7l7MwtkIfoNe7EBg9UII3f2U9IZcYIgQY3UR53QFrDb44CAFPLZRlrjkPs4qNTptntUidWCAPCKRZ3cEUNIbLvDQfxyGBHuSsX4SIlTYBxLNc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729431824; c=relaxed/simple; bh=8/c/yA7Z4chcFAC86FyGgWEKtP39z4MIdZaXvM2xvd8=; h=Message-Id:In-Reply-To:References:From:Date:Subject:Content-Type: MIME-Version:To:Cc; b=oGCcXxgy/stB3EWEvPWXHv/jsEM9jH2x/y7ULQTq/bh/8zCqgSHt591smvsvR/gX4K2vzDJeF5szcEDgzHoxgAu3CyrOvVFzRmtvCr5cvtrz5KcA1hsCDyL/ldHGgEJNSrTwAgjraaGM51CAXIiXjrOsGJs9imti/Pjrcot9BhE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=K0duO3aU; arc=none smtp.client-ip=209.85.208.50 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="K0duO3aU" Received: by mail-ed1-f50.google.com with SMTP id 4fb4d7f45d1cf-5c984352742so3907300a12.1 for ; Sun, 20 Oct 2024 06:43:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1729431819; x=1730036619; darn=vger.kernel.org; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=CiRsSr9N+34ulobRz7mRSLnLQayzsrwlctft/WmYog0=; b=K0duO3aU5PQnaJuD4wIXcD0Y2DnR1koU/iV5Tn70kVR4E13F/sfIVdrEadvSE0ZI1u jm2iZH3yaJ/yV9osljGoFYoDPaw/pAyvYlgIgaJYyUmLtzM8Xe1eq53krLvMiOUth8WO 5hQIQlLiigQNh/qWc9+l7T9trrfRNyt8uXnMUzjVV8yxPh6Vijp7NjfVjuwooR8kkRAQ WqblKUdviGT5DQQT8B5iuQER7m189ohSqNUq/feomiUPY642BVRv89l2Y0Jfdf76PsRh YTcFOL+o96uKAlVSPyaWAKIGdJ9fBU2cwKp5nLVolFWUE+q8km6yrjkXjqirffmAaZPQ Tz0w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729431819; x=1730036619; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=CiRsSr9N+34ulobRz7mRSLnLQayzsrwlctft/WmYog0=; b=JokU6oalk/gM1xxayOJjSRQ3fZiAIFlrgiyaNt+vMFyBUCJ/tRU/6I8NJQGjXlxWi2 pVrzbQl5Oq7XDFooDPcMqPSPXoKlX/8Pboj+ICb7POG781BSLcs0bZKNNw1zcJihnpYY uW9HMEB8xk4yj703V9MxZloe/mUq1masgTzC/bd/x4HcZh9zquuJu7ZS0wxQQ5QgCRe0 hkcQ7hVhycghq3K89kN6lnMvdV2Ou0kgZFcK+cJMvqHfy/COiTEnu8hD/AaSv1O0/NJ6 wA1fZvMf41FvaAcDd8Rut1Rp7YVjRK0ChFtYEf3iKl4wrcDlPMFTYq0z8I0P4UCnuBsv mx6w== X-Gm-Message-State: AOJu0YzY4nQMZni/dumLL9P0jI0BVux/p9+/8P8Hp5fFmcfou4SHba6j HhsHxX5HMUqwcOJUf0LgL4kmu7PT4vDNAnSMiiwvnajy5dvHkMhlEavSNg== X-Google-Smtp-Source: AGHT+IEaI3ytqAcanX8jeQcydv19aO9rCXTIGzktbuDDSj5R/7eW9Gx7Q2rSNgF7MlDMwWNVR73Ftw== X-Received: by 2002:a05:6402:845:b0:5c8:a01c:e511 with SMTP id 4fb4d7f45d1cf-5ca0ac86021mr7627303a12.10.1729431819233; Sun, 20 Oct 2024 06:43:39 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id 4fb4d7f45d1cf-5cb6696b53bsm897607a12.17.2024.10.20.06.43.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 20 Oct 2024 06:43:37 -0700 (PDT) Message-Id: <6e89fb219b55316c6c67a6a4b87b6928ff5b6f93.1729431810.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Sun, 20 Oct 2024 13:43:18 +0000 Subject: [PATCH v2 05/17] revision: create mark_trees_uninteresting_dense() Fcc: Sent Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, johannes.schindelin@gmx.de, peff@peff.net, ps@pks.im, me@ttaylorr.com, johncai86@gmail.com, newren@gmail.com, christian.couder@gmail.com, kristofferhaugsbakk@fastmail.com, Derrick Stolee , Derrick Stolee From: Derrick Stolee From: Derrick Stolee The sparse tree walk algorithm was created in d5d2e93577e (revision: implement sparse algorithm, 2019-01-16) and involves using the mark_trees_uninteresting_sparse() method. This method takes a repository and an oidset of tree IDs, some of which have the UNINTERESTING flag and some of which do not. Create a method that has an equivalent set of preconditions but uses a "dense" walk (recursively visits all reachable trees, as long as they have not previously been marked UNINTERESTING). This is an important difference from mark_tree_uninteresting(), which short-circuits if the given tree has the UNINTERESTING flag. A use of this method will be added in a later change, with a condition set whether the sparse or dense approach should be used. Signed-off-by: Derrick Stolee --- revision.c | 15 +++++++++++++++ revision.h | 1 + 2 files changed, 16 insertions(+) diff --git a/revision.c b/revision.c index 2d7ad2bddff..bdc312f1538 100644 --- a/revision.c +++ b/revision.c @@ -219,6 +219,21 @@ static void add_children_by_path(struct repository *r, free_tree_buffer(tree); } +void mark_trees_uninteresting_dense(struct repository *r, + struct oidset *trees) +{ + struct object_id *oid; + struct oidset_iter iter; + + oidset_iter_init(trees, &iter); + while ((oid = oidset_iter_next(&iter))) { + struct tree *tree = lookup_tree(r, oid); + + if (tree && (tree->object.flags & UNINTERESTING)) + mark_tree_contents_uninteresting(r, tree); + } +} + void mark_trees_uninteresting_sparse(struct repository *r, struct oidset *trees) { diff --git a/revision.h b/revision.h index 71e984c452b..8938b2db112 100644 --- a/revision.h +++ b/revision.h @@ -487,6 +487,7 @@ void put_revision_mark(const struct rev_info *revs, void mark_parents_uninteresting(struct rev_info *revs, struct commit *commit); void mark_tree_uninteresting(struct repository *r, struct tree *tree); +void mark_trees_uninteresting_dense(struct repository *r, struct oidset *trees); void mark_trees_uninteresting_sparse(struct repository *r, struct oidset *trees); void show_object_with_name(FILE *, struct object *, const char *); From patchwork Sun Oct 20 13:43:19 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13843083 Received: from mail-ej1-f52.google.com (mail-ej1-f52.google.com [209.85.218.52]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 57585193425 for ; Sun, 20 Oct 2024 13:43:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.218.52 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729431824; cv=none; b=kgaJ/SlI0/RL+3BPrIKzaVckV+aGSraEUhHuNL+HgkWTO9Q2qMXLmKYuelxEZJ/QI6nrTWnoyUDSWfU+fEq1A1YoZBaKun53s1/WAZA/DHRBZPLM/Cqahp4a12ssIOljehgeUt+Ud9VG4ayr80gd59upM1Ew4MX98ffFQDMNw4A= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729431824; c=relaxed/simple; bh=2TGFiYtSK8HMIyaeTOuSKRp7mrxIpYUI6kNIC0slwik=; h=Message-Id:In-Reply-To:References:From:Date:Subject:Content-Type: MIME-Version:To:Cc; b=XOL6AC5f4LeLjpM1WmDewFyMXL84xavRagrG1ghiQ+ElFwoplLSQOvE9Ejpz+VtGZDzyOpN7OdqW1RivNkaEbpOP0iWP+Lx+HIKHRfgvvqvYUVsbKx98x/d+v5zz/MmXJgasLmw41wNxx8BmpF0+ZtHtC0JO8XR8QA3hWxqRhtU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=dWqcM3vg; arc=none smtp.client-ip=209.85.218.52 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="dWqcM3vg" Received: by mail-ej1-f52.google.com with SMTP id a640c23a62f3a-a9a1b71d7ffso564583766b.1 for ; Sun, 20 Oct 2024 06:43:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1729431820; x=1730036620; darn=vger.kernel.org; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=ErhD4OZv4Hk6HVtHqwAGvW2Wpwsef5/Cu//ACnqDeKs=; b=dWqcM3vgGkDhWIwJia7oCwyuAKZEaOpLzX5jFqk5PL00D4Z+3VI/WgZgifxIfr0X1B Bppzfivb2uLyuu3gmgxlkrWfiR+GAWWIERqfF21eDRUtrPgoT3gud+Ub4rYP5bxzA5vJ oJg10q4qhmHhHggcYyl4dDGKSneu5IAgHZoXGeQF8bpSCW17wocv1Cynydbfdkganr06 7s8l9aN5JpHuUBjnDJwZdUqCfHI9IoqEMl/hCzY+VFBuUSJ8nrH5+yQVppL2ZfPlOMCJ wqRWOWg8L3FGAXcPLguWKd00Jya1K8N9EualwFr6dTYeLPgVqovCCKHhwInCBQduU7Rn CAKA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729431820; x=1730036620; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ErhD4OZv4Hk6HVtHqwAGvW2Wpwsef5/Cu//ACnqDeKs=; b=VvGsQshQFNQziX5RP+z5r/0HSSrueV2ZvFgNyy6rteAbcIk2MlV6+PZrm7TTw/0CtW Nk78QEBUtf62j+JoqvNQ19eL1jnETsh6PfwmFPIgAKO8YPXBxAxiYCuIj7zB5K/KW4wC Bo/EPG3O8q4OvbEAa8srpRmUXqaslldq7kK4TrkrD7ioRezI8OiDwvQC8ywPgnmBr6Ed bdmw0UIj06uRtlT7fnEO2arjrw/Iboca8Z53YGlIunIwE8+rBf4+PKW7YbK3y3MPGQMZ tKG1c6dp14P//FI5Zry4pY647wbWK1SuUtqZNWcQPR7jcqluOc0/3gV+9RWznfv2DUFd Ez9w== X-Gm-Message-State: AOJu0Yx7Qg8lHP5j8rhw+JZPX4G1nVhxXPgP2PATWVyDkmi+ciqeApMj ETyH7ng8rGKbvP7m5oA9CxyZlii8HF0p1Dik4RQZf2fc1QWxW7aGj0zsbw== X-Google-Smtp-Source: AGHT+IFydtJ/WSzrSYuA66iUNyKCcBzQUPV7NGsSh4wHE3fK2d7UZ9uOdklyy+1NkMORy3wpVn9JBQ== X-Received: by 2002:a17:907:6d03:b0:a9a:5cf8:9e40 with SMTP id a640c23a62f3a-a9a69a79c26mr616640766b.24.1729431820031; Sun, 20 Oct 2024 06:43:40 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a9a915a46fbsm91056766b.215.2024.10.20.06.43.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 20 Oct 2024 06:43:39 -0700 (PDT) Message-Id: <238d7d95715d3e161489a5ef8d788c0cac4f7a03.1729431810.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Sun, 20 Oct 2024 13:43:19 +0000 Subject: [PATCH v2 06/17] path-walk: add prune_all_uninteresting option Fcc: Sent Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, johannes.schindelin@gmx.de, peff@peff.net, ps@pks.im, me@ttaylorr.com, johncai86@gmail.com, newren@gmail.com, christian.couder@gmail.com, kristofferhaugsbakk@fastmail.com, Derrick Stolee , Derrick Stolee From: Derrick Stolee From: Derrick Stolee This option causes the path-walk API to act like the sparse tree-walk algorithm implemented by mark_trees_uninteresting_sparse() in list-objects.c. Starting from the commits marked as UNINTERESTING, their root trees and all objects reachable from those trees are UNINTERSTING, at least as we walk path-by-path. When we reach a path where all objects associated with that path are marked UNINTERESTING, then do no continue walking the children of that path. We need to be careful to pass the UNINTERESTING flag in a deep way on the UNINTERESTING objects before we start the path-walk, or else the depth-first search for the path-walk API may accidentally report some objects as interesting. Signed-off-by: Derrick Stolee --- Documentation/technical/api-path-walk.txt | 8 +++ path-walk.c | 64 ++++++++++++++++++++++- path-walk.h | 8 +++ t/helper/test-path-walk.c | 10 +++- t/t6601-path-walk.sh | 40 +++++++++++--- 5 files changed, 118 insertions(+), 12 deletions(-) diff --git a/Documentation/technical/api-path-walk.txt b/Documentation/technical/api-path-walk.txt index 5fea1d1db17..c51f92cd649 100644 --- a/Documentation/technical/api-path-walk.txt +++ b/Documentation/technical/api-path-walk.txt @@ -57,6 +57,14 @@ commits are emitted. While it is possible to walk only commits in this way, consumers would be better off using the revision walk API instead. +`prune_all_uninteresting`:: + By default, all reachable paths are emitted by the path-walk API. + This option allows consumers to declare that they are not + interested in paths where all included objects are marked with the + `UNINTERESTING` flag. This requires using the `boundary` option in + the revision walk so that the walk emits commits marked with the + `UNINTERESTING` flag. + Examples -------- diff --git a/path-walk.c b/path-walk.c index 55758f50abd..910dfc6fdc9 100644 --- a/path-walk.c +++ b/path-walk.c @@ -22,6 +22,7 @@ struct type_and_oid_list { enum object_type type; struct oid_array oids; + int maybe_interesting; }; #define TYPE_AND_OID_LIST_INIT { \ @@ -124,6 +125,8 @@ static int add_children(struct path_walk_context *ctx, strmap_put(&ctx->paths_to_lists, path.buf, list); string_list_append(&ctx->path_stack, path.buf); } + if (!(o->flags & UNINTERESTING)) + list->maybe_interesting = 1; oid_array_append(&list->oids, &entry.oid); } @@ -145,6 +148,40 @@ static int walk_path(struct path_walk_context *ctx, list = strmap_get(&ctx->paths_to_lists, path); + if (ctx->info->prune_all_uninteresting) { + /* + * This is true if all objects were UNINTERESTING + * when added to the list. + */ + if (!list->maybe_interesting) + return 0; + + /* + * But it's still possible that the objects were set + * as UNINTERESTING after being added. Do a quick check. + */ + list->maybe_interesting = 0; + for (size_t i = 0; + !list->maybe_interesting && i < list->oids.nr; + i++) { + if (list->type == OBJ_TREE) { + struct tree *t = lookup_tree(ctx->repo, + &list->oids.oid[i]); + if (t && !(t->object.flags & UNINTERESTING)) + list->maybe_interesting = 1; + } else { + struct blob *b = lookup_blob(ctx->repo, + &list->oids.oid[i]); + if (b && !(b->object.flags & UNINTERESTING)) + list->maybe_interesting = 1; + } + } + + /* We have confirmed that all objects are UNINTERESTING. */ + if (!list->maybe_interesting) + return 0; + } + /* Evaluate function pointer on this data, if requested. */ if ((list->type == OBJ_TREE && ctx->info->trees) || (list->type == OBJ_BLOB && ctx->info->blobs)) @@ -187,7 +224,7 @@ static void clear_strmap(struct strmap *map) int walk_objects_by_path(struct path_walk_info *info) { const char *root_path = ""; - int ret = 0; + int ret = 0, has_uninteresting = 0; size_t commits_nr = 0, paths_nr = 0; struct commit *c; struct type_and_oid_list *root_tree_list; @@ -199,6 +236,7 @@ int walk_objects_by_path(struct path_walk_info *info) .path_stack = STRING_LIST_INIT_DUP, .paths_to_lists = STRMAP_INIT }; + struct oidset root_tree_set = OIDSET_INIT; trace2_region_enter("path-walk", "commit-walk", info->revs->repo); @@ -211,6 +249,7 @@ int walk_objects_by_path(struct path_walk_info *info) /* Insert a single list for the root tree into the paths. */ CALLOC_ARRAY(root_tree_list, 1); root_tree_list->type = OBJ_TREE; + root_tree_list->maybe_interesting = 1; strmap_put(&ctx.paths_to_lists, root_path, root_tree_list); /* @@ -306,10 +345,16 @@ int walk_objects_by_path(struct path_walk_info *info) t = lookup_tree(info->revs->repo, oid); if (t) { + if ((c->object.flags & UNINTERESTING)) { + t->object.flags |= UNINTERESTING; + has_uninteresting = 1; + } + if (t->object.flags & SEEN) continue; t->object.flags |= SEEN; - oid_array_append(&root_tree_list->oids, oid); + if (!oidset_insert(&root_tree_set, oid)) + oid_array_append(&root_tree_list->oids, oid); } else { warning("could not find tree %s", oid_to_hex(oid)); } @@ -325,6 +370,21 @@ int walk_objects_by_path(struct path_walk_info *info) oid_array_clear(&commit_list->oids); free(commit_list); + /* + * Before performing a DFS of our paths and emitting them as interesting, + * do a full walk of the trees to distribute the UNINTERESTING bit. Use + * the sparse algorithm if prune_all_uninteresting was set. + */ + if (has_uninteresting) { + trace2_region_enter("path-walk", "uninteresting-walk", info->revs->repo); + if (info->prune_all_uninteresting) + mark_trees_uninteresting_sparse(ctx.repo, &root_tree_set); + else + mark_trees_uninteresting_dense(ctx.repo, &root_tree_set); + trace2_region_leave("path-walk", "uninteresting-walk", info->revs->repo); + } + oidset_clear(&root_tree_set); + string_list_append(&ctx.path_stack, root_path); trace2_region_enter("path-walk", "path-walk", info->revs->repo); diff --git a/path-walk.h b/path-walk.h index 3f3b63180ef..3e44c4b8a58 100644 --- a/path-walk.h +++ b/path-walk.h @@ -38,6 +38,14 @@ struct path_walk_info { int trees; int blobs; int tags; + + /** + * When 'prune_all_uninteresting' is set and a path has all objects + * marked as UNINTERESTING, then the path-walk will not visit those + * objects. It will not call path_fn on those objects and will not + * walk the children of such trees. + */ + int prune_all_uninteresting; }; #define PATH_WALK_INFO_INIT { \ diff --git a/t/helper/test-path-walk.c b/t/helper/test-path-walk.c index c6c60d68749..06b103d8760 100644 --- a/t/helper/test-path-walk.c +++ b/t/helper/test-path-walk.c @@ -55,8 +55,12 @@ static int emit_block(const char *path, struct oid_array *oids, BUG("we do not understand this type"); } - for (size_t i = 0; i < oids->nr; i++) - printf("%s:%s:%s\n", typestr, path, oid_to_hex(&oids->oid[i])); + for (size_t i = 0; i < oids->nr; i++) { + struct object *o = lookup_unknown_object(the_repository, + &oids->oid[i]); + printf("%s:%s:%s%s\n", typestr, path, oid_to_hex(&oids->oid[i]), + o->flags & UNINTERESTING ? ":UNINTERESTING" : ""); + } return 0; } @@ -76,6 +80,8 @@ int cmd__path_walk(int argc, const char **argv) N_("toggle inclusion of tag objects")), OPT_BOOL(0, "trees", &info.trees, N_("toggle inclusion of tree objects")), + OPT_BOOL(0, "prune", &info.prune_all_uninteresting, + N_("toggle pruning of uninteresting paths")), OPT_END(), }; diff --git a/t/t6601-path-walk.sh b/t/t6601-path-walk.sh index 7758e2529ee..943adc6c8f1 100755 --- a/t/t6601-path-walk.sh +++ b/t/t6601-path-walk.sh @@ -229,19 +229,19 @@ test_expect_success 'topic, not base, boundary' ' cat >expect <<-EOF && COMMIT::$(git rev-parse topic) - COMMIT::$(git rev-parse base~1) + COMMIT::$(git rev-parse base~1):UNINTERESTING commits:2 TREE::$(git rev-parse topic^{tree}) - TREE::$(git rev-parse base~1^{tree}) - TREE:left/:$(git rev-parse base~1:left) + TREE::$(git rev-parse base~1^{tree}):UNINTERESTING + TREE:left/:$(git rev-parse base~1:left):UNINTERESTING TREE:right/:$(git rev-parse topic:right) - TREE:right/:$(git rev-parse base~1:right) + TREE:right/:$(git rev-parse base~1:right):UNINTERESTING trees:5 - BLOB:a:$(git rev-parse base~1:a) - BLOB:left/b:$(git rev-parse base~1:left/b) - BLOB:right/c:$(git rev-parse base~1:right/c) + BLOB:a:$(git rev-parse base~1:a):UNINTERESTING + BLOB:left/b:$(git rev-parse base~1:left/b):UNINTERESTING + BLOB:right/c:$(git rev-parse base~1:right/c):UNINTERESTING BLOB:right/c:$(git rev-parse topic:right/c) - BLOB:right/d:$(git rev-parse base~1:right/d) + BLOB:right/d:$(git rev-parse base~1:right/d):UNINTERESTING blobs:5 tags:0 EOF @@ -252,6 +252,30 @@ test_expect_success 'topic, not base, boundary' ' test_cmp expect.sorted out.sorted ' +test_expect_success 'topic, not base, boundary with pruning' ' + test-tool path-walk --prune -- --boundary topic --not base >out && + + cat >expect <<-EOF && + COMMIT::$(git rev-parse topic) + COMMIT::$(git rev-parse base~1):UNINTERESTING + commits:2 + TREE::$(git rev-parse topic^{tree}) + TREE::$(git rev-parse base~1^{tree}):UNINTERESTING + TREE:right/:$(git rev-parse topic:right) + TREE:right/:$(git rev-parse base~1:right):UNINTERESTING + trees:4 + BLOB:right/c:$(git rev-parse base~1:right/c):UNINTERESTING + BLOB:right/c:$(git rev-parse topic:right/c) + blobs:2 + tags:0 + EOF + + sort expect >expect.sorted && + sort out >out.sorted && + + test_cmp expect.sorted out.sorted +' + test_expect_success 'trees are reported exactly once' ' test_when_finished "rm -rf unique-trees" && test_create_repo unique-trees && From patchwork Sun Oct 20 13:43:20 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13843084 Received: from mail-ej1-f51.google.com (mail-ej1-f51.google.com [209.85.218.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4199A194089 for ; Sun, 20 Oct 2024 13:43:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.218.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729431826; cv=none; b=EyC5Qcf1Til0uFMPeFifkXW2FYQ08dFCtdDzdMPBWw+JilMjAW1wbGhq8YaIbK4ke3e2e/8L1j+BqozeYWcSr4o45cUbofFVECDmrpDL2OMXyUpvgmsvfsnTNWwha3XBAoNN1oIWJG4x2kozc0uch6EsaO2DCanVtwHWmv0csG0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729431826; c=relaxed/simple; bh=FWW0Yk4EjQQ1LYFwR1ffznyINczb6AXTwXpT8uITqmI=; h=Message-Id:In-Reply-To:References:From:Date:Subject:Content-Type: MIME-Version:To:Cc; b=pRFAfrZRlJJbN4XNW7BZ4c4PH4FA5g0cBVk1WKGvPQmEvVUuG2kmwMcQRB57DBo17BZbpwEgmWfW575v6rlxArSnXrRWIr+6K4MyewrRAOFVciRKUaNMyHW/6ncAVRExbTGxh+BeW1vHJ2WQFXbbXxrmNEUVm2zBOI5nnRsg8KA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=C+Zqkr7I; arc=none smtp.client-ip=209.85.218.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="C+Zqkr7I" Received: by mail-ej1-f51.google.com with SMTP id a640c23a62f3a-a99cc265e0aso513188066b.3 for ; Sun, 20 Oct 2024 06:43:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1729431822; x=1730036622; darn=vger.kernel.org; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=k+vaodEh/Yx/Ff4Tz8iCfMqaXhTzdOjL/w83oLvkHm4=; b=C+Zqkr7Ig3VvOtS6q4TXPPgLuVfCuEfXmNzK0r47nZzFkJBf/sQfY7BjcRYs+F94yu ZVcpOHYB7Qnes6CNQskJTuUKhZz8YtddznFnkMoxFIwoZDvYqEaMCcwUJBucF/cPXl7f YfMby2Z7lJH0k2bnK9sAHjdAiwsKO0rb3RxK6tvQjcVTmukUi2QNHKhhzADTzqTyG5Bk sBNaZaFwdkQa0zb6XEU4FrS1RxPMryWGnYCZuIxOu8vcElmFupmUJnusD/qI+DYRnbjB wHCymGUqK33SNIKWkpeBaEMtuBVdLNXhtqFY6S+WJ6jYWcdbgNZsU9is+XF8clVymJ8b xhcg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729431822; x=1730036622; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=k+vaodEh/Yx/Ff4Tz8iCfMqaXhTzdOjL/w83oLvkHm4=; b=TAeccPvYDYu0WvlajJ4VDLPATl9cED/tHvwvGDsYNzp6QzjtSY54ygiobNtGDkG2Jj UdvCpcfYFG961XrugLDZTTIuYndbEhlpeJxFcp5Xzbk5ZZqVvywMLMx9zDzR6ImFhfsv wsfgcp+Uh77XcuwHK4aEACxeQAfFafPkx3N0YsCl08bI5MwlJeskJIb8rcxaevK6UVmB Zoh2j1bJ5ESBjxqHcdqO/QX6kpFdF3JShKFDNKKZjxLLyOhEdAcWbUcEjgPSMqkOxXqx BHCGZBBaI/GD0BpGd3cYevWF4JexMTYOZm7ZBTquAreUxyKyPCgr4qZnGEK32x13Uj/7 F21g== X-Gm-Message-State: AOJu0YzR6ymiaBCmqntDKAsjpzNE54CAYAvoQXySM349HHNv7oCTQFdl /vT22mfmSzbHl6YH8376Y2db7NVnB5Set5BfGFwHM3mMWtf+GiULlo51yw== X-Google-Smtp-Source: AGHT+IGYuZI+WCHM3Z5n2SfNpd6Y/JXtspZztPehD/Ghn7Ja/FTA69r+gKogJnM57XS0ctyOPBFTFQ== X-Received: by 2002:a17:907:a4b:b0:a9a:421:720 with SMTP id a640c23a62f3a-a9a69c55b5cmr851243966b.46.1729431821843; Sun, 20 Oct 2024 06:43:41 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a9a912ed80fsm92675066b.46.2024.10.20.06.43.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 20 Oct 2024 06:43:41 -0700 (PDT) Message-Id: <3fdb57edbc504fbad572f21f309b8ff6a0ce1a72.1729431810.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Sun, 20 Oct 2024 13:43:20 +0000 Subject: [PATCH v2 07/17] pack-objects: extract should_attempt_deltas() Fcc: Sent Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, johannes.schindelin@gmx.de, peff@peff.net, ps@pks.im, me@ttaylorr.com, johncai86@gmail.com, newren@gmail.com, christian.couder@gmail.com, kristofferhaugsbakk@fastmail.com, Derrick Stolee , Derrick Stolee From: Derrick Stolee From: Derrick Stolee This will be helpful in a future change that introduces a new way to compute deltas. Be careful to preserve the nr_deltas counting logic in the existing method, but take the rest of the logic wholesale. Signed-off-by: Derrick Stolee --- builtin/pack-objects.c | 53 +++++++++++++++++++++++------------------- 1 file changed, 29 insertions(+), 24 deletions(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 0fc0680b402..82f4ca04000 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -3167,6 +3167,33 @@ static int add_ref_tag(const char *tag UNUSED, const char *referent UNUSED, cons return 0; } +static int should_attempt_deltas(struct object_entry *entry) +{ + if (DELTA(entry)) + return 0; + + if (!entry->type_valid || + oe_size_less_than(&to_pack, entry, 50)) + return 0; + + if (entry->no_try_delta) + return 0; + + if (!entry->preferred_base) { + if (oe_type(entry) < 0) + die(_("unable to get type of object %s"), + oid_to_hex(&entry->idx.oid)); + } else if (oe_type(entry) < 0) { + /* + * This object is not found, but we + * don't have to include it anyway. + */ + return 0; + } + + return 1; +} + static void prepare_pack(int window, int depth) { struct object_entry **delta_list; @@ -3197,33 +3224,11 @@ static void prepare_pack(int window, int depth) for (i = 0; i < to_pack.nr_objects; i++) { struct object_entry *entry = to_pack.objects + i; - if (DELTA(entry)) - /* This happens if we decided to reuse existing - * delta from a pack. "reuse_delta &&" is implied. - */ - continue; - - if (!entry->type_valid || - oe_size_less_than(&to_pack, entry, 50)) + if (!should_attempt_deltas(entry)) continue; - if (entry->no_try_delta) - continue; - - if (!entry->preferred_base) { + if (!entry->preferred_base) nr_deltas++; - if (oe_type(entry) < 0) - die(_("unable to get type of object %s"), - oid_to_hex(&entry->idx.oid)); - } else { - if (oe_type(entry) < 0) { - /* - * This object is not found, but we - * don't have to include it anyway. - */ - continue; - } - } delta_list[n++] = entry; } From patchwork Sun Oct 20 13:43:21 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13843086 Received: from mail-ej1-f46.google.com (mail-ej1-f46.google.com [209.85.218.46]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 53516192B62 for ; Sun, 20 Oct 2024 13:43:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.218.46 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729431828; cv=none; b=cve0YrsAG9G5CnfJagpCxifmHabM1c36ezPTdnju6lwlwspJB3fBNHFpBjWPyr4ArAZIOKLkuoc58BGSclrnH5zFLxjoeTYChaX7/btzZOOhQ0KNokyAe/B9yOLIQPVYLV6gmlAiu3LBlW/vdvISnbCRAkJoMKihtK2TRSuCfVk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729431828; c=relaxed/simple; bh=gH/98lVyPPiKwu8q/MqA+CPSSvQp8qmW1wOySMXepYw=; h=Message-Id:In-Reply-To:References:From:Date:Subject:Content-Type: MIME-Version:To:Cc; b=KiQXkC2RU38q/ODJTBqmzbRyOAKbeJG1UoUP/AHz08KtkIobw4Gbvd5XGDO7WLUu5TVMB4ymWJCcBYOu2gsEGUGWkJVSt/bdTwpwV0IwpgEW7O211KOn3Q2K+EjLklel1UFLeYHQ/cBj6fZHicSqkqEL/YKVDXwD2XbUPAknep0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=eG1ksoeA; arc=none smtp.client-ip=209.85.218.46 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="eG1ksoeA" Received: by mail-ej1-f46.google.com with SMTP id a640c23a62f3a-a9a0472306cso475545766b.3 for ; Sun, 20 Oct 2024 06:43:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1729431823; x=1730036623; darn=vger.kernel.org; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=ZL33WxIVg/i1aIggp6yQ/0v5qyXw7yClyCM1pjCQmlc=; b=eG1ksoeAj/PQIGZUrg8v6bTzad4+rVia2teOZlw54XUn2FnpvkEoMfopIMSmQGaFgD HAvyDOehxzVWJOy89G/EIt3wbn7QK9wHJudKLT0OFhCzSGQt66MNvy7qtYAL4fefjjzj iFLR0tNJ6IRtFKYLDtN7cam+rrZfiaoGbU063C4v/20qCQSEz16LNGGiuGP4tpOcdqnQ GFjMg/9S2GcuU/OnXNKPOP+w8DkX+jfg7Rp572QHG5anS2xM/kFOKVkfopaxphpMSQO4 JpS0R7bTNQwBVvYkYv8cdiD25cZ00OyBNOuNA4MqiZFwL17VlDvpSVV88uWJR13iaGrq st1w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729431823; x=1730036623; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ZL33WxIVg/i1aIggp6yQ/0v5qyXw7yClyCM1pjCQmlc=; b=DJ0e6+zji5Kse4uM2ElNSPXx1G4pZORHIw0dbiXMBK5OTnRbsa7qnvb0jScmye9mRk sVrr0KNRKt2SmqKOHXoK/P4iS3EmoSRVcjHDMNQeoR5fGWZkpzMs8FQOT14dYHir0/gt L75e10uhDK16IKHzxVgdsjqz8tMlS2grTmo4BxfUYnwG4HxuJYnRMdZCe/GgeGOBRPhf HLMArN5uti70PgsQ4I/FKhulF6BrTNtimWNQzIoIwxWox4SknSzO2XFCPh3nXHOeqjpU WIdfSSZDAGGDIwg7nFSve3kELdUhSCmofRSN8DGlzi5ZT0RUbLolCHHyz3Dxd3dJc+12 ap7w== X-Gm-Message-State: AOJu0YyZwy3Gj4sh/2ZGD12AKEynuxEHNxgt92Auwd9q+AH5URj+P/D8 jmN8/vztkOdooVhfL/VnzCQP9TEIymMuje35BcEnyMv0tGiRYo1A3AycEA== X-Google-Smtp-Source: AGHT+IHXBtv6c6mWbDOB8pFwBWrhVAQbSjg2jl0pgyMSi+3Cmtb5WODn0TVUiVT2vyf5l9YpWkx2Sg== X-Received: by 2002:a17:907:7f23:b0:a9a:eeb:b263 with SMTP id a640c23a62f3a-a9a69cd38cemr783569566b.58.1729431822603; Sun, 20 Oct 2024 06:43:42 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a9a9157366fsm91803866b.181.2024.10.20.06.43.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 20 Oct 2024 06:43:42 -0700 (PDT) Message-Id: In-Reply-To: References: Date: Sun, 20 Oct 2024 13:43:21 +0000 Subject: [PATCH v2 08/17] pack-objects: add --path-walk option Fcc: Sent Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, johannes.schindelin@gmx.de, peff@peff.net, ps@pks.im, me@ttaylorr.com, johncai86@gmail.com, newren@gmail.com, christian.couder@gmail.com, kristofferhaugsbakk@fastmail.com, Derrick Stolee , Derrick Stolee From: Derrick Stolee From: Derrick Stolee In order to more easily compute delta bases among objects that appear at the exact same path, add a --path-walk option to 'git pack-objects'. This option will use the path-walk API instead of the object walk given by the revision machinery. Since objects will be provided in batches representing a common path, those objects can be tested for delta bases immediately instead of waiting for a sort of the full object list by name-hash. This has multiple benefits, including avoiding collisions by name-hash. The objects marked as UNINTERESTING are included in these batches, so we are guaranteeing some locality to find good delta bases. After the individual passes are done on a per-path basis, the default name-hash is used to find other opportunistic delta bases that did not match exactly by the full path name. The current implementation performs delta calculations while walking objects, which is not ideal for a few reasons. First, this will cause the "Enumerating objects" phase to be much longer than usual. Second, it does not take advantage of threading during the path-scoped delta calculations. Even with this lack of threading, the path-walk option is sometimes faster than the usual approach. Future changes will refactor this code to allow for threading, but that complexity is deferred until later to keep this patch as simple as possible. This new walk is incompatible with some features and is ignored by others: * Object filters are not currently integrated with the path-walk API, such as sparse-checkout or tree depth. A blobless packfile could be integrated easily, but that is deferred for later. * Server-focused features such as delta islands, shallow packs, and using a bitmap index are incompatible with the path-walk API. * The path walk API is only compatible with the --revs option, not taking object lists or pack lists over stdin. These alternative ways to specify the objects currently ignores the --path-walk option without even a warning. Future changes will create performance tests that demonstrate the power of this approach. Signed-off-by: Derrick Stolee --- Documentation/git-pack-objects.txt | 13 +- Documentation/technical/api-path-walk.txt | 3 +- builtin/pack-objects.c | 147 ++++++++++++++++++++-- t/t5300-pack-object.sh | 17 +++ 4 files changed, 169 insertions(+), 11 deletions(-) diff --git a/Documentation/git-pack-objects.txt b/Documentation/git-pack-objects.txt index e32404c6aae..f2fda800a43 100644 --- a/Documentation/git-pack-objects.txt +++ b/Documentation/git-pack-objects.txt @@ -15,7 +15,8 @@ SYNOPSIS [--revs [--unpacked | --all]] [--keep-pack=] [--cruft] [--cruft-expiration=