From patchwork Sun Dec 3 13:39:08 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Han Young X-Patchwork-Id: 13477332 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="HlYCikfe" Received: from mail-ot1-x330.google.com (mail-ot1-x330.google.com [IPv6:2607:f8b0:4864:20::330]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 19ECFF2 for ; Sun, 3 Dec 2023 05:39:27 -0800 (PST) Received: by mail-ot1-x330.google.com with SMTP id 46e09a7af769-6d8750718adso1287549a34.0 for ; Sun, 03 Dec 2023 05:39:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1701610765; x=1702215565; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=9RvYv7uHu7brJsj9pn123xeFDsX9i1hM3oiiLxym4YM=; b=HlYCikfesY/VvUaqL7n5J+mvAUyWY1aERgsfYHaSBptdhrzh866NBHyYrDbEvm8/Sg misKdi1YYOvcWq4NDp9aZTHii6o/SCd7tZFxghz1eYq4ha/gUxHok1hm5AK/Bu+ZIVQN NJ1gGzWUWhvAhE4xeh3BPzW1jxgWQDVdSDARJWUaR+MnRiw6lrYHDITt2KSkILIhE1MK xlAYb2ViTeFCAWwbGT5xDDnUervDYDJsDl8Evdn5LEtPQDYdFPS5TNIiX157Tvg+Xnzi kRtXsMONBumhCjsUfEw9+CzmzvC14FosyiBUp28Z60C6F5OxH37zCE2nVkrSld9f6Pye A6fw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701610765; x=1702215565; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=9RvYv7uHu7brJsj9pn123xeFDsX9i1hM3oiiLxym4YM=; b=vdGS67Md1ov3XsNZlO0Oj4U+HO1retV3uLcfUvcjj1Wm/ZTfXVLuUn6ktXqKZPCEZ5 O8HvL9kUMI26kNJEONyXkkBybUtBaUYbRw4xMdy9luTbHfw+VFptVh/BMMZ3EViHZvsb Rfe1HniGEl4ebedJNZl7cO9bUC7TEk9AeF5DvY9G1vRDv9zbO2LzE1W6EeYJz873Utpx RnvS8iOF9077sh+P/I4pBHhK2ARUoxgOM39jhqO688m74txrso+yjHCg5rr8Tcw104Lz 8FBjnyYXmXwlglxVnVjdhbV15EaaQG72BMMYiZwBDLEmkJZf9sGtweMuev5IUysNFqds AipQ== X-Gm-Message-State: AOJu0Yw5iUCVVWhWKd3Pfx/8LUeWRIG8EPoVnma/tfSg96Fbou3qcg4o NvYPz0Q79mtWVbYMAByX8yUSImuubOV6vCVjrl0= X-Google-Smtp-Source: AGHT+IGe4CCyrTyG2Ny01ldNVuHjyluwJIG2na+HDTAKDqpM2NNStJUmz3eGdhy1wzvkBQdTL6xzsQ== X-Received: by 2002:a05:6871:2303:b0:1fb:3741:4dc5 with SMTP id sf3-20020a056871230300b001fb37414dc5mr441179oab.34.1701610765560; Sun, 03 Dec 2023 05:39:25 -0800 (PST) Received: from localhost.localdomain ([139.177.225.236]) by smtp.gmail.com with ESMTPSA id s4-20020a637704000000b005c65e82a0cbsm2515110pgc.17.2023.12.03.05.39.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 03 Dec 2023 05:39:25 -0800 (PST) From: Han Young X-Google-Original-From: Han Young To: git@vger.kernel.org Cc: Han Young Subject: [RFC PATCH 1/4] symlinks: add and export threaded rmdir variants Date: Sun, 3 Dec 2023 21:39:08 +0800 Message-ID: <20231203133911.41594-2-hanyoung@protonmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20231203133911.41594-1-hanyoung@protonmail.com> References: <20231203133911.41594-1-hanyoung@protonmail.com> Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Han Young Add and export threaded variants of remove dir related functions, these functions will be used by parallel unlink --- Most of the code of threaded_schedule_dir_for_removal and threaded_do_remove_scheduled_dirs is duplicated. We can remove the duplication either via breaking the function into smaller functions, or pass the cache as parameters. If we choose to pass the cache explicitly, default cache in both entry.c and symlinks.c probably need to be moved to unpack-trees.c. I'm not satisfied with using mutex guarded hashset to ensure every dir is removed. But I can't come up with a better way. symlinks.c | 120 +++++++++++++++++++++++++++++++++++++++++++++++++++-- symlinks.h | 6 +++ 2 files changed, 123 insertions(+), 3 deletions(-) diff --git a/symlinks.c b/symlinks.c index b29e340c2d..c8cb0a7eb7 100644 --- a/symlinks.c +++ b/symlinks.c @@ -2,9 +2,9 @@ #include "gettext.h" #include "setup.h" #include "symlinks.h" +#include "hashmap.h" +#include "pthread.h" -static int threaded_check_leading_path(struct cache_def *cache, const char *name, - int len, int warn_on_lstat_err); static int threaded_has_dirs_only_path(struct cache_def *cache, const char *name, int len, int prefix_len); /* @@ -229,7 +229,7 @@ int check_leading_path(const char *name, int len, int warn_on_lstat_err) * directory, or if we were unable to lstat() it. If warn_on_lstat_err is true, * also emit a warning for this error. */ -static int threaded_check_leading_path(struct cache_def *cache, const char *name, +int threaded_check_leading_path(struct cache_def *cache, const char *name, int len, int warn_on_lstat_err) { int flags; @@ -277,6 +277,51 @@ static int threaded_has_dirs_only_path(struct cache_def *cache, const char *name } static struct strbuf removal = STRBUF_INIT; +static struct hashmap dir_set; +pthread_mutex_t dir_set_mutex = PTHREAD_MUTEX_INITIALIZER; +struct rmdir_hash_entry { + struct hashmap_entry hash; + char *dir; + size_t dirlen; +}; + +/* rmdir_hashmap comparison function */ +static int rmdir_hash_entry_cmp(const void *cmp_data UNUSED, + const struct hashmap_entry *eptr, + const struct hashmap_entry *entry_or_key UNUSED, + const void *keydata) +{ + const struct rmdir_hash_entry *a, *b; + + a = container_of(eptr, const struct rmdir_hash_entry, hash); + return strcmp(a->dir, (char *)keydata); +} + +void threaded_init_remove_scheduled_dirs(void) +{ + unsigned flags = 0; + hashmap_init(&dir_set, rmdir_hash_entry_cmp, &flags, 0); +} + +static void add_dir_to_rmdir_hash(char *dir, size_t dirlen) +{ + struct rmdir_hash_entry *e; + struct hashmap_entry *ent; + int hash = strhash(dir); + pthread_mutex_lock(&dir_set_mutex); + ent = hashmap_get_from_hash(&dir_set, hash, dir); + + if (!ent) { + e = xmalloc(sizeof(struct rmdir_hash_entry)); + hashmap_entry_init(&e->hash, hash); + char *_dir= xmallocz(dirlen); + memcpy(_dir, dir, dirlen+1); + e->dir = _dir; + e->dirlen = dirlen; + hashmap_put_entry(&dir_set, e, hash); + } + pthread_mutex_unlock(&dir_set_mutex); +} static void do_remove_scheduled_dirs(int new_len) { @@ -294,6 +339,26 @@ static void do_remove_scheduled_dirs(int new_len) removal.len = new_len; } + +static void threaded_do_remove_scheduled_dirs(int new_len, struct strbuf *removal) +{ + while (removal->len > new_len) { + removal->buf[removal->len] = '\0'; + if (startup_info->original_cwd && + !strcmp(removal->buf, startup_info->original_cwd)) + break; + if (rmdir(removal->buf)) { + add_dir_to_rmdir_hash(removal->buf, removal->len); + break; + } + do { + removal->len--; + } while (removal->len > new_len && + removal->buf[removal->len] != '/'); + } + removal->len = new_len; +} + void schedule_dir_for_removal(const char *name, int len) { int match_len, last_slash, i, previous_slash; @@ -327,11 +392,60 @@ void schedule_dir_for_removal(const char *name, int len) strbuf_add(&removal, &name[match_len], last_slash - match_len); } +void threaded_schedule_dir_for_removal(const char *name, int len, struct strbuf *removal_cache) +{ + int match_len, last_slash, i, previous_slash; + + if (startup_info->original_cwd && + !strcmp(name, startup_info->original_cwd)) + return; /* Do not remove the current working directory */ + + match_len = last_slash = i = + longest_path_match(name, len, removal_cache->buf, removal_cache->len, + &previous_slash); + /* Find last slash inside 'name' */ + while (i < len) { + if (name[i] == '/') + last_slash = i; + i++; + } + + /* + * If we are about to go down the directory tree, we check if + * we must first go upwards the tree, such that we then can + * remove possible empty directories as we go upwards. + */ + if (match_len < last_slash && match_len < removal_cache->len) + threaded_do_remove_scheduled_dirs(match_len, removal_cache); + /* + * If we go deeper down the directory tree, we only need to + * save the new path components as we go down. + */ + if (match_len < last_slash) + strbuf_add(removal_cache, &name[match_len], last_slash - match_len); +} + void remove_scheduled_dirs(void) { do_remove_scheduled_dirs(0); } +void threaded_remove_scheduled_dirs_clean_up(void) +{ + struct hashmap_iter iter; + const struct rmdir_hash_entry *entry; + + hashmap_for_each_entry(&dir_set, &iter, entry, hash /* member name */) { + schedule_dir_for_removal(entry->dir, entry->dirlen); + } + remove_scheduled_dirs(); +} + +void threaded_remove_scheduled_dirs(struct strbuf *removal_cache) +{ + threaded_do_remove_scheduled_dirs(0, removal_cache); +} + void invalidate_lstat_cache(void) { reset_lstat_cache(&default_cache); diff --git a/symlinks.h b/symlinks.h index 7ae3d5b856..7898eae941 100644 --- a/symlinks.h +++ b/symlinks.h @@ -20,9 +20,15 @@ static inline void cache_def_clear(struct cache_def *cache) int has_symlink_leading_path(const char *name, int len); int threaded_has_symlink_leading_path(struct cache_def *, const char *, int); int check_leading_path(const char *name, int len, int warn_on_lstat_err); +int threaded_check_leading_path(struct cache_def *cache, const char *name, + int len, int warn_on_lstat_err); int has_dirs_only_path(const char *name, int len, int prefix_len); void invalidate_lstat_cache(void); void schedule_dir_for_removal(const char *name, int len); +void threaded_schedule_dir_for_removal(const char *name, int len, struct strbuf *removal_cache); void remove_scheduled_dirs(void); +void threaded_remove_scheduled_dirs(struct strbuf *removal_cache); +void threaded_init_remove_scheduled_dirs(void); +void threaded_remove_scheduled_dirs_clean_up(void); #endif /* SYMLINKS_H */ From patchwork Sun Dec 3 13:39:09 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Han Young X-Patchwork-Id: 13477333 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="kySDCokO" Received: from mail-oi1-x232.google.com (mail-oi1-x232.google.com [IPv6:2607:f8b0:4864:20::232]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E6629A1 for ; Sun, 3 Dec 2023 05:39:28 -0800 (PST) Received: by mail-oi1-x232.google.com with SMTP id 5614622812f47-3b8b5f54ba1so562819b6e.0 for ; Sun, 03 Dec 2023 05:39:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1701610767; x=1702215567; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=L1AT2OIsrXycQl4aHQAwKNZ0NUGOGDU/1vmP5aDNm0Q=; b=kySDCokO3q05qhNWBUcIbg+XAr0LuN3WZWU5Cmbj6Ock0XJ/XqHHLKNqz2jbCo7WCj NKmFj3jmiRPjsPT0/7mvpn0vlPpHy9EFeduuPocN4rB233x5Wy6Eiupa5xJaPPuc75NN fog6lDSZwhKUGgppZsvgHg9to17M7KMSe95qe7htJczrtv22v2aNrv6w1fJlNSYwe9+Y MKGz4P6tKELLeq8Nyzl6n7weiFxXWuW2CN1gMhA9SFLux8rh6oibzKUry1+4eqm+TrJd 88yxlVupltaHy/JM8RZnKk6lIPMXxc8o8oAs7o4UY+i9/+5tTYo9dqlxD+phb9UA3/c3 kwaA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701610767; x=1702215567; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=L1AT2OIsrXycQl4aHQAwKNZ0NUGOGDU/1vmP5aDNm0Q=; b=AfiPTCpUv84rrxncoSmFaTO1aBek9kGUn0I5jsfO31IaNCLKskjy+kysmTMPWrFZl4 /c/B8i/PaC0l62wOfnYcy1cAHh9S13MoRCxWpeTjkWQhnGEacC6mUSiq20hVXv+iPj9O XnvSD23CbErwSpAqxJsuKcH9H9p4s5p+MwmtDMxVFs305/1EKw0PTnPSvhfqyDqBm24b Z7mQGsNxK/TZrfIwJYbWzv1+c4mRoIoMJP3xEalP76FEPDFyKWXDpR5wfmnXiS8LfYI2 nFFdTaexuVLjPXAsAxnSQP8yhY2F4L8WEz+eSFL7bVLMzaryfRKUaYUW+ys7Nuveio7w cwTw== X-Gm-Message-State: AOJu0YyFamHmn1N/ANfWMagZXbjyYDRWdv7cSY4Lpolyuk+eJdp742WM J5J9GkY9E4pvq2Sjl2xOJwdydSfie+CNyq2SYZk= X-Google-Smtp-Source: AGHT+IExswVtY8k0qg65flpJsXvsb++RhXqTJgSbqknARPsYzfmoN2Ro4vyBaS2En5mYt9EjgbIKKA== X-Received: by 2002:a05:6808:f8a:b0:3b8:643f:934b with SMTP id o10-20020a0568080f8a00b003b8643f934bmr3809811oiw.46.1701610767712; Sun, 03 Dec 2023 05:39:27 -0800 (PST) Received: from localhost.localdomain ([139.177.225.236]) by smtp.gmail.com with ESMTPSA id s4-20020a637704000000b005c65e82a0cbsm2515110pgc.17.2023.12.03.05.39.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 03 Dec 2023 05:39:27 -0800 (PST) From: Han Young X-Google-Original-From: Han Young To: git@vger.kernel.org Cc: Han Young Subject: [RFC PATCH 2/4] entry: add threaded_unlink_entry function Date: Sun, 3 Dec 2023 21:39:09 +0800 Message-ID: <20231203133911.41594-3-hanyoung@protonmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20231203133911.41594-1-hanyoung@protonmail.com> References: <20231203133911.41594-1-hanyoung@protonmail.com> Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Han Young Add threaded_unlink_entry function, the threaded function uses cache passed by arguments instead of the default cache. It also calls threaded variant of schedule_dir_for_removal to ensure dirs are removed in multithreaded unlink. --- Another duplicated function. Because default removal cache and default lstat cache live in different source files, threaded variant of check_leading_path and schedule_dir_for_removal must be called here instead of choosing to pass explicit or default cache. entry.c | 16 ++++++++++++++++ entry.h | 3 +++ 2 files changed, 19 insertions(+) diff --git a/entry.c b/entry.c index 076e97eb89..04440beb2b 100644 --- a/entry.c +++ b/entry.c @@ -567,6 +567,22 @@ int checkout_entry_ca(struct cache_entry *ce, struct conv_attrs *ca, return write_entry(ce, path.buf, ca, state, 0, nr_checkouts); } +void threaded_unlink_entry(const struct cache_entry *ce, const char *super_prefix, + struct strbuf *removal, struct cache_def *cache) +{ + const struct submodule *sub = submodule_from_ce(ce); + if (sub) { + /* state.force is set at the caller. */ + submodule_move_head(ce->name, super_prefix, "HEAD", NULL, + SUBMODULE_MOVE_HEAD_FORCE); + } + if (threaded_check_leading_path(cache, ce->name, ce_namelen(ce), 1) >= 0) + return; + if (remove_or_warn(ce->ce_mode, ce->name)) + return; + threaded_schedule_dir_for_removal(ce->name, ce_namelen(ce), removal); +} + void unlink_entry(const struct cache_entry *ce, const char *super_prefix) { const struct submodule *sub = submodule_from_ce(ce); diff --git a/entry.h b/entry.h index ca3ed35bc0..413ca3822d 100644 --- a/entry.h +++ b/entry.h @@ -2,6 +2,7 @@ #define ENTRY_H #include "convert.h" +#include "symlinks.h" struct cache_entry; struct index_state; @@ -56,6 +57,8 @@ int finish_delayed_checkout(struct checkout *state, int show_progress); * down from "read-tree" et al. */ void unlink_entry(const struct cache_entry *ce, const char *super_prefix); +void threaded_unlink_entry(const struct cache_entry *ce, const char *super_prefix, + struct strbuf *removal, struct cache_def *cache); void *read_blob_entry(const struct cache_entry *ce, size_t *size); int fstat_checkout_output(int fd, const struct checkout *state, struct stat *st); From patchwork Sun Dec 3 13:39:10 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Han Young X-Patchwork-Id: 13477334 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="i5D9iWZo" Received: from mail-pg1-x533.google.com (mail-pg1-x533.google.com [IPv6:2607:f8b0:4864:20::533]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EEB7FF2 for ; Sun, 3 Dec 2023 05:39:30 -0800 (PST) Received: by mail-pg1-x533.google.com with SMTP id 41be03b00d2f7-5c27ee9c36bso1747425a12.1 for ; Sun, 03 Dec 2023 05:39:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1701610770; x=1702215570; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=GMUjySH3u9UK+l6BUcuuMftVY/wlV19i/Ue6UnYzbso=; b=i5D9iWZoytUZZBtFv5aiTIVPy7YMyoK8pf6MZEoXjVf5R2tQlQAa/ViFf5XqMkpB0x PNtrYQuaG5F63OkmcvioKQOfP4UkCzgTNyqiNLdC+LwxgtnboRFqqPQ8gfDbuX6WEcy5 yuJiPiaL5wZlyw/wyD62z49GdU3eQ+9hP20g4DXdH0CS2ngBF0Ux+fyOo8Eywrr+74Q+ kjTSa2jGan8lrdYJ+EZj4HhootBu7FvfACrZJY4+FMtlEQUJ+AJQOBLXivs96SnE4xuj bTLQHywngBl6ILSla6lyKTNDn/0dr2b2YMym4zQgGQhvP4eVc66sG7O8JSewh9JCGLeG stEA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701610770; x=1702215570; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=GMUjySH3u9UK+l6BUcuuMftVY/wlV19i/Ue6UnYzbso=; b=wV3hI0enDrRNoH5p3TaQ5VtRvyYRNmlolQWPlvdtJkEVpPJYms9w3/0sy/y7jA6zQI YNqPLGPWlpO3Iz0q6Zmta1IN2m49Oi5cHOYOWNZq3/EjUA32hjt6HGLqdzlg24dc9WmZ F/Y0L/myoHkQpPYqmfvvtHWQLLYQHCEU59PcnZo4dfLC0Qcr0Gnr1RhuKLklzyrtrlir 3uh7LI33dfpJx0OfK/BgydNj++/KWuU0tcZzwsRxmHvlwaQnHcogps/iW7f4xPfxGayb IusFasFODqhs/5zzCBP6X6UO+SBKVzZqy0XwkRiNw2lmGgn2THk0VaBnZRgRDP3FWHvS fN+A== X-Gm-Message-State: AOJu0Yx0mzYGgbw6n9F3+ugxreKLUDrXYSvQsDT8jhWpNG3DGoYKFUmd cV/ZEl3VVUr3enmphx44juhHM7tE3U7BaQw1548= X-Google-Smtp-Source: AGHT+IFDEzpmcEyUYp3svCJBx1P7laerlLEAqokTqUjf/oTsioD/HqQmGkENaipZJtMq2cg4IcXeCA== X-Received: by 2002:a05:6a20:de14:b0:18c:b133:dea4 with SMTP id kz20-20020a056a20de1400b0018cb133dea4mr2569163pzb.42.1701610770116; Sun, 03 Dec 2023 05:39:30 -0800 (PST) Received: from localhost.localdomain ([139.177.225.236]) by smtp.gmail.com with ESMTPSA id s4-20020a637704000000b005c65e82a0cbsm2515110pgc.17.2023.12.03.05.39.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 03 Dec 2023 05:39:29 -0800 (PST) From: Han Young X-Google-Original-From: Han Young To: git@vger.kernel.org Cc: Han Young Subject: [RFC PATCH 3/4] parallel-checkout: add parallel_unlink Date: Sun, 3 Dec 2023 21:39:10 +0800 Message-ID: <20231203133911.41594-4-hanyoung@protonmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20231203133911.41594-1-hanyoung@protonmail.com> References: <20231203133911.41594-1-hanyoung@protonmail.com> Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Han Young Add parallel_unlink to parallel-checkout, parallel_unlink uses multiple threads to unlink entries. Because the path to be removed is sorted, each thread iterate through the entry list interleaved to distribute the workload as evenly as possible. Due to the multithread nature, it's not possible to remove all the dirs in one pass. The dir one thread is about to remove may have item that are being removed by another thread. Whenever we failed to remove the dir, we save it in a hashset. When every thread has finished its job, we remove all the entries in the hashset. --- Note that we display progress after thread join, the progress count is updated for every thread instead of every path. During testing, threads almost finished at around the same time. This caused the abrupt progress update. We can use a mutex to display the progress, but that nullified the optimization on environment with fast file deletion time. parallel-checkout.c | 80 +++++++++++++++++++++++++++++++++++++++++++++ parallel-checkout.h | 25 ++++++++++++++ 2 files changed, 105 insertions(+) diff --git a/parallel-checkout.c b/parallel-checkout.c index b5a714c711..6e62e044d8 100644 --- a/parallel-checkout.c +++ b/parallel-checkout.c @@ -328,6 +328,24 @@ static int close_and_clear(int *fd) return ret; } +void *parallel_unlink_proc(void *_data) +{ + struct parallel_unlink_data *data = _data; + struct cache_def cache = CACHE_DEF_INIT; + int i = data->start; + data->cnt = 0; + + while (i < data->len) { + const struct cache_entry *ce = data->cache[i]; + if (ce->ce_flags & CE_WT_REMOVE) { + ++data->cnt; + threaded_unlink_entry(ce, data->super_prefix, data->removal_cache, &cache); + } + i += data->step; + } + return &data->cnt; +} + void write_pc_item(struct parallel_checkout_item *pc_item, struct checkout *state) { @@ -678,3 +696,65 @@ int run_parallel_checkout(struct checkout *state, int num_workers, int threshold finish_parallel_checkout(); return ret; } + +unsigned run_parallel_unlink(struct index_state *index, + struct progress *progress, + const char *super_prefix, int num_workers, int threshold, + unsigned cnt) +{ + int i, use_parallel = 0, errs = 0; + if (num_workers > 1 && index->cache_nr >= threshold) { + int unlink_cnt = 0; + for (i = 0; i < index->cache_nr; i++) { + const struct cache_entry *ce = index->cache[i]; + if (ce->ce_flags & CE_WT_REMOVE) { + unlink_cnt++; + } + } + if (unlink_cnt >= threshold) { + use_parallel = 1; + } + } + if (use_parallel) { + struct parallel_unlink_data *unlink_data; + CALLOC_ARRAY(unlink_data, num_workers); + threaded_init_remove_scheduled_dirs(); + struct strbuf removal_caches[num_workers]; + for (i = 0; i < num_workers; i++) { + struct parallel_unlink_data *data = &unlink_data[i]; + strbuf_init(&removal_caches[i], 50); + data->start = i; + data->cache = index->cache; + data->len = index->cache_nr; + data->step = num_workers; + data->super_prefix = super_prefix; + data->removal_cache = &removal_caches[i]; + errs = pthread_create(&data->pthread, NULL, parallel_unlink_proc, data); + if (errs) + die(_("unable to create parallel_checkout thread: %s"), strerror(errs)); + } + for (i = 0; i < num_workers; i++) { + void *t_cnt; + if (pthread_join(unlink_data[i].pthread, &t_cnt)) + die("unable to join parallel_unlink_thread"); + cnt += *((unsigned *)t_cnt); + display_progress(progress, cnt); + } + threaded_remove_scheduled_dirs_clean_up(); + for (i = 0; i < num_workers; i++) { + threaded_remove_scheduled_dirs(&removal_caches[i]); + } + remove_marked_cache_entries(index, 0); + } else { + for (i = 0; i < index->cache_nr; i++) { + const struct cache_entry *ce = index->cache[i]; + if (ce->ce_flags & CE_WT_REMOVE) { + display_progress(progress, ++cnt); + unlink_entry(ce, super_prefix); + } + } + remove_marked_cache_entries(index, 0); + remove_scheduled_dirs(); + } + return cnt; +} diff --git a/parallel-checkout.h b/parallel-checkout.h index c575284005..e851b773d9 100644 --- a/parallel-checkout.h +++ b/parallel-checkout.h @@ -43,6 +43,18 @@ size_t pc_queue_size(void); int run_parallel_checkout(struct checkout *state, int num_workers, int threshold, struct progress *progress, unsigned int *progress_cnt); +/* + * Unlink all the unlink entries in the index, returning the number of entries + * unlinked plus the origin value of cnt. If the number of entries + * to be removed is smaller than the specified threshold, the operation + * is performed sequentially. + */ +unsigned run_parallel_unlink(struct index_state *index, + struct progress *progress, + const char *super_prefix, + int num_workers, int threshold, + unsigned cnt); + /**************************************************************** * Interface with checkout--worker ****************************************************************/ @@ -76,6 +88,19 @@ struct parallel_checkout_item { struct stat st; }; +struct parallel_unlink_data { + pthread_t pthread; + struct cache_entry **cache; + struct strbuf *removal_cache; + size_t len; + int start; + size_t step; + unsigned cnt; + const char *super_prefix; +}; + +void *parallel_unlink_proc(void *_data); + /* * The fixed-size portion of `struct parallel_checkout_item` that is sent to the * workers. Following this will be 2 strings: ca.working_tree_encoding and From patchwork Sun Dec 3 13:39:11 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Han Young X-Patchwork-Id: 13477335 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="Cg6PucVN" Received: from mail-pf1-x435.google.com (mail-pf1-x435.google.com [IPv6:2607:f8b0:4864:20::435]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 841FCFC for ; Sun, 3 Dec 2023 05:39:33 -0800 (PST) Received: by mail-pf1-x435.google.com with SMTP id d2e1a72fcca58-6cdcef787ffso3959001b3a.0 for ; Sun, 03 Dec 2023 05:39:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1701610772; x=1702215572; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=IHK+vVu5ipNW149NlqulqGnnHdAIPxaJYBu04rkWNCc=; b=Cg6PucVNQHjD1VuX33xF1RLJrQn5SuGpvKNH79pDURnl09mv/KanrFm+CT1j0jadP4 xrZ1GQVSpvYx3K68hRdSCoEx/U4eYE8tuW71fjQ9dA+Nu2Ds6VR8+SKcUbPzWOOZIGHK HiJQ2bv4unLFLOVlj5tkbYzctY9Z1Fby5WEhjv9+rxihquryit7Qw1dLf8AwnTjzMd5u i+t8IP7LpcY4VhBkBKy3nYi2SUu82uJk34S9iVqAQodlwNnpT3fZTGb7grLRZwlrBRGZ C0Oe0QeCpQtsVSrfzRCkwBH6dnf0GyfwdOTKpipMTpsYafUGJSGweC1AY6ZIqFYn6MnD EDCQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701610773; x=1702215573; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=IHK+vVu5ipNW149NlqulqGnnHdAIPxaJYBu04rkWNCc=; b=ARIn+axHWOfKFpnjmUkYAyGf/5xNjiXl3W9UHf9CAiNvsUePxyeI7P/LJ1CE3kvWKW bMbpAg7Gc/7RUeMCWEeEUchKMoxpp5yILQbTNq4iHCtuUM2gEh20kr72Rd6b0e35ggsU giEiKlvnqB1/T+M9qCdlXdLq75bWpSS4NqB9fA6psHnJ7t46C/7a2LV36Rp4m7Yko2KH lrppbA7nksJiKfG+QMI5Vc+C4CmzgQF07if+0Gc925esES6T8n8pXWrddrQzAyTJvIh4 KKNoTcOdODB21fnQecNgTbuKWL6s1GSVJuD3ZojBK1Y27yWTG3PcN06r/Ffiz/Cy84Ej ZLNQ== X-Gm-Message-State: AOJu0YyRWt9vNX6no6asdJixqtRHoPG1aj3Jw9OqVtvYE0SuEpGUFxKo +UHGg87ouWnoynnCv+qmEmg2BJs3/czyGgEf83A= X-Google-Smtp-Source: AGHT+IEoRee87/6F2GBxdLF6X/WiYmGzP/uNEm8IjK54ASUgXU5vUyocBN0oniOAnaEVUdqU8UlNnA== X-Received: by 2002:a05:6a20:6a23:b0:18f:97c:9758 with SMTP id p35-20020a056a206a2300b0018f097c9758mr3490124pzk.64.1701610772754; Sun, 03 Dec 2023 05:39:32 -0800 (PST) Received: from localhost.localdomain ([139.177.225.236]) by smtp.gmail.com with ESMTPSA id s4-20020a637704000000b005c65e82a0cbsm2515110pgc.17.2023.12.03.05.39.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 03 Dec 2023 05:39:32 -0800 (PST) From: Han Young X-Google-Original-From: Han Young To: git@vger.kernel.org Cc: Han Young Subject: [RFC PATCH 4/4] unpack-trees: introduce parallel_unlink Date: Sun, 3 Dec 2023 21:39:11 +0800 Message-ID: <20231203133911.41594-5-hanyoung@protonmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20231203133911.41594-1-hanyoung@protonmail.com> References: <20231203133911.41594-1-hanyoung@protonmail.com> Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Han Young We have parallel_checkout option since 04155bdad, but the unlink is still executed single threaded. On very large repo, checkout across directory rename or restructure commit can lead to large amount of unlinked entries. In some instance, the unlink operation can be slower than the parallel checkout. This commit add parallel unlink support, parallel unlink uses multithreaded removal of entries. --- Unlink operation by itself is way faster than checkout, the default threshold should be way higher than parallel_checkout. I hardcoded the threshold to be 100 times higher, probably need to introduce a new config option with sensible default. To discover how many entries to remove require us to iterate index->cache, this is fast even for large number of entries compare to filesystem operation. I think we can reuse checkout.workers as the main switch for parallel_unlink, since it's also part of checkout process. unpack-trees.c | 15 ++------------- 1 file changed, 2 insertions(+), 13 deletions(-) diff --git a/unpack-trees.c b/unpack-trees.c index c2b20b80d5..53589cde8a 100644 --- a/unpack-trees.c +++ b/unpack-trees.c @@ -452,17 +452,8 @@ static int check_updates(struct unpack_trees_options *o, if (should_update_submodules()) load_gitmodules_file(index, NULL); - for (i = 0; i < index->cache_nr; i++) { - const struct cache_entry *ce = index->cache[i]; - - if (ce->ce_flags & CE_WT_REMOVE) { - display_progress(progress, ++cnt); - unlink_entry(ce, o->super_prefix); - } - } - - remove_marked_cache_entries(index, 0); - remove_scheduled_dirs(); + get_parallel_checkout_configs(&pc_workers, &pc_threshold); + cnt = run_parallel_unlink(index, progress, o->super_prefix, pc_workers, pc_threshold * 100, cnt); if (should_update_submodules()) load_gitmodules_file(index, &state); @@ -474,8 +465,6 @@ static int check_updates(struct unpack_trees_options *o, */ prefetch_cache_entries(index, must_checkout); - get_parallel_checkout_configs(&pc_workers, &pc_threshold); - enable_delayed_checkout(&state); if (pc_workers > 1) init_parallel_checkout();