From patchwork Tue Mar 5 01:21:11 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Atneya Nair X-Patchwork-Id: 13581411 Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com [209.85.128.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1D32812B71 for ; Tue, 5 Mar 2024 01:22:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709601750; cv=none; b=kedOyXOJXoHSQeZFNNfOgI4KWcgxq1eDDN+Rh5LtzZ6uaLXOFcMI8iI6FGnVhyA2Wi7B1SPf0UQFrLqhnI0cAPYek1qXg/b2FFP6RmdkW94jZVvAsixdmZ/w0XxvO8b+Y54Fs1tgAfv0rT37Azn6eG4Oz7isAgEeA3CCDvCvZqI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709601750; c=relaxed/simple; bh=3c2ZLP4ElaTIWQ1MwsygQRdMHcyYe/0LHge4IZg7Xzk=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=udLC72xa+t/0iFBjwuAj4AOtA7Y+r/l9YbLaMdhivWCkWZ2Lxfc+5NVfmble/mWcQvC0dDAHa2vLnP3A5KRLTpi1BkuaD+lTBbiN2Qpd98+wu//o8mN8HE6n8kbGQJeHg82FDMsFwkd4e7zHJYK1f2JyRp2zESqKnU2BcVWNMt8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--atneya.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=uYrZneq5; arc=none smtp.client-ip=209.85.128.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--atneya.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="uYrZneq5" Received: by mail-yw1-f202.google.com with SMTP id 00721157ae682-60987370f06so40320057b3.1 for ; Mon, 04 Mar 2024 17:22:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1709601747; x=1710206547; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=b1iKt0S10KykeU5net5hBhrTTpSl/3gXUvECXqShjT8=; b=uYrZneq5wfwWBhYypxcnyyu6xPZcFGO9vJSuvPhlEadoZC/D7OgkxX+nSzM4yTt3DX yU6GVILRxD2BX47h61+pKQMQsp9JZQaOyuHqM267tMpS8knI19nSzRW250jiUYayx/dw U6jMId75XdK/XpKbCanxEKWpnshC8NMwrg4P3ZvWoaISrI6jelePEvKfu8BHJTnDUl6j v9BvslSRUADjlJ9TlX3pYxVKO/jqSqV7sPlZD0Nt3BaTtqOOJquV/HQgJKdlmaHOdalf gGzrpx+Qta1hRgClK7opjwgtAAZoKvvELL3a0s/bOXs2sQtrl4+g2gB48C08jyGLLAP8 ZSqw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709601747; x=1710206547; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=b1iKt0S10KykeU5net5hBhrTTpSl/3gXUvECXqShjT8=; b=h0H2OrCjxElWKpgIWOdOPla6pk+6L8VomwkzoAqcHC5RPUaHIxFYTE+QK9pvQgZ2IH JVAjiw6MisRqWsai+/TSM99LvMFVzuCTFPl9nn90Oji8qBiOAAK3+at6kGK+5zz7mrzO yT/L0qI8x0Wyp6SOlPsoQDX+Ls34Ch7il4xwAjYfAR/TicKaOjbOoebqavx8ZVz2afWj aIfakIYwV8PovsV4qfD55PLj+KNaXg4ZTsaTu8a6MSFtxladwqKy8cHo28DTiQ9cM1lh 5vNLgzeihsQTyWyxXtqjrNyb+cf35A6miOFpqxkuC4g+56GiBG3cJQ0a/BIesnRRzn7/ wd7A== X-Gm-Message-State: AOJu0Yy52pRQ03zKysEieHjCkegclZWQkwVb0vBKBByFiFFnRpV3+wyv mTesGCUoP+a4Hl96cF4/MmkGd7BhQkVt9YUV9wDlyqkCVgeNr/vv8SdObLlB45ySL9vjKIjWSDt PX4B/Y3/J31Y0Dd2OgrDvbG6RYDzAThRGkQp3ePl/Cb/3nY2GSietDdSDTZ2nCPJVUCtfcybnNO wrJThzjunmnK2jAAFWTDnSo4yA0F2F X-Google-Smtp-Source: AGHT+IHpJWsnObp1bmiCcGCqpFhXm6cSqGmTUUJEU5wTdWf7yhnoQyv4pLG0o6XzVB8bt1+mCfgVJOyoGxg= X-Received: from capy.c.googlers.com ([fda3:e722:ac3:cc00:24:72f4:c0a8:151e]) (user=atneya job=sendgmr) by 2002:a05:690c:3387:b0:608:d499:3439 with SMTP id fl7-20020a05690c338700b00608d4993439mr2260950ywb.7.1709601746938; Mon, 04 Mar 2024 17:22:26 -0800 (PST) Date: Mon, 4 Mar 2024 17:21:11 -0800 In-Reply-To: <20240305012112.1598053-2-atneya@google.com> Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240305012112.1598053-2-atneya@google.com> X-Mailer: git-send-email 2.44.0.rc1.240.g4c46232300-goog Message-ID: <20240305012112.1598053-3-atneya@google.com> Subject: [RFC PATCH 1/3] Make read_gitfile and resolve_gitfile thread safe From: Atneya Nair To: git@vger.kernel.org Cc: gitster@pobox.com, jeffhost@microsoft.com, me@ttaylorr.com, nasamuffin@google.com, Atneya Nair Continue the work in commit: 3d7747e318532a36a263c61cdf92f2decb6424ff to remove unsafe shared buffer usage in read_gitfile_gently. Migrate read_gitfile_gently and resolve_gitfile_gently to take strbuf out params to allocate their return values into, rather than returning a view into a shared buffer. Leave the shared buffer in case a caller passes null for this param (for cases we haven't cleaned up yet). Migrate callers of resolve_gitfile to resolve_gitfile_gently. Signed-off-by: Atneya Nair --- Notes: Open questions: - Is checking the return value of read_gitfile necessary if on error, we are supposed to die, or set the error field to a non-zero value? - Should we clean up the other call-sites of read_gitfile? builtin/init-db.c | 7 ++++--- builtin/rev-parse.c | 4 +++- repository.c | 9 +++++---- setup.c | 36 +++++++++++++++++++++++++----------- setup.h | 7 +++---- submodule.c | 32 +++++++++++++++++++++++--------- worktree.c | 27 +++++++++++++-------------- 7 files changed, 76 insertions(+), 46 deletions(-) diff --git a/builtin/init-db.c b/builtin/init-db.c index 0170469b84..9135d07a0d 100644 --- a/builtin/init-db.c +++ b/builtin/init-db.c @@ -198,11 +198,11 @@ int cmd_init_db(int argc, const char **argv, const char *prefix) */ if (real_git_dir) { int err; - const char *p; struct strbuf sb = STRBUF_INIT; + struct strbuf gitfile = STRBUF_INIT; - p = read_gitfile_gently(git_dir, &err); - if (p && get_common_dir(&sb, p)) { + read_gitfile_gently(git_dir, &err, &gitfile); + if (!err && get_common_dir(&sb, gitfile.buf)) { struct strbuf mainwt = STRBUF_INIT; strbuf_addbuf(&mainwt, &sb); @@ -213,6 +213,7 @@ int cmd_init_db(int argc, const char **argv, const char *prefix) git_dir = strbuf_detach(&sb, NULL); } strbuf_release(&sb); + strbuf_release(&gitfile); } if (is_bare_repository_cfg < 0) diff --git a/builtin/rev-parse.c b/builtin/rev-parse.c index d08987646a..e1db6b3231 100644 --- a/builtin/rev-parse.c +++ b/builtin/rev-parse.c @@ -728,12 +728,14 @@ int cmd_rev_parse(int argc, const char **argv, const char *prefix) } if (!strcmp(arg, "--resolve-git-dir")) { const char *gitdir = argv[++i]; + struct strbuf resolved_gitdir = STRBUF_INIT; if (!gitdir) die(_("--resolve-git-dir requires an argument")); - gitdir = resolve_gitdir(gitdir); + gitdir = resolve_gitdir_gently(gitdir, NULL, &resolved_gitdir); if (!gitdir) die(_("not a gitdir '%s'"), argv[i]); puts(gitdir); + strbuf_release(&resolved_gitdir); continue; } } diff --git a/repository.c b/repository.c index 7aacb51b65..3ca6dbcf16 100644 --- a/repository.c +++ b/repository.c @@ -118,7 +118,7 @@ static int repo_init_gitdir(struct repository *repo, const char *gitdir) int ret = 0; int error = 0; char *abspath = NULL; - const char *resolved_gitdir; + struct strbuf resolved_gitdir = STRBUF_INIT; struct set_gitdir_args args = { NULL }; abspath = real_pathdup(gitdir, 0); @@ -128,15 +128,16 @@ static int repo_init_gitdir(struct repository *repo, const char *gitdir) } /* 'gitdir' must reference the gitdir directly */ - resolved_gitdir = resolve_gitdir_gently(abspath, &error); - if (!resolved_gitdir) { + resolve_gitdir_gently(abspath, &error, &resolved_gitdir); + if (error) { ret = -1; goto out; } - repo_set_gitdir(repo, resolved_gitdir, &args); + repo_set_gitdir(repo, resolved_gitdir.buf, &args); out: + strbuf_release(&resolved_gitdir); free(abspath); return ret; } diff --git a/setup.c b/setup.c index b69b1cbc2a..2e118cf216 100644 --- a/setup.c +++ b/setup.c @@ -397,14 +397,17 @@ int is_nonbare_repository_dir(struct strbuf *path) int ret = 0; int gitfile_error; size_t orig_path_len = path->len; + struct strbuf gitfile = STRBUF_INIT; + assert(orig_path_len != 0); strbuf_complete(path, '/'); strbuf_addstr(path, ".git"); - if (read_gitfile_gently(path->buf, &gitfile_error) || is_git_directory(path->buf)) + if (read_gitfile_gently(path->buf, &gitfile_error, &gitfile) || is_git_directory(path->buf)) ret = 1; if (gitfile_error == READ_GITFILE_ERR_OPEN_FAILED || gitfile_error == READ_GITFILE_ERR_READ_FAILED) ret = 1; + strbuf_release(&gitfile); strbuf_setlen(path, orig_path_len); return ret; } @@ -830,7 +833,8 @@ void read_gitfile_error_die(int error_code, const char *path, const char *dir) /* * Try to read the location of the git directory from the .git file, - * return path to git directory if found. The return value comes from + * return path to git directory if found. If passed a valid strbuf, the return + * value is is a ptr to within the buffer. If strbuf is null, the return value comes from * a shared buffer. * * On failure, if return_error_code is not NULL, return_error_code @@ -838,7 +842,7 @@ void read_gitfile_error_die(int error_code, const char *path, const char *dir) * return_error_code is NULL the function will die instead (for most * cases). */ -const char *read_gitfile_gently(const char *path, int *return_error_code) +const char *read_gitfile_gently(const char *path, int *return_error_code, struct strbuf* result_buf) { const int max_file_size = 1 << 20; /* 1MB */ int error_code = 0; @@ -848,7 +852,10 @@ const char *read_gitfile_gently(const char *path, int *return_error_code) struct stat st; int fd; ssize_t len; - static struct strbuf realpath = STRBUF_INIT; + static struct strbuf shared = STRBUF_INIT; + if (!result_buf) { + result_buf = &shared; + } if (stat(path, &st)) { /* NEEDSWORK: discern between ENOENT vs other errors */ @@ -900,8 +907,10 @@ const char *read_gitfile_gently(const char *path, int *return_error_code) goto cleanup_return; } - strbuf_realpath(&realpath, dir, 1); - path = realpath.buf; + strbuf_realpath(result_buf, dir, 1); + path = result_buf->buf; + // TODO is this valid? + if (!path) die(_("Unexpected null from realpath '%s'"), dir); cleanup_return: if (return_error_code) @@ -1316,12 +1325,13 @@ static enum discovery_result setup_git_directory_gently_1(struct strbuf *dir, int offset = dir->len, error_code = 0; char *gitdir_path = NULL; char *gitfile = NULL; + struct strbuf gitdirenvbuf = STRBUF_INIT; if (offset > min_offset) strbuf_addch(dir, '/'); strbuf_addstr(dir, DEFAULT_GIT_DIR_ENVIRONMENT); gitdirenv = read_gitfile_gently(dir->buf, die_on_error ? - NULL : &error_code); + NULL : &error_code, &gitdirenvbuf); if (!gitdirenv) { if (die_on_error || error_code == READ_GITFILE_ERR_NOT_A_FILE) { @@ -1330,8 +1340,10 @@ static enum discovery_result setup_git_directory_gently_1(struct strbuf *dir, gitdirenv = DEFAULT_GIT_DIR_ENVIRONMENT; gitdir_path = xstrdup(dir->buf); } - } else if (error_code != READ_GITFILE_ERR_STAT_FAILED) + } else if (error_code != READ_GITFILE_ERR_STAT_FAILED) { + strbuf_release(&gitdirenvbuf); return GIT_DIR_INVALID_GITFILE; + } } else gitfile = xstrdup(dir->buf); /* @@ -1365,9 +1377,10 @@ static enum discovery_result setup_git_directory_gently_1(struct strbuf *dir, */ free(gitdir_path); free(gitfile); - + strbuf_release(&gitdirenvbuf); return ret; } + strbuf_release(&gitdirenvbuf); if (is_git_directory(dir->buf)) { trace2_data_string("setup", NULL, "implicit-bare-repository", dir->buf); @@ -1692,11 +1705,12 @@ const char *setup_git_directory(void) return setup_git_directory_gently(NULL); } -const char *resolve_gitdir_gently(const char *suspect, int *return_error_code) +const char *resolve_gitdir_gently(const char *suspect, int *return_error_code, + struct strbuf* return_buf) { if (is_git_directory(suspect)) return suspect; - return read_gitfile_gently(suspect, return_error_code); + return read_gitfile_gently(suspect, return_error_code, return_buf); } /* if any standard file descriptor is missing open it to /dev/null */ diff --git a/setup.h b/setup.h index 3599aec93c..cf5a63762b 100644 --- a/setup.h +++ b/setup.h @@ -36,10 +36,9 @@ int is_nonbare_repository_dir(struct strbuf *path); #define READ_GITFILE_ERR_NOT_A_REPO 7 #define READ_GITFILE_ERR_TOO_LARGE 8 void read_gitfile_error_die(int error_code, const char *path, const char *dir); -const char *read_gitfile_gently(const char *path, int *return_error_code); -#define read_gitfile(path) read_gitfile_gently((path), NULL) -const char *resolve_gitdir_gently(const char *suspect, int *return_error_code); -#define resolve_gitdir(path) resolve_gitdir_gently((path), NULL) +const char *read_gitfile_gently(const char *path, int *return_error_code, struct strbuf *buf); +#define read_gitfile(path) read_gitfile_gently((path), NULL, NULL) +const char *resolve_gitdir_gently(const char *suspect, int *return_error_code, struct strbuf *buf); void setup_work_tree(void); diff --git a/submodule.c b/submodule.c index 213da79f66..455444321b 100644 --- a/submodule.c +++ b/submodule.c @@ -316,9 +316,10 @@ int is_submodule_populated_gently(const char *path, int *return_error_code) int ret = 0; char *gitdir = xstrfmt("%s/.git", path); - if (resolve_gitdir_gently(gitdir, return_error_code)) + struct strbuf resolved_gitdir_buf = STRBUF_INIT; + if (resolve_gitdir_gently(gitdir, return_error_code, &resolved_gitdir_buf)) ret = 1; - + strbuf_release(&resolved_gitdir_buf); free(gitdir); return ret; } @@ -1879,22 +1880,25 @@ unsigned is_submodule_modified(const char *path, int ignore_untracked) { struct child_process cp = CHILD_PROCESS_INIT; struct strbuf buf = STRBUF_INIT; + struct strbuf gitdirbuf = STRBUF_INIT; FILE *fp; unsigned dirty_submodule = 0; const char *git_dir; int ignore_cp_exit_code = 0; strbuf_addf(&buf, "%s/.git", path); - git_dir = read_gitfile(buf.buf); + git_dir = read_gitfile_gently(buf.buf, NULL, &gitdirbuf); if (!git_dir) git_dir = buf.buf; if (!is_git_directory(git_dir)) { if (is_directory(git_dir)) die(_("'%s' not recognized as a git repository"), git_dir); strbuf_release(&buf); + strbuf_release(&gitdirbuf); /* The submodule is not checked out, so it is not modified */ return 0; } + strbuf_release(&gitdirbuf); strbuf_reset(&buf); strvec_pushl(&cp.args, "status", "--porcelain=2", NULL); @@ -1958,15 +1962,16 @@ int submodule_uses_gitfile(const char *path) { struct child_process cp = CHILD_PROCESS_INIT; struct strbuf buf = STRBUF_INIT; - const char *git_dir; + struct strbuf gitfilebuf = STRBUF_INIT; strbuf_addf(&buf, "%s/.git", path); - git_dir = read_gitfile(buf.buf); - if (!git_dir) { + read_gitfile_gently(buf.buf, NULL, &gitfilebuf); + if (!gitfilebuf.buf) { strbuf_release(&buf); return 0; } strbuf_release(&buf); + strbuf_release(&gitfilebuf); /* Now test that all nested submodules use a gitfile too */ strvec_pushl(&cp.args, @@ -2276,6 +2281,7 @@ static void relocate_single_git_dir_into_superproject(const char *path, { char *old_git_dir = NULL, *real_old_git_dir = NULL, *real_new_git_dir = NULL; struct strbuf new_gitdir = STRBUF_INIT; + struct strbuf gitfilebuf = STRBUF_INIT; const struct submodule *sub; if (submodule_uses_worktrees(path)) @@ -2283,9 +2289,12 @@ static void relocate_single_git_dir_into_superproject(const char *path, "more than one worktree not supported"), path); old_git_dir = xstrfmt("%s/.git", path); - if (read_gitfile(old_git_dir)) + if (read_gitfile_gently(old_git_dir, NULL, &gitfilebuf)) { /* If it is an actual gitfile, it doesn't need migration. */ + strbuf_release(&gitfilebuf); return; + } + strbuf_release(&gitfilebuf); real_old_git_dir = real_pathdup(old_git_dir, 1); @@ -2343,8 +2352,9 @@ void absorb_git_dir_into_superproject(const char *path, int err_code; const char *sub_git_dir; struct strbuf gitdir = STRBUF_INIT; + struct strbuf resolved_gitdir_buf = STRBUF_INIT; strbuf_addf(&gitdir, "%s/.git", path); - sub_git_dir = resolve_gitdir_gently(gitdir.buf, &err_code); + sub_git_dir = resolve_gitdir_gently(gitdir.buf, &err_code, &resolved_gitdir_buf); /* Not populated? */ if (!sub_git_dir) { @@ -2385,6 +2395,8 @@ void absorb_git_dir_into_superproject(const char *path, free(real_sub_git_dir); free(real_common_git_dir); } + + strbuf_release(&resolved_gitdir_buf); strbuf_release(&gitdir); absorb_git_dir_into_superproject_recurse(path, super_prefix); @@ -2484,17 +2496,19 @@ int submodule_to_gitdir(struct strbuf *buf, const char *submodule) const struct submodule *sub; const char *git_dir; int ret = 0; + struct strbuf gitfilebuf = STRBUF_INIT; strbuf_reset(buf); strbuf_addstr(buf, submodule); strbuf_complete(buf, '/'); strbuf_addstr(buf, ".git"); - git_dir = read_gitfile(buf->buf); + git_dir = read_gitfile_gently(buf->buf, NULL, &gitfilebuf); if (git_dir) { strbuf_reset(buf); strbuf_addstr(buf, git_dir); } + strbuf_release(&gitfilebuf); if (!is_git_directory(buf->buf)) { sub = submodule_from_path(the_repository, null_oid(), submodule); diff --git a/worktree.c b/worktree.c index b02a05a74a..a6f125c8da 100644 --- a/worktree.c +++ b/worktree.c @@ -309,7 +309,7 @@ int validate_worktree(const struct worktree *wt, struct strbuf *errmsg, { struct strbuf wt_path = STRBUF_INIT; struct strbuf realpath = STRBUF_INIT; - char *path = NULL; + struct strbuf gitfile = STRBUF_INIT; int err, ret = -1; strbuf_addf(&wt_path, "%s/.git", wt->path); @@ -353,21 +353,20 @@ int validate_worktree(const struct worktree *wt, struct strbuf *errmsg, goto done; } - path = xstrdup_or_null(read_gitfile_gently(wt_path.buf, &err)); - if (!path) { + if (!read_gitfile_gently(wt_path.buf, &err, &gitfile)) { strbuf_addf_gently(errmsg, _("'%s' is not a .git file, error code %d"), wt_path.buf, err); goto done; } strbuf_realpath(&realpath, git_common_path("worktrees/%s", wt->id), 1); - ret = fspathcmp(path, realpath.buf); + ret = fspathcmp(gitfile.buf, realpath.buf); if (ret) strbuf_addf_gently(errmsg, _("'%s' does not point back to '%s'"), wt->path, git_common_path("worktrees/%s", wt->id)); done: - free(path); + strbuf_release(&gitfile); strbuf_release(&wt_path); strbuf_release(&realpath); return ret; @@ -567,7 +566,7 @@ static void repair_gitfile(struct worktree *wt, { struct strbuf dotgit = STRBUF_INIT; struct strbuf repo = STRBUF_INIT; - char *backlink; + struct strbuf backlink = STRBUF_INIT; const char *repair = NULL; int err; @@ -582,13 +581,13 @@ static void repair_gitfile(struct worktree *wt, strbuf_realpath(&repo, git_common_path("worktrees/%s", wt->id), 1); strbuf_addf(&dotgit, "%s/.git", wt->path); - backlink = xstrdup_or_null(read_gitfile_gently(dotgit.buf, &err)); + read_gitfile_gently(dotgit.buf, &err, &backlink); if (err == READ_GITFILE_ERR_NOT_A_FILE) fn(1, wt->path, _(".git is not a file"), cb_data); else if (err) repair = _(".git file broken"); - else if (fspathcmp(backlink, repo.buf)) + else if (fspathcmp(backlink.buf, repo.buf)) repair = _(".git file incorrect"); if (repair) { @@ -596,7 +595,7 @@ static void repair_gitfile(struct worktree *wt, write_file(dotgit.buf, "gitdir: %s", repo.buf); } - free(backlink); + strbuf_release(&backlink); strbuf_release(&repo); strbuf_release(&dotgit); } @@ -685,7 +684,7 @@ void repair_worktree_at_path(const char *path, struct strbuf realdotgit = STRBUF_INIT; struct strbuf gitdir = STRBUF_INIT; struct strbuf olddotgit = STRBUF_INIT; - char *backlink = NULL; + struct strbuf backlink = STRBUF_INIT; const char *repair = NULL; int err; @@ -701,12 +700,12 @@ void repair_worktree_at_path(const char *path, goto done; } - backlink = xstrdup_or_null(read_gitfile_gently(realdotgit.buf, &err)); + read_gitfile_gently(realdotgit.buf, &err, &backlink); if (err == READ_GITFILE_ERR_NOT_A_FILE) { fn(1, realdotgit.buf, _("unable to locate repository; .git is not a file"), cb_data); goto done; } else if (err == READ_GITFILE_ERR_NOT_A_REPO) { - if (!(backlink = infer_backlink(realdotgit.buf))) { + if (!(backlink.buf = infer_backlink(realdotgit.buf))) { fn(1, realdotgit.buf, _("unable to locate repository; .git file does not reference a repository"), cb_data); goto done; } @@ -715,7 +714,7 @@ void repair_worktree_at_path(const char *path, goto done; } - strbuf_addf(&gitdir, "%s/gitdir", backlink); + strbuf_addf(&gitdir, "%s/gitdir", backlink.buf); if (strbuf_read_file(&olddotgit, gitdir.buf, 0) < 0) repair = _("gitdir unreadable"); else { @@ -729,7 +728,7 @@ void repair_worktree_at_path(const char *path, write_file(gitdir.buf, "%s", realdotgit.buf); } done: - free(backlink); + strbuf_release(&backlink); strbuf_release(&olddotgit); strbuf_release(&gitdir); strbuf_release(&realdotgit); From patchwork Tue Mar 5 01:21:12 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Atneya Nair X-Patchwork-Id: 13581412 Received: from mail-yb1-f202.google.com (mail-yb1-f202.google.com [209.85.219.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2D0B85664 for ; Tue, 5 Mar 2024 01:22:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709601751; cv=none; b=P1j/wTtDMi8qVLWiTpB9LAyEHimUf/w9rGOS8h1yeZJhkHr7jyQZa7s3yEnBIHc0Ficu35rkdJ2yhj8laZHKEJWpG9ZZGaZfxn64O646qYFtT3OYoWqMjG7tMjYmC3kyJIsA2qWPHhwEGs9CqTBC3COE8HOLG2h77+GJahiaiX4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709601751; c=relaxed/simple; bh=G+oW77TjnTjnTh+vSrJtmK7X607SJcN0eyPP69LQnzc=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=vCIY5mdwwlfnKvHehW7OvnofjW0/VFUlHF6YnoonBzLJ4IyukhLhWRNwNt1hLPZUfOKBgBgAEKhBUQ0C/hO78tOgFdwxgT5gbN4PJji+a4P/Yl8Z9UmrBPtbSfnZj2ow5YqvrhXB7LjOQcjqwrZm1KOqwKCfFoNPib8WqdsWbS4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--atneya.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=u8lhISkm; arc=none smtp.client-ip=209.85.219.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--atneya.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="u8lhISkm" Received: by mail-yb1-f202.google.com with SMTP id 3f1490d57ef6-dbe9e13775aso8809633276.1 for ; Mon, 04 Mar 2024 17:22:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1709601749; x=1710206549; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=Mr9pwK0AlYnn4aBV1+s0A2I/JGdMoYX3va1Z9Jyvnlc=; b=u8lhISkm9YGUKkITp6tuklXimSXzTU33TvIMezXhTu+bWAbyzBBxowBp4MLiTkJypt Kzvw/CYWjiOZcuGZfmEAkQVi4mMbHgGo73EfkSUVNOah/Dtd38DYkVi08nNsygDmaYhP SGS1pSgazXgVu4eu7gY+C2vBRKWJ4NcMAbfS9xKS6u2nYQRzkDW7Bt/ki9MNOFExaQUJ 2nQFZtxBcgAAduMDqZbM7b9NZXSX4shVJCngXuHKNEGDK8HKZOUxvWQzd4ZCiFYCYIWr l7gzyoRe5PsW7cyfWLwrkBFyB6tgSRYYYHNsdNQUwjFic7mMMITmPb37qpOm1nvL7Jkc aXUA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709601749; x=1710206549; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Mr9pwK0AlYnn4aBV1+s0A2I/JGdMoYX3va1Z9Jyvnlc=; b=xGUTxGoUVVftdvOh3L9E+gRgydNaK+2+SJUw9Rl5kTRFbKScYSRD32/hsePfInPLjj TdEH3haOfHTqMGdfRyo5tOK26U2UjG4NVjrb12zv8LYxqeKoWd08AcfKqlZOVD21+0Wp HcmV+aHhc9Ps2KlTBvz7JwCYGm9drBGC9m/WK+Ry42BHEeFZfZBjrIJ3DcSi3vCAZm9V jZttdjDJlN2udIqoTwaerscs/Q0bm8keYSbnB98WgJenUSPsPAS2yNsfYmKLKhKiN3Lk 80FujHeIjnPZYoAUqLVodINW9idUw9p8pRt77PsxUgmKhITtoU035o7bgWnfZj4+S+jL B7Zg== X-Gm-Message-State: AOJu0YysaRy4gSTOJm1HEBjh5RRpmlTjD+VHzpvbqN6kU0tZ7oYLv8Cv HJ5n1zmfJ3B45kYkdLxsX6SWVzGv3yXPlUKMIWa35LJhGKi+lj78W3MiwXbHHAZ8myme86EgDEt a8xIVTdJ8cW5J8t0C2mkfrCdBtny914PC/Wz+JnYpzO7+JIcM+htpunccHNxsord03T+aRyAD0K 8y8UDOCg4bnQFei6yOcojY8WpaJeFM X-Google-Smtp-Source: AGHT+IE+5oyMXQ9aElIIb+gB22osFL1vRB3oAyBcUysgiCFZNEmJA8gzuOtVY8nbMrSDjgogbbls0p7KC3c= X-Received: from capy.c.googlers.com ([fda3:e722:ac3:cc00:24:72f4:c0a8:151e]) (user=atneya job=sendgmr) by 2002:a05:6902:1885:b0:dc6:207e:e8b1 with SMTP id cj5-20020a056902188500b00dc6207ee8b1mr2731068ybb.2.1709601749051; Mon, 04 Mar 2024 17:22:29 -0800 (PST) Date: Mon, 4 Mar 2024 17:21:12 -0800 In-Reply-To: <20240305012112.1598053-2-atneya@google.com> Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240305012112.1598053-2-atneya@google.com> X-Mailer: git-send-email 2.44.0.rc1.240.g4c46232300-goog Message-ID: <20240305012112.1598053-4-atneya@google.com> Subject: [RFC PATCH 2/3] Make ce_compare_gitlink thread-safe From: Atneya Nair To: git@vger.kernel.org Cc: gitster@pobox.com, jeffhost@microsoft.com, me@ttaylorr.com, nasamuffin@google.com, Atneya Nair To enable parallel update of the read cache for submodules, ce_compare_gitlink must be thread safe (for different objects). Remove string interning in do_config_from (called from repo_submodule_init) and add locking around accessing the ref_store_map. Signed-off-by: Atneya Nair --- Notes: Chasing down thread unsafe code was done using tsan. Open questions: - Is there any additional thread-unsafety that was missed? - Is it safe to strdup in do_config_from (do the filenames need static lifetime)? What is the performance implication (it seems small)? - Is the locking around ref_store_map appropriate and/or can it be made more granular? config.c | 3 ++- config.h | 2 +- refs.c | 9 +++++++++ 3 files changed, 12 insertions(+), 2 deletions(-) diff --git a/config.c b/config.c index 3cfeb3d8bd..d7f73d8745 100644 --- a/config.c +++ b/config.c @@ -1017,7 +1017,7 @@ static void kvi_from_source(struct config_source *cs, enum config_scope scope, struct key_value_info *out) { - out->filename = strintern(cs->name); + out->filename = strdup(cs->name); out->origin_type = cs->origin_type; out->linenr = cs->linenr; out->scope = scope; @@ -1857,6 +1857,7 @@ static int do_config_from(struct config_source *top, config_fn_t fn, strbuf_release(&top->value); strbuf_release(&top->var); + free(kvi.filename); return ret; } diff --git a/config.h b/config.h index 5dba984f77..b78f1b6667 100644 --- a/config.h +++ b/config.h @@ -118,7 +118,7 @@ struct config_options { /* Config source metadata for a given config key-value pair */ struct key_value_info { - const char *filename; + char *filename; int linenr; enum config_origin_type origin_type; enum config_scope scope; diff --git a/refs.c b/refs.c index c633abf284..cce8a31b22 100644 --- a/refs.c +++ b/refs.c @@ -2126,6 +2126,9 @@ struct ref_store *get_submodule_ref_store(const char *submodule) size_t len; struct repository *subrepo; + // TODO is this locking tolerable, and/or can we get any finer + static pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER; + if (!submodule) return NULL; @@ -2139,7 +2142,9 @@ struct ref_store *get_submodule_ref_store(const char *submodule) /* We need to strip off one or more trailing slashes */ submodule = to_free = xmemdupz(submodule, len); + pthread_mutex_lock(&lock); refs = lookup_ref_store_map(&submodule_ref_stores, submodule); + pthread_mutex_unlock(&lock); if (refs) goto done; @@ -2162,10 +2167,14 @@ struct ref_store *get_submodule_ref_store(const char *submodule) free(subrepo); goto done; } + + pthread_mutex_lock(&lock); + // TODO maybe lock this separately refs = ref_store_init(subrepo, submodule_sb.buf, REF_STORE_READ | REF_STORE_ODB); register_ref_store_map(&submodule_ref_stores, "submodule", refs, submodule); + pthread_mutex_unlock(&lock); done: strbuf_release(&submodule_sb); From patchwork Tue Mar 5 01:21:13 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Atneya Nair X-Patchwork-Id: 13581413 Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com [209.85.128.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3BE60134AC for ; Tue, 5 Mar 2024 01:22:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709601753; cv=none; b=qXYukO5SbIPnKjB1c0ruZ5M+izvdLOxf6BGZr9OFXmDDJQikv0unKUpEyl/JBRwH58NtM+IioaC4agBqVBcECrdvmV4KQp4/35h9WXE3Ju8n+sOi3kmv77mJzr/MESYHDn234M7HZFtS7sUIrUiCfRFxnhR7TZlrBhNSdCT3SL8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709601753; c=relaxed/simple; bh=zSWET9XHltTI4dv7J2oQE+rBaTTvhLwC5sDkBHBn51I=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=NblmtOCZN8N7dRg1xvC5u5Mch5D456DSxpLgSmNFr28qm3LKg43TyYYvPtXiHhOH23M0mUNDE9TQGofK4QAdImm68lJYC9aqGzgAGOxPxlZEXmUG+7xTClpaBXJvYwmILEH4jO7+yki5+k0VoLsUGUxvZofjzSHx1tafVA+0+eE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--atneya.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=EVQuUqzv; arc=none smtp.client-ip=209.85.128.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--atneya.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="EVQuUqzv" Received: by mail-yw1-f202.google.com with SMTP id 00721157ae682-60942936e31so91961107b3.1 for ; Mon, 04 Mar 2024 17:22:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1709601751; x=1710206551; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=MJMXaOH3wOHu7qLOMyVPppYQQizpgRw31SFU8Zn1pCc=; b=EVQuUqzvfft/AszcoNcVQHP7vO94PbwgIC6OvxLIwcF1VzOk09tEJX9yrXZ+MBY1BN TXJPBHfD2eFJQ8rOO7Nf3gv/1lwFoNCThuGgI4eFF4rOPCzH5588KLLsnhC/o8KeJqd+ q4GIoiHtiozZrW9QB9we5DJ82XwcVTOcILjgBkqi5WfacXZgE8wpHxQJq2OscfF0jAtK TcpUXj5HlDOJFKcTdxkieOSA1MGucV0nk9uDkrbbiut4/0LHnmRbEd+DoBAMkTFAxwpA eO0Mv8tbKb45zYK6LnXnSQKuESzgXJWI9CiyYJJ5G2XNg+DE8v17d7CzyupFWkKj5P5u D6SQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709601751; x=1710206551; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=MJMXaOH3wOHu7qLOMyVPppYQQizpgRw31SFU8Zn1pCc=; b=QDH5Vp2BgeAZU/XQ1DoSz2UrzPfauVBqMDciA+L6a7Gz9PDVkgf9smJ5Pi8/T+pjdR xXsNnxeD3QqGNZfTlNvOfsVK3qFp+BYqkYnNM/8FmpFYfQ8O6DkhtBtqYTFbRD37XJUy b7OLhKTgDUFVGsC0bsCUzQEB1hWtFyC38vZo6Uj7Z3v9JtBeVhvOzgsqUWf5Kfxw9Xhn +iSqsBlcUKxS5bMZseUE/C2upuSggd9JcrWumbuRyEinA+0xQPA18geI3jO5qtX7Sn0S yjMJxdUDGShzRLfQ+1qWZ/vkeR953QExeG7sO/44sAjEHWdbm87Y1ZJj3iVvjWG69ZwM by0w== X-Gm-Message-State: AOJu0Yyfy+WGZLY+TS7dw6L8zVkmqIYl7PrLM59WiWWPt5WyUL+tedDg gBUZvQEyTSGmXgFUZh8JViuezeBjUy/lAuY/C5PdjMm/qYcYETDd1zysdhHhyQRQ9kTNH2ewFWc wFQ22yLnOJvrRlkbWTGQGZugneFUUHvUQ/dRO+neOU6bXXbD2JjcvodW1vCa2zlOEFKd+xkYBjp JDtvX10Xt8F0WLm4NhMEkzl6+l3ebz X-Google-Smtp-Source: AGHT+IGN562l15+umteGzUaDfamJc7ZxS1cG1ohhIvP3ZBG3t6yIlAjxwGRs71uhVjrfgg+eQ3Uc4ZP+fZI= X-Received: from capy.c.googlers.com ([fda3:e722:ac3:cc00:24:72f4:c0a8:151e]) (user=atneya job=sendgmr) by 2002:a81:9809:0:b0:609:2031:1e09 with SMTP id p9-20020a819809000000b0060920311e09mr2955831ywg.6.1709601751128; Mon, 04 Mar 2024 17:22:31 -0800 (PST) Date: Mon, 4 Mar 2024 17:21:13 -0800 In-Reply-To: <20240305012112.1598053-2-atneya@google.com> Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240305012112.1598053-2-atneya@google.com> X-Mailer: git-send-email 2.44.0.rc1.240.g4c46232300-goog Message-ID: <20240305012112.1598053-5-atneya@google.com> Subject: [RFC PATCH 3/3] Preload submodule state in refresh_index From: Atneya Nair To: git@vger.kernel.org Cc: gitster@pobox.com, jeffhost@microsoft.com, me@ttaylorr.com, nasamuffin@google.com, Atneya Nair refresh_index currently parallelizes updating cache_entries for regular files based on lstat. Expand preload to parallelize exploring the checked out state of submodules, which is substantially more expensive. Cache the state of the submodule in memory, to avoid unnecessary re-computation (similar to regular files). This speeds up git status, and other operations which examine the read index, especially in repositories with many submodules. Signed-off-by: Atneya Nair --- Notes: For now, I added a new field to store the submodule state. Open questions: - Where can we efficiently store the submodule state? I assume we can re-use some of the ce_flags which aren't used for submodules? - Why can threads only go up to 64? Can we make this user configurable? preload-index.c | 25 ++++++++++++++++++++++--- read-cache-ll.h | 1 + read-cache.c | 3 +++ 3 files changed, 26 insertions(+), 3 deletions(-) diff --git a/preload-index.c b/preload-index.c index 63fd35d64b..091b22fa4c 100644 --- a/preload-index.c +++ b/preload-index.c @@ -22,7 +22,7 @@ * to have at least 500 lstat's per thread for it to * be worth starting a thread. */ -#define MAX_PARALLEL (20) +#define MAX_PARALLEL (60) #define THREAD_COST (500) struct progress_data { @@ -59,8 +59,21 @@ static void *preload_thread(void *_data) if (ce_stage(ce)) continue; - if (S_ISGITLINK(ce->ce_mode)) + if (S_ISGITLINK(ce->ce_mode)) { + // This call evaluates the submodule HEAD for GITLINK, which really does determine + // if there is a change (for index purposes). We can't use the traditional path of + // marking as VALID, because valid can't be used for submodules due to other code + // paths in which valid may skip investigation of the worktree in the submodule. + // Gitlinks also aren't statable, or fsmonitorable, so caching doesn't have the same + // semantics. + // Use a special entry to mark the ref change state and its validity. Future calls + // to ce_compare_gitlink will leverage this. + if (lstat(ce->name, &st)) + continue; + ce->sub_ref_state = (!!(ie_match_stat(index, ce, &st, + CE_MATCH_RACY_IS_DIRTY|CE_MATCH_IGNORE_FSMONITOR) & DATA_CHANGED) << 1) | 0x1; continue; + } if (ce_uptodate(ce)) continue; if (ce_skip_worktree(ce)) @@ -107,11 +120,17 @@ void preload_index(struct index_state *index, struct thread_data data[MAX_PARALLEL]; struct progress_data pd; int t2_sum_lstat = 0; + int link_count = 0; if (!HAVE_THREADS || !core_preload_index) return; - threads = index->cache_nr / THREAD_COST; + for (i = 0; i < index->cache_nr; i++) { + link_count += (S_ISGITLINK(index->cache[i]->ce_mode)); + } + // Exploring gitlinks are much more expensive than lstat, so modify the cost + threads = (index->cache_nr / THREAD_COST) + (link_count / 25); + if ((index->cache_nr > 1) && (threads < 2) && git_env_bool("GIT_TEST_PRELOAD_INDEX", 0)) threads = 2; if (threads < 2) diff --git a/read-cache-ll.h b/read-cache-ll.h index 2a50a784f0..5555bb0ae9 100644 --- a/read-cache-ll.h +++ b/read-cache-ll.h @@ -27,6 +27,7 @@ struct cache_entry { unsigned int mem_pool_allocated; unsigned int ce_namelen; unsigned int index; /* for link extension */ + unsigned int sub_ref_state; /* TODO pack somewhere. Lowest bit valid, second lowest dirty. */ struct object_id oid; char name[FLEX_ARRAY]; /* more */ }; diff --git a/read-cache.c b/read-cache.c index f546cf7875..541d40ca30 100644 --- a/read-cache.c +++ b/read-cache.c @@ -271,6 +271,9 @@ static int ce_compare_gitlink(const struct cache_entry *ce) * * If so, we consider it always to match. */ + if (ce->sub_ref_state & 0x1) + /* Check the cached value */ + return ce->sub_ref_state >> 1; if (resolve_gitlink_ref(ce->name, "HEAD", &oid) < 0) return 0; return !oideq(&oid, &ce->oid);