Message ID | e193a45318244d9f8b05dfe2fb1ce57f6a4f6428.1696615769.git.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | Performance improvement & cleanup in loose ref iteration | expand |
"Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes: > Unlike other existing usage of 'get_dtype', the 'follow_symlinks' arg is set > to 1 to replicate the existing handling of symlink dirents. This > unfortunately requires calling 'stat' on the associated entry regardless of > platform, but symlinks in the loose ref store are highly unlikely since > they'd need to be created manually by a user. Yeek. I wonder what breaks if we do not do this follow_symlinks() part, i.e., either just replace stat() with lstat() in the original without any of these four patches (which would be simple to figure out what breaks), or omit [3/4] and let get_dtype() yield DT_LNK. It seems that it comes from a7e66ae3 ([PATCH] Make do_each_ref() follow symlinks., 2005-08-16), and just like I commented on there in its log message back then, I still doubt that following a symbolic link is a great idea here in this codepath. But optimization without behaviour change is a good way to ensure that optimization does not introduce new bugs, and because keeping the historical behaviour like the patches [3/4] and this patch does is more work (meaning: if it proves unnecessary to dereference symbolic links, we can remove code instead of having to write new code to support the new behaviour), let's take the series as-is, and defer it to future developers to further clean-up the semantics. > Note that this patch also changes the condition for skipping creation of a > ref entry from "when 'stat' fails" to "when the d_type is anything other > than DT_REG or DT_DIR". If a dirent's d_type is DT_UNKNOWN (either because > the platform doesn't support d_type in dirents or some other reason) or > DT_LNK, 'get_dtype' will try to derive the underlying type with 'stat'. If > the 'stat' fails, the d_type will remain 'DT_UNKNOWN' and dirent will be > skipped. However, it will also be skipped if it is any other valid d_type > (e.g. DT_FIFO for named pipes, DT_LNK for a nested symlink). Git does not > handle these properly anyway, so we can safely constrain accepted types to > directories and regular files. Sounds good. > Signed-off-by: Victoria Dye <vdye@github.com> > --- > refs/files-backend.c | 14 +++++--------- > 1 file changed, 5 insertions(+), 9 deletions(-) Thanks.
diff --git a/refs/files-backend.c b/refs/files-backend.c index 341354182bb..db5c0c7a724 100644 --- a/refs/files-backend.c +++ b/refs/files-backend.c @@ -246,10 +246,8 @@ static void loose_fill_ref_dir(struct ref_store *ref_store, int dirnamelen = strlen(dirname); struct strbuf refname; struct strbuf path = STRBUF_INIT; - size_t path_baselen; files_ref_path(refs, &path, dirname); - path_baselen = path.len; d = opendir(path.buf); if (!d) { @@ -262,23 +260,22 @@ static void loose_fill_ref_dir(struct ref_store *ref_store, while ((de = readdir(d)) != NULL) { struct object_id oid; - struct stat st; int flag; + unsigned char dtype; if (de->d_name[0] == '.') continue; if (ends_with(de->d_name, ".lock")) continue; strbuf_addstr(&refname, de->d_name); - strbuf_addstr(&path, de->d_name); - if (stat(path.buf, &st) < 0) { - ; /* silently ignore */ - } else if (S_ISDIR(st.st_mode)) { + + dtype = get_dtype(de, &path, 1); + if (dtype == DT_DIR) { strbuf_addch(&refname, '/'); add_entry_to_dir(dir, create_dir_entry(dir->cache, refname.buf, refname.len)); - } else { + } else if (dtype == DT_REG) { if (!refs_resolve_ref_unsafe(&refs->base, refname.buf, RESOLVE_REF_READING, @@ -308,7 +305,6 @@ static void loose_fill_ref_dir(struct ref_store *ref_store, create_ref_entry(refname.buf, &oid, flag)); } strbuf_setlen(&refname, dirnamelen); - strbuf_setlen(&path, path_baselen); } strbuf_release(&refname); strbuf_release(&path);