From patchwork Tue Sep 14 03:38:40 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Neeraj Singh (WINDOWS-SFS)" X-Patchwork-Id: 12491615 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7B82DC433EF for ; Tue, 14 Sep 2021 03:39:03 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 55D6A60F26 for ; Tue, 14 Sep 2021 03:39:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238757AbhINDkG (ORCPT ); Mon, 13 Sep 2021 23:40:06 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57100 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238591AbhINDkF (ORCPT ); Mon, 13 Sep 2021 23:40:05 -0400 Received: from mail-wr1-x42a.google.com (mail-wr1-x42a.google.com [IPv6:2a00:1450:4864:20::42a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8983DC061760 for ; Mon, 13 Sep 2021 20:38:48 -0700 (PDT) Received: by mail-wr1-x42a.google.com with SMTP id g16so17826432wrb.3 for ; Mon, 13 Sep 2021 20:38:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=7X2vP95Nrih2/ZYF5MeQkFO3N7Ybz8M6MMt6amHXYUA=; b=ZUMw+c4bVODbvvJ6t9dCxmWJU8Fa/Zp1z6WIdLx1dJyphtQnSIQw6D8d1EKHrxD6ng yJGs+ZXPZoJunZnuNeICj5yWJ69juB46Q8uEm5YFw6bK0vuuHBv0NnnqsgKPfpWvBLxS zD5u2+OCqa7i2aBBMVj1FDVUEhEWYHpcDyjF8qYyN2AwljTh0/PhgtAsdifD+yZvGVlf biq7d6D/D59fkBhgwPDmZIDcP0jTxQUS9UkT9y//535yH/YD1dPnja0gSDA81ZtmhLCz yo18FB9TcxBS/WxGY7cvge5XoTPZe/Z5j3MsVUzBV7vf9aqrIbkV3kQALRj8YJZnw7bJ Ig9Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=7X2vP95Nrih2/ZYF5MeQkFO3N7Ybz8M6MMt6amHXYUA=; b=SFdsMU3m5rvKQwjbXFMwAu228AdjeQ4De0UH8JLYeTsDYiYC00+z5IYJVw/N6IBxF2 q2buCdKPhhgycRFuJEX6BWz/zdwvXoz1/D0+S9nlazKIcCFFRtnsRIovZoyoTLCKNvH3 fjtLDNruG7XxuYnir/09Epquptsuu0mVJU5Duxcf8guMP3liZW/6/mIPuKVDCzGx4mMT fZtQPEam1NSH2Jj5FeICfOO/tCfmE6d0GCov93dbMPQlApHu4GKMgtm7JUJtUJ1kEkr2 EW6/Evs8aaiMs96cYxjlaaJBaccffRoKQ+ETvAX5DRPWLWWEw0tcu+Abzqu/vbxvjkRa p6fg== X-Gm-Message-State: AOAM533fqtT2Emqo0d/Dvok848Qpj6htSE2vhtoduq+bxwLIRnOuwx65 sn7/jjQmZi+D7ABoialWYlosGhgifV8= X-Google-Smtp-Source: ABdhPJyN3q/cU0MEu7HPoinrvS2c3yz0n9+3Nwdubdbj9RdutKE5DvxykPPn472jUMLNA7kZy4MkSw== X-Received: by 2002:adf:e485:: with SMTP id i5mr15875243wrm.22.1631590727182; Mon, 13 Sep 2021 20:38:47 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id o2sm9696421wrh.13.2021.09.13.20.38.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 13 Sep 2021 20:38:46 -0700 (PDT) Message-Id: In-Reply-To: References: Date: Tue, 14 Sep 2021 03:38:40 +0000 Subject: [PATCH v3 1/6] bulk-checkin: rename 'state' variable and separate 'plugged' boolean Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Neeraj-Personal , Johannes Schindelin , Jeff King , Jeff Hostetler , Christoph Hellwig , =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason , "Randall S. Becker" , "Neeraj K. Singh" , Neeraj Singh Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Neeraj Singh From: Neeraj Singh Preparation for adding bulk-fsync to the bulk-checkin.c infrastructure. * Rename 'state' variable to 'bulk_checkin_state', since we will later be adding 'bulk_fsync_state'. This also makes the variable easier to find in the debugger, since the name is more unique. * Move the 'plugged' data member of 'bulk_checkin_state' into a separate static variable. Doing this avoids resetting the variable in finish_bulk_checkin when zeroing the 'bulk_checkin_state'. As-is, we seem to unintentionally disable the plugging functionality the first time a new packfile must be created due to packfile size limits. While disabling the plugging state only results in suboptimal behavior for the current code, it would be fatal for the bulk-fsync functionality later in this patch series. Signed-off-by: Neeraj Singh --- bulk-checkin.c | 22 ++++++++++++---------- 1 file changed, 12 insertions(+), 10 deletions(-) diff --git a/bulk-checkin.c b/bulk-checkin.c index b023d9959aa..f117d62c908 100644 --- a/bulk-checkin.c +++ b/bulk-checkin.c @@ -10,9 +10,9 @@ #include "packfile.h" #include "object-store.h" -static struct bulk_checkin_state { - unsigned plugged:1; +static int bulk_checkin_plugged; +static struct bulk_checkin_state { char *pack_tmp_name; struct hashfile *f; off_t offset; @@ -21,7 +21,7 @@ static struct bulk_checkin_state { struct pack_idx_entry **written; uint32_t alloc_written; uint32_t nr_written; -} state; +} bulk_checkin_state; static void finish_bulk_checkin(struct bulk_checkin_state *state) { @@ -260,21 +260,23 @@ int index_bulk_checkin(struct object_id *oid, int fd, size_t size, enum object_type type, const char *path, unsigned flags) { - int status = deflate_to_pack(&state, oid, fd, size, type, + int status = deflate_to_pack(&bulk_checkin_state, oid, fd, size, type, path, flags); - if (!state.plugged) - finish_bulk_checkin(&state); + if (!bulk_checkin_plugged) + finish_bulk_checkin(&bulk_checkin_state); return status; } void plug_bulk_checkin(void) { - state.plugged = 1; + assert(!bulk_checkin_plugged); + bulk_checkin_plugged = 1; } void unplug_bulk_checkin(void) { - state.plugged = 0; - if (state.f) - finish_bulk_checkin(&state); + assert(bulk_checkin_plugged); + bulk_checkin_plugged = 0; + if (bulk_checkin_state.f) + finish_bulk_checkin(&bulk_checkin_state); } From patchwork Tue Sep 14 03:38:41 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Neeraj Singh (WINDOWS-SFS)" X-Patchwork-Id: 12491619 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A3BDCC433F5 for ; Tue, 14 Sep 2021 03:39:03 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8095060F26 for ; Tue, 14 Sep 2021 03:39:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238941AbhINDkR (ORCPT ); Mon, 13 Sep 2021 23:40:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57108 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238798AbhINDkL (ORCPT ); Mon, 13 Sep 2021 23:40:11 -0400 Received: from mail-wr1-x42b.google.com (mail-wr1-x42b.google.com [IPv6:2a00:1450:4864:20::42b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6FFD9C061574 for ; Mon, 13 Sep 2021 20:38:49 -0700 (PDT) Received: by mail-wr1-x42b.google.com with SMTP id d6so17779424wrc.11 for ; Mon, 13 Sep 2021 20:38:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=Qg8T1PEZ0w3NRuIp0Dn2JRCuVWgNTjXvJivT1ohaoIM=; b=ZVxWyWeJBh1zsAJRWPaPHeOxG0kMroXaPtxxE6BfASm4AvoPtt1NS5ePoqIrLzWw5o ++E46YsLjCGbeUF6zJWV1cY32bthPRbkHqWTiepGIl9tC1kRf0NhEZ3obbNYYK5Yrydy X1fuy3/qWcGiiZTZ2eV+LduSyzCsUgP77s5Ud3NDHyJ73t4fXTWKXeJ7rK8OxX9++NgU yH/CkZYQufreVXEtMKvwV+KtWwoMgx7aLZn+qiBeXiH0keNmBmw7AEZfZ8r7aHKHD6VK FT4fMd4cgiJCYfqGrikZCuBovftRBs/0rhXTnxgKYbDXZV7LPVJhub5kI8UPPTZ1Oenc zPOg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=Qg8T1PEZ0w3NRuIp0Dn2JRCuVWgNTjXvJivT1ohaoIM=; b=D5f+CdFuX03qCRKxVvussDYLEPIjkKG1miMeUgY3Awr+3C7cB22+mL9eP2Fuq6qFWN r6JLbo3S24ZrBspsUBj3bG2ZWDPXoejHxiOGaoFjogcKx1fifvc1juJ9HrF84weNNihy g9LmOzRb/HKq04tXGL3uMj6xEfgDMfMeDWxwfbrrKDWOEXd+i7eciLkSY/prbfcj8YxW 6A4a8owwT9GrDGGXJa39LvyNydG74QZ28+K7rNFB+OLP1JPjmcckEOMQ9ArbEAvPWDTD bY6ifpXd+Kg8SBgCaBpKBUTy54E6sX5n2w+niov3YeP4kipbLsgOmTH5VgmH49Sf7uWR 5lbg== X-Gm-Message-State: AOAM530qU8vIVwYQQRjlx65vCb8cicG9QIN46KmKqc+WgIyBxAuE8RCi Av5ueRH6XXXUm7+S6RLGs+q6UbsWAQY= X-Google-Smtp-Source: ABdhPJwzVs5TzOxoBPvTOTa8OWTulHf9+soHPgluQOaL1XMYLSrfyOIjwd9AVqpCJbk+0oyDdneXMw== X-Received: by 2002:a05:6000:124b:: with SMTP id j11mr16444648wrx.147.1631590727905; Mon, 13 Sep 2021 20:38:47 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id 20sm122062wme.46.2021.09.13.20.38.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 13 Sep 2021 20:38:47 -0700 (PDT) Message-Id: In-Reply-To: References: Date: Tue, 14 Sep 2021 03:38:41 +0000 Subject: [PATCH v3 2/6] core.fsyncobjectfiles: batched disk flushes Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Neeraj-Personal , Johannes Schindelin , Jeff King , Jeff Hostetler , Christoph Hellwig , =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason , "Randall S. Becker" , "Neeraj K. Singh" , Neeraj Singh Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Neeraj Singh From: Neeraj Singh When adding many objects to a repo with core.fsyncObjectFiles set to true, the cost of fsync'ing each object file can become prohibitive. One major source of the cost of fsync is the implied flush of the hardware writeback cache within the disk drive. Fortunately, Windows, macOS, and Linux each offer mechanisms to write data from the filesystem page cache without initiating a hardware flush. This patch introduces a new 'core.fsyncObjectFiles = batch' option that takes advantage of the bulk-checkin infrastructure to batch up hardware flushes. When the new mode is enabled we do the following for new objects: 1. Create a tmp_obj_XXXX file and write the object data to it. 2. Issue a pagecache writeback request and wait for it to complete. 3. Record the tmp name and the final name in the bulk-checkin state for later rename. At the end of the entire transaction we: 1. Issue a fsync against the lock file to flush the hardware writeback cache, which should by now have processed the tmp file writes. 2. Rename all of the temp files to their final names. 3. When updating the index and/or refs, we assume that Git will issue another fsync internal to that operation. On a filesystem with a singular journal that is updated during name operations (e.g. create, link, rename, etc), such as NTFS and HFS+, we would expect the fsync to trigger a journal writeout so that this sequence is enough to ensure that the user's data is durable by the time the git command returns. This change also updates the macOS code to trigger a real hardware flush via fnctl(fd, F_FULLFSYNC) when fsync_or_die is called. Previously, on macOS there was no guarantee of durability since a simple fsync(2) call does not flush any hardware caches. _Performance numbers_: Linux - Hyper-V VM running Kernel 5.11 (Ubuntu 20.04) on a fast SSD. Mac - macOS 11.5.1 running on a Mac mini on a 1TB Apple SSD. Windows - Same host as Linux, a preview version of Windows 11. This number is from a patch later in the series. Adding 500 files to the repo with 'git add' Times reported in seconds. core.fsyncObjectFiles | Linux | Mac | Windows ----------------------|-------|-------|-------- false | 0.06 | 0.35 | 0.61 true | 1.88 | 11.18 | 2.47 batch | 0.15 | 0.41 | 1.53 Signed-off-by: Neeraj Singh --- Documentation/config/core.txt | 26 ++++++++--- Makefile | 6 +++ builtin/add.c | 3 +- bulk-checkin.c | 81 ++++++++++++++++++++++++++++++++++- bulk-checkin.h | 5 ++- cache.h | 8 +++- config.c | 8 +++- config.mak.uname | 1 + configure.ac | 8 ++++ environment.c | 2 +- git-compat-util.h | 7 +++ object-file.c | 22 +--------- wrapper.c | 36 ++++++++++++++++ write-or-die.c | 2 +- 14 files changed, 182 insertions(+), 33 deletions(-) diff --git a/Documentation/config/core.txt b/Documentation/config/core.txt index c04f62a54a1..0006d90980d 100644 --- a/Documentation/config/core.txt +++ b/Documentation/config/core.txt @@ -548,12 +548,26 @@ core.whitespace:: errors. The default tab width is 8. Allowed values are 1 to 63. core.fsyncObjectFiles:: - This boolean will enable 'fsync()' when writing object files. -+ -This is a total waste of time and effort on a filesystem that orders -data writes properly, but can be useful for filesystems that do not use -journalling (traditional UNIX filesystems) or that only journal metadata -and not file contents (OS X's HFS+, or Linux ext3 with "data=writeback"). + A value indicating the level of effort Git will expend in + trying to make objects added to the repo durable in the event + of an unclean system shutdown. This setting currently only + controls the object store, so updates to any refs or the + index may not be equally durable. ++ +* `false` allows data to remain in file system caches according to + operating system policy, whence it may be lost if the system loses power + or crashes. +* `true` triggers a data integrity flush for each object added to the + object store. This is the safest setting that is likely to ensure durability + across all operating systems and file systems that honor the 'fsync' system + call. However, this setting comes with a significant performance cost on + common hardware. +* `batch` enables an experimental mode that uses interfaces available in some + operating systems to write object data with a minimal set of FLUSH CACHE + (or equivalent) commands sent to the storage controller. If the operating + system interfaces are not available, this mode behaves the same as `true`. + This mode is expected to be safe on macOS for repos stored on HFS+ or APFS + filesystems and on Windows for repos stored on NTFS or ReFS. core.preloadIndex:: Enable parallel index preload for operations like 'git diff' diff --git a/Makefile b/Makefile index 429c276058d..326c7607e0f 100644 --- a/Makefile +++ b/Makefile @@ -406,6 +406,8 @@ all:: # # Define HAVE_CLOCK_MONOTONIC if your platform has CLOCK_MONOTONIC. # +# Define HAVE_SYNC_FILE_RANGE if your platform has sync_file_range. +# # Define NEEDS_LIBRT if your platform requires linking with librt (glibc version # before 2.17) for clock_gettime and CLOCK_MONOTONIC. # @@ -1896,6 +1898,10 @@ ifdef HAVE_CLOCK_MONOTONIC BASIC_CFLAGS += -DHAVE_CLOCK_MONOTONIC endif +ifdef HAVE_SYNC_FILE_RANGE + BASIC_CFLAGS += -DHAVE_SYNC_FILE_RANGE +endif + ifdef NEEDS_LIBRT EXTLIBS += -lrt endif diff --git a/builtin/add.c b/builtin/add.c index 2244311d485..dda4bf093a0 100644 --- a/builtin/add.c +++ b/builtin/add.c @@ -678,7 +678,8 @@ int cmd_add(int argc, const char **argv, const char *prefix) if (chmod_arg && pathspec.nr) exit_status |= chmod_pathspec(&pathspec, chmod_arg[0], show_only); - unplug_bulk_checkin(); + + unplug_bulk_checkin(&lock_file); finish: if (write_locked_index(&the_index, &lock_file, diff --git a/bulk-checkin.c b/bulk-checkin.c index f117d62c908..ddbab5e5c8c 100644 --- a/bulk-checkin.c +++ b/bulk-checkin.c @@ -3,15 +3,19 @@ */ #include "cache.h" #include "bulk-checkin.h" +#include "lockfile.h" #include "repository.h" #include "csum-file.h" #include "pack.h" #include "strbuf.h" +#include "string-list.h" #include "packfile.h" #include "object-store.h" static int bulk_checkin_plugged; +static struct string_list bulk_fsync_state = STRING_LIST_INIT_DUP; + static struct bulk_checkin_state { char *pack_tmp_name; struct hashfile *f; @@ -62,6 +66,32 @@ clear_exit: reprepare_packed_git(the_repository); } +static void do_sync_and_rename(struct string_list *fsync_state, struct lock_file *lock_file) +{ + if (fsync_state->nr) { + struct string_list_item *rename; + + /* + * Issue a full hardware flush against the lock file to ensure + * that all objects are durable before any renames occur. + * The code in fsync_and_close_loose_object_bulk_checkin has + * already ensured that writeout has occurred, but it has not + * flushed any writeback cache in the storage hardware. + */ + fsync_or_die(get_lock_file_fd(lock_file), get_lock_file_path(lock_file)); + + for_each_string_list_item(rename, fsync_state) { + const char *src = rename->string; + const char *dst = rename->util; + + if (finalize_object_file(src, dst)) + die_errno(_("could not rename '%s' to '%s'"), src, dst); + } + + string_list_clear(fsync_state, 1); + } +} + static int already_written(struct bulk_checkin_state *state, struct object_id *oid) { int i; @@ -256,6 +286,53 @@ static int deflate_to_pack(struct bulk_checkin_state *state, return 0; } +static void add_rename_bulk_checkin(struct string_list *fsync_state, + const char *src, const char *dst) +{ + string_list_insert(fsync_state, src)->util = xstrdup(dst); +} + +int fsync_and_close_loose_object_bulk_checkin(int fd, const char *tmpfile, + const char *filename, time_t mtime) +{ + int do_finalize = 1; + int ret = 0; + + if (fsync_object_files != FSYNC_OBJECT_FILES_OFF) { + /* + * If we have a plugged bulk checkin, we issue a call that + * cleans the filesystem page cache but avoids a hardware flush + * command. Later on we will issue a single hardware flush + * before renaming files as part of do_sync_and_rename. + */ + if (bulk_checkin_plugged && + fsync_object_files == FSYNC_OBJECT_FILES_BATCH && + git_fsync(fd, FSYNC_WRITEOUT_ONLY) >= 0) { + add_rename_bulk_checkin(&bulk_fsync_state, tmpfile, filename); + do_finalize = 0; + + } else { + fsync_or_die(fd, "loose object file"); + } + } + + if (close(fd)) + die_errno(_("error when closing loose object file")); + + if (mtime) { + struct utimbuf utb; + utb.actime = mtime; + utb.modtime = mtime; + if (utime(tmpfile, &utb) < 0) + warning_errno(_("failed utime() on %s"), tmpfile); + } + + if (do_finalize) + ret = finalize_object_file(tmpfile, filename); + + return ret; +} + int index_bulk_checkin(struct object_id *oid, int fd, size_t size, enum object_type type, const char *path, unsigned flags) @@ -273,10 +350,12 @@ void plug_bulk_checkin(void) bulk_checkin_plugged = 1; } -void unplug_bulk_checkin(void) +void unplug_bulk_checkin(struct lock_file *lock_file) { assert(bulk_checkin_plugged); bulk_checkin_plugged = 0; if (bulk_checkin_state.f) finish_bulk_checkin(&bulk_checkin_state); + + do_sync_and_rename(&bulk_fsync_state, lock_file); } diff --git a/bulk-checkin.h b/bulk-checkin.h index b26f3dc3b74..4a3309c1531 100644 --- a/bulk-checkin.h +++ b/bulk-checkin.h @@ -6,11 +6,14 @@ #include "cache.h" +int fsync_and_close_loose_object_bulk_checkin(int fd, const char *tmpfile, + const char *filename, time_t mtime); + int index_bulk_checkin(struct object_id *oid, int fd, size_t size, enum object_type type, const char *path, unsigned flags); void plug_bulk_checkin(void); -void unplug_bulk_checkin(void); +void unplug_bulk_checkin(struct lock_file *); #endif diff --git a/cache.h b/cache.h index d23de693680..39b3a88181a 100644 --- a/cache.h +++ b/cache.h @@ -985,7 +985,13 @@ void reset_shared_repository(void); extern int read_replace_refs; extern char *git_replace_ref_base; -extern int fsync_object_files; +enum FSYNC_OBJECT_FILES_MODE { + FSYNC_OBJECT_FILES_OFF, + FSYNC_OBJECT_FILES_ON, + FSYNC_OBJECT_FILES_BATCH +}; + +extern enum FSYNC_OBJECT_FILES_MODE fsync_object_files; extern int core_preload_index; extern int precomposed_unicode; extern int protect_hfs; diff --git a/config.c b/config.c index cb4a8058bff..9fe3602e1c4 100644 --- a/config.c +++ b/config.c @@ -1509,7 +1509,13 @@ static int git_default_core_config(const char *var, const char *value, void *cb) } if (!strcmp(var, "core.fsyncobjectfiles")) { - fsync_object_files = git_config_bool(var, value); + if (!value) + return config_error_nonbool(var); + if (!strcasecmp(value, "batch")) + fsync_object_files = FSYNC_OBJECT_FILES_BATCH; + else + fsync_object_files = git_config_bool(var, value) + ? FSYNC_OBJECT_FILES_ON : FSYNC_OBJECT_FILES_OFF; return 0; } diff --git a/config.mak.uname b/config.mak.uname index 76516aaa9a5..e6d482fbcc6 100644 --- a/config.mak.uname +++ b/config.mak.uname @@ -53,6 +53,7 @@ ifeq ($(uname_S),Linux) HAVE_CLOCK_MONOTONIC = YesPlease # -lrt is needed for clock_gettime on glibc <= 2.16 NEEDS_LIBRT = YesPlease + HAVE_SYNC_FILE_RANGE = YesPlease HAVE_GETDELIM = YesPlease SANE_TEXT_GREP=-a FREAD_READS_DIRECTORIES = UnfortunatelyYes diff --git a/configure.ac b/configure.ac index 031e8d3fee8..c711037d625 100644 --- a/configure.ac +++ b/configure.ac @@ -1090,6 +1090,14 @@ AC_COMPILE_IFELSE([CLOCK_MONOTONIC_SRC], [AC_MSG_RESULT([no]) HAVE_CLOCK_MONOTONIC=]) GIT_CONF_SUBST([HAVE_CLOCK_MONOTONIC]) + +# +# Define HAVE_SYNC_FILE_RANGE=YesPlease if sync_file_range is available. +GIT_CHECK_FUNC(sync_file_range, + [HAVE_SYNC_FILE_RANGE=YesPlease], + [HAVE_SYNC_FILE_RANGE]) +GIT_CONF_SUBST([HAVE_SYNC_FILE_RANGE]) + # # Define NO_SETITIMER if you don't have setitimer. GIT_CHECK_FUNC(setitimer, diff --git a/environment.c b/environment.c index d6b22ede7ea..3e23eafff80 100644 --- a/environment.c +++ b/environment.c @@ -43,7 +43,7 @@ const char *git_hooks_path; int zlib_compression_level = Z_BEST_SPEED; int core_compression_level; int pack_compression_level = Z_DEFAULT_COMPRESSION; -int fsync_object_files; +enum FSYNC_OBJECT_FILES_MODE fsync_object_files; size_t packed_git_window_size = DEFAULT_PACKED_GIT_WINDOW_SIZE; size_t packed_git_limit = DEFAULT_PACKED_GIT_LIMIT; size_t delta_base_cache_limit = 96 * 1024 * 1024; diff --git a/git-compat-util.h b/git-compat-util.h index b46605300ab..d14e2436276 100644 --- a/git-compat-util.h +++ b/git-compat-util.h @@ -1210,6 +1210,13 @@ __attribute__((format (printf, 1, 2))) NORETURN void BUG(const char *fmt, ...); #endif +enum fsync_action { + FSYNC_WRITEOUT_ONLY, + FSYNC_HARDWARE_FLUSH +}; + +int git_fsync(int fd, enum fsync_action action); + /* * Preserves errno, prints a message, but gives no warning for ENOENT. * Returns 0 on success, which includes trying to unlink an object that does diff --git a/object-file.c b/object-file.c index a8be8994814..ea14c3a3483 100644 --- a/object-file.c +++ b/object-file.c @@ -1859,15 +1859,6 @@ int hash_object_file(const struct git_hash_algo *algo, const void *buf, return 0; } -/* Finalize a file on disk, and close it. */ -static void close_loose_object(int fd) -{ - if (fsync_object_files) - fsync_or_die(fd, "loose object file"); - if (close(fd) != 0) - die_errno(_("error when closing loose object file")); -} - /* Size of directory component, including the ending '/' */ static inline int directory_size(const char *filename) { @@ -1973,17 +1964,8 @@ static int write_loose_object(const struct object_id *oid, char *hdr, die(_("confused by unstable object source data for %s"), oid_to_hex(oid)); - close_loose_object(fd); - - if (mtime) { - struct utimbuf utb; - utb.actime = mtime; - utb.modtime = mtime; - if (utime(tmp_file.buf, &utb) < 0) - warning_errno(_("failed utime() on %s"), tmp_file.buf); - } - - return finalize_object_file(tmp_file.buf, filename.buf); + return fsync_and_close_loose_object_bulk_checkin(fd, tmp_file.buf, + filename.buf, mtime); } static int freshen_loose_object(const struct object_id *oid) diff --git a/wrapper.c b/wrapper.c index 7c6586af321..cffe24d307a 100644 --- a/wrapper.c +++ b/wrapper.c @@ -540,6 +540,42 @@ int xmkstemp_mode(char *filename_template, int mode) return fd; } +int git_fsync(int fd, enum fsync_action action) +{ + if (action == FSYNC_WRITEOUT_ONLY) { +#ifdef __APPLE__ + /* + * on Mac OS X, fsync just causes filesystem cache writeback but does not + * flush hardware caches. + */ + return fsync(fd); +#endif + +#ifdef HAVE_SYNC_FILE_RANGE + /* + * On linux 2.6.17 and above, sync_file_range is the way to issue + * a writeback without a hardware flush. An offset of 0 and size of 0 + * indicates writeout of the entire file and the wait flags ensure that all + * dirty data is written to the disk (potentially in a disk-side cache) + * before we continue. + */ + + return sync_file_range(fd, 0, 0, SYNC_FILE_RANGE_WAIT_BEFORE | + SYNC_FILE_RANGE_WRITE | + SYNC_FILE_RANGE_WAIT_AFTER); +#endif + + errno = ENOSYS; + return -1; + } + +#ifdef __APPLE__ + return fcntl(fd, F_FULLFSYNC); +#else + return fsync(fd); +#endif +} + static int warn_if_unremovable(const char *op, const char *file, int rc) { int err; diff --git a/write-or-die.c b/write-or-die.c index d33e68f6abb..8f53953d4ab 100644 --- a/write-or-die.c +++ b/write-or-die.c @@ -57,7 +57,7 @@ void fprintf_or_die(FILE *f, const char *fmt, ...) void fsync_or_die(int fd, const char *msg) { - while (fsync(fd) < 0) { + while (git_fsync(fd, FSYNC_HARDWARE_FLUSH) < 0) { if (errno != EINTR) die_errno("fsync error on '%s'", msg); } From patchwork Tue Sep 14 03:38:42 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Neeraj Singh (WINDOWS-SFS)" X-Patchwork-Id: 12491617 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7C251C433FE for ; Tue, 14 Sep 2021 03:39:06 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 56EBD60EE9 for ; Tue, 14 Sep 2021 03:39:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238952AbhINDkU (ORCPT ); Mon, 13 Sep 2021 23:40:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57110 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238899AbhINDkO (ORCPT ); Mon, 13 Sep 2021 23:40:14 -0400 Received: from mail-wr1-x42c.google.com (mail-wr1-x42c.google.com [IPv6:2a00:1450:4864:20::42c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CE068C061760 for ; Mon, 13 Sep 2021 20:38:49 -0700 (PDT) Received: by mail-wr1-x42c.google.com with SMTP id d6so17779442wrc.11 for ; Mon, 13 Sep 2021 20:38:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=NC71VbTzrZPn0zp8bz9ysLYt9G/SzmNk6M7k9SXUgm8=; b=k74Oh0w0qozYywoF/dn0/SJ64Sxzcq6TxFxsOTXldEGsK5WxvJdYX+POdWuYX6VOAq mOHiruU7e78ulHuaKUXOoxG479+4v1UAMcCSH+7ZcCKMLJCLs+amS1UyW0bNhKkNPVHB dD6fAZp2eTyh4oWQXa6z338smcBcSAcOA8ZlHq+qNs0mPySyciIbwS/ZBbJNFpfYo9+e Lj8reHw21+r+OZALEp22BcsLMZphtIQFDMCPWbodKsRvfs20MbNCAe6zgiCy8D4ycWsD DJy76+E5qFAK2Lxg5uLp5HY1UqLbjLtL4KXqli0D5qVlNFdnZC+bKF9k+ceqG6VldS0i 8OwA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=NC71VbTzrZPn0zp8bz9ysLYt9G/SzmNk6M7k9SXUgm8=; b=WQB1S9xF1Y3uWFcB7GRo95wkqeg+65szp7QV43GsmjgnuU+mF5ztZUJOPqeSVO7gYo eoILIsYJN6l+NtdYbBT9v9DoAkMWY1L3h8TmUo5UhDlvqNvQCFhhMDFXdeJFAr+rSERd Rujo4Ii4GeiHFJoW4MRifCc3uno59NOLT16W4r/qRZX41/SEmRtuVuLaWn1LEYehVD9v TPDCD8cDEclbFQ9qUMTfVBMwqITNvquXvu0BbKC3HyYcVhggx2e3I284JKMb471ywFBi gNLWuVs2BdB+exChuyy4hlj3rCTFBCHlXcYRkVfp0Njvs0oTKRtQaBdWNEa85vGqYVrn +tMA== X-Gm-Message-State: AOAM530LU72zO2YLYXzpvbC1sXTx9pqHbXmfvVGiL/uK++C2GNBYRv7O Denzy7atWUdP7giC9vxLqS+bjqCPtE8= X-Google-Smtp-Source: ABdhPJyaEKNTEfpOUuGIHf8n5aGTscbrXoAvJqBvUw19EH16lrS1CIRZOLxfe4UqqRJV5b3VV3I09g== X-Received: by 2002:adf:fe82:: with SMTP id l2mr11403983wrr.268.1631590728499; Mon, 13 Sep 2021 20:38:48 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id 61sm2184378wrl.94.2021.09.13.20.38.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 13 Sep 2021 20:38:48 -0700 (PDT) Message-Id: <815a862e22940690b3db9a6fbbbda35029c88f66.1631590725.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Tue, 14 Sep 2021 03:38:42 +0000 Subject: [PATCH v3 3/6] core.fsyncobjectfiles: add windows support for batch mode Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Neeraj-Personal , Johannes Schindelin , Jeff King , Jeff Hostetler , Christoph Hellwig , =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason , "Randall S. Becker" , "Neeraj K. Singh" , Neeraj Singh Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Neeraj Singh From: Neeraj Singh This commit adds a win32 implementation for fsync_no_flush that is called git_fsync. The 'NtFlushBuffersFileEx' function being called is available since Windows 8. If the function is not available, we return -1 and Git falls back to doing a full fsync. The operating system is told to flush data only without a hardware flush primitive. A later full fsync will cause the metadata log to be flushed and then the disk cache to be flushed on NTFS and ReFS. Other filesystems will treat this as a full flush operation. I added a new file here for this system call so as not to conflict with downstream changes in the git-for-windows repository related to fscache. Signed-off-by: Neeraj Singh --- compat/mingw.h | 3 +++ compat/win32/flush.c | 29 +++++++++++++++++++++++++++++ config.mak.uname | 2 ++ contrib/buildsystems/CMakeLists.txt | 3 ++- wrapper.c | 4 ++++ 5 files changed, 40 insertions(+), 1 deletion(-) create mode 100644 compat/win32/flush.c diff --git a/compat/mingw.h b/compat/mingw.h index c9a52ad64a6..6074a3d3ced 100644 --- a/compat/mingw.h +++ b/compat/mingw.h @@ -329,6 +329,9 @@ int mingw_getpagesize(void); #define getpagesize mingw_getpagesize #endif +int win32_fsync_no_flush(int fd); +#define fsync_no_flush win32_fsync_no_flush + struct rlimit { unsigned int rlim_cur; }; diff --git a/compat/win32/flush.c b/compat/win32/flush.c new file mode 100644 index 00000000000..c013920ce37 --- /dev/null +++ b/compat/win32/flush.c @@ -0,0 +1,29 @@ +#include "../../git-compat-util.h" +#include +#include "lazyload.h" + +int win32_fsync_no_flush(int fd) +{ + IO_STATUS_BLOCK io_status; + +#define FLUSH_FLAGS_FILE_DATA_ONLY 1 + + DECLARE_PROC_ADDR(ntdll.dll, NTSTATUS, NtFlushBuffersFileEx, + HANDLE FileHandle, ULONG Flags, PVOID Parameters, ULONG ParameterSize, + PIO_STATUS_BLOCK IoStatusBlock); + + if (!INIT_PROC_ADDR(NtFlushBuffersFileEx)) { + errno = ENOSYS; + return -1; + } + + /* See https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/ntifs/nf-ntifs-ntflushbuffersfileex */ + memset(&io_status, 0, sizeof(io_status)); + if (NtFlushBuffersFileEx((HANDLE)_get_osfhandle(fd), FLUSH_FLAGS_FILE_DATA_ONLY, + NULL, 0, &io_status)) { + errno = EINVAL; + return -1; + } + + return 0; +} diff --git a/config.mak.uname b/config.mak.uname index e6d482fbcc6..34c93314a50 100644 --- a/config.mak.uname +++ b/config.mak.uname @@ -451,6 +451,7 @@ endif CFLAGS = BASIC_CFLAGS = -nologo -I. -Icompat/vcbuild/include -DWIN32 -D_CONSOLE -DHAVE_STRING_H -D_CRT_SECURE_NO_WARNINGS -D_CRT_NONSTDC_NO_DEPRECATE COMPAT_OBJS = compat/msvc.o compat/winansi.o \ + compat/win32/flush.o \ compat/win32/path-utils.o \ compat/win32/pthread.o compat/win32/syslog.o \ compat/win32/trace2_win32_process_info.o \ @@ -626,6 +627,7 @@ ifneq (,$(findstring MINGW,$(uname_S))) COMPAT_CFLAGS += -DSTRIP_EXTENSION=\".exe\" COMPAT_OBJS += compat/mingw.o compat/winansi.o \ compat/win32/trace2_win32_process_info.o \ + compat/win32/flush.o \ compat/win32/path-utils.o \ compat/win32/pthread.o compat/win32/syslog.o \ compat/win32/dirent.o diff --git a/contrib/buildsystems/CMakeLists.txt b/contrib/buildsystems/CMakeLists.txt index 171b4124afe..b573a5ee122 100644 --- a/contrib/buildsystems/CMakeLists.txt +++ b/contrib/buildsystems/CMakeLists.txt @@ -261,7 +261,8 @@ if(CMAKE_SYSTEM_NAME STREQUAL "Windows") NOGDI OBJECT_CREATION_MODE=1 __USE_MINGW_ANSI_STDIO=0 USE_NED_ALLOCATOR OVERRIDE_STRDUP MMAP_PREVENTS_DELETE USE_WIN32_MMAP UNICODE _UNICODE HAVE_WPGMPTR ENSURE_MSYSTEM_IS_SET) - list(APPEND compat_SOURCES compat/mingw.c compat/winansi.c compat/win32/path-utils.c + list(APPEND compat_SOURCES compat/mingw.c compat/winansi.c + compat/win32/flush.c compat/win32/path-utils.c compat/win32/pthread.c compat/win32mmap.c compat/win32/syslog.c compat/win32/trace2_win32_process_info.c compat/win32/dirent.c compat/nedmalloc/nedmalloc.c compat/strdup.c) diff --git a/wrapper.c b/wrapper.c index cffe24d307a..a9647018b68 100644 --- a/wrapper.c +++ b/wrapper.c @@ -565,6 +565,10 @@ int git_fsync(int fd, enum fsync_action action) SYNC_FILE_RANGE_WAIT_AFTER); #endif +#ifdef fsync_no_flush + return fsync_no_flush(fd); +#endif + errno = ENOSYS; return -1; } From patchwork Tue Sep 14 03:38:43 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Neeraj Singh (WINDOWS-SFS)" X-Patchwork-Id: 12491621 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6C096C433EF for ; Tue, 14 Sep 2021 03:39:10 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 460A560EE9 for ; Tue, 14 Sep 2021 03:39:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239019AbhINDkY (ORCPT ); Mon, 13 Sep 2021 23:40:24 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57114 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238846AbhINDkO (ORCPT ); Mon, 13 Sep 2021 23:40:14 -0400 Received: from mail-wm1-x335.google.com (mail-wm1-x335.google.com [IPv6:2a00:1450:4864:20::335]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5AC80C061762 for ; Mon, 13 Sep 2021 20:38:50 -0700 (PDT) Received: by mail-wm1-x335.google.com with SMTP id u19-20020a7bc053000000b002f8d045b2caso1378887wmc.1 for ; Mon, 13 Sep 2021 20:38:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=k48/OY3a/Od2qbkotgxVXMxzPpJgzp+Wqzz1gQdi384=; b=aWv+mN4kqDd42KqpRFWyC74g0/6F+WUg+IkJO7chbMmfZ5Xx27Lslv7fzdayr1UPEm dPdvd0BD1QW5QtVKbeuLLIGEgRjdcSVpeRCBUiLi+RW60kYgHxJMS8JLOO7PZNKmyJKI fFP6BOYfXC/KK+rpj8EmuhTkITCzIkyH5EUSPwJWxq/DkCbjwURAAZ2Hue7tj+BsIlE0 mFNC1yAYQxwd2G4vH3AvwKcf8ZTQRTs/Fool8T8jndt/zoUyf/jdRQ2ScZkAQF4Ubcqo QJF6MjSWQ7zk6f1g2qbTKzjXjF7LtXNgcOqjjxmFBtDY82c/GW42haXloXwaIh99CQl2 c9bQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=k48/OY3a/Od2qbkotgxVXMxzPpJgzp+Wqzz1gQdi384=; b=XUqLS3MNM7Bhuq6tbgZC6RvzAqP+/7OaMH7qrEWIHB1yc1cQay04hamE621RwznzT4 FtPPrDfbvEM1huQKmkDYoJ24jT9sVtH8Pgz1wpa/vDhCtkg5f0J2YA2yZNPw+kZEdI0R CzEiJqiFeW122mX4QmiHfH4W0KNczoyBXY2JBiU/bpY/BXuhF6LUnmlbh5cAaIaLu2L0 5/4dKiujjnyxkIX32/8brKLQOLnISULSqUvg+puwdLuPIsN8EpKhEHVVPcxAq17e0yrx E17Aj+hg2nkePDG626U7FGQoBF9IA2CjulSFpnWEkjvHlfxCHlIQhETV4EcJxGkIcgfA 9M5w== X-Gm-Message-State: AOAM531ZwGEhWGcxoVBP5277cPgNw3/pMtMpSv4uKOeZ0ghQaDn/dyOh 5FMyuNiWhlN/VuHetQu9rsQdaBlL8Fc= X-Google-Smtp-Source: ABdhPJycNlMtWokGlOiWXygAcsECLq995JILSbnsgIX7PtsnRoHnZxG2ZK87LuECthxZoJ6te/lO+w== X-Received: by 2002:a7b:c405:: with SMTP id k5mr13178037wmi.24.1631590729049; Mon, 13 Sep 2021 20:38:49 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id w1sm8332895wmc.19.2021.09.13.20.38.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 13 Sep 2021 20:38:48 -0700 (PDT) Message-Id: <6b5760389863d86fc15c69cfb31bafce5ad636e1.1631590725.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Tue, 14 Sep 2021 03:38:43 +0000 Subject: [PATCH v3 4/6] update-index: use the bulk-checkin infrastructure Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Neeraj-Personal , Johannes Schindelin , Jeff King , Jeff Hostetler , Christoph Hellwig , =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason , "Randall S. Becker" , "Neeraj K. Singh" , Neeraj Singh Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Neeraj Singh From: Neeraj Singh The update-index functionality is used internally by 'git stash push' to setup the internal stashed commit. This change enables bulk-checkin for update-index infrastructure to speed up adding new objects to the object database by leveraging the pack functionality and the new bulk-fsync functionality. This mode is enabled when passing paths to update-index via the --stdin flag, as is done by 'git stash'. There is some risk with this change, since under batch fsync, the object files will not be available until the update-index is entirely complete. This usage is unlikely, since any tool invoking update-index and expecting to see objects would have to snoop the output of --verbose to find out when update-index has actually processed a given path. Additionally the index is locked for the duration of the update. Signed-off-by: Neeraj Singh --- builtin/update-index.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/builtin/update-index.c b/builtin/update-index.c index 187203e8bb5..b0689f2cdf6 100644 --- a/builtin/update-index.c +++ b/builtin/update-index.c @@ -5,6 +5,7 @@ */ #define USE_THE_INDEX_COMPATIBILITY_MACROS #include "cache.h" +#include "bulk-checkin.h" #include "config.h" #include "lockfile.h" #include "quote.h" @@ -1150,6 +1151,7 @@ int cmd_update_index(int argc, const char **argv, const char *prefix) struct strbuf unquoted = STRBUF_INIT; setup_work_tree(); + plug_bulk_checkin(); while (getline_fn(&buf, stdin) != EOF) { char *p; if (!nul_term_line && buf.buf[0] == '"') { @@ -1164,6 +1166,7 @@ int cmd_update_index(int argc, const char **argv, const char *prefix) chmod_path(set_executable_bit, p); free(p); } + unplug_bulk_checkin(&lock_file); strbuf_release(&unquoted); strbuf_release(&buf); } From patchwork Tue Sep 14 03:38:44 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Neeraj Singh (WINDOWS-SFS)" X-Patchwork-Id: 12491623 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A6006C433F5 for ; Tue, 14 Sep 2021 03:39:11 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8764F60EE9 for ; Tue, 14 Sep 2021 03:39:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239100AbhINDk0 (ORCPT ); Mon, 13 Sep 2021 23:40:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57120 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238848AbhINDkO (ORCPT ); Mon, 13 Sep 2021 23:40:14 -0400 Received: from mail-wm1-x329.google.com (mail-wm1-x329.google.com [IPv6:2a00:1450:4864:20::329]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 122E2C061764 for ; Mon, 13 Sep 2021 20:38:51 -0700 (PDT) Received: by mail-wm1-x329.google.com with SMTP id y132so8071352wmc.1 for ; Mon, 13 Sep 2021 20:38:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=Vrb0Jnn5ImSQtleFZTqPwnJj+nFf4VQvZpqsrSdQ7Xc=; b=Akd2msZUT+YUxLAZD8/sVtZfIcPKSHDlTPe7xIgaCq367LLYRaJodPMAaRqdq4g808 HAt6etOISC6DOXPrScu3IcVDtnxP2DOzFjBXYiVPycSnoJBIM746f5cryJJAfOrY4fJO 0ctgLUOUVusjJJCcZ4t/dugm5I/JB8xRM+LPXfp3lgCFcY2Jy1LE0o9EOGJfFNpRNp1G OTbtvLIbOLIr715o+2pDGlW8UHVwbe+ncUr67aWs5Q+1m/VMUHaGIcB3YckdCQwWIt8F Mo55RnmGrY+KYNQSjYz45XKNcQf+q6LgozY3IVAAtcofF0Z2OCtmuKHO6ikrSxfjeswa E0PA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=Vrb0Jnn5ImSQtleFZTqPwnJj+nFf4VQvZpqsrSdQ7Xc=; b=Z6DLpBdybX6gAtO1e5e03Y0+RHbV/h7y264z/tna/lemb+VudtQkQHAUjhLvRhn3vG jIvNoOAbkkM3Jlc5W9MsUAqbO6++8oowb4ZaEnTHeHodvPWrUqKB3kjQ6j/4gMdiWvA7 OQLfe8V1apzWavle4sAjrFRt5P/M5nWP6W1GpKmWx1e6+cGpIRZiKis2NVM6XASXxoEe 4v/UO0cjWXQAQlZO6Do7kcjNqriqU9Gkv7NexQm1+tqQto06AGjby7Lr0hFe9ofD6nTU w9e8CVKDKOMFUGVDCM54XqKlDnilulp6b4q1jZ58DRdNiHVMl0MRSDd1rWHg4Mnf7RQV nqZQ== X-Gm-Message-State: AOAM530UNpnXyZ/4kPKqKDUcPP43dRENa8I4YkdwJ+yReXzCCir8xQnI TK919ccX5+WA+lchTMQtNCdQSR7j5HI= X-Google-Smtp-Source: ABdhPJwRP45tDHI2aRHsT/Je7sMccSWEqPN2MRRsM5JPdpVfDIznu/t6JiuXaQbZex+BMynr09OZcQ== X-Received: by 2002:a1c:149:: with SMTP id 70mr14331981wmb.187.1631590729627; Mon, 13 Sep 2021 20:38:49 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id 129sm8346796wmz.26.2021.09.13.20.38.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 13 Sep 2021 20:38:49 -0700 (PDT) Message-Id: In-Reply-To: References: Date: Tue, 14 Sep 2021 03:38:44 +0000 Subject: [PATCH v3 5/6] core.fsyncobjectfiles: performance tests for add and stash Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Neeraj-Personal , Johannes Schindelin , Jeff King , Jeff Hostetler , Christoph Hellwig , =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason , "Randall S. Becker" , "Neeraj K. Singh" , Neeraj Singh Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Neeraj Singh From: Neeraj Singh Add a basic performance test for "git add" and "git stash" of a lot of new objects with various fsync settings. Signed-off-by: Neeraj Singh --- t/perf/lib-unique-files.sh | 32 ++++++++++++++++++++++++++ t/perf/p3700-add.sh | 43 +++++++++++++++++++++++++++++++++++ t/perf/p3900-stash.sh | 46 ++++++++++++++++++++++++++++++++++++++ 3 files changed, 121 insertions(+) create mode 100644 t/perf/lib-unique-files.sh create mode 100755 t/perf/p3700-add.sh create mode 100755 t/perf/p3900-stash.sh diff --git a/t/perf/lib-unique-files.sh b/t/perf/lib-unique-files.sh new file mode 100644 index 00000000000..10083395ae5 --- /dev/null +++ b/t/perf/lib-unique-files.sh @@ -0,0 +1,32 @@ +# Helper to create files with unique contents + +test_create_unique_files_base__=$(date -u) +test_create_unique_files_counter__=0 + +# Create multiple files with unique contents. Takes the number of +# directories, the number of files in each directory, and the base +# directory. +# +# test_create_unique_files 2 3 . -- Creates 2 directories with 3 files +# each in the current directory, all +# with unique contents. + +test_create_unique_files() { + test "$#" -ne 3 && BUG "3 param" + + local dirs=$1 + local files=$2 + local basedir=$3 + + for i in $(test_seq $dirs) + do + local dir=$basedir/dir$i + + mkdir -p "$dir" > /dev/null + for j in $(test_seq $files) + do + test_create_unique_files_counter__=$((test_create_unique_files_counter__ + 1)) + echo "$test_create_unique_files_base__.$test_create_unique_files_counter__" >"$dir/file$j.txt" + done + done +} diff --git a/t/perf/p3700-add.sh b/t/perf/p3700-add.sh new file mode 100755 index 00000000000..4ca3224f364 --- /dev/null +++ b/t/perf/p3700-add.sh @@ -0,0 +1,43 @@ +#!/bin/sh +# +# This test measures the performance of adding new files to the object database +# and index. The test was originally added to measure the effect of the +# core.fsyncObjectFiles=batch mode, which is why we are testing different values +# of that setting explicitly and creating a lot of unique objects. + +test_description="Tests performance of add" + +. ./perf-lib.sh + +. $TEST_DIRECTORY/perf/lib-unique-files.sh + +test_perf_default_repo +test_checkout_worktree + +dir_count=10 +files_per_dir=50 +total_files=$((dir_count * files_per_dir)) + +# We need to create the files each time we run the perf test, but +# we do not want to measure the cost of creating the files, so run +# the tet once. +if test "$GIT_PERF_REPEAT_COUNT" -ne 1 +then + echo "warning: Setting GIT_PERF_REPEAT_COUNT=1" >&2 + GIT_PERF_REPEAT_COUNT=1 +fi + +for m in false true batch +do + test_expect_success "create the files for core.fsyncObjectFiles=$m" ' + git reset --hard && + # create files across directories + test_create_unique_files $dir_count $files_per_dir files + ' + + test_perf "add $total_files files (core.fsyncObjectFiles=$m)" " + git -c core.fsyncobjectfiles=$m add files + " +done + +test_done diff --git a/t/perf/p3900-stash.sh b/t/perf/p3900-stash.sh new file mode 100755 index 00000000000..407b95c104b --- /dev/null +++ b/t/perf/p3900-stash.sh @@ -0,0 +1,46 @@ +#!/bin/sh +# +# This test measures the performance of adding new files to the object database +# and index. The test was originally added to measure the effect of the +# core.fsyncObjectFiles=batch mode, which is why we are testing different values +# of that setting explicitly and creating a lot of unique objects. + +test_description="Tests performance of stash" + +. ./perf-lib.sh + +. $TEST_DIRECTORY/perf/lib-unique-files.sh + +test_perf_default_repo +test_checkout_worktree + +dir_count=10 +files_per_dir=50 +total_files=$((dir_count * files_per_dir)) + +# We need to create the files each time we run the perf test, but +# we do not want to measure the cost of creating the files, so run +# the tet once. +if test "$GIT_PERF_REPEAT_COUNT" -ne 1 +then + echo "warning: Setting GIT_PERF_REPEAT_COUNT=1" >&2 + GIT_PERF_REPEAT_COUNT=1 +fi + +for m in false true batch +do + test_expect_success "create the files for core.fsyncObjectFiles=$m" ' + git reset --hard && + # create files across directories + test_create_unique_files $dir_count $files_per_dir files + ' + + # We only stash files in the 'files' subdirectory since + # the perf test infrastructure creates files in the + # current working directory that need to be preserved + test_perf "stash 500 files (core.fsyncObjectFiles=$m)" " + git -c core.fsyncobjectfiles=$m stash push -u -- files + " +done + +test_done From patchwork Tue Sep 14 03:38:45 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Neeraj Singh (WINDOWS-SFS)" X-Patchwork-Id: 12491625 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 60DD3C433F5 for ; Tue, 14 Sep 2021 03:39:18 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 487CD60F26 for ; Tue, 14 Sep 2021 03:39:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239122AbhINDk2 (ORCPT ); Mon, 13 Sep 2021 23:40:28 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57122 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238834AbhINDkL (ORCPT ); Mon, 13 Sep 2021 23:40:11 -0400 Received: from mail-wm1-x334.google.com (mail-wm1-x334.google.com [IPv6:2a00:1450:4864:20::334]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8334DC061766 for ; Mon, 13 Sep 2021 20:38:51 -0700 (PDT) Received: by mail-wm1-x334.google.com with SMTP id z184-20020a1c7ec1000000b003065f0bc631so909533wmc.0 for ; Mon, 13 Sep 2021 20:38:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=bVvyumh6MBpoMN2SqO/NM5b8LWzac+sdurFlpS+N5Go=; b=H2ceaPtPELV6x9zC4I6Rf56IOPqWZZFX7IIeKzE/xpdtr9Qst15BNcCupwfTP+6rC4 fmAjYh8swC/qERnGzlm8T3t8Z34sTc6D853HUkTYrvVhSgBT9jjoHYtzd9IblIThxXwr 3sh8Fd6/qjlHrqoTraRT5caECjoAT9rZ8AMwsMclVL/yBsXYNhqSPAry8p1EyDmHlMkR 1neM8Tw1usmhONFQorEojz5NsjfeH0mCYHS8UAMSsNjuuOrG9F0E3aUIzzJhDQmiFui8 NW9LGqzhNmhAHQRCaaHPgzqicilkPYGGhkfz8pBF+8G5+otuktoPA9trJmII5uyZZ7tF I7Kw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=bVvyumh6MBpoMN2SqO/NM5b8LWzac+sdurFlpS+N5Go=; b=OcF+lpazAmqPIZSIJ1/J7BWWzNHoqewoBSt0GKI7MnfIk0LUQaifuLF6kg0UtfaPjw DatbxjOLM4sMgkUlJgqOZc7fragtiqhg79heH3MmNtePvco3Lg5u5+EavuK1MRzSf9aK iihkP1GjwvUGQrwEflmU9tuMaO8SKfZ3HbbwGUH1E9dqnb4vpDSVgXWvFBlV2C/1N0gQ l5rwgg6TlbP7Hb7BCesunJGexPIqlfFXRkDw2zkY/nzMhqW6JZBBHjPrAZHwFFnpedYD Y41AWW4MU8jVlcQHV3A45PLi5zPL7HHMbK4q3C4HMSxvSHj7JwZ8Sop709DP31Gl/8gX bKcw== X-Gm-Message-State: AOAM531RpmzqG54rhBADg5k72ymJFj9hVIEsThOEwOPpimK3x4a0t1NR N6myxFd7+ebMx8Hgvf2gMgBt41J/tZw= X-Google-Smtp-Source: ABdhPJyRAEnV858apXS6c6+xoRZQxJ1fFpwhUCnrpDL6MQlSDMJY1023H7ajj3/P1jtVyS/21dFqpQ== X-Received: by 2002:a05:600c:298:: with SMTP id 24mr6442707wmk.116.1631590730166; Mon, 13 Sep 2021 20:38:50 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id f19sm5706418wmf.11.2021.09.13.20.38.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 13 Sep 2021 20:38:49 -0700 (PDT) Message-Id: <55a40fc8fd59df6180c8a87d93fcc9a232ff8d0a.1631590725.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Tue, 14 Sep 2021 03:38:45 +0000 Subject: [PATCH v3 6/6] core.fsyncobjectfiles: enable batch mode for testing Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Neeraj-Personal , Johannes Schindelin , Jeff King , Jeff Hostetler , Christoph Hellwig , =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason , "Randall S. Becker" , "Neeraj K. Singh" , Neeraj Singh Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Neeraj Singh From: Neeraj Singh Signed-off-by: Neeraj Singh --- environment.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/environment.c b/environment.c index 3e23eafff80..27d5e11267e 100644 --- a/environment.c +++ b/environment.c @@ -43,7 +43,7 @@ const char *git_hooks_path; int zlib_compression_level = Z_BEST_SPEED; int core_compression_level; int pack_compression_level = Z_DEFAULT_COMPRESSION; -enum FSYNC_OBJECT_FILES_MODE fsync_object_files; +enum FSYNC_OBJECT_FILES_MODE fsync_object_files = FSYNC_OBJECT_FILES_BATCH; size_t packed_git_window_size = DEFAULT_PACKED_GIT_WINDOW_SIZE; size_t packed_git_limit = DEFAULT_PACKED_GIT_LIMIT; size_t delta_base_cache_limit = 96 * 1024 * 1024;