From patchwork Tue Mar 15 21:30:53 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Neeraj Singh (WINDOWS-SFS)" X-Patchwork-Id: 12781862 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C5900C433F5 for ; Tue, 15 Mar 2022 21:31:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1351931AbiCOVcT (ORCPT ); Tue, 15 Mar 2022 17:32:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37626 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1351045AbiCOVcQ (ORCPT ); Tue, 15 Mar 2022 17:32:16 -0400 Received: from mail-wr1-x42c.google.com (mail-wr1-x42c.google.com [IPv6:2a00:1450:4864:20::42c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8DE1F5BE45 for ; Tue, 15 Mar 2022 14:31:03 -0700 (PDT) Received: by mail-wr1-x42c.google.com with SMTP id h15so371422wrc.6 for ; Tue, 15 Mar 2022 14:31:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=0MN4ba+hT0xX9b9hxT23sNJfmufB+b0cpHIN0fkaGEI=; b=Del9q4Yqm3k8qdb7gMWc4Ld67zqk/wT4DS4nM5uCwFQBwXTq+RP0olV9hVWGmrSNao kT9Sgu031wvhayaiDW+ayoUKsYHig51DLzRbX0Km+7XeVDHRBLKYGTbScWgFxkvdOLoY GaTx44lPZCK3djsAJeHPDofuh/no0xTXkpxirRBR16L8jzg3Ovzlzvq/LoIHyM10nqrb nSSHuljpAsgMFvEbZbGOw4X2yZ0p+Wq7d25PZ96Tm7g+Ezc8PhkNCmiySlwJVYr1dxP+ ARYQDntCJ8e5+3No45i+ITRn25tCWTvUM7oVdZ2iqN5l5rZS83O+2SibxXpg/SPA8ND9 BifQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=0MN4ba+hT0xX9b9hxT23sNJfmufB+b0cpHIN0fkaGEI=; b=I9O79OHbEjCzh3UYb9rMf9WXUUVkShDLLnjBTvz3ktrKSjJgYxfCj/wsmYd3Gw1+vD jgkKkA/eaK9RelZCs/Ispom9ETeZshYq4hCI7bkQTMvhDQo+u/L25iQHmazLdEvO4jo6 ZHP/HMe5YAS2doK8D7OAAzArXSBGRhDFueRRcDiPkrRK/NKxslcMNqPal71r0trvO4EY ySNPY8tVB4R5CU10cyVst96i5jSZjERTcY12WrqUz+HQSIqcaUATs6sCr2OYo5OKUYkX 3EZt5nmY3HJIRKyJbLQjMgXAQrPUigr6TanB0diogdKlQuzlfDJ82U/uvjlfhX9n97aq irVQ== X-Gm-Message-State: AOAM5321TZtrTbb5OCednbKJuhnKD74OrjkDAbuhQMI6MkbaSNPTN/ka P+IJO9788a+NR7vn+VjDLLM1SdHOCoM= X-Google-Smtp-Source: ABdhPJx4lkiLSspxWbRbhVUHixC1L2P6NWTe8ZmzwNFU+bRe2iN2wgzh4OHSeAGMusbCQQvdJOpCqQ== X-Received: by 2002:adf:d1e2:0:b0:1f1:d702:72c1 with SMTP id g2-20020adfd1e2000000b001f1d70272c1mr21598610wrd.687.1647379861795; Tue, 15 Mar 2022 14:31:01 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id o7-20020a5d6707000000b001f067c7b47fsm142935wru.27.2022.03.15.14.31.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 15 Mar 2022 14:31:01 -0700 (PDT) Message-Id: In-Reply-To: References: Date: Tue, 15 Mar 2022 21:30:53 +0000 Subject: [PATCH 1/7] bulk-checkin: rename 'state' variable and separate 'plugged' boolean Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Johannes.Schindelin@gmx.de, avarab@gmail.com, nksingh85@gmail.com, ps@pks.im, "Neeraj K. Singh" , Neeraj Singh Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Neeraj Singh From: Neeraj Singh Preparation for adding bulk-fsync to the bulk-checkin.c infrastructure. * Rename 'state' variable to 'bulk_checkin_state', since we will later be adding 'bulk_fsync_objdir'. This also makes the variable easier to find in the debugger, since the name is more unique. * Move the 'plugged' data member of 'bulk_checkin_state' into a separate static variable. Doing this avoids resetting the variable in finish_bulk_checkin when zeroing the 'bulk_checkin_state'. As-is, we seem to unintentionally disable the plugging functionality the first time a new packfile must be created due to packfile size limits. While disabling the plugging state only results in suboptimal behavior for the current code, it would be fatal for the bulk-fsync functionality later in this patch series. Signed-off-by: Neeraj Singh --- bulk-checkin.c | 22 ++++++++++++---------- 1 file changed, 12 insertions(+), 10 deletions(-) diff --git a/bulk-checkin.c b/bulk-checkin.c index e988a388b65..93b1dc5138a 100644 --- a/bulk-checkin.c +++ b/bulk-checkin.c @@ -10,9 +10,9 @@ #include "packfile.h" #include "object-store.h" -static struct bulk_checkin_state { - unsigned plugged:1; +static int bulk_checkin_plugged; +static struct bulk_checkin_state { char *pack_tmp_name; struct hashfile *f; off_t offset; @@ -21,7 +21,7 @@ static struct bulk_checkin_state { struct pack_idx_entry **written; uint32_t alloc_written; uint32_t nr_written; -} state; +} bulk_checkin_state; static void finish_tmp_packfile(struct strbuf *basename, const char *pack_tmp_name, @@ -278,21 +278,23 @@ int index_bulk_checkin(struct object_id *oid, int fd, size_t size, enum object_type type, const char *path, unsigned flags) { - int status = deflate_to_pack(&state, oid, fd, size, type, + int status = deflate_to_pack(&bulk_checkin_state, oid, fd, size, type, path, flags); - if (!state.plugged) - finish_bulk_checkin(&state); + if (!bulk_checkin_plugged) + finish_bulk_checkin(&bulk_checkin_state); return status; } void plug_bulk_checkin(void) { - state.plugged = 1; + assert(!bulk_checkin_plugged); + bulk_checkin_plugged = 1; } void unplug_bulk_checkin(void) { - state.plugged = 0; - if (state.f) - finish_bulk_checkin(&state); + assert(bulk_checkin_plugged); + bulk_checkin_plugged = 0; + if (bulk_checkin_state.f) + finish_bulk_checkin(&bulk_checkin_state); } From patchwork Tue Mar 15 21:30:54 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Neeraj Singh (WINDOWS-SFS)" X-Patchwork-Id: 12781863 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5A37BC433EF for ; Tue, 15 Mar 2022 21:31:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1351960AbiCOVcW (ORCPT ); Tue, 15 Mar 2022 17:32:22 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37644 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1351284AbiCOVcR (ORCPT ); Tue, 15 Mar 2022 17:32:17 -0400 Received: from mail-wm1-x334.google.com (mail-wm1-x334.google.com [IPv6:2a00:1450:4864:20::334]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 696735BE47 for ; Tue, 15 Mar 2022 14:31:04 -0700 (PDT) Received: by mail-wm1-x334.google.com with SMTP id p184-20020a1c29c1000000b0037f76d8b484so342223wmp.5 for ; Tue, 15 Mar 2022 14:31:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=7i75K7ijf+l6vkI2XlesvYMiiOUQOe4rl8kHX4/O63o=; b=KjRgSIpYY92/I9qGlZKCF8R6vfMclwmfnE0KTsvgx9I84xURrxTd0F0RK2vfLK9QzU 4oxlbBnTqoz6lVrXmJWuO9O+fGvWIuxfrU93RrlBK7gTT74H+mWUiuy9e1uc3b7dJn5u 38e66uKhcccLE63t7a2O3e/YxUGELn0TWYbe/3eqPUYbipoNNa93F4gqxz2jXpTA3eQq m6A96sn75283/c5Od96Sh0zQ5LMiH1nKlrmBgdpRHwIJLzrYW1SVwbZ26lynd9h2J8pj N/TMl/PoJstFkBx9XT9w6Z4m3aNpFQYjhfD+RZaQgSCOX/GjEITe511GZxW4EBOyN1/7 BfCw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=7i75K7ijf+l6vkI2XlesvYMiiOUQOe4rl8kHX4/O63o=; b=4vhCxCzI7speRTuHsSjQnOpKWZAOsIEGrkL6pvSQLZSVzTEE4sKeUIBo9XzATr3CSR D99VmdE7Y9x1aEqIHx9QYdeVRDgKqzg7a/65uph5SYVk8ScyNKEKzWGhgV7xMdU74K6F 8sQKeFWBhpAklZ8ly8hZe1dF8iFaBl+QZhUe2hZstQfwXZiYHzvx4TSLYuD479c+Ch9p UApY3f2dg7c4V1N+B/UBOs5RpdUX0dcE/vY5pwxzLsnICdW8BTC++hWBkUvnku9fMMYr rNedyrOoMuPe9kBWQkcFR0K9MLvXmWghIv/tABLUxqfX/PPECpdXdmoCCkVxexrooyW+ tx0A== X-Gm-Message-State: AOAM532Mx5y1VLUfSOEvCVfTB6SlABzqJ6M+2UuHg0PzXRTvHAyn37GN hBLEH1EFgyNmah4kYQpZ8Tvj0Xre1Ho= X-Google-Smtp-Source: ABdhPJyI7eHHVu78jtTjFgFh8DeW42yDHivbY1mvd5ARcl+OwUZV0Lhw+TcDYsZV1BGb2llWKs0GrA== X-Received: by 2002:a7b:cd13:0:b0:38b:f39c:1181 with SMTP id f19-20020a7bcd13000000b0038bf39c1181mr3092707wmj.20.1647379862645; Tue, 15 Mar 2022 14:31:02 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id n16-20020a5d4010000000b001f07772457csm68526wrp.101.2022.03.15.14.31.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 15 Mar 2022 14:31:02 -0700 (PDT) Message-Id: In-Reply-To: References: Date: Tue, 15 Mar 2022 21:30:54 +0000 Subject: [PATCH 2/7] core.fsyncmethod: batched disk flushes for loose-objects Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Johannes.Schindelin@gmx.de, avarab@gmail.com, nksingh85@gmail.com, ps@pks.im, "Neeraj K. Singh" , Neeraj Singh Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Neeraj Singh From: Neeraj Singh When adding many objects to a repo with `core.fsync=loose-object`, the cost of fsync'ing each object file can become prohibitive. One major source of the cost of fsync is the implied flush of the hardware writeback cache within the disk drive. This commit introduces a new `core.fsyncMethod=batch` option that batches up hardware flushes. It hooks into the bulk-checkin plugging and unplugging functionality, takes advantage of tmp-objdir, and uses the writeout-only support code. When the new mode is enabled, we do the following for each new object: 1. Create the object in a tmp-objdir. 2. Issue a pagecache writeback request and wait for it to complete. At the end of the entire transaction when unplugging bulk checkin: 1. Issue an fsync against a dummy file to flush the hardware writeback cache, which should by now have seen the tmp-objdir writes. 2. Rename all of the tmp-objdir files to their final names. 3. When updating the index and/or refs, we assume that Git will issue another fsync internal to that operation. This is not the default today, but the user now has the option of syncing the index and there is a separate patch series to implement syncing of refs. On a filesystem with a singular journal that is updated during name operations (e.g. create, link, rename, etc), such as NTFS, HFS+, or XFS we would expect the fsync to trigger a journal writeout so that this sequence is enough to ensure that the user's data is durable by the time the git command returns. Batch mode is only enabled if core.fsyncObjectFiles is false or unset. _Performance numbers_: Linux - Hyper-V VM running Kernel 5.11 (Ubuntu 20.04) on a fast SSD. Mac - macOS 11.5.1 running on a Mac mini on a 1TB Apple SSD. Windows - Same host as Linux, a preview version of Windows 11. Adding 500 files to the repo with 'git add' Times reported in seconds. object file syncing | Linux | Mac | Windows --------------------|-------|-------|-------- disabled | 0.06 | 0.35 | 0.61 fsync | 1.88 | 11.18 | 2.47 batch | 0.15 | 0.41 | 1.53 Signed-off-by: Neeraj Singh --- Documentation/config/core.txt | 5 +++ bulk-checkin.c | 67 +++++++++++++++++++++++++++++++++++ bulk-checkin.h | 2 ++ cache.h | 8 ++++- config.c | 2 ++ object-file.c | 2 ++ 6 files changed, 85 insertions(+), 1 deletion(-) diff --git a/Documentation/config/core.txt b/Documentation/config/core.txt index 062e5259905..c041ed33801 100644 --- a/Documentation/config/core.txt +++ b/Documentation/config/core.txt @@ -628,6 +628,11 @@ core.fsyncMethod:: * `writeout-only` issues pagecache writeback requests, but depending on the filesystem and storage hardware, data added to the repository may not be durable in the event of a system crash. This is the default mode on macOS. +* `batch` enables a mode that uses writeout-only flushes to stage multiple + updates in the disk writeback cache and then a single full fsync to trigger + the disk cache flush at the end of the operation. This mode is expected to + be as safe as `fsync` on macOS for repos stored on HFS+ or APFS filesystems + and on Windows for repos stored on NTFS or ReFS filesystems. core.fsyncObjectFiles:: This boolean will enable 'fsync()' when writing object files. diff --git a/bulk-checkin.c b/bulk-checkin.c index 93b1dc5138a..5c13fe17802 100644 --- a/bulk-checkin.c +++ b/bulk-checkin.c @@ -3,14 +3,20 @@ */ #include "cache.h" #include "bulk-checkin.h" +#include "lockfile.h" #include "repository.h" #include "csum-file.h" #include "pack.h" #include "strbuf.h" +#include "string-list.h" +#include "tmp-objdir.h" #include "packfile.h" #include "object-store.h" static int bulk_checkin_plugged; +static int needs_batch_fsync; + +static struct tmp_objdir *bulk_fsync_objdir; static struct bulk_checkin_state { char *pack_tmp_name; @@ -80,6 +86,34 @@ clear_exit: reprepare_packed_git(the_repository); } +/* + * Cleanup after batch-mode fsync_object_files. + */ +static void do_batch_fsync(void) +{ + /* + * Issue a full hardware flush against a temporary file to ensure + * that all objects are durable before any renames occur. The code in + * fsync_loose_object_bulk_checkin has already issued a writeout + * request, but it has not flushed any writeback cache in the storage + * hardware. + */ + + if (needs_batch_fsync) { + struct strbuf temp_path = STRBUF_INIT; + struct tempfile *temp; + + strbuf_addf(&temp_path, "%s/bulk_fsync_XXXXXX", get_object_directory()); + temp = xmks_tempfile(temp_path.buf); + fsync_or_die(get_tempfile_fd(temp), get_tempfile_path(temp)); + delete_tempfile(&temp); + strbuf_release(&temp_path); + } + + if (bulk_fsync_objdir) + tmp_objdir_migrate(bulk_fsync_objdir); +} + static int already_written(struct bulk_checkin_state *state, struct object_id *oid) { int i; @@ -274,6 +308,24 @@ static int deflate_to_pack(struct bulk_checkin_state *state, return 0; } +void fsync_loose_object_bulk_checkin(int fd) +{ + /* + * If we have a plugged bulk checkin, we issue a call that + * cleans the filesystem page cache but avoids a hardware flush + * command. Later on we will issue a single hardware flush + * before as part of do_batch_fsync. + */ + if (bulk_checkin_plugged && + git_fsync(fd, FSYNC_WRITEOUT_ONLY) >= 0) { + assert(bulk_fsync_objdir); + if (!needs_batch_fsync) + needs_batch_fsync = 1; + } else { + fsync_or_die(fd, "loose object file"); + } +} + int index_bulk_checkin(struct object_id *oid, int fd, size_t size, enum object_type type, const char *path, unsigned flags) @@ -288,6 +340,19 @@ int index_bulk_checkin(struct object_id *oid, void plug_bulk_checkin(void) { assert(!bulk_checkin_plugged); + + /* + * A temporary object directory is used to hold the files + * while they are not fsynced. + */ + if (batch_fsync_enabled(FSYNC_COMPONENT_LOOSE_OBJECT)) { + bulk_fsync_objdir = tmp_objdir_create("bulk-fsync"); + if (!bulk_fsync_objdir) + die(_("Could not create temporary object directory for core.fsyncobjectfiles=batch")); + + tmp_objdir_replace_primary_odb(bulk_fsync_objdir, 0); + } + bulk_checkin_plugged = 1; } @@ -297,4 +362,6 @@ void unplug_bulk_checkin(void) bulk_checkin_plugged = 0; if (bulk_checkin_state.f) finish_bulk_checkin(&bulk_checkin_state); + + do_batch_fsync(); } diff --git a/bulk-checkin.h b/bulk-checkin.h index b26f3dc3b74..08f292379b6 100644 --- a/bulk-checkin.h +++ b/bulk-checkin.h @@ -6,6 +6,8 @@ #include "cache.h" +void fsync_loose_object_bulk_checkin(int fd); + int index_bulk_checkin(struct object_id *oid, int fd, size_t size, enum object_type type, const char *path, unsigned flags); diff --git a/cache.h b/cache.h index d347d0757f7..4d07691e791 100644 --- a/cache.h +++ b/cache.h @@ -1040,7 +1040,8 @@ extern int use_fsync; enum fsync_method { FSYNC_METHOD_FSYNC, - FSYNC_METHOD_WRITEOUT_ONLY + FSYNC_METHOD_WRITEOUT_ONLY, + FSYNC_METHOD_BATCH }; extern enum fsync_method fsync_method; @@ -1766,6 +1767,11 @@ void fsync_or_die(int fd, const char *); int fsync_component(enum fsync_component component, int fd); void fsync_component_or_die(enum fsync_component component, int fd, const char *msg); +static inline int batch_fsync_enabled(enum fsync_component component) +{ + return (fsync_components & component) && (fsync_method == FSYNC_METHOD_BATCH); +} + ssize_t read_in_full(int fd, void *buf, size_t count); ssize_t write_in_full(int fd, const void *buf, size_t count); ssize_t pread_in_full(int fd, void *buf, size_t count, off_t offset); diff --git a/config.c b/config.c index 261ee7436e0..0b28f90de8b 100644 --- a/config.c +++ b/config.c @@ -1688,6 +1688,8 @@ static int git_default_core_config(const char *var, const char *value, void *cb) fsync_method = FSYNC_METHOD_FSYNC; else if (!strcmp(value, "writeout-only")) fsync_method = FSYNC_METHOD_WRITEOUT_ONLY; + else if (!strcmp(value, "batch")) + fsync_method = FSYNC_METHOD_BATCH; else warning(_("ignoring unknown core.fsyncMethod value '%s'"), value); diff --git a/object-file.c b/object-file.c index 295cb899e22..ef6621ffe56 100644 --- a/object-file.c +++ b/object-file.c @@ -1894,6 +1894,8 @@ static void close_loose_object(int fd) if (fsync_object_files > 0) fsync_or_die(fd, "loose object file"); + else if (batch_fsync_enabled(FSYNC_COMPONENT_LOOSE_OBJECT)) + fsync_loose_object_bulk_checkin(fd); else fsync_component_or_die(FSYNC_COMPONENT_LOOSE_OBJECT, fd, "loose object file"); From patchwork Tue Mar 15 21:30:55 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Neeraj Singh (WINDOWS-SFS)" X-Patchwork-Id: 12781865 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 00EAFC433EF for ; Tue, 15 Mar 2022 21:31:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1351981AbiCOVcZ (ORCPT ); Tue, 15 Mar 2022 17:32:25 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37646 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1351340AbiCOVcR (ORCPT ); Tue, 15 Mar 2022 17:32:17 -0400 Received: from mail-wr1-x42e.google.com (mail-wr1-x42e.google.com [IPv6:2a00:1450:4864:20::42e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 140DD5BE48 for ; Tue, 15 Mar 2022 14:31:05 -0700 (PDT) Received: by mail-wr1-x42e.google.com with SMTP id t11so378280wrm.5 for ; Tue, 15 Mar 2022 14:31:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=VMEUndPqaZqgnJ/fjVlg4dlleAN5tgf3inMupsE+RpY=; b=gr7J0kvQBIA/L/21qWVlT0rWgQ6ObVXnOnpCRYvP9Vg6H4dSxEaP+y1jEvyfGnS/jK jCUxXMEoz1+/diVz8Sgpl/sGtVZzeOJoAfuyUS/dJCfhN8nVBZMmCa0YCWJTPMWudje8 hZFMX3AhEIItMDZICCc6g00BZlHUuzE+/b8Eg18IEGF0R1WiZuFl61oR8BpJxIT8czCy dvSKHIY0JZY/La7czKpiiwOfLZ0V72dqzhvJNbgyTE29TktRdXhGW9yZf1II99EXjOcX lk1YFpL+8CRnofRU5qJ7HnCxDXB12mGD4Zd5shVvk65UBFujRjs4Ffrz7ewROpnlyH/1 TwCA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=VMEUndPqaZqgnJ/fjVlg4dlleAN5tgf3inMupsE+RpY=; b=rfmJnaHH90u9zCWiKntsoBzeqZ1xMw0pO2b6uRBa7sxkIFx+CHWRwwHKvBHKlYP/6j Kp70NBkDHYUJ+WqQuPQqSWd579appkuaavlBOoFmmDt5BYuUozOfecfdItyn+YJYalv6 I2sEWVz8CX2LCOaL3gCSQ4nEJZ3E8bREn4tRfka8WaskVYgGgx6g17uMbkZ/GEnETneN Roz6lDtM/BxxXzwIzXFL3rP/Kldo9sqp6e/LGPijFIJNStR/kG6VvMG+pwHeA8/ySRut KMQrclnCRceF1JhIWa4HYbGHwz4uaTNMXcJccAAxlvCrOsEpBlg2YIUn7b0VK7nE97pY MjGg== X-Gm-Message-State: AOAM530Fm6+PU70MjpgzPYe83m3+PTe1FFpavNn3WUOe6gSIaX4VTVvw pCbVMlmF0oSFbcLQUpZTiUWwLzPRdRE= X-Google-Smtp-Source: ABdhPJzkPpFxx657elMqcUfEvFvgDxT2FFvIhSRkns9thuCmTMtmvVduZ6w6iPhV1/wOF3OjPuH01g== X-Received: by 2002:a05:6000:1549:b0:1f1:e566:eb87 with SMTP id 9-20020a056000154900b001f1e566eb87mr21204800wry.87.1647379863546; Tue, 15 Mar 2022 14:31:03 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id m20-20020a05600c4f5400b0038b5162260csm9087wmq.23.2022.03.15.14.31.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 15 Mar 2022 14:31:03 -0700 (PDT) Message-Id: In-Reply-To: References: Date: Tue, 15 Mar 2022 21:30:55 +0000 Subject: [PATCH 3/7] update-index: use the bulk-checkin infrastructure Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Johannes.Schindelin@gmx.de, avarab@gmail.com, nksingh85@gmail.com, ps@pks.im, "Neeraj K. Singh" , Neeraj Singh Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Neeraj Singh From: Neeraj Singh The update-index functionality is used internally by 'git stash push' to setup the internal stashed commit. This change enables bulk-checkin for update-index infrastructure to speed up adding new objects to the object database by leveraging the batch fsync functionality. There is some risk with this change, since under batch fsync, the object files will be in a tmp-objdir until update-index is complete. This usage is unlikely, since any tool invoking update-index and expecting to see objects would have to synchronize with the update-index process after passing it a file path. Signed-off-by: Neeraj Singh --- builtin/update-index.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/builtin/update-index.c b/builtin/update-index.c index 75d646377cc..38e9d7e88cb 100644 --- a/builtin/update-index.c +++ b/builtin/update-index.c @@ -5,6 +5,7 @@ */ #define USE_THE_INDEX_COMPATIBILITY_MACROS #include "cache.h" +#include "bulk-checkin.h" #include "config.h" #include "lockfile.h" #include "quote.h" @@ -1110,6 +1111,9 @@ int cmd_update_index(int argc, const char **argv, const char *prefix) the_index.updated_skipworktree = 1; + /* we might be adding many objects to the object database */ + plug_bulk_checkin(); + /* * Custom copy of parse_options() because we want to handle * filename arguments as they come. @@ -1190,6 +1194,8 @@ int cmd_update_index(int argc, const char **argv, const char *prefix) strbuf_release(&buf); } + /* by now we must have added all of the new objects */ + unplug_bulk_checkin(); if (split_index > 0) { if (git_config_get_split_index() == 0) warning(_("core.splitIndex is set to false; " From patchwork Tue Mar 15 21:30:56 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Neeraj Singh (WINDOWS-SFS)" X-Patchwork-Id: 12781864 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 66DCEC433F5 for ; Tue, 15 Mar 2022 21:31:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1351979AbiCOVcY (ORCPT ); Tue, 15 Mar 2022 17:32:24 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37666 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1351795AbiCOVcS (ORCPT ); Tue, 15 Mar 2022 17:32:18 -0400 Received: from mail-wm1-x32f.google.com (mail-wm1-x32f.google.com [IPv6:2a00:1450:4864:20::32f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CEAEE5BE42 for ; Tue, 15 Mar 2022 14:31:05 -0700 (PDT) Received: by mail-wm1-x32f.google.com with SMTP id r190-20020a1c2bc7000000b0038a1013241dso361567wmr.1 for ; Tue, 15 Mar 2022 14:31:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=uH9KeiJNcsWuHeLIMulApi4ZYs+kLt8o7kPraRKdbBo=; b=DYHqBzxvJP+vUf1Jp/eSbsRq6+ipFXpGp3nfKEYIVIGu4MkchPVlMEpF1B0Krrsk9f qad4gxJ6gykYV0tz46ngLX0A8oqVngXd/ChHoQbjF3aMto+0kVqWhY5wnL53MIQyzO53 vOEpHyJKL4v+oNMVqIwGwlyUeXQHb+afvuVVWWskoooQoFn7p6d3csGoeaxwp9SLSyi0 Dtjn0RjLHwwviT+EA3ID+VTmr7NuLr9yBmolJfXxp7wd1y1sj1JCVWLxhD/nFa+s+9lI /roHENQMiYLnqU8lXVwDA3fHvmzCaDGIToDYTceuwN/qmzCSCRdj0PfiDHhJc0VBLYHX rJMg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=uH9KeiJNcsWuHeLIMulApi4ZYs+kLt8o7kPraRKdbBo=; b=RhQh9vOYawztS5MB6L+F9jFstrzCH7U26Rn1fGHGQ0zVinDOuptMydWQ85AqjBRDIz 2nuoHTWNgXMSAlfPBlI49hHmWjIyt5RsIUbdwfmztgGaleqTqyk/RS/+tIO6bFbjaIOg eDJGQMU+gTKruPv4aG9OSH/YLr7oLk1M2wrIn6jpVAy0RCDirldsr7zWE8Fq/y3mBZWB 2f6QozNTEdDagvFJUyHdqIFktD2sTniwRv1IqcI8GWHwGmxUrht+KDicuAl5OEMyFD8k +ZMM13km7DEGE6+LdMGlR24WJ8ZXbv1VTZFge4f4rPMhERmEwImz+uJ+bB1VjJd3u+jI 9PYg== X-Gm-Message-State: AOAM530bSFH0M4tklJ5R/QUwq6NdbKd2ctd3VU9utrfg43xUXQ+t5zRb 7yMKRdsGg38AaligrySJeIXgexMbISs= X-Google-Smtp-Source: ABdhPJxoT0oVlf24tZ11PT6ezBIdqXJe1Dgga/vMPfBhkUxCRdgbVP4JM6DFOFaxYBRe5say0Tcgnw== X-Received: by 2002:a05:600c:4a12:b0:389:d7f7:fbcf with SMTP id c18-20020a05600c4a1200b00389d7f7fbcfmr5039600wmp.158.1647379864278; Tue, 15 Mar 2022 14:31:04 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id o11-20020adfe80b000000b00203dda2388dsm101183wrm.30.2022.03.15.14.31.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 15 Mar 2022 14:31:03 -0700 (PDT) Message-Id: <99e3a61b9191e4102529c8b90b36e0f96f2b23bf.1647379859.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Tue, 15 Mar 2022 21:30:56 +0000 Subject: [PATCH 4/7] unpack-objects: use the bulk-checkin infrastructure Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Johannes.Schindelin@gmx.de, avarab@gmail.com, nksingh85@gmail.com, ps@pks.im, "Neeraj K. Singh" , Neeraj Singh Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Neeraj Singh From: Neeraj Singh The unpack-objects functionality is used by fetch, push, and fast-import to turn the transfered data into object database entries when there are fewer objects than the 'unpacklimit' setting. By enabling bulk-checkin when unpacking objects, we can take advantage of batched fsyncs. Signed-off-by: Neeraj Singh --- builtin/unpack-objects.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c index dbeb0680a58..c55b6616aed 100644 --- a/builtin/unpack-objects.c +++ b/builtin/unpack-objects.c @@ -1,5 +1,6 @@ #include "builtin.h" #include "cache.h" +#include "bulk-checkin.h" #include "config.h" #include "object-store.h" #include "object.h" @@ -503,10 +504,12 @@ static void unpack_all(void) if (!quiet) progress = start_progress(_("Unpacking objects"), nr_objects); CALLOC_ARRAY(obj_list, nr_objects); + plug_bulk_checkin(); for (i = 0; i < nr_objects; i++) { unpack_one(i); display_progress(progress, i + 1); } + unplug_bulk_checkin(); stop_progress(&progress); if (delta_list) From patchwork Tue Mar 15 21:30:57 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Neeraj Singh (WINDOWS-SFS)" X-Patchwork-Id: 12781866 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 85D4AC433F5 for ; Tue, 15 Mar 2022 21:31:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1351984AbiCOVc1 (ORCPT ); Tue, 15 Mar 2022 17:32:27 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37672 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1349209AbiCOVcT (ORCPT ); Tue, 15 Mar 2022 17:32:19 -0400 Received: from mail-wr1-x433.google.com (mail-wr1-x433.google.com [IPv6:2a00:1450:4864:20::433]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 908B95BE44 for ; Tue, 15 Mar 2022 14:31:06 -0700 (PDT) Received: by mail-wr1-x433.google.com with SMTP id h15so371565wrc.6 for ; Tue, 15 Mar 2022 14:31:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=xznHAbPb1YXa3RvOdPcCMBklNaMgBGuf4+t5joBYEHc=; b=ohBTAnggQ6aMve0hdyqr7O9DO/rDt27rCitsYj6+cF3q2CDw/A9dalYfodUtXf+cpp gnd1afcN+/acYbd/WO/O7RWsuz27QDTXt43E1qijmR6irevKxTy7YI5t+35+tKO9MKRP TfQpOp+pAMo6wwTnR+WJtwonYVi90rX/Tfa2K1oqd8EhD2CcJjuYDP1T2P0XMKxvocfT FXo9jI4ZRjViIkzGD9g9QAtOgwmZ8YOWytRfsu7DeB98AGSeFrZiPm1Fw59IEba2CS/I RpVaJapgueEX8mVxClFThyfJhcNFx7FWPZnOus/E6E4F1E249rZLDXcrpAiZtWf6yzc1 x1ag== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=xznHAbPb1YXa3RvOdPcCMBklNaMgBGuf4+t5joBYEHc=; b=2a7l+T/RyUlryMRB9HveKsICPfcHib0TJGKbZmPX8TI7dKh3C6M/e+Qh8+T6vE8JOr ZrXnwAljfp/IIRzIEH5Omf+DgZ7eLWFFlZ3KRr+gTShEk+tAWfpE1SNk2eooizy1ZUi0 H8MhlwHXPBdOoDC4uaPUUsUiqs4pj1E2fK8e4yybla8+ynpgzjwvzIHHBXmfhKtyb8U3 UrheBMmToxbfvuMwHAzXhwwL5P60pw/0y5x5crykZlidJwyK3KRThsILBIR6DiWB4/9n nloocmOYnwUeIYFZlP7brm/alRzn8TkPhWhBtm9yRDQMrweC2r2NfruLC2dXHfp2ZBaf 5FbA== X-Gm-Message-State: AOAM532eDCBjgQ12kacl6VkD4RUfFGDLZTIFGdPXGcvPVa3n7VPA7YNp 2x/0X11MX6no8NSs7yg811UmzsD9NPw= X-Google-Smtp-Source: ABdhPJzuJ5vut6oJkxlIPzJmC5WrSzGEbIMeQT8rnMCJ7mx7ZnMLgy5a/xDLtdwNMjK63xWSAFv0Nw== X-Received: by 2002:adf:fe8d:0:b0:203:de3d:7f93 with SMTP id l13-20020adffe8d000000b00203de3d7f93mr426276wrr.55.1647379864993; Tue, 15 Mar 2022 14:31:04 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id d1-20020adffbc1000000b00203de0fff63sm91360wrs.70.2022.03.15.14.31.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 15 Mar 2022 14:31:04 -0700 (PDT) Message-Id: <4e56c58c8cb812b244feaf814b43b7dc28879f9a.1647379859.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Tue, 15 Mar 2022 21:30:57 +0000 Subject: [PATCH 5/7] core.fsync: use batch mode and sync loose objects by default on Windows Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Johannes.Schindelin@gmx.de, avarab@gmail.com, nksingh85@gmail.com, ps@pks.im, "Neeraj K. Singh" , Neeraj Singh Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Neeraj Singh From: Neeraj Singh Git for Windows has defaulted to core.fsyncObjectFiles=true since September 2017. We turn on syncing of loose object files with batch mode in upstream Git so that we can get broad coverage of the new code upstream. We don't actually do fsyncs in the test suite, since GIT_TEST_FSYNC is set to 0. However, we do exercise all of the surrounding batch mode code since GIT_TEST_FSYNC merely makes the maybe_fsync wrapper always appear to succeed. Signed-off-by: Neeraj Singh change fsyncmethod to batch as well --- cache.h | 4 ++++ compat/mingw.h | 3 +++ config.c | 2 +- git-compat-util.h | 2 ++ 4 files changed, 10 insertions(+), 1 deletion(-) diff --git a/cache.h b/cache.h index 4d07691e791..04193d87246 100644 --- a/cache.h +++ b/cache.h @@ -1031,6 +1031,10 @@ enum fsync_component { FSYNC_COMPONENT_INDEX | \ FSYNC_COMPONENT_REFERENCE) +#ifndef FSYNC_COMPONENTS_PLATFORM_DEFAULT +#define FSYNC_COMPONENTS_PLATFORM_DEFAULT FSYNC_COMPONENTS_DEFAULT +#endif + /* * A bitmask indicating which components of the repo should be fsynced. */ diff --git a/compat/mingw.h b/compat/mingw.h index 6074a3d3ced..afe30868c04 100644 --- a/compat/mingw.h +++ b/compat/mingw.h @@ -332,6 +332,9 @@ int mingw_getpagesize(void); int win32_fsync_no_flush(int fd); #define fsync_no_flush win32_fsync_no_flush +#define FSYNC_COMPONENTS_PLATFORM_DEFAULT (FSYNC_COMPONENTS_DEFAULT | FSYNC_COMPONENT_LOOSE_OBJECT) +#define FSYNC_METHOD_DEFAULT (FSYNC_METHOD_BATCH) + struct rlimit { unsigned int rlim_cur; }; diff --git a/config.c b/config.c index 0b28f90de8b..c76443dc556 100644 --- a/config.c +++ b/config.c @@ -1342,7 +1342,7 @@ static const struct fsync_component_name { static enum fsync_component parse_fsync_components(const char *var, const char *string) { - enum fsync_component current = FSYNC_COMPONENTS_DEFAULT; + enum fsync_component current = FSYNC_COMPONENTS_PLATFORM_DEFAULT; enum fsync_component positive = 0, negative = 0; while (string) { diff --git a/git-compat-util.h b/git-compat-util.h index 0892e209a2f..fffe42ce7c1 100644 --- a/git-compat-util.h +++ b/git-compat-util.h @@ -1257,11 +1257,13 @@ __attribute__((format (printf, 3, 4))) NORETURN void BUG_fl(const char *file, int line, const char *fmt, ...); #define BUG(...) BUG_fl(__FILE__, __LINE__, __VA_ARGS__) +#ifndef FSYNC_METHOD_DEFAULT #ifdef __APPLE__ #define FSYNC_METHOD_DEFAULT FSYNC_METHOD_WRITEOUT_ONLY #else #define FSYNC_METHOD_DEFAULT FSYNC_METHOD_FSYNC #endif +#endif enum fsync_action { FSYNC_WRITEOUT_ONLY, From patchwork Tue Mar 15 21:30:58 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Neeraj Singh (WINDOWS-SFS)" X-Patchwork-Id: 12781867 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 56049C433EF for ; Tue, 15 Mar 2022 21:31:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1351989AbiCOVca (ORCPT ); Tue, 15 Mar 2022 17:32:30 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37712 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1351950AbiCOVcU (ORCPT ); Tue, 15 Mar 2022 17:32:20 -0400 Received: from mail-wr1-x42e.google.com (mail-wr1-x42e.google.com [IPv6:2a00:1450:4864:20::42e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 75E1D5BD33 for ; Tue, 15 Mar 2022 14:31:07 -0700 (PDT) Received: by mail-wr1-x42e.google.com with SMTP id d7so366450wrb.7 for ; Tue, 15 Mar 2022 14:31:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=j+0/ZGqRxtrgeeHJf9TQjEozWirMq7rcVtdyD7pbrvk=; b=ZpZq+SmWP/JvNycXK8dBwob8sI3ri+35uqAg+pbovEyovFZLGuf8+Qr0HYHOuljBB/ tayXfAq2MLCChZ7Y5S3i32/ajFTZuYWQfPttQITgmM6ew4meFgh3PVFD97yOLBDrnDpn 7hXtBI+jQwrePoxzPHp6PGOXnWf2rbCFcmRS0DMYPzHfgADieJiULMNO+7sZx8aIxPQy M3ita0tQDcLOhma3BIMuYdKh/z2Gwk9t2gzHmTszV4E4FJz1lxSeCd8+8YGShjBU+8cR DGDUaphbjYTlspxUTBWAJGrCWftU627IwODg2J6AzaJPUfH8/hyd3jIVuzX5EA51YecB NZnQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=j+0/ZGqRxtrgeeHJf9TQjEozWirMq7rcVtdyD7pbrvk=; b=MFHxdsTZYU06jrJbqBfxBhMv12BKQdTZqjia7SeGpL7OxbAwOhJp2EVCN/BmvyVTAJ 59IZOFIjpITODh6PlRyEmJqeRbx9DU3WIerUwIcTcR8mNfry/41IhUuB7RFlpL2aMtNg wegLWkaV/UXrD88eTORyQjRXfPW8fGP6n5vVMgcRC3I1nRKEk4bwoxhwUarrVTpiMT7q XKwM6GYln2GcR3YGe8om6gC+/Fm0b7XPk3q5/VOLv0ZU5xxEji3qG4vFI7j5Vj86Q1IH PZg8y1mmjmjYKjc0xhfOZVIH8FwXpCr5JpeXoRZhVN8VSgq2FEjxR7293xbYVc4oSNoB D/+g== X-Gm-Message-State: AOAM5321LfPaZ8t1OCSsZyThBJCgKIiQTCoSr7SyXGf/YuQnmdYLhqI0 vbzxvZG4v6jr+RifQpx5l6BfhEWSQo4= X-Google-Smtp-Source: ABdhPJzptlkAoKdkE7+pKBT9inuA7cR/YTfI38S3sB6F4e4fykkXlUsQgsxMr22k5c8jmEeI+Olu0g== X-Received: by 2002:a5d:59ad:0:b0:203:99d0:fcc7 with SMTP id p13-20020a5d59ad000000b0020399d0fcc7mr15735261wrr.592.1647379865793; Tue, 15 Mar 2022 14:31:05 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id k9-20020adfd849000000b00203d18bf389sm103174wrl.17.2022.03.15.14.31.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 15 Mar 2022 14:31:05 -0700 (PDT) Message-Id: <88e47047d790f76ffacaa61ed8041b733e30f45a.1647379859.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Tue, 15 Mar 2022 21:30:58 +0000 Subject: [PATCH 6/7] core.fsyncmethod: tests for batch mode Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Johannes.Schindelin@gmx.de, avarab@gmail.com, nksingh85@gmail.com, ps@pks.im, "Neeraj K. Singh" , Neeraj Singh Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Neeraj Singh From: Neeraj Singh Add test cases to exercise batch mode for: * 'git add' * 'git stash' * 'git update-index' * 'git unpack-objects' These tests ensure that the added data winds up in the object database. In this change we introduce a new test helper lib-unique-files.sh. The goal of this library is to create a tree of files that have different oids from any other files that may have been created in the current test repo. This helps us avoid missing validation of an object being added due to it already being in the repo. We aren't actually issuing any fsyncs in these tests, since GIT_TEST_FSYNC is 0, but we still exercise all of the tmp_objdir logic in bulk-checkin. Signed-off-by: Neeraj Singh --- t/lib-unique-files.sh | 36 ++++++++++++++++++++++++++++++++++++ t/t3700-add.sh | 22 ++++++++++++++++++++++ t/t3903-stash.sh | 17 +++++++++++++++++ t/t5300-pack-object.sh | 32 +++++++++++++++++++++----------- 4 files changed, 96 insertions(+), 11 deletions(-) create mode 100644 t/lib-unique-files.sh diff --git a/t/lib-unique-files.sh b/t/lib-unique-files.sh new file mode 100644 index 00000000000..a7de4ca8512 --- /dev/null +++ b/t/lib-unique-files.sh @@ -0,0 +1,36 @@ +# Helper to create files with unique contents + + +# Create multiple files with unique contents. Takes the number of +# directories, the number of files in each directory, and the base +# directory. +# +# test_create_unique_files 2 3 my_dir -- Creates 2 directories with 3 files +# each in my_dir, all with unique +# contents. + +test_create_unique_files() { + test "$#" -ne 3 && BUG "3 param" + + local dirs=$1 + local files=$2 + local basedir=$3 + local counter=0 + test_tick + local basedata=$test_tick + + + rm -rf $basedir + + for i in $(test_seq $dirs) + do + local dir=$basedir/dir$i + + mkdir -p "$dir" + for j in $(test_seq $files) + do + counter=$((counter + 1)) + echo "$basedata.$counter" >"$dir/file$j.txt" + done + done +} diff --git a/t/t3700-add.sh b/t/t3700-add.sh index b1f90ba3250..1f349f52ad3 100755 --- a/t/t3700-add.sh +++ b/t/t3700-add.sh @@ -8,6 +8,8 @@ test_description='Test of git add, including the -- option.' TEST_PASSES_SANITIZE_LEAK=true . ./test-lib.sh +. $TEST_DIRECTORY/lib-unique-files.sh + # Test the file mode "$1" of the file "$2" in the index. test_mode_in_index () { case "$(git ls-files -s "$2")" in @@ -34,6 +36,26 @@ test_expect_success \ 'Test that "git add -- -q" works' \ 'touch -- -q && git add -- -q' +BATCH_CONFIGURATION='-c core.fsync=loose-object -c core.fsyncmethod=batch' + +test_expect_success 'git add: core.fsyncmethod=batch' " + test_create_unique_files 2 4 fsync-files && + git $BATCH_CONFIGURATION add -- ./fsync-files/ && + rm -f fsynced_files && + git ls-files --stage fsync-files/ > fsynced_files && + test_line_count = 8 fsynced_files && + awk -- '{print \$2}' fsynced_files | xargs -n1 git cat-file -e +" + +test_expect_success 'git update-index: core.fsyncmethod=batch' " + test_create_unique_files 2 4 fsync-files2 && + find fsync-files2 ! -type d -print | xargs git $BATCH_CONFIGURATION update-index --add -- && + rm -f fsynced_files2 && + git ls-files --stage fsync-files2/ > fsynced_files2 && + test_line_count = 8 fsynced_files2 && + awk -- '{print \$2}' fsynced_files2 | xargs -n1 git cat-file -e +" + test_expect_success \ 'git add: Test that executable bit is not used if core.filemode=0' \ 'git config core.filemode 0 && diff --git a/t/t3903-stash.sh b/t/t3903-stash.sh index 4abbc8fccae..877276c1ca3 100755 --- a/t/t3903-stash.sh +++ b/t/t3903-stash.sh @@ -9,6 +9,7 @@ GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME . ./test-lib.sh +. $TEST_DIRECTORY/lib-unique-files.sh test_expect_success 'usage on cmd and subcommand invalid option' ' test_expect_code 129 git stash --invalid-option 2>usage && @@ -1410,6 +1411,22 @@ test_expect_success 'stash handles skip-worktree entries nicely' ' git rev-parse --verify refs/stash:A.t ' + +BATCH_CONFIGURATION='-c core.fsync=loose-object -c core.fsyncmethod=batch' + +test_expect_success 'stash with core.fsyncmethod=batch' " + test_create_unique_files 2 4 fsync-files && + git $BATCH_CONFIGURATION stash push -u -- ./fsync-files/ && + rm -f fsynced_files && + + # The files were untracked, so use the third parent, + # which contains the untracked files + git ls-tree -r stash^3 -- ./fsync-files/ > fsynced_files && + test_line_count = 8 fsynced_files && + awk -- '{print \$3}' fsynced_files | xargs -n1 git cat-file -e +" + + test_expect_success 'git stash succeeds despite directory/file change' ' test_create_repo directory_file_switch_v1 && ( diff --git a/t/t5300-pack-object.sh b/t/t5300-pack-object.sh index a11d61206ad..8e2f73cc68f 100755 --- a/t/t5300-pack-object.sh +++ b/t/t5300-pack-object.sh @@ -162,23 +162,25 @@ test_expect_success 'pack-objects with bogus arguments' ' check_unpack () { test_when_finished "rm -rf git2" && - git init --bare git2 && - git -C git2 unpack-objects -n <"$1".pack && - git -C git2 unpack-objects <"$1".pack && - (cd .git && find objects -type f -print) | - while read path - do - cmp git2/$path .git/$path || { - echo $path differs. - return 1 - } - done + git $2 init --bare git2 && + ( + git $2 -C git2 unpack-objects -n <"$1".pack && + git $2 -C git2 unpack-objects <"$1".pack && + git $2 -C git2 cat-file --batch-check="%(objectname)" + ) current && + cmp obj-list current } test_expect_success 'unpack without delta' ' check_unpack test-1-${packname_1} ' +BATCH_CONFIGURATION='-c core.fsync=loose-object -c core.fsyncmethod=batch' + +test_expect_success 'unpack without delta (core.fsyncmethod=batch)' ' + check_unpack test-1-${packname_1} "$BATCH_CONFIGURATION" +' + test_expect_success 'pack with REF_DELTA' ' packname_2=$(git pack-objects --progress test-2 stderr) && check_deltas stderr -gt 0 @@ -188,6 +190,10 @@ test_expect_success 'unpack with REF_DELTA' ' check_unpack test-2-${packname_2} ' +test_expect_success 'unpack with REF_DELTA (core.fsyncmethod=batch)' ' + check_unpack test-2-${packname_2} "$BATCH_CONFIGURATION" +' + test_expect_success 'pack with OFS_DELTA' ' packname_3=$(git pack-objects --progress --delta-base-offset test-3 \ stderr) && @@ -198,6 +204,10 @@ test_expect_success 'unpack with OFS_DELTA' ' check_unpack test-3-${packname_3} ' +test_expect_success 'unpack with OFS_DELTA (core.fsyncmethod=batch)' ' + check_unpack test-3-${packname_3} "$BATCH_CONFIGURATION" +' + test_expect_success 'compare delta flavors' ' perl -e '\'' defined($_ = -s $_) or die for @ARGV; From patchwork Tue Mar 15 21:30:59 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Neeraj Singh (WINDOWS-SFS)" X-Patchwork-Id: 12781868 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C5868C433FE for ; Tue, 15 Mar 2022 21:31:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1351977AbiCOVcd (ORCPT ); Tue, 15 Mar 2022 17:32:33 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37740 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1351954AbiCOVcV (ORCPT ); Tue, 15 Mar 2022 17:32:21 -0400 Received: from mail-wm1-x336.google.com (mail-wm1-x336.google.com [IPv6:2a00:1450:4864:20::336]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 51F025BE4A for ; Tue, 15 Mar 2022 14:31:08 -0700 (PDT) Received: by mail-wm1-x336.google.com with SMTP id n33-20020a05600c3ba100b003832caf7f3aso1843674wms.0 for ; Tue, 15 Mar 2022 14:31:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=hiQ5WX0jcODcj04j+PrFFWVvlhk66JiPT7wxH3YR6go=; b=HshSyYVg3CTO3VmZtMjK93mc9NWTRALN52LWSYB0q3yfUW/oqZvrwj1WTYT+AbU3dz fCBOX0CxKKsZL5XqE8rE2NLViXFQFGdlZDPNyMy2SKZRxu7yc2yLZg/7JnG8K73/z6A/ 4KZKNwTGMq/Xbdzpk08+Jrk5bh+bllV4lxb8VrQ7yisYUYEbj/2HV8fdveWAnvAu+3x9 sfF/lRG0tyjcbC18wrS3QYarvnzDaShr8wHM44DKtquLkItQIpzNkQ7N0O88TKdVY3rs kjM5nzG4ER1x+L4+YkxQb3IY8yIc3eTfdao7LKAhLgfR4BF94h4MRUOHQL1qU8ylLKTP hhCQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=hiQ5WX0jcODcj04j+PrFFWVvlhk66JiPT7wxH3YR6go=; b=nxdhpTiP9q21nH3ASUedaWfKZ6c9iKizP5LLM9DI3sz8wltsnIgz9zfOU6GRqPp8VK hKelb9XBm1cTQgPCrSlyr2xNtklCagbWKODfZVZuFbQUy3rrLFoV8Mjfi54gE5hPYDqm F2AmrxBlepdnUeKLo5iWnA2UHtCzO6RQs2EbEb2FvZHuqD801vpNX17HGVYydBOUeZ3m CrMRxVq9Qfs3AWzD3dQsEJtgnnPG8acN45omoSYDGqNtADH4d1/Stso/Zm90DXTy52+4 iIeuH02cUr0CfZBid4kG+bc+WPAXq0tzveMrr5XcgG9Scu38NfNkou1C2FcCE4VC10pg TC6A== X-Gm-Message-State: AOAM5303+RiOuPvyp8pQY1MEc+oh2k8Xos+0zj6CE/nOESDNpDWnHsmG PbawJhsWUax+qzxdM2cJs/DGIik8RBo= X-Google-Smtp-Source: ABdhPJylvMugWe4Xc1W6wLSQoiLObYYz2J5WT1i6wbPBZBBDOv9kKPi+xvM90jBP8mytLUOzxbAlEQ== X-Received: by 2002:a05:600c:4142:b0:389:a592:10ce with SMTP id h2-20020a05600c414200b00389a59210cemr4953489wmm.148.1647379866627; Tue, 15 Mar 2022 14:31:06 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id u11-20020a05600c19cb00b00389efe9c512sm3706686wmq.23.2022.03.15.14.31.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 15 Mar 2022 14:31:06 -0700 (PDT) Message-Id: <876741f1ef9a5b8af28f73948a3e9ddc16d88c6d.1647379859.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Tue, 15 Mar 2022 21:30:59 +0000 Subject: [PATCH 7/7] core.fsyncmethod: performance tests for add and stash Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Johannes.Schindelin@gmx.de, avarab@gmail.com, nksingh85@gmail.com, ps@pks.im, "Neeraj K. Singh" , Neeraj Singh Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Neeraj Singh From: Neeraj Singh Add basic performance tests for "git add" and "git stash" of a lot of new objects with various fsync settings. This shows the benefit of batch mode relative to an ordinary stash command. Signed-off-by: Neeraj Singh --- t/perf/p3700-add.sh | 59 ++++++++++++++++++++++++++++++++++++++++ t/perf/p3900-stash.sh | 62 +++++++++++++++++++++++++++++++++++++++++++ t/perf/perf-lib.sh | 4 +-- 3 files changed, 123 insertions(+), 2 deletions(-) create mode 100755 t/perf/p3700-add.sh create mode 100755 t/perf/p3900-stash.sh diff --git a/t/perf/p3700-add.sh b/t/perf/p3700-add.sh new file mode 100755 index 00000000000..2ea78c9449d --- /dev/null +++ b/t/perf/p3700-add.sh @@ -0,0 +1,59 @@ +#!/bin/sh +# +# This test measures the performance of adding new files to the object database +# and index. The test was originally added to measure the effect of the +# core.fsyncMethod=batch mode, which is why we are testing different values +# of that setting explicitly and creating a lot of unique objects. + +test_description="Tests performance of add" + +# Fsync is normally turned off for the test suite. +GIT_TEST_FSYNC=1 +export GIT_TEST_FSYNC + +. ./perf-lib.sh + +. $TEST_DIRECTORY/lib-unique-files.sh + +test_perf_default_repo +test_checkout_worktree + +dir_count=10 +files_per_dir=50 +total_files=$((dir_count * files_per_dir)) + +# We need to create the files each time we run the perf test, but +# we do not want to measure the cost of creating the files, so run +# the test once. +if test "${GIT_PERF_REPEAT_COUNT-1}" -ne 1 +then + echo "warning: Setting GIT_PERF_REPEAT_COUNT=1" >&2 + GIT_PERF_REPEAT_COUNT=1 +fi + +for m in false true batch +do + test_expect_success "create the files for object_fsyncing=$m" ' + git reset --hard && + # create files across directories + test_create_unique_files $dir_count $files_per_dir files + ' + + case $m in + false) + FSYNC_CONFIG='-c core.fsync=-loose-object -c core.fsyncmethod=fsync' + ;; + true) + FSYNC_CONFIG='-c core.fsync=loose-object -c core.fsyncmethod=fsync' + ;; + batch) + FSYNC_CONFIG='-c core.fsync=loose-object -c core.fsyncmethod=batch' + ;; + esac + + test_perf "add $total_files files (object_fsyncing=$m)" " + git $FSYNC_CONFIG add files + " +done + +test_done diff --git a/t/perf/p3900-stash.sh b/t/perf/p3900-stash.sh new file mode 100755 index 00000000000..3526f06cef4 --- /dev/null +++ b/t/perf/p3900-stash.sh @@ -0,0 +1,62 @@ +#!/bin/sh +# +# This test measures the performance of adding new files to the object database +# and index. The test was originally added to measure the effect of the +# core.fsyncMethod=batch mode, which is why we are testing different values +# of that setting explicitly and creating a lot of unique objects. + +test_description="Tests performance of stash" + +# Fsync is normally turned off for the test suite. +GIT_TEST_FSYNC=1 +export GIT_TEST_FSYNC + +. ./perf-lib.sh + +. $TEST_DIRECTORY/lib-unique-files.sh + +test_perf_default_repo +test_checkout_worktree + +dir_count=10 +files_per_dir=50 +total_files=$((dir_count * files_per_dir)) + +# We need to create the files each time we run the perf test, but +# we do not want to measure the cost of creating the files, so run +# the test once. +if test "${GIT_PERF_REPEAT_COUNT-1}" -ne 1 +then + echo "warning: Setting GIT_PERF_REPEAT_COUNT=1" >&2 + GIT_PERF_REPEAT_COUNT=1 +fi + +for m in false true batch +do + test_expect_success "create the files for object_fsyncing=$m" ' + git reset --hard && + # create files across directories + test_create_unique_files $dir_count $files_per_dir files + ' + + case $m in + false) + FSYNC_CONFIG='-c core.fsync=-loose-object -c core.fsyncmethod=fsync' + ;; + true) + FSYNC_CONFIG='-c core.fsync=loose-object -c core.fsyncmethod=fsync' + ;; + batch) + FSYNC_CONFIG='-c core.fsync=loose-object -c core.fsyncmethod=batch' + ;; + esac + + # We only stash files in the 'files' subdirectory since + # the perf test infrastructure creates files in the + # current working directory that need to be preserved + test_perf "stash $total_files files (object_fsyncing=$m)" " + git $FSYNC_CONFIG stash push -u -- files + " +done + +test_done diff --git a/t/perf/perf-lib.sh b/t/perf/perf-lib.sh index 932105cd12c..d270d1d962a 100644 --- a/t/perf/perf-lib.sh +++ b/t/perf/perf-lib.sh @@ -98,8 +98,8 @@ test_perf_create_repo_from () { mkdir -p "$repo/.git" ( cd "$source" && - { cp -Rl "$objects_dir" "$repo/.git/" 2>/dev/null || - cp -R "$objects_dir" "$repo/.git/"; } && + { cp -Rl "$objects_dir" "$repo/.git/" || + cp -R "$objects_dir" "$repo/.git/" 2>/dev/null;} && # common_dir must come first here, since we want source_git to # take precedence and overwrite any overlapping files