From patchwork Wed Mar 23 14:18:25 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsCBCamFybWFzb24=?= X-Patchwork-Id: 12789805 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id F40A5C433EF for ; Wed, 23 Mar 2022 14:18:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244690AbiCWOUS (ORCPT ); Wed, 23 Mar 2022 10:20:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42302 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237118AbiCWOUQ (ORCPT ); Wed, 23 Mar 2022 10:20:16 -0400 Received: from mail-wm1-x32a.google.com (mail-wm1-x32a.google.com [IPv6:2a00:1450:4864:20::32a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 670C77C174 for ; Wed, 23 Mar 2022 07:18:46 -0700 (PDT) Received: by mail-wm1-x32a.google.com with SMTP id r64so1031063wmr.4 for ; Wed, 23 Mar 2022 07:18:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=NEl2TPXkGJPZMUlcZArsK9LrVbvSSsRHbbQiCADWqYg=; b=WkIJFB6eu+NzCCqGlsga4QQY3ZUhaBf3OyQBoh9kDI0VgvLrQVucAgcSKnADy/iGTb 8YWL2B7/cq+jJGKcTtfsYR+rrA2r35pndEF8915+o0l520RvC80TY7/aBvWcRD5v9riM CLEqhwXwv8a+wpJWC0kEjtH0fyEEfHGpsiMieGzvmaDqN7BO4MPpvlOZXQi+CsipgSaX PenoFg9rMiLGVhKvfIdnLIOtOxsV2Pot6MbgocPJobigEKv57kaQGlKGYy0LxTKoES25 GW5fu3kcC1eHc65ununu5Elvr28tmXx/VsH6xODrMN6NjfLvAtmbHCGopGrVfst9Xhwv eGkw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=NEl2TPXkGJPZMUlcZArsK9LrVbvSSsRHbbQiCADWqYg=; b=tW8YXJKhMlFf0y1xTdWxwBFzH3wu5lxxbvAgVIZsfE0SuVwxOPz4dabKP89E/UHPQv z2VTBwju1KXztehK3JQDtTwqphVuYQczhgnKNVHkJKnDnoMEYXmpDBF2N/rUuLhCs2QK 6yty39sah2CF22XBcRztDWs5pMVe4MHQqEzMrOKJfCb6nrg4I2s8wWGh1vvwzfLMnT6v hAv3njoLc4/YqYMkOBL80vd/xL2feKbqNuxmSPofRBpLhbFeNzYn/PUz+qS98gCUv2P0 J598iehkC3Sbz71ULaYvil+m1XOWv5biIn/UUBcBQMFCKQTs5Gi/UBbej4KAGVPY7i8D c04Q== X-Gm-Message-State: AOAM530tSKzBwckIiQIMvlMMaer1eYKgE7qRxM/lAVCrANLKdX+yUy12 FPOu/eyt9GMYHG3sCEFKfZJoULcTHJbNvw== X-Google-Smtp-Source: ABdhPJxXLnLAmVEEa/V9pUk6CMivCRpHyPRYGg7ajEtrnTwP1jJlueY9tXoqsoL/DRgR3bjFnH6x6w== X-Received: by 2002:a7b:cd88:0:b0:38c:9d04:d794 with SMTP id y8-20020a7bcd88000000b0038c9d04d794mr87118wmj.140.1648045124535; Wed, 23 Mar 2022 07:18:44 -0700 (PDT) Received: from vm.nix.is (vm.nix.is. [2a01:4f8:120:2468::2]) by smtp.gmail.com with ESMTPSA id q14-20020a1cf30e000000b0038986a18ec8sm30592wmq.46.2022.03.23.07.18.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 23 Mar 2022 07:18:44 -0700 (PDT) From: =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsCBCamFybWFzb24=?= To: git@vger.kernel.org Cc: Junio C Hamano , Neeraj Singh , Johannes Schindelin , Patrick Steinhardt , Bagas Sanjaya , Neeraj Singh , =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsCBC?= =?utf-8?b?amFybWFzb24=?= Subject: [RFC PATCH v2 1/7] unpack-objects: add skeleton HASH_N_OBJECTS{,_{FIRST,LAST}} flags Date: Wed, 23 Mar 2022 15:18:25 +0100 Message-Id: X-Mailer: git-send-email 2.35.1.1428.g1c1a0152d61 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org In preparation for making the bulk-checkin.c logic operate from object-file.c itself in some common cases let's add HASH_N_OBJECTS{,_{FIRST,LAST}} flags. This will allow us to adjust for-loops that add N objects to just pass down whether they have >1 objects (HASH_N_OBJECTS), as well as passing down flags for whether we have the first or last object. We'll thus be able to drive any sort of batch-object mechanism from write_object_file_flags() directly, which until now didn't know if it was doing one object, or some arbitrary N. Signed-off-by: Ævar Arnfjörð Bjarmason --- builtin/unpack-objects.c | 60 +++++++++++++++++++++++----------------- cache.h | 3 ++ 2 files changed, 37 insertions(+), 26 deletions(-) diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c index c55b6616aed..ec40c6fd966 100644 --- a/builtin/unpack-objects.c +++ b/builtin/unpack-objects.c @@ -233,7 +233,8 @@ static void write_rest(void) } static void added_object(unsigned nr, enum object_type type, - void *data, unsigned long size); + void *data, unsigned long size, + unsigned oflags); /* * Write out nr-th object from the list, now we know the contents @@ -241,21 +242,21 @@ static void added_object(unsigned nr, enum object_type type, * to be checked at the end. */ static void write_object(unsigned nr, enum object_type type, - void *buf, unsigned long size) + void *buf, unsigned long size, unsigned oflags) { if (!strict) { - if (write_object_file(buf, size, type, - &obj_list[nr].oid) < 0) + if (write_object_file_flags(buf, size, type, + &obj_list[nr].oid, oflags) < 0) die("failed to write object"); - added_object(nr, type, buf, size); + added_object(nr, type, buf, size, oflags); free(buf); obj_list[nr].obj = NULL; } else if (type == OBJ_BLOB) { struct blob *blob; - if (write_object_file(buf, size, type, - &obj_list[nr].oid) < 0) + if (write_object_file_flags(buf, size, type, + &obj_list[nr].oid, oflags) < 0) die("failed to write object"); - added_object(nr, type, buf, size); + added_object(nr, type, buf, size, oflags); free(buf); blob = lookup_blob(the_repository, &obj_list[nr].oid); @@ -269,7 +270,7 @@ static void write_object(unsigned nr, enum object_type type, int eaten; hash_object_file(the_hash_algo, buf, size, type, &obj_list[nr].oid); - added_object(nr, type, buf, size); + added_object(nr, type, buf, size, oflags); obj = parse_object_buffer(the_repository, &obj_list[nr].oid, type, size, buf, &eaten); @@ -283,7 +284,7 @@ static void write_object(unsigned nr, enum object_type type, static void resolve_delta(unsigned nr, enum object_type type, void *base, unsigned long base_size, - void *delta, unsigned long delta_size) + void *delta, unsigned long delta_size, unsigned oflags) { void *result; unsigned long result_size; @@ -294,7 +295,7 @@ static void resolve_delta(unsigned nr, enum object_type type, if (!result) die("failed to apply delta"); free(delta); - write_object(nr, type, result, result_size); + write_object(nr, type, result, result_size, oflags); } /* @@ -302,7 +303,7 @@ static void resolve_delta(unsigned nr, enum object_type type, * resolve all the deltified objects that are based on it. */ static void added_object(unsigned nr, enum object_type type, - void *data, unsigned long size) + void *data, unsigned long size, unsigned oflags) { struct delta_info **p = &delta_list; struct delta_info *info; @@ -313,7 +314,7 @@ static void added_object(unsigned nr, enum object_type type, *p = info->next; p = &delta_list; resolve_delta(info->nr, type, data, size, - info->delta, info->size); + info->delta, info->size, oflags); free(info); continue; } @@ -322,18 +323,19 @@ static void added_object(unsigned nr, enum object_type type, } static void unpack_non_delta_entry(enum object_type type, unsigned long size, - unsigned nr) + unsigned nr, unsigned oflags) { void *buf = get_data(size); if (!dry_run && buf) - write_object(nr, type, buf, size); + write_object(nr, type, buf, size, oflags); else free(buf); } static int resolve_against_held(unsigned nr, const struct object_id *base, - void *delta_data, unsigned long delta_size) + void *delta_data, unsigned long delta_size, + unsigned oflags) { struct object *obj; struct obj_buffer *obj_buffer; @@ -344,12 +346,12 @@ static int resolve_against_held(unsigned nr, const struct object_id *base, if (!obj_buffer) return 0; resolve_delta(nr, obj->type, obj_buffer->buffer, - obj_buffer->size, delta_data, delta_size); + obj_buffer->size, delta_data, delta_size, oflags); return 1; } static void unpack_delta_entry(enum object_type type, unsigned long delta_size, - unsigned nr) + unsigned nr, unsigned oflags) { void *delta_data, *base; unsigned long base_size; @@ -366,7 +368,7 @@ static void unpack_delta_entry(enum object_type type, unsigned long delta_size, if (has_object_file(&base_oid)) ; /* Ok we have this one */ else if (resolve_against_held(nr, &base_oid, - delta_data, delta_size)) + delta_data, delta_size, oflags)) return; /* we are done */ else { /* cannot resolve yet --- queue it */ @@ -428,7 +430,7 @@ static void unpack_delta_entry(enum object_type type, unsigned long delta_size, } } - if (resolve_against_held(nr, &base_oid, delta_data, delta_size)) + if (resolve_against_held(nr, &base_oid, delta_data, delta_size, oflags)) return; base = read_object_file(&base_oid, &type, &base_size); @@ -440,11 +442,11 @@ static void unpack_delta_entry(enum object_type type, unsigned long delta_size, has_errors = 1; return; } - resolve_delta(nr, type, base, base_size, delta_data, delta_size); + resolve_delta(nr, type, base, base_size, delta_data, delta_size, oflags); free(base); } -static void unpack_one(unsigned nr) +static void unpack_one(unsigned nr, unsigned oflags) { unsigned shift; unsigned char *pack; @@ -472,11 +474,11 @@ static void unpack_one(unsigned nr) case OBJ_TREE: case OBJ_BLOB: case OBJ_TAG: - unpack_non_delta_entry(type, size, nr); + unpack_non_delta_entry(type, size, nr, oflags); return; case OBJ_REF_DELTA: case OBJ_OFS_DELTA: - unpack_delta_entry(type, size, nr); + unpack_delta_entry(type, size, nr, oflags); return; default: error("bad object type %d", type); @@ -491,6 +493,7 @@ static void unpack_all(void) { int i; struct pack_header *hdr = fill(sizeof(struct pack_header)); + unsigned oflags; nr_objects = ntohl(hdr->hdr_entries); @@ -505,9 +508,14 @@ static void unpack_all(void) progress = start_progress(_("Unpacking objects"), nr_objects); CALLOC_ARRAY(obj_list, nr_objects); plug_bulk_checkin(); + oflags = nr_objects > 1 ? HASH_N_OBJECTS : 0; for (i = 0; i < nr_objects; i++) { - unpack_one(i); - display_progress(progress, i + 1); + int nth = i + 1; + unsigned f = i == 0 ? HASH_N_OBJECTS_FIRST : + nr_objects == nth ? HASH_N_OBJECTS_LAST : 0; + + unpack_one(i, oflags | f); + display_progress(progress, nth); } unplug_bulk_checkin(); stop_progress(&progress); diff --git a/cache.h b/cache.h index 84fafe2ed71..72c91c91286 100644 --- a/cache.h +++ b/cache.h @@ -896,6 +896,9 @@ int ie_modified(struct index_state *, const struct cache_entry *, struct stat *, #define HASH_FORMAT_CHECK 2 #define HASH_RENORMALIZE 4 #define HASH_SILENT 8 +#define HASH_N_OBJECTS 1<<4 +#define HASH_N_OBJECTS_FIRST 1<<5 +#define HASH_N_OBJECTS_LAST 1<<6 int index_fd(struct index_state *istate, struct object_id *oid, int fd, struct stat *st, enum object_type type, const char *path, unsigned flags); int index_path(struct index_state *istate, struct object_id *oid, const char *path, struct stat *st, unsigned flags); From patchwork Wed Mar 23 14:18:26 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsCBCamFybWFzb24=?= X-Patchwork-Id: 12789807 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 653BEC433FE for ; Wed, 23 Mar 2022 14:18:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244702AbiCWOUZ (ORCPT ); Wed, 23 Mar 2022 10:20:25 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42392 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244689AbiCWOUS (ORCPT ); Wed, 23 Mar 2022 10:20:18 -0400 Received: from mail-wm1-x331.google.com (mail-wm1-x331.google.com [IPv6:2a00:1450:4864:20::331]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 711FC7C795 for ; Wed, 23 Mar 2022 07:18:47 -0700 (PDT) Received: by mail-wm1-x331.google.com with SMTP id r190-20020a1c2bc7000000b0038a1013241dso1016097wmr.1 for ; Wed, 23 Mar 2022 07:18:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=FzrDEvrT/Yf3XGyhzV3PfcpHzsagn6QzDf2xficoLuU=; b=cA3wTAMPgraLYl4Ym1l1TDbYD66rPmuDewtCgXwt0MjfWbTPiJdx7hdD3IB11YJrLy +jM4CrYsb1aa/ZWFSXlobIo6VnH8tEH+YEHqVtLF9I6c54LPHbRShPukZzQsrfuWl0CM gLpRiAzM2YORUHAH/1R7bs3Gpr2ou6kH/y3VO30kixk5TwYlTbGE1C6gTDrtoN+cdFp1 HzxqM6b+udBmQz0ZOclxXZK+QYkaNIvGf7cPD3xvQyU9SYhTrN0cn+svS1eTRxGMSfzt 0cvn2o1nXLZgRLl4pxcXELu9A5/xQfAgVfrPggmPFty1w9v+5NDHpkTqfjyJWHJry8qm Xvsg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=FzrDEvrT/Yf3XGyhzV3PfcpHzsagn6QzDf2xficoLuU=; b=pKBpnqTcdqy818mxLqq7A0jorKZ4E4nGJuoCULTTyrdNntDD46t/WzMAu885l7LNFy QXWn8gXUBzYQjpqdP7l1IIgslRNjvLRokGSMURfF4WIPe2XiRsZHIlGROQKATtLY++O+ xB/2CF8zv/uIOZeGP4ijNoIjcy0aAzB5aAawzHhzh78G0YlPF5DUtCtOzSdnRZXUroH+ x2ACwoevrW5lP8IVZ4KdId+BOcf//bLGewP1q5L3byGmZkLmWdeBcAzeDo1Wn4ud9pjA 89a9KFgpI4BSRu7VvxnE3IcSKjPvvs5HEKr9JzRedY1jSeqoxCeamiwvGtgKWQy3zxmk iqSg== X-Gm-Message-State: AOAM532Ok0oChkHRQxbOJirixkYmZ3y66H6+bi58YLwyhoVOCzG6S+Aw 60LDGclCNgiECOqYqGYEjB17OeoFh4nZOw== X-Google-Smtp-Source: ABdhPJwRGYvL4zSNOSBFHTobJ4RQyxK88RDP+I2XYrfaSPhF2/Jljm8aDZLpICxBbDZPw+V99rVJbQ== X-Received: by 2002:a05:600c:3016:b0:38c:8786:d3b6 with SMTP id j22-20020a05600c301600b0038c8786d3b6mr51783wmh.135.1648045125579; Wed, 23 Mar 2022 07:18:45 -0700 (PDT) Received: from vm.nix.is (vm.nix.is. [2a01:4f8:120:2468::2]) by smtp.gmail.com with ESMTPSA id q14-20020a1cf30e000000b0038986a18ec8sm30592wmq.46.2022.03.23.07.18.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 23 Mar 2022 07:18:44 -0700 (PDT) From: =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsCBCamFybWFzb24=?= To: git@vger.kernel.org Cc: Junio C Hamano , Neeraj Singh , Johannes Schindelin , Patrick Steinhardt , Bagas Sanjaya , Neeraj Singh , =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsCBC?= =?utf-8?b?amFybWFzb24=?= Subject: [RFC PATCH v2 2/7] object-file: pass down unpack-objects.c flags for "bulk" checkin Date: Wed, 23 Mar 2022 15:18:26 +0100 Message-Id: X-Mailer: git-send-email 2.35.1.1428.g1c1a0152d61 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Remove much of this as a POC for exploring some of what I mentioned in https://lore.kernel.org/git/220322.86mthinxnn.gmgdl@evledraar.gmail.com/ This commit is obviously not what we *should* do as end-state, but demonstrates what's needed (I think) for a bare-minimum implementation of just the "bulk" syncing method for loose objects without the part where we do the tmp-objdir.c dance. Performance with this is already quite promising. Benchmarking with: git hyperfine -L rev ns/batched-fsync,HEAD -s 'make CFLAGS=-O3' \ -p 'rm -rf r.git && git init --bare r.git' \ './git -C r.git -c core.fsync=loose-object -c core.fsyncMethod=batch unpack-objects 0" branch. Note: This commit reverts much of "core.fsyncmethod: batched disk flushes for loose-objects". We'll set up new structures to bring what it was doing back in a different way. I.e. to do the tmp-objdir plug-in in object-file.c Signed-off-by: Ævar Arnfjörð Bjarmason --- builtin/unpack-objects.c | 2 -- builtin/update-index.c | 4 --- bulk-checkin.c | 74 ---------------------------------------- bulk-checkin.h | 3 -- cache.h | 5 --- object-file.c | 37 ++++++++++++++------ 6 files changed, 26 insertions(+), 99 deletions(-) diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c index ec40c6fd966..93da436581b 100644 --- a/builtin/unpack-objects.c +++ b/builtin/unpack-objects.c @@ -507,7 +507,6 @@ static void unpack_all(void) if (!quiet) progress = start_progress(_("Unpacking objects"), nr_objects); CALLOC_ARRAY(obj_list, nr_objects); - plug_bulk_checkin(); oflags = nr_objects > 1 ? HASH_N_OBJECTS : 0; for (i = 0; i < nr_objects; i++) { int nth = i + 1; @@ -517,7 +516,6 @@ static void unpack_all(void) unpack_one(i, oflags | f); display_progress(progress, nth); } - unplug_bulk_checkin(); stop_progress(&progress); if (delta_list) diff --git a/builtin/update-index.c b/builtin/update-index.c index cbd2b0d633b..95ed3c47b2e 100644 --- a/builtin/update-index.c +++ b/builtin/update-index.c @@ -1118,8 +1118,6 @@ int cmd_update_index(int argc, const char **argv, const char *prefix) parse_options_start(&ctx, argc, argv, prefix, options, PARSE_OPT_STOP_AT_NON_OPTION); - /* optimize adding many objects to the object database */ - plug_bulk_checkin(); while (ctx.argc) { if (parseopt_state != PARSE_OPT_DONE) parseopt_state = parse_options_step(&ctx, options, @@ -1194,8 +1192,6 @@ int cmd_update_index(int argc, const char **argv, const char *prefix) strbuf_release(&buf); } - /* by now we must have added all of the new objects */ - unplug_bulk_checkin(); if (split_index > 0) { if (git_config_get_split_index() == 0) warning(_("core.splitIndex is set to false; " diff --git a/bulk-checkin.c b/bulk-checkin.c index a0dca79ba6a..577b135e39c 100644 --- a/bulk-checkin.c +++ b/bulk-checkin.c @@ -3,20 +3,15 @@ */ #include "cache.h" #include "bulk-checkin.h" -#include "lockfile.h" #include "repository.h" #include "csum-file.h" #include "pack.h" #include "strbuf.h" -#include "string-list.h" -#include "tmp-objdir.h" #include "packfile.h" #include "object-store.h" static int bulk_checkin_plugged; -static struct tmp_objdir *bulk_fsync_objdir; - static struct bulk_checkin_state { char *pack_tmp_name; struct hashfile *f; @@ -85,40 +80,6 @@ static void finish_bulk_checkin(struct bulk_checkin_state *state) reprepare_packed_git(the_repository); } -/* - * Cleanup after batch-mode fsync_object_files. - */ -static void do_batch_fsync(void) -{ - struct strbuf temp_path = STRBUF_INIT; - struct tempfile *temp; - - if (!bulk_fsync_objdir) - return; - - /* - * Issue a full hardware flush against a temporary file to ensure - * that all objects are durable before any renames occur. The code in - * fsync_loose_object_bulk_checkin has already issued a writeout - * request, but it has not flushed any writeback cache in the storage - * hardware or any filesystem logs. This fsync call acts as a barrier - * to ensure that the data in each new object file is durable before - * the final name is visible. - */ - strbuf_addf(&temp_path, "%s/bulk_fsync_XXXXXX", get_object_directory()); - temp = xmks_tempfile(temp_path.buf); - fsync_or_die(get_tempfile_fd(temp), get_tempfile_path(temp)); - delete_tempfile(&temp); - strbuf_release(&temp_path); - - /* - * Make the object files visible in the primary ODB after their data is - * fully durable. - */ - tmp_objdir_migrate(bulk_fsync_objdir); - bulk_fsync_objdir = NULL; -} - static int already_written(struct bulk_checkin_state *state, struct object_id *oid) { int i; @@ -313,26 +274,6 @@ static int deflate_to_pack(struct bulk_checkin_state *state, return 0; } -void prepare_loose_object_bulk_checkin(void) -{ - if (bulk_checkin_plugged && !bulk_fsync_objdir) - bulk_fsync_objdir = tmp_objdir_create("bulk-fsync"); -} - -void fsync_loose_object_bulk_checkin(int fd, const char *filename) -{ - /* - * If we have a plugged bulk checkin, we issue a call that - * cleans the filesystem page cache but avoids a hardware flush - * command. Later on we will issue a single hardware flush - * before as part of do_batch_fsync. - */ - if (!bulk_fsync_objdir || - git_fsync(fd, FSYNC_WRITEOUT_ONLY) < 0) { - fsync_or_die(fd, filename); - } -} - int index_bulk_checkin(struct object_id *oid, int fd, size_t size, enum object_type type, const char *path, unsigned flags) @@ -347,19 +288,6 @@ int index_bulk_checkin(struct object_id *oid, void plug_bulk_checkin(void) { assert(!bulk_checkin_plugged); - - /* - * A temporary object directory is used to hold the files - * while they are not fsynced. - */ - if (batch_fsync_enabled(FSYNC_COMPONENT_LOOSE_OBJECT)) { - bulk_fsync_objdir = tmp_objdir_create("bulk-fsync"); - if (!bulk_fsync_objdir) - die(_("Could not create temporary object directory for core.fsyncMethod=batch")); - - tmp_objdir_replace_primary_odb(bulk_fsync_objdir, 0); - } - bulk_checkin_plugged = 1; } @@ -369,6 +297,4 @@ void unplug_bulk_checkin(void) bulk_checkin_plugged = 0; if (bulk_checkin_state.f) finish_bulk_checkin(&bulk_checkin_state); - - do_batch_fsync(); } diff --git a/bulk-checkin.h b/bulk-checkin.h index 181d3447ff9..b26f3dc3b74 100644 --- a/bulk-checkin.h +++ b/bulk-checkin.h @@ -6,9 +6,6 @@ #include "cache.h" -void prepare_loose_object_bulk_checkin(void); -void fsync_loose_object_bulk_checkin(int fd, const char *filename); - int index_bulk_checkin(struct object_id *oid, int fd, size_t size, enum object_type type, const char *path, unsigned flags); diff --git a/cache.h b/cache.h index 72c91c91286..2f3831fa853 100644 --- a/cache.h +++ b/cache.h @@ -1772,11 +1772,6 @@ void fsync_or_die(int fd, const char *); int fsync_component(enum fsync_component component, int fd); void fsync_component_or_die(enum fsync_component component, int fd, const char *msg); -static inline int batch_fsync_enabled(enum fsync_component component) -{ - return (fsync_components & component) && (fsync_method == FSYNC_METHOD_BATCH); -} - ssize_t read_in_full(int fd, void *buf, size_t count); ssize_t write_in_full(int fd, const void *buf, size_t count); ssize_t pread_in_full(int fd, void *buf, size_t count, off_t offset); diff --git a/object-file.c b/object-file.c index cd0ddb49e4b..dbeb3df502d 100644 --- a/object-file.c +++ b/object-file.c @@ -1886,19 +1886,37 @@ void hash_object_file(const struct git_hash_algo *algo, const void *buf, hash_object_file_literally(algo, buf, len, type_name(type), oid); } +static void sync_loose_object_batch(int fd, const char *filename, + const unsigned oflags) +{ + const int last = oflags & HASH_N_OBJECTS_LAST; + + /* + * We're doing a sync_file_range() (or equivalent) for 1..N-1 + * objects, and then a "real" fsync() for N. On some OS's + * enabling core.fsync=loose-object && core.fsyncMethod=batch + * improves the performance by a lot. + */ + if (last || (!last && git_fsync(fd, FSYNC_WRITEOUT_ONLY) < 0)) + fsync_or_die(fd, filename); +} + /* Finalize a file on disk, and close it. */ -static void close_loose_object(int fd, const char *filename) +static void close_loose_object(int fd, const char *filename, + const unsigned oflags) { + int fsync_loose; + if (the_repository->objects->odb->will_destroy) goto out; - if (batch_fsync_enabled(FSYNC_COMPONENT_LOOSE_OBJECT)) - fsync_loose_object_bulk_checkin(fd, filename); - else if (fsync_object_files > 0) + fsync_loose = fsync_components & FSYNC_COMPONENT_LOOSE_OBJECT; + + if (oflags & HASH_N_OBJECTS && fsync_loose && + fsync_method == FSYNC_METHOD_BATCH) + sync_loose_object_batch(fd, filename, oflags); + else if (fsync_object_files > 0 || fsync_loose) fsync_or_die(fd, filename); - else - fsync_component_or_die(FSYNC_COMPONENT_LOOSE_OBJECT, fd, - filename); out: if (close(fd) != 0) @@ -1962,9 +1980,6 @@ static int write_loose_object(const struct object_id *oid, char *hdr, static struct strbuf tmp_file = STRBUF_INIT; static struct strbuf filename = STRBUF_INIT; - if (batch_fsync_enabled(FSYNC_COMPONENT_LOOSE_OBJECT)) - prepare_loose_object_bulk_checkin(); - loose_object_path(the_repository, &filename, oid); fd = create_tmpfile(&tmp_file, filename.buf); @@ -2015,7 +2030,7 @@ static int write_loose_object(const struct object_id *oid, char *hdr, die(_("confused by unstable object source data for %s"), oid_to_hex(oid)); - close_loose_object(fd, tmp_file.buf); + close_loose_object(fd, tmp_file.buf, flags); if (mtime) { struct utimbuf utb; From patchwork Wed Mar 23 14:18:27 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsCBCamFybWFzb24=?= X-Patchwork-Id: 12789808 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 45054C433EF for ; Wed, 23 Mar 2022 14:19:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244704AbiCWOU0 (ORCPT ); Wed, 23 Mar 2022 10:20:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42440 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244692AbiCWOUS (ORCPT ); Wed, 23 Mar 2022 10:20:18 -0400 Received: from mail-wr1-x430.google.com (mail-wr1-x430.google.com [IPv6:2a00:1450:4864:20::430]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6295D7C174 for ; Wed, 23 Mar 2022 07:18:48 -0700 (PDT) Received: by mail-wr1-x430.google.com with SMTP id r13so2336706wrr.9 for ; Wed, 23 Mar 2022 07:18:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=WiTfAcJdJXo8A5uaC2Wd1+NkjJOfiwE6jVqaPd2koL0=; b=aBKgYwv6sOUSV4AlP/QNX76uIUaYXRTCb5LfPb0VpHCZxtxj/K44m0iUII1iyVQDyq 0oFwRNS4XMFQrYcEAP+BZnj+M6bHanXSXO8TQpm1/1WLpNfzk3I/TQkkvpOJlt6NVXtd 9lDQQXb0UCsnjFcAvTX+OidTWZSig4SFH/9yZ+yMKcddRMmagqrx0CNQ8vK9ZcagzxrK b39olKrJkbga9/yaCHkwtGFqhXHeinQJ3ovWpqy09vy8ks3ei8hV1yzJtw8Th4ygzGXU aVviP19xe1P+p1GIV4xfE0AeacoCnVbyKYO83CLd1MXtTomxkzAG4lj7C9zE2s93Y6xb p7wA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=WiTfAcJdJXo8A5uaC2Wd1+NkjJOfiwE6jVqaPd2koL0=; b=7nYPN9fcNQLMgyRA7hzwLm5RETMns2jcloyZsOVzEiLvkWyMV1HOKCkfcFNoOjOb0s moTqF+wrFgnjW7piADMaQtSGKQHm10DcOv8K6ZDhDMvFdr/bDPQLy87iE+5AqEjqD09m /iHuNJ93JLEUEigCmI33uQOUPUsHlDoslLNpBCO3RXl17PazzSIwZR3zbCzYfNPrSgqt vCnZbAfBwAvBYHRpStOl1hIYsNm7Zf+L8rGaoAYoLm+YlXQfqMx7qiEWt/PSOk7OaE7U ocajMH3hbpsCDQayJHkVyoI84mCTb5+gzgivNf8cD9i+e2GTX6Habnvk76LbqKuuTu3y Hjmw== X-Gm-Message-State: AOAM532KJNCgUFdwBIhsPULuN0XouGMeRiQu/IiRwSWlvNJPQRipZgbs O9AsBiQlp0jN5/W9UMhfYARapDRAymg7oQ== X-Google-Smtp-Source: ABdhPJyRxaRUv68L1HcRUvKIOYtOYwqIACuk912ybc0x+3zeYc/pXtE6l4gSNbKT3R3LLC6a4yr/tQ== X-Received: by 2002:adf:fac8:0:b0:203:fb08:ff7 with SMTP id a8-20020adffac8000000b00203fb080ff7mr24158wrs.648.1648045126577; Wed, 23 Mar 2022 07:18:46 -0700 (PDT) Received: from vm.nix.is (vm.nix.is. [2a01:4f8:120:2468::2]) by smtp.gmail.com with ESMTPSA id q14-20020a1cf30e000000b0038986a18ec8sm30592wmq.46.2022.03.23.07.18.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 23 Mar 2022 07:18:45 -0700 (PDT) From: =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsCBCamFybWFzb24=?= To: git@vger.kernel.org Cc: Junio C Hamano , Neeraj Singh , Johannes Schindelin , Patrick Steinhardt , Bagas Sanjaya , Neeraj Singh , =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsCBC?= =?utf-8?b?amFybWFzb24=?= Subject: [RFC PATCH v2 3/7] update-index: pass down skeleton "oflags" argument Date: Wed, 23 Mar 2022 15:18:27 +0100 Message-Id: X-Mailer: git-send-email 2.35.1.1428.g1c1a0152d61 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org As with a preceding change to "unpack-objects" add an "oflags" going from cmd_update_index() all the way down to the code in object-file.c. Note also how index_mem() will now call write_object_file_flags(). Signed-off-by: Ævar Arnfjörð Bjarmason --- builtin/update-index.c | 32 ++++++++++++++++++-------------- object-file.c | 2 +- 2 files changed, 19 insertions(+), 15 deletions(-) diff --git a/builtin/update-index.c b/builtin/update-index.c index 95ed3c47b2e..34aaaa16c20 100644 --- a/builtin/update-index.c +++ b/builtin/update-index.c @@ -267,10 +267,12 @@ static int process_lstat_error(const char *path, int err) return error("lstat(\"%s\"): %s", path, strerror(err)); } -static int add_one_path(const struct cache_entry *old, const char *path, int len, struct stat *st) +static int add_one_path(const struct cache_entry *old, const char *path, + int len, struct stat *st, const unsigned oflags) { int option; struct cache_entry *ce; + unsigned f; /* Was the old index entry already up-to-date? */ if (old && !ce_stage(old) && !ce_match_stat(old, st, 0)) @@ -283,8 +285,8 @@ static int add_one_path(const struct cache_entry *old, const char *path, int len fill_stat_cache_info(&the_index, ce, st); ce->ce_mode = ce_mode_from_stat(old, st->st_mode); - if (index_path(&the_index, &ce->oid, path, st, - info_only ? 0 : HASH_WRITE_OBJECT)) { + f = oflags | (info_only ? 0 : HASH_WRITE_OBJECT); + if (index_path(&the_index, &ce->oid, path, st, f)) { discard_cache_entry(ce); return -1; } @@ -320,7 +322,8 @@ static int add_one_path(const struct cache_entry *old, const char *path, int len * - it doesn't exist at all in the index, but it is a valid * git directory, and it should be *added* as a gitlink. */ -static int process_directory(const char *path, int len, struct stat *st) +static int process_directory(const char *path, int len, struct stat *st, + const unsigned oflags) { struct object_id oid; int pos = cache_name_pos(path, len); @@ -334,7 +337,7 @@ static int process_directory(const char *path, int len, struct stat *st) if (resolve_gitlink_ref(path, "HEAD", &oid) < 0) return 0; - return add_one_path(ce, path, len, st); + return add_one_path(ce, path, len, st, oflags); } /* Should this be an unconditional error? */ return remove_one_path(path); @@ -358,13 +361,14 @@ static int process_directory(const char *path, int len, struct stat *st) /* No match - should we add it as a gitlink? */ if (!resolve_gitlink_ref(path, "HEAD", &oid)) - return add_one_path(NULL, path, len, st); + return add_one_path(NULL, path, len, st, oflags); /* Error out. */ return error("%s: is a directory - add files inside instead", path); } -static int process_path(const char *path, struct stat *st, int stat_errno) +static int process_path(const char *path, struct stat *st, int stat_errno, + const unsigned oflags) { int pos, len; const struct cache_entry *ce; @@ -395,9 +399,9 @@ static int process_path(const char *path, struct stat *st, int stat_errno) return process_lstat_error(path, stat_errno); if (S_ISDIR(st->st_mode)) - return process_directory(path, len, st); + return process_directory(path, len, st, oflags); - return add_one_path(ce, path, len, st); + return add_one_path(ce, path, len, st, oflags); } static int add_cacheinfo(unsigned int mode, const struct object_id *oid, @@ -446,7 +450,7 @@ static void chmod_path(char flip, const char *path) die("git update-index: cannot chmod %cx '%s'", flip, path); } -static void update_one(const char *path) +static void update_one(const char *path, const unsigned oflags) { int stat_errno = 0; struct stat st; @@ -485,7 +489,7 @@ static void update_one(const char *path) report("remove '%s'", path); return; } - if (process_path(path, &st, stat_errno)) + if (process_path(path, &st, stat_errno, oflags)) die("Unable to process path %s", path); report("add '%s'", path); } @@ -776,7 +780,7 @@ static int do_reupdate(int ac, const char **av, */ save_nr = active_nr; path = xstrdup(ce->name); - update_one(path); + update_one(path, 0); free(path); discard_cache_entry(old); if (save_nr != active_nr) @@ -1138,7 +1142,7 @@ int cmd_update_index(int argc, const char **argv, const char *prefix) setup_work_tree(); p = prefix_path(prefix, prefix_length, path); - update_one(p); + update_one(p, 0); if (set_executable_bit) chmod_path(set_executable_bit, p); free(p); @@ -1183,7 +1187,7 @@ int cmd_update_index(int argc, const char **argv, const char *prefix) strbuf_swap(&buf, &unquoted); } p = prefix_path(prefix, prefix_length, buf.buf); - update_one(p); + update_one(p, 0); if (set_executable_bit) chmod_path(set_executable_bit, p); free(p); diff --git a/object-file.c b/object-file.c index dbeb3df502d..8999fce2b15 100644 --- a/object-file.c +++ b/object-file.c @@ -2211,7 +2211,7 @@ static int index_mem(struct index_state *istate, } if (write_object) - ret = write_object_file(buf, size, type, oid); + ret = write_object_file_flags(buf, size, type, oid, flags); else hash_object_file(the_hash_algo, buf, size, type, oid); if (re_allocated) From patchwork Wed Mar 23 14:18:28 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsCBCamFybWFzb24=?= X-Patchwork-Id: 12789809 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0D1D1C433F5 for ; Wed, 23 Mar 2022 14:19:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244710AbiCWOUh (ORCPT ); Wed, 23 Mar 2022 10:20:37 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42548 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244695AbiCWOUU (ORCPT ); Wed, 23 Mar 2022 10:20:20 -0400 Received: from mail-wm1-x32b.google.com (mail-wm1-x32b.google.com [IPv6:2a00:1450:4864:20::32b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5540F7C799 for ; Wed, 23 Mar 2022 07:18:49 -0700 (PDT) Received: by mail-wm1-x32b.google.com with SMTP id p26-20020a05600c1d9a00b0038ccbff1951so33651wms.1 for ; Wed, 23 Mar 2022 07:18:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=hCxgYGY5sx044c3hn7+9Re/0syPBMtAsicc7ETFjbG4=; b=cRlOfryIQNmgoIUjrVyxnp30ZzMqcG/mSwDtucidlJ75Ip4jUqgXOzQ77FyZIIgcwg dIfFl3zYStP+l2UshRv6C1Z5bSPxe2rYK1nY6BIL8bvNIdAdJ4lyI8kHVoFKR3EHvNbq LV79mQ1wcl9tf728HpNPJDmvNjnGTK1HNCvc1zSRNuFCCqO8Hdf7hoGVUeDsoMFeyHd0 0E0E8CCTsGLmyt5zVGLqSCkLerw+7lVB3OXKXAz91cxEQWmI3X1hSuaJm8DUTKPiVZul 43T7q7deAlb9imd11vhVvC0bTR5C+Xi09AsOyB3AE8EtQ+i5woOs7LZ/k9XEEEgnQUNd HhgA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=hCxgYGY5sx044c3hn7+9Re/0syPBMtAsicc7ETFjbG4=; b=noqfyin+dIw8BaunMCR9WpiVllFNDplaWWMOBL63daFMMnMGKYVcFGGB8rSQlgiJhn nsHMTkVLmUdj44Inpg1I8zqkDn1Now4dwWGm98w7AGovbEITSj1I/M6xHVnr5m1H+68J yclhsKzvgUPwIoupbhTYBFvL+magkrIyxY+BIpoOwO0spWNy1I/b/W/Ri5x1uKBAgeLc 6iQElC1BGwUX8bqJChbrAC00b9D9ifelTP0iZb0cecqRyCwaimd1nrMWMCYUajXIWGcz R/5FBH3E2hm3hfhNOekSRFUG7awjVBaypJUGPPPzFDzRGqIDB4risLlQ68I3SChTlXGk gyeQ== X-Gm-Message-State: AOAM532ArARvKd0T/MJQbAILHG0/Uzb1dPPD/ueqELyQNkMsZ/6KaEhP 3VAT8p3OC734rnBh+6cm8+E/R3Q0ldBXwA== X-Google-Smtp-Source: ABdhPJw/Nci8rhY+tEroNZmZsQkF+JLBC3snTrSCJih8MhibpH1SltOCsqdLQsrUXYcEyziAnFXokg== X-Received: by 2002:a05:600c:1f17:b0:38b:b2b3:9faa with SMTP id bd23-20020a05600c1f1700b0038bb2b39faamr9678652wmb.190.1648045127547; Wed, 23 Mar 2022 07:18:47 -0700 (PDT) Received: from vm.nix.is (vm.nix.is. [2a01:4f8:120:2468::2]) by smtp.gmail.com with ESMTPSA id q14-20020a1cf30e000000b0038986a18ec8sm30592wmq.46.2022.03.23.07.18.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 23 Mar 2022 07:18:46 -0700 (PDT) From: =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsCBCamFybWFzb24=?= To: git@vger.kernel.org Cc: Junio C Hamano , Neeraj Singh , Johannes Schindelin , Patrick Steinhardt , Bagas Sanjaya , Neeraj Singh , =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsCBC?= =?utf-8?b?amFybWFzb24=?= Subject: [RFC PATCH v2 4/7] update-index: have the index fsync() flush the loose objects Date: Wed, 23 Mar 2022 15:18:28 +0100 Message-Id: X-Mailer: git-send-email 2.35.1.1428.g1c1a0152d61 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org As with unpack-objects in a preceding commit have update-index.c make use of the HASH_N_OBJECTS{,_{FIRST,LAST}} flags. We now have a "batch" mode again for "update-index". Adding the t/* directory from git.git on a Linux ramdisk is a bit faster than with the tmp-objdir indirection: $ git hyperfine -L rev ns/batched-fsync,HEAD -s 'make CFLAGS=-O3 && rm -rf repo && git init repo && cp -R t repo/ && git ls-files -- t >repo/.git/to-add.txt' -p 'rm -rf repo/.git/objects/* repo/.git/index' './git -c core.fsync=loose-object -c core.fsyncMethod=batch -C repo update-index --add --stdin The flow of this code isn't quite set up for re-plugging the tmp-objdir back in. In particular we no longer pass HASH_N_OBJECTS_FIRST (but doing so would be trivial)< and there's no HASH_N_OBJECTS_LAST. So this and other callers would need some light transaction-y API, or to otherwise pass down a "yes, I'd like to flush it" down to finalize_hashfile(), but doing so will be trivial. And since we've started structuring it this way it'll become easy to do any arbitrary number of things down the line that would "bulk fsync" before the final fsync(). Now we write some objects and fsync() on the index, but between those two could do any number of other things where we'd defer the fsync(). This sort of thing might be especially interesting for "git repack" when it writes e.g. a *.bitmap, *.rev, *.pack and *.idx. In that case we could skip the fsync() on all of those, and only do it on the *.idx before we renamed it in-place. I *think* nothing cares about a *.pack without an *.idx, but even then we could fsync *.idx, rename *.pack, rename *.idx and still safely do only one fsync(). See "git show --first-parent" on 62874602032 (Merge branch 'tb/pack-finalize-ordering' into maint, 2021-10-12) for a good overview of the code involved in that. 1. https://lore.kernel.org/git/220323.86sfr9ndpr.gmgdl@evledraar.gmail.com/ Signed-off-by: Ævar Arnfjörð Bjarmason --- builtin/update-index.c | 7 ++++--- cache.h | 1 + read-cache.c | 29 ++++++++++++++++++++++++++++- 3 files changed, 33 insertions(+), 4 deletions(-) diff --git a/builtin/update-index.c b/builtin/update-index.c index 34aaaa16c20..6cfec6efb38 100644 --- a/builtin/update-index.c +++ b/builtin/update-index.c @@ -1142,7 +1142,7 @@ int cmd_update_index(int argc, const char **argv, const char *prefix) setup_work_tree(); p = prefix_path(prefix, prefix_length, path); - update_one(p, 0); + update_one(p, HASH_N_OBJECTS); if (set_executable_bit) chmod_path(set_executable_bit, p); free(p); @@ -1187,7 +1187,7 @@ int cmd_update_index(int argc, const char **argv, const char *prefix) strbuf_swap(&buf, &unquoted); } p = prefix_path(prefix, prefix_length, buf.buf); - update_one(p, 0); + update_one(p, HASH_N_OBJECTS); if (set_executable_bit) chmod_path(set_executable_bit, p); free(p); @@ -1263,7 +1263,8 @@ int cmd_update_index(int argc, const char **argv, const char *prefix) exit(128); unable_to_lock_die(get_index_file(), lock_error); } - if (write_locked_index(&the_index, &lock_file, COMMIT_LOCK)) + if (write_locked_index(&the_index, &lock_file, + COMMIT_LOCK | WLI_NEED_LOOSE_FSYNC)) die("Unable to write new index file"); } diff --git a/cache.h b/cache.h index 2f3831fa853..7542e009a34 100644 --- a/cache.h +++ b/cache.h @@ -751,6 +751,7 @@ void ensure_full_index(struct index_state *istate); /* For use with `write_locked_index()`. */ #define COMMIT_LOCK (1 << 0) #define SKIP_IF_UNCHANGED (1 << 1) +#define WLI_NEED_LOOSE_FSYNC (1 << 2) /* * Write the index while holding an already-taken lock. Close the lock, diff --git a/read-cache.c b/read-cache.c index 3e0e7d41837..275f6308c32 100644 --- a/read-cache.c +++ b/read-cache.c @@ -2860,6 +2860,33 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile, int ieot_entries = 1; struct index_entry_offset_table *ieot = NULL; int nr, nr_threads; + unsigned int wflags = FSYNC_COMPONENT_INDEX; + + + /* + * TODO: This is abuse of the API recently modified + * finalize_hashfile() which reveals a shortcoming of its + * "fsync" design. + * + * I.e. It expects a "enum fsync_component component" label, + * but here we're passing it an OR of the two, knowing that + * it'll call fsync_component_or_die() which (in + * write-or-die.c) will do "(fsync_components & wflags)" (to + * our "wflags" here). + * + * But the API really should be changed to explicitly take + * such flags, because in this case we'd like to fsync() the + * index if we're in the bulk mode, *even if* our + * "core.fsync=index" isn't configured. + * + * That's because at this point we've been queuing up object + * writes that we didn't fsync(), and are going to use this + * fsync() to "flush" the whole thing. Doing it this way + * avoids redundantly calling fsync() twice when once will do. + */ + if (fsync_method == FSYNC_METHOD_BATCH && + flags & WLI_NEED_LOOSE_FSYNC) + wflags |= FSYNC_COMPONENT_LOOSE_OBJECT; f = hashfd(tempfile->fd, tempfile->filename.buf); @@ -3094,7 +3121,7 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile, if (!alternate_index_output && (flags & COMMIT_LOCK)) csum_fsync_flag = CSUM_FSYNC; - finalize_hashfile(f, istate->oid.hash, FSYNC_COMPONENT_INDEX, + finalize_hashfile(f, istate->oid.hash, wflags, CSUM_HASH_IN_STREAM | csum_fsync_flag); if (close_tempfile_gently(tempfile)) { From patchwork Wed Mar 23 14:18:29 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsCBCamFybWFzb24=?= X-Patchwork-Id: 12789810 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B0DEAC433EF for ; Wed, 23 Mar 2022 14:19:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244757AbiCWOUo (ORCPT ); Wed, 23 Mar 2022 10:20:44 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42546 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244696AbiCWOUU (ORCPT ); Wed, 23 Mar 2022 10:20:20 -0400 Received: from mail-wm1-x330.google.com (mail-wm1-x330.google.com [IPv6:2a00:1450:4864:20::330]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 39E5B7C79B for ; Wed, 23 Mar 2022 07:18:50 -0700 (PDT) Received: by mail-wm1-x330.google.com with SMTP id p12-20020a05600c430c00b0038cbdf52227so1014391wme.2 for ; Wed, 23 Mar 2022 07:18:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=aiCX82v4JkbKLcnp7TNkHQh6J48cLvQ+k4Zd2qEfSN8=; b=bwJC5IJ4DN/wEvzENoMXM5Pk9y7JMbCdr+zMGUmTyrqDIax8gPFqdub1+7nHtskK22 oHMBIkiteDiQ5gD41Cu1s3/zL56eIc98szYYqVBK6GrtX98cCopH/HMkREX1eOz47ZoK SAO3dJck0Tl94vVmqY7znAoZYSFvEEzaN/sZjiW0v7sX3xmRWPbFARoC/MNLH1YQYcHt BiO8ETZVJ+d9mroRJYg87cYFc/f5m42ZgsJ4E/WZilju58Nbea0CqH0Mfxbp14vyddeX C1eIY+LHgABTIDOt/GWaPVglb7JLTxjb8ID2USEXWAeIggTe2J1LqKjWjfbiix2gcqtE X8zg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=aiCX82v4JkbKLcnp7TNkHQh6J48cLvQ+k4Zd2qEfSN8=; b=2Fk6hV6YP9pED38vKFq6qqafpBZbba37tDLjcdSZ5JrjzJRDblB9XknS6NBDWInlfd gKOXhZ5uTNXrv0hYtWS2R/yB859kisrwysyQ9fbn/pNIKMHYW0o3SS7XvbctAJ4N0lFJ wFsj+u9+yAABlo+yzSL/FCPRN+vaG69FvKaCxxAL0UecA8EXvPXueq1ome1WStTcKpn2 5O3A9QBT6k8soSeRpMxOW3j5G8N/FOTk91419z/Cpz8lAV1po4xNO37rAySK03QzoMHN OGIU7FsftGsFbSi5x8PO4kAThy5CZWq5NHKEmPG1dJEgA1DU2LccVeP23fQYChjLKpin clNQ== X-Gm-Message-State: AOAM530C3n/r8CxQmrAviNwp4agUDxYhhTM4DcFAacvlOffW2d8T6HjI aoXeiV9bOYdGEuBLaTapOSP5u+bfsk06gA== X-Google-Smtp-Source: ABdhPJw3fCEsDluLch1z72EN/SBdW/Ww18SqH5+vww/KbDcxWtjKcxX3aRGq9mXnsq1l2hTk4gcsXA== X-Received: by 2002:a05:600c:2188:b0:38c:9a21:9c95 with SMTP id e8-20020a05600c218800b0038c9a219c95mr9040207wme.87.1648045128486; Wed, 23 Mar 2022 07:18:48 -0700 (PDT) Received: from vm.nix.is (vm.nix.is. [2a01:4f8:120:2468::2]) by smtp.gmail.com with ESMTPSA id q14-20020a1cf30e000000b0038986a18ec8sm30592wmq.46.2022.03.23.07.18.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 23 Mar 2022 07:18:47 -0700 (PDT) From: =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsCBCamFybWFzb24=?= To: git@vger.kernel.org Cc: Junio C Hamano , Neeraj Singh , Johannes Schindelin , Patrick Steinhardt , Bagas Sanjaya , Neeraj Singh , =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsCBC?= =?utf-8?b?amFybWFzb24=?= Subject: [RFC PATCH v2 5/7] add: use WLI_NEED_LOOSE_FSYNC for new "only the index" bulk fsync() Date: Wed, 23 Mar 2022 15:18:29 +0100 Message-Id: X-Mailer: git-send-email 2.35.1.1428.g1c1a0152d61 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org We can now bring "bulk" syncing back to "git add" using a mechanism discussed in the preceding commit where we fsync() on the index, not the last object we write. On a ramdisk: $ git hyperfine -L rev ns/batched-fsync,HEAD -s 'make CFLAGS=-O3 && rm -rf repo && git init repo && cp -R t repo/' -p 'rm -rf repo/.git/objects/* repo/.git/ index' './git -c core.fsync=loose-object -c core.fsyncMethod=batch -C repo add .' --warmup 1 Benchmark 1: ./git -c core.fsync=loose-object -c core.fsyncMethod=batch -C repo add .' in 'ns/batched-fsync Time (mean ± σ): 299.5 ms ± 1.6 ms [User: 193.4 ms, System: 103.7 ms] Range (min … max): 296.6 ms … 301.6 ms 10 runs Benchmark 2: ./git -c core.fsync=loose-object -c core.fsyncMethod=batch -C repo add .' in 'HEAD Time (mean ± σ): 282.8 ms ± 2.1 ms [User: 193.8 ms, System: 86.6 ms] Range (min … max): 279.1 ms … 285.6 ms 10 runs Summary './git -c core.fsync=loose-object -c core.fsyncMethod=batch -C repo add .' in 'HEAD' ran 1.06 ± 0.01 times faster than './git -c core.fsync=loose-object -c core.fsyncMethod=batch -C repo add .' in 'ns/batched-fsync' My times on my spinning disk are too fuzzy to quote with confidence, but I have seen it go as well as 15-30% faster. FWIW doing "strace --summary-only" on the ramdisk is ~20% faster: $ git hyperfine -L rev ns/batched-fsync,HEAD -s 'make CFLAGS=-O3 && rm -rf repo && git init repo && cp -R t repo/' -p 'rm -rf repo/.git/objects/* repo/.git/index' 'strace --summary-only ./git -c core.fsync=loose-object -c core.fsyncMethod=batch -C repo add .' --warmup 1 Benchmark 1: strace --summary-only ./git -c core.fsync=loose-object -c core.fsyncMethod=batch -C repo add .' in 'ns/batched-fsync Time (mean ± σ): 917.4 ms ± 18.8 ms [User: 388.7 ms, System: 672.1 ms] Range (min … max): 885.3 ms … 948.1 ms 10 runs Benchmark 2: strace --summary-only ./git -c core.fsync=loose-object -c core.fsyncMethod=batch -C repo add .' in 'HEAD Time (mean ± σ): 769.0 ms ± 9.2 ms [User: 358.2 ms, System: 521.2 ms] Range (min … max): 760.7 ms … 792.6 ms 10 runs Summary 'strace --summary-only ./git -c core.fsync=loose-object -c core.fsyncMethod=batch -C repo add .' in 'HEAD' ran 1.19 ± 0.03 times faster than 'strace --summary-only ./git -c core.fsync=loose-object -c core.fsyncMethod=batch -C repo add .' in 'ns/batched-fsync' Signed-off-by: Ævar Arnfjörð Bjarmason --- builtin/add.c | 6 ++++-- cache.h | 1 + read-cache.c | 8 ++++++++ 3 files changed, 13 insertions(+), 2 deletions(-) diff --git a/builtin/add.c b/builtin/add.c index 3ffb86a4338..6ef18b6246c 100644 --- a/builtin/add.c +++ b/builtin/add.c @@ -580,7 +580,8 @@ int cmd_add(int argc, const char **argv, const char *prefix) (intent_to_add ? ADD_CACHE_INTENT : 0) | (ignore_add_errors ? ADD_CACHE_IGNORE_ERRORS : 0) | (!(addremove || take_worktree_changes) - ? ADD_CACHE_IGNORE_REMOVAL : 0)); + ? ADD_CACHE_IGNORE_REMOVAL : 0)) | + ADD_CACHE_HASH_N_OBJECTS; if (read_cache_preload(&pathspec) < 0) die(_("index file corrupt")); @@ -686,7 +687,8 @@ int cmd_add(int argc, const char **argv, const char *prefix) finish: if (write_locked_index(&the_index, &lock_file, - COMMIT_LOCK | SKIP_IF_UNCHANGED)) + COMMIT_LOCK | SKIP_IF_UNCHANGED | + WLI_NEED_LOOSE_FSYNC)) die(_("Unable to write new index file")); dir_clear(&dir); diff --git a/cache.h b/cache.h index 7542e009a34..d57af938cbc 100644 --- a/cache.h +++ b/cache.h @@ -857,6 +857,7 @@ int remove_file_from_index(struct index_state *, const char *path); #define ADD_CACHE_IGNORE_ERRORS 4 #define ADD_CACHE_IGNORE_REMOVAL 8 #define ADD_CACHE_INTENT 16 +#define ADD_CACHE_HASH_N_OBJECTS 32 /* * These two are used to add the contents of the file at path * to the index, marking the working tree up-to-date by storing diff --git a/read-cache.c b/read-cache.c index 275f6308c32..788423b6dde 100644 --- a/read-cache.c +++ b/read-cache.c @@ -755,6 +755,14 @@ int add_to_index(struct index_state *istate, const char *path, struct stat *st, unsigned hash_flags = pretend ? 0 : HASH_WRITE_OBJECT; struct object_id oid; + /* + * TODO: Can't we also set HASH_N_OBJECTS_FIRST as a function + * of !(ce->ce_flags & CE_ADDED) or something? I'm not too + * familiar with the cache API... + */ + if (flags & ADD_CACHE_HASH_N_OBJECTS) + hash_flags |= HASH_N_OBJECTS; + if (flags & ADD_CACHE_RENORMALIZE) hash_flags |= HASH_RENORMALIZE; From patchwork Wed Mar 23 14:18:30 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsCBCamFybWFzb24=?= X-Patchwork-Id: 12789812 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5DE06C433EF for ; Wed, 23 Mar 2022 14:19:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244700AbiCWOUr (ORCPT ); Wed, 23 Mar 2022 10:20:47 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42574 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244697AbiCWOUU (ORCPT ); Wed, 23 Mar 2022 10:20:20 -0400 Received: from mail-wr1-x42a.google.com (mail-wr1-x42a.google.com [IPv6:2a00:1450:4864:20::42a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3936D7CDC0 for ; Wed, 23 Mar 2022 07:18:51 -0700 (PDT) Received: by mail-wr1-x42a.google.com with SMTP id m30so2377166wrb.1 for ; Wed, 23 Mar 2022 07:18:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=HIwBEui+mBuahqG4gyWGQKaoY0BcWnAnpQS93bYyQrA=; b=NALxmF1I9XcTIbiTUvUzBilyD9Hjncpn7CJs6qsapc9nhzXLdk71BqsxbyQ7wml5pM FyflGjp/6jWGAghA9DwHdQHs50fLKfkyVXwauy+Pnc1pUMmfWXQyi0kKIlv5EV0tfc0X kGu0GJsnpf7wcfqVGlKuVpQfUaiZCY6tyESr6C4IRrVe3DG4ftoFUMKj96kbtINFn7QI de1/W+VRgZQC7FyoZxj78v+L6/5/ZZYM6OXuSs73ByO6BJiXyi/QL1UMqFkbNoEmANfI zF/nj/cEOBnyZ+Dhe+U/kPPMS1MRzOoIKVG177VZ0ihaAaLsxVI4C+j3DvUA1aBrYI+l QAuQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=HIwBEui+mBuahqG4gyWGQKaoY0BcWnAnpQS93bYyQrA=; b=P64aYmDb2YE5fQEDlankMuCZ9RAebfelGkv/VREmJq19IgC6qEWO/Deqj6ApEt9Zau fFNLVvPifvdQA2kSqwB9Qfy/KCfjKyxmqWtjck0pPEMWu3xN81Mbn+dWmEbJAbQVYxY5 Yaz8LIof4Dni0LBAXw4laU59FSkU1oD6W09D+Q0S+fFDdKGOsyPIRL5aLU7MiPLva9K6 6MOhd2KMleNTZ9qw+vhJVnp22SpJIM+Rs8GoqDay/zoEZ6ycoMJB+HUdLxRY0JOi2c// JeUf+2g8bInW8iEOo/srpApLch5o3Magd2ubX2KN9lYzG+AYelotf6RA2av9f7p2tS/J KtDA== X-Gm-Message-State: AOAM531Qknw5zsq9w/3UxXlVh+NcW6+bWai0dR+HeLO/o7MGI0HVDWNb fzfdiZypuSRvPqzWR5hSAi5TCjPZbCBJgQ== X-Google-Smtp-Source: ABdhPJwyBIeRupjT71R+CEr5khJgmvFBGfmuGllL/pP3EcgfHnLLThckxHHKrULrSVjR6CvlD8MbuQ== X-Received: by 2002:adf:fa87:0:b0:203:f28e:76c3 with SMTP id h7-20020adffa87000000b00203f28e76c3mr23283665wrr.579.1648045129543; Wed, 23 Mar 2022 07:18:49 -0700 (PDT) Received: from vm.nix.is (vm.nix.is. [2a01:4f8:120:2468::2]) by smtp.gmail.com with ESMTPSA id q14-20020a1cf30e000000b0038986a18ec8sm30592wmq.46.2022.03.23.07.18.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 23 Mar 2022 07:18:48 -0700 (PDT) From: =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsCBCamFybWFzb24=?= To: git@vger.kernel.org Cc: Junio C Hamano , Neeraj Singh , Johannes Schindelin , Patrick Steinhardt , Bagas Sanjaya , Neeraj Singh , =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsCBC?= =?utf-8?b?amFybWFzb24=?= Subject: [RFC PATCH v2 6/7] fsync docs: update for new syncing semantics Date: Wed, 23 Mar 2022 15:18:30 +0100 Message-Id: X-Mailer: git-send-email 2.35.1.1428.g1c1a0152d61 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Signed-off-by: Ævar Arnfjörð Bjarmason --- Documentation/config/core.txt | 23 +++++++++++++++++------ 1 file changed, 17 insertions(+), 6 deletions(-) diff --git a/Documentation/config/core.txt b/Documentation/config/core.txt index cf0e9b8b088..f598925b597 100644 --- a/Documentation/config/core.txt +++ b/Documentation/config/core.txt @@ -596,12 +596,23 @@ core.fsyncMethod:: filesystem and storage hardware, data added to the repository may not be durable in the event of a system crash. This is the default mode on macOS. * `batch` enables a mode that uses writeout-only flushes to stage multiple - updates in the disk writeback cache and then does a single full fsync of - a dummy file to trigger the disk cache flush at the end of the operation. - Currently `batch` mode only applies to loose-object files. Other repository - data is made durable as if `fsync` was specified. This mode is expected to - be as safe as `fsync` on macOS for repos stored on HFS+ or APFS filesystems - and on Windows for repos stored on NTFS or ReFS filesystems. + updates in the disk writeback cache and, before doing a full fsync() of + on the "last" file that to trigger the disk cache flush at the end of the + operation. ++ +Other repository data is made durable as if `fsync` was +specified. This mode is expected to be as safe as `fsync` on macOS for +repos stored on HFS+ or APFS filesystems and on Windows for repos +stored on NTFS or ReFS filesystems. ++ +The `batch` is currently only applies to loose-object files and will +kick in when using the linkgit:git-unpack-objects[1] and +linkgit:update-index[1] commands. Note that the "last" file to be +synced may be the last object, as in the case of +linkgit:git-unpack-objects[1], or relevant "index" (or in the future, +"ref") update, as in the case of linkgit:git-update-index[1]. I.e. the +batch syncing of the loose objects may be deferred until a subsequent +fsync() to a file that makes them "active". core.fsyncObjectFiles:: This boolean will enable 'fsync()' when writing object files. From patchwork Wed Mar 23 14:18:31 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsCBCamFybWFzb24=?= X-Patchwork-Id: 12789811 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 15EA9C433FE for ; Wed, 23 Mar 2022 14:19:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244730AbiCWOUq (ORCPT ); Wed, 23 Mar 2022 10:20:46 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42630 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244699AbiCWOUW (ORCPT ); Wed, 23 Mar 2022 10:20:22 -0400 Received: from mail-wm1-x329.google.com (mail-wm1-x329.google.com [IPv6:2a00:1450:4864:20::329]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4D64F7CDC4 for ; Wed, 23 Mar 2022 07:18:52 -0700 (PDT) Received: by mail-wm1-x329.google.com with SMTP id k124-20020a1ca182000000b0038c9cf6e2a6so1028183wme.0 for ; Wed, 23 Mar 2022 07:18:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=jzcblGX8ZxIKXRkxRRlf46XrZ6d0iUVgHKsq+qVMr6o=; b=OJCkvNRFb4em2Muj+YD0IJMlXl36M1PU6FdpUYqSyBXrU8h8pgKifqAN+YJQ9avG+r ulBEz52EcyOTWBaTAVMkUtV2A4jAGpNIXARXeiR7fCAd+lcIMK/cC8DiRLictJE+/clA sUjXb5Oo+Xut7m4lrIXT85hASqSgMrqvJ2yG3JAlF4ChNVdI49krghGVKgFtNGOX8vhH RA78Jf3Ds13pm4v4CP4WeTurAa/t4N/RevU+yBy3mwLqJxd/1bcQJNGwn3Gl6/oEsbXC V+I7e9eqqvdjfkSnb83GJjNTnbPuKYU5yNsfwrEIttmH4mmsAybNRvPf7nlx3BKssCak JxVQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=jzcblGX8ZxIKXRkxRRlf46XrZ6d0iUVgHKsq+qVMr6o=; b=5Btma1XWVjN/aamQ+XTh8s8xBd6Tl9KykB5P0CT4lyJ6n61Yh3RB+gpCai9mJdsAit SmkWva+xn5dzPqo2G3nxMHt+kVxr/9YDJXaKasCUJm5zR4ms0g8IBu0o0IQx7PND5Vm6 ekuISlPN19JD94jlNVQjjgYuj2z+hFUD6KwOGLUeCMtVFxyD4vdm/sqpxKpZlR+ykGGb R7ZLNdZNHdghBeu0v7eEz3filHkOQsYIZbzlm79pJ219sCOqIxCm6geg03aPn1/r56nN /3P72v1cSZn8IHkytPkUpbPSFu+aVeCPie5Vt8M4qZ0k3WE6mTM4i5ZuJAxtKyeyM1uI n1Dg== X-Gm-Message-State: AOAM532T8N88E+Xy23ARNTJxCis1Z5Gl5G56bgtCrdZtQJlPqnQlxR5i XVjYsk8/KUrxSXBsZSrSj8tsLCLqesrdgQ== X-Google-Smtp-Source: ABdhPJyUN64F5FIGkdeuYsU4agUO6AvYkmmr+ZW9SD5vBqEsbJSX7M2Z7bgt1QP6nWEzUbhhwOMoxw== X-Received: by 2002:a05:600c:35cc:b0:38c:73e8:7dd5 with SMTP id r12-20020a05600c35cc00b0038c73e87dd5mr9684391wmq.196.1648045130493; Wed, 23 Mar 2022 07:18:50 -0700 (PDT) Received: from vm.nix.is (vm.nix.is. [2a01:4f8:120:2468::2]) by smtp.gmail.com with ESMTPSA id q14-20020a1cf30e000000b0038986a18ec8sm30592wmq.46.2022.03.23.07.18.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 23 Mar 2022 07:18:49 -0700 (PDT) From: =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsCBCamFybWFzb24=?= To: git@vger.kernel.org Cc: Junio C Hamano , Neeraj Singh , Johannes Schindelin , Patrick Steinhardt , Bagas Sanjaya , Neeraj Singh , =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsCBC?= =?utf-8?b?amFybWFzb24=?= Subject: [RFC PATCH v2 7/7] fsync docs: add new fsyncMethod.batch.quarantine, elaborate on old Date: Wed, 23 Mar 2022 15:18:31 +0100 Message-Id: X-Mailer: git-send-email 2.35.1.1428.g1c1a0152d61 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Add a new fsyncMethod.batch.quarantine setting which defaults to "false". Preceding (RFC, and not meant to flip-flop like that eventually) commits ripped out the "tmp-objdir" part of the core.fsyncMethod=batch. This documentation proposes to keep that as the default for the reasons discussed in it, while allowing users to set "fsyncMethod.batch.quarantine=true". Furthermore update the discussion of "core.fsyncObjectFiles" with information about what it *really* does, why you probably shouldn't use it, and how to safely emulate most of what it gave users in the past in terms of performance benefit. Signed-off-by: Ævar Arnfjörð Bjarmason --- Documentation/config/core.txt | 80 +++++++++++++++++++++++++++++++---- 1 file changed, 72 insertions(+), 8 deletions(-) diff --git a/Documentation/config/core.txt b/Documentation/config/core.txt index f598925b597..365a12dc7ae 100644 --- a/Documentation/config/core.txt +++ b/Documentation/config/core.txt @@ -607,21 +607,85 @@ stored on NTFS or ReFS filesystems. + The `batch` is currently only applies to loose-object files and will kick in when using the linkgit:git-unpack-objects[1] and -linkgit:update-index[1] commands. Note that the "last" file to be +linkgit:git-update-index[1] commands. Note that the "last" file to be synced may be the last object, as in the case of linkgit:git-unpack-objects[1], or relevant "index" (or in the future, "ref") update, as in the case of linkgit:git-update-index[1]. I.e. the batch syncing of the loose objects may be deferred until a subsequent fsync() to a file that makes them "active". +fsyncMethod.batch.quarantine:: + A boolean which if set to `true` will cause "batched" writes + to objects to be "quarantined" if + `core.fsyncMethod=batch`. This is `false` by default. ++ +The primary object of these fsync() settings is to protect against +repository corruption of things which are reachable, i.e. "reachable", +via references, the index etc. Not merely objects that were present in +the object store. ++ +Historically setting `core.fsyncObjectFiles=false` assumed that on a +filesystem with where an fsync() would flush all preceding outstanding +I/O that we might end up with a corrupt loose object, but that was OK +as long as no reference referred to it. We'd eventually the corrupt +object with linkgit:git-gc[1], and linkgit:git-fsck[1] would only +report it as a minor annoyance ++ +Setting `fsyncMethod.batch.quarantine=true` takes the view that +something like a corrupt *unreferenced* loose object in the object +store is something we'd like to avoid, at the cost of reduced +performance when using `core.fsyncMethod=batch`. ++ +Currently this uses the same mechanism described in the "QUARANTINE +ENVIRONMENT" in the linkgit:git-receive-pack[1] documentation, but +that's subject to change. The performance loss is because we need to +"stage" the objects in that quarantine environment, fsync() it, and +once that's done rename() or link() it in-place into the main object +store, possibly with an fsync() of the index or ref at the end ++ +With `fsyncMethod.batch.quarantine=false` we'll "stage" things in the +main object store, and then do one fsync() at the very end, either on +the last object we write, or file (index or ref) that'll make it +"reachable". ++ +The bad thing about setting this to `true` is lost performance, as +well as not being able to access the objects as they're written (which +e.g. consumers of linkgit:git-update-index[1]'s `--verbose` mode might +want to do). ++ +The good thing is that you should be guaranteed not to get e.g. short +or otherwise corrupt loose objects if you pull your power cord, in +practice various git commands deal quite badly with discovering such a +stray corrupt object (including perhaps assuming it's valid based on +its existence, or hard dying on an error rather than replacing +it). Repairing such "unreachable corruption" can require manual +intervention. + core.fsyncObjectFiles:: - This boolean will enable 'fsync()' when writing object files. - This setting is deprecated. Use core.fsync instead. -+ -This setting affects data added to the Git repository in loose-object -form. When set to true, Git will issue an fsync or similar system call -to flush caches so that loose-objects remain consistent in the face -of a unclean system shutdown. + This boolean will enable 'fsync()' when writing loose object + files. ++ +This setting is the historical fsync configuration setting. It's now +*deprecated*, you should use `core.fsync` instead, perhaps in +combination with `core.fsyncMethod=batch`. ++ +The `core.fsyncObjectFiles` was initially added based on integrity +assumptions that early (pre-ext-4) versions of Linux's "ext" +filesystems provided. ++ +I.e. that a write of file A without an `fsync()` followed by a write +of file `B` with `fsync()` would implicitly guarantee that `A' would +be `fsync()`'d by calling `fsync()` on `B`. This asssumption is *not* +backed up by any standard (e.g. POSIX), but worked in practice on some +Linux setups. ++ +Nowadays you should almost certainly want to use +`core.fsync=loose-object` instead in combination with +`core.fsyncMethod=bulk`, and possibly with +`fsyncMethod.batch.quarantine=true`, see above. On modern OS's (Linux, +OSX, Windows) that gives you most of the performance benefit of +`core.fsyncObjectFiles=false` with all of the safety of the old +`core.fsyncObjectFiles=true`. core.preloadIndex:: Enable parallel index preload for operations like 'git diff'