From patchwork Fri May 20 23:17:32 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 12857536 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 62CEFC433EF for ; Fri, 20 May 2022 23:17:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1354028AbiETXRy (ORCPT ); Fri, 20 May 2022 19:17:54 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49256 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1354029AbiETXRm (ORCPT ); Fri, 20 May 2022 19:17:42 -0400 Received: from mail-qt1-x82e.google.com (mail-qt1-x82e.google.com [IPv6:2607:f8b0:4864:20::82e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id ED3131957A3 for ; Fri, 20 May 2022 16:17:34 -0700 (PDT) Received: by mail-qt1-x82e.google.com with SMTP id hh4so8859421qtb.10 for ; Fri, 20 May 2022 16:17:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20210112.gappssmtp.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=J9A7QUwSMtUTE9V9ep9knuOFm86Sksaj+/ci+zTakCc=; b=ELj77lHKB+eEgmx5zcJ5UahCNJWNcWgulpiVmFlA/hOLicfeX9Olg/zdI5jkegbdaj nHgqhSWw3tdkIKq0fvOxtj62WF9qa5c1LXWUaFJytvrnLd1eDY6laZfOiPgVczZGYIJx mBe27WaiAbcn6EGYfzXYFUb2ntKZ4D1sYj0dlZdxGVNhFoQe4twAVBcd3RmylJvJZtjg KzqvLY/WE2X4d0pYpHqJtm2GBZ82xGkCnQqXX+fLmTVTZKiuP5HbtOo+Bv/HfUsJb7gr +9k95etOOK8NZam9bDaVl9JscUF1Pvdqmr9mzTeqhJ2eCva0zwI4msFTTe02wQXa6u1r N+aQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=J9A7QUwSMtUTE9V9ep9knuOFm86Sksaj+/ci+zTakCc=; b=Tg61Sr5OehQcmHfJ03uPKsisemsOtEXru40UA4Sv7OMcp5+f5s9XDgRkqUbgrw6ui4 NQ2yyM52RdX4MYuylTQtfHZNZq2UNk19kE5n5kiww4JWlozHCZdqSiioZTvVXPwykRKB jIJ3Yo7Cjf1DOLNwWgABFT0tp1L4YKfDGim22lvcWrKLp8GkRihfiiG9QB2Njh1Vt6qF Aq6CxeoPD04qNUsHyH0VXGKA6fd82e9FkTDfmc+c4pkwqJwB+qX66Q+BWjntLveyCOkF t7OiLU+9m6B4IHmpeuHY8/JOMx+ztnVjl2D08iNgD1az5KlB6xbOLeCQ9wWh8qLCVF4h lK6Q== X-Gm-Message-State: AOAM531cdCPbuGyuwi+Cnq4F6yfBRJpxnxmE/qBXf7by/PCmPeM1RJcA iPOJGZJ7Fe9kVKDjeo7XGp2LMbHQw57QY5fI X-Google-Smtp-Source: ABdhPJzd+7Kz+CYzjW9B1wY+YC70u3l3HLo1QWFY+wY5RoFrN/XUJ+H/XDkz62CR2mn5lPOzQsuOqw== X-Received: by 2002:a05:622a:651:b0:2f3:b1b2:4702 with SMTP id a17-20020a05622a065100b002f3b1b24702mr9487320qtb.239.1653088653691; Fri, 20 May 2022 16:17:33 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id u20-20020a05620a121400b0069fc13ce223sm430584qkj.84.2022.05.20.16.17.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 20 May 2022 16:17:33 -0700 (PDT) Date: Fri, 20 May 2022 19:17:32 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: avarab@gmail.com, derrickstolee@github.com, gitster@pobox.com, jrnieder@gmail.com, larsxschneider@gmail.com, tytso@mit.edu Subject: [PATCH v5 01/17] Documentation/technical: add cruft-packs.txt Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Create a technical document to explain cruft packs. It contains a brief overview of the problem, some background, details on the implementation, and a couple of alternative approaches not considered here. Signed-off-by: Taylor Blau --- Documentation/Makefile | 1 + Documentation/technical/cruft-packs.txt | 123 ++++++++++++++++++++++++ 2 files changed, 124 insertions(+) create mode 100644 Documentation/technical/cruft-packs.txt diff --git a/Documentation/Makefile b/Documentation/Makefile index adb2f1b50a..2faffb52ab 100644 --- a/Documentation/Makefile +++ b/Documentation/Makefile @@ -94,6 +94,7 @@ TECH_DOCS += MyFirstContribution TECH_DOCS += MyFirstObjectWalk TECH_DOCS += SubmittingPatches TECH_DOCS += technical/bundle-format +TECH_DOCS += technical/cruft-packs TECH_DOCS += technical/hash-function-transition TECH_DOCS += technical/http-protocol TECH_DOCS += technical/index-format diff --git a/Documentation/technical/cruft-packs.txt b/Documentation/technical/cruft-packs.txt new file mode 100644 index 0000000000..c0f583cd48 --- /dev/null +++ b/Documentation/technical/cruft-packs.txt @@ -0,0 +1,123 @@ += Cruft packs + +The cruft packs feature offer an alternative to Git's traditional mechanism of +removing unreachable objects. This document provides an overview of Git's +pruning mechanism, and how a cruft pack can be used instead to accomplish the +same. + +== Background + +To remove unreachable objects from your repository, Git offers `git repack -Ad` +(see linkgit:git-repack[1]). Quoting from the documentation: + +[quote] +[...] unreachable objects in a previous pack become loose, unpacked objects, +instead of being left in the old pack. [...] loose unreachable objects will be +pruned according to normal expiry rules with the next 'git gc' invocation. + +Unreachable objects aren't removed immediately, since doing so could race with +an incoming push which may reference an object which is about to be deleted. +Instead, those unreachable objects are stored as loose object and stay that way +until they are older than the expiration window, at which point they are removed +by linkgit:git-prune[1]. + +Git must store these unreachable objects loose in order to keep track of their +per-object mtimes. If these unreachable objects were written into one big pack, +then either freshening that pack (because an object contained within it was +re-written) or creating a new pack of unreachable objects would cause the pack's +mtime to get updated, and the objects within it would never leave the expiration +window. Instead, objects are stored loose in order to keep track of the +individual object mtimes and avoid a situation where all cruft objects are +freshened at once. + +This can lead to undesirable situations when a repository contains many +unreachable objects which have not yet left the grace period. Having large +directories in the shards of `.git/objects` can lead to decreased performance in +the repository. But given enough unreachable objects, this can lead to inode +starvation and degrade the performance of the whole system. Since we +can never pack those objects, these repositories often take up a large amount of +disk space, since we can only zlib compress them, but not store them in delta +chains. + +== Cruft packs + +A cruft pack eliminates the need for storing unreachable objects in a loose +state by including the per-object mtimes in a separate file alongside a single +pack containing all loose objects. + +A cruft pack is written by `git repack --cruft` when generating a new pack. +linkgit:git-pack-objects[1]'s `--cruft` option. Note that `git repack --cruft` +is a classic all-into-one repack, meaning that everything in the resulting pack is +reachable, and everything else is unreachable. Once written, the `--cruft` +option instructs `git repack` to generate another pack containing only objects +not packed in the previous step (which equates to packing all unreachable +objects together). This progresses as follows: + + 1. Enumerate every object, marking any object which is (a) not contained in a + kept-pack, and (b) whose mtime is within the grace period as a traversal + tip. + + 2. Perform a reachability traversal based on the tips gathered in the previous + step, adding every object along the way to the pack. + + 3. Write the pack out, along with a `.mtimes` file that records the per-object + timestamps. + +This mode is invoked internally by linkgit:git-repack[1] when instructed to +write a cruft pack. Crucially, the set of in-core kept packs is exactly the set +of packs which will not be deleted by the repack; in other words, they contain +all of the repository's reachable objects. + +When a repository already has a cruft pack, `git repack --cruft` typically only +adds objects to it. An exception to this is when `git repack` is given the +`--cruft-expiration` option, which allows the generated cruft pack to omit +expired objects instead of waiting for linkgit:git-gc[1] to expire those objects +later on. + +It is linkgit:git-gc[1] that is typically responsible for removing expired +unreachable objects. + +== Caution for mixed-version environments + +Repositories that have cruft packs in them will continue to work with any older +version of Git. Note, however, that previous versions of Git which do not +understand the `.mtimes` file will use the cruft pack's mtime as the mtime for +all of the objects in it. In other words, do not expect older (pre-cruft pack) +versions of Git to interpret or even read the contents of the `.mtimes` file. + +Note that having mixed versions of Git GC-ing the same repository can lead to +unreachable objects never being completely pruned. This can happen under the +following circumstances: + + - An older version of Git running GC explodes the contents of an existing + cruft pack loose, using the cruft pack's mtime. + - A newer version running GC collects those loose objects into a cruft pack, + where the .mtime file reflects the loose object's actual mtimes, but the + cruft pack mtime is "now". + +Repeating this process will lead to unreachable objects not getting pruned as a +result of repeatedly resetting the objects' mtimes to the present time. + +If you are GC-ing repositories in a mixed version environment, consider omitting +the `--cruft` option when using linkgit:git-repack[1] and linkgit:git-gc[1], and +leaving the `gc.cruftPacks` configuration unset until all writers understand +cruft packs. + +== Alternatives + +Notable alternatives to this design include: + + - The location of the per-object mtime data, and + - Storing unreachable objects in multiple cruft packs. + +On the location of mtime data, a new auxiliary file tied to the pack was chosen +to avoid complicating the `.idx` format. If the `.idx` format were ever to gain +support for optional chunks of data, it may make sense to consolidate the +`.mtimes` format into the `.idx` itself. + +Storing unreachable objects among multiple cruft packs (e.g., creating a new +cruft pack during each repacking operation including only unreachable objects +which aren't already stored in an earlier cruft pack) is significantly more +complicated to construct, and so aren't pursued here. The obvious drawback to +the current implementation is that the entire cruft pack must be re-written from +scratch. From patchwork Fri May 20 23:17:35 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 12857542 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D3C73C433F5 for ; Fri, 20 May 2022 23:18:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1354037AbiETXSH (ORCPT ); Fri, 20 May 2022 19:18:07 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49616 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1347331AbiETXRq (ORCPT ); Fri, 20 May 2022 19:17:46 -0400 Received: from mail-qt1-x831.google.com (mail-qt1-x831.google.com [IPv6:2607:f8b0:4864:20::831]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id ADF02197F4F for ; Fri, 20 May 2022 16:17:37 -0700 (PDT) Received: by mail-qt1-x831.google.com with SMTP id x7so5667777qta.6 for ; Fri, 20 May 2022 16:17:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20210112.gappssmtp.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=vpVrKS7+fN+BtOnK0GINvFj8vmLh0kOEvIdfu03/qZ8=; b=EYMW3gNAk23GrJx1zqzxJNuAfppbKPuid/RbDnXPqRXRthlqiEbh+XNi25tXJM90fc 60zA6tNrBfHsuFZnSJ54EjvK7pp6SOp+8ReOzjRODzVnCZHNliAO299RdBXwEWguCWnf At9epmQWAqV0QHJwPCKG2s5G7Ac1lYV2Vn9g498RIcRFocXTyxkoLPSQEZSd1PGqPWhw EsFGSkehICCoYZaHeIGuvznmK2PW2mZtEcDEmvM/FDddAUABzSLwfxTmNYlrTs6yW3pb 8S1nVQb2JLRbrTkXBMradchLsiNK+Xh3yWPmfIl582/K0rH7Sk5mLxWuViGAlGqKnkZD dhDw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=vpVrKS7+fN+BtOnK0GINvFj8vmLh0kOEvIdfu03/qZ8=; b=YHYQKLfXyMhqX77t/LiUW9dqrZhVfsaYkfPGL9hKrqi+Io5iAH/+i2g/9tqMyVDxQe qQcapj4qaDjsVdPvWNJRpLobPVNgpzxZxbFmAbMUyWEpI89TPw7FeRS20LExOZi60Yh5 5SG3Ua23YiB/nAU2gc/odP6DfExYrX8UdyXzMT1c5Fua4T8Id7pZGFJQkiLzyqo2R2j6 G9la6nHbLZ58R7MZDmHtjy77wWg2xVrFPkk4ZqC6XHEWYkOEtbh01HlOA7MV/Uw4ybYf NEPswuYYcMCanwxogPaKugqWi3H0PHWpakcFy18TsOEc5yd3lTcBckd4ZCQQit2NT0YV PZDQ== X-Gm-Message-State: AOAM5336kc7WtzTvgeQrL9Tdp3z35ItKM9chVC04AwC5zm0wYl+ghm5h ya31i+6/gUJchobekeXegvqjcZ9mCp6fxv08 X-Google-Smtp-Source: ABdhPJxe+RRe6hX4MzEMvb1NPyKeqUlWCy1+2C+CqKnQRilvZee8tCmZuXZUORCqrdnSBThCFIQ/VA== X-Received: by 2002:ac8:5f47:0:b0:2f9:2bf3:a540 with SMTP id y7-20020ac85f47000000b002f92bf3a540mr341487qta.254.1653088656589; Fri, 20 May 2022 16:17:36 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id br11-20020a05622a1e0b00b002f39b99f6acsm478803qtb.70.2022.05.20.16.17.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 20 May 2022 16:17:36 -0700 (PDT) Date: Fri, 20 May 2022 19:17:35 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: avarab@gmail.com, derrickstolee@github.com, gitster@pobox.com, jrnieder@gmail.com, larsxschneider@gmail.com, tytso@mit.edu Subject: [PATCH v5 02/17] pack-mtimes: support reading .mtimes files Message-ID: <91a9d21b0b7d99023083c0bbb6f91ccdc1782736.1653088640.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org To store the individual mtimes of objects in a cruft pack, introduce a new `.mtimes` format that can optionally accompany a single pack in the repository. The format is defined in Documentation/technical/pack-format.txt, and stores a 4-byte network order timestamp for each object in name (index) order. This patch prepares for cruft packs by defining the `.mtimes` format, and introducing a basic API that callers can use to read out individual mtimes. Signed-off-by: Taylor Blau Signed-off-by: Taylor Blau --- Documentation/technical/pack-format.txt | 19 ++++ Makefile | 1 + builtin/repack.c | 1 + object-store.h | 5 +- pack-mtimes.c | 126 ++++++++++++++++++++++++ pack-mtimes.h | 15 +++ packfile.c | 19 +++- 7 files changed, 183 insertions(+), 3 deletions(-) create mode 100644 pack-mtimes.c create mode 100644 pack-mtimes.h diff --git a/Documentation/technical/pack-format.txt b/Documentation/technical/pack-format.txt index 6d3efb7d16..b520aa9c45 100644 --- a/Documentation/technical/pack-format.txt +++ b/Documentation/technical/pack-format.txt @@ -294,6 +294,25 @@ Pack file entry: <+ All 4-byte numbers are in network order. +== pack-*.mtimes files have the format: + +All 4-byte numbers are in network byte order. + + - A 4-byte magic number '0x4d544d45' ('MTME'). + + - A 4-byte version identifier (= 1). + + - A 4-byte hash function identifier (= 1 for SHA-1, 2 for SHA-256). + + - A table of 4-byte unsigned integers. The ith value is the + modification time (mtime) of the ith object in the corresponding + pack by lexicographic (index) order. The mtimes count standard + epoch seconds. + + - A trailer, containing a checksum of the corresponding packfile, + and a checksum of all of the above (each having length according + to the specified hash function). + == multi-pack-index (MIDX) files have the following format: The multi-pack-index files refer to multiple pack-files and loose objects. diff --git a/Makefile b/Makefile index 61aadf3ce8..a299580b7c 100644 --- a/Makefile +++ b/Makefile @@ -993,6 +993,7 @@ LIB_OBJS += oidtree.o LIB_OBJS += pack-bitmap-write.o LIB_OBJS += pack-bitmap.o LIB_OBJS += pack-check.o +LIB_OBJS += pack-mtimes.o LIB_OBJS += pack-objects.o LIB_OBJS += pack-revindex.o LIB_OBJS += pack-write.o diff --git a/builtin/repack.c b/builtin/repack.c index d1a563d5b6..e7a3920c6d 100644 --- a/builtin/repack.c +++ b/builtin/repack.c @@ -217,6 +217,7 @@ static struct { } exts[] = { {".pack"}, {".rev", 1}, + {".mtimes", 1}, {".bitmap", 1}, {".promisor", 1}, {".idx"}, diff --git a/object-store.h b/object-store.h index 53996018c1..2c4671ed7a 100644 --- a/object-store.h +++ b/object-store.h @@ -115,12 +115,15 @@ struct packed_git { freshened:1, do_not_close:1, pack_promisor:1, - multi_pack_index:1; + multi_pack_index:1, + is_cruft:1; unsigned char hash[GIT_MAX_RAWSZ]; struct revindex_entry *revindex; const uint32_t *revindex_data; const uint32_t *revindex_map; size_t revindex_size; + const uint32_t *mtimes_map; + size_t mtimes_size; /* something like ".git/objects/pack/xxxxx.pack" */ char pack_name[FLEX_ARRAY]; /* more */ }; diff --git a/pack-mtimes.c b/pack-mtimes.c new file mode 100644 index 0000000000..46ad584af1 --- /dev/null +++ b/pack-mtimes.c @@ -0,0 +1,126 @@ +#include "pack-mtimes.h" +#include "object-store.h" +#include "packfile.h" + +static char *pack_mtimes_filename(struct packed_git *p) +{ + size_t len; + if (!strip_suffix(p->pack_name, ".pack", &len)) + BUG("pack_name does not end in .pack"); + /* NEEDSWORK: this could reuse code from pack-revindex.c. */ + return xstrfmt("%.*s.mtimes", (int)len, p->pack_name); +} + +#define MTIMES_HEADER_SIZE (12) +#define MTIMES_MIN_SIZE (MTIMES_HEADER_SIZE + (2 * the_hash_algo->rawsz)) + +struct mtimes_header { + uint32_t signature; + uint32_t version; + uint32_t hash_id; +}; + +static int load_pack_mtimes_file(char *mtimes_file, + uint32_t num_objects, + const uint32_t **data_p, size_t *len_p) +{ + int fd, ret = 0; + struct stat st; + void *data = NULL; + size_t mtimes_size; + struct mtimes_header header; + uint32_t *hdr; + + fd = git_open(mtimes_file); + + if (fd < 0) { + ret = -1; + goto cleanup; + } + if (fstat(fd, &st)) { + ret = error_errno(_("failed to read %s"), mtimes_file); + goto cleanup; + } + + mtimes_size = xsize_t(st.st_size); + + if (mtimes_size < MTIMES_MIN_SIZE) { + ret = error(_("mtimes file %s is too small"), mtimes_file); + goto cleanup; + } + + if (mtimes_size - MTIMES_MIN_SIZE != st_mult(sizeof(uint32_t), num_objects)) { + ret = error(_("mtimes file %s is corrupt"), mtimes_file); + goto cleanup; + } + + data = hdr = xmmap(NULL, mtimes_size, PROT_READ, MAP_PRIVATE, fd, 0); + + header.signature = ntohl(hdr[0]); + header.version = ntohl(hdr[1]); + header.hash_id = ntohl(hdr[2]); + + if (header.signature != MTIMES_SIGNATURE) { + ret = error(_("mtimes file %s has unknown signature"), mtimes_file); + goto cleanup; + } + + if (header.version != 1) { + ret = error(_("mtimes file %s has unsupported version %"PRIu32), + mtimes_file, header.version); + goto cleanup; + } + + if (!(header.hash_id == 1 || header.hash_id == 2)) { + ret = error(_("mtimes file %s has unsupported hash id %"PRIu32), + mtimes_file, header.hash_id); + goto cleanup; + } + +cleanup: + if (ret) { + if (data) + munmap(data, mtimes_size); + } else { + *len_p = mtimes_size; + *data_p = (const uint32_t *)data; + } + + close(fd); + return ret; +} + +int load_pack_mtimes(struct packed_git *p) +{ + char *mtimes_name = NULL; + int ret = 0; + + if (!p->is_cruft) + return ret; /* not a cruft pack */ + if (p->mtimes_map) + return ret; /* already loaded */ + + ret = open_pack_index(p); + if (ret < 0) + goto cleanup; + + mtimes_name = pack_mtimes_filename(p); + ret = load_pack_mtimes_file(mtimes_name, + p->num_objects, + &p->mtimes_map, + &p->mtimes_size); +cleanup: + free(mtimes_name); + return ret; +} + +uint32_t nth_packed_mtime(struct packed_git *p, uint32_t pos) +{ + if (!p->mtimes_map) + BUG("pack .mtimes file not loaded for %s", p->pack_name); + if (p->num_objects <= pos) + BUG("pack .mtimes out-of-bounds (%"PRIu32" vs %"PRIu32")", + pos, p->num_objects); + + return get_be32(p->mtimes_map + pos + 3); +} diff --git a/pack-mtimes.h b/pack-mtimes.h new file mode 100644 index 0000000000..38ddb9f893 --- /dev/null +++ b/pack-mtimes.h @@ -0,0 +1,15 @@ +#ifndef PACK_MTIMES_H +#define PACK_MTIMES_H + +#include "git-compat-util.h" + +#define MTIMES_SIGNATURE 0x4d544d45 /* "MTME" */ +#define MTIMES_VERSION 1 + +struct packed_git; + +int load_pack_mtimes(struct packed_git *p); + +uint32_t nth_packed_mtime(struct packed_git *p, uint32_t pos); + +#endif diff --git a/packfile.c b/packfile.c index 835b2d2716..fc0245fbab 100644 --- a/packfile.c +++ b/packfile.c @@ -334,12 +334,22 @@ static void close_pack_revindex(struct packed_git *p) p->revindex_data = NULL; } +static void close_pack_mtimes(struct packed_git *p) +{ + if (!p->mtimes_map) + return; + + munmap((void *)p->mtimes_map, p->mtimes_size); + p->mtimes_map = NULL; +} + void close_pack(struct packed_git *p) { close_pack_windows(p); close_pack_fd(p); close_pack_index(p); close_pack_revindex(p); + close_pack_mtimes(p); oidset_clear(&p->bad_objects); } @@ -363,7 +373,7 @@ void close_object_store(struct raw_object_store *o) void unlink_pack_path(const char *pack_name, int force_delete) { - static const char *exts[] = {".pack", ".idx", ".rev", ".keep", ".bitmap", ".promisor"}; + static const char *exts[] = {".pack", ".idx", ".rev", ".keep", ".bitmap", ".promisor", ".mtimes"}; int i; struct strbuf buf = STRBUF_INIT; size_t plen; @@ -718,6 +728,10 @@ struct packed_git *add_packed_git(const char *path, size_t path_len, int local) if (!access(p->pack_name, F_OK)) p->pack_promisor = 1; + xsnprintf(p->pack_name + path_len, alloc - path_len, ".mtimes"); + if (!access(p->pack_name, F_OK)) + p->is_cruft = 1; + xsnprintf(p->pack_name + path_len, alloc - path_len, ".pack"); if (stat(p->pack_name, &st) || !S_ISREG(st.st_mode)) { free(p); @@ -869,7 +883,8 @@ static void prepare_pack(const char *full_name, size_t full_name_len, ends_with(file_name, ".pack") || ends_with(file_name, ".bitmap") || ends_with(file_name, ".keep") || - ends_with(file_name, ".promisor")) + ends_with(file_name, ".promisor") || + ends_with(file_name, ".mtimes")) string_list_append(data->garbage, full_name); else report_garbage(PACKDIR_FILE_GARBAGE, full_name); From patchwork Fri May 20 23:17:38 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 12857539 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 23048C433EF for ; Fri, 20 May 2022 23:18:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1354043AbiETXSA (ORCPT ); Fri, 20 May 2022 19:18:00 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50226 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1354032AbiETXRw (ORCPT ); Fri, 20 May 2022 19:17:52 -0400 Received: from mail-qk1-x733.google.com (mail-qk1-x733.google.com [IPv6:2607:f8b0:4864:20::733]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A6CEA195906 for ; Fri, 20 May 2022 16:17:40 -0700 (PDT) Received: by mail-qk1-x733.google.com with SMTP id i68so7659270qke.11 for ; Fri, 20 May 2022 16:17:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20210112.gappssmtp.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=Bo+ivTzf0EDZiLZ3Ib1QjcIVdR0ks/eiqC5eD2VZ25s=; b=A080MEWTWPJ3N5/sT9LA1gn8zOWtE+1l6IjWEj3pJxKWFfkY0iiZOogXMmTeE12GG1 s5zkArxW+AMSGdG/RpveuArC//JnotvxlViFZNZsGLZbnra8XOSm7jyu+EAIUtMAM4b1 hwLIH7+rmxrpbO6TvOPAJJbONfLh2lfl7Z/XmAE/MFd7/HKxyQkpFAyygYNcIyRDKqSz LbVeV/N0l2GbTKc2GmnFEtXCrdIXHBNiQwoWuRpL7P92tru1BV1byvw434YbjN6xpTBg rDTfLdM2f8wPJZ+HHQZfRVZEL3N6ASIgOd5DsDJ/1rrtyKXKIXZVuQNWZcELrvTg30zf xe+g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=Bo+ivTzf0EDZiLZ3Ib1QjcIVdR0ks/eiqC5eD2VZ25s=; b=Y4cn/JXruJVhRplp+4haOX5oBqA5qbv5L/yQ+TzXjQh/TPo7eaIj/4qGiaFbB2y+Iq 5veuF5W/flZ/oaxES6WP7DALydcd2aYP3NICjq/lRR90g9/54EKfpuWZHF6elazl7pdb pkFgU0XI224XGpGhS9FD2B7MMx9FkmZ2V0h2m7IOwIfc3e1z21pB0g2XYaH4vN4xRYQa EtPKhIzD2Q1yu1yavqMkwNm1Mm4bfYEk6y0p4gf6TAuWaXgngg6MmS8GkmY3K7kywBM7 +kj1VcF5J3Rd/Cp/j8oNbNb8SJZqyYUfDGFoTDWo9h4cvQYqhBcXmhg+f5Pvpn7s/3Dk hubQ== X-Gm-Message-State: AOAM531Fk4EirwaupdPIuvddFCpcc8NKAbD5dAmDM9jneMLSBrukH48x fb57dRogmme+V0gwSSaltdVNqOKtZf/EeFG+ X-Google-Smtp-Source: ABdhPJxtjOAUu666LbHoEtpRZ3xh5UoUzmO2Enj1XOI5PQlRRCqfTBSw3HINZzY/stm5x9/sFTEx9Q== X-Received: by 2002:a05:620a:424e:b0:67e:4c1b:c214 with SMTP id w14-20020a05620a424e00b0067e4c1bc214mr7853173qko.651.1653088659660; Fri, 20 May 2022 16:17:39 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id d25-20020ac84e39000000b002f9273e3980sm437927qtw.77.2022.05.20.16.17.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 20 May 2022 16:17:39 -0700 (PDT) Date: Fri, 20 May 2022 19:17:38 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: avarab@gmail.com, derrickstolee@github.com, gitster@pobox.com, jrnieder@gmail.com, larsxschneider@gmail.com, tytso@mit.edu Subject: [PATCH v5 03/17] pack-write: pass 'struct packing_data' to 'stage_tmp_packfiles' Message-ID: <67c4e7209d510c09b5ddf018204f84ad0d0ebf9b.1653088640.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org This structure will be used to communicate the per-object mtimes when writing a cruft pack. Here, we need the full packing_data structure because the mtime information is stored in an array there, not on the individual object_entry's themselves (to avoid paying the overhead in structure width for operations which do not generate a cruft pack). We haven't passed this information down before because one of the two callers (in bulk-checkin.c) does not have a packing_data structure at all. In that case (where no cruft pack will be generated), NULL is passed instead. Signed-off-by: Taylor Blau --- builtin/pack-objects.c | 3 ++- bulk-checkin.c | 2 +- pack-write.c | 1 + pack.h | 3 +++ 4 files changed, 7 insertions(+), 2 deletions(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 014dcd4bc9..6ac927047c 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -1262,7 +1262,8 @@ static void write_pack_file(void) stage_tmp_packfiles(&tmpname, pack_tmp_name, written_list, nr_written, - &pack_idx_opts, hash, &idx_tmp_name); + &to_pack, &pack_idx_opts, hash, + &idx_tmp_name); if (write_bitmap_index) { size_t tmpname_len = tmpname.len; diff --git a/bulk-checkin.c b/bulk-checkin.c index 6d6c37171c..e988a388b6 100644 --- a/bulk-checkin.c +++ b/bulk-checkin.c @@ -33,7 +33,7 @@ static void finish_tmp_packfile(struct strbuf *basename, char *idx_tmp_name = NULL; stage_tmp_packfiles(basename, pack_tmp_name, written_list, nr_written, - pack_idx_opts, hash, &idx_tmp_name); + NULL, pack_idx_opts, hash, &idx_tmp_name); rename_tmp_packfile_idx(basename, &idx_tmp_name); free(idx_tmp_name); diff --git a/pack-write.c b/pack-write.c index 51812cb129..a2adc565f4 100644 --- a/pack-write.c +++ b/pack-write.c @@ -484,6 +484,7 @@ void stage_tmp_packfiles(struct strbuf *name_buffer, const char *pack_tmp_name, struct pack_idx_entry **written_list, uint32_t nr_written, + struct packing_data *to_pack, struct pack_idx_option *pack_idx_opts, unsigned char hash[], char **idx_tmp_name) diff --git a/pack.h b/pack.h index b22bfc4a18..fd27cfdfd7 100644 --- a/pack.h +++ b/pack.h @@ -109,11 +109,14 @@ int encode_in_pack_object_header(unsigned char *hdr, int hdr_len, #define PH_ERROR_PROTOCOL (-3) int read_pack_header(int fd, struct pack_header *); +struct packing_data; + struct hashfile *create_tmp_packfile(char **pack_tmp_name); void stage_tmp_packfiles(struct strbuf *name_buffer, const char *pack_tmp_name, struct pack_idx_entry **written_list, uint32_t nr_written, + struct packing_data *to_pack, struct pack_idx_option *pack_idx_opts, unsigned char hash[], char **idx_tmp_name); From patchwork Fri May 20 23:17:41 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 12857538 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 42D46C433F5 for ; Fri, 20 May 2022 23:18:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1354035AbiETXR5 (ORCPT ); Fri, 20 May 2022 19:17:57 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50228 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1354045AbiETXRw (ORCPT ); Fri, 20 May 2022 19:17:52 -0400 Received: from mail-qt1-x833.google.com (mail-qt1-x833.google.com [IPv6:2607:f8b0:4864:20::833]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8C7B51A15F8 for ; Fri, 20 May 2022 16:17:43 -0700 (PDT) Received: by mail-qt1-x833.google.com with SMTP id y15so4592034qtx.4 for ; Fri, 20 May 2022 16:17:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20210112.gappssmtp.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=hbjZya/lcmOcKNQZnKnTdQ7PP9mvuviLGWXtq/9S3xc=; b=uSqBhBuH7JCBkN6L4sSNHIAHExllcITXwU1oFmOm3qglfoec2weHLG9w96Z+RYBgzO pip+UPhntj160V6ePlXp90ohb1ZqEDU8WeSnWU5FT55B9RNAZ4Oai3MNFWoHuUKePr/A N+piSEV/ZEx56AVwgHrjujVNa44OvTZoymf+M8NP5SLw7DjoCrVVFrTFGEY6ima7UPK+ 9Ro/T0XfzGOiAquKH2dn1agz02BqnKuYZFGMB19DHzvtPN1RI1iGLKTZroSkJRgibISd on96JcEqWQu4MeTOhiVBcHKaVw32o0ft9rNiSz80WESidvl2uymTE1jVpxylm4x5qFqU x5oQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=hbjZya/lcmOcKNQZnKnTdQ7PP9mvuviLGWXtq/9S3xc=; b=m0NpooJw2BCP+Q5oO5ElZhJ262sf98QQuDXC5hY/v6TY0wToPP2OePzXI9y4t59Irk lpObqx5F06q+ZYAu9RkjkvvD2HXv4NJsB1AkU56rwFI7k/sg2MU7jBIY+4ovXMn55NIC RYftkMeZg/DUoAEXYKsBz5BUnFPH6R7dZQScleMpk707P3A0PVuNwFUnnDnlUz0a/HHb l9T/iqDBSUkk68KpyLz1xWh1+1jQmyxHXKR5M1i6fb00EwZWzjuoibKuAf2KZJBe+/Rw Fa1cXinGM0HfJMDp9I+TKo8vy2DZyzbdxQNCPBxnseEX8aKImIouj8cgsBxdgg2aRntI cDwQ== X-Gm-Message-State: AOAM532qRD9ZbQuFcZHBJQqmFydw54Y3waWskyz0z4mfwQfNhqW73kor Rggu6JPS/zKQUYcgmB7/t/PQF2zFkAkicc7p X-Google-Smtp-Source: ABdhPJzknmRkgxoSy5RJQQf4OOYQDnu7KAiT2Fb5db3I4NENKwjKYKwMIJPwtBtud4B5CypVbfjd1w== X-Received: by 2002:a05:622a:1489:b0:2f3:e404:f707 with SMTP id t9-20020a05622a148900b002f3e404f707mr9395922qtx.414.1653088662391; Fri, 20 May 2022 16:17:42 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id 10-20020ac84e8a000000b002f39b99f69asm555626qtp.52.2022.05.20.16.17.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 20 May 2022 16:17:42 -0700 (PDT) Date: Fri, 20 May 2022 19:17:41 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: avarab@gmail.com, derrickstolee@github.com, gitster@pobox.com, jrnieder@gmail.com, larsxschneider@gmail.com, tytso@mit.edu Subject: [PATCH v5 04/17] chunk-format.h: extract oid_version() Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org There are three definitions of an identical function which converts `the_hash_algo` into either 1 (for SHA-1) or 2 (for SHA-256). There is a copy of this function for writing both the commit-graph and multi-pack-index file, and another inline definition used to write the .rev header. Consolidate these into a single definition in chunk-format.h. It's not clear that this is the best header to define this function in, but it should do for now. (Worth noting, the .rev caller expects a 4-byte unsigned, but the other two callers work with a single unsigned byte. The consolidated version uses the latter type, and lets the compiler widen it when required). Another caller will be added in a subsequent patch. Signed-off-by: Taylor Blau --- chunk-format.c | 12 ++++++++++++ chunk-format.h | 3 +++ commit-graph.c | 18 +++--------------- midx.c | 18 +++--------------- pack-write.c | 15 ++------------- 5 files changed, 23 insertions(+), 43 deletions(-) diff --git a/chunk-format.c b/chunk-format.c index 1c3dca62e2..0275b74a89 100644 --- a/chunk-format.c +++ b/chunk-format.c @@ -181,3 +181,15 @@ int read_chunk(struct chunkfile *cf, return CHUNK_NOT_FOUND; } + +uint8_t oid_version(const struct git_hash_algo *algop) +{ + switch (hash_algo_by_ptr(algop)) { + case GIT_HASH_SHA1: + return 1; + case GIT_HASH_SHA256: + return 2; + default: + die(_("invalid hash version")); + } +} diff --git a/chunk-format.h b/chunk-format.h index 9ccbe00377..7885aa0848 100644 --- a/chunk-format.h +++ b/chunk-format.h @@ -2,6 +2,7 @@ #define CHUNK_FORMAT_H #include "git-compat-util.h" +#include "hash.h" struct hashfile; struct chunkfile; @@ -65,4 +66,6 @@ int read_chunk(struct chunkfile *cf, chunk_read_fn fn, void *data); +uint8_t oid_version(const struct git_hash_algo *algop); + #endif diff --git a/commit-graph.c b/commit-graph.c index 06107beedc..066d82ed6a 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -193,18 +193,6 @@ char *get_commit_graph_chain_filename(struct object_directory *odb) return xstrfmt("%s/info/commit-graphs/commit-graph-chain", odb->path); } -static uint8_t oid_version(void) -{ - switch (hash_algo_by_ptr(the_hash_algo)) { - case GIT_HASH_SHA1: - return 1; - case GIT_HASH_SHA256: - return 2; - default: - die(_("invalid hash version")); - } -} - static struct commit_graph *alloc_commit_graph(void) { struct commit_graph *g = xcalloc(1, sizeof(*g)); @@ -365,9 +353,9 @@ struct commit_graph *parse_commit_graph(struct repository *r, } hash_version = *(unsigned char*)(data + 5); - if (hash_version != oid_version()) { + if (hash_version != oid_version(the_hash_algo)) { error(_("commit-graph hash version %X does not match version %X"), - hash_version, oid_version()); + hash_version, oid_version(the_hash_algo)); return NULL; } @@ -1924,7 +1912,7 @@ static int write_commit_graph_file(struct write_commit_graph_context *ctx) hashwrite_be32(f, GRAPH_SIGNATURE); hashwrite_u8(f, GRAPH_VERSION); - hashwrite_u8(f, oid_version()); + hashwrite_u8(f, oid_version(the_hash_algo)); hashwrite_u8(f, get_num_chunks(cf)); hashwrite_u8(f, ctx->num_commit_graphs_after - 1); diff --git a/midx.c b/midx.c index 3db0e47735..c617c51cd0 100644 --- a/midx.c +++ b/midx.c @@ -41,18 +41,6 @@ #define PACK_EXPIRED UINT_MAX -static uint8_t oid_version(void) -{ - switch (hash_algo_by_ptr(the_hash_algo)) { - case GIT_HASH_SHA1: - return 1; - case GIT_HASH_SHA256: - return 2; - default: - die(_("invalid hash version")); - } -} - const unsigned char *get_midx_checksum(struct multi_pack_index *m) { return m->data + m->data_len - the_hash_algo->rawsz; @@ -134,9 +122,9 @@ struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local m->version); hash_version = m->data[MIDX_BYTE_HASH_VERSION]; - if (hash_version != oid_version()) { + if (hash_version != oid_version(the_hash_algo)) { error(_("multi-pack-index hash version %u does not match version %u"), - hash_version, oid_version()); + hash_version, oid_version(the_hash_algo)); goto cleanup_fail; } m->hash_len = the_hash_algo->rawsz; @@ -420,7 +408,7 @@ static size_t write_midx_header(struct hashfile *f, { hashwrite_be32(f, MIDX_SIGNATURE); hashwrite_u8(f, MIDX_VERSION); - hashwrite_u8(f, oid_version()); + hashwrite_u8(f, oid_version(the_hash_algo)); hashwrite_u8(f, num_chunks); hashwrite_u8(f, 0); /* unused */ hashwrite_be32(f, num_packs); diff --git a/pack-write.c b/pack-write.c index a2adc565f4..27b171e440 100644 --- a/pack-write.c +++ b/pack-write.c @@ -2,6 +2,7 @@ #include "pack.h" #include "csum-file.h" #include "remote.h" +#include "chunk-format.h" void reset_pack_idx_option(struct pack_idx_option *opts) { @@ -181,21 +182,9 @@ static int pack_order_cmp(const void *va, const void *vb, void *ctx) static void write_rev_header(struct hashfile *f) { - uint32_t oid_version; - switch (hash_algo_by_ptr(the_hash_algo)) { - case GIT_HASH_SHA1: - oid_version = 1; - break; - case GIT_HASH_SHA256: - oid_version = 2; - break; - default: - die("write_rev_header: unknown hash version"); - } - hashwrite_be32(f, RIDX_SIGNATURE); hashwrite_be32(f, RIDX_VERSION); - hashwrite_be32(f, oid_version); + hashwrite_be32(f, oid_version(the_hash_algo)); } static void write_rev_index_positions(struct hashfile *f, From patchwork Fri May 20 23:17:43 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 12857537 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 58C30C433F5 for ; Fri, 20 May 2022 23:17:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235516AbiETXR4 (ORCPT ); Fri, 20 May 2022 19:17:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49268 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1354012AbiETXRx (ORCPT ); Fri, 20 May 2022 19:17:53 -0400 Received: from mail-qk1-x72d.google.com (mail-qk1-x72d.google.com [IPv6:2607:f8b0:4864:20::72d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8E2CF1A29C6 for ; Fri, 20 May 2022 16:17:46 -0700 (PDT) Received: by mail-qk1-x72d.google.com with SMTP id m1so8901686qkn.10 for ; Fri, 20 May 2022 16:17:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20210112.gappssmtp.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=P+Hq+TxYueHz3mILps4sx45Lxjp1F6wSHdMmiOm9nUM=; b=OexmsDDGsXEUnNgmgmycqB9lL0pokIJSgH7OzBlX4J1pXix/TRM5gw//WJOmg6AfFH HTMjTcf3xR1NYfooydsvHp4wu9bKwQLllWaCG2jttJSO1uj07jccJhWbWHnDjSbxkuVk RwAabA6H51+3sadKvEtHVTjxOZEjmKmj6U8vFtKIuSxUYw1vXvFc2vGYT/q9hUaZNzf3 GRFS8ceYI5Pwz/psVgztrHgzqpynQLKIG2jAvnPflNS3fc1VGlPyZhbV9h2c56g91zvl JZdbdIkbKewzrcQaj2FGxa6opKSBnYzvNBNi52SyPTRNZEj7ULp+oOpOajJvUtL7nYeH SFwg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=P+Hq+TxYueHz3mILps4sx45Lxjp1F6wSHdMmiOm9nUM=; b=hzjLqlW4Ens5MVobW6bQWwCWB/Z9mctXoW2XSSPJoiAWE9alxS9/dgrd5ea+M2ALFD cIiiuaBnO4VT0wE93Dl/Kcw0Lvj+AsYsIqqudTkpqIKRZdy6Ca5gyocS2Bbzrd0zPrnJ AX8G75EZh9KqCKIH3z5nHpAehVoEUDzDOcs6KKWQTNCogtc9ARRgaxLAvvoNkTSFxpZs pHhObrPJdmL4eeGbbPFj0zFav2u99apqpiLCoS48PurQO5CZ3d/MdAmNBJyprcIiGSV/ YjjUJGV1SB1ANkNjGVr1AogsGRI29ZqqGdutmRXhrnUp1+GesnGPSsxox5vJ7hUCxgpg 3wTw== X-Gm-Message-State: AOAM530KfDigVe7QzbhsMClUTKP8CQe2tiwYw8LbdLZA8atoVtMHLlIh 2P84GhOrKbMsGftXnW57P/CSBiX43GIJm0wI X-Google-Smtp-Source: ABdhPJyc01iCsWVxceh71ogFX0GCukbfj3g5bkaTCUKvDjUR/akjVtqJXzFXtFj+B0r5PLR1KvMV4w== X-Received: by 2002:a37:9c01:0:b0:6a3:51f9:10a7 with SMTP id f1-20020a379c01000000b006a351f910a7mr2567159qke.242.1653088665230; Fri, 20 May 2022 16:17:45 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id cj19-20020a05622a259300b002f39b99f6aasm471566qtb.68.2022.05.20.16.17.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 20 May 2022 16:17:44 -0700 (PDT) Date: Fri, 20 May 2022 19:17:43 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: avarab@gmail.com, derrickstolee@github.com, gitster@pobox.com, jrnieder@gmail.com, larsxschneider@gmail.com, tytso@mit.edu Subject: [PATCH v5 05/17] pack-mtimes: support writing pack .mtimes files Message-ID: <788d1f96f22323f93dec8cf37385865984eb246b.1653088640.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Now that the `.mtimes` format is defined, supplement the pack-write API to be able to conditionally write an `.mtimes` file along with a pack by setting an additional flag and passing an oidmap that contains the timestamps corresponding to each object in the pack. Signed-off-by: Taylor Blau --- pack-objects.c | 6 ++++ pack-objects.h | 25 ++++++++++++++++ pack-write.c | 77 ++++++++++++++++++++++++++++++++++++++++++++++++++ pack.h | 1 + 4 files changed, 109 insertions(+) diff --git a/pack-objects.c b/pack-objects.c index fe2a4eace9..272e8d4517 100644 --- a/pack-objects.c +++ b/pack-objects.c @@ -170,6 +170,9 @@ struct object_entry *packlist_alloc(struct packing_data *pdata, if (pdata->layer) REALLOC_ARRAY(pdata->layer, pdata->nr_alloc); + + if (pdata->cruft_mtime) + REALLOC_ARRAY(pdata->cruft_mtime, pdata->nr_alloc); } new_entry = pdata->objects + pdata->nr_objects++; @@ -198,6 +201,9 @@ struct object_entry *packlist_alloc(struct packing_data *pdata, if (pdata->layer) pdata->layer[pdata->nr_objects - 1] = 0; + if (pdata->cruft_mtime) + pdata->cruft_mtime[pdata->nr_objects - 1] = 0; + return new_entry; } diff --git a/pack-objects.h b/pack-objects.h index dca2351ef9..393b9db546 100644 --- a/pack-objects.h +++ b/pack-objects.h @@ -168,6 +168,14 @@ struct packing_data { /* delta islands */ unsigned int *tree_depth; unsigned char *layer; + + /* + * Used when writing cruft packs. + * + * Object mtimes are stored in pack order when writing, but + * written out in lexicographic (index) order. + */ + uint32_t *cruft_mtime; }; void prepare_packing_data(struct repository *r, struct packing_data *pdata); @@ -289,4 +297,21 @@ static inline void oe_set_layer(struct packing_data *pack, pack->layer[e - pack->objects] = layer; } +static inline uint32_t oe_cruft_mtime(struct packing_data *pack, + struct object_entry *e) +{ + if (!pack->cruft_mtime) + return 0; + return pack->cruft_mtime[e - pack->objects]; +} + +static inline void oe_set_cruft_mtime(struct packing_data *pack, + struct object_entry *e, + uint32_t mtime) +{ + if (!pack->cruft_mtime) + CALLOC_ARRAY(pack->cruft_mtime, pack->nr_alloc); + pack->cruft_mtime[e - pack->objects] = mtime; +} + #endif diff --git a/pack-write.c b/pack-write.c index 27b171e440..23c0342018 100644 --- a/pack-write.c +++ b/pack-write.c @@ -3,6 +3,10 @@ #include "csum-file.h" #include "remote.h" #include "chunk-format.h" +#include "pack-mtimes.h" +#include "oidmap.h" +#include "chunk-format.h" +#include "pack-objects.h" void reset_pack_idx_option(struct pack_idx_option *opts) { @@ -277,6 +281,70 @@ const char *write_rev_file_order(const char *rev_name, return rev_name; } +static void write_mtimes_header(struct hashfile *f) +{ + hashwrite_be32(f, MTIMES_SIGNATURE); + hashwrite_be32(f, MTIMES_VERSION); + hashwrite_be32(f, oid_version(the_hash_algo)); +} + +/* + * Writes the object mtimes of "objects" for use in a .mtimes file. + * Note that objects must be in lexicographic (index) order, which is + * the expected ordering of these values in the .mtimes file. + */ +static void write_mtimes_objects(struct hashfile *f, + struct packing_data *to_pack, + struct pack_idx_entry **objects, + uint32_t nr_objects) +{ + uint32_t i; + for (i = 0; i < nr_objects; i++) { + struct object_entry *e = (struct object_entry*)objects[i]; + hashwrite_be32(f, oe_cruft_mtime(to_pack, e)); + } +} + +static void write_mtimes_trailer(struct hashfile *f, const unsigned char *hash) +{ + hashwrite(f, hash, the_hash_algo->rawsz); +} + +static const char *write_mtimes_file(const char *mtimes_name, + struct packing_data *to_pack, + struct pack_idx_entry **objects, + uint32_t nr_objects, + const unsigned char *hash) +{ + struct hashfile *f; + int fd; + + if (!to_pack) + BUG("cannot call write_mtimes_file with NULL packing_data"); + + if (!mtimes_name) { + struct strbuf tmp_file = STRBUF_INIT; + fd = odb_mkstemp(&tmp_file, "pack/tmp_mtimes_XXXXXX"); + mtimes_name = strbuf_detach(&tmp_file, NULL); + } else { + unlink(mtimes_name); + fd = xopen(mtimes_name, O_CREAT|O_EXCL|O_WRONLY, 0600); + } + f = hashfd(fd, mtimes_name); + + write_mtimes_header(f); + write_mtimes_objects(f, to_pack, objects, nr_objects); + write_mtimes_trailer(f, hash); + + if (adjust_shared_perm(mtimes_name) < 0) + die(_("failed to make %s readable"), mtimes_name); + + finalize_hashfile(f, NULL, FSYNC_COMPONENT_PACK_METADATA, + CSUM_HASH_IN_STREAM | CSUM_CLOSE | CSUM_FSYNC); + + return mtimes_name; +} + off_t write_pack_header(struct hashfile *f, uint32_t nr_entries) { struct pack_header hdr; @@ -479,6 +547,7 @@ void stage_tmp_packfiles(struct strbuf *name_buffer, char **idx_tmp_name) { const char *rev_tmp_name = NULL; + const char *mtimes_tmp_name = NULL; if (adjust_shared_perm(pack_tmp_name)) die_errno("unable to make temporary pack file readable"); @@ -491,9 +560,17 @@ void stage_tmp_packfiles(struct strbuf *name_buffer, rev_tmp_name = write_rev_file(NULL, written_list, nr_written, hash, pack_idx_opts->flags); + if (pack_idx_opts->flags & WRITE_MTIMES) { + mtimes_tmp_name = write_mtimes_file(NULL, to_pack, written_list, + nr_written, + hash); + } + rename_tmp_packfile(name_buffer, pack_tmp_name, "pack"); if (rev_tmp_name) rename_tmp_packfile(name_buffer, rev_tmp_name, "rev"); + if (mtimes_tmp_name) + rename_tmp_packfile(name_buffer, mtimes_tmp_name, "mtimes"); } void write_promisor_file(const char *promisor_name, struct ref **sought, int nr_sought) diff --git a/pack.h b/pack.h index fd27cfdfd7..01d385903a 100644 --- a/pack.h +++ b/pack.h @@ -44,6 +44,7 @@ struct pack_idx_option { #define WRITE_IDX_STRICT 02 #define WRITE_REV 04 #define WRITE_REV_VERIFY 010 +#define WRITE_MTIMES 020 uint32_t version; uint32_t off32_limit; From patchwork Fri May 20 23:17:46 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 12857540 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3B012C433EF for ; Fri, 20 May 2022 23:18:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1354049AbiETXSD (ORCPT ); Fri, 20 May 2022 19:18:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50174 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1354014AbiETXRx (ORCPT ); Fri, 20 May 2022 19:17:53 -0400 Received: from mail-qt1-x831.google.com (mail-qt1-x831.google.com [IPv6:2607:f8b0:4864:20::831]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8B0411A35B5 for ; Fri, 20 May 2022 16:17:48 -0700 (PDT) Received: by mail-qt1-x831.google.com with SMTP id x7so5667777qta.6 for ; Fri, 20 May 2022 16:17:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20210112.gappssmtp.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=SqASYvmKa442CO3IwI5FZCBpEfKsU3INt4MwBobT+bg=; b=htLC5MA43u28HNR4ohEvDj7hFUmSMVf6ZpuS+mT/qkKifoX2v8nKZmCx6H2d5DAkrR sx55oOPzdJQqxk8l19hRfW6Ny84Xuf1JF45n9obcraHYK7Cimo1eba1hHfoRF2kHA6YO 5aX273Xac8vC7KH15h5KtkKFXVSvrEGngi1uU5YPzbik0Qo/d0lVKhPUlV9RXVy4Eatz NjlAjr6FUKyrrgVV6o3bjSX18DtfXxwGGUo6tgX0n2uBYZfWrOTibV3vaQgdhFlJRkqq aC01pv4Tx7E8nWmU0O7re0cnPMgfpGrbcW8vl3K1FjSAscH22yTWgva5k8fm8/VVsiNE TD7A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=SqASYvmKa442CO3IwI5FZCBpEfKsU3INt4MwBobT+bg=; b=Mx56ckFwKD7ItZIZIN6GUJ+SduejpSLe3Qg4XoPZ3wc3hKNJaLveT2pMdgXOjQkK6P bj2rSvqrE9mkOhKclc+VZ4iKHe5OP5EjJu7ppeGvoePpw88bZQw2Ul5SqRnfHERzV0iX Br43CBc8yjSW49Vw3TVF0XNB0c4fWnNYfFnC8msCi0Mz9sjaIMwichHrFLBiC6K90CKv +bfhQDo/I4vcneLmtcQvbBGsyHyUF988y3B4QDEno683kDekvvkVheRbP9nOD5Bt4sDT 3VOfUUfO6xdKh/rUSyBa0yrztqbv+QEh8FvdrGwrYNZ97T6sXxGjgCT0+RSA/i5lT02l Doaw== X-Gm-Message-State: AOAM530chUsevarPjqxbmGF/t6s+xeuBD2TKHdb1VwzCkYvrC3XwMjVB bmNK5LW+mznFZurUtHllTLu73jR6wEMzoVzo X-Google-Smtp-Source: ABdhPJyURXU0lq1+2erhYRBtEfpxR/SsNJ8VWqgWo1k2Z2xpbuJ1zJKfmxY2k4YpO9Yhm7UncdN21Q== X-Received: by 2002:ac8:5b51:0:b0:2f3:d5a9:fde with SMTP id n17-20020ac85b51000000b002f3d5a90fdemr9467372qtw.60.1653088667967; Fri, 20 May 2022 16:17:47 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id z18-20020ac87cb2000000b002f39b99f6b6sm424700qtv.80.2022.05.20.16.17.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 20 May 2022 16:17:47 -0700 (PDT) Date: Fri, 20 May 2022 19:17:46 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: avarab@gmail.com, derrickstolee@github.com, gitster@pobox.com, jrnieder@gmail.com, larsxschneider@gmail.com, tytso@mit.edu Subject: [PATCH v5 06/17] t/helper: add 'pack-mtimes' test-tool Message-ID: <2a6cfb00bf287380f44f324549db274a1b022499.1653088640.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org In the next patch, we will implement and test support for writing a cruft pack via a special mode of `git pack-objects`. To make sure that objects are written with the correct timestamps, and a new test-tool that can dump the object names and corresponding timestamps from a given `.mtimes` file. Signed-off-by: Taylor Blau --- Makefile | 1 + t/helper/test-pack-mtimes.c | 56 +++++++++++++++++++++++++++++++++++++ t/helper/test-tool.c | 1 + t/helper/test-tool.h | 1 + 4 files changed, 59 insertions(+) create mode 100644 t/helper/test-pack-mtimes.c diff --git a/Makefile b/Makefile index a299580b7c..0b6eab0453 100644 --- a/Makefile +++ b/Makefile @@ -738,6 +738,7 @@ TEST_BUILTINS_OBJS += test-oid-array.o TEST_BUILTINS_OBJS += test-oidmap.o TEST_BUILTINS_OBJS += test-oidtree.o TEST_BUILTINS_OBJS += test-online-cpus.o +TEST_BUILTINS_OBJS += test-pack-mtimes.o TEST_BUILTINS_OBJS += test-parse-options.o TEST_BUILTINS_OBJS += test-parse-pathspec-file.o TEST_BUILTINS_OBJS += test-partial-clone.o diff --git a/t/helper/test-pack-mtimes.c b/t/helper/test-pack-mtimes.c new file mode 100644 index 0000000000..f7b79daf4c --- /dev/null +++ b/t/helper/test-pack-mtimes.c @@ -0,0 +1,56 @@ +#include "git-compat-util.h" +#include "test-tool.h" +#include "strbuf.h" +#include "object-store.h" +#include "packfile.h" +#include "pack-mtimes.h" + +static void dump_mtimes(struct packed_git *p) +{ + uint32_t i; + if (load_pack_mtimes(p) < 0) + die("could not load pack .mtimes"); + + for (i = 0; i < p->num_objects; i++) { + struct object_id oid; + if (nth_packed_object_id(&oid, p, i) < 0) + die("could not load object id at position %"PRIu32, i); + + printf("%s %"PRIu32"\n", + oid_to_hex(&oid), nth_packed_mtime(p, i)); + } +} + +static const char *pack_mtimes_usage = "\n" +" test-tool pack-mtimes "; + +int cmd__pack_mtimes(int argc, const char **argv) +{ + struct strbuf buf = STRBUF_INIT; + struct packed_git *p; + + setup_git_directory(); + + if (argc != 2) + usage(pack_mtimes_usage); + + for (p = get_all_packs(the_repository); p; p = p->next) { + strbuf_addstr(&buf, basename(p->pack_name)); + strbuf_strip_suffix(&buf, ".pack"); + strbuf_addstr(&buf, ".mtimes"); + + if (!strcmp(buf.buf, argv[1])) + break; + + strbuf_reset(&buf); + } + + strbuf_release(&buf); + + if (!p) + die("could not find pack '%s'", argv[1]); + + dump_mtimes(p); + + return 0; +} diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c index 0424f7adf5..d2eacd302d 100644 --- a/t/helper/test-tool.c +++ b/t/helper/test-tool.c @@ -48,6 +48,7 @@ static struct test_cmd cmds[] = { { "oidmap", cmd__oidmap }, { "oidtree", cmd__oidtree }, { "online-cpus", cmd__online_cpus }, + { "pack-mtimes", cmd__pack_mtimes }, { "parse-options", cmd__parse_options }, { "parse-pathspec-file", cmd__parse_pathspec_file }, { "partial-clone", cmd__partial_clone }, diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h index c876e8246f..960cc27ef7 100644 --- a/t/helper/test-tool.h +++ b/t/helper/test-tool.h @@ -38,6 +38,7 @@ int cmd__mktemp(int argc, const char **argv); int cmd__oidmap(int argc, const char **argv); int cmd__oidtree(int argc, const char **argv); int cmd__online_cpus(int argc, const char **argv); +int cmd__pack_mtimes(int argc, const char **argv); int cmd__parse_options(int argc, const char **argv); int cmd__parse_pathspec_file(int argc, const char** argv); int cmd__partial_clone(int argc, const char **argv); From patchwork Fri May 20 23:17:49 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 12857541 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 54F43C433F5 for ; Fri, 20 May 2022 23:18:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1353276AbiETXSG (ORCPT ); Fri, 20 May 2022 19:18:06 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50502 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1354041AbiETXRy (ORCPT ); Fri, 20 May 2022 19:17:54 -0400 Received: from mail-qt1-x832.google.com (mail-qt1-x832.google.com [IPv6:2607:f8b0:4864:20::832]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B69DB1A4907 for ; Fri, 20 May 2022 16:17:51 -0700 (PDT) Received: by mail-qt1-x832.google.com with SMTP id k2so8899982qtp.1 for ; Fri, 20 May 2022 16:17:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20210112.gappssmtp.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=eGZdPFnS0lM71M7JI9AHcwK6IUXvVZUam/Aut+ZfQZg=; b=yui52nYCeASRK55n8/Y+ShUU406zEwNBpkbkAFlWJV8hQlcLlEmqpmJtgilMLzB2Nd hBaU1Q1/3T6GFbawHLUJdN6aL1jaFoZ29OXU6sWEIwJCz+hNOPNHvFSuiN0p90j/vklP E/gVHObihkY+hfV63Yxhgz/rN3eUZzyBrDz7Ez4+K8V4HsWMF3JZSGB5KADNiCnq/aCy 9wJ7voXFyiE0rHv+pUJkB+oy07XpwOr3fgIbd9reGTRL7Qi+VGAa0/sapv2TxIuSavV9 GyhIEI1Rpl+xlDNRxh3VTvFXu5MkE5nkNph6iBID565TgxENMLW4xeVz6ElSHU07A1oh PijQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=eGZdPFnS0lM71M7JI9AHcwK6IUXvVZUam/Aut+ZfQZg=; b=yBu7JQdyI7Ial5FEm6BeS6MkMRNfiCPqyT6hp6BfRB6Xn3sDUvo+tTSi8OQ0DyMrOq NSj+4ED36H+V86neNR52UEicsbs1+GCO5T5yJ3rZxz+AqKQfij7y827p144S+nLuFyOJ cP9zJ3YnnXg0qmWMXifgUTXVoxwb9JPvAT9SLFLJFZ5fn+ElZ+cNTwkAtmnB9lcgAr3x IJ/K9d2WZQPMupQI3vQCMTI7kTTGVUKaEv3V4XxFcs+sp0Ee3UkGwqRpPJhTsWmVuIQf fssloN//3homuEzK98LFDMhy7elXAtHyzXHXLoPTl5FObtIh0C6l35enKyRenjJYkMaI Uk3g== X-Gm-Message-State: AOAM532ufeh+I0Ft8ZX4E4fEfMShjH/mixKCoHtkU8ZxgzfL+DCpq9+e wR7YrRbwmsc1o6tM9m6L2NvdvU9Ew+6dJy1z X-Google-Smtp-Source: ABdhPJz/ARuA1qXIdjoTTWVxcyLYPtVohzPs68ObtZZwiQi61rqFz+CVvXymmSNtUikaU9562AfquQ== X-Received: by 2002:a05:622a:493:b0:2f3:e77b:640b with SMTP id p19-20020a05622a049300b002f3e77b640bmr9455784qtx.236.1653088670699; Fri, 20 May 2022 16:17:50 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id cm26-20020a05622a251a00b002f39b99f6a6sm459072qtb.64.2022.05.20.16.17.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 20 May 2022 16:17:50 -0700 (PDT) Date: Fri, 20 May 2022 19:17:49 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: avarab@gmail.com, derrickstolee@github.com, gitster@pobox.com, jrnieder@gmail.com, larsxschneider@gmail.com, tytso@mit.edu Subject: [PATCH v5 07/17] builtin/pack-objects.c: return from create_object_entry() Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org A new caller in the next commit will want to immediately modify the object_entry structure created by create_object_entry(). Instead of forcing that caller to wastefully look-up the entry we just created, return it from create_object_entry() instead. Signed-off-by: Taylor Blau --- builtin/pack-objects.c | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 6ac927047c..c6d16872ee 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -1516,13 +1516,13 @@ static int want_object_in_pack(const struct object_id *oid, return 1; } -static void create_object_entry(const struct object_id *oid, - enum object_type type, - uint32_t hash, - int exclude, - int no_try_delta, - struct packed_git *found_pack, - off_t found_offset) +static struct object_entry *create_object_entry(const struct object_id *oid, + enum object_type type, + uint32_t hash, + int exclude, + int no_try_delta, + struct packed_git *found_pack, + off_t found_offset) { struct object_entry *entry; @@ -1539,6 +1539,8 @@ static void create_object_entry(const struct object_id *oid, } entry->no_try_delta = no_try_delta; + + return entry; } static const char no_closure_warning[] = N_( From patchwork Fri May 20 23:17:52 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 12857543 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E4892C433EF for ; Fri, 20 May 2022 23:18:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1354039AbiETXSK (ORCPT ); Fri, 20 May 2022 19:18:10 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50374 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238696AbiETXR4 (ORCPT ); Fri, 20 May 2022 19:17:56 -0400 Received: from mail-qt1-x82c.google.com (mail-qt1-x82c.google.com [IPv6:2607:f8b0:4864:20::82c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9C9B51957B8 for ; Fri, 20 May 2022 16:17:54 -0700 (PDT) Received: by mail-qt1-x82c.google.com with SMTP id v6so5847308qtx.12 for ; Fri, 20 May 2022 16:17:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20210112.gappssmtp.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=J/R31/ebmQ4lbVegmp4wIsp5bWpx1GJe4uyhjKvlOFQ=; b=B7Z2ItjefzCcqk4qTm9FTB7yxRsYInXnWwSsvk+9kz97HAk+q45SL9Lj3jIqKhxisx U1PjTvxyvCwqT1b6ukjQ5REMPjrjMF57KNePADDmhPj7z+kdqrWq8F+KXfCLKAyR3ttq NtgjVMLs88Oy6NE1KsEKhl080B+x1+ym2WYT4tXacpD3W8U6f+JbBx+7+k+AxSgWlNIb 4jk2UcDxmv4/7xWU1GmRc0TNv1BLOD/3nITqBJQHS8SzUz8lk7bq/iN4f1rqH7BSp34Y HGMm7uN7GilCHvNn4aqNPBPvGNygJgfxuSeVGl7nrmP8JcJxv2XFsuKabJ/fAMe4hGIL /fAQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=J/R31/ebmQ4lbVegmp4wIsp5bWpx1GJe4uyhjKvlOFQ=; b=lcHhHJswcDiGH1fj53m4w5BdF5Ad7oJy7tVjDe6YAdtm0lpBJ8nsgTJNRB3in2f7WP +wZtMderFjiJPxZ5jMbLFTVkwlsJpZCPHfy6wdC6bDQmR7WBfgXOfDJA9SWevxPRI/RP k/IMe1hvzyHRqYN2PdS0w1t3YAQF7QsNpcoAMoNtZNaAwZ6GknCubhKjsmob+BGMOnaH a4w6CAB6dNmB9loZD0zcYfcQIDqgMNkoYwxrsoy0MLJxWdrol45I6Ua1HCSn5f8YkmFK pqZuY8qWNts3HAOjt1GYJC8DaKiEAjYscQUgKXzgjR+CaFMSDAVqX6HFCYNCmCyU6WRa l7cg== X-Gm-Message-State: AOAM530Gbno9as9cWzsvuu4FKojBq4CCQm8+BD3x2PzEBWxR7cZeYGc7 TbkL9ED3Qur/QWBBPHK4AbNwf89qkxAdAQuQ X-Google-Smtp-Source: ABdhPJzXyNGbpflHifiUNSugljXd8adH3krg3urAzA+BgwNmtS3ONngyI2gFNS5SfmkzRLGUPXhscg== X-Received: by 2002:ac8:590c:0:b0:2f3:e1b7:5d1d with SMTP id 12-20020ac8590c000000b002f3e1b75d1dmr9508879qty.191.1653088673328; Fri, 20 May 2022 16:17:53 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id f4-20020ac86ec4000000b002f906fc8530sm483899qtv.46.2022.05.20.16.17.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 20 May 2022 16:17:53 -0700 (PDT) Date: Fri, 20 May 2022 19:17:52 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: avarab@gmail.com, derrickstolee@github.com, gitster@pobox.com, jrnieder@gmail.com, larsxschneider@gmail.com, tytso@mit.edu Subject: [PATCH v5 08/17] builtin/pack-objects.c: --cruft without expiration Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Teach `pack-objects` how to generate a cruft pack when no objects are dropped (i.e., `--cruft-expiration=never`). Later patches will teach `pack-objects` how to generate a cruft pack that prunes objects. When generating a cruft pack which does not prune objects, we want to collect all unreachable objects into a single pack (noting and updating their mtimes as we accumulate them). Ordinary use will pass the result of a `git repack -A` as a kept pack, so when this patch says "kept pack", readers should think "reachable objects". Generating a non-expiring cruft packs works as follows: - Callers provide a list of every pack they know about, and indicate which packs are about to be removed. - All packs which are going to be removed (we'll call these the redundant ones) are marked as kept in-core. Any packs the caller did not mention (but are known to the `pack-objects` process) are also marked as kept in-core. Packs not mentioned by the caller are assumed to be unknown to them, i.e., they entered the repository after the caller decided which packs should be kept and which should be discarded. Since we do not want to include objects in these "unknown" packs (because we don't know which of their objects are or aren't reachable), these are also marked as kept in-core. - Then, we enumerate all objects in the repository, and add them to our packing list if they do not appear in an in-core kept pack. This results in a new cruft pack which contains all known objects that aren't included in the kept packs. When the kept pack is the result of `git repack -A`, the resulting pack contains all unreachable objects. Signed-off-by: Taylor Blau --- Documentation/git-pack-objects.txt | 30 ++++ builtin/pack-objects.c | 201 +++++++++++++++++++++++++- object-file.c | 2 +- object-store.h | 2 + t/t5329-pack-objects-cruft.sh | 218 +++++++++++++++++++++++++++++ 5 files changed, 448 insertions(+), 5 deletions(-) create mode 100755 t/t5329-pack-objects-cruft.sh diff --git a/Documentation/git-pack-objects.txt b/Documentation/git-pack-objects.txt index f8344e1e5b..a9995a932c 100644 --- a/Documentation/git-pack-objects.txt +++ b/Documentation/git-pack-objects.txt @@ -13,6 +13,7 @@ SYNOPSIS [--no-reuse-delta] [--delta-base-offset] [--non-empty] [--local] [--incremental] [--window=] [--depth=] [--revs [--unpacked | --all]] [--keep-pack=] + [--cruft] [--cruft-expiration=