From patchwork Fri May 6 00:26:53 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jason Hatton X-Patchwork-Id: 12840380 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 66408C433F5 for ; Fri, 6 May 2022 00:27:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1387568AbiEFAaq (ORCPT ); Thu, 5 May 2022 20:30:46 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35824 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1352286AbiEFAao (ORCPT ); Thu, 5 May 2022 20:30:44 -0400 Received: from us-smtp-delivery-104.mimecast.com (us-smtp-delivery-104.mimecast.com [170.10.129.104]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 4A03215FF5 for ; Thu, 5 May 2022 17:27:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=globalfinishing.com; s=mimecast20180829; t=1651796821; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=/o96+vKuKOAPEO33i7qB/rafPIHTWrjlT9C/+ScdE2o=; b=LIbyYioTAzuTU5apoWKeq8ZDv6d9vDhrakicIG5XX8Gam/SOkN96QsVUPYCvh1AQIai9ld rgm5K6mNyNhOYseamux3mbbHS1e7LQuCJevscgZVYWXw+jM6I9rXXVHklenuxYyGy8Qbqg IMLKKNoJVsYu/xIgmKs3vVUlU+rO7aE= Received: from NAM10-MW2-obe.outbound.protection.outlook.com (mail-mw2nam10lp2101.outbound.protection.outlook.com [104.47.55.101]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-235-7QowixBmO6ScDV-MhR6Gvw-1; Thu, 05 May 2022 20:27:00 -0400 X-MC-Unique: 7QowixBmO6ScDV-MhR6Gvw-1 Received: from CY4PR16MB1655.namprd16.prod.outlook.com (2603:10b6:910:6d::10) by MN2PR16MB3582.namprd16.prod.outlook.com (2603:10b6:208:163::20) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5206.25; Fri, 6 May 2022 00:26:53 +0000 Received: from CY4PR16MB1655.namprd16.prod.outlook.com ([fe80::6d76:e92c:8b8c:10ea]) by CY4PR16MB1655.namprd16.prod.outlook.com ([fe80::6d76:e92c:8b8c:10ea%4]) with mapi id 15.20.5206.027; Fri, 6 May 2022 00:26:53 +0000 From: Jason Hatton To: Philip Oakley , =?iso-8859-1?q?Ren=E9_Scharfe?= , "git@vger.kernel.org" CC: Junio C Hamano Subject: [PATCH] Prevent git from rehashing 4GBi files Thread-Topic: [PATCH] Prevent git from rehashing 4GBi files Thread-Index: Adhg33aOJ0Ra5s3oR3q4L+LbW+ficQ== Date: Fri, 6 May 2022 00:26:53 +0000 Message-ID: Accept-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: 9cb618fa-218a-4faa-404c-08da2ef71eb0 x-ms-traffictypediagnostic: MN2PR16MB3582:EE_ x-microsoft-antispam-prvs: x-ms-exchange-senderadcheck: 1 x-ms-exchange-antispam-relay: 0 x-microsoft-antispam: BCL:0 x-microsoft-antispam-message-info: nvbp/+OsGLs1gGbVnHqUYHHkPo3H9zSUCwITl8ZNgkkT5/vx4I/N2xV2yho8hpzWpYkDiYpzmW/18YAGUzXTlgtDqzNKFVUD6/qmMf3Ue3L+b2CgyZZ41rXdhE7TnELP6+MHC3UbSwo9qhNlVUrsPqVNQzknCyK1SsQ7AgnskbWYCCJcMbO5vP8eaSV1hW5YxOo+U5DD8VTUydC4Z+w5WkrO9dkFkSPWUd3D3qfJvyYd5AlsoaYtEZvpKMPN1buUkRXetTGWhicGD9JZglIIfknyP0WfPfsZJtvw5hMHTN9LEwRoF6nOzvkmvPxiZq+xa1Z7Hh90gbO25LXY4HBeSCLgJrH1/0qHCqEvbM58QYeNaAaJ23zmTNjkjg/J5u7zfGpCB12wN+IcLdlrbzDrecqeDWUsmJrj1S7r9YSoHxO7PzJS0FJDtgruRYtT9OieCLWhIsbuI7eCw4iDiEOsWvivQRkKkbEuFDBJl47k/rOmFfJ2vdLbXMQ7Hkysz2HgHd+WOwZ+tFrtNJtKLtdeS0Jf85N94nwCnX4iwektvxLAKUmVb3nr0jPlp4dLMyHoq3nXyYf7EGJFOdtdN0dB5jgB8UZC9kSjDZx5ZPq012fRl2nwBbtT8MaTAuDwdRxdHsayNl2nKz5Fp8gvV7fTdSPev8ry7pwtz6ptXXWRAsvpHgkdpEZO1KBkqMz6+egt0vZ8b9ygszsu9MalWige1g== x-forefront-antispam-report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:CY4PR16MB1655.namprd16.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230001)(366004)(110136005)(186003)(38070700005)(66446008)(64756008)(66556008)(66476007)(55016003)(26005)(38100700002)(76116006)(66946007)(8676002)(4326008)(122000001)(316002)(5660300002)(52536014)(86362001)(6506007)(7696005)(2906002)(508600001)(8936002)(33656002)(71200400001)(83380400001)(9686003);DIR:OUT;SFP:1101 x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?iso-8859-1?q?maqwtxrD4WhkfEOZsLKIKWE?= =?iso-8859-1?q?1yrnZcPbSRthLttCJm1vpUJNIKJcjjtnMMik7CsufPR7ZITprvHnQAo4U5Zw?= =?iso-8859-1?q?QmpZoh0mrk2uMUooViGILdMwUAqrHBenps/FK+K3YrUjkyZVHWENEh3gM6PQ?= =?iso-8859-1?q?zBlIYIzSsaUgtZtCZ/+fSWPf9Gz5im6/oxXit2AZvxNjmLCpEmODuAqHgDhU?= =?iso-8859-1?q?j+qX0Rg/YioIOKaiSYJfUrbMRi5+vBRBuSUBY1mCgwAbRQj3KmvD2iEmjvyC?= =?iso-8859-1?q?etvHWjVehZ6wPceKq0uxso1bCFfaSb4QxIS+yppZIxvXdMzoFBB5+b6CJCZo?= =?iso-8859-1?q?Lj2rF0iPus4oY60EABj7/quiNHQt/TlxdYrD7MV0SBphWDM36XF32Y5FKNWu?= =?iso-8859-1?q?i7HTrYveNVVniHKH3wNGKFHwAHNlVQyEBuEbMIWo0Bd+Ra3vXT2iSFfc5ies?= =?iso-8859-1?q?aq4Qjfrq0aogtj78H0wia4+HxNovEJ74OYW8X1NFyiSoYcEr6kjBL/jZhpXo?= =?iso-8859-1?q?Unmd6LwTsDIiJKInc4ys1abkgSJ1nzwxTb4YczcDqj4iiZU1PVCHBbgv6K3B?= =?iso-8859-1?q?UFfpTo2+ZrzFfbXflxqox2Jq7z790XriJnKen1Hq0WQz+rkfATERTTzb3f5H?= =?iso-8859-1?q?yoWhu8oeaswDWp7rskzSu+oXJoSjZwqI7z8l0nJDI3awPuCyKjU2Fd9utzWX?= =?iso-8859-1?q?QepgHsHrZgoYLnrhqb5v2MdUeeUdyphHJXh3hCUpoheF8+w1zn3RJOJc0l/B?= =?iso-8859-1?q?Hr8ar7UyTXVPpzh4ny3/gjxuMO5/u3gPP+LugZTKx0HB19AarCkKsVJjYBrC?= =?iso-8859-1?q?wWLz/ati5PmFP9cqUu20g8TUJ4IjRzGeX0csBQJTgtqTwItiHIKeNYR9nld0?= =?iso-8859-1?q?jHOvNPQ4xz2bTMwwg8x9qdeSnrRcbMnQpHjJtqZkDmFj+gz6ukJNEVwMV5JG?= =?iso-8859-1?q?KUEGqCvKuywZ3HMhiFSP1Vaxn+VnmdwTdgwcHbYhdL3qXjh/DOG8wiVYF9Ad?= =?iso-8859-1?q?MHH21h2HeuMBLwI8cWUTd2a1IJGTjNMtXMNMH5ajW/TTLWZ+QF33c3n9rml1?= =?iso-8859-1?q?cvhK7ezVb/vyzTSkGgoFubBuJx++DFZvIJ0pv+Bj4uCIV2vZVnIgdrWYWiqt?= =?iso-8859-1?q?7f2QXRkCdSCEsJkX2g24dDAB0CNJpVn37A5qFi0HleSopgtUaK1H5sYD/7sc?= =?iso-8859-1?q?dvwnzA9RE/ddtGsThEC8+Ro1nnLUBvdv+1kKCOtAK+GxLY36rOh9kt/r342m?= =?iso-8859-1?q?apHDiPibDoHIgiLmSvafHCqT4Y8w/78XME3Ppcfe36cAI7msv9AzQZIJ3Nw2?= =?iso-8859-1?q?TK0S2SphUYv1tppzd26S/JRuuzqISeav66vm4R0228BT8TCv5x0ll2u5xMUD?= =?iso-8859-1?q?cRrGUefnW52d2OcK56A4kXZl7XUdTaUa/j4v5Z36GIllePnKJDCgeznF88dh?= =?iso-8859-1?q?AQT0qPUku5hN9CYGpTWQMKigHubBSB3Dyl5amTJAPEsv94Y+0mV7SROfmEBj?= =?iso-8859-1?q?wllGe2rjbBXLKaLFGWNs3eZO8r4UdI7/D+m5nDlHp8tA628r0HFVUjxbdrpl?= =?iso-8859-1?q?CzardrRbRVfmAIPaB+mulJkDgCtXr4h5QwHXq8zos+ETjH8Sx5lmW5xERi0/?= =?iso-8859-1?q?L0+tzLzMxgBXctDr/wX+oyhUJKA6phDML5POgmw12NyOFRr4Ucp4jx8zrvgg?= =?iso-8859-1?q?g3Ry4ddVQQ22vInpjsb/en6bFBk7RtxgulGDB2g05P0PXyCJJgQWHek9/uRx?= =?iso-8859-1?q?IgZM+/7r8oq4LZB6fCL08+tQi9SoSpncbVX0L5pQX45YIxHQpDfOoIgBgAL/?= =?iso-8859-1?q?4FGsdTRo=3D?= MIME-Version: 1.0 X-OriginatorOrg: globalfinishing.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: CY4PR16MB1655.namprd16.prod.outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: 9cb618fa-218a-4faa-404c-08da2ef71eb0 X-MS-Exchange-CrossTenant-originalarrivaltime: 06 May 2022 00:26:53.2544 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 41026ff0-b4fb-4e53-ada8-a2e2e3e9ec4d X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: 5wgcufZeHU0rSywj//NbTW3QEQ6EK98j4Gdmnv1VKLB5NyuFROiYnkHl4IzGT4+XtiV9teAJnVQaDZwrmF/X2qNEh29KNguadSN8crWemss= X-MS-Exchange-Transport-CrossTenantHeadersStamped: MN2PR16MB3582 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA104A295 smtp.mailfrom=jhatton@globalfinishing.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: globalfinishing.com Content-Language: en-US Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Git cache stores file sizes using uint32_t. This causes any file that is a multiple of 2^32 to have a cached file size of zero. Zero is a special value used by racily clean. This causes git to rehash every file that is a multiple of 2^32 every time git status or git commit is run. This patch mitigates the problem by making all files that are a multiple of 2^32 appear to have a size of 1<<31 instead of zero. The value of 1<<31 is chosen to keep it as far away from zero as possible to help prevent things getting mixed up with unpatched versions of git. An example would be to have a 2^32 sized file in the index of patched git. Patched git would save the file as 2^31 in the cache. An unpatched git would very much see the file has changed in size and force it to rehash the file, which is safe. The file would have to grow or shrink by exactly 2^31 and retain all of its ctime, mtime, and other attributes for old git to not notice the change. This patch does not change the behavior of any file that is not an exact multiple of 2^32. Signed-off-by: Jason D. Hatton --- cache.h | 1 + read-cache.c | 16 ++++++++++++++-- 2 files changed, 15 insertions(+), 2 deletions(-) diff --git a/cache.h b/cache.h index 4b666b2848..74e983227b 100644 --- a/cache.h +++ b/cache.h @@ -898,6 +898,7 @@ int ie_modified(struct index_state *, const struct cache_entry *, struct stat *, #define HASH_SILENT 8 int index_fd(struct index_state *istate, struct object_id *oid, int fd, struct stat *st, enum object_type type, const char *path, unsigned flags); int index_path(struct index_state *istate, struct object_id *oid, const char *path, struct stat *st, unsigned flags); +unsigned int munge_st_size(off_t st_size); /* * Record to sd the data from st that we use to check whether a file diff --git a/read-cache.c b/read-cache.c index ea6150ea28..b0a1b505db 100644 --- a/read-cache.c +++ b/read-cache.c @@ -163,6 +163,18 @@ void rename_index_entry_at(struct index_state *istate, int nr, const char *new_n add_index_entry(istate, new_entry, ADD_CACHE_OK_TO_ADD|ADD_CACHE_OK_TO_REPLACE); } +/* + * Munge st_size into an unsigned int. + */ +unsigned int munge_st_size(off_t st_size) { + unsigned int sd_size = st_size; + + if(!sd_size && st_size) + return 0x80000000; + else + return sd_size; +} + void fill_stat_data(struct stat_data *sd, struct stat *st) { sd->sd_ctime.sec = (unsigned int)st->st_ctime; @@ -173,7 +185,7 @@ void fill_stat_data(struct stat_data *sd, struct stat *st) sd->sd_ino = st->st_ino; sd->sd_uid = st->st_uid; sd->sd_gid = st->st_gid; - sd->sd_size = st->st_size; + sd->sd_size = munge_st_size(st->st_size); } int match_stat_data(const struct stat_data *sd, struct stat *st) @@ -212,7 +224,7 @@ int match_stat_data(const struct stat_data *sd, struct stat *st) changed |= INODE_CHANGED; #endif - if (sd->sd_size != (unsigned int) st->st_size) + if (sd->sd_size != munge_st_size(st->st_size)) changed |= DATA_CHANGED; return changed;