From patchwork Thu Nov 9 22:34:11 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13451836 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9443C38FB3 for ; Thu, 9 Nov 2023 22:34:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="iCg3XMj4" Received: from mail-qk1-x735.google.com (mail-qk1-x735.google.com [IPv6:2607:f8b0:4864:20::735]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0C1CD2D63 for ; Thu, 9 Nov 2023 14:34:14 -0800 (PST) Received: by mail-qk1-x735.google.com with SMTP id af79cd13be357-778a47bc09aso90110785a.3 for ; Thu, 09 Nov 2023 14:34:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1699569253; x=1700174053; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=k0GnV4Z+c6dZbNpid6RQ+HjufacG8wqK42c+ZwyHwlU=; b=iCg3XMj4IMFTqEIG1fP53hnZ4LNY7Uioom9dOtUDlg6YeoGXO1vDW1Zf2iQHlva51A GswDrUPWFpEYhswdmrfj7blVaOch0zcEUqc9gk58VE37NaY5MTlxcN4Wv7JJrpEIf5Cu mfipzHVGA4xorJYLxzslAUoKMyuYbAu4izBpBrmILXHwViPRefJ94lfsdETSi31aeJSO gh7KxzcBsmrlT4wEHyHBIQsGo20hDfY2iG0pHFIXnWk26eNqu01rd6+K04kcDpZkEhki dNey09ajRtGmz39OBlbhqtk6VRO9b9Dkwemthh0SUh1sdFjRe8jFKLDEk5HpJDLCdQRv HJ+Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1699569253; x=1700174053; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=k0GnV4Z+c6dZbNpid6RQ+HjufacG8wqK42c+ZwyHwlU=; b=pDlK8Hrdh6UOPWluZMd50ut79Ov/ECadLD0BB5Er6Dd2hEOAq7KrYyArByOMxDN1lw SL2SPwgkFl9fS14GSRBrcqHK73dTTKNjfsdY8KOha/TJjGtLN+9UYbEypJcrvmUnQdzK dVvNuXMJ8A1qWQvZ1/smZ32XMfbLdT56981nSy5FqAI2hujvUizxOZE3X6GPUtbZdruQ kv6Z3GntwLffcDZJPsNxIDh8xkVaAV9FVrNzk1QGrZ8kDD3+IWNbwWVN2tIR2cyaysOV EjukjBkfMfayoDTXdlpe8+rHxzVHOSY5seXYS6aKVbD8YtVdi14IGdqnQ2tyO3cW3BpP +Q3g== X-Gm-Message-State: AOJu0Yxkjuo5CQoTxTCOz5/eQ1y+XMweEcYJC3rtR3WlEeISX1Mtl+6d uD/wxz5BmU0IDebXSvuLuxytIhEV0oVkWdlbMa5SLw== X-Google-Smtp-Source: AGHT+IFknJ1TNULMkhYf2/dZXsP/r/ACSCik2uNjZYiO4YBvfdiRGGjYc99i5ZRkWbZzg0n7YjTRig== X-Received: by 2002:a05:620a:408b:b0:779:e90c:f203 with SMTP id f11-20020a05620a408b00b00779e90cf203mr6715192qko.9.1699569252890; Thu, 09 Nov 2023 14:34:12 -0800 (PST) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id z20-20020ae9c114000000b0076cb3690ae7sm248902qki.68.2023.11.09.14.34.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 09 Nov 2023 14:34:12 -0800 (PST) Date: Thu, 9 Nov 2023 17:34:11 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Junio C Hamano Subject: [PATCH 1/7] chunk-format: introduce `pair_chunk_expect()` helper Message-ID: References: <20231109070310.GA2697602@coredump.intra.peff.net> Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: In 570b8b8836 (chunk-format: note that pair_chunk() is unsafe, 2023-10-09), the pair_chunk() interface grew a required "size" pointer, so that the caller is forced to at least have a handle on the actual size of the given chunk. Many callers were converted to the new interface. A handful were instead converted from the unsafe version of pair_chunk() to read_chunk() so that they could check their expected size. This led to a lot of code like: static int graph_read_oid_lookup(const unsigned char *chunk_start, size_t chunk_size, void *data) { struct commit_graph *g = data; g->chunk_oid_lookup = chunk_start; if (chunk_size / g->hash_len != g->num_commits) return error(_("commit-graph OID lookup chunk is the wrong size")); return 0; } , leaving the caller to use read_chunk(), like so: read_chunk(cf, GRAPH_CHUNKID_OIDLOOKUP, graph_read_oid_lookup, graph); The callback to read_chunk() (in the above, `graph_read_oid_lookup()`) does nothing more than (a) assign a pointer to the location of the start of the chunk in the mmap'd file, and (b) assert that it has the correct size. For callers that know the expected size of their chunk(s) up-front (most often because they are made up of a known number of fixed-size records), we can simplify this by teaching the chunk-format API itself to validate the expected size for us. This is wrapped in a new function, called `pair_chunk_expect()` which takes a pair of "size_t"s (corresponding to the record size and count), instead of a "size_t *", and validates that the given chunk matches the expected size as given. This will allow us to reduce the number of lines of code it takes to perform these basic read_chunk() operations, by taking the above and replacing it with something like: if (pair_chunk_expect(cf, GRAPH_CHUNKID_OIDLOOKUP, &graph->chunk_oid_lookup, graph->hash_len, graph->num_commits)) error(_("commit-graph oid lookup chunk is wrong size")); We will perform those transformations in the following commits. Helped-by: Jeff King Signed-off-by: Taylor Blau --- chunk-format.c | 29 +++++++++++++++++++++++++++++ chunk-format.h | 13 ++++++++++++- 2 files changed, 41 insertions(+), 1 deletion(-) diff --git a/chunk-format.c b/chunk-format.c index cdc7f39b70..be078dcca8 100644 --- a/chunk-format.c +++ b/chunk-format.c @@ -163,6 +163,10 @@ int read_table_of_contents(struct chunkfile *cf, struct pair_chunk_data { const unsigned char **p; size_t *size; + + /* for pair_chunk_expect() only */ + size_t record_size; + size_t record_nr; }; static int pair_chunk_fn(const unsigned char *chunk_start, @@ -175,6 +179,17 @@ static int pair_chunk_fn(const unsigned char *chunk_start, return 0; } +static int pair_chunk_expect_fn(const unsigned char *chunk_start, + size_t chunk_size, + void *data) +{ + struct pair_chunk_data *pcd = data; + if (chunk_size / pcd->record_size != pcd->record_nr) + return -1; + *pcd->p = chunk_start; + return 0; +} + int pair_chunk(struct chunkfile *cf, uint32_t chunk_id, const unsigned char **p, @@ -184,6 +199,20 @@ int pair_chunk(struct chunkfile *cf, return read_chunk(cf, chunk_id, pair_chunk_fn, &pcd); } +int pair_chunk_expect(struct chunkfile *cf, + uint32_t chunk_id, + const unsigned char **p, + size_t record_size, + size_t record_nr) +{ + struct pair_chunk_data pcd = { + .p = p, + .record_size = record_size, + .record_nr = record_nr, + }; + return read_chunk(cf, chunk_id, pair_chunk_expect_fn, &pcd); +} + int read_chunk(struct chunkfile *cf, uint32_t chunk_id, chunk_read_fn fn, diff --git a/chunk-format.h b/chunk-format.h index 14b76180ef..10806d7a9a 100644 --- a/chunk-format.h +++ b/chunk-format.h @@ -17,7 +17,8 @@ struct chunkfile; * * If reading a file, use a NULL 'struct hashfile *' and then call * read_table_of_contents(). Supply the memory-mapped data to the - * pair_chunk() or read_chunk() methods, as appropriate. + * pair_chunk(), pair_chunk_expect(), or read_chunk() methods, as + * appropriate. * * DO NOT MIX THESE MODES. Use different 'struct chunkfile' instances * for reading and writing. @@ -54,6 +55,16 @@ int pair_chunk(struct chunkfile *cf, const unsigned char **p, size_t *size); +/* + * Similar to 'pair_chunk', but used for callers who are reading a chunk + * with a known number of fixed-width records. + */ +int pair_chunk_expect(struct chunkfile *cf, + uint32_t chunk_id, + const unsigned char **p, + size_t record_size, + size_t record_nr); + typedef int (*chunk_read_fn)(const unsigned char *chunk_start, size_t chunk_size, void *data); /*