From patchwork Thu Oct 19 17:28:42 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13429557 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6CC8832C98 for ; Thu, 19 Oct 2023 17:28:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="A68w18bZ" Received: from mail-yw1-x1132.google.com (mail-yw1-x1132.google.com [IPv6:2607:f8b0:4864:20::1132]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EF096116 for ; Thu, 19 Oct 2023 10:28:44 -0700 (PDT) Received: by mail-yw1-x1132.google.com with SMTP id 00721157ae682-5a82c2eb50cso82534527b3.2 for ; Thu, 19 Oct 2023 10:28:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1697736524; x=1698341324; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=CXSMvWY7CKzijXjlyWkNKG+D2kaCAJngmGwvnqvSxtQ=; b=A68w18bZ3WWAmDNZAy9yoMA639gfuvXGO/cYedpDw+/ZCqoy0yEW67CSXLxIgzBj+Y wW4KGIgU4MRyXI9MDSTAFqOFRvme/6oMzaKBmqdanJE5Wz2VE15M2quzyKqsvaRk4Ool 1qJR0VyTJLVnQcdCnD2Vn6OS/mxlEmddbkTfBqP1PFfHv3yBRIrBjoQHP4B++Zkv6UAU 8KOJxUTYki4wCz8u3dw52LkA9Q+rmWgWDtfhcPaHAsbGF3YQGnPNP89BWr041+NeWo39 h4V1NN1MfYxM8UfIK2R38XhPLvdOWfEXlovM0AUQ70ThDpHAab241J/ngM1mQe3h2UgQ fQNw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697736524; x=1698341324; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=CXSMvWY7CKzijXjlyWkNKG+D2kaCAJngmGwvnqvSxtQ=; b=OxXW3C7X2TVFc/rXHMA7HDlxpxHq4IEKWYCg8/B3jJ8AldJKCwGPR05s3CXrgakuwT v6Sg1mKvOhqDLxtnafchZQp5M410SSBO2tmn2cTJPUDUM2fWEGm2740o5L/bgLR/sUfm +90Y6FJjwuOKJPDWMB3dm+xOU0UcnRMBc7l9brq22Xhqx1stXOEro4QE8ykvwgodjM+L TGAS8ESdb0Wdja0bqfXp5eaNLX/BVdlQEgBsjx5kf2VrbUb/8L2ztA1hs08+POMUNSG+ T94NxHEHlQV7a7iusrbKzSMig+LD/WAfscrnKMk0ZQ8X8TbwdoJvZrFKxjWAXUYOvGLh r0mg== X-Gm-Message-State: AOJu0YzkuHZT9q7iesTGvGl+CES1fJfdQxOJG1ocKR8UOjhRR8u8PaaF Ca9yB/dUaMHFQIBJIfwAkLYq/gBZRM6zyRP7p0wjYA== X-Google-Smtp-Source: AGHT+IGaF/IeQ+ZQjGgDypGW4Bvd5FFze9sbZ9c0pijt0+iKs2g0dzn379DSd1d60RXhxM0CmLBT/g== X-Received: by 2002:a0d:e606:0:b0:5a7:af9d:53fd with SMTP id p6-20020a0de606000000b005a7af9d53fdmr2921411ywe.9.1697736523872; Thu, 19 Oct 2023 10:28:43 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id t22-20020a05620a0b1600b007757eddae8bsm903327qkg.62.2023.10.19.10.28.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 19 Oct 2023 10:28:43 -0700 (PDT) Date: Thu, 19 Oct 2023 13:28:42 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: Elijah Newren , "Eric W. Biederman" , Jeff King , Junio C Hamano , Patrick Steinhardt Subject: [PATCH v4 1/7] bulk-checkin: extract abstract `bulk_checkin_source` Message-ID: <97bb6e9f59e5092f0519c7d1141d0673313fdc33.1697736516.git.me@ttaylorr.com> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: A future commit will want to implement a very similar routine as in `stream_blob_to_pack()` with two notable changes: - Instead of streaming just OBJ_BLOBs, this new function may want to stream objects of arbitrary type. - Instead of streaming the object's contents from an open file-descriptor, this new function may want to "stream" its contents from memory. To avoid duplicating a significant chunk of code between the existing `stream_blob_to_pack()`, extract an abstract `bulk_checkin_source`. This concept currently is a thin layer of `lseek()` and `read_in_full()`, but will grow to understand how to perform analogous operations when writing out an object's contents from memory. Suggested-by: Junio C Hamano Signed-off-by: Taylor Blau --- bulk-checkin.c | 61 +++++++++++++++++++++++++++++++++++++++++++------- 1 file changed, 53 insertions(+), 8 deletions(-) diff --git a/bulk-checkin.c b/bulk-checkin.c index 6ce62999e5..c05d06e1e1 100644 --- a/bulk-checkin.c +++ b/bulk-checkin.c @@ -140,8 +140,41 @@ static int already_written(struct bulk_checkin_packfile *state, struct object_id return 0; } +struct bulk_checkin_source { + enum { SOURCE_FILE } type; + + /* SOURCE_FILE fields */ + int fd; + + /* common fields */ + size_t size; + const char *path; +}; + +static off_t bulk_checkin_source_seek_to(struct bulk_checkin_source *source, + off_t offset) +{ + switch (source->type) { + case SOURCE_FILE: + return lseek(source->fd, offset, SEEK_SET); + default: + BUG("unknown bulk-checkin source: %d", source->type); + } +} + +static ssize_t bulk_checkin_source_read(struct bulk_checkin_source *source, + void *buf, size_t nr) +{ + switch (source->type) { + case SOURCE_FILE: + return read_in_full(source->fd, buf, nr); + default: + BUG("unknown bulk-checkin source: %d", source->type); + } +} + /* - * Read the contents from fd for size bytes, streaming it to the + * Read the contents from 'source' for 'size' bytes, streaming it to the * packfile in state while updating the hash in ctx. Signal a failure * by returning a negative value when the resulting pack would exceed * the pack size limit and this is not the first object in the pack, @@ -157,7 +190,7 @@ static int already_written(struct bulk_checkin_packfile *state, struct object_id */ static int stream_blob_to_pack(struct bulk_checkin_packfile *state, git_hash_ctx *ctx, off_t *already_hashed_to, - int fd, size_t size, const char *path, + struct bulk_checkin_source *source, unsigned flags) { git_zstream s; @@ -167,22 +200,28 @@ static int stream_blob_to_pack(struct bulk_checkin_packfile *state, int status = Z_OK; int write_object = (flags & HASH_WRITE_OBJECT); off_t offset = 0; + size_t size = source->size; git_deflate_init(&s, pack_compression_level); - hdrlen = encode_in_pack_object_header(obuf, sizeof(obuf), OBJ_BLOB, size); + hdrlen = encode_in_pack_object_header(obuf, sizeof(obuf), OBJ_BLOB, + size); s.next_out = obuf + hdrlen; s.avail_out = sizeof(obuf) - hdrlen; while (status != Z_STREAM_END) { if (size && !s.avail_in) { ssize_t rsize = size < sizeof(ibuf) ? size : sizeof(ibuf); - ssize_t read_result = read_in_full(fd, ibuf, rsize); + ssize_t read_result; + + read_result = bulk_checkin_source_read(source, ibuf, + rsize); if (read_result < 0) - die_errno("failed to read from '%s'", path); + die_errno("failed to read from '%s'", + source->path); if (read_result != rsize) die("failed to read %d bytes from '%s'", - (int)rsize, path); + (int)rsize, source->path); offset += rsize; if (*already_hashed_to < offset) { size_t hsize = offset - *already_hashed_to; @@ -258,6 +297,12 @@ static int deflate_blob_to_pack(struct bulk_checkin_packfile *state, unsigned header_len; struct hashfile_checkpoint checkpoint = {0}; struct pack_idx_entry *idx = NULL; + struct bulk_checkin_source source = { + .type = SOURCE_FILE, + .fd = fd, + .size = size, + .path = path, + }; seekback = lseek(fd, 0, SEEK_CUR); if (seekback == (off_t) -1) @@ -283,7 +328,7 @@ static int deflate_blob_to_pack(struct bulk_checkin_packfile *state, crc32_begin(state->f); } if (!stream_blob_to_pack(state, &ctx, &already_hashed_to, - fd, size, path, flags)) + &source, flags)) break; /* * Writing this object to the current pack will make @@ -295,7 +340,7 @@ static int deflate_blob_to_pack(struct bulk_checkin_packfile *state, hashfile_truncate(state->f, &checkpoint); state->offset = checkpoint.offset; flush_bulk_checkin_packfile(state); - if (lseek(fd, seekback, SEEK_SET) == (off_t) -1) + if (bulk_checkin_source_seek_to(&source, seekback) == (off_t)-1) return error("cannot seek back"); } the_hash_algo->final_oid_fn(result_oid, &ctx); From patchwork Thu Oct 19 17:28:45 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13429558 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 928813419A for ; Thu, 19 Oct 2023 17:28:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="m2r5D9MI" Received: from mail-qt1-x82a.google.com (mail-qt1-x82a.google.com [IPv6:2607:f8b0:4864:20::82a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2CB75106 for ; Thu, 19 Oct 2023 10:28:48 -0700 (PDT) Received: by mail-qt1-x82a.google.com with SMTP id d75a77b69052e-41cbf31da84so12602751cf.0 for ; Thu, 19 Oct 2023 10:28:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1697736527; x=1698341327; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=iQ79seosJVSXf5DET4/gvh4mgHAm7F22J5mTWmTlEb0=; b=m2r5D9MI5giywcXAnmA1euTsM1z373Rvp2jj94EB8nwH48XAuz68SnU62tSQVDDIck Zndj02/PnyEq+GEn28HTkhyop3p1/0fhCc8H4H555RTrFJ8lC8D6ig+b0oHrq9xHDoQ1 EYywNe2nM+4mqyrLdfBYtlPBYjPwY9oLksmKwFU8znb+mzkvt/yrrz7vivLzVVIoc0aV syrqzO15H8p5qVnUPhlJyOGuY7AwHqnU26Lq4JcfdI417gMXcfg27h1diNztyQjfJ6W1 l0IevElvsv2UZdVudihPLidz9yZgBedh2DZPPt2lJd5FBLLCnu2+hjRxjf1ZYOPtZuv7 b/kA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697736527; x=1698341327; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=iQ79seosJVSXf5DET4/gvh4mgHAm7F22J5mTWmTlEb0=; b=BYOYqUfZTAQwmY693DLXLorGKvKipvwvArPGbeJHSmSjqPMYOiqVdpqqTziU8q4yzN dD/VSOiCYthOS/f4FaA+s7IoWJposbcuphkYWix8Ni9RSG7v6nWyktRhMNDPc1O7DvKD jtmUw5DrhfSFDUBZFqJ2yLBXoBaIliyzPFBZi6Cke6NURD4vgfM7ZKYtbR1gS5GT1P+H OgQdVsQx4tmXyHJ6smDrLbBqcbS9PMEZQlTpdb09NpsAuv2gKKJJ5E76293ZLLiLG6v0 vvi8ThLWVt74gzMQQjBJWKAEjUFQyBZQr27N+uAjZ8TTJDH+uqw7xF7Xru8nEenFViCZ ZTRA== X-Gm-Message-State: AOJu0Yyp7UsFWJvsjsr2AQwECSP2S5p6mSh4+KYWnoN5wlgirrLbr8K5 6YIHHd4OMw1iVp39n2SHI5kweXMeAoJ1S7KsjQAycQ== X-Google-Smtp-Source: AGHT+IFvrnvIUC8jTKYgZbuyTy5ZqzMclypClo9Fwg3oZl3hioNElH8uSPV/a/k8gpWfpZbIc20Fmw== X-Received: by 2002:ad4:5be2:0:b0:66d:44c9:ac8 with SMTP id k2-20020ad45be2000000b0066d44c90ac8mr3390774qvc.24.1697736526948; Thu, 19 Oct 2023 10:28:46 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id f8-20020a0cf7c8000000b006564afc5908sm9341qvo.111.2023.10.19.10.28.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 19 Oct 2023 10:28:46 -0700 (PDT) Date: Thu, 19 Oct 2023 13:28:45 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: Elijah Newren , "Eric W. Biederman" , Jeff King , Junio C Hamano , Patrick Steinhardt Subject: [PATCH v4 2/7] bulk-checkin: generify `stream_blob_to_pack()` for arbitrary types Message-ID: <9d633df339f2a769bcae4d6328ca47915184e4aa.1697736516.git.me@ttaylorr.com> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: The existing `stream_blob_to_pack()` function is named based on the fact that it knows only how to stream blobs into a bulk-checkin pack. But there is no longer anything in this function which prevents us from writing objects of arbitrary types to the bulk-checkin pack. Prepare to write OBJ_TREEs by removing this assumption, adding an `enum object_type` parameter to this function's argument list, and renaming it to `stream_obj_to_pack()` accordingly. Signed-off-by: Taylor Blau --- bulk-checkin.c | 15 +++++++-------- 1 file changed, 7 insertions(+), 8 deletions(-) diff --git a/bulk-checkin.c b/bulk-checkin.c index c05d06e1e1..7e6b52112e 100644 --- a/bulk-checkin.c +++ b/bulk-checkin.c @@ -188,10 +188,10 @@ static ssize_t bulk_checkin_source_read(struct bulk_checkin_source *source, * status before calling us just in case we ask it to call us again * with a new pack. */ -static int stream_blob_to_pack(struct bulk_checkin_packfile *state, - git_hash_ctx *ctx, off_t *already_hashed_to, - struct bulk_checkin_source *source, - unsigned flags) +static int stream_obj_to_pack(struct bulk_checkin_packfile *state, + git_hash_ctx *ctx, off_t *already_hashed_to, + struct bulk_checkin_source *source, + enum object_type type, unsigned flags) { git_zstream s; unsigned char ibuf[16384]; @@ -204,8 +204,7 @@ static int stream_blob_to_pack(struct bulk_checkin_packfile *state, git_deflate_init(&s, pack_compression_level); - hdrlen = encode_in_pack_object_header(obuf, sizeof(obuf), OBJ_BLOB, - size); + hdrlen = encode_in_pack_object_header(obuf, sizeof(obuf), type, size); s.next_out = obuf + hdrlen; s.avail_out = sizeof(obuf) - hdrlen; @@ -327,8 +326,8 @@ static int deflate_blob_to_pack(struct bulk_checkin_packfile *state, idx->offset = state->offset; crc32_begin(state->f); } - if (!stream_blob_to_pack(state, &ctx, &already_hashed_to, - &source, flags)) + if (!stream_obj_to_pack(state, &ctx, &already_hashed_to, + &source, OBJ_BLOB, flags)) break; /* * Writing this object to the current pack will make From patchwork Thu Oct 19 17:28:48 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13429559 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 12EBA347C9 for ; Thu, 19 Oct 2023 17:28:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="EbmIGWah" Received: from mail-yb1-xb33.google.com (mail-yb1-xb33.google.com [IPv6:2607:f8b0:4864:20::b33]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E8D63CF for ; Thu, 19 Oct 2023 10:28:50 -0700 (PDT) Received: by mail-yb1-xb33.google.com with SMTP id 3f1490d57ef6-d77ad095f13so8515505276.2 for ; Thu, 19 Oct 2023 10:28:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1697736530; x=1698341330; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=Y+DEeCy474bdzfkPR1WnB7ovUU7jUK0mCkLA5nybjF8=; b=EbmIGWah1t7gcxVP/vI5YD58zLpRDVoxrrJ2dyjh5oyNfvPSPNFxdWDrt5jPSbl3Y/ XwHMxSa9q6vI/JnQbIlzt2bMVMuEvA5HMcB/UfKBGTyJRQKMJt8/ZVAWGZgactPIDajT rrzdDBHROuEnXtyi/Aul0Q4eug/c1W59OzM7/hKTxKuODVnnQMpmVJDmPNWlx+qgTuey Ng2Hrc/bzt/QfXDXDV/wbRGtf0bfV78hmzoZnOOLA5lStNiU1LujIfGud7isuGuzcEix n8GNcd6PYi+zg7Xv3NtwZDZ+rfIFuDDKg0m1mZSylXSxicsfmLpG+yD1MC9uIR819DWH 7NcQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697736530; x=1698341330; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=Y+DEeCy474bdzfkPR1WnB7ovUU7jUK0mCkLA5nybjF8=; b=B9CY5pA2QUmOgd/p4QHw/F/w49mNCRHHazaqHu3voPm9Qb8+a0ZQGA/7RoN426rmve tZMXi1jfGk9BBE8gEbBCX6Ydsz/ygaV2GSeSGIZ46J0V2/K5S6HaZlGQy5Nl4r79uWoa alqSaakqd+W9eQdk956sar3DtFYzQstYhu1Cz3zP4lqB+X9Uf7LnDAcCsk94V55lfxyt y+VL2sLc+MldHSIqI/BPJAfI96V5Gr6ERYxbhI8CFYQDlBg2OUlGr/FykCj7fgaXZkA9 mAdmj4skjI6xb8EOy2w8dDzGB7BewNpRLpHezQN0dHdZSfoeUs2XI+3pAAx/snjv5V0M UvUw== X-Gm-Message-State: AOJu0Yw9cvvfYX6bLRoUv4yvu/+XV5G1yHcXAyu/6PoncH3KbtK/Gf8j YzVk5Ae6a71qEeAGkT0KC313dq67G0mDl9Tg/3QDVw== X-Google-Smtp-Source: AGHT+IFxkAyETLOZIuZ5HpSGeVEpOSIvlY8OXN3jSPSys2rXWcDLppIBHu8939vtp328qAJzM5vudg== X-Received: by 2002:a25:3d5:0:b0:d9a:58e1:106d with SMTP id 204-20020a2503d5000000b00d9a58e1106dmr2911711ybd.52.1697736529860; Thu, 19 Oct 2023 10:28:49 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id lg8-20020a056214548800b0066d05ed3778sm15830qvb.56.2023.10.19.10.28.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 19 Oct 2023 10:28:49 -0700 (PDT) Date: Thu, 19 Oct 2023 13:28:48 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: Elijah Newren , "Eric W. Biederman" , Jeff King , Junio C Hamano , Patrick Steinhardt Subject: [PATCH v4 3/7] bulk-checkin: refactor deflate routine to accept a `bulk_checkin_source` Message-ID: References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Prepare for a future change where we will want to use a routine very similar to the existing `deflate_blob_to_pack()` but over arbitrary sources (i.e. either open file-descriptors, or a location in memory). Extract out a common "deflate_obj_to_pack()" routine that acts on a bulk_checkin_source, instead of a (int, size_t) pair. Then rewrite `deflate_blob_to_pack()` in terms of it. Signed-off-by: Taylor Blau --- bulk-checkin.c | 52 ++++++++++++++++++++++++++++++-------------------- 1 file changed, 31 insertions(+), 21 deletions(-) diff --git a/bulk-checkin.c b/bulk-checkin.c index 7e6b52112e..28bc8d5ab4 100644 --- a/bulk-checkin.c +++ b/bulk-checkin.c @@ -285,30 +285,23 @@ static void prepare_to_stream(struct bulk_checkin_packfile *state, die_errno("unable to write pack header"); } -static int deflate_blob_to_pack(struct bulk_checkin_packfile *state, - struct object_id *result_oid, - int fd, size_t size, - const char *path, unsigned flags) + +static int deflate_obj_to_pack(struct bulk_checkin_packfile *state, + struct object_id *result_oid, + struct bulk_checkin_source *source, + enum object_type type, + off_t seekback, + unsigned flags) { - off_t seekback, already_hashed_to; + off_t already_hashed_to = 0; git_hash_ctx ctx; unsigned char obuf[16384]; unsigned header_len; struct hashfile_checkpoint checkpoint = {0}; struct pack_idx_entry *idx = NULL; - struct bulk_checkin_source source = { - .type = SOURCE_FILE, - .fd = fd, - .size = size, - .path = path, - }; - seekback = lseek(fd, 0, SEEK_CUR); - if (seekback == (off_t) -1) - return error("cannot find the current offset"); - - header_len = format_object_header((char *)obuf, sizeof(obuf), - OBJ_BLOB, size); + header_len = format_object_header((char *)obuf, sizeof(obuf), type, + source->size); the_hash_algo->init_fn(&ctx); the_hash_algo->update_fn(&ctx, obuf, header_len); the_hash_algo->init_fn(&checkpoint.ctx); @@ -317,8 +310,6 @@ static int deflate_blob_to_pack(struct bulk_checkin_packfile *state, if ((flags & HASH_WRITE_OBJECT) != 0) CALLOC_ARRAY(idx, 1); - already_hashed_to = 0; - while (1) { prepare_to_stream(state, flags); if (idx) { @@ -327,7 +318,7 @@ static int deflate_blob_to_pack(struct bulk_checkin_packfile *state, crc32_begin(state->f); } if (!stream_obj_to_pack(state, &ctx, &already_hashed_to, - &source, OBJ_BLOB, flags)) + source, type, flags)) break; /* * Writing this object to the current pack will make @@ -339,7 +330,7 @@ static int deflate_blob_to_pack(struct bulk_checkin_packfile *state, hashfile_truncate(state->f, &checkpoint); state->offset = checkpoint.offset; flush_bulk_checkin_packfile(state); - if (bulk_checkin_source_seek_to(&source, seekback) == (off_t)-1) + if (bulk_checkin_source_seek_to(source, seekback) == (off_t)-1) return error("cannot seek back"); } the_hash_algo->final_oid_fn(result_oid, &ctx); @@ -361,6 +352,25 @@ static int deflate_blob_to_pack(struct bulk_checkin_packfile *state, return 0; } +static int deflate_blob_to_pack(struct bulk_checkin_packfile *state, + struct object_id *result_oid, + int fd, size_t size, + const char *path, unsigned flags) +{ + struct bulk_checkin_source source = { + .type = SOURCE_FILE, + .fd = fd, + .size = size, + .path = path, + }; + off_t seekback = lseek(fd, 0, SEEK_CUR); + if (seekback == (off_t) -1) + return error("cannot find the current offset"); + + return deflate_obj_to_pack(state, result_oid, &source, OBJ_BLOB, + seekback, flags); +} + void prepare_loose_object_bulk_checkin(void) { /* From patchwork Thu Oct 19 17:28:51 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13429560 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3BE18354EA for ; Thu, 19 Oct 2023 17:28:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="InQ/O81/" Received: from mail-qv1-xf29.google.com (mail-qv1-xf29.google.com [IPv6:2607:f8b0:4864:20::f29]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1C1CE112 for ; Thu, 19 Oct 2023 10:28:54 -0700 (PDT) Received: by mail-qv1-xf29.google.com with SMTP id 6a1803df08f44-66d4453ba38so32946166d6.0 for ; Thu, 19 Oct 2023 10:28:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1697736533; x=1698341333; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=PO2d8JRzHU7FtTh9JiLtJgEt0DrafOcFO2g07IErPkA=; b=InQ/O81/BEsvdXQiBs9qz2X8m7iVS7xjXbqLwNfyriXYf1n1oObDXYKuMp528Gt0Xx cg8OHHyu6J//XToOtDfqpXPoe/InHFJ1yqImzpTWT5YSj81UzjvlBkudFGo6WNJyNFNU kKpiaQEfQztlRbqGoUvP1nU9vr3iTH7/KA9pt2uYA9sGxbROMQykIqPMk4ECs8p32MK6 h7UUuC2ceHSrIN5vaM4gixEVhFduC/qtZfNdOnw5fIuADgcHABK2xluu7KQrORxukgbv ml58BBx6L1Dk3bxtbPG+ft1mrr5fveKe+2cEvjaXsswB6jxNvkmtvSZc8/B3uSeycMj8 XV0g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697736533; x=1698341333; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=PO2d8JRzHU7FtTh9JiLtJgEt0DrafOcFO2g07IErPkA=; b=vzlrNPBGQll9gUhZX7t8Jc8U7sJDQ9+K06FCg86FBgUWHovfICFz0IO68RFTqhgVC7 ywLBD8g4QtD8EwhAW4CHveJZThuPlElJa+7wwaRRPclJY0mCi2vMsvHADX7np6rkP4lW u/HGY0ZWaNsLOBEyKBpJxWhiERDt/MvICaL8Om9Oen3NOH6v9XhU4iX7U2GkKB43xRvt eEtl5mVxpdlOs6uao5HcqbWV4nIjnmVybQ3x482QGOrVlerJySi1A0rjtK10kTP9sMMb N53RuQ6PcG+yqYsrF2YUXYnvZX+kdLwTrP59edz419iTgVaDYNysvXqiRnR5HpbtX/Hv XMnw== X-Gm-Message-State: AOJu0Yx/bcO9kwPs9oMCm2O93FK8OiLQJB5ALlZlRwuIKChdDwcoaOlF 0FuGksue67P9lEDYJhB0L7YJnznAhMBqmi5B48T+ug== X-Google-Smtp-Source: AGHT+IHK6i4P9Z8hhcA2D+HWm+kBaGiQhY0q+x8P21v7I8An7hZ0k1H5HJ0sY1EM2P/N0oHJHa/5oQ== X-Received: by 2002:a05:6214:21ed:b0:651:6349:fa7 with SMTP id p13-20020a05621421ed00b0065163490fa7mr3290192qvj.25.1697736533076; Thu, 19 Oct 2023 10:28:53 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id e20-20020ad442b4000000b0065d1380dd17sm15736qvr.61.2023.10.19.10.28.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 19 Oct 2023 10:28:52 -0700 (PDT) Date: Thu, 19 Oct 2023 13:28:51 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: Elijah Newren , "Eric W. Biederman" , Jeff King , Junio C Hamano , Patrick Steinhardt Subject: [PATCH v4 4/7] bulk-checkin: implement `SOURCE_INCORE` mode for `bulk_checkin_source` Message-ID: References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Continue to prepare for streaming an object's contents directly from memory by teaching `bulk_checkin_source` how to perform reads and seeks based on an address in memory. Unlike file descriptors, which manage their own offset internally, we have to keep track of how many bytes we've read out of the buffer, and make sure we don't read past the end of the buffer. Suggested-by: Junio C Hamano Signed-off-by: Taylor Blau --- bulk-checkin.c | 18 +++++++++++++++++- 1 file changed, 17 insertions(+), 1 deletion(-) diff --git a/bulk-checkin.c b/bulk-checkin.c index 28bc8d5ab4..60361b3e2e 100644 --- a/bulk-checkin.c +++ b/bulk-checkin.c @@ -141,11 +141,15 @@ static int already_written(struct bulk_checkin_packfile *state, struct object_id } struct bulk_checkin_source { - enum { SOURCE_FILE } type; + enum { SOURCE_FILE, SOURCE_INCORE } type; /* SOURCE_FILE fields */ int fd; + /* SOURCE_INCORE fields */ + const void *buf; + size_t read; + /* common fields */ size_t size; const char *path; @@ -157,6 +161,11 @@ static off_t bulk_checkin_source_seek_to(struct bulk_checkin_source *source, switch (source->type) { case SOURCE_FILE: return lseek(source->fd, offset, SEEK_SET); + case SOURCE_INCORE: + if (!(0 <= offset && offset < source->size)) + return (off_t)-1; + source->read = offset; + return source->read; default: BUG("unknown bulk-checkin source: %d", source->type); } @@ -168,6 +177,13 @@ static ssize_t bulk_checkin_source_read(struct bulk_checkin_source *source, switch (source->type) { case SOURCE_FILE: return read_in_full(source->fd, buf, nr); + case SOURCE_INCORE: + assert(source->read <= source->size); + if (nr > source->size - source->read) + nr = source->size - source->read; + memcpy(buf, (unsigned char *)source->buf + source->read, nr); + source->read += nr; + return nr; default: BUG("unknown bulk-checkin source: %d", source->type); } From patchwork Thu Oct 19 17:28:54 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13429561 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 918A335513 for ; Thu, 19 Oct 2023 17:28:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="p1CVgMPr" Received: from mail-qv1-xf33.google.com (mail-qv1-xf33.google.com [IPv6:2607:f8b0:4864:20::f33]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 46338CF for ; Thu, 19 Oct 2023 10:28:57 -0700 (PDT) Received: by mail-qv1-xf33.google.com with SMTP id 6a1803df08f44-66cfd35f595so48819416d6.2 for ; Thu, 19 Oct 2023 10:28:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1697736536; x=1698341336; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=z/yIZoK7/bYFsNbbqGOq2HXOM4InMO1xEo5GdQEL0RU=; b=p1CVgMPrgIzf/9F0YVyJB/yZLqFBGpiEKycmcFOclebQho2l7pUKAzowtHPmtRdGR8 h0jWMjwkprIKxquOrSrQC/cbtTpOOJNEriFBXcLX7439548KakWqAm5JGbqp/udeUO/9 4cu5jjCt+eBninfgajHnbTKGVRojEWiewHwq0bBgep577HwLTKd1kHqJUy4pJrKVydh1 YmXIpD2ND0v+ANWotCjfLcEix321TVaO7MK2w/S25gimoRfj5AthHMIYmwt3OI3JhcUF gZCDJulqn91tBaR6YFKxqlUroiSBhGPwX5PIdxlahkW8tOb40qpKf/R5ugDzOWMq7M4i UeGQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697736536; x=1698341336; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=z/yIZoK7/bYFsNbbqGOq2HXOM4InMO1xEo5GdQEL0RU=; b=QTzUqrEsFPm2lKu+3WKtkUYbUAt39xTXJIXNIVEHt3g1xRpjGSKyXHVjIoIn7k8n/l JTSEpPeYJm50QMUp0M8UlPSMRPs2oe0GGgqY/P4rPP7S8kJutBgD2TvvgVotAdty+X28 AMnCpMpHdL4nN5njHDSCuNZ5vic2Nfo7UUw16MIt6c7GD65KrWXGB2fJC0A+bQXXcSyW l4bfvVlcR5BL0qPZvlWilgYrgb1BqAjT797vc30Kj9gGPRlE8Rq5+YEMF0GoSMI4vxVx czupPeyYnyVotzBPVA4/+hi79YF0r+hy89gOXPjTHDzkM+qpZJm8jlboiUBKNcvWBIkj 5SOA== X-Gm-Message-State: AOJu0YxLq+0CoewzteUdxFdbeNhN6bXs0Ra8thLdbWE/9bKaFitGK9wh QNtz1ga/06sQUXapSU5HJJ1PUveql2vLL3R8SDVOVw== X-Google-Smtp-Source: AGHT+IFYMVdUlY/ixoA4vEbFQildLhzt4mIsDh9UOp2ZeMc0qV7dqFprWC5FnSTeFpMRu5W2RiP1LQ== X-Received: by 2002:a05:6214:e6c:b0:66d:11fd:c9c4 with SMTP id jz12-20020a0562140e6c00b0066d11fdc9c4mr2904672qvb.46.1697736536118; Thu, 19 Oct 2023 10:28:56 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id pm10-20020ad446ca000000b0066d1f118b7esm18211qvb.1.2023.10.19.10.28.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 19 Oct 2023 10:28:55 -0700 (PDT) Date: Thu, 19 Oct 2023 13:28:54 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: Elijah Newren , "Eric W. Biederman" , Jeff King , Junio C Hamano , Patrick Steinhardt Subject: [PATCH v4 5/7] bulk-checkin: introduce `index_blob_bulk_checkin_incore()` Message-ID: <48095afe80fa94a4e9b47f95e9e5821e690075c3.1697736516.git.me@ttaylorr.com> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Now that we have factored out many of the common routines necessary to index a new object into a pack created by the bulk-checkin machinery, we can introduce a variant of `index_blob_bulk_checkin()` that acts on blobs whose contents we can fit in memory. This will be useful in a couple of more commits in order to provide the `merge-tree` builtin with a mechanism to create a new pack containing any objects it created during the merge, instead of storing those objects individually as loose. Similar to the existing `index_blob_bulk_checkin()` function, the entrypoint delegates to `deflate_obj_to_pack_incore()`. That function in turn delegates to deflate_obj_to_pack(), which is responsible for formatting the pack header and then deflating the contents into the pack. Consistent with the rest of the bulk-checkin mechanism, there are no direct tests here. In future commits when we expose this new functionality via the `merge-tree` builtin, we will test it indirectly there. Signed-off-by: Taylor Blau --- bulk-checkin.c | 29 +++++++++++++++++++++++++++++ bulk-checkin.h | 4 ++++ 2 files changed, 33 insertions(+) diff --git a/bulk-checkin.c b/bulk-checkin.c index 60361b3e2e..655a583b06 100644 --- a/bulk-checkin.c +++ b/bulk-checkin.c @@ -368,6 +368,23 @@ static int deflate_obj_to_pack(struct bulk_checkin_packfile *state, return 0; } +static int deflate_obj_to_pack_incore(struct bulk_checkin_packfile *state, + struct object_id *result_oid, + const void *buf, size_t size, + const char *path, enum object_type type, + unsigned flags) +{ + struct bulk_checkin_source source = { + .type = SOURCE_INCORE, + .buf = buf, + .size = size, + .read = 0, + .path = path, + }; + + return deflate_obj_to_pack(state, result_oid, &source, type, 0, flags); +} + static int deflate_blob_to_pack(struct bulk_checkin_packfile *state, struct object_id *result_oid, int fd, size_t size, @@ -431,6 +448,18 @@ int index_blob_bulk_checkin(struct object_id *oid, return status; } +int index_blob_bulk_checkin_incore(struct object_id *oid, + const void *buf, size_t size, + const char *path, unsigned flags) +{ + int status = deflate_obj_to_pack_incore(&bulk_checkin_packfile, oid, + buf, size, path, OBJ_BLOB, + flags); + if (!odb_transaction_nesting) + flush_bulk_checkin_packfile(&bulk_checkin_packfile); + return status; +} + void begin_odb_transaction(void) { odb_transaction_nesting += 1; diff --git a/bulk-checkin.h b/bulk-checkin.h index aa7286a7b3..1b91daeaee 100644 --- a/bulk-checkin.h +++ b/bulk-checkin.h @@ -13,6 +13,10 @@ int index_blob_bulk_checkin(struct object_id *oid, int fd, size_t size, const char *path, unsigned flags); +int index_blob_bulk_checkin_incore(struct object_id *oid, + const void *buf, size_t size, + const char *path, unsigned flags); + /* * Tell the object database to optimize for adding * multiple objects. end_odb_transaction must be called From patchwork Thu Oct 19 17:28:57 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13429562 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 742BE3551E for ; Thu, 19 Oct 2023 17:29:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="A+I7sw3K" Received: from mail-qt1-x835.google.com (mail-qt1-x835.google.com [IPv6:2607:f8b0:4864:20::835]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 35472121 for ; Thu, 19 Oct 2023 10:29:00 -0700 (PDT) Received: by mail-qt1-x835.google.com with SMTP id d75a77b69052e-41cc776ccb4so6608441cf.1 for ; Thu, 19 Oct 2023 10:29:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1697736539; x=1698341339; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=SJDkSjt2ki+3DLRlhDqakH7kpII7dPEZ57X0nfOaBPU=; b=A+I7sw3K9gQnssgSFGDeg30OOMzEqHvM0+YIrq0+NnswbBQmbVFIM7d6SMNqYbtxH5 qY4gkrB5OrqvL0N75xutGf0DnDjZo518JbBvHc3S01QrDYdw6VVKV2BFK+oo1lUrrIOG CuHYmCJ6njMmizVaRzwX7gaCMVgPqE89NO1Ty2p8HnAgACACVAthCePUOcJCzSEL2uUA T7z4OIaVdzxHRny073dNDWvNSHoV74ch2ShE45BMMNaQ2ncEBI4R1HlPrTngRf+S1Ixx /003K+DOtJwAZARDCFAt2e5UOWKWXxEdXOeOSo0U23BFCXWQG6vZIA/qleku6Oh45sw4 aNDA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697736539; x=1698341339; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=SJDkSjt2ki+3DLRlhDqakH7kpII7dPEZ57X0nfOaBPU=; b=oIe9g9Xr7g56SAAjYAbPEi8Puv5O00cscS5PgldUgG8g5hHNJBCsSjfhqVl/OnLRCD lboyDdthvtBuk7OlghSh+8pH4yaJsaEWde/n2fVSxHdOKAnyPAMCDNVKYeTeAid66fAi mqaSZuozhTM9h1u6j5lpROItNG8JrsmJY3WdMJE2bG48rrHHTOOiW0AAOa/AnEkDBBaR MJ0rAZmK1H3j3wbNOwrImpVLdZrOVgipfoLLbw2zAmztVCzmEtuYtUh4UnfrqgaBUtmz SgsW4BshcKnMEfHm5X7WwibASNoeocELDGLaabDmLKWNFewWNz6ngVAGYi7w1y29pYDM LmDQ== X-Gm-Message-State: AOJu0Yx28cBpKwQw2TuxPbSBA0X9xVT0PENu4ra8lIxsJRsgdE3277VS TeKh+LubwbWCHsQYfbchOVdcgrF5FewGS6Ma1zrJGg== X-Google-Smtp-Source: AGHT+IFr5qZHpzHunS/epBrOCjQOnKtG8kB6PpNlPXd+EbLuiXcEdD3scwZ2UGnvrrZeSemq22kkGw== X-Received: by 2002:ac8:5f8b:0:b0:412:2ed3:38ec with SMTP id j11-20020ac85f8b000000b004122ed338ecmr3571486qta.18.1697736539135; Thu, 19 Oct 2023 10:28:59 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id o9-20020ac85a49000000b004199f47ccdbsm883731qta.51.2023.10.19.10.28.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 19 Oct 2023 10:28:58 -0700 (PDT) Date: Thu, 19 Oct 2023 13:28:57 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: Elijah Newren , "Eric W. Biederman" , Jeff King , Junio C Hamano , Patrick Steinhardt Subject: [PATCH v4 6/7] bulk-checkin: introduce `index_tree_bulk_checkin_incore()` Message-ID: <60568f9281c2588f777e3886610a1b40730fcc0f.1697736516.git.me@ttaylorr.com> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: The remaining missing piece in order to teach the `merge-tree` builtin how to write the contents of a merge into a pack is a function to index tree objects into a bulk-checkin pack. This patch implements that missing piece, which is a thin wrapper around all of the functionality introduced in previous commits. If and when Git gains support for a "compatibility" hash algorithm, the changes to support that here will be minimal. The bulk-checkin machinery will need to convert the incoming tree to compute its length under the compatibility hash, necessary to reconstruct its header. With that information (and the converted contents of the tree), the bulk-checkin machinery will have enough to keep track of the converted object's hash in order to update the compatibility mapping. Within some thin wrapper around `deflate_obj_to_pack_incore()` (perhaps `deflate_tree_to_pack_incore()`), the changes should be limited to something like: struct strbuf converted = STRBUF_INIT; if (the_repository->compat_hash_algo) { if (convert_object_file(&compat_obj, the_repository->hash_algo, the_repository->compat_hash_algo, ...) < 0) die(...); format_object_header_hash(the_repository->compat_hash_algo, OBJ_TREE, size); } /* compute the converted tree's hash using the compat algorithm */ strbuf_release(&converted); , assuming related changes throughout the rest of the bulk-checkin machinery necessary to update the hash of the converted object, which are likewise minimal in size. Signed-off-by: Taylor Blau --- bulk-checkin.c | 12 ++++++++++++ bulk-checkin.h | 4 ++++ 2 files changed, 16 insertions(+) diff --git a/bulk-checkin.c b/bulk-checkin.c index 655a583b06..c1faf75f5f 100644 --- a/bulk-checkin.c +++ b/bulk-checkin.c @@ -460,6 +460,18 @@ int index_blob_bulk_checkin_incore(struct object_id *oid, return status; } +int index_tree_bulk_checkin_incore(struct object_id *oid, + const void *buf, size_t size, + const char *path, unsigned flags) +{ + int status = deflate_obj_to_pack_incore(&bulk_checkin_packfile, oid, + buf, size, path, OBJ_TREE, + flags); + if (!odb_transaction_nesting) + flush_bulk_checkin_packfile(&bulk_checkin_packfile); + return status; +} + void begin_odb_transaction(void) { odb_transaction_nesting += 1; diff --git a/bulk-checkin.h b/bulk-checkin.h index 1b91daeaee..89786b3954 100644 --- a/bulk-checkin.h +++ b/bulk-checkin.h @@ -17,6 +17,10 @@ int index_blob_bulk_checkin_incore(struct object_id *oid, const void *buf, size_t size, const char *path, unsigned flags); +int index_tree_bulk_checkin_incore(struct object_id *oid, + const void *buf, size_t size, + const char *path, unsigned flags); + /* * Tell the object database to optimize for adding * multiple objects. end_odb_transaction must be called From patchwork Thu Oct 19 17:29:00 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13429563 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 161A636B07 for ; Thu, 19 Oct 2023 17:29:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="Xpuxe1ZB" Received: from mail-qv1-xf31.google.com (mail-qv1-xf31.google.com [IPv6:2607:f8b0:4864:20::f31]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 44100106 for ; Thu, 19 Oct 2023 10:29:03 -0700 (PDT) Received: by mail-qv1-xf31.google.com with SMTP id 6a1803df08f44-65b0e623189so48873646d6.1 for ; Thu, 19 Oct 2023 10:29:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1697736542; x=1698341342; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=zWBPb1p/3e7JwVGCRdjpElN9hfloV7/vAVvTj2hAblw=; b=Xpuxe1ZB9LY2GFUr6VfTe/MJre4RvIMhqcj33k1hXy1MNaon4I9E8zk6flxG1z6EM+ oOf6zZx+J0fvn6JWWvUzEB5SYKkeDJTKdSICbnfQVz2FTBfCByIZVMqz52zPMoDtUGEg v3D2WNDkCM8MJ5/EF9+0+sU6OuVOJBtqepxioB3NlBTereh7fihgTuLHRwvIcE3B+wc6 6Gim5AVwOHbo5d+FW7IpcSpUhNwHzW/rwWu5nUYCTbWVFVQq+Ka7BDnvQ/S6yamRoQhC 2ctIWOmmzXxKc+iomJJCqFUhG7gYHXHjHfZ04tyO2PYCKvEWexkl2YUIHRXUTz4BHOhe 5Cog== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697736542; x=1698341342; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=zWBPb1p/3e7JwVGCRdjpElN9hfloV7/vAVvTj2hAblw=; b=bvl+9rRQuaaGlIGlGuQtCiJhbRr9begPkA/8FgXDOx09b2vLHd69mBlkOJ9QJLX7XJ P6iJHc3RHzxDrohgD5BWoTYULEvRxJLzUESosrbLvFgA3fslDBlLYhXyibYMpSOJF2Zx WfS496GiddN578N/bD50lHZ2StjSx7ywdZXSuhGgL/JMBPyUog7PTiwHRtfWd3DMrg5p 4IySt5iWuPPsDE+Jzv7jNL6vYNkpTTdHL19YUuRDBb9Cr31jfT4uSNFwOOtbeMOuojZd pxaFc3rfYs9p1A5Mb5WEubf63XrW14JgSKymzsATIfdtz4k5FwzvhrAiC23QJ1sWJ/Ml II6Q== X-Gm-Message-State: AOJu0YygONomlcDgh9TrezEfEKG1pZv8LsOb82j9EOMLAS+Nq0HWmEtS 4NVp0IpvSKYIt1Zd7RXTpgDMIqZGDPtkW53ElkeeKA== X-Google-Smtp-Source: AGHT+IGmichjZk0bDzMht5+g0rbDJedgbChxVDzdlCNug94JGlJ2Xokal6lfYaLnAoSr54TbZnWB0A== X-Received: by 2002:a05:6214:c8d:b0:65b:252c:4227 with SMTP id r13-20020a0562140c8d00b0065b252c4227mr2940679qvr.0.1697736542037; Thu, 19 Oct 2023 10:29:02 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id b18-20020ad45192000000b0065b31dfdf70sm18222qvp.11.2023.10.19.10.29.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 19 Oct 2023 10:29:01 -0700 (PDT) Date: Thu, 19 Oct 2023 13:29:00 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: Elijah Newren , "Eric W. Biederman" , Jeff King , Junio C Hamano , Patrick Steinhardt Subject: [PATCH v4 7/7] builtin/merge-tree.c: implement support for `--write-pack` Message-ID: References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: When using merge-tree often within a repository[^1], it is possible to generate a relatively large number of loose objects, which can result in degraded performance, and inode exhaustion in extreme cases. Building on the functionality introduced in previous commits, the bulk-checkin machinery now has support to write arbitrary blob and tree objects which are small enough to be held in-core. We can use this to write any blob/tree objects generated by ORT into a separate pack instead of writing them out individually as loose. This functionality is gated behind a new `--write-pack` option to `merge-tree` that works with the (non-deprecated) `--write-tree` mode. The implementation is relatively straightforward. There are two spots within the ORT mechanism where we call `write_object_file()`, one for content differences within blobs, and another to assemble any new trees necessary to construct the merge. In each of those locations, conditionally replace calls to `write_object_file()` with `index_blob_bulk_checkin_incore()` or `index_tree_bulk_checkin_incore()` depending on which kind of object we are writing. The only remaining task is to begin and end the transaction necessary to initialize the bulk-checkin machinery, and move any new pack(s) it created into the main object store. [^1]: Such is the case at GitHub, where we run presumptive "test merges" on open pull requests to see whether or not we can light up the merge button green depending on whether or not the presumptive merge was conflicted. This is done in response to a number of user-initiated events, including viewing an open pull request whose last test merge is stale with respect to the current base and tip of the pull request. As a result, merge-tree can be run very frequently on large, active repositories. Signed-off-by: Taylor Blau --- Documentation/git-merge-tree.txt | 4 ++ builtin/merge-tree.c | 5 ++ merge-ort.c | 42 +++++++++++---- merge-recursive.h | 1 + t/t4301-merge-tree-write-tree.sh | 93 ++++++++++++++++++++++++++++++++ 5 files changed, 136 insertions(+), 9 deletions(-) diff --git a/Documentation/git-merge-tree.txt b/Documentation/git-merge-tree.txt index ffc4fbf7e8..9d37609ef1 100644 --- a/Documentation/git-merge-tree.txt +++ b/Documentation/git-merge-tree.txt @@ -69,6 +69,10 @@ OPTIONS specify a merge-base for the merge, and specifying multiple bases is currently not supported. This option is incompatible with `--stdin`. +--write-pack:: + Write any new objects into a separate packfile instead of as + individual loose objects. + [[OUTPUT]] OUTPUT ------ diff --git a/builtin/merge-tree.c b/builtin/merge-tree.c index 0de42aecf4..672ebd4c54 100644 --- a/builtin/merge-tree.c +++ b/builtin/merge-tree.c @@ -18,6 +18,7 @@ #include "quote.h" #include "tree.h" #include "config.h" +#include "bulk-checkin.h" static int line_termination = '\n'; @@ -414,6 +415,7 @@ struct merge_tree_options { int show_messages; int name_only; int use_stdin; + int write_pack; }; static int real_merge(struct merge_tree_options *o, @@ -440,6 +442,7 @@ static int real_merge(struct merge_tree_options *o, init_merge_options(&opt, the_repository); opt.show_rename_progress = 0; + opt.write_pack = o->write_pack; opt.branch1 = branch1; opt.branch2 = branch2; @@ -548,6 +551,8 @@ int cmd_merge_tree(int argc, const char **argv, const char *prefix) &merge_base, N_("commit"), N_("specify a merge-base for the merge")), + OPT_BOOL(0, "write-pack", &o.write_pack, + N_("write new objects to a pack instead of as loose")), OPT_END() }; diff --git a/merge-ort.c b/merge-ort.c index 3653725661..523577d71e 100644 --- a/merge-ort.c +++ b/merge-ort.c @@ -48,6 +48,7 @@ #include "tree.h" #include "unpack-trees.h" #include "xdiff-interface.h" +#include "bulk-checkin.h" /* * We have many arrays of size 3. Whenever we have such an array, the @@ -2108,10 +2109,19 @@ static int handle_content_merge(struct merge_options *opt, if ((merge_status < 0) || !result_buf.ptr) ret = error(_("failed to execute internal merge")); - if (!ret && - write_object_file(result_buf.ptr, result_buf.size, - OBJ_BLOB, &result->oid)) - ret = error(_("unable to add %s to database"), path); + if (!ret) { + ret = opt->write_pack + ? index_blob_bulk_checkin_incore(&result->oid, + result_buf.ptr, + result_buf.size, + path, 1) + : write_object_file(result_buf.ptr, + result_buf.size, + OBJ_BLOB, &result->oid); + if (ret) + ret = error(_("unable to add %s to database"), + path); + } free(result_buf.ptr); if (ret) @@ -3597,7 +3607,8 @@ static int tree_entry_order(const void *a_, const void *b_) b->string, strlen(b->string), bmi->result.mode); } -static int write_tree(struct object_id *result_oid, +static int write_tree(struct merge_options *opt, + struct object_id *result_oid, struct string_list *versions, unsigned int offset, size_t hash_size) @@ -3631,8 +3642,14 @@ static int write_tree(struct object_id *result_oid, } /* Write this object file out, and record in result_oid */ - if (write_object_file(buf.buf, buf.len, OBJ_TREE, result_oid)) + ret = opt->write_pack + ? index_tree_bulk_checkin_incore(result_oid, + buf.buf, buf.len, "", 1) + : write_object_file(buf.buf, buf.len, OBJ_TREE, result_oid); + + if (ret) ret = -1; + strbuf_release(&buf); return ret; } @@ -3797,8 +3814,8 @@ static int write_completed_directory(struct merge_options *opt, */ dir_info->is_null = 0; dir_info->result.mode = S_IFDIR; - if (write_tree(&dir_info->result.oid, &info->versions, offset, - opt->repo->hash_algo->rawsz) < 0) + if (write_tree(opt, &dir_info->result.oid, &info->versions, + offset, opt->repo->hash_algo->rawsz) < 0) ret = -1; } @@ -4332,9 +4349,13 @@ static int process_entries(struct merge_options *opt, fflush(stdout); BUG("dir_metadata accounting completely off; shouldn't happen"); } - if (write_tree(result_oid, &dir_metadata.versions, 0, + if (write_tree(opt, result_oid, &dir_metadata.versions, 0, opt->repo->hash_algo->rawsz) < 0) ret = -1; + + if (opt->write_pack) + end_odb_transaction(); + cleanup: string_list_clear(&plist, 0); string_list_clear(&dir_metadata.versions, 0); @@ -4878,6 +4899,9 @@ static void merge_start(struct merge_options *opt, struct merge_result *result) */ strmap_init(&opt->priv->conflicts); + if (opt->write_pack) + begin_odb_transaction(); + trace2_region_leave("merge", "allocate/init", opt->repo); } diff --git a/merge-recursive.h b/merge-recursive.h index b88000e3c2..156e160876 100644 --- a/merge-recursive.h +++ b/merge-recursive.h @@ -48,6 +48,7 @@ struct merge_options { unsigned renormalize : 1; unsigned record_conflict_msgs_as_headers : 1; const char *msg_header_prefix; + unsigned write_pack : 1; /* internal fields used by the implementation */ struct merge_options_internal *priv; diff --git a/t/t4301-merge-tree-write-tree.sh b/t/t4301-merge-tree-write-tree.sh index 250f721795..2d81ff4de5 100755 --- a/t/t4301-merge-tree-write-tree.sh +++ b/t/t4301-merge-tree-write-tree.sh @@ -922,4 +922,97 @@ test_expect_success 'check the input format when --stdin is passed' ' test_cmp expect actual ' +packdir=".git/objects/pack" + +test_expect_success 'merge-tree can pack its result with --write-pack' ' + test_when_finished "rm -rf repo" && + git init repo && + + # base has lines [3, 4, 5] + # - side adds to the beginning, resulting in [1, 2, 3, 4, 5] + # - other adds to the end, resulting in [3, 4, 5, 6, 7] + # + # merging the two should result in a new blob object containing + # [1, 2, 3, 4, 5, 6, 7], along with a new tree. + test_commit -C repo base file "$(test_seq 3 5)" && + git -C repo branch -M main && + git -C repo checkout -b side main && + test_commit -C repo side file "$(test_seq 1 5)" && + git -C repo checkout -b other main && + test_commit -C repo other file "$(test_seq 3 7)" && + + find repo/$packdir -type f -name "pack-*.idx" >packs.before && + tree="$(git -C repo merge-tree --write-pack \ + refs/tags/side refs/tags/other)" && + blob="$(git -C repo rev-parse $tree:file)" && + find repo/$packdir -type f -name "pack-*.idx" >packs.after && + + test_must_be_empty packs.before && + test_line_count = 1 packs.after && + + git show-index <$(cat packs.after) >objects && + test_line_count = 2 objects && + grep "^[1-9][0-9]* $tree" objects && + grep "^[1-9][0-9]* $blob" objects +' + +test_expect_success 'merge-tree can write multiple packs with --write-pack' ' + test_when_finished "rm -rf repo" && + git init repo && + ( + cd repo && + + git config pack.packSizeLimit 512 && + + test_seq 512 >f && + + # "f" contains roughly ~2,000 bytes. + # + # Each side ("foo" and "bar") adds a small amount of data at the + # beginning and end of "base", respectively. + git add f && + test_tick && + git commit -m base && + git branch -M main && + + git checkout -b foo main && + { + echo foo && cat f + } >f.tmp && + mv f.tmp f && + git add f && + test_tick && + git commit -m foo && + + git checkout -b bar main && + echo bar >>f && + git add f && + test_tick && + git commit -m bar && + + find $packdir -type f -name "pack-*.idx" >packs.before && + # Merging either side should result in a new object which is + # larger than 1M, thus the result should be split into two + # separate packs. + tree="$(git merge-tree --write-pack \ + refs/heads/foo refs/heads/bar)" && + blob="$(git rev-parse $tree:f)" && + find $packdir -type f -name "pack-*.idx" >packs.after && + + test_must_be_empty packs.before && + test_line_count = 2 packs.after && + for idx in $(cat packs.after) + do + git show-index <$idx || return 1 + done >objects && + + # The resulting set of packs should contain one copy of both + # objects, each in a separate pack. + test_line_count = 2 objects && + grep "^[1-9][0-9]* $tree" objects && + grep "^[1-9][0-9]* $blob" objects + + ) +' + test_done