From patchwork Sat Oct 9 08:20:58 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Han Xin X-Patchwork-Id: 12547359 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 358D5C433EF for ; Sat, 9 Oct 2021 08:21:18 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 0D9E760F8F for ; Sat, 9 Oct 2021 08:21:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230202AbhJIIXN (ORCPT ); Sat, 9 Oct 2021 04:23:13 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33926 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229578AbhJIIXH (ORCPT ); Sat, 9 Oct 2021 04:23:07 -0400 Received: from mail-pj1-x1029.google.com (mail-pj1-x1029.google.com [IPv6:2607:f8b0:4864:20::1029]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A6CBDC061570 for ; Sat, 9 Oct 2021 01:21:09 -0700 (PDT) Received: by mail-pj1-x1029.google.com with SMTP id ls18-20020a17090b351200b001a00250584aso10487620pjb.4 for ; Sat, 09 Oct 2021 01:21:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=pEz22nQ/Tj3oPWHtjnsLy25KBD+x056U4Pr2DkzNlmU=; b=ZRz5fONuy0a6IZIX+aiKS1YHWRBSWuSZ7YCPlNndw9D7LLk2Sey8E23VEqhkepSZ1T xv/84UMuenS+BQD2uf10RdC9u3/HWFUxp6cum9s7u5BrA2yMXT24TWCz+RD7NtHnKhRj 5jU47GaoDg8qp9NQ4J5NrKoPvC8+cqEG2B+pl///LVDVhrfHHY6fntXVCqYQN3FgNhoM htilp8MwEqTJmYjG/6qgClxQipon42/ORH07OEJ4zax0JYgEyO1LEtxGAeDnvkBEAfCE /rEnkodaPU3zXlcqTcaITKVNIF17RQAKaFY8xskATRAb5Uc9XttHu+xf+oeXGkioZzHg gbpw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=pEz22nQ/Tj3oPWHtjnsLy25KBD+x056U4Pr2DkzNlmU=; b=7w+I+YfLqNBVKtymLLRmfWJ7YsCrj27wIP/FWWbBtSqwf1Ie6vhgJeroTW1rK5mMa5 Au21Y71QN8sHdqzve2/u/DxQFZTNdgJPBD95hjfVVt65p5uDEamQD97mx+9/kzVBYwYB JV0lYokWIyEeHj8aH9N9Mi1b+pC+EviNDWDwF2SPy5qaqvuEMU7nn6Dmg0DXncz7qfPT nl39DB9Ho6AxLIj5W1VIIOAgvDaL6ldbjDDOXpjd99pPwEysEG7ise9yFu+8ErjpWTXw G+GrA32Vy562MbTzg4YDDMMMQmXbGf1vAqNgShrFdy3opeSc74/P9ObVq/PpLvCsSXQo J3TA== X-Gm-Message-State: AOAM530iMMs3tdKSSMqO3Ccvz3PbvA7j2hr2QtVT6jaks1tXVb7ABNog 0YfMlJr3LB1eHLmZ4liJs/E= X-Google-Smtp-Source: ABdhPJyvmztzFWDJgHMKWtlx5bjx5OTJ7s7NavEr5UjXS68aduDryyxGzkwu5z0Hf05FZC1UFi00zg== X-Received: by 2002:a17:90a:1f4a:: with SMTP id y10mr16483409pjy.225.1633767669086; Sat, 09 Oct 2021 01:21:09 -0700 (PDT) Received: from localhost.localdomain ([58.100.47.145]) by smtp.gmail.com with ESMTPSA id f20sm1710551pga.12.2021.10.09.01.21.06 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Sat, 09 Oct 2021 01:21:08 -0700 (PDT) From: Han Xin To: Junio C Hamano , Git List , Jeff King , Jiang Xin Cc: Han Xin Subject: [PATCH] unpack-objects: unpack large object in stream Date: Sat, 9 Oct 2021 16:20:58 +0800 Message-Id: <20211009082058.41138-1-chiyutianyi@gmail.com> X-Mailer: git-send-email 2.33.0.1.g09a6bb964f.dirty MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Han Xin When calling "unpack_non_delta_entry()", will allocate full memory for the whole size of the unpacked object and write the buffer to loose file on disk. This may lead to OOM for the git-unpack-objects process when unpacking a very large object. In function "unpack_delta_entry()", will also allocate full memory to buffer the whole delta, but since there will be no delta for an object larger than "core.bigFileThreshold", this issue is moderate. To resolve the OOM issue in "git-unpack-objects", we can unpack large object to file in stream, and use the setting of "core.bigFileThreshold" as the threshold for large object. Reviewed-by: Jiang Xin Signed-off-by: Han Xin --- builtin/unpack-objects.c | 41 +++++++- object-file.c | 149 +++++++++++++++++++++++++++--- object-store.h | 9 ++ t/t5590-receive-unpack-objects.sh | 92 ++++++++++++++++++ 4 files changed, 279 insertions(+), 12 deletions(-) create mode 100755 t/t5590-receive-unpack-objects.sh diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c index 4a9466295b..8ac77e60a8 100644 --- a/builtin/unpack-objects.c +++ b/builtin/unpack-objects.c @@ -320,11 +320,50 @@ static void added_object(unsigned nr, enum object_type type, } } +static void fill_stream(struct git_zstream *stream) +{ + stream->next_in = fill(1); + stream->avail_in = len; +} + +static void use_stream(struct git_zstream *stream) +{ + use(len - stream->avail_in); +} + +static void write_stream_blob(unsigned nr, unsigned long size) +{ + struct git_zstream_reader reader; + struct object_id *oid = &obj_list[nr].oid; + + reader.fill = &fill_stream; + reader.use = &use_stream; + + if (write_stream_object_file(&reader, size, type_name(OBJ_BLOB), + oid, dry_run)) + die("failed to write object in stream"); + if (strict && !dry_run) { + struct blob *blob = lookup_blob(the_repository, oid); + if (blob) + blob->object.flags |= FLAG_WRITTEN; + else + die("invalid blob object from stream"); + } + obj_list[nr].obj = NULL; +} + static void unpack_non_delta_entry(enum object_type type, unsigned long size, unsigned nr) { - void *buf = get_data(size); + void *buf; + + /* Write large blob in stream without allocating full buffer. */ + if (type == OBJ_BLOB && size > big_file_threshold) { + write_stream_blob(nr, size); + return; + } + buf = get_data(size); if (!dry_run && buf) write_object(nr, type, buf, size); else diff --git a/object-file.c b/object-file.c index a8be899481..06c1693675 100644 --- a/object-file.c +++ b/object-file.c @@ -1913,6 +1913,28 @@ static int create_tmpfile(struct strbuf *tmp, const char *filename) return fd; } +static int write_object_buffer(struct git_zstream *stream, git_hash_ctx *c, + int fd, unsigned char *compressed, + int compressed_len, const void *buf, + size_t len, int flush) +{ + int ret; + + stream->next_in = (void *)buf; + stream->avail_in = len; + do { + unsigned char *in0 = stream->next_in; + ret = git_deflate(stream, flush); + the_hash_algo->update_fn(c, in0, stream->next_in - in0); + if (write_buffer(fd, compressed, stream->next_out - compressed) < 0) + die(_("unable to write loose object file")); + stream->next_out = compressed; + stream->avail_out = compressed_len; + } while (ret == Z_OK); + + return ret; +} + static int write_loose_object(const struct object_id *oid, char *hdr, int hdrlen, const void *buf, unsigned long len, time_t mtime) @@ -1949,17 +1971,9 @@ static int write_loose_object(const struct object_id *oid, char *hdr, the_hash_algo->update_fn(&c, hdr, hdrlen); /* Then the data itself.. */ - stream.next_in = (void *)buf; - stream.avail_in = len; - do { - unsigned char *in0 = stream.next_in; - ret = git_deflate(&stream, Z_FINISH); - the_hash_algo->update_fn(&c, in0, stream.next_in - in0); - if (write_buffer(fd, compressed, stream.next_out - compressed) < 0) - die(_("unable to write loose object file")); - stream.next_out = compressed; - stream.avail_out = sizeof(compressed); - } while (ret == Z_OK); + ret = write_object_buffer(&stream, &c, fd, compressed, + sizeof(compressed), buf, len, + Z_FINISH); if (ret != Z_STREAM_END) die(_("unable to deflate new object %s (%d)"), oid_to_hex(oid), @@ -2020,6 +2034,119 @@ int write_object_file(const void *buf, unsigned long len, const char *type, return write_loose_object(oid, hdr, hdrlen, buf, len, 0); } +int write_stream_object_file(struct git_zstream_reader *reader, + unsigned long len, const char *type, + struct object_id *oid, + int dry_run) +{ + git_zstream istream, ostream; + unsigned char buf[8192], compressed[4096]; + char hdr[MAX_HEADER_LEN]; + int istatus, ostatus, fd = 0, hdrlen, dirlen, flush = 0; + int ret = 0; + git_hash_ctx c; + struct strbuf tmp_file = STRBUF_INIT; + struct strbuf filename = STRBUF_INIT; + + /* Write tmpfile in objects dir, because oid is unknown */ + if (!dry_run) { + strbuf_addstr(&filename, the_repository->objects->odb->path); + strbuf_addch(&filename, '/'); + fd = create_tmpfile(&tmp_file, filename.buf); + if (fd < 0) { + if (errno == EACCES) + ret = error(_("insufficient permission for adding an object to repository database %s"), + get_object_directory()); + else + ret = error_errno(_("unable to create temporary file")); + goto cleanup; + } + } + + memset(&istream, 0, sizeof(istream)); + istream.next_out = buf; + istream.avail_out = sizeof(buf); + git_inflate_init(&istream); + + if (!dry_run) { + /* Set it up */ + git_deflate_init(&ostream, zlib_compression_level); + ostream.next_out = compressed; + ostream.avail_out = sizeof(compressed); + the_hash_algo->init_fn(&c); + + /* First header */ + hdrlen = xsnprintf(hdr, sizeof(hdr), "%s %" PRIuMAX, type, + (uintmax_t)len) + 1; + ostream.next_in = (unsigned char *)hdr; + ostream.avail_in = hdrlen; + while (git_deflate(&ostream, 0) == Z_OK) + ; /* nothing */ + the_hash_algo->update_fn(&c, hdr, hdrlen); + } + + /* Then the data itself */ + do { + unsigned char *last_out = istream.next_out; + reader->fill(&istream); + istatus = git_inflate(&istream, 0); + if (istatus == Z_STREAM_END) + flush = Z_FINISH; + reader->use(&istream); + if (!dry_run) + ostatus = write_object_buffer(&ostream, &c, fd, compressed, + sizeof(compressed), last_out, + istream.next_out - last_out, + flush); + istream.next_out = buf; + istream.avail_out = sizeof(buf); + } while (istatus == Z_OK); + + if (istream.total_out != len || istatus != Z_STREAM_END) + die( _("inflate returned %d"), istatus); + git_inflate_end(&istream); + + if (dry_run) + goto cleanup; + + if (ostatus != Z_STREAM_END) + die(_("unable to deflate new object (%d)"), ostatus); + ostatus = git_deflate_end_gently(&ostream); + if (ostatus != Z_OK) + die(_("deflateEnd on object failed (%d)"), ostatus); + the_hash_algo->final_fn(oid->hash, &c); + close_loose_object(fd); + + /* We get the oid now */ + loose_object_path(the_repository, &filename, oid); + + dirlen = directory_size(filename.buf); + if (dirlen) { + struct strbuf dir = STRBUF_INIT; + /* + * Make sure the directory exists; note that the contents + * of the buffer are undefined after mkstemp returns an + * error, so we have to rewrite the whole buffer from + * scratch. + */ + strbuf_add(&dir, filename.buf, dirlen - 1); + if (mkdir(dir.buf, 0777) && errno != EEXIST) { + unlink_or_warn(tmp_file.buf); + strbuf_release(&dir); + ret = -1; + goto cleanup; + } + strbuf_release(&dir); + } + + ret = finalize_object_file(tmp_file.buf, filename.buf); + +cleanup: + strbuf_release(&tmp_file); + strbuf_release(&filename); + return ret; +} + int hash_object_file_literally(const void *buf, unsigned long len, const char *type, struct object_id *oid, unsigned flags) diff --git a/object-store.h b/object-store.h index d24915ced1..12b113ef93 100644 --- a/object-store.h +++ b/object-store.h @@ -33,6 +33,11 @@ struct object_directory { char *path; }; +struct git_zstream_reader { + void (*fill)(struct git_zstream *); + void (*use)(struct git_zstream *); +}; + KHASH_INIT(odb_path_map, const char * /* key: odb_path */, struct object_directory *, 1, fspathhash, fspatheq) @@ -225,6 +230,10 @@ int hash_object_file(const struct git_hash_algo *algo, const void *buf, int write_object_file(const void *buf, unsigned long len, const char *type, struct object_id *oid); +int write_stream_object_file(struct git_zstream_reader *reader, + unsigned long len, const char *type, + struct object_id *oid, int dry_run); + int hash_object_file_literally(const void *buf, unsigned long len, const char *type, struct object_id *oid, unsigned flags); diff --git a/t/t5590-receive-unpack-objects.sh b/t/t5590-receive-unpack-objects.sh new file mode 100755 index 0000000000..7e63dfc0db --- /dev/null +++ b/t/t5590-receive-unpack-objects.sh @@ -0,0 +1,92 @@ +#!/bin/sh +# +# Copyright (c) 2021 Han Xin +# + +test_description='Test unpack-objects when receive pack' + +GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main +export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME + +. ./test-lib.sh + +test_expect_success "create commit with big blobs (1.5 MB)" ' + test-tool genrandom foo 1500000 >big-blob && + test_commit --append foo big-blob && + test-tool genrandom bar 1500000 >big-blob && + test_commit --append bar big-blob && + ( + cd .git && + find objects/?? -type f | sort + ) >expect && + git repack -ad +' + +test_expect_success 'setup GIT_ALLOC_LIMIT to 1MB' ' + GIT_ALLOC_LIMIT=1m && + export GIT_ALLOC_LIMIT +' + +test_expect_success 'prepare dest repository' ' + git init --bare dest.git && + git -C dest.git config core.bigFileThreshold 2m && + git -C dest.git config receive.unpacklimit 100 +' + +test_expect_success 'fail to push: cannot allocate' ' + test_must_fail git push dest.git HEAD 2>err && + test_i18ngrep "remote: fatal: attempting to allocate" err && + ( + cd dest.git && + find objects/?? -type f | sort + ) >actual && + ! test_cmp expect actual +' + +test_expect_success 'set a lower bigfile threshold' ' + git -C dest.git config core.bigFileThreshold 1m +' + +test_expect_success 'unpack big object in stream' ' + git push dest.git HEAD && + git -C dest.git fsck && + ( + cd dest.git && + find objects/?? -type f | sort + ) >actual && + test_cmp expect actual +' + +test_expect_success 'setup for unpack-objects dry-run test' ' + PACK=$(echo main | git pack-objects --progress --revs test) && + unset GIT_ALLOC_LIMIT && + git init --bare unpack-test.git +' + +test_expect_success 'unpack-objects dry-run with large threshold' ' + ( + cd unpack-test.git && + git config core.bigFileThreshold 2m && + git unpack-objects -n <../test-$PACK.pack + ) && + ( + cd unpack-test.git && + find objects/ -type f + ) >actual && + test_must_be_empty actual +' + +test_expect_success 'unpack-objects dry-run with small threshold' ' + ( + cd unpack-test.git && + git config core.bigFileThreshold 1m && + git unpack-objects -n <../test-$PACK.pack + ) && + ( + cd unpack-test.git && + find objects/ -type f + ) >actual && + test_must_be_empty actual +' + +test_done From patchwork Fri Nov 12 09:40:06 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Han Xin X-Patchwork-Id: 12616417 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9559FC433EF for ; Fri, 12 Nov 2021 09:42:02 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7225560EFD for ; Fri, 12 Nov 2021 09:42:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234862AbhKLJow (ORCPT ); Fri, 12 Nov 2021 04:44:52 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60000 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234855AbhKLJov (ORCPT ); Fri, 12 Nov 2021 04:44:51 -0500 Received: from mail-pj1-x102e.google.com (mail-pj1-x102e.google.com [IPv6:2607:f8b0:4864:20::102e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3FB99C061766 for ; Fri, 12 Nov 2021 01:42:01 -0800 (PST) Received: by mail-pj1-x102e.google.com with SMTP id gb13-20020a17090b060d00b001a674e2c4a8so7139176pjb.4 for ; Fri, 12 Nov 2021 01:42:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=HIGGY4gMqk8UtNQzWhaZzUHKIEXuvW8JN2xdt8ujWtg=; b=PqImHcybrXM9uRLuSvWRYqYB/oLBbrQrcIbrqoRQG47uv3gkX7+a6fxXUxSpo4o3nW EaBayLbWVpxUFOQth2R6Z0V7NhculFD+fqjFDUyl9l/L8wmJ44McxRN/NhVPRCoGLskt 3P4j4r6QZI/tAXNKsPHmdvydI63wp2vF8XCLaU82lVqKPx6osQIibNkPVglr8ljUta3Q KJxqcMLMQ7HFDa24KhT4JOAYPVwSVftXMLI+acTkI6IMZypTr/oZbnk5OgPYp0Rexjmk Xu53FBWGu0CciPDINljyQ+iuvZJhi0P5xSjucQCyrerMI70N36TtrFka+fWtYtTLpQrg bPZQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=HIGGY4gMqk8UtNQzWhaZzUHKIEXuvW8JN2xdt8ujWtg=; b=hKshss6IjiKlSvrqE1MxMp/OtQl9x9XaJAAh/UuU6Js5o/WhYHGHUTWwyUR7IELYpQ ZPw8FHGHZM/gUFOhR69FaqDk2D90eD6l25jT2JH0nunFS+9bbaA8W/SF39XkrFAwo6/i rSX/o3S/cx1c2HSnVHXEUmY13Tn+h1xODgGmf36/JNVnFCGdFHb3sLEd/F/9Q6vVwaZu 0j2ey3a2j958y/OAhF4qJNYPMRd9tlQCV5Af9jZbnepPd8jnxJ66v2ms3qnYgJ4oDIf+ o/TjrUYHTTyIUp66jgVSIlHy460/5yxpWoSXr1YTYb7k8VBTCNe8oRuO40/0qWtpWYMI sbjw== X-Gm-Message-State: AOAM531tDuLkw+Jn76NlgHsaO5HtahAHjRzmVrJc5tVgTGrvdvzcy7oa Sesa10pC38/T92Xs/+wGbvg= X-Google-Smtp-Source: ABdhPJxGPV8akuMU+kKP/N3ztL60KqD6DNbNIZsjtMhvSVKoPJTmXXDyfQTBb67YhGi9Fu5UmNkRBQ== X-Received: by 2002:a17:90b:1c02:: with SMTP id oc2mr16350857pjb.65.1636710120790; Fri, 12 Nov 2021 01:42:00 -0800 (PST) Received: from localhost.localdomain ([205.204.117.100]) by smtp.gmail.com with ESMTPSA id q18sm6310103pfj.46.2021.11.12.01.41.58 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Fri, 12 Nov 2021 01:42:00 -0800 (PST) From: Han Xin To: Junio C Hamano , Git List , Jeff King , Jiang Xin , Philip Oakley Cc: Han Xin Subject: [PATCH v2 2/6] object-file.c: add dry_run mode for write_loose_object() Date: Fri, 12 Nov 2021 17:40:06 +0800 Message-Id: <20211112094010.73468-2-chiyutianyi@gmail.com> X-Mailer: git-send-email 2.33.1.44.g9344627884.agit.6.5.4 In-Reply-To: <20211009082058.41138-1-chiyutianyi@gmail.com> References: <20211009082058.41138-1-chiyutianyi@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Han Xin We will use "write_loose_object()" later to handle large blob object, which needs to work in dry_run mode. Helped-by: Jiang Xin Signed-off-by: Han Xin --- object-file.c | 32 +++++++++++++++++++------------- 1 file changed, 19 insertions(+), 13 deletions(-) diff --git a/object-file.c b/object-file.c index 1ad2cb579c..b0838c847e 100644 --- a/object-file.c +++ b/object-file.c @@ -1880,9 +1880,10 @@ static const char *read_input_stream_from_buffer(void *data, unsigned long *len) static int write_loose_object(const struct object_id *oid, char *hdr, int hdrlen, struct input_stream *in_stream, + int dry_run, time_t mtime, unsigned flags) { - int fd, ret; + int fd, ret = 0; unsigned char compressed[4096]; git_zstream stream; git_hash_ctx c; @@ -1894,14 +1895,16 @@ static int write_loose_object(const struct object_id *oid, char *hdr, loose_object_path(the_repository, &filename, oid); - fd = create_tmpfile(&tmp_file, filename.buf); - if (fd < 0) { - if (flags & HASH_SILENT) - return -1; - else if (errno == EACCES) - return error(_("insufficient permission for adding an object to repository database %s"), get_object_directory()); - else - return error_errno(_("unable to create temporary file")); + if (!dry_run) { + fd = create_tmpfile(&tmp_file, filename.buf); + if (fd < 0) { + if (flags & HASH_SILENT) + return -1; + else if (errno == EACCES) + return error(_("insufficient permission for adding an object to repository database %s"), get_object_directory()); + else + return error_errno(_("unable to create temporary file")); + } } /* Set it up */ @@ -1925,7 +1928,7 @@ static int write_loose_object(const struct object_id *oid, char *hdr, unsigned char *in0 = stream.next_in; ret = git_deflate(&stream, Z_FINISH); the_hash_algo->update_fn(&c, in0, stream.next_in - in0); - if (write_buffer(fd, compressed, stream.next_out - compressed) < 0) + if (!dry_run && write_buffer(fd, compressed, stream.next_out - compressed) < 0) die(_("unable to write loose object file")); stream.next_out = compressed; stream.avail_out = sizeof(compressed); @@ -1943,6 +1946,9 @@ static int write_loose_object(const struct object_id *oid, char *hdr, die(_("confused by unstable object source data for %s"), oid_to_hex(oid)); + if (dry_run) + return 0; + close_loose_object(fd); if (mtime) { @@ -1996,7 +2002,7 @@ int write_object_file_flags(const void *buf, unsigned long len, &hdrlen); if (freshen_packed_object(oid) || freshen_loose_object(oid)) return 0; - return write_loose_object(oid, hdr, hdrlen, &in_stream, 0, flags); + return write_loose_object(oid, hdr, hdrlen, &in_stream, 0, 0, flags); } int hash_object_file_literally(const void *buf, unsigned long len, @@ -2023,7 +2029,7 @@ int hash_object_file_literally(const void *buf, unsigned long len, goto cleanup; if (freshen_packed_object(oid) || freshen_loose_object(oid)) goto cleanup; - status = write_loose_object(oid, header, hdrlen, &in_stream, 0, 0); + status = write_loose_object(oid, header, hdrlen, &in_stream, 0, 0, 0); cleanup: free(header); @@ -2052,7 +2058,7 @@ int force_object_loose(const struct object_id *oid, time_t mtime) data.buf = buf; data.len = len; hdrlen = xsnprintf(hdr, sizeof(hdr), "%s %"PRIuMAX , type_name(type), (uintmax_t)len) + 1; - ret = write_loose_object(oid, hdr, hdrlen, &in_stream, mtime, 0); + ret = write_loose_object(oid, hdr, hdrlen, &in_stream, 0, mtime, 0); free(buf); return ret; From patchwork Fri Nov 12 09:40:07 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Han Xin X-Patchwork-Id: 12616419 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7E62AC433F5 for ; Fri, 12 Nov 2021 09:42:05 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 5854F60EFD for ; Fri, 12 Nov 2021 09:42:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234691AbhKLJoy (ORCPT ); Fri, 12 Nov 2021 04:44:54 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60018 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234865AbhKLJox (ORCPT ); Fri, 12 Nov 2021 04:44:53 -0500 Received: from mail-pf1-x42a.google.com (mail-pf1-x42a.google.com [IPv6:2607:f8b0:4864:20::42a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 76451C06127A for ; Fri, 12 Nov 2021 01:42:03 -0800 (PST) Received: by mail-pf1-x42a.google.com with SMTP id n85so8024061pfd.10 for ; Fri, 12 Nov 2021 01:42:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=X2W5X25/KoAJXu3U5msZKVRYOHfoP6Zpb5dytoB/oTY=; b=bXmDRfCxnECG6M4EFqmBYqrn4hAOHZ7bZt4ION6l4/4aLPebxQIMGd2lGeC/rR2yyy 0K7EgFngS7/YparSWfMMWC/K+8L7qUhtXi3JvLCk+fggzRq+hheJuo2L61ntWQL0ZgWy M/DDKcPcwglljTvbt6FeJ78mfQibFDAuUWBVE+aBP+dgKadJCoNieYtphHSiAOMUxxKu cWBNyRpm9ubkHrXhaCN/W1jrNRyGsgvgboAZewtE2nGnIGc1Gps+JJjNHs3ko/PTsi8j 1gUU6V9mttYjmSbnd1R4l2yURzD5qTc2vUjzq2sn/+6XkMiWAaFS/3oXOwqJOP0025U3 x0pw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=X2W5X25/KoAJXu3U5msZKVRYOHfoP6Zpb5dytoB/oTY=; b=Yv2iMvB6DoJ0Za8sCZQSY7aJxZFoCFjpoX0Pk3hd5QcD7IDdOHhmmfstUt3ffThS0m AouPAcG9+rpfQh+PsLGgSS8N4quRoFrZMEegPxBVFXf+PLl+8XZs9/3aQ5v5ahsHkSaR IZZjfCOQIt6jxHdkUCzGg8ZaCvNslaCOsGljuVQ+JegF7n6C6tSI6ZhTm9WdVpH0jieU xVbTr2kI8ub2orbYrNvPweDPFRsytNGqr2D49ZboKvvhKEz+B+fN4+ZCczHfKYE6t3ZV alC/qlun1HopMt8IK5uTiwQt+dd/XXcR1osVkv6/7N5kJBGnxVJ+wrCtOtZljcrzKTXp kISw== X-Gm-Message-State: AOAM530c5C7dxlyv6zGtkVSQl6c+djAxt+dVikoGYIyNujXWYelYKmim QuX4QQBKdguMDz3hufp5E04= X-Google-Smtp-Source: ABdhPJzP14MUdExdL4eoz6jCcFhB9XTU1plc7NW++uOF6UksAFlRwe1iA8PwOIOykhROHG536eK7Eg== X-Received: by 2002:a05:6a00:15c7:b0:49f:f48b:f96e with SMTP id o7-20020a056a0015c700b0049ff48bf96emr12803480pfu.65.1636710123050; Fri, 12 Nov 2021 01:42:03 -0800 (PST) Received: from localhost.localdomain ([205.204.117.100]) by smtp.gmail.com with ESMTPSA id q18sm6310103pfj.46.2021.11.12.01.42.01 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Fri, 12 Nov 2021 01:42:02 -0800 (PST) From: Han Xin To: Junio C Hamano , Git List , Jeff King , Jiang Xin , Philip Oakley Cc: Han Xin Subject: [PATCH v2 3/6] object-file.c: handle nil oid in write_loose_object() Date: Fri, 12 Nov 2021 17:40:07 +0800 Message-Id: <20211112094010.73468-3-chiyutianyi@gmail.com> X-Mailer: git-send-email 2.33.1.44.g9344627884.agit.6.5.4 In-Reply-To: <20211009082058.41138-1-chiyutianyi@gmail.com> References: <20211009082058.41138-1-chiyutianyi@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Han Xin When read input stream, oid can't get before reading all, and it will be filled after reading. Helped-by: Jiang Xin Signed-off-by: Han Xin --- object-file.c | 34 ++++++++++++++++++++++++++++++++-- 1 file changed, 32 insertions(+), 2 deletions(-) diff --git a/object-file.c b/object-file.c index b0838c847e..8393659f0d 100644 --- a/object-file.c +++ b/object-file.c @@ -1893,7 +1893,13 @@ static int write_loose_object(const struct object_id *oid, char *hdr, const char *buf; unsigned long len; - loose_object_path(the_repository, &filename, oid); + if (is_null_oid(oid)) { + /* When oid is not determined, save tmp file to odb path. */ + strbuf_reset(&filename); + strbuf_addstr(&filename, the_repository->objects->odb->path); + strbuf_addch(&filename, '/'); + } else + loose_object_path(the_repository, &filename, oid); if (!dry_run) { fd = create_tmpfile(&tmp_file, filename.buf); @@ -1942,7 +1948,7 @@ static int write_loose_object(const struct object_id *oid, char *hdr, die(_("deflateEnd on object %s failed (%d)"), oid_to_hex(oid), ret); the_hash_algo->final_oid_fn(¶no_oid, &c); - if (!oideq(oid, ¶no_oid)) + if (!is_null_oid(oid) && !oideq(oid, ¶no_oid)) die(_("confused by unstable object source data for %s"), oid_to_hex(oid)); @@ -1951,6 +1957,30 @@ static int write_loose_object(const struct object_id *oid, char *hdr, close_loose_object(fd); + if (is_null_oid(oid)) { + int dirlen; + + /* copy oid */ + oidcpy((struct object_id *)oid, ¶no_oid); + /* We get the oid now */ + loose_object_path(the_repository, &filename, oid); + + dirlen = directory_size(filename.buf); + if (dirlen) { + struct strbuf dir = STRBUF_INIT; + /* + * Make sure the directory exists; note that the + * contents of the buffer are undefined after mkstemp + * returns an error, so we have to rewrite the whole + * buffer from scratch. + */ + strbuf_reset(&dir); + strbuf_add(&dir, filename.buf, dirlen - 1); + if (mkdir(dir.buf, 0777) && errno != EEXIST) + return -1; + } + } + if (mtime) { struct utimbuf utb; utb.actime = mtime; From patchwork Fri Nov 12 09:40:08 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Han Xin X-Patchwork-Id: 12616421 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6FCEFC433F5 for ; Fri, 12 Nov 2021 09:42:11 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 46515600CC for ; Fri, 12 Nov 2021 09:42:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234831AbhKLJpA (ORCPT ); Fri, 12 Nov 2021 04:45:00 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60034 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234803AbhKLJo4 (ORCPT ); Fri, 12 Nov 2021 04:44:56 -0500 Received: from mail-pj1-x1036.google.com (mail-pj1-x1036.google.com [IPv6:2607:f8b0:4864:20::1036]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CE0E3C061766 for ; Fri, 12 Nov 2021 01:42:05 -0800 (PST) Received: by mail-pj1-x1036.google.com with SMTP id w33-20020a17090a6ba400b001a722a06212so5861988pjj.0 for ; Fri, 12 Nov 2021 01:42:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=j/6iXnqxfJ6pUit3Gd/WT6piNiAGRGDrqssqtaQL7Iw=; b=khrEwNw3GN+OyYBasLXUwHArobA9HINwpx4UDBOId6sVUqSMHjxfF4gQ2u3//PXYKE ar6RYT7fHcU8VjKbhzGDEJMCeVI42P6zavmP91qp6i6pngHo2U+3YS7ttaYhwQLyqYcS IwDwgGB9sM8+MmH3xO43v/U938+uNNOUCfHa25Z/ew2CYcZEYmNOd0f5c4HbZCcz057s lW13ytA9QXRYgkSsc+mXmQDT3bNHRoZpo4oN7VWqXpUWdd7dOh7zikTJgDAy3Gy8BYJK DlKLXAD/uLXJ/Tm7s6cPH2M+0RQfa+1JUD8u7p9KlXIbsTVV475NO2U403gil/ltAl2Q JAAA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=j/6iXnqxfJ6pUit3Gd/WT6piNiAGRGDrqssqtaQL7Iw=; b=C/ZTF3DRjCwHQg6ZReeSAHa62ISJuVCH9Rk9X0D1aJlXK09ueCkpo27NCq7h90JnSf 0U9Jqc5TEzs5Ou2AiIg6OdcXHn4CaZkYrPFHs1icV6Inbg5SOMYy+jKZ2ktTf0tGifnV F9UZSBbvutY49UIcVY1jFPLq8w1LU3moxAyFoIXV7D+e3zocte3Ul3jA8ojyP2RvrfnX N4ZYgauabFVu063RIgbzfkpcVUuL6FMBaMMYUNUbyeIlgaiR9xZrcxn4rNYP2d1BQLXF Sgy6r4EXtIZGpjZV8hN7iKMrqi3u8w/ckCKyzsaYuoYm+agcrP9NOT5QX6lneGrUHoyB v+1w== X-Gm-Message-State: AOAM530+hF/SNoR4KcQdFghlzts3NKkEJ0jmyUdTw3KyK61JDqm0mCQt X01hAXKHTgfsuL/IM1ZBEzc= X-Google-Smtp-Source: ABdhPJz+faWQQsq+ZHmJv77Kx9P7EPTFOtKOOUafuT3PV7H8kwydDhXTkhf4/duIHiXha3uXA6Wqyg== X-Received: by 2002:a17:90a:4812:: with SMTP id a18mr33591110pjh.223.1636710125437; Fri, 12 Nov 2021 01:42:05 -0800 (PST) Received: from localhost.localdomain ([205.204.117.100]) by smtp.gmail.com with ESMTPSA id q18sm6310103pfj.46.2021.11.12.01.42.03 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Fri, 12 Nov 2021 01:42:05 -0800 (PST) From: Han Xin To: Junio C Hamano , Git List , Jeff King , Jiang Xin , Philip Oakley Cc: Han Xin Subject: [PATCH v2 4/6] object-file.c: read input stream repeatedly in write_loose_object() Date: Fri, 12 Nov 2021 17:40:08 +0800 Message-Id: <20211112094010.73468-4-chiyutianyi@gmail.com> X-Mailer: git-send-email 2.33.1.44.g9344627884.agit.6.5.4 In-Reply-To: <20211009082058.41138-1-chiyutianyi@gmail.com> References: <20211009082058.41138-1-chiyutianyi@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Han Xin Read input stream repeatedly in write_loose_object() unless reach the end, so that we can divide the large blob write into many small blocks. Signed-off-by: Han Xin --- object-file.c | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/object-file.c b/object-file.c index 8393659f0d..e333448c54 100644 --- a/object-file.c +++ b/object-file.c @@ -1891,7 +1891,7 @@ static int write_loose_object(const struct object_id *oid, char *hdr, static struct strbuf tmp_file = STRBUF_INIT; static struct strbuf filename = STRBUF_INIT; const char *buf; - unsigned long len; + int flush = 0; if (is_null_oid(oid)) { /* When oid is not determined, save tmp file to odb path. */ @@ -1927,12 +1927,16 @@ static int write_loose_object(const struct object_id *oid, char *hdr, the_hash_algo->update_fn(&c, hdr, hdrlen); /* Then the data itself.. */ - buf = in_stream->read(in_stream->data, &len); - stream.next_in = (void *)buf; - stream.avail_in = len; do { unsigned char *in0 = stream.next_in; - ret = git_deflate(&stream, Z_FINISH); + if (!stream.avail_in) { + if ((buf = in_stream->read(in_stream->data, &stream.avail_in))) { + stream.next_in = (void *)buf; + in0 = (unsigned char *)buf; + } else + flush = Z_FINISH; + } + ret = git_deflate(&stream, flush); the_hash_algo->update_fn(&c, in0, stream.next_in - in0); if (!dry_run && write_buffer(fd, compressed, stream.next_out - compressed) < 0) die(_("unable to write loose object file")); From patchwork Fri Nov 12 09:40:09 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Han Xin X-Patchwork-Id: 12616423 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6E89CC433F5 for ; Fri, 12 Nov 2021 09:42:14 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 5433F60EE5 for ; Fri, 12 Nov 2021 09:42:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234821AbhKLJpD (ORCPT ); Fri, 12 Nov 2021 04:45:03 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60060 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234825AbhKLJpA (ORCPT ); Fri, 12 Nov 2021 04:45:00 -0500 Received: from mail-pj1-x102a.google.com (mail-pj1-x102a.google.com [IPv6:2607:f8b0:4864:20::102a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1C5C2C061205 for ; Fri, 12 Nov 2021 01:42:08 -0800 (PST) Received: by mail-pj1-x102a.google.com with SMTP id w33-20020a17090a6ba400b001a722a06212so5862039pjj.0 for ; Fri, 12 Nov 2021 01:42:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=zZzKSZv5O8h9jfyAeawQu6Dcrc3fav1gznXiBqvBhNM=; b=PsKcEWB/zU0gOBsAHd60dG3AlUY7GfXutCrUmi6Jm9DOJnNceWS/YALV27Vxr/jrkE k3hKEDZTSQvyvq/S1Okrz0F1ZFgkhg68GDh0Wce+xed+iNMKZFBwlK41nTF9d2TZjgUJ aldUKHVkhHeN6Nerz2sddrokngmpdNpaBg9uJ7nnMKpkYKiaf+YDjlmsjdZQi1oj6G2G wYv0xYVNeqJnk+8jy2/tI+F/Uf23bRB4lKY1n/xrBZ1EaZX6HA4T4+mGuhsu588GxfHs G7IMxmuYmHXtulbfsbePTS+8kfvl6V6Cuj67Q5ntn4OlPJ7HDbi4FU/sk89bm7+7MDgD /PkA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=zZzKSZv5O8h9jfyAeawQu6Dcrc3fav1gznXiBqvBhNM=; b=bSSxnBOrJHJm240ERfaBWVwkskHxhXafHwueusNz3AvQGgm5JmxoWfpvntWPWsGkUD sERpIHqcl75rrI4R8VxoR9ALnrIlvOYa/fEFWGzq1DLEuL6qiugHvT3Be8w/iFNGbK57 NUHmNZR5Z9JTPgDigW7GoQD5niqhdMhE2t8lwZW+weYyPsHE6trg4Ry3JPXQebjba28u mfk1ipuv0NcuKHU6p/K6QZ9HB3QK6xTFE+FiZmxdeC1C+gBxx44mB//BRcKpdwKx9tRh o+T33FygqTESCmMi3NXABcDwlYhwAfdNClNzGRoqz1TifUr1khDwLekJCtbFZt6UUm/u lEAw== X-Gm-Message-State: AOAM5316oEdPKPC5pilLsTNwPrVcdzZmZ/7LB0Mi6CcmSjBQ3spr3vw0 z3kN14x5Z9sm/pPetj/7oHw= X-Google-Smtp-Source: ABdhPJzP79t3qgXyH5cuJVLE89ZGPSjw+NNVs9qtLS4c4n7YOAkhrXn6pkQKRGGHwtBj2y/3hQEopw== X-Received: by 2002:a17:90a:2e16:: with SMTP id q22mr16569701pjd.156.1636710127636; Fri, 12 Nov 2021 01:42:07 -0800 (PST) Received: from localhost.localdomain ([205.204.117.100]) by smtp.gmail.com with ESMTPSA id q18sm6310103pfj.46.2021.11.12.01.42.05 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Fri, 12 Nov 2021 01:42:07 -0800 (PST) From: Han Xin To: Junio C Hamano , Git List , Jeff King , Jiang Xin , Philip Oakley Cc: Han Xin Subject: [PATCH v2 5/6] object-store.h: add write_loose_object() Date: Fri, 12 Nov 2021 17:40:09 +0800 Message-Id: <20211112094010.73468-5-chiyutianyi@gmail.com> X-Mailer: git-send-email 2.33.1.44.g9344627884.agit.6.5.4 In-Reply-To: <20211009082058.41138-1-chiyutianyi@gmail.com> References: <20211009082058.41138-1-chiyutianyi@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Han Xin For large loose object files, that should be possible to stream it direct to disk with "write_loose_object()". Unlike "write_object_file()", you need to implement an "input_stream" instead of giving void *buf. Signed-off-by: Han Xin --- object-file.c | 8 ++++---- object-store.h | 5 +++++ 2 files changed, 9 insertions(+), 4 deletions(-) diff --git a/object-file.c b/object-file.c index e333448c54..60eb29db97 100644 --- a/object-file.c +++ b/object-file.c @@ -1878,10 +1878,10 @@ static const char *read_input_stream_from_buffer(void *data, unsigned long *len) return input->buf; } -static int write_loose_object(const struct object_id *oid, char *hdr, - int hdrlen, struct input_stream *in_stream, - int dry_run, - time_t mtime, unsigned flags) +int write_loose_object(const struct object_id *oid, char *hdr, + int hdrlen, struct input_stream *in_stream, + int dry_run, + time_t mtime, unsigned flags) { int fd, ret = 0; unsigned char compressed[4096]; diff --git a/object-store.h b/object-store.h index f1b67e9100..f6faa8d6d3 100644 --- a/object-store.h +++ b/object-store.h @@ -228,6 +228,11 @@ int hash_object_file(const struct git_hash_algo *algo, const void *buf, unsigned long len, const char *type, struct object_id *oid); +int write_loose_object(const struct object_id *oid, char *hdr, + int hdrlen, struct input_stream *in_stream, + int dry_run, + time_t mtime, unsigned flags); + int write_object_file_flags(const void *buf, unsigned long len, const char *type, struct object_id *oid, unsigned flags); From patchwork Fri Nov 12 09:40:10 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Han Xin X-Patchwork-Id: 12616425 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5042BC433FE for ; Fri, 12 Nov 2021 09:42:15 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3584D600CC for ; Fri, 12 Nov 2021 09:42:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234865AbhKLJpE (ORCPT ); Fri, 12 Nov 2021 04:45:04 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60066 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234803AbhKLJpA (ORCPT ); Fri, 12 Nov 2021 04:45:00 -0500 Received: from mail-pl1-x630.google.com (mail-pl1-x630.google.com [IPv6:2607:f8b0:4864:20::630]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 69935C061766 for ; Fri, 12 Nov 2021 01:42:10 -0800 (PST) Received: by mail-pl1-x630.google.com with SMTP id v20so7980381plo.7 for ; Fri, 12 Nov 2021 01:42:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=YeI8zSoQcUSpyhENDCjUe66JdQx4/Rih8jSSFnaq1rc=; b=qCBe15iDUHwaL9CbXG/G27VMb57dgbkGXnLXDehg5fahf37DqmzUCzcMK3nYN5iTwJ ua7+e+3IICChV5d/JjGHOu8FMgjTH+tPYX7kwAssbFIMAiCcjPMo8XzD6Hnu+CsXgI/f 5J0BIlxdT7klLUsP7lw0R2KAduudfTl8oLk9mFFOxgXGxragj07Zm+ytCfZy6CNEfxP0 tGDyIIf12YrWfwZwV/VoHHWTmWMjlqPu3Ud0q6l5mrozvcXUbuIWJIKwyplOK0gQMYlQ KeWHJmywitVkE212IAEbHQhIpGGpOzve0oNE+9dfANyu8Wnb0ARZU9WZscxOEa8KXAOF lTsg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=YeI8zSoQcUSpyhENDCjUe66JdQx4/Rih8jSSFnaq1rc=; b=j37ooktVnfdYefCQ4tm/fI9zx2+JeA29gJHjMv7rCZvahVOIDSjoaKPyKBJtH3ERJz m4pfenY8IKMn6R9viN8Opm3eAWsBRnI3YDUzN8opuUvPqG/1kdh5D6o2B4jbpQWnEVdN iGwNe5YE43ubR8JTUDmjC2/qgkmTHIEHl5IL9ompuhmBHfA3rEvDaKbR6b9qWHrurVXP dGSefabS7Hqna+SfLdFeLMJIBGtV+ZcxgPp6KdmIJcgz3+lVIk5u6rmwNVN7aXPdknbJ z3G6bsBRXvBA/IiCS7yVHmMy2hIhZOSDMxnn5O9DhRV0CruzgJLzB5WS0Ayrr5cgdO0H ErkA== X-Gm-Message-State: AOAM533wnFTbSKztJ+VjQf5SSj3+bAl4Wchdn+dGjOpspXKz17mwTPKP fv7h7vF+ChseIO9t5Sn6gs4= X-Google-Smtp-Source: ABdhPJyEGYD4ll+I5hWALsR9c8X9SBUT4L4OgbFXfrqsL5Vbjd+IFEQC62TOK64S9hWX+p/hJ1Ztlg== X-Received: by 2002:a17:902:8346:b0:142:9e66:2f54 with SMTP id z6-20020a170902834600b001429e662f54mr6367381pln.27.1636710129945; Fri, 12 Nov 2021 01:42:09 -0800 (PST) Received: from localhost.localdomain ([205.204.117.100]) by smtp.gmail.com with ESMTPSA id q18sm6310103pfj.46.2021.11.12.01.42.07 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Fri, 12 Nov 2021 01:42:09 -0800 (PST) From: Han Xin To: Junio C Hamano , Git List , Jeff King , Jiang Xin , Philip Oakley Cc: Han Xin Subject: [PATCH v2 6/6] unpack-objects: unpack large object in stream Date: Fri, 12 Nov 2021 17:40:10 +0800 Message-Id: <20211112094010.73468-6-chiyutianyi@gmail.com> X-Mailer: git-send-email 2.33.1.44.g9344627884.agit.6.5.4 In-Reply-To: <20211009082058.41138-1-chiyutianyi@gmail.com> References: <20211009082058.41138-1-chiyutianyi@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Han Xin When calling "unpack_non_delta_entry()", will allocate full memory for the whole size of the unpacked object and write the buffer to loose file on disk. This may lead to OOM for the git-unpack-objects process when unpacking a very large object. In function "unpack_delta_entry()", will also allocate full memory to buffer the whole delta, but since there will be no delta for an object larger than "core.bigFileThreshold", this issue is moderate. To resolve the OOM issue in "git-unpack-objects", we can unpack large object to file in stream, and use "core.bigFileThreshold" to avoid OOM limits when called "get_data()". Signed-off-by: Han Xin --- builtin/unpack-objects.c | 76 ++++++++++++++++++++++++- t/t5590-receive-unpack-objects.sh | 92 +++++++++++++++++++++++++++++++ 2 files changed, 167 insertions(+), 1 deletion(-) create mode 100755 t/t5590-receive-unpack-objects.sh diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c index 4a9466295b..6c757d823b 100644 --- a/builtin/unpack-objects.c +++ b/builtin/unpack-objects.c @@ -320,11 +320,85 @@ static void added_object(unsigned nr, enum object_type type, } } +struct input_data_from_zstream { + git_zstream *zstream; + unsigned char buf[4096]; + int status; +}; + +static const char *read_inflate_in_stream(void *data, unsigned long *readlen) +{ + struct input_data_from_zstream *input = data; + git_zstream *zstream = input->zstream; + void *in = fill(1); + + if (!len || input->status == Z_STREAM_END) { + *readlen = 0; + return NULL; + } + + zstream->next_out = input->buf; + zstream->avail_out = sizeof(input->buf); + zstream->next_in = in; + zstream->avail_in = len; + + input->status = git_inflate(zstream, 0); + use(len - zstream->avail_in); + *readlen = sizeof(input->buf) - zstream->avail_out; + + return (const char *)input->buf; +} + +static void write_stream_blob(unsigned nr, unsigned long size) +{ + char hdr[32]; + int hdrlen; + git_zstream zstream; + struct input_data_from_zstream data; + struct input_stream in_stream = { + .read = read_inflate_in_stream, + .data = &data, + }; + struct object_id *oid = &obj_list[nr].oid; + int ret; + + memset(&zstream, 0, sizeof(zstream)); + memset(&data, 0, sizeof(data)); + data.zstream = &zstream; + git_inflate_init(&zstream); + + /* Generate the header */ + hdrlen = xsnprintf(hdr, sizeof(hdr), "%s %"PRIuMAX, type_name(OBJ_BLOB), (uintmax_t)size) + 1; + + if ((ret = write_loose_object(oid, hdr, hdrlen, &in_stream, dry_run, 0, 0))) + die(_("failed to write object in stream %d"), ret); + + if (zstream.total_out != size || data.status != Z_STREAM_END) + die(_("inflate returned %d"), data.status); + git_inflate_end(&zstream); + + if (strict && !dry_run) { + struct blob *blob = lookup_blob(the_repository, oid); + if (blob) + blob->object.flags |= FLAG_WRITTEN; + else + die("invalid blob object from stream"); + } + obj_list[nr].obj = NULL; +} + static void unpack_non_delta_entry(enum object_type type, unsigned long size, unsigned nr) { - void *buf = get_data(size); + void *buf; + + /* Write large blob in stream without allocating full buffer. */ + if (type == OBJ_BLOB && size > big_file_threshold) { + write_stream_blob(nr, size); + return; + } + buf = get_data(size); if (!dry_run && buf) write_object(nr, type, buf, size); else diff --git a/t/t5590-receive-unpack-objects.sh b/t/t5590-receive-unpack-objects.sh new file mode 100755 index 0000000000..7e63dfc0db --- /dev/null +++ b/t/t5590-receive-unpack-objects.sh @@ -0,0 +1,92 @@ +#!/bin/sh +# +# Copyright (c) 2021 Han Xin +# + +test_description='Test unpack-objects when receive pack' + +GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main +export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME + +. ./test-lib.sh + +test_expect_success "create commit with big blobs (1.5 MB)" ' + test-tool genrandom foo 1500000 >big-blob && + test_commit --append foo big-blob && + test-tool genrandom bar 1500000 >big-blob && + test_commit --append bar big-blob && + ( + cd .git && + find objects/?? -type f | sort + ) >expect && + git repack -ad +' + +test_expect_success 'setup GIT_ALLOC_LIMIT to 1MB' ' + GIT_ALLOC_LIMIT=1m && + export GIT_ALLOC_LIMIT +' + +test_expect_success 'prepare dest repository' ' + git init --bare dest.git && + git -C dest.git config core.bigFileThreshold 2m && + git -C dest.git config receive.unpacklimit 100 +' + +test_expect_success 'fail to push: cannot allocate' ' + test_must_fail git push dest.git HEAD 2>err && + test_i18ngrep "remote: fatal: attempting to allocate" err && + ( + cd dest.git && + find objects/?? -type f | sort + ) >actual && + ! test_cmp expect actual +' + +test_expect_success 'set a lower bigfile threshold' ' + git -C dest.git config core.bigFileThreshold 1m +' + +test_expect_success 'unpack big object in stream' ' + git push dest.git HEAD && + git -C dest.git fsck && + ( + cd dest.git && + find objects/?? -type f | sort + ) >actual && + test_cmp expect actual +' + +test_expect_success 'setup for unpack-objects dry-run test' ' + PACK=$(echo main | git pack-objects --progress --revs test) && + unset GIT_ALLOC_LIMIT && + git init --bare unpack-test.git +' + +test_expect_success 'unpack-objects dry-run with large threshold' ' + ( + cd unpack-test.git && + git config core.bigFileThreshold 2m && + git unpack-objects -n <../test-$PACK.pack + ) && + ( + cd unpack-test.git && + find objects/ -type f + ) >actual && + test_must_be_empty actual +' + +test_expect_success 'unpack-objects dry-run with small threshold' ' + ( + cd unpack-test.git && + git config core.bigFileThreshold 1m && + git unpack-objects -n <../test-$PACK.pack + ) && + ( + cd unpack-test.git && + find objects/ -type f + ) >actual && + test_must_be_empty actual +' + +test_done