Message ID | 183353220fe.d7826593472673.3445243727369286065@elijahpepe.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | reftable: pass pq_entry by address | expand |
On Tue, Sep 13, 2022 at 6:53 AM Elijah Conners <business@elijahpepe.com> wrote: > > In merged_iter_pqueue_add, the pq_entry parameter is passed by value, > although it exceeds 64 bytes. LGTM.
Elijah Conners <business@elijahpepe.com> writes: > In merged_iter_pqueue_add, the pq_entry parameter is passed by value, > although it exceeds 64 bytes. Do we have any hard guidance like "do not pass an data item whose size is larger than 64 bytes" in our coding guidelines? If not, make sure that the reference to 64 bytes does not look like one. While this patch is not bad as a change, we are going to make a copy of the value with structure assignment at the leaf level, I am not sure how big a deal this is in practice. In any case, wouldn't it make sense to make the "we pass reference not because we want to let the callee modify the value, but because the callee deep in the callchain wants to copy the contents out of it" parameter a pointer to a constant? I.e. void merged_iter_pqueue_add(struct merged_iter_pqueue *pq, const struct pq_entry *e) Other than that, looking good.
Junio C Hamano <gitster@pobox.com> writes: > Do we have any hard guidance like "do not pass an data item whose > size is larger than 64 bytes" in our coding guidelines? If not, > make sure that the reference to 64 bytes does not look like one. While we don't have hard guidance like that, putting an object that exceeds 64 bytes on the stack is dangerous. > In any case, wouldn't it make sense to make the "we pass reference > not because we want to let the callee modify the value, but because > the callee deep in the callchain wants to copy the contents out of > it" parameter a pointer to a constant? Yes. I overlooked that making this change. Feel free to make that change, otherwise I'll do it myself.
On Tue, Sep 13, 2022 at 7:34 PM Elijah Conners <business@elijahpepe.com> wrote: > > Junio C Hamano <gitster@pobox.com> writes: > > Do we have any hard guidance like "do not pass an data item whose > > size is larger than 64 bytes" in our coding guidelines? If not, > > make sure that the reference to 64 bytes does not look like one. > While we don't have hard guidance like that, putting an object that exceeds 64 bytes on the stack is dangerous. it might be a bit slower, but "dangerous"? How so?
Han-Wen Nienhuys <hanwen@google.com> writes:
> it might be a bit slower, but "dangerous"? How so?
In this context, dangerous is the wrong word, but in some cases large objects on the stack can cause stack overflows. In this case, slower is the right word here.
Elijah Conners <business@elijahpepe.com> writes: > Junio C Hamano <gitster@pobox.com> writes: > > Do we have any hard guidance like "do not pass an data item whose > > size is larger than 64 bytes" in our coding guidelines? If not, > > make sure that the reference to 64 bytes does not look like one. > While we don't have hard guidance like that, putting an object that exceeds 64 bytes on the stack is dangerous. > > > In any case, wouldn't it make sense to make the "we pass reference > > not because we want to let the callee modify the value, but because > > the callee deep in the callchain wants to copy the contents out of > > it" parameter a pointer to a constant? > Yes. I overlooked that making this change. Feel free to make that change, otherwise I'll do it myself. OK, will wait for an updated patch that corrects the proposed log message (i.e. not to say "size is larger than 64 bytes hence this is bad") with a const pointer. Note that this project tries to avoid piling "oops the previous one was wrong, and this is a fix" patches on top of earlier patch that are faulty or suboptimal. Instead "v2" and later patches are written as if an earlier iteration never happened, i.e. allowing the author to pretend to be perfect human ;-). Thanks.
On Tue, Sep 13, 2022 at 8:03 PM Elijah Conners <business@elijahpepe.com> wrote: > > Han-Wen Nienhuys <hanwen@google.com> writes: > > it might be a bit slower, but "dangerous"? How so? > In this context, dangerous is the wrong word, but in some cases large objects on the stack can cause stack overflows. In this case, slower is the right word here. I'll let you paint this bikeshed, but do note that the priority queue isn't actually optimal here, in a much bigger way. In the typical case, you'd have 1. large base reftable (created by GC) 2. small updates (created by individual ref updates) When you're iterating, most of the iteration entries will come from the large base reftable, and only occasionally, you have to get entries from the small tables. In this scenario, the current code will insert entries from the large table into the priority queue, have it filter up to the top at cost log(number-of-tables), for each of the entries to be read. With the current online compaction, number-of-tables = log(number-of-refs), so at log(log(number-of-refs)) it's not a huge cost, but certainly larger than the cost of copying the entry once in the function. JGit has an optimization here where it tries to get the next entry from the table that previously provided the minimum entry. I didn't implement it for simplicity's sake, but if you care about performance, you might want to try your hand at that.
diff --git a/reftable/merged.c b/reftable/merged.c index 2a6efa110d..5ded470c08 100644 --- a/reftable/merged.c +++ b/reftable/merged.c @@ -36,7 +36,7 @@ static int merged_iter_init(struct merged_iter *mi) .rec = rec, .index = i, }; - merged_iter_pqueue_add(&mi->pq, e); + merged_iter_pqueue_add(&mi->pq, &e); } } @@ -71,7 +71,7 @@ static int merged_iter_advance_nonnull_subiter(struct merged_iter *mi, return 0; } - merged_iter_pqueue_add(&mi->pq, e); + merged_iter_pqueue_add(&mi->pq, &e); return 0; } diff --git a/reftable/pq.c b/reftable/pq.c index 96ca6dd37b..156f78a064 100644 --- a/reftable/pq.c +++ b/reftable/pq.c @@ -71,7 +71,7 @@ struct pq_entry merged_iter_pqueue_remove(struct merged_iter_pqueue *pq) return e; } -void merged_iter_pqueue_add(struct merged_iter_pqueue *pq, struct pq_entry e) +void merged_iter_pqueue_add(struct merged_iter_pqueue *pq, struct pq_entry *e) { int i = 0; @@ -81,7 +81,7 @@ void merged_iter_pqueue_add(struct merged_iter_pqueue *pq, struct pq_entry e) pq->cap * sizeof(struct pq_entry)); } - pq->heap[pq->len++] = e; + pq->heap[pq->len++] = *e; i = pq->len - 1; while (i > 0) { int j = (i - 1) / 2; diff --git a/reftable/pq.h b/reftable/pq.h index 56fc1b6d87..e5e9234baf 100644 --- a/reftable/pq.h +++ b/reftable/pq.h @@ -26,7 +26,7 @@ struct pq_entry merged_iter_pqueue_top(struct merged_iter_pqueue pq); int merged_iter_pqueue_is_empty(struct merged_iter_pqueue pq); void merged_iter_pqueue_check(struct merged_iter_pqueue pq); struct pq_entry merged_iter_pqueue_remove(struct merged_iter_pqueue *pq); -void merged_iter_pqueue_add(struct merged_iter_pqueue *pq, struct pq_entry e); +void merged_iter_pqueue_add(struct merged_iter_pqueue *pq, struct pq_entry *e); void merged_iter_pqueue_release(struct merged_iter_pqueue *pq); int pq_less(struct pq_entry *a, struct pq_entry *b); diff --git a/reftable/pq_test.c b/reftable/pq_test.c index 7de5e886f3..011b5c7502 100644 --- a/reftable/pq_test.c +++ b/reftable/pq_test.c @@ -46,7 +46,7 @@ static void test_pq(void) .u.ref = { .refname = names[i], } } }; - merged_iter_pqueue_add(&pq, e); + merged_iter_pqueue_add(&pq, &e); merged_iter_pqueue_check(pq); i = (i * 7) % N; } while (i != 1);
In merged_iter_pqueue_add, the pq_entry parameter is passed by value, although it exceeds 64 bytes. Signed-off-by: Elijah Conners <business@elijahpepe.com> --- reftable/merged.c | 4 ++-- reftable/pq.c | 4 ++-- reftable/pq.h | 2 +- reftable/pq_test.c | 2 +- 4 files changed, 6 insertions(+), 6 deletions(-)