diff mbox series

reftable: pass pq_entry by address

Message ID 183353220fe.d7826593472673.3445243727369286065@elijahpepe.com (mailing list archive)
State New, archived
Headers show
Series reftable: pass pq_entry by address | expand

Commit Message

Elijah Conners Sept. 13, 2022, 4:53 a.m. UTC
In merged_iter_pqueue_add, the pq_entry parameter is passed by value,
although it exceeds 64 bytes.

Signed-off-by: Elijah Conners <business@elijahpepe.com>
---
 reftable/merged.c  | 4 ++--
 reftable/pq.c      | 4 ++--
 reftable/pq.h      | 2 +-
 reftable/pq_test.c | 2 +-
 4 files changed, 6 insertions(+), 6 deletions(-)

Comments

Han-Wen Nienhuys Sept. 13, 2022, 9:11 a.m. UTC | #1
On Tue, Sep 13, 2022 at 6:53 AM Elijah Conners <business@elijahpepe.com> wrote:
>
> In merged_iter_pqueue_add, the pq_entry parameter is passed by value,
> although it exceeds 64 bytes.


LGTM.
Junio C Hamano Sept. 13, 2022, 3:55 p.m. UTC | #2
Elijah Conners <business@elijahpepe.com> writes:

> In merged_iter_pqueue_add, the pq_entry parameter is passed by value,
> although it exceeds 64 bytes.

Do we have any hard guidance like "do not pass an data item whose
size is larger than 64 bytes" in our coding guidelines?  If not,
make sure that the reference to 64 bytes does not look like one.

While this patch is not bad as a change, we are going to make a copy
of the value with structure assignment at the leaf level, I am not
sure how big a deal this is in practice.

In any case, wouldn't it make sense to make the "we pass reference
not because we want to let the callee modify the value, but because
the callee deep in the callchain wants to copy the contents out of
it" parameter a pointer to a constant?  I.e.

    void merged_iter_pqueue_add(struct merged_iter_pqueue *pq, const struct pq_entry *e)

Other than that, looking good.
Elijah Conners Sept. 13, 2022, 5:34 p.m. UTC | #3
Junio C Hamano <gitster@pobox.com> writes:
 > Do we have any hard guidance like "do not pass an data item whose
 > size is larger than 64 bytes" in our coding guidelines?  If not,
 > make sure that the reference to 64 bytes does not look like one.
While we don't have hard guidance like that, putting an object that exceeds 64 bytes on the stack is dangerous.

 > In any case, wouldn't it make sense to make the "we pass reference
 > not because we want to let the callee modify the value, but because
 > the callee deep in the callchain wants to copy the contents out of
 > it" parameter a pointer to a constant? 
Yes. I overlooked that making this change. Feel free to make that change, otherwise I'll do it myself.
Han-Wen Nienhuys Sept. 13, 2022, 5:36 p.m. UTC | #4
On Tue, Sep 13, 2022 at 7:34 PM Elijah Conners <business@elijahpepe.com> wrote:
>
> Junio C Hamano <gitster@pobox.com> writes:
>  > Do we have any hard guidance like "do not pass an data item whose
>  > size is larger than 64 bytes" in our coding guidelines?  If not,
>  > make sure that the reference to 64 bytes does not look like one.
> While we don't have hard guidance like that, putting an object that exceeds 64 bytes on the stack is dangerous.

it might be a bit slower, but "dangerous"? How so?
Elijah Conners Sept. 13, 2022, 6:03 p.m. UTC | #5
Han-Wen Nienhuys <hanwen@google.com> writes:
 > it might be a bit slower, but "dangerous"? How so?
In this context, dangerous is the wrong word, but in some cases large objects on the stack can cause stack overflows. In this case, slower is the right word here.
Junio C Hamano Sept. 15, 2022, 2:27 a.m. UTC | #6
Elijah Conners <business@elijahpepe.com> writes:

> Junio C Hamano <gitster@pobox.com> writes:
>  > Do we have any hard guidance like "do not pass an data item whose
>  > size is larger than 64 bytes" in our coding guidelines?  If not,
>  > make sure that the reference to 64 bytes does not look like one.
> While we don't have hard guidance like that, putting an object that exceeds 64 bytes on the stack is dangerous.
>
>  > In any case, wouldn't it make sense to make the "we pass reference
>  > not because we want to let the callee modify the value, but because
>  > the callee deep in the callchain wants to copy the contents out of
>  > it" parameter a pointer to a constant? 
> Yes. I overlooked that making this change. Feel free to make that change, otherwise I'll do it myself.

OK, will wait for an updated patch that corrects the proposed log
message (i.e. not to say "size is larger than 64 bytes hence this is
bad") with a const pointer.

Note that this project tries to avoid piling "oops the previous one
was wrong, and this is a fix" patches on top of earlier patch that
are faulty or suboptimal.  Instead "v2" and later patches are
written as if an earlier iteration never happened, i.e. allowing the
author to pretend to be perfect human ;-).

Thanks.
Han-Wen Nienhuys Sept. 15, 2022, 7:49 a.m. UTC | #7
On Tue, Sep 13, 2022 at 8:03 PM Elijah Conners <business@elijahpepe.com> wrote:
>
> Han-Wen Nienhuys <hanwen@google.com> writes:
>  > it might be a bit slower, but "dangerous"? How so?
> In this context, dangerous is the wrong word, but in some cases large objects on the stack can cause stack overflows. In this case, slower is the right word here.

I'll let you paint this bikeshed, but do note that the priority queue
isn't actually optimal here, in a much bigger way. In the typical
case, you'd have

1. large base reftable (created by GC)
2. small updates (created by individual ref updates)

When you're iterating, most of the iteration entries will come from
the large base reftable, and only occasionally, you have to get
entries from the small tables.
In this scenario, the current code will insert entries from the large
table into the priority queue, have it filter up to the top at cost
log(number-of-tables), for each of the entries to be read.

With the current online compaction, number-of-tables =
log(number-of-refs), so at log(log(number-of-refs)) it's not a huge
cost, but certainly larger than the cost of copying the entry once in
the function.

JGit has an optimization here where it tries to get the next entry
from the table that previously provided the minimum entry. I didn't
implement it for simplicity's sake, but if you care about performance,
you might want to try your hand at that.
diff mbox series

Patch

diff --git a/reftable/merged.c b/reftable/merged.c
index 2a6efa110d..5ded470c08 100644
--- a/reftable/merged.c
+++ b/reftable/merged.c
@@ -36,7 +36,7 @@  static int merged_iter_init(struct merged_iter *mi)
 				.rec = rec,
 				.index = i,
 			};
-			merged_iter_pqueue_add(&mi->pq, e);
+			merged_iter_pqueue_add(&mi->pq, &e);
 		}
 	}
 
@@ -71,7 +71,7 @@  static int merged_iter_advance_nonnull_subiter(struct merged_iter *mi,
 		return 0;
 	}
 
-	merged_iter_pqueue_add(&mi->pq, e);
+	merged_iter_pqueue_add(&mi->pq, &e);
 	return 0;
 }
 
diff --git a/reftable/pq.c b/reftable/pq.c
index 96ca6dd37b..156f78a064 100644
--- a/reftable/pq.c
+++ b/reftable/pq.c
@@ -71,7 +71,7 @@  struct pq_entry merged_iter_pqueue_remove(struct merged_iter_pqueue *pq)
 	return e;
 }
 
-void merged_iter_pqueue_add(struct merged_iter_pqueue *pq, struct pq_entry e)
+void merged_iter_pqueue_add(struct merged_iter_pqueue *pq, struct pq_entry *e)
 {
 	int i = 0;
 
@@ -81,7 +81,7 @@  void merged_iter_pqueue_add(struct merged_iter_pqueue *pq, struct pq_entry e)
 					    pq->cap * sizeof(struct pq_entry));
 	}
 
-	pq->heap[pq->len++] = e;
+	pq->heap[pq->len++] = *e;
 	i = pq->len - 1;
 	while (i > 0) {
 		int j = (i - 1) / 2;
diff --git a/reftable/pq.h b/reftable/pq.h
index 56fc1b6d87..e5e9234baf 100644
--- a/reftable/pq.h
+++ b/reftable/pq.h
@@ -26,7 +26,7 @@  struct pq_entry merged_iter_pqueue_top(struct merged_iter_pqueue pq);
 int merged_iter_pqueue_is_empty(struct merged_iter_pqueue pq);
 void merged_iter_pqueue_check(struct merged_iter_pqueue pq);
 struct pq_entry merged_iter_pqueue_remove(struct merged_iter_pqueue *pq);
-void merged_iter_pqueue_add(struct merged_iter_pqueue *pq, struct pq_entry e);
+void merged_iter_pqueue_add(struct merged_iter_pqueue *pq, struct pq_entry *e);
 void merged_iter_pqueue_release(struct merged_iter_pqueue *pq);
 int pq_less(struct pq_entry *a, struct pq_entry *b);
 
diff --git a/reftable/pq_test.c b/reftable/pq_test.c
index 7de5e886f3..011b5c7502 100644
--- a/reftable/pq_test.c
+++ b/reftable/pq_test.c
@@ -46,7 +46,7 @@  static void test_pq(void)
 					       .u.ref = {
 						       .refname = names[i],
 					       } } };
-		merged_iter_pqueue_add(&pq, e);
+		merged_iter_pqueue_add(&pq, &e);
 		merged_iter_pqueue_check(pq);
 		i = (i * 7) % N;
 	} while (i != 1);