commit-slab: clarify slabname##_peek()'s return value
diff mbox series

Message ID 20200310153049.3482-1-szeder.dev@gmail.com
State New
Headers show
Series
  • commit-slab: clarify slabname##_peek()'s return value
Related show

Commit Message

SZEDER Gábor March 10, 2020, 3:30 p.m. UTC
Ever since 862e730ec1 (commit-slab: introduce slabname##_peek()
function, 2015-05-14) the slabname##_peek() function is documented as:

  This function is similar to indegree_at(), but it will return NULL
  until a call to indegree_at() was made for the commit.

This, however, is usually not the case.  If indegree_at() allocates
memory, then it will do so not only for the single commit it got as
parameter, but it will allocate a whole new, ~512kB slab.  Later on,
if any other commit's 'index' field happens to point into an already
allocated slab, then indegree_peek() for such a commit will return a
valid non-NULL pointer, pointing to a zero-initialized location in the
slab, even if no indegree_at() call has been made for that commit yet.

Update slabname##_peek()'s documentation to clarify this.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---
 commit-slab.h | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

Comments

Jeff King March 10, 2020, 5:54 p.m. UTC | #1
On Tue, Mar 10, 2020 at 04:30:49PM +0100, SZEDER Gábor wrote:

> Ever since 862e730ec1 (commit-slab: introduce slabname##_peek()
> function, 2015-05-14) the slabname##_peek() function is documented as:
> 
>   This function is similar to indegree_at(), but it will return NULL
>   until a call to indegree_at() was made for the commit.
> 
> This, however, is usually not the case.  If indegree_at() allocates
> memory, then it will do so not only for the single commit it got as
> parameter, but it will allocate a whole new, ~512kB slab.  Later on,
> if any other commit's 'index' field happens to point into an already
> allocated slab, then indegree_peek() for such a commit will return a
> valid non-NULL pointer, pointing to a zero-initialized location in the
> slab, even if no indegree_at() call has been made for that commit yet.
> 
> Update slabname##_peek()'s documentation to clarify this.

Yeah, I agree the existing documentation is misleading. Your update
looks good to me.

I thought at first we might simply be able to say:

  This function is similar to indegree_at(), but it will avoid
  allocating new slab memory (so its result is suitable only for
  reading, not writing).

But I think it's worth mentioning that the caller needs to handle both
NULL or a possible zero-initialized value, as your patch does.

I also wondered if we could make life easier for the caller by
collapsing these cases. I.e., always returning a zero-initialized value,
and never NULL. All of the callers do something like:

  struct blame_origin *get_blame_suspects(struct commit *commit)
  {                               
          struct blame_origin **result;
                  
          result = blame_suspects_peek(&blame_suspects, commit);
  
          return result ? *result : NULL;
  }

all of which could be turned into a single blame_suspects_peek() call if
it just consistently returned a zero-initialized value (it's a little
confusing in this example because we're storing pointers, so the
zero-initialized value is _also_ NULL, but it's a different type).

But that would get a bit awkward, because peek() returns a pointer, not
a value (as it should, because the type we're storing may be a compound
type, which we generally avoid passing or returning by value).  So we'd
actually need to return a pointer to a zero-initialized dummy value. Not
impossible, but getting a bit odd.

-Peff
Jeff King March 10, 2020, 6:07 p.m. UTC | #2
On Tue, Mar 10, 2020 at 01:54:46PM -0400, Jeff King wrote:

> I also wondered if we could make life easier for the caller by
> collapsing these cases. I.e., always returning a zero-initialized value,
> and never NULL.
> [...]
> But that would get a bit awkward, because peek() returns a pointer, not
> a value (as it should, because the type we're storing may be a compound
> type, which we generally avoid passing or returning by value).  So we'd
> actually need to return a pointer to a zero-initialized dummy value. Not
> impossible, but getting a bit odd.

I was a little curious how it would look. Code-wise it's not too bad,
but if anybody ever wrote to the dummy entry, that would cause confusing
bugs. Possibly peek() could return a pointer-to-const.

The code savings in the callers are not all that much (I didn't convert
all sites, but you can see that it just saves a few lines in each case).

So it's probably not worth the churn, but if anybody is really excited
about it, I can polish it into a real patch.

diff --git a/blame.c b/blame.c
index 29770e5c81..3ae42a1edd 100644
--- a/blame.c
+++ b/blame.c
@@ -15,11 +15,7 @@ static struct blame_suspects blame_suspects;
 
 struct blame_origin *get_blame_suspects(struct commit *commit)
 {
-	struct blame_origin **result;
-
-	result = blame_suspects_peek(&blame_suspects, commit);
-
-	return result ? *result : NULL;
+	return *blame_suspects_peek(&blame_suspects, commit);
 }
 
 static void set_blame_suspects(struct commit *commit, struct blame_origin *origin)
diff --git a/commit-slab-decl.h b/commit-slab-decl.h
index adc7b46c83..4b951e7b2f 100644
--- a/commit-slab-decl.h
+++ b/commit-slab-decl.h
@@ -13,6 +13,7 @@ struct slabname {							\
 	unsigned stride;						\
 	unsigned slab_count;						\
 	elemtype **slab;						\
+	elemtype dummy;							\
 }
 
 /*
diff --git a/commit-slab-impl.h b/commit-slab-impl.h
index 5c0eb91a5d..505c21599a 100644
--- a/commit-slab-impl.h
+++ b/commit-slab-impl.h
@@ -22,6 +22,7 @@ scope void init_ ##slabname## _with_stride(struct slabname *s,		\
 	s->slab_size = COMMIT_SLAB_SIZE / elem_size;			\
 	s->slab_count = 0;						\
 	s->slab = NULL;							\
+	memset(&s->dummy, 0, sizeof(elemtype));				\
 }									\
 									\
 scope void init_ ##slabname(struct slabname *s)				\
@@ -50,15 +51,15 @@ scope elemtype *slabname## _at_peek(struct slabname *s,			\
 	if (s->slab_count <= nth_slab) {				\
 		unsigned int i;						\
 		if (!add_if_missing)					\
-			return NULL;					\
+			return &s->dummy;				\
 		REALLOC_ARRAY(s->slab, nth_slab + 1);			\
 		for (i = s->slab_count; i <= nth_slab; i++)		\
 			s->slab[i] = NULL;				\
 		s->slab_count = nth_slab + 1;				\
 	}								\
 	if (!s->slab[nth_slab]) {					\
 		if (!add_if_missing)					\
-			return NULL;					\
+			return &s->dummy;				\
 		s->slab[nth_slab] = xcalloc(s->slab_size,		\
 					    sizeof(**s->slab) * s->stride);		\
 	}								\
diff --git a/commit.c b/commit.c
index a6cfa41a4e..49d86435a5 100644
--- a/commit.c
+++ b/commit.c
@@ -289,11 +289,6 @@ const void *get_cached_commit_buffer(struct repository *r, const struct commit *
 {
 	struct commit_buffer *v = buffer_slab_peek(
 		r->parsed_objects->buffer_slab, commit);
-	if (!v) {
-		if (sizep)
-			*sizep = 0;
-		return NULL;
-	}
 	if (sizep)
 		*sizep = v->size;
 	return v->buffer;
Junio C Hamano March 10, 2020, 6:19 p.m. UTC | #3
Jeff King <peff@peff.net> writes:

> But that would get a bit awkward, because peek() returns a pointer, not
> a value (as it should, because the type we're storing may be a compound
> type, which we generally avoid passing or returning by value).  So we'd
> actually need to return a pointer to a zero-initialized dummy value. Not
> impossible, but getting a bit odd.

Do we have a guarantee that callers of the peek only look at, never
touch, the location?  As long as we make it return a "const *", it
might be OK, but a quick look at commit-slab.h tells me that we do
not say "const".
Junio C Hamano March 10, 2020, 6:26 p.m. UTC | #4
Junio C Hamano <gitster@pobox.com> writes:

> Jeff King <peff@peff.net> writes:
>
>> But that would get a bit awkward, because peek() returns a pointer, not
>> a value (as it should, because the type we're storing may be a compound
>> type, which we generally avoid passing or returning by value).  So we'd
>> actually need to return a pointer to a zero-initialized dummy value. Not
>> impossible, but getting a bit odd.
>
> Do we have a guarantee that callers of the peek only look at, never
> touch, the location?  As long as we make it return a "const *", it
> might be OK, but a quick look at commit-slab.h tells me that we do
> not say "const".

Ah, should have read the other message ;-)
Jeff King March 11, 2020, 4:07 p.m. UTC | #5
On Tue, Mar 10, 2020 at 02:07:49PM -0400, Jeff King wrote:

> I was a little curious how it would look. Code-wise it's not too bad,
> but if anybody ever wrote to the dummy entry, that would cause confusing
> bugs. Possibly peek() could return a pointer-to-const.

I was also curious about this, but it gets ugly really quickly. So I
think this is a dead end, but read on if you're interested.

The first issue is that implicit const conversions are tricky around
pointers-to-pointers. So the actual commit-slab change ends up with
this:

diff --git a/commit-slab-decl.h b/commit-slab-decl.h
index 4b951e7b2f..2f67b08aa6 100644
--- a/commit-slab-decl.h
+++ b/commit-slab-decl.h
@@ -35,7 +35,7 @@ void init_ ##slabname(struct slabname *s);				\
 void clear_ ##slabname(struct slabname *s);				\
 elemtype *slabname## _at_peek(struct slabname *s, const struct commit *c, int add_if_missing); \
 elemtype *slabname## _at(struct slabname *s, const struct commit *c);	\
-elemtype *slabname## _peek(struct slabname *s, const struct commit *c)
+const elemtype *slabname## _peek(struct slabname *s, const struct commit *c)
 
 #define define_shared_commit_slab(slabname, elemtype) \
 	declare_commit_slab(slabname, elemtype); \
diff --git a/commit-slab-impl.h b/commit-slab-impl.h
index 505c21599a..f2d6ef97a2 100644
--- a/commit-slab-impl.h
+++ b/commit-slab-impl.h
@@ -72,10 +72,14 @@ scope elemtype *slabname## _at(struct slabname *s,			\
 	return slabname##_at_peek(s, c, 1);				\
 }									\
 									\
-scope elemtype *slabname## _peek(struct slabname *s,			\
+scope const elemtype *slabname## _peek(struct slabname *s,		\
 					     const struct commit *c)	\
 {									\
-	return slabname##_at_peek(s, c, 0);				\
+	/*								\
+	 * The cast is necessary here to handle pointer elemtypes.	\
+	 * E.g., (int **) cannot implicitly become (const int **).	\
+	 */								\
+	return (const elemtype *)slabname##_at_peek(s, c, 0);		\
 }									\
 									\
 struct slabname

Gross, but at least it's contained. The bigger problem, though, is that
many of the callers _do_ want to peek and then modify the result if it
exists and isn't set to zero. So in blame, for example, we might try to
do:

diff --git a/blame.c b/blame.c
index 3ae42a1edd..ef0526c100 100644
--- a/blame.c
+++ b/blame.c
@@ -13,7 +13,7 @@
 define_commit_slab(blame_suspects, struct blame_origin *);
 static struct blame_suspects blame_suspects;
 
-struct blame_origin *get_blame_suspects(struct commit *commit)
+const struct blame_origin *get_blame_suspects(struct commit *commit)
 {
 	return *blame_suspects_peek(&blame_suspects, commit);
 }
@@ -24,11 +24,12 @@ static void set_blame_suspects(struct commit *commit, struct blame_origin *origi
 }
 
 void blame_origin_decref(struct blame_origin *o)
 {
 	if (o && --o->refcnt <= 0) {
-		struct blame_origin *p, *l = NULL;
+		const struct blame_origin *p;
+		struct blame_origin *l = NULL;
 		if (o->previous)
 			blame_origin_decref(o->previous);
 		free(o->file.ptr);
 		/* Should be present exactly once in commit chain */
 		for (p = get_blame_suspects(o->commit); p; l = p, p = p->next) {

But that loop at the end of the context won't work, because we're using
a non-const pointer "l" to iterate. And that "l" has to be non-const,
because we actually modify its next pointer.

And here's an even more straightforward example, with an actual solution
to make it compile:

diff --git a/commit.c b/commit.c
index 49d86435a5..ee36a47db2 100644
--- a/commit.c
+++ b/commit.c
@@ -319,17 +319,19 @@ void repo_unuse_commit_buffer(struct repository *r,
 			      const struct commit *commit,
 			      const void *buffer)
 {
-	struct commit_buffer *v = buffer_slab_peek(
+	const struct commit_buffer *v = buffer_slab_peek(
 		r->parsed_objects->buffer_slab, commit);
 	if (!(v && v->buffer == buffer))
 		free((void *)buffer);
 }
 
 void free_commit_buffer(struct parsed_object_pool *pool, struct commit *commit)
 {
-	struct commit_buffer *v = buffer_slab_peek(
+	const struct commit_buffer *cv = buffer_slab_peek(
 		pool->buffer_slab, commit);
-	if (v) {
+	if (cv->buffer) {
+		/* we know this isn't a dummy because it has a real buffer */
+		struct commit_buffer *v = (struct commit_buffer *)cv;
 		FREE_AND_NULL(v->buffer);
 		v->size = 0;
 	}

The first hunk is straight-forward, because we really are treating the
slab value as read-only. But in the second we're trying to free the
value it points to.  Using peek there makes sense, since we wouldn't
want to allocate a slab entry just to see that it points to no buffer at
all. And the code is correct even without const, because the writing we
do is a noop in the case that the buffer struct was already empty. But
C's type system doesn't know that, of course.

So the workarounds we have to do here and elsewhere are much worse than
the few lines of "if (v)" that we got rid of by simplifying peek().

-Peff

Patch
diff mbox series

diff --git a/commit-slab.h b/commit-slab.h
index 69bf0c807c..05b3f2804e 100644
--- a/commit-slab.h
+++ b/commit-slab.h
@@ -24,7 +24,12 @@ 
  * - int *indegree_peek(struct indegree *, struct commit *);
  *
  *   This function is similar to indegree_at(), but it will return NULL
- *   until a call to indegree_at() was made for the commit.
+ *   if the location to store the data associated with the given commit
+ *   has not been allocated yet.
+ *   Note that the location to store the data might have already been
+ *   allocated even if no indegree_at() call has been made for that commit
+ *   yet; in this case this function returns a pointer to a
+ *   zero-initialized location.
  *
  * - void init_indegree(struct indegree *);
  *   void init_indegree_with_stride(struct indegree *, int);