Message ID | 20200310153049.3482-1-szeder.dev@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | commit-slab: clarify slabname##_peek()'s return value | expand |
On Tue, Mar 10, 2020 at 04:30:49PM +0100, SZEDER Gábor wrote: > Ever since 862e730ec1 (commit-slab: introduce slabname##_peek() > function, 2015-05-14) the slabname##_peek() function is documented as: > > This function is similar to indegree_at(), but it will return NULL > until a call to indegree_at() was made for the commit. > > This, however, is usually not the case. If indegree_at() allocates > memory, then it will do so not only for the single commit it got as > parameter, but it will allocate a whole new, ~512kB slab. Later on, > if any other commit's 'index' field happens to point into an already > allocated slab, then indegree_peek() for such a commit will return a > valid non-NULL pointer, pointing to a zero-initialized location in the > slab, even if no indegree_at() call has been made for that commit yet. > > Update slabname##_peek()'s documentation to clarify this. Yeah, I agree the existing documentation is misleading. Your update looks good to me. I thought at first we might simply be able to say: This function is similar to indegree_at(), but it will avoid allocating new slab memory (so its result is suitable only for reading, not writing). But I think it's worth mentioning that the caller needs to handle both NULL or a possible zero-initialized value, as your patch does. I also wondered if we could make life easier for the caller by collapsing these cases. I.e., always returning a zero-initialized value, and never NULL. All of the callers do something like: struct blame_origin *get_blame_suspects(struct commit *commit) { struct blame_origin **result; result = blame_suspects_peek(&blame_suspects, commit); return result ? *result : NULL; } all of which could be turned into a single blame_suspects_peek() call if it just consistently returned a zero-initialized value (it's a little confusing in this example because we're storing pointers, so the zero-initialized value is _also_ NULL, but it's a different type). But that would get a bit awkward, because peek() returns a pointer, not a value (as it should, because the type we're storing may be a compound type, which we generally avoid passing or returning by value). So we'd actually need to return a pointer to a zero-initialized dummy value. Not impossible, but getting a bit odd. -Peff
On Tue, Mar 10, 2020 at 01:54:46PM -0400, Jeff King wrote: > I also wondered if we could make life easier for the caller by > collapsing these cases. I.e., always returning a zero-initialized value, > and never NULL. > [...] > But that would get a bit awkward, because peek() returns a pointer, not > a value (as it should, because the type we're storing may be a compound > type, which we generally avoid passing or returning by value). So we'd > actually need to return a pointer to a zero-initialized dummy value. Not > impossible, but getting a bit odd. I was a little curious how it would look. Code-wise it's not too bad, but if anybody ever wrote to the dummy entry, that would cause confusing bugs. Possibly peek() could return a pointer-to-const. The code savings in the callers are not all that much (I didn't convert all sites, but you can see that it just saves a few lines in each case). So it's probably not worth the churn, but if anybody is really excited about it, I can polish it into a real patch. diff --git a/blame.c b/blame.c index 29770e5c81..3ae42a1edd 100644 --- a/blame.c +++ b/blame.c @@ -15,11 +15,7 @@ static struct blame_suspects blame_suspects; struct blame_origin *get_blame_suspects(struct commit *commit) { - struct blame_origin **result; - - result = blame_suspects_peek(&blame_suspects, commit); - - return result ? *result : NULL; + return *blame_suspects_peek(&blame_suspects, commit); } static void set_blame_suspects(struct commit *commit, struct blame_origin *origin) diff --git a/commit-slab-decl.h b/commit-slab-decl.h index adc7b46c83..4b951e7b2f 100644 --- a/commit-slab-decl.h +++ b/commit-slab-decl.h @@ -13,6 +13,7 @@ struct slabname { \ unsigned stride; \ unsigned slab_count; \ elemtype **slab; \ + elemtype dummy; \ } /* diff --git a/commit-slab-impl.h b/commit-slab-impl.h index 5c0eb91a5d..505c21599a 100644 --- a/commit-slab-impl.h +++ b/commit-slab-impl.h @@ -22,6 +22,7 @@ scope void init_ ##slabname## _with_stride(struct slabname *s, \ s->slab_size = COMMIT_SLAB_SIZE / elem_size; \ s->slab_count = 0; \ s->slab = NULL; \ + memset(&s->dummy, 0, sizeof(elemtype)); \ } \ \ scope void init_ ##slabname(struct slabname *s) \ @@ -50,15 +51,15 @@ scope elemtype *slabname## _at_peek(struct slabname *s, \ if (s->slab_count <= nth_slab) { \ unsigned int i; \ if (!add_if_missing) \ - return NULL; \ + return &s->dummy; \ REALLOC_ARRAY(s->slab, nth_slab + 1); \ for (i = s->slab_count; i <= nth_slab; i++) \ s->slab[i] = NULL; \ s->slab_count = nth_slab + 1; \ } \ if (!s->slab[nth_slab]) { \ if (!add_if_missing) \ - return NULL; \ + return &s->dummy; \ s->slab[nth_slab] = xcalloc(s->slab_size, \ sizeof(**s->slab) * s->stride); \ } \ diff --git a/commit.c b/commit.c index a6cfa41a4e..49d86435a5 100644 --- a/commit.c +++ b/commit.c @@ -289,11 +289,6 @@ const void *get_cached_commit_buffer(struct repository *r, const struct commit * { struct commit_buffer *v = buffer_slab_peek( r->parsed_objects->buffer_slab, commit); - if (!v) { - if (sizep) - *sizep = 0; - return NULL; - } if (sizep) *sizep = v->size; return v->buffer;
Jeff King <peff@peff.net> writes: > But that would get a bit awkward, because peek() returns a pointer, not > a value (as it should, because the type we're storing may be a compound > type, which we generally avoid passing or returning by value). So we'd > actually need to return a pointer to a zero-initialized dummy value. Not > impossible, but getting a bit odd. Do we have a guarantee that callers of the peek only look at, never touch, the location? As long as we make it return a "const *", it might be OK, but a quick look at commit-slab.h tells me that we do not say "const".
Junio C Hamano <gitster@pobox.com> writes: > Jeff King <peff@peff.net> writes: > >> But that would get a bit awkward, because peek() returns a pointer, not >> a value (as it should, because the type we're storing may be a compound >> type, which we generally avoid passing or returning by value). So we'd >> actually need to return a pointer to a zero-initialized dummy value. Not >> impossible, but getting a bit odd. > > Do we have a guarantee that callers of the peek only look at, never > touch, the location? As long as we make it return a "const *", it > might be OK, but a quick look at commit-slab.h tells me that we do > not say "const". Ah, should have read the other message ;-)
On Tue, Mar 10, 2020 at 02:07:49PM -0400, Jeff King wrote: > I was a little curious how it would look. Code-wise it's not too bad, > but if anybody ever wrote to the dummy entry, that would cause confusing > bugs. Possibly peek() could return a pointer-to-const. I was also curious about this, but it gets ugly really quickly. So I think this is a dead end, but read on if you're interested. The first issue is that implicit const conversions are tricky around pointers-to-pointers. So the actual commit-slab change ends up with this: diff --git a/commit-slab-decl.h b/commit-slab-decl.h index 4b951e7b2f..2f67b08aa6 100644 --- a/commit-slab-decl.h +++ b/commit-slab-decl.h @@ -35,7 +35,7 @@ void init_ ##slabname(struct slabname *s); \ void clear_ ##slabname(struct slabname *s); \ elemtype *slabname## _at_peek(struct slabname *s, const struct commit *c, int add_if_missing); \ elemtype *slabname## _at(struct slabname *s, const struct commit *c); \ -elemtype *slabname## _peek(struct slabname *s, const struct commit *c) +const elemtype *slabname## _peek(struct slabname *s, const struct commit *c) #define define_shared_commit_slab(slabname, elemtype) \ declare_commit_slab(slabname, elemtype); \ diff --git a/commit-slab-impl.h b/commit-slab-impl.h index 505c21599a..f2d6ef97a2 100644 --- a/commit-slab-impl.h +++ b/commit-slab-impl.h @@ -72,10 +72,14 @@ scope elemtype *slabname## _at(struct slabname *s, \ return slabname##_at_peek(s, c, 1); \ } \ \ -scope elemtype *slabname## _peek(struct slabname *s, \ +scope const elemtype *slabname## _peek(struct slabname *s, \ const struct commit *c) \ { \ - return slabname##_at_peek(s, c, 0); \ + /* \ + * The cast is necessary here to handle pointer elemtypes. \ + * E.g., (int **) cannot implicitly become (const int **). \ + */ \ + return (const elemtype *)slabname##_at_peek(s, c, 0); \ } \ \ struct slabname Gross, but at least it's contained. The bigger problem, though, is that many of the callers _do_ want to peek and then modify the result if it exists and isn't set to zero. So in blame, for example, we might try to do: diff --git a/blame.c b/blame.c index 3ae42a1edd..ef0526c100 100644 --- a/blame.c +++ b/blame.c @@ -13,7 +13,7 @@ define_commit_slab(blame_suspects, struct blame_origin *); static struct blame_suspects blame_suspects; -struct blame_origin *get_blame_suspects(struct commit *commit) +const struct blame_origin *get_blame_suspects(struct commit *commit) { return *blame_suspects_peek(&blame_suspects, commit); } @@ -24,11 +24,12 @@ static void set_blame_suspects(struct commit *commit, struct blame_origin *origi } void blame_origin_decref(struct blame_origin *o) { if (o && --o->refcnt <= 0) { - struct blame_origin *p, *l = NULL; + const struct blame_origin *p; + struct blame_origin *l = NULL; if (o->previous) blame_origin_decref(o->previous); free(o->file.ptr); /* Should be present exactly once in commit chain */ for (p = get_blame_suspects(o->commit); p; l = p, p = p->next) { But that loop at the end of the context won't work, because we're using a non-const pointer "l" to iterate. And that "l" has to be non-const, because we actually modify its next pointer. And here's an even more straightforward example, with an actual solution to make it compile: diff --git a/commit.c b/commit.c index 49d86435a5..ee36a47db2 100644 --- a/commit.c +++ b/commit.c @@ -319,17 +319,19 @@ void repo_unuse_commit_buffer(struct repository *r, const struct commit *commit, const void *buffer) { - struct commit_buffer *v = buffer_slab_peek( + const struct commit_buffer *v = buffer_slab_peek( r->parsed_objects->buffer_slab, commit); if (!(v && v->buffer == buffer)) free((void *)buffer); } void free_commit_buffer(struct parsed_object_pool *pool, struct commit *commit) { - struct commit_buffer *v = buffer_slab_peek( + const struct commit_buffer *cv = buffer_slab_peek( pool->buffer_slab, commit); - if (v) { + if (cv->buffer) { + /* we know this isn't a dummy because it has a real buffer */ + struct commit_buffer *v = (struct commit_buffer *)cv; FREE_AND_NULL(v->buffer); v->size = 0; } The first hunk is straight-forward, because we really are treating the slab value as read-only. But in the second we're trying to free the value it points to. Using peek there makes sense, since we wouldn't want to allocate a slab entry just to see that it points to no buffer at all. And the code is correct even without const, because the writing we do is a noop in the case that the buffer struct was already empty. But C's type system doesn't know that, of course. So the workarounds we have to do here and elsewhere are much worse than the few lines of "if (v)" that we got rid of by simplifying peek(). -Peff
diff --git a/commit-slab.h b/commit-slab.h index 69bf0c807c..05b3f2804e 100644 --- a/commit-slab.h +++ b/commit-slab.h @@ -24,7 +24,12 @@ * - int *indegree_peek(struct indegree *, struct commit *); * * This function is similar to indegree_at(), but it will return NULL - * until a call to indegree_at() was made for the commit. + * if the location to store the data associated with the given commit + * has not been allocated yet. + * Note that the location to store the data might have already been + * allocated even if no indegree_at() call has been made for that commit + * yet; in this case this function returns a pointer to a + * zero-initialized location. * * - void init_indegree(struct indegree *); * void init_indegree_with_stride(struct indegree *, int);
Ever since 862e730ec1 (commit-slab: introduce slabname##_peek() function, 2015-05-14) the slabname##_peek() function is documented as: This function is similar to indegree_at(), but it will return NULL until a call to indegree_at() was made for the commit. This, however, is usually not the case. If indegree_at() allocates memory, then it will do so not only for the single commit it got as parameter, but it will allocate a whole new, ~512kB slab. Later on, if any other commit's 'index' field happens to point into an already allocated slab, then indegree_peek() for such a commit will return a valid non-NULL pointer, pointing to a zero-initialized location in the slab, even if no indegree_at() call has been made for that commit yet. Update slabname##_peek()'s documentation to clarify this. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> --- commit-slab.h | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)