diff mbox series

[1/8] sha1-array: provide oid_array_filter

Message ID 20180921223558.65055-2-sbeller@google.com (mailing list archive)
State New, archived
Headers show
Series fetch: make sure submodule oids are fetched | expand

Commit Message

Stefan Beller Sept. 21, 2018, 10:35 p.m. UTC
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
---
 sha1-array.c | 17 +++++++++++++++++
 sha1-array.h |  9 +++++++++
 2 files changed, 26 insertions(+)

Comments

Ævar Arnfjörð Bjarmason Sept. 22, 2018, 12:58 p.m. UTC | #1
On Fri, Sep 21 2018, Stefan Beller wrote:

> +/*
> + * Apply want to each entry in array, retaining only the entries for
> + * which the function returns true.  Preserve the order of the entries
> + * that are retained.
> + */
> +void oid_array_filter(struct oid_array *array,
> +		      for_each_oid_fn want,
> +		      void *cbdata);
> +
>  #endif /* SHA1_ARRAY_H */

The code LGTM, but this comment should instead be an update to the API
docs, see my recent 5cc044e025 ("get_short_oid: sort ambiguous objects
by type, then SHA-1", 2018-05-10) for an addition of a new function to
this API where I added some new docs.
Stefan Beller Sept. 25, 2018, 7:26 p.m. UTC | #2
On Sat, Sep 22, 2018 at 5:58 AM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
>
> On Fri, Sep 21 2018, Stefan Beller wrote:
>
> > +/*
> > + * Apply want to each entry in array, retaining only the entries for
> > + * which the function returns true.  Preserve the order of the entries
> > + * that are retained.
> > + */
> > +void oid_array_filter(struct oid_array *array,
> > +                   for_each_oid_fn want,
> > +                   void *cbdata);
> > +
> >  #endif /* SHA1_ARRAY_H */
>
> The code LGTM, but this comment should instead be an update to the API
> docs, see my recent 5cc044e025 ("get_short_oid: sort ambiguous objects
> by type, then SHA-1", 2018-05-10) for an addition of a new function to
> this API where I added some new docs.

ok will fix for consistency (this whole API is there).

Longer term (I thought) we were trying to migrate API docs
to headers instead?

Stefan
Jeff King Sept. 26, 2018, 4:15 a.m. UTC | #3
On Tue, Sep 25, 2018 at 12:26:44PM -0700, Stefan Beller wrote:

> On Sat, Sep 22, 2018 at 5:58 AM Ævar Arnfjörð Bjarmason
> <avarab@gmail.com> wrote:
> >
> >
> > On Fri, Sep 21 2018, Stefan Beller wrote:
> >
> > > +/*
> > > + * Apply want to each entry in array, retaining only the entries for
> > > + * which the function returns true.  Preserve the order of the entries
> > > + * that are retained.
> > > + */
> > > +void oid_array_filter(struct oid_array *array,
> > > +                   for_each_oid_fn want,
> > > +                   void *cbdata);
> > > +
> > >  #endif /* SHA1_ARRAY_H */
> >
> > The code LGTM, but this comment should instead be an update to the API
> > docs, see my recent 5cc044e025 ("get_short_oid: sort ambiguous objects
> > by type, then SHA-1", 2018-05-10) for an addition of a new function to
> > this API where I added some new docs.
> 
> ok will fix for consistency (this whole API is there).
> 
> Longer term (I thought) we were trying to migrate API docs
> to headers instead?

Yes, please. I think it prevents exactly this sort of confusion. :)

-Peff
Junio C Hamano Sept. 26, 2018, 5:10 p.m. UTC | #4
Jeff King <peff@peff.net> writes:

> On Tue, Sep 25, 2018 at 12:26:44PM -0700, Stefan Beller wrote:
>
>> On Sat, Sep 22, 2018 at 5:58 AM Ævar Arnfjörð Bjarmason
>> <avarab@gmail.com> wrote:
>> >
>> >
>> > On Fri, Sep 21 2018, Stefan Beller wrote:
>> >
>> > > +/*
>> > > + * Apply want to each entry in array, retaining only the entries for
>> > > + * which the function returns true.  Preserve the order of the entries
>> > > + * that are retained.
>> > > + */
>> > > +void oid_array_filter(struct oid_array *array,
>> > > +                   for_each_oid_fn want,
>> > > +                   void *cbdata);
>> > > +
>> > >  #endif /* SHA1_ARRAY_H */
>> >
>> > The code LGTM, but this comment should instead be an update to the API
>> > docs, see my recent 5cc044e025 ("get_short_oid: sort ambiguous objects
>> > by type, then SHA-1", 2018-05-10) for an addition of a new function to
>> > this API where I added some new docs.
>> 
>> ok will fix for consistency (this whole API is there).
>> 
>> Longer term (I thought) we were trying to migrate API docs
>> to headers instead?
>
> Yes, please. I think it prevents exactly this sort of confusion. :)

CodingGuidelines or SubmittingPatches update, perhaps?

 Documentation/CodingGuidelines | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/Documentation/CodingGuidelines b/Documentation/CodingGuidelines
index 48aa4edfbd..b54684e807 100644
--- a/Documentation/CodingGuidelines
+++ b/Documentation/CodingGuidelines
@@ -358,7 +358,11 @@ For C programs:
    string_list for sorted string lists, a hash map (mapping struct
    objects) named "struct decorate", amongst other things.
 
- - When you come up with an API, document it.
+ - When you come up with an API, document it.  It used to be
+   encouraged to do so in Documentation/technical/, and the birds-eye
+   level overview may still be more suitable there, but detailed
+   function-by-function level of documentation is done by comments in
+   corresponding .h files these days.
 
  - The first #include in C files, except in platform specific compat/
    implementations, must be either "git-compat-util.h", "cache.h" or
Ævar Arnfjörð Bjarmason Sept. 26, 2018, 5:49 p.m. UTC | #5
On Wed, Sep 26 2018, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
>
>> On Tue, Sep 25, 2018 at 12:26:44PM -0700, Stefan Beller wrote:
>>
>>> On Sat, Sep 22, 2018 at 5:58 AM Ævar Arnfjörð Bjarmason
>>> <avarab@gmail.com> wrote:
>>> >
>>> >
>>> > On Fri, Sep 21 2018, Stefan Beller wrote:
>>> >
>>> > > +/*
>>> > > + * Apply want to each entry in array, retaining only the entries for
>>> > > + * which the function returns true.  Preserve the order of the entries
>>> > > + * that are retained.
>>> > > + */
>>> > > +void oid_array_filter(struct oid_array *array,
>>> > > +                   for_each_oid_fn want,
>>> > > +                   void *cbdata);
>>> > > +
>>> > >  #endif /* SHA1_ARRAY_H */
>>> >
>>> > The code LGTM, but this comment should instead be an update to the API
>>> > docs, see my recent 5cc044e025 ("get_short_oid: sort ambiguous objects
>>> > by type, then SHA-1", 2018-05-10) for an addition of a new function to
>>> > this API where I added some new docs.
>>>
>>> ok will fix for consistency (this whole API is there).
>>>
>>> Longer term (I thought) we were trying to migrate API docs
>>> to headers instead?
>>
>> Yes, please. I think it prevents exactly this sort of confusion. :)
>
> CodingGuidelines or SubmittingPatches update, perhaps?
>
>  Documentation/CodingGuidelines | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/CodingGuidelines b/Documentation/CodingGuidelines
> index 48aa4edfbd..b54684e807 100644
> --- a/Documentation/CodingGuidelines
> +++ b/Documentation/CodingGuidelines
> @@ -358,7 +358,11 @@ For C programs:
>     string_list for sorted string lists, a hash map (mapping struct
>     objects) named "struct decorate", amongst other things.
>
> - - When you come up with an API, document it.
> + - When you come up with an API, document it.  It used to be
> +   encouraged to do so in Documentation/technical/, and the birds-eye
> +   level overview may still be more suitable there, but detailed
> +   function-by-function level of documentation is done by comments in
> +   corresponding .h files these days.
>
>   - The first #include in C files, except in platform specific compat/
>     implementations, must be either "git-compat-util.h", "cache.h" or

Thanks. I had not looked at this closely and was under the false
impression that it was going in the other direction. Good to have it
clarified.
Junio C Hamano Sept. 26, 2018, 6:27 p.m. UTC | #6
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> On Wed, Sep 26 2018, Junio C Hamano wrote:
>
>> Jeff King <peff@peff.net> writes:
>>>
>>> Yes, please. I think it prevents exactly this sort of confusion. :)
>>
>> CodingGuidelines or SubmittingPatches update, perhaps?
>>
>>  Documentation/CodingGuidelines | 6 +++++-
>>  1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/Documentation/CodingGuidelines b/Documentation/CodingGuidelines
>> index 48aa4edfbd..b54684e807 100644
>> --- a/Documentation/CodingGuidelines
>> +++ b/Documentation/CodingGuidelines
>> @@ -358,7 +358,11 @@ For C programs:
>>     string_list for sorted string lists, a hash map (mapping struct
>>     objects) named "struct decorate", amongst other things.
>>
>> - - When you come up with an API, document it.
>> + - When you come up with an API, document it.  It used to be
>> +   encouraged to do so in Documentation/technical/, and the birds-eye
>> +   level overview may still be more suitable there, but detailed
>> +   function-by-function level of documentation is done by comments in
>> +   corresponding .h files these days.
>>
>>   - The first #include in C files, except in platform specific compat/
>>     implementations, must be either "git-compat-util.h", "cache.h" or
>
> Thanks. I had not looked at this closely and was under the false
> impression that it was going in the other direction. Good to have it
> clarified.

Heh, I knew people were in favor of one over the other but until
Peff chimed in to this thread, I didn't recall which one was
preferred, partly because I personally do not see a huge advantage
in using in-code comments as docs for programmers, and do not like
having to read them as in-code comments.

If somebody wants to wordsmith the text and send in a patch with
good log message, please do so, as I myself am not sure if what I
wrote is the consensus position.  It could be that they want to have
even birds-eye overview in the header files.
Jeff King Sept. 26, 2018, 6:34 p.m. UTC | #7
On Wed, Sep 26, 2018 at 11:27:53AM -0700, Junio C Hamano wrote:

> >> diff --git a/Documentation/CodingGuidelines b/Documentation/CodingGuidelines
> >> index 48aa4edfbd..b54684e807 100644
> >> --- a/Documentation/CodingGuidelines
> >> +++ b/Documentation/CodingGuidelines
> >> @@ -358,7 +358,11 @@ For C programs:
> >>     string_list for sorted string lists, a hash map (mapping struct
> >>     objects) named "struct decorate", amongst other things.
> >>
> >> - - When you come up with an API, document it.
> >> + - When you come up with an API, document it.  It used to be
> >> +   encouraged to do so in Documentation/technical/, and the birds-eye
> >> +   level overview may still be more suitable there, but detailed
> >> +   function-by-function level of documentation is done by comments in
> >> +   corresponding .h files these days.
> >>
> >>   - The first #include in C files, except in platform specific compat/
> >>     implementations, must be either "git-compat-util.h", "cache.h" or
> >
> > Thanks. I had not looked at this closely and was under the false
> > impression that it was going in the other direction. Good to have it
> > clarified.
> 
> Heh, I knew people were in favor of one over the other but until
> Peff chimed in to this thread, I didn't recall which one was
> preferred, partly because I personally do not see a huge advantage
> in using in-code comments as docs for programmers, and do not like
> having to read them as in-code comments.
> 
> If somebody wants to wordsmith the text and send in a patch with
> good log message, please do so, as I myself am not sure if what I
> wrote is the consensus position.  It could be that they want to have
> even birds-eye overview in the header files.

Yes, I would say that everything should go into the header files. For
the same reason that the function descriptions should go there: by being
close to the thing being changed, it is more likely that authors will
remember to adjust the documentation. It's not exactly literate
programming, but it's a step in that direction.

That's just my opinion, of course. I've been very happy so far with the
documentation that we have transitioned. We talked a while ago about a
script to extract the comments into a "just documentation" and build it
as asciidoc, but I doubt I would use such a thing myself.

I do agree in general that mentioning this in CodingGuidelines is a good
idea.

-Peff
Ævar Arnfjörð Bjarmason Sept. 26, 2018, 6:43 p.m. UTC | #8
On Wed, Sep 26 2018, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>
>> On Wed, Sep 26 2018, Junio C Hamano wrote:
>>
>>> Jeff King <peff@peff.net> writes:
>>>>
>>>> Yes, please. I think it prevents exactly this sort of confusion. :)
>>>
>>> CodingGuidelines or SubmittingPatches update, perhaps?
>>>
>>>  Documentation/CodingGuidelines | 6 +++++-
>>>  1 file changed, 5 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/Documentation/CodingGuidelines b/Documentation/CodingGuidelines
>>> index 48aa4edfbd..b54684e807 100644
>>> --- a/Documentation/CodingGuidelines
>>> +++ b/Documentation/CodingGuidelines
>>> @@ -358,7 +358,11 @@ For C programs:
>>>     string_list for sorted string lists, a hash map (mapping struct
>>>     objects) named "struct decorate", amongst other things.
>>>
>>> - - When you come up with an API, document it.
>>> + - When you come up with an API, document it.  It used to be
>>> +   encouraged to do so in Documentation/technical/, and the birds-eye
>>> +   level overview may still be more suitable there, but detailed
>>> +   function-by-function level of documentation is done by comments in
>>> +   corresponding .h files these days.
>>>
>>>   - The first #include in C files, except in platform specific compat/
>>>     implementations, must be either "git-compat-util.h", "cache.h" or
>>
>> Thanks. I had not looked at this closely and was under the false
>> impression that it was going in the other direction. Good to have it
>> clarified.
>
> Heh, I knew people were in favor of one over the other but until
> Peff chimed in to this thread, I didn't recall which one was
> preferred, partly because I personally do not see a huge advantage
> in using in-code comments as docs for programmers, and do not like
> having to read them as in-code comments.
>
> If somebody wants to wordsmith the text and send in a patch with
> good log message, please do so, as I myself am not sure if what I
> wrote is the consensus position.  It could be that they want to have
> even birds-eye overview in the header files.

While we're on that subject, I'd much prefer if these technical docs and
other asciidoc-ish stuff we would be manpages build as part of our
normal "make doc" target. So e.g. this one would be readable via
something like "man 3 gitapi-oid-array".

They could still be maintained along with the code given a sufficiently
smart doc build system, e.g. perl does that:
https://github.com/Perl/perl5/blob/v5.26.0/av.c#L294-L387

Although doing that in some readable and sane way would mean getting rid
of your beloved multi-line /* ... */ comment formatting, at least for
that case. It would be a pain to have everyone configure their editor to
word-wrap lines with leading "*" without screwing up the asciidoc
format.
Jeff King Sept. 26, 2018, 6:58 p.m. UTC | #9
On Wed, Sep 26, 2018 at 08:43:14PM +0200, Ævar Arnfjörð Bjarmason wrote:

> While we're on that subject, I'd much prefer if these technical docs and
> other asciidoc-ish stuff we would be manpages build as part of our
> normal "make doc" target. So e.g. this one would be readable via
> something like "man 3 gitapi-oid-array".

I'm mildly negative on this, just because it introduces extra work on
people writing the documentation. Now it has to follow special
formatting rules, and sometimes the source is uglier (e.g., the horrible
+-continuation in lists). Are authors now responsible for formatting any
changes they make to make sure they look good in asciidoc? Or are people
who care about the formatted output going to come along afterwards and
submit fixup patches? Either way it seems like make-work.

Now I'll admit it seems like make-work to me because I would not plan to
ever look at the formatted output myself. But I guess I don't understand
the audience for this formatted output. These are APIs internal to Git
itself. We would not generally want to install gitapi-oid-array into
/usr/share/man, because only people actually working on Git would be
able to use it. So it sounds like a convenience for a handful of
developers (who like to look at this manpage versus the source). It
doesn't seem like the cost/benefit is there.

And if we were going to generate something external, would it make more
sense to write in a structured format like doxygen? I am not a big fan
of it myself, but at least from there you can generate a more richly
interconnected set of documentation.

-Peff
Stefan Beller Sept. 26, 2018, 7:39 p.m. UTC | #10
On Wed, Sep 26, 2018 at 11:58 AM Jeff King <peff@peff.net> wrote:

> Now I'll admit it seems like make-work to me because I would not plan to
> ever look at the formatted output myself. But I guess I don't understand
> the audience for this formatted output.

I have been watching from the side lines so far, as I have a view that is very
much like Peffs, if I were to dictate my opinion onto the project, we'd:
* put all internal API docs into headerfiles.
* get rid of all Documentation/technical/ that describes things
   at a function level. So the remaining things are high level docs such
   as protocol, hash transition plan, partial clones...

But I'd want to understand why we are not doing this (obvious to me)
best thing, I have the impression other people use the docs differently.

How are these docs (for example api-oid-array or
api-revision-walking) used?

I usually tend to use git-grep a lot when looking for a function, just to
jump to the header file and read the comment there and go around the
header file looking if we have a better suited thing.
(Also I tend to like to have fewer files open, such that I prefer
a header file of reasonable size over 2 files, one docs and one header)

So when I find a function documented in Docs/technical, I would first
read there, then go to the header to see the signature and then make
use of it.

If I recall an earlier discussion on this topic, I recall Junio saying that
this *may* pollute the header files, as for example sha1-array.h
is easy to grok in its entirety, and makes it easy to see what
functions regarding oid_arrays are available. A counter example would
be hashmap.h, as that requires some scrolling and skimming.

> These are APIs internal to Git
> itself. We would not generally want to install gitapi-oid-array into
> /usr/share/man, because only people actually working on Git would be
> able to use it. So it sounds like a convenience for a handful of
> developers (who like to look at this manpage versus the source). It
> doesn't seem like the cost/benefit is there.

If this type of docs makes Ævar more productive, I am all for keeping it.

> And if we were going to generate something external, would it make more
> sense to write in a structured format like doxygen? I am not a big fan
> of it myself, but at least from there you can generate a more richly
> interconnected set of documentation.

That sounds sensible to me as the additional burden of yet-another-tool
is only imposed to the developers, but doxygen would provide
a better output in my experience.

Stefan
Junio C Hamano Sept. 26, 2018, 7:49 p.m. UTC | #11
Jeff King <peff@peff.net> writes:

> Now I'll admit it seems like make-work to me because I would not plan to
> ever look at the formatted output myself. But I guess I don't understand
> the audience for this formatted output. These are APIs internal to Git
> itself. We would not generally want to install gitapi-oid-array into
> /usr/share/man, because only people actually working on Git would be
> able to use it. So it sounds like a convenience for a handful of
> developers (who like to look at this manpage versus the source). It
> doesn't seem like the cost/benefit is there.
>
> And if we were going to generate something external, would it make more
> sense to write in a structured format like doxygen? I am not a big fan
> of it myself, but at least from there you can generate a more richly
> interconnected set of documentation.

I agree on both counts.  I just like to read these in plain text
while I am coding for Git (or reviewing patches coded for Git).  

The reason why I have mild preference to D/technical/ over in-header
doc is only because I find even these asterisks at the left-side-end
distracting; it is not that materials in D/technical could be passed
through AsciiDoc.
Stefan Beller Sept. 26, 2018, 7:59 p.m. UTC | #12
> I agree on both counts.  I just like to read these in plain text
> while I am coding for Git (or reviewing patches coded for Git).
>
> The reason why I have mild preference to D/technical/ over in-header
> doc is only because I find even these asterisks at the left-side-end
> distracting; it is not that materials in D/technical could be passed
> through AsciiDoc.

    /**
    Would we be open to consider another commenting style?
    AFAICT the asterisks are a cultural thing and not enforced by
    and old compiler not understanding these comments.

    Just like ending the comment in an asymetric fashion ;-) */

(I am not seriously proposing unbalanced comments, but for
longer documenting comments we may want to consider,
omitting the asterisk?)
Junio C Hamano Sept. 26, 2018, 8:19 p.m. UTC | #13
Stefan Beller <sbeller@google.com> writes:

>> I agree on both counts.  I just like to read these in plain text
>> while I am coding for Git (or reviewing patches coded for Git).
>>
>> The reason why I have mild preference to D/technical/ over in-header
>> doc is only because I find even these asterisks at the left-side-end
>> distracting; it is not that materials in D/technical could be passed
>> through AsciiDoc.
>
>     /**
>     Would we be open to consider another commenting style?

No.  You still cannot write */ in that comment section, for example,
so you cannot do "plain text" still---that is not a great trade off
to deviate from the common comment style used for other stuff.

When I say I want plain text, I mean it.  Anything you write in .h
cannot be plain text, unless you make all readers agree with some
convention, and "there is always '*' at the left edge" is such an
acceptable convention can learn to live with, just like everybody
else does ; at least, that is consistent with other comments.

> (I am not seriously proposing unbalanced comments, but for
> longer documenting comments we may want to consider,
> omitting the asterisk?)

If this is not a serious proposal, then I'll stop wasting
everybody's time.

Now it is clear that the only pieces of consensus we got out of this
discussion so far is that we want to add to CodingGuidelines, that
my earlier "something like this" is not the text we want and that we
want everything in the header as comment.

So let's see if we can have usable alternative, apply that and move
on.
Ævar Arnfjörð Bjarmason Sept. 26, 2018, 8:44 p.m. UTC | #14
[Changing subject because this stopped being about Stefan's patches a
while ago]

On Wed, Sep 26 2018, Jeff King wrote:

> On Wed, Sep 26, 2018 at 08:43:14PM +0200, Ævar Arnfjörð Bjarmason wrote:
>
>> While we're on that subject, I'd much prefer if these technical docs and
>> other asciidoc-ish stuff we would be manpages build as part of our
>> normal "make doc" target. So e.g. this one would be readable via
>> something like "man 3 gitapi-oid-array".

[responding out of order]

> Now I'll admit it seems like make-work to me because I would not plan to
> ever look at the formatted output myself. But I guess I don't understand
> the audience for this formatted output. These are APIs internal to Git
> itself. We would not generally want to install gitapi-oid-array into
> /usr/share/man, because only people actually working on Git would be
> able to use it. So it sounds like a convenience for a handful of
> developers (who like to look at this manpage versus the source). It
> doesn't seem like the cost/benefit is there.
>
> And if we were going to generate something external, would it make more
> sense to write in a structured format like doxygen? I am not a big fan
> of it myself, but at least from there you can generate a more richly
> interconnected set of documentation.

It's useful to have a single authoritative source for all documentation
that's easy to search through.

I don't really care where that text actually lives, i.e. whether it's
all in *.txt to begin with, or in *.c or *.h and pulled together by some
"make" step, or much how it's formatted, beyond it being easy to work
with. I'm not a big fan of asciidoc, but it's what we've got if we want
formatting, inter-linking etc.

My bias here is that I've also contributed actively to the perl project
in the past, and with that project you can get an overview of *all* of
the docs by typing:

    man perl

That includes stuff like perl585delta(1) which we'd stick in
Documentation/RelNotes, and "Internals and C Language Interface". Most
of what we'd put in Documentation/technical/api-* & headers is in
perlapi(1).

I spent an embarrassingly long time submitting patches to git before
discovering that Documentation/technical/api-*.txt even existed, I think
something like 1-2 years ago.

We have very well documented stuff like strbuf.h (mostly thanks to you),
but how is such documentation supposed to be discovered? Something like:

    git grep -A20 '^/\*$' -- *.h
    <hold in page down>

?

In terms of getting an overview it's indistinguishable from
comments. I.e. there's nothing like an index of:

    man git-api-strbuf     ==> working with strings
    man git-api-sha1-array ==> list, iterate and binary lookup SHA1s

etc.

Sometimes it's obvious where to look, but as someone who uses most of
these APIs very occasionally (because I contribute less) I keep
(re-)discovering various internal APIs.

I'm also not in the habit of opening the *.h file for something, I
usually just start reading the *.c and only later discover there's some
API docs in the relevant *.h (or not).

So in terms of implementation I'm a big fan of the perl.git approach of
having these docs as comments before the function implementation in the
*.c file.

It means you can't avoid seeing it or updating it when source
spelunking, and it leaves header files short, which is useful when you'd
like to get a general API overview without all the docs. Our documented
headers are quite fat, e.g. strbuf.h is 60% of the size of strbuf.c, but
less than 20% when you strip the docs.

> I'm mildly negative on this, just because it introduces extra work on
> people writing the documentation. Now it has to follow special
> formatting rules, and sometimes the source is uglier (e.g., the horrible
> +-continuation in lists). Are authors now responsible for formatting any
> changes they make to make sure they look good in asciidoc? Or are people
> who care about the formatted output going to come along afterwards and
> submit fixup patches? Either way it seems like make-work.

This part I'm slightly confused by. If you're just saying "let's
document stuff in *.h files in free-flowing text", fine. But if we're
talking about the difference between Documentation/technical/api-*.txt
and having the same stuff in manpages, then the stuff in api-*.txt *is*
already asciidoc, and we have a target to format it all up and ship it
with git, and distros ship that.

E.g. on Debian you can "apt install git-doc" and browse stuff like
file:///usr/share/doc/git-doc/technical/api-argv-array.html which is the
HTML formatted version, same for all the other api-*.txt docs.

We just don't have some central index of those, and they're not a
default target which means the likes of our own https://git-scm.com
don't build them, so they're harder to find.

Every perl installation also ships perlapi and a MB or so of obscure
internal docs to everyone, which makes it easier to find via Google et
al, which I find useful. So I guess I'm more on the side fence of
dropping a hypothetical gitapi-oid-array into /usr/share/man than you
are.

ANYWAY

This E-mail is much longer than I intended, sorry about that. I didn't
just mean to bitch about how we ship docs, but I was wondering if there
was a desire to move to something like what I've outlined above, or
whether the status quo was mostly by design and intended.

I just thought I'd write this rather lengthy E-Mails because this is one
of the rare areas where you can fully describe an idea in E-Mail without
writing any patches.

If the consensus is that something like the exhaustive index "perldoc
perl" provides wouldn't be useful for git I can just drop this, but if
people are enthusiastic about having that it would be useful to work on
it...
Jeff King Sept. 26, 2018, 8:51 p.m. UTC | #15
On Wed, Sep 26, 2018 at 01:19:20PM -0700, Junio C Hamano wrote:

> >     /**
> >     Would we be open to consider another commenting style?
> 
> No.  You still cannot write */ in that comment section, for example,
> so you cannot do "plain text" still---that is not a great trade off
> to deviate from the common comment style used for other stuff.

We can do:

#if HEADER_DOC

Write anything here except #endif!

#endif /* HEADER_DOC */

But I actually think that is more jarring than a big comment (when I am
reading C code, I expect to see C-like things mostly.

I do agree with you that a true plain text file is nicer in terms of
freedom to format, etc. But IMHO the tradeoff is easily worth it to keep
the related content (function declarations and their documentation)
close together.

-Peff
Stefan Beller Sept. 26, 2018, 9:22 p.m. UTC | #16
> > (I am not seriously proposing unbalanced comments, but for
> > longer documenting comments we may want to consider,
> > omitting the asterisk?)
>
> If this is not a serious proposal, then I'll stop wasting
> everybody's time.

I proposed two things, one of which I was not serious about.

The other has downsides as both you and Peff point out.
So living with comments may not be the worst after all.

I'll read&reply to Aevars proposal now.
Junio C Hamano Sept. 26, 2018, 9:40 p.m. UTC | #17
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> So in terms of implementation I'm a big fan of the perl.git approach of
> having these docs as comments before the function implementation in the
> *.c file.
>
> It means you can't avoid seeing it or updating it when source
> spelunking, and it leaves header files short, which is useful when you'd
> like to get a general API overview without all the docs. Our documented
> headers are quite fat, e.g. strbuf.h is 60% of the size of strbuf.c, but
> less than 20% when you strip the docs.

Just like you I do not care too much where the authoritative source
lives, and I do agree with Peff that separate text file is much more
likely to go stale wrt the code, and that it is an acceptable trade
off to write docs in comment and live with slightly less freedom
than writing in plain text in order to keep the docs and source
closer together.  I do not think between Peff and me, neither of us
were contrasting in-header vs in-code comments.

And I tend to agree that it makes it even harder to let doc and code
drift apart to have the doc near the implementation (as opposed to
in the header).  It probably makes the result less useful, unless
the project invests in making the result accessible like "man
perlapi" does, than having them in the header for people who view
headers as guide to users of API, though.

> We just don't have some central index of those, and they're not a
> default target which means the likes of our own https://git-scm.com
> don't build them, so they're harder to find.
>
> Every perl installation also ships perlapi and a MB or so of obscure
> internal docs to everyone, which makes it easier to find via Google et
> al, which I find useful. So I guess I'm more on the side fence of
> dropping a hypothetical gitapi-oid-array into /usr/share/man than you
> are.

Regardless of where we place docs (between .c/.h or .txt), making
them indexable and investing in make target to actually produce
these docs with ToC is different matter.  I do not think people who
actually spent time on our docs had enough inclination to do so
themselves, but that should not prevent you or other like-minded
people from leading by example.
Stefan Beller Sept. 26, 2018, 11:21 p.m. UTC | #18
On Wed, Sep 26, 2018 at 1:44 PM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:

> > And if we were going to generate something external, would it make more
> > sense to write in a structured format like doxygen? I am not a big fan
> > of it myself, but at least from there you can generate a more richly
> > interconnected set of documentation.
>
> It's useful to have a single authoritative source for all documentation
> that's easy to search through.

If that is the case I would propose to keep it all in header files and
organize the headers.

> That includes stuff like perl585delta(1) which we'd stick in
> Documentation/RelNotes, and "Internals and C Language Interface". Most
> of what we'd put in Documentation/technical/api-* & headers is in
> perlapi(1).

This seems cool, but was also a recent introduction?
perl400delta seems to yield nothing for me (which may be because
I do not have an old version of perl installed?)

>
> Sometimes it's obvious where to look, but as someone who uses most of
> these APIs very occasionally (because I contribute less) I keep
> (re-)discovering various internal APIs.

Sometimes I have the same feeling. Maybe more structure in the
source files would help (e.g. datastructures/{strbuf, string-list}.h
and objects/{packfile.h, sha1*} ?

> Every perl installation also ships perlapi and a MB or so of obscure
> internal docs to everyone, which makes it easier to find via Google et
> al, which I find useful. So I guess I'm more on the side fence of
> dropping a hypothetical gitapi-oid-array into /usr/share/man than you
> are.
>

Personally I would not want to ship our internal docs everywhere
as it seems like overly wasteful. But then, only my early days
of childhood were guided by Internet that is not available anywhere
and everywhere. Today I would just take for granted that I can lookup
things in github/git/git when I am in not at my regular desk.

> ANYWAY
>
> This E-mail is much longer than I intended, sorry about that. I didn't
> just mean to bitch about how we ship docs, but I was wondering if there
> was a desire to move to something like what I've outlined above, or
> whether the status quo was mostly by design and intended.
>
> I just thought I'd write this rather lengthy E-Mails because this is one
> of the rare areas where you can fully describe an idea in E-Mail without
> writing any patches.
>
> If the consensus is that something like the exhaustive index "perldoc
> perl" provides wouldn't be useful for git I can just drop this, but if
> people are enthusiastic about having that it would be useful to work on
> it...

I agree with Junio, as that the discoverability is unrelated to where to store
the actual docs, Documentation/technical/* has the advantage that it
only has "good" stuff, i.e. some topic that someone cared enough to
write about and it is easy to guess if it is relevant to your need.
In *.h, we have a lot of false positives (xdiff-interface.h or cache.h
just pollute the searching space when looking for suitable storage
classes.)

So I wonder if we'd want to have a list (similar as in
command-list.txt) that describes the header files and gives
some classification of them, for example one class could be the
data structures (strbuf, stringlist, hashmap), algorithms
(diff, range-diff), OS specific stuff (run-command, trace, alloc)
or Git specific things (apply, fetch, submodule)
Jeff King Sept. 27, 2018, 6:20 a.m. UTC | #19
On Wed, Sep 26, 2018 at 10:44:33PM +0200, Ævar Arnfjörð Bjarmason wrote:

> My bias here is that I've also contributed actively to the perl project
> in the past, and with that project you can get an overview of *all* of
> the docs by typing:
> 
>     man perl
> 
> That includes stuff like perl585delta(1) which we'd stick in
> Documentation/RelNotes, and "Internals and C Language Interface". Most
> of what we'd put in Documentation/technical/api-* & headers is in
> perlapi(1).

I like the perl documentation, too. But I think most of what you're
talking about there is the public API, where you have many readers. But
here we're talking about internal functions (albeit ones we hope to see
reused across the codebase). My concern is mostly: is the work in
maintaining this documentation-formatting system worth the number of
readers it will get?

There are two numbers there, and I'm guessing at both of them. ;)

If you're interested in pulling documentation out of the header files
and generating asciidoc from it, I'm happy to at least try keeping it up
to date. When we started putting this information into header files, we
used "/**" to start the comment, as a special marker to indicate it was
worth pulling out. I don't know how well we've maintained that
convention, but it's a starting point.

> I spent an embarrassingly long time submitting patches to git before
> discovering that Documentation/technical/api-*.txt even existed, I think
> something like 1-2 years ago.

That's another thing that I think is improved by keeping the
documentation and code together. If you find one, you find the other.
You just have to "somehow" find one, which is what you're getting at
below.

> We have very well documented stuff like strbuf.h (mostly thanks to you),
> but how is such documentation supposed to be discovered? Something like:
> 
>     git grep -A20 '^/\*$' -- *.h
>     <hold in page down>
> 
> ?
> 
> In terms of getting an overview it's indistinguishable from
> comments. I.e. there's nothing like an index of:
> 
>     man git-api-strbuf     ==> working with strings
>     man git-api-sha1-array ==> list, iterate and binary lookup SHA1s

I agree that is a problem, especially for contributors less familiar
with the code base. But I think generating an index is a separate (and
much easier) problem than formatting all of the documentation.

We already have the /** convention I mentioned above. Could we have
another micro-format like:

  /** API:strbuf - working with strings */

in each header file? That would make generating such an index pretty
trivial.

> I'm also not in the habit of opening the *.h file for something, I
> usually just start reading the *.c and only later discover there's some
> API docs in the relevant *.h (or not).

Interesting. I'm not totally opposed to putting the documentation
alongside the actual code. It does feel a bit cluttered to me, but I
think you're right that it keeps everything as close together as
possible.

> It means you can't avoid seeing it or updating it when source
> spelunking, and it leaves header files short, which is useful when you'd
> like to get a general API overview without all the docs. Our documented
> headers are quite fat, e.g. strbuf.h is 60% of the size of strbuf.c, but
> less than 20% when you strip the docs.

I don't use folds in my editor, and I guess nobody else in this thread
does, either. But they may be a reasonable tool for "wow, there are
comments, declarations, and definitions all together and I just want to
view one of them". In vim, try "set foldmethod=syntax" and then "zc" to
close the folds.

I kind of hate it myself, but just another option to explore.

> > I'm mildly negative on this, just because it introduces extra work on
> > people writing the documentation. Now it has to follow special
> > formatting rules, and sometimes the source is uglier (e.g., the horrible
> > +-continuation in lists). Are authors now responsible for formatting any
> > changes they make to make sure they look good in asciidoc? Or are people
> > who care about the formatted output going to come along afterwards and
> > submit fixup patches? Either way it seems like make-work.
> 
> This part I'm slightly confused by. If you're just saying "let's
> document stuff in *.h files in free-flowing text", fine. But if we're
> talking about the difference between Documentation/technical/api-*.txt
> and having the same stuff in manpages, then the stuff in api-*.txt *is*
> already asciidoc, and we have a target to format it all up and ship it
> with git, and distros ship that.

Yes, the "free-flowing text" thing is more or less what I'm saying. I
suspect what's in technical/* currently probably has formatting
problems, and people update it (if we're lucky!) without regard to what
the result looks like.

> E.g. on Debian you can "apt install git-doc" and browse stuff like
> file:///usr/share/doc/git-doc/technical/api-argv-array.html which is the
> HTML formatted version, same for all the other api-*.txt docs.

IMHO this is just silly. What am I as an end user going to do with
api-argv-array.html?

> This E-mail is much longer than I intended, sorry about that. I didn't
> just mean to bitch about how we ship docs, but I was wondering if there
> was a desire to move to something like what I've outlined above, or
> whether the status quo was mostly by design and intended.

I think we're actually half-way between status quos. The stuff in
technical/api-* has existed for a long time. We've been slowly moving
stuff over to header files, but there's still a bit of stuff there.

Building those files as asciidoc is one of those things that just
cropped up along the way. I have never thought it was useful myself, but
somebody bothered to write the patches, so OK. The person who did so is
not even somebody who has otherwise worked on the project, AFAICT.

-Peff
Jonathan Nieder Sept. 27, 2018, 6:34 a.m. UTC | #20
Hi,

Jeff King wrote:
> On Wed, Sep 26, 2018 at 10:44:33PM +0200, Ævar Arnfjörð Bjarmason wrote:

>> In terms of getting an overview it's indistinguishable from
>> comments. I.e. there's nothing like an index of:
>>
>>     man git-api-strbuf     ==> working with strings
>>     man git-api-sha1-array ==> list, iterate and binary lookup SHA1s
>
> I agree that is a problem, especially for contributors less familiar
> with the code base. But I think generating an index is a separate (and
> much easier) problem than formatting all of the documentation.
>
> We already have the /** convention I mentioned above. Could we have
> another micro-format like:
>
>   /** API:strbuf - working with strings */
>
> in each header file? That would make generating such an index pretty
> trivial.

Can you spell out the problem for me a little more?  E.g. if we had a
convention that the first comment in strbuf.h should say

	/* strbuf - Git's standard string type */

or even just

	/* Git's standard string type */

would that do the trick?

>> I'm also not in the habit of opening the *.h file for something, I
>> usually just start reading the *.c and only later discover there's some
>> API docs in the relevant *.h (or not).
>
> Interesting. I'm not totally opposed to putting the documentation
> alongside the actual code. It does feel a bit cluttered to me, but I
> think you're right that it keeps everything as close together as
> possible.

I've experienced projects following both conventions.  One thing I like
a lot about documentation in the header file is that it encourages
people to make the API documentation self-contained.  That is, you
describe the contract in a way that doesn't require reading the
implementation for details.

It took me a while to get used to this kind of convention but now I
really like it.  So I really do prefer to keep putting API
documentation in the header files in Git.  (Implementation
documentation in the source files is of course also very welcome.)

>> It means you can't avoid seeing it or updating it when source
>> spelunking, and it leaves header files short, which is useful when you'd
>> like to get a general API overview without all the docs. Our documented
>> headers are quite fat, e.g. strbuf.h is 60% of the size of strbuf.c, but
>> less than 20% when you strip the docs.
>
> I don't use folds in my editor, and I guess nobody else in this thread
> does, either. But they may be a reasonable tool for "wow, there are
> comments, declarations, and definitions all together and I just want to
> view one of them". In vim, try "set foldmethod=syntax" and then "zc" to
> close the folds.

I use kythe to get an outline view of the header files.

[...]
>> E.g. on Debian you can "apt install git-doc" and browse stuff like
>> file:///usr/share/doc/git-doc/technical/api-argv-array.html which is the
>> HTML formatted version, same for all the other api-*.txt docs.
>
> IMHO this is just silly. What am I as an end user going to do with
> api-argv-array.html?

In Debian we just ship all the docs.  For something like
technical/pack-heuristics, it's quite nice.  For the API docs, I think
they belong in the headers.

Thanks,
Jonathan
Jeff King Sept. 27, 2018, 6:40 a.m. UTC | #21
On Wed, Sep 26, 2018 at 11:34:04PM -0700, Jonathan Nieder wrote:

> > We already have the /** convention I mentioned above. Could we have
> > another micro-format like:
> >
> >   /** API:strbuf - working with strings */
> >
> > in each header file? That would make generating such an index pretty
> > trivial.
> 
> Can you spell out the problem for me a little more?  E.g. if we had a
> convention that the first comment in strbuf.h should say
> 
> 	/* strbuf - Git's standard string type */
> 
> or even just
> 
> 	/* Git's standard string type */
> 
> would that do the trick?

I assumed that not all header files would want to advertise themselves
as a public API, so we'd want to some way to differentiate these from
first comments in "other" header files (and I assumed we did not want a
separate list of "these are the API files to go into the index", at
which point you might as well just write "standard string type" in that
list).

But I'm happy with any convention that lets Ævar generate the index he
wants.

> > Interesting. I'm not totally opposed to putting the documentation
> > alongside the actual code. It does feel a bit cluttered to me, but I
> > think you're right that it keeps everything as close together as
> > possible.
> 
> I've experienced projects following both conventions.  One thing I like
> a lot about documentation in the header file is that it encourages
> people to make the API documentation self-contained.  That is, you
> describe the contract in a way that doesn't require reading the
> implementation for details.
> 
> It took me a while to get used to this kind of convention but now I
> really like it.  So I really do prefer to keep putting API
> documentation in the header files in Git.  (Implementation
> documentation in the source files is of course also very welcome.)

Yeah, I'd agree with all of that.

> I use kythe to get an outline view of the header files.

Looks neat. Doesn't seem to be package for Debian, though. ;)

> In Debian we just ship all the docs.  For something like
> technical/pack-heuristics, it's quite nice.  For the API docs, I think
> they belong in the headers.

Yes, I'd agree with that. About half of technical/* is design docs, etc,
that might be of use to random folks. But the internal code APIs are
really only useful if you have an actual clone of git.git.

-Peff
Jeff King Sept. 27, 2018, 6:48 a.m. UTC | #22
On Wed, Sep 26, 2018 at 02:58:12PM -0400, Jeff King wrote:

> And if we were going to generate something external, would it make more
> sense to write in a structured format like doxygen? I am not a big fan
> of it myself, but at least from there you can generate a more richly
> interconnected set of documentation.

By the way, I tried running Doxygen on the current code-base. I _still_
hate the presentation of it (mostly because it's jarring to jump out of
my terminal to a web browser). But it actually does an OK job of showing
strbuf.h, with our overview attached to the struct and individual
functions correctly documented. There are a couple of formatting
hiccups, but it's actually pretty close.

-Peff
Ævar Arnfjörð Bjarmason Sept. 27, 2018, 8:58 a.m. UTC | #23
On Wed, Sep 26 2018, Stefan Beller wrote:

> On Wed, Sep 26, 2018 at 1:44 PM Ævar Arnfjörð Bjarmason
> <avarab@gmail.com> wrote:
>
>> > And if we were going to generate something external, would it make more
>> > sense to write in a structured format like doxygen? I am not a big fan
>> > of it myself, but at least from there you can generate a more richly
>> > interconnected set of documentation.
>>
>> It's useful to have a single authoritative source for all documentation
>> that's easy to search through.
>
> If that is the case I would propose to keep it all in header files and
> organize the headers.
>
>> That includes stuff like perl585delta(1) which we'd stick in
>> Documentation/RelNotes, and "Internals and C Language Interface". Most
>> of what we'd put in Documentation/technical/api-* & headers is in
>> perlapi(1).
>
> This seems cool, but was also a recent introduction?
> perl400delta seems to yield nothing for me (which may be because
> I do not have an old version of perl installed?)

Depends on what you think is "recent" I suppose. Perl 5.4 is the first
version where deltas in POD format started being maintained
consistently, that version was released in mid-1997. Perl 4 was released
in 1991, see "perldoc perlhist". So ~everything consistently in POD has
been the case for ~20 years.
Junio C Hamano Sept. 27, 2018, 5:36 p.m. UTC | #24
Jeff King <peff@peff.net> writes:

> If you're interested in pulling documentation out of the header files
> and generating asciidoc from it, I'm happy to at least try keeping it up
> to date. When we started putting this information into header files, we
> used "/**" to start the comment, as a special marker to indicate it was
> worth pulling out. I don't know how well we've maintained that
> convention, but it's a starting point.

I noticed some people add these extra asterisk at the beginning of
comment, but I do not recall that we declared it is a convention we
adopt, so I'd rather be surprised if we've "maintained" it.

Please have it in CodingGuidelines or somewhere once this thread
settles and we decide to keep that convention (I have no problem
with the convention; it is just I personally didn't think it was
worth doing myself at least until now that we might have found
somebody who wants to make use of the markings).
Jeff King Sept. 27, 2018, 6:25 p.m. UTC | #25
On Thu, Sep 27, 2018 at 10:36:11AM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > If you're interested in pulling documentation out of the header files
> > and generating asciidoc from it, I'm happy to at least try keeping it up
> > to date. When we started putting this information into header files, we
> > used "/**" to start the comment, as a special marker to indicate it was
> > worth pulling out. I don't know how well we've maintained that
> > convention, but it's a starting point.
> 
> I noticed some people add these extra asterisk at the beginning of
> comment, but I do not recall that we declared it is a convention we
> adopt, so I'd rather be surprised if we've "maintained" it.

True. _I_ declared it as a convention and used it when I migrated some
of the initial api-* documents, but I don't know if anybody else
followed that lead.

FWIW, it is not my own convention, but one used by other tools like
doxygen (which in turn got it from javadoc, I think).

> Please have it in CodingGuidelines or somewhere once this thread
> settles and we decide to keep that convention (I have no problem
> with the convention; it is just I personally didn't think it was
> worth doing myself at least until now that we might have found
> somebody who wants to make use of the markings).

Yeah, this is basically why I hadn't pushed harder for it. If nobody is
actually extracting based on the convention, there is not much point. So
I was waiting for somebody to show up with an interest in using an
extraction tool.

-Peff
diff mbox series

Patch

diff --git a/sha1-array.c b/sha1-array.c
index b94e0ec0f5e..d922e94e3fc 100644
--- a/sha1-array.c
+++ b/sha1-array.c
@@ -77,3 +77,20 @@  int oid_array_for_each_unique(struct oid_array *array,
 	}
 	return 0;
 }
+
+void oid_array_filter(struct oid_array *array,
+		      for_each_oid_fn want,
+		      void *cb_data)
+{
+	unsigned nr = array->nr, src, dst;
+	struct object_id *oids = array->oid;
+
+	for (src = dst = 0; src < nr; src++) {
+		if (want(&oids[src], cb_data)) {
+			if (src != dst)
+				oidcpy(&oids[dst], &oids[src]);
+			dst++;
+		}
+	}
+	array->nr = dst;
+}
diff --git a/sha1-array.h b/sha1-array.h
index 232bf950172..275e5b02f5e 100644
--- a/sha1-array.h
+++ b/sha1-array.h
@@ -23,4 +23,13 @@  int oid_array_for_each_unique(struct oid_array *array,
 			      for_each_oid_fn fn,
 			      void *data);
 
+/*
+ * Apply want to each entry in array, retaining only the entries for
+ * which the function returns true.  Preserve the order of the entries
+ * that are retained.
+ */
+void oid_array_filter(struct oid_array *array,
+		      for_each_oid_fn want,
+		      void *cbdata);
+
 #endif /* SHA1_ARRAY_H */