diff mbox series

CodingGuidelines: discourage arbitrary suffixes in function names

Message ID 20241021124145.636561-1-karthik.188@gmail.com (mailing list archive)
State Superseded
Headers show
Series CodingGuidelines: discourage arbitrary suffixes in function names | expand

Commit Message

karthik nayak Oct. 21, 2024, 12:41 p.m. UTC
We often name functions with arbitrary suffixes like `_1` as an
extension of another existing function. This created confusion and
doesn't provide good clarity into the functions purpose. Let's document
good function naming etiquette in our CodingGuidelines.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---

This is mostly in response to an ongoing thread [1] where I ran into one of
these functions and it really took me a while to wrap my head around what the
function does.

[1]: https://lore.kernel.org/git/CAOLa=ZREg3xuaT6mbM8+Havn3regZDhK45kGy0+Fw8t56c7Mpg@mail.gmail.com/#R

 Documentation/CodingGuidelines | 5 +++++
 1 file changed, 5 insertions(+)

Comments

Patrick Steinhardt Oct. 21, 2024, 12:59 p.m. UTC | #1
On Mon, Oct 21, 2024 at 02:41:45PM +0200, Karthik Nayak wrote:
> We often name functions with arbitrary suffixes like `_1` as an
> extension of another existing function. This created confusion and

Micro-nit: s/created/creates/

[snip]
> diff --git a/Documentation/CodingGuidelines b/Documentation/CodingGuidelines
> index 30fda4142c..b8e2d30567 100644
> --- a/Documentation/CodingGuidelines
> +++ b/Documentation/CodingGuidelines
> @@ -621,6 +621,11 @@ For C programs:
>      - `S_free()` releases a structure's contents and frees the
>        structure.
>  
> + - Function names should be self-explanatory, clearly reflecting their
> +   purpose or behavior. To maintain clarity and avoid confusion,
> +   arbitrary suffixes such as _1 are discouraged, as they provide no
> +   meaningful insight into the function's role.

Names being self-explanatory is certainly a worthwhile goal, even though
I guess that it's a bit more on the idealized side of things. Functions
will often not be fully self-explanatory, which is likely just a matter
of reality. I mostly just don't want us to end on the other side of the
spectrum where we go militant on "Every function must be no longer than
2 lines of code such that it can be self-explanatory".

And yes, I'm of course stretching what you are saying quite a bit, I
know that this is not what you want to say. I'm merely writing down my
own thoughts while thinking it through.

So, that being said, I agree that we shouldn't use arbitrary suffixes,
as these are quite hard to understand indeed and typically don't really
provide any context. So as long as we interpret this rule leniently I'm
happy with it.

Patrick
Kristoffer Haugsbakk Oct. 21, 2024, 4:51 p.m. UTC | #2
On Mon, Oct 21, 2024, at 14:41, Karthik Nayak wrote:
> We often name functions with arbitrary suffixes like `_1` as an
> extension of another existing function. This created confusion and
> doesn't provide good clarity into the functions purpose. Let's document
> good function naming etiquette in our CodingGuidelines.
>
> Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
> ---
>
> This is mostly in response to an ongoing thread [1] where I ran into one of
> these functions and it really took me a while to wrap my head around what the
> function does.
>
> [1]:
> https://lore.kernel.org/git/CAOLa=ZREg3xuaT6mbM8+Havn3regZDhK45kGy0+Fw8t56c7Mpg@mail.gmail.com/#R

I was wondering whether it would make sense to use that link in this
document for the context.  But I see that there is only one instance
of that in the current document.
Taylor Blau Oct. 21, 2024, 8:02 p.m. UTC | #3
On Mon, Oct 21, 2024 at 02:59:00PM +0200, Patrick Steinhardt wrote:
> On Mon, Oct 21, 2024 at 02:41:45PM +0200, Karthik Nayak wrote:
> > We often name functions with arbitrary suffixes like `_1` as an
> > extension of another existing function. This created confusion and
>
> Micro-nit: s/created/creates/
>
> [snip]
> > diff --git a/Documentation/CodingGuidelines b/Documentation/CodingGuidelines
> > index 30fda4142c..b8e2d30567 100644
> > --- a/Documentation/CodingGuidelines
> > +++ b/Documentation/CodingGuidelines
> > @@ -621,6 +621,11 @@ For C programs:
> >      - `S_free()` releases a structure's contents and frees the
> >        structure.
> >
> > + - Function names should be self-explanatory, clearly reflecting their
> > +   purpose or behavior. To maintain clarity and avoid confusion,
> > +   arbitrary suffixes such as _1 are discouraged, as they provide no
> > +   meaningful insight into the function's role.
>
> Names being self-explanatory is certainly a worthwhile goal, even though
> I guess that it's a bit more on the idealized side of things. Functions
> will often not be fully self-explanatory, which is likely just a matter
> of reality. I mostly just don't want us to end on the other side of the
> spectrum where we go militant on "Every function must be no longer than
> 2 lines of code such that it can be self-explanatory".
>
> And yes, I'm of course stretching what you are saying quite a bit, I
> know that this is not what you want to say. I'm merely writing down my
> own thoughts while thinking it through.
>
> So, that being said, I agree that we shouldn't use arbitrary suffixes,
> as these are quite hard to understand indeed and typically don't really
> provide any context. So as long as we interpret this rule leniently I'm
> happy with it.

I am not sure here... I think that using a "_1()" suffix means that the
function is processing one of a number of elements that all need to be
handled similarly, but where both the processing of an individual
element, and the handling of a group of elements are both complicated
enough to be placed in their own function.

It's also a helpful convention when you have a recursive function that
does some setup during its initial call, but subsequent layers of
recursion don't need or want to repeat that setup.

So I'm not sure I agree that "_1()" is always a bad idea as this changes
suggests (i.e. by writing that "they provide no meaningful insight into
the function's role").

Perhaps we could rephrase what is written here to suggest a couple of
instances where we wouldn't want to apply this rule, and the two that I
have written above could perhaps be a useful starting point. But I lean
more towards not adjusting our CodingGuidelines at all here.

Thanks,
Taylor
karthik nayak Oct. 22, 2024, 8:34 a.m. UTC | #4
Patrick Steinhardt <ps@pks.im> writes:

> On Mon, Oct 21, 2024 at 02:41:45PM +0200, Karthik Nayak wrote:
>> We often name functions with arbitrary suffixes like `_1` as an
>> extension of another existing function. This created confusion and
>
> Micro-nit: s/created/creates/
>
> [snip]
>> diff --git a/Documentation/CodingGuidelines b/Documentation/CodingGuidelines
>> index 30fda4142c..b8e2d30567 100644
>> --- a/Documentation/CodingGuidelines
>> +++ b/Documentation/CodingGuidelines
>> @@ -621,6 +621,11 @@ For C programs:
>>      - `S_free()` releases a structure's contents and frees the
>>        structure.
>>
>> + - Function names should be self-explanatory, clearly reflecting their
>> +   purpose or behavior. To maintain clarity and avoid confusion,
>> +   arbitrary suffixes such as _1 are discouraged, as they provide no
>> +   meaningful insight into the function's role.
>
> Names being self-explanatory is certainly a worthwhile goal, even though
> I guess that it's a bit more on the idealized side of things. Functions
> will often not be fully self-explanatory, which is likely just a matter
> of reality. I mostly just don't want us to end on the other side of the
> spectrum where we go militant on "Every function must be no longer than
> 2 lines of code such that it can be self-explanatory".
>
> And yes, I'm of course stretching what you are saying quite a bit, I
> know that this is not what you want to say. I'm merely writing down my
> own thoughts while thinking it through.
>

I agree, going the other way doesn't help either.

> So, that being said, I agree that we shouldn't use arbitrary suffixes,
> as these are quite hard to understand indeed and typically don't really
> provide any context. So as long as we interpret this rule leniently I'm
> happy with it.
>
> Patrick

I tried to keep the wording to also not just say "it is not allowed to
use such suffixes", but mostly to discourage it. I guess we can also
iterate on this as needed with time.

Karthik
karthik nayak Oct. 22, 2024, 8:45 a.m. UTC | #5
Taylor Blau <me@ttaylorr.com> writes:

> On Mon, Oct 21, 2024 at 02:59:00PM +0200, Patrick Steinhardt wrote:
>> On Mon, Oct 21, 2024 at 02:41:45PM +0200, Karthik Nayak wrote:
>> > We often name functions with arbitrary suffixes like `_1` as an
>> > extension of another existing function. This created confusion and
>>
>> Micro-nit: s/created/creates/
>>
>> [snip]
>> > diff --git a/Documentation/CodingGuidelines b/Documentation/CodingGuidelines
>> > index 30fda4142c..b8e2d30567 100644
>> > --- a/Documentation/CodingGuidelines
>> > +++ b/Documentation/CodingGuidelines
>> > @@ -621,6 +621,11 @@ For C programs:
>> >      - `S_free()` releases a structure's contents and frees the
>> >        structure.
>> >
>> > + - Function names should be self-explanatory, clearly reflecting their
>> > +   purpose or behavior. To maintain clarity and avoid confusion,
>> > +   arbitrary suffixes such as _1 are discouraged, as they provide no
>> > +   meaningful insight into the function's role.
>>
>> Names being self-explanatory is certainly a worthwhile goal, even though
>> I guess that it's a bit more on the idealized side of things. Functions
>> will often not be fully self-explanatory, which is likely just a matter
>> of reality. I mostly just don't want us to end on the other side of the
>> spectrum where we go militant on "Every function must be no longer than
>> 2 lines of code such that it can be self-explanatory".
>>
>> And yes, I'm of course stretching what you are saying quite a bit, I
>> know that this is not what you want to say. I'm merely writing down my
>> own thoughts while thinking it through.
>>
>> So, that being said, I agree that we shouldn't use arbitrary suffixes,
>> as these are quite hard to understand indeed and typically don't really
>> provide any context. So as long as we interpret this rule leniently I'm
>> happy with it.
>
> I am not sure here... I think that using a "_1()" suffix means that the
> function is processing one of a number of elements that all need to be
> handled similarly, but where both the processing of an individual
> element, and the handling of a group of elements are both complicated
> enough to be placed in their own function.
>

The crux here is that this meaning is internalized by people who know
the code through in and out. But for anyone new to the project, this is
not evident from the naming.

> It's also a helpful convention when you have a recursive function that
> does some setup during its initial call, but subsequent layers of
> recursion don't need or want to repeat that setup.
>

I get the usecase of having such functions. I totally see the need, it's
mostly the naming that I'm trying to change.

Let's consider two of such functions:

1. mark_remote_island_1: This function is called to do _some_ work on a
single remote island when iterating over a list.
2. find_longest_prefixes_1: This is a recursive function which is used
to find the longest prefix.

If you notice, both use the "_1" suffix and do different things (operate
on a single instance from a list vs provide recursive behavior). So the
user has to read the code to understand, which makes the "_1" a bit
confusing.

Second, this could have instead been named:
1. mark_single_remote_island: Which reads better, giving the idea that
we are really working on a single remote island. Whereas having a "_1"
doesn't easily imply the same.
2. find_longest_prefixes_recursively: Which also reads better, and gives
a hint on how the function operates and why it is split out from the
base function.

> So I'm not sure I agree that "_1()" is always a bad idea as this changes
> suggests (i.e. by writing that "they provide no meaningful insight into
> the function's role").
>
> Perhaps we could rephrase what is written here to suggest a couple of
> instances where we wouldn't want to apply this rule, and the two that I
> have written above could perhaps be a useful starting point. But I lean
> more towards not adjusting our CodingGuidelines at all here.
>

I hope my reasoning convinces you of the benefits, this is mostly coming
from me spending more time than I'd like to admit on one of these
functions, trying to understand what they do :D

> Thanks,
> Taylor

Thanks,
Karthik
karthik nayak Oct. 22, 2024, 8:47 a.m. UTC | #6
"Kristoffer Haugsbakk" <kristofferhaugsbakk@fastmail.com> writes:

> On Mon, Oct 21, 2024, at 14:41, Karthik Nayak wrote:
>> We often name functions with arbitrary suffixes like `_1` as an
>> extension of another existing function. This created confusion and
>> doesn't provide good clarity into the functions purpose. Let's document
>> good function naming etiquette in our CodingGuidelines.
>>
>> Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
>> ---
>>
>> This is mostly in response to an ongoing thread [1] where I ran into one of
>> these functions and it really took me a while to wrap my head around what the
>> function does.
>>
>> [1]:
>> https://lore.kernel.org/git/CAOLa=ZREg3xuaT6mbM8+Havn3regZDhK45kGy0+Fw8t56c7Mpg@mail.gmail.com/#R
>
> I was wondering whether it would make sense to use that link in this
> document for the context.  But I see that there is only one instance
> of that in the current document.

Yeah, there are a few such functions in our codebase. I'd be happy to
make any improvements, but also think this is simple and clear at the
moment.
Taylor Blau Oct. 22, 2024, 4:41 p.m. UTC | #7
On Tue, Oct 22, 2024 at 03:45:38AM -0500, karthik nayak wrote:
> >> So, that being said, I agree that we shouldn't use arbitrary suffixes,
> >> as these are quite hard to understand indeed and typically don't really
> >> provide any context. So as long as we interpret this rule leniently I'm
> >> happy with it.
> >
> > I am not sure here... I think that using a "_1()" suffix means that the
> > function is processing one of a number of elements that all need to be
> > handled similarly, but where both the processing of an individual
> > element, and the handling of a group of elements are both complicated
> > enough to be placed in their own function.
>
> The crux here is that this meaning is internalized by people who know
> the code through in and out. But for anyone new to the project, this is
> not evident from the naming.

Right. I think that with that in mind, a good goal might be to document
that convention to make sure that newcomers to the project can easily
interpret what foo() and foo_1() mean. Another approach is the one you
pursue here, which is to change the existing convention in the name of
making the code more approachable for newcomers.

Both approaches meet the same end, but the former does not involve
changing existing conventions, but instead documenting them. That seems
like a more reasonable path to me.

> > It's also a helpful convention when you have a recursive function that
> > does some setup during its initial call, but subsequent layers of
> > recursion don't need or want to repeat that setup.
> >
>
> I get the usecase of having such functions. I totally see the need, it's
> mostly the naming that I'm trying to change.
>
> Let's consider two of such functions:
>
> 1. mark_remote_island_1: This function is called to do _some_ work on a
> single remote island when iterating over a list.
> 2. find_longest_prefixes_1: This is a recursive function which is used
> to find the longest prefix.
>
> If you notice, both use the "_1" suffix and do different things (operate
> on a single instance from a list vs provide recursive behavior). So the
> user has to read the code to understand, which makes the "_1" a bit
> confusing.
>
> Second, this could have instead been named:
> 1. mark_single_remote_island: Which reads better, giving the idea that
> we are really working on a single remote island. Whereas having a "_1"
> doesn't easily imply the same.
> 2. find_longest_prefixes_recursively: Which also reads better, and gives
> a hint on how the function operates and why it is split out from the
> base function.

I don't disagree that writing "single" or "recursively" can be
considered clearer. I think that the convention to suffix such functions
with "_1()" is more terse, but saves characters and can avoid awkward
line wrapping.

Thanks,
Taylor
karthik nayak Oct. 23, 2024, 7:44 a.m. UTC | #8
Taylor Blau <me@ttaylorr.com> writes:

> On Tue, Oct 22, 2024 at 03:45:38AM -0500, karthik nayak wrote:
>> >> So, that being said, I agree that we shouldn't use arbitrary suffixes,
>> >> as these are quite hard to understand indeed and typically don't really
>> >> provide any context. So as long as we interpret this rule leniently I'm
>> >> happy with it.
>> >
>> > I am not sure here... I think that using a "_1()" suffix means that the
>> > function is processing one of a number of elements that all need to be
>> > handled similarly, but where both the processing of an individual
>> > element, and the handling of a group of elements are both complicated
>> > enough to be placed in their own function.
>>
>> The crux here is that this meaning is internalized by people who know
>> the code through in and out. But for anyone new to the project, this is
>> not evident from the naming.
>
> Right. I think that with that in mind, a good goal might be to document
> that convention to make sure that newcomers to the project can easily
> interpret what foo() and foo_1() mean. Another approach is the one you
> pursue here, which is to change the existing convention in the name of
> making the code more approachable for newcomers.
>
> Both approaches meet the same end, but the former does not involve
> changing existing conventions, but instead documenting them. That seems
> like a more reasonable path to me.
>

I would disagree though. I think some conventions even though we've been
using them for a while, should be changed. I guess a good middle ground
is to document current behavior while also discouraging future use. I'll
add that in and push a new version.

>> > It's also a helpful convention when you have a recursive function that
>> > does some setup during its initial call, but subsequent layers of
>> > recursion don't need or want to repeat that setup.
>> >
>>
>> I get the usecase of having such functions. I totally see the need, it's
>> mostly the naming that I'm trying to change.
>>
>> Let's consider two of such functions:
>>
>> 1. mark_remote_island_1: This function is called to do _some_ work on a
>> single remote island when iterating over a list.
>> 2. find_longest_prefixes_1: This is a recursive function which is used
>> to find the longest prefix.
>>
>> If you notice, both use the "_1" suffix and do different things (operate
>> on a single instance from a list vs provide recursive behavior). So the
>> user has to read the code to understand, which makes the "_1" a bit
>> confusing.
>>
>> Second, this could have instead been named:
>> 1. mark_single_remote_island: Which reads better, giving the idea that
>> we are really working on a single remote island. Whereas having a "_1"
>> doesn't easily imply the same.
>> 2. find_longest_prefixes_recursively: Which also reads better, and gives
>> a hint on how the function operates and why it is split out from the
>> base function.
>
> I don't disagree that writing "single" or "recursively" can be
> considered clearer. I think that the convention to suffix such functions
> with "_1()" is more terse, but saves characters and can avoid awkward
> line wrapping.
>

But are those pros worth the ambiguity it brings?

> Thanks,
> Taylor

- Karthik
Junio C Hamano Oct. 24, 2024, 12:50 a.m. UTC | #9
Taylor Blau <me@ttaylorr.com> writes:

> I don't disagree that writing "single" or "recursively" can be
> considered clearer. I think that the convention to suffix such functions
> with "_1()" is more terse, but saves characters and can avoid awkward
> line wrapping.

I am reasonably sure that I was the first user of the _1()
convention, or at least I was one of them.  The reason for the
choice of suffix was only because there wasn't anything suitable
when refactoring an existing function foo() into a set-up part and
its recursive body, so I just kept the set-up part and the single
call into the new function in the original foo(), and had to give a
name to the new function that holds the body of the original logic
that was moved from foo().

Neither foo_helper() or foo_recursive() were descriptive enough to
warrant such longer suffixes than a simple _1().  They easily can
get "help by doing what?" and "recursively doing what?" reaction,
which is a sure sign that the suffixes are not descriptive enough.

That was the only reason why I picked that "short-and-sweet but
cryptic" suffix.

Surely all of _1(), _helper(), _recursive() are meaningless.  If we
were to replace existing uses of them, the replacement has to be 10x
better.

Having said all that, as an aspirational goal, I think it is good to
encourage people to find a name that is descriptive when writing a
new function.  I'd refrain from judging if it is way too obvious to
be worth documenting (as I am officially on vacation and shouldn't
be thinking too much about the project).

Thanks.
Taylor Blau Oct. 24, 2024, 4:50 p.m. UTC | #10
On Wed, Oct 23, 2024 at 05:50:17PM -0700, Junio C Hamano wrote:
> Surely all of _1(), _helper(), _recursive() are meaningless.  If we
> were to replace existing uses of them, the replacement has to be 10x
> better.

Well put. Each of the three are more or less equally meaningless, but
_1() is an accepted (?) project convention and has the fewest
characters, so I think is a good choice personally.

> Having said all that, as an aspirational goal, I think it is good to
> encourage people to find a name that is descriptive when writing a
> new function.  I'd refrain from judging if it is way too obvious to
> be worth documenting (as I am officially on vacation and shouldn't
> be thinking too much about the project).

Yeah, go back to vacation ;-).

Thanks,
Taylor
diff mbox series

Patch

diff --git a/Documentation/CodingGuidelines b/Documentation/CodingGuidelines
index 30fda4142c..b8e2d30567 100644
--- a/Documentation/CodingGuidelines
+++ b/Documentation/CodingGuidelines
@@ -621,6 +621,11 @@  For C programs:
     - `S_free()` releases a structure's contents and frees the
       structure.
 
+ - Function names should be self-explanatory, clearly reflecting their
+   purpose or behavior. To maintain clarity and avoid confusion,
+   arbitrary suffixes such as _1 are discouraged, as they provide no
+   meaningful insight into the function's role.
+
 For Perl programs:
 
  - Most of the C guidelines above apply.