diff mbox series

[net-next,3/3] net: phylink: remove "using_mac_select_pcs"

Message ID E1syBPE-006Unh-TL@rmk-PC.armlinux.org.uk (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series Removing more phylink cruft | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 6 this patch: 6
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 7 of 7 maintainers
netdev/build_clang success Errors and warnings before: 6 this patch: 6
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 5 this patch: 5
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 54 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 19 this patch: 19
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-10-10--09-00 (tests: 775)

Commit Message

Russell King (Oracle) Oct. 8, 2024, 2:41 p.m. UTC
With DSA's implementation of the mac_select_pcs() method removed, we
can now remove the detection of mac_select_pcs() implementation.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
---
 drivers/net/phy/phylink.c | 14 +++-----------
 1 file changed, 3 insertions(+), 11 deletions(-)

Comments

Vladimir Oltean Oct. 9, 2024, 12:29 p.m. UTC | #1
On Tue, Oct 08, 2024 at 03:41:44PM +0100, Russell King (Oracle) wrote:
> With DSA's implementation of the mac_select_pcs() method removed, we
> can now remove the detection of mac_select_pcs() implementation.
> 
> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
> ---
>  drivers/net/phy/phylink.c | 14 +++-----------
>  1 file changed, 3 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
> index 4309317de3d1..8f86599d3d78 100644
> --- a/drivers/net/phy/phylink.c
> +++ b/drivers/net/phy/phylink.c
> @@ -79,7 +79,6 @@ struct phylink {
>  	unsigned int pcs_state;
>  
>  	bool mac_link_dropped;
> -	bool using_mac_select_pcs;
>  
>  	struct sfp_bus *sfp_bus;
>  	bool sfp_may_have_phy;
> @@ -661,12 +660,12 @@ static int phylink_validate_mac_and_pcs(struct phylink *pl,
>  	int ret;
>  
>  	/* Get the PCS for this interface mode */
> -	if (pl->using_mac_select_pcs) {
> +	if (pl->mac_ops->mac_select_pcs) {
>  		pcs = pl->mac_ops->mac_select_pcs(pl->config, state->interface);
>  		if (IS_ERR(pcs))
>  			return PTR_ERR(pcs);
>  	} else {
> -		pcs = pl->pcs;
> +		pcs = NULL;

The assignment from the "else" branch could have been folded into the
variable initialization.

Also, maybe a word in the commit message would be good about why the
"pcs = pl->pcs" line became "pcs = NULL". I get the impression that
these are 2 logical changes in one patch. This second aspect I'm
highlighting seems to be cleaning up the last remnants of phylink_set_pcs().
Since all phylink users have been converted to mac_select_pcs(), there's
no other possible value for "pl->pcs" than NULL if "using_mac_select_pcs"
is true.

I'm not 100% sure that this is the case, but cross-checking with the git
history, it seems to be the case. Commit 1054457006d4 ("net: phy:
phylink: fix DSA mac_select_pcs() introduction") was merged on Feb 21,2022,
and commit a5081bad2eac ("net: phylink: remove phylink_set_pcs()") on
Feb 26. So it seems plausible that this fixup could have been made as
soon as Feb 26, 2022. Please confirm.

>  	}
>  
>  	if (pcs) {
Vladimir Oltean Oct. 9, 2024, 12:33 p.m. UTC | #2
On Wed, Oct 09, 2024 at 03:29:38PM +0300, Vladimir Oltean wrote:
> there's no other possible value for "pl->pcs" than NULL if
> "using_mac_select_pcs" is true.

I mean if it is false, sorry.
Paolo Abeni Oct. 10, 2024, 9:47 a.m. UTC | #3
On 10/9/24 14:29, Vladimir Oltean wrote:
> On Tue, Oct 08, 2024 at 03:41:44PM +0100, Russell King (Oracle) wrote:
>> With DSA's implementation of the mac_select_pcs() method removed, we
>> can now remove the detection of mac_select_pcs() implementation.
>>
>> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
>> ---
>>   drivers/net/phy/phylink.c | 14 +++-----------
>>   1 file changed, 3 insertions(+), 11 deletions(-)
>>
>> diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
>> index 4309317de3d1..8f86599d3d78 100644
>> --- a/drivers/net/phy/phylink.c
>> +++ b/drivers/net/phy/phylink.c
>> @@ -79,7 +79,6 @@ struct phylink {
>>   	unsigned int pcs_state;
>>   
>>   	bool mac_link_dropped;
>> -	bool using_mac_select_pcs;
>>   
>>   	struct sfp_bus *sfp_bus;
>>   	bool sfp_may_have_phy;
>> @@ -661,12 +660,12 @@ static int phylink_validate_mac_and_pcs(struct phylink *pl,
>>   	int ret;
>>   
>>   	/* Get the PCS for this interface mode */
>> -	if (pl->using_mac_select_pcs) {
>> +	if (pl->mac_ops->mac_select_pcs) {
>>   		pcs = pl->mac_ops->mac_select_pcs(pl->config, state->interface);
>>   		if (IS_ERR(pcs))
>>   			return PTR_ERR(pcs);
>>   	} else {
>> -		pcs = pl->pcs;
>> +		pcs = NULL;
> 
> The assignment from the "else" branch could have been folded into the
> variable initialization.
> 
> Also, maybe a word in the commit message would be good about why the
> "pcs = pl->pcs" line became "pcs = NULL". I get the impression that
> these are 2 logical changes in one patch. This second aspect I'm
> highlighting seems to be cleaning up the last remnants of phylink_set_pcs().
> Since all phylink users have been converted to mac_select_pcs(), there's
> no other possible value for "pl->pcs" than NULL if "using_mac_select_pcs"
> is true.
> 
> I'm not 100% sure that this is the case, but cross-checking with the git
> history, it seems to be the case. Commit 1054457006d4 ("net: phy:
> phylink: fix DSA mac_select_pcs() introduction") was merged on Feb 21,2022,
> and commit a5081bad2eac ("net: phylink: remove phylink_set_pcs()") on
> Feb 26. So it seems plausible that this fixup could have been made as
> soon as Feb 26, 2022. Please confirm.

I agree with Vladimir, I think at least expanding the reasoning in the 
commit message would be useful.

Thanks,

Paolo
Russell King (Oracle) Oct. 10, 2024, 11:21 a.m. UTC | #4
On Wed, Oct 09, 2024 at 03:29:38PM +0300, Vladimir Oltean wrote:
> On Tue, Oct 08, 2024 at 03:41:44PM +0100, Russell King (Oracle) wrote:
> > With DSA's implementation of the mac_select_pcs() method removed, we
> > can now remove the detection of mac_select_pcs() implementation.
> > 
> > Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
> > ---
> >  drivers/net/phy/phylink.c | 14 +++-----------
> >  1 file changed, 3 insertions(+), 11 deletions(-)
> > 
> > diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
> > index 4309317de3d1..8f86599d3d78 100644
> > --- a/drivers/net/phy/phylink.c
> > +++ b/drivers/net/phy/phylink.c
> > @@ -79,7 +79,6 @@ struct phylink {
> >  	unsigned int pcs_state;
> >  
> >  	bool mac_link_dropped;
> > -	bool using_mac_select_pcs;
> >  
> >  	struct sfp_bus *sfp_bus;
> >  	bool sfp_may_have_phy;
> > @@ -661,12 +660,12 @@ static int phylink_validate_mac_and_pcs(struct phylink *pl,
> >  	int ret;
> >  
> >  	/* Get the PCS for this interface mode */
> > -	if (pl->using_mac_select_pcs) {
> > +	if (pl->mac_ops->mac_select_pcs) {
> >  		pcs = pl->mac_ops->mac_select_pcs(pl->config, state->interface);
> >  		if (IS_ERR(pcs))
> >  			return PTR_ERR(pcs);
> >  	} else {
> > -		pcs = pl->pcs;
> > +		pcs = NULL;
> 
> The assignment from the "else" branch could have been folded into the
> variable initialization.
> 
> Also, maybe a word in the commit message would be good about why the
> "pcs = pl->pcs" line became "pcs = NULL". I get the impression that
> these are 2 logical changes in one patch. This second aspect I'm
> highlighting seems to be cleaning up the last remnants of phylink_set_pcs().
> Since all phylink users have been converted to mac_select_pcs(), there's
> no other possible value for "pl->pcs" than NULL if "using_mac_select_pcs"
> is true.

Hmm. Looking at this again, we're getting into quite a mess because of
one of your previous review comments from a number of years back.

You stated that you didn't see the need to support a transition from
having-a-PCS to having-no-PCS. I don't have a link to that discussion.
However, it is why we've ended up with phylink_major_config() having
the extra complexity here, effectively preventing mac_select_pcs()
from being able to remove a PCS that was previously added:

		pcs_changed = pcs && pl->pcs != pcs;

because if mac_select_pcs() returns NULL, it was decided that any
in-use PCS would not be removed. It seems (at least to me) to be a
silly decision now.

However, if mac_select_pcs() in phylink_major_config() returns NULL,
we don't do any validation of the PCS.

So this, today, before these patches, is already an inconsistent mess.

To fix this, I think:

	struct phylink_pcs *pcs = NULL;
...
        if (pl->mac_ops->mac_select_pcs) {
                pcs = pl->mac_ops->mac_select_pcs(pl->config, state->interface);
                if (IS_ERR(pcs))
                        return PTR_ERR(pcs);
	}

	if (!pcs)
		pcs = pl->pcs;

is needed to give consistent behaviour.

Alternatively, we could allow mac_select_pcs() to return NULL, which
would then allow the PCS to be removed.

Let me know if you've changed your mind on what behaviour we should
have, because this affects what I do to sort this out.

However, it is true that if mac_select_pcs() is NULL, since commit
a5081bad2eac ("net: phylink: remove phylink_set_pcs()") there is no
way pl->pcs can be non-NULL, since there is no other way for it to
be set.
Russell King (Oracle) Oct. 10, 2024, 1 p.m. UTC | #5
On Thu, Oct 10, 2024 at 12:21:43PM +0100, Russell King (Oracle) wrote:
> On Wed, Oct 09, 2024 at 03:29:38PM +0300, Vladimir Oltean wrote:
> > On Tue, Oct 08, 2024 at 03:41:44PM +0100, Russell King (Oracle) wrote:
> > > With DSA's implementation of the mac_select_pcs() method removed, we
> > > can now remove the detection of mac_select_pcs() implementation.
> > > 
> > > Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
> > > ---
> > >  drivers/net/phy/phylink.c | 14 +++-----------
> > >  1 file changed, 3 insertions(+), 11 deletions(-)
> > > 
> > > diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
> > > index 4309317de3d1..8f86599d3d78 100644
> > > --- a/drivers/net/phy/phylink.c
> > > +++ b/drivers/net/phy/phylink.c
> > > @@ -79,7 +79,6 @@ struct phylink {
> > >  	unsigned int pcs_state;
> > >  
> > >  	bool mac_link_dropped;
> > > -	bool using_mac_select_pcs;
> > >  
> > >  	struct sfp_bus *sfp_bus;
> > >  	bool sfp_may_have_phy;
> > > @@ -661,12 +660,12 @@ static int phylink_validate_mac_and_pcs(struct phylink *pl,
> > >  	int ret;
> > >  
> > >  	/* Get the PCS for this interface mode */
> > > -	if (pl->using_mac_select_pcs) {
> > > +	if (pl->mac_ops->mac_select_pcs) {
> > >  		pcs = pl->mac_ops->mac_select_pcs(pl->config, state->interface);
> > >  		if (IS_ERR(pcs))
> > >  			return PTR_ERR(pcs);
> > >  	} else {
> > > -		pcs = pl->pcs;
> > > +		pcs = NULL;
> > 
> > The assignment from the "else" branch could have been folded into the
> > variable initialization.
> > 
> > Also, maybe a word in the commit message would be good about why the
> > "pcs = pl->pcs" line became "pcs = NULL". I get the impression that
> > these are 2 logical changes in one patch. This second aspect I'm
> > highlighting seems to be cleaning up the last remnants of phylink_set_pcs().
> > Since all phylink users have been converted to mac_select_pcs(), there's
> > no other possible value for "pl->pcs" than NULL if "using_mac_select_pcs"
> > is true.
> 
> Hmm. Looking at this again, we're getting into quite a mess because of
> one of your previous review comments from a number of years back.
> 
> You stated that you didn't see the need to support a transition from
> having-a-PCS to having-no-PCS. I don't have a link to that discussion.
> However, it is why we've ended up with phylink_major_config() having
> the extra complexity here, effectively preventing mac_select_pcs()
> from being able to remove a PCS that was previously added:
> 
> 		pcs_changed = pcs && pl->pcs != pcs;
> 
> because if mac_select_pcs() returns NULL, it was decided that any
> in-use PCS would not be removed. It seems (at least to me) to be a
> silly decision now.
> 
> However, if mac_select_pcs() in phylink_major_config() returns NULL,
> we don't do any validation of the PCS.
> 
> So this, today, before these patches, is already an inconsistent mess.
> 
> To fix this, I think:
> 
> 	struct phylink_pcs *pcs = NULL;
> ...
>         if (pl->mac_ops->mac_select_pcs) {
>                 pcs = pl->mac_ops->mac_select_pcs(pl->config, state->interface);
>                 if (IS_ERR(pcs))
>                         return PTR_ERR(pcs);
> 	}
> 
> 	if (!pcs)
> 		pcs = pl->pcs;
> 
> is needed to give consistent behaviour.
> 
> Alternatively, we could allow mac_select_pcs() to return NULL, which
> would then allow the PCS to be removed.
> 
> Let me know if you've changed your mind on what behaviour we should
> have, because this affects what I do to sort this out.

Here's a link to the original discussion from November 2021:

https://lore.kernel.org/all/E1mpSba-00BXp6-9e@rmk-PC.armlinux.org.uk/

Google uselessly refused to find it, so I searched my own mailboxes
to find the message ID.
Vladimir Oltean Oct. 11, 2024, 10:39 a.m. UTC | #6
On Thu, Oct 10, 2024 at 02:00:32PM +0100, Russell King (Oracle) wrote:
> On Thu, Oct 10, 2024 at 12:21:43PM +0100, Russell King (Oracle) wrote:
> > Hmm. Looking at this again, we're getting into quite a mess because of
> > one of your previous review comments from a number of years back.
> > 
> > You stated that you didn't see the need to support a transition from
> > having-a-PCS to having-no-PCS. I don't have a link to that discussion.
> > However, it is why we've ended up with phylink_major_config() having
> > the extra complexity here, effectively preventing mac_select_pcs()
> > from being able to remove a PCS that was previously added:
> > 
> > 		pcs_changed = pcs && pl->pcs != pcs;
> > 
> > because if mac_select_pcs() returns NULL, it was decided that any
> > in-use PCS would not be removed. It seems (at least to me) to be a
> > silly decision now.
> > 
> > However, if mac_select_pcs() in phylink_major_config() returns NULL,
> > we don't do any validation of the PCS.
> > 
> > So this, today, before these patches, is already an inconsistent mess.
> > 
> > To fix this, I think:
> > 
> > 	struct phylink_pcs *pcs = NULL;
> > ...
> >         if (pl->mac_ops->mac_select_pcs) {
> >                 pcs = pl->mac_ops->mac_select_pcs(pl->config, state->interface);
> >                 if (IS_ERR(pcs))
> >                         return PTR_ERR(pcs);
> > 	}
> > 
> > 	if (!pcs)
> > 		pcs = pl->pcs;
> > 
> > is needed to give consistent behaviour.
> > 
> > Alternatively, we could allow mac_select_pcs() to return NULL, which
> > would then allow the PCS to be removed.
> > 
> > Let me know if you've changed your mind on what behaviour we should
> > have, because this affects what I do to sort this out.
> 
> Here's a link to the original discussion from November 2021:
> 
> https://lore.kernel.org/all/E1mpSba-00BXp6-9e@rmk-PC.armlinux.org.uk/
> 
> Google uselessly refused to find it, so I searched my own mailboxes
> to find the message ID.

Important note: I cannot find any discussion on any mailing list which
fills the gap between me asking what is the real world applicability of
mac_select_pcs() returning NULL after it has returned non-NULL, and the
current phylink behavior, as described above by you. That behavior was
first posted here:
https://lore.kernel.org/netdev/Ybiue1TPCwsdHmV4@shell.armlinux.org.uk/
in patches 1/7 and 2/7. I did not state that phylink should keep the old
PCS around, and I do not take responsibility for that.

Keeping in mind that I don't know whether anything has changed since
2021 which would make this condition any less theoretical than it was
back then, I guess if I were maintaining the code involved, I'd choose
between 2 options (whichever is easiest):

- Imagine a purely theoretical scenario where phylink transitions
  between a state->interface requiring a phylink_pcs, and one not
  requiring a phylink_pcs. I'm not even saying a serial PCS hardware
  block isn't present, just that it isn't modeled as a phylink_pcs
  (for reasons which may be valid or not). Probably the most logical
  thing to do in this scenario is allow the old phylink_pcs to be
  removed, and its ops never to be used for the new state->interface.

- Validate, possibly at phylink_validate_phy() time, that for all
  phy->possible_interfaces, mac_select_pcs() either returns NULL for
  all of them, or non-NULL for all of them. The idea would be to leave
  room for the use case to define itself (and the restriction to be
  lifted whenever necessary), instead of giving a predefined behavior
  for the transition when in reality we have no idea of the use case
  behind it. I don't know whether checking phy->possible_interfaces
  would be sufficient in ensuring that such a transition cannot occur.

I find no contradiction between my replies (mostly questions, actually)
in Nov 2021 and my current agreement that phylink's behavior of keeping
the old PCS and using it for the new state->interface doesn't make much
sense.
Russell King (Oracle) Oct. 11, 2024, 10:58 a.m. UTC | #7
On Fri, Oct 11, 2024 at 01:39:12PM +0300, Vladimir Oltean wrote:
> On Thu, Oct 10, 2024 at 02:00:32PM +0100, Russell King (Oracle) wrote:
> > On Thu, Oct 10, 2024 at 12:21:43PM +0100, Russell King (Oracle) wrote:
> > > Hmm. Looking at this again, we're getting into quite a mess because of
> > > one of your previous review comments from a number of years back.
> > > 
> > > You stated that you didn't see the need to support a transition from
> > > having-a-PCS to having-no-PCS. I don't have a link to that discussion.
> > > However, it is why we've ended up with phylink_major_config() having
> > > the extra complexity here, effectively preventing mac_select_pcs()
> > > from being able to remove a PCS that was previously added:
> > > 
> > > 		pcs_changed = pcs && pl->pcs != pcs;
> > > 
> > > because if mac_select_pcs() returns NULL, it was decided that any
> > > in-use PCS would not be removed. It seems (at least to me) to be a
> > > silly decision now.
> > > 
> > > However, if mac_select_pcs() in phylink_major_config() returns NULL,
> > > we don't do any validation of the PCS.
> > > 
> > > So this, today, before these patches, is already an inconsistent mess.
> > > 
> > > To fix this, I think:
> > > 
> > > 	struct phylink_pcs *pcs = NULL;
> > > ...
> > >         if (pl->mac_ops->mac_select_pcs) {
> > >                 pcs = pl->mac_ops->mac_select_pcs(pl->config, state->interface);
> > >                 if (IS_ERR(pcs))
> > >                         return PTR_ERR(pcs);
> > > 	}
> > > 
> > > 	if (!pcs)
> > > 		pcs = pl->pcs;
> > > 
> > > is needed to give consistent behaviour.
> > > 
> > > Alternatively, we could allow mac_select_pcs() to return NULL, which
> > > would then allow the PCS to be removed.
> > > 
> > > Let me know if you've changed your mind on what behaviour we should
> > > have, because this affects what I do to sort this out.
> > 
> > Here's a link to the original discussion from November 2021:
> > 
> > https://lore.kernel.org/all/E1mpSba-00BXp6-9e@rmk-PC.armlinux.org.uk/
> > 
> > Google uselessly refused to find it, so I searched my own mailboxes
> > to find the message ID.
> 
> Important note: I cannot find any discussion on any mailing list which
> fills the gap between me asking what is the real world applicability of
> mac_select_pcs() returning NULL after it has returned non-NULL, and the
> current phylink behavior, as described above by you. That behavior was
> first posted here:
> https://lore.kernel.org/netdev/Ybiue1TPCwsdHmV4@shell.armlinux.org.uk/
> in patches 1/7 and 2/7. I did not state that phylink should keep the old
> PCS around, and I do not take responsibility for that.

I wanted to add support for phylink_set_pcs() to remove the current
PCS and submitted a patch for it. You didn't see a use case and objected
to the patch, which wasn't merged. I've kept that behaviour ever since
on the grounds of your objection - as per the link that I posted.
This has been carried forward into the mac_select_pcs() implementation
where it explicitly does not allow a PCS to be removed. Pointing to
the introduction of mac_select_pcs() is later than your objection.

Let me restate it. As a *direct* result of your comments on patch 8/8
which starts here:
https://lore.kernel.org/netdev/E1mpSba-00BXp6-9e@rmk-PC.armlinux.org.uk/
I took your comments as meaning that you saw no reason why we should
allow a PCS to ever be removed. phylink_set_pcs() needed to be modified
to allow that to happen. Given your objection, that behaviour has been
carried forward by having explicit additional code to ensure that a
PCS can't be removed from phylink without replacing it with a different
PCS. In other words, mac_select_pcs() returning NULL when it has
previously returned a PCS does *nothing* to remove the previous PCS.

Maybe this was not your intention when reviewing patch 8/8, but that's
how your comments came over, and lead me to the conclusion that we
need to enforce that - and that is enforced by:

                pcs_changed = pcs && pl->pcs != pcs;

so pcs_change will always be false if pcs == NULL, thus preventing the
replacement of the pcs:

        if (pcs_changed) {
                phylink_pcs_disable(pl->pcs);

                if (pl->pcs)
                        pl->pcs->phylink = NULL;

                pcs->phylink = pl;

                pl->pcs = pcs;
        }

I wouldn't have coded it this way had you not objected to patch 8/8
which lead me to believe we should not allow a PCS to be removed.

Review comments do have implications for future patches... maybe it
wasn't want you intended, but this is a great example of cause and
(possibly unintended) effect.
Vladimir Oltean Oct. 11, 2024, 12:54 p.m. UTC | #8
On Fri, Oct 11, 2024 at 11:58:07AM +0100, Russell King (Oracle) wrote:
> On Fri, Oct 11, 2024 at 01:39:12PM +0300, Vladimir Oltean wrote:
> > On Thu, Oct 10, 2024 at 02:00:32PM +0100, Russell King (Oracle) wrote:
> > > On Thu, Oct 10, 2024 at 12:21:43PM +0100, Russell King (Oracle) wrote:
> > > > Hmm. Looking at this again, we're getting into quite a mess because of
> > > > one of your previous review comments from a number of years back.
> > > > 
> > > > You stated that you didn't see the need to support a transition from
> > > > having-a-PCS to having-no-PCS. I don't have a link to that discussion.
> > > > However, it is why we've ended up with phylink_major_config() having
> > > > the extra complexity here, effectively preventing mac_select_pcs()
> > > > from being able to remove a PCS that was previously added:
> > > > 
> > > > 		pcs_changed = pcs && pl->pcs != pcs;
> > > > 
> > > > because if mac_select_pcs() returns NULL, it was decided that any
> > > > in-use PCS would not be removed. It seems (at least to me) to be a
> > > > silly decision now.
> > > > 
> > > > However, if mac_select_pcs() in phylink_major_config() returns NULL,
> > > > we don't do any validation of the PCS.
> > > > 
> > > > So this, today, before these patches, is already an inconsistent mess.
> > > > 
> > > > To fix this, I think:
> > > > 
> > > > 	struct phylink_pcs *pcs = NULL;
> > > > ...
> > > >         if (pl->mac_ops->mac_select_pcs) {
> > > >                 pcs = pl->mac_ops->mac_select_pcs(pl->config, state->interface);
> > > >                 if (IS_ERR(pcs))
> > > >                         return PTR_ERR(pcs);
> > > > 	}
> > > > 
> > > > 	if (!pcs)
> > > > 		pcs = pl->pcs;
> > > > 
> > > > is needed to give consistent behaviour.
> > > > 
> > > > Alternatively, we could allow mac_select_pcs() to return NULL, which
> > > > would then allow the PCS to be removed.
> > > > 
> > > > Let me know if you've changed your mind on what behaviour we should
> > > > have, because this affects what I do to sort this out.
> > > 
> > > Here's a link to the original discussion from November 2021:
> > > 
> > > https://lore.kernel.org/all/E1mpSba-00BXp6-9e@rmk-PC.armlinux.org.uk/
> > > 
> > > Google uselessly refused to find it, so I searched my own mailboxes
> > > to find the message ID.
> > 
> > Important note: I cannot find any discussion on any mailing list which
> > fills the gap between me asking what is the real world applicability of
> > mac_select_pcs() returning NULL after it has returned non-NULL, and the
> > current phylink behavior, as described above by you. That behavior was
> > first posted here:
> > https://lore.kernel.org/netdev/Ybiue1TPCwsdHmV4@shell.armlinux.org.uk/
> > in patches 1/7 and 2/7. I did not state that phylink should keep the old
> > PCS around, and I do not take responsibility for that.
> 
> I wanted to add support for phylink_set_pcs() to remove the current
> PCS and submitted a patch for it. You didn't see a use case and objected
> to the patch, which wasn't merged.

It was an RFC, it wasn't a candidate for merging anyway.

> I've kept that behaviour ever since on the grounds of your objection -
> as per the link that I posted. This has been carried forward into the
> mac_select_pcs() implementation where it explicitly does not allow a
> PCS to be removed. Pointing to the introduction of mac_select_pcs() is
> later than your objection.

This does not invalidate my previous point in any way. The phylink_set_pcs()
API at that time implied a voluntary call from the driver. "pcs" was not
allowed to be NULL, and your patch was the one introducing phylink_set_pcs(NULL)
as a valid call. That's what I objected to as not seeing the purpose of.

Whereas the new mac_select_pcs() has "return NULL" already baked into it
from day one (the one-to-one equivalent of it being: don't call
phylink_set_pcs()). It becomes inevitable, in this new model, to handle
somehow the "what if" scenario of returning NULL after non-NULL, whereas
that was not necessary before. It's a different API, which automatically
implies a new set of rules.

My point is that it was impossible for me to consciously contribute to
the definition of the mac_select_pcs() API rules. Saying that the
introduction of mac_select_pcs() came later than my comment proves
exactly that. I couldn't have possibly known that my review comment will
be used for a different API than phylink_set_pcs(), because the new API
hadn't been posted.

Whereas what you are saying is that the mac_select_pcs() posting took
place after my comment => of course you took my comment into consideration.
You seem to be omitting that I did not have all information.

> Let me restate it. As a *direct* result of your comments on patch 8/8
> which starts here:
> https://lore.kernel.org/netdev/E1mpSba-00BXp6-9e@rmk-PC.armlinux.org.uk/
> I took your comments as meaning that you saw no reason why we should
> allow a PCS to ever be removed.

I still stand by that statement, in that context. You took it out of
context.

> phylink_set_pcs() needed to be modified to allow that to happen. Given
> your objection, that behaviour has been carried forward by having
> explicit additional code to ensure that a PCS can't be removed from
> phylink without replacing it with a different PCS. In other words,
> mac_select_pcs() returning NULL when it has previously returned a PCS
> does *nothing* to remove the previous PCS.

The carrying over of old behavior from one API to another API is merely
one of the design choices that can be made, and far from the only
option. In general, you introduce new API exactly _because_ you want to
change the behavior in some conditions.

> Maybe this was not your intention when reviewing patch 8/8, but that's
> how your comments came over, and lead me to the conclusion that we
> need to enforce that - and that is enforced by:
> 
>                 pcs_changed = pcs && pl->pcs != pcs;
> 
> so pcs_change will always be false if pcs == NULL, thus preventing the
> replacement of the pcs:
> 
>         if (pcs_changed) {
>                 phylink_pcs_disable(pl->pcs);
> 
>                 if (pl->pcs)
>                         pl->pcs->phylink = NULL;
> 
>                 pcs->phylink = pl;
> 
>                 pl->pcs = pcs;
>         }
> 
> I wouldn't have coded it this way had you not objected to patch 8/8
> which lead me to believe we should not allow a PCS to be removed.
> 
> Review comments do have implications for future patches... maybe it
> wasn't want you intended, but this is a great example of cause and
> (possibly unintended) effect.

This restatement was not necessary, as I believe I got the point from
your previous message.

I still fundamentally reject any responsibility you wish to attribute to
me here. For one, my review feedback was in a different context. But
let's even assume it was directly related, and I knew mac_select_pcs()
would come. If I were maintaining a piece of code and received some
review feedback, I would not translate it into code that carries
exclusively _my_ sign-off, unless _I_ agree with it and can justify it.
I would not seek to transfer responsibility to somebody else. In fact,
if I were to be held accountable for patches signed off by others, I
wouldn't even be reviewing anything. So I think it's a very fair
collaboration rule, and I'm only asking you to apply it to me as well.
Russell King (Oracle) Oct. 11, 2024, 5:51 p.m. UTC | #9
On Fri, Oct 11, 2024 at 03:54:21PM +0300, Vladimir Oltean wrote:
> On Fri, Oct 11, 2024 at 11:58:07AM +0100, Russell King (Oracle) wrote:
> > On Fri, Oct 11, 2024 at 01:39:12PM +0300, Vladimir Oltean wrote:
> > > On Thu, Oct 10, 2024 at 02:00:32PM +0100, Russell King (Oracle) wrote:
> > > > On Thu, Oct 10, 2024 at 12:21:43PM +0100, Russell King (Oracle) wrote:
> > > > > Hmm. Looking at this again, we're getting into quite a mess because of
> > > > > one of your previous review comments from a number of years back.
> > > > > 
> > > > > You stated that you didn't see the need to support a transition from
> > > > > having-a-PCS to having-no-PCS. I don't have a link to that discussion.
> > > > > However, it is why we've ended up with phylink_major_config() having
> > > > > the extra complexity here, effectively preventing mac_select_pcs()
> > > > > from being able to remove a PCS that was previously added:
> > > > > 
> > > > > 		pcs_changed = pcs && pl->pcs != pcs;
> > > > > 
> > > > > because if mac_select_pcs() returns NULL, it was decided that any
> > > > > in-use PCS would not be removed. It seems (at least to me) to be a
> > > > > silly decision now.
> > > > > 
> > > > > However, if mac_select_pcs() in phylink_major_config() returns NULL,
> > > > > we don't do any validation of the PCS.
> > > > > 
> > > > > So this, today, before these patches, is already an inconsistent mess.
> > > > > 
> > > > > To fix this, I think:
> > > > > 
> > > > > 	struct phylink_pcs *pcs = NULL;
> > > > > ...
> > > > >         if (pl->mac_ops->mac_select_pcs) {
> > > > >                 pcs = pl->mac_ops->mac_select_pcs(pl->config, state->interface);
> > > > >                 if (IS_ERR(pcs))
> > > > >                         return PTR_ERR(pcs);
> > > > > 	}
> > > > > 
> > > > > 	if (!pcs)
> > > > > 		pcs = pl->pcs;
> > > > > 
> > > > > is needed to give consistent behaviour.
> > > > > 
> > > > > Alternatively, we could allow mac_select_pcs() to return NULL, which
> > > > > would then allow the PCS to be removed.
> > > > > 
> > > > > Let me know if you've changed your mind on what behaviour we should
> > > > > have, because this affects what I do to sort this out.
> > > > 
> > > > Here's a link to the original discussion from November 2021:
> > > > 
> > > > https://lore.kernel.org/all/E1mpSba-00BXp6-9e@rmk-PC.armlinux.org.uk/
> > > > 
> > > > Google uselessly refused to find it, so I searched my own mailboxes
> > > > to find the message ID.
> > > 
> > > Important note: I cannot find any discussion on any mailing list which
> > > fills the gap between me asking what is the real world applicability of
> > > mac_select_pcs() returning NULL after it has returned non-NULL, and the
> > > current phylink behavior, as described above by you. That behavior was
> > > first posted here:
> > > https://lore.kernel.org/netdev/Ybiue1TPCwsdHmV4@shell.armlinux.org.uk/
> > > in patches 1/7 and 2/7. I did not state that phylink should keep the old
> > > PCS around, and I do not take responsibility for that.
> > 
> > I wanted to add support for phylink_set_pcs() to remove the current
> > PCS and submitted a patch for it. You didn't see a use case and objected
> > to the patch, which wasn't merged.
> 
> It was an RFC, it wasn't a candidate for merging anyway.

What does that have to do with it????????????

An idea is put forward (the idea of allowing PCS to be removed.) It's
put forward as a RFC. It gets shot down. Author then goes away believing
that there is no desire to allow PCS to be removed. That idea gets
carried forward into future patches.

_That_ is what exactly happened. I'm not attributing blame for it,
merely explaining how we got to where we are with this, and how we've
ended up in the mess we have with PCS able to be used outside of its
validated set.

You want me to provide more explanation on the patch, but I've
identified a fundamental error here caused as an effect of a previous
review comment.

I'm now wondering what to do about it and how to solve this in a way
that won't cause us to go around another long confrontational discussion
but it seems that's not possible.

So, do I ignore your review comments and just do what I think is the
right thing, or do I attempt to discuss it with you? I think, given
_this_ debacle, I ignore you. I would much rather involve you but it
seems that's a mistake.
Vladimir Oltean Oct. 12, 2024, 10:27 a.m. UTC | #10
On Fri, Oct 11, 2024 at 06:51:51PM +0100, Russell King (Oracle) wrote:
> On Fri, Oct 11, 2024 at 03:54:21PM +0300, Vladimir Oltean wrote:
> > On Fri, Oct 11, 2024 at 11:58:07AM +0100, Russell King (Oracle) wrote:
> > > I wanted to add support for phylink_set_pcs() to remove the current
> > > PCS and submitted a patch for it. You didn't see a use case and objected
> > > to the patch, which wasn't merged.
> > 
> > It was an RFC, it wasn't a candidate for merging anyway.
> 
> What does that have to do with it????????????
> 
> An idea is put forward (the idea of allowing PCS to be removed.) It's
> put forward as a RFC. It gets shot down. Author then goes away believing
> that there is no desire to allow PCS to be removed. That idea gets
> carried forward into future patches.
> 
> _That_ is what exactly happened. I'm not attributing blame for it,
> merely explaining how we got to where we are with this, and how we've
> ended up in the mess we have with PCS able to be used outside of its
> validated set.
> 
> You want me to provide more explanation on the patch, but I've
> identified a fundamental error here caused as an effect of a previous
> review comment.
> 
> I'm now wondering what to do about it and how to solve this in a way
> that won't cause us to go around another long confrontational discussion
> but it seems that's not possible.
> 
> So, do I ignore your review comments and just do what I think is the
> right thing, or do I attempt to discuss it with you? I think, given
> _this_ debacle, I ignore you. I would much rather involve you but it
> seems that's a mistake.

My technical answer was already provided 2 replies ago:

| Keeping in mind that I don't know whether anything has changed since
| 2021 which would make this condition any less theoretical than it was
| back then, I guess if I were maintaining the code involved, I'd choose
| between 2 options (whichever is easiest):
| 
| - Imagine a purely theoretical scenario where phylink transitions
|   between a state->interface requiring a phylink_pcs, and one not
|   requiring a phylink_pcs. I'm not even saying a serial PCS hardware
|   block isn't present, just that it isn't modeled as a phylink_pcs
|   (for reasons which may be valid or not). Probably the most logical
|   thing to do in this scenario is allow the old phylink_pcs to be
|   removed, and its ops never to be used for the new state->interface.
| 
| - Validate, possibly at phylink_validate_phy() time, that for all
|   phy->possible_interfaces, mac_select_pcs() either returns NULL for
|   all of them, or non-NULL for all of them. The idea would be to leave
|   room for the use case to define itself (and the restriction to be
|   lifted whenever necessary), instead of giving a predefined behavior
|   for the transition when in reality we have no idea of the use case
|   behind it. I don't know whether checking phy->possible_interfaces
|   would be sufficient in ensuring that such a transition cannot occur.

I have nothing more to add to this discussion.
diff mbox series

Patch

diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
index 4309317de3d1..8f86599d3d78 100644
--- a/drivers/net/phy/phylink.c
+++ b/drivers/net/phy/phylink.c
@@ -79,7 +79,6 @@  struct phylink {
 	unsigned int pcs_state;
 
 	bool mac_link_dropped;
-	bool using_mac_select_pcs;
 
 	struct sfp_bus *sfp_bus;
 	bool sfp_may_have_phy;
@@ -661,12 +660,12 @@  static int phylink_validate_mac_and_pcs(struct phylink *pl,
 	int ret;
 
 	/* Get the PCS for this interface mode */
-	if (pl->using_mac_select_pcs) {
+	if (pl->mac_ops->mac_select_pcs) {
 		pcs = pl->mac_ops->mac_select_pcs(pl->config, state->interface);
 		if (IS_ERR(pcs))
 			return PTR_ERR(pcs);
 	} else {
-		pcs = pl->pcs;
+		pcs = NULL;
 	}
 
 	if (pcs) {
@@ -1182,7 +1181,7 @@  static void phylink_major_config(struct phylink *pl, bool restart,
 						state->interface,
 						state->advertising);
 
-	if (pl->using_mac_select_pcs) {
+	if (pl->mac_ops->mac_select_pcs) {
 		pcs = pl->mac_ops->mac_select_pcs(pl->config, state->interface);
 		if (IS_ERR(pcs)) {
 			phylink_err(pl,
@@ -1698,7 +1697,6 @@  struct phylink *phylink_create(struct phylink_config *config,
 			       phy_interface_t iface,
 			       const struct phylink_mac_ops *mac_ops)
 {
-	bool using_mac_select_pcs = false;
 	struct phylink *pl;
 	int ret;
 
@@ -1709,11 +1707,6 @@  struct phylink *phylink_create(struct phylink_config *config,
 		return ERR_PTR(-EINVAL);
 	}
 
-	if (mac_ops->mac_select_pcs &&
-	    mac_ops->mac_select_pcs(config, PHY_INTERFACE_MODE_NA) !=
-	      ERR_PTR(-EOPNOTSUPP))
-		using_mac_select_pcs = true;
-
 	pl = kzalloc(sizeof(*pl), GFP_KERNEL);
 	if (!pl)
 		return ERR_PTR(-ENOMEM);
@@ -1732,7 +1725,6 @@  struct phylink *phylink_create(struct phylink_config *config,
 		return ERR_PTR(-EINVAL);
 	}
 
-	pl->using_mac_select_pcs = using_mac_select_pcs;
 	pl->phy_state.interface = iface;
 	pl->link_interface = iface;
 	if (iface == PHY_INTERFACE_MODE_MOCA)