diff mbox series

[net-next] net: phylib: fix phy_read*_poll_timeout()

Message ID E1q4kX6-00BNuM-Mx@rmk-PC.armlinux.org.uk (mailing list archive)
State Accepted
Commit 4ec7329517027db28c5683675ab3b3842ad60324
Delegated to: Netdev Maintainers
Headers show
Series [net-next] net: phylib: fix phy_read*_poll_timeout() | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 408 this patch: 408
netdev/cc_maintainers success CCed 4 of 4 maintainers
netdev/build_clang success Errors and warnings before: 287 this patch: 287
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 394 this patch: 394
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 31 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Russell King (Oracle) June 1, 2023, 3:48 p.m. UTC
Dan Carpenter reported a signedness bug in genphy_loopback(). Andrew
reports that:

"It is common to get this wrong in general with PHY drivers. Dan
regularly posts fixes like this soon after a PHY driver patch it
merged. I really wish we could somehow get the compiler to warn when
the result from phy_read() is stored into a unsigned type. It would
save Dan a lot of work."

Let's make phy_read*_poll_timeout() immune to further issues when "val"
is an unsigned type by storing the read function's result in a signed
int as well as "val", and using the signed variable both to check for
an error and for propagating that error to the caller.

The advantage of this method is we don't change where the cast from
the signed return code to the user's variable occurs - so users will
see no change.

Previously Heiner changed phy_read_poll_timeout() to check for an error
before evaluating the user supplied condition, but didn't update
phy_read_mmd_poll_timeout(). Make that change there too.

Link: https://lore.kernel.org/r/d7bb312e-2428-45f6-b9b3-59ba544e8b94@kili.mountain
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
---
 include/linux/phy.h | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

Comments

Jakub Kicinski June 2, 2023, 4:33 a.m. UTC | #1
On Thu, 01 Jun 2023 16:48:12 +0100 Russell King (Oracle) wrote:
> +	__ret = read_poll_timeout(__val = phy_read, val, \
                                                    ^^^
Is this not __val on purpose?
Jakub Kicinski June 2, 2023, 4:35 a.m. UTC | #2
On Thu, 1 Jun 2023 21:33:45 -0700 Jakub Kicinski wrote:
> On Thu, 01 Jun 2023 16:48:12 +0100 Russell King (Oracle) wrote:
> > +	__ret = read_poll_timeout(__val = phy_read, val, \  
>                                                     ^^^
> Is this not __val on purpose?

Yes it is :)  All this to save the single line of assignment
after the read_poll_timeout() "call" ?
Russell King (Oracle) June 2, 2023, 8:53 a.m. UTC | #3
On Thu, Jun 01, 2023 at 09:35:09PM -0700, Jakub Kicinski wrote:
> On Thu, 1 Jun 2023 21:33:45 -0700 Jakub Kicinski wrote:
> > On Thu, 01 Jun 2023 16:48:12 +0100 Russell King (Oracle) wrote:
> > > +	__ret = read_poll_timeout(__val = phy_read, val, \  
> >                                                     ^^^
> > Is this not __val on purpose?
> 
> Yes it is :)  All this to save the single line of assignment
> after the read_poll_timeout() "call" ?

Okay, so it seems you don't like it. We can't fix it then, and we'll
have to go with the BUILD_BUG_ON() forcing all users to use a signed
varable (which better be larger than a s8 so negative errnos can fit)
or we just rely on Dan to report the problems.
Jakub Kicinski June 2, 2023, 4:05 p.m. UTC | #4
On Fri, 2 Jun 2023 09:53:09 +0100 Russell King (Oracle) wrote:
> > Yes it is :)  All this to save the single line of assignment
> > after the read_poll_timeout() "call" ?  
> 
> Okay, so it seems you don't like it. We can't fix it then, and we'll
> have to go with the BUILD_BUG_ON() forcing all users to use a signed
> varable (which better be larger than a s8 so negative errnos can fit)
> or we just rely on Dan to report the problems.

Wait, did the version I proposed not work?

https://lore.kernel.org/all/20230530121910.05b9f837@kernel.org/

I just think the assignment inside the first argument is unnecessarily
unreadable. Maybe it's just me.
Russell King (Oracle) June 2, 2023, 4:17 p.m. UTC | #5
On Fri, Jun 02, 2023 at 09:05:39AM -0700, Jakub Kicinski wrote:
> On Fri, 2 Jun 2023 09:53:09 +0100 Russell King (Oracle) wrote:
> > > Yes it is :)  All this to save the single line of assignment
> > > after the read_poll_timeout() "call" ?  
> > 
> > Okay, so it seems you don't like it. We can't fix it then, and we'll
> > have to go with the BUILD_BUG_ON() forcing all users to use a signed
> > varable (which better be larger than a s8 so negative errnos can fit)
> > or we just rely on Dan to report the problems.
> 
> Wait, did the version I proposed not work?
> 
> https://lore.kernel.org/all/20230530121910.05b9f837@kernel.org/

If we're into the business of throwing web URLs at each other for
messages we've already read, here's my one for you which contains
the explanation why your one is broken, and proposing my solution.

https://lore.kernel.org/all/ZHZmBBDSVMf1WQWI@shell.armlinux.org.uk/

To see exactly why yours is broken, see the paragraph starting
"The elephant in the room..."

If it needs yet more explanation, which clearly it does, then let's
look at what genphy_loopback is doing:

                ret = phy_read_poll_timeout(phydev, MII_BMSR, val,
                                            val & BMSR_LSTATUS,
                                    5000, 500000, true);

Now, with your supposed "fix" of:

+	int __ret, __val;						\
+									\
+	__ret = read_poll_timeout(phy_read, __val, __val < 0 || (cond),	\
 		sleep_us, timeout_us, sleep_before_read, phydev, regnum); \

This ends up being:

	int __ret, __val;

	__ret = read_poll_timeout(phy_read, __val, __val < 0 || (val & BMSR_LSTATUS),
 		sleep_us, timeout_us, sleep_before_read, phydev, regnum);

and that expands to something that does this:

	__val = phy_read(phydev, regnum);
	if (__val < 0 || (val & BMSR_LSTATUS))
		break;

Can you spot the bug yet? Where does "val" for the test "val & BMSR_LSTATUS"
come from?

A bigger hint. With the existing code, this would have been:

	val = phy_read(phydev, regnum);
	if (val < 0 || (val & BMSR_LSTATUS))
		break;

See the difference? val & BMSR_LSTATUS is checking the value that was
returned from phy_read() here, but in yours, it's checking an
uninitialised variable.

With my proposal, this becomes:

	val = __val = phy_read(phydev, regnum);
	if (__val < 0 || (val & BMSR_LSTATUS))
		break;

where "val" is whatever type the user chose, which has absolutely _no_
bearing what so ever on whether the test for __val < 0 can be correctly
evaluated, and makes that test totally independent of whatever type the
user chose.
Russell King (Oracle) June 2, 2023, 4:34 p.m. UTC | #6
On Fri, Jun 02, 2023 at 05:17:59PM +0100, Russell King (Oracle) wrote:
> On Fri, Jun 02, 2023 at 09:05:39AM -0700, Jakub Kicinski wrote:
> > On Fri, 2 Jun 2023 09:53:09 +0100 Russell King (Oracle) wrote:
> > > > Yes it is :)  All this to save the single line of assignment
> > > > after the read_poll_timeout() "call" ?  
> > > 
> > > Okay, so it seems you don't like it. We can't fix it then, and we'll
> > > have to go with the BUILD_BUG_ON() forcing all users to use a signed
> > > varable (which better be larger than a s8 so negative errnos can fit)
> > > or we just rely on Dan to report the problems.
> > 
> > Wait, did the version I proposed not work?
> > 
> > https://lore.kernel.org/all/20230530121910.05b9f837@kernel.org/
> 
> If we're into the business of throwing web URLs at each other for
> messages we've already read, here's my one for you which contains
> the explanation why your one is broken, and proposing my solution.
> 
> https://lore.kernel.org/all/ZHZmBBDSVMf1WQWI@shell.armlinux.org.uk/
> 
> To see exactly why yours is broken, see the paragraph starting
> "The elephant in the room..."
> 
> If it needs yet more explanation, which clearly it does, then let's
> look at what genphy_loopback is doing:
> 
>                 ret = phy_read_poll_timeout(phydev, MII_BMSR, val,
>                                             val & BMSR_LSTATUS,
>                                     5000, 500000, true);
> 
> Now, with your supposed "fix" of:
> 
> +	int __ret, __val;						\
> +									\
> +	__ret = read_poll_timeout(phy_read, __val, __val < 0 || (cond),	\
>  		sleep_us, timeout_us, sleep_before_read, phydev, regnum); \
> 
> This ends up being:
> 
> 	int __ret, __val;
> 
> 	__ret = read_poll_timeout(phy_read, __val, __val < 0 || (val & BMSR_LSTATUS),
>  		sleep_us, timeout_us, sleep_before_read, phydev, regnum);
> 
> and that expands to something that does this:
> 
> 	__val = phy_read(phydev, regnum);
> 	if (__val < 0 || (val & BMSR_LSTATUS))
> 		break;
> 
> Can you spot the bug yet? Where does "val" for the test "val & BMSR_LSTATUS"
> come from?
> 
> A bigger hint. With the existing code, this would have been:
> 
> 	val = phy_read(phydev, regnum);
> 	if (val < 0 || (val & BMSR_LSTATUS))
> 		break;
> 
> See the difference? val & BMSR_LSTATUS is checking the value that was
> returned from phy_read() here, but in yours, it's checking an
> uninitialised variable.
> 
> With my proposal, this becomes:
> 
> 	val = __val = phy_read(phydev, regnum);
> 	if (__val < 0 || (val & BMSR_LSTATUS))
> 		break;
> 
> where "val" is whatever type the user chose, which has absolutely _no_
> bearing what so ever on whether the test for __val < 0 can be correctly
> evaluated, and makes that test totally independent of whatever type the
> user chose.

If you don't like my solution, then I suppose another possibility would
be:

#define __phy_poll_read(phydev, regnum, val) \
	({ \
		int __err; \
		__err = phy_read(phydev, regnum); \
		if (__err >= 0) \
			val = __err; \
		__err; \
	})

#define phy_read_poll_timeout(phydev, regnum, val, cond, sleep_us, \
                                timeout_us, sleep_before_read) \
({ \
	int __ret, __err; \
	__ret = read_poll_timeout(__phy_poll_read, __err, \
				  __err < 0 || (cond), \
		sleep_us, timeout_us, sleep_before_read, phydev, regnum, val); \
	if (__err < 0) \
		__ret = __err; \
...

but that brings with it the possibility of using an uninitialised
"val" (e.g. if phy_read() returns an error on the first iteration.)
and is way more horrid and even less easy to understand.

Remember that we default to *not* warning about uninitialised variables
when building the kernel, so this won't produce a warning - which I
guess is probably why you didn't notice that your suggestion left "val"
uninitialised.
Jakub Kicinski June 2, 2023, 5:10 p.m. UTC | #7
On Fri, 2 Jun 2023 17:34:31 +0100 Russell King (Oracle) wrote:
> On Fri, Jun 02, 2023 at 05:17:59PM +0100, Russell King (Oracle) wrote:
> > On Fri, Jun 02, 2023 at 09:05:39AM -0700, Jakub Kicinski wrote:  
> > > Wait, did the version I proposed not work?
> > > 
> > > https://lore.kernel.org/all/20230530121910.05b9f837@kernel.org/  
> > 
> > If we're into the business of throwing web URLs at each other for
> > messages we've already read, here's my one for you which contains
> > the explanation why your one is broken, and proposing my solution.
> > 
> > https://lore.kernel.org/all/ZHZmBBDSVMf1WQWI@shell.armlinux.org.uk/
> > 
> > To see exactly why yours is broken, see the paragraph starting
> > "The elephant in the room..."

Ah, yes, sorry, I'll admit I didn't get what you mean by the elephant
paragraph when I read that.

> If you don't like my solution, then I suppose another possibility would
> be:
> 
> #define __phy_poll_read(phydev, regnum, val) \
> 	({ \
> 		int __err; \
> 		__err = phy_read(phydev, regnum); \
> 		if (__err >= 0) \
> 			val = __err; \
> 		__err; \
> 	})
> 
> #define phy_read_poll_timeout(phydev, regnum, val, cond, sleep_us, \
>                                 timeout_us, sleep_before_read) \
> ({ \
> 	int __ret, __err; \
> 	__ret = read_poll_timeout(__phy_poll_read, __err, \
> 				  __err < 0 || (cond), \
> 		sleep_us, timeout_us, sleep_before_read, phydev, regnum, val); \
> 	if (__err < 0) \
> 		__ret = __err; \
> ...
> 
> but that brings with it the possibility of using an uninitialised
> "val" (e.g. if phy_read() returns an error on the first iteration.)
> and is way more horrid and even less easy to understand.
> 
> Remember that we default to *not* warning about uninitialised variables
> when building the kernel, so this won't produce a warning - which I
> guess is probably why you didn't notice that your suggestion left "val"
> uninitialised.

Right :(  Let's keep the patch as is.
patchwork-bot+netdevbpf@kernel.org June 3, 2023, 6:40 a.m. UTC | #8
Hello:

This patch was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Thu, 01 Jun 2023 16:48:12 +0100 you wrote:
> Dan Carpenter reported a signedness bug in genphy_loopback(). Andrew
> reports that:
> 
> "It is common to get this wrong in general with PHY drivers. Dan
> regularly posts fixes like this soon after a PHY driver patch it
> merged. I really wish we could somehow get the compiler to warn when
> the result from phy_read() is stored into a unsigned type. It would
> save Dan a lot of work."
> 
> [...]

Here is the summary with links:
  - [net-next] net: phylib: fix phy_read*_poll_timeout()
    https://git.kernel.org/netdev/net-next/c/4ec732951702

You are awesome, thank you!
diff mbox series

Patch

diff --git a/include/linux/phy.h b/include/linux/phy.h
index 7addde5d14c0..11c1e91563d4 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -1206,10 +1206,12 @@  static inline int phy_read(struct phy_device *phydev, u32 regnum)
 #define phy_read_poll_timeout(phydev, regnum, val, cond, sleep_us, \
 				timeout_us, sleep_before_read) \
 ({ \
-	int __ret = read_poll_timeout(phy_read, val, val < 0 || (cond), \
+	int __ret, __val; \
+	__ret = read_poll_timeout(__val = phy_read, val, \
+				  __val < 0 || (cond), \
 		sleep_us, timeout_us, sleep_before_read, phydev, regnum); \
-	if (val < 0) \
-		__ret = val; \
+	if (__val < 0) \
+		__ret = __val; \
 	if (__ret) \
 		phydev_err(phydev, "%s failed: %d\n", __func__, __ret); \
 	__ret; \
@@ -1302,11 +1304,13 @@  int phy_read_mmd(struct phy_device *phydev, int devad, u32 regnum);
 #define phy_read_mmd_poll_timeout(phydev, devaddr, regnum, val, cond, \
 				  sleep_us, timeout_us, sleep_before_read) \
 ({ \
-	int __ret = read_poll_timeout(phy_read_mmd, val, (cond) || val < 0, \
+	int __ret, __val; \
+	__ret = read_poll_timeout(__val = phy_read_mmd, val, \
+				  __val < 0 || (cond), \
 				  sleep_us, timeout_us, sleep_before_read, \
 				  phydev, devaddr, regnum); \
-	if (val <  0) \
-		__ret = val; \
+	if (__val < 0) \
+		__ret = __val; \
 	if (__ret) \
 		phydev_err(phydev, "%s failed: %d\n", __func__, __ret); \
 	__ret; \