diff mbox series

clkdev: report over-sized strings when creating clkdev entries

Message ID E1rl62V-004UFh-Te@rmk-PC.armlinux.org.uk (mailing list archive)
State Changes Requested, archived
Headers show
Series clkdev: report over-sized strings when creating clkdev entries | expand

Commit Message

Russell King (Oracle) March 15, 2024, 11:47 a.m. UTC
Report an error when an attempt to register a clkdev entry results in a
truncated string so the problem can be easily spotted.

Reported by: Duanqiang Wen <duanqiangwen@net-swift.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
---
 drivers/clk/clkdev.c | 26 ++++++++++++++++++++++++--
 1 file changed, 24 insertions(+), 2 deletions(-)

Comments

Stephen Boyd April 8, 2024, 3:48 a.m. UTC | #1
Quoting Russell King (Oracle) (2024-03-15 04:47:55)
> Report an error when an attempt to register a clkdev entry results in a
> truncated string so the problem can be easily spotted.
> 
> Reported by: Duanqiang Wen <duanqiangwen@net-swift.com>
> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
> ---

Russell, are you taking this through your tree? I took the last one
because it was small and you reviewed it instead of applied it. If
you're taking it please add:

Reviewed-by: Stephen Boyd <sboyd@kernel.org>
Stephen Boyd May 2, 2024, 12:59 a.m. UTC | #2
Quoting Russell King (Oracle) (2024-03-15 04:47:55)
> Report an error when an attempt to register a clkdev entry results in a
> truncated string so the problem can be easily spotted.
> 
> Reported by: Duanqiang Wen <duanqiangwen@net-swift.com>
> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
> ---

Applied to clk-next
Stephen Boyd May 2, 2024, 1:02 a.m. UTC | #3
Quoting Stephen Boyd (2024-05-01 17:59:16)
> Quoting Russell King (Oracle) (2024-03-15 04:47:55)
> > Report an error when an attempt to register a clkdev entry results in a
> > truncated string so the problem can be easily spotted.
> > 
> > Reported by: Duanqiang Wen <duanqiangwen@net-swift.com>
> > Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
> > ---
> 
> Applied to clk-next
> 

And backed out because I get a compilation failure

drivers/clk/clkdev.c: In function 'vclkdev_alloc':
drivers/clk/clkdev.c:182:17: error: function 'vclkdev_alloc' might be a candidate for 'gnu_printf' format attribute [-Werror=suggest-attribute=format]
  182 |                 res = vsnprintf(cla->dev_id, sizeof(cla->dev_id), dev_fmt, ap);
      |                 ^~~
cc1: all warnings being treated as errors
make[5]: *** [scripts/Makefile.build:244: drivers/clk/clkdev.o] Error 1
make[4]: *** [scripts/Makefile.build:485: drivers/clk] Error 2
Russell King (Oracle) May 2, 2024, 8:02 a.m. UTC | #4
On Wed, May 01, 2024 at 06:02:54PM -0700, Stephen Boyd wrote:
> Quoting Stephen Boyd (2024-05-01 17:59:16)
> > Quoting Russell King (Oracle) (2024-03-15 04:47:55)
> > > Report an error when an attempt to register a clkdev entry results in a
> > > truncated string so the problem can be easily spotted.
> > > 
> > > Reported by: Duanqiang Wen <duanqiangwen@net-swift.com>
> > > Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
> > > ---
> > 
> > Applied to clk-next
> > 
> 
> And backed out because I get a compilation failure
> 
> drivers/clk/clkdev.c: In function 'vclkdev_alloc':
> drivers/clk/clkdev.c:182:17: error: function 'vclkdev_alloc' might be a candidate for 'gnu_printf' format attribute [-Werror=suggest-attribute=format]
>   182 |                 res = vsnprintf(cla->dev_id, sizeof(cla->dev_id), dev_fmt, ap);
>       |                 ^~~
> cc1: all warnings being treated as errors
> make[5]: *** [scripts/Makefile.build:244: drivers/clk/clkdev.o] Error 1
> make[4]: *** [scripts/Makefile.build:485: drivers/clk] Error 2

It builds fine for me. I don't get this _error_, and it's really no
different from what it originally was - instead of using vcsnprintf()
we're now using vsnprintf(). That should make no difference what so
ever.
Russell King (Oracle) May 2, 2024, 8:22 a.m. UTC | #5
On Thu, May 02, 2024 at 09:02:16AM +0100, Russell King (Oracle) wrote:
> On Wed, May 01, 2024 at 06:02:54PM -0700, Stephen Boyd wrote:
> > Quoting Stephen Boyd (2024-05-01 17:59:16)
> > > Quoting Russell King (Oracle) (2024-03-15 04:47:55)
> > > > Report an error when an attempt to register a clkdev entry results in a
> > > > truncated string so the problem can be easily spotted.
> > > > 
> > > > Reported by: Duanqiang Wen <duanqiangwen@net-swift.com>
> > > > Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
> > > > ---
> > > 
> > > Applied to clk-next
> > > 
> > 
> > And backed out because I get a compilation failure
> > 
> > drivers/clk/clkdev.c: In function 'vclkdev_alloc':
> > drivers/clk/clkdev.c:182:17: error: function 'vclkdev_alloc' might be a candidate for 'gnu_printf' format attribute [-Werror=suggest-attribute=format]
> >   182 |                 res = vsnprintf(cla->dev_id, sizeof(cla->dev_id), dev_fmt, ap);
> >       |                 ^~~
> > cc1: all warnings being treated as errors
> > make[5]: *** [scripts/Makefile.build:244: drivers/clk/clkdev.o] Error 1
> > make[4]: *** [scripts/Makefile.build:485: drivers/clk] Error 2
> 
> It builds fine for me. I don't get this _error_, and it's really no
> different from what it originally was - instead of using vcsnprintf()
> we're now using vsnprintf(). That should make no difference what so
> ever.

... and I've just checked, and it builds entirely cleanly for me.

I'll merge it.
Russell King (Oracle) May 2, 2024, 11:08 a.m. UTC | #6
On Thu, May 02, 2024 at 09:22:39AM +0100, Russell King (Oracle) wrote:
> On Thu, May 02, 2024 at 09:02:16AM +0100, Russell King (Oracle) wrote:
> > On Wed, May 01, 2024 at 06:02:54PM -0700, Stephen Boyd wrote:
> > > Quoting Stephen Boyd (2024-05-01 17:59:16)
> > > > Quoting Russell King (Oracle) (2024-03-15 04:47:55)
> > > > > Report an error when an attempt to register a clkdev entry results in a
> > > > > truncated string so the problem can be easily spotted.
> > > > > 
> > > > > Reported by: Duanqiang Wen <duanqiangwen@net-swift.com>
> > > > > Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
> > > > > ---
> > > > 
> > > > Applied to clk-next
> > > > 
> > > 
> > > And backed out because I get a compilation failure
> > > 
> > > drivers/clk/clkdev.c: In function 'vclkdev_alloc':
> > > drivers/clk/clkdev.c:182:17: error: function 'vclkdev_alloc' might be a candidate for 'gnu_printf' format attribute [-Werror=suggest-attribute=format]
> > >   182 |                 res = vsnprintf(cla->dev_id, sizeof(cla->dev_id), dev_fmt, ap);
> > >       |                 ^~~
> > > cc1: all warnings being treated as errors
> > > make[5]: *** [scripts/Makefile.build:244: drivers/clk/clkdev.o] Error 1
> > > make[4]: *** [scripts/Makefile.build:485: drivers/clk] Error 2
> > 
> > It builds fine for me. I don't get this _error_, and it's really no
> > different from what it originally was - instead of using vcsnprintf()
> > we're now using vsnprintf(). That should make no difference what so
> > ever.
> 
> ... and I've just checked, and it builds entirely cleanly for me.
> 
> I'll merge it.

It should be in tonight's linux-next tree.

I also note... the kernel build bot never flagged the above, not even
a warning for it, and it would've built the patch as it was Cc'd to
linux-kernel. So my conclusion is the error you are seeing is somehow
specific to your environment. Maybe you're building with an additional
set of warning flags and -Werror?
Stephen Boyd May 2, 2024, 10:18 p.m. UTC | #7
Quoting Russell King (Oracle) (2024-05-02 04:08:38)
> On Thu, May 02, 2024 at 09:22:39AM +0100, Russell King (Oracle) wrote:
> > On Thu, May 02, 2024 at 09:02:16AM +0100, Russell King (Oracle) wrote:
> > > On Wed, May 01, 2024 at 06:02:54PM -0700, Stephen Boyd wrote:
> > > > Quoting Stephen Boyd (2024-05-01 17:59:16)
> > > > > Quoting Russell King (Oracle) (2024-03-15 04:47:55)
> > > > > > Report an error when an attempt to register a clkdev entry results in a
> > > > > > truncated string so the problem can be easily spotted.
> > > > > > 
> > > > > > Reported by: Duanqiang Wen <duanqiangwen@net-swift.com>
> > > > > > Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
> > > > > > ---
> > > > > 
> > > > > Applied to clk-next
> > > > > 
> > > > 
> > > > And backed out because I get a compilation failure
> > > > 
> > > > drivers/clk/clkdev.c: In function 'vclkdev_alloc':
> > > > drivers/clk/clkdev.c:182:17: error: function 'vclkdev_alloc' might be a candidate for 'gnu_printf' format attribute [-Werror=suggest-attribute=format]
> > > >   182 |                 res = vsnprintf(cla->dev_id, sizeof(cla->dev_id), dev_fmt, ap);
> > > >       |                 ^~~
> > > > cc1: all warnings being treated as errors
> > > > make[5]: *** [scripts/Makefile.build:244: drivers/clk/clkdev.o] Error 1
> > > > make[4]: *** [scripts/Makefile.build:485: drivers/clk] Error 2
> > > 
> > > It builds fine for me. I don't get this _error_, and it's really no
> > > different from what it originally was - instead of using vcsnprintf()
> > > we're now using vsnprintf(). That should make no difference what so
> > > ever.
> > 
> > ... and I've just checked, and it builds entirely cleanly for me.
> > 
> > I'll merge it.
> 
> It should be in tonight's linux-next tree.

Ok thanks

> 
> I also note... the kernel build bot never flagged the above, not even
> a warning for it, and it would've built the patch as it was Cc'd to
> linux-kernel.

Indeed. I see a report here[1] but it's only a warning. And there was
some work a few years ago[2] that I forgot about. Seems there was a
possible fix[3] as well.

> So my conclusion is the error you are seeing is somehow
> specific to your environment. Maybe you're building with an additional
> set of warning flags and -Werror?
> 

Yes I build with W=1 but I didn't think that turned warnings into
errors. Maybe that part is new.

[1] https://lore.kernel.org/all/202403110643.27JXEVCI-lkp@intel.com/
[2] https://lore.kernel.org/all/20210310085937.GF4931@dell/
[3] https://lore.kernel.org/all/20221111071809.3440-1-liubo03@inspur.com/
Guenter Roeck May 17, 2024, 10:09 p.m. UTC | #8
Hi,

On Fri, Mar 15, 2024 at 11:47:55AM +0000, Russell King (Oracle) wrote:
> Report an error when an attempt to register a clkdev entry results in a
> truncated string so the problem can be easily spotted.
> 
> Reported by: Duanqiang Wen <duanqiangwen@net-swift.com>
> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
> Reviewed-by: Stephen Boyd <sboyd@kernel.org>

With this patch in the mainline kernel, I get

10000000.clock-controller:corepll: device ID is greater than 24
sifive-clk-prci 10000000.clock-controller: Failed to register clkdev for corepll: -12
sifive-clk-prci 10000000.clock-controller: could not register clocks: -12
sifive-clk-prci 10000000.clock-controller: probe with driver sifive-clk-prci failed with error -12
...
platform 10060000.gpio: deferred probe pending: platform: supplier 10000000.clock-controller not ready
platform 10010000.serial: deferred probe pending: platform: supplier 10000000.clock-controller not ready
platform 10011000.serial: deferred probe pending: platform: supplier 10000000.clock-controller not ready
platform 10040000.spi: deferred probe pending: platform: supplier 10000000.clock-controller not ready
platform 10050000.spi: deferred probe pending: platform: supplier 10000000.clock-controller not ready
platform 10090000.ethernet: deferred probe pending: platform: supplier 10000000.clock-controller not ready

when trying to boot sifive_u in qemu.

Apparently, "10000000.clock-controller" is too long. Any suggestion on
how to solve the problem ? I guess using dev_name(dev) as dev_id parameter
for clk_hw_register_clkdev() is not or no longer a good idea.
What else should be used instead ?

Thanks,
Guenter
Russell King (Oracle) May 17, 2024, 10:22 p.m. UTC | #9
On Fri, May 17, 2024 at 03:09:12PM -0700, Guenter Roeck wrote:
> Hi,
> 
> On Fri, Mar 15, 2024 at 11:47:55AM +0000, Russell King (Oracle) wrote:
> > Report an error when an attempt to register a clkdev entry results in a
> > truncated string so the problem can be easily spotted.
> > 
> > Reported by: Duanqiang Wen <duanqiangwen@net-swift.com>
> > Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
> > Reviewed-by: Stephen Boyd <sboyd@kernel.org>
> 
> With this patch in the mainline kernel, I get
> 
> 10000000.clock-controller:corepll: device ID is greater than 24
> sifive-clk-prci 10000000.clock-controller: Failed to register clkdev for corepll: -12
> sifive-clk-prci 10000000.clock-controller: could not register clocks: -12
> sifive-clk-prci 10000000.clock-controller: probe with driver sifive-clk-prci failed with error -12
> ...
> platform 10060000.gpio: deferred probe pending: platform: supplier 10000000.clock-controller not ready
> platform 10010000.serial: deferred probe pending: platform: supplier 10000000.clock-controller not ready
> platform 10011000.serial: deferred probe pending: platform: supplier 10000000.clock-controller not ready
> platform 10040000.spi: deferred probe pending: platform: supplier 10000000.clock-controller not ready
> platform 10050000.spi: deferred probe pending: platform: supplier 10000000.clock-controller not ready
> platform 10090000.ethernet: deferred probe pending: platform: supplier 10000000.clock-controller not ready
> 
> when trying to boot sifive_u in qemu.
> 
> Apparently, "10000000.clock-controller" is too long. Any suggestion on
> how to solve the problem ? I guess using dev_name(dev) as dev_id parameter
> for clk_hw_register_clkdev() is not or no longer a good idea.
> What else should be used instead ?

It was *never* a good idea. clkdev uses a fixed buffer size of 20
characters including the NUL character, and "10000000.clock-controller"
would have been silently truncated to "10000000.clock-cont", and thus

                        if (!dev_id || strcmp(p->dev_id, dev_id))

would never have matched.

We need to think about (a) whether your use of clk_hw_register_clkdev()
is still appropriate, and (b) whether we need to increase the size of
the strings.
Guenter Roeck May 17, 2024, 11:34 p.m. UTC | #10
On 5/17/24 15:22, Russell King (Oracle) wrote:
> On Fri, May 17, 2024 at 03:09:12PM -0700, Guenter Roeck wrote:
>> Hi,
>>
>> On Fri, Mar 15, 2024 at 11:47:55AM +0000, Russell King (Oracle) wrote:
>>> Report an error when an attempt to register a clkdev entry results in a
>>> truncated string so the problem can be easily spotted.
>>>
>>> Reported by: Duanqiang Wen <duanqiangwen@net-swift.com>
>>> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
>>> Reviewed-by: Stephen Boyd <sboyd@kernel.org>
>>
>> With this patch in the mainline kernel, I get
>>
>> 10000000.clock-controller:corepll: device ID is greater than 24
>> sifive-clk-prci 10000000.clock-controller: Failed to register clkdev for corepll: -12
>> sifive-clk-prci 10000000.clock-controller: could not register clocks: -12
>> sifive-clk-prci 10000000.clock-controller: probe with driver sifive-clk-prci failed with error -12
>> ...
>> platform 10060000.gpio: deferred probe pending: platform: supplier 10000000.clock-controller not ready
>> platform 10010000.serial: deferred probe pending: platform: supplier 10000000.clock-controller not ready
>> platform 10011000.serial: deferred probe pending: platform: supplier 10000000.clock-controller not ready
>> platform 10040000.spi: deferred probe pending: platform: supplier 10000000.clock-controller not ready
>> platform 10050000.spi: deferred probe pending: platform: supplier 10000000.clock-controller not ready
>> platform 10090000.ethernet: deferred probe pending: platform: supplier 10000000.clock-controller not ready
>>
>> when trying to boot sifive_u in qemu.
>>
>> Apparently, "10000000.clock-controller" is too long. Any suggestion on
>> how to solve the problem ? I guess using dev_name(dev) as dev_id parameter
>> for clk_hw_register_clkdev() is not or no longer a good idea.
>> What else should be used instead ?
> 
> It was *never* a good idea. clkdev uses a fixed buffer size of 20
> characters including the NUL character, and "10000000.clock-controller"
> would have been silently truncated to "10000000.clock-cont", and thus
> 
>                          if (!dev_id || strcmp(p->dev_id, dev_id))
> 
> would never have matched.
> 
> We need to think about (a) whether your use of clk_hw_register_clkdev()
> is still appropriate, and (b) whether we need to increase the size of
> the strings.
>

It isn't _my_ use, really. I only run a variety of boot tests with qemu.
I expect we'll see reports from others trying to boot the mainline kernel
on real sifive_u hardware or other hardware using the same driver or other
drivers using dev_name() as dev_id parameter. Coccinelle finds the
following callers:

./sound/soc/codecs/rt5682s.c: dev_name() used as dev_id at line 2833
./drivers/clk/sifive/sifive-prci.c: dev_name() used as dev_id at line 540
./sound/soc/codecs/tlv320aic32x4-clk.c: dev_name() used as dev_id at line 475
./drivers/i2c/busses/i2c-bcm2835.c: dev_name() used as dev_id at line 189
./sound/soc/codecs/rt5682.c: dev_name() used as dev_id at line 2909

Guenter
Russell King (Oracle) May 17, 2024, 11:37 p.m. UTC | #11
On Fri, May 17, 2024 at 04:34:06PM -0700, Guenter Roeck wrote:
> On 5/17/24 15:22, Russell King (Oracle) wrote:
> > On Fri, May 17, 2024 at 03:09:12PM -0700, Guenter Roeck wrote:
> > > Hi,
> > > 
> > > On Fri, Mar 15, 2024 at 11:47:55AM +0000, Russell King (Oracle) wrote:
> > > > Report an error when an attempt to register a clkdev entry results in a
> > > > truncated string so the problem can be easily spotted.
> > > > 
> > > > Reported by: Duanqiang Wen <duanqiangwen@net-swift.com>
> > > > Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
> > > > Reviewed-by: Stephen Boyd <sboyd@kernel.org>
> > > 
> > > With this patch in the mainline kernel, I get
> > > 
> > > 10000000.clock-controller:corepll: device ID is greater than 24
> > > sifive-clk-prci 10000000.clock-controller: Failed to register clkdev for corepll: -12
> > > sifive-clk-prci 10000000.clock-controller: could not register clocks: -12
> > > sifive-clk-prci 10000000.clock-controller: probe with driver sifive-clk-prci failed with error -12
> > > ...
> > > platform 10060000.gpio: deferred probe pending: platform: supplier 10000000.clock-controller not ready
> > > platform 10010000.serial: deferred probe pending: platform: supplier 10000000.clock-controller not ready
> > > platform 10011000.serial: deferred probe pending: platform: supplier 10000000.clock-controller not ready
> > > platform 10040000.spi: deferred probe pending: platform: supplier 10000000.clock-controller not ready
> > > platform 10050000.spi: deferred probe pending: platform: supplier 10000000.clock-controller not ready
> > > platform 10090000.ethernet: deferred probe pending: platform: supplier 10000000.clock-controller not ready
> > > 
> > > when trying to boot sifive_u in qemu.
> > > 
> > > Apparently, "10000000.clock-controller" is too long. Any suggestion on
> > > how to solve the problem ? I guess using dev_name(dev) as dev_id parameter
> > > for clk_hw_register_clkdev() is not or no longer a good idea.
> > > What else should be used instead ?
> > 
> > It was *never* a good idea. clkdev uses a fixed buffer size of 20
> > characters including the NUL character, and "10000000.clock-controller"
> > would have been silently truncated to "10000000.clock-cont", and thus
> > 
> >                          if (!dev_id || strcmp(p->dev_id, dev_id))
> > 
> > would never have matched.
> > 
> > We need to think about (a) whether your use of clk_hw_register_clkdev()
> > is still appropriate, and (b) whether we need to increase the size of
> > the strings.
> > 
> 
> It isn't _my_ use, really. I only run a variety of boot tests with qemu.
> I expect we'll see reports from others trying to boot the mainline kernel
> on real sifive_u hardware or other hardware using the same driver or other
> drivers using dev_name() as dev_id parameter. Coccinelle finds the
> following callers:

Using dev_name() is not an issue. It's when dev_name() exceeds 19
characters that it becomes an issue (and always has been an issue
due to the truncation.) clk_get(dev, ...) uses dev_name(dev) to match
against its entry in the table.

As I say, dev_name() itself is not an issue. The length used for the
name is.
Guenter Roeck May 18, 2024, 3:24 a.m. UTC | #12
On 5/17/24 16:37, Russell King (Oracle) wrote:
> On Fri, May 17, 2024 at 04:34:06PM -0700, Guenter Roeck wrote:
>> On 5/17/24 15:22, Russell King (Oracle) wrote:
>>> On Fri, May 17, 2024 at 03:09:12PM -0700, Guenter Roeck wrote:
>>>> Hi,
>>>>
>>>> On Fri, Mar 15, 2024 at 11:47:55AM +0000, Russell King (Oracle) wrote:
>>>>> Report an error when an attempt to register a clkdev entry results in a
>>>>> truncated string so the problem can be easily spotted.
>>>>>
>>>>> Reported by: Duanqiang Wen <duanqiangwen@net-swift.com>
>>>>> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
>>>>> Reviewed-by: Stephen Boyd <sboyd@kernel.org>
>>>>
>>>> With this patch in the mainline kernel, I get
>>>>
>>>> 10000000.clock-controller:corepll: device ID is greater than 24
>>>> sifive-clk-prci 10000000.clock-controller: Failed to register clkdev for corepll: -12
>>>> sifive-clk-prci 10000000.clock-controller: could not register clocks: -12
>>>> sifive-clk-prci 10000000.clock-controller: probe with driver sifive-clk-prci failed with error -12
>>>> ...
>>>> platform 10060000.gpio: deferred probe pending: platform: supplier 10000000.clock-controller not ready
>>>> platform 10010000.serial: deferred probe pending: platform: supplier 10000000.clock-controller not ready
>>>> platform 10011000.serial: deferred probe pending: platform: supplier 10000000.clock-controller not ready
>>>> platform 10040000.spi: deferred probe pending: platform: supplier 10000000.clock-controller not ready
>>>> platform 10050000.spi: deferred probe pending: platform: supplier 10000000.clock-controller not ready
>>>> platform 10090000.ethernet: deferred probe pending: platform: supplier 10000000.clock-controller not ready
>>>>
>>>> when trying to boot sifive_u in qemu.
>>>>
>>>> Apparently, "10000000.clock-controller" is too long. Any suggestion on
>>>> how to solve the problem ? I guess using dev_name(dev) as dev_id parameter
>>>> for clk_hw_register_clkdev() is not or no longer a good idea.
>>>> What else should be used instead ?
>>>
>>> It was *never* a good idea. clkdev uses a fixed buffer size of 20
>>> characters including the NUL character, and "10000000.clock-controller"
>>> would have been silently truncated to "10000000.clock-cont", and thus
>>>
>>>                           if (!dev_id || strcmp(p->dev_id, dev_id))
>>>
>>> would never have matched.
>>>
>>> We need to think about (a) whether your use of clk_hw_register_clkdev()
>>> is still appropriate, and (b) whether we need to increase the size of
>>> the strings.
>>>
>>
>> It isn't _my_ use, really. I only run a variety of boot tests with qemu.
>> I expect we'll see reports from others trying to boot the mainline kernel
>> on real sifive_u hardware or other hardware using the same driver or other
>> drivers using dev_name() as dev_id parameter. Coccinelle finds the
>> following callers:
> 
> Using dev_name() is not an issue. It's when dev_name() exceeds 19
> characters that it becomes an issue (and always has been an issue
> due to the truncation.) clk_get(dev, ...) uses dev_name(dev) to match
> against its entry in the table.
> 
> As I say, dev_name() itself is not an issue. The length used for the
> name is.
> 

Maybe, but the existence of best_dev_name() suggests that this has been seen
before and that, as you mentioned, it is not a good idea. Anyway, the patch
below fixes the problem for me. I don't know if it is acceptable / correct,
so it might serve as guidance for others when fixing the problem for real.

Thanks,
Guenter

---
diff --git a/drivers/clk/sifive/sifive-prci.c b/drivers/clk/sifive/sifive-prci.c
index 25b8e1a80ddc..20cc8f42d9eb 100644
--- a/drivers/clk/sifive/sifive-prci.c
+++ b/drivers/clk/sifive/sifive-prci.c
@@ -537,7 +537,7 @@ static int __prci_register_clocks(struct device *dev, struct __prci_data *pd,
                         return r;
                 }

-               r = clk_hw_register_clkdev(&pic->hw, pic->name, dev_name(dev));
+               r = clk_hw_register_clkdev(&pic->hw, pic->name, "prci");
                 if (r) {
                         dev_warn(dev, "Failed to register clkdev for %s: %d\n",
                                  init.name, r);
Russell King (Oracle) May 18, 2024, 7:01 a.m. UTC | #13
On Fri, May 17, 2024 at 08:24:19PM -0700, Guenter Roeck wrote:
> On 5/17/24 16:37, Russell King (Oracle) wrote:
> > On Fri, May 17, 2024 at 04:34:06PM -0700, Guenter Roeck wrote:
> > > On 5/17/24 15:22, Russell King (Oracle) wrote:
> > > > On Fri, May 17, 2024 at 03:09:12PM -0700, Guenter Roeck wrote:
> > > > > Hi,
> > > > > 
> > > > > On Fri, Mar 15, 2024 at 11:47:55AM +0000, Russell King (Oracle) wrote:
> > > > > > Report an error when an attempt to register a clkdev entry results in a
> > > > > > truncated string so the problem can be easily spotted.
> > > > > > 
> > > > > > Reported by: Duanqiang Wen <duanqiangwen@net-swift.com>
> > > > > > Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
> > > > > > Reviewed-by: Stephen Boyd <sboyd@kernel.org>
> > > > > 
> > > > > With this patch in the mainline kernel, I get
> > > > > 
> > > > > 10000000.clock-controller:corepll: device ID is greater than 24
> > > > > sifive-clk-prci 10000000.clock-controller: Failed to register clkdev for corepll: -12
> > > > > sifive-clk-prci 10000000.clock-controller: could not register clocks: -12
> > > > > sifive-clk-prci 10000000.clock-controller: probe with driver sifive-clk-prci failed with error -12
> > > > > ...
> > > > > platform 10060000.gpio: deferred probe pending: platform: supplier 10000000.clock-controller not ready
> > > > > platform 10010000.serial: deferred probe pending: platform: supplier 10000000.clock-controller not ready
> > > > > platform 10011000.serial: deferred probe pending: platform: supplier 10000000.clock-controller not ready
> > > > > platform 10040000.spi: deferred probe pending: platform: supplier 10000000.clock-controller not ready
> > > > > platform 10050000.spi: deferred probe pending: platform: supplier 10000000.clock-controller not ready
> > > > > platform 10090000.ethernet: deferred probe pending: platform: supplier 10000000.clock-controller not ready
> > > > > 
> > > > > when trying to boot sifive_u in qemu.
> > > > > 
> > > > > Apparently, "10000000.clock-controller" is too long. Any suggestion on
> > > > > how to solve the problem ? I guess using dev_name(dev) as dev_id parameter
> > > > > for clk_hw_register_clkdev() is not or no longer a good idea.
> > > > > What else should be used instead ?
> > > > 
> > > > It was *never* a good idea. clkdev uses a fixed buffer size of 20
> > > > characters including the NUL character, and "10000000.clock-controller"
> > > > would have been silently truncated to "10000000.clock-cont", and thus
> > > > 
> > > >                           if (!dev_id || strcmp(p->dev_id, dev_id))
> > > > 
> > > > would never have matched.
> > > > 
> > > > We need to think about (a) whether your use of clk_hw_register_clkdev()
> > > > is still appropriate, and (b) whether we need to increase the size of
> > > > the strings.
> > > > 
> > > 
> > > It isn't _my_ use, really. I only run a variety of boot tests with qemu.
> > > I expect we'll see reports from others trying to boot the mainline kernel
> > > on real sifive_u hardware or other hardware using the same driver or other
> > > drivers using dev_name() as dev_id parameter. Coccinelle finds the
> > > following callers:
> > 
> > Using dev_name() is not an issue. It's when dev_name() exceeds 19
> > characters that it becomes an issue (and always has been an issue
> > due to the truncation.) clk_get(dev, ...) uses dev_name(dev) to match
> > against its entry in the table.
> > 
> > As I say, dev_name() itself is not an issue. The length used for the
> > name is.
> > 
> 
> Maybe, but the existence of best_dev_name() suggests that this has been seen
> before and that, as you mentioned, it is not a good idea. Anyway, the patch
> below fixes the problem for me. I don't know if it is acceptable / correct,
> so it might serve as guidance for others when fixing the problem for real.

I get the impression that there's a communication problem here, so I'm
not going to continue replying. Thanks.
Guenter Roeck May 18, 2024, 1:44 p.m. UTC | #14
On Fri, May 17, 2024 at 03:09:14PM -0700, Guenter Roeck wrote:
> Hi,
> 
> On Fri, Mar 15, 2024 at 11:47:55AM +0000, Russell King (Oracle) wrote:
> > Report an error when an attempt to register a clkdev entry results in a
> > truncated string so the problem can be easily spotted.
> > 
> > Reported by: Duanqiang Wen <duanqiangwen@net-swift.com>
> > Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
> > Reviewed-by: Stephen Boyd <sboyd@kernel.org>
> 
> With this patch in the mainline kernel, I get
> 
> 10000000.clock-controller:corepll: device ID is greater than 24
> sifive-clk-prci 10000000.clock-controller: Failed to register clkdev for corepll: -12
> sifive-clk-prci 10000000.clock-controller: could not register clocks: -12
> sifive-clk-prci 10000000.clock-controller: probe with driver sifive-clk-prci failed with error -12
> ...
> platform 10060000.gpio: deferred probe pending: platform: supplier 10000000.clock-controller not ready
> platform 10010000.serial: deferred probe pending: platform: supplier 10000000.clock-controller not ready
> platform 10011000.serial: deferred probe pending: platform: supplier 10000000.clock-controller not ready
> platform 10040000.spi: deferred probe pending: platform: supplier 10000000.clock-controller not ready
> platform 10050000.spi: deferred probe pending: platform: supplier 10000000.clock-controller not ready
> platform 10090000.ethernet: deferred probe pending: platform: supplier 10000000.clock-controller not ready
> 
> when trying to boot sifive_u in qemu.
> 
> Apparently, "10000000.clock-controller" is too long. Any suggestion on
> how to solve the problem ? I guess using dev_name(dev) as dev_id parameter
> for clk_hw_register_clkdev() is not or no longer a good idea.
> What else should be used instead ?
> 

Copying regzbot.

#regzbot ^introduced 8d532528ff6a
#regzbot title sifive_u boot failure
#regzbot ignore-activity
Linux regression tracking (Thorsten Leemhuis) May 22, 2024, 6:53 a.m. UTC | #15
[CCing the regression list, as it should be in the loop for regressions:
https://docs.kernel.org/admin-guide/reporting-regressions.html]

On 18.05.24 09:01, Russell King (Oracle) wrote:
> On Fri, May 17, 2024 at 08:24:19PM -0700, Guenter Roeck wrote:
>> On 5/17/24 16:37, Russell King (Oracle) wrote:
>>> On Fri, May 17, 2024 at 04:34:06PM -0700, Guenter Roeck wrote:
>>>> On 5/17/24 15:22, Russell King (Oracle) wrote:
>>>>> On Fri, May 17, 2024 at 03:09:12PM -0700, Guenter Roeck wrote:
>>>>>> On Fri, Mar 15, 2024 at 11:47:55AM +0000, Russell King (Oracle) wrote:
>>>>>>> Report an error when an attempt to register a clkdev entry results in a
>>>>>>> truncated string so the problem can be easily spotted.
>>>>>>>
>>>>>>> Reported by: Duanqiang Wen <duanqiangwen@net-swift.com>
>>>>>>> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
>>>>>>> Reviewed-by: Stephen Boyd <sboyd@kernel.org>
>>>>>>
>>>>>> With this patch in the mainline kernel, I get
>>>>>>
>>>>>> 10000000.clock-controller:corepll: device ID is greater than 24
>>>>>> sifive-clk-prci 10000000.clock-controller: Failed to register clkdev for corepll: -12
>>>>>> sifive-clk-prci 10000000.clock-controller: could not register clocks: -12
>>>>>> sifive-clk-prci 10000000.clock-controller: probe with driver sifive-clk-prci failed with error -12
>>>>>> ...
>>>>>> platform 10060000.gpio: deferred probe pending: platform: supplier 10000000.clock-controller not ready
>>>>>> platform 10010000.serial: deferred probe pending: platform: supplier 10000000.clock-controller not ready
>>>>>> platform 10011000.serial: deferred probe pending: platform: supplier 10000000.clock-controller not ready
>>>>>> platform 10040000.spi: deferred probe pending: platform: supplier 10000000.clock-controller not ready
>>>>>> platform 10050000.spi: deferred probe pending: platform: supplier 10000000.clock-controller not ready
>>>>>> platform 10090000.ethernet: deferred probe pending: platform: supplier 10000000.clock-controller not ready
>>>>>>
>>>>>> when trying to boot sifive_u in qemu.
>>>>>>
>>>>>> Apparently, "10000000.clock-controller" is too long. Any suggestion on
>>>>>> how to solve the problem ? I guess using dev_name(dev) as dev_id parameter
>>>>>> for clk_hw_register_clkdev() is not or no longer a good idea.
>>>>>> What else should be used instead ?
>>>>>
>>>>> It was *never* a good idea. clkdev uses a fixed buffer size of 20
>>>>> characters including the NUL character, and "10000000.clock-controller"
>>>>> would have been silently truncated to "10000000.clock-cont", and thus
>>>>>
>>>>>                           if (!dev_id || strcmp(p->dev_id, dev_id))
>>>>>
>>>>> would never have matched.
>>>>>
>>>>> We need to think about (a) whether your use of clk_hw_register_clkdev()
>>>>> is still appropriate, and (b) whether we need to increase the size of
>>>>> the strings.
>>>>
>>>> It isn't _my_ use, really. I only run a variety of boot tests with qemu.
>>>> I expect we'll see reports from others trying to boot the mainline kernel
>>>> on real sifive_u hardware or other hardware using the same driver or other
>>>> drivers using dev_name() as dev_id parameter. Coccinelle finds the
>>>> following callers:
>>>
>>> Using dev_name() is not an issue. It's when dev_name() exceeds 19
>>> characters that it becomes an issue (and always has been an issue
>>> due to the truncation.) clk_get(dev, ...) uses dev_name(dev) to match
>>> against its entry in the table.
>>>
>>> As I say, dev_name() itself is not an issue. The length used for the
>>> name is.
>>>
>>
>> Maybe, but the existence of best_dev_name() suggests that this has been seen
>> before and that, as you mentioned, it is not a good idea. Anyway, the patch
>> below fixes the problem for me. I don't know if it is acceptable / correct,
>> so it might serve as guidance for others when fixing the problem for real.
> 
> I get the impression that there's a communication problem here, so I'm
> not going to continue replying. Thanks.

Hmmm. Communication problem aside, this in the end seems to be a
regression that is caused by a change of yours. Maybe not a major one
that is making a fuzz about, but still one that would be good to get
fixed. So who will take care of that? Alternatively:

Guenter, I assume we could simply fix this by reverting the culprit if
Linus wanted to?

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

#regzbot poke
Russell King (Oracle) May 22, 2024, 9:34 a.m. UTC | #16
On Wed, May 22, 2024 at 08:53:18AM +0200, Linux regression tracking (Thorsten Leemhuis) wrote:
> Hmmm. Communication problem aside, this in the end seems to be a
> regression that is caused by a change of yours. Maybe not a major one
> that is making a fuzz about, but still one that would be good to get
> fixed. So who will take care of that?

I have suggested several approaches to fixing it, and each time I'm
being ignored by Guenter, who seems to have some other agenda -
because he seems to believe that using dev_name() when registering
the clk with clkdev is wrong... despite the fact that clkdev uses
dev_name().

What I am uncertain about is:
1) whether clkdev is even necessary here, or whether it is pure noise.
   I think it's pure noise.  Why? The dev_name() that is being used#
   to register the clk seems to be the _source_ device of the clock,
   whereas the name given should be the _consumer_ of the clock (when
   clk_get(dev, con_id) is called, dev is the _consumer_ device, and
   this is the device that dev_name() is used internally with.) Thus,
   if _that_ device is not the same as the struct device that is being
   passed to dev_name() when registering the clk, the entry in clkdev
   is utterly useless.

2) why someone would think that using best_dev_name() to work around
   this would be a good idea. One might as well pass the string
   "hahaha" when registering the clk - because if the device name is
   truncated, clk_get() is not going to find it. So, by registering
   it with clkdev, we're just eating memory for no reason.

Therefore, this change is finding bugs elsewhere. Should it cause a
boot failure? No, and I'm happy to make clkdev just warn about it.
However, reverting the change means we're not going to find these
issues.

Why was the change originally proposed (by Duanqiang Wen) ? The reason
was because of this truncation causing clk_get() to fail unexpectedly.

I am all for a _sensible_ discussion over this - not one that seems to
have an agenda about "should dev_name() be used when registering a
clk" that seems to be Guenter's approach because _that_ is not the root
cause of the issue and I've already explained that _that_ is not the
issue here. Yet, Guenter insists on that.
Russell King (Oracle) May 22, 2024, 9:37 a.m. UTC | #17
On Wed, May 22, 2024 at 10:34:22AM +0100, Russell King (Oracle) wrote:
> On Wed, May 22, 2024 at 08:53:18AM +0200, Linux regression tracking (Thorsten Leemhuis) wrote:
> > Hmmm. Communication problem aside, this in the end seems to be a
> > regression that is caused by a change of yours. Maybe not a major one
> > that is making a fuzz about, but still one that would be good to get
> > fixed. So who will take care of that?
> 
> I have suggested several approaches to fixing it, and each time I'm
> being ignored by Guenter, who seems to have some other agenda -
> because he seems to believe that using dev_name() when registering
> the clk with clkdev is wrong... despite the fact that clkdev uses
> dev_name().
> 
> What I am uncertain about is:
> 1) whether clkdev is even necessary here, or whether it is pure noise.
>    I think it's pure noise.  Why? The dev_name() that is being used#
>    to register the clk seems to be the _source_ device of the clock,
>    whereas the name given should be the _consumer_ of the clock (when
>    clk_get(dev, con_id) is called, dev is the _consumer_ device, and
>    this is the device that dev_name() is used internally with.) Thus,
>    if _that_ device is not the same as the struct device that is being
>    passed to dev_name() when registering the clk, the entry in clkdev
>    is utterly useless.
> 
> 2) why someone would think that using best_dev_name() to work around
>    this would be a good idea. One might as well pass the string
>    "hahaha" when registering the clk - because if the device name is
>    truncated, clk_get() is not going to find it. So, by registering
>    it with clkdev, we're just eating memory for no reason.
> 
> Therefore, this change is finding bugs elsewhere. Should it cause a
> boot failure? No, and I'm happy to make clkdev just warn about it.
> However, reverting the change means we're not going to find these
> issues.
> 
> Why was the change originally proposed (by Duanqiang Wen) ? The reason
> was because of this truncation causing clk_get() to fail unexpectedly.
> 
> I am all for a _sensible_ discussion over this - not one that seems to
> have an agenda about "should dev_name() be used when registering a
> clk" that seems to be Guenter's approach because _that_ is not the root
> cause of the issue and I've already explained that _that_ is not the
> issue here. Yet, Guenter insists on that.

... and I'll also add that I'm up to my eyeballs with an issue at
Oracle, and thus have very very little time to deal with mainline
kernel issues right now, so if people appear to be intentionally
obtuse and difficult, I will end the discussion as I did here
_purely_ because I _do_ _not_ _have_ _the_ _time_ to waste my time
with them.
Guenter Roeck May 22, 2024, 9:32 p.m. UTC | #18
On 5/22/24 02:34, Russell King (Oracle) wrote:
> On Wed, May 22, 2024 at 08:53:18AM +0200, Linux regression tracking (Thorsten Leemhuis) wrote:
>> Hmmm. Communication problem aside, this in the end seems to be a
>> regression that is caused by a change of yours. Maybe not a major one
>> that is making a fuzz about, but still one that would be good to get
>> fixed. So who will take care of that?
> 
> I have suggested several approaches to fixing it, and each time I'm
> being ignored by Guenter, who seems to have some other agenda -
> because he seems to believe that using dev_name() when registering
> the clk with clkdev is wrong... despite the fact that clkdev uses
> dev_name().
> 

I don't have an agenda. I did not intentionally ignore anyone.
Misunderstand, maybe, but not ignore. I don't care how this is fixed.
All I intended to do was to report it. My apologies to anyone who
I managed to offend.

> What I am uncertain about is:
> 1) whether clkdev is even necessary here, or whether it is pure noise.
>     I think it's pure noise.  Why? The dev_name() that is being used#
>     to register the clk seems to be the _source_ device of the clock,
>     whereas the name given should be the _consumer_ of the clock (when
>     clk_get(dev, con_id) is called, dev is the _consumer_ device, and
>     this is the device that dev_name() is used internally with.) Thus,
>     if _that_ device is not the same as the struct device that is being
>     passed to dev_name() when registering the clk, the entry in clkdev
>     is utterly useless.
> 
> 2) why someone would think that using best_dev_name() to work around
>     this would be a good idea. One might as well pass the string
>     "hahaha" when registering the clk - because if the device name is
>     truncated, clk_get() is not going to find it. So, by registering
>     it with clkdev, we're just eating memory for no reason.
> 
> Therefore, this change is finding bugs elsewhere. Should it cause a
> boot failure? No, and I'm happy to make clkdev just warn about it.
> However, reverting the change means we're not going to find these
> issues.
> 
> Why was the change originally proposed (by Duanqiang Wen) ? The reason
> was because of this truncation causing clk_get() to fail unexpectedly.
> 
> I am all for a _sensible_ discussion over this - not one that seems to
> have an agenda about "should dev_name() be used when registering a
> clk" that seems to be Guenter's approach because _that_ is not the root
> cause of the issue and I've already explained that _that_ is not the
> issue here. Yet, Guenter insists on that.
> 

I don't know if using dev_name() is wrong or not. I don't know much
if anything about the clock subsystem. Someone else said that using
dev_name() would be a "bad idea", or at least that was my understanding.
That is all. Again, my apologies for any misunderstanding and for
getting involved beyond the initial report.

Either case, I really don't want to be involved further, and
_please_ don't make any assumptions or decisions based on anything
I may have said out of ignorance.

Thanks,
Guenter
diff mbox series

Patch

diff --git a/drivers/clk/clkdev.c b/drivers/clk/clkdev.c
index ee37d0be6877..3146f26ab997 100644
--- a/drivers/clk/clkdev.c
+++ b/drivers/clk/clkdev.c
@@ -158,6 +158,10 @@  vclkdev_alloc(struct clk_hw *hw, const char *con_id, const char *dev_fmt,
 	va_list ap)
 {
 	struct clk_lookup_alloc *cla;
+	struct va_format fmt;
+	const char *failure;
+	size_t max_size;
+	ssize_t res;
 
 	cla = kzalloc(sizeof(*cla), GFP_KERNEL);
 	if (!cla)
@@ -165,16 +169,34 @@  vclkdev_alloc(struct clk_hw *hw, const char *con_id, const char *dev_fmt,
 
 	cla->cl.clk_hw = hw;
 	if (con_id) {
-		strscpy(cla->con_id, con_id, sizeof(cla->con_id));
+		res = strscpy(cla->con_id, con_id, sizeof(cla->con_id));
+		if (res < 0) {
+			max_size = sizeof(cla->con_id);
+			failure = "connection";
+			goto fail;
+		}
 		cla->cl.con_id = cla->con_id;
 	}
 
 	if (dev_fmt) {
-		vscnprintf(cla->dev_id, sizeof(cla->dev_id), dev_fmt, ap);
+		res = vsnprintf(cla->dev_id, sizeof(cla->dev_id), dev_fmt, ap);
+		if (res >= sizeof(cla->dev_id)) {
+			max_size = sizeof(cla->dev_id);
+			failure = "device";
+			goto fail;
+		}
 		cla->cl.dev_id = cla->dev_id;
 	}
 
 	return &cla->cl;
+
+fail:
+	fmt.fmt = dev_fmt;
+	fmt.va = &ap;
+	pr_err("%pV:%s: %s ID is greater than %zu\n",
+	       &fmt, con_id, failure, max_size);
+	kfree(cla);
+	return NULL;
 }
 
 static struct clk_lookup *