Message ID | 20140408171417.GB27776@obsidianresearch.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hi Jason, On Tue, Apr 08, 2014 at 11:14:17AM -0600, Jason Gunthorpe wrote: > The mbus hardware requires a power of two size, if a non-power > of two is passed in to the low level routines they configure the > register in a way that results in undefined behaviour. > > - WARN_ON to make this invalid usage really obvious > - Round down to the nearest power of two so we only stuff a valid > size into the HW So if I understand it right, for example when my first NIC registers e0000000-e02fffff, then we'll truncate it down to e0000000-e01fffff, won't that cause any issue, for example if the NIC needs to access data beyond that limit ? BTW I can try your patch with the myricom NIC which started to work when rounding up, I'll quickly see if that fixes the issue as well, but I'm now a little bit confused. Regards, Willy
Dear Jason Gunthorpe, On Tue, 8 Apr 2014 11:14:17 -0600, Jason Gunthorpe wrote: > > mvebu-mbus/devices output: > > > > # cat /sys/kernel/debug/mvebu-mbus/devices > > [10] 00000000e0000000 - 00000000e0900000 : 0004:00e8 > > > I'm not sure why I don't have the non-power-of-two problem for the MBus > > windows. > > Well, you do, 0x900000 is not aligned. It converts to size=0b10001111, > which is undefined behavior for mbus. Ah correct. Though I'm still puzzled as to why the undefined behavior works for me, and not for Willy, who I believe has the same NIC as me. > What do you think about this: > > From 235b0bf637242a50ec45c8766d18a942bff601cf Mon Sep 17 00:00:00 2001 > From: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> > Date: Tue, 8 Apr 2014 11:12:41 -0600 > Subject: [PATCH] bus: mvebu-mbus: Avoid setting an undefined window size > > The mbus hardware requires a power of two size, if a non-power > of two is passed in to the low level routines they configure the > register in a way that results in undefined behaviour. > > - WARN_ON to make this invalid usage really obvious > - Round down to the nearest power of two so we only stuff a valid > size into the HW I perfectly fine with those two. > - When reading interpret undefined values in a conservative way, > the value is assumed to be the lowest power of two. This avoids > the debugfs output showing a value that looks 'correct' But I am not sure with this one. Since now you're anyway rounding down sizes, how is it possible to get a non-power-of-two size in the registers? I would probably prefer to have mvebu_devs_debug_show() do something like: seq_printf(seq, "[%02d] %016llx - %016llx : %04x:%04x%s", win, (unsigned long long)wbase, (unsigned long long)(wbase + wsize), wtarget, wattr, (!is_power_of_2(wsize)) ? " non-pow2 undefined behavior!" : ""); or something like that. This way at least the function does not "hide" the fact that the configured value is invalid. See one more comment below. > All together this makes the recent problems with silent failure > of PCI very obvious, noisy and debuggable. > > Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> > --- > drivers/bus/mvebu-mbus.c | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/drivers/bus/mvebu-mbus.c b/drivers/bus/mvebu-mbus.c > index 2ac754e..d26f63c 100644 > --- a/drivers/bus/mvebu-mbus.c > +++ b/drivers/bus/mvebu-mbus.c > @@ -56,6 +56,7 @@ > #include <linux/of.h> > #include <linux/of_address.h> > #include <linux/debugfs.h> > +#include <linux/log2.h> > > /* > * DDR target is the same on all platforms. > @@ -147,7 +148,7 @@ static void mvebu_mbus_read_window(struct mvebu_mbus_state *mbus, > *enabled = 1; > *base = ((u64)basereg & WIN_BASE_HIGH) << 32; > *base |= (basereg & WIN_BASE_LOW); > - *size = (ctrlreg | ~WIN_CTRL_SIZE_MASK) + 1; > + *size = 1 << (ffs(~(ctrlreg | ~WIN_CTRL_SIZE_MASK)) - 1); > > if (target) > *target = (ctrlreg & WIN_CTRL_TGT_MASK) >> WIN_CTRL_TGT_SHIFT; > @@ -266,6 +267,9 @@ static int mvebu_mbus_setup_window(struct mvebu_mbus_state *mbus, > mbus->soc->win_cfg_offset(win); > u32 ctrl, remap_addr; > > + WARN_ON(!is_power_of_2(size)); > + size = rounddown_pow_of_two(size); Maybe something like: if (!is_power_of_2(size)) { WARN(true, "Invalid MBus window size: 0x%x, rounding down to 0x%x\n", size, rounddown_pow_of_two(size)); size = rounddown_pow_of_two(size); } And while we're adding checks, why not also verify that the base address is a multiple of the window size? I think it's the other requirement. That being said, this warning doesn't really solve the problem that the PCI core may allocate BARs whose size cannot be represented through MBus windows :-) Thomas
On Tue, Apr 08, 2014 at 08:01:40PM +0200, Thomas Petazzoni wrote: > > Well, you do, 0x900000 is not aligned. It converts to size=0b10001111, > > which is undefined behavior for mbus. > > Ah correct. Though I'm still puzzled as to why the undefined behavior > works for me, and not for Willy, who I believe has the same NIC as me. If we knew the algorithm in the HW it would probably make sense :) > > - When reading interpret undefined values in a conservative way, > > the value is assumed to be the lowest power of two. This avoids > > the debugfs output showing a value that looks 'correct' > > But I am not sure with this one. Since now you're anyway rounding down > sizes, how is it possible to get a non-power-of-two size in the > registers? I agree, it should not happen if everything is correct, but the apparently correct debugfs output obscured the root cause of the problem.. > I would probably prefer to have mvebu_devs_debug_show() do something > like: > > seq_printf(seq, "[%02d] %016llx - %016llx : %04x:%04x%s", > win, (unsigned long long)wbase, > (unsigned long long)(wbase + wsize), wtarget, wattr, > (!is_power_of_2(wsize)) ? " non-pow2 undefined behavior!" : ""); That sounds good > > + WARN_ON(!is_power_of_2(size)); > > + size = rounddown_pow_of_two(size); > > Maybe something like: > > if (!is_power_of_2(size)) { > WARN(true, "Invalid MBus window size: 0x%x, rounding down to 0x%x\n", > size, rounddown_pow_of_two(size)); > size = rounddown_pow_of_two(size); > } > > And while we're adding checks, why not also verify that the base > address is a multiple of the window size? I think it's the other > requirement. Yes, agree, sounds good > That being said, this warning doesn't really solve the problem that the > PCI core may allocate BARs whose size cannot be represented through > MBus windows :-) Right, but Will pointed out it took 3 months to get to the root cause, so it might save that time in future :) I'll revise/resend the patch. Jason
Dear Jason Gunthorpe, On Tue, 8 Apr 2014 12:22:30 -0600, Jason Gunthorpe wrote: > > Ah correct. Though I'm still puzzled as to why the undefined behavior > > works for me, and not for Willy, who I believe has the same NIC as me. > > If we knew the algorithm in the HW it would probably make sense :) Indeed :) > > But I am not sure with this one. Since now you're anyway rounding down > > sizes, how is it possible to get a non-power-of-two size in the > > registers? > > I agree, it should not happen if everything is correct, but the > apparently correct debugfs output obscured the root cause of the > problem.. Absolutely. > > That being said, this warning doesn't really solve the problem that the > > PCI core may allocate BARs whose size cannot be represented through > > MBus windows :-) > > Right, but Will pointed out it took 3 months to get to the root cause, > so it might save that time in future :) Yes, I completely agree, and I'm totally in favor of making things more robust in mvebu-mbus. Actually, should we simply return an error if something tries to set up a window with an invalid size? But maybe it's a bit too strong for the moment, until the power of two size problem gets resolved on the PCI side. But the other day, I also had funky issues with another window, and it turned out to be caused by a non-power-of-two window as well... > I'll revise/resend the patch. Thanks! Thomas
diff --git a/drivers/bus/mvebu-mbus.c b/drivers/bus/mvebu-mbus.c index 2ac754e..d26f63c 100644 --- a/drivers/bus/mvebu-mbus.c +++ b/drivers/bus/mvebu-mbus.c @@ -56,6 +56,7 @@ #include <linux/of.h> #include <linux/of_address.h> #include <linux/debugfs.h> +#include <linux/log2.h> /* * DDR target is the same on all platforms. @@ -147,7 +148,7 @@ static void mvebu_mbus_read_window(struct mvebu_mbus_state *mbus, *enabled = 1; *base = ((u64)basereg & WIN_BASE_HIGH) << 32; *base |= (basereg & WIN_BASE_LOW); - *size = (ctrlreg | ~WIN_CTRL_SIZE_MASK) + 1; + *size = 1 << (ffs(~(ctrlreg | ~WIN_CTRL_SIZE_MASK)) - 1); if (target) *target = (ctrlreg & WIN_CTRL_TGT_MASK) >> WIN_CTRL_TGT_SHIFT; @@ -266,6 +267,9 @@ static int mvebu_mbus_setup_window(struct mvebu_mbus_state *mbus, mbus->soc->win_cfg_offset(win); u32 ctrl, remap_addr; + WARN_ON(!is_power_of_2(size)); + size = rounddown_pow_of_two(size); + ctrl = ((size - 1) & WIN_CTRL_SIZE_MASK) | (attr << WIN_CTRL_ATTR_SHIFT) | (target << WIN_CTRL_TGT_SHIFT) |