Message ID | 53074584.5010202@keymile.com (mailing list archive) |
---|---|
State | New, archived |
Delegated to: | Bjorn Helgaas |
Headers | show |
Dear Gerlando Falauto, On Fri, 21 Feb 2014 13:24:36 +0100, Gerlando Falauto wrote: > So I restored the total aperture size to 192MB. > I had to rework your patch a bit because: > > a) I'm running an older kernel and driver > b) sizes are actually 1-byte offset Hum, right. This is a bit weird, maybe I should change that, I don't think the mvebu-mbus driver should accept 1-byte offset sizes. > Here's the assignment (same as before): > > pci 0000:00:01.0: BAR 8: assigned [mem 0xe0000000-0xebffffff] > pci 0000:01:00.0: BAR 1: assigned [mem 0xe0000000-0xe7ffffff] > pci 0000:01:00.0: BAR 3: assigned [mem 0xe8000000-0xe87fffff] > pci 0000:01:00.0: BAR 4: assigned [mem 0xe8800000-0xe8801fff] > pci 0000:01:00.0: BAR 0: assigned [mem 0xe8802000-0xe8802fff] > pci 0000:01:00.0: BAR 2: assigned [mem 0xe8803000-0xe8803fff] > pci 0000:01:00.0: BAR 5: assigned [mem 0xe8804000-0xe8804fff] > > And here's the output I get from: > > # cat /sys/kernel/debug/mvebu-mbus/devices > [00] 00000000e8000000 - 00000000ec000000 : pcie0.0 (remap 00000000e8000000) > [01] disabled > [02] disabled > [03] disabled > [04] 00000000ff000000 - 00000000ff010000 : nand > [05] 00000000f4000000 - 00000000f8000000 : vpcie > [06] 00000000fe000000 - 00000000fe010000 : dragonite > [07] 00000000e0000000 - 00000000e8000000 : pcie0.0 This seems correct: we have two windows pointing to the same device, and they have consecutive addresses. > I did not get to test the whole address space thoroughly, but all the > BARs are still accessible (mainly BAR0 which contains the control space > and is mapped on the "new" MBUS window, and BAR1 which is the "big" > one). So at least, the issues we had before are now gone. Did you check that what you read from BAR0 (which is mapped on the new MBUS window) is really what you expect, and not just the same thing as BAR1 accessible for the big window? I just want to make sure that the hardware indeed properly handles two windows for the same device. > So I'd say this looks like a very promising approach. :-) Indeed. However, I don't think this approach solves the entire problem, for two reasons: *) For small BARs that are not power-of-two sized, we may not want to consume two windows, but instead consume a little bit more address space. Using two windows to map a 96 KB BAR would be a waste of windows: using a single 128 KB window is much more efficient. *) I don't know if the algorithm to split the BAR into multiple windows is going to be trivial. Thomas
On Friday 21 February 2014 14:47:08 Thomas Petazzoni wrote: > > > So I'd say this looks like a very promising approach. > > Indeed. However, I don't think this approach solves the entire problem, > for two reasons: > > *) For small BARs that are not power-of-two sized, we may not want to > consume two windows, but instead consume a little bit more address > space. Using two windows to map a 96 KB BAR would be a waste of > windows: using a single 128 KB window is much more efficient. definitely. > *) I don't know if the algorithm to split the BAR into multiple > windows is going to be trivial. The easiest solution would be to special case 'size is between 128MB+1 and 192MB' if that turns out to be the most interesting case. It's easy enough to make the second window smaller than 64MB if we want. If we want things to be a little fancier, we could use: switch (size) { case (SZ_32M+1) ... (SZ_32M+SZ_16M): size2 = size - SZ_32M; size -= SZ_32M; break; case (SZ_64M+1) ... (SZ_64M+SZ_32M): size2 = size - SZ_64M; size -= SZ_64M; break; case (SZ_128M+1) ... (SZ_128M+SZ_64M): size2 = size - SZ_128M; size -= SZ_128M; break; }; Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Dear Arnd Bergmann, On Fri, 21 Feb 2014 16:05:16 +0100, Arnd Bergmann wrote: > > *) I don't know if the algorithm to split the BAR into multiple > > windows is going to be trivial. > > The easiest solution would be to special case 'size is between > 128MB+1 and 192MB' if that turns out to be the most interesting > case. It's easy enough to make the second window smaller than 64MB > if we want. > > If we want things to be a little fancier, we could use: > > switch (size) { > case (SZ_32M+1) ... (SZ_32M+SZ_16M): > size2 = size - SZ_32M; > size -= SZ_32M; > break; > case (SZ_64M+1) ... (SZ_64M+SZ_32M): > size2 = size - SZ_64M; > size -= SZ_64M; > break; > case (SZ_128M+1) ... (SZ_128M+SZ_64M): > size2 = size - SZ_128M; > size -= SZ_128M; > break; > }; What if the size of your BAR is 128 MB + 64 MB + 32 MB ? Then you need three windows, and your algorithm doesn't work :-) Thomas
On Friday 21 February 2014 16:11:08 Thomas Petazzoni wrote: > On Fri, 21 Feb 2014 16:05:16 +0100, Arnd Bergmann wrote: > > > > *) I don't know if the algorithm to split the BAR into multiple > > > windows is going to be trivial. > > > > The easiest solution would be to special case 'size is between > > 128MB+1 and 192MB' if that turns out to be the most interesting > > case. It's easy enough to make the second window smaller than 64MB > > if we want. > > > > If we want things to be a little fancier, we could use: > > > > switch (size) { > > case (SZ_32M+1) ... (SZ_32M+SZ_16M): > > size2 = size - SZ_32M; > > size -= SZ_32M; > > break; > > case (SZ_64M+1) ... (SZ_64M+SZ_32M): > > size2 = size - SZ_64M; > > size -= SZ_64M; > > break; > > case (SZ_128M+1) ... (SZ_128M+SZ_64M): > > size2 = size - SZ_128M; > > size -= SZ_128M; > > break; > > }; > > What if the size of your BAR is 128 MB + 64 MB + 32 MB ? Then you need > three windows, and your algorithm doesn't work I was hoping we could avoid using more than two windows. With the algorithm above we would round up to 256MB and fail if that doesn't fit, which is the same thing that happens when you run out of space. Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Dear Arnd Bergmann, On Fri, 21 Feb 2014 16:20:45 +0100, Arnd Bergmann wrote: > > What if the size of your BAR is 128 MB + 64 MB + 32 MB ? Then you need > > three windows, and your algorithm doesn't work > > I was hoping we could avoid using more than two windows. > With the algorithm above we would round up to 256MB and > fail if that doesn't fit, which is the same thing that > happens when you run out of space. The problem is precisely that we currently don't have any well to tell the Linux PCI core that we need to round up the size of a BAR. That's the whole starting point of the discussion :-) Thomas
On Fri, Feb 21, 2014 at 02:47:08PM +0100, Thomas Petazzoni wrote: > *) I don't know if the algorithm to split the BAR into multiple > windows is going to be trivial. physaddr_t base,size; while (size != 0) { physaddr_t window_size = 1 << log2_round_down(size); create_window(base,window_size); base += window_size; size -= window_size; } At the very worst log2_round_down is approxmiately unsigned int log2_round_down(unsigned int val) { unsigned int res = 0; while ((1<<res) <= val) res++; return res - 1; } Minimum PCI required alignment for windows is 1MB so it will always work out into some number of mbus windows.. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Dear Jason Gunthorpe, On Fri, 21 Feb 2014 09:39:02 -0700, Jason Gunthorpe wrote: > On Fri, Feb 21, 2014 at 02:47:08PM +0100, Thomas Petazzoni wrote: > > *) I don't know if the algorithm to split the BAR into multiple > > windows is going to be trivial. > > physaddr_t base,size; > > while (size != 0) { > physaddr_t window_size = 1 << log2_round_down(size); > create_window(base,window_size); > base += window_size; > size -= window_size; > } > > At the very worst log2_round_down is approxmiately > > unsigned int log2_round_down(unsigned int val) > { > unsigned int res = 0; > while ((1<<res) <= val) > res++; > return res - 1; > } > > Minimum PCI required alignment for windows is 1MB so it will always > work out into some number of mbus windows.. Interesting! Thanks! Now I have another question: our mvebu_pcie_align_resource() function makes sure that the base address of the BAR is aligned on its size, because it is a requirement of MBus windows. However, if you later split the BAR into multiple windows, will this continue to work out? Let's take an example: a 96 MB BAR. If it gets put at 0xe0000000, then no problem: we create one 64 MB window at 0xe0000000 and a 32 MB window at 0xe4000000. Both base addresses are aligned on the size of the window. However, if the 96 MB BAR gets put at 0xea000000 (which is aligned on a 96 MB boundary, as required by our mvebu_pcie_align_resource). We create one 64 MB window at 0xea000000, and one 32 MB window at 0xee000000. Unfortunately, while 0xea000000 is aligned on a 96 MB boundary, it is not aligned on a 64 MB boundary, so the 64 MB window we have created is wrong. Which also makes me think that our mvebu_pcie_align_resource() function uses round_up(start, size), which most likely doesn't work with non power-of-two sizes. Thomas
On Fri, Feb 21, 2014 at 06:05:08PM +0100, Thomas Petazzoni wrote: > Now I have another question: our mvebu_pcie_align_resource() function > makes sure that the base address of the BAR is aligned on its size, > because it is a requirement of MBus windows. However, if you later > split the BAR into multiple windows, will this continue to work out? No, you must align to (1 << log2_round_down(size)) - that will always be the largest mbus window created and thus the highest starting alignment requirement. I looked for a bit to see if I could find why the core is rounding up to 196MB and it wasn't clear to me either. Gerlando, if you instrument the code in setup-bus.c, particularly pbus_size_mem, you will probably find out. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Friday 21 February 2014 10:31:05 Jason Gunthorpe wrote: > On Fri, Feb 21, 2014 at 06:05:08PM +0100, Thomas Petazzoni wrote: > > > Now I have another question: our mvebu_pcie_align_resource() function > > makes sure that the base address of the BAR is aligned on its size, > > because it is a requirement of MBus windows. However, if you later > > split the BAR into multiple windows, will this continue to work out? > > No, you must align to (1 << log2_round_down(size)) - that will always > be the largest mbus window created and thus the highest starting > alignment requirement. Unless you allow reordering the two windows. If you want a 96MB window, you only need 32MB alignment because you can either put the actual 64MB window first if you have 64MB alignment, or you put the 32MB window first if you don't and then the following 64MB will be aligned. It gets more complicated if you want to allow a 72MB window (16MB+64MB), as that could either be 64MB aligned or start 16MB before the next multiple of 64MB. I don't think there is any reason why code anywhere should align the window to a multiple of the size though if the size is not power-of-two, such as aligning to multiples of 96MB. That wouldn't help anyone. Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Dear Thomas, On 02/21/2014 02:47 PM, Thomas Petazzoni wrote: > Dear Gerlando Falauto, > > On Fri, 21 Feb 2014 13:24:36 +0100, Gerlando Falauto wrote: > >> So I restored the total aperture size to 192MB. >> I had to rework your patch a bit because: >> >> a) I'm running an older kernel and driver >> b) sizes are actually 1-byte offset > > Hum, right. This is a bit weird, maybe I should change that, I don't > think the mvebu-mbus driver should accept 1-byte offset sizes. I don't know anything about this, I only know the size dumped is of the form 0x...ffff, that's all. >> Here's the assignment (same as before): >> >> pci 0000:00:01.0: BAR 8: assigned [mem 0xe0000000-0xebffffff] >> pci 0000:01:00.0: BAR 1: assigned [mem 0xe0000000-0xe7ffffff] >> pci 0000:01:00.0: BAR 3: assigned [mem 0xe8000000-0xe87fffff] >> pci 0000:01:00.0: BAR 4: assigned [mem 0xe8800000-0xe8801fff] >> pci 0000:01:00.0: BAR 0: assigned [mem 0xe8802000-0xe8802fff] >> pci 0000:01:00.0: BAR 2: assigned [mem 0xe8803000-0xe8803fff] >> pci 0000:01:00.0: BAR 5: assigned [mem 0xe8804000-0xe8804fff] >> >> And here's the output I get from: >> >> # cat /sys/kernel/debug/mvebu-mbus/devices >> [00] 00000000e8000000 - 00000000ec000000 : pcie0.0 (remap 00000000e8000000) >> [01] disabled >> [02] disabled >> [03] disabled >> [04] 00000000ff000000 - 00000000ff010000 : nand >> [05] 00000000f4000000 - 00000000f8000000 : vpcie >> [06] 00000000fe000000 - 00000000fe010000 : dragonite >> [07] 00000000e0000000 - 00000000e8000000 : pcie0.0 > > This seems correct: we have two windows pointing to the same device, > and they have consecutive addresses. I don't know how to interpret the (remap ... ) bit, but yes, this looks right to me as well. I just don't know why mbus window 7 gets picked before 0, but apart from that, it looks nice. >> I did not get to test the whole address space thoroughly, but all the >> BARs are still accessible (mainly BAR0 which contains the control space >> and is mapped on the "new" MBUS window, and BAR1 which is the "big" >> one). So at least, the issues we had before are now gone. > > Did you check that what you read from BAR0 (which is mapped on the new > MBUS window) is really what you expect, and not just the same thing as > BAR1 accessible for the big window? I just want to make sure that the > hardware indeed properly handles two windows for the same device. Yes, there's no way the two BARs could be aliased. It's a fairly complex FPGA design, where BAR1 is the huge address space for a PCI-to-localbus bridge (whose connected devices are recognized correctly) and BAR0 is the control BAR (and its registers are read and written without a problem). >> So I'd say this looks like a very promising approach. :-) > > Indeed. However, I don't think this approach solves the entire problem, > for two reasons: > > *) For small BARs that are not power-of-two sized, we may not want to > consume two windows, but instead consume a little bit more address > space. Using two windows to map a 96 KB BAR would be a waste of > windows: using a single 128 KB window is much more efficient. > > *) I don't know if the algorithm to split the BAR into multiple > windows is going to be trivial. I see others have already replied and I pretty much agree with them. Thanks, Gerlando -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 02/21/2014 07:05 PM, Arnd Bergmann wrote: > On Friday 21 February 2014 10:31:05 Jason Gunthorpe wrote: >> On Fri, Feb 21, 2014 at 06:05:08PM +0100, Thomas Petazzoni wrote: >> >>> Now I have another question: our mvebu_pcie_align_resource() function >>> makes sure that the base address of the BAR is aligned on its size, >>> because it is a requirement of MBus windows. However, if you later >>> split the BAR into multiple windows, will this continue to work out? >> >> No, you must align to (1 << log2_round_down(size)) - that will always >> be the largest mbus window created and thus the highest starting >> alignment requirement. > > Unless you allow reordering the two windows. If you want a 96MB > window, you only need 32MB alignment because you can either put > the actual 64MB window first if you have 64MB alignment, or you > put the 32MB window first if you don't and then the following > 64MB will be aligned. > > It gets more complicated if you want to allow a 72MB window > (16MB+64MB), as that could either be 64MB aligned or start 16MB > before the next multiple of 64MB. > > I don't think there is any reason why code anywhere should align > the window to a multiple of the size though if the size is not > power-of-two, such as aligning to multiples of 96MB. That wouldn't > help anyone. I also don't see why in the world there would be a requirement of having a given "oddly-sized" range (e.g. 96MB) aligned to a multiple of its size. In the end, AFAIK aligment requirements' only purpose is to make hardware simpler. Cant'see how aligning to an "odd" number would help achieving this purpose. But that's just me, of course. Thanks guys! Gerlando -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Dear Gerlando Falauto, On Fri, 21 Feb 2014 19:18:25 +0100, Gerlando Falauto wrote: > > Hum, right. This is a bit weird, maybe I should change that, I don't > > think the mvebu-mbus driver should accept 1-byte offset sizes. > > I don't know anything about this, I only know the size dumped is of the > form 0x...ffff, that's all. I'll have to look into this. > >> # cat /sys/kernel/debug/mvebu-mbus/devices > >> [00] 00000000e8000000 - 00000000ec000000 : pcie0.0 (remap 00000000e8000000) > >> [01] disabled > >> [02] disabled > >> [03] disabled > >> [04] 00000000ff000000 - 00000000ff010000 : nand > >> [05] 00000000f4000000 - 00000000f8000000 : vpcie > >> [06] 00000000fe000000 - 00000000fe010000 : dragonite > >> [07] 00000000e0000000 - 00000000e8000000 : pcie0.0 > > > > This seems correct: we have two windows pointing to the same device, > > and they have consecutive addresses. > > I don't know how to interpret the (remap ... ) bit, but yes, this looks > right to me as well. I just don't know why mbus window 7 gets picked > before 0, but apart from that, it looks nice. Basically, some windows have an additional capability: they are "remappable". On Kirkwood, the first 4 windows are remappable, and the last 4 are not. Therefore, unless you request a remappable window, we allocate a non-remappable one, which is why window 4 to 7 get used first. And then, even though we don't need the remappable feature for the last window, there are no more non-remappable windows available, so window 0 gets allocated for our second PCIe window. It matches fine with the expected behavior of the mvebu-mbus driver. > > Did you check that what you read from BAR0 (which is mapped on the new > > MBUS window) is really what you expect, and not just the same thing as > > BAR1 accessible for the big window? I just want to make sure that the > > hardware indeed properly handles two windows for the same device. > > Yes, there's no way the two BARs could be aliased. It's a fairly complex > FPGA design, where BAR1 is the huge address space for a PCI-to-localbus > bridge (whose connected devices are recognized correctly) and BAR0 is > the control BAR (and its registers are read and written without a problem). Great, so it means that it really works! Thomas
diff --git a/drivers/bus/mvebu-mbus.c b/drivers/bus/mvebu-mbus.c index dd4445f..27fe162 100644 --- a/drivers/bus/mvebu-mbus.c +++ b/drivers/bus/mvebu-mbus.c @@ -251,11 +251,13 @@ static int mvebu_mbus_window_conflicts(struct mvebu_mbus_state *mbus, if ((u64)base < wend && end > wbase) return 0; +#if 0 /* * Check if target/attribute conflicts */ if (target == wtarget && attr == wattr) return 0; +#endif } return 1; diff --git a/drivers/pci/host/pci-mvebu.c b/drivers/pci/host/pci-mvebu.c index c8397c4..120a822 100644 --- a/drivers/pci/host/pci-mvebu.c +++ b/drivers/pci/host/pci-mvebu.c @@ -332,10 +332,21 @@ static void mvebu_pcie_handle_membase_change(struct mvebu_pcie_port *port) (((port->bridge.memlimit & 0xFFF0) << 16) | 0xFFFFF) - port->memwin_base; - mvebu_mbus_add_window_remap_flags(port->name, port->memwin_base, - port->memwin_size, - MVEBU_MBUS_NO_REMAP, - MVEBU_MBUS_PCI_MEM); + if (port->memwin_size + 1 == (SZ_128M + SZ_64M)) { + mvebu_mbus_add_window_remap_flags(port->name, port->memwin_base, + SZ_128M - 1, + MVEBU_MBUS_NO_REMAP, + MVEBU_MBUS_PCI_MEM); + mvebu_mbus_add_window_remap_flags(port->name, port->memwin_base + SZ_128M, + SZ_64M - 1, + MVEBU_MBUS_NO_REMAP, + MVEBU_MBUS_PCI_MEM); + } else { + mvebu_mbus_add_window_remap_flags(port->name, port->memwin_base, + port->memwin_size, + MVEBU_MBUS_NO_REMAP, + MVEBU_MBUS_PCI_MEM); + } }