diff mbox

pci-mvebu driver on km_kirkwood

Message ID 53074584.5010202@keymile.com (mailing list archive)
State New, archived
Headers show

Commit Message

Gerlando Falauto Feb. 21, 2014, 12:24 p.m. UTC
Hi Thomas,

On 02/21/2014 10:39 AM, Thomas Petazzoni wrote:
> Dear Gerlando Falauto,

[...]

>> I guess it would then also be useful to restore my previous setup, where
>> the total PCIe aperture is 192MB, right?
>
> Yes, that's the case I'm interested in at the moment. If you could try
> the above (ugly) patch, and see if you can access all your device BARs,
> it would be interesting. It would tell us if two separate windows
> having the same target/attribute and consecutive placement in the
> physical address space can actually work to address a given PCIe
> device. As you will see, the patch makes a very ugly special case for
> 192 MB :-)
>

So I restored the total aperture size to 192MB.
I had to rework your patch a bit because:

a) I'm running an older kernel and driver
b) sizes are actually 1-byte offset

So here it is:


  /*


Here's the assignment (same as before):

pci 0000:00:01.0: BAR 8: assigned [mem 0xe0000000-0xebffffff]
pci 0000:01:00.0: BAR 1: assigned [mem 0xe0000000-0xe7ffffff]
pci 0000:01:00.0: BAR 3: assigned [mem 0xe8000000-0xe87fffff]
pci 0000:01:00.0: BAR 4: assigned [mem 0xe8800000-0xe8801fff]
pci 0000:01:00.0: BAR 0: assigned [mem 0xe8802000-0xe8802fff]
pci 0000:01:00.0: BAR 2: assigned [mem 0xe8803000-0xe8803fff]
pci 0000:01:00.0: BAR 5: assigned [mem 0xe8804000-0xe8804fff]

And here's the output I get from:

# cat /sys/kernel/debug/mvebu-mbus/devices
[00] 00000000e8000000 - 00000000ec000000 : pcie0.0 (remap 00000000e8000000)
[01] disabled
[02] disabled
[03] disabled
[04] 00000000ff000000 - 00000000ff010000 : nand
[05] 00000000f4000000 - 00000000f8000000 : vpcie
[06] 00000000fe000000 - 00000000fe010000 : dragonite
[07] 00000000e0000000 - 00000000e8000000 : pcie0.0

I did not get to test the whole address space thoroughly, but all the 
BARs are still accessible (mainly BAR0 which contains the control space 
and is mapped on the "new" MBUS window, and BAR1 which is the "big" 
one). So at least, the issues we had before are now gone.
So I'd say this looks like a very promising approach. :-)

Thank you,
Gerlando

Comments

Thomas Petazzoni Feb. 21, 2014, 1:47 p.m. UTC | #1
Dear Gerlando Falauto,

On Fri, 21 Feb 2014 13:24:36 +0100, Gerlando Falauto wrote:

> So I restored the total aperture size to 192MB.
> I had to rework your patch a bit because:
> 
> a) I'm running an older kernel and driver
> b) sizes are actually 1-byte offset

Hum, right. This is a bit weird, maybe I should change that, I don't
think the mvebu-mbus driver should accept 1-byte offset sizes.

> Here's the assignment (same as before):
> 
> pci 0000:00:01.0: BAR 8: assigned [mem 0xe0000000-0xebffffff]
> pci 0000:01:00.0: BAR 1: assigned [mem 0xe0000000-0xe7ffffff]
> pci 0000:01:00.0: BAR 3: assigned [mem 0xe8000000-0xe87fffff]
> pci 0000:01:00.0: BAR 4: assigned [mem 0xe8800000-0xe8801fff]
> pci 0000:01:00.0: BAR 0: assigned [mem 0xe8802000-0xe8802fff]
> pci 0000:01:00.0: BAR 2: assigned [mem 0xe8803000-0xe8803fff]
> pci 0000:01:00.0: BAR 5: assigned [mem 0xe8804000-0xe8804fff]
> 
> And here's the output I get from:
> 
> # cat /sys/kernel/debug/mvebu-mbus/devices
> [00] 00000000e8000000 - 00000000ec000000 : pcie0.0 (remap 00000000e8000000)
> [01] disabled
> [02] disabled
> [03] disabled
> [04] 00000000ff000000 - 00000000ff010000 : nand
> [05] 00000000f4000000 - 00000000f8000000 : vpcie
> [06] 00000000fe000000 - 00000000fe010000 : dragonite
> [07] 00000000e0000000 - 00000000e8000000 : pcie0.0

This seems correct: we have two windows pointing to the same device,
and they have consecutive addresses.

> I did not get to test the whole address space thoroughly, but all the 
> BARs are still accessible (mainly BAR0 which contains the control space 
> and is mapped on the "new" MBUS window, and BAR1 which is the "big" 
> one). So at least, the issues we had before are now gone.

Did you check that what you read from BAR0 (which is mapped on the new
MBUS window) is really what you expect, and not just the same thing as
BAR1 accessible for the big window? I just want to make sure that the
hardware indeed properly handles two windows for the same device.

> So I'd say this looks like a very promising approach. :-)

Indeed. However, I don't think this approach solves the entire problem,
for two reasons:

 *) For small BARs that are not power-of-two sized, we may not want to
    consume two windows, but instead consume a little bit more address
    space. Using two windows to map a 96 KB BAR would be a waste of
    windows: using a single 128 KB window is much more efficient.

 *) I don't know if the algorithm to split the BAR into multiple
    windows is going to be trivial.

Thomas
Arnd Bergmann Feb. 21, 2014, 3:05 p.m. UTC | #2
On Friday 21 February 2014 14:47:08 Thomas Petazzoni wrote:
> 
> > So I'd say this looks like a very promising approach. 
> 
> Indeed. However, I don't think this approach solves the entire problem,
> for two reasons:
> 
>  *) For small BARs that are not power-of-two sized, we may not want to
>     consume two windows, but instead consume a little bit more address
>     space. Using two windows to map a 96 KB BAR would be a waste of
>     windows: using a single 128 KB window is much more efficient.

definitely.
 
>  *) I don't know if the algorithm to split the BAR into multiple
>     windows is going to be trivial.

The easiest solution would be to special case 'size is between
128MB+1 and 192MB' if that turns out to be the most interesting
case. It's easy enough to make the second window smaller than 64MB
if we want.

If we want things to be a little fancier, we could use:

	switch (size) {
		case (SZ_32M+1) ... (SZ_32M+SZ_16M):
			size2 = size - SZ_32M;
			size -= SZ_32M;
			break;
		case (SZ_64M+1) ... (SZ_64M+SZ_32M):
			size2 = size - SZ_64M;
			size -= SZ_64M;
			break;
		case (SZ_128M+1) ... (SZ_128M+SZ_64M):
			size2 = size - SZ_128M;
			size -= SZ_128M;
			break;
	};


	Arnd
Thomas Petazzoni Feb. 21, 2014, 3:11 p.m. UTC | #3
Dear Arnd Bergmann,

On Fri, 21 Feb 2014 16:05:16 +0100, Arnd Bergmann wrote:

> >  *) I don't know if the algorithm to split the BAR into multiple
> >     windows is going to be trivial.
> 
> The easiest solution would be to special case 'size is between
> 128MB+1 and 192MB' if that turns out to be the most interesting
> case. It's easy enough to make the second window smaller than 64MB
> if we want.
> 
> If we want things to be a little fancier, we could use:
> 
> 	switch (size) {
> 		case (SZ_32M+1) ... (SZ_32M+SZ_16M):
> 			size2 = size - SZ_32M;
> 			size -= SZ_32M;
> 			break;
> 		case (SZ_64M+1) ... (SZ_64M+SZ_32M):
> 			size2 = size - SZ_64M;
> 			size -= SZ_64M;
> 			break;
> 		case (SZ_128M+1) ... (SZ_128M+SZ_64M):
> 			size2 = size - SZ_128M;
> 			size -= SZ_128M;
> 			break;
> 	};

What if the size of your BAR is 128 MB + 64 MB + 32 MB ? Then you need
three windows, and your algorithm doesn't work :-)

Thomas
Arnd Bergmann Feb. 21, 2014, 3:20 p.m. UTC | #4
On Friday 21 February 2014 16:11:08 Thomas Petazzoni wrote:
> On Fri, 21 Feb 2014 16:05:16 +0100, Arnd Bergmann wrote:
> 
> > >  *) I don't know if the algorithm to split the BAR into multiple
> > >     windows is going to be trivial.
> > 
> > The easiest solution would be to special case 'size is between
> > 128MB+1 and 192MB' if that turns out to be the most interesting
> > case. It's easy enough to make the second window smaller than 64MB
> > if we want.
> > 
> > If we want things to be a little fancier, we could use:
> > 
> >       switch (size) {
> >               case (SZ_32M+1) ... (SZ_32M+SZ_16M):
> >                       size2 = size - SZ_32M;
> >                       size -= SZ_32M;
> >                       break;
> >               case (SZ_64M+1) ... (SZ_64M+SZ_32M):
> >                       size2 = size - SZ_64M;
> >                       size -= SZ_64M;
> >                       break;
> >               case (SZ_128M+1) ... (SZ_128M+SZ_64M):
> >                       size2 = size - SZ_128M;
> >                       size -= SZ_128M;
> >                       break;
> >       };
> 
> What if the size of your BAR is 128 MB + 64 MB + 32 MB ? Then you need
> three windows, and your algorithm doesn't work 

I was hoping we could avoid using more than two windows.
With the algorithm above we would round up to 256MB and
fail if that doesn't fit, which is the same thing that
happens when you run out of space.

	Arnd
Thomas Petazzoni Feb. 21, 2014, 3:37 p.m. UTC | #5
Dear Arnd Bergmann,

On Fri, 21 Feb 2014 16:20:45 +0100, Arnd Bergmann wrote:

> > What if the size of your BAR is 128 MB + 64 MB + 32 MB ? Then you need
> > three windows, and your algorithm doesn't work 
> 
> I was hoping we could avoid using more than two windows.
> With the algorithm above we would round up to 256MB and
> fail if that doesn't fit, which is the same thing that
> happens when you run out of space.

The problem is precisely that we currently don't have any well to tell
the Linux PCI core that we need to round up the size of a BAR. That's
the whole starting point of the discussion :-)

Thomas
Jason Gunthorpe Feb. 21, 2014, 4:39 p.m. UTC | #6
On Fri, Feb 21, 2014 at 02:47:08PM +0100, Thomas Petazzoni wrote:
>  *) I don't know if the algorithm to split the BAR into multiple
>     windows is going to be trivial.

physaddr_t base,size;

while (size != 0) {
   physaddr_t window_size = 1 << log2_round_down(size);
   create_window(base,window_size);
   base += window_size;
   size -= window_size;
}

At the very worst log2_round_down is approxmiately

unsigned int log2_round_down(unsigned int val)
{
	unsigned int res = 0;
	while ((1<<res) <= val)
		res++;
	return res - 1;
}

Minimum PCI required alignment for windows is 1MB so it will always
work out into some number of mbus windows..

Jason
Thomas Petazzoni Feb. 21, 2014, 5:05 p.m. UTC | #7
Dear Jason Gunthorpe,

On Fri, 21 Feb 2014 09:39:02 -0700, Jason Gunthorpe wrote:
> On Fri, Feb 21, 2014 at 02:47:08PM +0100, Thomas Petazzoni wrote:
> >  *) I don't know if the algorithm to split the BAR into multiple
> >     windows is going to be trivial.
> 
> physaddr_t base,size;
> 
> while (size != 0) {
>    physaddr_t window_size = 1 << log2_round_down(size);
>    create_window(base,window_size);
>    base += window_size;
>    size -= window_size;
> }
> 
> At the very worst log2_round_down is approxmiately
> 
> unsigned int log2_round_down(unsigned int val)
> {
> 	unsigned int res = 0;
> 	while ((1<<res) <= val)
> 		res++;
> 	return res - 1;
> }
> 
> Minimum PCI required alignment for windows is 1MB so it will always
> work out into some number of mbus windows..

Interesting! Thanks!

Now I have another question: our mvebu_pcie_align_resource() function
makes sure that the base address of the BAR is aligned on its size,
because it is a requirement of MBus windows. However, if you later
split the BAR into multiple windows, will this continue to work out?

Let's take an example: a 96 MB BAR. If it gets put at 0xe0000000, then
no problem: we create one 64 MB window at 0xe0000000 and a 32 MB window
at 0xe4000000. Both base addresses are aligned on the size of the
window.

However, if the 96 MB BAR gets put at 0xea000000 (which is aligned on a
96 MB boundary, as required by our mvebu_pcie_align_resource). We
create one 64 MB window at 0xea000000, and one 32 MB window at
0xee000000. Unfortunately, while 0xea000000 is aligned on a 96 MB
boundary, it is not aligned on a 64 MB boundary, so the 64 MB window we
have created is wrong.

Which also makes me think that our mvebu_pcie_align_resource()
function uses round_up(start, size), which most likely doesn't work with
non power-of-two sizes.

Thomas
Jason Gunthorpe Feb. 21, 2014, 5:31 p.m. UTC | #8
On Fri, Feb 21, 2014 at 06:05:08PM +0100, Thomas Petazzoni wrote:
 
> Now I have another question: our mvebu_pcie_align_resource() function
> makes sure that the base address of the BAR is aligned on its size,
> because it is a requirement of MBus windows. However, if you later
> split the BAR into multiple windows, will this continue to work out?

No, you must align to (1 << log2_round_down(size)) - that will always
be the largest mbus window created and thus the highest starting
alignment requirement.

I looked for a bit to see if I could find why the core is rounding up
to 196MB and it wasn't clear to me either.

Gerlando, if you instrument the code in setup-bus.c, particularly
pbus_size_mem, you will probably find out.

Jason
Arnd Bergmann Feb. 21, 2014, 6:05 p.m. UTC | #9
On Friday 21 February 2014 10:31:05 Jason Gunthorpe wrote:
> On Fri, Feb 21, 2014 at 06:05:08PM +0100, Thomas Petazzoni wrote:
>  
> > Now I have another question: our mvebu_pcie_align_resource() function
> > makes sure that the base address of the BAR is aligned on its size,
> > because it is a requirement of MBus windows. However, if you later
> > split the BAR into multiple windows, will this continue to work out?
> 
> No, you must align to (1 << log2_round_down(size)) - that will always
> be the largest mbus window created and thus the highest starting
> alignment requirement.

Unless you allow reordering the two windows. If you want a 96MB
window, you only need 32MB alignment because you can either put
the actual 64MB window first if you have 64MB alignment, or you
put the 32MB window first if you don't and then the following
64MB will be aligned.

It gets more complicated if you want to allow a 72MB window
(16MB+64MB), as that could either be 64MB aligned or start 16MB
before the next multiple of 64MB.

I don't think there is any reason why code anywhere should align
the window to a multiple of the size though if the size is not
power-of-two, such as aligning to multiples of 96MB. That wouldn't
help anyone.

	Arnd
Gerlando Falauto Feb. 21, 2014, 6:18 p.m. UTC | #10
Dear Thomas,

On 02/21/2014 02:47 PM, Thomas Petazzoni wrote:
> Dear Gerlando Falauto,
>
> On Fri, 21 Feb 2014 13:24:36 +0100, Gerlando Falauto wrote:
>
>> So I restored the total aperture size to 192MB.
>> I had to rework your patch a bit because:
>>
>> a) I'm running an older kernel and driver
>> b) sizes are actually 1-byte offset
>
> Hum, right. This is a bit weird, maybe I should change that, I don't
> think the mvebu-mbus driver should accept 1-byte offset sizes.

I don't know anything about this, I only know the size dumped is of the 
form 0x...ffff, that's all.

>> Here's the assignment (same as before):
>>
>> pci 0000:00:01.0: BAR 8: assigned [mem 0xe0000000-0xebffffff]
>> pci 0000:01:00.0: BAR 1: assigned [mem 0xe0000000-0xe7ffffff]
>> pci 0000:01:00.0: BAR 3: assigned [mem 0xe8000000-0xe87fffff]
>> pci 0000:01:00.0: BAR 4: assigned [mem 0xe8800000-0xe8801fff]
>> pci 0000:01:00.0: BAR 0: assigned [mem 0xe8802000-0xe8802fff]
>> pci 0000:01:00.0: BAR 2: assigned [mem 0xe8803000-0xe8803fff]
>> pci 0000:01:00.0: BAR 5: assigned [mem 0xe8804000-0xe8804fff]
>>
>> And here's the output I get from:
>>
>> # cat /sys/kernel/debug/mvebu-mbus/devices
>> [00] 00000000e8000000 - 00000000ec000000 : pcie0.0 (remap 00000000e8000000)
>> [01] disabled
>> [02] disabled
>> [03] disabled
>> [04] 00000000ff000000 - 00000000ff010000 : nand
>> [05] 00000000f4000000 - 00000000f8000000 : vpcie
>> [06] 00000000fe000000 - 00000000fe010000 : dragonite
>> [07] 00000000e0000000 - 00000000e8000000 : pcie0.0
>
> This seems correct: we have two windows pointing to the same device,
> and they have consecutive addresses.

I don't know how to interpret the (remap ... ) bit, but yes, this looks 
right to me as well. I just don't know why mbus window 7 gets picked 
before 0, but apart from that, it looks nice.

>> I did not get to test the whole address space thoroughly, but all the
>> BARs are still accessible (mainly BAR0 which contains the control space
>> and is mapped on the "new" MBUS window, and BAR1 which is the "big"
>> one). So at least, the issues we had before are now gone.
>
> Did you check that what you read from BAR0 (which is mapped on the new
> MBUS window) is really what you expect, and not just the same thing as
> BAR1 accessible for the big window? I just want to make sure that the
> hardware indeed properly handles two windows for the same device.

Yes, there's no way the two BARs could be aliased. It's a fairly complex 
FPGA design, where BAR1 is the huge address space for a PCI-to-localbus 
bridge (whose connected devices are recognized correctly) and BAR0 is 
the control BAR (and its registers are read and written without a problem).


>> So I'd say this looks like a very promising approach. :-)
>
> Indeed. However, I don't think this approach solves the entire problem,
> for two reasons:
>
>   *) For small BARs that are not power-of-two sized, we may not want to
>      consume two windows, but instead consume a little bit more address
>      space. Using two windows to map a 96 KB BAR would be a waste of
>      windows: using a single 128 KB window is much more efficient.
>
>   *) I don't know if the algorithm to split the BAR into multiple
>      windows is going to be trivial.

I see others have already replied and I pretty much agree with them.

Thanks,
Gerlando
Gerlando Falauto Feb. 21, 2014, 6:29 p.m. UTC | #11
On 02/21/2014 07:05 PM, Arnd Bergmann wrote:
> On Friday 21 February 2014 10:31:05 Jason Gunthorpe wrote:
>> On Fri, Feb 21, 2014 at 06:05:08PM +0100, Thomas Petazzoni wrote:
>>
>>> Now I have another question: our mvebu_pcie_align_resource() function
>>> makes sure that the base address of the BAR is aligned on its size,
>>> because it is a requirement of MBus windows. However, if you later
>>> split the BAR into multiple windows, will this continue to work out?
>>
>> No, you must align to (1 << log2_round_down(size)) - that will always
>> be the largest mbus window created and thus the highest starting
>> alignment requirement.
>
> Unless you allow reordering the two windows. If you want a 96MB
> window, you only need 32MB alignment because you can either put
> the actual 64MB window first if you have 64MB alignment, or you
> put the 32MB window first if you don't and then the following
> 64MB will be aligned.
>
> It gets more complicated if you want to allow a 72MB window
> (16MB+64MB), as that could either be 64MB aligned or start 16MB
> before the next multiple of 64MB.
>
> I don't think there is any reason why code anywhere should align
> the window to a multiple of the size though if the size is not
> power-of-two, such as aligning to multiples of 96MB. That wouldn't
> help anyone.

I also don't see why in the world there would be a requirement of having 
a given "oddly-sized" range (e.g. 96MB) aligned to a multiple of its 
size. In the end, AFAIK aligment requirements' only purpose is to make 
hardware simpler. Cant'see how aligning to an "odd" number would help 
achieving this purpose. But that's just me, of course.

Thanks guys!
Gerlando
Thomas Petazzoni Feb. 21, 2014, 6:45 p.m. UTC | #12
Dear Gerlando Falauto,

On Fri, 21 Feb 2014 19:18:25 +0100, Gerlando Falauto wrote:

> > Hum, right. This is a bit weird, maybe I should change that, I don't
> > think the mvebu-mbus driver should accept 1-byte offset sizes.
> 
> I don't know anything about this, I only know the size dumped is of the 
> form 0x...ffff, that's all.

I'll have to look into this.

> >> # cat /sys/kernel/debug/mvebu-mbus/devices
> >> [00] 00000000e8000000 - 00000000ec000000 : pcie0.0 (remap 00000000e8000000)
> >> [01] disabled
> >> [02] disabled
> >> [03] disabled
> >> [04] 00000000ff000000 - 00000000ff010000 : nand
> >> [05] 00000000f4000000 - 00000000f8000000 : vpcie
> >> [06] 00000000fe000000 - 00000000fe010000 : dragonite
> >> [07] 00000000e0000000 - 00000000e8000000 : pcie0.0
> >
> > This seems correct: we have two windows pointing to the same device,
> > and they have consecutive addresses.
> 
> I don't know how to interpret the (remap ... ) bit, but yes, this looks 
> right to me as well. I just don't know why mbus window 7 gets picked 
> before 0, but apart from that, it looks nice.

Basically, some windows have an additional capability: they are
"remappable". On Kirkwood, the first 4 windows are remappable, and the
last 4 are not. Therefore, unless you request a remappable window, we
allocate a non-remappable one, which is why window 4 to 7 get used
first. And then, even though we don't need the remappable feature for
the last window, there are no more non-remappable windows available, so
window 0 gets allocated for our second PCIe window.

It matches fine with the expected behavior of the mvebu-mbus driver.

> > Did you check that what you read from BAR0 (which is mapped on the new
> > MBUS window) is really what you expect, and not just the same thing as
> > BAR1 accessible for the big window? I just want to make sure that the
> > hardware indeed properly handles two windows for the same device.
> 
> Yes, there's no way the two BARs could be aliased. It's a fairly complex 
> FPGA design, where BAR1 is the huge address space for a PCI-to-localbus 
> bridge (whose connected devices are recognized correctly) and BAR0 is 
> the control BAR (and its registers are read and written without a problem).

Great, so it means that it really works!

Thomas
diff mbox

Patch

diff --git a/drivers/bus/mvebu-mbus.c b/drivers/bus/mvebu-mbus.c
index dd4445f..27fe162 100644
--- a/drivers/bus/mvebu-mbus.c
+++ b/drivers/bus/mvebu-mbus.c
@@ -251,11 +251,13 @@  static int mvebu_mbus_window_conflicts(struct 
mvebu_mbus_state *mbus,
  		if ((u64)base < wend && end > wbase)
  			return 0;

+#if 0
  		/*
  		 * Check if target/attribute conflicts
  		 */
  		if (target == wtarget && attr == wattr)
  			return 0;
+#endif
  	}

  	return 1;
diff --git a/drivers/pci/host/pci-mvebu.c b/drivers/pci/host/pci-mvebu.c
index c8397c4..120a822 100644
--- a/drivers/pci/host/pci-mvebu.c
+++ b/drivers/pci/host/pci-mvebu.c
@@ -332,10 +332,21 @@  static void 
mvebu_pcie_handle_membase_change(struct mvebu_pcie_port *port)
  		(((port->bridge.memlimit & 0xFFF0) << 16) | 0xFFFFF) -
  		port->memwin_base;

-	mvebu_mbus_add_window_remap_flags(port->name, port->memwin_base,
-					  port->memwin_size,
-					  MVEBU_MBUS_NO_REMAP,
-					  MVEBU_MBUS_PCI_MEM);
+	if (port->memwin_size + 1 == (SZ_128M + SZ_64M)) {
+		mvebu_mbus_add_window_remap_flags(port->name, port->memwin_base,
+						  SZ_128M - 1,
+						  MVEBU_MBUS_NO_REMAP,
+						  MVEBU_MBUS_PCI_MEM);
+		mvebu_mbus_add_window_remap_flags(port->name, port->memwin_base + 
SZ_128M,
+						  SZ_64M - 1,
+						  MVEBU_MBUS_NO_REMAP,
+						  MVEBU_MBUS_PCI_MEM);
+	} else {
+		mvebu_mbus_add_window_remap_flags(port->name, port->memwin_base,
+						  port->memwin_size,
+						  MVEBU_MBUS_NO_REMAP,
+						  MVEBU_MBUS_PCI_MEM);
+	}
  }