diff mbox

[6/6] PCI: Try to allocate mem64 above 4G at first

Message ID CAE9FiQXgvaqErd1oiHmt6G8GwbW92C490+0sYyLUMcMFMMNRAg@mail.gmail.com (mailing list archive)
State New, archived
Delegated to: Bjorn Helgaas
Headers show

Commit Message

Yinghai Lu Nov. 21, 2013, 8:18 p.m. UTC
On Thu, Nov 21, 2013 at 2:30 AM, Guo Chao <yan@linux.vnet.ibm.com> wrote:
> Hi:
>
> On Tue, Nov 19, 2013 at 05:51:57PM -0800, Yinghai Lu wrote:
>> Will fall back to below 4g if it can not find any above 4g.
>>
>
>
>
>
>> x86 32bit without X86_PAE support will have bottom set to 0, because
>> resource_size_t is 32bit.
>>
>> Also for 32bit with resource_size_t 64bit kernel on machine with pae support
>> we are safe because iomem_resource is limited to 32bit according to
>> x86_phys_bits.
>>
>> -v2: update bottom assigning to make it clear for non-pae support machine.
>> -v3: Bjorn's change:
>>         use MAX_RESOURCE instead of -1
>>         use start/end instead of bottom/max
>>         for all arch instead of just x86_64
>> -v4: updated after PCI_MAX_RESOURCE_32 change.
>> -v5: restore io handling to use PCI_MAX_RESOURCE_32 as limit.
>> -v6: checking pcibios_resource_to_bus return for every bus res, to decide it
>>       if we need to try high at first.
>>      It supports all arches instead of just x86_64.
>>

> Work fine in our systems if '[RFC PATCH 3/3] PCI: do not reset bridge's
> IORESOURCE_MEM_64 flag for ROM BAR' applied.
>
> Otherwise, in one system, the 32-bit window is too small to provide
> fallback space for prefetchable windows of root bridge, causing all
> prefethable resources failed to get addresses.
>
> Any comments about that patch?

no, that patch is not right.
That could prevent rom BAR getting allocate under 4G.

solution could be:
1. Just remove pref on rom bar allocation. as attached rom_no_pref.patch
2. or treat pref rom as option resource, as attached rom_option_1_xxx.patch
3. or more generic, treat all pci BAR 32bit prefetechable as normal
MMIO32 during allocation. aka mmio prefectechable will be used for pci
bridge that support 64bit mmio pref and leave bridge's 32bit only pref
bar register blank.

Bjorn, Linus,
Are you happy with No 3?

Thanks

Yinghai

Comments

Linus Torvalds Nov. 21, 2013, 9:05 p.m. UTC | #1
On Thu, Nov 21, 2013 at 12:18 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>
> 3. or more generic, treat all pci BAR 32bit prefetechable as normal
> MMIO32 during allocation. aka mmio prefectechable will be used for pci
> bridge that support 64bit mmio pref and leave bridge's 32bit only pref
> bar register blank.
>
> Bjorn, Linus,
> Are you happy with No 3?

I don't think I understand your #3. If there's a 32-bit device behind
the bridge, it will need to be in the low 4G, so you can't use the
64-bit window because it will be out of range. So that wouldn't work.
So you'd need to have a bridge window in the low 32-bit area too if
there are any 32-bit devices behind the bridge.

Also, haven't we tried the "try to allocate 64-bit devices in high
memory" and it has always failed because there are broken devices?

                Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yinghai Lu Nov. 21, 2013, 9:34 p.m. UTC | #2
On Thu, Nov 21, 2013 at 1:05 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Thu, Nov 21, 2013 at 12:18 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>>
>> 3. or more generic, treat all pci BAR 32bit prefetechable as normal
>> MMIO32 during allocation. aka mmio prefectechable will be used for pci
>> bridge that support 64bit mmio pref and leave bridge's 32bit only pref
>> bar register blank.
>>
>> Bjorn, Linus,
>> Are you happy with No 3?
>
> I don't think I understand your #3. If there's a 32-bit device behind
> the bridge, it will need to be in the low 4G, so you can't use the
> 64-bit window because it will be out of range. So that wouldn't work.
> So you'd need to have a bridge window in the low 32-bit area too if
> there are any 32-bit devices behind the bridge.

bridge should have two mmio,
mmio nonpref: it only can be 32bit
mmio pref: it could be 32bit only or 64bit.
and we are ok to put child pref range under nonpref range from bridge.

Here are suggestion is:
assume mmio pref: will be use for 64bit pref only, and if the mmio pref
does not support 64bit, we just ignore it.
If devices under that bridge need pref, we will just use range from bridge's
nonpref mmio.

>
> Also, haven't we tried the "try to allocate 64-bit devices in high
> memory" and it has always failed because there are broken devices?

In my test setups, it is always working.
--- only with Intel network devices and Mellanox Infiniband cards and storage
cards from Qlogic and Emulex.

Thanks

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Linus Torvalds Nov. 21, 2013, 9:42 p.m. UTC | #3
On Thu, Nov 21, 2013 at 1:34 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>
> Here are suggestion is:
> assume mmio pref: will be use for 64bit pref only, and if the mmio pref
> does not support 64bit, we just ignore it.
> If devices under that bridge need pref, we will just use range from bridge's
> nonpref mmio.

Since we always scan devices behind the bus, why don't we take that
into account? If we find devices with 32-bit mmio, we try to make the
prefetchable one be in the 32-bit range.

> In my test setups, it is always working.
> --- only with Intel network devices and Mellanox Infiniband cards and storage
> cards from Qlogic and Emulex.

Right. And remember how many times your testing did *not* show
problems that others saw?

"Works for me" doesn't work for PCI resource management.

              Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

Subject: [PATCH] PCI: Treat ROM resource as optional during assigning.

So will try to allocate them together with requested ones, if can not assign
them, could go with requested one only, and just skip ROM resource.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 drivers/pci/setup-bus.c |   21 +++++++--------------
 1 file changed, 7 insertions(+), 14 deletions(-)

Index: linux-2.6/drivers/pci/setup-bus.c
===================================================================
--- linux-2.6.orig/drivers/pci/setup-bus.c
+++ linux-2.6/drivers/pci/setup-bus.c
@@ -303,18 +303,10 @@  static void assign_requested_resources_s
 		idx = pci_dev_resource_idx(dev_res->dev, res);
 		if (resource_size(res) &&
 		    pci_assign_resource_fit(dev_res->dev, idx, fit)) {
-			if (fail_head) {
-				/*
-				 * if the failed res is for ROM BAR, and it will
-				 * be enabled later, don't add it to the list
-				 */
-				if (!((idx == PCI_ROM_RESOURCE) &&
-				      (!(res->flags & IORESOURCE_ROM_ENABLE))))
-					add_to_list(fail_head,
-						    dev_res->dev, res,
-						    0 /* don't care */,
-						    0 /* don't care */);
-			}
+			if (fail_head)
+				add_to_list(fail_head, dev_res->dev, res,
+					    0 /* don't care */,
+					    0 /* don't care */);
 			reset_resource(res);
 		}
 	}
@@ -903,8 +895,9 @@  static int pbus_size_mem(struct pci_bus
 				continue;
 			r_size = resource_size(r);
 
-			/* put SRIOV requested res to the optional list */
-			if (realloc_head && is_pci_iov_resource_idx(i)) {
+			/* put SRIOV/ROM requested res to the optional list */
+			if (realloc_head && (is_pci_iov_resource_idx(i) ||
+					     is_pci_rom_resource_idx(i))) {
 				r->end = r->start - 1;
 				add_to_list(realloc_head, dev, r, r_size, 0/* dont' care */);
 				children_add_size += r_size;