diff mbox

[RFC] IB/hfi1: Fix port ordering issue in a multiport device

Message ID 148409267200.13402.16060755922068447437.stgit@tstruk-mobl1.ra.intel.com (mailing list archive)
State New, archived
Delegated to: Bjorn Helgaas
Headers show

Commit Message

Tadeusz Struk Jan. 10, 2017, 11:57 p.m. UTC
Some hardware have multiple HFIs within the same ASIC. In some devices the
numbers labeled on the faceplate of the device don't match the PCI bus
order, and the result is that the devices (ports) are probed in the opposite
order of their port numbers. The result is IB device unit numbers are in
reverse order from the faceplate numbering. This leads to confusion, and
errors.
We can fix this by enforcing the correct port order at module load time.
To reorder the ports to match the numbering labels on the back of the
device we need to delay registering devices with the ib_core until we
receive a probe for the affected devices, reorder them and start
initialization in the correct order later. We use a timer with 1 second
timeout. On hfi1 probes devices are ordered and stored in a list.
Each hfi1 probe updates the timer. When the timer timeouts all devices are
initialized and registered with ib_core in the right order.
When there will more than one second time gap between hfi1 probes,
the timer will be update in each call so there is no danger that a device
will be "missed".

A new module param called port_reorder is introduced to cover users, who
already prewired their clusters in the 'invalid' order. The default
behavior does not change and results in devices ordered in the PCI bus
order. Port_reorder=1 is used to apply special reordering to these devices.

Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com>
---
 drivers/infiniband/hw/hfi1/chip.c |    3 -
 drivers/infiniband/hw/hfi1/chip.h |    4 +
 drivers/infiniband/hw/hfi1/init.c |  170 +++++++++++++++++++++++++++++++++++++
 3 files changed, 173 insertions(+), 4 deletions(-)


--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Leon Romanovsky Jan. 11, 2017, 7:12 a.m. UTC | #1
On Tue, Jan 10, 2017 at 03:57:52PM -0800, Tadeusz Struk wrote:
> Some hardware have multiple HFIs within the same ASIC. In some devices the
> numbers labeled on the faceplate of the device don't match the PCI bus
> order, and the result is that the devices (ports) are probed in the opposite
> order of their port numbers. The result is IB device unit numbers are in
> reverse order from the faceplate numbering. This leads to confusion, and
> errors.

Can't you recognize such device at the initialization phase?
This is definitely specific HW implementation bug/issue/limitation.

> We can fix this by enforcing the correct port order at module load time.
> To reorder the ports to match the numbering labels on the back of the
> device we need to delay registering devices with the ib_core until we
> receive a probe for the affected devices, reorder them and start
> initialization in the correct order later. We use a timer with 1 second
> timeout. On hfi1 probes devices are ordered and stored in a list.
> Each hfi1 probe updates the timer. When the timer timeouts all devices are
> initialized and registered with ib_core in the right order.
> When there will more than one second time gap between hfi1 probes,
> the timer will be update in each call so there is no danger that a device
> will be "missed".
>
> A new module param called port_reorder is introduced to cover users, who
> already prewired their clusters in the 'invalid' order. The default
> behavior does not change and results in devices ordered in the PCI bus
> order. Port_reorder=1 is used to apply special reordering to these devices.

I always had an impression that module parameters are rarely beneficial
in upstream kernel for driver modules and adding them for new driver
code should be explicitly prohibited in CodingStyle guide.

You are adding new module parameter which will be with us forever to fix
special HW bug/implementation in some legacy installations. It will be
insane to add such thing to upstream kernel, errata and out-of-tree
implementations are best places for such things.

Thanks
Tadeusz Struk Jan. 11, 2017, 5:20 p.m. UTC | #2
Hi Leon,
On 01/10/2017 11:12 PM, Leon Romanovsky wrote:
> Can't you recognize such device at the initialization phase?

Yes, this is exactly what it does. It checks the device GUID
and reorders the ports during insmod.

> This is definitely specific HW implementation bug/issue/limitation.

Correct. As the commit message says this is to fix a HW issue.

> 
> I always had an impression that module parameters are rarely beneficial
> in upstream kernel for driver modules and adding them for new driver
> code should be explicitly prohibited in CodingStyle guide.
> 
> You are adding new module parameter which will be with us forever to fix
> special HW bug/implementation in some legacy installations. It will be
> insane to add such thing to upstream kernel, errata and out-of-tree
> implementations are best places for such things.

Agree, but the reality is that there are 5505 $(git grep module_param drivers/ | wc -l)
module parameters in the kernel already, and even mlx4 and mlx5 drivers use them.
I really considered every other possible solution, but module parameter is the
only possible way to do such things during insmod.
Thanks,
Leon Romanovsky Jan. 11, 2017, 5:59 p.m. UTC | #3
On Wed, Jan 11, 2017 at 09:20:54AM -0800, Tadeusz Struk wrote:
> Hi Leon,
> On 01/10/2017 11:12 PM, Leon Romanovsky wrote:
> > Can't you recognize such device at the initialization phase?
>
> Yes, this is exactly what it does. It checks the device GUID
> and reorders the ports during insmod.

Why don't your firmware report the need to revert the ports?

>
> > This is definitely specific HW implementation bug/issue/limitation.
>
> Correct. As the commit message says this is to fix a HW issue.
>
> >
> > I always had an impression that module parameters are rarely beneficial
> > in upstream kernel for driver modules and adding them for new driver
> > code should be explicitly prohibited in CodingStyle guide.
> >
> > You are adding new module parameter which will be with us forever to fix
> > special HW bug/implementation in some legacy installations. It will be
> > insane to add such thing to upstream kernel, errata and out-of-tree
> > implementations are best places for such things.
>
> Agree, but the reality is that there are 5505 $(git grep module_param drivers/ | wc -l)
> module parameters in the kernel already, and even mlx4 and mlx5 drivers use them.

Regarding mlx4/mlx5, I'm not proud of that code and if it was dependent
on me, I would remove it without thinking twice.

Previous copy-paste can not be an excuse to add another module parameter.

> I really considered every other possible solution, but module parameter is the
> only possible way to do such things during insmod.

This is another problem with the module parameters - assumption that
everyone is running modules, but this is simply incorrect, most of my
smoke tests are performed with monolithic kernel.

> Thanks,
> --
> Tadeusz
Jason Gunthorpe Jan. 11, 2017, 6:10 p.m. UTC | #4
On Tue, Jan 10, 2017 at 03:57:52PM -0800, Tadeusz Struk wrote:
> We can fix this by enforcing the correct port order at module load time.
> To reorder the ports to match the numbering labels on the back of the
> device we need to delay registering devices with the ib_core until
> we

Sorry, no way - this is horrifying.

If you need stable names for RDMA devices then you need to add proper
infrastructure to the kernel to rename RDMA devices from user space
via udev. ala netdev.

or change the ib_core to allow your driver to specify the full name
and manage things in your driver.

No way on this insane block probe approach.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Doug Ledford Jan. 18, 2017, 9:01 p.m. UTC | #5
On Wed, 2017-01-11 at 11:10 -0700, Jason Gunthorpe wrote:
> On Tue, Jan 10, 2017 at 03:57:52PM -0800, Tadeusz Struk wrote:
> > 
> > We can fix this by enforcing the correct port order at module load
> > time.
> > To reorder the ports to match the numbering labels on the back of
> > the
> > device we need to delay registering devices with the ib_core until
> > we
> 
> Sorry, no way - this is horrifying.

Agree.

> If you need stable names for RDMA devices then you need to add proper
> infrastructure to the kernel to rename RDMA devices from user space
> via udev. ala netdev.

This has its own problems in this particular case, namely having to
ship files that know which parts are effected and then modify the
names.  Or requiring that users create all of the rename rules when
they really shouldn't be bothered with anything.  Although, the module
option to turn this fix off for existing clusters that have been wired
up wrong is just as bad as creating persistent naming rules...

> or change the ib_core to allow your driver to specify the full name
> and manage things in your driver.
> 
> No way on this insane block probe approach.

Isn't there already code in the code device layer to handle this kind
of thing?  I seem to recall backporting it from upstream to a rhel
kernel many years ago.  Lemme go look...

OK, sure, as far as the reordering stuff is concerned, all you need to
do is to make use of the EPROBE_DEFER return option to your PCI probe
routine.  That way, when you get the probe for the out of order port,
the first time you pass EPROBE_DEFER as your return, then you get the
second port, your register it with the IB layer which will make it
appear as the first port (optionally, and as I haven't tried this I
don't know if it will work or if it's necessary, you can save the
pointer to the first port's device struct off, and when you get the
second one, you can tell the driver layer to splice the second port in
front of the first port in the device child list on the parent device,
but I think this is optional and really has no lasting effect on the
outcome), then later the first port gets retried and ends up being the
second port.

So, scrap all of this hand done junk and use the provided
infrastructure as it was intended to be used.  I *really* don't want to
do a kernel module option for this either.  Do you know for a fact that
this is wired up wrong in the field and the people can't just swap
hfi1_0<->hfi1_1 in their config files and have it all work without
being recabled?
Jason Gunthorpe Jan. 18, 2017, 9:08 p.m. UTC | #6
On Wed, Jan 18, 2017 at 04:01:00PM -0500, Doug Ledford wrote:

> OK, sure, as far as the reordering stuff is concerned, all you need to
> do is to make use of the EPROBE_DEFER return option to your PCI probe
> routine.  That way, when you get the probe for the out of order
> port,

EPROBE_DEFER is intended when waiting on other components the driver
may need, not for setting stable names.

This is a 'stable device naming' problem, which we have never tried to
solve in RDMA.

udev is the expected kernel way to solve this. Trying to hack stable
names by forcing device bind order is horrible.

hfi runs smack into this because it is the first scheme to actually
typically operate in a multi-struct ib_device world, which means they
get to solve it properly :\

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tadeusz Struk Jan. 18, 2017, 10:03 p.m. UTC | #7
On 01/18/2017 01:08 PM, Jason Gunthorpe wrote:
> EPROBE_DEFER is intended when waiting on other components the driver
> may need, not for setting stable names.

Also EPROBE_DEFER only works during boot time.
After boot when a module gets rmmod/insmod EPROBE_DEFER has no effect.
Thanks,
Doug Ledford Jan. 19, 2017, 12:16 a.m. UTC | #8
On Wed, 2017-01-18 at 14:08 -0700, Jason Gunthorpe wrote:
> On Wed, Jan 18, 2017 at 04:01:00PM -0500, Doug Ledford wrote:
> 
> > 
> > OK, sure, as far as the reordering stuff is concerned, all you need
> > to
> > do is to make use of the EPROBE_DEFER return option to your PCI
> > probe
> > routine.  That way, when you get the probe for the out of order
> > port,
> 
> EPROBE_DEFER is intended when waiting on other components the driver
> may need, not for setting stable names.

What it's intended for, and what it can be used for, need not be the
same.

> This is a 'stable device naming' problem, which we have never tried
> to
> solve in RDMA.

No, that would imply they must be hfi1_0 and hfi1_1, when this is not
the case.  If you had two cards in this system, both dual port, then
you may be wanting to rename hfi_3 to hfi1_2 and vice versa.  It is a
relative name problem, not a stable name problem.  We need only reverse
the order that the ports are probed in, their names are whatever they
end up being once reversed.

> udev is the expected kernel way to solve this. Trying to hack stable
> names by forcing device bind order is horrible.

This is a manufacturing defect.  Something I'm sure Intel wants to
resolve without requiring users to go in and manually name their ports.
 I have no doubt that they would prefer that the user remain blissfully
unaware of the issue, all except for the ones that probably reported it
and already have their system cabled up wrong as a result.

> hfi runs smack into this because it is the first scheme to actually
> typically operate in a multi-struct ib_device world, which means they
> get to solve it properly :\

No, mlx5 could have easily hit this too as their ports are separate PCI
functions.  As I said, it's a manufacturing defect and that means I'm
sure Intel would prefer to solve it silently and automatically.  I
would if I were them anyway.
Doug Ledford Jan. 19, 2017, 12:17 a.m. UTC | #9
On Wed, 2017-01-18 at 14:03 -0800, Tadeusz Struk wrote:
> On 01/18/2017 01:08 PM, Jason Gunthorpe wrote:
> > 
> > EPROBE_DEFER is intended when waiting on other components the
> > driver
> > may need, not for setting stable names.
> 
> Also EPROBE_DEFER only works during boot time.
> After boot when a module gets rmmod/insmod EPROBE_DEFER has no
> effect.

Are you positive about this?  And even if you are, my answer is to fix
the core to honor EPROBE_DEFER post boot.
Tadeusz Struk Jan. 19, 2017, 4:51 p.m. UTC | #10
On 01/18/2017 04:17 PM, Doug Ledford wrote:
>>> EPROBE_DEFER is intended when waiting on other components the
>>> driver
>>> may need, not for setting stable names.
>> Also EPROBE_DEFER only works during boot time.
>> After boot when a module gets rmmod/insmod EPROBE_DEFER has no
>> effect.
> Are you positive about this?  And even if you are, my answer is to fix
> the core to honor EPROBE_DEFER post boot.

Yes, the driver_deferred_probe_trigger() function is called from
late_initcall(). After boot there is nothing to call it again.
I'm investigating the udev path now as Jason suggested, but
udev has its own limitations.
I will see what it would take to extend the EPROBE_DEFER
functionality after boot.
Thanks,
Jason Gunthorpe Jan. 19, 2017, 5:58 p.m. UTC | #11
On Wed, Jan 18, 2017 at 07:16:29PM -0500, Doug Ledford wrote:

> > This is a 'stable device naming' problem, which we have never
> > tried to solve in RDMA.
> 
> No, that would imply they must be hfi1_0 and hfi1_1, when this is not
> the case.  If you had two cards in this system, both dual port, then
> you may be wanting to rename hfi_3 to hfi1_2 and vice versa.  It is a
> relative name problem, not a stable name problem.  We need only reverse
> the order that the ports are probed in, their names are whatever they
> end up being once reversed.

Eh? If *users* expect RDMA names to be meaningful/stable then it *IS*
the stable naming problem. We have never guarenteed stable device
names in RDMA, but it does happen to work out by luck in many cases.

Linux does have a guarentee of PCI driver bind order. For instance
the parallel probe patch series randomizes driver bind order, so any
driver relying on this for 'stable names' is broken.

> > udev is the expected kernel way to solve this. Trying to hack stable
> > names by forcing device bind order is horrible.
> 
> This is a manufacturing defect.  Something I'm sure Intel wants to
> resolve without requiring users to go in and manually name their ports.
>  I have no doubt that they would prefer that the user remain blissfully
> unaware of the issue, all except for the ones that probably reported it
> and already have their system cabled up wrong as a result.

Modern udev models do not require manual naming by users, look at what
netdev is doing to solve this problem these days.

hif_slot#_port# can be generated automatically by udev based on
information from the driver and the BIOS. This is what is being done
for netdev.

That is where we really need to go as well.

As you say, this is a oops on Intels part, so that may be too long
term - so they should solve this temporarily and imperfectly *in their
driver* by assinging RDMA device names manually, eg make it so that
hfiX has X be even for port 0 and X be odd for port 1.

Never any need for any kind of defered binding approach.

> No, mlx5 could have easily hit this too as their ports are separate
> PCI functions.

Sound like intel and mellanox should collaborate on getting udev
stable naming working right for RDMA... They are eventually going to
get burned.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Leon Romanovsky Jan. 22, 2017, 8:16 a.m. UTC | #12
On Thu, Jan 19, 2017 at 10:58:02AM -0700, Jason Gunthorpe wrote:
> On Wed, Jan 18, 2017 at 07:16:29PM -0500, Doug Ledford wrote:
>
>
> > No, mlx5 could have easily hit this too as their ports are separate
> > PCI functions.
>
> Sound like intel and mellanox should collaborate on getting udev
> stable naming working right for RDMA... They are eventually going to
> get burned.

Send patches and we will be happy to review and test them.

Thanks

>
> Jason
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/infiniband/hw/hfi1/chip.c b/drivers/infiniband/hw/hfi1/chip.c
index ef72bc2..d9aadac 100644
--- a/drivers/infiniband/hw/hfi1/chip.c
+++ b/drivers/infiniband/hw/hfi1/chip.c
@@ -129,9 +129,6 @@  struct flag_table {
 #define NUM_MAP_ENTRIES		256
 #define NUM_MAP_REGS             32
 
-/* Bit offset into the GUID which carries HFI id information */
-#define GUID_HFI_INDEX_SHIFT     39
-
 /* extract the emulation revision */
 #define emulator_rev(dd) ((dd)->irev >> 8)
 /* parallel and serial emulation versions are 3 and 4 respectively */
diff --git a/drivers/infiniband/hw/hfi1/chip.h b/drivers/infiniband/hw/hfi1/chip.h
index 043fd21..aa322b0 100644
--- a/drivers/infiniband/hw/hfi1/chip.h
+++ b/drivers/infiniband/hw/hfi1/chip.h
@@ -575,6 +575,10 @@  enum {
 #define LOOPBACK_LCB	2
 #define LOOPBACK_CABLE	3	/* external cable */
 
+/* Bit offset into the GUID which carries HFI id information */
+#define GUID_HFI_INDEX_SHIFT     39
+#define HFI_ASIC_GUID(guid) ((guid) & ~(1ULL << GUID_HFI_INDEX_SHIFT))
+
 /* read and write hardware registers */
 u64 read_csr(const struct hfi1_devdata *dd, u32 offset);
 void write_csr(const struct hfi1_devdata *dd, u32 offset, u64 value);
diff --git a/drivers/infiniband/hw/hfi1/init.c b/drivers/infiniband/hw/hfi1/init.c
index e3b5bc9..93436bd 100644
--- a/drivers/infiniband/hw/hfi1/init.c
+++ b/drivers/infiniband/hw/hfi1/init.c
@@ -116,12 +116,29 @@  unsigned int user_credit_return_threshold = 33;	/* default is 33% */
 module_param(user_credit_return_threshold, uint, S_IRUGO);
 MODULE_PARM_DESC(user_credit_return_threshold, "Credit return threshold for user send contexts, return when unreturned credits passes this many blocks (in percent of allocated blocks, 0 is off)");
 
+static bool port_reorder;
+module_param(port_reorder, bool, S_IRUGO);
+MODULE_PARM_DESC(port_reorder, "Device port reorder: 1 - order HFIs on the same ASIC in increasing port order, or 0 - order exactly as the kernel enumerates (default)");
+
 static inline u64 encode_rcv_header_entry_size(u16);
 
 static struct idr hfi1_unit_table;
 u32 hfi1_cpulist_count;
 unsigned long *hfi1_cpulist;
 
+struct hfi1_init_dev {
+	struct list_head list;
+	struct pci_dev *pdev;
+	const struct pci_device_id *ent;
+	u64 guid;
+};
+
+static struct timer_list init_timer;
+static LIST_HEAD(init_list);
+static DEFINE_MUTEX(init_mutex); /* protects init_list */
+static struct workqueue_struct *hfi1_init_wq;
+static void hfi1_deferred_init(struct work_struct *work);
+
 /*
  * Common code for creating the receive context array.
  */
@@ -1165,6 +1182,7 @@  void hfi1_disable_after_error(struct hfi1_devdata *dd)
 
 static void remove_one(struct pci_dev *);
 static int init_one(struct pci_dev *, const struct pci_device_id *);
+static void hfi1_probe_done(unsigned long data);
 
 #define DRIVER_LOAD_MSG "Intel " DRIVER_NAME " loaded: "
 #define PFX DRIVER_NAME ": "
@@ -1209,6 +1227,16 @@  static int __init hfi1_mod_init(void)
 	if (ret)
 		goto bail;
 
+	if (port_reorder) {
+		hfi1_init_wq = alloc_workqueue("hfi1_init_wq", WQ_MEM_RECLAIM,
+					       0);
+		if (!hfi1_init_wq) {
+			pr_err("Failed to create deffered dev init wq\n");
+			ret = -ENOMEM;
+			goto bail;
+		}
+	}
+
 	/* validate max MTU before any devices start */
 	if (!valid_opa_max_mtu(hfi1_max_mtu)) {
 		pr_err("Invalid max_mtu 0x%x, using 0x%x instead\n",
@@ -1264,6 +1292,9 @@  static int __init hfi1_mod_init(void)
 	ret = hfi1_wss_init();
 	if (ret < 0)
 		goto bail_wss;
+
+	setup_timer(&init_timer, hfi1_probe_done, 0);
+
 	ret = pci_register_driver(&hfi1_pci_driver);
 	if (ret < 0) {
 		pr_err("Unable to register driver: error %d\n", -ret);
@@ -1288,6 +1319,8 @@  module_init(hfi1_mod_init);
  */
 static void __exit hfi1_mod_cleanup(void)
 {
+	struct hfi1_init_dev *pos, *tmp;
+
 	pci_unregister_driver(&hfi1_pci_driver);
 	node_affinity_destroy();
 	hfi1_wss_exit();
@@ -1298,6 +1331,18 @@  static void __exit hfi1_mod_cleanup(void)
 	idr_destroy(&hfi1_unit_table);
 	dispose_firmware();	/* asymmetric with obtain_firmware() */
 	dev_cleanup();
+
+	mutex_lock(&init_mutex);
+	list_for_each_entry_safe(pos, tmp, &init_list, list) {
+		list_del(&pos->list);
+		kfree(pos);
+	}
+	mutex_unlock(&init_mutex);
+	del_timer(&init_timer);
+	if (hfi1_init_wq) {
+		destroy_workqueue(hfi1_init_wq);
+		hfi1_init_wq = NULL;
+	}
 }
 
 module_exit(hfi1_mod_cleanup);
@@ -1415,7 +1460,7 @@  static int init_validate_rcvhdrcnt(struct device *dev, uint thecnt)
 	return 0;
 }
 
-static int init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
+static int do_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 {
 	int ret = 0, j, pidx, initfail;
 	struct hfi1_devdata *dd;
@@ -1545,6 +1590,129 @@  static int init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 	return ret;
 }
 
+static void hfi1_deferred_init(struct work_struct *work)
+{
+	struct hfi1_init_dev *pos, *tmp;
+	int ret;
+
+	mutex_lock(&init_mutex);
+	list_for_each_entry_safe(pos, tmp, &init_list, list) {
+		ret = do_init_one(pos->pdev, pos->ent);
+		if (ret)
+			hfi1_early_err(&pos->pdev->dev, "%s: dev init failed, err %d\n",
+				       __func__, ret);
+
+		list_del(&pos->list);
+		kfree(pos);
+	}
+	mutex_unlock(&init_mutex);
+	kfree(work);
+}
+
+static void hfi1_probe_done(unsigned long data)
+{
+	struct work_struct *work;
+
+	if (!hfi1_init_wq)
+		return;
+
+	/* Got probe for all devices. Now invoke hfi1_deferred_init */
+	work = kzalloc(sizeof(*work), GFP_ATOMIC);
+	if (!work)
+		return;
+
+	INIT_WORK(work, hfi1_deferred_init);
+	queue_work(hfi1_init_wq, work);
+}
+
+static int get_guid(struct pci_dev *pdev, u64 *guid)
+{
+	void __iomem *bar;
+
+	bar = ioremap_nocache(pci_resource_start(pdev, 0),
+			      DC_DC8051_CFG_LOCAL_GUID + 8);
+	if (!bar) {
+		hfi1_early_err(&pdev->dev, "%s: unable to ioremap bar\n",
+			       __func__);
+		return -ENOMEM;
+	}
+	/* make sure the DC is not in reset so the GUID is readable */
+	writeq(0ull, bar + CCE_DC_CTRL);
+	(void)readq(bar + CCE_DC_CTRL); /* force the write */
+
+	/* read device guid */
+	*guid = readq(bar + DC_DC8051_CFG_LOCAL_GUID);
+	iounmap(bar);
+	return 0;
+}
+
+static int init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
+{
+	struct hfi1_init_dev *probed_pdev;
+	struct hfi1_init_dev *pos;
+	u64 guid;
+	int ret;
+
+	if (!port_reorder)
+		return do_init_one(pdev, ent);
+
+	probed_pdev = kzalloc(sizeof(*probed_pdev), GFP_KERNEL);
+	if (!probed_pdev)
+		return -ENOMEM;
+
+	ret = pci_enable_device(pdev);
+	if (ret) {
+		hfi1_early_err(&pdev->dev, "%s: cannot enable device, err %d\n",
+			       __func__, ret);
+		goto fail;
+	}
+
+	ret = get_guid(pdev, &guid);
+	if (ret)
+		goto fail_get_guid;
+
+	probed_pdev->pdev = pdev;
+	probed_pdev->ent = ent;
+	probed_pdev->guid = guid;
+
+	mutex_lock(&init_mutex);
+	if (list_empty(&init_list)) {
+		list_add_tail(&probed_pdev->list, &init_list);
+	} else {
+		list_for_each_entry(pos, &init_list, list)
+			if (HFI_ASIC_GUID(pos->guid) == HFI_ASIC_GUID(guid))
+				break;
+
+		/* Match found, pos and entry are valid */
+		if (pos->guid == probed_pdev->guid) {
+			/* Something wrong - same hardware id */
+			hfi1_early_err(&pdev->dev, "%s: duplicate guid 0x%llx\n",
+				       __func__, guid);
+			mutex_unlock(&init_mutex);
+			goto fail_get_guid;
+		}
+
+		/*
+		 * Order around the first of the pair that was enumerated.
+		 * Insert before or after the one already in the list.
+		 */
+		if ((pos->guid >> GUID_HFI_INDEX_SHIFT) & 1)
+			list_add_tail(&probed_pdev->list, &pos->list);
+		else
+			list_add(&probed_pdev->list, &pos->list);
+	}
+	mutex_unlock(&init_mutex);
+	mod_timer(&init_timer, jiffies + msecs_to_jiffies(1000));
+	pci_disable_device(pdev);
+	return 0;
+
+fail_get_guid:
+	pci_disable_device(pdev);
+fail:
+	kfree(probed_pdev);
+	return ret;
+}
+
 static void wait_for_clients(struct hfi1_devdata *dd)
 {
 	/*