diff mbox series

[v5,03/14] PCI: cadence: Convert all r/w accessors to perform only 32-bit accesses

Message ID 20200522033631.32574-4-kishon@ti.com (mailing list archive)
State Superseded, archived
Delegated to: Lorenzo Pieralisi
Headers show
Series Add PCIe support to TI's J721E SoC | expand

Commit Message

Kishon Vijay Abraham I May 22, 2020, 3:36 a.m. UTC
Certain platforms like TI's J721E using Cadence PCIe IP can perform only
32-bit accesses for reading or writing to Cadence registers. Convert all
read and write accesses to 32-bit in Cadence PCIe driver in preparation
for adding PCIe support in TI's J721E SoC.

Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
---
 drivers/pci/controller/cadence/pcie-cadence.h | 71 ++++++++++++++-----
 1 file changed, 53 insertions(+), 18 deletions(-)

Comments

Rob Herring May 22, 2020, 3:54 p.m. UTC | #1
On Thu, May 21, 2020 at 9:37 PM Kishon Vijay Abraham I <kishon@ti.com> wrote:
>
> Certain platforms like TI's J721E using Cadence PCIe IP can perform only
> 32-bit accesses for reading or writing to Cadence registers. Convert all
> read and write accesses to 32-bit in Cadence PCIe driver in preparation
> for adding PCIe support in TI's J721E SoC.

Looking more closely I don't think cdns_pcie_ep_assert_intx is okay
with this and never can be given the PCI_COMMAND and PCI_STATUS
registers are in the same word (IIRC, that's the main reason 32-bit
config space accesses are broken). So this isn't going to work at
least for EP accesses. And maybe you need a custom .raise_irq() hook
to minimize any problems (such as making the RMW atomic at least from
the endpoint's perspective).

Rob
Kishon Vijay Abraham I May 25, 2020, 3:30 a.m. UTC | #2
Hi Rob,

On 5/22/2020 9:24 PM, Rob Herring wrote:
> On Thu, May 21, 2020 at 9:37 PM Kishon Vijay Abraham I <kishon@ti.com> wrote:
>>
>> Certain platforms like TI's J721E using Cadence PCIe IP can perform only
>> 32-bit accesses for reading or writing to Cadence registers. Convert all
>> read and write accesses to 32-bit in Cadence PCIe driver in preparation
>> for adding PCIe support in TI's J721E SoC.
> 
> Looking more closely I don't think cdns_pcie_ep_assert_intx is okay
> with this and never can be given the PCI_COMMAND and PCI_STATUS
> registers are in the same word (IIRC, that's the main reason 32-bit
> config space accesses are broken). So this isn't going to work at

right, PCI_STATUS has write '1' to clear bits and there's a chance that it
could be reset while raising legacy interrupt. While this cannot be avoided for
TI's J721E, other platforms doesn't have to have this limitation.
> least for EP accesses. And maybe you need a custom .raise_irq() hook
> to minimize any problems (such as making the RMW atomic at least from
> the endpoint's perspective).

This is to make sure EP doesn't update in-consistent state when RC is updating
the PCI_STATUS register? Since this involves two different systems, how do we
make this atomic?

Thanks
Kishon
Rob Herring May 26, 2020, 3:12 p.m. UTC | #3
On Sun, May 24, 2020 at 9:30 PM Kishon Vijay Abraham I <kishon@ti.com> wrote:
>
> Hi Rob,
>
> On 5/22/2020 9:24 PM, Rob Herring wrote:
> > On Thu, May 21, 2020 at 9:37 PM Kishon Vijay Abraham I <kishon@ti.com> wrote:
> >>
> >> Certain platforms like TI's J721E using Cadence PCIe IP can perform only
> >> 32-bit accesses for reading or writing to Cadence registers. Convert all
> >> read and write accesses to 32-bit in Cadence PCIe driver in preparation
> >> for adding PCIe support in TI's J721E SoC.
> >
> > Looking more closely I don't think cdns_pcie_ep_assert_intx is okay
> > with this and never can be given the PCI_COMMAND and PCI_STATUS
> > registers are in the same word (IIRC, that's the main reason 32-bit
> > config space accesses are broken). So this isn't going to work at
>
> right, PCI_STATUS has write '1' to clear bits and there's a chance that it
> could be reset while raising legacy interrupt. While this cannot be avoided for
> TI's J721E, other platforms doesn't have to have this limitation.
> > least for EP accesses. And maybe you need a custom .raise_irq() hook
> > to minimize any problems (such as making the RMW atomic at least from
> > the endpoint's perspective).
>
> This is to make sure EP doesn't update in-consistent state when RC is updating
> the PCI_STATUS register? Since this involves two different systems, how do we
> make this atomic?

You can't make it atomic WRT both systems, but is there locking around
each RMW? Specifically, are preemption and interrupts disabled to
ensure time between a read and write are minimized? You wouldn't want
interrupts disabled during the delay too though (i.e. around
.raise_irq()).

BTW, I've asked this question before, but aren't PCI legacy interrupts
level triggered? If so, isn't generating a pulse wrong?

Rob
Kishon Vijay Abraham I May 27, 2020, 10:49 a.m. UTC | #4
Hi Rob,

On 5/26/2020 8:42 PM, Rob Herring wrote:
> On Sun, May 24, 2020 at 9:30 PM Kishon Vijay Abraham I <kishon@ti.com> wrote:
>>
>> Hi Rob,
>>
>> On 5/22/2020 9:24 PM, Rob Herring wrote:
>>> On Thu, May 21, 2020 at 9:37 PM Kishon Vijay Abraham I <kishon@ti.com> wrote:
>>>>
>>>> Certain platforms like TI's J721E using Cadence PCIe IP can perform only
>>>> 32-bit accesses for reading or writing to Cadence registers. Convert all
>>>> read and write accesses to 32-bit in Cadence PCIe driver in preparation
>>>> for adding PCIe support in TI's J721E SoC.
>>>
>>> Looking more closely I don't think cdns_pcie_ep_assert_intx is okay
>>> with this and never can be given the PCI_COMMAND and PCI_STATUS
>>> registers are in the same word (IIRC, that's the main reason 32-bit
>>> config space accesses are broken). So this isn't going to work at
>>
>> right, PCI_STATUS has write '1' to clear bits and there's a chance that it
>> could be reset while raising legacy interrupt. While this cannot be avoided for
>> TI's J721E, other platforms doesn't have to have this limitation.
>>> least for EP accesses. And maybe you need a custom .raise_irq() hook
>>> to minimize any problems (such as making the RMW atomic at least from
>>> the endpoint's perspective).
>>
>> This is to make sure EP doesn't update in-consistent state when RC is updating
>> the PCI_STATUS register? Since this involves two different systems, how do we
>> make this atomic?
> 
> You can't make it atomic WRT both systems, but is there locking around
> each RMW? Specifically, are preemption and interrupts disabled to
> ensure time between a read and write are minimized? You wouldn't want
> interrupts disabled during the delay too though (i.e. around
> .raise_irq()).

Okay, I'll add spin spin_lock_irqsave() in cdns_pcie_write_sz(). As you also
pointed below that delay for legacy interrupt is wrong and it has to be fixed
(with a later series).

How do you want to handle cdns_pcie_ep_fn_writew() now? Because now we are
changing the default implementation to perform only 32-bit access (used for
legacy interrupt, msi-x interrupt and while writing standard headers) and it's
not okay only for legacy interrupts for platforms other than TI.

So just for legacy interrupt, you want me to add a different accessor which
does not perform 32-bit writes (while we add a different .raise_irq for TI
platform?
> 
> BTW, I've asked this question before, but aren't PCI legacy interrupts
> level triggered? If so, isn't generating a pulse wrong?

You are right. This is wrong and it has to be fixed. I'll work on this later.

Thanks
Kishon
Rob Herring May 27, 2020, 4:37 p.m. UTC | #5
On Wed, May 27, 2020 at 4:49 AM Kishon Vijay Abraham I <kishon@ti.com> wrote:
>
> Hi Rob,
>
> On 5/26/2020 8:42 PM, Rob Herring wrote:
> > On Sun, May 24, 2020 at 9:30 PM Kishon Vijay Abraham I <kishon@ti.com> wrote:
> >>
> >> Hi Rob,
> >>
> >> On 5/22/2020 9:24 PM, Rob Herring wrote:
> >>> On Thu, May 21, 2020 at 9:37 PM Kishon Vijay Abraham I <kishon@ti.com> wrote:
> >>>>
> >>>> Certain platforms like TI's J721E using Cadence PCIe IP can perform only
> >>>> 32-bit accesses for reading or writing to Cadence registers. Convert all
> >>>> read and write accesses to 32-bit in Cadence PCIe driver in preparation
> >>>> for adding PCIe support in TI's J721E SoC.
> >>>
> >>> Looking more closely I don't think cdns_pcie_ep_assert_intx is okay
> >>> with this and never can be given the PCI_COMMAND and PCI_STATUS
> >>> registers are in the same word (IIRC, that's the main reason 32-bit
> >>> config space accesses are broken). So this isn't going to work at
> >>
> >> right, PCI_STATUS has write '1' to clear bits and there's a chance that it
> >> could be reset while raising legacy interrupt. While this cannot be avoided for
> >> TI's J721E, other platforms doesn't have to have this limitation.
> >>> least for EP accesses. And maybe you need a custom .raise_irq() hook
> >>> to minimize any problems (such as making the RMW atomic at least from
> >>> the endpoint's perspective).
> >>
> >> This is to make sure EP doesn't update in-consistent state when RC is updating
> >> the PCI_STATUS register? Since this involves two different systems, how do we
> >> make this atomic?
> >
> > You can't make it atomic WRT both systems, but is there locking around
> > each RMW? Specifically, are preemption and interrupts disabled to
> > ensure time between a read and write are minimized? You wouldn't want
> > interrupts disabled during the delay too though (i.e. around
> > .raise_irq()).
>
> Okay, I'll add spin spin_lock_irqsave() in cdns_pcie_write_sz(). As you also
> pointed below that delay for legacy interrupt is wrong and it has to be fixed
> (with a later series).

But you don't need a lock everywhere. You need locks in the callers
(and only sometimes).

> How do you want to handle cdns_pcie_ep_fn_writew() now? Because now we are
> changing the default implementation to perform only 32-bit access (used for
> legacy interrupt, msi-x interrupt and while writing standard headers) and it's
> not okay only for legacy interrupts for platforms other than TI.

Now I'm wondering how set_msi is not racy in the current code with the
host setting/clearing PCI_MSI_FLAGS_ENABLE? Maybe that bit is RO from
the EP side?

Ultimately I think you're going to have to provide your own endpoint
functions or you need accessors for specific registers like
PCI_MSI_FLAGS. Then for example, you just rely on the 2 bytes before
PCI_MSI_FLAGS being reserved and do a 32-bit access without a RMW.
Trying to abstract this at the register read/write level is going to
be fragile.

Rob
Kishon Vijay Abraham I May 27, 2020, 10:06 p.m. UTC | #6
Hi Rob,

On 5/27/2020 10:07 PM, Rob Herring wrote:
> On Wed, May 27, 2020 at 4:49 AM Kishon Vijay Abraham I <kishon@ti.com> wrote:
>>
>> Hi Rob,
>>
>> On 5/26/2020 8:42 PM, Rob Herring wrote:
>>> On Sun, May 24, 2020 at 9:30 PM Kishon Vijay Abraham I <kishon@ti.com> wrote:
>>>>
>>>> Hi Rob,
>>>>
>>>> On 5/22/2020 9:24 PM, Rob Herring wrote:
>>>>> On Thu, May 21, 2020 at 9:37 PM Kishon Vijay Abraham I <kishon@ti.com> wrote:
>>>>>>
>>>>>> Certain platforms like TI's J721E using Cadence PCIe IP can perform only
>>>>>> 32-bit accesses for reading or writing to Cadence registers. Convert all
>>>>>> read and write accesses to 32-bit in Cadence PCIe driver in preparation
>>>>>> for adding PCIe support in TI's J721E SoC.
>>>>>
>>>>> Looking more closely I don't think cdns_pcie_ep_assert_intx is okay
>>>>> with this and never can be given the PCI_COMMAND and PCI_STATUS
>>>>> registers are in the same word (IIRC, that's the main reason 32-bit
>>>>> config space accesses are broken). So this isn't going to work at
>>>>
>>>> right, PCI_STATUS has write '1' to clear bits and there's a chance that it
>>>> could be reset while raising legacy interrupt. While this cannot be avoided for
>>>> TI's J721E, other platforms doesn't have to have this limitation.
>>>>> least for EP accesses. And maybe you need a custom .raise_irq() hook
>>>>> to minimize any problems (such as making the RMW atomic at least from
>>>>> the endpoint's perspective).
>>>>
>>>> This is to make sure EP doesn't update in-consistent state when RC is updating
>>>> the PCI_STATUS register? Since this involves two different systems, how do we
>>>> make this atomic?
>>>
>>> You can't make it atomic WRT both systems, but is there locking around
>>> each RMW? Specifically, are preemption and interrupts disabled to
>>> ensure time between a read and write are minimized? You wouldn't want
>>> interrupts disabled during the delay too though (i.e. around
>>> .raise_irq()).
>>
>> Okay, I'll add spin spin_lock_irqsave() in cdns_pcie_write_sz(). As you also
>> pointed below that delay for legacy interrupt is wrong and it has to be fixed
>> (with a later series).
> 
> But you don't need a lock everywhere. You need locks in the callers
> (and only sometimes).

Okay, the locks should be added only for registers where HOST can also write to
the same register? Maybe only raise_irq then..

> 
>> How do you want to handle cdns_pcie_ep_fn_writew() now? Because now we are
>> changing the default implementation to perform only 32-bit access (used for
>> legacy interrupt, msi-x interrupt and while writing standard headers) and it's
>> not okay only for legacy interrupts for platforms other than TI.
> 
> Now I'm wondering how set_msi is not racy in the current code with the
> host setting/clearing PCI_MSI_FLAGS_ENABLE? Maybe that bit is RO from
> the EP side?

set_msi/set_msix is a one time configuration that is invoked before the host
establishes the link with the endpoint. I don't think we have to consider this
as racy.

Thanks
Kishon

> 
> Ultimately I think you're going to have to provide your own endpoint
> functions or you need accessors for specific registers like
> PCI_MSI_FLAGS. Then for example, you just rely on the 2 bytes before
> PCI_MSI_FLAGS being reserved and do a 32-bit access without a RMW.
> Trying to abstract this at the register read/write level is going to
> be fragile
> 
> Rob
>
Kishon Vijay Abraham I June 1, 2020, 1:16 a.m. UTC | #7
Hi Rob,

On 5/28/2020 3:36 AM, Kishon Vijay Abraham I wrote:
> Hi Rob,
> 
> On 5/27/2020 10:07 PM, Rob Herring wrote:
>> On Wed, May 27, 2020 at 4:49 AM Kishon Vijay Abraham I <kishon@ti.com> wrote:
>>>
>>> Hi Rob,
>>>
>>> On 5/26/2020 8:42 PM, Rob Herring wrote:
>>>> On Sun, May 24, 2020 at 9:30 PM Kishon Vijay Abraham I <kishon@ti.com> wrote:
>>>>>
>>>>> Hi Rob,
>>>>>
>>>>> On 5/22/2020 9:24 PM, Rob Herring wrote:
>>>>>> On Thu, May 21, 2020 at 9:37 PM Kishon Vijay Abraham I <kishon@ti.com> wrote:
>>>>>>>
>>>>>>> Certain platforms like TI's J721E using Cadence PCIe IP can perform only
>>>>>>> 32-bit accesses for reading or writing to Cadence registers. Convert all
>>>>>>> read and write accesses to 32-bit in Cadence PCIe driver in preparation
>>>>>>> for adding PCIe support in TI's J721E SoC.
>>>>>>
>>>>>> Looking more closely I don't think cdns_pcie_ep_assert_intx is okay
>>>>>> with this and never can be given the PCI_COMMAND and PCI_STATUS
>>>>>> registers are in the same word (IIRC, that's the main reason 32-bit
>>>>>> config space accesses are broken). So this isn't going to work at
>>>>>
>>>>> right, PCI_STATUS has write '1' to clear bits and there's a chance that it
>>>>> could be reset while raising legacy interrupt. While this cannot be avoided for
>>>>> TI's J721E, other platforms doesn't have to have this limitation.
>>>>>> least for EP accesses. And maybe you need a custom .raise_irq() hook
>>>>>> to minimize any problems (such as making the RMW atomic at least from
>>>>>> the endpoint's perspective).
>>>>>
>>>>> This is to make sure EP doesn't update in-consistent state when RC is updating
>>>>> the PCI_STATUS register? Since this involves two different systems, how do we
>>>>> make this atomic?
>>>>
>>>> You can't make it atomic WRT both systems, but is there locking around
>>>> each RMW? Specifically, are preemption and interrupts disabled to
>>>> ensure time between a read and write are minimized? You wouldn't want
>>>> interrupts disabled during the delay too though (i.e. around
>>>> .raise_irq()).
>>>
>>> Okay, I'll add spin spin_lock_irqsave() in cdns_pcie_write_sz(). As you also
>>> pointed below that delay for legacy interrupt is wrong and it has to be fixed
>>> (with a later series).
>>
>> But you don't need a lock everywhere. You need locks in the callers
>> (and only sometimes).
> 
> Okay, the locks should be added only for registers where HOST can also write to
> the same register? Maybe only raise_irq then..
> 
>>
>>> How do you want to handle cdns_pcie_ep_fn_writew() now? Because now we are
>>> changing the default implementation to perform only 32-bit access (used for
>>> legacy interrupt, msi-x interrupt and while writing standard headers) and it's
>>> not okay only for legacy interrupts for platforms other than TI.
>>
>> Now I'm wondering how set_msi is not racy in the current code with the
>> host setting/clearing PCI_MSI_FLAGS_ENABLE? Maybe that bit is RO from
>> the EP side?
> 
> set_msi/set_msix is a one time configuration that is invoked before the host
> establishes the link with the endpoint. I don't think we have to consider this
> as racy.

Can we try to close on this discussion please?

Thanks
Kishon

> 
> Thanks
> Kishon
> 
>>
>> Ultimately I think you're going to have to provide your own endpoint
>> functions or you need accessors for specific registers like
>> PCI_MSI_FLAGS. Then for example, you just rely on the 2 bytes before
>> PCI_MSI_FLAGS being reserved and do a 32-bit access without a RMW.
>> Trying to abstract this at the register read/write level is going to
>> be fragile
>>
>> Rob
>>
Kishon Vijay Abraham I June 8, 2020, 3:58 p.m. UTC | #8
Hi Rob,

On 6/1/2020 6:46 AM, Kishon Vijay Abraham I wrote:
> Hi Rob,
> 
> On 5/28/2020 3:36 AM, Kishon Vijay Abraham I wrote:
>> Hi Rob,
>>
>> On 5/27/2020 10:07 PM, Rob Herring wrote:
>>> On Wed, May 27, 2020 at 4:49 AM Kishon Vijay Abraham I <kishon@ti.com> wrote:
>>>>
>>>> Hi Rob,
>>>>
>>>> On 5/26/2020 8:42 PM, Rob Herring wrote:
>>>>> On Sun, May 24, 2020 at 9:30 PM Kishon Vijay Abraham I <kishon@ti.com> wrote:
>>>>>>
>>>>>> Hi Rob,
>>>>>>
>>>>>> On 5/22/2020 9:24 PM, Rob Herring wrote:
>>>>>>> On Thu, May 21, 2020 at 9:37 PM Kishon Vijay Abraham I <kishon@ti.com> wrote:
>>>>>>>>
>>>>>>>> Certain platforms like TI's J721E using Cadence PCIe IP can perform only
>>>>>>>> 32-bit accesses for reading or writing to Cadence registers. Convert all
>>>>>>>> read and write accesses to 32-bit in Cadence PCIe driver in preparation
>>>>>>>> for adding PCIe support in TI's J721E SoC.
>>>>>>>
>>>>>>> Looking more closely I don't think cdns_pcie_ep_assert_intx is okay
>>>>>>> with this and never can be given the PCI_COMMAND and PCI_STATUS
>>>>>>> registers are in the same word (IIRC, that's the main reason 32-bit
>>>>>>> config space accesses are broken). So this isn't going to work at
>>>>>>
>>>>>> right, PCI_STATUS has write '1' to clear bits and there's a chance that it
>>>>>> could be reset while raising legacy interrupt. While this cannot be avoided for
>>>>>> TI's J721E, other platforms doesn't have to have this limitation.
>>>>>>> least for EP accesses. And maybe you need a custom .raise_irq() hook
>>>>>>> to minimize any problems (such as making the RMW atomic at least from
>>>>>>> the endpoint's perspective).
>>>>>>
>>>>>> This is to make sure EP doesn't update in-consistent state when RC is updating
>>>>>> the PCI_STATUS register? Since this involves two different systems, how do we
>>>>>> make this atomic?
>>>>>
>>>>> You can't make it atomic WRT both systems, but is there locking around
>>>>> each RMW? Specifically, are preemption and interrupts disabled to
>>>>> ensure time between a read and write are minimized? You wouldn't want
>>>>> interrupts disabled during the delay too though (i.e. around
>>>>> .raise_irq()).
>>>>
>>>> Okay, I'll add spin spin_lock_irqsave() in cdns_pcie_write_sz(). As you also
>>>> pointed below that delay for legacy interrupt is wrong and it has to be fixed
>>>> (with a later series).
>>>
>>> But you don't need a lock everywhere. You need locks in the callers
>>> (and only sometimes).
>>
>> Okay, the locks should be added only for registers where HOST can also write to
>> the same register? Maybe only raise_irq then..
>>
>>>
>>>> How do you want to handle cdns_pcie_ep_fn_writew() now? Because now we are
>>>> changing the default implementation to perform only 32-bit access (used for
>>>> legacy interrupt, msi-x interrupt and while writing standard headers) and it's
>>>> not okay only for legacy interrupts for platforms other than TI.
>>>
>>> Now I'm wondering how set_msi is not racy in the current code with the
>>> host setting/clearing PCI_MSI_FLAGS_ENABLE? Maybe that bit is RO from
>>> the EP side?
>>
>> set_msi/set_msix is a one time configuration that is invoked before the host
>> establishes the link with the endpoint. I don't think we have to consider this
>> as racy.
> 
> Can we try to close on this discussion please?

Should we just try to handle .raise_irq() separately for TI platform and all
the other accesses remain as 32-bit access?

Thanks
Kishon

> 
> Thanks
> Kishon
> 
>>
>> Thanks
>> Kishon
>>
>>>
>>> Ultimately I think you're going to have to provide your own endpoint
>>> functions or you need accessors for specific registers like
>>> PCI_MSI_FLAGS. Then for example, you just rely on the 2 bytes before
>>> PCI_MSI_FLAGS being reserved and do a 32-bit access without a RMW.
>>> Trying to abstract this at the register read/write level is going to
>>> be fragile
>>>
>>> Rob
>>>
diff mbox series

Patch

diff --git a/drivers/pci/controller/cadence/pcie-cadence.h b/drivers/pci/controller/cadence/pcie-cadence.h
index bc49c22e48a9..737e9561092b 100644
--- a/drivers/pci/controller/cadence/pcie-cadence.h
+++ b/drivers/pci/controller/cadence/pcie-cadence.h
@@ -319,50 +319,88 @@  struct cdns_pcie_ep {
 
 
 /* Register access */
-static inline void cdns_pcie_writeb(struct cdns_pcie *pcie, u32 reg, u8 value)
+static inline void cdns_pcie_writel(struct cdns_pcie *pcie, u32 reg, u32 value)
 {
-	writeb(value, pcie->reg_base + reg);
+	writel(value, pcie->reg_base + reg);
 }
 
-static inline void cdns_pcie_writew(struct cdns_pcie *pcie, u32 reg, u16 value)
+static inline u32 cdns_pcie_readl(struct cdns_pcie *pcie, u32 reg)
 {
-	writew(value, pcie->reg_base + reg);
+	return readl(pcie->reg_base + reg);
 }
 
-static inline void cdns_pcie_writel(struct cdns_pcie *pcie, u32 reg, u32 value)
+static inline u32 cdns_pcie_read_sz(void __iomem *addr, int size)
 {
-	writel(value, pcie->reg_base + reg);
+	void __iomem *aligned_addr = PTR_ALIGN_DOWN(addr, 0x4);
+	unsigned int offset = (unsigned long)addr & 0x3;
+	u32 val = readl(aligned_addr);
+
+	if (!IS_ALIGNED((uintptr_t)addr, size)) {
+		WARN(1, "Address %p and size %d are not aligned\n", addr, size);
+		return 0;
+	}
+
+	if (size > 2)
+		return val;
+
+	return (val >> (8 * offset)) & ((1 << (size * 8)) - 1);
 }
 
-static inline u32 cdns_pcie_readl(struct cdns_pcie *pcie, u32 reg)
+static inline void cdns_pcie_write_sz(void __iomem *addr, int size, u32 value)
 {
-	return readl(pcie->reg_base + reg);
+	void __iomem *aligned_addr = PTR_ALIGN_DOWN(addr, 0x4);
+	unsigned int offset = (unsigned long)addr & 0x3;
+	u32 mask;
+	u32 val;
+
+	if (!IS_ALIGNED((uintptr_t)addr, size)) {
+		WARN(1, "Address %p and size %d are not aligned\n", addr, size);
+		return;
+	}
+
+	if (size > 2) {
+		writel(value, addr);
+		return;
+	}
+
+	mask = ~(((1 << (size * 8)) - 1) << (offset * 8));
+	val = readl(aligned_addr) & mask;
+	val |= value << (offset * 8);
+	writel(val, aligned_addr);
 }
 
 /* Root Port register access */
 static inline void cdns_pcie_rp_writeb(struct cdns_pcie *pcie,
 				       u32 reg, u8 value)
 {
-	writeb(value, pcie->reg_base + CDNS_PCIE_RP_BASE + reg);
+	void __iomem *addr = pcie->reg_base + CDNS_PCIE_RP_BASE + reg;
+
+	cdns_pcie_write_sz(addr, 0x1, value);
 }
 
 static inline void cdns_pcie_rp_writew(struct cdns_pcie *pcie,
 				       u32 reg, u16 value)
 {
-	writew(value, pcie->reg_base + CDNS_PCIE_RP_BASE + reg);
+	void __iomem *addr = pcie->reg_base + CDNS_PCIE_RP_BASE + reg;
+
+	cdns_pcie_write_sz(addr, 0x2, value);
 }
 
 /* Endpoint Function register access */
 static inline void cdns_pcie_ep_fn_writeb(struct cdns_pcie *pcie, u8 fn,
 					  u32 reg, u8 value)
 {
-	writeb(value, pcie->reg_base + CDNS_PCIE_EP_FUNC_BASE(fn) + reg);
+	void __iomem *addr = pcie->reg_base + CDNS_PCIE_EP_FUNC_BASE(fn) + reg;
+
+	cdns_pcie_write_sz(addr, 0x1, value);
 }
 
 static inline void cdns_pcie_ep_fn_writew(struct cdns_pcie *pcie, u8 fn,
 					  u32 reg, u16 value)
 {
-	writew(value, pcie->reg_base + CDNS_PCIE_EP_FUNC_BASE(fn) + reg);
+	void __iomem *addr = pcie->reg_base + CDNS_PCIE_EP_FUNC_BASE(fn) + reg;
+
+	cdns_pcie_write_sz(addr, 0x2, value);
 }
 
 static inline void cdns_pcie_ep_fn_writel(struct cdns_pcie *pcie, u8 fn,
@@ -371,14 +409,11 @@  static inline void cdns_pcie_ep_fn_writel(struct cdns_pcie *pcie, u8 fn,
 	writel(value, pcie->reg_base + CDNS_PCIE_EP_FUNC_BASE(fn) + reg);
 }
 
-static inline u8 cdns_pcie_ep_fn_readb(struct cdns_pcie *pcie, u8 fn, u32 reg)
-{
-	return readb(pcie->reg_base + CDNS_PCIE_EP_FUNC_BASE(fn) + reg);
-}
-
 static inline u16 cdns_pcie_ep_fn_readw(struct cdns_pcie *pcie, u8 fn, u32 reg)
 {
-	return readw(pcie->reg_base + CDNS_PCIE_EP_FUNC_BASE(fn) + reg);
+	void __iomem *addr = pcie->reg_base + CDNS_PCIE_EP_FUNC_BASE(fn) + reg;
+
+	return cdns_pcie_read_sz(addr, 0x2);
 }
 
 static inline u32 cdns_pcie_ep_fn_readl(struct cdns_pcie *pcie, u8 fn, u32 reg)