diff mbox series

[v4,7/8] mtd: spi-nor: Enhance locking to support reads while writes

Message ID 20230201113603.293758-8-miquel.raynal@bootlin.com (mailing list archive)
State New, archived
Headers show
Series mtd: spi-nor: read while write support | expand

Commit Message

Miquel Raynal Feb. 1, 2023, 11:36 a.m. UTC
On devices featuring several banks, the Read While Write (RWW) feature
is here to improve the overall performance when performing parallel
reads and writes at different locations (different banks). The following
constraints have to be taken into account:
1#: A single operation can be performed in a given bank.
2#: Only a single program or erase operation can happen on the entire
    chip (common hardware limitation to limit costs)
3#: Reads must remain serialized even though reads on different banks
    might occur at the same time.
4#: The I/O bus is unique and thus is the most constrained resource, all
    spi-nor operations requiring access to the spi bus (through the spi
    controller) must be serialized until the bus exchanges are over. So
    we must ensure a single operation can be "sent" at a time.
5#: Any other operation that would not be either a read or a write or an
    erase is considered requiring access to the full chip and cannot be
    parallelized, we then need to ensure the full chip is in the idle
    state when this occurs.

All these constraints can easily be managed with a proper locking model:
1#: Is enforced by a bitfield of the in-use banks, so that only a single
    operation can happen in a specific bank at any time.
2#: Is handled by the ongoing_pe boolean which is set before any write
    or erase, and is released only at the very end of the
    operation. This way, no other destructive operation on the chip can
    start during this time frame.
3#: An ongoing_rd boolean allows to track the ongoing reads, so that
    only one can be performed at a time.
4#: An ongoing_io boolean is introduced in order to capture and serialize
    bus accessed. This is the one being released "sooner" than before,
    because we only need to protect the chip against other SPI accesses
    during the I/O phase, which for the destructive operations is the
    beginning of the operation (when we send the command cycles and
    possibly the data), while the second part of the operation (the
    erase delay or the programmation delay) is when we can do something
    else in another bank.
5#: Is handled by the three booleans presented above, if any of them is
    set, the chip is not yet ready for the operation and must wait.

All these internal variables are protected by the existing lock, so that
changes in this structure are atomic. The serialization is handled with
a wait queue.

Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
---
 drivers/mtd/spi-nor/core.c  | 319 ++++++++++++++++++++++++++++++++++--
 include/linux/mtd/spi-nor.h |  13 ++
 2 files changed, 317 insertions(+), 15 deletions(-)

Comments

Tudor Ambarus March 17, 2023, 5:59 a.m. UTC | #1
Hi, Miquel,

I find the overall idea good.

On 2/1/23 11:36, Miquel Raynal wrote:
> On devices featuring several banks, the Read While Write (RWW) feature
> is here to improve the overall performance when performing parallel
> reads and writes at different locations (different banks). The following
> constraints have to be taken into account:
> 1#: A single operation can be performed in a given bank.
> 2#: Only a single program or erase operation can happen on the entire
>     chip (common hardware limitation to limit costs)
> 3#: Reads must remain serialized even though reads on different banks
>     might occur at the same time.

3# is unclear if one limits just at reading the commit message. Are the
reads serialized per bank or per flash?

After reading the code, it looks like all the reads are serialized per
flash regardless if it reads registers or memory. I assume you meant
that crossing a bank boundary with a single read is fine. But can you
really read from bank 1 and bank 3 at the same time? The code doesn't
take this into consideration.

> 4#: The I/O bus is unique and thus is the most constrained resource, all
>     spi-nor operations requiring access to the spi bus (through the spi
>     controller) must be serialized until the bus exchanges are over. So
>     we must ensure a single operation can be "sent" at a time.
> 5#: Any other operation that would not be either a read or a write or an
>     erase is considered requiring access to the full chip and cannot be
>     parallelized, we then need to ensure the full chip is in the idle
>     state when this occurs.
> 
> All these constraints can easily be managed with a proper locking model:
> 1#: Is enforced by a bitfield of the in-use banks, so that only a single
>     operation can happen in a specific bank at any time.
> 2#: Is handled by the ongoing_pe boolean which is set before any write
>     or erase, and is released only at the very end of the
>     operation. This way, no other destructive operation on the chip can
>     start during this time frame.
> 3#: An ongoing_rd boolean allows to track the ongoing reads, so that
>     only one can be performed at a time.
> 4#: An ongoing_io boolean is introduced in order to capture and serialize
>     bus accessed. This is the one being released "sooner" than before,
>     because we only need to protect the chip against other SPI accesses
>     during the I/O phase, which for the destructive operations is the
>     beginning of the operation (when we send the command cycles and
>     possibly the data), while the second part of the operation (the
>     erase delay or the programmation delay) is when we can do something
>     else in another bank.
> 5#: Is handled by the three booleans presented above, if any of them is
>     set, the chip is not yet ready for the operation and must wait.
> 
> All these internal variables are protected by the existing lock, so that
> changes in this structure are atomic. The serialization is handled with
> a wait queue.
> 
> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
> ---
>  drivers/mtd/spi-nor/core.c  | 319 ++++++++++++++++++++++++++++++++++--
>  include/linux/mtd/spi-nor.h |  13 ++
>  2 files changed, 317 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/mtd/spi-nor/core.c b/drivers/mtd/spi-nor/core.c
> index ac4627e0d6c2..ad2436e3688f 100644
> --- a/drivers/mtd/spi-nor/core.c
> +++ b/drivers/mtd/spi-nor/core.c
> @@ -589,6 +589,66 @@ int spi_nor_sr_ready(struct spi_nor *nor)
>  	return !(nor->bouncebuf[0] & SR_WIP);
>  }
>  
> +/**
> + * spi_nor_parallel_locking() - Checks if the RWW locking scheme shall be used
> + * @nor:	pointer to 'struct spi_nor'.
> + *
> + * Return: true if parallel locking is enabled, false otherwise.
> + */
> +static bool spi_nor_parallel_locking(struct spi_nor *nor)
> +{
> +	if (nor->controller_ops &&
> +	    (nor->controller_ops->prepare || nor->controller_ops->unprepare))
> +		return false;

We won't allow controller drivers in spi-nor/controllers to benefit of
this feature, just do:
	if (nor->controller_ops)
		return false;
> +
> +	return nor->info->n_banks > 1 && nor->info->no_sfdp_flags & SPI_NOR_RWW;

we don't play with flash info flags throughout the core. Introduce a
SNOR_F equivalent flag, see how they're used. You'll be able to get rid
of the n_banks check as well.

> +}
> +
> +/* Locking helpers for status read operations */
> +static int spi_nor_rww_start_rdst(struct spi_nor *nor)
> +{
> +	int ret = -EAGAIN;

you can have a pointer to rww here, you'll avoid all those dereferences
from nor. I would add such a pointer wherever there is more than one
dereference, so the comment is for the entire patch.
> +
> +	mutex_lock(&nor->lock);
> +
> +	if (nor->rww.ongoing_io || nor->rww.ongoing_rd)
> +		goto busy;
> +
> +	nor->rww.ongoing_io = true;
> +	nor->rww.ongoing_rd = true;
> +	ret = 0;
> +
> +busy:
> +	mutex_unlock(&nor->lock);
> +	return ret;
> +}
> +
> +static void spi_nor_rww_end_rdst(struct spi_nor *nor)
> +{
> +	mutex_lock(&nor->lock);
> +
> +	nor->rww.ongoing_io = false;
> +	nor->rww.ongoing_rd = false;
> +
> +	mutex_unlock(&nor->lock);
> +}
> +
> +static int spi_nor_lock_rdst(struct spi_nor *nor)
> +{
> +	if (spi_nor_parallel_locking(nor))
> +		return spi_nor_rww_start_rdst(nor);
> +
> +	return 0;
> +}
> +
> +static void spi_nor_unlock_rdst(struct spi_nor *nor)
> +{
> +	if (spi_nor_parallel_locking(nor)) {
> +		spi_nor_rww_end_rdst(nor);
> +		wake_up(&nor->rww.wait);
> +	}
> +}
> +
>  /**
>   * spi_nor_ready() - Query the flash to see if it is ready for new commands.
>   * @nor:	pointer to 'struct spi_nor'.
> @@ -597,11 +657,21 @@ int spi_nor_sr_ready(struct spi_nor *nor)
>   */
>  static int spi_nor_ready(struct spi_nor *nor)
>  {
> +	int ret;
> +
> +	ret = spi_nor_lock_rdst(nor);
> +	if (ret)
> +		return 0;
> +
>  	/* Flashes might override the standard routine. */
>  	if (nor->params->ready)
> -		return nor->params->ready(nor);
> +		ret = nor->params->ready(nor);
> +	else
> +		ret = spi_nor_sr_ready(nor);
>  
> -	return spi_nor_sr_ready(nor);
> +	spi_nor_unlock_rdst(nor);
> +
> +	return ret;
>  }
>  
>  /**
> @@ -1087,7 +1157,81 @@ static void spi_nor_unprep(struct spi_nor *nor)
>  		nor->controller_ops->unprepare(nor);
>  }
>  
> +static void spi_nor_offset_to_banks(struct spi_nor *nor, loff_t start, size_t len,

pass directly the bank_size instead of the pointer to nor, you'll avoid
the double dereference.

> +				    unsigned int *first, unsigned int *last)

unsigned long long *first, *last ?

> +{
> +	*first = DIV_ROUND_DOWN_ULL(start, nor->params->bank_size);
> +	*last = DIV_ROUND_DOWN_ULL(start + len - 1, nor->params->bank_size);
> +}
> +
>  /* Generic helpers for internal locking and serialization */
> +static bool spi_nor_rww_start_io(struct spi_nor *nor)
> +{
> +	bool start = false;
> +
> +	mutex_lock(&nor->lock);
> +
> +	if (nor->rww.ongoing_io)
> +		goto busy;
> +
> +	nor->rww.ongoing_io = true;
> +	start = true;
> +
> +busy:
> +	mutex_unlock(&nor->lock);
> +	return start;
> +}
> +
> +static void spi_nor_rww_end_io(struct spi_nor *nor)
> +{
> +	mutex_lock(&nor->lock);
> +	nor->rww.ongoing_io = false;
> +	mutex_unlock(&nor->lock);
> +}
> +
> +static int spi_nor_lock_device(struct spi_nor *nor)
> +{
> +	if (!spi_nor_parallel_locking(nor))
> +		return 0;
> +
> +	return wait_event_killable(nor->rww.wait, spi_nor_rww_start_io(nor));
> +}
> +
> +static void spi_nor_unlock_device(struct spi_nor *nor)
> +{
> +	if (spi_nor_parallel_locking(nor))
> +		spi_nor_rww_end_io(nor);

shall we wake_up here too?

> +}
> +
> +/* Generic helpers for internal locking and serialization */
> +static bool spi_nor_rww_start_exclusive(struct spi_nor *nor)
> +{
> +	bool start = false;
> +
> +	mutex_lock(&nor->lock);
> +
> +	if (nor->rww.ongoing_io || nor->rww.ongoing_rd || nor->rww.ongoing_pe)
> +		goto busy;
> +
> +	nor->rww.ongoing_io = true;
> +	nor->rww.ongoing_rd = true;
> +	nor->rww.ongoing_pe = true;
> +	start = true;
> +
> +busy:
> +	mutex_unlock(&nor->lock);
> +	return start;
> +}
> +
> +static void spi_nor_rww_end_exclusive(struct spi_nor *nor)
> +{
> +	mutex_lock(&nor->lock);
> +	nor->rww.ongoing_io = false;
> +	nor->rww.ongoing_rd = false;
> +	nor->rww.ongoing_pe = false;
> +	mutex_unlock(&nor->lock);
> +}
> +
>  int spi_nor_prep_and_lock(struct spi_nor *nor)
>  {
>  	int ret;
> @@ -1096,19 +1240,71 @@ int spi_nor_prep_and_lock(struct spi_nor *nor)
>  	if (ret)
>  		return ret;
>  
> -	mutex_lock(&nor->lock);
> +	if (!spi_nor_parallel_locking(nor))
> +		mutex_lock(&nor->lock);
> +	else
> +		ret = wait_event_killable(nor->rww.wait,
> +					  spi_nor_rww_start_exclusive(nor));
>

No, don't touch spi_nor_prep_and_lock() and spi_nor_unlock_and_unprep(),
you're giving the impresion that users of it (OTP, SWP) are safe to use
them while reads or PE are in progress, which is not the case, because
you don't guard the ops that they're using. You'll also have to document
the flash info RWW flag ands say it is mutual exclusive with SWP and OTP
features.

> -	return 0;
> +	return ret;
>  }
>  
>  void spi_nor_unlock_and_unprep(struct spi_nor *nor)
>  {
> -	mutex_unlock(&nor->lock);
> +	if (!spi_nor_parallel_locking(nor)) {
> +		mutex_unlock(&nor->lock);
> +	} else {
> +		spi_nor_rww_end_exclusive(nor);
> +		wake_up(&nor->rww.wait);
> +	}
>  
>  	spi_nor_unprep(nor);
>  }
>  
>  /* Internal locking helpers for program and erase operations */
> +static bool spi_nor_rww_start_pe(struct spi_nor *nor, loff_t start, size_t len)
> +{
> +	unsigned int first, last;
> +	bool started = false;
> +	int bank;
> +
> +	mutex_lock(&nor->lock);
> +
> +	if (nor->rww.ongoing_io || nor->rww.ongoing_rd || nor->rww.ongoing_pe)
> +		goto busy;
> +
> +	spi_nor_offset_to_banks(nor, start, len, &first, &last);
> +	for (bank = first; bank <= last; bank++)
> +		if (nor->rww.used_banks & BIT(bank))
> +			goto busy;
> +
> +	for (bank = first; bank <= last; bank++)
you can avoid this second look by introducing a local used_banks
variable and have the mask set in the previous loop. Then you'll just do
an init at this point.

> +		nor->rww.used_banks |= BIT(bank);
> +
> +	nor->rww.ongoing_pe = true;
> +	started = true;
> +
> +busy:
> +	mutex_unlock(&nor->lock);
> +	return started;
> +}
> +
> +static void spi_nor_rww_end_pe(struct spi_nor *nor, loff_t start, size_t len)
> +{
> +	unsigned int first, last;
> +	int bank;
> +
> +	mutex_lock(&nor->lock);
> +
> +	spi_nor_offset_to_banks(nor, start, len, &first, &last);
> +	for (bank = first; bank <= last; bank++)
> +		nor->rww.used_banks &= ~BIT(bank);
> +
> +	nor->rww.ongoing_pe = false;
> +
> +	mutex_unlock(&nor->lock);
> +}
> +
>  static int spi_nor_prep_and_lock_pe(struct spi_nor *nor, loff_t start, size_t len)
>  {
>  	int ret;
> @@ -1117,19 +1313,73 @@ static int spi_nor_prep_and_lock_pe(struct spi_nor *nor, loff_t start, size_t le
>  	if (ret)
>  		return ret;
>  
> -	mutex_lock(&nor->lock);
> +	if (!spi_nor_parallel_locking(nor))
> +		mutex_lock(&nor->lock);
> +	else
> +		ret = wait_event_killable(nor->rww.wait,
> +					  spi_nor_rww_start_pe(nor, start, len));
>  
> -	return 0;
> +	return ret;
>  }
>  
>  static void spi_nor_unlock_and_unprep_pe(struct spi_nor *nor, loff_t start, size_t len)
>  {
> -	mutex_unlock(&nor->lock);
> +	if (!spi_nor_parallel_locking(nor)) {
> +		mutex_unlock(&nor->lock);
> +	} else {
> +		spi_nor_rww_end_pe(nor, start, len);
> +		wake_up(&nor->rww.wait);
> +	}
>  
>  	spi_nor_unprep(nor);
>  }
>  
>  /* Internal locking helpers for read operations */
> +static bool spi_nor_rww_start_rd(struct spi_nor *nor, loff_t start, size_t len)
> +{
> +	unsigned int first, last;
> +	bool started = false;
> +	int bank;
> +
> +	mutex_lock(&nor->lock);
> +
> +	if (nor->rww.ongoing_io || nor->rww.ongoing_rd)
> +		goto busy;
> +
> +	spi_nor_offset_to_banks(nor, start, len, &first, &last);
> +	for (bank = first; bank <= last; bank++)
> +		if (nor->rww.used_banks & BIT(bank))
> +			goto busy;
> +
> +	for (bank = first; bank <= last; bank++)

mask to avoid 2nd loop

Cheers,
ta
Miquel Raynal March 24, 2023, 5:41 p.m. UTC | #2
Hi Tudor,

tudor.ambarus@linaro.org wrote on Fri, 17 Mar 2023 05:59:08 +0000:

> Hi, Miquel,
> 
> I find the overall idea good.

Thanks a lot for the detailed review!

> On 2/1/23 11:36, Miquel Raynal wrote:
> > On devices featuring several banks, the Read While Write (RWW) feature
> > is here to improve the overall performance when performing parallel
> > reads and writes at different locations (different banks). The following
> > constraints have to be taken into account:
> > 1#: A single operation can be performed in a given bank.
> > 2#: Only a single program or erase operation can happen on the entire
> >     chip (common hardware limitation to limit costs)
> > 3#: Reads must remain serialized even though reads on different banks
> >     might occur at the same time.  
> 
> 3# is unclear if one limits just at reading the commit message. Are the
> reads serialized per bank or per flash?

Per flash.

> After reading the code, it looks like all the reads are serialized per
> flash regardless if it reads registers or memory. I assume you meant
> that crossing a bank boundary with a single read is fine.

Yes, I will update that item to clarify.

> But can you
> really read from bank 1 and bank 3 at the same time? The code doesn't
> take this into consideration.

Yes this is taken into account and supported, a read can cross a bank
boundary.

> 
> > 4#: The I/O bus is unique and thus is the most constrained resource, all
> >     spi-nor operations requiring access to the spi bus (through the spi
> >     controller) must be serialized until the bus exchanges are over. So
> >     we must ensure a single operation can be "sent" at a time.
> > 5#: Any other operation that would not be either a read or a write or an
> >     erase is considered requiring access to the full chip and cannot be
> >     parallelized, we then need to ensure the full chip is in the idle
> >     state when this occurs.
> > 
> > All these constraints can easily be managed with a proper locking model:
> > 1#: Is enforced by a bitfield of the in-use banks, so that only a single
> >     operation can happen in a specific bank at any time.
> > 2#: Is handled by the ongoing_pe boolean which is set before any write
> >     or erase, and is released only at the very end of the
> >     operation. This way, no other destructive operation on the chip can
> >     start during this time frame.
> > 3#: An ongoing_rd boolean allows to track the ongoing reads, so that
> >     only one can be performed at a time.
> > 4#: An ongoing_io boolean is introduced in order to capture and serialize
> >     bus accessed. This is the one being released "sooner" than before,
> >     because we only need to protect the chip against other SPI accesses
> >     during the I/O phase, which for the destructive operations is the
> >     beginning of the operation (when we send the command cycles and
> >     possibly the data), while the second part of the operation (the
> >     erase delay or the programmation delay) is when we can do something
> >     else in another bank.
> > 5#: Is handled by the three booleans presented above, if any of them is
> >     set, the chip is not yet ready for the operation and must wait.
> > 
> > All these internal variables are protected by the existing lock, so that
> > changes in this structure are atomic. The serialization is handled with
> > a wait queue.
> > 
> > Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
> > ---
> >  drivers/mtd/spi-nor/core.c  | 319 ++++++++++++++++++++++++++++++++++--
> >  include/linux/mtd/spi-nor.h |  13 ++
> >  2 files changed, 317 insertions(+), 15 deletions(-)
> > 
> > diff --git a/drivers/mtd/spi-nor/core.c b/drivers/mtd/spi-nor/core.c
> > index ac4627e0d6c2..ad2436e3688f 100644
> > --- a/drivers/mtd/spi-nor/core.c
> > +++ b/drivers/mtd/spi-nor/core.c
> > @@ -589,6 +589,66 @@ int spi_nor_sr_ready(struct spi_nor *nor)
> >  	return !(nor->bouncebuf[0] & SR_WIP);
> >  }
> >  
> > +/**
> > + * spi_nor_parallel_locking() - Checks if the RWW locking scheme shall be used
> > + * @nor:	pointer to 'struct spi_nor'.
> > + *
> > + * Return: true if parallel locking is enabled, false otherwise.
> > + */
> > +static bool spi_nor_parallel_locking(struct spi_nor *nor)
> > +{
> > +	if (nor->controller_ops &&
> > +	    (nor->controller_ops->prepare || nor->controller_ops->unprepare))
> > +		return false;  
> 
> We won't allow controller drivers in spi-nor/controllers to benefit of
> this feature, just do:
> 	if (nor->controller_ops)
> 		return false;

That is also checked in the spi-nor init helper, where SNOR_F_RWW is
now set, so no need to check it again and again.

> > +	return nor->info->n_banks > 1 && nor->info->no_sfdp_flags & SPI_NOR_RWW;  
> 
> we don't play with flash info flags throughout the core. Introduce a
> SNOR_F equivalent flag, see how they're used. You'll be able to get rid
> of the n_banks check as well.

Thanks for the clear pointers, looks much nicer now!

> > +}
> > +
> > +/* Locking helpers for status read operations */
> > +static int spi_nor_rww_start_rdst(struct spi_nor *nor)
> > +{
> > +	int ret = -EAGAIN;  
> 
> you can have a pointer to rww here, you'll avoid all those dereferences
> from nor. I would add such a pointer wherever there is more than one
> dereference, so the comment is for the entire patch.

haha, I guess this is a matter of taste, I'm not bothered by those, but
ok, I'll make the change here and after :-)

> > +
> > +	mutex_lock(&nor->lock);
> > +
> > +	if (nor->rww.ongoing_io || nor->rww.ongoing_rd)
> > +		goto busy;
> > +
> > +	nor->rww.ongoing_io = true;
> > +	nor->rww.ongoing_rd = true;
> > +	ret = 0;
> > +
> > +busy:
> > +	mutex_unlock(&nor->lock);
> > +	return ret;
> > +}
> > +
> > +static void spi_nor_rww_end_rdst(struct spi_nor *nor)
> > +{
> > +	mutex_lock(&nor->lock);
> > +
> > +	nor->rww.ongoing_io = false;
> > +	nor->rww.ongoing_rd = false;
> > +
> > +	mutex_unlock(&nor->lock);
> > +}
> > +
> > +static int spi_nor_lock_rdst(struct spi_nor *nor)
> > +{
> > +	if (spi_nor_parallel_locking(nor))
> > +		return spi_nor_rww_start_rdst(nor);
> > +
> > +	return 0;
> > +}
> > +
> > +static void spi_nor_unlock_rdst(struct spi_nor *nor)
> > +{
> > +	if (spi_nor_parallel_locking(nor)) {
> > +		spi_nor_rww_end_rdst(nor);
> > +		wake_up(&nor->rww.wait);
> > +	}
> > +}
> > +
> >  /**
> >   * spi_nor_ready() - Query the flash to see if it is ready for new commands.
> >   * @nor:	pointer to 'struct spi_nor'.
> > @@ -597,11 +657,21 @@ int spi_nor_sr_ready(struct spi_nor *nor)
> >   */
> >  static int spi_nor_ready(struct spi_nor *nor)
> >  {
> > +	int ret;
> > +
> > +	ret = spi_nor_lock_rdst(nor);
> > +	if (ret)
> > +		return 0;
> > +
> >  	/* Flashes might override the standard routine. */
> >  	if (nor->params->ready)
> > -		return nor->params->ready(nor);
> > +		ret = nor->params->ready(nor);
> > +	else
> > +		ret = spi_nor_sr_ready(nor);
> >  
> > -	return spi_nor_sr_ready(nor);
> > +	spi_nor_unlock_rdst(nor);
> > +
> > +	return ret;
> >  }
> >  
> >  /**
> > @@ -1087,7 +1157,81 @@ static void spi_nor_unprep(struct spi_nor *nor)
> >  		nor->controller_ops->unprepare(nor);
> >  }
> >  
> > +static void spi_nor_offset_to_banks(struct spi_nor *nor, loff_t start, size_t len,  
> 
> pass directly the bank_size instead of the pointer to nor, you'll avoid
> the double dereference.

Done

> 
> > +				    unsigned int *first, unsigned int *last)  
> 
> unsigned long long *first, *last ?

Actually I want these to remain unsigned int, the ULL suffix just mean
the input might be a 64-bit value, but it is quite common to treat the
output as 32-bit. Here we do not expect values greater than 4.

> > +{
> > +	*first = DIV_ROUND_DOWN_ULL(start, nor->params->bank_size);
> > +	*last = DIV_ROUND_DOWN_ULL(start + len - 1, nor->params->bank_size);
> > +}
> > +
> >  /* Generic helpers for internal locking and serialization */
> > +static bool spi_nor_rww_start_io(struct spi_nor *nor)
> > +{
> > +	bool start = false;
> > +
> > +	mutex_lock(&nor->lock);
> > +
> > +	if (nor->rww.ongoing_io)
> > +		goto busy;
> > +
> > +	nor->rww.ongoing_io = true;
> > +	start = true;
> > +
> > +busy:
> > +	mutex_unlock(&nor->lock);
> > +	return start;
> > +}
> > +
> > +static void spi_nor_rww_end_io(struct spi_nor *nor)
> > +{
> > +	mutex_lock(&nor->lock);
> > +	nor->rww.ongoing_io = false;
> > +	mutex_unlock(&nor->lock);
> > +}
> > +
> > +static int spi_nor_lock_device(struct spi_nor *nor)
> > +{
> > +	if (!spi_nor_parallel_locking(nor))
> > +		return 0;
> > +
> > +	return wait_event_killable(nor->rww.wait, spi_nor_rww_start_io(nor));
> > +}
> > +
> > +static void spi_nor_unlock_device(struct spi_nor *nor)
> > +{
> > +	if (spi_nor_parallel_locking(nor))
> > +		spi_nor_rww_end_io(nor);  
> 
> shall we wake_up here too?

True

> 
> > +}
> > +
> > +/* Generic helpers for internal locking and serialization */
> > +static bool spi_nor_rww_start_exclusive(struct spi_nor *nor)
> > +{
> > +	bool start = false;
> > +
> > +	mutex_lock(&nor->lock);
> > +
> > +	if (nor->rww.ongoing_io || nor->rww.ongoing_rd || nor->rww.ongoing_pe)
> > +		goto busy;
> > +
> > +	nor->rww.ongoing_io = true;
> > +	nor->rww.ongoing_rd = true;
> > +	nor->rww.ongoing_pe = true;
> > +	start = true;
> > +
> > +busy:
> > +	mutex_unlock(&nor->lock);
> > +	return start;
> > +}
> > +
> > +static void spi_nor_rww_end_exclusive(struct spi_nor *nor)
> > +{
> > +	mutex_lock(&nor->lock);
> > +	nor->rww.ongoing_io = false;
> > +	nor->rww.ongoing_rd = false;
> > +	nor->rww.ongoing_pe = false;
> > +	mutex_unlock(&nor->lock);
> > +}
> > +
> >  int spi_nor_prep_and_lock(struct spi_nor *nor)
> >  {
> >  	int ret;
> > @@ -1096,19 +1240,71 @@ int spi_nor_prep_and_lock(struct spi_nor *nor)
> >  	if (ret)
> >  		return ret;
> >  
> > -	mutex_lock(&nor->lock);
> > +	if (!spi_nor_parallel_locking(nor))
> > +		mutex_lock(&nor->lock);
> > +	else
> > +		ret = wait_event_killable(nor->rww.wait,
> > +					  spi_nor_rww_start_exclusive(nor));
> >  
> 
> No, don't touch spi_nor_prep_and_lock() and spi_nor_unlock_and_unprep(),
> you're giving the impresion that users of it (OTP, SWP) are safe to use
> them while reads or PE are in progress, which is not the case, because
> you don't guard the ops that they're using. You'll also have to document
> the flash info RWW flag ands say it is mutual exclusive with SWP and OTP
> features.

Actually I'm not getting what you mean here. Any access that is not
a pure read, write or erase (like register accesses, otp and swp
accesses) are automatically treated as needing full and exclusive
access to the chip. So everything works as before with these, even
if RWW is enabled, because that's not where we want performance
improvements. So I need to implement a dedicated "parallel_locking"
path inside these helpers to clearly enforcing an exclusive access to
the chip.

> > -	return 0;
> > +	return ret;
> >  }
> >  
> >  void spi_nor_unlock_and_unprep(struct spi_nor *nor)
> >  {
> > -	mutex_unlock(&nor->lock);
> > +	if (!spi_nor_parallel_locking(nor)) {
> > +		mutex_unlock(&nor->lock);
> > +	} else {
> > +		spi_nor_rww_end_exclusive(nor);
> > +		wake_up(&nor->rww.wait);
> > +	}
> >  
> >  	spi_nor_unprep(nor);
> >  }
> >  
> >  /* Internal locking helpers for program and erase operations */
> > +static bool spi_nor_rww_start_pe(struct spi_nor *nor, loff_t start, size_t len)
> > +{
> > +	unsigned int first, last;
> > +	bool started = false;
> > +	int bank;
> > +
> > +	mutex_lock(&nor->lock);
> > +
> > +	if (nor->rww.ongoing_io || nor->rww.ongoing_rd || nor->rww.ongoing_pe)
> > +		goto busy;
> > +
> > +	spi_nor_offset_to_banks(nor, start, len, &first, &last);
> > +	for (bank = first; bank <= last; bank++)
> > +		if (nor->rww.used_banks & BIT(bank))
> > +			goto busy;
> > +
> > +	for (bank = first; bank <= last; bank++)  
> you can avoid this second look by introducing a local used_banks
> variable and have the mask set in the previous loop. Then you'll just do
> an init at this point.

Done.

> 
> > +		nor->rww.used_banks |= BIT(bank);
> > +
> > +	nor->rww.ongoing_pe = true;
> > +	started = true;
> > +
> > +busy:
> > +	mutex_unlock(&nor->lock);
> > +	return started;
> > +}
> > +
> > +static void spi_nor_rww_end_pe(struct spi_nor *nor, loff_t start, size_t len)
> > +{
> > +	unsigned int first, last;
> > +	int bank;
> > +
> > +	mutex_lock(&nor->lock);
> > +
> > +	spi_nor_offset_to_banks(nor, start, len, &first, &last);
> > +	for (bank = first; bank <= last; bank++)
> > +		nor->rww.used_banks &= ~BIT(bank);
> > +
> > +	nor->rww.ongoing_pe = false;
> > +
> > +	mutex_unlock(&nor->lock);
> > +}
> > +
> >  static int spi_nor_prep_and_lock_pe(struct spi_nor *nor, loff_t start, size_t len)
> >  {
> >  	int ret;
> > @@ -1117,19 +1313,73 @@ static int spi_nor_prep_and_lock_pe(struct spi_nor *nor, loff_t start, size_t le
> >  	if (ret)
> >  		return ret;
> >  
> > -	mutex_lock(&nor->lock);
> > +	if (!spi_nor_parallel_locking(nor))
> > +		mutex_lock(&nor->lock);
> > +	else
> > +		ret = wait_event_killable(nor->rww.wait,
> > +					  spi_nor_rww_start_pe(nor, start, len));
> >  
> > -	return 0;
> > +	return ret;
> >  }
> >  
> >  static void spi_nor_unlock_and_unprep_pe(struct spi_nor *nor, loff_t start, size_t len)
> >  {
> > -	mutex_unlock(&nor->lock);
> > +	if (!spi_nor_parallel_locking(nor)) {
> > +		mutex_unlock(&nor->lock);
> > +	} else {
> > +		spi_nor_rww_end_pe(nor, start, len);
> > +		wake_up(&nor->rww.wait);
> > +	}
> >  
> >  	spi_nor_unprep(nor);
> >  }
> >  
> >  /* Internal locking helpers for read operations */
> > +static bool spi_nor_rww_start_rd(struct spi_nor *nor, loff_t start, size_t len)
> > +{
> > +	unsigned int first, last;
> > +	bool started = false;
> > +	int bank;
> > +
> > +	mutex_lock(&nor->lock);
> > +
> > +	if (nor->rww.ongoing_io || nor->rww.ongoing_rd)
> > +		goto busy;
> > +
> > +	spi_nor_offset_to_banks(nor, start, len, &first, &last);
> > +	for (bank = first; bank <= last; bank++)
> > +		if (nor->rww.used_banks & BIT(bank))
> > +			goto busy;
> > +
> > +	for (bank = first; bank <= last; bank++)  
> 
> mask to avoid 2nd loop

Done as well.

Thanks a lot!
Miquèl
Tudor Ambarus March 27, 2023, 9:29 a.m. UTC | #3
On 3/24/23 17:41, Miquel Raynal wrote:
> Hi Tudor,
> 

Hi!

> tudor.ambarus@linaro.org wrote on Fri, 17 Mar 2023 05:59:08 +0000:
> 
>> Hi, Miquel,
>>
>> I find the overall idea good.
> 
> Thanks a lot for the detailed review!
> 
>> On 2/1/23 11:36, Miquel Raynal wrote:
>>> On devices featuring several banks, the Read While Write (RWW) feature
>>> is here to improve the overall performance when performing parallel
>>> reads and writes at different locations (different banks). The following
>>> constraints have to be taken into account:
>>> 1#: A single operation can be performed in a given bank.
>>> 2#: Only a single program or erase operation can happen on the entire
>>>     chip (common hardware limitation to limit costs)
>>> 3#: Reads must remain serialized even though reads on different banks
>>>     might occur at the same time.  
>>
>> 3# is unclear if one limits just at reading the commit message. Are the
>> reads serialized per bank or per flash?
> 
> Per flash.
> 
>> After reading the code, it looks like all the reads are serialized per
>> flash regardless if it reads registers or memory. I assume you meant
>> that crossing a bank boundary with a single read is fine.
> 
> Yes, I will update that item to clarify.

thanks!

> 
>> But can you
>> really read from bank 1 and bank 3 at the same time? The code doesn't
>> take this into consideration.
> 
> Yes this is taken into account and supported, a read can cross a bank
> boundary.

No, I meant that you can't do a read from bank 1 and while the first
read is in progress, to start a second read from the 3rd bank and
process both reads in parallel, reading from both banks at the same
time. At least not with the current code, because you set
rww.{ongoing_io, ongoing_rd} to true and the second read will wait.
Cross boundary reads on successive banks should work with current code,
yes. So what does the hw support?

> 
>>
>>> 4#: The I/O bus is unique and thus is the most constrained resource, all
>>>     spi-nor operations requiring access to the spi bus (through the spi
>>>     controller) must be serialized until the bus exchanges are over. So
>>>     we must ensure a single operation can be "sent" at a time.
>>> 5#: Any other operation that would not be either a read or a write or an
>>>     erase is considered requiring access to the full chip and cannot be
>>>     parallelized, we then need to ensure the full chip is in the idle
>>>     state when this occurs.
>>>
>>> All these constraints can easily be managed with a proper locking model:
>>> 1#: Is enforced by a bitfield of the in-use banks, so that only a single
>>>     operation can happen in a specific bank at any time.
>>> 2#: Is handled by the ongoing_pe boolean which is set before any write
>>>     or erase, and is released only at the very end of the
>>>     operation. This way, no other destructive operation on the chip can
>>>     start during this time frame.
>>> 3#: An ongoing_rd boolean allows to track the ongoing reads, so that
>>>     only one can be performed at a time.
>>> 4#: An ongoing_io boolean is introduced in order to capture and serialize
>>>     bus accessed. This is the one being released "sooner" than before,
>>>     because we only need to protect the chip against other SPI accesses
>>>     during the I/O phase, which for the destructive operations is the
>>>     beginning of the operation (when we send the command cycles and
>>>     possibly the data), while the second part of the operation (the
>>>     erase delay or the programmation delay) is when we can do something
>>>     else in another bank.
>>> 5#: Is handled by the three booleans presented above, if any of them is
>>>     set, the chip is not yet ready for the operation and must wait.
>>>
>>> All these internal variables are protected by the existing lock, so that
>>> changes in this structure are atomic. The serialization is handled with
>>> a wait queue.
>>>
>>> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
>>> ---
>>>  drivers/mtd/spi-nor/core.c  | 319 ++++++++++++++++++++++++++++++++++--
>>>  include/linux/mtd/spi-nor.h |  13 ++
>>>  2 files changed, 317 insertions(+), 15 deletions(-)
>>>
>>> diff --git a/drivers/mtd/spi-nor/core.c b/drivers/mtd/spi-nor/core.c
>>> index ac4627e0d6c2..ad2436e3688f 100644
>>> --- a/drivers/mtd/spi-nor/core.c
>>> +++ b/drivers/mtd/spi-nor/core.c
>>> @@ -589,6 +589,66 @@ int spi_nor_sr_ready(struct spi_nor *nor)
>>>  	return !(nor->bouncebuf[0] & SR_WIP);
>>>  }
>>>  
>>> +/**
>>> + * spi_nor_parallel_locking() - Checks if the RWW locking scheme shall be used
>>> + * @nor:	pointer to 'struct spi_nor'.
>>> + *
>>> + * Return: true if parallel locking is enabled, false otherwise.
>>> + */
>>> +static bool spi_nor_parallel_locking(struct spi_nor *nor)
>>> +{
>>> +	if (nor->controller_ops &&
>>> +	    (nor->controller_ops->prepare || nor->controller_ops->unprepare))
>>> +		return false;  
>>
>> We won't allow controller drivers in spi-nor/controllers to benefit of
>> this feature, just do:
>> 	if (nor->controller_ops)
>> 		return false;
> 
> That is also checked in the spi-nor init helper, where SNOR_F_RWW is
> now set, so no need to check it again and again.
> 
>>> +	return nor->info->n_banks > 1 && nor->info->no_sfdp_flags & SPI_NOR_RWW;  
>>
>> we don't play with flash info flags throughout the core. Introduce a
>> SNOR_F equivalent flag, see how they're used. You'll be able to get rid
>> of the n_banks check as well.
> 
> Thanks for the clear pointers, looks much nicer now!
> 
>>> +}
>>> +
>>> +/* Locking helpers for status read operations */
>>> +static int spi_nor_rww_start_rdst(struct spi_nor *nor)
>>> +{
>>> +	int ret = -EAGAIN;  
>>
>> you can have a pointer to rww here, you'll avoid all those dereferences
>> from nor. I would add such a pointer wherever there is more than one
>> dereference, so the comment is for the entire patch.
> 
> haha, I guess this is a matter of taste, I'm not bothered by those, but
> ok, I'll make the change here and after :-)>
>>> +
>>> +	mutex_lock(&nor->lock);
>>> +
>>> +	if (nor->rww.ongoing_io || nor->rww.ongoing_rd)
>>> +		goto busy;
>>> +
>>> +	nor->rww.ongoing_io = true;
>>> +	nor->rww.ongoing_rd = true;
>>> +	ret = 0;
>>> +
>>> +busy:
>>> +	mutex_unlock(&nor->lock);
>>> +	return ret;
>>> +}
>>> +
>>> +static void spi_nor_rww_end_rdst(struct spi_nor *nor)
>>> +{
>>> +	mutex_lock(&nor->lock);
>>> +
>>> +	nor->rww.ongoing_io = false;
>>> +	nor->rww.ongoing_rd = false;
>>> +
>>> +	mutex_unlock(&nor->lock);
>>> +}
>>> +
>>> +static int spi_nor_lock_rdst(struct spi_nor *nor)
>>> +{
>>> +	if (spi_nor_parallel_locking(nor))
>>> +		return spi_nor_rww_start_rdst(nor);
>>> +
>>> +	return 0;
>>> +}
>>> +
>>> +static void spi_nor_unlock_rdst(struct spi_nor *nor)
>>> +{
>>> +	if (spi_nor_parallel_locking(nor)) {
>>> +		spi_nor_rww_end_rdst(nor);
>>> +		wake_up(&nor->rww.wait);
>>> +	}
>>> +}
>>> +
>>>  /**
>>>   * spi_nor_ready() - Query the flash to see if it is ready for new commands.
>>>   * @nor:	pointer to 'struct spi_nor'.
>>> @@ -597,11 +657,21 @@ int spi_nor_sr_ready(struct spi_nor *nor)
>>>   */
>>>  static int spi_nor_ready(struct spi_nor *nor)
>>>  {
>>> +	int ret;
>>> +
>>> +	ret = spi_nor_lock_rdst(nor);
>>> +	if (ret)
>>> +		return 0;
>>> +
>>>  	/* Flashes might override the standard routine. */
>>>  	if (nor->params->ready)
>>> -		return nor->params->ready(nor);
>>> +		ret = nor->params->ready(nor);
>>> +	else
>>> +		ret = spi_nor_sr_ready(nor);
>>>  
>>> -	return spi_nor_sr_ready(nor);
>>> +	spi_nor_unlock_rdst(nor);
>>> +
>>> +	return ret;
>>>  }
>>>  
>>>  /**
>>> @@ -1087,7 +1157,81 @@ static void spi_nor_unprep(struct spi_nor *nor)
>>>  		nor->controller_ops->unprepare(nor);
>>>  }
>>>  
>>> +static void spi_nor_offset_to_banks(struct spi_nor *nor, loff_t start, size_t len,  
>>
>> pass directly the bank_size instead of the pointer to nor, you'll avoid
>> the double dereference.
> 
> Done
> 
>>
>>> +				    unsigned int *first, unsigned int *last)  
>>
>> unsigned long long *first, *last ?
> 
> Actually I want these to remain unsigned int, the ULL suffix just mean
> the input might be a 64-bit value, but it is quite common to treat the
> output as 32-bit. Here we do not expect values greater than 4.

Ok. Then maybe we should match how we define nbanks in NOR. Was it a u8?

> 
>>> +{
>>> +	*first = DIV_ROUND_DOWN_ULL(start, nor->params->bank_size);
>>> +	*last = DIV_ROUND_DOWN_ULL(start + len - 1, nor->params->bank_size);
>>> +}
>>> +
>>>  /* Generic helpers for internal locking and serialization */
>>> +static bool spi_nor_rww_start_io(struct spi_nor *nor)
>>> +{
>>> +	bool start = false;
>>> +
>>> +	mutex_lock(&nor->lock);
>>> +
>>> +	if (nor->rww.ongoing_io)
>>> +		goto busy;
>>> +
>>> +	nor->rww.ongoing_io = true;
>>> +	start = true;
>>> +
>>> +busy:
>>> +	mutex_unlock(&nor->lock);
>>> +	return start;
>>> +}
>>> +
>>> +static void spi_nor_rww_end_io(struct spi_nor *nor)
>>> +{
>>> +	mutex_lock(&nor->lock);
>>> +	nor->rww.ongoing_io = false;
>>> +	mutex_unlock(&nor->lock);
>>> +}
>>> +
>>> +static int spi_nor_lock_device(struct spi_nor *nor)
>>> +{
>>> +	if (!spi_nor_parallel_locking(nor))
>>> +		return 0;
>>> +
>>> +	return wait_event_killable(nor->rww.wait, spi_nor_rww_start_io(nor));
>>> +}
>>> +
>>> +static void spi_nor_unlock_device(struct spi_nor *nor)
>>> +{
>>> +	if (spi_nor_parallel_locking(nor))
>>> +		spi_nor_rww_end_io(nor);  
>>
>> shall we wake_up here too?
> 
> True
> 
>>
>>> +}
>>> +
>>> +/* Generic helpers for internal locking and serialization */
>>> +static bool spi_nor_rww_start_exclusive(struct spi_nor *nor)
>>> +{
>>> +	bool start = false;
>>> +
>>> +	mutex_lock(&nor->lock);
>>> +
>>> +	if (nor->rww.ongoing_io || nor->rww.ongoing_rd || nor->rww.ongoing_pe)
>>> +		goto busy;
>>> +
>>> +	nor->rww.ongoing_io = true;
>>> +	nor->rww.ongoing_rd = true;
>>> +	nor->rww.ongoing_pe = true;
>>> +	start = true;
>>> +
>>> +busy:
>>> +	mutex_unlock(&nor->lock);
>>> +	return start;
>>> +}
>>> +
>>> +static void spi_nor_rww_end_exclusive(struct spi_nor *nor)
>>> +{
>>> +	mutex_lock(&nor->lock);
>>> +	nor->rww.ongoing_io = false;
>>> +	nor->rww.ongoing_rd = false;
>>> +	nor->rww.ongoing_pe = false;
>>> +	mutex_unlock(&nor->lock);
>>> +}
>>> +
>>>  int spi_nor_prep_and_lock(struct spi_nor *nor)
>>>  {
>>>  	int ret;
>>> @@ -1096,19 +1240,71 @@ int spi_nor_prep_and_lock(struct spi_nor *nor)
>>>  	if (ret)
>>>  		return ret;
>>>  
>>> -	mutex_lock(&nor->lock);
>>> +	if (!spi_nor_parallel_locking(nor))
>>> +		mutex_lock(&nor->lock);
>>> +	else
>>> +		ret = wait_event_killable(nor->rww.wait,
>>> +					  spi_nor_rww_start_exclusive(nor));
>>>  
>>
>> No, don't touch spi_nor_prep_and_lock() and spi_nor_unlock_and_unprep(),
>> you're giving the impresion that users of it (OTP, SWP) are safe to use
>> them while reads or PE are in progress, which is not the case, because
>> you don't guard the ops that they're using. You'll also have to document
>> the flash info RWW flag ands say it is mutual exclusive with SWP and OTP
>> features.
> 
> Actually I'm not getting what you mean here. Any access that is not
> a pure read, write or erase (like register accesses, otp and swp
> accesses) are automatically treated as needing full and exclusive
> access to the chip. So everything works as before with these, even
> if RWW is enabled, because that's not where we want performance

oh, yes, all SWP and OTP ops are guarded with spi_nor_prep_and_lock()
and spi_nor_unlock_and_unprep(), you should be good to go.

> improvements. So I need to implement a dedicated "parallel_locking"
> path inside these helpers to clearly enforcing an exclusive access to
> the chip.
>
Miquel Raynal March 28, 2023, 8:22 a.m. UTC | #4
Hi Tudor,

tudor.ambarus@linaro.org wrote on Mon, 27 Mar 2023 10:29:03 +0100:

> On 3/24/23 17:41, Miquel Raynal wrote:
> > Hi Tudor,
> >   
> 
> Hi!
> 
> > tudor.ambarus@linaro.org wrote on Fri, 17 Mar 2023 05:59:08 +0000:
> >   
> >> Hi, Miquel,
> >>
> >> I find the overall idea good.  
> > 
> > Thanks a lot for the detailed review!
> >   
> >> On 2/1/23 11:36, Miquel Raynal wrote:  
> >>> On devices featuring several banks, the Read While Write (RWW) feature
> >>> is here to improve the overall performance when performing parallel
> >>> reads and writes at different locations (different banks). The following
> >>> constraints have to be taken into account:
> >>> 1#: A single operation can be performed in a given bank.
> >>> 2#: Only a single program or erase operation can happen on the entire
> >>>     chip (common hardware limitation to limit costs)
> >>> 3#: Reads must remain serialized even though reads on different banks
> >>>     might occur at the same time.    
> >>
> >> 3# is unclear if one limits just at reading the commit message. Are the
> >> reads serialized per bank or per flash?  
> > 
> > Per flash.
> >   
> >> After reading the code, it looks like all the reads are serialized per
> >> flash regardless if it reads registers or memory. I assume you meant
> >> that crossing a bank boundary with a single read is fine.  
> > 
> > Yes, I will update that item to clarify.  
> 
> thanks!
> 
> >   
> >> But can you
> >> really read from bank 1 and bank 3 at the same time? The code doesn't
> >> take this into consideration.  
> > 
> > Yes this is taken into account and supported, a read can cross a bank
> > boundary.  
> 
> No, I meant that you can't do a read from bank 1 and while the first
> read is in progress, to start a second read from the 3rd bank and
> process both reads in parallel, reading from both banks at the same
> time. At least not with the current code, because you set
> rww.{ongoing_io, ongoing_rd} to true and the second read will wait.
> Cross boundary reads on successive banks should work with current code,
> yes. So what does the hw support?

Ok, sorry for the confusion. So, I think I remember a discussion where
I was told that this was not supported even though it would not be
extremely complex to support at a physical level ("just" by increasing
the current source). But IIRC right now this is not supported. Anyhow,
the main target of the RWW is to perform a read during a while, this is
very handy for performing eg. system updates besides reducing the
overall latency, but I don't think we want to bring even more
parallelism between reads. Actually the current implementation would
not work and a whole mtd I/O scheduler would be needed for that, which
is yet another task.


[...]

> >>> @@ -1087,7 +1157,81 @@ static void spi_nor_unprep(struct spi_nor *nor)
> >>>  		nor->controller_ops->unprepare(nor);
> >>>  }
> >>>  
> >>> +static void spi_nor_offset_to_banks(struct spi_nor *nor, loff_t start, size_t len,    
> >>
> >> pass directly the bank_size instead of the pointer to nor, you'll avoid
> >> the double dereference.  
> > 
> > Done
> >   
> >>  
> >>> +				    unsigned int *first, unsigned int *last)    
> >>
> >> unsigned long long *first, *last ?  
> > 
> > Actually I want these to remain unsigned int, the ULL suffix just mean
> > the input might be a 64-bit value, but it is quite common to treat the
> > output as 32-bit. Here we do not expect values greater than 4.  
> 
> Ok. Then maybe we should match how we define nbanks in NOR. Was it a u8?

Why not.

> 
> >   
> >>> +{
> >>> +	*first = DIV_ROUND_DOWN_ULL(start, nor->params->bank_size);
> >>> +	*last = DIV_ROUND_DOWN_ULL(start + len - 1, nor->params->bank_size);
> >>> +}
> >>> +

Thanks,
Miquèl
diff mbox series

Patch

diff --git a/drivers/mtd/spi-nor/core.c b/drivers/mtd/spi-nor/core.c
index ac4627e0d6c2..ad2436e3688f 100644
--- a/drivers/mtd/spi-nor/core.c
+++ b/drivers/mtd/spi-nor/core.c
@@ -589,6 +589,66 @@  int spi_nor_sr_ready(struct spi_nor *nor)
 	return !(nor->bouncebuf[0] & SR_WIP);
 }
 
+/**
+ * spi_nor_parallel_locking() - Checks if the RWW locking scheme shall be used
+ * @nor:	pointer to 'struct spi_nor'.
+ *
+ * Return: true if parallel locking is enabled, false otherwise.
+ */
+static bool spi_nor_parallel_locking(struct spi_nor *nor)
+{
+	if (nor->controller_ops &&
+	    (nor->controller_ops->prepare || nor->controller_ops->unprepare))
+		return false;
+
+	return nor->info->n_banks > 1 && nor->info->no_sfdp_flags & SPI_NOR_RWW;
+}
+
+/* Locking helpers for status read operations */
+static int spi_nor_rww_start_rdst(struct spi_nor *nor)
+{
+	int ret = -EAGAIN;
+
+	mutex_lock(&nor->lock);
+
+	if (nor->rww.ongoing_io || nor->rww.ongoing_rd)
+		goto busy;
+
+	nor->rww.ongoing_io = true;
+	nor->rww.ongoing_rd = true;
+	ret = 0;
+
+busy:
+	mutex_unlock(&nor->lock);
+	return ret;
+}
+
+static void spi_nor_rww_end_rdst(struct spi_nor *nor)
+{
+	mutex_lock(&nor->lock);
+
+	nor->rww.ongoing_io = false;
+	nor->rww.ongoing_rd = false;
+
+	mutex_unlock(&nor->lock);
+}
+
+static int spi_nor_lock_rdst(struct spi_nor *nor)
+{
+	if (spi_nor_parallel_locking(nor))
+		return spi_nor_rww_start_rdst(nor);
+
+	return 0;
+}
+
+static void spi_nor_unlock_rdst(struct spi_nor *nor)
+{
+	if (spi_nor_parallel_locking(nor)) {
+		spi_nor_rww_end_rdst(nor);
+		wake_up(&nor->rww.wait);
+	}
+}
+
 /**
  * spi_nor_ready() - Query the flash to see if it is ready for new commands.
  * @nor:	pointer to 'struct spi_nor'.
@@ -597,11 +657,21 @@  int spi_nor_sr_ready(struct spi_nor *nor)
  */
 static int spi_nor_ready(struct spi_nor *nor)
 {
+	int ret;
+
+	ret = spi_nor_lock_rdst(nor);
+	if (ret)
+		return 0;
+
 	/* Flashes might override the standard routine. */
 	if (nor->params->ready)
-		return nor->params->ready(nor);
+		ret = nor->params->ready(nor);
+	else
+		ret = spi_nor_sr_ready(nor);
 
-	return spi_nor_sr_ready(nor);
+	spi_nor_unlock_rdst(nor);
+
+	return ret;
 }
 
 /**
@@ -1087,7 +1157,81 @@  static void spi_nor_unprep(struct spi_nor *nor)
 		nor->controller_ops->unprepare(nor);
 }
 
+static void spi_nor_offset_to_banks(struct spi_nor *nor, loff_t start, size_t len,
+				    unsigned int *first, unsigned int *last)
+{
+	*first = DIV_ROUND_DOWN_ULL(start, nor->params->bank_size);
+	*last = DIV_ROUND_DOWN_ULL(start + len - 1, nor->params->bank_size);
+}
+
 /* Generic helpers for internal locking and serialization */
+static bool spi_nor_rww_start_io(struct spi_nor *nor)
+{
+	bool start = false;
+
+	mutex_lock(&nor->lock);
+
+	if (nor->rww.ongoing_io)
+		goto busy;
+
+	nor->rww.ongoing_io = true;
+	start = true;
+
+busy:
+	mutex_unlock(&nor->lock);
+	return start;
+}
+
+static void spi_nor_rww_end_io(struct spi_nor *nor)
+{
+	mutex_lock(&nor->lock);
+	nor->rww.ongoing_io = false;
+	mutex_unlock(&nor->lock);
+}
+
+static int spi_nor_lock_device(struct spi_nor *nor)
+{
+	if (!spi_nor_parallel_locking(nor))
+		return 0;
+
+	return wait_event_killable(nor->rww.wait, spi_nor_rww_start_io(nor));
+}
+
+static void spi_nor_unlock_device(struct spi_nor *nor)
+{
+	if (spi_nor_parallel_locking(nor))
+		spi_nor_rww_end_io(nor);
+}
+
+/* Generic helpers for internal locking and serialization */
+static bool spi_nor_rww_start_exclusive(struct spi_nor *nor)
+{
+	bool start = false;
+
+	mutex_lock(&nor->lock);
+
+	if (nor->rww.ongoing_io || nor->rww.ongoing_rd || nor->rww.ongoing_pe)
+		goto busy;
+
+	nor->rww.ongoing_io = true;
+	nor->rww.ongoing_rd = true;
+	nor->rww.ongoing_pe = true;
+	start = true;
+
+busy:
+	mutex_unlock(&nor->lock);
+	return start;
+}
+
+static void spi_nor_rww_end_exclusive(struct spi_nor *nor)
+{
+	mutex_lock(&nor->lock);
+	nor->rww.ongoing_io = false;
+	nor->rww.ongoing_rd = false;
+	nor->rww.ongoing_pe = false;
+	mutex_unlock(&nor->lock);
+}
+
 int spi_nor_prep_and_lock(struct spi_nor *nor)
 {
 	int ret;
@@ -1096,19 +1240,71 @@  int spi_nor_prep_and_lock(struct spi_nor *nor)
 	if (ret)
 		return ret;
 
-	mutex_lock(&nor->lock);
+	if (!spi_nor_parallel_locking(nor))
+		mutex_lock(&nor->lock);
+	else
+		ret = wait_event_killable(nor->rww.wait,
+					  spi_nor_rww_start_exclusive(nor));
 
-	return 0;
+	return ret;
 }
 
 void spi_nor_unlock_and_unprep(struct spi_nor *nor)
 {
-	mutex_unlock(&nor->lock);
+	if (!spi_nor_parallel_locking(nor)) {
+		mutex_unlock(&nor->lock);
+	} else {
+		spi_nor_rww_end_exclusive(nor);
+		wake_up(&nor->rww.wait);
+	}
 
 	spi_nor_unprep(nor);
 }
 
 /* Internal locking helpers for program and erase operations */
+static bool spi_nor_rww_start_pe(struct spi_nor *nor, loff_t start, size_t len)
+{
+	unsigned int first, last;
+	bool started = false;
+	int bank;
+
+	mutex_lock(&nor->lock);
+
+	if (nor->rww.ongoing_io || nor->rww.ongoing_rd || nor->rww.ongoing_pe)
+		goto busy;
+
+	spi_nor_offset_to_banks(nor, start, len, &first, &last);
+	for (bank = first; bank <= last; bank++)
+		if (nor->rww.used_banks & BIT(bank))
+			goto busy;
+
+	for (bank = first; bank <= last; bank++)
+		nor->rww.used_banks |= BIT(bank);
+
+	nor->rww.ongoing_pe = true;
+	started = true;
+
+busy:
+	mutex_unlock(&nor->lock);
+	return started;
+}
+
+static void spi_nor_rww_end_pe(struct spi_nor *nor, loff_t start, size_t len)
+{
+	unsigned int first, last;
+	int bank;
+
+	mutex_lock(&nor->lock);
+
+	spi_nor_offset_to_banks(nor, start, len, &first, &last);
+	for (bank = first; bank <= last; bank++)
+		nor->rww.used_banks &= ~BIT(bank);
+
+	nor->rww.ongoing_pe = false;
+
+	mutex_unlock(&nor->lock);
+}
+
 static int spi_nor_prep_and_lock_pe(struct spi_nor *nor, loff_t start, size_t len)
 {
 	int ret;
@@ -1117,19 +1313,73 @@  static int spi_nor_prep_and_lock_pe(struct spi_nor *nor, loff_t start, size_t le
 	if (ret)
 		return ret;
 
-	mutex_lock(&nor->lock);
+	if (!spi_nor_parallel_locking(nor))
+		mutex_lock(&nor->lock);
+	else
+		ret = wait_event_killable(nor->rww.wait,
+					  spi_nor_rww_start_pe(nor, start, len));
 
-	return 0;
+	return ret;
 }
 
 static void spi_nor_unlock_and_unprep_pe(struct spi_nor *nor, loff_t start, size_t len)
 {
-	mutex_unlock(&nor->lock);
+	if (!spi_nor_parallel_locking(nor)) {
+		mutex_unlock(&nor->lock);
+	} else {
+		spi_nor_rww_end_pe(nor, start, len);
+		wake_up(&nor->rww.wait);
+	}
 
 	spi_nor_unprep(nor);
 }
 
 /* Internal locking helpers for read operations */
+static bool spi_nor_rww_start_rd(struct spi_nor *nor, loff_t start, size_t len)
+{
+	unsigned int first, last;
+	bool started = false;
+	int bank;
+
+	mutex_lock(&nor->lock);
+
+	if (nor->rww.ongoing_io || nor->rww.ongoing_rd)
+		goto busy;
+
+	spi_nor_offset_to_banks(nor, start, len, &first, &last);
+	for (bank = first; bank <= last; bank++)
+		if (nor->rww.used_banks & BIT(bank))
+			goto busy;
+
+	for (bank = first; bank <= last; bank++)
+		nor->rww.used_banks |= BIT(bank);
+
+	nor->rww.ongoing_io = true;
+	nor->rww.ongoing_rd = true;
+	started = true;
+
+busy:
+	mutex_unlock(&nor->lock);
+	return started;
+}
+
+static void spi_nor_rww_end_rd(struct spi_nor *nor, loff_t start, size_t len)
+{
+	unsigned int first, last;
+	int bank;
+
+	mutex_lock(&nor->lock);
+
+	spi_nor_offset_to_banks(nor, start, len, &first, &last);
+	for (bank = first; bank <= last; bank++)
+		nor->rww.used_banks &= ~BIT(bank);
+
+	nor->rww.ongoing_io = false;
+	nor->rww.ongoing_rd = false;
+
+	mutex_unlock(&nor->lock);
+}
+
 static int spi_nor_prep_and_lock_rd(struct spi_nor *nor, loff_t start, size_t len)
 {
 	int ret;
@@ -1138,14 +1388,23 @@  static int spi_nor_prep_and_lock_rd(struct spi_nor *nor, loff_t start, size_t le
 	if (ret)
 		return ret;
 
-	mutex_lock(&nor->lock);
+	if (!spi_nor_parallel_locking(nor))
+		mutex_lock(&nor->lock);
+	else
+		ret = wait_event_killable(nor->rww.wait,
+					  spi_nor_rww_start_rd(nor, start, len));
 
-	return 0;
+	return ret;
 }
 
 static void spi_nor_unlock_and_unprep_rd(struct spi_nor *nor, loff_t start, size_t len)
 {
-	mutex_unlock(&nor->lock);
+	if (!spi_nor_parallel_locking(nor)) {
+		mutex_unlock(&nor->lock);
+	} else {
+		spi_nor_rww_end_rd(nor, start, len);
+		wake_up(&nor->rww.wait);
+	}
 
 	spi_nor_unprep(nor);
 }
@@ -1454,11 +1713,18 @@  static int spi_nor_erase_multi_sectors(struct spi_nor *nor, u64 addr, u32 len)
 			dev_vdbg(nor->dev, "erase_cmd->size = 0x%08x, erase_cmd->opcode = 0x%02x, erase_cmd->count = %u\n",
 				 cmd->size, cmd->opcode, cmd->count);
 
-			ret = spi_nor_write_enable(nor);
+			ret = spi_nor_lock_device(nor);
 			if (ret)
 				goto destroy_erase_cmd_list;
 
+			ret = spi_nor_write_enable(nor);
+			if (ret) {
+				spi_nor_unlock_device(nor);
+				goto destroy_erase_cmd_list;
+			}
+
 			ret = spi_nor_erase_sector(nor, addr);
+			spi_nor_unlock_device(nor);
 			if (ret)
 				goto destroy_erase_cmd_list;
 
@@ -1511,11 +1777,18 @@  static int spi_nor_erase(struct mtd_info *mtd, struct erase_info *instr)
 	if (len == mtd->size && !(nor->flags & SNOR_F_NO_OP_CHIP_ERASE)) {
 		unsigned long timeout;
 
-		ret = spi_nor_write_enable(nor);
+		ret = spi_nor_lock_device(nor);
 		if (ret)
 			goto erase_err;
 
+		ret = spi_nor_write_enable(nor);
+		if (ret) {
+			spi_nor_unlock_device(nor);
+			goto erase_err;
+		}
+
 		ret = spi_nor_erase_chip(nor);
+		spi_nor_unlock_device(nor);
 		if (ret)
 			goto erase_err;
 
@@ -1540,11 +1813,18 @@  static int spi_nor_erase(struct mtd_info *mtd, struct erase_info *instr)
 	/* "sector"-at-a-time erase */
 	} else if (spi_nor_has_uniform_erase(nor)) {
 		while (len) {
-			ret = spi_nor_write_enable(nor);
+			ret = spi_nor_lock_device(nor);
 			if (ret)
 				goto erase_err;
 
+			ret = spi_nor_write_enable(nor);
+			if (ret) {
+				spi_nor_unlock_device(nor);
+				goto erase_err;
+			}
+
 			ret = spi_nor_erase_sector(nor, addr);
+			spi_nor_unlock_device(nor);
 			if (ret)
 				goto erase_err;
 
@@ -1837,11 +2117,18 @@  static int spi_nor_write(struct mtd_info *mtd, loff_t to, size_t len,
 
 		addr = spi_nor_convert_addr(nor, addr);
 
-		ret = spi_nor_write_enable(nor);
+		ret = spi_nor_lock_device(nor);
 		if (ret)
 			goto write_err;
 
+		ret = spi_nor_write_enable(nor);
+		if (ret) {
+			spi_nor_unlock_device(nor);
+			goto write_err;
+		}
+
 		ret = spi_nor_write_data(nor, addr, page_remain, buf + i);
+		spi_nor_unlock_device(nor);
 		if (ret < 0)
 			goto write_err;
 		written = ret;
@@ -3111,6 +3398,8 @@  int spi_nor_scan(struct spi_nor *nor, const char *name,
 	nor->info = info;
 
 	mutex_init(&nor->lock);
+	if (spi_nor_parallel_locking(nor))
+		init_waitqueue_head(&nor->rww.wait);
 
 	/* Init flash parameters based on flash_info struct and SFDP */
 	ret = spi_nor_init_params(nor);
diff --git a/include/linux/mtd/spi-nor.h b/include/linux/mtd/spi-nor.h
index 25765556223a..ec338463d563 100644
--- a/include/linux/mtd/spi-nor.h
+++ b/include/linux/mtd/spi-nor.h
@@ -344,6 +344,12 @@  struct spi_nor_flash_parameter;
  * struct spi_nor - Structure for defining the SPI NOR layer
  * @mtd:		an mtd_info structure
  * @lock:		the lock for the read/write/erase/lock/unlock operations
+ * @rww:		Read-While-Write (RWW) sync lock
+ * @rww.wait:		wait queue for the RWW sync
+ * @rww.ongoing_io:	the bus is busy
+ * @rww.ongoing_rd:	a read is ongoing on the chip
+ * @rww.ongoing_pe:	a program/erase is ongoing on the chip
+ * @rww.used_banks:	bitmap of the banks in use
  * @dev:		pointer to an SPI device or an SPI NOR controller device
  * @spimem:		pointer to the SPI memory device
  * @bouncebuf:		bounce buffer used when the buffer passed by the MTD
@@ -377,6 +383,13 @@  struct spi_nor_flash_parameter;
 struct spi_nor {
 	struct mtd_info		mtd;
 	struct mutex		lock;
+	struct {
+		wait_queue_head_t wait;
+		bool		ongoing_io;
+		bool		ongoing_rd;
+		bool		ongoing_pe;
+		unsigned int	used_banks;
+	} rww;
 	struct device		*dev;
 	struct spi_mem		*spimem;
 	u8			*bouncebuf;