Message ID | 20230412061700.1492474-5-AVKrasnov@sberdevices.ru (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | refactoring and fix for Meson NAND | expand |
Hi Arseniy, AVKrasnov@sberdevices.ru wrote on Wed, 12 Apr 2023 09:16:58 +0300: > This NAND reads only few user's bytes in ECC mode (not full OOB), so "This NAND reads" does not look right, do you mean "Subpage reads do not retrieve all the OOB bytes,"? > fill OOB buffer with zeroes to not return garbage from previous reads > to user. > Otherwise 'nanddump' utility prints something like this for just erased > page: > > ... > 0x000007f0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > OOB Data: ff ff ff ff 00 00 ff ff 80 cf 22 99 cb ad d3 be > OOB Data: 63 27 ae 06 16 0a 2f eb bb dd 46 74 41 8e 88 6e > OOB Data: 38 a1 2d e6 77 d4 05 06 f2 a5 7e 25 eb 34 7c ff > OOB Data: 38 ea de 14 10 de 9b 40 33 16 6a cc 9d aa 2f 5e > > Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru> > --- > drivers/mtd/nand/raw/meson_nand.c | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/drivers/mtd/nand/raw/meson_nand.c b/drivers/mtd/nand/raw/meson_nand.c > index f84a10238e4d..f2f2472cb511 100644 > --- a/drivers/mtd/nand/raw/meson_nand.c > +++ b/drivers/mtd/nand/raw/meson_nand.c > @@ -858,9 +858,12 @@ static int meson_nfc_read_page_sub(struct nand_chip *nand, > static int meson_nfc_read_page_raw(struct nand_chip *nand, u8 *buf, > int oob_required, int page) > { > + struct mtd_info *mtd = nand_to_mtd(nand); > u8 *oob_buf = nand->oob_poi; > int ret; > > + memset(oob_buf, 0, mtd->oobsize); I'm surprised raw reads do not read the entire OOB? > + > ret = meson_nfc_read_page_sub(nand, page, 1); > if (ret) > return ret; > @@ -881,6 +884,8 @@ static int meson_nfc_read_page_hwecc(struct nand_chip *nand, u8 *buf, > u8 *oob_buf = nand->oob_poi; > int ret, i; > > + memset(oob_buf, 0, mtd->oobsize); > + > ret = meson_nfc_read_page_sub(nand, page, 0); > if (ret) > return ret; Thanks, Miquèl
Hi again, miquel.raynal@bootlin.com wrote on Wed, 12 Apr 2023 09:44:00 +0200: > Hi Arseniy, > > AVKrasnov@sberdevices.ru wrote on Wed, 12 Apr 2023 09:16:58 +0300: > > > This NAND reads only few user's bytes in ECC mode (not full OOB), so > > "This NAND reads" does not look right, do you mean "Subpage reads do > not retrieve all the OOB bytes,"? > > > fill OOB buffer with zeroes to not return garbage from previous reads > > to user. > > Otherwise 'nanddump' utility prints something like this for just erased > > page: > > > > ... > > 0x000007f0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > > OOB Data: ff ff ff ff 00 00 ff ff 80 cf 22 99 cb ad d3 be > > OOB Data: 63 27 ae 06 16 0a 2f eb bb dd 46 74 41 8e 88 6e > > OOB Data: 38 a1 2d e6 77 d4 05 06 f2 a5 7e 25 eb 34 7c ff > > OOB Data: 38 ea de 14 10 de 9b 40 33 16 6a cc 9d aa 2f 5e > > > > Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru> > > --- > > drivers/mtd/nand/raw/meson_nand.c | 5 +++++ > > 1 file changed, 5 insertions(+) > > > > diff --git a/drivers/mtd/nand/raw/meson_nand.c b/drivers/mtd/nand/raw/meson_nand.c > > index f84a10238e4d..f2f2472cb511 100644 > > --- a/drivers/mtd/nand/raw/meson_nand.c > > +++ b/drivers/mtd/nand/raw/meson_nand.c > > @@ -858,9 +858,12 @@ static int meson_nfc_read_page_sub(struct nand_chip *nand, > > static int meson_nfc_read_page_raw(struct nand_chip *nand, u8 *buf, > > int oob_required, int page) > > { > > + struct mtd_info *mtd = nand_to_mtd(nand); > > u8 *oob_buf = nand->oob_poi; > > int ret; > > > > + memset(oob_buf, 0, mtd->oobsize); Should use 0xff instead of 0x00, that's the default state of a NAND cell. And also this memset should be conditioned to 'oob_required' I guess? > > I'm surprised raw reads do not read the entire OOB? > > > + > > ret = meson_nfc_read_page_sub(nand, page, 1); > > if (ret) > > return ret; > > @@ -881,6 +884,8 @@ static int meson_nfc_read_page_hwecc(struct nand_chip *nand, u8 *buf, > > u8 *oob_buf = nand->oob_poi; > > int ret, i; > > > > + memset(oob_buf, 0, mtd->oobsize); > > + > > ret = meson_nfc_read_page_sub(nand, page, 0); > > if (ret) > > return ret; > > > Thanks, > Miquèl Thanks, Miquèl
On 12.04.2023 10:44, Miquel Raynal wrote: > Hi Arseniy, > > AVKrasnov@sberdevices.ru wrote on Wed, 12 Apr 2023 09:16:58 +0300: > >> This NAND reads only few user's bytes in ECC mode (not full OOB), so > > "This NAND reads" does not look right, do you mean "Subpage reads do > not retrieve all the OOB bytes,"? > >> fill OOB buffer with zeroes to not return garbage from previous reads >> to user. >> Otherwise 'nanddump' utility prints something like this for just erased >> page: >> >> ... >> 0x000007f0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff >> OOB Data: ff ff ff ff 00 00 ff ff 80 cf 22 99 cb ad d3 be >> OOB Data: 63 27 ae 06 16 0a 2f eb bb dd 46 74 41 8e 88 6e >> OOB Data: 38 a1 2d e6 77 d4 05 06 f2 a5 7e 25 eb 34 7c ff >> OOB Data: 38 ea de 14 10 de 9b 40 33 16 6a cc 9d aa 2f 5e >> >> Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru> >> --- >> drivers/mtd/nand/raw/meson_nand.c | 5 +++++ >> 1 file changed, 5 insertions(+) >> >> diff --git a/drivers/mtd/nand/raw/meson_nand.c b/drivers/mtd/nand/raw/meson_nand.c >> index f84a10238e4d..f2f2472cb511 100644 >> --- a/drivers/mtd/nand/raw/meson_nand.c >> +++ b/drivers/mtd/nand/raw/meson_nand.c >> @@ -858,9 +858,12 @@ static int meson_nfc_read_page_sub(struct nand_chip *nand, >> static int meson_nfc_read_page_raw(struct nand_chip *nand, u8 *buf, >> int oob_required, int page) >> { >> + struct mtd_info *mtd = nand_to_mtd(nand); >> u8 *oob_buf = nand->oob_poi; >> int ret; >> >> + memset(oob_buf, 0, mtd->oobsize); > > I'm surprised raw reads do not read the entire OOB? Yes! Seems in case of raw access (what i see in this driver) number of OOB bytes read still depends on ECC parameters: for each portion of data covered with ECC code we can read it's ECC code and "user bytes" from OOB - it is what i see by dumping DMA buffer by printk(). For example I'm working with 2K NAND pages, each page has 2 x 1K ECC blocks. For each ECC block I have 16 OOB bytes which I can access by read/write. Each 16 bytes contains 2 bytes of user's data and 14 bytes ECC codes. So when I read page in raw mode controller returns 32 bytes (2 x (2 + 14)) of OOB. While OOB is reported as 64 bytes. Thanks, Arseniy > >> + >> ret = meson_nfc_read_page_sub(nand, page, 1); >> if (ret) >> return ret; >> @@ -881,6 +884,8 @@ static int meson_nfc_read_page_hwecc(struct nand_chip *nand, u8 *buf, >> u8 *oob_buf = nand->oob_poi; >> int ret, i; >> >> + memset(oob_buf, 0, mtd->oobsize); >> + >> ret = meson_nfc_read_page_sub(nand, page, 0); >> if (ret) >> return ret; > > > Thanks, > Miquèl
Hi Arseniy, avkrasnov@sberdevices.ru wrote on Wed, 12 Apr 2023 12:20:55 +0300: > On 12.04.2023 10:44, Miquel Raynal wrote: > > Hi Arseniy, > > > > AVKrasnov@sberdevices.ru wrote on Wed, 12 Apr 2023 09:16:58 +0300: > > > >> This NAND reads only few user's bytes in ECC mode (not full OOB), so > > > > "This NAND reads" does not look right, do you mean "Subpage reads do > > not retrieve all the OOB bytes,"? > > > >> fill OOB buffer with zeroes to not return garbage from previous reads > >> to user. > >> Otherwise 'nanddump' utility prints something like this for just erased > >> page: > >> > >> ... > >> 0x000007f0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > >> OOB Data: ff ff ff ff 00 00 ff ff 80 cf 22 99 cb ad d3 be > >> OOB Data: 63 27 ae 06 16 0a 2f eb bb dd 46 74 41 8e 88 6e > >> OOB Data: 38 a1 2d e6 77 d4 05 06 f2 a5 7e 25 eb 34 7c ff > >> OOB Data: 38 ea de 14 10 de 9b 40 33 16 6a cc 9d aa 2f 5e > >> > >> Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru> > >> --- > >> drivers/mtd/nand/raw/meson_nand.c | 5 +++++ > >> 1 file changed, 5 insertions(+) > >> > >> diff --git a/drivers/mtd/nand/raw/meson_nand.c b/drivers/mtd/nand/raw/meson_nand.c > >> index f84a10238e4d..f2f2472cb511 100644 > >> --- a/drivers/mtd/nand/raw/meson_nand.c > >> +++ b/drivers/mtd/nand/raw/meson_nand.c > >> @@ -858,9 +858,12 @@ static int meson_nfc_read_page_sub(struct nand_chip *nand, > >> static int meson_nfc_read_page_raw(struct nand_chip *nand, u8 *buf, > >> int oob_required, int page) > >> { > >> + struct mtd_info *mtd = nand_to_mtd(nand); > >> u8 *oob_buf = nand->oob_poi; > >> int ret; > >> > >> + memset(oob_buf, 0, mtd->oobsize); > > > > I'm surprised raw reads do not read the entire OOB? > > Yes! Seems in case of raw access (what i see in this driver) number of OOB bytes read > still depends on ECC parameters: for each portion of data covered with ECC code we can > read it's ECC code and "user bytes" from OOB - it is what i see by dumping DMA buffer by > printk(). For example I'm working with 2K NAND pages, each page has 2 x 1K ECC blocks. > For each ECC block I have 16 OOB bytes which I can access by read/write. Each 16 bytes > contains 2 bytes of user's data and 14 bytes ECC codes. So when I read page in raw mode > controller returns 32 bytes (2 x (2 + 14)) of OOB. While OOB is reported as 64 bytes. In all modes, when you read OOB, you should get the full OOB. The fact that ECC correction is enabled or disabled does not matter. If the NAND features OOB sections of 64 bytes, you should get the 64 bytes. What happens sometimes, is that some of the bytes are not protected against bitflips, but the policy is to return the full buffer. > > Thanks, Arseniy > > > > >> + > >> ret = meson_nfc_read_page_sub(nand, page, 1); > >> if (ret) > >> return ret; > >> @@ -881,6 +884,8 @@ static int meson_nfc_read_page_hwecc(struct nand_chip *nand, u8 *buf, > >> u8 *oob_buf = nand->oob_poi; > >> int ret, i; > >> > >> + memset(oob_buf, 0, mtd->oobsize); > >> + > >> ret = meson_nfc_read_page_sub(nand, page, 0); > >> if (ret) > >> return ret; > > > > > > Thanks, > > Miquèl Thanks, Miquèl
On 12.04.2023 12:36, Miquel Raynal wrote: > Hi Arseniy, > > avkrasnov@sberdevices.ru wrote on Wed, 12 Apr 2023 12:20:55 +0300: > >> On 12.04.2023 10:44, Miquel Raynal wrote: >>> Hi Arseniy, >>> >>> AVKrasnov@sberdevices.ru wrote on Wed, 12 Apr 2023 09:16:58 +0300: >>> >>>> This NAND reads only few user's bytes in ECC mode (not full OOB), so >>> >>> "This NAND reads" does not look right, do you mean "Subpage reads do >>> not retrieve all the OOB bytes,"? >>> >>>> fill OOB buffer with zeroes to not return garbage from previous reads >>>> to user. >>>> Otherwise 'nanddump' utility prints something like this for just erased >>>> page: >>>> >>>> ... >>>> 0x000007f0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff >>>> OOB Data: ff ff ff ff 00 00 ff ff 80 cf 22 99 cb ad d3 be >>>> OOB Data: 63 27 ae 06 16 0a 2f eb bb dd 46 74 41 8e 88 6e >>>> OOB Data: 38 a1 2d e6 77 d4 05 06 f2 a5 7e 25 eb 34 7c ff >>>> OOB Data: 38 ea de 14 10 de 9b 40 33 16 6a cc 9d aa 2f 5e >>>> >>>> Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru> >>>> --- >>>> drivers/mtd/nand/raw/meson_nand.c | 5 +++++ >>>> 1 file changed, 5 insertions(+) >>>> >>>> diff --git a/drivers/mtd/nand/raw/meson_nand.c b/drivers/mtd/nand/raw/meson_nand.c >>>> index f84a10238e4d..f2f2472cb511 100644 >>>> --- a/drivers/mtd/nand/raw/meson_nand.c >>>> +++ b/drivers/mtd/nand/raw/meson_nand.c >>>> @@ -858,9 +858,12 @@ static int meson_nfc_read_page_sub(struct nand_chip *nand, >>>> static int meson_nfc_read_page_raw(struct nand_chip *nand, u8 *buf, >>>> int oob_required, int page) >>>> { >>>> + struct mtd_info *mtd = nand_to_mtd(nand); >>>> u8 *oob_buf = nand->oob_poi; >>>> int ret; >>>> >>>> + memset(oob_buf, 0, mtd->oobsize); >>> >>> I'm surprised raw reads do not read the entire OOB? >> >> Yes! Seems in case of raw access (what i see in this driver) number of OOB bytes read >> still depends on ECC parameters: for each portion of data covered with ECC code we can >> read it's ECC code and "user bytes" from OOB - it is what i see by dumping DMA buffer by >> printk(). For example I'm working with 2K NAND pages, each page has 2 x 1K ECC blocks. >> For each ECC block I have 16 OOB bytes which I can access by read/write. Each 16 bytes >> contains 2 bytes of user's data and 14 bytes ECC codes. So when I read page in raw mode >> controller returns 32 bytes (2 x (2 + 14)) of OOB. While OOB is reported as 64 bytes. > > In all modes, when you read OOB, you should get the full OOB. The fact > that ECC correction is enabled or disabled does not matter. If the NAND > features OOB sections of 64 bytes, you should get the 64 bytes. > > What happens sometimes, is that some of the bytes are not protected > against bitflips, but the policy is to return the full buffer. Ok, so to clarify case for this NAND controller: 1) In both ECC and raw modes i need to return the same raw OOB data (e.g. user bytes + ECC codes)? 2) If I have access to only 32 bytes of OOB (in case above), I must report that size of OOB is only 32 bytes during initialization? Thanks, Arseniy > >> >> Thanks, Arseniy >> >>> >>>> + >>>> ret = meson_nfc_read_page_sub(nand, page, 1); >>>> if (ret) >>>> return ret; >>>> @@ -881,6 +884,8 @@ static int meson_nfc_read_page_hwecc(struct nand_chip *nand, u8 *buf, >>>> u8 *oob_buf = nand->oob_poi; >>>> int ret, i; >>>> >>>> + memset(oob_buf, 0, mtd->oobsize); >>>> + >>>> ret = meson_nfc_read_page_sub(nand, page, 0); >>>> if (ret) >>>> return ret; >>> >>> >>> Thanks, >>> Miquèl > > > Thanks, > Miquèl
Hi Arseniy and Miquel, On 2023/4/12 18:14, Arseniy Krasnov wrote: > [ EXTERNAL EMAIL ] > > > > On 12.04.2023 12:36, Miquel Raynal wrote: >> Hi Arseniy, >> >> avkrasnov@sberdevices.ru wrote on Wed, 12 Apr 2023 12:20:55 +0300: >> >>> On 12.04.2023 10:44, Miquel Raynal wrote: >>>> Hi Arseniy, >>>> >>>> AVKrasnov@sberdevices.ru wrote on Wed, 12 Apr 2023 09:16:58 +0300: >>>> >>>>> This NAND reads only few user's bytes in ECC mode (not full OOB), so >>>> >>>> "This NAND reads" does not look right, do you mean "Subpage reads do >>>> not retrieve all the OOB bytes,"? >>>> >>>>> fill OOB buffer with zeroes to not return garbage from previous reads >>>>> to user. >>>>> Otherwise 'nanddump' utility prints something like this for just erased >>>>> page: >>>>> >>>>> ... >>>>> 0x000007f0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff >>>>> OOB Data: ff ff ff ff 00 00 ff ff 80 cf 22 99 cb ad d3 be >>>>> OOB Data: 63 27 ae 06 16 0a 2f eb bb dd 46 74 41 8e 88 6e >>>>> OOB Data: 38 a1 2d e6 77 d4 05 06 f2 a5 7e 25 eb 34 7c ff >>>>> OOB Data: 38 ea de 14 10 de 9b 40 33 16 6a cc 9d aa 2f 5e >>>>> >>>>> Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru> >>>>> --- >>>>> drivers/mtd/nand/raw/meson_nand.c | 5 +++++ >>>>> 1 file changed, 5 insertions(+) >>>>> >>>>> diff --git a/drivers/mtd/nand/raw/meson_nand.c b/drivers/mtd/nand/raw/meson_nand.c >>>>> index f84a10238e4d..f2f2472cb511 100644 >>>>> --- a/drivers/mtd/nand/raw/meson_nand.c >>>>> +++ b/drivers/mtd/nand/raw/meson_nand.c >>>>> @@ -858,9 +858,12 @@ static int meson_nfc_read_page_sub(struct nand_chip *nand, >>>>> static int meson_nfc_read_page_raw(struct nand_chip *nand, u8 *buf, >>>>> int oob_required, int page) >>>>> { >>>>> + struct mtd_info *mtd = nand_to_mtd(nand); >>>>> u8 *oob_buf = nand->oob_poi; >>>>> int ret; >>>>> >>>>> + memset(oob_buf, 0, mtd->oobsize); >>>> >>>> I'm surprised raw reads do not read the entire OOB? >>> >>> Yes! Seems in case of raw access (what i see in this driver) number of OOB bytes read >>> still depends on ECC parameters: for each portion of data covered with ECC code we can >>> read it's ECC code and "user bytes" from OOB - it is what i see by dumping DMA buffer by >>> printk(). For example I'm working with 2K NAND pages, each page has 2 x 1K ECC blocks. >>> For each ECC block I have 16 OOB bytes which I can access by read/write. Each 16 bytes >>> contains 2 bytes of user's data and 14 bytes ECC codes. So when I read page in raw mode >>> controller returns 32 bytes (2 x (2 + 14)) of OOB. While OOB is reported as 64 bytes. >> >> In all modes, when you read OOB, you should get the full OOB. The fact >> that ECC correction is enabled or disabled does not matter. If the NAND >> features OOB sections of 64 bytes, you should get the 64 bytes. >> >> What happens sometimes, is that some of the bytes are not protected >> against bitflips, but the policy is to return the full buffer. > > Ok, so to clarify case for this NAND controller: > 1) In both ECC and raw modes i need to return the same raw OOB data (e.g. user bytes > + ECC codes)? > 2) If I have access to only 32 bytes of OOB (in case above), I must report that size > of OOB is only 32 bytes during initialization? > > Thanks, Arseniy Yes. it should return all the OOB data. i make a mistake on raw read and there is wrong code in meson_nfc_read_page_raw(). meson_nfc_get_data_oob(nand, buf, oob_buf); changed to: if (oob_required) memcpy(oob_buf, buf + mtd->writesize, mtd->oobsize) for the ECC mode, i define the meson_ooblayout_ops in host driver. > >> >>> >>> Thanks, Arseniy >>> >>>> >>>>> + >>>>> ret = meson_nfc_read_page_sub(nand, page, 1); >>>>> if (ret) >>>>> return ret; >>>>> @@ -881,6 +884,8 @@ static int meson_nfc_read_page_hwecc(struct nand_chip *nand, u8 *buf, >>>>> u8 *oob_buf = nand->oob_poi; >>>>> int ret, i; >>>>> >>>>> + memset(oob_buf, 0, mtd->oobsize); >>>>> + >>>>> ret = meson_nfc_read_page_sub(nand, page, 0); >>>>> if (ret) >>>>> return ret; >>>> >>>> >>>> Thanks, >>>> Miquèl >> >> >> Thanks, >> Miquèl >
Hi, On 2023/4/12 18:51, Liang Yang wrote: >>>>>> diff --git a/drivers/mtd/nand/raw/meson_nand.c >>>>>> b/drivers/mtd/nand/raw/meson_nand.c >>>>>> index f84a10238e4d..f2f2472cb511 100644 >>>>>> --- a/drivers/mtd/nand/raw/meson_nand.c >>>>>> +++ b/drivers/mtd/nand/raw/meson_nand.c >>>>>> @@ -858,9 +858,12 @@ static int meson_nfc_read_page_sub(struct >>>>>> nand_chip *nand, >>>>>> static int meson_nfc_read_page_raw(struct nand_chip *nand, u8 *buf, >>>>>> int oob_required, int page) >>>>>> { >>>>>> + struct mtd_info *mtd = nand_to_mtd(nand); >>>>>> u8 *oob_buf = nand->oob_poi; >>>>>> int ret; >>>>>> + memset(oob_buf, 0, mtd->oobsize); >>>>> >>>>> I'm surprised raw reads do not read the entire OOB? >>>> >>>> Yes! Seems in case of raw access (what i see in this driver) number >>>> of OOB bytes read >>>> still depends on ECC parameters: for each portion of data covered >>>> with ECC code we can >>>> read it's ECC code and "user bytes" from OOB - it is what i see by >>>> dumping DMA buffer by >>>> printk(). For example I'm working with 2K NAND pages, each page has >>>> 2 x 1K ECC blocks. >>>> For each ECC block I have 16 OOB bytes which I can access by >>>> read/write. Each 16 bytes >>>> contains 2 bytes of user's data and 14 bytes ECC codes. So when I >>>> read page in raw mode >>>> controller returns 32 bytes (2 x (2 + 14)) of OOB. While OOB is >>>> reported as 64 bytes. >>> >>> In all modes, when you read OOB, you should get the full OOB. The fact >>> that ECC correction is enabled or disabled does not matter. If the NAND >>> features OOB sections of 64 bytes, you should get the 64 bytes. >>> >>> What happens sometimes, is that some of the bytes are not protected >>> against bitflips, but the policy is to return the full buffer. >> >> Ok, so to clarify case for this NAND controller: >> 1) In both ECC and raw modes i need to return the same raw OOB data >> (e.g. user bytes >> + ECC codes)? >> 2) If I have access to only 32 bytes of OOB (in case above), I must >> report that size >> of OOB is only 32 bytes during initialization? >> >> Thanks, Arseniy > > Yes. it should return all the OOB data. i make a mistake on raw read and > there is wrong code in meson_nfc_read_page_raw(). > meson_nfc_get_data_oob(nand, buf, oob_buf); > changed to: > if (oob_required) > memcpy(oob_buf, buf + mtd->writesize, mtd->oobsize) Sorry, please ignore this. the previous code is right. the controller changes the layout of one page; the physical layout is 2048 main data + 64 oob data. after writing into NAND page, it is stored like this: 1024 main data + 2 user bytes + 14 ECC parity bytes + 1024 main data + 2 user bytes + 14 ECC parity bytes. so that is right we only get 4 user bytes and 28 ECC parity bytes, total 32 bytes. that is the behavior of the controller that transferring one ECC page(1KB) brings back only 2 user bytes. because layout is changed by controller, so go back to the function. meson_nfc_get_data_oob(nand, buf, oob_buf) try to get the right user and ecc parity bytes from the right pos. after that, the other oob bytes is not reading from NAND flash.
Hello Liang, On Wed, Apr 12, 2023 at 07:36:30PM +0800, Liang Yang wrote: > Hi, > > On 2023/4/12 18:51, Liang Yang wrote: > > > > > > > diff --git a/drivers/mtd/nand/raw/meson_nand.c > > > > > > > b/drivers/mtd/nand/raw/meson_nand.c > > > > > > > index f84a10238e4d..f2f2472cb511 100644 > > > > > > > --- a/drivers/mtd/nand/raw/meson_nand.c > > > > > > > +++ b/drivers/mtd/nand/raw/meson_nand.c > > > > > > > @@ -858,9 +858,12 @@ static int > > > > > > > meson_nfc_read_page_sub(struct nand_chip *nand, > > > > > > > static int meson_nfc_read_page_raw(struct nand_chip *nand, u8 *buf, > > > > > > > int oob_required, int page) > > > > > > > { > > > > > > > + struct mtd_info *mtd = nand_to_mtd(nand); > > > > > > > u8 *oob_buf = nand->oob_poi; > > > > > > > int ret; > > > > > > > + memset(oob_buf, 0, mtd->oobsize); > > > > > > > > > > > > I'm surprised raw reads do not read the entire OOB? > > > > > > > > > > Yes! Seems in case of raw access (what i see in this driver) > > > > > number of OOB bytes read > > > > > still depends on ECC parameters: for each portion of data > > > > > covered with ECC code we can > > > > > read it's ECC code and "user bytes" from OOB - it is what i > > > > > see by dumping DMA buffer by > > > > > printk(). For example I'm working with 2K NAND pages, each > > > > > page has 2 x 1K ECC blocks. > > > > > For each ECC block I have 16 OOB bytes which I can access by > > > > > read/write. Each 16 bytes > > > > > contains 2 bytes of user's data and 14 bytes ECC codes. So > > > > > when I read page in raw mode > > > > > controller returns 32 bytes (2 x (2 + 14)) of OOB. While OOB > > > > > is reported as 64 bytes. > > > > > > > > In all modes, when you read OOB, you should get the full OOB. The fact > > > > that ECC correction is enabled or disabled does not matter. If the NAND > > > > features OOB sections of 64 bytes, you should get the 64 bytes. > > > > > > > > What happens sometimes, is that some of the bytes are not protected > > > > against bitflips, but the policy is to return the full buffer. > > > > > > Ok, so to clarify case for this NAND controller: > > > 1) In both ECC and raw modes i need to return the same raw OOB data > > > (e.g. user bytes > > > + ECC codes)? > > > 2) If I have access to only 32 bytes of OOB (in case above), I must > > > report that size > > > of OOB is only 32 bytes during initialization? > > > > > > Thanks, Arseniy > > > > Yes. it should return all the OOB data. i make a mistake on raw read and > > there is wrong code in meson_nfc_read_page_raw(). > > meson_nfc_get_data_oob(nand, buf, oob_buf); > > changed to: > > if (oob_required) > > memcpy(oob_buf, buf + mtd->writesize, mtd->oobsize) > > Sorry, please ignore this. the previous code is right. > > the controller changes the layout of one page; the physical layout is 2048 > main data + 64 oob data. after writing into NAND page, it is stored > like this: 1024 main data + 2 user bytes + 14 ECC parity bytes + 1024 main > data + 2 user bytes + 14 ECC parity bytes. so that is right we only get 4 > user bytes and 28 ECC parity bytes, total 32 bytes. that is the behavior of > the controller that transferring one ECC page(1KB) brings back only 2 user > bytes. > > because layout is changed by controller, so go back to the function. > meson_nfc_get_data_oob(nand, buf, oob_buf) try to get the right user and ecc > parity bytes from the right pos. after that, the other oob bytes is not > reading from NAND flash. I have always been under the impression that NAND OOB layout falls under the responsibility of the flash driver. Is this specific to the Amlogic NAND, and does it map the flash layout to the internal controller layout? For example, different OOB layouts exist between Macronix and ESMT. Apologies for any confusion, and thank you in advance for any help in clarifying this matter.
On 12.04.2023 14:43, Dmitry Rokosov wrote: > Hello Liang, > > On Wed, Apr 12, 2023 at 07:36:30PM +0800, Liang Yang wrote: >> Hi, >> >> On 2023/4/12 18:51, Liang Yang wrote: >>>>>>>> diff --git a/drivers/mtd/nand/raw/meson_nand.c >>>>>>>> b/drivers/mtd/nand/raw/meson_nand.c >>>>>>>> index f84a10238e4d..f2f2472cb511 100644 >>>>>>>> --- a/drivers/mtd/nand/raw/meson_nand.c >>>>>>>> +++ b/drivers/mtd/nand/raw/meson_nand.c >>>>>>>> @@ -858,9 +858,12 @@ static int >>>>>>>> meson_nfc_read_page_sub(struct nand_chip *nand, >>>>>>>> static int meson_nfc_read_page_raw(struct nand_chip *nand, u8 *buf, >>>>>>>> int oob_required, int page) >>>>>>>> { >>>>>>>> + struct mtd_info *mtd = nand_to_mtd(nand); >>>>>>>> u8 *oob_buf = nand->oob_poi; >>>>>>>> int ret; >>>>>>>> + memset(oob_buf, 0, mtd->oobsize); >>>>>>> >>>>>>> I'm surprised raw reads do not read the entire OOB? >>>>>> >>>>>> Yes! Seems in case of raw access (what i see in this driver) >>>>>> number of OOB bytes read >>>>>> still depends on ECC parameters: for each portion of data >>>>>> covered with ECC code we can >>>>>> read it's ECC code and "user bytes" from OOB - it is what i >>>>>> see by dumping DMA buffer by >>>>>> printk(). For example I'm working with 2K NAND pages, each >>>>>> page has 2 x 1K ECC blocks. >>>>>> For each ECC block I have 16 OOB bytes which I can access by >>>>>> read/write. Each 16 bytes >>>>>> contains 2 bytes of user's data and 14 bytes ECC codes. So >>>>>> when I read page in raw mode >>>>>> controller returns 32 bytes (2 x (2 + 14)) of OOB. While OOB >>>>>> is reported as 64 bytes. >>>>> >>>>> In all modes, when you read OOB, you should get the full OOB. The fact >>>>> that ECC correction is enabled or disabled does not matter. If the NAND >>>>> features OOB sections of 64 bytes, you should get the 64 bytes. >>>>> >>>>> What happens sometimes, is that some of the bytes are not protected >>>>> against bitflips, but the policy is to return the full buffer. >>>> >>>> Ok, so to clarify case for this NAND controller: >>>> 1) In both ECC and raw modes i need to return the same raw OOB data >>>> (e.g. user bytes >>>> + ECC codes)? >>>> 2) If I have access to only 32 bytes of OOB (in case above), I must >>>> report that size >>>> of OOB is only 32 bytes during initialization? >>>> >>>> Thanks, Arseniy >>> >>> Yes. it should return all the OOB data. i make a mistake on raw read and >>> there is wrong code in meson_nfc_read_page_raw(). >>> meson_nfc_get_data_oob(nand, buf, oob_buf); >>> changed to: >>> if (oob_required) >>> memcpy(oob_buf, buf + mtd->writesize, mtd->oobsize) >> >> Sorry, please ignore this. the previous code is right. >> >> the controller changes the layout of one page; the physical layout is 2048 >> main data + 64 oob data. after writing into NAND page, it is stored >> like this: 1024 main data + 2 user bytes + 14 ECC parity bytes + 1024 main >> data + 2 user bytes + 14 ECC parity bytes. so that is right we only get 4 >> user bytes and 28 ECC parity bytes, total 32 bytes. that is the behavior of >> the controller that transferring one ECC page(1KB) brings back only 2 user >> bytes. So i perform the following test, using file with the following pattern: [ 1024 x 0x11 ][ 1024 x 0x22 ][ 16 x 0x33 ][ 16 x 0x44 ][ 16 x 0x55 ][ 16 x 0x66 ] Total 2112 bytes (2048 data + 64 oob) I write it using 'nandwrite' tool with flags '--oob' and '--noecc' - e.g. raw mode. meson_nfc_set_data_oob() places to DMA buffer only parts with 0x11, 0x22, 0x33, 0x44. After write data bytes will be 0x11 and 0x22, first 32 bytes of OOB will be 0x33 and 0x44. Parts with 0x55 and 0x66 will be obviously missed. So it is Amlogic limitation for 32 bytes of OOB (in case of 1K ECC as You mentioned above)? And there is no way to access to the whole 64 bytes of OOB even in raw mode (and therefore in ECC mode)? Thanks, Arseniy >> >> because layout is changed by controller, so go back to the function. >> meson_nfc_get_data_oob(nand, buf, oob_buf) try to get the right user and ecc >> parity bytes from the right pos. after that, the other oob bytes is not >> reading from NAND flash. > > I have always been under the impression that NAND OOB layout falls under the > responsibility of the flash driver. Is this specific to the Amlogic NAND, > and does it map the flash layout to the internal controller layout? > For example, different OOB layouts exist between Macronix and ESMT. > > Apologies for any confusion, and thank you in advance for any help in > clarifying this matter. >
Hi Arseniy, avkrasnov@sberdevices.ru wrote on Wed, 12 Apr 2023 13:14:52 +0300: > On 12.04.2023 12:36, Miquel Raynal wrote: > > Hi Arseniy, > > > > avkrasnov@sberdevices.ru wrote on Wed, 12 Apr 2023 12:20:55 +0300: > > > >> On 12.04.2023 10:44, Miquel Raynal wrote: > >>> Hi Arseniy, > >>> > >>> AVKrasnov@sberdevices.ru wrote on Wed, 12 Apr 2023 09:16:58 +0300: > >>> > >>>> This NAND reads only few user's bytes in ECC mode (not full OOB), so > >>> > >>> "This NAND reads" does not look right, do you mean "Subpage reads do > >>> not retrieve all the OOB bytes,"? > >>> > >>>> fill OOB buffer with zeroes to not return garbage from previous reads > >>>> to user. > >>>> Otherwise 'nanddump' utility prints something like this for just erased > >>>> page: > >>>> > >>>> ... > >>>> 0x000007f0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > >>>> OOB Data: ff ff ff ff 00 00 ff ff 80 cf 22 99 cb ad d3 be > >>>> OOB Data: 63 27 ae 06 16 0a 2f eb bb dd 46 74 41 8e 88 6e > >>>> OOB Data: 38 a1 2d e6 77 d4 05 06 f2 a5 7e 25 eb 34 7c ff > >>>> OOB Data: 38 ea de 14 10 de 9b 40 33 16 6a cc 9d aa 2f 5e > >>>> > >>>> Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru> > >>>> --- > >>>> drivers/mtd/nand/raw/meson_nand.c | 5 +++++ > >>>> 1 file changed, 5 insertions(+) > >>>> > >>>> diff --git a/drivers/mtd/nand/raw/meson_nand.c b/drivers/mtd/nand/raw/meson_nand.c > >>>> index f84a10238e4d..f2f2472cb511 100644 > >>>> --- a/drivers/mtd/nand/raw/meson_nand.c > >>>> +++ b/drivers/mtd/nand/raw/meson_nand.c > >>>> @@ -858,9 +858,12 @@ static int meson_nfc_read_page_sub(struct nand_chip *nand, > >>>> static int meson_nfc_read_page_raw(struct nand_chip *nand, u8 *buf, > >>>> int oob_required, int page) > >>>> { > >>>> + struct mtd_info *mtd = nand_to_mtd(nand); > >>>> u8 *oob_buf = nand->oob_poi; > >>>> int ret; > >>>> > >>>> + memset(oob_buf, 0, mtd->oobsize); > >>> > >>> I'm surprised raw reads do not read the entire OOB? > >> > >> Yes! Seems in case of raw access (what i see in this driver) number of OOB bytes read > >> still depends on ECC parameters: for each portion of data covered with ECC code we can > >> read it's ECC code and "user bytes" from OOB - it is what i see by dumping DMA buffer by > >> printk(). For example I'm working with 2K NAND pages, each page has 2 x 1K ECC blocks. > >> For each ECC block I have 16 OOB bytes which I can access by read/write. Each 16 bytes > >> contains 2 bytes of user's data and 14 bytes ECC codes. So when I read page in raw mode > >> controller returns 32 bytes (2 x (2 + 14)) of OOB. While OOB is reported as 64 bytes. > > > > In all modes, when you read OOB, you should get the full OOB. The fact > > that ECC correction is enabled or disabled does not matter. If the NAND > > features OOB sections of 64 bytes, you should get the 64 bytes. > > > > What happens sometimes, is that some of the bytes are not protected > > against bitflips, but the policy is to return the full buffer. > > Ok, so to clarify case for this NAND controller: > 1) In both ECC and raw modes i need to return the same raw OOB data (e.g. user bytes > + ECC codes)? Well, you need to cover the same amount of data, yes. But in the ECC case the data won't be raw (at least not all of it). > 2) If I have access to only 32 bytes of OOB (in case above), I must report that size > of OOB is only 32 bytes during initialization? I believe it's just an implementation error in the controller driver. This controller can be used on NAND chips with other geometries, please don't play with it. Thanks, Miquèl
On 12.04.2023 15:18, Miquel Raynal wrote: > Hi Arseniy, > > avkrasnov@sberdevices.ru wrote on Wed, 12 Apr 2023 13:14:52 +0300: > >> On 12.04.2023 12:36, Miquel Raynal wrote: >>> Hi Arseniy, >>> >>> avkrasnov@sberdevices.ru wrote on Wed, 12 Apr 2023 12:20:55 +0300: >>> >>>> On 12.04.2023 10:44, Miquel Raynal wrote: >>>>> Hi Arseniy, >>>>> >>>>> AVKrasnov@sberdevices.ru wrote on Wed, 12 Apr 2023 09:16:58 +0300: >>>>> >>>>>> This NAND reads only few user's bytes in ECC mode (not full OOB), so >>>>> >>>>> "This NAND reads" does not look right, do you mean "Subpage reads do >>>>> not retrieve all the OOB bytes,"? >>>>> >>>>>> fill OOB buffer with zeroes to not return garbage from previous reads >>>>>> to user. >>>>>> Otherwise 'nanddump' utility prints something like this for just erased >>>>>> page: >>>>>> >>>>>> ... >>>>>> 0x000007f0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff >>>>>> OOB Data: ff ff ff ff 00 00 ff ff 80 cf 22 99 cb ad d3 be >>>>>> OOB Data: 63 27 ae 06 16 0a 2f eb bb dd 46 74 41 8e 88 6e >>>>>> OOB Data: 38 a1 2d e6 77 d4 05 06 f2 a5 7e 25 eb 34 7c ff >>>>>> OOB Data: 38 ea de 14 10 de 9b 40 33 16 6a cc 9d aa 2f 5e >>>>>> >>>>>> Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru> >>>>>> --- >>>>>> drivers/mtd/nand/raw/meson_nand.c | 5 +++++ >>>>>> 1 file changed, 5 insertions(+) >>>>>> >>>>>> diff --git a/drivers/mtd/nand/raw/meson_nand.c b/drivers/mtd/nand/raw/meson_nand.c >>>>>> index f84a10238e4d..f2f2472cb511 100644 >>>>>> --- a/drivers/mtd/nand/raw/meson_nand.c >>>>>> +++ b/drivers/mtd/nand/raw/meson_nand.c >>>>>> @@ -858,9 +858,12 @@ static int meson_nfc_read_page_sub(struct nand_chip *nand, >>>>>> static int meson_nfc_read_page_raw(struct nand_chip *nand, u8 *buf, >>>>>> int oob_required, int page) >>>>>> { >>>>>> + struct mtd_info *mtd = nand_to_mtd(nand); >>>>>> u8 *oob_buf = nand->oob_poi; >>>>>> int ret; >>>>>> >>>>>> + memset(oob_buf, 0, mtd->oobsize); >>>>> >>>>> I'm surprised raw reads do not read the entire OOB? >>>> >>>> Yes! Seems in case of raw access (what i see in this driver) number of OOB bytes read >>>> still depends on ECC parameters: for each portion of data covered with ECC code we can >>>> read it's ECC code and "user bytes" from OOB - it is what i see by dumping DMA buffer by >>>> printk(). For example I'm working with 2K NAND pages, each page has 2 x 1K ECC blocks. >>>> For each ECC block I have 16 OOB bytes which I can access by read/write. Each 16 bytes >>>> contains 2 bytes of user's data and 14 bytes ECC codes. So when I read page in raw mode >>>> controller returns 32 bytes (2 x (2 + 14)) of OOB. While OOB is reported as 64 bytes. >>> >>> In all modes, when you read OOB, you should get the full OOB. The fact >>> that ECC correction is enabled or disabled does not matter. If the NAND >>> features OOB sections of 64 bytes, you should get the 64 bytes. >>> >>> What happens sometimes, is that some of the bytes are not protected >>> against bitflips, but the policy is to return the full buffer. >> >> Ok, so to clarify case for this NAND controller: >> 1) In both ECC and raw modes i need to return the same raw OOB data (e.g. user bytes >> + ECC codes)? > > Well, you need to cover the same amount of data, yes. But in the ECC > case the data won't be raw (at least not all of it). So "same amount of data", in ECC mode current implementation returns only user OOB bytes (e.g. OOB data excluding ECC codes), in raw it returns user bytes + ECC codes. IIUC correct behaviour is to always return user bytes + ECC codes as OOB data even in ECC mode ? Thanks, Arseniy > >> 2) If I have access to only 32 bytes of OOB (in case above), I must report that size >> of OOB is only 32 bytes during initialization? > > I believe it's just an implementation error in the controller driver. > This controller can be used on NAND chips with other geometries, please > don't play with it. > > Thanks, > Miquèl
Hi Dmitry, On 2023/4/12 19:43, Dmitry Rokosov wrote: > [ EXTERNAL EMAIL ] > > Hello Liang, > > On Wed, Apr 12, 2023 at 07:36:30PM +0800, Liang Yang wrote: >> Hi, >> >> On 2023/4/12 18:51, Liang Yang wrote: >>>>>>>> diff --git a/drivers/mtd/nand/raw/meson_nand.c >>>>>>>> b/drivers/mtd/nand/raw/meson_nand.c >>>>>>>> index f84a10238e4d..f2f2472cb511 100644 >>>>>>>> --- a/drivers/mtd/nand/raw/meson_nand.c >>>>>>>> +++ b/drivers/mtd/nand/raw/meson_nand.c >>>>>>>> @@ -858,9 +858,12 @@ static int >>>>>>>> meson_nfc_read_page_sub(struct nand_chip *nand, >>>>>>>> static int meson_nfc_read_page_raw(struct nand_chip *nand, u8 *buf, >>>>>>>> int oob_required, int page) >>>>>>>> { >>>>>>>> + struct mtd_info *mtd = nand_to_mtd(nand); >>>>>>>> u8 *oob_buf = nand->oob_poi; >>>>>>>> int ret; >>>>>>>> + memset(oob_buf, 0, mtd->oobsize); >>>>>>> >>>>>>> I'm surprised raw reads do not read the entire OOB? >>>>>> >>>>>> Yes! Seems in case of raw access (what i see in this driver) >>>>>> number of OOB bytes read >>>>>> still depends on ECC parameters: for each portion of data >>>>>> covered with ECC code we can >>>>>> read it's ECC code and "user bytes" from OOB - it is what i >>>>>> see by dumping DMA buffer by >>>>>> printk(). For example I'm working with 2K NAND pages, each >>>>>> page has 2 x 1K ECC blocks. >>>>>> For each ECC block I have 16 OOB bytes which I can access by >>>>>> read/write. Each 16 bytes >>>>>> contains 2 bytes of user's data and 14 bytes ECC codes. So >>>>>> when I read page in raw mode >>>>>> controller returns 32 bytes (2 x (2 + 14)) of OOB. While OOB >>>>>> is reported as 64 bytes. >>>>> >>>>> In all modes, when you read OOB, you should get the full OOB. The fact >>>>> that ECC correction is enabled or disabled does not matter. If the NAND >>>>> features OOB sections of 64 bytes, you should get the 64 bytes. >>>>> >>>>> What happens sometimes, is that some of the bytes are not protected >>>>> against bitflips, but the policy is to return the full buffer. >>>> >>>> Ok, so to clarify case for this NAND controller: >>>> 1) In both ECC and raw modes i need to return the same raw OOB data >>>> (e.g. user bytes >>>> + ECC codes)? >>>> 2) If I have access to only 32 bytes of OOB (in case above), I must >>>> report that size >>>> of OOB is only 32 bytes during initialization? >>>> >>>> Thanks, Arseniy >>> >>> Yes. it should return all the OOB data. i make a mistake on raw read and >>> there is wrong code in meson_nfc_read_page_raw(). >>> meson_nfc_get_data_oob(nand, buf, oob_buf); >>> changed to: >>> if (oob_required) >>> memcpy(oob_buf, buf + mtd->writesize, mtd->oobsize) >> >> Sorry, please ignore this. the previous code is right. >> >> the controller changes the layout of one page; the physical layout is 2048 >> main data + 64 oob data. after writing into NAND page, it is stored >> like this: 1024 main data + 2 user bytes + 14 ECC parity bytes + 1024 main >> data + 2 user bytes + 14 ECC parity bytes. so that is right we only get 4 >> user bytes and 28 ECC parity bytes, total 32 bytes. that is the behavior of >> the controller that transferring one ECC page(1KB) brings back only 2 user >> bytes. >> >> because layout is changed by controller, so go back to the function. >> meson_nfc_get_data_oob(nand, buf, oob_buf) try to get the right user and ecc >> parity bytes from the right pos. after that, the other oob bytes is not >> reading from NAND flash. > > I have always been under the impression that NAND OOB layout falls under the > responsibility of the flash driver. Is this specific to the Amlogic NAND, > and does it map the flash layout to the internal controller layout? > For example, different OOB layouts exist between Macronix and ESMT. > > Apologies for any confusion, and thank you in advance for any help in > clarifying this matter. > This is the behavior of Amlogic NAND controller, not related to NAND device. this design of Amlogic NAND controller is reducing the fifo(sram) size inside the controller. To the NAND device with <2KB(main) + 64B(oob)> page, usually if the controller place user bytes and ECC parity bytes after 2048(column address). to finish the ECC calculation and checking, the controller has to read the data and its ECC parity bytes. if they are not placed continuously, the controllers have to read the whole page data including the oob data together or send random read command with column address. so consider that the controller needs a fifo of at least (2048 + 64) bytes inside. if the NAND device has bigger size of NAND page, the size of the fifo should be bigger. in Amlogic NAND controller, after changing the data layout of the NAND page, it just needs <1KB + 16B> fifo size. the controller firstly sends only <1024 + 16> clocks and get 1st ECC page data and its ECC parity bytes, and then it can start ECC calculation and checking. after that or at the same time, loop next ECC page data and its ECC parity.
Hi Arseniy, avkrasnov@sberdevices.ru wrote on Wed, 12 Apr 2023 15:22:26 +0300: > On 12.04.2023 15:18, Miquel Raynal wrote: > > Hi Arseniy, > > > > avkrasnov@sberdevices.ru wrote on Wed, 12 Apr 2023 13:14:52 +0300: > > > >> On 12.04.2023 12:36, Miquel Raynal wrote: > >>> Hi Arseniy, > >>> > >>> avkrasnov@sberdevices.ru wrote on Wed, 12 Apr 2023 12:20:55 +0300: > >>> > >>>> On 12.04.2023 10:44, Miquel Raynal wrote: > >>>>> Hi Arseniy, > >>>>> > >>>>> AVKrasnov@sberdevices.ru wrote on Wed, 12 Apr 2023 09:16:58 +0300: > >>>>> > >>>>>> This NAND reads only few user's bytes in ECC mode (not full OOB), so > >>>>> > >>>>> "This NAND reads" does not look right, do you mean "Subpage reads do > >>>>> not retrieve all the OOB bytes,"? > >>>>> > >>>>>> fill OOB buffer with zeroes to not return garbage from previous reads > >>>>>> to user. > >>>>>> Otherwise 'nanddump' utility prints something like this for just erased > >>>>>> page: > >>>>>> > >>>>>> ... > >>>>>> 0x000007f0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > >>>>>> OOB Data: ff ff ff ff 00 00 ff ff 80 cf 22 99 cb ad d3 be > >>>>>> OOB Data: 63 27 ae 06 16 0a 2f eb bb dd 46 74 41 8e 88 6e > >>>>>> OOB Data: 38 a1 2d e6 77 d4 05 06 f2 a5 7e 25 eb 34 7c ff > >>>>>> OOB Data: 38 ea de 14 10 de 9b 40 33 16 6a cc 9d aa 2f 5e > >>>>>> > >>>>>> Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru> > >>>>>> --- > >>>>>> drivers/mtd/nand/raw/meson_nand.c | 5 +++++ > >>>>>> 1 file changed, 5 insertions(+) > >>>>>> > >>>>>> diff --git a/drivers/mtd/nand/raw/meson_nand.c b/drivers/mtd/nand/raw/meson_nand.c > >>>>>> index f84a10238e4d..f2f2472cb511 100644 > >>>>>> --- a/drivers/mtd/nand/raw/meson_nand.c > >>>>>> +++ b/drivers/mtd/nand/raw/meson_nand.c > >>>>>> @@ -858,9 +858,12 @@ static int meson_nfc_read_page_sub(struct nand_chip *nand, > >>>>>> static int meson_nfc_read_page_raw(struct nand_chip *nand, u8 *buf, > >>>>>> int oob_required, int page) > >>>>>> { > >>>>>> + struct mtd_info *mtd = nand_to_mtd(nand); > >>>>>> u8 *oob_buf = nand->oob_poi; > >>>>>> int ret; > >>>>>> > >>>>>> + memset(oob_buf, 0, mtd->oobsize); > >>>>> > >>>>> I'm surprised raw reads do not read the entire OOB? > >>>> > >>>> Yes! Seems in case of raw access (what i see in this driver) number of OOB bytes read > >>>> still depends on ECC parameters: for each portion of data covered with ECC code we can > >>>> read it's ECC code and "user bytes" from OOB - it is what i see by dumping DMA buffer by > >>>> printk(). For example I'm working with 2K NAND pages, each page has 2 x 1K ECC blocks. > >>>> For each ECC block I have 16 OOB bytes which I can access by read/write. Each 16 bytes > >>>> contains 2 bytes of user's data and 14 bytes ECC codes. So when I read page in raw mode > >>>> controller returns 32 bytes (2 x (2 + 14)) of OOB. While OOB is reported as 64 bytes. > >>> > >>> In all modes, when you read OOB, you should get the full OOB. The fact > >>> that ECC correction is enabled or disabled does not matter. If the NAND > >>> features OOB sections of 64 bytes, you should get the 64 bytes. > >>> > >>> What happens sometimes, is that some of the bytes are not protected > >>> against bitflips, but the policy is to return the full buffer. > >> > >> Ok, so to clarify case for this NAND controller: > >> 1) In both ECC and raw modes i need to return the same raw OOB data (e.g. user bytes > >> + ECC codes)? > > > > Well, you need to cover the same amount of data, yes. But in the ECC > > case the data won't be raw (at least not all of it). > > So "same amount of data", in ECC mode current implementation returns only user OOB bytes (e.g. > OOB data excluding ECC codes), in raw it returns user bytes + ECC codes. IIUC correct > behaviour is to always return user bytes + ECC codes as OOB data even in ECC mode ? If the page are 2k+64B you should read 2k+64B when OOB are requested. If the controller only returns 2k+32B, then perform a random read to just move the read pointer to mtd->size + mtd->oobsize - 32 and retrieve the missing 32 bytes? This applies to the two modes, the only difference is: - with correction (commonly named "ECC mode"): the user bytes and ECC bytes should be fixed if there are any bitflips - without correction (commonly referred as "raw mode"): no correction applies, if there are bitflips, give them Please mind the raw mode can be slow, it's meant for debugging and testing, mainly. Page reads however should be fast, so if just moving the column pointer works, then do it, otherwise we'll consider returning FFs. Thanks, Miquèl
Hi Miquel and Arseniy, On 2023/4/12 20:57, Miquel Raynal wrote: > [ EXTERNAL EMAIL ] > > Hi Arseniy, > > avkrasnov@sberdevices.ru wrote on Wed, 12 Apr 2023 15:22:26 +0300: > >> On 12.04.2023 15:18, Miquel Raynal wrote: >>> Hi Arseniy, >>> >>> avkrasnov@sberdevices.ru wrote on Wed, 12 Apr 2023 13:14:52 +0300: >>> >>>> On 12.04.2023 12:36, Miquel Raynal wrote: >>>>> Hi Arseniy, >>>>> >>>>> avkrasnov@sberdevices.ru wrote on Wed, 12 Apr 2023 12:20:55 +0300: >>>>> >>>>>> On 12.04.2023 10:44, Miquel Raynal wrote: >>>>>>> Hi Arseniy, >>>>>>> >>>>>>> AVKrasnov@sberdevices.ru wrote on Wed, 12 Apr 2023 09:16:58 +0300: >>>>>>> >>>>>>>> This NAND reads only few user's bytes in ECC mode (not full OOB), so >>>>>>> >>>>>>> "This NAND reads" does not look right, do you mean "Subpage reads do >>>>>>> not retrieve all the OOB bytes,"? >>>>>>> >>>>>>>> fill OOB buffer with zeroes to not return garbage from previous reads >>>>>>>> to user. >>>>>>>> Otherwise 'nanddump' utility prints something like this for just erased >>>>>>>> page: >>>>>>>> >>>>>>>> ... >>>>>>>> 0x000007f0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff >>>>>>>> OOB Data: ff ff ff ff 00 00 ff ff 80 cf 22 99 cb ad d3 be >>>>>>>> OOB Data: 63 27 ae 06 16 0a 2f eb bb dd 46 74 41 8e 88 6e >>>>>>>> OOB Data: 38 a1 2d e6 77 d4 05 06 f2 a5 7e 25 eb 34 7c ff >>>>>>>> OOB Data: 38 ea de 14 10 de 9b 40 33 16 6a cc 9d aa 2f 5e >>>>>>>> >>>>>>>> Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru> >>>>>>>> --- >>>>>>>> drivers/mtd/nand/raw/meson_nand.c | 5 +++++ >>>>>>>> 1 file changed, 5 insertions(+) >>>>>>>> >>>>>>>> diff --git a/drivers/mtd/nand/raw/meson_nand.c b/drivers/mtd/nand/raw/meson_nand.c >>>>>>>> index f84a10238e4d..f2f2472cb511 100644 >>>>>>>> --- a/drivers/mtd/nand/raw/meson_nand.c >>>>>>>> +++ b/drivers/mtd/nand/raw/meson_nand.c >>>>>>>> @@ -858,9 +858,12 @@ static int meson_nfc_read_page_sub(struct nand_chip *nand, >>>>>>>> static int meson_nfc_read_page_raw(struct nand_chip *nand, u8 *buf, >>>>>>>> int oob_required, int page) >>>>>>>> { >>>>>>>> + struct mtd_info *mtd = nand_to_mtd(nand); >>>>>>>> u8 *oob_buf = nand->oob_poi; >>>>>>>> int ret; >>>>>>>> >>>>>>>> + memset(oob_buf, 0, mtd->oobsize); >>>>>>> >>>>>>> I'm surprised raw reads do not read the entire OOB? >>>>>> >>>>>> Yes! Seems in case of raw access (what i see in this driver) number of OOB bytes read >>>>>> still depends on ECC parameters: for each portion of data covered with ECC code we can >>>>>> read it's ECC code and "user bytes" from OOB - it is what i see by dumping DMA buffer by >>>>>> printk(). For example I'm working with 2K NAND pages, each page has 2 x 1K ECC blocks. >>>>>> For each ECC block I have 16 OOB bytes which I can access by read/write. Each 16 bytes >>>>>> contains 2 bytes of user's data and 14 bytes ECC codes. So when I read page in raw mode >>>>>> controller returns 32 bytes (2 x (2 + 14)) of OOB. While OOB is reported as 64 bytes. >>>>> >>>>> In all modes, when you read OOB, you should get the full OOB. The fact >>>>> that ECC correction is enabled or disabled does not matter. If the NAND >>>>> features OOB sections of 64 bytes, you should get the 64 bytes. >>>>> >>>>> What happens sometimes, is that some of the bytes are not protected >>>>> against bitflips, but the policy is to return the full buffer. >>>> >>>> Ok, so to clarify case for this NAND controller: >>>> 1) In both ECC and raw modes i need to return the same raw OOB data (e.g. user bytes >>>> + ECC codes)? >>> >>> Well, you need to cover the same amount of data, yes. But in the ECC >>> case the data won't be raw (at least not all of it). >> >> So "same amount of data", in ECC mode current implementation returns only user OOB bytes (e.g. >> OOB data excluding ECC codes), in raw it returns user bytes + ECC codes. IIUC correct >> behaviour is to always return user bytes + ECC codes as OOB data even in ECC mode ? > > If the page are 2k+64B you should read 2k+64B when OOB are requested. > > If the controller only returns 2k+32B, then perform a random read to > just move the read pointer to mtd->size + mtd->oobsize - 32 and > retrieve the missing 32 bytes? 1) raw read can read out the whole page data 2k+64B, decided by the len in the controller raw read command: cmd = (len & GENMASK(5, 0)) | scrambler | DMA_DIR(dir); after that, the missing oob bytes(not used) can be copied from meson_chip->data_buf. so the implementation of meson_nfc_read_page_raw() is like this if need. { ...... meson_nfc_read_page_sub(nand, page, 1); meson_nfc_get_data_oob(nand, buf, oob_buf); oob_len = (nand->ecc.bytes + 2) * nand->ecc.steps; memcpy(oob_buf + oob_len, meson_chip->data_buf + oob_len, mtd->oobsize - oob_len); } 2) In ECC mode, the controller can't bring back the missing OOB bytes. it can read out the user bytes and ecc bytes per meson_ooblayout_ops define. > > This applies to the two modes, the only difference is: > - with correction (commonly named "ECC mode"): the user bytes and ECC > bytes should be fixed if there are any bitflips > - without correction (commonly referred as "raw mode"): no correction > applies, if there are bitflips, give them > > Please mind the raw mode can be slow, it's meant for debugging and > testing, mainly. Page reads however should be fast, so if just moving > the column pointer works, then do it, otherwise we'll consider > returning FFs. > > Thanks, > Miquèl >
Hello, liang.yang@amlogic.com wrote on Wed, 12 Apr 2023 22:04:28 +0800: > Hi Miquel and Arseniy, > > On 2023/4/12 20:57, Miquel Raynal wrote: > > [ EXTERNAL EMAIL ] > > > > Hi Arseniy, > > > > avkrasnov@sberdevices.ru wrote on Wed, 12 Apr 2023 15:22:26 +0300: > > > >> On 12.04.2023 15:18, Miquel Raynal wrote: > >>> Hi Arseniy, > >>> > >>> avkrasnov@sberdevices.ru wrote on Wed, 12 Apr 2023 13:14:52 +0300: > >>> >>>> On 12.04.2023 12:36, Miquel Raynal wrote: > >>>>> Hi Arseniy, > >>>>> > >>>>> avkrasnov@sberdevices.ru wrote on Wed, 12 Apr 2023 12:20:55 +0300: > >>>>> >>>>>> On 12.04.2023 10:44, Miquel Raynal wrote: > >>>>>>> Hi Arseniy, > >>>>>>> > >>>>>>> AVKrasnov@sberdevices.ru wrote on Wed, 12 Apr 2023 09:16:58 +0300: > >>>>>>> >>>>>>>> This NAND reads only few user's bytes in ECC mode (not full OOB), so > >>>>>>> > >>>>>>> "This NAND reads" does not look right, do you mean "Subpage reads do > >>>>>>> not retrieve all the OOB bytes,"? > >>>>>>> >>>>>>>> fill OOB buffer with zeroes to not return garbage from previous reads > >>>>>>>> to user. > >>>>>>>> Otherwise 'nanddump' utility prints something like this for just erased > >>>>>>>> page: > >>>>>>>> > >>>>>>>> ... > >>>>>>>> 0x000007f0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > >>>>>>>> OOB Data: ff ff ff ff 00 00 ff ff 80 cf 22 99 cb ad d3 be > >>>>>>>> OOB Data: 63 27 ae 06 16 0a 2f eb bb dd 46 74 41 8e 88 6e > >>>>>>>> OOB Data: 38 a1 2d e6 77 d4 05 06 f2 a5 7e 25 eb 34 7c ff > >>>>>>>> OOB Data: 38 ea de 14 10 de 9b 40 33 16 6a cc 9d aa 2f 5e > >>>>>>>> > >>>>>>>> Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru> > >>>>>>>> --- > >>>>>>>> drivers/mtd/nand/raw/meson_nand.c | 5 +++++ > >>>>>>>> 1 file changed, 5 insertions(+) > >>>>>>>> > >>>>>>>> diff --git a/drivers/mtd/nand/raw/meson_nand.c b/drivers/mtd/nand/raw/meson_nand.c > >>>>>>>> index f84a10238e4d..f2f2472cb511 100644 > >>>>>>>> --- a/drivers/mtd/nand/raw/meson_nand.c > >>>>>>>> +++ b/drivers/mtd/nand/raw/meson_nand.c > >>>>>>>> @@ -858,9 +858,12 @@ static int meson_nfc_read_page_sub(struct nand_chip *nand, > >>>>>>>> static int meson_nfc_read_page_raw(struct nand_chip *nand, u8 *buf, > >>>>>>>> int oob_required, int page) > >>>>>>>> { > >>>>>>>> + struct mtd_info *mtd = nand_to_mtd(nand); > >>>>>>>> u8 *oob_buf = nand->oob_poi; > >>>>>>>> int ret; > >>>>>>>> >>>>>>>> + memset(oob_buf, 0, mtd->oobsize); > >>>>>>> > >>>>>>> I'm surprised raw reads do not read the entire OOB? > >>>>>> > >>>>>> Yes! Seems in case of raw access (what i see in this driver) number of OOB bytes read > >>>>>> still depends on ECC parameters: for each portion of data covered with ECC code we can > >>>>>> read it's ECC code and "user bytes" from OOB - it is what i see by dumping DMA buffer by > >>>>>> printk(). For example I'm working with 2K NAND pages, each page has 2 x 1K ECC blocks. > >>>>>> For each ECC block I have 16 OOB bytes which I can access by read/write. Each 16 bytes > >>>>>> contains 2 bytes of user's data and 14 bytes ECC codes. So when I read page in raw mode > >>>>>> controller returns 32 bytes (2 x (2 + 14)) of OOB. While OOB is reported as 64 bytes. > >>>>> > >>>>> In all modes, when you read OOB, you should get the full OOB. The fact > >>>>> that ECC correction is enabled or disabled does not matter. If the NAND > >>>>> features OOB sections of 64 bytes, you should get the 64 bytes. > >>>>> > >>>>> What happens sometimes, is that some of the bytes are not protected > >>>>> against bitflips, but the policy is to return the full buffer. > >>>> > >>>> Ok, so to clarify case for this NAND controller: > >>>> 1) In both ECC and raw modes i need to return the same raw OOB data (e.g. user bytes > >>>> + ECC codes)? > >>> > >>> Well, you need to cover the same amount of data, yes. But in the ECC > >>> case the data won't be raw (at least not all of it). > >> > >> So "same amount of data", in ECC mode current implementation returns only user OOB bytes (e.g. > >> OOB data excluding ECC codes), in raw it returns user bytes + ECC codes. IIUC correct > >> behaviour is to always return user bytes + ECC codes as OOB data even in ECC mode ? > > > > If the page are 2k+64B you should read 2k+64B when OOB are requested. > > > > If the controller only returns 2k+32B, then perform a random read to > > just move the read pointer to mtd->size + mtd->oobsize - 32 and > > retrieve the missing 32 bytes? > > 1) raw read can read out the whole page data 2k+64B, decided by the len in the controller raw read command: > cmd = (len & GENMASK(5, 0)) | scrambler | DMA_DIR(dir); > after that, the missing oob bytes(not used) can be copied from meson_chip->data_buf. so the implementation of meson_nfc_read_page_raw() is like this if need. > { > ...... > meson_nfc_read_page_sub(nand, page, 1); > meson_nfc_get_data_oob(nand, buf, oob_buf); > oob_len = (nand->ecc.bytes + 2) * nand->ecc.steps; > memcpy(oob_buf + oob_len, meson_chip->data_buf + oob_len, mtd->oobsize - oob_len); > > } > 2) In ECC mode, the controller can't bring back the missing OOB bytes. it can read out the user bytes and ecc bytes per meson_ooblayout_ops define. And then (if oob_required) you can bring the missing bytes with something along: nand_change_read_column_op(chip, mtd->writesize + oob_len, oob_buf + oob_len, mtd->oobsize - oob_len, false); Should not be a huge performance hit. > > > > > This applies to the two modes, the only difference is: > > - with correction (commonly named "ECC mode"): the user bytes and ECC > > bytes should be fixed if there are any bitflips > > - without correction (commonly referred as "raw mode"): no correction > > applies, if there are bitflips, give them > > > > Please mind the raw mode can be slow, it's meant for debugging and > > testing, mainly. Page reads however should be fast, so if just moving > > the column pointer works, then do it, otherwise we'll consider > > returning FFs. > > > > Thanks, > > Miquèl > > > Thanks, Miquèl
On Wed, Apr 12, 2023 at 10:04:28PM +0800, Liang Yang wrote: > Hi Miquel and Arseniy, > > On 2023/4/12 20:57, Miquel Raynal wrote: > > [ EXTERNAL EMAIL ] > > > > Hi Arseniy, > > > > avkrasnov@sberdevices.ru wrote on Wed, 12 Apr 2023 15:22:26 +0300: > > > > > On 12.04.2023 15:18, Miquel Raynal wrote: > > > > Hi Arseniy, > > > > > > > > avkrasnov@sberdevices.ru wrote on Wed, 12 Apr 2023 13:14:52 +0300: > > > > > On 12.04.2023 12:36, Miquel Raynal wrote: > > > > > > Hi Arseniy, > > > > > > > > > > > > avkrasnov@sberdevices.ru wrote on Wed, 12 Apr 2023 12:20:55 +0300: > > > > > > > On 12.04.2023 10:44, Miquel Raynal wrote: > > > > > > > > Hi Arseniy, > > > > > > > > > > > > > > > > AVKrasnov@sberdevices.ru wrote on Wed, 12 Apr 2023 09:16:58 +0300: > > > > > > > > > This NAND reads only few user's bytes in ECC mode (not full OOB), so > > > > > > > > > > > > > > > > "This NAND reads" does not look right, do you mean "Subpage reads do > > > > > > > > not retrieve all the OOB bytes,"? > > > > > > > > > fill OOB buffer with zeroes to not return garbage from previous reads > > > > > > > > > to user. > > > > > > > > > Otherwise 'nanddump' utility prints something like this for just erased > > > > > > > > > page: > > > > > > > > > > > > > > > > > > ... > > > > > > > > > 0x000007f0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > > > > > > > > > OOB Data: ff ff ff ff 00 00 ff ff 80 cf 22 99 cb ad d3 be > > > > > > > > > OOB Data: 63 27 ae 06 16 0a 2f eb bb dd 46 74 41 8e 88 6e > > > > > > > > > OOB Data: 38 a1 2d e6 77 d4 05 06 f2 a5 7e 25 eb 34 7c ff > > > > > > > > > OOB Data: 38 ea de 14 10 de 9b 40 33 16 6a cc 9d aa 2f 5e > > > > > > > > > > > > > > > > > > Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru> > > > > > > > > > --- > > > > > > > > > drivers/mtd/nand/raw/meson_nand.c | 5 +++++ > > > > > > > > > 1 file changed, 5 insertions(+) > > > > > > > > > > > > > > > > > > diff --git a/drivers/mtd/nand/raw/meson_nand.c b/drivers/mtd/nand/raw/meson_nand.c > > > > > > > > > index f84a10238e4d..f2f2472cb511 100644 > > > > > > > > > --- a/drivers/mtd/nand/raw/meson_nand.c > > > > > > > > > +++ b/drivers/mtd/nand/raw/meson_nand.c > > > > > > > > > @@ -858,9 +858,12 @@ static int meson_nfc_read_page_sub(struct nand_chip *nand, > > > > > > > > > static int meson_nfc_read_page_raw(struct nand_chip *nand, u8 *buf, > > > > > > > > > int oob_required, int page) > > > > > > > > > { > > > > > > > > > + struct mtd_info *mtd = nand_to_mtd(nand); > > > > > > > > > u8 *oob_buf = nand->oob_poi; > > > > > > > > > int ret; > > > > > > > > > + memset(oob_buf, 0, mtd->oobsize); > > > > > > > > > > > > > > > > I'm surprised raw reads do not read the entire OOB? > > > > > > > > > > > > > > Yes! Seems in case of raw access (what i see in this driver) number of OOB bytes read > > > > > > > still depends on ECC parameters: for each portion of data covered with ECC code we can > > > > > > > read it's ECC code and "user bytes" from OOB - it is what i see by dumping DMA buffer by > > > > > > > printk(). For example I'm working with 2K NAND pages, each page has 2 x 1K ECC blocks. > > > > > > > For each ECC block I have 16 OOB bytes which I can access by read/write. Each 16 bytes > > > > > > > contains 2 bytes of user's data and 14 bytes ECC codes. So when I read page in raw mode > > > > > > > controller returns 32 bytes (2 x (2 + 14)) of OOB. While OOB is reported as 64 bytes. > > > > > > > > > > > > In all modes, when you read OOB, you should get the full OOB. The fact > > > > > > that ECC correction is enabled or disabled does not matter. If the NAND > > > > > > features OOB sections of 64 bytes, you should get the 64 bytes. > > > > > > > > > > > > What happens sometimes, is that some of the bytes are not protected > > > > > > against bitflips, but the policy is to return the full buffer. > > > > > > > > > > Ok, so to clarify case for this NAND controller: > > > > > 1) In both ECC and raw modes i need to return the same raw OOB data (e.g. user bytes > > > > > + ECC codes)? > > > > > > > > Well, you need to cover the same amount of data, yes. But in the ECC > > > > case the data won't be raw (at least not all of it). > > > > > > So "same amount of data", in ECC mode current implementation returns only user OOB bytes (e.g. > > > OOB data excluding ECC codes), in raw it returns user bytes + ECC codes. IIUC correct > > > behaviour is to always return user bytes + ECC codes as OOB data even in ECC mode ? > > > > If the page are 2k+64B you should read 2k+64B when OOB are requested. > > > > If the controller only returns 2k+32B, then perform a random read to > > just move the read pointer to mtd->size + mtd->oobsize - 32 and > > retrieve the missing 32 bytes? > > 1) raw read can read out the whole page data 2k+64B, decided by the len in > the controller raw read command: > cmd = (len & GENMASK(5, 0)) | scrambler | DMA_DIR(dir); > after that, the missing oob bytes(not used) can be copied from > meson_chip->data_buf. so the implementation of meson_nfc_read_page_raw() is > like this if need. > { > ...... > meson_nfc_read_page_sub(nand, page, 1); > meson_nfc_get_data_oob(nand, buf, oob_buf); > oob_len = (nand->ecc.bytes + 2) * nand->ecc.steps; > memcpy(oob_buf + oob_len, meson_chip->data_buf + oob_len, mtd->oobsize - > oob_len); > > } > 2) In ECC mode, the controller can't bring back the missing OOB bytes. it > can read out the user bytes and ecc bytes per meson_ooblayout_ops define. > How does the Meson controller know the actual NAND flash layout when the OOB is split into protected and unprotected areas, such as Free and ECC areas? If the controller has a static OOB layout, where is the mapping located? > > > > This applies to the two modes, the only difference is: > > - with correction (commonly named "ECC mode"): the user bytes and ECC > > bytes should be fixed if there are any bitflips > > - without correction (commonly referred as "raw mode"): no correction > > applies, if there are bitflips, give them > > > > Please mind the raw mode can be slow, it's meant for debugging and > > testing, mainly. Page reads however should be fast, so if just moving > > the column pointer works, then do it, otherwise we'll consider > > returning FFs.
Hi Dmitry, ddrokosov@sberdevices.ru wrote on Wed, 12 Apr 2023 22:15:48 +0300: > On Wed, Apr 12, 2023 at 10:04:28PM +0800, Liang Yang wrote: > > Hi Miquel and Arseniy, > > > > On 2023/4/12 20:57, Miquel Raynal wrote: > > > [ EXTERNAL EMAIL ] > > > > > > Hi Arseniy, > > > > > > avkrasnov@sberdevices.ru wrote on Wed, 12 Apr 2023 15:22:26 +0300: > > > > > > > On 12.04.2023 15:18, Miquel Raynal wrote: > > > > > Hi Arseniy, > > > > > > > > > > avkrasnov@sberdevices.ru wrote on Wed, 12 Apr 2023 13:14:52 +0300: > > > > > > On 12.04.2023 12:36, Miquel Raynal wrote: > > > > > > > Hi Arseniy, > > > > > > > > > > > > > > avkrasnov@sberdevices.ru wrote on Wed, 12 Apr 2023 12:20:55 +0300: > > > > > > > > On 12.04.2023 10:44, Miquel Raynal wrote: > > > > > > > > > Hi Arseniy, > > > > > > > > > > > > > > > > > > AVKrasnov@sberdevices.ru wrote on Wed, 12 Apr 2023 09:16:58 +0300: > > > > > > > > > > This NAND reads only few user's bytes in ECC mode (not full OOB), so > > > > > > > > > > > > > > > > > > "This NAND reads" does not look right, do you mean "Subpage reads do > > > > > > > > > not retrieve all the OOB bytes,"? > > > > > > > > > > fill OOB buffer with zeroes to not return garbage from previous reads > > > > > > > > > > to user. > > > > > > > > > > Otherwise 'nanddump' utility prints something like this for just erased > > > > > > > > > > page: > > > > > > > > > > > > > > > > > > > > ... > > > > > > > > > > 0x000007f0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > > > > > > > > > > OOB Data: ff ff ff ff 00 00 ff ff 80 cf 22 99 cb ad d3 be > > > > > > > > > > OOB Data: 63 27 ae 06 16 0a 2f eb bb dd 46 74 41 8e 88 6e > > > > > > > > > > OOB Data: 38 a1 2d e6 77 d4 05 06 f2 a5 7e 25 eb 34 7c ff > > > > > > > > > > OOB Data: 38 ea de 14 10 de 9b 40 33 16 6a cc 9d aa 2f 5e > > > > > > > > > > > > > > > > > > > > Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru> > > > > > > > > > > --- > > > > > > > > > > drivers/mtd/nand/raw/meson_nand.c | 5 +++++ > > > > > > > > > > 1 file changed, 5 insertions(+) > > > > > > > > > > > > > > > > > > > > diff --git a/drivers/mtd/nand/raw/meson_nand.c b/drivers/mtd/nand/raw/meson_nand.c > > > > > > > > > > index f84a10238e4d..f2f2472cb511 100644 > > > > > > > > > > --- a/drivers/mtd/nand/raw/meson_nand.c > > > > > > > > > > +++ b/drivers/mtd/nand/raw/meson_nand.c > > > > > > > > > > @@ -858,9 +858,12 @@ static int meson_nfc_read_page_sub(struct nand_chip *nand, > > > > > > > > > > static int meson_nfc_read_page_raw(struct nand_chip *nand, u8 *buf, > > > > > > > > > > int oob_required, int page) > > > > > > > > > > { > > > > > > > > > > + struct mtd_info *mtd = nand_to_mtd(nand); > > > > > > > > > > u8 *oob_buf = nand->oob_poi; > > > > > > > > > > int ret; > > > > > > > > > > + memset(oob_buf, 0, mtd->oobsize); > > > > > > > > > > > > > > > > > > I'm surprised raw reads do not read the entire OOB? > > > > > > > > > > > > > > > > Yes! Seems in case of raw access (what i see in this driver) number of OOB bytes read > > > > > > > > still depends on ECC parameters: for each portion of data covered with ECC code we can > > > > > > > > read it's ECC code and "user bytes" from OOB - it is what i see by dumping DMA buffer by > > > > > > > > printk(). For example I'm working with 2K NAND pages, each page has 2 x 1K ECC blocks. > > > > > > > > For each ECC block I have 16 OOB bytes which I can access by read/write. Each 16 bytes > > > > > > > > contains 2 bytes of user's data and 14 bytes ECC codes. So when I read page in raw mode > > > > > > > > controller returns 32 bytes (2 x (2 + 14)) of OOB. While OOB is reported as 64 bytes. > > > > > > > > > > > > > > In all modes, when you read OOB, you should get the full OOB. The fact > > > > > > > that ECC correction is enabled or disabled does not matter. If the NAND > > > > > > > features OOB sections of 64 bytes, you should get the 64 bytes. > > > > > > > > > > > > > > What happens sometimes, is that some of the bytes are not protected > > > > > > > against bitflips, but the policy is to return the full buffer. > > > > > > > > > > > > Ok, so to clarify case for this NAND controller: > > > > > > 1) In both ECC and raw modes i need to return the same raw OOB data (e.g. user bytes > > > > > > + ECC codes)? > > > > > > > > > > Well, you need to cover the same amount of data, yes. But in the ECC > > > > > case the data won't be raw (at least not all of it). > > > > > > > > So "same amount of data", in ECC mode current implementation returns only user OOB bytes (e.g. > > > > OOB data excluding ECC codes), in raw it returns user bytes + ECC codes. IIUC correct > > > > behaviour is to always return user bytes + ECC codes as OOB data even in ECC mode ? > > > > > > If the page are 2k+64B you should read 2k+64B when OOB are requested. > > > > > > If the controller only returns 2k+32B, then perform a random read to > > > just move the read pointer to mtd->size + mtd->oobsize - 32 and > > > retrieve the missing 32 bytes? > > > > 1) raw read can read out the whole page data 2k+64B, decided by the len in > > the controller raw read command: > > cmd = (len & GENMASK(5, 0)) | scrambler | DMA_DIR(dir); > > after that, the missing oob bytes(not used) can be copied from > > meson_chip->data_buf. so the implementation of meson_nfc_read_page_raw() is > > like this if need. > > { > > ...... > > meson_nfc_read_page_sub(nand, page, 1); > > meson_nfc_get_data_oob(nand, buf, oob_buf); > > oob_len = (nand->ecc.bytes + 2) * nand->ecc.steps; > > memcpy(oob_buf + oob_len, meson_chip->data_buf + oob_len, mtd->oobsize - > > oob_len); > > > > } > > 2) In ECC mode, the controller can't bring back the missing OOB bytes. it > > can read out the user bytes and ecc bytes per meson_ooblayout_ops define. > > > > How does the Meson controller know the actual NAND flash layout when the > OOB is split into protected and unprotected areas, such as Free and ECC > areas? If the controller has a static OOB layout, where is the mapping > located? It's usually a set of values hardcoded in the driver. It's a per geometry set. > > > > > > > This applies to the two modes, the only difference is: > > > - with correction (commonly named "ECC mode"): the user bytes and ECC > > > bytes should be fixed if there are any bitflips > > > - without correction (commonly referred as "raw mode"): no correction > > > applies, if there are bitflips, give them > > > > > > Please mind the raw mode can be slow, it's meant for debugging and > > > testing, mainly. Page reads however should be fast, so if just moving > > > the column pointer works, then do it, otherwise we'll consider > > > returning FFs. > Thanks, Miquèl
Hi Miquel, On 2023/4/12 22:32, Miquel Raynal wrote: > [ EXTERNAL EMAIL ] > > Hello, > > liang.yang@amlogic.com wrote on Wed, 12 Apr 2023 22:04:28 +0800: > >> Hi Miquel and Arseniy, >> >> On 2023/4/12 20:57, Miquel Raynal wrote: >>> [ EXTERNAL EMAIL ] >>> >>> Hi Arseniy, >>> >>> avkrasnov@sberdevices.ru wrote on Wed, 12 Apr 2023 15:22:26 +0300: >>> >>>> On 12.04.2023 15:18, Miquel Raynal wrote: >>>>> Hi Arseniy, >>>>> >>>>> avkrasnov@sberdevices.ru wrote on Wed, 12 Apr 2023 13:14:52 +0300: >>>>> >>>> On 12.04.2023 12:36, Miquel Raynal wrote: >>>>>>> Hi Arseniy, >>>>>>> >>>>>>> avkrasnov@sberdevices.ru wrote on Wed, 12 Apr 2023 12:20:55 +0300: >>>>>>> >>>>>> On 12.04.2023 10:44, Miquel Raynal wrote: >>>>>>>>> Hi Arseniy, >>>>>>>>> >>>>>>>>> AVKrasnov@sberdevices.ru wrote on Wed, 12 Apr 2023 09:16:58 +0300: >>>>>>>>> >>>>>>>> This NAND reads only few user's bytes in ECC mode (not full OOB), so >>>>>>>>> >>>>>>>>> "This NAND reads" does not look right, do you mean "Subpage reads do >>>>>>>>> not retrieve all the OOB bytes,"? >>>>>>>>> >>>>>>>> fill OOB buffer with zeroes to not return garbage from previous reads >>>>>>>>>> to user. >>>>>>>>>> Otherwise 'nanddump' utility prints something like this for just erased >>>>>>>>>> page: >>>>>>>>>> >>>>>>>>>> ... >>>>>>>>>> 0x000007f0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff >>>>>>>>>> OOB Data: ff ff ff ff 00 00 ff ff 80 cf 22 99 cb ad d3 be >>>>>>>>>> OOB Data: 63 27 ae 06 16 0a 2f eb bb dd 46 74 41 8e 88 6e >>>>>>>>>> OOB Data: 38 a1 2d e6 77 d4 05 06 f2 a5 7e 25 eb 34 7c ff >>>>>>>>>> OOB Data: 38 ea de 14 10 de 9b 40 33 16 6a cc 9d aa 2f 5e >>>>>>>>>> >>>>>>>>>> Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru> >>>>>>>>>> --- >>>>>>>>>> drivers/mtd/nand/raw/meson_nand.c | 5 +++++ >>>>>>>>>> 1 file changed, 5 insertions(+) >>>>>>>>>> >>>>>>>>>> diff --git a/drivers/mtd/nand/raw/meson_nand.c b/drivers/mtd/nand/raw/meson_nand.c >>>>>>>>>> index f84a10238e4d..f2f2472cb511 100644 >>>>>>>>>> --- a/drivers/mtd/nand/raw/meson_nand.c >>>>>>>>>> +++ b/drivers/mtd/nand/raw/meson_nand.c >>>>>>>>>> @@ -858,9 +858,12 @@ static int meson_nfc_read_page_sub(struct nand_chip *nand, >>>>>>>>>> static int meson_nfc_read_page_raw(struct nand_chip *nand, u8 *buf, >>>>>>>>>> int oob_required, int page) >>>>>>>>>> { >>>>>>>>>> + struct mtd_info *mtd = nand_to_mtd(nand); >>>>>>>>>> u8 *oob_buf = nand->oob_poi; >>>>>>>>>> int ret; >>>>>>>>>> >>>>>>>> + memset(oob_buf, 0, mtd->oobsize); >>>>>>>>> >>>>>>>>> I'm surprised raw reads do not read the entire OOB? >>>>>>>> >>>>>>>> Yes! Seems in case of raw access (what i see in this driver) number of OOB bytes read >>>>>>>> still depends on ECC parameters: for each portion of data covered with ECC code we can >>>>>>>> read it's ECC code and "user bytes" from OOB - it is what i see by dumping DMA buffer by >>>>>>>> printk(). For example I'm working with 2K NAND pages, each page has 2 x 1K ECC blocks. >>>>>>>> For each ECC block I have 16 OOB bytes which I can access by read/write. Each 16 bytes >>>>>>>> contains 2 bytes of user's data and 14 bytes ECC codes. So when I read page in raw mode >>>>>>>> controller returns 32 bytes (2 x (2 + 14)) of OOB. While OOB is reported as 64 bytes. >>>>>>> >>>>>>> In all modes, when you read OOB, you should get the full OOB. The fact >>>>>>> that ECC correction is enabled or disabled does not matter. If the NAND >>>>>>> features OOB sections of 64 bytes, you should get the 64 bytes. >>>>>>> >>>>>>> What happens sometimes, is that some of the bytes are not protected >>>>>>> against bitflips, but the policy is to return the full buffer. >>>>>> >>>>>> Ok, so to clarify case for this NAND controller: >>>>>> 1) In both ECC and raw modes i need to return the same raw OOB data (e.g. user bytes >>>>>> + ECC codes)? >>>>> >>>>> Well, you need to cover the same amount of data, yes. But in the ECC >>>>> case the data won't be raw (at least not all of it). >>>> >>>> So "same amount of data", in ECC mode current implementation returns only user OOB bytes (e.g. >>>> OOB data excluding ECC codes), in raw it returns user bytes + ECC codes. IIUC correct >>>> behaviour is to always return user bytes + ECC codes as OOB data even in ECC mode ? >>> >>> If the page are 2k+64B you should read 2k+64B when OOB are requested. >>> >>> If the controller only returns 2k+32B, then perform a random read to >>> just move the read pointer to mtd->size + mtd->oobsize - 32 and >>> retrieve the missing 32 bytes? >> >> 1) raw read can read out the whole page data 2k+64B, decided by the len in the controller raw read command: >> cmd = (len & GENMASK(5, 0)) | scrambler | DMA_DIR(dir); >> after that, the missing oob bytes(not used) can be copied from meson_chip->data_buf. so the implementation of meson_nfc_read_page_raw() is like this if need. >> { >> ...... >> meson_nfc_read_page_sub(nand, page, 1); >> meson_nfc_get_data_oob(nand, buf, oob_buf); >> oob_len = (nand->ecc.bytes + 2) * nand->ecc.steps; >> memcpy(oob_buf + oob_len, meson_chip->data_buf + oob_len, mtd->oobsize - oob_len); >> >> } >> 2) In ECC mode, the controller can't bring back the missing OOB bytes. it can read out the user bytes and ecc bytes per meson_ooblayout_ops define. > > And then (if oob_required) you can bring the missing bytes with > something along: > nand_change_read_column_op(chip, mtd->writesize + oob_len, > oob_buf + oob_len, > mtd->oobsize - oob_len, > false); > Should not be a huge performance hit. After finishing ECC mode reading, the column address internal in NAND device should be the right pos; it doesn't need to change the column again. so adding controller raw read for the missing bytes after ECC reading may works. > >> >>> >>> This applies to the two modes, the only difference is: >>> - with correction (commonly named "ECC mode"): the user bytes and ECC >>> bytes should be fixed if there are any bitflips >>> - without correction (commonly referred as "raw mode"): no correction >>> applies, if there are bitflips, give them >>> >>> Please mind the raw mode can be slow, it's meant for debugging and >>> testing, mainly. Page reads however should be fast, so if just moving >>> the column pointer works, then do it, otherwise we'll consider >>> returning FFs. >>> >>> Thanks, >>> Miquèl >>> >> > > > Thanks, > Miquèl >
On 2023/4/13 13:32, Liang Yang wrote: > Hi Miquel, > > On 2023/4/12 22:32, Miquel Raynal wrote: >> [ EXTERNAL EMAIL ] >> >> Hello, >> >> liang.yang@amlogic.com wrote on Wed, 12 Apr 2023 22:04:28 +0800: >> >>> Hi Miquel and Arseniy, >>> >>> On 2023/4/12 20:57, Miquel Raynal wrote: >>>> [ EXTERNAL EMAIL ] >>>> >>>> Hi Arseniy, >>>> >>>> avkrasnov@sberdevices.ru wrote on Wed, 12 Apr 2023 15:22:26 +0300: >>>>> On 12.04.2023 15:18, Miquel Raynal wrote: >>>>>> Hi Arseniy, >>>>>> >>>>>> avkrasnov@sberdevices.ru wrote on Wed, 12 Apr 2023 13:14:52 +0300: >>>>>> >>>> On 12.04.2023 12:36, Miquel Raynal wrote: >>>>>>>> Hi Arseniy, >>>>>>>> >>>>>>>> avkrasnov@sberdevices.ru wrote on Wed, 12 Apr 2023 12:20:55 +0300: >>>>>>>> >>>>>> On 12.04.2023 10:44, Miquel Raynal wrote: >>>>>>>>>> Hi Arseniy, >>>>>>>>>> >>>>>>>>>> AVKrasnov@sberdevices.ru wrote on Wed, 12 Apr 2023 09:16:58 >>>>>>>>>> +0300: >>>>>>>>>> >>>>>>>> This NAND reads only few user's bytes in ECC >>>>>>>>>> mode (not full OOB), so >>>>>>>>>> >>>>>>>>>> "This NAND reads" does not look right, do you mean "Subpage >>>>>>>>>> reads do >>>>>>>>>> not retrieve all the OOB bytes,"? >>>>>>>>>> >>>>>>>> fill OOB buffer with zeroes to not return >>>>>>>>>> garbage from previous reads >>>>>>>>>>> to user. >>>>>>>>>>> Otherwise 'nanddump' utility prints something like this for >>>>>>>>>>> just erased >>>>>>>>>>> page: >>>>>>>>>>> >>>>>>>>>>> ... >>>>>>>>>>> 0x000007f0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff >>>>>>>>>>> OOB Data: ff ff ff ff 00 00 ff ff 80 cf 22 99 cb ad d3 be >>>>>>>>>>> OOB Data: 63 27 ae 06 16 0a 2f eb bb dd 46 74 41 8e 88 6e >>>>>>>>>>> OOB Data: 38 a1 2d e6 77 d4 05 06 f2 a5 7e 25 eb 34 7c ff >>>>>>>>>>> OOB Data: 38 ea de 14 10 de 9b 40 33 16 6a cc 9d aa 2f 5e >>>>>>>>>>> >>>>>>>>>>> Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru> >>>>>>>>>>> --- >>>>>>>>>>> drivers/mtd/nand/raw/meson_nand.c | 5 +++++ >>>>>>>>>>> 1 file changed, 5 insertions(+) >>>>>>>>>>> >>>>>>>>>>> diff --git a/drivers/mtd/nand/raw/meson_nand.c >>>>>>>>>>> b/drivers/mtd/nand/raw/meson_nand.c >>>>>>>>>>> index f84a10238e4d..f2f2472cb511 100644 >>>>>>>>>>> --- a/drivers/mtd/nand/raw/meson_nand.c >>>>>>>>>>> +++ b/drivers/mtd/nand/raw/meson_nand.c >>>>>>>>>>> @@ -858,9 +858,12 @@ static int >>>>>>>>>>> meson_nfc_read_page_sub(struct nand_chip *nand, >>>>>>>>>>> static int meson_nfc_read_page_raw(struct nand_chip *nand, >>>>>>>>>>> u8 *buf, >>>>>>>>>>> int oob_required, int page) >>>>>>>>>>> { >>>>>>>>>>> + struct mtd_info *mtd = nand_to_mtd(nand); >>>>>>>>>>> u8 *oob_buf = nand->oob_poi; >>>>>>>>>>> int ret; >>>>>>>>>>> >>>>>>>> + memset(oob_buf, 0, mtd->oobsize); >>>>>>>>>> >>>>>>>>>> I'm surprised raw reads do not read the entire OOB? >>>>>>>>> >>>>>>>>> Yes! Seems in case of raw access (what i see in this driver) >>>>>>>>> number of OOB bytes read >>>>>>>>> still depends on ECC parameters: for each portion of data >>>>>>>>> covered with ECC code we can >>>>>>>>> read it's ECC code and "user bytes" from OOB - it is what i see >>>>>>>>> by dumping DMA buffer by >>>>>>>>> printk(). For example I'm working with 2K NAND pages, each page >>>>>>>>> has 2 x 1K ECC blocks. >>>>>>>>> For each ECC block I have 16 OOB bytes which I can access by >>>>>>>>> read/write. Each 16 bytes >>>>>>>>> contains 2 bytes of user's data and 14 bytes ECC codes. So when >>>>>>>>> I read page in raw mode >>>>>>>>> controller returns 32 bytes (2 x (2 + 14)) of OOB. While OOB is >>>>>>>>> reported as 64 bytes. >>>>>>>> >>>>>>>> In all modes, when you read OOB, you should get the full OOB. >>>>>>>> The fact >>>>>>>> that ECC correction is enabled or disabled does not matter. If >>>>>>>> the NAND >>>>>>>> features OOB sections of 64 bytes, you should get the 64 bytes. >>>>>>>> >>>>>>>> What happens sometimes, is that some of the bytes are not protected >>>>>>>> against bitflips, but the policy is to return the full buffer. >>>>>>> >>>>>>> Ok, so to clarify case for this NAND controller: >>>>>>> 1) In both ECC and raw modes i need to return the same raw OOB >>>>>>> data (e.g. user bytes >>>>>>> + ECC codes)? >>>>>> >>>>>> Well, you need to cover the same amount of data, yes. But in the ECC >>>>>> case the data won't be raw (at least not all of it). >>>>> >>>>> So "same amount of data", in ECC mode current implementation >>>>> returns only user OOB bytes (e.g. >>>>> OOB data excluding ECC codes), in raw it returns user bytes + ECC >>>>> codes. IIUC correct >>>>> behaviour is to always return user bytes + ECC codes as OOB data >>>>> even in ECC mode ? >>>> >>>> If the page are 2k+64B you should read 2k+64B when OOB are requested. >>>> >>>> If the controller only returns 2k+32B, then perform a random read to >>>> just move the read pointer to mtd->size + mtd->oobsize - 32 and >>>> retrieve the missing 32 bytes? >>> >>> 1) raw read can read out the whole page data 2k+64B, decided by the >>> len in the controller raw read command: >>> cmd = (len & GENMASK(5, 0)) | scrambler | DMA_DIR(dir); >>> after that, the missing oob bytes(not used) can be copied from >>> meson_chip->data_buf. so the implementation of >>> meson_nfc_read_page_raw() is like this if need. >>> { >>> ...... >>> meson_nfc_read_page_sub(nand, page, 1); >>> meson_nfc_get_data_oob(nand, buf, oob_buf); >>> oob_len = (nand->ecc.bytes + 2) * nand->ecc.steps; >>> memcpy(oob_buf + oob_len, meson_chip->data_buf + oob_len, >>> mtd->oobsize - oob_len); >>> >>> } >>> 2) In ECC mode, the controller can't bring back the missing OOB >>> bytes. it can read out the user bytes and ecc bytes per >>> meson_ooblayout_ops define. >> >> And then (if oob_required) you can bring the missing bytes with >> something along: >> nand_change_read_column_op(chip, mtd->writesize + oob_len, >> oob_buf + oob_len, >> mtd->oobsize - oob_len, >> false); >> Should not be a huge performance hit. > > After finishing ECC mode reading, the column address internal in NAND > device should be the right pos; it doesn't need to change the column > again. so adding controller raw read for the missing bytes after ECC > reading may works. > use raw read for the missing bytes, but they are not protected by host ECC. to the NAND type of storage, is it ok or missing bytes better to be filled with 0xff? >> >>> >>>> >>>> This applies to the two modes, the only difference is: >>>> - with correction (commonly named "ECC mode"): the user bytes and ECC >>>> bytes should be fixed if there are any bitflips >>>> - without correction (commonly referred as "raw mode"): no correction >>>> applies, if there are bitflips, give them >>>> >>>> Please mind the raw mode can be slow, it's meant for debugging and >>>> testing, mainly. Page reads however should be fast, so if just moving >>>> the column pointer works, then do it, otherwise we'll consider >>>> returning FFs. >>>> >>>> Thanks, >>>> Miquèl >>> >> >> >> Thanks, >> Miquèl >> >
On 13.04.2023 09:11, Liang Yang wrote: > > On 2023/4/13 13:32, Liang Yang wrote: >> Hi Miquel, >> >> On 2023/4/12 22:32, Miquel Raynal wrote: >>> [ EXTERNAL EMAIL ] >>> >>> Hello, >>> >>> liang.yang@amlogic.com wrote on Wed, 12 Apr 2023 22:04:28 +0800: >>> >>>> Hi Miquel and Arseniy, >>>> >>>> On 2023/4/12 20:57, Miquel Raynal wrote: >>>>> [ EXTERNAL EMAIL ] >>>>> >>>>> Hi Arseniy, >>>>> >>>>> avkrasnov@sberdevices.ru wrote on Wed, 12 Apr 2023 15:22:26 +0300: >>>>>> On 12.04.2023 15:18, Miquel Raynal wrote: >>>>>>> Hi Arseniy, >>>>>>> >>>>>>> avkrasnov@sberdevices.ru wrote on Wed, 12 Apr 2023 13:14:52 +0300: >>>>>>> >>>> On 12.04.2023 12:36, Miquel Raynal wrote: >>>>>>>>> Hi Arseniy, >>>>>>>>> >>>>>>>>> avkrasnov@sberdevices.ru wrote on Wed, 12 Apr 2023 12:20:55 +0300: >>>>>>>>> >>>>>> On 12.04.2023 10:44, Miquel Raynal wrote: >>>>>>>>>>> Hi Arseniy, >>>>>>>>>>> >>>>>>>>>>> AVKrasnov@sberdevices.ru wrote on Wed, 12 Apr 2023 09:16:58 +0300: >>>>>>>>>>> >>>>>>>> This NAND reads only few user's bytes in ECC mode (not full OOB), so >>>>>>>>>>> >>>>>>>>>>> "This NAND reads" does not look right, do you mean "Subpage reads do >>>>>>>>>>> not retrieve all the OOB bytes,"? >>>>>>>>>>> >>>>>>>> fill OOB buffer with zeroes to not return garbage from previous reads >>>>>>>>>>>> to user. >>>>>>>>>>>> Otherwise 'nanddump' utility prints something like this for just erased >>>>>>>>>>>> page: >>>>>>>>>>>> >>>>>>>>>>>> ... >>>>>>>>>>>> 0x000007f0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff >>>>>>>>>>>> OOB Data: ff ff ff ff 00 00 ff ff 80 cf 22 99 cb ad d3 be >>>>>>>>>>>> OOB Data: 63 27 ae 06 16 0a 2f eb bb dd 46 74 41 8e 88 6e >>>>>>>>>>>> OOB Data: 38 a1 2d e6 77 d4 05 06 f2 a5 7e 25 eb 34 7c ff >>>>>>>>>>>> OOB Data: 38 ea de 14 10 de 9b 40 33 16 6a cc 9d aa 2f 5e >>>>>>>>>>>> >>>>>>>>>>>> Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru> >>>>>>>>>>>> --- >>>>>>>>>>>> drivers/mtd/nand/raw/meson_nand.c | 5 +++++ >>>>>>>>>>>> 1 file changed, 5 insertions(+) >>>>>>>>>>>> >>>>>>>>>>>> diff --git a/drivers/mtd/nand/raw/meson_nand.c b/drivers/mtd/nand/raw/meson_nand.c >>>>>>>>>>>> index f84a10238e4d..f2f2472cb511 100644 >>>>>>>>>>>> --- a/drivers/mtd/nand/raw/meson_nand.c >>>>>>>>>>>> +++ b/drivers/mtd/nand/raw/meson_nand.c >>>>>>>>>>>> @@ -858,9 +858,12 @@ static int meson_nfc_read_page_sub(struct nand_chip *nand, >>>>>>>>>>>> static int meson_nfc_read_page_raw(struct nand_chip *nand, u8 *buf, >>>>>>>>>>>> int oob_required, int page) >>>>>>>>>>>> { >>>>>>>>>>>> + struct mtd_info *mtd = nand_to_mtd(nand); >>>>>>>>>>>> u8 *oob_buf = nand->oob_poi; >>>>>>>>>>>> int ret; >>>>>>>>>>>> >>>>>>>> + memset(oob_buf, 0, mtd->oobsize); >>>>>>>>>>> >>>>>>>>>>> I'm surprised raw reads do not read the entire OOB? >>>>>>>>>> >>>>>>>>>> Yes! Seems in case of raw access (what i see in this driver) number of OOB bytes read >>>>>>>>>> still depends on ECC parameters: for each portion of data covered with ECC code we can >>>>>>>>>> read it's ECC code and "user bytes" from OOB - it is what i see by dumping DMA buffer by >>>>>>>>>> printk(). For example I'm working with 2K NAND pages, each page has 2 x 1K ECC blocks. >>>>>>>>>> For each ECC block I have 16 OOB bytes which I can access by read/write. Each 16 bytes >>>>>>>>>> contains 2 bytes of user's data and 14 bytes ECC codes. So when I read page in raw mode >>>>>>>>>> controller returns 32 bytes (2 x (2 + 14)) of OOB. While OOB is reported as 64 bytes. >>>>>>>>> >>>>>>>>> In all modes, when you read OOB, you should get the full OOB. The fact >>>>>>>>> that ECC correction is enabled or disabled does not matter. If the NAND >>>>>>>>> features OOB sections of 64 bytes, you should get the 64 bytes. >>>>>>>>> >>>>>>>>> What happens sometimes, is that some of the bytes are not protected >>>>>>>>> against bitflips, but the policy is to return the full buffer. >>>>>>>> >>>>>>>> Ok, so to clarify case for this NAND controller: >>>>>>>> 1) In both ECC and raw modes i need to return the same raw OOB data (e.g. user bytes >>>>>>>> + ECC codes)? >>>>>>> >>>>>>> Well, you need to cover the same amount of data, yes. But in the ECC >>>>>>> case the data won't be raw (at least not all of it). >>>>>> >>>>>> So "same amount of data", in ECC mode current implementation returns only user OOB bytes (e.g. >>>>>> OOB data excluding ECC codes), in raw it returns user bytes + ECC codes. IIUC correct >>>>>> behaviour is to always return user bytes + ECC codes as OOB data even in ECC mode ? >>>>> >>>>> If the page are 2k+64B you should read 2k+64B when OOB are requested. >>>>> >>>>> If the controller only returns 2k+32B, then perform a random read to >>>>> just move the read pointer to mtd->size + mtd->oobsize - 32 and >>>>> retrieve the missing 32 bytes? >>>> >>>> 1) raw read can read out the whole page data 2k+64B, decided by the len in the controller raw read command: >>>> cmd = (len & GENMASK(5, 0)) | scrambler | DMA_DIR(dir); >>>> after that, the missing oob bytes(not used) can be copied from meson_chip->data_buf. so the implementation of meson_nfc_read_page_raw() is like this if need. >>>> { >>>> ...... >>>> meson_nfc_read_page_sub(nand, page, 1); >>>> meson_nfc_get_data_oob(nand, buf, oob_buf); >>>> oob_len = (nand->ecc.bytes + 2) * nand->ecc.steps; >>>> memcpy(oob_buf + oob_len, meson_chip->data_buf + oob_len, mtd->oobsize - oob_len); >>>> >>>> } >>>> 2) In ECC mode, the controller can't bring back the missing OOB bytes. it can read out the user bytes and ecc bytes per meson_ooblayout_ops define. >>> >>> And then (if oob_required) you can bring the missing bytes with >>> something along: >>> nand_change_read_column_op(chip, mtd->writesize + oob_len, >>> oob_buf + oob_len, >>> mtd->oobsize - oob_len, >>> false); >>> Should not be a huge performance hit. >> >> After finishing ECC mode reading, the column address internal in NAND device should be the right pos; it doesn't need to change the column again. so adding controller raw read for the missing bytes after ECC reading may works. >> > use raw read for the missing bytes, but they are not protected by host ECC. to the NAND type of storage, is it ok or missing bytes better to be filled with 0xff? IIUC Miquèl's reply, valid behaviour is to return full OOB data in both modes. So in: ECC mode it is user bytes(corrected by ECC, read from info buffer) + ECC + missing bytes. ECC and missing bytes read in RAW mode. RAW mode it is user bytes(not corrected by ECC) + ECC + missing bytes Also @Liang, is this valid code (drivers/mtd/nand/raw/meson_nand.c)? ret = nand_check_erased_ecc_chunk(data, ecc->size, oob, ecc->bytes + 2, NULL, 0, ecc->strength); It confused me, because 'oob' buffer contains both user bytes and ECC code, 'ecc->bytes + 2' is 16. May be it should be: ret = nand_check_erased_ecc_chunk(data, ecc->size, oob + 2, ecc->bytes, NULL, 0, ecc->strength); For example let's look on Tegra's driver (drivers/mtd/nand/raw/tegra_nand.c): u8 *oob = chip->oob_poi + nand->ecc.offset + (chip->ecc.bytes * bit); ret = nand_check_erased_ecc_chunk(data, chip->ecc.size, oob, chip->ecc.bytes, NULL, 0, chip->ecc.strength); 'oob' contains 'nand->ecc.offset', and ECC length does not account user bytes length (e.g. 2) - it is just 'chip->ecc.bytes' Please correct me if i'm wrong. Thanks, Arseniy > >>> >>>> >>>>> >>>>> This applies to the two modes, the only difference is: >>>>> - with correction (commonly named "ECC mode"): the user bytes and ECC >>>>> bytes should be fixed if there are any bitflips >>>>> - without correction (commonly referred as "raw mode"): no correction >>>>> applies, if there are bitflips, give them >>>>> >>>>> Please mind the raw mode can be slow, it's meant for debugging and >>>>> testing, mainly. Page reads however should be fast, so if just moving >>>>> the column pointer works, then do it, otherwise we'll consider >>>>> returning FFs. >>>>> >>>>> Thanks, >>>>> Miquèl >>>> >>> >>> >>> Thanks, >>> Miquèl >>> >> >
Hi Arseniy, avkrasnov@sberdevices.ru wrote on Thu, 13 Apr 2023 10:00:24 +0300: > On 13.04.2023 09:11, Liang Yang wrote: > > > > On 2023/4/13 13:32, Liang Yang wrote: > >> Hi Miquel, > >> > >> On 2023/4/12 22:32, Miquel Raynal wrote: > >>> [ EXTERNAL EMAIL ] > >>> > >>> Hello, > >>> > >>> liang.yang@amlogic.com wrote on Wed, 12 Apr 2023 22:04:28 +0800: > >>> > >>>> Hi Miquel and Arseniy, > >>>> > >>>> On 2023/4/12 20:57, Miquel Raynal wrote: > >>>>> [ EXTERNAL EMAIL ] > >>>>> > >>>>> Hi Arseniy, > >>>>> > >>>>> avkrasnov@sberdevices.ru wrote on Wed, 12 Apr 2023 15:22:26 +0300: > >>>>>> On 12.04.2023 15:18, Miquel Raynal wrote: > >>>>>>> Hi Arseniy, > >>>>>>> > >>>>>>> avkrasnov@sberdevices.ru wrote on Wed, 12 Apr 2023 13:14:52 +0300: > >>>>>>> >>>> On 12.04.2023 12:36, Miquel Raynal wrote: > >>>>>>>>> Hi Arseniy, > >>>>>>>>> > >>>>>>>>> avkrasnov@sberdevices.ru wrote on Wed, 12 Apr 2023 12:20:55 +0300: > >>>>>>>>> >>>>>> On 12.04.2023 10:44, Miquel Raynal wrote: > >>>>>>>>>>> Hi Arseniy, > >>>>>>>>>>> > >>>>>>>>>>> AVKrasnov@sberdevices.ru wrote on Wed, 12 Apr 2023 09:16:58 +0300: > >>>>>>>>>>> >>>>>>>> This NAND reads only few user's bytes in ECC mode (not full OOB), so > >>>>>>>>>>> > >>>>>>>>>>> "This NAND reads" does not look right, do you mean "Subpage reads do > >>>>>>>>>>> not retrieve all the OOB bytes,"? > >>>>>>>>>>> >>>>>>>> fill OOB buffer with zeroes to not return garbage from previous reads > >>>>>>>>>>>> to user. > >>>>>>>>>>>> Otherwise 'nanddump' utility prints something like this for just erased > >>>>>>>>>>>> page: > >>>>>>>>>>>> > >>>>>>>>>>>> ... > >>>>>>>>>>>> 0x000007f0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > >>>>>>>>>>>> OOB Data: ff ff ff ff 00 00 ff ff 80 cf 22 99 cb ad d3 be > >>>>>>>>>>>> OOB Data: 63 27 ae 06 16 0a 2f eb bb dd 46 74 41 8e 88 6e > >>>>>>>>>>>> OOB Data: 38 a1 2d e6 77 d4 05 06 f2 a5 7e 25 eb 34 7c ff > >>>>>>>>>>>> OOB Data: 38 ea de 14 10 de 9b 40 33 16 6a cc 9d aa 2f 5e > >>>>>>>>>>>> > >>>>>>>>>>>> Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru> > >>>>>>>>>>>> --- > >>>>>>>>>>>> drivers/mtd/nand/raw/meson_nand.c | 5 +++++ > >>>>>>>>>>>> 1 file changed, 5 insertions(+) > >>>>>>>>>>>> > >>>>>>>>>>>> diff --git a/drivers/mtd/nand/raw/meson_nand.c b/drivers/mtd/nand/raw/meson_nand.c > >>>>>>>>>>>> index f84a10238e4d..f2f2472cb511 100644 > >>>>>>>>>>>> --- a/drivers/mtd/nand/raw/meson_nand.c > >>>>>>>>>>>> +++ b/drivers/mtd/nand/raw/meson_nand.c > >>>>>>>>>>>> @@ -858,9 +858,12 @@ static int meson_nfc_read_page_sub(struct nand_chip *nand, > >>>>>>>>>>>> static int meson_nfc_read_page_raw(struct nand_chip *nand, u8 *buf, > >>>>>>>>>>>> int oob_required, int page) > >>>>>>>>>>>> { > >>>>>>>>>>>> + struct mtd_info *mtd = nand_to_mtd(nand); > >>>>>>>>>>>> u8 *oob_buf = nand->oob_poi; > >>>>>>>>>>>> int ret; > >>>>>>>>>>>> >>>>>>>> + memset(oob_buf, 0, mtd->oobsize); > >>>>>>>>>>> > >>>>>>>>>>> I'm surprised raw reads do not read the entire OOB? > >>>>>>>>>> > >>>>>>>>>> Yes! Seems in case of raw access (what i see in this driver) number of OOB bytes read > >>>>>>>>>> still depends on ECC parameters: for each portion of data covered with ECC code we can > >>>>>>>>>> read it's ECC code and "user bytes" from OOB - it is what i see by dumping DMA buffer by > >>>>>>>>>> printk(). For example I'm working with 2K NAND pages, each page has 2 x 1K ECC blocks. > >>>>>>>>>> For each ECC block I have 16 OOB bytes which I can access by read/write. Each 16 bytes > >>>>>>>>>> contains 2 bytes of user's data and 14 bytes ECC codes. So when I read page in raw mode > >>>>>>>>>> controller returns 32 bytes (2 x (2 + 14)) of OOB. While OOB is reported as 64 bytes. > >>>>>>>>> > >>>>>>>>> In all modes, when you read OOB, you should get the full OOB. The fact > >>>>>>>>> that ECC correction is enabled or disabled does not matter. If the NAND > >>>>>>>>> features OOB sections of 64 bytes, you should get the 64 bytes. > >>>>>>>>> > >>>>>>>>> What happens sometimes, is that some of the bytes are not protected > >>>>>>>>> against bitflips, but the policy is to return the full buffer. > >>>>>>>> > >>>>>>>> Ok, so to clarify case for this NAND controller: > >>>>>>>> 1) In both ECC and raw modes i need to return the same raw OOB data (e.g. user bytes > >>>>>>>> + ECC codes)? > >>>>>>> > >>>>>>> Well, you need to cover the same amount of data, yes. But in the ECC > >>>>>>> case the data won't be raw (at least not all of it). > >>>>>> > >>>>>> So "same amount of data", in ECC mode current implementation returns only user OOB bytes (e.g. > >>>>>> OOB data excluding ECC codes), in raw it returns user bytes + ECC codes. IIUC correct > >>>>>> behaviour is to always return user bytes + ECC codes as OOB data even in ECC mode ? > >>>>> > >>>>> If the page are 2k+64B you should read 2k+64B when OOB are requested. > >>>>> > >>>>> If the controller only returns 2k+32B, then perform a random read to > >>>>> just move the read pointer to mtd->size + mtd->oobsize - 32 and > >>>>> retrieve the missing 32 bytes? > >>>> > >>>> 1) raw read can read out the whole page data 2k+64B, decided by the len in the controller raw read command: > >>>> cmd = (len & GENMASK(5, 0)) | scrambler | DMA_DIR(dir); > >>>> after that, the missing oob bytes(not used) can be copied from meson_chip->data_buf. so the implementation of meson_nfc_read_page_raw() is like this if need. > >>>> { > >>>> ...... > >>>> meson_nfc_read_page_sub(nand, page, 1); > >>>> meson_nfc_get_data_oob(nand, buf, oob_buf); > >>>> oob_len = (nand->ecc.bytes + 2) * nand->ecc.steps; > >>>> memcpy(oob_buf + oob_len, meson_chip->data_buf + oob_len, mtd->oobsize - oob_len); > >>>> > >>>> } > >>>> 2) In ECC mode, the controller can't bring back the missing OOB bytes. it can read out the user bytes and ecc bytes per meson_ooblayout_ops define. > >>> > >>> And then (if oob_required) you can bring the missing bytes with > >>> something along: > >>> nand_change_read_column_op(chip, mtd->writesize + oob_len, > >>> oob_buf + oob_len, > >>> mtd->oobsize - oob_len, > >>> false); > >>> Should not be a huge performance hit. > >> > >> After finishing ECC mode reading, the column address internal in NAND device should be the right pos; it doesn't need to change the column again. so adding controller raw read for the missing bytes after ECC reading may works. > >> > > use raw read for the missing bytes, but they are not protected by host ECC. to the NAND type of storage, is it ok or missing bytes better to be filled with 0xff? > > IIUC Miquèl's reply, valid behaviour is to return full OOB data in both modes. So in: > ECC mode it is user bytes(corrected by ECC, read from info buffer) + ECC + missing bytes. ECC and missing bytes read in RAW mode. I believe the ECC bytes you'll get will be corrected. You can check this by using nandflipbits in mtd-utils. > RAW mode it is user bytes(not corrected by ECC) + ECC + missing bytes > > > Also @Liang, is this valid code (drivers/mtd/nand/raw/meson_nand.c)? > > ret = nand_check_erased_ecc_chunk(data, ecc->size, > oob, ecc->bytes + 2, > NULL, 0, > ecc->strength); > > It confused me, because 'oob' buffer contains both user bytes and ECC code, > 'ecc->bytes + 2' is 16. May be it should be: > > ret = nand_check_erased_ecc_chunk(data, ecc->size, > oob + 2, ecc->bytes, > NULL, 0, > ecc->strength); When you check for an erased chunk you should probably check the whole OOB area. > > For example let's look on Tegra's driver (drivers/mtd/nand/raw/tegra_nand.c): > > u8 *oob = chip->oob_poi + nand->ecc.offset + > (chip->ecc.bytes * bit); > > ret = nand_check_erased_ecc_chunk(data, chip->ecc.size, > oob, chip->ecc.bytes, > NULL, 0, > chip->ecc.strength); > > 'oob' contains 'nand->ecc.offset', and ECC length does not account user bytes length > (e.g. 2) - it is just 'chip->ecc.bytes' I haven't looked carefully, but be aware there are two user bytes reserved at the beginning of the OOB area for marking bad blocks. There *may* be a confusion somewhere. I am not saying there is one, just a hint if you can't find an explanation. Thanks, Miquèl
Hi Liang, liang.yang@amlogic.com wrote on Thu, 13 Apr 2023 13:32:03 +0800: > Hi Miquel, > > On 2023/4/12 22:32, Miquel Raynal wrote: > > [ EXTERNAL EMAIL ] > > > > Hello, > > > > liang.yang@amlogic.com wrote on Wed, 12 Apr 2023 22:04:28 +0800: > > > >> Hi Miquel and Arseniy, > >> > >> On 2023/4/12 20:57, Miquel Raynal wrote: > >>> [ EXTERNAL EMAIL ] > >>> > >>> Hi Arseniy, > >>> > >>> avkrasnov@sberdevices.ru wrote on Wed, 12 Apr 2023 15:22:26 +0300: > >>> >>>> On 12.04.2023 15:18, Miquel Raynal wrote: > >>>>> Hi Arseniy, > >>>>> > >>>>> avkrasnov@sberdevices.ru wrote on Wed, 12 Apr 2023 13:14:52 +0300: > >>>>> >>>> On 12.04.2023 12:36, Miquel Raynal wrote: > >>>>>>> Hi Arseniy, > >>>>>>> > >>>>>>> avkrasnov@sberdevices.ru wrote on Wed, 12 Apr 2023 12:20:55 +0300: > >>>>>>> >>>>>> On 12.04.2023 10:44, Miquel Raynal wrote: > >>>>>>>>> Hi Arseniy, > >>>>>>>>> > >>>>>>>>> AVKrasnov@sberdevices.ru wrote on Wed, 12 Apr 2023 09:16:58 +0300: > >>>>>>>>> >>>>>>>> This NAND reads only few user's bytes in ECC mode (not full OOB), so > >>>>>>>>> > >>>>>>>>> "This NAND reads" does not look right, do you mean "Subpage reads do > >>>>>>>>> not retrieve all the OOB bytes,"? > >>>>>>>>> >>>>>>>> fill OOB buffer with zeroes to not return garbage from previous reads > >>>>>>>>>> to user. > >>>>>>>>>> Otherwise 'nanddump' utility prints something like this for just erased > >>>>>>>>>> page: > >>>>>>>>>> > >>>>>>>>>> ... > >>>>>>>>>> 0x000007f0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > >>>>>>>>>> OOB Data: ff ff ff ff 00 00 ff ff 80 cf 22 99 cb ad d3 be > >>>>>>>>>> OOB Data: 63 27 ae 06 16 0a 2f eb bb dd 46 74 41 8e 88 6e > >>>>>>>>>> OOB Data: 38 a1 2d e6 77 d4 05 06 f2 a5 7e 25 eb 34 7c ff > >>>>>>>>>> OOB Data: 38 ea de 14 10 de 9b 40 33 16 6a cc 9d aa 2f 5e > >>>>>>>>>> > >>>>>>>>>> Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru> > >>>>>>>>>> --- > >>>>>>>>>> drivers/mtd/nand/raw/meson_nand.c | 5 +++++ > >>>>>>>>>> 1 file changed, 5 insertions(+) > >>>>>>>>>> > >>>>>>>>>> diff --git a/drivers/mtd/nand/raw/meson_nand.c b/drivers/mtd/nand/raw/meson_nand.c > >>>>>>>>>> index f84a10238e4d..f2f2472cb511 100644 > >>>>>>>>>> --- a/drivers/mtd/nand/raw/meson_nand.c > >>>>>>>>>> +++ b/drivers/mtd/nand/raw/meson_nand.c > >>>>>>>>>> @@ -858,9 +858,12 @@ static int meson_nfc_read_page_sub(struct nand_chip *nand, > >>>>>>>>>> static int meson_nfc_read_page_raw(struct nand_chip *nand, u8 *buf, > >>>>>>>>>> int oob_required, int page) > >>>>>>>>>> { > >>>>>>>>>> + struct mtd_info *mtd = nand_to_mtd(nand); > >>>>>>>>>> u8 *oob_buf = nand->oob_poi; > >>>>>>>>>> int ret; > >>>>>>>>>> >>>>>>>> + memset(oob_buf, 0, mtd->oobsize); > >>>>>>>>> > >>>>>>>>> I'm surprised raw reads do not read the entire OOB? > >>>>>>>> > >>>>>>>> Yes! Seems in case of raw access (what i see in this driver) number of OOB bytes read > >>>>>>>> still depends on ECC parameters: for each portion of data covered with ECC code we can > >>>>>>>> read it's ECC code and "user bytes" from OOB - it is what i see by dumping DMA buffer by > >>>>>>>> printk(). For example I'm working with 2K NAND pages, each page has 2 x 1K ECC blocks. > >>>>>>>> For each ECC block I have 16 OOB bytes which I can access by read/write. Each 16 bytes > >>>>>>>> contains 2 bytes of user's data and 14 bytes ECC codes. So when I read page in raw mode > >>>>>>>> controller returns 32 bytes (2 x (2 + 14)) of OOB. While OOB is reported as 64 bytes. > >>>>>>> > >>>>>>> In all modes, when you read OOB, you should get the full OOB. The fact > >>>>>>> that ECC correction is enabled or disabled does not matter. If the NAND > >>>>>>> features OOB sections of 64 bytes, you should get the 64 bytes. > >>>>>>> > >>>>>>> What happens sometimes, is that some of the bytes are not protected > >>>>>>> against bitflips, but the policy is to return the full buffer. > >>>>>> > >>>>>> Ok, so to clarify case for this NAND controller: > >>>>>> 1) In both ECC and raw modes i need to return the same raw OOB data (e.g. user bytes > >>>>>> + ECC codes)? > >>>>> > >>>>> Well, you need to cover the same amount of data, yes. But in the ECC > >>>>> case the data won't be raw (at least not all of it). > >>>> > >>>> So "same amount of data", in ECC mode current implementation returns only user OOB bytes (e.g. > >>>> OOB data excluding ECC codes), in raw it returns user bytes + ECC codes. IIUC correct > >>>> behaviour is to always return user bytes + ECC codes as OOB data even in ECC mode ? > >>> > >>> If the page are 2k+64B you should read 2k+64B when OOB are requested. > >>> > >>> If the controller only returns 2k+32B, then perform a random read to > >>> just move the read pointer to mtd->size + mtd->oobsize - 32 and > >>> retrieve the missing 32 bytes? > >> > >> 1) raw read can read out the whole page data 2k+64B, decided by the len in the controller raw read command: > >> cmd = (len & GENMASK(5, 0)) | scrambler | DMA_DIR(dir); > >> after that, the missing oob bytes(not used) can be copied from meson_chip->data_buf. so the implementation of meson_nfc_read_page_raw() is like this if need. > >> { > >> ...... > >> meson_nfc_read_page_sub(nand, page, 1); > >> meson_nfc_get_data_oob(nand, buf, oob_buf); > >> oob_len = (nand->ecc.bytes + 2) * nand->ecc.steps; > >> memcpy(oob_buf + oob_len, meson_chip->data_buf + oob_len, mtd->oobsize - oob_len); > >> > >> } > >> 2) In ECC mode, the controller can't bring back the missing OOB bytes. it can read out the user bytes and ecc bytes per meson_ooblayout_ops define. > > > > And then (if oob_required) you can bring the missing bytes with > > something along: > > nand_change_read_column_op(chip, mtd->writesize + oob_len, > > oob_buf + oob_len, > > mtd->oobsize - oob_len, > > false); > > Should not be a huge performance hit. > > After finishing ECC mode reading, the column address internal in NAND device should be the right pos; it doesn't need to change the column again. so adding controller raw read for the missing bytes after ECC reading may works. Yes, if the last byte read is the one right before the "missing" bytes, then that will work. Thanks, Miquèl
On Wed, Apr 12, 2023 at 10:56:03PM +0200, Miquel Raynal wrote: > Hi Dmitry, > > ddrokosov@sberdevices.ru wrote on Wed, 12 Apr 2023 22:15:48 +0300: > > > On Wed, Apr 12, 2023 at 10:04:28PM +0800, Liang Yang wrote: > > > Hi Miquel and Arseniy, > > > > > > On 2023/4/12 20:57, Miquel Raynal wrote: > > > > [ EXTERNAL EMAIL ] > > > > > > > > Hi Arseniy, > > > > > > > > avkrasnov@sberdevices.ru wrote on Wed, 12 Apr 2023 15:22:26 +0300: > > > > > > > > > On 12.04.2023 15:18, Miquel Raynal wrote: > > > > > > Hi Arseniy, > > > > > > > > > > > > avkrasnov@sberdevices.ru wrote on Wed, 12 Apr 2023 13:14:52 +0300: > > > > > > > On 12.04.2023 12:36, Miquel Raynal wrote: > > > > > > > > Hi Arseniy, > > > > > > > > > > > > > > > > avkrasnov@sberdevices.ru wrote on Wed, 12 Apr 2023 12:20:55 +0300: > > > > > > > > > On 12.04.2023 10:44, Miquel Raynal wrote: > > > > > > > > > > Hi Arseniy, > > > > > > > > > > > > > > > > > > > > AVKrasnov@sberdevices.ru wrote on Wed, 12 Apr 2023 09:16:58 +0300: > > > > > > > > > > > This NAND reads only few user's bytes in ECC mode (not full OOB), so > > > > > > > > > > > > > > > > > > > > "This NAND reads" does not look right, do you mean "Subpage reads do > > > > > > > > > > not retrieve all the OOB bytes,"? > > > > > > > > > > > fill OOB buffer with zeroes to not return garbage from previous reads > > > > > > > > > > > to user. > > > > > > > > > > > Otherwise 'nanddump' utility prints something like this for just erased > > > > > > > > > > > page: > > > > > > > > > > > > > > > > > > > > > > ... > > > > > > > > > > > 0x000007f0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > > > > > > > > > > > OOB Data: ff ff ff ff 00 00 ff ff 80 cf 22 99 cb ad d3 be > > > > > > > > > > > OOB Data: 63 27 ae 06 16 0a 2f eb bb dd 46 74 41 8e 88 6e > > > > > > > > > > > OOB Data: 38 a1 2d e6 77 d4 05 06 f2 a5 7e 25 eb 34 7c ff > > > > > > > > > > > OOB Data: 38 ea de 14 10 de 9b 40 33 16 6a cc 9d aa 2f 5e > > > > > > > > > > > > > > > > > > > > > > Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru> > > > > > > > > > > > --- > > > > > > > > > > > drivers/mtd/nand/raw/meson_nand.c | 5 +++++ > > > > > > > > > > > 1 file changed, 5 insertions(+) > > > > > > > > > > > > > > > > > > > > > > diff --git a/drivers/mtd/nand/raw/meson_nand.c b/drivers/mtd/nand/raw/meson_nand.c > > > > > > > > > > > index f84a10238e4d..f2f2472cb511 100644 > > > > > > > > > > > --- a/drivers/mtd/nand/raw/meson_nand.c > > > > > > > > > > > +++ b/drivers/mtd/nand/raw/meson_nand.c > > > > > > > > > > > @@ -858,9 +858,12 @@ static int meson_nfc_read_page_sub(struct nand_chip *nand, > > > > > > > > > > > static int meson_nfc_read_page_raw(struct nand_chip *nand, u8 *buf, > > > > > > > > > > > int oob_required, int page) > > > > > > > > > > > { > > > > > > > > > > > + struct mtd_info *mtd = nand_to_mtd(nand); > > > > > > > > > > > u8 *oob_buf = nand->oob_poi; > > > > > > > > > > > int ret; > > > > > > > > > > > + memset(oob_buf, 0, mtd->oobsize); > > > > > > > > > > > > > > > > > > > > I'm surprised raw reads do not read the entire OOB? > > > > > > > > > > > > > > > > > > Yes! Seems in case of raw access (what i see in this driver) number of OOB bytes read > > > > > > > > > still depends on ECC parameters: for each portion of data covered with ECC code we can > > > > > > > > > read it's ECC code and "user bytes" from OOB - it is what i see by dumping DMA buffer by > > > > > > > > > printk(). For example I'm working with 2K NAND pages, each page has 2 x 1K ECC blocks. > > > > > > > > > For each ECC block I have 16 OOB bytes which I can access by read/write. Each 16 bytes > > > > > > > > > contains 2 bytes of user's data and 14 bytes ECC codes. So when I read page in raw mode > > > > > > > > > controller returns 32 bytes (2 x (2 + 14)) of OOB. While OOB is reported as 64 bytes. > > > > > > > > > > > > > > > > In all modes, when you read OOB, you should get the full OOB. The fact > > > > > > > > that ECC correction is enabled or disabled does not matter. If the NAND > > > > > > > > features OOB sections of 64 bytes, you should get the 64 bytes. > > > > > > > > > > > > > > > > What happens sometimes, is that some of the bytes are not protected > > > > > > > > against bitflips, but the policy is to return the full buffer. > > > > > > > > > > > > > > Ok, so to clarify case for this NAND controller: > > > > > > > 1) In both ECC and raw modes i need to return the same raw OOB data (e.g. user bytes > > > > > > > + ECC codes)? > > > > > > > > > > > > Well, you need to cover the same amount of data, yes. But in the ECC > > > > > > case the data won't be raw (at least not all of it). > > > > > > > > > > So "same amount of data", in ECC mode current implementation returns only user OOB bytes (e.g. > > > > > OOB data excluding ECC codes), in raw it returns user bytes + ECC codes. IIUC correct > > > > > behaviour is to always return user bytes + ECC codes as OOB data even in ECC mode ? > > > > > > > > If the page are 2k+64B you should read 2k+64B when OOB are requested. > > > > > > > > If the controller only returns 2k+32B, then perform a random read to > > > > just move the read pointer to mtd->size + mtd->oobsize - 32 and > > > > retrieve the missing 32 bytes? > > > > > > 1) raw read can read out the whole page data 2k+64B, decided by the len in > > > the controller raw read command: > > > cmd = (len & GENMASK(5, 0)) | scrambler | DMA_DIR(dir); > > > after that, the missing oob bytes(not used) can be copied from > > > meson_chip->data_buf. so the implementation of meson_nfc_read_page_raw() is > > > like this if need. > > > { > > > ...... > > > meson_nfc_read_page_sub(nand, page, 1); > > > meson_nfc_get_data_oob(nand, buf, oob_buf); > > > oob_len = (nand->ecc.bytes + 2) * nand->ecc.steps; > > > memcpy(oob_buf + oob_len, meson_chip->data_buf + oob_len, mtd->oobsize - > > > oob_len); > > > > > > } > > > 2) In ECC mode, the controller can't bring back the missing OOB bytes. it > > > can read out the user bytes and ecc bytes per meson_ooblayout_ops define. > > > > > > > How does the Meson controller know the actual NAND flash layout when the > > OOB is split into protected and unprotected areas, such as Free and ECC > > areas? If the controller has a static OOB layout, where is the mapping > > located? > > It's usually a set of values hardcoded in the driver. It's a per > geometry set. > Sorry, I'm still confused. Before I developed spinand drivers, the OOB layout was located on the flash driver side. Do you mean if the OOB geometry in the rawnand subsystem is under the responsibility of the controller driver? > > > > > > > > > > This applies to the two modes, the only difference is: > > > > - with correction (commonly named "ECC mode"): the user bytes and ECC > > > > bytes should be fixed if there are any bitflips > > > > - without correction (commonly referred as "raw mode"): no correction > > > > applies, if there are bitflips, give them > > > > > > > > Please mind the raw mode can be slow, it's meant for debugging and > > > > testing, mainly. Page reads however should be fast, so if just moving > > > > the column pointer works, then do it, otherwise we'll consider > > > > returning FFs. > >
On 13.04.2023 11:22, Miquel Raynal wrote: > Hi Arseniy, > > avkrasnov@sberdevices.ru wrote on Thu, 13 Apr 2023 10:00:24 +0300: > >> On 13.04.2023 09:11, Liang Yang wrote: >>> >>> On 2023/4/13 13:32, Liang Yang wrote: >>>> Hi Miquel, >>>> >>>> On 2023/4/12 22:32, Miquel Raynal wrote: >>>>> [ EXTERNAL EMAIL ] >>>>> >>>>> Hello, >>>>> >>>>> liang.yang@amlogic.com wrote on Wed, 12 Apr 2023 22:04:28 +0800: >>>>> >>>>>> Hi Miquel and Arseniy, >>>>>> >>>>>> On 2023/4/12 20:57, Miquel Raynal wrote: >>>>>>> [ EXTERNAL EMAIL ] >>>>>>> >>>>>>> Hi Arseniy, >>>>>>> >>>>>>> avkrasnov@sberdevices.ru wrote on Wed, 12 Apr 2023 15:22:26 +0300: >>>>>>>> On 12.04.2023 15:18, Miquel Raynal wrote: >>>>>>>>> Hi Arseniy, >>>>>>>>> >>>>>>>>> avkrasnov@sberdevices.ru wrote on Wed, 12 Apr 2023 13:14:52 +0300: >>>>>>>>> >>>> On 12.04.2023 12:36, Miquel Raynal wrote: >>>>>>>>>>> Hi Arseniy, >>>>>>>>>>> >>>>>>>>>>> avkrasnov@sberdevices.ru wrote on Wed, 12 Apr 2023 12:20:55 +0300: >>>>>>>>>>> >>>>>> On 12.04.2023 10:44, Miquel Raynal wrote: >>>>>>>>>>>>> Hi Arseniy, >>>>>>>>>>>>> >>>>>>>>>>>>> AVKrasnov@sberdevices.ru wrote on Wed, 12 Apr 2023 09:16:58 +0300: >>>>>>>>>>>>> >>>>>>>> This NAND reads only few user's bytes in ECC mode (not full OOB), so >>>>>>>>>>>>> >>>>>>>>>>>>> "This NAND reads" does not look right, do you mean "Subpage reads do >>>>>>>>>>>>> not retrieve all the OOB bytes,"? >>>>>>>>>>>>> >>>>>>>> fill OOB buffer with zeroes to not return garbage from previous reads >>>>>>>>>>>>>> to user. >>>>>>>>>>>>>> Otherwise 'nanddump' utility prints something like this for just erased >>>>>>>>>>>>>> page: >>>>>>>>>>>>>> >>>>>>>>>>>>>> ... >>>>>>>>>>>>>> 0x000007f0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff >>>>>>>>>>>>>> OOB Data: ff ff ff ff 00 00 ff ff 80 cf 22 99 cb ad d3 be >>>>>>>>>>>>>> OOB Data: 63 27 ae 06 16 0a 2f eb bb dd 46 74 41 8e 88 6e >>>>>>>>>>>>>> OOB Data: 38 a1 2d e6 77 d4 05 06 f2 a5 7e 25 eb 34 7c ff >>>>>>>>>>>>>> OOB Data: 38 ea de 14 10 de 9b 40 33 16 6a cc 9d aa 2f 5e >>>>>>>>>>>>>> >>>>>>>>>>>>>> Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru> >>>>>>>>>>>>>> --- >>>>>>>>>>>>>> drivers/mtd/nand/raw/meson_nand.c | 5 +++++ >>>>>>>>>>>>>> 1 file changed, 5 insertions(+) >>>>>>>>>>>>>> >>>>>>>>>>>>>> diff --git a/drivers/mtd/nand/raw/meson_nand.c b/drivers/mtd/nand/raw/meson_nand.c >>>>>>>>>>>>>> index f84a10238e4d..f2f2472cb511 100644 >>>>>>>>>>>>>> --- a/drivers/mtd/nand/raw/meson_nand.c >>>>>>>>>>>>>> +++ b/drivers/mtd/nand/raw/meson_nand.c >>>>>>>>>>>>>> @@ -858,9 +858,12 @@ static int meson_nfc_read_page_sub(struct nand_chip *nand, >>>>>>>>>>>>>> static int meson_nfc_read_page_raw(struct nand_chip *nand, u8 *buf, >>>>>>>>>>>>>> int oob_required, int page) >>>>>>>>>>>>>> { >>>>>>>>>>>>>> + struct mtd_info *mtd = nand_to_mtd(nand); >>>>>>>>>>>>>> u8 *oob_buf = nand->oob_poi; >>>>>>>>>>>>>> int ret; >>>>>>>>>>>>>> >>>>>>>> + memset(oob_buf, 0, mtd->oobsize); >>>>>>>>>>>>> >>>>>>>>>>>>> I'm surprised raw reads do not read the entire OOB? >>>>>>>>>>>> >>>>>>>>>>>> Yes! Seems in case of raw access (what i see in this driver) number of OOB bytes read >>>>>>>>>>>> still depends on ECC parameters: for each portion of data covered with ECC code we can >>>>>>>>>>>> read it's ECC code and "user bytes" from OOB - it is what i see by dumping DMA buffer by >>>>>>>>>>>> printk(). For example I'm working with 2K NAND pages, each page has 2 x 1K ECC blocks. >>>>>>>>>>>> For each ECC block I have 16 OOB bytes which I can access by read/write. Each 16 bytes >>>>>>>>>>>> contains 2 bytes of user's data and 14 bytes ECC codes. So when I read page in raw mode >>>>>>>>>>>> controller returns 32 bytes (2 x (2 + 14)) of OOB. While OOB is reported as 64 bytes. >>>>>>>>>>> >>>>>>>>>>> In all modes, when you read OOB, you should get the full OOB. The fact >>>>>>>>>>> that ECC correction is enabled or disabled does not matter. If the NAND >>>>>>>>>>> features OOB sections of 64 bytes, you should get the 64 bytes. >>>>>>>>>>> >>>>>>>>>>> What happens sometimes, is that some of the bytes are not protected >>>>>>>>>>> against bitflips, but the policy is to return the full buffer. >>>>>>>>>> >>>>>>>>>> Ok, so to clarify case for this NAND controller: >>>>>>>>>> 1) In both ECC and raw modes i need to return the same raw OOB data (e.g. user bytes >>>>>>>>>> + ECC codes)? >>>>>>>>> >>>>>>>>> Well, you need to cover the same amount of data, yes. But in the ECC >>>>>>>>> case the data won't be raw (at least not all of it). >>>>>>>> >>>>>>>> So "same amount of data", in ECC mode current implementation returns only user OOB bytes (e.g. >>>>>>>> OOB data excluding ECC codes), in raw it returns user bytes + ECC codes. IIUC correct >>>>>>>> behaviour is to always return user bytes + ECC codes as OOB data even in ECC mode ? >>>>>>> >>>>>>> If the page are 2k+64B you should read 2k+64B when OOB are requested. >>>>>>> >>>>>>> If the controller only returns 2k+32B, then perform a random read to >>>>>>> just move the read pointer to mtd->size + mtd->oobsize - 32 and >>>>>>> retrieve the missing 32 bytes? >>>>>> >>>>>> 1) raw read can read out the whole page data 2k+64B, decided by the len in the controller raw read command: >>>>>> cmd = (len & GENMASK(5, 0)) | scrambler | DMA_DIR(dir); >>>>>> after that, the missing oob bytes(not used) can be copied from meson_chip->data_buf. so the implementation of meson_nfc_read_page_raw() is like this if need. >>>>>> { >>>>>> ...... >>>>>> meson_nfc_read_page_sub(nand, page, 1); >>>>>> meson_nfc_get_data_oob(nand, buf, oob_buf); >>>>>> oob_len = (nand->ecc.bytes + 2) * nand->ecc.steps; >>>>>> memcpy(oob_buf + oob_len, meson_chip->data_buf + oob_len, mtd->oobsize - oob_len); >>>>>> >>>>>> } >>>>>> 2) In ECC mode, the controller can't bring back the missing OOB bytes. it can read out the user bytes and ecc bytes per meson_ooblayout_ops define. >>>>> >>>>> And then (if oob_required) you can bring the missing bytes with >>>>> something along: >>>>> nand_change_read_column_op(chip, mtd->writesize + oob_len, >>>>> oob_buf + oob_len, >>>>> mtd->oobsize - oob_len, >>>>> false); >>>>> Should not be a huge performance hit. >>>> >>>> After finishing ECC mode reading, the column address internal in NAND device should be the right pos; it doesn't need to change the column again. so adding controller raw read for the missing bytes after ECC reading may works. >>>> >>> use raw read for the missing bytes, but they are not protected by host ECC. to the NAND type of storage, is it ok or missing bytes better to be filled with 0xff? >> >> IIUC Miquèl's reply, valid behaviour is to return full OOB data in both modes. So in: >> ECC mode it is user bytes(corrected by ECC, read from info buffer) + ECC + missing bytes. ECC and missing bytes read in RAW mode. > > I believe the ECC bytes you'll get will be corrected. > You can check this by using nandflipbits in mtd-utils. Sorry, didn't get it, i'm new in NAND area. Bytes of ECC codes are available only in raw mode (at least in this NAND driver) also as missing bytes of OOB. So IIUC ECC codes are metadata to correct data bytes, and thus couldn't be corrected. Thanks, Arseniy > >> RAW mode it is user bytes(not corrected by ECC) + ECC + missing bytes >> >> >> Also @Liang, is this valid code (drivers/mtd/nand/raw/meson_nand.c)? >> >> ret = nand_check_erased_ecc_chunk(data, ecc->size, >> oob, ecc->bytes + 2, >> NULL, 0, >> ecc->strength); >> >> It confused me, because 'oob' buffer contains both user bytes and ECC code, >> 'ecc->bytes + 2' is 16. May be it should be: >> >> ret = nand_check_erased_ecc_chunk(data, ecc->size, >> oob + 2, ecc->bytes, >> NULL, 0, >> ecc->strength); > > When you check for an erased chunk you should probably check the whole > OOB area. Ok, I see > >> >> For example let's look on Tegra's driver (drivers/mtd/nand/raw/tegra_nand.c): >> >> u8 *oob = chip->oob_poi + nand->ecc.offset + >> (chip->ecc.bytes * bit); >> >> ret = nand_check_erased_ecc_chunk(data, chip->ecc.size, >> oob, chip->ecc.bytes, >> NULL, 0, >> chip->ecc.strength); >> >> 'oob' contains 'nand->ecc.offset', and ECC length does not account user bytes length >> (e.g. 2) - it is just 'chip->ecc.bytes' > > I haven't looked carefully, but be aware there are two user bytes > reserved at the beginning of the OOB area for marking bad blocks. There > *may* be a confusion somewhere. I am not saying there is one, just a > hint if you can't find an explanation. > > > Thanks, > Miquèl
Hi Arseniy, avkrasnov@sberdevices.ru wrote on Thu, 13 Apr 2023 12:36:24 +0300: > On 13.04.2023 11:22, Miquel Raynal wrote: > > Hi Arseniy, > > > > avkrasnov@sberdevices.ru wrote on Thu, 13 Apr 2023 10:00:24 +0300: > > > >> On 13.04.2023 09:11, Liang Yang wrote: > >>> > >>> On 2023/4/13 13:32, Liang Yang wrote: > >>>> Hi Miquel, > >>>> > >>>> On 2023/4/12 22:32, Miquel Raynal wrote: > >>>>> [ EXTERNAL EMAIL ] > >>>>> > >>>>> Hello, > >>>>> > >>>>> liang.yang@amlogic.com wrote on Wed, 12 Apr 2023 22:04:28 +0800: > >>>>> > >>>>>> Hi Miquel and Arseniy, > >>>>>> > >>>>>> On 2023/4/12 20:57, Miquel Raynal wrote: > >>>>>>> [ EXTERNAL EMAIL ] > >>>>>>> > >>>>>>> Hi Arseniy, > >>>>>>> > >>>>>>> avkrasnov@sberdevices.ru wrote on Wed, 12 Apr 2023 15:22:26 +0300: > >>>>>>>> On 12.04.2023 15:18, Miquel Raynal wrote: > >>>>>>>>> Hi Arseniy, > >>>>>>>>> > >>>>>>>>> avkrasnov@sberdevices.ru wrote on Wed, 12 Apr 2023 13:14:52 +0300: > >>>>>>>>> >>>> On 12.04.2023 12:36, Miquel Raynal wrote: > >>>>>>>>>>> Hi Arseniy, > >>>>>>>>>>> > >>>>>>>>>>> avkrasnov@sberdevices.ru wrote on Wed, 12 Apr 2023 12:20:55 +0300: > >>>>>>>>>>> >>>>>> On 12.04.2023 10:44, Miquel Raynal wrote: > >>>>>>>>>>>>> Hi Arseniy, > >>>>>>>>>>>>> > >>>>>>>>>>>>> AVKrasnov@sberdevices.ru wrote on Wed, 12 Apr 2023 09:16:58 +0300: > >>>>>>>>>>>>> >>>>>>>> This NAND reads only few user's bytes in ECC mode (not full OOB), so > >>>>>>>>>>>>> > >>>>>>>>>>>>> "This NAND reads" does not look right, do you mean "Subpage reads do > >>>>>>>>>>>>> not retrieve all the OOB bytes,"? > >>>>>>>>>>>>> >>>>>>>> fill OOB buffer with zeroes to not return garbage from previous reads > >>>>>>>>>>>>>> to user. > >>>>>>>>>>>>>> Otherwise 'nanddump' utility prints something like this for just erased > >>>>>>>>>>>>>> page: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> ... > >>>>>>>>>>>>>> 0x000007f0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > >>>>>>>>>>>>>> OOB Data: ff ff ff ff 00 00 ff ff 80 cf 22 99 cb ad d3 be > >>>>>>>>>>>>>> OOB Data: 63 27 ae 06 16 0a 2f eb bb dd 46 74 41 8e 88 6e > >>>>>>>>>>>>>> OOB Data: 38 a1 2d e6 77 d4 05 06 f2 a5 7e 25 eb 34 7c ff > >>>>>>>>>>>>>> OOB Data: 38 ea de 14 10 de 9b 40 33 16 6a cc 9d aa 2f 5e > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru> > >>>>>>>>>>>>>> --- > >>>>>>>>>>>>>> drivers/mtd/nand/raw/meson_nand.c | 5 +++++ > >>>>>>>>>>>>>> 1 file changed, 5 insertions(+) > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> diff --git a/drivers/mtd/nand/raw/meson_nand.c b/drivers/mtd/nand/raw/meson_nand.c > >>>>>>>>>>>>>> index f84a10238e4d..f2f2472cb511 100644 > >>>>>>>>>>>>>> --- a/drivers/mtd/nand/raw/meson_nand.c > >>>>>>>>>>>>>> +++ b/drivers/mtd/nand/raw/meson_nand.c > >>>>>>>>>>>>>> @@ -858,9 +858,12 @@ static int meson_nfc_read_page_sub(struct nand_chip *nand, > >>>>>>>>>>>>>> static int meson_nfc_read_page_raw(struct nand_chip *nand, u8 *buf, > >>>>>>>>>>>>>> int oob_required, int page) > >>>>>>>>>>>>>> { > >>>>>>>>>>>>>> + struct mtd_info *mtd = nand_to_mtd(nand); > >>>>>>>>>>>>>> u8 *oob_buf = nand->oob_poi; > >>>>>>>>>>>>>> int ret; > >>>>>>>>>>>>>> >>>>>>>> + memset(oob_buf, 0, mtd->oobsize); > >>>>>>>>>>>>> > >>>>>>>>>>>>> I'm surprised raw reads do not read the entire OOB? > >>>>>>>>>>>> > >>>>>>>>>>>> Yes! Seems in case of raw access (what i see in this driver) number of OOB bytes read > >>>>>>>>>>>> still depends on ECC parameters: for each portion of data covered with ECC code we can > >>>>>>>>>>>> read it's ECC code and "user bytes" from OOB - it is what i see by dumping DMA buffer by > >>>>>>>>>>>> printk(). For example I'm working with 2K NAND pages, each page has 2 x 1K ECC blocks. > >>>>>>>>>>>> For each ECC block I have 16 OOB bytes which I can access by read/write. Each 16 bytes > >>>>>>>>>>>> contains 2 bytes of user's data and 14 bytes ECC codes. So when I read page in raw mode > >>>>>>>>>>>> controller returns 32 bytes (2 x (2 + 14)) of OOB. While OOB is reported as 64 bytes. > >>>>>>>>>>> > >>>>>>>>>>> In all modes, when you read OOB, you should get the full OOB. The fact > >>>>>>>>>>> that ECC correction is enabled or disabled does not matter. If the NAND > >>>>>>>>>>> features OOB sections of 64 bytes, you should get the 64 bytes. > >>>>>>>>>>> > >>>>>>>>>>> What happens sometimes, is that some of the bytes are not protected > >>>>>>>>>>> against bitflips, but the policy is to return the full buffer. > >>>>>>>>>> > >>>>>>>>>> Ok, so to clarify case for this NAND controller: > >>>>>>>>>> 1) In both ECC and raw modes i need to return the same raw OOB data (e.g. user bytes > >>>>>>>>>> + ECC codes)? > >>>>>>>>> > >>>>>>>>> Well, you need to cover the same amount of data, yes. But in the ECC > >>>>>>>>> case the data won't be raw (at least not all of it). > >>>>>>>> > >>>>>>>> So "same amount of data", in ECC mode current implementation returns only user OOB bytes (e.g. > >>>>>>>> OOB data excluding ECC codes), in raw it returns user bytes + ECC codes. IIUC correct > >>>>>>>> behaviour is to always return user bytes + ECC codes as OOB data even in ECC mode ? > >>>>>>> > >>>>>>> If the page are 2k+64B you should read 2k+64B when OOB are requested. > >>>>>>> > >>>>>>> If the controller only returns 2k+32B, then perform a random read to > >>>>>>> just move the read pointer to mtd->size + mtd->oobsize - 32 and > >>>>>>> retrieve the missing 32 bytes? > >>>>>> > >>>>>> 1) raw read can read out the whole page data 2k+64B, decided by the len in the controller raw read command: > >>>>>> cmd = (len & GENMASK(5, 0)) | scrambler | DMA_DIR(dir); > >>>>>> after that, the missing oob bytes(not used) can be copied from meson_chip->data_buf. so the implementation of meson_nfc_read_page_raw() is like this if need. > >>>>>> { > >>>>>> ...... > >>>>>> meson_nfc_read_page_sub(nand, page, 1); > >>>>>> meson_nfc_get_data_oob(nand, buf, oob_buf); > >>>>>> oob_len = (nand->ecc.bytes + 2) * nand->ecc.steps; > >>>>>> memcpy(oob_buf + oob_len, meson_chip->data_buf + oob_len, mtd->oobsize - oob_len); > >>>>>> > >>>>>> } > >>>>>> 2) In ECC mode, the controller can't bring back the missing OOB bytes. it can read out the user bytes and ecc bytes per meson_ooblayout_ops define. > >>>>> > >>>>> And then (if oob_required) you can bring the missing bytes with > >>>>> something along: > >>>>> nand_change_read_column_op(chip, mtd->writesize + oob_len, > >>>>> oob_buf + oob_len, > >>>>> mtd->oobsize - oob_len, > >>>>> false); > >>>>> Should not be a huge performance hit. > >>>> > >>>> After finishing ECC mode reading, the column address internal in NAND device should be the right pos; it doesn't need to change the column again. so adding controller raw read for the missing bytes after ECC reading may works. > >>>> > >>> use raw read for the missing bytes, but they are not protected by host ECC. to the NAND type of storage, is it ok or missing bytes better to be filled with 0xff? > >> > >> IIUC Miquèl's reply, valid behaviour is to return full OOB data in both modes. So in: > >> ECC mode it is user bytes(corrected by ECC, read from info buffer) + ECC + missing bytes. ECC and missing bytes read in RAW mode. > > > > I believe the ECC bytes you'll get will be corrected. > > You can check this by using nandflipbits in mtd-utils. > > Sorry, didn't get it, i'm new in NAND area. Bytes of ECC codes are available only in raw mode (at least in this NAND > driver) also as missing bytes of OOB. Gasp. Yeah that's a controller limitation, okay. > So IIUC ECC codes are metadata to correct data bytes, and thus > couldn't be corrected. We consider them metadata, but they are fully part of the ECC scheme and thus their correction is part of the process, bitflips in the ECC bytes will count as data bitflips actually. I talked a bit about ECC engines at a previous conference if it can help: https://elinux.org/ELC_Europe_2020_Presentations 'Understand ECC Support for NAND Flash Devices in Linux' And also wrote a blog post with a chapter about ECC engines: https://bootlin.com/blog/supporting-a-misbehaving-nand-ecc-engine/ Thanks, Miquèl
Hi Dmitry, ddrokosov@sberdevices.ru wrote on Thu, 13 Apr 2023 12:27:06 +0300: > On Wed, Apr 12, 2023 at 10:56:03PM +0200, Miquel Raynal wrote: > > Hi Dmitry, > > > > ddrokosov@sberdevices.ru wrote on Wed, 12 Apr 2023 22:15:48 +0300: > > > > > On Wed, Apr 12, 2023 at 10:04:28PM +0800, Liang Yang wrote: > > > > Hi Miquel and Arseniy, > > > > > > > > On 2023/4/12 20:57, Miquel Raynal wrote: > > > > > [ EXTERNAL EMAIL ] > > > > > > > > > > Hi Arseniy, > > > > > > > > > > avkrasnov@sberdevices.ru wrote on Wed, 12 Apr 2023 15:22:26 +0300: > > > > > > > > > > > On 12.04.2023 15:18, Miquel Raynal wrote: > > > > > > > Hi Arseniy, > > > > > > > > > > > > > > avkrasnov@sberdevices.ru wrote on Wed, 12 Apr 2023 13:14:52 +0300: > > > > > > > > On 12.04.2023 12:36, Miquel Raynal wrote: > > > > > > > > > Hi Arseniy, > > > > > > > > > > > > > > > > > > avkrasnov@sberdevices.ru wrote on Wed, 12 Apr 2023 12:20:55 +0300: > > > > > > > > > > On 12.04.2023 10:44, Miquel Raynal wrote: > > > > > > > > > > > Hi Arseniy, > > > > > > > > > > > > > > > > > > > > > > AVKrasnov@sberdevices.ru wrote on Wed, 12 Apr 2023 09:16:58 +0300: > > > > > > > > > > > > This NAND reads only few user's bytes in ECC mode (not full OOB), so > > > > > > > > > > > > > > > > > > > > > > "This NAND reads" does not look right, do you mean "Subpage reads do > > > > > > > > > > > not retrieve all the OOB bytes,"? > > > > > > > > > > > > fill OOB buffer with zeroes to not return garbage from previous reads > > > > > > > > > > > > to user. > > > > > > > > > > > > Otherwise 'nanddump' utility prints something like this for just erased > > > > > > > > > > > > page: > > > > > > > > > > > > > > > > > > > > > > > > ... > > > > > > > > > > > > 0x000007f0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > > > > > > > > > > > > OOB Data: ff ff ff ff 00 00 ff ff 80 cf 22 99 cb ad d3 be > > > > > > > > > > > > OOB Data: 63 27 ae 06 16 0a 2f eb bb dd 46 74 41 8e 88 6e > > > > > > > > > > > > OOB Data: 38 a1 2d e6 77 d4 05 06 f2 a5 7e 25 eb 34 7c ff > > > > > > > > > > > > OOB Data: 38 ea de 14 10 de 9b 40 33 16 6a cc 9d aa 2f 5e > > > > > > > > > > > > > > > > > > > > > > > > Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru> > > > > > > > > > > > > --- > > > > > > > > > > > > drivers/mtd/nand/raw/meson_nand.c | 5 +++++ > > > > > > > > > > > > 1 file changed, 5 insertions(+) > > > > > > > > > > > > > > > > > > > > > > > > diff --git a/drivers/mtd/nand/raw/meson_nand.c b/drivers/mtd/nand/raw/meson_nand.c > > > > > > > > > > > > index f84a10238e4d..f2f2472cb511 100644 > > > > > > > > > > > > --- a/drivers/mtd/nand/raw/meson_nand.c > > > > > > > > > > > > +++ b/drivers/mtd/nand/raw/meson_nand.c > > > > > > > > > > > > @@ -858,9 +858,12 @@ static int meson_nfc_read_page_sub(struct nand_chip *nand, > > > > > > > > > > > > static int meson_nfc_read_page_raw(struct nand_chip *nand, u8 *buf, > > > > > > > > > > > > int oob_required, int page) > > > > > > > > > > > > { > > > > > > > > > > > > + struct mtd_info *mtd = nand_to_mtd(nand); > > > > > > > > > > > > u8 *oob_buf = nand->oob_poi; > > > > > > > > > > > > int ret; > > > > > > > > > > > > + memset(oob_buf, 0, mtd->oobsize); > > > > > > > > > > > > > > > > > > > > > > I'm surprised raw reads do not read the entire OOB? > > > > > > > > > > > > > > > > > > > > Yes! Seems in case of raw access (what i see in this driver) number of OOB bytes read > > > > > > > > > > still depends on ECC parameters: for each portion of data covered with ECC code we can > > > > > > > > > > read it's ECC code and "user bytes" from OOB - it is what i see by dumping DMA buffer by > > > > > > > > > > printk(). For example I'm working with 2K NAND pages, each page has 2 x 1K ECC blocks. > > > > > > > > > > For each ECC block I have 16 OOB bytes which I can access by read/write. Each 16 bytes > > > > > > > > > > contains 2 bytes of user's data and 14 bytes ECC codes. So when I read page in raw mode > > > > > > > > > > controller returns 32 bytes (2 x (2 + 14)) of OOB. While OOB is reported as 64 bytes. > > > > > > > > > > > > > > > > > > In all modes, when you read OOB, you should get the full OOB. The fact > > > > > > > > > that ECC correction is enabled or disabled does not matter. If the NAND > > > > > > > > > features OOB sections of 64 bytes, you should get the 64 bytes. > > > > > > > > > > > > > > > > > > What happens sometimes, is that some of the bytes are not protected > > > > > > > > > against bitflips, but the policy is to return the full buffer. > > > > > > > > > > > > > > > > Ok, so to clarify case for this NAND controller: > > > > > > > > 1) In both ECC and raw modes i need to return the same raw OOB data (e.g. user bytes > > > > > > > > + ECC codes)? > > > > > > > > > > > > > > Well, you need to cover the same amount of data, yes. But in the ECC > > > > > > > case the data won't be raw (at least not all of it). > > > > > > > > > > > > So "same amount of data", in ECC mode current implementation returns only user OOB bytes (e.g. > > > > > > OOB data excluding ECC codes), in raw it returns user bytes + ECC codes. IIUC correct > > > > > > behaviour is to always return user bytes + ECC codes as OOB data even in ECC mode ? > > > > > > > > > > If the page are 2k+64B you should read 2k+64B when OOB are requested. > > > > > > > > > > If the controller only returns 2k+32B, then perform a random read to > > > > > just move the read pointer to mtd->size + mtd->oobsize - 32 and > > > > > retrieve the missing 32 bytes? > > > > > > > > 1) raw read can read out the whole page data 2k+64B, decided by the len in > > > > the controller raw read command: > > > > cmd = (len & GENMASK(5, 0)) | scrambler | DMA_DIR(dir); > > > > after that, the missing oob bytes(not used) can be copied from > > > > meson_chip->data_buf. so the implementation of meson_nfc_read_page_raw() is > > > > like this if need. > > > > { > > > > ...... > > > > meson_nfc_read_page_sub(nand, page, 1); > > > > meson_nfc_get_data_oob(nand, buf, oob_buf); > > > > oob_len = (nand->ecc.bytes + 2) * nand->ecc.steps; > > > > memcpy(oob_buf + oob_len, meson_chip->data_buf + oob_len, mtd->oobsize - > > > > oob_len); > > > > > > > > } > > > > 2) In ECC mode, the controller can't bring back the missing OOB bytes. it > > > > can read out the user bytes and ecc bytes per meson_ooblayout_ops define. > > > > > > > > > > How does the Meson controller know the actual NAND flash layout when the > > > OOB is split into protected and unprotected areas, such as Free and ECC > > > areas? If the controller has a static OOB layout, where is the mapping > > > located? > > > > It's usually a set of values hardcoded in the driver. It's a per > > geometry set. > > > > Sorry, I'm still confused. Before I developed spinand drivers, the OOB layout > was located on the flash driver side. The spinand subsystem is different, most of the chips have on-die ECC, which means the ECC layout is most of the time per-chip. In the raw NAND layer, most of the time the ECC engine is merged with the NAND controller and thus the list of available layouts depend on the controller. But these layouts often adapt to the NAND chip geometry, so you can only now which layout to use after identifying the chip (there is an ->attach_chip() hook to handle things related to geometry after controller base initialization. > Do you mean if the OOB geometry in the rawnand subsystem is under the > responsibility of the controller driver? The geometry is NAND chip specific. The OOB layout depends on the chip geometry, the ECC engine capabilities and configuration (like the strength). The discovery of the geometry is performed by the core (using raw NAND controller driver callbacks of course). If on-host ECC engine is picked (the default in the raw NAND subsystem), then the controller driver picks the correct OOB layout and refuses to probe otherwise. Hope its clearer now :) Thanks, Miquèl
On 13.04.2023 13:22, Miquel Raynal wrote: > Hi Arseniy, > > avkrasnov@sberdevices.ru wrote on Thu, 13 Apr 2023 12:36:24 +0300: > >> On 13.04.2023 11:22, Miquel Raynal wrote: >>> Hi Arseniy, >>> >>> avkrasnov@sberdevices.ru wrote on Thu, 13 Apr 2023 10:00:24 +0300: >>> >>>> On 13.04.2023 09:11, Liang Yang wrote: >>>>> >>>>> On 2023/4/13 13:32, Liang Yang wrote: >>>>>> Hi Miquel, >>>>>> >>>>>> On 2023/4/12 22:32, Miquel Raynal wrote: >>>>>>> [ EXTERNAL EMAIL ] >>>>>>> >>>>>>> Hello, >>>>>>> >>>>>>> liang.yang@amlogic.com wrote on Wed, 12 Apr 2023 22:04:28 +0800: >>>>>>> >>>>>>>> Hi Miquel and Arseniy, >>>>>>>> >>>>>>>> On 2023/4/12 20:57, Miquel Raynal wrote: >>>>>>>>> [ EXTERNAL EMAIL ] >>>>>>>>> >>>>>>>>> Hi Arseniy, >>>>>>>>> >>>>>>>>> avkrasnov@sberdevices.ru wrote on Wed, 12 Apr 2023 15:22:26 +0300: >>>>>>>>>> On 12.04.2023 15:18, Miquel Raynal wrote: >>>>>>>>>>> Hi Arseniy, >>>>>>>>>>> >>>>>>>>>>> avkrasnov@sberdevices.ru wrote on Wed, 12 Apr 2023 13:14:52 +0300: >>>>>>>>>>> >>>> On 12.04.2023 12:36, Miquel Raynal wrote: >>>>>>>>>>>>> Hi Arseniy, >>>>>>>>>>>>> >>>>>>>>>>>>> avkrasnov@sberdevices.ru wrote on Wed, 12 Apr 2023 12:20:55 +0300: >>>>>>>>>>>>> >>>>>> On 12.04.2023 10:44, Miquel Raynal wrote: >>>>>>>>>>>>>>> Hi Arseniy, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> AVKrasnov@sberdevices.ru wrote on Wed, 12 Apr 2023 09:16:58 +0300: >>>>>>>>>>>>>>> >>>>>>>> This NAND reads only few user's bytes in ECC mode (not full OOB), so >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> "This NAND reads" does not look right, do you mean "Subpage reads do >>>>>>>>>>>>>>> not retrieve all the OOB bytes,"? >>>>>>>>>>>>>>> >>>>>>>> fill OOB buffer with zeroes to not return garbage from previous reads >>>>>>>>>>>>>>>> to user. >>>>>>>>>>>>>>>> Otherwise 'nanddump' utility prints something like this for just erased >>>>>>>>>>>>>>>> page: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> ... >>>>>>>>>>>>>>>> 0x000007f0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff >>>>>>>>>>>>>>>> OOB Data: ff ff ff ff 00 00 ff ff 80 cf 22 99 cb ad d3 be >>>>>>>>>>>>>>>> OOB Data: 63 27 ae 06 16 0a 2f eb bb dd 46 74 41 8e 88 6e >>>>>>>>>>>>>>>> OOB Data: 38 a1 2d e6 77 d4 05 06 f2 a5 7e 25 eb 34 7c ff >>>>>>>>>>>>>>>> OOB Data: 38 ea de 14 10 de 9b 40 33 16 6a cc 9d aa 2f 5e >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru> >>>>>>>>>>>>>>>> --- >>>>>>>>>>>>>>>> drivers/mtd/nand/raw/meson_nand.c | 5 +++++ >>>>>>>>>>>>>>>> 1 file changed, 5 insertions(+) >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> diff --git a/drivers/mtd/nand/raw/meson_nand.c b/drivers/mtd/nand/raw/meson_nand.c >>>>>>>>>>>>>>>> index f84a10238e4d..f2f2472cb511 100644 >>>>>>>>>>>>>>>> --- a/drivers/mtd/nand/raw/meson_nand.c >>>>>>>>>>>>>>>> +++ b/drivers/mtd/nand/raw/meson_nand.c >>>>>>>>>>>>>>>> @@ -858,9 +858,12 @@ static int meson_nfc_read_page_sub(struct nand_chip *nand, >>>>>>>>>>>>>>>> static int meson_nfc_read_page_raw(struct nand_chip *nand, u8 *buf, >>>>>>>>>>>>>>>> int oob_required, int page) >>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>> + struct mtd_info *mtd = nand_to_mtd(nand); >>>>>>>>>>>>>>>> u8 *oob_buf = nand->oob_poi; >>>>>>>>>>>>>>>> int ret; >>>>>>>>>>>>>>>> >>>>>>>> + memset(oob_buf, 0, mtd->oobsize); >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I'm surprised raw reads do not read the entire OOB? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Yes! Seems in case of raw access (what i see in this driver) number of OOB bytes read >>>>>>>>>>>>>> still depends on ECC parameters: for each portion of data covered with ECC code we can >>>>>>>>>>>>>> read it's ECC code and "user bytes" from OOB - it is what i see by dumping DMA buffer by >>>>>>>>>>>>>> printk(). For example I'm working with 2K NAND pages, each page has 2 x 1K ECC blocks. >>>>>>>>>>>>>> For each ECC block I have 16 OOB bytes which I can access by read/write. Each 16 bytes >>>>>>>>>>>>>> contains 2 bytes of user's data and 14 bytes ECC codes. So when I read page in raw mode >>>>>>>>>>>>>> controller returns 32 bytes (2 x (2 + 14)) of OOB. While OOB is reported as 64 bytes. >>>>>>>>>>>>> >>>>>>>>>>>>> In all modes, when you read OOB, you should get the full OOB. The fact >>>>>>>>>>>>> that ECC correction is enabled or disabled does not matter. If the NAND >>>>>>>>>>>>> features OOB sections of 64 bytes, you should get the 64 bytes. >>>>>>>>>>>>> >>>>>>>>>>>>> What happens sometimes, is that some of the bytes are not protected >>>>>>>>>>>>> against bitflips, but the policy is to return the full buffer. >>>>>>>>>>>> >>>>>>>>>>>> Ok, so to clarify case for this NAND controller: >>>>>>>>>>>> 1) In both ECC and raw modes i need to return the same raw OOB data (e.g. user bytes >>>>>>>>>>>> + ECC codes)? >>>>>>>>>>> >>>>>>>>>>> Well, you need to cover the same amount of data, yes. But in the ECC >>>>>>>>>>> case the data won't be raw (at least not all of it). >>>>>>>>>> >>>>>>>>>> So "same amount of data", in ECC mode current implementation returns only user OOB bytes (e.g. >>>>>>>>>> OOB data excluding ECC codes), in raw it returns user bytes + ECC codes. IIUC correct >>>>>>>>>> behaviour is to always return user bytes + ECC codes as OOB data even in ECC mode ? >>>>>>>>> >>>>>>>>> If the page are 2k+64B you should read 2k+64B when OOB are requested. >>>>>>>>> >>>>>>>>> If the controller only returns 2k+32B, then perform a random read to >>>>>>>>> just move the read pointer to mtd->size + mtd->oobsize - 32 and >>>>>>>>> retrieve the missing 32 bytes? >>>>>>>> >>>>>>>> 1) raw read can read out the whole page data 2k+64B, decided by the len in the controller raw read command: >>>>>>>> cmd = (len & GENMASK(5, 0)) | scrambler | DMA_DIR(dir); >>>>>>>> after that, the missing oob bytes(not used) can be copied from meson_chip->data_buf. so the implementation of meson_nfc_read_page_raw() is like this if need. >>>>>>>> { >>>>>>>> ...... >>>>>>>> meson_nfc_read_page_sub(nand, page, 1); >>>>>>>> meson_nfc_get_data_oob(nand, buf, oob_buf); >>>>>>>> oob_len = (nand->ecc.bytes + 2) * nand->ecc.steps; >>>>>>>> memcpy(oob_buf + oob_len, meson_chip->data_buf + oob_len, mtd->oobsize - oob_len); >>>>>>>> >>>>>>>> } >>>>>>>> 2) In ECC mode, the controller can't bring back the missing OOB bytes. it can read out the user bytes and ecc bytes per meson_ooblayout_ops define. >>>>>>> >>>>>>> And then (if oob_required) you can bring the missing bytes with >>>>>>> something along: >>>>>>> nand_change_read_column_op(chip, mtd->writesize + oob_len, >>>>>>> oob_buf + oob_len, >>>>>>> mtd->oobsize - oob_len, >>>>>>> false); >>>>>>> Should not be a huge performance hit. >>>>>> >>>>>> After finishing ECC mode reading, the column address internal in NAND device should be the right pos; it doesn't need to change the column again. so adding controller raw read for the missing bytes after ECC reading may works. >>>>>> >>>>> use raw read for the missing bytes, but they are not protected by host ECC. to the NAND type of storage, is it ok or missing bytes better to be filled with 0xff? >>>> >>>> IIUC Miquèl's reply, valid behaviour is to return full OOB data in both modes. So in: >>>> ECC mode it is user bytes(corrected by ECC, read from info buffer) + ECC + missing bytes. ECC and missing bytes read in RAW mode. >>> >>> I believe the ECC bytes you'll get will be corrected. >>> You can check this by using nandflipbits in mtd-utils. >> >> Sorry, didn't get it, i'm new in NAND area. Bytes of ECC codes are available only in raw mode (at least in this NAND >> driver) also as missing bytes of OOB. > > Gasp. Yeah that's a controller limitation, okay. > >> So IIUC ECC codes are metadata to correct data bytes, and thus >> couldn't be corrected. > > We consider them metadata, but they are fully part of the ECC scheme > and thus their correction is part of the process, bitflips in the ECC > bytes will count as data bitflips actually. > > I talked a bit about ECC engines at a previous conference if it can > help: > https://elinux.org/ELC_Europe_2020_Presentations > 'Understand ECC Support for NAND Flash Devices in Linux' > And also wrote a blog post with a chapter about ECC engines: > https://bootlin.com/blog/supporting-a-misbehaving-nand-ecc-engine/ Thanks for this! Thanks, Arseniy > > Thanks, > Miquèl
On Thu, Apr 13, 2023 at 12:29:16PM +0200, Miquel Raynal wrote: > Hi Dmitry, > > ddrokosov@sberdevices.ru wrote on Thu, 13 Apr 2023 12:27:06 +0300: > > > On Wed, Apr 12, 2023 at 10:56:03PM +0200, Miquel Raynal wrote: > > > Hi Dmitry, > > > > > > ddrokosov@sberdevices.ru wrote on Wed, 12 Apr 2023 22:15:48 +0300: > > > > > > > On Wed, Apr 12, 2023 at 10:04:28PM +0800, Liang Yang wrote: > > > > > Hi Miquel and Arseniy, > > > > > > > > > > On 2023/4/12 20:57, Miquel Raynal wrote: > > > > > > [ EXTERNAL EMAIL ] > > > > > > > > > > > > Hi Arseniy, > > > > > > > > > > > > avkrasnov@sberdevices.ru wrote on Wed, 12 Apr 2023 15:22:26 +0300: > > > > > > > > > > > > > On 12.04.2023 15:18, Miquel Raynal wrote: > > > > > > > > Hi Arseniy, > > > > > > > > > > > > > > > > avkrasnov@sberdevices.ru wrote on Wed, 12 Apr 2023 13:14:52 +0300: > > > > > > > > > On 12.04.2023 12:36, Miquel Raynal wrote: > > > > > > > > > > Hi Arseniy, > > > > > > > > > > > > > > > > > > > > avkrasnov@sberdevices.ru wrote on Wed, 12 Apr 2023 12:20:55 +0300: > > > > > > > > > > > On 12.04.2023 10:44, Miquel Raynal wrote: > > > > > > > > > > > > Hi Arseniy, > > > > > > > > > > > > > > > > > > > > > > > > AVKrasnov@sberdevices.ru wrote on Wed, 12 Apr 2023 09:16:58 +0300: > > > > > > > > > > > > > This NAND reads only few user's bytes in ECC mode (not full OOB), so > > > > > > > > > > > > > > > > > > > > > > > > "This NAND reads" does not look right, do you mean "Subpage reads do > > > > > > > > > > > > not retrieve all the OOB bytes,"? > > > > > > > > > > > > > fill OOB buffer with zeroes to not return garbage from previous reads > > > > > > > > > > > > > to user. > > > > > > > > > > > > > Otherwise 'nanddump' utility prints something like this for just erased > > > > > > > > > > > > > page: > > > > > > > > > > > > > > > > > > > > > > > > > > ... > > > > > > > > > > > > > 0x000007f0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > > > > > > > > > > > > > OOB Data: ff ff ff ff 00 00 ff ff 80 cf 22 99 cb ad d3 be > > > > > > > > > > > > > OOB Data: 63 27 ae 06 16 0a 2f eb bb dd 46 74 41 8e 88 6e > > > > > > > > > > > > > OOB Data: 38 a1 2d e6 77 d4 05 06 f2 a5 7e 25 eb 34 7c ff > > > > > > > > > > > > > OOB Data: 38 ea de 14 10 de 9b 40 33 16 6a cc 9d aa 2f 5e > > > > > > > > > > > > > > > > > > > > > > > > > > Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru> > > > > > > > > > > > > > --- > > > > > > > > > > > > > drivers/mtd/nand/raw/meson_nand.c | 5 +++++ > > > > > > > > > > > > > 1 file changed, 5 insertions(+) > > > > > > > > > > > > > > > > > > > > > > > > > > diff --git a/drivers/mtd/nand/raw/meson_nand.c b/drivers/mtd/nand/raw/meson_nand.c > > > > > > > > > > > > > index f84a10238e4d..f2f2472cb511 100644 > > > > > > > > > > > > > --- a/drivers/mtd/nand/raw/meson_nand.c > > > > > > > > > > > > > +++ b/drivers/mtd/nand/raw/meson_nand.c > > > > > > > > > > > > > @@ -858,9 +858,12 @@ static int meson_nfc_read_page_sub(struct nand_chip *nand, > > > > > > > > > > > > > static int meson_nfc_read_page_raw(struct nand_chip *nand, u8 *buf, > > > > > > > > > > > > > int oob_required, int page) > > > > > > > > > > > > > { > > > > > > > > > > > > > + struct mtd_info *mtd = nand_to_mtd(nand); > > > > > > > > > > > > > u8 *oob_buf = nand->oob_poi; > > > > > > > > > > > > > int ret; > > > > > > > > > > > > > + memset(oob_buf, 0, mtd->oobsize); > > > > > > > > > > > > > > > > > > > > > > > > I'm surprised raw reads do not read the entire OOB? > > > > > > > > > > > > > > > > > > > > > > Yes! Seems in case of raw access (what i see in this driver) number of OOB bytes read > > > > > > > > > > > still depends on ECC parameters: for each portion of data covered with ECC code we can > > > > > > > > > > > read it's ECC code and "user bytes" from OOB - it is what i see by dumping DMA buffer by > > > > > > > > > > > printk(). For example I'm working with 2K NAND pages, each page has 2 x 1K ECC blocks. > > > > > > > > > > > For each ECC block I have 16 OOB bytes which I can access by read/write. Each 16 bytes > > > > > > > > > > > contains 2 bytes of user's data and 14 bytes ECC codes. So when I read page in raw mode > > > > > > > > > > > controller returns 32 bytes (2 x (2 + 14)) of OOB. While OOB is reported as 64 bytes. > > > > > > > > > > > > > > > > > > > > In all modes, when you read OOB, you should get the full OOB. The fact > > > > > > > > > > that ECC correction is enabled or disabled does not matter. If the NAND > > > > > > > > > > features OOB sections of 64 bytes, you should get the 64 bytes. > > > > > > > > > > > > > > > > > > > > What happens sometimes, is that some of the bytes are not protected > > > > > > > > > > against bitflips, but the policy is to return the full buffer. > > > > > > > > > > > > > > > > > > Ok, so to clarify case for this NAND controller: > > > > > > > > > 1) In both ECC and raw modes i need to return the same raw OOB data (e.g. user bytes > > > > > > > > > + ECC codes)? > > > > > > > > > > > > > > > > Well, you need to cover the same amount of data, yes. But in the ECC > > > > > > > > case the data won't be raw (at least not all of it). > > > > > > > > > > > > > > So "same amount of data", in ECC mode current implementation returns only user OOB bytes (e.g. > > > > > > > OOB data excluding ECC codes), in raw it returns user bytes + ECC codes. IIUC correct > > > > > > > behaviour is to always return user bytes + ECC codes as OOB data even in ECC mode ? > > > > > > > > > > > > If the page are 2k+64B you should read 2k+64B when OOB are requested. > > > > > > > > > > > > If the controller only returns 2k+32B, then perform a random read to > > > > > > just move the read pointer to mtd->size + mtd->oobsize - 32 and > > > > > > retrieve the missing 32 bytes? > > > > > > > > > > 1) raw read can read out the whole page data 2k+64B, decided by the len in > > > > > the controller raw read command: > > > > > cmd = (len & GENMASK(5, 0)) | scrambler | DMA_DIR(dir); > > > > > after that, the missing oob bytes(not used) can be copied from > > > > > meson_chip->data_buf. so the implementation of meson_nfc_read_page_raw() is > > > > > like this if need. > > > > > { > > > > > ...... > > > > > meson_nfc_read_page_sub(nand, page, 1); > > > > > meson_nfc_get_data_oob(nand, buf, oob_buf); > > > > > oob_len = (nand->ecc.bytes + 2) * nand->ecc.steps; > > > > > memcpy(oob_buf + oob_len, meson_chip->data_buf + oob_len, mtd->oobsize - > > > > > oob_len); > > > > > > > > > > } > > > > > 2) In ECC mode, the controller can't bring back the missing OOB bytes. it > > > > > can read out the user bytes and ecc bytes per meson_ooblayout_ops define. > > > > > > > > > > > > > How does the Meson controller know the actual NAND flash layout when the > > > > OOB is split into protected and unprotected areas, such as Free and ECC > > > > areas? If the controller has a static OOB layout, where is the mapping > > > > located? > > > > > > It's usually a set of values hardcoded in the driver. It's a per > > > geometry set. > > > > > > > Sorry, I'm still confused. Before I developed spinand drivers, the OOB layout > > was located on the flash driver side. > > The spinand subsystem is different, most of the chips have on-die ECC, > which means the ECC layout is most of the time per-chip. > > In the raw NAND layer, most of the time the ECC engine is merged with > the NAND controller and thus the list of available layouts depend on > the controller. But these layouts often adapt to the NAND chip > geometry, so you can only now which layout to use after identifying the > chip (there is an ->attach_chip() hook to handle things related to > geometry after controller base initialization. > > > Do you mean if the OOB geometry in the rawnand subsystem is under the > > responsibility of the controller driver? > > The geometry is NAND chip specific. > The OOB layout depends on the chip geometry, the ECC engine > capabilities and configuration (like the strength). > The discovery of the geometry is performed by the core (using raw NAND > controller driver callbacks of course). > If on-host ECC engine is picked (the default in the raw NAND > subsystem), then the controller driver picks the correct OOB layout and > refuses to probe otherwise. > > Hope its clearer now :) Yeah, pretty clear! I wanted to express my special thanks to you for sharing such detailed information about the raw NAND layer! Your insights will be incredibly helpful for me and will provide valuable context that I would not have otherwise had. Thank you so much for taking the time to share your expertise with me!
On 13.04.2023 13:35, Arseniy Krasnov wrote: > > > On 13.04.2023 13:22, Miquel Raynal wrote: >> Hi Arseniy, >> >> avkrasnov@sberdevices.ru wrote on Thu, 13 Apr 2023 12:36:24 +0300: >> >>> On 13.04.2023 11:22, Miquel Raynal wrote: >>>> Hi Arseniy, >>>> >>>> avkrasnov@sberdevices.ru wrote on Thu, 13 Apr 2023 10:00:24 +0300: >>>> >>>>> On 13.04.2023 09:11, Liang Yang wrote: >>>>>> >>>>>> On 2023/4/13 13:32, Liang Yang wrote: >>>>>>> Hi Miquel, >>>>>>> >>>>>>> On 2023/4/12 22:32, Miquel Raynal wrote: >>>>>>>> [ EXTERNAL EMAIL ] >>>>>>>> >>>>>>>> Hello, >>>>>>>> >>>>>>>> liang.yang@amlogic.com wrote on Wed, 12 Apr 2023 22:04:28 +0800: >>>>>>>> >>>>>>>>> Hi Miquel and Arseniy, >>>>>>>>> >>>>>>>>> On 2023/4/12 20:57, Miquel Raynal wrote: >>>>>>>>>> [ EXTERNAL EMAIL ] >>>>>>>>>> >>>>>>>>>> Hi Arseniy, >>>>>>>>>> >>>>>>>>>> avkrasnov@sberdevices.ru wrote on Wed, 12 Apr 2023 15:22:26 +0300: >>>>>>>>>>> On 12.04.2023 15:18, Miquel Raynal wrote: >>>>>>>>>>>> Hi Arseniy, >>>>>>>>>>>> >>>>>>>>>>>> avkrasnov@sberdevices.ru wrote on Wed, 12 Apr 2023 13:14:52 +0300: >>>>>>>>>>>> >>>> On 12.04.2023 12:36, Miquel Raynal wrote: >>>>>>>>>>>>>> Hi Arseniy, >>>>>>>>>>>>>> >>>>>>>>>>>>>> avkrasnov@sberdevices.ru wrote on Wed, 12 Apr 2023 12:20:55 +0300: >>>>>>>>>>>>>> >>>>>> On 12.04.2023 10:44, Miquel Raynal wrote: >>>>>>>>>>>>>>>> Hi Arseniy, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> AVKrasnov@sberdevices.ru wrote on Wed, 12 Apr 2023 09:16:58 +0300: >>>>>>>>>>>>>>>> >>>>>>>> This NAND reads only few user's bytes in ECC mode (not full OOB), so >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> "This NAND reads" does not look right, do you mean "Subpage reads do >>>>>>>>>>>>>>>> not retrieve all the OOB bytes,"? >>>>>>>>>>>>>>>> >>>>>>>> fill OOB buffer with zeroes to not return garbage from previous reads >>>>>>>>>>>>>>>>> to user. >>>>>>>>>>>>>>>>> Otherwise 'nanddump' utility prints something like this for just erased >>>>>>>>>>>>>>>>> page: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> ... >>>>>>>>>>>>>>>>> 0x000007f0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff >>>>>>>>>>>>>>>>> OOB Data: ff ff ff ff 00 00 ff ff 80 cf 22 99 cb ad d3 be >>>>>>>>>>>>>>>>> OOB Data: 63 27 ae 06 16 0a 2f eb bb dd 46 74 41 8e 88 6e >>>>>>>>>>>>>>>>> OOB Data: 38 a1 2d e6 77 d4 05 06 f2 a5 7e 25 eb 34 7c ff >>>>>>>>>>>>>>>>> OOB Data: 38 ea de 14 10 de 9b 40 33 16 6a cc 9d aa 2f 5e >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru> >>>>>>>>>>>>>>>>> --- >>>>>>>>>>>>>>>>> drivers/mtd/nand/raw/meson_nand.c | 5 +++++ >>>>>>>>>>>>>>>>> 1 file changed, 5 insertions(+) >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> diff --git a/drivers/mtd/nand/raw/meson_nand.c b/drivers/mtd/nand/raw/meson_nand.c >>>>>>>>>>>>>>>>> index f84a10238e4d..f2f2472cb511 100644 >>>>>>>>>>>>>>>>> --- a/drivers/mtd/nand/raw/meson_nand.c >>>>>>>>>>>>>>>>> +++ b/drivers/mtd/nand/raw/meson_nand.c >>>>>>>>>>>>>>>>> @@ -858,9 +858,12 @@ static int meson_nfc_read_page_sub(struct nand_chip *nand, >>>>>>>>>>>>>>>>> static int meson_nfc_read_page_raw(struct nand_chip *nand, u8 *buf, >>>>>>>>>>>>>>>>> int oob_required, int page) >>>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>>> + struct mtd_info *mtd = nand_to_mtd(nand); >>>>>>>>>>>>>>>>> u8 *oob_buf = nand->oob_poi; >>>>>>>>>>>>>>>>> int ret; >>>>>>>>>>>>>>>>> >>>>>>>> + memset(oob_buf, 0, mtd->oobsize); >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I'm surprised raw reads do not read the entire OOB? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Yes! Seems in case of raw access (what i see in this driver) number of OOB bytes read >>>>>>>>>>>>>>> still depends on ECC parameters: for each portion of data covered with ECC code we can >>>>>>>>>>>>>>> read it's ECC code and "user bytes" from OOB - it is what i see by dumping DMA buffer by >>>>>>>>>>>>>>> printk(). For example I'm working with 2K NAND pages, each page has 2 x 1K ECC blocks. >>>>>>>>>>>>>>> For each ECC block I have 16 OOB bytes which I can access by read/write. Each 16 bytes >>>>>>>>>>>>>>> contains 2 bytes of user's data and 14 bytes ECC codes. So when I read page in raw mode >>>>>>>>>>>>>>> controller returns 32 bytes (2 x (2 + 14)) of OOB. While OOB is reported as 64 bytes. >>>>>>>>>>>>>> >>>>>>>>>>>>>> In all modes, when you read OOB, you should get the full OOB. The fact >>>>>>>>>>>>>> that ECC correction is enabled or disabled does not matter. If the NAND >>>>>>>>>>>>>> features OOB sections of 64 bytes, you should get the 64 bytes. >>>>>>>>>>>>>> >>>>>>>>>>>>>> What happens sometimes, is that some of the bytes are not protected >>>>>>>>>>>>>> against bitflips, but the policy is to return the full buffer. >>>>>>>>>>>>> >>>>>>>>>>>>> Ok, so to clarify case for this NAND controller: >>>>>>>>>>>>> 1) In both ECC and raw modes i need to return the same raw OOB data (e.g. user bytes >>>>>>>>>>>>> + ECC codes)? >>>>>>>>>>>> >>>>>>>>>>>> Well, you need to cover the same amount of data, yes. But in the ECC >>>>>>>>>>>> case the data won't be raw (at least not all of it). >>>>>>>>>>> >>>>>>>>>>> So "same amount of data", in ECC mode current implementation returns only user OOB bytes (e.g. >>>>>>>>>>> OOB data excluding ECC codes), in raw it returns user bytes + ECC codes. IIUC correct >>>>>>>>>>> behaviour is to always return user bytes + ECC codes as OOB data even in ECC mode ? >>>>>>>>>> >>>>>>>>>> If the page are 2k+64B you should read 2k+64B when OOB are requested. >>>>>>>>>> >>>>>>>>>> If the controller only returns 2k+32B, then perform a random read to >>>>>>>>>> just move the read pointer to mtd->size + mtd->oobsize - 32 and >>>>>>>>>> retrieve the missing 32 bytes? >>>>>>>>> >>>>>>>>> 1) raw read can read out the whole page data 2k+64B, decided by the len in the controller raw read command: >>>>>>>>> cmd = (len & GENMASK(5, 0)) | scrambler | DMA_DIR(dir); >>>>>>>>> after that, the missing oob bytes(not used) can be copied from meson_chip->data_buf. so the implementation of meson_nfc_read_page_raw() is like this if need. >>>>>>>>> { >>>>>>>>> ...... >>>>>>>>> meson_nfc_read_page_sub(nand, page, 1); >>>>>>>>> meson_nfc_get_data_oob(nand, buf, oob_buf); >>>>>>>>> oob_len = (nand->ecc.bytes + 2) * nand->ecc.steps; >>>>>>>>> memcpy(oob_buf + oob_len, meson_chip->data_buf + oob_len, mtd->oobsize - oob_len); >>>>>>>>> >>>>>>>>> } >>>>>>>>> 2) In ECC mode, the controller can't bring back the missing OOB bytes. it can read out the user bytes and ecc bytes per meson_ooblayout_ops define. >>>>>>>> >>>>>>>> And then (if oob_required) you can bring the missing bytes with >>>>>>>> something along: >>>>>>>> nand_change_read_column_op(chip, mtd->writesize + oob_len, >>>>>>>> oob_buf + oob_len, >>>>>>>> mtd->oobsize - oob_len, >>>>>>>> false); >>>>>>>> Should not be a huge performance hit. >>>>>>> >>>>>>> After finishing ECC mode reading, the column address internal in NAND device should be the right pos; it doesn't need to change the column again. so adding controller raw read for the missing bytes after ECC reading may works. >>>>>>> >>>>>> use raw read for the missing bytes, but they are not protected by host ECC. to the NAND type of storage, is it ok or missing bytes better to be filled with 0xff? >>>>> >>>>> IIUC Miquèl's reply, valid behaviour is to return full OOB data in both modes. So in: >>>>> ECC mode it is user bytes(corrected by ECC, read from info buffer) + ECC + missing bytes. ECC and missing bytes read in RAW mode. >>>> >>>> I believe the ECC bytes you'll get will be corrected. >>>> You can check this by using nandflipbits in mtd-utils. >>> >>> Sorry, didn't get it, i'm new in NAND area. Bytes of ECC codes are available only in raw mode (at least in this NAND >>> driver) also as missing bytes of OOB. >> >> Gasp. Yeah that's a controller limitation, okay. >> >>> So IIUC ECC codes are metadata to correct data bytes, and thus >>> couldn't be corrected. >> >> We consider them metadata, but they are fully part of the ECC scheme >> and thus their correction is part of the process, bitflips in the ECC >> bytes will count as data bitflips actually. >> >> I talked a bit about ECC engines at a previous conference if it can >> help: >> https://elinux.org/ELC_Europe_2020_Presentations >> 'Understand ECC Support for NAND Flash Devices in Linux' >> And also wrote a blog post with a chapter about ECC engines: >> https://bootlin.com/blog/supporting-a-misbehaving-nand-ecc-engine/ > > > Thanks for this! > > Thanks, Arseniy Hello again @Liang @Miquel! One more question about OOB access, as I can see current driver uses the following callbacks: nand->ecc.write_oob_raw = nand_write_oob_std; nand->ecc.write_oob = nand_write_oob_std; Function 'nand_write_oob_std()' writes data to the end of the page. But as I can see by dumping 'data_buf' during read, physical layout of each page is the following (1KB ECC): 0x000: [ 1 KB of data ] 0x400: [ 2B user data] [ 14B ECC code] 0x410: [ 1 KB of data ] (A) 0x810: [ 2B user data] [ 14B ECC code] 0x820: [ 32B unused ] So, after 'nand_write_oob_std()' (let data be sequence from [0x0 ... 0x3f]), page will look like this: 0x000: [ 0xFF ] 0x400: [ ........ ] 0x7f0: [ 0xFF ] 0x800: [ 00 ....................... ] 0x830: [ ........................ 3f ] Here we have two problems: 1) Attempt to display raw data by 'nanddump' utility produces a little bit invalid output, as driver relies on layout (A) from above. E.g. OOB data is at 0x400 and 0x810. Here is an example (attempt to write 0x11 0x22 0x33 0x44): 0x000007f0: 11 22 ff ff ff ff ff ff ff ff ff ff ff ff ff ff |."..............| OOB Data: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| OOB Data: 33 44 ff ff ff ff ff ff ff ff ff ff ff ff ff ff |3D..............| OOB Data: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| OOB Data: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| 2) Attempt to read data in ECC mode will fail, because IIUC page is in dirty state (I mean was written at least once) and NAND controller tries to use ECC codes at 0x400 and 0x810, which are obviously broken in this case. Thus we have strange situation: OOB seems written without any errors, but we can't read this page. First idea is to write OOB data to 0x400 and 0x810 in raw mode, but this does not work - if there is some data, NAND controller will try to use ECC engine to check these user bytes on next attempt to read this page. But as these 4 bytes were written in raw mode, ECC codes are missed. We suggest the following thing: use only area at 0x820 for OOB (see A) - it is not covered by ECC engine, so write to this zone won't conflict with ECC in future. In this case we change 'meson_ooblayout_free()' function which instead of describing 2 user bytes for each ECC block, will return 16 tail bytes for each ECC block. What to You think? > >> >> Thanks, >> Miquèl
Hi Arseniy and Miquel, On 2023/4/18 13:12, Arseniy Krasnov wrote: > [ EXTERNAL EMAIL ] > > > > On 13.04.2023 13:35, Arseniy Krasnov wrote: >> >> >> On 13.04.2023 13:22, Miquel Raynal wrote: >>> Hi Arseniy, >>> >>> avkrasnov@sberdevices.ru wrote on Thu, 13 Apr 2023 12:36:24 +0300: >>> >>>> On 13.04.2023 11:22, Miquel Raynal wrote: >>>>> Hi Arseniy, >>>>> >>>>> avkrasnov@sberdevices.ru wrote on Thu, 13 Apr 2023 10:00:24 +0300: >>>>> >>>>>> On 13.04.2023 09:11, Liang Yang wrote: >>>>>>> >>>>>>> On 2023/4/13 13:32, Liang Yang wrote: >>>>>>>> Hi Miquel, >>>>>>>> >>>>>>>> On 2023/4/12 22:32, Miquel Raynal wrote: >>>>>>>>> [ EXTERNAL EMAIL ] >>>>>>>>> >>>>>>>>> Hello, >>>>>>>>> >>>>>>>>> liang.yang@amlogic.com wrote on Wed, 12 Apr 2023 22:04:28 +0800: >>>>>>>>> >>>>>>>>>> Hi Miquel and Arseniy, >>>>>>>>>> >>>>>>>>>> On 2023/4/12 20:57, Miquel Raynal wrote: >>>>>>>>>>> [ EXTERNAL EMAIL ] >>>>>>>>>>> >>>>>>>>>>> Hi Arseniy, >>>>>>>>>>> >>>>>>>>>>> avkrasnov@sberdevices.ru wrote on Wed, 12 Apr 2023 15:22:26 +0300: >>>>>>>>>>>> On 12.04.2023 15:18, Miquel Raynal wrote: >>>>>>>>>>>>> Hi Arseniy, >>>>>>>>>>>>> >>>>>>>>>>>>> avkrasnov@sberdevices.ru wrote on Wed, 12 Apr 2023 13:14:52 +0300: >>>>>>>>>>>>> >>>> On 12.04.2023 12:36, Miquel Raynal wrote: >>>>>>>>>>>>>>> Hi Arseniy, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> avkrasnov@sberdevices.ru wrote on Wed, 12 Apr 2023 12:20:55 +0300: >>>>>>>>>>>>>>> >>>>>> On 12.04.2023 10:44, Miquel Raynal wrote: >>>>>>>>>>>>>>>>> Hi Arseniy, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> AVKrasnov@sberdevices.ru wrote on Wed, 12 Apr 2023 09:16:58 +0300: >>>>>>>>>>>>>>>>> >>>>>>>> This NAND reads only few user's bytes in ECC mode (not full OOB), so >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> "This NAND reads" does not look right, do you mean "Subpage reads do >>>>>>>>>>>>>>>>> not retrieve all the OOB bytes,"? >>>>>>>>>>>>>>>>> >>>>>>>> fill OOB buffer with zeroes to not return garbage from previous reads >>>>>>>>>>>>>>>>>> to user. >>>>>>>>>>>>>>>>>> Otherwise 'nanddump' utility prints something like this for just erased >>>>>>>>>>>>>>>>>> page: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> ... >>>>>>>>>>>>>>>>>> 0x000007f0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff >>>>>>>>>>>>>>>>>> OOB Data: ff ff ff ff 00 00 ff ff 80 cf 22 99 cb ad d3 be >>>>>>>>>>>>>>>>>> OOB Data: 63 27 ae 06 16 0a 2f eb bb dd 46 74 41 8e 88 6e >>>>>>>>>>>>>>>>>> OOB Data: 38 a1 2d e6 77 d4 05 06 f2 a5 7e 25 eb 34 7c ff >>>>>>>>>>>>>>>>>> OOB Data: 38 ea de 14 10 de 9b 40 33 16 6a cc 9d aa 2f 5e >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru> >>>>>>>>>>>>>>>>>> --- >>>>>>>>>>>>>>>>>> drivers/mtd/nand/raw/meson_nand.c | 5 +++++ >>>>>>>>>>>>>>>>>> 1 file changed, 5 insertions(+) >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> diff --git a/drivers/mtd/nand/raw/meson_nand.c b/drivers/mtd/nand/raw/meson_nand.c >>>>>>>>>>>>>>>>>> index f84a10238e4d..f2f2472cb511 100644 >>>>>>>>>>>>>>>>>> --- a/drivers/mtd/nand/raw/meson_nand.c >>>>>>>>>>>>>>>>>> +++ b/drivers/mtd/nand/raw/meson_nand.c >>>>>>>>>>>>>>>>>> @@ -858,9 +858,12 @@ static int meson_nfc_read_page_sub(struct nand_chip *nand, >>>>>>>>>>>>>>>>>> static int meson_nfc_read_page_raw(struct nand_chip *nand, u8 *buf, >>>>>>>>>>>>>>>>>> int oob_required, int page) >>>>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>>>> + struct mtd_info *mtd = nand_to_mtd(nand); >>>>>>>>>>>>>>>>>> u8 *oob_buf = nand->oob_poi; >>>>>>>>>>>>>>>>>> int ret; >>>>>>>>>>>>>>>>>> >>>>>>>> + memset(oob_buf, 0, mtd->oobsize); >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I'm surprised raw reads do not read the entire OOB? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Yes! Seems in case of raw access (what i see in this driver) number of OOB bytes read >>>>>>>>>>>>>>>> still depends on ECC parameters: for each portion of data covered with ECC code we can >>>>>>>>>>>>>>>> read it's ECC code and "user bytes" from OOB - it is what i see by dumping DMA buffer by >>>>>>>>>>>>>>>> printk(). For example I'm working with 2K NAND pages, each page has 2 x 1K ECC blocks. >>>>>>>>>>>>>>>> For each ECC block I have 16 OOB bytes which I can access by read/write. Each 16 bytes >>>>>>>>>>>>>>>> contains 2 bytes of user's data and 14 bytes ECC codes. So when I read page in raw mode >>>>>>>>>>>>>>>> controller returns 32 bytes (2 x (2 + 14)) of OOB. While OOB is reported as 64 bytes. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> In all modes, when you read OOB, you should get the full OOB. The fact >>>>>>>>>>>>>>> that ECC correction is enabled or disabled does not matter. If the NAND >>>>>>>>>>>>>>> features OOB sections of 64 bytes, you should get the 64 bytes. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> What happens sometimes, is that some of the bytes are not protected >>>>>>>>>>>>>>> against bitflips, but the policy is to return the full buffer. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Ok, so to clarify case for this NAND controller: >>>>>>>>>>>>>> 1) In both ECC and raw modes i need to return the same raw OOB data (e.g. user bytes >>>>>>>>>>>>>> + ECC codes)? >>>>>>>>>>>>> >>>>>>>>>>>>> Well, you need to cover the same amount of data, yes. But in the ECC >>>>>>>>>>>>> case the data won't be raw (at least not all of it). >>>>>>>>>>>> >>>>>>>>>>>> So "same amount of data", in ECC mode current implementation returns only user OOB bytes (e.g. >>>>>>>>>>>> OOB data excluding ECC codes), in raw it returns user bytes + ECC codes. IIUC correct >>>>>>>>>>>> behaviour is to always return user bytes + ECC codes as OOB data even in ECC mode ? >>>>>>>>>>> >>>>>>>>>>> If the page are 2k+64B you should read 2k+64B when OOB are requested. >>>>>>>>>>> >>>>>>>>>>> If the controller only returns 2k+32B, then perform a random read to >>>>>>>>>>> just move the read pointer to mtd->size + mtd->oobsize - 32 and >>>>>>>>>>> retrieve the missing 32 bytes? >>>>>>>>>> >>>>>>>>>> 1) raw read can read out the whole page data 2k+64B, decided by the len in the controller raw read command: >>>>>>>>>> cmd = (len & GENMASK(5, 0)) | scrambler | DMA_DIR(dir); >>>>>>>>>> after that, the missing oob bytes(not used) can be copied from meson_chip->data_buf. so the implementation of meson_nfc_read_page_raw() is like this if need. >>>>>>>>>> { >>>>>>>>>> ...... >>>>>>>>>> meson_nfc_read_page_sub(nand, page, 1); >>>>>>>>>> meson_nfc_get_data_oob(nand, buf, oob_buf); >>>>>>>>>> oob_len = (nand->ecc.bytes + 2) * nand->ecc.steps; >>>>>>>>>> memcpy(oob_buf + oob_len, meson_chip->data_buf + oob_len, mtd->oobsize - oob_len); >>>>>>>>>> >>>>>>>>>> } >>>>>>>>>> 2) In ECC mode, the controller can't bring back the missing OOB bytes. it can read out the user bytes and ecc bytes per meson_ooblayout_ops define. >>>>>>>>> >>>>>>>>> And then (if oob_required) you can bring the missing bytes with >>>>>>>>> something along: >>>>>>>>> nand_change_read_column_op(chip, mtd->writesize + oob_len, >>>>>>>>> oob_buf + oob_len, >>>>>>>>> mtd->oobsize - oob_len, >>>>>>>>> false); >>>>>>>>> Should not be a huge performance hit. >>>>>>>> >>>>>>>> After finishing ECC mode reading, the column address internal in NAND device should be the right pos; it doesn't need to change the column again. so adding controller raw read for the missing bytes after ECC reading may works. >>>>>>>> >>>>>>> use raw read for the missing bytes, but they are not protected by host ECC. to the NAND type of storage, is it ok or missing bytes better to be filled with 0xff? >>>>>> >>>>>> IIUC Miquèl's reply, valid behaviour is to return full OOB data in both modes. So in: >>>>>> ECC mode it is user bytes(corrected by ECC, read from info buffer) + ECC + missing bytes. ECC and missing bytes read in RAW mode. >>>>> >>>>> I believe the ECC bytes you'll get will be corrected. >>>>> You can check this by using nandflipbits in mtd-utils. >>>> >>>> Sorry, didn't get it, i'm new in NAND area. Bytes of ECC codes are available only in raw mode (at least in this NAND >>>> driver) also as missing bytes of OOB. >>> >>> Gasp. Yeah that's a controller limitation, okay. >>> >>>> So IIUC ECC codes are metadata to correct data bytes, and thus >>>> couldn't be corrected. >>> >>> We consider them metadata, but they are fully part of the ECC scheme >>> and thus their correction is part of the process, bitflips in the ECC >>> bytes will count as data bitflips actually. >>> >>> I talked a bit about ECC engines at a previous conference if it can >>> help: >>> https://elinux.org/ELC_Europe_2020_Presentations >>> 'Understand ECC Support for NAND Flash Devices in Linux' >>> And also wrote a blog post with a chapter about ECC engines: >>> https://bootlin.com/blog/supporting-a-misbehaving-nand-ecc-engine/ >> >> >> Thanks for this! >> >> Thanks, Arseniy > > Hello again @Liang @Miquel! > > One more question about OOB access, as I can see current driver uses the following > callbacks: > > nand->ecc.write_oob_raw = nand_write_oob_std; > nand->ecc.write_oob = nand_write_oob_std; > > > Function 'nand_write_oob_std()' writes data to the end of the page. But as I > can see by dumping 'data_buf' during read, physical layout of each page is the > following (1KB ECC): > > 0x000: [ 1 KB of data ] > 0x400: [ 2B user data] [ 14B ECC code] > 0x410: [ 1 KB of data ] (A) > 0x810: [ 2B user data] [ 14B ECC code] > 0x820: [ 32B unused ] > > > > So, after 'nand_write_oob_std()' (let data be sequence from [0x0 ... 0x3f]), > page will look like this: > > 0x000: [ 0xFF ] > 0x400: [ ........ ] > 0x7f0: [ 0xFF ] > 0x800: [ 00 ....................... ] > 0x830: [ ........................ 3f ] > > Here we have two problems: > 1) Attempt to display raw data by 'nanddump' utility produces a little bit > invalid output, as driver relies on layout (A) from above. E.g. OOB data > is at 0x400 and 0x810. Here is an example (attempt to write 0x11 0x22 0x33 0x44): > > 0x000007f0: 11 22 ff ff ff ff ff ff ff ff ff ff ff ff ff ff |."..............| > OOB Data: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| > OOB Data: 33 44 ff ff ff ff ff ff ff ff ff ff ff ff ff ff |3D..............| > OOB Data: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| > OOB Data: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| > Hi Arseniy, I realized the write_oob_raw() and write_oob() are wrong in meson_nand.c. I suggest both of them should be reworked and follow the format of meson nand controller. i.e. firstly format the data in Layout (A) and then write. reading is firstly reading the data of layout (A) and then compost the layout (B). > > 2) Attempt to read data in ECC mode will fail, because IIUC page is in dirty > state (I mean was written at least once) and NAND controller tries to use > ECC codes at 0x400 and 0x810, which are obviously broken in this case. Thus As i said above, write_oob_raw() and write_oob() should be reworked. i don't know what do you mean page was written at least once. anyway the page should be written once, even just write_oob_raw(). > we have strange situation: OOB seems written without any errors, but we can't > read this page. First idea is to write OOB data to 0x400 and 0x810 in raw mode, > but this does not work - if there is some data, NAND controller will try to > use ECC engine to check these user bytes on next attempt to read this page. But > as these 4 bytes were written in raw mode, ECC codes are missed. > > We suggest the following thing: use only area at 0x820 for OOB (see A) - it is not covered > by ECC engine, so write to this zone won't conflict with ECC in future. In this case > we change 'meson_ooblayout_free()' function which instead of describing 2 user bytes > for each ECC block, will return 16 tail bytes for each ECC block. > > What to You think? the key point is that the data 0x820-0x840 is not protected by Host ECC.so i don't think we have to change it. Thanks, Liang > > >> >>> >>> Thanks, >>> Miquèl >
On 18.04.2023 15:24, Liang Yang wrote: > Hi Arseniy and Miquel, > > On 2023/4/18 13:12, Arseniy Krasnov wrote: >> [ EXTERNAL EMAIL ] >> >> >> >> On 13.04.2023 13:35, Arseniy Krasnov wrote: >>> >>> >>> On 13.04.2023 13:22, Miquel Raynal wrote: >>>> Hi Arseniy, >>>> >>>> avkrasnov@sberdevices.ru wrote on Thu, 13 Apr 2023 12:36:24 +0300: >>>> >>>>> On 13.04.2023 11:22, Miquel Raynal wrote: >>>>>> Hi Arseniy, >>>>>> >>>>>> avkrasnov@sberdevices.ru wrote on Thu, 13 Apr 2023 10:00:24 +0300: >>>>>> >>>>>>> On 13.04.2023 09:11, Liang Yang wrote: >>>>>>>> >>>>>>>> On 2023/4/13 13:32, Liang Yang wrote: >>>>>>>>> Hi Miquel, >>>>>>>>> >>>>>>>>> On 2023/4/12 22:32, Miquel Raynal wrote: >>>>>>>>>> [ EXTERNAL EMAIL ] >>>>>>>>>> >>>>>>>>>> Hello, >>>>>>>>>> >>>>>>>>>> liang.yang@amlogic.com wrote on Wed, 12 Apr 2023 22:04:28 +0800: >>>>>>>>>> >>>>>>>>>>> Hi Miquel and Arseniy, >>>>>>>>>>> >>>>>>>>>>> On 2023/4/12 20:57, Miquel Raynal wrote: >>>>>>>>>>>> [ EXTERNAL EMAIL ] >>>>>>>>>>>> >>>>>>>>>>>> Hi Arseniy, >>>>>>>>>>>> >>>>>>>>>>>> avkrasnov@sberdevices.ru wrote on Wed, 12 Apr 2023 15:22:26 +0300: >>>>>>>>>>>>> On 12.04.2023 15:18, Miquel Raynal wrote: >>>>>>>>>>>>>> Hi Arseniy, >>>>>>>>>>>>>> >>>>>>>>>>>>>> avkrasnov@sberdevices.ru wrote on Wed, 12 Apr 2023 13:14:52 +0300: >>>>>>>>>>>>>> >>>> On 12.04.2023 12:36, Miquel Raynal wrote: >>>>>>>>>>>>>>>> Hi Arseniy, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> avkrasnov@sberdevices.ru wrote on Wed, 12 Apr 2023 12:20:55 +0300: >>>>>>>>>>>>>>>> >>>>>> On 12.04.2023 10:44, Miquel Raynal wrote: >>>>>>>>>>>>>>>>>> Hi Arseniy, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> AVKrasnov@sberdevices.ru wrote on Wed, 12 Apr 2023 09:16:58 +0300: >>>>>>>>>>>>>>>>>> >>>>>>>> This NAND reads only few user's bytes in ECC mode (not full OOB), so >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> "This NAND reads" does not look right, do you mean "Subpage reads do >>>>>>>>>>>>>>>>>> not retrieve all the OOB bytes,"? >>>>>>>>>>>>>>>>>> >>>>>>>> fill OOB buffer with zeroes to not return garbage from previous reads >>>>>>>>>>>>>>>>>>> to user. >>>>>>>>>>>>>>>>>>> Otherwise 'nanddump' utility prints something like this for just erased >>>>>>>>>>>>>>>>>>> page: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> ... >>>>>>>>>>>>>>>>>>> 0x000007f0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff >>>>>>>>>>>>>>>>>>> OOB Data: ff ff ff ff 00 00 ff ff 80 cf 22 99 cb ad d3 be >>>>>>>>>>>>>>>>>>> OOB Data: 63 27 ae 06 16 0a 2f eb bb dd 46 74 41 8e 88 6e >>>>>>>>>>>>>>>>>>> OOB Data: 38 a1 2d e6 77 d4 05 06 f2 a5 7e 25 eb 34 7c ff >>>>>>>>>>>>>>>>>>> OOB Data: 38 ea de 14 10 de 9b 40 33 16 6a cc 9d aa 2f 5e >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru> >>>>>>>>>>>>>>>>>>> --- >>>>>>>>>>>>>>>>>>> drivers/mtd/nand/raw/meson_nand.c | 5 +++++ >>>>>>>>>>>>>>>>>>> 1 file changed, 5 insertions(+) >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> diff --git a/drivers/mtd/nand/raw/meson_nand.c b/drivers/mtd/nand/raw/meson_nand.c >>>>>>>>>>>>>>>>>>> index f84a10238e4d..f2f2472cb511 100644 >>>>>>>>>>>>>>>>>>> --- a/drivers/mtd/nand/raw/meson_nand.c >>>>>>>>>>>>>>>>>>> +++ b/drivers/mtd/nand/raw/meson_nand.c >>>>>>>>>>>>>>>>>>> @@ -858,9 +858,12 @@ static int meson_nfc_read_page_sub(struct nand_chip *nand, >>>>>>>>>>>>>>>>>>> static int meson_nfc_read_page_raw(struct nand_chip *nand, u8 *buf, >>>>>>>>>>>>>>>>>>> int oob_required, int page) >>>>>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>>>>> + struct mtd_info *mtd = nand_to_mtd(nand); >>>>>>>>>>>>>>>>>>> u8 *oob_buf = nand->oob_poi; >>>>>>>>>>>>>>>>>>> int ret; >>>>>>>>>>>>>>>>>>> >>>>>>>> + memset(oob_buf, 0, mtd->oobsize); >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I'm surprised raw reads do not read the entire OOB? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Yes! Seems in case of raw access (what i see in this driver) number of OOB bytes read >>>>>>>>>>>>>>>>> still depends on ECC parameters: for each portion of data covered with ECC code we can >>>>>>>>>>>>>>>>> read it's ECC code and "user bytes" from OOB - it is what i see by dumping DMA buffer by >>>>>>>>>>>>>>>>> printk(). For example I'm working with 2K NAND pages, each page has 2 x 1K ECC blocks. >>>>>>>>>>>>>>>>> For each ECC block I have 16 OOB bytes which I can access by read/write. Each 16 bytes >>>>>>>>>>>>>>>>> contains 2 bytes of user's data and 14 bytes ECC codes. So when I read page in raw mode >>>>>>>>>>>>>>>>> controller returns 32 bytes (2 x (2 + 14)) of OOB. While OOB is reported as 64 bytes. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> In all modes, when you read OOB, you should get the full OOB. The fact >>>>>>>>>>>>>>>> that ECC correction is enabled or disabled does not matter. If the NAND >>>>>>>>>>>>>>>> features OOB sections of 64 bytes, you should get the 64 bytes. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> What happens sometimes, is that some of the bytes are not protected >>>>>>>>>>>>>>>> against bitflips, but the policy is to return the full buffer. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Ok, so to clarify case for this NAND controller: >>>>>>>>>>>>>>> 1) In both ECC and raw modes i need to return the same raw OOB data (e.g. user bytes >>>>>>>>>>>>>>> + ECC codes)? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Well, you need to cover the same amount of data, yes. But in the ECC >>>>>>>>>>>>>> case the data won't be raw (at least not all of it). >>>>>>>>>>>>> >>>>>>>>>>>>> So "same amount of data", in ECC mode current implementation returns only user OOB bytes (e.g. >>>>>>>>>>>>> OOB data excluding ECC codes), in raw it returns user bytes + ECC codes. IIUC correct >>>>>>>>>>>>> behaviour is to always return user bytes + ECC codes as OOB data even in ECC mode ? >>>>>>>>>>>> >>>>>>>>>>>> If the page are 2k+64B you should read 2k+64B when OOB are requested. >>>>>>>>>>>> >>>>>>>>>>>> If the controller only returns 2k+32B, then perform a random read to >>>>>>>>>>>> just move the read pointer to mtd->size + mtd->oobsize - 32 and >>>>>>>>>>>> retrieve the missing 32 bytes? >>>>>>>>>>> >>>>>>>>>>> 1) raw read can read out the whole page data 2k+64B, decided by the len in the controller raw read command: >>>>>>>>>>> cmd = (len & GENMASK(5, 0)) | scrambler | DMA_DIR(dir); >>>>>>>>>>> after that, the missing oob bytes(not used) can be copied from meson_chip->data_buf. so the implementation of meson_nfc_read_page_raw() is like this if need. >>>>>>>>>>> { >>>>>>>>>>> ...... >>>>>>>>>>> meson_nfc_read_page_sub(nand, page, 1); >>>>>>>>>>> meson_nfc_get_data_oob(nand, buf, oob_buf); >>>>>>>>>>> oob_len = (nand->ecc.bytes + 2) * nand->ecc.steps; >>>>>>>>>>> memcpy(oob_buf + oob_len, meson_chip->data_buf + oob_len, mtd->oobsize - oob_len); >>>>>>>>>>> >>>>>>>>>>> } >>>>>>>>>>> 2) In ECC mode, the controller can't bring back the missing OOB bytes. it can read out the user bytes and ecc bytes per meson_ooblayout_ops define. >>>>>>>>>> >>>>>>>>>> And then (if oob_required) you can bring the missing bytes with >>>>>>>>>> something along: >>>>>>>>>> nand_change_read_column_op(chip, mtd->writesize + oob_len, >>>>>>>>>> oob_buf + oob_len, >>>>>>>>>> mtd->oobsize - oob_len, >>>>>>>>>> false); >>>>>>>>>> Should not be a huge performance hit. >>>>>>>>> >>>>>>>>> After finishing ECC mode reading, the column address internal in NAND device should be the right pos; it doesn't need to change the column again. so adding controller raw read for the missing bytes after ECC reading may works. >>>>>>>>> >>>>>>>> use raw read for the missing bytes, but they are not protected by host ECC. to the NAND type of storage, is it ok or missing bytes better to be filled with 0xff? >>>>>>> >>>>>>> IIUC Miquèl's reply, valid behaviour is to return full OOB data in both modes. So in: >>>>>>> ECC mode it is user bytes(corrected by ECC, read from info buffer) + ECC + missing bytes. ECC and missing bytes read in RAW mode. >>>>>> >>>>>> I believe the ECC bytes you'll get will be corrected. >>>>>> You can check this by using nandflipbits in mtd-utils. >>>>> >>>>> Sorry, didn't get it, i'm new in NAND area. Bytes of ECC codes are available only in raw mode (at least in this NAND >>>>> driver) also as missing bytes of OOB. >>>> >>>> Gasp. Yeah that's a controller limitation, okay. >>>> >>>>> So IIUC ECC codes are metadata to correct data bytes, and thus >>>>> couldn't be corrected. >>>> >>>> We consider them metadata, but they are fully part of the ECC scheme >>>> and thus their correction is part of the process, bitflips in the ECC >>>> bytes will count as data bitflips actually. >>>> >>>> I talked a bit about ECC engines at a previous conference if it can >>>> help: >>>> https://elinux.org/ELC_Europe_2020_Presentations >>>> 'Understand ECC Support for NAND Flash Devices in Linux' >>>> And also wrote a blog post with a chapter about ECC engines: >>>> https://bootlin.com/blog/supporting-a-misbehaving-nand-ecc-engine/ >>> >>> >>> Thanks for this! >>> >>> Thanks, Arseniy >> >> Hello again @Liang @Miquel! >> >> One more question about OOB access, as I can see current driver uses the following >> callbacks: >> >> nand->ecc.write_oob_raw = nand_write_oob_std; >> nand->ecc.write_oob = nand_write_oob_std; >> >> >> Function 'nand_write_oob_std()' writes data to the end of the page. But as I >> can see by dumping 'data_buf' during read, physical layout of each page is the >> following (1KB ECC): >> >> 0x000: [ 1 KB of data ] >> 0x400: [ 2B user data] [ 14B ECC code] >> 0x410: [ 1 KB of data ] (A) >> 0x810: [ 2B user data] [ 14B ECC code] >> 0x820: [ 32B unused ] >> >> >> >> So, after 'nand_write_oob_std()' (let data be sequence from [0x0 ... 0x3f]), >> page will look like this: >> >> 0x000: [ 0xFF ] >> 0x400: [ ........ ] >> 0x7f0: [ 0xFF ] >> 0x800: [ 00 ....................... ] >> 0x830: [ ........................ 3f ] >> >> Here we have two problems: >> 1) Attempt to display raw data by 'nanddump' utility produces a little bit >> invalid output, as driver relies on layout (A) from above. E.g. OOB data >> is at 0x400 and 0x810. Here is an example (attempt to write 0x11 0x22 0x33 0x44): >> >> 0x000007f0: 11 22 ff ff ff ff ff ff ff ff ff ff ff ff ff ff |."..............| >> OOB Data: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| >> OOB Data: 33 44 ff ff ff ff ff ff ff ff ff ff ff ff ff ff |3D..............| >> OOB Data: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| >> OOB Data: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| >> > Hi Arseniy, > > I realized the write_oob_raw() and write_oob() are wrong in meson_nand.c. I suggest both of them should be reworked and follow the format of meson nand controller. i.e. firstly format the data in Layout (A) and then write. reading is firstly reading the data of layout (A) and then compost the layout (B). IIUC after such writing only OOB (e.g. user bytes) according layout (A), hw will also write ECC codes, so it will be impossible to write data to this page later, because we cannot update ECC codes properly for the newly written data (we can't update bits from 0 to 1). > > >> >> 2) Attempt to read data in ECC mode will fail, because IIUC page is in dirty >> state (I mean was written at least once) and NAND controller tries to use >> ECC codes at 0x400 and 0x810, which are obviously broken in this case. Thus > > As i said above, write_oob_raw() and write_oob() should be reworked. > i don't know what do you mean page was written at least once. anyway the page should be written once, even just write_oob_raw(). Sorry, You mean that after OOB write, we cannot write to the data area (e.g. 0x0 .. 0x810) until page will be erased? For example JFFS2 writes to OOB own markers, then it tries to write to the data area of such page. Thanks, Arseniy > >> we have strange situation: OOB seems written without any errors, but we can't >> read this page. First idea is to write OOB data to 0x400 and 0x810 in raw mode, >> but this does not work - if there is some data, NAND controller will try to >> use ECC engine to check these user bytes on next attempt to read this page. But >> as these 4 bytes were written in raw mode, ECC codes are missed. >> >> We suggest the following thing: use only area at 0x820 for OOB (see A) - it is not covered >> by ECC engine, so write to this zone won't conflict with ECC in future. In this case >> we change 'meson_ooblayout_free()' function which instead of describing 2 user bytes >> for each ECC block, will return 16 tail bytes for each ECC block. >> >> What to You think? > > the key point is that the data 0x820-0x840 is not protected by Host ECC.so i don't think we have to change it. > > Thanks, > Liang >> >> >>> >>>> >>>> Thanks, >>>> Miquèl >>
Hi Arseniy, > >> Hello again @Liang @Miquel! > >> > >> One more question about OOB access, as I can see current driver uses the following > >> callbacks: > >> > >> nand->ecc.write_oob_raw = nand_write_oob_std; > >> nand->ecc.write_oob = nand_write_oob_std; > >> > >> > >> Function 'nand_write_oob_std()' writes data to the end of the page. But as I > >> can see by dumping 'data_buf' during read, physical layout of each page is the > >> following (1KB ECC): > >> > >> 0x000: [ 1 KB of data ] > >> 0x400: [ 2B user data] [ 14B ECC code] > >> 0x410: [ 1 KB of data ] (A) > >> 0x810: [ 2B user data] [ 14B ECC code] > >> 0x820: [ 32B unused ] > >> > >> > >> > >> So, after 'nand_write_oob_std()' (let data be sequence from [0x0 ... 0x3f]), > >> page will look like this: > >> > >> 0x000: [ 0xFF ] > >> 0x400: [ ........ ] > >> 0x7f0: [ 0xFF ] > >> 0x800: [ 00 ....................... ] > >> 0x830: [ ........................ 3f ] > >> > >> Here we have two problems: > >> 1) Attempt to display raw data by 'nanddump' utility produces a little bit > >> invalid output, as driver relies on layout (A) from above. E.g. OOB data > >> is at 0x400 and 0x810. Here is an example (attempt to write 0x11 0x22 0x33 0x44): > >> > >> 0x000007f0: 11 22 ff ff ff ff ff ff ff ff ff ff ff ff ff ff |."..............| > >> OOB Data: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| > >> OOB Data: 33 44 ff ff ff ff ff ff ff ff ff ff ff ff ff ff |3D..............| > >> OOB Data: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| > >> OOB Data: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| > >> > > Hi Arseniy, > > > > I realized the write_oob_raw() and write_oob() are wrong in meson_nand.c. I suggest both of them should be reworked and follow the format of meson nand controller. i.e. firstly format the data in Layout (A) and then write. reading is firstly reading the data of layout (A) and then compost the layout (B). > > IIUC after such writing only OOB (e.g. user bytes) according layout (A), hw will also write ECC codes, so > it will be impossible to write data to this page later, because we cannot update ECC codes properly for the newly > written data (we can't update bits from 0 to 1). > > > > > > >> > >> 2) Attempt to read data in ECC mode will fail, because IIUC page is in dirty > >> state (I mean was written at least once) and NAND controller tries to use > >> ECC codes at 0x400 and 0x810, which are obviously broken in this case. Thus > > > > As i said above, write_oob_raw() and write_oob() should be reworked. > > i don't know what do you mean page was written at least once. anyway the page should be written once, even just write_oob_raw(). > > Sorry, You mean that after OOB write, we cannot write to the data area (e.g. 0x0 .. 0x810) until page will be erased? For example > JFFS2 writes to OOB own markers, then it tries to write to the data area of such page. A page is written after two steps: - loading the data into the NAND chip cache (that's when you use the bus) - programming the NAND array with the data loaded in cache (that's when you wait) In theory it does not matter where you write in the cache, it's regular DRAM, you can make random writes there with the appropriate NAND commands. Of course when using embedded hardware ECC engines, the controllers usually expect to be fed in a certain way in order to produce the ECC bytes and put them at the right location in cache. And then, when you actually send the "program" command, the NAND cells actually get programmed based on what has been loaded in cache. Thanks, Miquèl
On 18.04.2023 16:25, Miquel Raynal wrote: > Hi Arseniy, > >>>> Hello again @Liang @Miquel! >>>> >>>> One more question about OOB access, as I can see current driver uses the following >>>> callbacks: >>>> >>>> nand->ecc.write_oob_raw = nand_write_oob_std; >>>> nand->ecc.write_oob = nand_write_oob_std; >>>> >>>> >>>> Function 'nand_write_oob_std()' writes data to the end of the page. But as I >>>> can see by dumping 'data_buf' during read, physical layout of each page is the >>>> following (1KB ECC): >>>> >>>> 0x000: [ 1 KB of data ] >>>> 0x400: [ 2B user data] [ 14B ECC code] >>>> 0x410: [ 1 KB of data ] (A) >>>> 0x810: [ 2B user data] [ 14B ECC code] >>>> 0x820: [ 32B unused ] >>>> >>>> >>>> >>>> So, after 'nand_write_oob_std()' (let data be sequence from [0x0 ... 0x3f]), >>>> page will look like this: >>>> >>>> 0x000: [ 0xFF ] >>>> 0x400: [ ........ ] >>>> 0x7f0: [ 0xFF ] >>>> 0x800: [ 00 ....................... ] >>>> 0x830: [ ........................ 3f ] >>>> >>>> Here we have two problems: >>>> 1) Attempt to display raw data by 'nanddump' utility produces a little bit >>>> invalid output, as driver relies on layout (A) from above. E.g. OOB data >>>> is at 0x400 and 0x810. Here is an example (attempt to write 0x11 0x22 0x33 0x44): >>>> >>>> 0x000007f0: 11 22 ff ff ff ff ff ff ff ff ff ff ff ff ff ff |."..............| >>>> OOB Data: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| >>>> OOB Data: 33 44 ff ff ff ff ff ff ff ff ff ff ff ff ff ff |3D..............| >>>> OOB Data: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| >>>> OOB Data: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| >>>> >>> Hi Arseniy, >>> >>> I realized the write_oob_raw() and write_oob() are wrong in meson_nand.c. I suggest both of them should be reworked and follow the format of meson nand controller. i.e. firstly format the data in Layout (A) and then write. reading is firstly reading the data of layout (A) and then compost the layout (B). >> >> IIUC after such writing only OOB (e.g. user bytes) according layout (A), hw will also write ECC codes, so >> it will be impossible to write data to this page later, because we cannot update ECC codes properly for the newly >> written data (we can't update bits from 0 to 1). >> >>> >>> >>>> >>>> 2) Attempt to read data in ECC mode will fail, because IIUC page is in dirty >>>> state (I mean was written at least once) and NAND controller tries to use >>>> ECC codes at 0x400 and 0x810, which are obviously broken in this case. Thus >>> >>> As i said above, write_oob_raw() and write_oob() should be reworked. >>> i don't know what do you mean page was written at least once. anyway the page should be written once, even just write_oob_raw(). >> >> Sorry, You mean that after OOB write, we cannot write to the data area (e.g. 0x0 .. 0x810) until page will be erased? For example >> JFFS2 writes to OOB own markers, then it tries to write to the data area of such page. @Liang, I'll describe current test case in details: 1) I have erased page, I can read it in both raw and ecc modes - no problem (it is full of 0xFF). 2) I (JFFS2 for example) want to write only OOB - let it be clean markers. 3) I use raw write to the needed page (please correct me if i'm wrong). Four bytes at 0x400 and 0x810 are updated. All other bytes still 0xff. 4) Now, when i'm trying to read this page in ECC mode, I get ECC errors: IIUC this happens because from controller point of view ECC codes are invalid for current data (all ECCs are 0xff). Is this behaviour is ok? 5) Ok, don't care on these ECC errors, let's go further. 6) I'm going to write same page in ECC mode - how to do it correctly? There is already 4 OOB bytes, considered to be covered by ECC (but in fact now - ECC area is FFed). That's why I asked Your opinion about moving OOB data to nonprotected by ECC area (and leave user's bytes untouched). In this case OOB access is free and not linked with ECC codes which also covers data. Thanks, Arseniy > > A page is written after two steps: > - loading the data into the NAND chip cache (that's when you use the > bus) > - programming the NAND array with the data loaded in cache (that's when > you wait) > > In theory it does not matter where you write in the cache, it's regular > DRAM, you can make random writes there with the appropriate NAND > commands. Of course when using embedded hardware ECC engines, the > controllers usually expect to be fed in a certain way in order to > produce the ECC bytes and put them at the right location in cache. > > And then, when you actually send the "program" command, the NAND cells > actually get programmed based on what has been loaded in cache. Thanks for this details! Very interesting! > > Thanks, > Miquèl
On 18.04.2023 17:57, Arseniy Krasnov wrote: > > > On 18.04.2023 16:25, Miquel Raynal wrote: >> Hi Arseniy, >> >>>>> Hello again @Liang @Miquel! >>>>> >>>>> One more question about OOB access, as I can see current driver uses the following >>>>> callbacks: >>>>> >>>>> nand->ecc.write_oob_raw = nand_write_oob_std; >>>>> nand->ecc.write_oob = nand_write_oob_std; >>>>> >>>>> >>>>> Function 'nand_write_oob_std()' writes data to the end of the page. But as I >>>>> can see by dumping 'data_buf' during read, physical layout of each page is the >>>>> following (1KB ECC): >>>>> >>>>> 0x000: [ 1 KB of data ] >>>>> 0x400: [ 2B user data] [ 14B ECC code] >>>>> 0x410: [ 1 KB of data ] (A) >>>>> 0x810: [ 2B user data] [ 14B ECC code] >>>>> 0x820: [ 32B unused ] >>>>> >>>>> >>>>> >>>>> So, after 'nand_write_oob_std()' (let data be sequence from [0x0 ... 0x3f]), >>>>> page will look like this: >>>>> >>>>> 0x000: [ 0xFF ] >>>>> 0x400: [ ........ ] >>>>> 0x7f0: [ 0xFF ] >>>>> 0x800: [ 00 ....................... ] >>>>> 0x830: [ ........................ 3f ] >>>>> >>>>> Here we have two problems: >>>>> 1) Attempt to display raw data by 'nanddump' utility produces a little bit >>>>> invalid output, as driver relies on layout (A) from above. E.g. OOB data >>>>> is at 0x400 and 0x810. Here is an example (attempt to write 0x11 0x22 0x33 0x44): >>>>> >>>>> 0x000007f0: 11 22 ff ff ff ff ff ff ff ff ff ff ff ff ff ff |."..............| >>>>> OOB Data: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| >>>>> OOB Data: 33 44 ff ff ff ff ff ff ff ff ff ff ff ff ff ff |3D..............| >>>>> OOB Data: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| >>>>> OOB Data: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| >>>>> >>>> Hi Arseniy, >>>> >>>> I realized the write_oob_raw() and write_oob() are wrong in meson_nand.c. I suggest both of them should be reworked and follow the format of meson nand controller. i.e. firstly format the data in Layout (A) and then write. reading is firstly reading the data of layout (A) and then compost the layout (B). >>> >>> IIUC after such writing only OOB (e.g. user bytes) according layout (A), hw will also write ECC codes, so >>> it will be impossible to write data to this page later, because we cannot update ECC codes properly for the newly >>> written data (we can't update bits from 0 to 1). >>> >>>> >>>> >>>>> >>>>> 2) Attempt to read data in ECC mode will fail, because IIUC page is in dirty >>>>> state (I mean was written at least once) and NAND controller tries to use >>>>> ECC codes at 0x400 and 0x810, which are obviously broken in this case. Thus >>>> >>>> As i said above, write_oob_raw() and write_oob() should be reworked. >>>> i don't know what do you mean page was written at least once. anyway the page should be written once, even just write_oob_raw(). >>> >>> Sorry, You mean that after OOB write, we cannot write to the data area (e.g. 0x0 .. 0x810) until page will be erased? For example >>> JFFS2 writes to OOB own markers, then it tries to write to the data area of such page. > > @Liang, I'll describe current test case in details: > 1) I have erased page, I can read it in both raw and ecc modes - no problem (it is full of 0xFF). > 2) I (JFFS2 for example) want to write only OOB - let it be clean markers. > 3) I use raw write to the needed page (please correct me if i'm wrong). Four bytes > at 0x400 and 0x810 are updated. All other bytes still 0xff. > 4) Now, when i'm trying to read this page in ECC mode, I get ECC errors: IIUC this > happens because from controller point of view ECC codes are invalid for current > data (all ECCs are 0xff). Is this behaviour is ok? > 5) Ok, don't care on these ECC errors, let's go further. > 6) I'm going to write same page in ECC mode - how to do it correctly? There is already > 4 OOB bytes, considered to be covered by ECC (but in fact now - ECC area is FFed). > > That's why I asked Your opinion about moving OOB data to nonprotected by ECC area (and > leave user's bytes untouched). In this case OOB access is free and not linked with ECC > codes which also covers data. > > Thanks, Arseniy For example steps 1-4 happens during work with JFFS2, it tries to write OOB, then, sometimes later read this page in ECC mode - and get ECC errors. Thanks, Arseniy > > >> >> A page is written after two steps: >> - loading the data into the NAND chip cache (that's when you use the >> bus) >> - programming the NAND array with the data loaded in cache (that's when >> you wait) >> >> In theory it does not matter where you write in the cache, it's regular >> DRAM, you can make random writes there with the appropriate NAND >> commands. Of course when using embedded hardware ECC engines, the >> controllers usually expect to be fed in a certain way in order to >> produce the ECC bytes and put them at the right location in cache. >> >> And then, when you actually send the "program" command, the NAND cells >> actually get programmed based on what has been loaded in cache. > > Thanks for this details! Very interesting! > > >> >> Thanks, >> Miquèl
Hi Arseniy, On 2023/4/18 22:57, Arseniy Krasnov wrote: > [ EXTERNAL EMAIL ] > > > > On 18.04.2023 16:25, Miquel Raynal wrote: >> Hi Arseniy, >> >>>>> Hello again @Liang @Miquel! >>>>> >>>>> One more question about OOB access, as I can see current driver uses the following >>>>> callbacks: >>>>> >>>>> nand->ecc.write_oob_raw = nand_write_oob_std; >>>>> nand->ecc.write_oob = nand_write_oob_std; >>>>> >>>>> >>>>> Function 'nand_write_oob_std()' writes data to the end of the page. But as I >>>>> can see by dumping 'data_buf' during read, physical layout of each page is the >>>>> following (1KB ECC): >>>>> >>>>> 0x000: [ 1 KB of data ] >>>>> 0x400: [ 2B user data] [ 14B ECC code] >>>>> 0x410: [ 1 KB of data ] (A) >>>>> 0x810: [ 2B user data] [ 14B ECC code] >>>>> 0x820: [ 32B unused ] >>>>> >>>>> >>>>> >>>>> So, after 'nand_write_oob_std()' (let data be sequence from [0x0 ... 0x3f]), >>>>> page will look like this: >>>>> >>>>> 0x000: [ 0xFF ] >>>>> 0x400: [ ........ ] >>>>> 0x7f0: [ 0xFF ] >>>>> 0x800: [ 00 ....................... ] >>>>> 0x830: [ ........................ 3f ] >>>>> >>>>> Here we have two problems: >>>>> 1) Attempt to display raw data by 'nanddump' utility produces a little bit >>>>> invalid output, as driver relies on layout (A) from above. E.g. OOB data >>>>> is at 0x400 and 0x810. Here is an example (attempt to write 0x11 0x22 0x33 0x44): >>>>> >>>>> 0x000007f0: 11 22 ff ff ff ff ff ff ff ff ff ff ff ff ff ff |."..............| >>>>> OOB Data: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| >>>>> OOB Data: 33 44 ff ff ff ff ff ff ff ff ff ff ff ff ff ff |3D..............| >>>>> OOB Data: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| >>>>> OOB Data: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| >>>>> >>>> Hi Arseniy, >>>> >>>> I realized the write_oob_raw() and write_oob() are wrong in meson_nand.c. I suggest both of them should be reworked and follow the format of meson nand controller. i.e. firstly format the data in Layout (A) and then write. reading is firstly reading the data of layout (A) and then compost the layout (B). >>> >>> IIUC after such writing only OOB (e.g. user bytes) according layout (A), hw will also write ECC codes, so >>> it will be impossible to write data to this page later, because we cannot update ECC codes properly for the newly >>> written data (we can't update bits from 0 to 1). >>> >>>> >>>> >>>>> >>>>> 2) Attempt to read data in ECC mode will fail, because IIUC page is in dirty >>>>> state (I mean was written at least once) and NAND controller tries to use >>>>> ECC codes at 0x400 and 0x810, which are obviously broken in this case. Thus >>>> >>>> As i said above, write_oob_raw() and write_oob() should be reworked. >>>> i don't know what do you mean page was written at least once. anyway the page should be written once, even just write_oob_raw(). >>> >>> Sorry, You mean that after OOB write, we cannot write to the data area (e.g. 0x0 .. 0x810) until page will be erased? For example >>> JFFS2 writes to OOB own markers, then it tries to write to the data area of such page. > > @Liang, I'll describe current test case in details: > 1) I have erased page, I can read it in both raw and ecc modes - no problem (it is full of 0xFF). > 2) I (JFFS2 for example) want to write only OOB - let it be clean markers. > 3) I use raw write to the needed page (please correct me if i'm wrong). Four bytes > at 0x400 and 0x810 are updated. All other bytes still 0xff. > 4) Now, when i'm trying to read this page in ECC mode, I get ECC errors: IIUC this > happens because from controller point of view ECC codes are invalid for current > data (all ECCs are 0xff). Is this behaviour is ok? Yes, it is exactly reported ECC errors. > 5) Ok, don't care on these ECC errors, let's go further. > 6) I'm going to write same page in ECC mode - how to do it correctly? There is already > 4 OOB bytes, considered to be covered by ECC (but in fact now - ECC area is FFed). If step 4 has excuted "program" command at the page (nand_write_oob_std() does), it can't be written again before erasing the page(block). so we have to read the whole page in the ddr and change the content, erase block, write it again. I don't think Jffs2 has the same steps (1-6) as you said above. are you sure that happes on Jffs2 or just an example? > > That's why I asked Your opinion about moving OOB data to nonprotected by ECC area (and > leave user's bytes untouched). In this case OOB access is free and not linked with ECC > codes which also covers data. > > Thanks, Arseniy > > >> >> A page is written after two steps: >> - loading the data into the NAND chip cache (that's when you use the >> bus) >> - programming the NAND array with the data loaded in cache (that's when >> you wait) >> >> In theory it does not matter where you write in the cache, it's regular >> DRAM, you can make random writes there with the appropriate NAND >> commands. Of course when using embedded hardware ECC engines, the >> controllers usually expect to be fed in a certain way in order to >> produce the ECC bytes and put them at the right location in cache. >> >> And then, when you actually send the "program" command, the NAND cells >> actually get programmed based on what has been loaded in cache. > > Thanks for this details! Very interesting! > > >> >> Thanks, >> Miquèl >
On 19.04.2023 06:05, Liang Yang wrote: > Hi Arseniy, > > On 2023/4/18 22:57, Arseniy Krasnov wrote: >> [ EXTERNAL EMAIL ] >> >> >> >> On 18.04.2023 16:25, Miquel Raynal wrote: >>> Hi Arseniy, >>> >>>>>> Hello again @Liang @Miquel! >>>>>> >>>>>> One more question about OOB access, as I can see current driver uses the following >>>>>> callbacks: >>>>>> >>>>>> nand->ecc.write_oob_raw = nand_write_oob_std; >>>>>> nand->ecc.write_oob = nand_write_oob_std; >>>>>> >>>>>> >>>>>> Function 'nand_write_oob_std()' writes data to the end of the page. But as I >>>>>> can see by dumping 'data_buf' during read, physical layout of each page is the >>>>>> following (1KB ECC): >>>>>> >>>>>> 0x000: [ 1 KB of data ] >>>>>> 0x400: [ 2B user data] [ 14B ECC code] >>>>>> 0x410: [ 1 KB of data ] (A) >>>>>> 0x810: [ 2B user data] [ 14B ECC code] >>>>>> 0x820: [ 32B unused ] >>>>>> >>>>>> >>>>>> >>>>>> So, after 'nand_write_oob_std()' (let data be sequence from [0x0 ... 0x3f]), >>>>>> page will look like this: >>>>>> >>>>>> 0x000: [ 0xFF ] >>>>>> 0x400: [ ........ ] >>>>>> 0x7f0: [ 0xFF ] >>>>>> 0x800: [ 00 ....................... ] >>>>>> 0x830: [ ........................ 3f ] >>>>>> >>>>>> Here we have two problems: >>>>>> 1) Attempt to display raw data by 'nanddump' utility produces a little bit >>>>>> invalid output, as driver relies on layout (A) from above. E.g. OOB data >>>>>> is at 0x400 and 0x810. Here is an example (attempt to write 0x11 0x22 0x33 0x44): >>>>>> >>>>>> 0x000007f0: 11 22 ff ff ff ff ff ff ff ff ff ff ff ff ff ff |."..............| >>>>>> OOB Data: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| >>>>>> OOB Data: 33 44 ff ff ff ff ff ff ff ff ff ff ff ff ff ff |3D..............| >>>>>> OOB Data: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| >>>>>> OOB Data: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| >>>>>> >>>>> Hi Arseniy, >>>>> >>>>> I realized the write_oob_raw() and write_oob() are wrong in meson_nand.c. I suggest both of them should be reworked and follow the format of meson nand controller. i.e. firstly format the data in Layout (A) and then write. reading is firstly reading the data of layout (A) and then compost the layout (B). >>>> >>>> IIUC after such writing only OOB (e.g. user bytes) according layout (A), hw will also write ECC codes, so >>>> it will be impossible to write data to this page later, because we cannot update ECC codes properly for the newly >>>> written data (we can't update bits from 0 to 1). >>>> >>>>> >>>>> >>>>>> >>>>>> 2) Attempt to read data in ECC mode will fail, because IIUC page is in dirty >>>>>> state (I mean was written at least once) and NAND controller tries to use >>>>>> ECC codes at 0x400 and 0x810, which are obviously broken in this case. Thus >>>>> >>>>> As i said above, write_oob_raw() and write_oob() should be reworked. >>>>> i don't know what do you mean page was written at least once. anyway the page should be written once, even just write_oob_raw(). >>>> >>>> Sorry, You mean that after OOB write, we cannot write to the data area (e.g. 0x0 .. 0x810) until page will be erased? For example >>>> JFFS2 writes to OOB own markers, then it tries to write to the data area of such page. >> >> @Liang, I'll describe current test case in details: >> 1) I have erased page, I can read it in both raw and ecc modes - no problem (it is full of 0xFF). >> 2) I (JFFS2 for example) want to write only OOB - let it be clean markers. >> 3) I use raw write to the needed page (please correct me if i'm wrong). Four bytes >> at 0x400 and 0x810 are updated. All other bytes still 0xff. >> 4) Now, when i'm trying to read this page in ECC mode, I get ECC errors: IIUC this >> happens because from controller point of view ECC codes are invalid for current >> data (all ECCs are 0xff). Is this behaviour is ok? > > Yes, it is exactly reported ECC errors. I see, so if we write OOB (e.g. using raw mode), there is no way to read this page in ECC mode later? And the only way to make it readable is to write it in ECC mode, but before this write, we need to read it's user's byte (from previous OOB write) in raw mode, put it to info buf (as user's bytes) and write this page. In this case NAND controller will generate ECC codes including user's byte and page become readable in ECC mode again. > >> 5) Ok, don't care on these ECC errors, let's go further. >> 6) I'm going to write same page in ECC mode - how to do it correctly? There is already >> 4 OOB bytes, considered to be covered by ECC (but in fact now - ECC area is FFed). > > If step 4 has excuted "program" command at the page (nand_write_oob_std() does), it can't be written again before erasing the page(block). so we have to read the whole page in the ddr and change the content, erase block, write it again. > > I don't think Jffs2 has the same steps (1-6) as you said above. are you sure that happes on Jffs2 or just an example? I just checked JFFS2 mount/umount again, here is what i see: 0) First attempt to mount JFFS2. 1) It writes OOB to page N (i'm using raw write). It is cleanmarker value 0x85 0x19 0x03 0x20. Mount is done. 2) Umount JFFS2. Done. 3) Second attempt to mount JFFS2. 4) It reads OOB from page N (i'm using raw read). Value is 0x85 0x19 0x03 0x20. Done. 5) It reads page N in ECC mode, and i get: jffs2: mtd->read(0x100 bytes from N) returned ECC error 6) Mount failed. We already had problem which looks like this on another device. Solution was to use OOB area which is not covered by ECC for JFFS2 cleanmarkers. Thanks, Arseniy > >> >> That's why I asked Your opinion about moving OOB data to nonprotected by ECC area (and >> leave user's bytes untouched). In this case OOB access is free and not linked with ECC >> codes which also covers data. >> >> Thanks, Arseniy >> >>> >>> A page is written after two steps: >>> - loading the data into the NAND chip cache (that's when you use the >>> bus) >>> - programming the NAND array with the data loaded in cache (that's when >>> you wait) >>> >>> In theory it does not matter where you write in the cache, it's regular >>> DRAM, you can make random writes there with the appropriate NAND >>> commands. Of course when using embedded hardware ECC engines, the >>> controllers usually expect to be fed in a certain way in order to >>> produce the ECC bytes and put them at the right location in cache. >>> >>> And then, when you actually send the "program" command, the NAND cells >>> actually get programmed based on what has been loaded in cache. >> >> Thanks for this details! Very interesting! >> >> >>> >>> Thanks, >>> Miquèl >>
On 19.04.2023 09:41, Arseniy Krasnov wrote: > > > On 19.04.2023 06:05, Liang Yang wrote: >> Hi Arseniy, >> >> On 2023/4/18 22:57, Arseniy Krasnov wrote: >>> [ EXTERNAL EMAIL ] >>> >>> >>> >>> On 18.04.2023 16:25, Miquel Raynal wrote: >>>> Hi Arseniy, >>>> >>>>>>> Hello again @Liang @Miquel! >>>>>>> >>>>>>> One more question about OOB access, as I can see current driver uses the following >>>>>>> callbacks: >>>>>>> >>>>>>> nand->ecc.write_oob_raw = nand_write_oob_std; >>>>>>> nand->ecc.write_oob = nand_write_oob_std; >>>>>>> >>>>>>> >>>>>>> Function 'nand_write_oob_std()' writes data to the end of the page. But as I >>>>>>> can see by dumping 'data_buf' during read, physical layout of each page is the >>>>>>> following (1KB ECC): >>>>>>> >>>>>>> 0x000: [ 1 KB of data ] >>>>>>> 0x400: [ 2B user data] [ 14B ECC code] >>>>>>> 0x410: [ 1 KB of data ] (A) >>>>>>> 0x810: [ 2B user data] [ 14B ECC code] >>>>>>> 0x820: [ 32B unused ] >>>>>>> >>>>>>> >>>>>>> >>>>>>> So, after 'nand_write_oob_std()' (let data be sequence from [0x0 ... 0x3f]), >>>>>>> page will look like this: >>>>>>> >>>>>>> 0x000: [ 0xFF ] >>>>>>> 0x400: [ ........ ] >>>>>>> 0x7f0: [ 0xFF ] >>>>>>> 0x800: [ 00 ....................... ] >>>>>>> 0x830: [ ........................ 3f ] >>>>>>> >>>>>>> Here we have two problems: >>>>>>> 1) Attempt to display raw data by 'nanddump' utility produces a little bit >>>>>>> invalid output, as driver relies on layout (A) from above. E.g. OOB data >>>>>>> is at 0x400 and 0x810. Here is an example (attempt to write 0x11 0x22 0x33 0x44): >>>>>>> >>>>>>> 0x000007f0: 11 22 ff ff ff ff ff ff ff ff ff ff ff ff ff ff |."..............| >>>>>>> OOB Data: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| >>>>>>> OOB Data: 33 44 ff ff ff ff ff ff ff ff ff ff ff ff ff ff |3D..............| >>>>>>> OOB Data: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| >>>>>>> OOB Data: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| >>>>>>> >>>>>> Hi Arseniy, >>>>>> >>>>>> I realized the write_oob_raw() and write_oob() are wrong in meson_nand.c. I suggest both of them should be reworked and follow the format of meson nand controller. i.e. firstly format the data in Layout (A) and then write. reading is firstly reading the data of layout (A) and then compost the layout (B). >>>>> >>>>> IIUC after such writing only OOB (e.g. user bytes) according layout (A), hw will also write ECC codes, so >>>>> it will be impossible to write data to this page later, because we cannot update ECC codes properly for the newly >>>>> written data (we can't update bits from 0 to 1). >>>>> >>>>>> >>>>>> >>>>>>> >>>>>>> 2) Attempt to read data in ECC mode will fail, because IIUC page is in dirty >>>>>>> state (I mean was written at least once) and NAND controller tries to use >>>>>>> ECC codes at 0x400 and 0x810, which are obviously broken in this case. Thus >>>>>> >>>>>> As i said above, write_oob_raw() and write_oob() should be reworked. >>>>>> i don't know what do you mean page was written at least once. anyway the page should be written once, even just write_oob_raw(). >>>>> >>>>> Sorry, You mean that after OOB write, we cannot write to the data area (e.g. 0x0 .. 0x810) until page will be erased? For example >>>>> JFFS2 writes to OOB own markers, then it tries to write to the data area of such page. >>> >>> @Liang, I'll describe current test case in details: >>> 1) I have erased page, I can read it in both raw and ecc modes - no problem (it is full of 0xFF). >>> 2) I (JFFS2 for example) want to write only OOB - let it be clean markers. >>> 3) I use raw write to the needed page (please correct me if i'm wrong). Four bytes >>> at 0x400 and 0x810 are updated. All other bytes still 0xff. >>> 4) Now, when i'm trying to read this page in ECC mode, I get ECC errors: IIUC this >>> happens because from controller point of view ECC codes are invalid for current >>> data (all ECCs are 0xff). Is this behaviour is ok? >> >> Yes, it is exactly reported ECC errors. > > I see, so if we write OOB (e.g. using raw mode), there is no way to read this page in ECC mode later? And the > only way to make it readable is to write it in ECC mode, but before this write, we need to read it's > user's byte (from previous OOB write) in raw mode, put it to info buf (as user's bytes) and write this page. In this > case NAND controller will generate ECC codes including user's byte and page become readable in ECC mode > again. > >> >>> 5) Ok, don't care on these ECC errors, let's go further. >>> 6) I'm going to write same page in ECC mode - how to do it correctly? There is already >>> 4 OOB bytes, considered to be covered by ECC (but in fact now - ECC area is FFed). >> >> If step 4 has excuted "program" command at the page (nand_write_oob_std() does), it can't be written again before erasing the page(block). so we have to read the whole page in the ddr and change the content, erase block, write it again. >> >> I don't think Jffs2 has the same steps (1-6) as you said above. are you sure that happes on Jffs2 or just an example? > > I just checked JFFS2 mount/umount again, here is what i see: > 0) First attempt to mount JFFS2. > 1) It writes OOB to page N (i'm using raw write). It is cleanmarker value 0x85 0x19 0x03 0x20. Mount is done. > 2) Umount JFFS2. Done. > 3) Second attempt to mount JFFS2. > 4) It reads OOB from page N (i'm using raw read). Value is 0x85 0x19 0x03 0x20. Done. > 5) It reads page N in ECC mode, and i get: > jffs2: mtd->read(0x100 bytes from N) returned ECC error > 6) Mount failed. > > We already had problem which looks like this on another device. Solution was to use OOB area which is > not covered by ECC for JFFS2 cleanmarkers. > > Thanks, Arseniy > @Liang, Small addition, if i'm trying to implement OOB read/write in ECC mode, then step 5) will success, but later, JFFS2 tries to write this page (in ECC mode of course), and in this case ECC codes will be broken, because we can't update them properly without erasing whole page. Please take a look at this patch from my colleagues: https://lore.kernel.org/all/20230329114240.378722-1-mmkurbanov@sberdevices.ru It is related with "We already had problem which looks like this on another device" from above: in 'f50l1g41lb_ooblayout_free()' we reserve 2 bytes in non-ECC area for bad block markers. Thanks, Arseniy >> >>> >>> That's why I asked Your opinion about moving OOB data to nonprotected by ECC area (and >>> leave user's bytes untouched). In this case OOB access is free and not linked with ECC >>> codes which also covers data. >>> >>> Thanks, Arseniy >>> >>>> >>>> A page is written after two steps: >>>> - loading the data into the NAND chip cache (that's when you use the >>>> bus) >>>> - programming the NAND array with the data loaded in cache (that's when >>>> you wait) >>>> >>>> In theory it does not matter where you write in the cache, it's regular >>>> DRAM, you can make random writes there with the appropriate NAND >>>> commands. Of course when using embedded hardware ECC engines, the >>>> controllers usually expect to be fed in a certain way in order to >>>> produce the ECC bytes and put them at the right location in cache. >>>> >>>> And then, when you actually send the "program" command, the NAND cells >>>> actually get programmed based on what has been loaded in cache. >>> >>> Thanks for this details! Very interesting! >>> >>> >>>> >>>> Thanks, >>>> Miquèl >>>
Hi Arseniy, On 2023/4/20 17:37, Arseniy Krasnov wrote: > [ EXTERNAL EMAIL ] > > On 19.04.2023 09:41, Arseniy Krasnov wrote: >> >> >> On 19.04.2023 06:05, Liang Yang wrote: >>> Hi Arseniy, >>> >>> On 2023/4/18 22:57, Arseniy Krasnov wrote: >>>> [ EXTERNAL EMAIL ] >>>> >>>> >>>> >>>> On 18.04.2023 16:25, Miquel Raynal wrote: >>>>> Hi Arseniy, >>>>> >>>>>>>> Hello again @Liang @Miquel! >>>>>>>> >>>>>>>> One more question about OOB access, as I can see current driver uses the following >>>>>>>> callbacks: >>>>>>>> >>>>>>>> nand->ecc.write_oob_raw = nand_write_oob_std; >>>>>>>> nand->ecc.write_oob = nand_write_oob_std; >>>>>>>> >>>>>>>> >>>>>>>> Function 'nand_write_oob_std()' writes data to the end of the page. But as I >>>>>>>> can see by dumping 'data_buf' during read, physical layout of each page is the >>>>>>>> following (1KB ECC): >>>>>>>> >>>>>>>> 0x000: [ 1 KB of data ] >>>>>>>> 0x400: [ 2B user data] [ 14B ECC code] >>>>>>>> 0x410: [ 1 KB of data ] (A) >>>>>>>> 0x810: [ 2B user data] [ 14B ECC code] >>>>>>>> 0x820: [ 32B unused ] >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> So, after 'nand_write_oob_std()' (let data be sequence from [0x0 ... 0x3f]), >>>>>>>> page will look like this: >>>>>>>> >>>>>>>> 0x000: [ 0xFF ] >>>>>>>> 0x400: [ ........ ] >>>>>>>> 0x7f0: [ 0xFF ] >>>>>>>> 0x800: [ 00 ....................... ] >>>>>>>> 0x830: [ ........................ 3f ] >>>>>>>> >>>>>>>> Here we have two problems: >>>>>>>> 1) Attempt to display raw data by 'nanddump' utility produces a little bit >>>>>>>> invalid output, as driver relies on layout (A) from above. E.g. OOB data >>>>>>>> is at 0x400 and 0x810. Here is an example (attempt to write 0x11 0x22 0x33 0x44): >>>>>>>> >>>>>>>> 0x000007f0: 11 22 ff ff ff ff ff ff ff ff ff ff ff ff ff ff |."..............| >>>>>>>> OOB Data: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| >>>>>>>> OOB Data: 33 44 ff ff ff ff ff ff ff ff ff ff ff ff ff ff |3D..............| >>>>>>>> OOB Data: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| >>>>>>>> OOB Data: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| >>>>>>>> >>>>>>> Hi Arseniy, >>>>>>> >>>>>>> I realized the write_oob_raw() and write_oob() are wrong in meson_nand.c. I suggest both of them should be reworked and follow the format of meson nand controller. i.e. firstly format the data in Layout (A) and then write. reading is firstly reading the data of layout (A) and then compost the layout (B). >>>>>> >>>>>> IIUC after such writing only OOB (e.g. user bytes) according layout (A), hw will also write ECC codes, so >>>>>> it will be impossible to write data to this page later, because we cannot update ECC codes properly for the newly >>>>>> written data (we can't update bits from 0 to 1). >>>>>> >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> 2) Attempt to read data in ECC mode will fail, because IIUC page is in dirty >>>>>>>> state (I mean was written at least once) and NAND controller tries to use >>>>>>>> ECC codes at 0x400 and 0x810, which are obviously broken in this case. Thus >>>>>>> >>>>>>> As i said above, write_oob_raw() and write_oob() should be reworked. >>>>>>> i don't know what do you mean page was written at least once. anyway the page should be written once, even just write_oob_raw(). >>>>>> >>>>>> Sorry, You mean that after OOB write, we cannot write to the data area (e.g. 0x0 .. 0x810) until page will be erased? For example >>>>>> JFFS2 writes to OOB own markers, then it tries to write to the data area of such page. >>>> >>>> @Liang, I'll describe current test case in details: >>>> 1) I have erased page, I can read it in both raw and ecc modes - no problem (it is full of 0xFF). >>>> 2) I (JFFS2 for example) want to write only OOB - let it be clean markers. >>>> 3) I use raw write to the needed page (please correct me if i'm wrong). Four bytes >>>> at 0x400 and 0x810 are updated. All other bytes still 0xff. >>>> 4) Now, when i'm trying to read this page in ECC mode, I get ECC errors: IIUC this >>>> happens because from controller point of view ECC codes are invalid for current >>>> data (all ECCs are 0xff). Is this behaviour is ok? >>> >>> Yes, it is exactly reported ECC errors. >> >> I see, so if we write OOB (e.g. using raw mode), there is no way to read this page in ECC mode later? And the Of course, there is no ECC parity bytes in it; or raw write the data with the ECC parity bytes per the layout (A) you describe above. >> only way to make it readable is to write it in ECC mode, but before this write, we need to read it's >> user's byte (from previous OOB write) in raw mode, put it to info buf (as user's bytes) and write this page. In this >> case NAND controller will generate ECC codes including user's byte and page become readable in ECC mode >> again. yes, you are right. >> >>> >>>> 5) Ok, don't care on these ECC errors, let's go further. >>>> 6) I'm going to write same page in ECC mode - how to do it correctly? There is already >>>> 4 OOB bytes, considered to be covered by ECC (but in fact now - ECC area is FFed). >>> >>> If step 4 has excuted "program" command at the page (nand_write_oob_std() does), it can't be written again before erasing the page(block). so we have to read the whole page in the ddr and change the content, erase block, write it again. >>> >>> I don't think Jffs2 has the same steps (1-6) as you said above. are you sure that happes on Jffs2 or just an example? > > >> >> I just checked JFFS2 mount/umount again, here is what i see: >> 0) First attempt to mount JFFS2. >> 1) It writes OOB to page N (i'm using raw write). It is cleanmarker value 0x85 0x19 0x03 0x20. Mount is done. >> 2) Umount JFFS2. Done. >> 3) Second attempt to mount JFFS2. >> 4) It reads OOB from page N (i'm using raw read). Value is 0x85 0x19 0x03 0x20. Done. >> 5) It reads page N in ECC mode, and i get: >> jffs2: mtd->read(0x100 bytes from N) returned ECC error >> 6) Mount failed. >> >> We already had problem which looks like this on another device. Solution was to use OOB area which is >> not covered by ECC for JFFS2 cleanmarkers. ok, so there is not ECC parity bytes and mtd->read() returns ECC error. does it have to use raw write/read on step 1) and 4)? >> >> Thanks, Arseniy >> > > @Liang, > > Small addition, if i'm trying to implement OOB read/write in ECC mode, then step 5) will success, > but later, JFFS2 tries to write this page (in ECC mode of course), and in this case ECC codes will > be broken, because we can't update them properly without erasing whole page. > > Please take a look at this patch from my colleagues: > https://lore.kernel.org/all/20230329114240.378722-1-mmkurbanov@sberdevices.ru > It is related with "We already had problem which looks like this on another device" from above: > in 'f50l1g41lb_ooblayout_free()' we reserve 2 bytes in non-ECC area for bad block markers. It is about the SPI NAND and use another controller called spifc. but in meson nfc, it is heavily depended on the pre-created BBT in NAND device. > > Thanks, Arseniy > > >>> >>>> >>>> That's why I asked Your opinion about moving OOB data to nonprotected by ECC area (and >>>> leave user's bytes untouched). In this case OOB access is free and not linked with ECC >>>> codes which also covers data. >>>> >>>> Thanks, Arseniy >>>> >>>>> >>>>> A page is written after two steps: >>>>> - loading the data into the NAND chip cache (that's when you use the >>>>> bus) >>>>> - programming the NAND array with the data loaded in cache (that's when >>>>> you wait) >>>>> >>>>> In theory it does not matter where you write in the cache, it's regular >>>>> DRAM, you can make random writes there with the appropriate NAND >>>>> commands. Of course when using embedded hardware ECC engines, the >>>>> controllers usually expect to be fed in a certain way in order to >>>>> produce the ECC bytes and put them at the right location in cache. >>>>> >>>>> And then, when you actually send the "program" command, the NAND cells >>>>> actually get programmed based on what has been loaded in cache. >>>> >>>> Thanks for this details! Very interesting! >>>> >>>> >>>>> >>>>> Thanks, >>>>> Miquèl >>>>
On 26.04.2023 16:51, Liang Yang wrote: > Hi Arseniy, > > On 2023/4/20 17:37, Arseniy Krasnov wrote: >> [ EXTERNAL EMAIL ] >> >> On 19.04.2023 09:41, Arseniy Krasnov wrote: >>> >>> >>> On 19.04.2023 06:05, Liang Yang wrote: >>>> Hi Arseniy, >>>> >>>> On 2023/4/18 22:57, Arseniy Krasnov wrote: >>>>> [ EXTERNAL EMAIL ] >>>>> >>>>> >>>>> >>>>> On 18.04.2023 16:25, Miquel Raynal wrote: >>>>>> Hi Arseniy, >>>>>> >>>>>>>>> Hello again @Liang @Miquel! >>>>>>>>> >>>>>>>>> One more question about OOB access, as I can see current driver uses the following >>>>>>>>> callbacks: >>>>>>>>> >>>>>>>>> nand->ecc.write_oob_raw = nand_write_oob_std; >>>>>>>>> nand->ecc.write_oob = nand_write_oob_std; >>>>>>>>> >>>>>>>>> >>>>>>>>> Function 'nand_write_oob_std()' writes data to the end of the page. But as I >>>>>>>>> can see by dumping 'data_buf' during read, physical layout of each page is the >>>>>>>>> following (1KB ECC): >>>>>>>>> >>>>>>>>> 0x000: [ 1 KB of data ] >>>>>>>>> 0x400: [ 2B user data] [ 14B ECC code] >>>>>>>>> 0x410: [ 1 KB of data ] (A) >>>>>>>>> 0x810: [ 2B user data] [ 14B ECC code] >>>>>>>>> 0x820: [ 32B unused ] >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> So, after 'nand_write_oob_std()' (let data be sequence from [0x0 ... 0x3f]), >>>>>>>>> page will look like this: >>>>>>>>> >>>>>>>>> 0x000: [ 0xFF ] >>>>>>>>> 0x400: [ ........ ] >>>>>>>>> 0x7f0: [ 0xFF ] >>>>>>>>> 0x800: [ 00 ....................... ] >>>>>>>>> 0x830: [ ........................ 3f ] >>>>>>>>> >>>>>>>>> Here we have two problems: >>>>>>>>> 1) Attempt to display raw data by 'nanddump' utility produces a little bit >>>>>>>>> invalid output, as driver relies on layout (A) from above. E.g. OOB data >>>>>>>>> is at 0x400 and 0x810. Here is an example (attempt to write 0x11 0x22 0x33 0x44): >>>>>>>>> >>>>>>>>> 0x000007f0: 11 22 ff ff ff ff ff ff ff ff ff ff ff ff ff ff |."..............| >>>>>>>>> OOB Data: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| >>>>>>>>> OOB Data: 33 44 ff ff ff ff ff ff ff ff ff ff ff ff ff ff |3D..............| >>>>>>>>> OOB Data: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| >>>>>>>>> OOB Data: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| >>>>>>>>> >>>>>>>> Hi Arseniy, >>>>>>>> >>>>>>>> I realized the write_oob_raw() and write_oob() are wrong in meson_nand.c. I suggest both of them should be reworked and follow the format of meson nand controller. i.e. firstly format the data in Layout (A) and then write. reading is firstly reading the data of layout (A) and then compost the layout (B). >>>>>>> >>>>>>> IIUC after such writing only OOB (e.g. user bytes) according layout (A), hw will also write ECC codes, so >>>>>>> it will be impossible to write data to this page later, because we cannot update ECC codes properly for the newly >>>>>>> written data (we can't update bits from 0 to 1). >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> 2) Attempt to read data in ECC mode will fail, because IIUC page is in dirty >>>>>>>>> state (I mean was written at least once) and NAND controller tries to use >>>>>>>>> ECC codes at 0x400 and 0x810, which are obviously broken in this case. Thus >>>>>>>> >>>>>>>> As i said above, write_oob_raw() and write_oob() should be reworked. >>>>>>>> i don't know what do you mean page was written at least once. anyway the page should be written once, even just write_oob_raw(). >>>>>>> >>>>>>> Sorry, You mean that after OOB write, we cannot write to the data area (e.g. 0x0 .. 0x810) until page will be erased? For example >>>>>>> JFFS2 writes to OOB own markers, then it tries to write to the data area of such page. >>>>> >>>>> @Liang, I'll describe current test case in details: >>>>> 1) I have erased page, I can read it in both raw and ecc modes - no problem (it is full of 0xFF). >>>>> 2) I (JFFS2 for example) want to write only OOB - let it be clean markers. >>>>> 3) I use raw write to the needed page (please correct me if i'm wrong). Four bytes >>>>> at 0x400 and 0x810 are updated. All other bytes still 0xff. >>>>> 4) Now, when i'm trying to read this page in ECC mode, I get ECC errors: IIUC this >>>>> happens because from controller point of view ECC codes are invalid for current >>>>> data (all ECCs are 0xff). Is this behaviour is ok? >>>> >>>> Yes, it is exactly reported ECC errors. >>> >>> I see, so if we write OOB (e.g. using raw mode), there is no way to read this page in ECC mode later? And the > > Of course, there is no ECC parity bytes in it; or raw write the data with the ECC parity bytes per the layout (A) you describe above. > But don't it looks like strange? Just writing OOB makes page unreadable? May be it is better to move OOB data out of ECC area as I suggested in v2? >>> only way to make it readable is to write it in ECC mode, but before this write, we need to read it's >>> user's byte (from previous OOB write) in raw mode, put it to info buf (as user's bytes) and write this page. In this >>> case NAND controller will generate ECC codes including user's byte and page become readable in ECC mode >>> again. > > yes, you are right. > >>> >>>> >>>>> 5) Ok, don't care on these ECC errors, let's go further. >>>>> 6) I'm going to write same page in ECC mode - how to do it correctly? There is already >>>>> 4 OOB bytes, considered to be covered by ECC (but in fact now - ECC area is FFed). >>>> >>>> If step 4 has excuted "program" command at the page (nand_write_oob_std() does), it can't be written again before erasing the page(block). so we have to read the whole page in the ddr and change the content, erase block, write it again. >>>> >>>> I don't think Jffs2 has the same steps (1-6) as you said above. are you sure that happes on Jffs2 or just an example? >> >> >>> >>> I just checked JFFS2 mount/umount again, here is what i see: >>> 0) First attempt to mount JFFS2. >>> 1) It writes OOB to page N (i'm using raw write). It is cleanmarker value 0x85 0x19 0x03 0x20. Mount is done. >>> 2) Umount JFFS2. Done. >>> 3) Second attempt to mount JFFS2. >>> 4) It reads OOB from page N (i'm using raw read). Value is 0x85 0x19 0x03 0x20. Done. >>> 5) It reads page N in ECC mode, and i get: >>> jffs2: mtd->read(0x100 bytes from N) returned ECC error >>> 6) Mount failed. >>> >>> We already had problem which looks like this on another device. Solution was to use OOB area which is >>> not covered by ECC for JFFS2 cleanmarkers. > > ok, so there is not ECC parity bytes and mtd->read() returns ECC error. > does it have to use raw write/read on step 1) and 4)? > If i'm using non raw access to OOB, for example write OOB (user bytes) in ECC mode, then steps 1) and 4) and 5) passes ok, but write to this page will be impossible (for example JFFS2 writes to such pages later) - we can't update ECC codes properly without erasing whole page. Write operation will be done without problem, but read will trigger ECC errors due to broken ECC codes. In general problem that we discuss is that in current implementation data and OOB conflicts with each other by sharing same ECC codes, these ECC codes could be written only once (without erasing), while data and OOB has different callbacks to access and thus supposed to work separately. What do You think? Thanks, Arseniy >>> >>> Thanks, Arseniy >>> >> >> @Liang, >> >> Small addition, if i'm trying to implement OOB read/write in ECC mode, then step 5) will success, >> but later, JFFS2 tries to write this page (in ECC mode of course), and in this case ECC codes will >> be broken, because we can't update them properly without erasing whole page. >> >> Please take a look at this patch from my colleagues: >> https://lore.kernel.org/all/20230329114240.378722-1-mmkurbanov@sberdevices.ru >> It is related with "We already had problem which looks like this on another device" from above: >> in 'f50l1g41lb_ooblayout_free()' we reserve 2 bytes in non-ECC area for bad block markers. > > It is about the SPI NAND and use another controller called spifc. > but in meson nfc, it is heavily depended on the pre-created BBT in NAND device. > > >> >> Thanks, Arseniy >> >> >>>> >>>>> >>>>> That's why I asked Your opinion about moving OOB data to nonprotected by ECC area (and >>>>> leave user's bytes untouched). In this case OOB access is free and not linked with ECC >>>>> codes which also covers data. >>>>> >>>>> Thanks, Arseniy >>>>> >>>>>> >>>>>> A page is written after two steps: >>>>>> - loading the data into the NAND chip cache (that's when you use the >>>>>> bus) >>>>>> - programming the NAND array with the data loaded in cache (that's when >>>>>> you wait) >>>>>> >>>>>> In theory it does not matter where you write in the cache, it's regular >>>>>> DRAM, you can make random writes there with the appropriate NAND >>>>>> commands. Of course when using embedded hardware ECC engines, the >>>>>> controllers usually expect to be fed in a certain way in order to >>>>>> produce the ECC bytes and put them at the right location in cache. >>>>>> >>>>>> And then, when you actually send the "program" command, the NAND cells >>>>>> actually get programmed based on what has been loaded in cache. >>>>> >>>>> Thanks for this details! Very interesting! >>>>> >>>>> >>>>>> >>>>>> Thanks, >>>>>> Miquèl >>>>>
Hi Arseniy, avkrasnov@sberdevices.ru wrote on Wed, 26 Apr 2023 17:46:19 +0300: > On 26.04.2023 16:51, Liang Yang wrote: > > Hi Arseniy, > > > > On 2023/4/20 17:37, Arseniy Krasnov wrote: > >> [ EXTERNAL EMAIL ] > >> > >> On 19.04.2023 09:41, Arseniy Krasnov wrote: > >>> > >>> > >>> On 19.04.2023 06:05, Liang Yang wrote: > >>>> Hi Arseniy, > >>>> > >>>> On 2023/4/18 22:57, Arseniy Krasnov wrote: > >>>>> [ EXTERNAL EMAIL ] > >>>>> > >>>>> > >>>>> > >>>>> On 18.04.2023 16:25, Miquel Raynal wrote: > >>>>>> Hi Arseniy, > >>>>>> > >>>>>>>>> Hello again @Liang @Miquel! > >>>>>>>>> > >>>>>>>>> One more question about OOB access, as I can see current driver uses the following > >>>>>>>>> callbacks: > >>>>>>>>> > >>>>>>>>> nand->ecc.write_oob_raw = nand_write_oob_std; > >>>>>>>>> nand->ecc.write_oob = nand_write_oob_std; > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> Function 'nand_write_oob_std()' writes data to the end of the page. But as I > >>>>>>>>> can see by dumping 'data_buf' during read, physical layout of each page is the > >>>>>>>>> following (1KB ECC): > >>>>>>>>> > >>>>>>>>> 0x000: [ 1 KB of data ] > >>>>>>>>> 0x400: [ 2B user data] [ 14B ECC code] > >>>>>>>>> 0x410: [ 1 KB of data ] (A) > >>>>>>>>> 0x810: [ 2B user data] [ 14B ECC code] > >>>>>>>>> 0x820: [ 32B unused ] > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> So, after 'nand_write_oob_std()' (let data be sequence from [0x0 ... 0x3f]), > >>>>>>>>> page will look like this: > >>>>>>>>> > >>>>>>>>> 0x000: [ 0xFF ] > >>>>>>>>> 0x400: [ ........ ] > >>>>>>>>> 0x7f0: [ 0xFF ] > >>>>>>>>> 0x800: [ 00 ....................... ] > >>>>>>>>> 0x830: [ ........................ 3f ] > >>>>>>>>> > >>>>>>>>> Here we have two problems: > >>>>>>>>> 1) Attempt to display raw data by 'nanddump' utility produces a little bit > >>>>>>>>> invalid output, as driver relies on layout (A) from above. E.g. OOB data > >>>>>>>>> is at 0x400 and 0x810. Here is an example (attempt to write 0x11 0x22 0x33 0x44): > >>>>>>>>> > >>>>>>>>> 0x000007f0: 11 22 ff ff ff ff ff ff ff ff ff ff ff ff ff ff |."..............| > >>>>>>>>> OOB Data: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| > >>>>>>>>> OOB Data: 33 44 ff ff ff ff ff ff ff ff ff ff ff ff ff ff |3D..............| > >>>>>>>>> OOB Data: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| > >>>>>>>>> OOB Data: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| > >>>>>>>>> > >>>>>>>> Hi Arseniy, > >>>>>>>> > >>>>>>>> I realized the write_oob_raw() and write_oob() are wrong in meson_nand.c. I suggest both of them should be reworked and follow the format of meson nand controller. i.e. firstly format the data in Layout (A) and then write. reading is firstly reading the data of layout (A) and then compost the layout (B). > >>>>>>> > >>>>>>> IIUC after such writing only OOB (e.g. user bytes) according layout (A), hw will also write ECC codes, so > >>>>>>> it will be impossible to write data to this page later, because we cannot update ECC codes properly for the newly > >>>>>>> written data (we can't update bits from 0 to 1). > >>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>> > >>>>>>>>> 2) Attempt to read data in ECC mode will fail, because IIUC page is in dirty > >>>>>>>>> state (I mean was written at least once) and NAND controller tries to use > >>>>>>>>> ECC codes at 0x400 and 0x810, which are obviously broken in this case. Thus > >>>>>>>> > >>>>>>>> As i said above, write_oob_raw() and write_oob() should be reworked. > >>>>>>>> i don't know what do you mean page was written at least once. anyway the page should be written once, even just write_oob_raw(). > >>>>>>> > >>>>>>> Sorry, You mean that after OOB write, we cannot write to the data area (e.g. 0x0 .. 0x810) until page will be erased? For example > >>>>>>> JFFS2 writes to OOB own markers, then it tries to write to the data area of such page. > >>>>> > >>>>> @Liang, I'll describe current test case in details: > >>>>> 1) I have erased page, I can read it in both raw and ecc modes - no problem (it is full of 0xFF). > >>>>> 2) I (JFFS2 for example) want to write only OOB - let it be clean markers. > >>>>> 3) I use raw write to the needed page (please correct me if i'm wrong). Four bytes > >>>>> at 0x400 and 0x810 are updated. All other bytes still 0xff. > >>>>> 4) Now, when i'm trying to read this page in ECC mode, I get ECC errors: IIUC this > >>>>> happens because from controller point of view ECC codes are invalid for current > >>>>> data (all ECCs are 0xff). Is this behaviour is ok? > >>>> > >>>> Yes, it is exactly reported ECC errors. > >>> > >>> I see, so if we write OOB (e.g. using raw mode), there is no way to read this page in ECC mode later? And the > > > > Of course, there is no ECC parity bytes in it; or raw write the data with the ECC parity bytes per the layout (A) you describe above. > > > > But don't it looks like strange? Just writing OOB makes page unreadable? May be it is better to move OOB data > out of ECC area as I suggested in v2? > > >>> only way to make it readable is to write it in ECC mode, but before this write, we need to read it's > >>> user's byte (from previous OOB write) in raw mode, put it to info buf (as user's bytes) and write this page. In this > >>> case NAND controller will generate ECC codes including user's byte and page become readable in ECC mode > >>> again. > > > > yes, you are right. > > > >>> > >>>> > >>>>> 5) Ok, don't care on these ECC errors, let's go further. > >>>>> 6) I'm going to write same page in ECC mode - how to do it correctly? There is already > >>>>> 4 OOB bytes, considered to be covered by ECC (but in fact now - ECC area is FFed). > >>>> > >>>> If step 4 has excuted "program" command at the page (nand_write_oob_std() does), it can't be written again before erasing the page(block). so we have to read the whole page in the ddr and change the content, erase block, write it again. > >>>> > >>>> I don't think Jffs2 has the same steps (1-6) as you said above. are you sure that happes on Jffs2 or just an example? > >> > >> > >>> > >>> I just checked JFFS2 mount/umount again, here is what i see: > >>> 0) First attempt to mount JFFS2. > >>> 1) It writes OOB to page N (i'm using raw write). It is cleanmarker value 0x85 0x19 0x03 0x20. Mount is done. > >>> 2) Umount JFFS2. Done. > >>> 3) Second attempt to mount JFFS2. > >>> 4) It reads OOB from page N (i'm using raw read). Value is 0x85 0x19 0x03 0x20. Done. > >>> 5) It reads page N in ECC mode, and i get: > >>> jffs2: mtd->read(0x100 bytes from N) returned ECC error > >>> 6) Mount failed. > >>> > >>> We already had problem which looks like this on another device. Solution was to use OOB area which is > >>> not covered by ECC for JFFS2 cleanmarkers. > > > > ok, so there is not ECC parity bytes and mtd->read() returns ECC error. > > does it have to use raw write/read on step 1) and 4)? > > > > If i'm using non raw access to OOB, for example write OOB (user bytes) in ECC mode, then > steps 1) and 4) and 5) passes ok, but write to this page will be impossible (for example JFFS2 > writes to such pages later) - we can't update ECC codes properly without erasing whole page. > Write operation will be done without problem, but read will trigger ECC errors due to broken > ECC codes. > > In general problem that we discuss is that in current implementation data and OOB conflicts > with each other by sharing same ECC codes, these ECC codes could be written only once (without > erasing), while data and OOB has different callbacks to access and thus supposed to work > separately. The fact that there might be helpers just for writing OOB areas or just in-band areas are optimizations. NAND pages are meant to be written a single time, no matter what portion you write. In some cases, it is possible to perform subpage writes if the chip supports it. Pages may be split into several areas which cover a partial in-band area *and* a partial OOB area. If you write into the in-band *or* out-of-band areas of a given subpage, you *cannot* write the other part later without erasing. Thanks, Miquèl
On 02.05.2023 12:59, Miquel Raynal wrote: > Hi Arseniy, > > avkrasnov@sberdevices.ru wrote on Wed, 26 Apr 2023 17:46:19 +0300: > >> On 26.04.2023 16:51, Liang Yang wrote: >>> Hi Arseniy, >>> >>> On 2023/4/20 17:37, Arseniy Krasnov wrote: >>>> [ EXTERNAL EMAIL ] >>>> >>>> On 19.04.2023 09:41, Arseniy Krasnov wrote: >>>>> >>>>> >>>>> On 19.04.2023 06:05, Liang Yang wrote: >>>>>> Hi Arseniy, >>>>>> >>>>>> On 2023/4/18 22:57, Arseniy Krasnov wrote: >>>>>>> [ EXTERNAL EMAIL ] >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 18.04.2023 16:25, Miquel Raynal wrote: >>>>>>>> Hi Arseniy, >>>>>>>> >>>>>>>>>>> Hello again @Liang @Miquel! >>>>>>>>>>> >>>>>>>>>>> One more question about OOB access, as I can see current driver uses the following >>>>>>>>>>> callbacks: >>>>>>>>>>> >>>>>>>>>>> nand->ecc.write_oob_raw = nand_write_oob_std; >>>>>>>>>>> nand->ecc.write_oob = nand_write_oob_std; >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Function 'nand_write_oob_std()' writes data to the end of the page. But as I >>>>>>>>>>> can see by dumping 'data_buf' during read, physical layout of each page is the >>>>>>>>>>> following (1KB ECC): >>>>>>>>>>> >>>>>>>>>>> 0x000: [ 1 KB of data ] >>>>>>>>>>> 0x400: [ 2B user data] [ 14B ECC code] >>>>>>>>>>> 0x410: [ 1 KB of data ] (A) >>>>>>>>>>> 0x810: [ 2B user data] [ 14B ECC code] >>>>>>>>>>> 0x820: [ 32B unused ] >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> So, after 'nand_write_oob_std()' (let data be sequence from [0x0 ... 0x3f]), >>>>>>>>>>> page will look like this: >>>>>>>>>>> >>>>>>>>>>> 0x000: [ 0xFF ] >>>>>>>>>>> 0x400: [ ........ ] >>>>>>>>>>> 0x7f0: [ 0xFF ] >>>>>>>>>>> 0x800: [ 00 ....................... ] >>>>>>>>>>> 0x830: [ ........................ 3f ] >>>>>>>>>>> >>>>>>>>>>> Here we have two problems: >>>>>>>>>>> 1) Attempt to display raw data by 'nanddump' utility produces a little bit >>>>>>>>>>> invalid output, as driver relies on layout (A) from above. E.g. OOB data >>>>>>>>>>> is at 0x400 and 0x810. Here is an example (attempt to write 0x11 0x22 0x33 0x44): >>>>>>>>>>> >>>>>>>>>>> 0x000007f0: 11 22 ff ff ff ff ff ff ff ff ff ff ff ff ff ff |."..............| >>>>>>>>>>> OOB Data: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| >>>>>>>>>>> OOB Data: 33 44 ff ff ff ff ff ff ff ff ff ff ff ff ff ff |3D..............| >>>>>>>>>>> OOB Data: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| >>>>>>>>>>> OOB Data: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| >>>>>>>>>>> >>>>>>>>>> Hi Arseniy, >>>>>>>>>> >>>>>>>>>> I realized the write_oob_raw() and write_oob() are wrong in meson_nand.c. I suggest both of them should be reworked and follow the format of meson nand controller. i.e. firstly format the data in Layout (A) and then write. reading is firstly reading the data of layout (A) and then compost the layout (B). >>>>>>>>> >>>>>>>>> IIUC after such writing only OOB (e.g. user bytes) according layout (A), hw will also write ECC codes, so >>>>>>>>> it will be impossible to write data to this page later, because we cannot update ECC codes properly for the newly >>>>>>>>> written data (we can't update bits from 0 to 1). >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> 2) Attempt to read data in ECC mode will fail, because IIUC page is in dirty >>>>>>>>>>> state (I mean was written at least once) and NAND controller tries to use >>>>>>>>>>> ECC codes at 0x400 and 0x810, which are obviously broken in this case. Thus >>>>>>>>>> >>>>>>>>>> As i said above, write_oob_raw() and write_oob() should be reworked. >>>>>>>>>> i don't know what do you mean page was written at least once. anyway the page should be written once, even just write_oob_raw(). >>>>>>>>> >>>>>>>>> Sorry, You mean that after OOB write, we cannot write to the data area (e.g. 0x0 .. 0x810) until page will be erased? For example >>>>>>>>> JFFS2 writes to OOB own markers, then it tries to write to the data area of such page. >>>>>>> >>>>>>> @Liang, I'll describe current test case in details: >>>>>>> 1) I have erased page, I can read it in both raw and ecc modes - no problem (it is full of 0xFF). >>>>>>> 2) I (JFFS2 for example) want to write only OOB - let it be clean markers. >>>>>>> 3) I use raw write to the needed page (please correct me if i'm wrong). Four bytes >>>>>>> at 0x400 and 0x810 are updated. All other bytes still 0xff. >>>>>>> 4) Now, when i'm trying to read this page in ECC mode, I get ECC errors: IIUC this >>>>>>> happens because from controller point of view ECC codes are invalid for current >>>>>>> data (all ECCs are 0xff). Is this behaviour is ok? >>>>>> >>>>>> Yes, it is exactly reported ECC errors. >>>>> >>>>> I see, so if we write OOB (e.g. using raw mode), there is no way to read this page in ECC mode later? And the >>> >>> Of course, there is no ECC parity bytes in it; or raw write the data with the ECC parity bytes per the layout (A) you describe above. >>> >> >> But don't it looks like strange? Just writing OOB makes page unreadable? May be it is better to move OOB data >> out of ECC area as I suggested in v2? >> >>>>> only way to make it readable is to write it in ECC mode, but before this write, we need to read it's >>>>> user's byte (from previous OOB write) in raw mode, put it to info buf (as user's bytes) and write this page. In this >>>>> case NAND controller will generate ECC codes including user's byte and page become readable in ECC mode >>>>> again. >>> >>> yes, you are right. >>> >>>>> >>>>>> >>>>>>> 5) Ok, don't care on these ECC errors, let's go further. >>>>>>> 6) I'm going to write same page in ECC mode - how to do it correctly? There is already >>>>>>> 4 OOB bytes, considered to be covered by ECC (but in fact now - ECC area is FFed). >>>>>> >>>>>> If step 4 has excuted "program" command at the page (nand_write_oob_std() does), it can't be written again before erasing the page(block). so we have to read the whole page in the ddr and change the content, erase block, write it again. >>>>>> >>>>>> I don't think Jffs2 has the same steps (1-6) as you said above. are you sure that happes on Jffs2 or just an example? >>>> >>>> >>>>> >>>>> I just checked JFFS2 mount/umount again, here is what i see: >>>>> 0) First attempt to mount JFFS2. >>>>> 1) It writes OOB to page N (i'm using raw write). It is cleanmarker value 0x85 0x19 0x03 0x20. Mount is done. >>>>> 2) Umount JFFS2. Done. >>>>> 3) Second attempt to mount JFFS2. >>>>> 4) It reads OOB from page N (i'm using raw read). Value is 0x85 0x19 0x03 0x20. Done. >>>>> 5) It reads page N in ECC mode, and i get: >>>>> jffs2: mtd->read(0x100 bytes from N) returned ECC error >>>>> 6) Mount failed. >>>>> >>>>> We already had problem which looks like this on another device. Solution was to use OOB area which is >>>>> not covered by ECC for JFFS2 cleanmarkers. >>> >>> ok, so there is not ECC parity bytes and mtd->read() returns ECC error. >>> does it have to use raw write/read on step 1) and 4)? >>> >> >> If i'm using non raw access to OOB, for example write OOB (user bytes) in ECC mode, then >> steps 1) and 4) and 5) passes ok, but write to this page will be impossible (for example JFFS2 >> writes to such pages later) - we can't update ECC codes properly without erasing whole page. >> Write operation will be done without problem, but read will trigger ECC errors due to broken >> ECC codes. >> >> In general problem that we discuss is that in current implementation data and OOB conflicts >> with each other by sharing same ECC codes, these ECC codes could be written only once (without >> erasing), while data and OOB has different callbacks to access and thus supposed to work >> separately. > > The fact that there might be helpers just for writing OOB areas or just > in-band areas are optimizations. NAND pages are meant to be written a > single time, no matter what portion you write. In some cases, it is > possible to perform subpage writes if the chip supports it. Pages may > be split into several areas which cover a partial in-band area *and* a > partial OOB area. If you write into the in-band *or* out-of-band areas > of a given subpage, you *cannot* write the other part later without Thanks for details! So in case of JFFS2 it looks like strange, that it tries to write page after writing clean markers to it before? In the old vendor's driver OOB write callback is suppressed by return 0 always and JFFS2 works correctly. Thanks, Arseniy > erasing. > > Thanks, > Miquèl
Hi Arseniy, avkrasnov@sberdevices.ru wrote on Tue, 2 May 2023 13:11:38 +0300: > On 02.05.2023 12:59, Miquel Raynal wrote: > > Hi Arseniy, > > > > avkrasnov@sberdevices.ru wrote on Wed, 26 Apr 2023 17:46:19 +0300: > > > >> On 26.04.2023 16:51, Liang Yang wrote: > >>> Hi Arseniy, > >>> > >>> On 2023/4/20 17:37, Arseniy Krasnov wrote: > >>>> [ EXTERNAL EMAIL ] > >>>> > >>>> On 19.04.2023 09:41, Arseniy Krasnov wrote: > >>>>> > >>>>> > >>>>> On 19.04.2023 06:05, Liang Yang wrote: > >>>>>> Hi Arseniy, > >>>>>> > >>>>>> On 2023/4/18 22:57, Arseniy Krasnov wrote: > >>>>>>> [ EXTERNAL EMAIL ] > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> On 18.04.2023 16:25, Miquel Raynal wrote: > >>>>>>>> Hi Arseniy, > >>>>>>>> > >>>>>>>>>>> Hello again @Liang @Miquel! > >>>>>>>>>>> > >>>>>>>>>>> One more question about OOB access, as I can see current driver uses the following > >>>>>>>>>>> callbacks: > >>>>>>>>>>> > >>>>>>>>>>> nand->ecc.write_oob_raw = nand_write_oob_std; > >>>>>>>>>>> nand->ecc.write_oob = nand_write_oob_std; > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> Function 'nand_write_oob_std()' writes data to the end of the page. But as I > >>>>>>>>>>> can see by dumping 'data_buf' during read, physical layout of each page is the > >>>>>>>>>>> following (1KB ECC): > >>>>>>>>>>> > >>>>>>>>>>> 0x000: [ 1 KB of data ] > >>>>>>>>>>> 0x400: [ 2B user data] [ 14B ECC code] > >>>>>>>>>>> 0x410: [ 1 KB of data ] (A) > >>>>>>>>>>> 0x810: [ 2B user data] [ 14B ECC code] > >>>>>>>>>>> 0x820: [ 32B unused ] > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> So, after 'nand_write_oob_std()' (let data be sequence from [0x0 ... 0x3f]), > >>>>>>>>>>> page will look like this: > >>>>>>>>>>> > >>>>>>>>>>> 0x000: [ 0xFF ] > >>>>>>>>>>> 0x400: [ ........ ] > >>>>>>>>>>> 0x7f0: [ 0xFF ] > >>>>>>>>>>> 0x800: [ 00 ....................... ] > >>>>>>>>>>> 0x830: [ ........................ 3f ] > >>>>>>>>>>> > >>>>>>>>>>> Here we have two problems: > >>>>>>>>>>> 1) Attempt to display raw data by 'nanddump' utility produces a little bit > >>>>>>>>>>> invalid output, as driver relies on layout (A) from above. E.g. OOB data > >>>>>>>>>>> is at 0x400 and 0x810. Here is an example (attempt to write 0x11 0x22 0x33 0x44): > >>>>>>>>>>> > >>>>>>>>>>> 0x000007f0: 11 22 ff ff ff ff ff ff ff ff ff ff ff ff ff ff |."..............| > >>>>>>>>>>> OOB Data: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| > >>>>>>>>>>> OOB Data: 33 44 ff ff ff ff ff ff ff ff ff ff ff ff ff ff |3D..............| > >>>>>>>>>>> OOB Data: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| > >>>>>>>>>>> OOB Data: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| > >>>>>>>>>>> > >>>>>>>>>> Hi Arseniy, > >>>>>>>>>> > >>>>>>>>>> I realized the write_oob_raw() and write_oob() are wrong in meson_nand.c. I suggest both of them should be reworked and follow the format of meson nand controller. i.e. firstly format the data in Layout (A) and then write. reading is firstly reading the data of layout (A) and then compost the layout (B). > >>>>>>>>> > >>>>>>>>> IIUC after such writing only OOB (e.g. user bytes) according layout (A), hw will also write ECC codes, so > >>>>>>>>> it will be impossible to write data to this page later, because we cannot update ECC codes properly for the newly > >>>>>>>>> written data (we can't update bits from 0 to 1). > >>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> 2) Attempt to read data in ECC mode will fail, because IIUC page is in dirty > >>>>>>>>>>> state (I mean was written at least once) and NAND controller tries to use > >>>>>>>>>>> ECC codes at 0x400 and 0x810, which are obviously broken in this case. Thus > >>>>>>>>>> > >>>>>>>>>> As i said above, write_oob_raw() and write_oob() should be reworked. > >>>>>>>>>> i don't know what do you mean page was written at least once. anyway the page should be written once, even just write_oob_raw(). > >>>>>>>>> > >>>>>>>>> Sorry, You mean that after OOB write, we cannot write to the data area (e.g. 0x0 .. 0x810) until page will be erased? For example > >>>>>>>>> JFFS2 writes to OOB own markers, then it tries to write to the data area of such page. > >>>>>>> > >>>>>>> @Liang, I'll describe current test case in details: > >>>>>>> 1) I have erased page, I can read it in both raw and ecc modes - no problem (it is full of 0xFF). > >>>>>>> 2) I (JFFS2 for example) want to write only OOB - let it be clean markers. > >>>>>>> 3) I use raw write to the needed page (please correct me if i'm wrong). Four bytes > >>>>>>> at 0x400 and 0x810 are updated. All other bytes still 0xff. > >>>>>>> 4) Now, when i'm trying to read this page in ECC mode, I get ECC errors: IIUC this > >>>>>>> happens because from controller point of view ECC codes are invalid for current > >>>>>>> data (all ECCs are 0xff). Is this behaviour is ok? > >>>>>> > >>>>>> Yes, it is exactly reported ECC errors. > >>>>> > >>>>> I see, so if we write OOB (e.g. using raw mode), there is no way to read this page in ECC mode later? And the > >>> > >>> Of course, there is no ECC parity bytes in it; or raw write the data with the ECC parity bytes per the layout (A) you describe above. > >>> > >> > >> But don't it looks like strange? Just writing OOB makes page unreadable? May be it is better to move OOB data > >> out of ECC area as I suggested in v2? > >> > >>>>> only way to make it readable is to write it in ECC mode, but before this write, we need to read it's > >>>>> user's byte (from previous OOB write) in raw mode, put it to info buf (as user's bytes) and write this page. In this > >>>>> case NAND controller will generate ECC codes including user's byte and page become readable in ECC mode > >>>>> again. > >>> > >>> yes, you are right. > >>> > >>>>> > >>>>>> > >>>>>>> 5) Ok, don't care on these ECC errors, let's go further. > >>>>>>> 6) I'm going to write same page in ECC mode - how to do it correctly? There is already > >>>>>>> 4 OOB bytes, considered to be covered by ECC (but in fact now - ECC area is FFed). > >>>>>> > >>>>>> If step 4 has excuted "program" command at the page (nand_write_oob_std() does), it can't be written again before erasing the page(block). so we have to read the whole page in the ddr and change the content, erase block, write it again. > >>>>>> > >>>>>> I don't think Jffs2 has the same steps (1-6) as you said above. are you sure that happes on Jffs2 or just an example? > >>>> > >>>> > >>>>> > >>>>> I just checked JFFS2 mount/umount again, here is what i see: > >>>>> 0) First attempt to mount JFFS2. > >>>>> 1) It writes OOB to page N (i'm using raw write). It is cleanmarker value 0x85 0x19 0x03 0x20. Mount is done. > >>>>> 2) Umount JFFS2. Done. > >>>>> 3) Second attempt to mount JFFS2. > >>>>> 4) It reads OOB from page N (i'm using raw read). Value is 0x85 0x19 0x03 0x20. Done. > >>>>> 5) It reads page N in ECC mode, and i get: > >>>>> jffs2: mtd->read(0x100 bytes from N) returned ECC error > >>>>> 6) Mount failed. > >>>>> > >>>>> We already had problem which looks like this on another device. Solution was to use OOB area which is > >>>>> not covered by ECC for JFFS2 cleanmarkers. > >>> > >>> ok, so there is not ECC parity bytes and mtd->read() returns ECC error. > >>> does it have to use raw write/read on step 1) and 4)? > >>> > >> > >> If i'm using non raw access to OOB, for example write OOB (user bytes) in ECC mode, then > >> steps 1) and 4) and 5) passes ok, but write to this page will be impossible (for example JFFS2 > >> writes to such pages later) - we can't update ECC codes properly without erasing whole page. > >> Write operation will be done without problem, but read will trigger ECC errors due to broken > >> ECC codes. > >> > >> In general problem that we discuss is that in current implementation data and OOB conflicts > >> with each other by sharing same ECC codes, these ECC codes could be written only once (without > >> erasing), while data and OOB has different callbacks to access and thus supposed to work > >> separately. > > > > The fact that there might be helpers just for writing OOB areas or just > > in-band areas are optimizations. NAND pages are meant to be written a > > single time, no matter what portion you write. In some cases, it is > > possible to perform subpage writes if the chip supports it. Pages may > > be split into several areas which cover a partial in-band area *and* a > > partial OOB area. If you write into the in-band *or* out-of-band areas > > of a given subpage, you *cannot* write the other part later without > > Thanks for details! So in case of JFFS2 it looks like strange, that it tries > to write page after writing clean markers to it before? In the old vendor's > driver OOB write callback is suppressed by return 0 always and JFFS2 works > correctly. Can you point the code you're mentioning? (both what JFFS2 which looks strange to you and the old vendor hack) Thanks, Miquèl
On 02.05.2023 14:27, Miquel Raynal wrote: > Hi Arseniy, > > avkrasnov@sberdevices.ru wrote on Tue, 2 May 2023 13:11:38 +0300: > >> On 02.05.2023 12:59, Miquel Raynal wrote: >>> Hi Arseniy, >>> >>> avkrasnov@sberdevices.ru wrote on Wed, 26 Apr 2023 17:46:19 +0300: >>> >>>> On 26.04.2023 16:51, Liang Yang wrote: >>>>> Hi Arseniy, >>>>> >>>>> On 2023/4/20 17:37, Arseniy Krasnov wrote: >>>>>> [ EXTERNAL EMAIL ] >>>>>> >>>>>> On 19.04.2023 09:41, Arseniy Krasnov wrote: >>>>>>> >>>>>>> >>>>>>> On 19.04.2023 06:05, Liang Yang wrote: >>>>>>>> Hi Arseniy, >>>>>>>> >>>>>>>> On 2023/4/18 22:57, Arseniy Krasnov wrote: >>>>>>>>> [ EXTERNAL EMAIL ] >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On 18.04.2023 16:25, Miquel Raynal wrote: >>>>>>>>>> Hi Arseniy, >>>>>>>>>> >>>>>>>>>>>>> Hello again @Liang @Miquel! >>>>>>>>>>>>> >>>>>>>>>>>>> One more question about OOB access, as I can see current driver uses the following >>>>>>>>>>>>> callbacks: >>>>>>>>>>>>> >>>>>>>>>>>>> nand->ecc.write_oob_raw = nand_write_oob_std; >>>>>>>>>>>>> nand->ecc.write_oob = nand_write_oob_std; >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Function 'nand_write_oob_std()' writes data to the end of the page. But as I >>>>>>>>>>>>> can see by dumping 'data_buf' during read, physical layout of each page is the >>>>>>>>>>>>> following (1KB ECC): >>>>>>>>>>>>> >>>>>>>>>>>>> 0x000: [ 1 KB of data ] >>>>>>>>>>>>> 0x400: [ 2B user data] [ 14B ECC code] >>>>>>>>>>>>> 0x410: [ 1 KB of data ] (A) >>>>>>>>>>>>> 0x810: [ 2B user data] [ 14B ECC code] >>>>>>>>>>>>> 0x820: [ 32B unused ] >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> So, after 'nand_write_oob_std()' (let data be sequence from [0x0 ... 0x3f]), >>>>>>>>>>>>> page will look like this: >>>>>>>>>>>>> >>>>>>>>>>>>> 0x000: [ 0xFF ] >>>>>>>>>>>>> 0x400: [ ........ ] >>>>>>>>>>>>> 0x7f0: [ 0xFF ] >>>>>>>>>>>>> 0x800: [ 00 ....................... ] >>>>>>>>>>>>> 0x830: [ ........................ 3f ] >>>>>>>>>>>>> >>>>>>>>>>>>> Here we have two problems: >>>>>>>>>>>>> 1) Attempt to display raw data by 'nanddump' utility produces a little bit >>>>>>>>>>>>> invalid output, as driver relies on layout (A) from above. E.g. OOB data >>>>>>>>>>>>> is at 0x400 and 0x810. Here is an example (attempt to write 0x11 0x22 0x33 0x44): >>>>>>>>>>>>> >>>>>>>>>>>>> 0x000007f0: 11 22 ff ff ff ff ff ff ff ff ff ff ff ff ff ff |."..............| >>>>>>>>>>>>> OOB Data: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| >>>>>>>>>>>>> OOB Data: 33 44 ff ff ff ff ff ff ff ff ff ff ff ff ff ff |3D..............| >>>>>>>>>>>>> OOB Data: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| >>>>>>>>>>>>> OOB Data: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| >>>>>>>>>>>>> >>>>>>>>>>>> Hi Arseniy, >>>>>>>>>>>> >>>>>>>>>>>> I realized the write_oob_raw() and write_oob() are wrong in meson_nand.c. I suggest both of them should be reworked and follow the format of meson nand controller. i.e. firstly format the data in Layout (A) and then write. reading is firstly reading the data of layout (A) and then compost the layout (B). >>>>>>>>>>> >>>>>>>>>>> IIUC after such writing only OOB (e.g. user bytes) according layout (A), hw will also write ECC codes, so >>>>>>>>>>> it will be impossible to write data to this page later, because we cannot update ECC codes properly for the newly >>>>>>>>>>> written data (we can't update bits from 0 to 1). >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> 2) Attempt to read data in ECC mode will fail, because IIUC page is in dirty >>>>>>>>>>>>> state (I mean was written at least once) and NAND controller tries to use >>>>>>>>>>>>> ECC codes at 0x400 and 0x810, which are obviously broken in this case. Thus >>>>>>>>>>>> >>>>>>>>>>>> As i said above, write_oob_raw() and write_oob() should be reworked. >>>>>>>>>>>> i don't know what do you mean page was written at least once. anyway the page should be written once, even just write_oob_raw(). >>>>>>>>>>> >>>>>>>>>>> Sorry, You mean that after OOB write, we cannot write to the data area (e.g. 0x0 .. 0x810) until page will be erased? For example >>>>>>>>>>> JFFS2 writes to OOB own markers, then it tries to write to the data area of such page. >>>>>>>>> >>>>>>>>> @Liang, I'll describe current test case in details: >>>>>>>>> 1) I have erased page, I can read it in both raw and ecc modes - no problem (it is full of 0xFF). >>>>>>>>> 2) I (JFFS2 for example) want to write only OOB - let it be clean markers. >>>>>>>>> 3) I use raw write to the needed page (please correct me if i'm wrong). Four bytes >>>>>>>>> at 0x400 and 0x810 are updated. All other bytes still 0xff. >>>>>>>>> 4) Now, when i'm trying to read this page in ECC mode, I get ECC errors: IIUC this >>>>>>>>> happens because from controller point of view ECC codes are invalid for current >>>>>>>>> data (all ECCs are 0xff). Is this behaviour is ok? >>>>>>>> >>>>>>>> Yes, it is exactly reported ECC errors. >>>>>>> >>>>>>> I see, so if we write OOB (e.g. using raw mode), there is no way to read this page in ECC mode later? And the >>>>> >>>>> Of course, there is no ECC parity bytes in it; or raw write the data with the ECC parity bytes per the layout (A) you describe above. >>>>> >>>> >>>> But don't it looks like strange? Just writing OOB makes page unreadable? May be it is better to move OOB data >>>> out of ECC area as I suggested in v2? >>>> >>>>>>> only way to make it readable is to write it in ECC mode, but before this write, we need to read it's >>>>>>> user's byte (from previous OOB write) in raw mode, put it to info buf (as user's bytes) and write this page. In this >>>>>>> case NAND controller will generate ECC codes including user's byte and page become readable in ECC mode >>>>>>> again. >>>>> >>>>> yes, you are right. >>>>> >>>>>>> >>>>>>>> >>>>>>>>> 5) Ok, don't care on these ECC errors, let's go further. >>>>>>>>> 6) I'm going to write same page in ECC mode - how to do it correctly? There is already >>>>>>>>> 4 OOB bytes, considered to be covered by ECC (but in fact now - ECC area is FFed). >>>>>>>> >>>>>>>> If step 4 has excuted "program" command at the page (nand_write_oob_std() does), it can't be written again before erasing the page(block). so we have to read the whole page in the ddr and change the content, erase block, write it again. >>>>>>>> >>>>>>>> I don't think Jffs2 has the same steps (1-6) as you said above. are you sure that happes on Jffs2 or just an example? >>>>>> >>>>>> >>>>>>> >>>>>>> I just checked JFFS2 mount/umount again, here is what i see: >>>>>>> 0) First attempt to mount JFFS2. >>>>>>> 1) It writes OOB to page N (i'm using raw write). It is cleanmarker value 0x85 0x19 0x03 0x20. Mount is done. >>>>>>> 2) Umount JFFS2. Done. >>>>>>> 3) Second attempt to mount JFFS2. >>>>>>> 4) It reads OOB from page N (i'm using raw read). Value is 0x85 0x19 0x03 0x20. Done. >>>>>>> 5) It reads page N in ECC mode, and i get: >>>>>>> jffs2: mtd->read(0x100 bytes from N) returned ECC error >>>>>>> 6) Mount failed. >>>>>>> >>>>>>> We already had problem which looks like this on another device. Solution was to use OOB area which is >>>>>>> not covered by ECC for JFFS2 cleanmarkers. >>>>> >>>>> ok, so there is not ECC parity bytes and mtd->read() returns ECC error. >>>>> does it have to use raw write/read on step 1) and 4)? >>>>> >>>> >>>> If i'm using non raw access to OOB, for example write OOB (user bytes) in ECC mode, then >>>> steps 1) and 4) and 5) passes ok, but write to this page will be impossible (for example JFFS2 >>>> writes to such pages later) - we can't update ECC codes properly without erasing whole page. >>>> Write operation will be done without problem, but read will trigger ECC errors due to broken >>>> ECC codes. >>>> >>>> In general problem that we discuss is that in current implementation data and OOB conflicts >>>> with each other by sharing same ECC codes, these ECC codes could be written only once (without >>>> erasing), while data and OOB has different callbacks to access and thus supposed to work >>>> separately. >>> >>> The fact that there might be helpers just for writing OOB areas or just >>> in-band areas are optimizations. NAND pages are meant to be written a >>> single time, no matter what portion you write. In some cases, it is >>> possible to perform subpage writes if the chip supports it. Pages may >>> be split into several areas which cover a partial in-band area *and* a >>> partial OOB area. If you write into the in-band *or* out-of-band areas >>> of a given subpage, you *cannot* write the other part later without >> >> Thanks for details! So in case of JFFS2 it looks like strange, that it tries >> to write page after writing clean markers to it before? In the old vendor's >> driver OOB write callback is suppressed by return 0 always and JFFS2 works >> correctly. > > Can you point the code you're mentioning? (both what JFFS2 which looks > strange to you and the old vendor hack) Here is version of the old vendor's driver: https://github.com/kszaq/linux-amlogic/blob/master_new_amports/drivers/amlogic/nand/nand/aml_nand.c#L3260 In my version there is no BUG() there, but it is same driver for the same chip. About JFFS2 - i didn't check its source code, but what I can see using printk(), is that it first tries to write cleanmarker using OOB write callback. Then later it tries to write to this page, so may be it is unexpected behaviour of JFFS2? Thanks, Arseniy > > > Thanks, > Miquèl
Hi Arseniy, Richard, your input is welcome below :-) > >>>>>>> I just checked JFFS2 mount/umount again, here is what i see: > >>>>>>> 0) First attempt to mount JFFS2. > >>>>>>> 1) It writes OOB to page N (i'm using raw write). It is cleanmarker value 0x85 0x19 0x03 0x20. Mount is done. > >>>>>>> 2) Umount JFFS2. Done. > >>>>>>> 3) Second attempt to mount JFFS2. > >>>>>>> 4) It reads OOB from page N (i'm using raw read). Value is 0x85 0x19 0x03 0x20. Done. > >>>>>>> 5) It reads page N in ECC mode, and i get: > >>>>>>> jffs2: mtd->read(0x100 bytes from N) returned ECC error > >>>>>>> 6) Mount failed. > >>>>>>> > >>>>>>> We already had problem which looks like this on another device. Solution was to use OOB area which is > >>>>>>> not covered by ECC for JFFS2 cleanmarkers. > >>>>> > >>>>> ok, so there is not ECC parity bytes and mtd->read() returns ECC error. > >>>>> does it have to use raw write/read on step 1) and 4)? > >>>>> > >>>> > >>>> If i'm using non raw access to OOB, for example write OOB (user bytes) in ECC mode, then > >>>> steps 1) and 4) and 5) passes ok, but write to this page will be impossible (for example JFFS2 > >>>> writes to such pages later) - we can't update ECC codes properly without erasing whole page. > >>>> Write operation will be done without problem, but read will trigger ECC errors due to broken > >>>> ECC codes. > >>>> > >>>> In general problem that we discuss is that in current implementation data and OOB conflicts > >>>> with each other by sharing same ECC codes, these ECC codes could be written only once (without > >>>> erasing), while data and OOB has different callbacks to access and thus supposed to work > >>>> separately. > >>> > >>> The fact that there might be helpers just for writing OOB areas or just > >>> in-band areas are optimizations. NAND pages are meant to be written a > >>> single time, no matter what portion you write. In some cases, it is > >>> possible to perform subpage writes if the chip supports it. Pages may > >>> be split into several areas which cover a partial in-band area *and* a > >>> partial OOB area. If you write into the in-band *or* out-of-band areas > >>> of a given subpage, you *cannot* write the other part later without > >> > >> Thanks for details! So in case of JFFS2 it looks like strange, that it tries > >> to write page after writing clean markers to it before? In the old vendor's > >> driver OOB write callback is suppressed by return 0 always and JFFS2 works > >> correctly. > > > > Can you point the code you're mentioning? (both what JFFS2 which looks > > strange to you and the old vendor hack) > > Here is version of the old vendor's driver: > > https://github.com/kszaq/linux-amlogic/blob/master_new_amports/drivers/amlogic/nand/nand/aml_nand.c#L3260 > > In my version there is no BUG() there, but it is same driver for the same chip. > > About JFFS2 - i didn't check its source code, but what I can see using printk(), is that it first > tries to write cleanmarker using OOB write callback. Then later it tries to write to this page, so > may be it is unexpected behaviour of JFFS2? TBH I am not knowledgeable about JFFS2, maybe Richard can help here. Are you sure you flash is recognized by JFFS2 as being a NAND device? Did you enable CONFIG_JFFS2_FS_WRITEBUFFER correctly? Because cleanmarker seem to be discarded when using a NAND device, and recognizing the device as a NAND device requires the above option to be set apparently. Thanks, Miquèl
On 02.05.2023 15:17, Miquel Raynal wrote: > Hi Arseniy, > > Richard, your input is welcome below :-) > >>>>>>>>> I just checked JFFS2 mount/umount again, here is what i see: >>>>>>>>> 0) First attempt to mount JFFS2. >>>>>>>>> 1) It writes OOB to page N (i'm using raw write). It is cleanmarker value 0x85 0x19 0x03 0x20. Mount is done. >>>>>>>>> 2) Umount JFFS2. Done. >>>>>>>>> 3) Second attempt to mount JFFS2. >>>>>>>>> 4) It reads OOB from page N (i'm using raw read). Value is 0x85 0x19 0x03 0x20. Done. >>>>>>>>> 5) It reads page N in ECC mode, and i get: >>>>>>>>> jffs2: mtd->read(0x100 bytes from N) returned ECC error >>>>>>>>> 6) Mount failed. >>>>>>>>> >>>>>>>>> We already had problem which looks like this on another device. Solution was to use OOB area which is >>>>>>>>> not covered by ECC for JFFS2 cleanmarkers. >>>>>>> >>>>>>> ok, so there is not ECC parity bytes and mtd->read() returns ECC error. >>>>>>> does it have to use raw write/read on step 1) and 4)? >>>>>>> >>>>>> >>>>>> If i'm using non raw access to OOB, for example write OOB (user bytes) in ECC mode, then >>>>>> steps 1) and 4) and 5) passes ok, but write to this page will be impossible (for example JFFS2 >>>>>> writes to such pages later) - we can't update ECC codes properly without erasing whole page. >>>>>> Write operation will be done without problem, but read will trigger ECC errors due to broken >>>>>> ECC codes. >>>>>> >>>>>> In general problem that we discuss is that in current implementation data and OOB conflicts >>>>>> with each other by sharing same ECC codes, these ECC codes could be written only once (without >>>>>> erasing), while data and OOB has different callbacks to access and thus supposed to work >>>>>> separately. >>>>> >>>>> The fact that there might be helpers just for writing OOB areas or just >>>>> in-band areas are optimizations. NAND pages are meant to be written a >>>>> single time, no matter what portion you write. In some cases, it is >>>>> possible to perform subpage writes if the chip supports it. Pages may >>>>> be split into several areas which cover a partial in-band area *and* a >>>>> partial OOB area. If you write into the in-band *or* out-of-band areas >>>>> of a given subpage, you *cannot* write the other part later without >>>> >>>> Thanks for details! So in case of JFFS2 it looks like strange, that it tries >>>> to write page after writing clean markers to it before? In the old vendor's >>>> driver OOB write callback is suppressed by return 0 always and JFFS2 works >>>> correctly. >>> >>> Can you point the code you're mentioning? (both what JFFS2 which looks >>> strange to you and the old vendor hack) >> >> Here is version of the old vendor's driver: >> >> https://github.com/kszaq/linux-amlogic/blob/master_new_amports/drivers/amlogic/nand/nand/aml_nand.c#L3260 >> >> In my version there is no BUG() there, but it is same driver for the same chip. >> >> About JFFS2 - i didn't check its source code, but what I can see using printk(), is that it first >> tries to write cleanmarker using OOB write callback. Then later it tries to write to this page, so >> may be it is unexpected behaviour of JFFS2? > > TBH I am not knowledgeable about JFFS2, maybe Richard can help here. > > Are you sure you flash is recognized by JFFS2 as being a NAND device? > Did you enable CONFIG_JFFS2_FS_WRITEBUFFER correctly? Because > cleanmarker seem to be discarded when using a NAND device, and > recognizing the device as a NAND device requires the above option to be > set apparently. Yes, I have CONFIG_JFFS2_FS_WRITEBUFFER=y And i see, that jffs2_mark_erased_block() calls jffs2_cleanmarker_oob() which checks that we have MTD_NANDFLASH. This check is true, so then jffs2_write_nand_cleanmarker() is called and there is OOB write in it. So I see opposite thing: cleanmarkers are not discarded with NAND device. Thanks, Arseniy > > Thanks, > Miquèl
Hi Arseniy, avkrasnov@sberdevices.ru wrote on Tue, 2 May 2023 15:24:09 +0300: > On 02.05.2023 15:17, Miquel Raynal wrote: > > Hi Arseniy, > > > > Richard, your input is welcome below :-) > > > >>>>>>>>> I just checked JFFS2 mount/umount again, here is what i see: > >>>>>>>>> 0) First attempt to mount JFFS2. > >>>>>>>>> 1) It writes OOB to page N (i'm using raw write). It is cleanmarker value 0x85 0x19 0x03 0x20. Mount is done. > >>>>>>>>> 2) Umount JFFS2. Done. > >>>>>>>>> 3) Second attempt to mount JFFS2. > >>>>>>>>> 4) It reads OOB from page N (i'm using raw read). Value is 0x85 0x19 0x03 0x20. Done. > >>>>>>>>> 5) It reads page N in ECC mode, and i get: > >>>>>>>>> jffs2: mtd->read(0x100 bytes from N) returned ECC error > >>>>>>>>> 6) Mount failed. > >>>>>>>>> > >>>>>>>>> We already had problem which looks like this on another device. Solution was to use OOB area which is > >>>>>>>>> not covered by ECC for JFFS2 cleanmarkers. > >>>>>>> > >>>>>>> ok, so there is not ECC parity bytes and mtd->read() returns ECC error. > >>>>>>> does it have to use raw write/read on step 1) and 4)? > >>>>>>> > >>>>>> > >>>>>> If i'm using non raw access to OOB, for example write OOB (user bytes) in ECC mode, then > >>>>>> steps 1) and 4) and 5) passes ok, but write to this page will be impossible (for example JFFS2 > >>>>>> writes to such pages later) - we can't update ECC codes properly without erasing whole page. > >>>>>> Write operation will be done without problem, but read will trigger ECC errors due to broken > >>>>>> ECC codes. > >>>>>> > >>>>>> In general problem that we discuss is that in current implementation data and OOB conflicts > >>>>>> with each other by sharing same ECC codes, these ECC codes could be written only once (without > >>>>>> erasing), while data and OOB has different callbacks to access and thus supposed to work > >>>>>> separately. > >>>>> > >>>>> The fact that there might be helpers just for writing OOB areas or just > >>>>> in-band areas are optimizations. NAND pages are meant to be written a > >>>>> single time, no matter what portion you write. In some cases, it is > >>>>> possible to perform subpage writes if the chip supports it. Pages may > >>>>> be split into several areas which cover a partial in-band area *and* a > >>>>> partial OOB area. If you write into the in-band *or* out-of-band areas > >>>>> of a given subpage, you *cannot* write the other part later without > >>>> > >>>> Thanks for details! So in case of JFFS2 it looks like strange, that it tries > >>>> to write page after writing clean markers to it before? In the old vendor's > >>>> driver OOB write callback is suppressed by return 0 always and JFFS2 works > >>>> correctly. > >>> > >>> Can you point the code you're mentioning? (both what JFFS2 which looks > >>> strange to you and the old vendor hack) > >> > >> Here is version of the old vendor's driver: > >> > >> https://github.com/kszaq/linux-amlogic/blob/master_new_amports/drivers/amlogic/nand/nand/aml_nand.c#L3260 > >> > >> In my version there is no BUG() there, but it is same driver for the same chip. > >> > >> About JFFS2 - i didn't check its source code, but what I can see using printk(), is that it first > >> tries to write cleanmarker using OOB write callback. Then later it tries to write to this page, so > >> may be it is unexpected behaviour of JFFS2? > > > > TBH I am not knowledgeable about JFFS2, maybe Richard can help here. > > > > Are you sure you flash is recognized by JFFS2 as being a NAND device? > > Did you enable CONFIG_JFFS2_FS_WRITEBUFFER correctly? Because > > cleanmarker seem to be discarded when using a NAND device, and > > recognizing the device as a NAND device requires the above option to be > > set apparently. > > Yes, I have > > CONFIG_JFFS2_FS_WRITEBUFFER=y > > And i see, that jffs2_mark_erased_block() calls jffs2_cleanmarker_oob() which checks that we have MTD_NANDFLASH. This > check is true, so then jffs2_write_nand_cleanmarker() is called and there is OOB write in it. So I see opposite thing: > cleanmarkers are not discarded with NAND device. Excellent. So when cleanmarker_size == 0, it means there is no cleanmarker. But if it is a NAND device, we write the marker anyway. Well I guess it used to work on old controllers using a Hamming ECC engine not protecting any user OOB bytes, so writing the clean markers would simply not lead to ECC bytes being produced/written. Or it might have worked as well on controller drivers not enabling the ECC engine when performing OOB-only writes. It also requires the chip to be old enough to support multiple writes on the same (sub)page as long as the written bits do not overlap? Perhaps that's what the hack in the old driver is for. But that's IMHO broken in case of unexpected reboot :-) Miquèl
On 02.05.2023 16:05, Miquel Raynal wrote: > Hi Arseniy, > > avkrasnov@sberdevices.ru wrote on Tue, 2 May 2023 15:24:09 +0300: > >> On 02.05.2023 15:17, Miquel Raynal wrote: >>> Hi Arseniy, >>> >>> Richard, your input is welcome below :-) >>> >>>>>>>>>>> I just checked JFFS2 mount/umount again, here is what i see: >>>>>>>>>>> 0) First attempt to mount JFFS2. >>>>>>>>>>> 1) It writes OOB to page N (i'm using raw write). It is cleanmarker value 0x85 0x19 0x03 0x20. Mount is done. >>>>>>>>>>> 2) Umount JFFS2. Done. >>>>>>>>>>> 3) Second attempt to mount JFFS2. >>>>>>>>>>> 4) It reads OOB from page N (i'm using raw read). Value is 0x85 0x19 0x03 0x20. Done. >>>>>>>>>>> 5) It reads page N in ECC mode, and i get: >>>>>>>>>>> jffs2: mtd->read(0x100 bytes from N) returned ECC error >>>>>>>>>>> 6) Mount failed. >>>>>>>>>>> >>>>>>>>>>> We already had problem which looks like this on another device. Solution was to use OOB area which is >>>>>>>>>>> not covered by ECC for JFFS2 cleanmarkers. >>>>>>>>> >>>>>>>>> ok, so there is not ECC parity bytes and mtd->read() returns ECC error. >>>>>>>>> does it have to use raw write/read on step 1) and 4)? >>>>>>>>> >>>>>>>> >>>>>>>> If i'm using non raw access to OOB, for example write OOB (user bytes) in ECC mode, then >>>>>>>> steps 1) and 4) and 5) passes ok, but write to this page will be impossible (for example JFFS2 >>>>>>>> writes to such pages later) - we can't update ECC codes properly without erasing whole page. >>>>>>>> Write operation will be done without problem, but read will trigger ECC errors due to broken >>>>>>>> ECC codes. >>>>>>>> >>>>>>>> In general problem that we discuss is that in current implementation data and OOB conflicts >>>>>>>> with each other by sharing same ECC codes, these ECC codes could be written only once (without >>>>>>>> erasing), while data and OOB has different callbacks to access and thus supposed to work >>>>>>>> separately. >>>>>>> >>>>>>> The fact that there might be helpers just for writing OOB areas or just >>>>>>> in-band areas are optimizations. NAND pages are meant to be written a >>>>>>> single time, no matter what portion you write. In some cases, it is >>>>>>> possible to perform subpage writes if the chip supports it. Pages may >>>>>>> be split into several areas which cover a partial in-band area *and* a >>>>>>> partial OOB area. If you write into the in-band *or* out-of-band areas >>>>>>> of a given subpage, you *cannot* write the other part later without >>>>>> >>>>>> Thanks for details! So in case of JFFS2 it looks like strange, that it tries >>>>>> to write page after writing clean markers to it before? In the old vendor's >>>>>> driver OOB write callback is suppressed by return 0 always and JFFS2 works >>>>>> correctly. >>>>> >>>>> Can you point the code you're mentioning? (both what JFFS2 which looks >>>>> strange to you and the old vendor hack) >>>> >>>> Here is version of the old vendor's driver: >>>> >>>> https://github.com/kszaq/linux-amlogic/blob/master_new_amports/drivers/amlogic/nand/nand/aml_nand.c#L3260 >>>> >>>> In my version there is no BUG() there, but it is same driver for the same chip. >>>> >>>> About JFFS2 - i didn't check its source code, but what I can see using printk(), is that it first >>>> tries to write cleanmarker using OOB write callback. Then later it tries to write to this page, so >>>> may be it is unexpected behaviour of JFFS2? >>> >>> TBH I am not knowledgeable about JFFS2, maybe Richard can help here. >>> >>> Are you sure you flash is recognized by JFFS2 as being a NAND device? >>> Did you enable CONFIG_JFFS2_FS_WRITEBUFFER correctly? Because >>> cleanmarker seem to be discarded when using a NAND device, and >>> recognizing the device as a NAND device requires the above option to be >>> set apparently. >> >> Yes, I have >> >> CONFIG_JFFS2_FS_WRITEBUFFER=y >> >> And i see, that jffs2_mark_erased_block() calls jffs2_cleanmarker_oob() which checks that we have MTD_NANDFLASH. This >> check is true, so then jffs2_write_nand_cleanmarker() is called and there is OOB write in it. So I see opposite thing: >> cleanmarkers are not discarded with NAND device. > > Excellent. So when cleanmarker_size == 0, it means there is no > cleanmarker. But if it is a NAND device, we write the marker anyway. > > Well I guess it used to work on old controllers using a Hamming ECC > engine not protecting any user OOB bytes, so writing the clean markers > would simply not lead to ECC bytes being produced/written. Or it might > have worked as well on controller drivers not enabling the ECC engine > when performing OOB-only writes. It also requires the chip to be old > enough to support multiple writes on the same (sub)page as long as the > written bits do not overlap? Yes, with controller which supports such modes there will be no problem here! What i see, is that this controller doesn't support multiple writes to the same page in ECC mode(e.g. it can't update ECC correctly). So in v2 i've added patch which moves OOB out of ECC area, thus JFFS2 driver will work correctly. So for me main question here is: How JFFS2 should work with controllers where we can't update data and OOB independently? Driver of this filesystem knows nothing about this features of the controller. Or JFFS2 works incorrectly in my case when it tries to call write page callback after calling write OOB callback (IIUC it is better to ask Richard as You mentioned above). Or may be it is better to suppress OOB write callback (or set it to NULL) in this driver as in vendor's driver? Thanks, Arseniy > > Perhaps that's what the hack in the old driver is for. But that's > IMHO broken in case of unexpected reboot :-) > > Miquèl
Hi Arseniy, avkrasnov@sberdevices.ru wrote on Tue, 2 May 2023 19:13:38 +0300: > On 02.05.2023 16:05, Miquel Raynal wrote: > > Hi Arseniy, > > > > avkrasnov@sberdevices.ru wrote on Tue, 2 May 2023 15:24:09 +0300: > > > >> On 02.05.2023 15:17, Miquel Raynal wrote: > >>> Hi Arseniy, > >>> > >>> Richard, your input is welcome below :-) > >>> > >>>>>>>>>>> I just checked JFFS2 mount/umount again, here is what i see: > >>>>>>>>>>> 0) First attempt to mount JFFS2. > >>>>>>>>>>> 1) It writes OOB to page N (i'm using raw write). It is cleanmarker value 0x85 0x19 0x03 0x20. Mount is done. > >>>>>>>>>>> 2) Umount JFFS2. Done. > >>>>>>>>>>> 3) Second attempt to mount JFFS2. > >>>>>>>>>>> 4) It reads OOB from page N (i'm using raw read). Value is 0x85 0x19 0x03 0x20. Done. > >>>>>>>>>>> 5) It reads page N in ECC mode, and i get: > >>>>>>>>>>> jffs2: mtd->read(0x100 bytes from N) returned ECC error > >>>>>>>>>>> 6) Mount failed. > >>>>>>>>>>> > >>>>>>>>>>> We already had problem which looks like this on another device. Solution was to use OOB area which is > >>>>>>>>>>> not covered by ECC for JFFS2 cleanmarkers. > >>>>>>>>> > >>>>>>>>> ok, so there is not ECC parity bytes and mtd->read() returns ECC error. > >>>>>>>>> does it have to use raw write/read on step 1) and 4)? > >>>>>>>>> > >>>>>>>> > >>>>>>>> If i'm using non raw access to OOB, for example write OOB (user bytes) in ECC mode, then > >>>>>>>> steps 1) and 4) and 5) passes ok, but write to this page will be impossible (for example JFFS2 > >>>>>>>> writes to such pages later) - we can't update ECC codes properly without erasing whole page. > >>>>>>>> Write operation will be done without problem, but read will trigger ECC errors due to broken > >>>>>>>> ECC codes. > >>>>>>>> > >>>>>>>> In general problem that we discuss is that in current implementation data and OOB conflicts > >>>>>>>> with each other by sharing same ECC codes, these ECC codes could be written only once (without > >>>>>>>> erasing), while data and OOB has different callbacks to access and thus supposed to work > >>>>>>>> separately. > >>>>>>> > >>>>>>> The fact that there might be helpers just for writing OOB areas or just > >>>>>>> in-band areas are optimizations. NAND pages are meant to be written a > >>>>>>> single time, no matter what portion you write. In some cases, it is > >>>>>>> possible to perform subpage writes if the chip supports it. Pages may > >>>>>>> be split into several areas which cover a partial in-band area *and* a > >>>>>>> partial OOB area. If you write into the in-band *or* out-of-band areas > >>>>>>> of a given subpage, you *cannot* write the other part later without > >>>>>> > >>>>>> Thanks for details! So in case of JFFS2 it looks like strange, that it tries > >>>>>> to write page after writing clean markers to it before? In the old vendor's > >>>>>> driver OOB write callback is suppressed by return 0 always and JFFS2 works > >>>>>> correctly. > >>>>> > >>>>> Can you point the code you're mentioning? (both what JFFS2 which looks > >>>>> strange to you and the old vendor hack) > >>>> > >>>> Here is version of the old vendor's driver: > >>>> > >>>> https://github.com/kszaq/linux-amlogic/blob/master_new_amports/drivers/amlogic/nand/nand/aml_nand.c#L3260 > >>>> > >>>> In my version there is no BUG() there, but it is same driver for the same chip. > >>>> > >>>> About JFFS2 - i didn't check its source code, but what I can see using printk(), is that it first > >>>> tries to write cleanmarker using OOB write callback. Then later it tries to write to this page, so > >>>> may be it is unexpected behaviour of JFFS2? > >>> > >>> TBH I am not knowledgeable about JFFS2, maybe Richard can help here. > >>> > >>> Are you sure you flash is recognized by JFFS2 as being a NAND device? > >>> Did you enable CONFIG_JFFS2_FS_WRITEBUFFER correctly? Because > >>> cleanmarker seem to be discarded when using a NAND device, and > >>> recognizing the device as a NAND device requires the above option to be > >>> set apparently. > >> > >> Yes, I have > >> > >> CONFIG_JFFS2_FS_WRITEBUFFER=y > >> > >> And i see, that jffs2_mark_erased_block() calls jffs2_cleanmarker_oob() which checks that we have MTD_NANDFLASH. This > >> check is true, so then jffs2_write_nand_cleanmarker() is called and there is OOB write in it. So I see opposite thing: > >> cleanmarkers are not discarded with NAND device. > > > > Excellent. So when cleanmarker_size == 0, it means there is no > > cleanmarker. But if it is a NAND device, we write the marker anyway. > > > > Well I guess it used to work on old controllers using a Hamming ECC > > engine not protecting any user OOB bytes, so writing the clean markers > > would simply not lead to ECC bytes being produced/written. Or it might > > have worked as well on controller drivers not enabling the ECC engine > > when performing OOB-only writes. It also requires the chip to be old > > enough to support multiple writes on the same (sub)page as long as the > > written bits do not overlap? > > Yes, with controller which supports such modes there will be no problem here! > What i see, is that this controller doesn't support multiple writes to the > same page in ECC mode(e.g. it can't update ECC correctly). I don't think this is a controller limitation. The NAND chip cannot write ECC bytes a first time and then overwrite other ECC bytes, that cannot work. The fact that we write ECC bytes in the first place is because the ECC engine covers the free OOB bytes used by JFFS2 to write its cleanmarkers. > So in v2 i've added > patch which moves OOB out of ECC area, thus JFFS2 driver will work correctly. I am sorry but the above sentence is not clear to me. I believe you meant the free OOB bytes are moved outside of the area protected by the ECC engine. In this case I guess it should be fine. > So for me main question here is: > > How JFFS2 should work with controllers where we can't update data and OOB > independently? Driver of this filesystem knows nothing about this features of > the controller. > > Or JFFS2 works incorrectly in my case when it tries to call write page callback > after calling write OOB callback (IIUC it is better to ask Richard as You mentioned above). > > Or may be it is better to suppress OOB write callback (or set it to NULL) in this > driver as in vendor's driver? I would assume using the unprotected free OOB bytes to store the cleanmarkers should work. But that's a bit fragile and very filesystem oriented. I don't like this much. But on the other side JFFS2 is legacy, you should use UBI (which does not play with OOB areas) :-) Thanks, Miquèl > > Thanks, Arseniy > > > > > Perhaps that's what the hack in the old driver is for. But that's > > IMHO broken in case of unexpected reboot :-) > > > > Miquèl
On 03.05.2023 11:03, Miquel Raynal wrote: > Hi Arseniy, > > avkrasnov@sberdevices.ru wrote on Tue, 2 May 2023 19:13:38 +0300: > >> On 02.05.2023 16:05, Miquel Raynal wrote: >>> Hi Arseniy, >>> >>> avkrasnov@sberdevices.ru wrote on Tue, 2 May 2023 15:24:09 +0300: >>> >>>> On 02.05.2023 15:17, Miquel Raynal wrote: >>>>> Hi Arseniy, >>>>> >>>>> Richard, your input is welcome below :-) >>>>> >>>>>>>>>>>>> I just checked JFFS2 mount/umount again, here is what i see: >>>>>>>>>>>>> 0) First attempt to mount JFFS2. >>>>>>>>>>>>> 1) It writes OOB to page N (i'm using raw write). It is cleanmarker value 0x85 0x19 0x03 0x20. Mount is done. >>>>>>>>>>>>> 2) Umount JFFS2. Done. >>>>>>>>>>>>> 3) Second attempt to mount JFFS2. >>>>>>>>>>>>> 4) It reads OOB from page N (i'm using raw read). Value is 0x85 0x19 0x03 0x20. Done. >>>>>>>>>>>>> 5) It reads page N in ECC mode, and i get: >>>>>>>>>>>>> jffs2: mtd->read(0x100 bytes from N) returned ECC error >>>>>>>>>>>>> 6) Mount failed. >>>>>>>>>>>>> >>>>>>>>>>>>> We already had problem which looks like this on another device. Solution was to use OOB area which is >>>>>>>>>>>>> not covered by ECC for JFFS2 cleanmarkers. >>>>>>>>>>> >>>>>>>>>>> ok, so there is not ECC parity bytes and mtd->read() returns ECC error. >>>>>>>>>>> does it have to use raw write/read on step 1) and 4)? >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> If i'm using non raw access to OOB, for example write OOB (user bytes) in ECC mode, then >>>>>>>>>> steps 1) and 4) and 5) passes ok, but write to this page will be impossible (for example JFFS2 >>>>>>>>>> writes to such pages later) - we can't update ECC codes properly without erasing whole page. >>>>>>>>>> Write operation will be done without problem, but read will trigger ECC errors due to broken >>>>>>>>>> ECC codes. >>>>>>>>>> >>>>>>>>>> In general problem that we discuss is that in current implementation data and OOB conflicts >>>>>>>>>> with each other by sharing same ECC codes, these ECC codes could be written only once (without >>>>>>>>>> erasing), while data and OOB has different callbacks to access and thus supposed to work >>>>>>>>>> separately. >>>>>>>>> >>>>>>>>> The fact that there might be helpers just for writing OOB areas or just >>>>>>>>> in-band areas are optimizations. NAND pages are meant to be written a >>>>>>>>> single time, no matter what portion you write. In some cases, it is >>>>>>>>> possible to perform subpage writes if the chip supports it. Pages may >>>>>>>>> be split into several areas which cover a partial in-band area *and* a >>>>>>>>> partial OOB area. If you write into the in-band *or* out-of-band areas >>>>>>>>> of a given subpage, you *cannot* write the other part later without >>>>>>>> >>>>>>>> Thanks for details! So in case of JFFS2 it looks like strange, that it tries >>>>>>>> to write page after writing clean markers to it before? In the old vendor's >>>>>>>> driver OOB write callback is suppressed by return 0 always and JFFS2 works >>>>>>>> correctly. >>>>>>> >>>>>>> Can you point the code you're mentioning? (both what JFFS2 which looks >>>>>>> strange to you and the old vendor hack) >>>>>> >>>>>> Here is version of the old vendor's driver: >>>>>> >>>>>> https://github.com/kszaq/linux-amlogic/blob/master_new_amports/drivers/amlogic/nand/nand/aml_nand.c#L3260 >>>>>> >>>>>> In my version there is no BUG() there, but it is same driver for the same chip. >>>>>> >>>>>> About JFFS2 - i didn't check its source code, but what I can see using printk(), is that it first >>>>>> tries to write cleanmarker using OOB write callback. Then later it tries to write to this page, so >>>>>> may be it is unexpected behaviour of JFFS2? >>>>> >>>>> TBH I am not knowledgeable about JFFS2, maybe Richard can help here. >>>>> >>>>> Are you sure you flash is recognized by JFFS2 as being a NAND device? >>>>> Did you enable CONFIG_JFFS2_FS_WRITEBUFFER correctly? Because >>>>> cleanmarker seem to be discarded when using a NAND device, and >>>>> recognizing the device as a NAND device requires the above option to be >>>>> set apparently. >>>> >>>> Yes, I have >>>> >>>> CONFIG_JFFS2_FS_WRITEBUFFER=y >>>> >>>> And i see, that jffs2_mark_erased_block() calls jffs2_cleanmarker_oob() which checks that we have MTD_NANDFLASH. This >>>> check is true, so then jffs2_write_nand_cleanmarker() is called and there is OOB write in it. So I see opposite thing: >>>> cleanmarkers are not discarded with NAND device. >>> >>> Excellent. So when cleanmarker_size == 0, it means there is no >>> cleanmarker. But if it is a NAND device, we write the marker anyway. >>> >>> Well I guess it used to work on old controllers using a Hamming ECC >>> engine not protecting any user OOB bytes, so writing the clean markers >>> would simply not lead to ECC bytes being produced/written. Or it might >>> have worked as well on controller drivers not enabling the ECC engine >>> when performing OOB-only writes. It also requires the chip to be old >>> enough to support multiple writes on the same (sub)page as long as the >>> written bits do not overlap? >> >> Yes, with controller which supports such modes there will be no problem here! >> What i see, is that this controller doesn't support multiple writes to the >> same page in ECC mode(e.g. it can't update ECC correctly). > > I don't think this is a controller limitation. The NAND chip cannot > write ECC bytes a first time and then overwrite other ECC bytes, that > cannot work. The fact that we write ECC bytes in the first place is > because the ECC engine covers the free OOB bytes used by JFFS2 to write > its cleanmarkers. > >> So in v2 i've added >> patch which moves OOB out of ECC area, thus JFFS2 driver will work correctly. > > I am sorry but the above sentence is not clear to me. I believe you > meant the free OOB bytes are moved outside of the area protected by the > ECC engine. In this case I guess it should be fine. Exactly, free bytes which are reported by OOB layout callbacks were moved out of ECC area. > >> So for me main question here is: >> >> How JFFS2 should work with controllers where we can't update data and OOB >> independently? Driver of this filesystem knows nothing about this features of >> the controller. >> >> Or JFFS2 works incorrectly in my case when it tries to call write page callback >> after calling write OOB callback (IIUC it is better to ask Richard as You mentioned above). >> >> Or may be it is better to suppress OOB write callback (or set it to NULL) in this >> driver as in vendor's driver? > > I would assume using the unprotected free OOB bytes to store the > cleanmarkers should work. But that's a bit fragile and very filesystem > oriented. I don't like this much. But on the other side JFFS2 is > legacy, you should use UBI (which does not play with OOB areas) :-) Problem here is that we can't use UBI in this case, because it does not support small fs images. So the only way to make JFFS2 work is to move free OOB bytes to non protected area. Otherwise i think we have strange situation that JFFS2 can't work correctly on specific type on NAND controller. We already had same problem on another NAND controller, and solution was to move OOB free bytes no non-protected area: https://lore.kernel.org/all/20230329114240.378722-1-mmkurbanov@sberdevices.ru/ Thanks, Arseniy > > Thanks, > Miquèl > >> >> Thanks, Arseniy >> >>> >>> Perhaps that's what the hack in the old driver is for. But that's >>> IMHO broken in case of unexpected reboot :-) >>> >>> Miquèl >
Sorry for joining the party late. On Wed, May 3, 2023 at 10:07 AM Miquel Raynal <miquel.raynal@bootlin.com> wrote: > > So for me main question here is: > > > > How JFFS2 should work with controllers where we can't update data and OOB > > independently? Driver of this filesystem knows nothing about this features of > > the controller. > > > > Or JFFS2 works incorrectly in my case when it tries to call write page callback > > after calling write OOB callback (IIUC it is better to ask Richard as You mentioned above). > > > > Or may be it is better to suppress OOB write callback (or set it to NULL) in this > > driver as in vendor's driver? > > I would assume using the unprotected free OOB bytes to store the > cleanmarkers should work. But that's a bit fragile and very filesystem > oriented. I don't like this much. But on the other side JFFS2 is > legacy, you should use UBI (which does not play with OOB areas) :-) Please note that JFFS2's main use case is NOR flash. Support for NAND flash was never considered as fully complete/stable. That's why UBIFS (JFFS3 in the very beginning) was invented. Thanks, //richard
On 03.05.2023 13:23, Arseniy Krasnov wrote: > > > On 03.05.2023 11:03, Miquel Raynal wrote: >> Hi Arseniy, >> >> avkrasnov@sberdevices.ru wrote on Tue, 2 May 2023 19:13:38 +0300: >> >>> On 02.05.2023 16:05, Miquel Raynal wrote: >>>> Hi Arseniy, >>>> >>>> avkrasnov@sberdevices.ru wrote on Tue, 2 May 2023 15:24:09 +0300: >>>> >>>>> On 02.05.2023 15:17, Miquel Raynal wrote: >>>>>> Hi Arseniy, >>>>>> >>>>>> Richard, your input is welcome below :-) >>>>>> >>>>>>>>>>>>>> I just checked JFFS2 mount/umount again, here is what i see: >>>>>>>>>>>>>> 0) First attempt to mount JFFS2. >>>>>>>>>>>>>> 1) It writes OOB to page N (i'm using raw write). It is cleanmarker value 0x85 0x19 0x03 0x20. Mount is done. >>>>>>>>>>>>>> 2) Umount JFFS2. Done. >>>>>>>>>>>>>> 3) Second attempt to mount JFFS2. >>>>>>>>>>>>>> 4) It reads OOB from page N (i'm using raw read). Value is 0x85 0x19 0x03 0x20. Done. >>>>>>>>>>>>>> 5) It reads page N in ECC mode, and i get: >>>>>>>>>>>>>> jffs2: mtd->read(0x100 bytes from N) returned ECC error >>>>>>>>>>>>>> 6) Mount failed. >>>>>>>>>>>>>> >>>>>>>>>>>>>> We already had problem which looks like this on another device. Solution was to use OOB area which is >>>>>>>>>>>>>> not covered by ECC for JFFS2 cleanmarkers. >>>>>>>>>>>> >>>>>>>>>>>> ok, so there is not ECC parity bytes and mtd->read() returns ECC error. >>>>>>>>>>>> does it have to use raw write/read on step 1) and 4)? >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> If i'm using non raw access to OOB, for example write OOB (user bytes) in ECC mode, then >>>>>>>>>>> steps 1) and 4) and 5) passes ok, but write to this page will be impossible (for example JFFS2 >>>>>>>>>>> writes to such pages later) - we can't update ECC codes properly without erasing whole page. >>>>>>>>>>> Write operation will be done without problem, but read will trigger ECC errors due to broken >>>>>>>>>>> ECC codes. >>>>>>>>>>> >>>>>>>>>>> In general problem that we discuss is that in current implementation data and OOB conflicts >>>>>>>>>>> with each other by sharing same ECC codes, these ECC codes could be written only once (without >>>>>>>>>>> erasing), while data and OOB has different callbacks to access and thus supposed to work >>>>>>>>>>> separately. >>>>>>>>>> >>>>>>>>>> The fact that there might be helpers just for writing OOB areas or just >>>>>>>>>> in-band areas are optimizations. NAND pages are meant to be written a >>>>>>>>>> single time, no matter what portion you write. In some cases, it is >>>>>>>>>> possible to perform subpage writes if the chip supports it. Pages may >>>>>>>>>> be split into several areas which cover a partial in-band area *and* a >>>>>>>>>> partial OOB area. If you write into the in-band *or* out-of-band areas >>>>>>>>>> of a given subpage, you *cannot* write the other part later without >>>>>>>>> >>>>>>>>> Thanks for details! So in case of JFFS2 it looks like strange, that it tries >>>>>>>>> to write page after writing clean markers to it before? In the old vendor's >>>>>>>>> driver OOB write callback is suppressed by return 0 always and JFFS2 works >>>>>>>>> correctly. >>>>>>>> >>>>>>>> Can you point the code you're mentioning? (both what JFFS2 which looks >>>>>>>> strange to you and the old vendor hack) >>>>>>> >>>>>>> Here is version of the old vendor's driver: >>>>>>> >>>>>>> https://github.com/kszaq/linux-amlogic/blob/master_new_amports/drivers/amlogic/nand/nand/aml_nand.c#L3260 >>>>>>> >>>>>>> In my version there is no BUG() there, but it is same driver for the same chip. >>>>>>> >>>>>>> About JFFS2 - i didn't check its source code, but what I can see using printk(), is that it first >>>>>>> tries to write cleanmarker using OOB write callback. Then later it tries to write to this page, so >>>>>>> may be it is unexpected behaviour of JFFS2? >>>>>> >>>>>> TBH I am not knowledgeable about JFFS2, maybe Richard can help here. >>>>>> >>>>>> Are you sure you flash is recognized by JFFS2 as being a NAND device? >>>>>> Did you enable CONFIG_JFFS2_FS_WRITEBUFFER correctly? Because >>>>>> cleanmarker seem to be discarded when using a NAND device, and >>>>>> recognizing the device as a NAND device requires the above option to be >>>>>> set apparently. >>>>> >>>>> Yes, I have >>>>> >>>>> CONFIG_JFFS2_FS_WRITEBUFFER=y >>>>> >>>>> And i see, that jffs2_mark_erased_block() calls jffs2_cleanmarker_oob() which checks that we have MTD_NANDFLASH. This >>>>> check is true, so then jffs2_write_nand_cleanmarker() is called and there is OOB write in it. So I see opposite thing: >>>>> cleanmarkers are not discarded with NAND device. >>>> >>>> Excellent. So when cleanmarker_size == 0, it means there is no >>>> cleanmarker. But if it is a NAND device, we write the marker anyway. >>>> >>>> Well I guess it used to work on old controllers using a Hamming ECC >>>> engine not protecting any user OOB bytes, so writing the clean markers >>>> would simply not lead to ECC bytes being produced/written. Or it might >>>> have worked as well on controller drivers not enabling the ECC engine >>>> when performing OOB-only writes. It also requires the chip to be old >>>> enough to support multiple writes on the same (sub)page as long as the >>>> written bits do not overlap? >>> >>> Yes, with controller which supports such modes there will be no problem here! >>> What i see, is that this controller doesn't support multiple writes to the >>> same page in ECC mode(e.g. it can't update ECC correctly). >> >> I don't think this is a controller limitation. The NAND chip cannot >> write ECC bytes a first time and then overwrite other ECC bytes, that >> cannot work. The fact that we write ECC bytes in the first place is >> because the ECC engine covers the free OOB bytes used by JFFS2 to write >> its cleanmarkers. >> >>> So in v2 i've added >>> patch which moves OOB out of ECC area, thus JFFS2 driver will work correctly. >> >> I am sorry but the above sentence is not clear to me. I believe you >> meant the free OOB bytes are moved outside of the area protected by the >> ECC engine. In this case I guess it should be fine. > > Exactly, free bytes which are reported by OOB layout callbacks were moved out of > ECC area. > >> >>> So for me main question here is: >>> >>> How JFFS2 should work with controllers where we can't update data and OOB >>> independently? Driver of this filesystem knows nothing about this features of >>> the controller. >>> >>> Or JFFS2 works incorrectly in my case when it tries to call write page callback >>> after calling write OOB callback (IIUC it is better to ask Richard as You mentioned above). >>> >>> Or may be it is better to suppress OOB write callback (or set it to NULL) in this >>> driver as in vendor's driver? >> >> I would assume using the unprotected free OOB bytes to store the >> cleanmarkers should work. But that's a bit fragile and very filesystem >> oriented. I don't like this much. But on the other side JFFS2 is >> legacy, you should use UBI (which does not play with OOB areas) :-) > > Problem here is that we can't use UBI in this case, because it does not support > small fs images. So the only way to make JFFS2 work is to move free OOB bytes to > non protected area. Otherwise i think we have strange situation that JFFS2 can't > work correctly on specific type on NAND controller. We already had same problem > on another NAND controller, and solution was to move OOB free bytes no non-protected > area: > > https://lore.kernel.org/all/20230329114240.378722-1-mmkurbanov@sberdevices.ru/ > > Thanks, Arseniy Upd: may be i can add option for this driver, which makes JFFS2 work correctly on this chip. This feature suppresses OOB writes as in old driver. By default it is disabled and OOB is ECC protected(current behaviour), if enabled - it prints WARN_ONCE() and always returns 0. What do You think? Or may be add an option, which moves free bytes of OOB to ECC non-protected area and it is disabled by default. Thanks, Arseniy > >> >> Thanks, >> Miquèl >> >>> >>> Thanks, Arseniy >>> >>>> >>>> Perhaps that's what the hack in the old driver is for. But that's >>>> IMHO broken in case of unexpected reboot :-) >>>> >>>> Miquèl >>
On 03.05.2023 22:48, Richard Weinberger wrote: > Sorry for joining the party late. > > On Wed, May 3, 2023 at 10:07 AM Miquel Raynal <miquel.raynal@bootlin.com> wrote: >>> So for me main question here is: >>> >>> How JFFS2 should work with controllers where we can't update data and OOB >>> independently? Driver of this filesystem knows nothing about this features of >>> the controller. >>> >>> Or JFFS2 works incorrectly in my case when it tries to call write page callback >>> after calling write OOB callback (IIUC it is better to ask Richard as You mentioned above). >>> >>> Or may be it is better to suppress OOB write callback (or set it to NULL) in this >>> driver as in vendor's driver? >> >> I would assume using the unprotected free OOB bytes to store the >> cleanmarkers should work. But that's a bit fragile and very filesystem >> oriented. I don't like this much. But on the other side JFFS2 is >> legacy, you should use UBI (which does not play with OOB areas) :-) > > Please note that JFFS2's main use case is NOR flash. Support for NAND > flash was never > considered as fully complete/stable. That's why UBIFS (JFFS3 in the > very beginning) > was invented. Hello Richard! Thanks for this details! But I think it will be useful to make JFFS2 work correctly on this chip :) Anyway i think it is not rare/exotic case to use JFFS2 on NAND. Thanks, Arseniy > > Thanks, > //richard
Hi Arseniy, avkrasnov@sberdevices.ru wrote on Thu, 4 May 2023 14:37:45 +0300: > On 03.05.2023 13:23, Arseniy Krasnov wrote: > > > > > > On 03.05.2023 11:03, Miquel Raynal wrote: > >> Hi Arseniy, > >> > >> avkrasnov@sberdevices.ru wrote on Tue, 2 May 2023 19:13:38 +0300: > >> > >>> On 02.05.2023 16:05, Miquel Raynal wrote: > >>>> Hi Arseniy, > >>>> > >>>> avkrasnov@sberdevices.ru wrote on Tue, 2 May 2023 15:24:09 +0300: > >>>> > >>>>> On 02.05.2023 15:17, Miquel Raynal wrote: > >>>>>> Hi Arseniy, > >>>>>> > >>>>>> Richard, your input is welcome below :-) > >>>>>> > >>>>>>>>>>>>>> I just checked JFFS2 mount/umount again, here is what i see: > >>>>>>>>>>>>>> 0) First attempt to mount JFFS2. > >>>>>>>>>>>>>> 1) It writes OOB to page N (i'm using raw write). It is cleanmarker value 0x85 0x19 0x03 0x20. Mount is done. > >>>>>>>>>>>>>> 2) Umount JFFS2. Done. > >>>>>>>>>>>>>> 3) Second attempt to mount JFFS2. > >>>>>>>>>>>>>> 4) It reads OOB from page N (i'm using raw read). Value is 0x85 0x19 0x03 0x20. Done. > >>>>>>>>>>>>>> 5) It reads page N in ECC mode, and i get: > >>>>>>>>>>>>>> jffs2: mtd->read(0x100 bytes from N) returned ECC error > >>>>>>>>>>>>>> 6) Mount failed. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> We already had problem which looks like this on another device. Solution was to use OOB area which is > >>>>>>>>>>>>>> not covered by ECC for JFFS2 cleanmarkers. > >>>>>>>>>>>> > >>>>>>>>>>>> ok, so there is not ECC parity bytes and mtd->read() returns ECC error. > >>>>>>>>>>>> does it have to use raw write/read on step 1) and 4)? > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> If i'm using non raw access to OOB, for example write OOB (user bytes) in ECC mode, then > >>>>>>>>>>> steps 1) and 4) and 5) passes ok, but write to this page will be impossible (for example JFFS2 > >>>>>>>>>>> writes to such pages later) - we can't update ECC codes properly without erasing whole page. > >>>>>>>>>>> Write operation will be done without problem, but read will trigger ECC errors due to broken > >>>>>>>>>>> ECC codes. > >>>>>>>>>>> > >>>>>>>>>>> In general problem that we discuss is that in current implementation data and OOB conflicts > >>>>>>>>>>> with each other by sharing same ECC codes, these ECC codes could be written only once (without > >>>>>>>>>>> erasing), while data and OOB has different callbacks to access and thus supposed to work > >>>>>>>>>>> separately. > >>>>>>>>>> > >>>>>>>>>> The fact that there might be helpers just for writing OOB areas or just > >>>>>>>>>> in-band areas are optimizations. NAND pages are meant to be written a > >>>>>>>>>> single time, no matter what portion you write. In some cases, it is > >>>>>>>>>> possible to perform subpage writes if the chip supports it. Pages may > >>>>>>>>>> be split into several areas which cover a partial in-band area *and* a > >>>>>>>>>> partial OOB area. If you write into the in-band *or* out-of-band areas > >>>>>>>>>> of a given subpage, you *cannot* write the other part later without > >>>>>>>>> > >>>>>>>>> Thanks for details! So in case of JFFS2 it looks like strange, that it tries > >>>>>>>>> to write page after writing clean markers to it before? In the old vendor's > >>>>>>>>> driver OOB write callback is suppressed by return 0 always and JFFS2 works > >>>>>>>>> correctly. > >>>>>>>> > >>>>>>>> Can you point the code you're mentioning? (both what JFFS2 which looks > >>>>>>>> strange to you and the old vendor hack) > >>>>>>> > >>>>>>> Here is version of the old vendor's driver: > >>>>>>> > >>>>>>> https://github.com/kszaq/linux-amlogic/blob/master_new_amports/drivers/amlogic/nand/nand/aml_nand.c#L3260 > >>>>>>> > >>>>>>> In my version there is no BUG() there, but it is same driver for the same chip. > >>>>>>> > >>>>>>> About JFFS2 - i didn't check its source code, but what I can see using printk(), is that it first > >>>>>>> tries to write cleanmarker using OOB write callback. Then later it tries to write to this page, so > >>>>>>> may be it is unexpected behaviour of JFFS2? > >>>>>> > >>>>>> TBH I am not knowledgeable about JFFS2, maybe Richard can help here. > >>>>>> > >>>>>> Are you sure you flash is recognized by JFFS2 as being a NAND device? > >>>>>> Did you enable CONFIG_JFFS2_FS_WRITEBUFFER correctly? Because > >>>>>> cleanmarker seem to be discarded when using a NAND device, and > >>>>>> recognizing the device as a NAND device requires the above option to be > >>>>>> set apparently. > >>>>> > >>>>> Yes, I have > >>>>> > >>>>> CONFIG_JFFS2_FS_WRITEBUFFER=y > >>>>> > >>>>> And i see, that jffs2_mark_erased_block() calls jffs2_cleanmarker_oob() which checks that we have MTD_NANDFLASH. This > >>>>> check is true, so then jffs2_write_nand_cleanmarker() is called and there is OOB write in it. So I see opposite thing: > >>>>> cleanmarkers are not discarded with NAND device. > >>>> > >>>> Excellent. So when cleanmarker_size == 0, it means there is no > >>>> cleanmarker. But if it is a NAND device, we write the marker anyway. > >>>> > >>>> Well I guess it used to work on old controllers using a Hamming ECC > >>>> engine not protecting any user OOB bytes, so writing the clean markers > >>>> would simply not lead to ECC bytes being produced/written. Or it might > >>>> have worked as well on controller drivers not enabling the ECC engine > >>>> when performing OOB-only writes. It also requires the chip to be old > >>>> enough to support multiple writes on the same (sub)page as long as the > >>>> written bits do not overlap? > >>> > >>> Yes, with controller which supports such modes there will be no problem here! > >>> What i see, is that this controller doesn't support multiple writes to the > >>> same page in ECC mode(e.g. it can't update ECC correctly). > >> > >> I don't think this is a controller limitation. The NAND chip cannot > >> write ECC bytes a first time and then overwrite other ECC bytes, that > >> cannot work. The fact that we write ECC bytes in the first place is > >> because the ECC engine covers the free OOB bytes used by JFFS2 to write > >> its cleanmarkers. > >> > >>> So in v2 i've added > >>> patch which moves OOB out of ECC area, thus JFFS2 driver will work correctly. > >> > >> I am sorry but the above sentence is not clear to me. I believe you > >> meant the free OOB bytes are moved outside of the area protected by the > >> ECC engine. In this case I guess it should be fine. > > > > Exactly, free bytes which are reported by OOB layout callbacks were moved out of > > ECC area. > > > >> > >>> So for me main question here is: > >>> > >>> How JFFS2 should work with controllers where we can't update data and OOB > >>> independently? Driver of this filesystem knows nothing about this features of > >>> the controller. > >>> > >>> Or JFFS2 works incorrectly in my case when it tries to call write page callback > >>> after calling write OOB callback (IIUC it is better to ask Richard as You mentioned above). > >>> > >>> Or may be it is better to suppress OOB write callback (or set it to NULL) in this > >>> driver as in vendor's driver? > >> > >> I would assume using the unprotected free OOB bytes to store the > >> cleanmarkers should work. But that's a bit fragile and very filesystem > >> oriented. I don't like this much. But on the other side JFFS2 is > >> legacy, you should use UBI (which does not play with OOB areas) :-) > > > > Problem here is that we can't use UBI in this case, because it does not support > > small fs images. So the only way to make JFFS2 work is to move free OOB bytes to > > non protected area. Otherwise i think we have strange situation that JFFS2 can't > > work correctly on specific type on NAND controller. We already had same problem > > on another NAND controller, and solution was to move OOB free bytes no non-protected > > area: > > > > https://lore.kernel.org/all/20230329114240.378722-1-mmkurbanov@sberdevices.ru/ > > > > Thanks, Arseniy > > Upd: may be i can add option for this driver, which makes JFFS2 work correctly on this chip. > This feature suppresses OOB writes as in old driver. By default it is disabled and OOB is > ECC protected(current behaviour), if enabled - it prints WARN_ONCE() and always returns 0. > What do You think? > > Or may be add an option, which moves free bytes of OOB to ECC non-protected area and it is disabled > by default. I prefer having a single ooblayout where we expose unprotected user OOB bytes only. As of today, the only upstream user of user OOB bytes is JFFS2 anyway. Thanks, Miquèl
On 04.05.2023 15:17, Miquel Raynal wrote: > Hi Arseniy, > > avkrasnov@sberdevices.ru wrote on Thu, 4 May 2023 14:37:45 +0300: > >> On 03.05.2023 13:23, Arseniy Krasnov wrote: >>> >>> >>> On 03.05.2023 11:03, Miquel Raynal wrote: >>>> Hi Arseniy, >>>> >>>> avkrasnov@sberdevices.ru wrote on Tue, 2 May 2023 19:13:38 +0300: >>>> >>>>> On 02.05.2023 16:05, Miquel Raynal wrote: >>>>>> Hi Arseniy, >>>>>> >>>>>> avkrasnov@sberdevices.ru wrote on Tue, 2 May 2023 15:24:09 +0300: >>>>>> >>>>>>> On 02.05.2023 15:17, Miquel Raynal wrote: >>>>>>>> Hi Arseniy, >>>>>>>> >>>>>>>> Richard, your input is welcome below :-) >>>>>>>> >>>>>>>>>>>>>>>> I just checked JFFS2 mount/umount again, here is what i see: >>>>>>>>>>>>>>>> 0) First attempt to mount JFFS2. >>>>>>>>>>>>>>>> 1) It writes OOB to page N (i'm using raw write). It is cleanmarker value 0x85 0x19 0x03 0x20. Mount is done. >>>>>>>>>>>>>>>> 2) Umount JFFS2. Done. >>>>>>>>>>>>>>>> 3) Second attempt to mount JFFS2. >>>>>>>>>>>>>>>> 4) It reads OOB from page N (i'm using raw read). Value is 0x85 0x19 0x03 0x20. Done. >>>>>>>>>>>>>>>> 5) It reads page N in ECC mode, and i get: >>>>>>>>>>>>>>>> jffs2: mtd->read(0x100 bytes from N) returned ECC error >>>>>>>>>>>>>>>> 6) Mount failed. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> We already had problem which looks like this on another device. Solution was to use OOB area which is >>>>>>>>>>>>>>>> not covered by ECC for JFFS2 cleanmarkers. >>>>>>>>>>>>>> >>>>>>>>>>>>>> ok, so there is not ECC parity bytes and mtd->read() returns ECC error. >>>>>>>>>>>>>> does it have to use raw write/read on step 1) and 4)? >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> If i'm using non raw access to OOB, for example write OOB (user bytes) in ECC mode, then >>>>>>>>>>>>> steps 1) and 4) and 5) passes ok, but write to this page will be impossible (for example JFFS2 >>>>>>>>>>>>> writes to such pages later) - we can't update ECC codes properly without erasing whole page. >>>>>>>>>>>>> Write operation will be done without problem, but read will trigger ECC errors due to broken >>>>>>>>>>>>> ECC codes. >>>>>>>>>>>>> >>>>>>>>>>>>> In general problem that we discuss is that in current implementation data and OOB conflicts >>>>>>>>>>>>> with each other by sharing same ECC codes, these ECC codes could be written only once (without >>>>>>>>>>>>> erasing), while data and OOB has different callbacks to access and thus supposed to work >>>>>>>>>>>>> separately. >>>>>>>>>>>> >>>>>>>>>>>> The fact that there might be helpers just for writing OOB areas or just >>>>>>>>>>>> in-band areas are optimizations. NAND pages are meant to be written a >>>>>>>>>>>> single time, no matter what portion you write. In some cases, it is >>>>>>>>>>>> possible to perform subpage writes if the chip supports it. Pages may >>>>>>>>>>>> be split into several areas which cover a partial in-band area *and* a >>>>>>>>>>>> partial OOB area. If you write into the in-band *or* out-of-band areas >>>>>>>>>>>> of a given subpage, you *cannot* write the other part later without >>>>>>>>>>> >>>>>>>>>>> Thanks for details! So in case of JFFS2 it looks like strange, that it tries >>>>>>>>>>> to write page after writing clean markers to it before? In the old vendor's >>>>>>>>>>> driver OOB write callback is suppressed by return 0 always and JFFS2 works >>>>>>>>>>> correctly. >>>>>>>>>> >>>>>>>>>> Can you point the code you're mentioning? (both what JFFS2 which looks >>>>>>>>>> strange to you and the old vendor hack) >>>>>>>>> >>>>>>>>> Here is version of the old vendor's driver: >>>>>>>>> >>>>>>>>> https://github.com/kszaq/linux-amlogic/blob/master_new_amports/drivers/amlogic/nand/nand/aml_nand.c#L3260 >>>>>>>>> >>>>>>>>> In my version there is no BUG() there, but it is same driver for the same chip. >>>>>>>>> >>>>>>>>> About JFFS2 - i didn't check its source code, but what I can see using printk(), is that it first >>>>>>>>> tries to write cleanmarker using OOB write callback. Then later it tries to write to this page, so >>>>>>>>> may be it is unexpected behaviour of JFFS2? >>>>>>>> >>>>>>>> TBH I am not knowledgeable about JFFS2, maybe Richard can help here. >>>>>>>> >>>>>>>> Are you sure you flash is recognized by JFFS2 as being a NAND device? >>>>>>>> Did you enable CONFIG_JFFS2_FS_WRITEBUFFER correctly? Because >>>>>>>> cleanmarker seem to be discarded when using a NAND device, and >>>>>>>> recognizing the device as a NAND device requires the above option to be >>>>>>>> set apparently. >>>>>>> >>>>>>> Yes, I have >>>>>>> >>>>>>> CONFIG_JFFS2_FS_WRITEBUFFER=y >>>>>>> >>>>>>> And i see, that jffs2_mark_erased_block() calls jffs2_cleanmarker_oob() which checks that we have MTD_NANDFLASH. This >>>>>>> check is true, so then jffs2_write_nand_cleanmarker() is called and there is OOB write in it. So I see opposite thing: >>>>>>> cleanmarkers are not discarded with NAND device. >>>>>> >>>>>> Excellent. So when cleanmarker_size == 0, it means there is no >>>>>> cleanmarker. But if it is a NAND device, we write the marker anyway. >>>>>> >>>>>> Well I guess it used to work on old controllers using a Hamming ECC >>>>>> engine not protecting any user OOB bytes, so writing the clean markers >>>>>> would simply not lead to ECC bytes being produced/written. Or it might >>>>>> have worked as well on controller drivers not enabling the ECC engine >>>>>> when performing OOB-only writes. It also requires the chip to be old >>>>>> enough to support multiple writes on the same (sub)page as long as the >>>>>> written bits do not overlap? >>>>> >>>>> Yes, with controller which supports such modes there will be no problem here! >>>>> What i see, is that this controller doesn't support multiple writes to the >>>>> same page in ECC mode(e.g. it can't update ECC correctly). >>>> >>>> I don't think this is a controller limitation. The NAND chip cannot >>>> write ECC bytes a first time and then overwrite other ECC bytes, that >>>> cannot work. The fact that we write ECC bytes in the first place is >>>> because the ECC engine covers the free OOB bytes used by JFFS2 to write >>>> its cleanmarkers. >>>> >>>>> So in v2 i've added >>>>> patch which moves OOB out of ECC area, thus JFFS2 driver will work correctly. >>>> >>>> I am sorry but the above sentence is not clear to me. I believe you >>>> meant the free OOB bytes are moved outside of the area protected by the >>>> ECC engine. In this case I guess it should be fine. >>> >>> Exactly, free bytes which are reported by OOB layout callbacks were moved out of >>> ECC area. >>> >>>> >>>>> So for me main question here is: >>>>> >>>>> How JFFS2 should work with controllers where we can't update data and OOB >>>>> independently? Driver of this filesystem knows nothing about this features of >>>>> the controller. >>>>> >>>>> Or JFFS2 works incorrectly in my case when it tries to call write page callback >>>>> after calling write OOB callback (IIUC it is better to ask Richard as You mentioned above). >>>>> >>>>> Or may be it is better to suppress OOB write callback (or set it to NULL) in this >>>>> driver as in vendor's driver? >>>> >>>> I would assume using the unprotected free OOB bytes to store the >>>> cleanmarkers should work. But that's a bit fragile and very filesystem >>>> oriented. I don't like this much. But on the other side JFFS2 is >>>> legacy, you should use UBI (which does not play with OOB areas) :-) >>> >>> Problem here is that we can't use UBI in this case, because it does not support >>> small fs images. So the only way to make JFFS2 work is to move free OOB bytes to >>> non protected area. Otherwise i think we have strange situation that JFFS2 can't >>> work correctly on specific type on NAND controller. We already had same problem >>> on another NAND controller, and solution was to move OOB free bytes no non-protected >>> area: >>> >>> https://lore.kernel.org/all/20230329114240.378722-1-mmkurbanov@sberdevices.ru/ >>> >>> Thanks, Arseniy >> >> Upd: may be i can add option for this driver, which makes JFFS2 work correctly on this chip. >> This feature suppresses OOB writes as in old driver. By default it is disabled and OOB is >> ECC protected(current behaviour), if enabled - it prints WARN_ONCE() and always returns 0. >> What do You think? >> >> Or may be add an option, which moves free bytes of OOB to ECC non-protected area and it is disabled >> by default. > > I prefer having a single ooblayout where we expose unprotected user OOB > bytes only. As of today, the only upstream user of user OOB bytes is > JFFS2 anyway. Ok, so i'll prepare v3 on the next week with this update (it is already in v2, but i'm going to modify another patch with updates from @Liang). Thanks, Arseniy > > Thanks, > Miquèl
diff --git a/drivers/mtd/nand/raw/meson_nand.c b/drivers/mtd/nand/raw/meson_nand.c index f84a10238e4d..f2f2472cb511 100644 --- a/drivers/mtd/nand/raw/meson_nand.c +++ b/drivers/mtd/nand/raw/meson_nand.c @@ -858,9 +858,12 @@ static int meson_nfc_read_page_sub(struct nand_chip *nand, static int meson_nfc_read_page_raw(struct nand_chip *nand, u8 *buf, int oob_required, int page) { + struct mtd_info *mtd = nand_to_mtd(nand); u8 *oob_buf = nand->oob_poi; int ret; + memset(oob_buf, 0, mtd->oobsize); + ret = meson_nfc_read_page_sub(nand, page, 1); if (ret) return ret; @@ -881,6 +884,8 @@ static int meson_nfc_read_page_hwecc(struct nand_chip *nand, u8 *buf, u8 *oob_buf = nand->oob_poi; int ret, i; + memset(oob_buf, 0, mtd->oobsize); + ret = meson_nfc_read_page_sub(nand, page, 0); if (ret) return ret;
This NAND reads only few user's bytes in ECC mode (not full OOB), so fill OOB buffer with zeroes to not return garbage from previous reads to user. Otherwise 'nanddump' utility prints something like this for just erased page: ... 0x000007f0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff OOB Data: ff ff ff ff 00 00 ff ff 80 cf 22 99 cb ad d3 be OOB Data: 63 27 ae 06 16 0a 2f eb bb dd 46 74 41 8e 88 6e OOB Data: 38 a1 2d e6 77 d4 05 06 f2 a5 7e 25 eb 34 7c ff OOB Data: 38 ea de 14 10 de 9b 40 33 16 6a cc 9d aa 2f 5e Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru> --- drivers/mtd/nand/raw/meson_nand.c | 5 +++++ 1 file changed, 5 insertions(+)