Skip to content

boards: spresense: Fix link errors in parallel build.#102

Closed
masayuki2009 wants to merge 1 commit into
apache:masterfrom
masayuki2009:fix_parallel_build_for_spresense
Closed

boards: spresense: Fix link errors in parallel build.#102
masayuki2009 wants to merge 1 commit into
apache:masterfrom
masayuki2009:fix_parallel_build_for_spresense

Conversation

@masayuki2009

Copy link
Copy Markdown
Contributor

Summary

  • In pararell build, current build system updates some '.a' files simultaniously and thus '.a' files are corrupted. To aoid this erros, ar command is locked by flock command.

Impact

  • Spresense build is only affected.

Limitations / TODO

  • flock command needs to be installed but on Ubuntu platform the command is installed by default.

  • There is no limitations for the current Spresense configurations. However, if you add apps/examples/posix_spawn, parallel build would fail. I think this is a Makefile problem under the directory.

Testing

  • I tested all the current configurations for Spresense

In parallel build, current build system updates some '.a' files
simultaneously and thus '.a' files are corrupted. To avoid this
errors, ar command is locked by flock command.

Signed-off-by: Masayuki Ishikawa <Masayuki.Ishikawa@jp.sony.com>
@masayuki2009

Copy link
Copy Markdown
Contributor Author

@jerpelea I added flock command for parallel build. Do you have any concerns? As far as I confirmed (please see the summary), it works well.

@jerpelea

Copy link
Copy Markdown
Contributor

@MasayukiIshikawa can it be replicated on other devices?

I did not see this issue on my machine

@masayuki2009

Copy link
Copy Markdown
Contributor Author

@MasayukiIshikawa can it be replicated on other devices?

I did not see this issue on my machine

@jerpelea Though it's possible to apply this changes to other devices, how can I do build tests? Also, do we assume to use Ubuntu platform? I think we should wait for a new CI system.

@jerpelea

Copy link
Copy Markdown
Contributor

@MasayukiIshikawa i am building with ubuntu on a multicore mashine

How did you spot the issue ?
Are you using mac? windows? linux?

@masayuki2009

Copy link
Copy Markdown
Contributor Author

@MasayukiIshikawa i am building with ubuntu on a multicore mashine

How did you spot the issue ?
Are you using mac? windows? linux?

@jerpelea I can reproduce link errors for spresense:wifi configuration with -j4 build with my Ubuntu machine. Actually it totally depends on host processor performance (CPU/Memory/HDD or SSD). For example if I specify -j5 it has no problem. So you need to find a number which causes parallel build issues.

@jerpelea

Copy link
Copy Markdown
Contributor

strange since I can't reproduce
I am building j40 and j56

@patacongo

Copy link
Copy Markdown
Contributor

Is flock available on all platforms? Linux, Cygwin, MSYS, macOS, FreeBSD? I would think so. Windows native would be an issue.

@patacongo

Copy link
Copy Markdown
Contributor

We cannot merge this PR while there is ongoing discussion. We need for you to advice us when the discussion reaches a conclusion. You make close the PR yourself if you don't want to merge, or lets us know if you do want to merge.

Alin... since you are committer, you could also just "Rebase and merge" the change to master if you are happy with it.

@jerpelea

jerpelea commented Jan 15, 2020

Copy link
Copy Markdown
Contributor

I am trying to reproduce the issue before merging it and for now i tried all combinations from j1 to j56 without success.

@xiaoxiang781216

xiaoxiang781216 commented Jan 15, 2020

Copy link
Copy Markdown
Contributor

Hi all,
This is a common issue, please find a global solution to fix it(at least the file lock should move to common place), otherwise it is hard to speed up the automation build without the reliable parallel build.

@jerpelea

Copy link
Copy Markdown
Contributor

@xiaoxiang781216 can you provide a way to replicate it ?

@xiaoxiang781216

Copy link
Copy Markdown
Contributor

No, we hit the problem randomly, why we need provide a 100% repo step for a race condition issue? The problem is there: t is a bad thing that multiple thread write to a same library(libapps.a).
We either:
1.Add file lock or
2.Change how libapps.a get generated
Either method is fine, but the fix need put to the common place.

@masayuki2009

Copy link
Copy Markdown
Contributor Author

No, we hit the problem randomly, why we need provide a 100% repo step for a race condition issue? The problem is there: t is a bad thing that multiple thread write to a same library(libapps.a).
We either:
1.Add file lock or
2.Change how libapps.a get generated
Either method is fine, but the fix need put to the common place.

@xiaoxiang781216 Thanks and your comments are right.

@masayuki2009

Copy link
Copy Markdown
Contributor Author

Is flock available on all platforms? Linux, Cygwin, MSYS, macOS, FreeBSD? I would think so. Windows native would be an issue.

@patacongo as long as I checked other platforms, Linux/Cygwin/FreeBSD provide flock command. macOS does not provide the command but we can find a solution. However I was not able to find a solution for MSYS|MinGW.

http://www.polarhome.com/service/man/?qf=FLOCK&af=0&tf=2&of=Cygwin
https://config9.com/linux/bash/mac-os-x-equivalent-of-linux-flock1-command/
https://www.freebsd.org/cgi/man.cgi?query=flock&amp;sektion=2

@patacongo

Copy link
Copy Markdown
Contributor

flock() is available on both Cygwin and MSYS2. I checked Cygwin earlier. I am on MSYS2 now an I see:

$ which flock
/usr/bin/flock
$ flock --version
flock from util-linux 2.33.1

@patacongo

Copy link
Copy Markdown
Contributor

It appears that macOS does support flock(): https://developer.apple.com/library/archive/documentation/System/Conceptual/ManPages_iPhoneOS/man2/flock.2.html

But that is odd because flock() is part of util-linux. I also see at https://stackoverflow.com/questions/10526651/mac-os-x-equivalent-of-linux-flock1-command: "There is a cross-platform flock command here: https://github.com/discoteq/flock"

Perhaps that is the flock() used on maOS?

@patacongo

Copy link
Copy Markdown
Contributor

If we build the cross-platform version, we would be good everywhere and the behavior would always be the same, even on Window native. I think we should consider that.

@xiaoxiang781216

xiaoxiang781216 commented Jan 15, 2020

Copy link
Copy Markdown
Contributor

@patacongo since:
1.There are many methods to build NuttX on Windows: Cygwin, MSYS2, Unbuntu on Windows.
2.Makefile.win for Windows native doesn't update for a long time, I suppose it is broken now.
Do you consider that it's fine to remove Window native to save the maintain cost? I am asking this question because it will give the user VERY BAD impression if we document that NuttX support Windows native but fail to build in reality.

@patacongo

Copy link
Copy Markdown
Contributor

I think we should keep windows native support. It hasn't been used in sometime, but there have been users in the past and there will be users in the future. It is an important component for some environments and some SDKs.

It is the normal history of the Windows native build to languish for a year or two, then some one needs it and brings it up to date. That cycle has repeated many times.

I would vote to keep the hooks in place and bring it up to date when next needed. Notice that I gave you a reference to a platform independent version of flock() that can be used with Windows native.

I think that abandoning a platform is a very serious thing and should be considered very carefully. There should be no quick decisions to do that. It is a violation of one of the basic principles of the Inviolables and, I think we would need a full discussion and full vote before we did anything like that.

@patacongo

Copy link
Copy Markdown
Contributor

A simplifying thought occurs to me. Currently the Windows native build requires a few Unix like tools that it historically gets from GnuWin32. But I think we could use the MSYS2 tools instead. So it seems to me that we could reduce the native build to a special case MSYS2 build.

The special case is that it does not use the Bash shell. Rather, it uses a CMD.com shell of some kind. This is necessary because the Windows native build environment (probably Visual Studio) executes in the context of CMD.com.

I think if the few .bat files needed in the build were replaced with .c executables for Windows, then you should be to build essentially shell-less (although some CMD.com executables would still be required).

That is why, for example, there is a configure.c which is a work-alike for configure.sh (but much faster). There is also a configure.sh and a configure.bat We should get rid of both of those really and unify to configure.c which works in a POSIX environment as well as a Windows native environment.

There are five other .bat file in nuttx/tools and they are trivial.

Let me experiment a little with the native Windows build. I have not tried it in a long time. Let me see where things are at. Let's not make any hasty decisions.

@patacongo

Copy link
Copy Markdown
Contributor

Let's raise this issues in a discussion thread. Let's discuss for awhile and see if the group would like to remove Windows native support of not. I think there are reasons for keeping it as well as reasons for removing it. Let's get consensus and then we should probably have a vote.

@xiaoxiang781216

Copy link
Copy Markdown
Contributor

I am modifying all config related stuff to remove the hardcode arch and board list, and find that configure.c call opendir/readdir/closedir, I think that Windows just support FindFirstFile/FindNextFile API?
Since the build need kconfig-backend and gmake, both tools need a Unix like environment? I can't understand Windows native has any addtional value if the essential tools need Cygwin or MSYS to build anyway.

@xiaoxiang781216

Copy link
Copy Markdown
Contributor

Let's raise this issues in a discussion thread. Let's discuss for awhile and see if the group would like to remove Windows native support of not. I think there are reasons for keeping it as well as reasons for removing it. Let's get consensus and then we should probably have a vote.

Sure, this type of change definitely need to make a consensus in community.

@jerpelea

Copy link
Copy Markdown
Contributor

@xiaoxiang781216 thanks for the clarification

I think that instead of avoiding the race by adding a dependency we should fix
how the apps are generated. This approach will fix it for all platforms and ensure less friction.

@patacongo downstream projects use Windows native and we should not remove support to work around bugs

@Ouss4

Ouss4 commented Jan 18, 2020

Copy link
Copy Markdown
Member

Hi,
Please advice, should this PR be closed without getting merged?

@jerpelea

Copy link
Copy Markdown
Contributor

I think that we should fix the issue with apps instead of avoiding it on each platform

@patacongo

Copy link
Copy Markdown
Contributor

This has been open for a long time. Most people seem to be opposed to the change. @Ouss4 asked if we should close this PR without merging several days ago. No one responded.

I am closing it now. If anyone has strong feelings, you should re-open it. Let's make the default state closed so that we do not have to see this every day as an open PR.

@patacongo patacongo closed this Jan 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants