use rmutex inside of all repeated implementation by anjiahao1 · Pull Request #6320 · apache/nuttx

anjiahao1 · 2022-05-24T14:44:24Z

Signed-off-by: anjiahao anjiahao@xiaomi.com

Impact

maybe Slightly reduced szie and increased speed

Testing

CI

pkarashchenko

In general looks good. Just few comments left

anjiahao1 · 2022-05-26T04:04:51Z

What is the difference between _SEM_DOSTRY andnxrmutex_destroy

pkarashchenko · 2022-05-26T08:28:05Z

What is the difference between _SEM_DOSTRY andnxrmutex_destroy

#if !defined(CONFIG_BUILD_FLAT) && defined(__KERNEL__)
#  define _SEM_DESTROY(s)       nxsem_destroy(s)
#else
#  define _SEM_DESTROY(s)       sem_destroy(s)
#endif

So the libc layer should be _SEM_DESTROY and not nxsem_destroy
Or we need to change mutex.h to use _SEM_ instead of nxsem_

anjiahao1 · 2022-05-26T13:10:45Z

maybe we need MUTEX_INITIALIZER -> NXMUTEX_INITIALIZER

pkarashchenko · 2022-05-26T13:16:21Z

Wow, that is something new. Need to check mainline builds

pkarashchenko · 2022-05-27T17:35:42Z

@xiaoxiang781216 I see that we are attempting to solve the mutex issue with "small blood", but IMO we need to look deeper. We have priority inheritance in place that should be applied to mutex only, but is implemented at semaphore level (that I believe is totally wrong), so we have holder for mutex and holder for semaphore in case of priority inheritance that at the moment of time when mutex is held are the same process IDs.
I truly believe that we need to implement nxmutex similar to nxsem as a separate kernel object and locate priority inheritance there, so we will "kill two rabbits with a single shot". Until then we will suffer from dozens of corner cases and complicating logic to cover it.

xiaoxiang781216 · 2022-05-27T17:56:12Z

@xiaoxiang781216 I see that we are attempting to solve the mutex issue with "small blood", but IMO we need to look deeper. We have priority inheritance in place that should be applied to mutex only, but is implemented at semaphore level (that I believe is totally wrong), so we have holder for mutex and holder for semaphore in case of priority inheritance that at the moment of time when mutex is held are the same process IDs. I truly believe that we need to implement nxmutex similar to nxsem as a separate kernel object and locate priority inheritance there, so we will "kill two rabbits with a single shot". Until then we will suffer from dozens of corner cases and complicating logic to cover it.

Yes, I agree that it isn't good to use semaphore both for the signal and lock. That's why we add nxmutex_t and nxrmutex_t:

Implement the recursive lock(nxmutex_t) and migrate all usage to it
Migrate all unrecursive sem lock/unlock to mutex_t

With the above change, we can get the clean code base that:

All code(except the implementation of [pthread_|nxr]mutex_t) use [nx]sem_t for the signal only
All code use [pthread_|nxr]mutex_t as the lock

The next steps are:

Change the default protocol from SEM_PRIO_INHERIT to SEM_PRIO_NONE
Remove sem_setprotocol(&sem, SEM_PRIO_NONE) from the code base
Add sem_setprotocol(&sem, SEM_PRIO_INHERIT) to the implementation of [pthread_|nxr]mutex_t)

After this, all priority inheritance should get fixed. Of course, we can move the implementation of priority inheritance from sem_t to mutex_t, so people never get the unexpected behavior with semaphore.

pkarashchenko · 2022-05-27T17:58:04Z

Sounds like a plan :)

Signed-off-by: anjiahao <anjiahao@xiaomi.com>

pkarashchenko · 2022-06-12T18:54:10Z

-static inline int syslog_dev_takesem(FAR struct syslog_dev_s *syslog_dev)
-{
-  pid_t me = getpid();
-  int ret;
-
-  /* Does this thread already hold the semaphore?  That could happen if
-   * we were called recursively, i.e., if the logic kicked off by
-   * file_write() where to generate more debug output.  Return an
-   * error in that case.
-   */
-
-  if (syslog_dev->sl_holder == me)
-    {
-      /* Return an error (instead of deadlocking) */
-
-      return -EWOULDBLOCK;
-    }
-
-  /* Either the semaphore is available or is currently held by another
-   * thread.  Wait for it to become available.
-   */
-
-  ret = nxsem_wait(&syslog_dev->sl_sem);
-  if (ret < 0)
-    {
-      return ret;
-    }
-
-  /* We hold the semaphore.  We can safely mark ourself as the holder
-   * of the semaphore.
-   */
-
-  syslog_dev->sl_holder = me;
-  return OK;
-}


@xiaoxiang781216 I observe deadlock after I migrate my application to the latest NuttX master. I narrow down the issue and see that it happens when multiple tasks try to use syslog. I have a syslog configuration with 2 channels:

Channel attached to console

Channel attached to file on /mnt/sdcard0/nuttx.log
When the issue happens I observe next picture:

I've noticed that before this change syslog_dev_takesem returned return -EWOULDBLOCK; with a comment /* Return an error (instead of deadlocking) */ and seems like this functionality has been broken now.

In general I think that we rush too much with replacing of semaphores with mutexes. For example the FLAT build functioning changed after this change since all nxsem_ API calls became replaced with sem_ API calls and introduced unnecessary cancellation points in kernel. I'm not sure about the side effect of this, but still investigating.
I think we need to revert part of the changes from this PR especially syslog part and part related to replacement in libc and get back nxmutex_ to use nxsem_ APIs only and use it in kernel part only for now. Then make a step-by-step change to user space.

I hope I did all correctly in #6414

anjiahao1 force-pushed the tihuan_rmutex branch from 7060e12 to 12fc010 Compare May 24, 2022 14:45

pkarashchenko reviewed May 24, 2022

View reviewed changes

Comment thread drivers/1wire/1wire.c Outdated

Comment thread drivers/1wire/1wire.c Outdated

Comment thread drivers/1wire/1wire.c Outdated

anjiahao1 force-pushed the tihuan_rmutex branch 4 times, most recently from ca0bc6f to 133f87a Compare May 25, 2022 05:20