diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2024-11-19 16:35:06 -0800 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2024-11-19 16:35:06 -0800 |
commit | bf9aa14fc523d2763fc9a10672a709224e8fcaf4 (patch) | |
tree | 7d9c0cad473dc27a0c9bb09c561511df9481b066 /kernel/time/posix-timers.c | |
parent | 035238752319a58244d86facd442c5f40b0e97e2 (diff) | |
parent | cdc905d16b07981363e53a21853ba1cf6cd8e92a (diff) | |
download | linux-bf9aa14fc523d2763fc9a10672a709224e8fcaf4.tar.gz linux-bf9aa14fc523d2763fc9a10672a709224e8fcaf4.tar.bz2 linux-bf9aa14fc523d2763fc9a10672a709224e8fcaf4.zip |
Merge tag 'timers-core-2024-11-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer updates from Thomas Gleixner:
"A rather large update for timekeeping and timers:
- The final step to get rid of auto-rearming posix-timers
posix-timers are currently auto-rearmed by the kernel when the
signal of the timer is ignored so that the timer signal can be
delivered once the corresponding signal is unignored.
This requires to throttle the timer to prevent a DoS by small
intervals and keeps the system pointlessly out of low power states
for no value. This is a long standing non-trivial problem due to
the lock order of posix-timer lock and the sighand lock along with
life time issues as the timer and the sigqueue have different life
time rules.
Cure this by:
- Embedding the sigqueue into the timer struct to have the same
life time rules. Aside of that this also avoids the lookup of
the timer in the signal delivery and rearm path as it's just a
always valid container_of() now.
- Queuing ignored timer signals onto a seperate ignored list.
- Moving queued timer signals onto the ignored list when the
signal is switched to SIG_IGN before it could be delivered.
- Walking the ignored list when SIG_IGN is lifted and requeue the
signals to the actual signal lists. This allows the signal
delivery code to rearm the timer.
This also required to consolidate the signal delivery rules so they
are consistent across all situations. With that all self test
scenarios finally succeed.
- Core infrastructure for VFS multigrain timestamping
This is required to allow the kernel to use coarse grained time
stamps by default and switch to fine grained time stamps when inode
attributes are actively observed via getattr().
These changes have been provided to the VFS tree as well, so that
the VFS specific infrastructure could be built on top.
- Cleanup and consolidation of the sleep() infrastructure
- Move all sleep and timeout functions into one file
- Rework udelay() and ndelay() into proper documented inline
functions and replace the hardcoded magic numbers by proper
defines.
- Rework the fsleep() implementation to take the reality of the
timer wheel granularity on different HZ values into account.
Right now the boundaries are hard coded time ranges which fail
to provide the requested accuracy on different HZ settings.
- Update documentation for all sleep/timeout related functions
and fix up stale documentation links all over the place
- Fixup a few usage sites
- Rework of timekeeping and adjtimex(2) to prepare for multiple PTP
clocks
A system can have multiple PTP clocks which are participating in
seperate and independent PTP clock domains. So far the kernel only
considers the PTP clock which is based on CLOCK TAI relevant as
that's the clock which drives the timekeeping adjustments via the
various user space daemons through adjtimex(2).
The non TAI based clock domains are accessible via the file
descriptor based posix clocks, but their usability is very limited.
They can't be accessed fast as they always go all the way out to
the hardware and they cannot be utilized in the kernel itself.
As Time Sensitive Networking (TSN) gains traction it is required to
provide fast user and kernel space access to these clocks.
The approach taken is to utilize the timekeeping and adjtimex(2)
infrastructure to provide this access in a similar way how the
kernel provides access to clock MONOTONIC, REALTIME etc.
Instead of creating a duplicated infrastructure this rework
converts timekeeping and adjtimex(2) into generic functionality
which operates on pointers to data structures instead of using
static variables.
This allows to provide time accessors and adjtimex(2) functionality
for the independent PTP clocks in a subsequent step.
- Consolidate hrtimer initialization
hrtimers are set up by initializing the data structure and then
seperately setting the callback function for historical reasons.
That's an extra unnecessary step and makes Rust support less
straight forward than it should be.
Provide a new set of hrtimer_setup*() functions and convert the
core code and a few usage sites of the less frequently used
interfaces over.
The bulk of the htimer_init() to hrtimer_setup() conversion is
already prepared and scheduled for the next merge window.
- Drivers:
- Ensure that the global timekeeping clocksource is utilizing the
cluster 0 timer on MIPS multi-cluster systems.
Otherwise CPUs on different clusters use their cluster specific
clocksource which is not guaranteed to be synchronized with
other clusters.
- Mostly boring cleanups, fixes, improvements and code movement"
* tag 'timers-core-2024-11-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (140 commits)
posix-timers: Fix spurious warning on double enqueue versus do_exit()
clocksource/drivers/arm_arch_timer: Use of_property_present() for non-boolean properties
clocksource/drivers/gpx: Remove redundant casts
clocksource/drivers/timer-ti-dm: Fix child node refcount handling
dt-bindings: timer: actions,owl-timer: convert to YAML
clocksource/drivers/ralink: Add Ralink System Tick Counter driver
clocksource/drivers/mips-gic-timer: Always use cluster 0 counter as clocksource
clocksource/drivers/timer-ti-dm: Don't fail probe if int not found
clocksource/drivers:sp804: Make user selectable
clocksource/drivers/dw_apb: Remove unused dw_apb_clockevent functions
hrtimers: Delete hrtimer_init_on_stack()
alarmtimer: Switch to use hrtimer_setup() and hrtimer_setup_on_stack()
io_uring: Switch to use hrtimer_setup_on_stack()
sched/idle: Switch to use hrtimer_setup_on_stack()
hrtimers: Delete hrtimer_init_sleeper_on_stack()
wait: Switch to use hrtimer_setup_sleeper_on_stack()
timers: Switch to use hrtimer_setup_sleeper_on_stack()
net: pktgen: Switch to use hrtimer_setup_sleeper_on_stack()
futex: Switch to use hrtimer_setup_sleeper_on_stack()
fs/aio: Switch to use hrtimer_setup_sleeper_on_stack()
...
Diffstat (limited to 'kernel/time/posix-timers.c')
-rw-r--r-- | kernel/time/posix-timers.c | 267 |
1 files changed, 129 insertions, 138 deletions
diff --git a/kernel/time/posix-timers.c b/kernel/time/posix-timers.c index 4576aaed13b2..881a9ce96af7 100644 --- a/kernel/time/posix-timers.c +++ b/kernel/time/posix-timers.c @@ -233,11 +233,12 @@ __initcall(init_posix_timers); * The siginfo si_overrun field and the return value of timer_getoverrun(2) * are of type int. Clamp the overrun value to INT_MAX */ -static inline int timer_overrun_to_int(struct k_itimer *timr, int baseval) +static inline int timer_overrun_to_int(struct k_itimer *timr) { - s64 sum = timr->it_overrun_last + (s64)baseval; + if (timr->it_overrun_last > (s64)INT_MAX) + return INT_MAX; - return sum > (s64)INT_MAX ? INT_MAX : (int)sum; + return (int)timr->it_overrun_last; } static void common_hrtimer_rearm(struct k_itimer *timr) @@ -249,62 +250,62 @@ static void common_hrtimer_rearm(struct k_itimer *timr) hrtimer_restart(timer); } +static bool __posixtimer_deliver_signal(struct kernel_siginfo *info, struct k_itimer *timr) +{ + guard(spinlock)(&timr->it_lock); + + /* + * Check if the timer is still alive or whether it got modified + * since the signal was queued. In either case, don't rearm and + * drop the signal. + */ + if (timr->it_signal_seq != timr->it_sigqueue_seq || WARN_ON_ONCE(!timr->it_signal)) + return false; + + if (!timr->it_interval || WARN_ON_ONCE(timr->it_status != POSIX_TIMER_REQUEUE_PENDING)) + return true; + + timr->kclock->timer_rearm(timr); + timr->it_status = POSIX_TIMER_ARMED; + timr->it_overrun_last = timr->it_overrun; + timr->it_overrun = -1LL; + ++timr->it_signal_seq; + info->si_overrun = timer_overrun_to_int(timr); + return true; +} + /* - * This function is called from the signal delivery code if - * info->si_sys_private is not zero, which indicates that the timer has to - * be rearmed. Restart the timer and update info::si_overrun. + * This function is called from the signal delivery code. It decides + * whether the signal should be dropped and rearms interval timers. The + * timer can be unconditionally accessed as there is a reference held on + * it. */ -void posixtimer_rearm(struct kernel_siginfo *info) +bool posixtimer_deliver_signal(struct kernel_siginfo *info, struct sigqueue *timer_sigq) { - struct k_itimer *timr; - unsigned long flags; - - timr = lock_timer(info->si_tid, &flags); - if (!timr) - return; + struct k_itimer *timr = container_of(timer_sigq, struct k_itimer, sigq); + bool ret; - if (timr->it_interval && timr->it_requeue_pending == info->si_sys_private) { - timr->kclock->timer_rearm(timr); + /* + * Release siglock to ensure proper locking order versus + * timr::it_lock. Keep interrupts disabled. + */ + spin_unlock(¤t->sighand->siglock); - timr->it_active = 1; - timr->it_overrun_last = timr->it_overrun; - timr->it_overrun = -1LL; - ++timr->it_requeue_pending; + ret = __posixtimer_deliver_signal(info, timr); - info->si_overrun = timer_overrun_to_int(timr, info->si_overrun); - } + /* Drop the reference which was acquired when the signal was queued */ + posixtimer_putref(timr); - unlock_timer(timr, flags); + spin_lock(¤t->sighand->siglock); + return ret; } -int posix_timer_queue_signal(struct k_itimer *timr) +void posix_timer_queue_signal(struct k_itimer *timr) { - int ret, si_private = 0; - enum pid_type type; - lockdep_assert_held(&timr->it_lock); - timr->it_active = 0; - if (timr->it_interval) - si_private = ++timr->it_requeue_pending; - - /* - * FIXME: if ->sigq is queued we can race with - * dequeue_signal()->posixtimer_rearm(). - * - * If dequeue_signal() sees the "right" value of - * si_sys_private it calls posixtimer_rearm(). - * We re-queue ->sigq and drop ->it_lock(). - * posixtimer_rearm() locks the timer - * and re-schedules it while ->sigq is pending. - * Not really bad, but not that we want. - */ - timr->sigq->info.si_sys_private = si_private; - - type = !(timr->it_sigev_notify & SIGEV_THREAD_ID) ? PIDTYPE_TGID : PIDTYPE_PID; - ret = send_sigqueue(timr->sigq, timr->it_pid, type); - /* If we failed to send the signal the timer stops. */ - return ret > 0; + timr->it_status = timr->it_interval ? POSIX_TIMER_REQUEUE_PENDING : POSIX_TIMER_DISARMED; + posixtimer_send_sigqueue(timr); } /* @@ -317,62 +318,10 @@ int posix_timer_queue_signal(struct k_itimer *timr) static enum hrtimer_restart posix_timer_fn(struct hrtimer *timer) { struct k_itimer *timr = container_of(timer, struct k_itimer, it.real.timer); - enum hrtimer_restart ret = HRTIMER_NORESTART; - unsigned long flags; - - spin_lock_irqsave(&timr->it_lock, flags); - - if (posix_timer_queue_signal(timr)) { - /* - * The signal was not queued due to SIG_IGN. As a - * consequence the timer is not going to be rearmed from - * the signal delivery path. But as a real signal handler - * can be installed later the timer must be rearmed here. - */ - if (timr->it_interval != 0) { - ktime_t now = hrtimer_cb_get_time(timer); - - /* - * FIXME: What we really want, is to stop this - * timer completely and restart it in case the - * SIG_IGN is removed. This is a non trivial - * change to the signal handling code. - * - * For now let timers with an interval less than a - * jiffy expire every jiffy and recheck for a - * valid signal handler. - * - * This avoids interrupt starvation in case of a - * very small interval, which would expire the - * timer immediately again. - * - * Moving now ahead of time by one jiffy tricks - * hrtimer_forward() to expire the timer later, - * while it still maintains the overrun accuracy - * for the price of a slight inconsistency in the - * timer_gettime() case. This is at least better - * than a timer storm. - * - * Only required when high resolution timers are - * enabled as the periodic tick based timers are - * automatically aligned to the next tick. - */ - if (IS_ENABLED(CONFIG_HIGH_RES_TIMERS)) { - ktime_t kj = TICK_NSEC; - - if (timr->it_interval < kj) - now = ktime_add(now, kj); - } - - timr->it_overrun += hrtimer_forward(timer, now, timr->it_interval); - ret = HRTIMER_RESTART; - ++timr->it_requeue_pending; - timr->it_active = 1; - } - } - unlock_timer(timr, flags); - return ret; + guard(spinlock_irqsave)(&timr->it_lock); + posix_timer_queue_signal(timr); + return HRTIMER_NORESTART; } static struct pid *good_sigevent(sigevent_t * event) @@ -399,32 +348,27 @@ static struct pid *good_sigevent(sigevent_t * event) } } -static struct k_itimer * alloc_posix_timer(void) +static struct k_itimer *alloc_posix_timer(void) { struct k_itimer *tmr = kmem_cache_zalloc(posix_timers_cache, GFP_KERNEL); if (!tmr) return tmr; - if (unlikely(!(tmr->sigq = sigqueue_alloc()))) { + + if (unlikely(!posixtimer_init_sigqueue(&tmr->sigq))) { kmem_cache_free(posix_timers_cache, tmr); return NULL; } - clear_siginfo(&tmr->sigq->info); + rcuref_init(&tmr->rcuref, 1); return tmr; } -static void k_itimer_rcu_free(struct rcu_head *head) -{ - struct k_itimer *tmr = container_of(head, struct k_itimer, rcu); - - kmem_cache_free(posix_timers_cache, tmr); -} - -static void posix_timer_free(struct k_itimer *tmr) +void posixtimer_free_timer(struct k_itimer *tmr) { put_pid(tmr->it_pid); - sigqueue_free(tmr->sigq); - call_rcu(&tmr->rcu, k_itimer_rcu_free); + if (tmr->sigq.ucounts) + dec_rlimit_put_ucounts(tmr->sigq.ucounts, UCOUNT_RLIMIT_SIGPENDING); + kfree_rcu(tmr, rcu); } static void posix_timer_unhash_and_free(struct k_itimer *tmr) @@ -432,7 +376,7 @@ static void posix_timer_unhash_and_free(struct k_itimer *tmr) spin_lock(&hash_lock); hlist_del_rcu(&tmr->t_hash); spin_unlock(&hash_lock); - posix_timer_free(tmr); + posixtimer_putref(tmr); } static int common_timer_create(struct k_itimer *new_timer) @@ -467,7 +411,7 @@ static int do_timer_create(clockid_t which_clock, struct sigevent *event, */ new_timer_id = posix_timer_add(new_timer); if (new_timer_id < 0) { - posix_timer_free(new_timer); + posixtimer_free_timer(new_timer); return new_timer_id; } @@ -485,18 +429,23 @@ static int do_timer_create(clockid_t which_clock, struct sigevent *event, goto out; } new_timer->it_sigev_notify = event->sigev_notify; - new_timer->sigq->info.si_signo = event->sigev_signo; - new_timer->sigq->info.si_value = event->sigev_value; + new_timer->sigq.info.si_signo = event->sigev_signo; + new_timer->sigq.info.si_value = event->sigev_value; } else { new_timer->it_sigev_notify = SIGEV_SIGNAL; - new_timer->sigq->info.si_signo = SIGALRM; - memset(&new_timer->sigq->info.si_value, 0, sizeof(sigval_t)); - new_timer->sigq->info.si_value.sival_int = new_timer->it_id; + new_timer->sigq.info.si_signo = SIGALRM; + memset(&new_timer->sigq.info.si_value, 0, sizeof(sigval_t)); + new_timer->sigq.info.si_value.sival_int = new_timer->it_id; new_timer->it_pid = get_pid(task_tgid(current)); } - new_timer->sigq->info.si_tid = new_timer->it_id; - new_timer->sigq->info.si_code = SI_TIMER; + if (new_timer->it_sigev_notify & SIGEV_THREAD_ID) + new_timer->it_pid_type = PIDTYPE_PID; + else + new_timer->it_pid_type = PIDTYPE_TGID; + + new_timer->sigq.info.si_tid = new_timer->it_id; + new_timer->sigq.info.si_code = SI_TIMER; if (copy_to_user(created_timer_id, &new_timer_id, sizeof (new_timer_id))) { error = -EFAULT; @@ -580,7 +529,14 @@ static struct k_itimer *__lock_timer(timer_t timer_id, unsigned long *flags) * 1) Set timr::it_signal to NULL with timr::it_lock held * 2) Release timr::it_lock * 3) Remove from the hash under hash_lock - * 4) Call RCU for removal after the grace period + * 4) Put the reference count. + * + * The reference count might not drop to zero if timr::sigq is + * queued. In that case the signal delivery or flush will put the + * last reference count. + * + * When the reference count reaches zero, the timer is scheduled + * for RCU removal after the grace period. * * Holding rcu_read_lock() accross the lookup ensures that * the timer cannot be freed. @@ -647,10 +603,10 @@ void common_timer_get(struct k_itimer *timr, struct itimerspec64 *cur_setting) /* interval timer ? */ if (iv) { cur_setting->it_interval = ktime_to_timespec64(iv); - } else if (!timr->it_active) { + } else if (timr->it_status == POSIX_TIMER_DISARMED) { /* * SIGEV_NONE oneshot timers are never queued and therefore - * timr->it_active is always false. The check below + * timr->it_status is always DISARMED. The check below * vs. remaining time will handle this case. * * For all other timers there is nothing to update here, so @@ -667,7 +623,7 @@ void common_timer_get(struct k_itimer *timr, struct itimerspec64 *cur_setting) * is a SIGEV_NONE timer move the expiry time forward by intervals, * so expiry is > now. */ - if (iv && (timr->it_requeue_pending & REQUEUE_PENDING || sig_none)) + if (iv && timr->it_status != POSIX_TIMER_ARMED) timr->it_overrun += kc->timer_forward(timr, now); remaining = kc->timer_remaining(timr, now); @@ -775,7 +731,7 @@ SYSCALL_DEFINE1(timer_getoverrun, timer_t, timer_id) if (!timr) return -EINVAL; - overrun = timer_overrun_to_int(timr, 0); + overrun = timer_overrun_to_int(timr); unlock_timer(timr, flags); return overrun; @@ -867,8 +823,6 @@ void posix_timer_set_common(struct k_itimer *timer, struct itimerspec64 *new_set else timer->it_interval = 0; - /* Prevent reloading in case there is a signal pending */ - timer->it_requeue_pending = (timer->it_requeue_pending + 2) & ~REQUEUE_PENDING; /* Reset overrun accounting */ timer->it_overrun_last = 0; timer->it_overrun = -1LL; @@ -886,8 +840,6 @@ int common_timer_set(struct k_itimer *timr, int flags, if (old_setting) common_timer_get(timr, old_setting); - /* Prevent rearming by clearing the interval */ - timr->it_interval = 0; /* * Careful here. On SMP systems the timer expiry function could be * active and spinning on timr->it_lock. @@ -895,7 +847,7 @@ int common_timer_set(struct k_itimer *timr, int flags, if (kc->timer_try_to_cancel(timr) < 0) return TIMER_RETRY; - timr->it_active = 0; + timr->it_status = POSIX_TIMER_DISARMED; posix_timer_set_common(timr, new_setting); /* Keep timer disarmed when it_value is zero */ @@ -908,7 +860,8 @@ int common_timer_set(struct k_itimer *timr, int flags, sigev_none = timr->it_sigev_notify == SIGEV_NONE; kc->timer_arm(timr, expires, flags & TIMER_ABSTIME, sigev_none); - timr->it_active = !sigev_none; + if (!sigev_none) + timr->it_status = POSIX_TIMER_ARMED; return 0; } @@ -936,6 +889,9 @@ retry: if (old_spec64) old_spec64->it_interval = ktime_to_timespec64(timr->it_interval); + /* Prevent signal delivery and rearming. */ + timr->it_signal_seq++; + kc = timr->kclock; if (WARN_ON_ONCE(!kc || !kc->timer_set)) error = -EINVAL; @@ -1004,17 +960,31 @@ int common_timer_del(struct k_itimer *timer) { const struct k_clock *kc = timer->kclock; - timer->it_interval = 0; if (kc->timer_try_to_cancel(timer) < 0) return TIMER_RETRY; - timer->it_active = 0; + timer->it_status = POSIX_TIMER_DISARMED; return 0; } +/* + * If the deleted timer is on the ignored list, remove it and + * drop the associated reference. + */ +static inline void posix_timer_cleanup_ignored(struct k_itimer *tmr) +{ + if (!hlist_unhashed(&tmr->ignored_list)) { + hlist_del_init(&tmr->ignored_list); + posixtimer_putref(tmr); + } +} + static inline int timer_delete_hook(struct k_itimer *timer) { const struct k_clock *kc = timer->kclock; + /* Prevent signal delivery and rearming. */ + timer->it_signal_seq++; + if (WARN_ON_ONCE(!kc || !kc->timer_del)) return -EINVAL; return kc->timer_del(timer); @@ -1040,12 +1010,18 @@ retry_delete: spin_lock(¤t->sighand->siglock); hlist_del(&timer->list); - spin_unlock(¤t->sighand->siglock); + posix_timer_cleanup_ignored(timer); /* * A concurrent lookup could check timer::it_signal lockless. It * will reevaluate with timer::it_lock held and observe the NULL. + * + * It must be written with siglock held so that the signal code + * observes timer->it_signal == NULL in do_sigaction(SIG_IGN), + * which prevents it from moving a pending signal of a deleted + * timer to the ignore list. */ WRITE_ONCE(timer->it_signal, NULL); + spin_unlock(¤t->sighand->siglock); unlock_timer(timer, flags); posix_timer_unhash_and_free(timer); @@ -1091,6 +1067,8 @@ retry_delete: } hlist_del(&timer->list); + posix_timer_cleanup_ignored(timer); + /* * Setting timer::it_signal to NULL is technically not required * here as nothing can access the timer anymore legitimately via @@ -1123,6 +1101,19 @@ void exit_itimers(struct task_struct *tsk) /* The timers are not longer accessible via tsk::signal */ while (!hlist_empty(&timers)) itimer_delete(hlist_entry(timers.first, struct k_itimer, list)); + + /* + * There should be no timers on the ignored list. itimer_delete() has + * mopped them up. + */ + if (!WARN_ON_ONCE(!hlist_empty(&tsk->signal->ignored_posix_timers))) + return; + + hlist_move_list(&tsk->signal->ignored_posix_timers, &timers); + while (!hlist_empty(&timers)) { + posix_timer_cleanup_ignored(hlist_entry(timers.first, struct k_itimer, + ignored_list)); + } } SYSCALL_DEFINE2(clock_settime, const clockid_t, which_clock, |