UPSTREAM: sched: Clean up active_mm reference counting

The current active_mm reference counting is confusing and sub-optimal.

Rewrite the code to explicitly consider the 4 separate cases:

    user -> user

	When switching between two user tasks, all we need to consider
	is switch_mm().

    user -> kernel

	When switching from a user task to a kernel task (which
	doesn't have an associated mm) we retain the last mm in our
	active_mm. Increment a reference count on active_mm.

  kernel -> kernel

	When switching between kernel threads, all we need to do is
	pass along the active_mm reference.

  kernel -> user

	When switching between a kernel and user task, we must switch
	from the last active_mm to the next mm, hoping of course that
	these are the same. Decrement a reference on the active_mm.

The code keeps a different order, because as you'll note, both 'to
user' cases require switch_mm().

And where the old code would increment/decrement for the 'kernel ->
kernel' case, the new code observes this is a neutral operation and
avoids touching the reference count.

BUG=b:152605392
TEST=run power_VideoCall test

(cherry picked from commit 139d025cda1da)

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Rik van Riel <riel@surriel.com>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: luto@kernel.org
Signed-off-by: Joel Fernandes <joelaf@google.com>
Change-Id: I9f26b68346c6d0eccf5b9c900c8b4d8dae4d4d42
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/2186279
Reviewed-by: Sonny Rao <sonnyrao@chromium.org>
Reviewed-by: Sean Paul <seanpaul@chromium.org>
Signed-off-by: UtsavBalar1231 <utsavbalar1231@gmail.com>
This commit is contained in:
Peter Zijlstra 2019-07-29 16:05:15 +02:00 committed by spakkkk
parent e533d75d22
commit c2f6ec5e1e

View File

@ -3516,12 +3516,8 @@ static __always_inline struct rq *
context_switch(struct rq *rq, struct task_struct *prev,
struct task_struct *next, struct rq_flags *rf)
{
struct mm_struct *mm, *oldmm;
prepare_task_switch(rq, prev, next);
mm = next->mm;
oldmm = prev->active_mm;
/*
* For paravirt, this is coupled with an exit in switch_to to
* combine the page table reload and the switch backend into
@ -3530,22 +3526,37 @@ context_switch(struct rq *rq, struct task_struct *prev,
arch_start_context_switch(prev);
/*
* If mm is non-NULL, we pass through switch_mm(). If mm is
* NULL, we will pass through mmdrop() in finish_task_switch().
* Both of these contain the full memory barrier required by
* membarrier after storing to rq->curr, before returning to
* user-space.
* kernel -> kernel lazy + transfer active
* user -> kernel lazy + mmgrab() active
*
* kernel -> user switch + mmdrop() active
* user -> user switch
*/
if (!mm) {
next->active_mm = oldmm;
mmgrab(oldmm);
enter_lazy_tlb(oldmm, next);
} else
switch_mm_irqs_off(oldmm, mm, next);
if (!next->mm) { // to kernel
enter_lazy_tlb(prev->active_mm, next);
if (!prev->mm) {
next->active_mm = prev->active_mm;
if (prev->mm) // from user
mmgrab(prev->active_mm);
else
prev->active_mm = NULL;
rq->prev_mm = oldmm;
} else { // to user
/*
* sys_membarrier() requires an smp_mb() between setting
* rq->curr and returning to userspace.
*
* The below provides this either through switch_mm(), or in
* case 'prev->active_mm == next->mm' through
* finish_task_switch()'s mmdrop().
*/
switch_mm_irqs_off(prev->active_mm, next->mm, next);
if (!prev->mm) { // from kernel
/* will mmdrop() in finish_task_switch(). */
rq->prev_mm = prev->active_mm;
prev->active_mm = NULL;
}
}
rq->clock_update_flags &= ~(RQCF_ACT_SKIP|RQCF_REQ_SKIP);