Go to file
Daniel Sneddon 56cf3753a1 x86/speculation: Add RSB VM Exit protections
commit 2b1299322016731d56807aa49254a5ea3080b6b3 upstream.

tl;dr: The Enhanced IBRS mitigation for Spectre v2 does not work as
documented for RET instructions after VM exits. Mitigate it with a new
one-entry RSB stuffing mechanism and a new LFENCE.

== Background ==

Indirect Branch Restricted Speculation (IBRS) was designed to help
mitigate Branch Target Injection and Speculative Store Bypass, i.e.
Spectre, attacks. IBRS prevents software run in less privileged modes
from affecting branch prediction in more privileged modes. IBRS requires
the MSR to be written on every privilege level change.

To overcome some of the performance issues of IBRS, Enhanced IBRS was
introduced.  eIBRS is an "always on" IBRS, in other words, just turn
it on once instead of writing the MSR on every privilege level change.
When eIBRS is enabled, more privileged modes should be protected from
less privileged modes, including protecting VMMs from guests.

== Problem ==

Here's a simplification of how guests are run on Linux' KVM:

void run_kvm_guest(void)
{
	// Prepare to run guest
	VMRESUME();
	// Clean up after guest runs
}

The execution flow for that would look something like this to the
processor:

1. Host-side: call run_kvm_guest()
2. Host-side: VMRESUME
3. Guest runs, does "CALL guest_function"
4. VM exit, host runs again
5. Host might make some "cleanup" function calls
6. Host-side: RET from run_kvm_guest()

Now, when back on the host, there are a couple of possible scenarios of
post-guest activity the host needs to do before executing host code:

* on pre-eIBRS hardware (legacy IBRS, or nothing at all), the RSB is not
touched and Linux has to do a 32-entry stuffing.

* on eIBRS hardware, VM exit with IBRS enabled, or restoring the host
IBRS=1 shortly after VM exit, has a documented side effect of flushing
the RSB except in this PBRSB situation where the software needs to stuff
the last RSB entry "by hand".

IOW, with eIBRS supported, host RET instructions should no longer be
influenced by guest behavior after the host retires a single CALL
instruction.

However, if the RET instructions are "unbalanced" with CALLs after a VM
exit as is the RET in #6, it might speculatively use the address for the
instruction after the CALL in #3 as an RSB prediction. This is a problem
since the (untrusted) guest controls this address.

Balanced CALL/RET instruction pairs such as in step #5 are not affected.

== Solution ==

The PBRSB issue affects a wide variety of Intel processors which
support eIBRS. But not all of them need mitigation. Today,
X86_FEATURE_RSB_VMEXIT triggers an RSB filling sequence that mitigates
PBRSB. Systems setting RSB_VMEXIT need no further mitigation - i.e.,
eIBRS systems which enable legacy IBRS explicitly.

However, such systems (X86_FEATURE_IBRS_ENHANCED) do not set RSB_VMEXIT
and most of them need a new mitigation.

Therefore, introduce a new feature flag X86_FEATURE_RSB_VMEXIT_LITE
which triggers a lighter-weight PBRSB mitigation versus RSB_VMEXIT.

The lighter-weight mitigation performs a CALL instruction which is
immediately followed by a speculative execution barrier (INT3). This
steers speculative execution to the barrier -- just like a retpoline
-- which ensures that speculation can never reach an unbalanced RET.
Then, ensure this CALL is retired before continuing execution with an
LFENCE.

In other words, the window of exposure is opened at VM exit where RET
behavior is troublesome. While the window is open, force RSB predictions
sampling for RET targets to a dead end at the INT3. Close the window
with the LFENCE.

There is a subset of eIBRS systems which are not vulnerable to PBRSB.
Add these systems to the cpu_vuln_whitelist[] as NO_EIBRS_PBRSB.
Future systems that aren't vulnerable will set ARCH_CAP_PBRSB_NO.

  [ bp: Massage, incorporate review comments from Andy Cooper. ]

Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Co-developed-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
[ bp: Adjust patch to account for kvm entry being in c ]
Signed-off-by: Suraj Jitindar Singh <surajjs@amazon.com>
Signed-off-by: Suleiman Souhlal <suleiman@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-23 07:53:47 +01:00
arch x86/speculation: Add RSB VM Exit protections 2022-11-23 07:53:47 +01:00
block block, bfq: protect 'bfqd->queued' by 'bfqd->lock' 2022-11-10 17:46:54 +01:00
certs certs/blacklist_hashes.c: fix const confusion in certs blacklist 2022-06-25 11:49:16 +02:00
crypto crypto: drbg - make reseeding from get_random_bytes() synchronous 2022-06-25 11:49:13 +02:00
Documentation x86/speculation: Add RSB VM Exit protections 2022-11-23 07:53:47 +01:00
drivers intel_idle: Disable IBRS during long idle 2022-11-23 07:53:45 +01:00
firmware Fix built-in early-load Intel microcode alignment 2020-01-23 08:21:29 +01:00
fs ext4: fix warning in 'ext4_da_release_space' 2022-11-10 17:46:55 +01:00
include x86/bugs: Report AMD retbleed vulnerability 2022-11-23 07:53:44 +01:00
init random: handle latent entropy and command line from random_init() 2022-06-25 11:49:11 +02:00
ipc shm: extend forced shm destroy to support objects from several IPC nses 2021-12-08 08:50:11 +01:00
kernel kprobe: reverse kp->flags when arm_kprobe failed 2022-11-10 17:46:54 +01:00
lib dyndbg: let query-modname override actual module name 2022-10-26 13:19:31 +02:00
LICENSES LICENSES: Remove CC-BY-SA-4.0 license text 2018-10-18 11:28:50 +02:00
mm mm,hugetlb: take hugetlb_lock before decrementing h->resv_huge_pages 2022-11-03 23:52:29 +09:00
net tcp/udp: Make early_demux back namespacified. 2022-11-10 17:46:54 +01:00
samples samples/kretprobes: Fix return value if register_kretprobe() failed 2021-11-26 11:36:11 +01:00
scripts selinux: use "grep -E" instead of "egrep" 2022-10-26 13:19:24 +02:00
security capabilities: fix potential memleak on error path from vfs_getxattr_alloc() 2022-11-10 17:46:55 +01:00
sound ALSA: usb-audio: Add quirks for MacroSilicon MS2100/MS2106 devices 2022-11-10 17:46:55 +01:00
tools x86/speculation: Add RSB VM Exit protections 2022-11-23 07:53:47 +01:00
usr initramfs: restore default compression behavior 2020-04-13 10:44:59 +02:00
virt KVM: arm64: vgic: Fix exit condition in scan_its_table() 2022-11-03 23:52:24 +09:00
.clang-format clang-format: Set IndentWrappedFunctionNames false 2018-08-01 18:38:51 +02:00
.cocciconfig scripts: add Linux .cocciconfig for coccinelle 2016-07-22 12:13:39 +02:00
.get_maintainer.ignore Add hch to .get_maintainer.ignore 2015-08-21 14:30:10 -07:00
.gitattributes .gitattributes: set git diff driver for C source code files 2016-10-07 18:46:30 -07:00
.gitignore Kbuild updates for v4.17 (2nd) 2018-04-15 17:21:30 -07:00
.mailmap libnvdimm-for-4.19_misc 2018-08-25 18:13:10 -07:00
COPYING COPYING: use the new text with points to the license files 2018-03-23 12:41:45 -06:00
CREDITS 9p: remove Ron Minnich from MAINTAINERS 2018-08-17 16:20:26 -07:00
Kbuild Kbuild updates for v4.15 2017-11-17 17:45:29 -08:00
Kconfig kconfig: move the "Executable file formats" menu to fs/Kconfig.binfmt 2018-08-02 08:06:55 +09:00
MAINTAINERS MAINTAINERS: co-maintain random.c 2022-06-25 11:49:01 +02:00
Makefile Linux 4.19.265 2022-11-10 17:46:56 +01:00
README Docs: Added a pointer to the formatted docs to README 2018-03-21 09:02:53 -06:00

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.
See Documentation/00-INDEX for a list of what is contained in each file.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.