Add AMD Speculative Return Stack Overflow (SRSO) (#143)

Reference: https://www.amd.com/content/dam/amd/en/documents/corporate/cr/speculative-return-stack-overflow-whitepaper.pdf
2 files changed
tree: 302bdeebda388772739cc4a31a97733ada33ec7d
  1. .github/
  2. cmd/
  3. testdata/
  4. .gitignore
  5. .goreleaser.yml
  6. CONTRIBUTING.txt
  7. cpuid.go
  8. cpuid_386.s
  9. cpuid_amd64.s
  10. cpuid_arm64.s
  11. cpuid_test.go
  12. detect_arm64.go
  13. detect_ref.go
  14. detect_x86.go
  15. featureid_string.go
  16. go.mod
  17. go.sum
  18. LICENSE
  19. mockcpu_test.go
  20. os_darwin_arm64.go
  21. os_darwin_test.go
  22. os_linux_arm64.go
  23. os_other_arm64.go
  24. os_safe_linux_arm64.go
  25. os_unsafe_linux_arm64.go
  26. README.md
  27. test-architectures.sh
README.md

cpuid

Package cpuid provides information about the CPU running the current program.

CPU features are detected on startup, and kept for fast access through the life of the application. Currently x86 / x64 (AMD64/i386) and ARM (ARM64) is supported, and no external C (cgo) code is used, which should make the library very easy to use.

You can access the CPU information by accessing the shared CPU variable of the cpuid library.

Package home: https://github.com/klauspost/cpuid

PkgGoDev Go

installing

go get -u github.com/klauspost/cpuid/v2 using modules. Drop v2 for others.

Installing binary:

go install github.com/klauspost/cpuid/v2/cmd/cpuid@latest

Or download binaries from release page: https://github.com/klauspost/cpuid/releases

Homebrew

For macOS/Linux users, you can install via brew

$ brew install cpuid

example

package main

import (
	"fmt"
	"strings"

	. "github.com/klauspost/cpuid/v2"
)

func main() {
	// Print basic CPU information:
	fmt.Println("Name:", CPU.BrandName)
	fmt.Println("PhysicalCores:", CPU.PhysicalCores)
	fmt.Println("ThreadsPerCore:", CPU.ThreadsPerCore)
	fmt.Println("LogicalCores:", CPU.LogicalCores)
	fmt.Println("Family", CPU.Family, "Model:", CPU.Model, "Vendor ID:", CPU.VendorID)
	fmt.Println("Features:", strings.Join(CPU.FeatureSet(), ","))
	fmt.Println("Cacheline bytes:", CPU.CacheLine)
	fmt.Println("L1 Data Cache:", CPU.Cache.L1D, "bytes")
	fmt.Println("L1 Instruction Cache:", CPU.Cache.L1I, "bytes")
	fmt.Println("L2 Cache:", CPU.Cache.L2, "bytes")
	fmt.Println("L3 Cache:", CPU.Cache.L3, "bytes")
	fmt.Println("Frequency", CPU.Hz, "hz")

	// Test if we have these specific features:
	if CPU.Supports(SSE, SSE2) {
		fmt.Println("We have Streaming SIMD 2 Extensions")
	}
}

Sample output:

>go run main.go
Name: AMD Ryzen 9 3950X 16-Core Processor
PhysicalCores: 16
ThreadsPerCore: 2
LogicalCores: 32
Family 23 Model: 113 Vendor ID: AMD
Features: ADX,AESNI,AVX,AVX2,BMI1,BMI2,CLMUL,CMOV,CX16,F16C,FMA3,HTT,HYPERVISOR,LZCNT,MMX,MMXEXT,NX,POPCNT,RDRAND,RDSEED,RDTSCP,SHA,SSE,SSE2,SSE3,SSE4,SSE42,SSE4A,SSSE3
Cacheline bytes: 64
L1 Data Cache: 32768 bytes
L1 Instruction Cache: 32768 bytes
L2 Cache: 524288 bytes
L3 Cache: 16777216 bytes
Frequency 0 hz
We have Streaming SIMD 2 Extensions

usage

The cpuid.CPU provides access to CPU features. Use cpuid.CPU.Supports() to check for CPU features. A faster cpuid.CPU.Has() is provided which will usually be inlined by the gc compiler.

To test a larger number of features, they can be combined using f := CombineFeatures(CMOV, CMPXCHG8, X87, FXSR, MMX, SYSCALL, SSE, SSE2), etc. This can be using with cpuid.CPU.HasAll(f) to quickly test if all features are supported.

Note that for some cpu/os combinations some features will not be detected. amd64 has rather good support and should work reliably on all platforms.

Note that hypervisors may not pass through all CPU features through to the guest OS, so even if your host supports a feature it may not be visible on guests.

arm64 feature detection

Not all operating systems provide ARM features directly and there is no safe way to do so for the rest.

Currently arm64/linux and arm64/freebsd should be quite reliable. arm64/darwin adds features expected from the M1 processor, but a lot remains undetected.

A DetectARM() can be used if you are able to control your deployment, it will detect CPU features, but may crash if the OS doesn't intercept the calls. A -cpu.arm flag for detecting unsafe ARM features can be added. See below.

Note that currently only features are detected on ARM, no additional information is currently available.

flags

It is possible to add flags that affects cpu detection.

For this the Flags() command is provided.

This must be called before flag.Parse() AND after the flags have been parsed Detect() must be called.

This means that any detection used in init() functions will not contain these flags.

Example:

package main

import (
	"flag"
	"fmt"
	"strings"

	"github.com/klauspost/cpuid/v2"
)

func main() {
	cpuid.Flags()
	flag.Parse()
	cpuid.Detect()

	// Test if we have these specific features:
	if cpuid.CPU.Supports(cpuid.SSE, cpuid.SSE2) {
		fmt.Println("We have Streaming SIMD 2 Extensions")
	}
}

commandline

Download as binary from: https://github.com/klauspost/cpuid/releases

Install from source:

go install github.com/klauspost/cpuid/v2/cmd/cpuid@latest

Example

λ cpuid
Name: AMD Ryzen 9 3950X 16-Core Processor
Vendor String: AuthenticAMD
Vendor ID: AMD
PhysicalCores: 16
Threads Per Core: 2
Logical Cores: 32
CPU Family 23 Model: 113
Features: ADX,AESNI,AVX,AVX2,BMI1,BMI2,CLMUL,CLZERO,CMOV,CMPXCHG8,CPBOOST,CX16,F16C,FMA3,FXSR,FXSROPT,HTT,HYPERVISOR,LAHF,LZCNT,MCAOVERFLOW,MMX,MMXEXT,MOVBE,NX,OSXSAVE,POPCNT,RDRAND,RDSEED,RDTSCP,SCE,SHA,SSE,SSE2,SSE3,SSE4,SSE42,SSE4A,SSSE3,SUCCOR,X87,XSAVE
Microarchitecture level: 3
Cacheline bytes: 64
L1 Instruction Cache: 32768 bytes
L1 Data Cache: 32768 bytes
L2 Cache: 524288 bytes
L3 Cache: 16777216 bytes

JSON Output:

λ cpuid --json
{
  "BrandName": "AMD Ryzen 9 3950X 16-Core Processor",
  "VendorID": 2,
  "VendorString": "AuthenticAMD",
  "PhysicalCores": 16,
  "ThreadsPerCore": 2,
  "LogicalCores": 32,
  "Family": 23,
  "Model": 113,
  "CacheLine": 64,
  "Hz": 0,
  "BoostFreq": 0,
  "Cache": {
    "L1I": 32768,
    "L1D": 32768,
    "L2": 524288,
    "L3": 16777216
  },
  "SGX": {
    "Available": false,
    "LaunchControl": false,
    "SGX1Supported": false,
    "SGX2Supported": false,
    "MaxEnclaveSizeNot64": 0,
    "MaxEnclaveSize64": 0,
    "EPCSections": null
  },
  "Features": [
    "ADX",
    "AESNI",
    "AVX",
    "AVX2",
    "BMI1",
    "BMI2",
    "CLMUL",
    "CLZERO",
    "CMOV",
    "CMPXCHG8",
    "CPBOOST",
    "CX16",
    "F16C",
    "FMA3",
    "FXSR",
    "FXSROPT",
    "HTT",
    "HYPERVISOR",
    "LAHF",
    "LZCNT",
    "MCAOVERFLOW",
    "MMX",
    "MMXEXT",
    "MOVBE",
    "NX",
    "OSXSAVE",
    "POPCNT",
    "RDRAND",
    "RDSEED",
    "RDTSCP",
    "SCE",
    "SHA",
    "SSE",
    "SSE2",
    "SSE3",
    "SSE4",
    "SSE42",
    "SSE4A",
    "SSSE3",
    "SUCCOR",
    "X87",
    "XSAVE"
  ],
  "X64Level": 3
}

Check CPU microarch level

λ cpuid --check-level=3
2022/03/18 17:04:40 AMD Ryzen 9 3950X 16-Core Processor
2022/03/18 17:04:40 Microarchitecture level 3 is supported. Max level is 3.
Exit Code 0

λ cpuid --check-level=4
2022/03/18 17:06:18 AMD Ryzen 9 3950X 16-Core Processor
2022/03/18 17:06:18 Microarchitecture level 4 not supported. Max level is 3.
Exit Code 1

Available flags

x86 & amd64

Feature FlagDescription
ADXIntel ADX (Multi-Precision Add-Carry Instruction Extensions)
AESNIAdvanced Encryption Standard New Instructions
AMD3DNOWAMD 3DNOW
AMD3DNOWEXTAMD 3DNowExt
AMXBF16Tile computational operations on BFLOAT16 numbers
AMXINT8Tile computational operations on 8-bit integers
AMXFP16Tile computational operations on FP16 numbers
AMXTILETile architecture
APX_FIntel APX
AVXAVX functions
AVX10If set the Intel AVX10 Converged Vector ISA is supported
AVX10_128If set indicates that AVX10 128-bit vector support is present
AVX10_256If set indicates that AVX10 256-bit vector support is present
AVX10_512If set indicates that AVX10 512-bit vector support is present
AVX2AVX2 functions
AVX512BF16AVX-512 BFLOAT16 Instructions
AVX512BITALGAVX-512 Bit Algorithms
AVX512BWAVX-512 Byte and Word Instructions
AVX512CDAVX-512 Conflict Detection Instructions
AVX512DQAVX-512 Doubleword and Quadword Instructions
AVX512ERAVX-512 Exponential and Reciprocal Instructions
AVX512FAVX-512 Foundation
AVX512FP16AVX-512 FP16 Instructions
AVX512IFMAAVX-512 Integer Fused Multiply-Add Instructions
AVX512PFAVX-512 Prefetch Instructions
AVX512VBMIAVX-512 Vector Bit Manipulation Instructions
AVX512VBMI2AVX-512 Vector Bit Manipulation Instructions, Version 2
AVX512VLAVX-512 Vector Length Extensions
AVX512VNNIAVX-512 Vector Neural Network Instructions
AVX512VP2INTERSECTAVX-512 Intersect for D/Q
AVX512VPOPCNTDQAVX-512 Vector Population Count Doubleword and Quadword
AVXIFMAAVX-IFMA instructions
AVXNECONVERTAVX-NE-CONVERT instructions
AVXSLOWIndicates the CPU performs 2 128 bit operations instead of one
AVXVNNIAVX (VEX encoded) VNNI neural network instructions
AVXVNNIINT8AVX-VNNI-INT8 instructions
BHI_CTRLBranch History Injection and Intra-mode Branch Target Injection / CVE-2022-0001, CVE-2022-0002 / INTEL-SA-00598
BMI1Bit Manipulation Instruction Set 1
BMI2Bit Manipulation Instruction Set 2
CETIBTIntel CET Indirect Branch Tracking
CETSSIntel CET Shadow Stack
CLDEMOTECache Line Demote
CLMULCarry-less Multiplication
CLZEROCLZERO instruction supported
CMOVi686 CMOV
CMPCCXADDCMPCCXADD instructions
CMPSB_SCADBS_SHORTFast short CMPSB and SCASB
CMPXCHG8CMPXCHG8 instruction
CPBOOSTCore Performance Boost
CPPCAMD: Collaborative Processor Performance Control
CX16CMPXCHG16B Instruction
EFER_LMSLE_UNSAMD: =Core::X86::Msr::EFER[LMSLE] is not supported, and MBZ
ENQCMDEnqueue Command
ERMSEnhanced REP MOVSB/STOSB
F16CHalf-precision floating-point conversion
FLUSH_L1DFlush L1D cache
FMA3Intel FMA 3. Does not imply AVX.
FMA4Bulldozer FMA4 functions
FP128AMD: When set, the internal FP/SIMD execution datapath is 128-bits wide
FP256AMD: When set, the internal FP/SIMD execution datapath is 256-bits wide
FSRMFast Short Rep Mov
FXSRFXSAVE, FXRESTOR instructions, CR4 bit 9
FXSROPTFXSAVE/FXRSTOR optimizations
GFNIGalois Field New Instructions. May require other features (AVX, AVX512VL,AVX512F) based on usage.
HLEHardware Lock Elision
HRESETIf set CPU supports history reset and the IA32_HRESET_ENABLE MSR
HTTHyperthreading (enabled)
HWAHardware assert supported. Indicates support for MSRC001_10
HYBRID_CPUThis part has CPUs of more than one type.
HYPERVISORThis bit has been reserved by Intel & AMD for use by hypervisors
IA32_ARCH_CAPIA32_ARCH_CAPABILITIES MSR (Intel)
IA32_CORE_CAPIA32_CORE_CAPABILITIES MSR
IBPBIndirect Branch Restricted Speculation (IBRS) and Indirect Branch Predictor Barrier (IBPB)
IBRSAMD: Indirect Branch Restricted Speculation
IBRS_PREFERREDAMD: IBRS is preferred over software solution
IBRS_PROVIDES_SMPAMD: IBRS provides Same Mode Protection
IBSInstruction Based Sampling (AMD)
IBSBRNTRGTInstruction Based Sampling Feature (AMD)
IBSFETCHSAMInstruction Based Sampling Feature (AMD)
IBSFFVInstruction Based Sampling Feature (AMD)
IBSOPCNTInstruction Based Sampling Feature (AMD)
IBSOPCNTEXTInstruction Based Sampling Feature (AMD)
IBSOPSAMInstruction Based Sampling Feature (AMD)
IBSRDWROPCNTInstruction Based Sampling Feature (AMD)
IBSRIPINVALIDCHKInstruction Based Sampling Feature (AMD)
IBS_FETCH_CTLXAMD: IBS fetch control extended MSR supported
IBS_OPDATA4AMD: IBS op data 4 MSR supported
IBS_OPFUSEAMD: Indicates support for IbsOpFuse
IBS_PREVENTHOSTDisallowing IBS use by the host supported
IBS_ZEN4Fetch and Op IBS support IBS extensions added with Zen4
IDPRED_CTRLIPRED_DIS
INT_WBINVDWBINVD/WBNOINVD are interruptible.
INVLPGBNVLPGB and TLBSYNC instruction supported
KEYLOCKERKey locker
KEYLOCKERWKey locker wide
LAHFLAHF/SAHF in long mode
LAMIf set, CPU supports Linear Address Masking
LBRVIRTLBR virtualization
LZCNTLZCNT instruction
MCAOVERFLOWMCA overflow recovery support.
MCDT_NOProcessor do not exhibit MXCSR Configuration Dependent Timing behavior and do not need to mitigate it.
MCOMMITMCOMMIT instruction supported
MD_CLEARVERW clears CPU buffers
MMXstandard MMX
MMXEXTSSE integer functions or AMD MMX ext
MOVBEMOVBE instruction (big-endian)
MOVDIR64BMove 64 Bytes as Direct Store
MOVDIRIMove Doubleword as Direct Store
MOVSB_ZLFast Zero-Length MOVSB
MPXIntel MPX (Memory Protection Extensions)
MOVUMOVU SSE instructions are more efficient and should be preferred to SSE MOVL/MOVH. MOVUPS is more efficient than MOVLPS/MOVHPS. MOVUPD is more efficient than MOVLPD/MOVHPD
MSRIRCInstruction Retired Counter MSR available
MSRLISTRead/Write List of Model Specific Registers
MSR_PAGEFLUSHPage Flush MSR available
NRIPSIndicates support for NRIP save on VMEXIT
NXNX (No-Execute) bit
OSXSAVEXSAVE enabled by OS
PCONFIGPCONFIG for Intel Multi-Key Total Memory Encryption
POPCNTPOPCNT instruction
PPINAMD: Protected Processor Inventory Number support. Indicates that Protected Processor Inventory Number (PPIN) capability can be enabled
PREFETCHIPREFETCHIT0/1 instructions
PSFDPredictive Store Forward Disable
RDPRURDPRU instruction supported
RDRANDRDRAND instruction is available
RDSEEDRDSEED instruction is available
RDTSCPRDTSCP Instruction
RRSBA_CTRLRestricted RSB Alternate
RTMRestricted Transactional Memory
RTM_ALWAYS_ABORTIndicates that the loaded microcode is forcing RTM abort.
SERIALIZESerialize Instruction Execution
SEVAMD Secure Encrypted Virtualization supported
SEV_64BITAMD SEV guest execution only allowed from a 64-bit host
SEV_ALTERNATIVEAMD SEV Alternate Injection supported
SEV_DEBUGSWAPFull debug state swap supported for SEV-ES guests
SEV_ESAMD SEV Encrypted State supported
SEV_RESTRICTEDAMD SEV Restricted Injection supported
SEV_SNPAMD SEV Secure Nested Paging supported
SGXSoftware Guard Extensions
SGXLCSoftware Guard Extensions Launch Control
SHAIntel SHA Extensions
SMEAMD Secure Memory Encryption supported
SME_COHERENTAMD Hardware cache coherency across encryption domains enforced
SPEC_CTRL_SSBDSpeculative Store Bypass Disable
SRBDS_CTRLSRBDS mitigation MSR available
SSESSE functions
SSE2P4 SSE functions
SSE3Prescott SSE3 functions
SSE4Penryn SSE4.1 functions
SSE42Nehalem SSE4.2 functions
SSE4AAMD Barcelona microarchitecture SSE4a instructions
SSSE3Conroe SSSE3 functions
STIBPSingle Thread Indirect Branch Predictors
STIBP_ALWAYSONAMD: Single Thread Indirect Branch Prediction Mode has Enhanced Performance and may be left Always On
STOSB_SHORTFast short STOSB
SUCCORSoftware uncorrectable error containment and recovery capability.
SVMAMD Secure Virtual Machine
SVMDAIndicates support for the SVM decode assists.
SVMFBASIDSVM, Indicates that TLB flush events, including CR3 writes and CR4.PGE toggles, flush only the current ASID's TLB entries. Also indicates support for the extended VMCBTLB_Control
SVMLAMD SVM lock. Indicates support for SVM-Lock.
SVMNPAMD SVM nested paging
SVMPFSVM pause intercept filter. Indicates support for the pause intercept filter
SVMPFTSVM PAUSE filter threshold. Indicates support for the PAUSE filter cycle count threshold
SYSCALLSystem-Call Extension (SCE): SYSCALL and SYSRET instructions.
SYSEESYSENTER and SYSEXIT instructions
TBMAMD Trailing Bit Manipulation
TDX_GUESTIntel Trust Domain Extensions Guest
TLB_FLUSH_NESTEDAMD: Flushing includes all the nested translations for guest translations
TMEIntel Total Memory Encryption. The following MSRs are supported: IA32_TME_CAPABILITY, IA32_TME_ACTIVATE, IA32_TME_EXCLUDE_MASK, and IA32_TME_EXCLUDE_BASE.
TOPEXTTopologyExtensions: topology extensions support. Indicates support for CPUID Fn8000_001D_EAX_x[N:0]-CPUID Fn8000_001E_EDX.
TSCRATEMSRMSR based TSC rate control. Indicates support for MSR TSC ratio MSRC000_0104
TSXLDTRKIntel TSX Suspend Load Address Tracking
VAESVector AES. AVX(512) versions requires additional checks.
VMCBCLEANVMCB clean bits. Indicates support for VMCB clean bits.
VMPLAMD VM Permission Levels supported
VMSA_REGPROTAMD VMSA Register Protection supported
VMXVirtual Machine Extensions
VPCLMULQDQCarry-Less Multiplication Quadword. Requires AVX for 3 register versions.
VTEAMD Virtual Transparent Encryption supported
WAITPKGTPAUSE, UMONITOR, UMWAIT
WBNOINVDWrite Back and Do Not Invalidate Cache
WRMSRNSNon-Serializing Write to Model Specific Register
X87FPU
XGETBV1Supports XGETBV with ECX = 1
XOPBulldozer XOP functions
XSAVEXSAVE, XRESTOR, XSETBV, XGETBV
XSAVECSupports XSAVEC and the compacted form of XRSTOR.
XSAVEOPTXSAVEOPT available
XSAVESSupports XSAVES/XRSTORS and IA32_XSS

ARM features:

Feature FlagDescription
AESARMAES instructions
ARMCPUIDSome CPU ID registers readable at user-level
ASIMDAdvanced SIMD
ASIMDDPSIMD Dot Product
ASIMDHPAdvanced SIMD half-precision floating point
ASIMDRDMRounding Double Multiply Accumulate/Subtract (SQRDMLAH/SQRDMLSH)
ATOMICSLarge System Extensions (LSE)
CRC32CRC32/CRC32C instructions
DCPOPData cache clean to Point of Persistence (DC CVAP)
EVTSTRMGeneric timer
FCMAFloatin point complex number addition and multiplication
FPSingle-precision and double-precision floating point
FPHPHalf-precision floating point
GPAGeneric Pointer Authentication
JSCVTJavascript-style double->int convert (FJCVTZS)
LRCPCWeaker release consistency (LDAPR, etc)
PMULLPolynomial Multiply instructions (PMULL/PMULL2)
SHA1SHA-1 instructions (SHA1C, etc)
SHA2SHA-2 instructions (SHA256H, etc)
SHA3SHA-3 instructions (EOR3, RAXI, XAR, BCAX)
SHA512SHA512 instructions
SM3SM3 instructions
SM4SM4 instructions
SVEScalable Vector Extension

license

This code is published under an MIT license. See LICENSE file for more information.