commit | f42f5eaf0bbeabd3a1153651cd2a5989faac4f58 | [log] [tgz] |
---|---|---|
author | Ma Mingfei <mingfei.ma@intel.com> | Thu Mar 28 06:03:24 2024 |
committer | GitHub <noreply@github.com> | Thu Mar 28 06:03:24 2024 |
tree | ec36c21a9230b8a3e42fdfc9cde3aa511818cfbc | |
parent | 6543fec09b2f04ac4a666882998b534afc9c1349 [diff] |
Add detection for Intel Advanced Matrix Extensions (AMX) instructions (#231) Tested using intel SDE: https://www.intel.com/content/www/us/en/download/684897/intel-software-development-emulator.html Test scripts: ``` bash scripts/local-build.sh ISAS=() OPTIONS=() PLATFORMS=() OPTIONS+=(-quark); PLATFORMS+=("Quark") OPTIONS+=(-p4); PLATFORMS+=("Pentium4") OPTIONS+=(-p4p); PLATFORMS+=("Pentium4 Prescott") OPTIONS+=(-mrm); PLATFORMS+=("Merom") OPTIONS+=(-pnr); PLATFORMS+=("Penryn") OPTIONS+=(-nhm); PLATFORMS+=("Nehalem") OPTIONS+=(-wsm); PLATFORMS+=("Westmere") OPTIONS+=(-snb); PLATFORMS+=("Sandy Bridge") OPTIONS+=(-ivb); PLATFORMS+=("Ivy Bridge") OPTIONS+=(-hsw); PLATFORMS+=("Haswell") OPTIONS+=(-bdw); PLATFORMS+=("Broadwell") OPTIONS+=(-slt); PLATFORMS+=("Saltwell") OPTIONS+=(-slm); PLATFORMS+=("Silvermont") OPTIONS+=(-glm); PLATFORMS+=("Goldmont") OPTIONS+=(-glp); PLATFORMS+=("Goldmont Plus") OPTIONS+=(-tnt); PLATFORMS+=("Tremont") OPTIONS+=(-snr); PLATFORMS+=("Snow Ridge") OPTIONS+=(-skl); PLATFORMS+=("Skylake") OPTIONS+=(-cnl); PLATFORMS+=("Cannon Lake") OPTIONS+=(-icl); PLATFORMS+=("Ice Lake") OPTIONS+=(-skx); PLATFORMS+=("Skylake server") OPTIONS+=(-clx); PLATFORMS+=("Cascade Lake") OPTIONS+=(-cpx); PLATFORMS+=("Cooper Lake") OPTIONS+=(-icx); PLATFORMS+=("Ice Lake server") OPTIONS+=(-knl); PLATFORMS+=("Knights landing") OPTIONS+=(-knm); PLATFORMS+=("Knights mill") OPTIONS+=(-tgl); PLATFORMS+=("Tiger Lake") OPTIONS+=(-adl); PLATFORMS+=("Alder Lake") OPTIONS+=(-mtl); PLATFORMS+=("Meteor Lake") OPTIONS+=(-rpl); PLATFORMS+=("Raptor Lake") OPTIONS+=(-spr); PLATFORMS+=("Sapphire Rapids") OPTIONS+=(-gnr); PLATFORMS+=("Granite Rapids") OPTIONS+=(-gnr256); PLATFORMS+=("Granite Rapids (AVX10.1 / 256VL)") OPTIONS+=(-srf); PLATFORMS+=("Sierra Forest") OPTIONS+=(-arl); PLATFORMS+=("Arrow Lake") OPTIONS+=(-lnl); PLATFORMS+=("Lunar Lake") OPTIONS+=(-future); PLATFORMS+=("Future chip") ISAS+=("AMXBF16") ISAS+=("AMXTILE") ISAS+=("AMXINT8") ISAS+=("AMXFP16") SDE_BIN="/home/mingfeim/packages/sde-external-9.33.0-2024-01-07-lin/sde" for I in "${!PLATFORMS[@]}"; do echo "${PLATFORMS["${I}"]}" for J in "${!ISAS[@]}"; do "${SDE_BIN}" "${OPTIONS[$I]}" -- ./build/local/isa-info | grep ${ISAS[$J]} done done ``` Results: ``` Quark SDE-ERROR: 64 bits applications are not supported by input chip: PENTIUM or by the input cpuid definition file SDE-ERROR: 64 bits applications are not supported by input chip: PENTIUM or by the input cpuid definition file SDE-ERROR: 64 bits applications are not supported by input chip: PENTIUM or by the input cpuid definition file SDE-ERROR: 64 bits applications are not supported by input chip: PENTIUM or by the input cpuid definition file Pentium4 SDE-ERROR: 64 bits applications are not supported by input chip: PENTIUM4 or by the input cpuid definition file SDE-ERROR: 64 bits applications are not supported by input chip: PENTIUM4 or by the input cpuid definition file SDE-ERROR: 64 bits applications are not supported by input chip: PENTIUM4 or by the input cpuid definition file SDE-ERROR: 64 bits applications are not supported by input chip: PENTIUM4 or by the input cpuid definition file Pentium4 Prescott AMXBF16: no AMXTILE: no AMXINT8: no AMXFP16: no Merom AMXBF16: no AMXTILE: no AMXINT8: no AMXFP16: no Penryn AMXBF16: no AMXTILE: no AMXINT8: no AMXFP16: no Nehalem AMXBF16: no AMXTILE: no AMXINT8: no AMXFP16: no Westmere AMXBF16: no AMXTILE: no AMXINT8: no AMXFP16: no Sandy Bridge AMXBF16: no AMXTILE: no AMXINT8: no AMXFP16: no Ivy Bridge AMXBF16: no AMXTILE: no AMXINT8: no AMXFP16: no Haswell AMXBF16: no AMXTILE: no AMXINT8: no AMXFP16: no Broadwell AMXBF16: no AMXTILE: no AMXINT8: no AMXFP16: no Saltwell AMXBF16: no AMXTILE: no AMXINT8: no AMXFP16: no Silvermont AMXBF16: no AMXTILE: no AMXINT8: no AMXFP16: no Goldmont AMXBF16: no AMXTILE: no AMXINT8: no AMXFP16: no Goldmont Plus AMXBF16: no AMXTILE: no AMXINT8: no AMXFP16: no Tremont AMXBF16: no AMXTILE: no AMXINT8: no AMXFP16: no Snow Ridge AMXBF16: no AMXTILE: no AMXINT8: no AMXFP16: no Skylake AMXBF16: no AMXTILE: no AMXINT8: no AMXFP16: no Cannon Lake AMXBF16: no AMXTILE: no AMXINT8: no AMXFP16: no Ice Lake AMXBF16: no AMXTILE: no AMXINT8: no AMXFP16: no Skylake server AMXBF16: no AMXTILE: no AMXINT8: no AMXFP16: no Cascade Lake AMXBF16: no AMXTILE: no AMXINT8: no AMXFP16: no Cooper Lake AMXBF16: no AMXTILE: no AMXINT8: no AMXFP16: no Ice Lake server AMXBF16: no AMXTILE: no AMXINT8: no AMXFP16: no Knights landing AMXBF16: no AMXTILE: no AMXINT8: no AMXFP16: no Knights mill AMXBF16: no AMXTILE: no AMXINT8: no AMXFP16: no Tiger Lake AMXBF16: no AMXTILE: no AMXINT8: no AMXFP16: no Alder Lake AMXBF16: no AMXTILE: no AMXINT8: no AMXFP16: no Meteor Lake AMXBF16: no AMXTILE: no AMXINT8: no AMXFP16: no Raptor Lake AMXBF16: no AMXTILE: no AMXINT8: no AMXFP16: no Sapphire Rapids AMXBF16: yes AMXTILE: yes AMXINT8: yes AMXFP16: no Granite Rapids AMXBF16: yes AMXTILE: yes AMXINT8: yes AMXFP16: yes Granite Rapids (AVX10.1 / 256VL) AMXBF16: yes AMXTILE: yes AMXINT8: yes AMXFP16: yes Sierra Forest AMXBF16: no AMXTILE: no AMXINT8: no AMXFP16: no Arrow Lake AMXBF16: no AMXTILE: no AMXINT8: no AMXFP16: no Lunar Lake AMXBF16: no AMXTILE: no AMXINT8: no AMXFP16: no Future chip AMXBF16: yes AMXTILE: yes AMXINT8: yes AMXFP16: yes ```
cpuinfo is a library to detect essential for performance optimization information about host CPU.
Log processor name:
cpuinfo_initialize(); printf("Running on %s CPU\n", cpuinfo_get_package(0)->name);
Detect if target is a 32-bit or 64-bit ARM system:
#if CPUINFO_ARCH_ARM || CPUINFO_ARCH_ARM64 /* 32-bit ARM-specific code here */ #endif
Check if the host CPU supports ARM NEON
cpuinfo_initialize(); if (cpuinfo_has_arm_neon()) { neon_implementation(arguments); }
Check if the host CPU supports x86 AVX
cpuinfo_initialize(); if (cpuinfo_has_x86_avx()) { avx_implementation(arguments); }
Check if the thread runs on a Cortex-A53 core
cpuinfo_initialize(); switch (cpuinfo_get_current_core()->uarch) { case cpuinfo_uarch_cortex_a53: cortex_a53_implementation(arguments); break; default: generic_implementation(arguments); break; }
Get the size of level 1 data cache on the fastest core in the processor (e.g. big core in big.LITTLE ARM systems):
cpuinfo_initialize(); const size_t l1_size = cpuinfo_get_processor(0)->cache.l1d->size;
Pin thread to cores sharing L2 cache with the current core (Linux or Android)
cpuinfo_initialize(); cpu_set_t cpu_set; CPU_ZERO(&cpu_set); const struct cpuinfo_cache* current_l2 = cpuinfo_get_current_processor()->cache.l2; for (uint32_t i = 0; i < current_l2->processor_count; i++) { CPU_SET(cpuinfo_get_processor(current_l2->processor_start + i)->linux_id, &cpu_set); } pthread_setaffinity_np(pthread_self(), sizeof(cpu_set_t), &cpu_set);
If you would like to provide your project's build environment with the necessary compiler and linker flags in a portable manner, the library by default when built enables CPUINFO_BUILD_PKG_CONFIG
and will generate a pkg-config manifest (libcpuinfo.pc). Here are several examples of how to use it:
If you used your distro's package manager to install the library, you can verify that it is available to your build environment like so:
$ pkg-config --cflags --libs libcpuinfo -I/usr/include/x86_64-linux-gnu/ -L/lib/x86_64-linux-gnu/ -lcpuinfo
If you have installed the library from source into a non-standard prefix, pkg-config may need help finding it:
$ PKG_CONFIG_PATH="/home/me/projects/cpuinfo/prefix/lib/pkgconfig/:$PKG_CONFIG_PATH" pkg-config --cflags --libs libcpuinfo -I/home/me/projects/cpuinfo/prefix/include -L/home/me/projects/cpuinfo/prefix/lib -lcpuinfo
To use with the GNU Autotools include the following snippet in your project's configure.ac
:
# CPU INFOrmation library... PKG_CHECK_MODULES( [libcpuinfo], [libcpuinfo], [], [AC_MSG_ERROR([libcpuinfo missing...])]) YOURPROJECT_CXXFLAGS="$YOURPROJECT_CXXFLAGS $libcpuinfo_CFLAGS" YOURPROJECT_LIBS="$YOURPROJECT_LIBS $libcpuinfo_LIBS"
To use with Meson you just need to add dependency('libcpuinfo')
as a dependency for your executable.
project( 'MyCpuInfoProject', 'cpp', meson_version: '>=0.55.0' ) executable( 'MyCpuInfoExecutable', sources: 'main.cpp', dependencies: dependency('libcpuinfo') )
This project can be built using Bazel.
You can also use this library as a dependency to your Bazel project. Add to the WORKSPACE
file:
load("@bazel_tools//tools/build_defs/repo:git.bzl", "git_repository") git_repository( name = "org_pytorch_cpuinfo", branch = "master", remote = "https://github.com/Vertexwahn/cpuinfo.git", )
And to your BUILD
file:
cc_binary( name = "cpuinfo_test", srcs = [ # ... ], deps = [ "@org_pytorch_cpuinfo//:cpuinfo", ], )
To use with CMake use the FindPkgConfig module. Here is an example:
cmake_minimum_required(VERSION 3.6) project("MyCpuInfoProject") find_package(PkgConfig) pkg_check_modules(CpuInfo REQUIRED IMPORTED_TARGET libcpuinfo) add_executable(${PROJECT_NAME} main.cpp) target_link_libraries(${PROJECT_NAME} PkgConfig::CpuInfo)
To use within a vanilla makefile, you can call pkg-config directly to supply compiler and linker flags using shell substitution.
CFLAGS=-g3 -Wall -Wextra -Werror ... LDFLAGS=-lfoo ... ... CFLAGS+= $(pkg-config --cflags libcpuinfo) LDFLAGS+= $(pkg-config --libs libcpuinfo)
/proc/cpuinfo
on ARMro.chipname
, ro.board.platform
, ro.product.board
, ro.mediatek.platform
, ro.arch
properties (Android)dmesg
) on ARM Linux/proc/cpuinfo
on 32-bit ARM EABI (Linux)FPSID
and WCID
registers (32-bit ARM)getauxval
(Linux/ARM)/proc/self/auxv
(Android/ARM)/proc/cpuinfo
(Linux/pre-ARMv7)sysctlbyname
(Mach)typology
directories (ARM/Linux)cache
directories (Linux)GetLogicalProcessorInformationEx
on ARM64 Windows/proc/cpuinfo
(Linux)host_info
(Mach)GetLogicalProcessorInformationEx
(Windows)