 Performance Matters(uintptr_t)begin & ~15UL; for (size_t i = 0; i < size; i += 16) { asm("icbi 0,%0" : : "r"(p)); p += 16; } asm("isync"); } DataHeapType* getDataHeap() { static char buf[sizeof(DataHeapType)]; 32) { asm("icbi 0,%0" : : "r"(p)); p += 32; } for (size_t i = 16; i < size; i += 32) { asm("icbi 0,%0" : : "r"(p)); p += 32; } asm("isync"); } DataHeapType* 32) { asm("icbi 0,%0" : : "r"(p)); p += 32; } for (size_t i = 16; i < size; i += 32) { asm("icbi 0,%0" : : "r"(p)); p += 32; } asm("isync"); } DataHeapType*0 码力 | 197 页 | 11.90 MB | 6 月前3 Performance Matters(uintptr_t)begin & ~15UL; for (size_t i = 0; i < size; i += 16) { asm("icbi 0,%0" : : "r"(p)); p += 16; } asm("isync"); } DataHeapType* getDataHeap() { static char buf[sizeof(DataHeapType)]; 32) { asm("icbi 0,%0" : : "r"(p)); p += 32; } for (size_t i = 16; i < size; i += 32) { asm("icbi 0,%0" : : "r"(p)); p += 32; } asm("isync"); } DataHeapType* 32) { asm("icbi 0,%0" : : "r"(p)); p += 32; } for (size_t i = 16; i < size; i += 32) { asm("icbi 0,%0" : : "r"(p)); p += 32; } asm("isync"); } DataHeapType*0 码力 | 197 页 | 11.90 MB | 6 月前3
 Data Is All You Need for FusionPerformance code is about Hardware Matrix Multiply #define macro_n4 {\ b_pref = b_ptr + 4 * K;\ __asm__ __volatile__(\ "movq %7,%%r15; movq %1,%%r14; movq %6,%%r11; salq £4,%%r11;"\ "cmpq £24,%%r15; jb M * K; b_ptr += 4 * K; c_ptr += 4 * ldc - M;\ } #define macro_n2 {\ b_pref = b_ptr + 2 * K;\ __asm__ __volatile__(\ "movq %7,%%r15; movq %1,%%r14; movq %6,%%r11; salq £4,%%r11;"\ "cmpq £24,%%r15; jb Performance code is about Hardware Matrix Multiply #define macro_n4 {\ b_pref = b_ptr + 4 * K;\ __asm__ __volatile__(\ "movq %7,%%r15; movq %1,%%r14; movq %6,%%r11; salq £4,%%r11;"\ "cmpq £24,%%r15; jb0 码力 | 151 页 | 9.90 MB | 6 月前3 Data Is All You Need for FusionPerformance code is about Hardware Matrix Multiply #define macro_n4 {\ b_pref = b_ptr + 4 * K;\ __asm__ __volatile__(\ "movq %7,%%r15; movq %1,%%r14; movq %6,%%r11; salq £4,%%r11;"\ "cmpq £24,%%r15; jb M * K; b_ptr += 4 * K; c_ptr += 4 * ldc - M;\ } #define macro_n2 {\ b_pref = b_ptr + 2 * K;\ __asm__ __volatile__(\ "movq %7,%%r15; movq %1,%%r14; movq %6,%%r11; salq £4,%%r11;"\ "cmpq £24,%%r15; jb Performance code is about Hardware Matrix Multiply #define macro_n4 {\ b_pref = b_ptr + 4 * K;\ __asm__ __volatile__(\ "movq %7,%%r15; movq %1,%%r14; movq %6,%%r11; salq £4,%%r11;"\ "cmpq £24,%%r15; jb0 码力 | 151 页 | 9.90 MB | 6 月前3
 Cross-Platform Floating-Point Determinism Out of the Boxtesting various sixit::dmath IEEE floats: shared-lib, static-lib, soft-float, and inline-asm Victor Istomin 🇺🇦 vi@6it.dev Templatizing math and geometry libs and tests Serhii Iliukhin fallback Mykhailo Borovyk 🇺🇦 mbo@6it.dev Implementing support for RISC-V, including inline-asm Vladyslav Merais 🇺🇦 vmer@6it.dev Overall idea, and Insisting that it is doable, in spite sixit::dmath::ieee_float_static_lib (SHOULD work if no LTO) 3 2020 sixit::dmath::ieee_float_inline_asm (SHOULD work, but for MSVC we have to use intrinsics ➡ more shaky) 4 sixit::dmath::ieee_float_if_strict_fp0 码力 | 31 页 | 3.88 MB | 6 月前3 Cross-Platform Floating-Point Determinism Out of the Boxtesting various sixit::dmath IEEE floats: shared-lib, static-lib, soft-float, and inline-asm Victor Istomin 🇺🇦 vi@6it.dev Templatizing math and geometry libs and tests Serhii Iliukhin fallback Mykhailo Borovyk 🇺🇦 mbo@6it.dev Implementing support for RISC-V, including inline-asm Vladyslav Merais 🇺🇦 vmer@6it.dev Overall idea, and Insisting that it is doable, in spite sixit::dmath::ieee_float_static_lib (SHOULD work if no LTO) 3 2020 sixit::dmath::ieee_float_inline_asm (SHOULD work, but for MSVC we have to use intrinsics ➡ more shaky) 4 sixit::dmath::ieee_float_if_strict_fp0 码力 | 31 页 | 3.88 MB | 6 月前3
 Just-in-Time Compilation - J F Bastien - CppCon 2020on the web as well, for example Python and Lua. What’s particularly neat about Emscripten is the asm.js approach that followed it. It has a clever use of JavaScript’s type system to efficiently represent nominally being a dynamic language. None of these features had originally been intended for the use asm.js made of them.Engineers from the four major browser vendors have risen to the challenge and collaboratively implementations. Bringing the Web up to Speed with WebAssembly — 2017 The marriage of PNaCl and Emscripten / asm.js. With a strong execution model: the virtual ISA is well defined. It pretends to be a modern CPU0 码力 | 111 页 | 3.98 MB | 6 月前3 Just-in-Time Compilation - J F Bastien - CppCon 2020on the web as well, for example Python and Lua. What’s particularly neat about Emscripten is the asm.js approach that followed it. It has a clever use of JavaScript’s type system to efficiently represent nominally being a dynamic language. None of these features had originally been intended for the use asm.js made of them.Engineers from the four major browser vendors have risen to the challenge and collaboratively implementations. Bringing the Web up to Speed with WebAssembly — 2017 The marriage of PNaCl and Emscripten / asm.js. With a strong execution model: the virtual ISA is well defined. It pretends to be a modern CPU0 码力 | 111 页 | 3.98 MB | 6 月前3
 The Main Points of C++import :A; export import :B; module : private; // starts the private part :2. Two points -> colon asm ("leal (%0,%0,4),%0“ : "=r"(n) : "0"(n)); • Class inheritance • Member access specifiers • Member underlying type • Bit-fields • Attribute specifier • In modules (private fragment and partitions) • (asm declaration) :3. Three points -> ellipsis 6 ...3. Three points -> ellipsis • Variadic functions0 码力 | 34 页 | 344.31 KB | 6 月前3 The Main Points of C++import :A; export import :B; module : private; // starts the private part :2. Two points -> colon asm ("leal (%0,%0,4),%0“ : "=r"(n) : "0"(n)); • Class inheritance • Member access specifiers • Member underlying type • Bit-fields • Attribute specifier • In modules (private fragment and partitions) • (asm declaration) :3. Three points -> ellipsis 6 ...3. Three points -> ellipsis • Variadic functions0 码力 | 34 页 | 344.31 KB | 6 月前3
 C++ Memory Model: from C++11 to C++23value = very_long_calc() while (not done){} asm volatile(“mfence” ::: “memory”) asm volatile(“mfence” ::: “memory”) done = true0 码力 | 112 页 | 5.17 MB | 6 月前3 C++ Memory Model: from C++11 to C++23value = very_long_calc() while (not done){} asm volatile(“mfence” ::: “memory”) asm volatile(“mfence” ::: “memory”) done = true0 码力 | 112 页 | 5.17 MB | 6 月前3
 Performance Engineering: Being Friendly to Your Hardware00 00 00 00 nop DWORD PTR [rax+0x0] void *memcpy_erms(void *dst, const void *src, size_t n) { asm volatile ("rep movsb" : "=D" (dst), "=S" (src), "=c" (n) : "0" (dst), "1" (src), "2" (n) : "memory"); 00 00 00 00 nop DWORD PTR [rax+0x0] void *memcpy_erms(void *dst, const void *src, size_t n) { asm volatile ("rep movsb" : "=D" (dst), "=S" (src), "=c" (n) : "0" (dst), "1" (src), "2" (n) : "memory");0 码力 | 111 页 | 2.23 MB | 6 月前3 Performance Engineering: Being Friendly to Your Hardware00 00 00 00 nop DWORD PTR [rax+0x0] void *memcpy_erms(void *dst, const void *src, size_t n) { asm volatile ("rep movsb" : "=D" (dst), "=S" (src), "=c" (n) : "0" (dst), "1" (src), "2" (n) : "memory"); 00 00 00 00 nop DWORD PTR [rax+0x0] void *memcpy_erms(void *dst, const void *src, size_t n) { asm volatile ("rep movsb" : "=D" (dst), "=S" (src), "=c" (n) : "0" (dst), "1" (src), "2" (n) : "memory");0 码力 | 111 页 | 2.23 MB | 6 月前3
 Conan 1.16 Documentationarmv7hf, armv7s, armv7k, armv8, armv8_32, armv8.3, sparc, sparcv9, mips, mips64, avr,␣ ˓→s390, s390x, asm.js, wasm] compiler: gcc: version: ["4.1", "4.4", "4.5", "4.6", "4.7", "4.8", "4.9", "5", "5.1", It should be possible to build packages for Emscripten (asm.js) via the following conan profile: include(default) [settings] os=Emscripten arch=asm.js compiler=clang compiler.version=6.0 compiler.libcxx=libc++ Ninja) work as well. As specified, os has been set to the Emscripten, and arch has been set to either asm.js or wasm (only these two are currently supported). And compiler setting has been set to match the0 码力 | 545 页 | 4.34 MB | 1 年前3 Conan 1.16 Documentationarmv7hf, armv7s, armv7k, armv8, armv8_32, armv8.3, sparc, sparcv9, mips, mips64, avr,␣ ˓→s390, s390x, asm.js, wasm] compiler: gcc: version: ["4.1", "4.4", "4.5", "4.6", "4.7", "4.8", "4.9", "5", "5.1", It should be possible to build packages for Emscripten (asm.js) via the following conan profile: include(default) [settings] os=Emscripten arch=asm.js compiler=clang compiler.version=6.0 compiler.libcxx=libc++ Ninja) work as well. As specified, os has been set to the Emscripten, and arch has been set to either asm.js or wasm (only these two are currently supported). And compiler setting has been set to match the0 码力 | 545 页 | 4.34 MB | 1 年前3
 Conan 1.15 Documentationarmv7hf, armv7s, armv7k, armv8, armv8_32, armv8.3, sparc, sparcv9, mips, mips64, avr,␣ ˓→s390, s390x, asm.js, wasm] compiler: gcc: version: ["4.1", "4.4", "4.5", "4.6", "4.7", "4.8", "4.9", "5", "5.1", It should be possible to build packages for Emscripten (asm.js) via the following conan profile: include(default) [settings] os=Emscripten arch=asm.js compiler=clang compiler.version=6.0 compiler.libcxx=libc++ Ninja) work as well. As specified, os has been set to the Emscripten, and arch has been set to either asm.js or wasm (only these two are currently supported). And compiler setting has been set to match the0 码力 | 540 页 | 4.22 MB | 1 年前3 Conan 1.15 Documentationarmv7hf, armv7s, armv7k, armv8, armv8_32, armv8.3, sparc, sparcv9, mips, mips64, avr,␣ ˓→s390, s390x, asm.js, wasm] compiler: gcc: version: ["4.1", "4.4", "4.5", "4.6", "4.7", "4.8", "4.9", "5", "5.1", It should be possible to build packages for Emscripten (asm.js) via the following conan profile: include(default) [settings] os=Emscripten arch=asm.js compiler=clang compiler.version=6.0 compiler.libcxx=libc++ Ninja) work as well. As specified, os has been set to the Emscripten, and arch has been set to either asm.js or wasm (only these two are currently supported). And compiler setting has been set to match the0 码力 | 540 页 | 4.22 MB | 1 年前3
 Conan 2.1 Documentationdefine("tools.build:compiler_executables", { "c": f"{toolchain}-gcc", "cpp": f"{toolchain}-g++", "asm": f"{toolchain}-as" }) Validating the toolchain package: settings, settings_build and settings_target define("tools.build:compiler_executables", { "c": f"{toolchain}-gcc", "cpp": f"{toolchain}-g++", "asm": f"{toolchain}-as" }) In this case, we need to define the following information: • Add directories compilers path to␣ ˓→be used. Allowed keys {'c', 'cpp', 'cuda', 'objc', 'objcxx', 'rc', 'fortran', 'asm', ˓→'hip', 'ispc'} tools.build:cxxflags: List of extra CXX flags used by different toolchains like␣0 码力 | 694 页 | 4.13 MB | 1 年前3 Conan 2.1 Documentationdefine("tools.build:compiler_executables", { "c": f"{toolchain}-gcc", "cpp": f"{toolchain}-g++", "asm": f"{toolchain}-as" }) Validating the toolchain package: settings, settings_build and settings_target define("tools.build:compiler_executables", { "c": f"{toolchain}-gcc", "cpp": f"{toolchain}-g++", "asm": f"{toolchain}-as" }) In this case, we need to define the following information: • Add directories compilers path to␣ ˓→be used. Allowed keys {'c', 'cpp', 'cuda', 'objc', 'objcxx', 'rc', 'fortran', 'asm', ˓→'hip', 'ispc'} tools.build:cxxflags: List of extra CXX flags used by different toolchains like␣0 码力 | 694 页 | 4.13 MB | 1 年前3
共 73 条
- 1
- 2
- 3
- 4
- 5
- 6
- 8














