DoD Instruction 5000.02 - IT文库_程序员IT互联网编程电子书和文档免费下载，助您码力十足！

首页文库资料文章资讯上传文档发布文章登录账户

Can Data-Oriented-Design be Improved?

programming • Modules • 2000s • Template metaprogramming • Concurrency • 2020s • ??? 3What is DoD about? • DoD (“Data oriented design”) • Not about cache lines, nor struct layout (at its core) • From wikipedia: 4Minimalist definition of DoD 𝐷𝑎𝑡𝑎!"#$"# = 𝐹(𝐷𝑎𝑡𝑎%&$"#) Transformation Input Data Output Data Specific transformation Previous transformation Next transformation 5How DoD is used in actual ChatGPT… 7 That’s cool, but it won’t get us very far…How can we improve it? (second try) • At its core DoD is just: 𝐷𝑎𝑡𝑎!"#$"# = 𝐹(𝐷𝑎𝑡𝑎%&$"#) • … With a heavy focus on the data. • What if we looked

0 码力 | 39 页 | 1.18 MB | 6 月前
3
Embracing an Adversarial Mindset for Cpp Security

Malware Research # @ Endgame 0x40200B Malware Research # @ Fireeye 0x40201A Computer Forensics # @ DoD PREVIOUS 0x40D021 Blackhat, RSA, DEFCON, 0x40D02B 44Con, CanSecWest 0x40D02E BsidesSF, WiCys 0x40D032

0 码力 | 92 页 | 3.67 MB | 6 月前
3
Just-in-Time Compilation - J F Bastien - CppCon 2020

carry much more semantic information implicitly. Here, think of examples where a single bytecode instruction might do a full matrix multiplication, or change the prototype of a class. There are much higher architecture on another architecture. Dynamic binary translation: execute the program from one Instruction Set Architecture in another (or the same) ISA, performing the translation dynamically. In other dynamic optimization system that is capable of transparently improving the performance of a native instruction stream as it executes on the processor. Focus its efforts on optimization opportunities that

0 码力 | 111 页 | 3.98 MB | 6 月前
3
C++ Memory Model: from C++11 to C++23

www.linkedin.com/in/alexdathskovsky INO Execution • instruction fetch • if operands available execute it if not fetch them • The instruction is executed by the functional unit • The functional unit Scheduling • instruction fetched • instruction dispatched to instruction que • The instruction waits in the que untils its input operand are available • if operands available instruction is allowed to to leave the queue before other instructions • The instruction is issued to a functional unit • Only if all older instructions have completed the operation the result is written to register fileAlex

0 码力 | 112 页 | 5.17 MB | 6 月前
3
Branchless Programming in C++

● Data dependency: a = (v1 + v2)*(v1 – v2) ● Pipeline increases CPU utilization ● Multiple instruction streams run in parallel – Dependencies within each stream – No data dependencies between streams executed ● Conditional jumps (branches) disrupt that order ● CPU must wait until it knows which instruction to fetch next load:v1[i] load:v2[i] cmp[i]:v1[i]>v2[i] jump if true a[i]:a+=v2[i] jump a[i]:a+=v1[i] executed ● Conditional jumps (branches) disrupt that order ● CPU must wait until it knows which instruction to fetch next load:v1[i]...v3[i] cmp[i]:v3[i]==0 jump if true a[i]:a+=v1[i]+v2[i] jump a[i]:a+=v1[i]*v2[i]

0 码力 | 61 页 | 9.08 MB | 6 月前
3
How Meta Made Debugging Async Code Easier with Coroutines and Senders

atomic topFrame; AsyncStackRoot* nextRoot; frame_ptr stackFramePtr; instruction_ptr returnAddress; };So what was that magic*? struct AsyncStackRoot { atomic atomic topFrame; AsyncStackRoot* nextRoot; frame_ptr stackFramePtr; instruction_ptr returnAddress; };So what was that magic*? struct AsyncStackRoot { atomic topFrame; nextRoot; frame_ptr stackFramePtr; instruction_ptr returnAddress; }; struct AsyncStackFrame { AsyncStackFrame* parentFrame; instruction_ptr instructionPointer; AsyncStackRoot* stackRoot;

0 码力 | 131 页 | 907.41 KB | 6 月前
3
Blazing Trails: Building the World's Fastest CameBoy Emulator in Modern C++

completely memory mapped • Only 8080’s set of registers • Z80’s extended bit manipulation instruction set • Some aditional new instructionsCPU Block DiagramMemory & Memory mapped access • 8 KiB of the CPU. Emulators with T-cycle accuracy simulate the exact number of clock ticks for every instruction, providing the highest level of timing precision. • M-Cycle: Memory Cycle. An M-cycle represents represents a higher-level unit of time used by the Game Boy's CPU for executing instructions. Each instruction takes a specific number of M-cycles, with each M-cycle typically equating to 4 T-cycles. 21T-Cycle

0 码力 | 91 页 | 8.37 MB | 6 月前
3
To Int or to Uint, This is the Question

as all numbers are represented in the binary form. Therefor using n/2 is equal to n>>1. •Each instruction that is fetched from the memory is pushed into a pipeline, one of the steps in the pipeline is as all numbers are represented in the binary form. Therefor using n/2 is equal to n>>1. •Each instruction that is fetched from the memory is pushed into a pipeline, one of the steps in the pipeline is as all numbers are represented in the binary form. Therefor using n/2 is equal to n>>1. •Each instruction that is fetched from the memory is pushed into a pipeline, one of the steps in the pipeline is

0 码力 | 102 页 | 3.64 MB | 6 月前
3
Back to Basics: Concurrency

of execution ● One CPU core executes code sequentially ○ i.e. One instruction after the other. Instruction Execute ... Instruction ExecuteThis somewhat reflects how we write software (2/3) 12 ● one main, sequential thread of execution ● One CPU core executes code sequentially ○ i.e. One instruction after the other. ○ We can abstract our visualization, and just show the call stack. ■ (One function have one main, sequential thread of execution ● One CPU core executes code serially ○ i.e. One instruction after the other. ○ We can abstract our visualization, and just show the call stack. ■ (One function

0 码力 | 141 页 | 6.02 MB | 6 月前
3
Performance Engineering: Being Friendly to Your Hardware

• Linear fetch • Incoming branch • Instruction alignment • Instruction fusingBranch prediction 42 Branching Fetch L1I • Governs fetching of next instruction blocks • A set of tables • Branch instructions is: - Complex - Serial - Slow • Fetch block size • Linear fetch vs incoming branch 44Instruction decoding 45 Branching Fetch Decode Queue ROM Cache L1I • Decoded operations may get cached Block-based cryptographic hash function. • Simple bitwise operations – many of them. • Instruction set equivalence may not be assumed. • Vertical vs horizontal data layout. 105 Latency Throughput

0 码力 | 111 页 | 2.23 MB | 6 月前
3

共 105 条前往

页

分类

语言

格式

Can Data-Oriented-Design be Improved?

Embracing an Adversarial Mindset for Cpp Security

Just-in-Time Compilation - J F Bastien - CppCon 2020

C++ Memory Model: from C++11 to C++23

Branchless Programming in C++

How Meta Made Debugging Async Code Easier with Coroutines and Senders

Blazing Trails: Building the World's Fastest CameBoy Emulator in Modern C++

To Int or to Uint, This is the Question

Back to Basics: Concurrency

Performance Engineering: Being Friendly to Your Hardware