Memory Blocks - IT文库_程序员IT互联网编程电子书和文档免费下载，助您码力十足！

首页文库资料文章资讯上传文档发布文章登录账户

JVM 内存模型

0 码力 | 1 页 | 48.42 KB | 1 年前
3
C++20: An (Almost) Complete Overview

constexpr functions can now:  use dynamic_cast() and typeid  do dynamic memory allocations, new / delete  contain try/catch blocks  But still cannot throw exceptions33 constexpr string & vector  std::string attribute can now include a reason  Example: [[nodiscard("Ignoring the return value will result in memory leaks.")]] void* GetData() { /* ... */ }80 Bit Operations  Header  Set of global

0 码力 | 85 页 | 512.18 KB | 6 月前
3
Working with Asynchrony Generically: A Tour of C++ Executors

Operation state notifies receiver by calling one of these exactly once.23 CONCEPTUAL BUILDING BLOCKS OF P2300 concept scheduler: schedule(scheduler) sender; concept sender: connect(sender

0 码力 | 121 页 | 7.73 MB | 6 月前
3
Bringing Existing Code to CUDA Using constexpr and std::pmr

• Introduction • Memory • Host vs Device Functions • Return on Investment • Concluding remarks Outline 2 |• I work the RiskLab team at CSIRO on applied mathematics for Financial Risk. • The aim of … add_gpu<<>>(N, x, y); // … } Ok, about the kernel parameters 10 |Memory“In a typical PC or cluster node today, the memories of the CPU and GPU are physically distinct and https://developer.nvidia.com/blog/unified-memory-in-cuda- 6/ CPU vs GPU Memory System Memory GPU Memory 12 |“Unified Memory creates a pool of managed memory that is shared between the CPU and GPU,

0 码力 | 51 页 | 3.68 MB | 6 月前
3
Making Libraries Consumable for Non-C++ Developers

Windows, sizeof(wchar_t) == 2 • Non-Windows, sizeof(wchar_t) == 4 std::basic_string has memory implications. More on that later.What assumptions are being made? void get_size(size_t dev, long* returns the struct in registers, but the get_data_from() member function returns in caller provided memory. This is often unexpected but occurs using the MSVC compiler for x86 with stdcall (callee cleanup) cleanup) or cdecl (caller cleanup). For non-MSVC, data_t is always returned in a caller provided memory.What else isn’t being declared? struct data_t { int a; int b; }; /* Get data from device ‘dev’

0 码力 | 29 页 | 1.21 MB | 6 月前
3
基于Rust-vmm实现Kubernetes运行时

Docker CP vulnerability Pod Isolation Challenges • Noisy neighbor −Impact performance on CPU, Memory, Bandwidth, Buffer IO, PIDs, File descriptors # kubectl run --rm -it bb --image=busybox sh / # abstracts the common virtualization components which implements a Rust-based VMM. • Written in Rust: Memory-safe language • Secure: Minimal hardware emulation • Flexible: Easy to customize to fit various network focus on correctness and performance • Compiled to native code offering performance similar to C • Memory management without garbage collection • Designed for systems programming Rust is a multi-paradigm

0 码力 | 27 页 | 34.17 MB | 1 年前
3
C++高性能并行编程与优化 - 课件 - 08 CUDA 开启的 GPU 编程

会自动进行同步操作，即和 cudaDeviceSynchronize() 等价！因此前面的 cudaDeviceSynchronize() 实际上可以删掉了。统一内存地址技术（ Unified Memory ） • 还有一种在比较新的显卡上支持的特性，那就是统一内存 (managed) ，只需把 cudaMalloc 换成 cudaMallocManaged 即可，释放时也是通过 cudaFree ）组成的。每个 SM 可以处理一个或多个板块。 • SM 又由多个流式单处理器（ SP ）组成。每个 SP 可以处理一个或多个线程。 • 每个 SM 都有自己的一块共享内存（ shared memory ），他的性质类似于 CPU 中的缓存——和主存相比很小，但是很快，用于缓冲临时数据。还有点特殊的性质，我们稍后会讲。 • 通常板块数量总是大于 SM 的数量，这时英伟达驱动就会在多个 SM 循环内部都是没有数据依赖，从而是可以并行的（对 CPU 而言是 SIMD 和指令级并行，虽然 GPU 没有，但为了引出共享内存的概念我才这样改）。板块的共享内存（ shared memory ） • 刚刚已经实现了无数据依赖可以并行的 for ，那么如何把他真正变成并行的呢？这就是板块的作用了，我们可以把刚刚的线程升级为板块，刚刚的 for 升级为线程，然后把刚刚 local_sum

0 码力 | 142 页 | 13.52 MB | 1 年前
3
使用硬件加速Tokio - 戴翔

Fly.io, and Embark), even it has paid contributors! Tokio is good enough Tokio Tokio's APIs are memory-safe, thread-safe, and misuse-resistant. This helps prevent common bugs, such as unbounded queues Software Enqueue Software Producer Consumer Consumer Consumer • Synchronization latency • Memory/Cache latency • CPU cycles latency DLB ： Dynamic Load Balance DLB Enqueue Logic Head and Tail Load Balancer Producer Producer Consumer Consumer Consumer • No Synchronization latency • No memory/cache latency • No CPU cycles DLB-Assist Channel Intro Hardware Senders Receive Senders Senders

0 码力 | 17 页 | 1.66 MB | 1 年前
3
陈东 - 利用Rust重塑移动应用开发-230618

Cross platform - Performance - Thread Safe - Memory Safe - Love 利用 Rust 重塑移动应用开发 Rust 在移动应用开发中的应用 Why Rust? - Cross platform - Performance - Memory Safe - Love 利用 Rust 重塑移动应用开发 PhoTto / image mobile platforms is increasingly gaining attention from developers. With its impressive performance, memory safety, and concurrency features, Rust has become an ideal choice for building high-performance

0 码力 | 22 页 | 2.10 MB | 1 年前
3
C++高性能并行编程与优化 - 课件 - 07 深入浅出访存优化

int 数组里赋值 1 比赋值 0 慢一倍？第 1 章：内存带宽 cpu-bound 与 memory-bound • 通常来说，并行只能加速计算的部分，不能加速内存读写的部分。 • 因此，对 fill 这种没有任何计算量，纯粹只有访存的循环体，并行没有加速效果。称为内存瓶颈（ memory-bound ）。 • 而 sine 这种内部需要泰勒展开来计算，每次迭代计算量很大的循实战案例：小内核卷积最内层循环 unroll 一下更好英文术语对照表 • 缓存行： cacheline • 一级缓存： L1 cache, L2 cache... • 内存带宽： memory bandwidth • 缓存命中，缓存未命中： cache hit ， cache miss • 伪共享： false sharing • 预取： prefetching • 直写： streaming array-of-struct-of-array Morton Ordering on the Intel Xeon Phi • Modern server architectures rely on memory locality for optimal performance. Data needs to be organized that allows the CPU’s to process the

0 码力 | 147 页 | 18.88 MB | 1 年前
3

共 28 条前往

页

分类

语言

格式

JVM 内存模型

C++20: An (Almost) Complete Overview

Working with Asynchrony Generically: A Tour of C++ Executors

Bringing Existing Code to CUDA Using constexpr and std::pmr

Making Libraries Consumable for Non-C++ Developers

基于Rust-vmm实现Kubernetes运行时

C++高性能并行编程与优化 - 课件 - 08 CUDA 开启的 GPU 编程

使用硬件加速Tokio - 戴翔

陈东 - 利用Rust重塑移动应用开发-230618

C++高性能并行编程与优化 - 课件 - 07 深入浅出访存优化