Designing an ultra low-overhead multithreading runtime for Nimfrom? ◇ 3 years ago: started writing a tensor library in Nim. ◇ 2 threading APIs at the time: OpenMP and simple threadpool ◇ 1 year ago: complete refactoring of the internals 3 Agenda ◇ Understanding - Userland, lightweight context switches - Cannot use hardware threads Preemptive: - PThreads (OpenMP, TBB, Cilk, …) - Scheduled by the OS, heavier context switches - Need synchronization primitives: bus, caches, ... 14 Data parallelism Parallel for loop - Same instructions on multiple data - OpenMP - Use-cases - Vectors, matrices, multi-dimensional arrays and tensors - Challenges: - Nested0 码力 | 37 页 | 556.64 KB | 1 年前3
Modern C++ for Parallelism in High Performance Computingwhat extent we achieve scaling for different parallelization strategies: C-style programming with OpenMP, native mechanisms in modern C++, as well as through Kokkos and Sycl. Discussion An important corner distributed memory, and OpenMP (Open Multi-Processing) for shared memory. In this project we focus mostly on the shared memory aspect and use OpenMP as the performance baseline. (1) OpenMP has standard bindings ‘C-style’ imple- mentation based on simple loops and linear vectors for floating point data storage. (2) OpenMP also has bindings to C++, where it can exploit a random access iterator. This means that we reimplement0 码力 | 3 页 | 91.16 KB | 6 月前3
大模型时代下向量数据库的设计与应用• Faiss OpenMP线程改造 • LLVM解析源码,找到所有 OpenMP指令语句 • 转换为调用自定义线程池和 lambda表达式 • 共享变量替换及并发保护 PieCloudVector • Faiss OpenMP线程改造 • 控制全局线程数 • 降低线程锁冲突 • 降低内存使用 PieCloudVector • Faiss OpenMP线程改造 • 避免无效线程 避免无效线程 PieCloudVector • Faiss OpenMP线程改造 • QPS大幅提升 PieCloudVector • Faiss OpenMP线程改造 • 内存占用大幅降低 PieCloudVector • Faiss与postgres内核对接 - gpu搜索的特殊路径 • 避免并发调用gpu • 查询请求按批单线程提交 PieCloudVector • 兼容国产硬件和操作系统0 码力 | 28 页 | 1.69 MB | 1 年前3
Khronos APIs for Heterogeneous Compute and Safety: SYCL and SYCL SC64); parallel_for_each(e, [=](index<2> idx) restrict(amp) { ptr[idx] *= 2.0f; }); Here we’re using OpenMP as an example float *h_a = { … }, d_a; cudaMalloc((void **)&d_a, size); cudaMemcpy(d_a, h_a, size 64>>>(a, b, c); cudaMemcpy(d_a, h_a, size, cudaMemcpyDeviceToHost); Examples: - OpenCL, CUDA, OpenMP, SYCL 2020 Implementation: - Data is moved to the device via explicit copy APIs Here we’re using0 码力 | 82 页 | 3.35 MB | 6 月前3
Heterogeneous Modern C++ with SYCL 2020Chair of SYCL Heterogeneous Programming Language ● ISO C++ Directions Group past Chair ● Past CEO OpenMP ● ISOCPP.org Director, VP http://isocpp.org/wiki/faq/wg21#michael-wong ● michael@codeplay.com Application uses SYCL, Kokkos, Raja SYCL in HPC/Supercomputers CUDA/pthreads/ OpenACC/OpenCL OpenMP for C and Fortran Need Languages that allow control of these Data Issues Set Data affinity, Data0 码力 | 114 页 | 7.94 MB | 6 月前3
cppcon 2021 safety guidelines for C parallel and concurrencyChair of SYCL Heterogeneous Programming Language ● ISO C++ Directions Group past Chair ● Past CEO OpenMP ● ISOCPP.org Director, VP http://isocpp.org/wiki/faq/wg21#michael-wong ● michael@codeplay.com ●0 码力 | 52 页 | 3.14 MB | 6 月前3
Interesting Upcoming Features from Low Latency, Parallelism and Concurrencycollection, and optimization processes. Useful for: ● Lock-free data structures ● Parallel reductions (OpenMP) ● Optimization algorithms ● Statistics collectionProposed interface namespace std { template0 码力 | 56 页 | 514.85 KB | 6 月前3
openEuler 22.09 技术白皮书Scheduler MM File System Network Syscall Proxy openEuler Linux Kernel 硬件 HPC Tasks UI DDS IDE OpenMP MPI Linux Compatible Syscall 场景化 OS (轻量) 场景化 OS 底座框架 噪声迁移 CPU隔离 openEuler 22.09 技术白皮书 17 openEuler0 码力 | 13 页 | 1.39 MB | 1 年前3
openEuler OS Technical Whitepaper
Innovation Projects
(June, 2023)low-intrusive programming interface: GCC for openEuler will provide CUDA-like variable attributes and OpenMP lead extension. It will automatically generate code to streamline development and improve ecosystem0 码力 | 116 页 | 3.16 MB | 1 年前3
Conda 23.3.x DocumentationIntel OpenMP runtime libraries. This is almost always caused by one of two things: 1. The environment with NumPy has not been activated. 2. Another software vendor has installed MKL or Intel OpenMP (libiomp5md0 码力 | 370 页 | 2.94 MB | 8 月前3
共 50 条
- 1
- 2
- 3
- 4
- 5
相关搜索词
DesigninganultralowoverheadmultithreadingruntimeforNimModernC++ParallelisminHighPerformanceComputing模型时代向量数据据库数据库设计应用KhronosAPIsHeterogeneousComputeandSafetySYCLSCwith2020cppcon2021safetyguidelinesparallelconcurrencyInterestingUpcomingFeaturesfromLowLatencyConcurrencyopenEuler22.09技术白皮皮书白皮书OSTechnicalWhitepaperInnovationProjectsJune2023Conda23.3Documentation













