-
PERFORMANCE
MATTERS
(joint work with Charlie Curtsinger, Grinnell College)
emeryberger.com, @emeryberger
Emery Berger
College of Information
and Computer Sciences
UMASS AMHERSTA short time ago
:
un.bmp
Ogle is too slow!
OGLE’84 is too slow!Transistors (millions)
Clock Speed (MHz)
Performance used to be easy
0.001
0.01
0.1
1
10
100
1,000
10,000
1970
1975
1980
1985
1990
1995 gle
loading…
No mojitos for me…
Back to the present…Transistors (millions)
Clock Speed (MHz)
Performance not easy anymore
0.001
0.01
0.1
1
10
100
1,000
10,000
1970
1975
1980
1985
1990
1995
0 码力 |
197 页 |
11.90 MB
| 6 月前 3
-
Being Friendly to Your
Hardware
Performance Engineering
A gentle introduction to hardware for software engineers
2Where does C++ run?
3On an abstract C++ machine
4On an abstract C++ machine?
In most practical cases at boot time only
Same capacity, different
composition => different
performance profile
From JESD 79-4 DDR4 specificationMemory
• Memory system is in the uncore
• Cores act Multiple instructions resulting in fewer
operations
• ISA restrictions may have impact to
performance
Imaginary ARM
mov r20, 0x123456789abcdef0Register renaming
52
Branching
Fetch
Decode
Queue
0 码力 |
111 页 |
2.23 MB
| 6 月前 3
-
Poster submission: Modern C++ for Parallelism in High
Performance Computing
Victor Eijkhout
CppCon 2024
Introduction
This poster reports on ‘D2D’, a benchmark that explores elegance of expression and context of a High Performance Computing ‘mini-application’. The same code has
been implemented using a number of different approaches to parallelism. Implementations are
discussed with performance results.
Relevance multi-dimensional arrays through ‘mdspan’, it is interesting to explore
what C++ can offer for lower level performance critical operations.
Scientific computing is an interesting test cases since many algorithms are
0 码力 |
3 页 |
91.16 KB
| 6 月前 3
-
Introduction
Firsts steps
Context
Theoretical foundations
Outline of an implementation
Conclusion
High-Performance Numerical Integration in the Age of C++26
Vincent Reverdy
Laboratoire d’Annecy de Physique des past, other languages do far better in terms of everything: functionality, ease of use,
and even performance
This talk
The goal is NOT to revolutionize everything or show a library that beats everything algorithms
Runge-Kutta Methods (RK)
yn+1 = yn + h
s
�
i=1
biki
ki = f(tn + cih, yn + (ai1k1 + ai2k2 + · · · + ai,i−1ki−1)h)
Linear Multistep Methods (LLM)
yn+s + as−1 · yn+s−1 + as−2 · yn+s−2 + · ·
0 码力 |
57 页 |
4.14 MB
| 6 月前 3
-
`University of Massachusetts Amherst
Powered by AI:
A Cambrian Explosion
for C++ Software Development Tools
Emery BergerCretaceous–Paleogene (K-Pg) extinction eventCretaceous–Paleogene (K-Pg) extinction ALLOCATED
MEMORY
USAGE
GPU
UTIL %,
PEAK
MEMORY
(MB/s)
MEMORY
PYTHON
NATIVE
AI-powered optimizations!AI-powered optimizations...
COMING SOON!evolveevolve
profiler that suggests optimizationsevolve
0 码力 |
128 页 |
23.40 MB
| 6 月前 3
-
云原生边云协同AI框架实践 普杰 华为云边缘云创新Lab 高级工程师 KubeEdge SIG AI Tech Lead 目 录 Edge AI现状与趋势 01 Sedna:边云协同AI框架 02 Sedna-GM:K8S Operator 03 实践案例 04 Edge AI现状与趋势 第一部分 Why Edge AI? • Cloud中心化的AI计算范式不足以应对端上AI 应用对实时性、准确性和强交互性的需求 devices Edge AI • 随着大模型的发展,AI 计算对算力需求大 幅且快速增长 AI应用到越来越多的边缘场景 分布式协同AI 概念 将人工智能相关的部分任务部署到边缘设备,基于边缘设备、边缘服务 器、云服务器,利用分布式乃至分布式协同方式实现人工智能的技术 数据在边缘产生 边侧逐步具备AI能力 分布式协同AI 核心驱动力 分布式协同AI核心驱动力 • 随着边侧算 随着边侧算力逐步强化,边缘AI持续演变至分布式协同AI 分布式协同AI技术挑战 1. 边缘资源碎片化 2. 边缘数据孤岛 3. 边缘样本少 4. 边缘数据异构 分布式协同AI 技术挑战 边云协同AI框架 第二部分 首个分布式协同AI开源项目Sedna 基于KubeEdge提供的边云协同能力,支持现有AI类应用无缝下沉到边缘 为分布式协同机器学习服务 ✓ 降低构建与部署成本 ✓ 提升模型性能
0 码力 |
37 页 |
2.36 MB
| 1 年前 3
-
Nim - the first high performance Nim - the first high performance language with full support for hot code- language with full support for hot code- reloading at runtime reloading at runtime by Viktor