|
Optional Readings
- The microarchitecture of Intel, AMD and VIA CPUs:
an
optimization guide by Agner Fog, 2014.
- AMD Jaguar microarchitecture in
Real World
Tech
- Intel Haswell microarchitecture in
Real World
Tech
-
Intel Phi microarchitecture
- PowerPC 604 RISC Microprocessor: in
IEEE Micro, 1994 Vol
14, Issue 5
- Loop unrolling:
Duff's device
- 2-bit Branch Predictors:IEEE
Trans. Comput., 1995 Vol 44, Issue 5
- Gshare Branch Predictor:
Combining Branch Predictors by Scott McFarling in Technical Note
TN-36, Digital (now HP) Western Research Laboratory, June, 1993
- Trace cache: a low latency approach to high bandwidth instruction
fetching in
Proc. 29th Annual IEEE/ACM Int'l Symposium on Microarchitecture,
1996, pg 24-34
-
What every programmer should know about memory
by Ulrich Drepper, 2007 (classic)
- Polymorphic Branch Prediction
ACM Trans. Archit. Code Optim., 2005 Vol 2, No 2
- Return Address Stack Proc. Intl.
Sym. on Microarch., 1998, Dallas, TX
- Store Data Buffer Design
IEEE Trans. Comput., 2009 Vol 48, No 10
- Data Prefetch
Intl. Sym. on Comp. Arch., 1990
-
Cache Coherence Techniques for Multicore Processors Michael
Marty, 2008
-
SMT on the Alpha EV8 in realworldtech.com
- Open textbook:
Programing on Parallel Machines by Norm Matloff, UCD - includes
CUDA
-
Multiprocessing Support for Hobby OSes Explained, Ben Titzer
-
Transactional Memory Herlihy and Moss, 1993
-
IBM Power8 HTM: IBM Journal of R+D, Volume 59, Issue 1, 2015
-
Transactionalal Memory Tutorial Liangzhen Lai, UCLA
|