全部商品分类

您现在的位置: 全部商品分类 > 电子电脑 > 计算机技术 > 硬件及维护

ARM嵌入式系统编程与优化(英文版)/经典原版书库

  • 定价: ¥79
  • ISBN:9787111565284
  • 开 本:16开 平装
  •  
  • 折扣:
  • 出版社:机械工业
  • 页数:300页
  • 作者:(美)詹森D.巴克斯
  • 立即节省:
  • 2017-04-01 第1版
  • 2017-04-01 第1次印刷
我要买:
点击放图片

导语

  

内容提要

  

    詹森D.巴克斯著的《ARM嵌入式系统编程与优化(英文版)》结合ARM架构和Linux工具,关注以性能为导向的嵌入式编程,深入讲解如何通过对数据、算法和存储等层面的优化,终实现性能的显著提升。书中首先讲解ARM架构和嵌入式系统的基础知识,然后结合图像变换、分形生成和计算机视觉等应用案例,详细说明不同的优化方法,读者可在RaspberryPi等平台上动手运行并比较不同算法,掌握实践技巧。本书适合作为本科或研究生嵌入式系统课程的教材,也适合从事相关开发工作的程序员参考。

作者简介

    詹森D.巴克斯(Jason D.Bakos)美国南卡罗来纳大学计算机科学与工程系副教授,研究方向包括高性能计算、异构网络和计算机体系结构等。2009年曾获得美国国家科学基金会(NSF)事业奖,现为ACM会刊《可重构技术与系统》的副主编。

目录

Preface
Acknowledgments
CHAPTER 1 The Linux/ARM embedded platform
  1.1 Performance-Oriented Programming
  1.2 ARM Technology
  1.3 Brief History of ARM
  1.4 ARM Programming
  1.5 ARM Architecture Set Architecture
    1.5.1 ARM general purpose registers
    1.5.2 Status register
    1.5.3 Memory addressing modes
    1.5.4 GNU ARM assembler
  1.6 Assembly Optimization #1: Sorting
    1.6.1 Reference implementation
    1.6.2 Assembly implementation
    1.6.3 Result verification
    1.6.4 Analysis of compiler-generated code
  1.7 Assembly Optimization #2: Bit Manipulation.
  1.8 Code Optimization Objectives
     1.8.1 Reducing the number of executed instructions
     1.8.2 Reducing average CPI
  1.9 Runtime Profiling with Performance Counters.
     1.9.1 ARM performance monitoring unit
     1.9.2 Linux Perf_Event
     1.9.3 Performance counter infrastructure
  1.10 Measuring Memory Bandwidth
  1.11 Performance Results
  1.12 Performance Bounds
  1.13 Basic ARM Instruction Set
     1.13.1 Integer arithmetic instructions
     1.13.2 Bitwise logical instructions
     1.13.3 Shift instructions
     1.13.4 Movement instructions
     1.13.5 Load and store instructions
     1.13.6 Comparison instructions
     1.13.7 Branch instructions
     1.13.8 Floating-point instructions
  1.14 Chapter Wrap-Up
  Exercises
CHAPTER 2 Multicore and data-level optimization: OpenMP and SIMD
  2.1 Optimization Techniques Covered by this Book..
  2.2 Amdahl's Law
  2.3 Test Kernel: Polynomial Evaluation
  2.4 Using Multiple Cores: OpenMP
    2.4.1 OpenMP directives
    2.4.2 Scope
    2.4.3 Other OpenMP directives
    2.4.4 OpenMP synchronization
    2.4.5 Debugging OpenMP code
    2.4.6 The OpenMP parallel for pragma
    2.4.7 0penMP with performance counters
    2.4.8 0penMP support for the Homer kernel
  2.5 Performance Bounds
  2.6 Performance Analysis
  2.7 Inline Assembly Language in GCC
  2.8 Optimization #h Reducing Instructions per Flop
  2.9 Optimization #2: Reducing CPI
     2.9.1 Software pipelining
     2.9.2 Software pipelining Homer's method
  2.10 Optimization #3: Multiple Flops per Instruction with Single Instruction, Multiple Data
     2.10.1 ARM11 VFP short vector instructions
     2.10.2 ARM Cortex NEON instructions
     2.10.3 NEON intrinsics
  2.11 Chapter Wrap-Up
  Exercises
CHAPTER 3 Arithmetic optimization and the Linux Framebuffer
  3.1 The Linux Framebuffer
  3.2 Affme Image Transformations
  3.3 Bilinear Interpolation
  3.4 Floating-Point Image Transformation
    3.4.1 Loading the image
    3.4.2 Rendering frames
  3.5 Analysis of Floating-Point Performance
  3.6 Fixed-Point Arithmetic
    3.6.1 Fixed point versus floating point: Accuracy
    3.6.2 Fixed point versus floating point: Range
    3.6.3 Fixed point versus floating point: Precision
    3.6.4 Using fixed point
    3.6.5 Efficient fixed-point addition
    3.6.6 Efficient fixed-point multiplication
    3.6.7 Determining radix point position
    3.6.8 Range and accuracy requirements for image transformation
    3.6.9 Converting from floating-point to fixed-point arithmetic
  3.7 Fixed-Point Performance
  3.8 Real-Time Fractal Generation
     3.8.1 Pixel coloring
     3.8.2 Zooming in
     3.8.3 Range and accuracy requirements
  3.9 Chapter Wrap-Up
  Exercises
CHAPTER 4 Memory optimization and video processing
  4.1 Stencil Loops
  4.2 Example Stencil: The Mean Filter
  4.3 Separable Filters
    4.3.1 Gaussian blur
    4.3.2 The Sobel filter
    4.3.3 The Harris comer detector
    4.3.4 Lucas-Kanade optical flow
  4.4 Memory Access Behavior of 2D Filters
    4.4.1 2D data representation
    4.4.2 Filtering along the row
    4.4.3 Filtering along the column
  4.5 Loop Tiling
  4.6 Tiling and the Stencil Halo Region
  4.7 Example 2D Filter Implementation
  4.8 Capturing and Converting Video Frames
    4.8.1 YUV and chroma subsampling
    4.8.2 Exporting tiles to the frame buffer
  4.9 Video4Linux Driver and API
  4.10 Applying the 2D Tiled Filter
  4.11 Applying the Separated 2D Tiled Filter
  4.12 Top-Level Loop
  4.13 Performance Results
  4.14 Chapter Wrap-Up
  Exercises
CHAPTER 5 Embedded heterogeneous programming with OpenCL
  5.1 GPU Microarchitecture
  5.2 0penCL
  5.3 0penCL Programming Model, Idioms, and Abstractions
     5.3.1 The host/device programming model
     5.3.2 Error checking
     5.3.3 Platform layer: Initializing the platforms...
     5.3.4 Platform layer: Initializing the devices
     5.3.5 Platform layer: Initializing the context
     5.3.6 Platform layer: Kernel control
     5.3.7 Platform layer: Kernel compilation
     5.3.8 Platform layer: Device memory allocation
  5.4 Kernel Workload Distribution
     5.4.1 Device memory
     5.4.2 Kernel parameters
     5.4.3 Kernel vectorization
     5.4.4 Parameter space for Homer kernel
     5.4.5 Kernel attributes
     5.4.6 Kernel dispatch
  5.5 0penCL Implementation of Homer's Method: Device Code
    5.5.1 Verification
  5.6 Performance Results
    5.6.1 Parameter exploration
    5.6.2 Number of workgroups
    5.6.3 Workgroup size
    5.6.4 Vector size
  5.7 Chapter Wrap-Up
  Exercises
Appendix A Adding PMU support to Raspbian for the Generation 1 Raspberry Pi
Appendix B NEON intrinsic reference
Appendix C OpenCL reference
Index