全部商品分类

您现在的位置: 全部商品分类 > 电子电脑 > 计算机技术 > 程序与语言

设计数据密集型应用(影印版)(英文版)

  • 定价: ¥99
  • ISBN:9787564173852
  • 开 本:16开 平装
  •  
  • 折扣:
  • 出版社:东南大学
  • 页数:590页
  • 作者:(英)马丁·科勒普...
  • 立即节省:
  • 2017-10-01 第1版
  • 2017-10-01 第1次印刷
我要买:
点击放图片

导语

  

内容提要

  

    今天,数据是系统设计的众多挑战中最核心的部分。我们需要解决许多难题,例如可伸缩性、一致性、可靠性、效率以及可维护性。此外,工具的选择纷繁复杂,包括关系数据库、NoSQL数据库、流式处理器或批处理器以及消息中间件。对于应用程序来说,哪个才是正确的选择?如何才能搞清楚所有这些时髦词?
    在这本务实且全面的指导之作中,马丁·科勒普曼著的《设计数据密集型应用(影印版)(英文版)》,会带你领略这一领域的多样性,他会分析各种数据处理工具和数据存储工具的优缺点。软件在不断变化,不过基本的原则没有变。通过本书,软件工程师和架构师会学到如何在实际中应用这些原则,如何在现代应用程序中充分使用数据。

作者简介

    马丁·科勒普曼,是英国剑桥大学的一名分布式系统研究员。在此之前他曾是软件工程师和企业家,在Linkedin和Rapportive工作过,从事大规模数据基础设施相关的工作。Martin经常在大会做演讲,写博客,也是开源贡献者。

目录

Part I. Foundations of Data Systems
  1. Reliable, Scalable, and Maintainable Applications
    Thinking About Data Systems
    Reliability
      Hardware Faults
      Software Errors
      Human Errors
      How Important Is Reliability?
    Scalability
      Describing Load
      Describing Performance
      Approaches for Coping with Load
    Maintainability
      Operability: Making Life Easy for Operations
      Simplicity: Managing Complexity
      Evolvability: Making Change Easy
    Summary
  2. Data Models and Query Languages
    Relational Model Versus Document Model
      The Birth of NoSQL
      The Object-Relational Mismatch
      Many-to-One and Many-to-Many Relationships
      Are Document Databases Repeating History?
      Relational Versus Document Databases Today
    Query Languages for Data
      Declarative Queries on the Web
      MapReduce Querying
    Graph-Like Data Models
      Property Graphs
      The Cypher Query Language
      Graph Queries in SQL
      Triple-Stores and SPARQL
      The Foundation: Datalog
    Summary
  3. Storage and Retrieval
    Data Structures That Power Your Database
      Hash Indexes
      SSTables and LSM-Trees
      B-Trees
      Comparing B-Trees and LSM-Trees
      Other Indexing Structures
    Transaction Processing or Analytics?
      Data Warehousing
      Stars and Snowflakes: Schemas for Analytics
    Column-Oriented Storage
      Column Compression
      Sort Order in Column Storage
      Writing to Column-Oriented Storage
      Aggregation: Data Cubes and Materialized Views
    Summary
  4. Encoding and Evolution
    Formats for Encoding Data
      Language-Specific Formats
      JSON, XML, and Binary Variants
      Thrift and Protocol Buffers
      Avro
      The Merits of Schemas
    Modes of Dataflow
      Dataflow Through Databases
      Dataflow Through Services: REST and RPC
      Message-Passing Dataflow
    Summary
Part II. Distributed Data
  5. Replication
    Leaders and Followers
      Synchronous Versus Asynchronous Replication
      Setting Up New Followers
      Handling Node Outages
      Implementation of Replication Logs
    Problems with Replication Lag
      Reading Your Own Writes
      Monotonic Reads
      Consistent Prefix Reads
      Solutions for Replication Lag
    Multi-Leader Replication
      Use Cases for Multi-Leader Replication
      Handling Write Conflicts
      Multi-Leader Replication Topologies
    Leaderless Replication
      Writing to the Database When a Node Is Down
      Limitations of Quorum Consistency
      Sloppy Quorums and Hinted Handoff
      Detecting Concurrent Writes
    Summary
  6. Partitioning
    Partitioning and Replication
    Partitioning of Key-Value Data
      Partitioning by Key Range
      Partitioning by Hash of Key
      Skewed Workloads and Relieving Hot Spots
    Partitioning and Secondary Indexes
      Partitioning Secondary Indexes by Document
      Partitioning Secondary Indexes by Term
    Rebalancing Partitions
      Strategies for Rebalancing
      Operations: Automatic or Manual Rebalancing
    Request Routing
      Parallel Query Execution
    Summary
  7. Transactions
    The Slippery Concept of a Transaction
      The Meaning of ACID
      Single-Object and Multi-Object Operations
    Weak Isolation Levels
      Read Committed
      Snapshot Isolation and Repeatable Read
      Preventing Lost Updates
      Write Skew and Phantoms
    Serializability
      Actual Serial Execution
      Two-Phase Locking (2PL)
      Serializable Snapshot Isolation (SSI)
    Summary
  8. The Trouble with Distributed Systems
    Faults and Partial Failures
      Cloud Computing and Supercomputing
    Unreliable Networks
      Network Faults in Practice
      Detecting Faults
      Timeouts and Unbounded Delays
      Synchronous Versus Asynchronous Networks
    Unreliable Clocks
      Monotonic Versus Time-of-Day Clocks
      Clock Synchronization and Accuracy
      Relying on Synchronized Clocks
      Process Pauses
    Knowledge, Truth, and Lies
      The Truth Is Defined by the Majority
      Byzantine Faults
      System Model and Reality
    Summary
  9. Consistency and Consensus
    Consistency Guarantees
    Linearizability
      What Makes a System Linearizable?
      Relying on Linearizability
      Implementing Linearizable Systems
      The Cost of Linearizability
    Ordering Guarantees
      Ordering and Causality
      Sequence Number Ordering
      Total Order Broadcast
      Distributed Transactions and Consensus
      Atomic Commit and Two-Phase Commit (2PC)
    Distributed Transactions in Practice
      Fault-Tolerant Consensus
      Membership and Coordination Services
    Summary
Part III. Derived Data
  10. Batch Processing
    Batch Processing with Unix Tools
      Simple Log Analysis
      The Unix Philosophy
    MapReduce and Distributed Filesystems
      MapReduce Job Execution
      Reduce-Side Joins and Grouping
      Map-Side Joins
      The Output of Batch Workflows
      Comparing Hadoop to Distributed Databases
    Beyond MapReduce
      Materialization of Intermediate State
      Graphs and Iterative Processing
      High-Level APIs and Languages
    Summary
  11. Stream Processing
    Transmitting Event Streams
      Messaging Systems
      Partitioned Logs
    Databases and Streams
      Keeping Systems in Sync
      Change Data Capture
      Event Sourcing
      State, Streams, and Immutability
    Processing Streams
      Uses of Stream Processing
      Reasoning About Time
      Stream Joins
      Fault Tolerance
    Summary
  12. The Future of Data Systems
    Data Integration
      Combining Specialized Tools by Deriving Data
      Batch and Stream Processing
    Unbundling Databases
      Composing Data Storage Technologies
      Designing Applications Around Dataflow
      Observing Derived State
    Aiming for Correctness
      The End-to-End Argument for Databases
      Enforcing Constraints
      Timeliness and Integrity
      Trust, but Verify
    Doing the Right Thing
      Predictive Analytics
      Privacy and Tracking
    Summary
Glossary
Index