paper mashup pre-spark

idea on big data rdbms是为富矿设计的,比如ERP,financial,ACID是基本要求 互联网的数据比如clickstream,privacy每条不值钱。一定要大量合起来分析才有价值。

这些需求的改变要求:

  1. 便宜commodity hardware
  2. 能scale
  3. 容易用

存储, batch计算, NoSQL, 一致性,

Protocol Buffers k8s

  1. not all API are equal, append as major use case
  2. re-define consistendency

resilence by checkpoint. not that HA.

resilence by quick recovery

spark resilence by immutable and recomputable

Written on May 15, 2024