paper mashup pre-spark

idea on big data rdbms是为富矿设计的,比如ERP,financial,ACID是基本要求 互联网的数据比如clickstream,privacy每条不值钱。一定要大量合起来分析才有价值。


  1. 便宜commodity hardware
  2. 能scale
  3. 容易用

存储, batch计算, NoSQL, 一致性,

Protocol Buffers k8s

  1. not all API are equal, append as major use case
  2. re-define consistendency

resilence by checkpoint. not that HA.

resilence by quick recovery

spark resilence by immutable and recomputable

Written on May 15, 2024