paper mashup pre-spark
idea on big data rdbms是为富矿设计的,比如ERP,financial,ACID是基本要求 互联网的数据比如clickstream,privacy每条不值钱。一定要大量合起来分析才有价值。
这些需求的改变要求:
- 便宜commodity hardware
- 能scale
- 容易用
存储, batch计算, NoSQL, 一致性,
Protocol Buffers k8s
- not all API are equal, append as major use case
- re-define consistendency
resilence by checkpoint. not that HA.
resilence by quick recovery
spark resilence by immutable and recomputable
Written on May 15, 2024