readyset or Partial State in Dataflow-Based Materialized Views

Although this query uses the relational model cleanly to fetch the data you require for your application, this is not a query you can run in a relational database. It’s just too expensive. You’d have to do something else to figure out the star count for your repositories. Maybe de-normalize the data. Migrate your database to keep a star count in the repository table. Or maybe keep the count in a Redis instance and update your application so that starring a repository writes to your database and to Redis.

PlanetScale Boost’s most fundamental operation also revolves around a plan.

With PlanetScale Boost, we’re trying to do the opposite. Instead of planning how to fetch the data every time you read from your database, we want to plan how to process the data every time you write to your database.

materizalied view:

  • run the query an remember the result
  • how to maintain the materialization
    • what happends if the underlying data changes?
    • maintain on write or read?
  • primary target analystics work, infrequent read, maintain on read

dataflow-based (how to maintain?):

  • data move to compute: push-based computation
  • data changes propagaes through graph of operators: relational operator like joins, agg, filter
  • edge is a data dependecy: e.g. join depends on its input

partial state: learning to forgot

  • chance are most entries in the view are not accessed
  • need to evict old entries and only add ones on demand.
  • three main contributions:
    • notion of missing state in the materilizaed view
    • upquery to populate missing state using dataflow

view must know the query parameter(s)

Written on February 17, 2024