notes on AWS

IAM policy

rules that, under the correct ‘condition’, define what ‘actions’ the policy ‘principal’ or holder can take to specify AWS ‘resources’

‘who’: user/group/role, role allow svc like EC2 to become the ‘who’ ‘what’: run instance/CRUD/plug to other svc ‘which’, I can access S3, but which one or all of them? ‘when’, when this rule should apply

Gotcha: If making and attaching policy to IAM principle, the principal can be optional and default to that ‘who’ but why we even need a principle? because some of policy are not attached to IAM users, could be resource. like S3 bucket, SNS, SQS, and Role trust policy. It is not ‘who can do which’, rather ‘which allow who’ bucket policy is just policy that need principal,

the who user and the who resouce what’s the difference between attaching a policy to an IAM user vs resouce like S3 bucket? permission slip: the user carry the paperwork can access VIP list: the user carry nothing but identity, the vault have the paperwork list.

iam role::assumeRole user and groups have actual ‘who’, but not iam. iam role will solid only after ‘trust policy’: who can assume this role. when creating a iam role in the aws console, a trust policy will generated when we select ‘Role Type’

passrole:

2022 review velox Language Frontend -> IR -> Optimizer -> Execution Engine -> Execution Runtime ——PostgreSQL —— (策略) velox (硬件调校)

ReadySet Jon Gjengset: Partial State in Dataflow-Based Materialized Views cache that understand SQL https://planetscale.com/blog/how-planetscale-boost-serves-your-sql-queries-instantly

Neon PostgreSQL 版的 PlanetScale, branching?

Google AlloyDB 基于 PostgreSQL 协议的 AlloyDB，整体架构是 AWS Aurora 的升级版 AlloyDB 采取了和 Aurora, Neon 类似的基于 WAL 的存算分离架构，不过 AlloyDB 在 TP 基础上先点了 AP 的技能树

原生 (vanilla) OLTP MySQL, PostgreSQL, SQL Server - Cloud SQL。
云原生数仓 OLAP - BigQuery。
云原生分布式 OLTP 数据库 - Cloud Spanner

Snowflake Unistore HTAP to TP 公司总是先有 TP 系统, TP 是在线系统，AP 是离线系统

LiteFS and Cloudflare D1

AP类的DuckDB

PostgreSQL = MySQL + 穷人版 (ClickHouse + MongoDB + Elasticsearch + InfluxDB) + Geospatial + Multi-tenancy

DataZone

SQL 语言统一，执行引擎统一，基于日志的存储统一

当下数据库痛点的瓶颈：如何基于云底座实现扎实的 serverless 形态，存算分离，spot instance，tiered storage，给云上的多租户提供高性价比的数据库服务。

如何把 TP，AP 甚至数据湖结合在一起。

如何让数据库变更的开发工作流更接近代码变更的工作流。

=== s3 bucket policy it is much easier to setup than the cross-account access. we could either add principals from other account id,Or we the bucket owner have to create a IAM role, a trust policy between the accounts and provide the user in the guest account the ability to assume the role. ACL is legacy, only user case is to grant permission to specific AWS services.

IAM is traditional role based access control, while s3 bucket policy is resource based access control.

trade-off on access control on s3: s3 bucket policy pro: all in one place s3, simple iam user policy pro: it has to be via IAM role, it is easier to mange in groups.

Full Stack Deep Learning: Chip: Design Machine Leraning System Udacity: MLE 李宏毅: Youtube 王喆: 书和极客时间专栏王树森：Youtube 李沐：b站和书 StatQuest ritvikmath 数学之美白板推导 Google的老论文: Hidden Technical Debt in Machine Learning Systems

====

作者：李庆超链接：https://www.zhihu.com/question/22314873/answer/2333079486 来源：知乎著作权归作者所有。商业转载请联系作者获得授权，非商业转载请注明出处。

在Amazon工作的四年间，有幸在AWS呆过两年。在此之前，对云一直很感兴趣，但没有具体的概念，也没有详细地了解过。日常听到的都是阿里云，腾讯云，国外的也就是AWS。来到AWS后，真正深入的去用云、做云、了解云服务的特性，才认识到云的博大精深。就行业现状，AWS当然是业界No.1，跟随其后就是Azure、GCP，国内以阿里云为龙头，后面就是腾讯云。虽然市场份额有差别，但我相信各个云厂商提供的服务都大同小异。在AWS两年所学甚多，这里我想从一个资深用户的角度去讲解下AWS。云的基础是计算、存储、网络，这三部分涵盖了互联网应用的各个方面，所有的云服务也是围绕这三部分去展开。把这三部分完全展开是不现实的，而且我也没有资深到了解AWS服务的各个方面。所以我打算从如何建立一个通用的互联网应用出发，去分解AWS的各个部分。AWS的服务是按照region来划分的，在部署自己的应用之前，需要选择region，比如美国有us-west, us-east regions, 中国有cn-north, cn-west regions。基本上按照服务用户所在的区域来选择region，服务中国的用户就选择中国的region，服务美国的用户就选择美国的region。否则这个网络传输的成本就非常高，而且中国区其独有的网络环境，导致其他地区的服务是无法访问的。一个region又划分为多个AZ (availability zone), 一般情况下，我们需要把服务器均匀分布在多个AZ，为了避免单点故障，也就是我们所说的灾备多活。选择好region后，就需要部署自己的VPC (virtual private cloud)，一个VPC定义了一个私有隔离的网络环境。在VPC里面，我们部署所有的计算、网络资源。计算资源就是我们的服务器，AWS最出名的就是EC2 (elastic compute cloud)。在部署EC2时，我们首先预估应用需要消耗的计算资源(cpu，磁盘，带宽等等)，选择EC2的型号和数量。然后将所有的EC2分割成多个ASG (auto scaling group), 一个ASG就相当于一个可弹性收缩的机器群，只要定义好扩容和缩容的指标，ASG就可以自己分配机器的数量。比如我们要求在EC2 CPU升到40%就扩容一倍，在CPU降到10%就缩容一倍，这样一个ASG里面机器CPU的消耗就一直均衡地保持在20%左右。具体分割成几个ASG，一般依据这个region有几个AZ来定，比如us-east-1 region有3个AZ，就分割成三个ASG，一个AZ部署一个ASG，这三个ASG在接受流量方面没有任何区别。接下来就是网络资源，每个VPC都有自己的ACL(acess control list)，一个ACL定义了inbound rules和outbound rules，分别限制了访问IP的限制和访出IP的限制。VPC的网络资源被划分成多个子网subnets，一个subnet是一组IP地址的集合，前面说到的ASG就部署在subnet里面。一般来说subnet的数据量跟region AZ的数据成正比，每个AZ分配一个public subnet和一个private subnet，那么us-east-1的VPC就会有6个subnets。我们将ASG部署在private subnet里面，只允许vpc内部的IP访问，用于保护机器资源。因为public subnet是对外的，所以我们在public subnet里面部署ELB (elastic load balancer)，用于接受vpc外的请求。有人会问如果我们想登录到ASG的EC2上面，应该怎么做？解决办法是在public subnet里面launch一个跳板机，我们先登录到跳板机，然后从跳板机里面登录到EC2上面。接下来就是存储资源，AWS提供多种选择，我们最熟悉的应该就是S3 (simple storage service)。S3是面向对象存储的服务，可以用来做数据归档、备份、恢复，或者作为数据分析、AI、Machine learning的数据湖来使用。通俗的理解就是我们的磁盘，它存储的key就是一个目录路径，相当于磁盘的目录，value是一个object，相当于文件或者子目录。在线的存储根据功能的不同也有很多选择，比较出名的而且是我用过的有三个。第一个是DynamoDB，它是document-based NoSQL DB。DynamoDB是Amazon内部使用最频繁的数据，几乎90%的存储都会选择DynamoDB，绝对地超过RDS。这个现象的原因在这篇文章Dynamo: Amazon’s Highly Available Key-value Store里面有解释，同时DDIA (Design data-intensive application) 这本书也给了同样的解释。总结就是：Amazon内部的存储90%以上都是单对多的关系，DynamoDB作为一个分布式的key-value DB，在可用性、扩展性方面非常适合这种单对多的存储结构。而且DynamoDB是最终一致性，进一步增加了它的可用性。关于DynamoDB我后面会单独出一篇介绍它的blog。而对于多对多的存储，我们就会用RDS (relational database service)，RDS字面理解就是关系型数据库。AWS RDS上面托管了多种关系型数据库，包括mysql、oracle、MS SQL、aurora、MariaDB和PostgreSQL这六种数据库，其中我使用过mysql和aurora。在Amazon内部，aurora使用的更频繁，它是兼容mysql和PostgreSQL的结合体，具体内容可见这篇文章Amazon Aurora: Design Considerations for High Throughput Cloud-Native Relational Databases。我使用的最后一种存储是AWS elasticsearch ，当时为了实现一个搜索功能，因为涉及到模糊匹配，全文搜索，就采用了aws es。aws es跟主流的es已经分道扬镳，目前在aws上面称为openSearch service。介绍完计算、网络、存储，接下来我想从应用的角度，讲讲在实际应用中，我们应该怎样使用AWS的服务。首先说消息队列，这个广泛应用的基础功能，AWS提供了SQS和SNS。SQS是一个分布式的队列消息service，SNS是一个分布式的发布-订阅消息service。具体有人会问这两者有什么区别，这里给出了回答: What is the difference between Amazon SNS and Amazon SQS?。在我的实践中，SQS和SNS会结合起来使用，首先应用发布消息到SQS的queue里面，然后SNS消费这个queue的消息，放到自身的topic里面持久保存，然后其他的应用订阅这个topic，消费里面的消息。建设完一个应用，然后就是DevOps。AWS在托管代码、编译代码、部署应用、监控应用方面也提供了一整套服务，从前到后，codeCommit -> codeArtifact -> codeBuild -> codeDeploy -> codePipeline。这一套实现了应用的continuous integration和continuous deployment。在监控方面，AWS提供了cloudWatch，可以以应用维度收集日志，视图化监控指标。AWS还提供了一个服务叫cloudFormation，这个服务在应用迁移的过程中，非常有用。Amazon内部很多都是国际化业务，应用需要部署到各个region。cloudFormation以yaml文本的形式记录下一个应用涉及到的各个服务资源配置，放在一个template里面。迁移到不同的region时，只需要一键run coudFormation template, 就可以部署好所有的AWS资源。最近比较火的serverless function，AWS在这方面也有很深入的实践。大家最熟悉的应该是lamda，可以让用户在不提供服务器的情况下，将应用run起来。用户除了提供代码，其他的资源计算、网络、存储，devOps一概不需要关心，lambda会根据流量的大小自动扩缩容。在计费方面，lambda会根据请求消耗的计算资源来收费，当没有请求进来的时候，就不会收费。Serverless的另一个服务是stepFunction，它是一个任务状态管理服务，每一个step表示一个任务，上个step的输出是下个step的输入。使用者将输入传递给stepFunction，经历过所有step的处理后，将输出返回给使用者。step可以是lambda function，也可以是自己定义的代码。在我的实践中，stepfunction在异步计算任务中用的比较多。以上内容，我从搭建一个互联网应用的角度出发，从计算，网络，存储三个方面大概介绍了我所接触的AWS服务。当然还有很多其他的服务，比如安全，大数据，深度学习等等相关，都没有涉及，因为自己了解不多或者根本没用过。本篇文章提供的链接均真实有效，希望提供的内容对大家有所帮助。如果有错误不当之处，麻烦指出来，我会一一改进。

Security speciality and sysops have about loads of content overlapping, about 75-80% , config, iam policies, IAM roles, Integration of cloud watch agent installation on ec2 instances and installing ssm agent as well, system manager operations.

integration of security with developer: Understanding of cognito in depth, how cognito requests are logged into cloudtrail.

integration of sys ops and solutions architect: configuration with cloud watch , DynamoDB, aurora global databases, aurora server less.

Integration of Developer and Sysops: Apart from the developer tools and DynamoDB. I felt the sysops and developer had very similar content.

Written on February 1, 2023