openai o1
reward engineering
k8s up and runngin
分布式系统在互联网时代,尤其是大数据时代到来之后,成为了每个程序员的必备技能之一。分布式系统从上个世纪80年代就开始有了不少出色的研究和论文,我在这里只列举最近15年范围以内我觉得有重大影响意义的15篇论文(15 within 15)。
k8s up and running
k8s labs
lab 3
k8s tips
–dry-run: By default as soon as the command is run, the resource will be created. If you simply want to test your command , use the –dry-run=client option. This will not create the resource, instead, tell you whether the resource can be created and if your command is right.
Security Reflection
Security TODO: Pre-Meetup Questions from CSE 291
KVM and QEMU
TODO: please write a 300-word paper review report that summarizes what you learned from this paper, what impresses you the most, what the technical flaws are, and how it compares to other systems you knew before reading the paper.
Para-Virtualization Reflection
TODO: Pre-Meetup Questions from CSE 291
Unikernels
Happy Memorial Day weekend! ** reflection
Firecracker
Amazon Firecracker
Serverless Computing 2
TODO: Pre-Meetup Questions from CSE 291
paper mashup pre-spark
idea on big data rdbms是为富矿设计的,比如ERP,financial,ACID是基本要求 互联网的数据比如clickstream,privacy每条不值钱。一定要大量合起来分析才有价值。
Serverless Computing
TODO: Pre-Meetup Questions from CSE 291 Q: Today’s serverless functions are stateless. How do you think different functions can share data and communicate? A: lambda is stateless, so the state must be some other place, such as DB, Shared File/object, MessageQueue/event stream or HTTP API call chain.
k8s and gVisor
TODO: Pre-Meetup Questions from CSE 291
container meetup class 5
Pre-Meetup Questions from CSE 291
how gpu rendering works
scenegraph:
how gpu rendering works
B6(g%G%)RfhWhvP
note on architect design db
池化技术 create db connection take time. 定期检测连pool connection is still good. 它们的创建过程都比较耗时,也比较消耗系统资源 。所以,我们把它们放在一个池子里统一管理起来,以达到 提升性能和资源复用的目的
分布式系统经典论文
https://www.zhihu.com/question/30026369
note on architect design
高并发系统设计的三大目标:高性能、高可用、可扩展
reflection on the virtualization io
Paper Reflection: Q: Is virtio a full virtualization or a paravirtualization technique? What’s its main benefit? A: It is paravirtualization because the hosted OS was modified to talk to the hypervisor. Main benefit: simplify and offload virtualization workload to the guest OS. problem space from (n*m) => (n+m)
reflection on the x86 virtualization memory
Paper Reflection:
note on "Notes on GitLab Postgres Schema Design"
ref: https://shekhargulati.com/2022/07/08/my-notes-on-gitlabs-postgres-schema-design/
readyset or Partial State in Dataflow-Based Materialized Views
Although this query uses the relational model cleanly to fetch the data you require for your application, this is not a query you can run in a relational database. It’s just too expensive. You’d have to do something else to figure out the star count for your repositories. Maybe de-normalize the data. Migrate your database to keep a star count in the repository table. Or maybe keep the count in a Redis instance and update your application so that starring a repository writes to your database and to Redis.
reflection on the x86 virtualization, sw vs hw
from PPT:
调度和正义
RAM资源是有限的,必然有些应用程序需要委屈下,问题在于,到底委屈哪个? 以个体来讲,公平公正,程序正义是非常重要,且带来完全感的promise。 但是,公平不是唯一的尺度,比如swap是一个非常贵的操作。以总体来讲,要trade off:降低发生swap vs 公平公正
relfection on virtualization background
OSTEP
Chip ML design
ch1 Overview
when to use ML (1) lean (2) complex patterns with (3) existing data to (4) make predictions (5) on unseen data
overview of design
Transacation:
dp 套路
时间序列类: 套路I: 状态与前一个时间序列的状态有关,可以是一维或者二维DP,第一维表示序列,第二维表示状态数量。 LC 198. House Robber LC 213. House Robber II LC 123. Best Time to Buy and Sell Stock III LC 309. Best Time to Buy and Sell Stock with Cooldown LC 376. Wiggle Subsequence LC 276. Paint Fence LC 487. Max Consecutive Ones II LC 1186. Maximum Subarray Sum with One Deletion 903.Valid Permutations for DI Sequence
Spark
尝试回到这个spark问题集。
AWS network
VPC: a gated parking buiding. IGW: entrance in and out to internet subnet: level in parking lot route table: route signs DNS resolver: parking lot staff can give direction when you are lost, or you want to go another subnet.
线程 vs 协程?
- 线程由中断和特定调用分出/让出时间片和触发上下文切换。切分的是对CPU的占有
- 协程则由 yield/await 等语言结构或特定调用分出 continuations 并切换上下文。切分的是控制流。
python tmd 装哪里了?
Python 是如何寻找包的
why the mmu
2023-08-30-why-the-design-misc.md
why the design distributed system
- Worked on maintaining and adding features to an enterprise-wide machine learning platform that is used by various other teams at Capital One to streamline machine learning processes
- Built an internal API that provisioned and created Kubernetes applications for ML jobs. This was done to move away from using AWS EMR to reduce deployment time from 20 minutes to less than 1 minute.
- Migrated the API component of our machine learning platform to a newer AWS account to prevent our subnet running out of IP addresses and reducing clutter
why the design distributed system
three-tier front, app, db.
why the design os
why the design db
为什么 Redis 选择单线程模型
- IO multiplexing这么香, 用select监控多个连接。
- mutil-thread 的data racing 太麻烦了
- web server bottleneck 是IO不是CPU
- 新版本有些可以多线程来做,UNLINK、FLUSHALL ASYNC 和 FLUSHDB ASYNC. 特点是没有data race。
why the design network
newwork
- why 3 handshake
data engineering overview
Row based vs Column based
airflow
issue when using airflow
UEFI and BIOS or Bootloader
Linux –UEFI– BIOS –register– Hardware
design a chat system
https://twitter.com/yanggege007/status/1627919590994092033
notes on gRPC
IDL: interface defined lang auto generated server skeleton/ client stub
notes on AWS
IAM policy
notes on Amazon Aurora
http://muratbuffalo.blogspot.com/2022/03/amazon-aurora-design-considerations-and.html
notes on CockroachDB
http://muratbuffalo.blogspot.com/2022/03/cockroachdb-resilient-geo-distributed.html
notes on Distributed Systems in One Lesson by Tim Berglund
storage single master storage (-consistency) =>read replica - (data schema,relation join)=> sharding
notes on LSM tree paper
compaction
notes on CockroachDB
http://muratbuffalo.blogspot.com/2022/03/cockroachdb-resilient-geo-distributed.html
notes on HOW DISCORD STORES BILLIONS OF MESSAGES
https://discord.com/blog/how-discord-stores-billions-of-messages
k8s 学习杂记
- note from 白话Kubernetes入门实践
c++ 常用算法
// <algorithm>
pos = find(iter.begin(), iter.end(), value) -vec.begin(); // 找元素
pos = search(content.begin(), content.end(), sub.begin(), sub.end()) - content.begin(); // 找子串
fill(iter.begin(), iter.end(), value); // 填充
reverse(iter.begin(), iter.end()); // 反转元素
rotate(iter.begin(), iter.begin() + cnt, iter.end()); // 轮转 cnt 位:1 2 3 => 2 3 1 (1 位)
unique(iter.begin(), iter.end()); // 连续元素去重
bool fun(value){}
partition(iter.begin(), iter.end(), fun); // 分组(轴点),快排基础,返回值为第二组开始点
partition_point(iter.begin(), iter.end(), fun); // 返回轴点位置
sort(iter.begin(), iter.end()); // 排序,从小到大
sort(iter.being(), iter.end(), bool_fun); // 排序,自定义,true => 在前
partial_sort(iter.begin(), iter.begin() + cnt, iter.end()); // 部分排序,全部的前 n 小排序好,如 0 1 2 7 8 6 5 9 4 3
nth_element(iter.begin(), iter.begin() + nth, iter.end()); // 将有序的第 n 个放到对应位置,且前面都小于等于,后面都大于等于,但不保证有序
lower_bound(iter.begin(), iter.end(), value); // 不小于 value 的第一个值位置
upper_bound(iter.begin(), iter.end(), value); // 大于 value 的第一个位置
binary_search(iter.begin(), iter.end(), value); // 二叉搜索元素是否存在
merge(iter1.begin(), iter1.end(), iter2.begin(), iter2.end(), dest.begin()); // 合并有序的两个迭代器
inplace_merge(iter.begin(), iter.begin() + mid, iter.end()); // 将前后有序的两部分合并
includes set_difference set_intersection set_symmertric_difference set_union; // 集合用函数
is_heap(iter.begin(), iter.end()); // 判断是否是大根堆
make_heap(iter.begin(), iter.end()); // 大根堆化
push_heap(iter.begin(), iter.end()); // 将末尾元素放入前 n - 1 个元素构成的堆
pop_heap(iter.begin(), iter.end()); // 将堆顶放到末尾
sort_heap(iter.begin(), iter.end()); // 将大根堆转为升序排列
max_element(iter.begin(), iter.end()); // argmax
min_element(iter.begin(), iter.end()); // argmin
minmax_element(iter.begin(), iter.end()); // 同时返回 min max 的位置
clamp(value, lo, hi); // C++17 限制在 [lo, hi] 内
next_permutation(iter.begin(), iter.end()); // 返回下一个排序,“排序”间以字典序比较大小,字典序最大时返回 false
prev_permutation(iter.begin(), iter.end()); // 与前者相反,返回字典序更小的下一个
accumulate(iter.begin(), iter.end(), init, binary_op); // 对迭代器做聚合操作,默认 add => sum
partial_sum(iter.begin(), iter.end(), dest.begin()); // 求前缀和,可以类似前者换函数做其他聚合操作
raft vs kafka
map reduce
Word Count
Using map reduce to count word frequency
sort + pq 贪心解决问题
253. Meeting Rooms II 用pq思路
sort by begin, pq sort by end TODO https://leetcode.com/problems/meeting-rooms-ii/solution/
class Solution {
public:
int minMeetingRooms(vector<vector<int>>& intervals) {
sort(intervals.begin(), intervals.end());
priority_queue<int, vector<int>, std::greater<int>> pq;//min
pq.push(intervals[0][1]);
for (int i = 1; i < intervals.size(); i++) {
if (pq.top() <= intervals[i][0]){
pq.pop();
}
pq.push(intervals[i][1]);
}
return pq.size();
}
};
妙用BST Sweep time line
Sweep time line
和刚刚的diff问题一样,只是问题颗粒不在是每个index, 而是index of interest, 在输入sparse的情况下避免了一秒一秒的数,而是event发生了再处理, 利用了bst的里面的key也是sorted的特性
Dp Roadmap
时间序列类: 套路I: 状态与前一个时间序列的状态有关,可以是一维或者二维DP,第一维表示序列,第二维表示状态数量。 LC 198. House Robber LC 213. House Robber II LC 123. Best Time to Buy and Sell Stock III LC 309. Best Time to Buy and Sell Stock with Cooldown LC 376. Wiggle Subsequence LC 276. Paint Fence LC 487. Max Consecutive Ones II LC 1186. Maximum Subarray Sum with One Deletion 903.Valid Permutations for DI Sequence
diff 差分思维
这一系列的思路就是只记录init的值,之后全部记录变化量(diff).
Binary Index Tree
Prefix Sum
prefix Sum 用于cache了加法结果,使得sum of range立刻可以得到结果, 注意前面要加0, 值得prefix存的是[)
的方式,
每次query要prefix[right+1] - prefix[left]
leetcode contest biweekly 68, weekly 273
2115. Find All Possible Recipes from Given Supplies
it is basically topo sort, remember from->to when building the graph.
class Solution {
public:
vector<string> findAllRecipes(vector<string>& recipes, vector<vector<string>>& ingredients, vector<string>& supplies) {
unordered_map<string, vector<string>> graph;
unordered_map<string, int> indegree;
for (int i = 0; i < recipes.size(); i++) {
string to = recipes[i];
for (auto& from: ingredients[i]) {
graph[from].push_back(to); //<<--
indegree[to]++;
}
}
queue<string> q;
for (string& seed: supplies) {
q.push(seed);
}
unordered_set<string> need(recipes.begin(), recipes.end());
vector<string> order;
while (!q.empty()) {
string cur = q.front(); q.pop();
if (need.count(cur)){
order.push_back(cur);
}
for (string to: graph[cur]) {
indegree[to]--;
if (indegree[to] == 0) {
q.push(to);
}
}
}
return order;
}
};
12-25 leetcode retro
394. Decode String
- keep two stacks, one for int, one for all
- accumulate int/string
[
=> push k, decoded to stack]
=> pop k, decoded, thenwip string = decode+ str*k
Basic Calculator
class Solution {
public:
int calculate(string s) {
queue<char> q;
for (char c: s) {
if (c != ' ') q.push(c);
}
return eval(q);
}
int eval(queue<char>& q) {
stack<int> stk;
int num = 0;
char op = '+';
while(!q.empty()) {
char c = q.front();q.pop();
if (c == ' ') continue;
if (isdigit(c)) {
num = num*10 + (c - '0');
}
if (c == '(') {
num = eval(q);
}
// - 12 +
// op num i
//because we see another op
// stk.top() op num
//then we know what's num, and what's op
if (!isdigit(c) || q.empty()){
switch(op) {
case '+': stk.push(num); break;
case '-': stk.push(-num); break;
case '*': stk.top() *= num; break;
case '/': stk.top() /= num; break;
}
op = c;
num = 0;
}
if (c == ')') {
break;
}
}
int ans = 0;
while (!stk.empty()) {
ans += stk.top(); stk.pop();
}
return ans;
}
};
day 11 on stack and queue design
232. Implement Queue using Stacks
leetcode retro
##
leetcode daily review 12-18
416. Partition Equal Subset Sum
leetcode retro
##
leetcode retro
##
leetcode retro
##
leetcode retro
##
leetcode retro
##
leetcode retro
##
leetcode sliding window
643. Maximum Average Subarray I
leetcode binary search
704. Binary Search
leetcode retro
leetcode 12-04 retro
797. All Paths From Source to Target
basic dfs to find the end node note the backtracking won’t include the root node. so either add it first in the ans, move the mark/unmark to the scope of dfs
How to Solve it
How to solve it?
leetcode 01-01 weekly retro
973. K Closest Points to Origin
C++ and Singleton
在单线程模式下,最简单的办法就是只能通过一个method获取instance,在第一次access的时候建立instance
rabbit hole of rpath and origin
a little background on dynamic linking
the job of dynamic linker and loader(ld.so) is to resolve the executable’s dependencies on shared libraries and to load the required ones at run-time. readelf .bin to find the NEDDED headers and search the DSO with a matching SONAME
大话 design pattern
What’s design pattern?
a design pattern is a general reusable solution to a commonly occurring problem within a given context in software design
go and coroutine
coroutine 保存register然后跳转到另外一个function
How to DIY a GUI
how to render point, link and surface (QPainterDevice)
note on note on distribute system
TL;DR There are fundamential differences between local computing and distribute computing, those differences cannot be ignored, or project are doomed to failure
MVC in a context
Everyone know MVC, but not everyone agress with the seperation of the concerns.
How to make a DB? [Draft]
It will built in C as most the db in the world, and using b-tree or b+ tree to store. the keypoint here are:
How to make a debugger
When we debug, we want to forgot about all security issue, and just want to peek the state of running applicaion. well, first thing not to do is to put debugger and debuggie to same memory space:) so they have to in different process.
How to make a chat app
XMPP
tldr It is a IM protocol that use XML, it is decentialized, and XML itself contain rounting info, and it is the server that doing all the heavy work :P
Linux Graphic Stack and Wayland
3D rendering with OpenGL
- A library, “Mesa”, implements the OpenGL API. It uses card-specific drivers to translate the API into a hardware-specific form. If the driver uses Gallium internally, there’s a shared component that turns the OpenGL API into a common intermediate representation, TGSI. The API is passed through Gallium, and all the device-specific driver does is translate from TGSI into hardware commands,
- libdrm uses special secret card-specific ioctls to talk to the Linux kernel
- The Linux kernel, having special permissions, can allocate memory on and for the card.
- Back out at the Mesa level, Mesa uses DRI2 to talk to Xorg to make sure that buffer flips and window positions, etc. are synchronized.
Webkit revisited
Read MoreRandom note for Linux
- Describe direct memory access (DMA). Can a user level buffer / pointer be used by kernel or drivers?
Linker and Loader
overview
Web Scraping
or How to pull data from a website without API