Files
notes/Distributed_Systems/site_reliability_engineering.md

48 lines
1.4 KiB
Markdown

# SRE (Site Reliability Enrineering)
## Managing Machines
- Google Borg [Ver15]
- similar to *Apache Mesos*
- Kubernetes [Bur16] (this is the successor)
- Clos network fabrik [Clos53][Sin15]
- B4 - backbone network [Jai13]
- OpenFlow communication protocol
## Storage
- Colossus
- successor of a GFS (Google File System)[Gheo03]
- comparable to *Lustre* or *Hadoop Distributed File System* (HDFS)
## DB
- BigTable [Cha06]
- Spanner [Cor12]
- Blobstore
## network
- Bandwidth Enforcer (BwE)[Kum15]
- Global Software Load Balacer (GSLB)
- 3 levels: Geo DNS, user service, RPC
## other
- Lock Service - Chubby [Bur06]
- similar to Zookeeper
- consistent DB
- Borgmon
- similar to *BB GUTS*
- Stubby
- similar to *gRPC* or *BB BAS*
- uses *protobufs*, check also *Apache Thrift*
## Reference
[Bur06]: http://research.google.com/archive/chubby-osdi06.pdf
[Bur16]: https://research.google/pubs/pub44843.pdf
[Cha06]: https://research.google/pubs/pub27898.pdf
[Clos53]: https://ia801901.us.archive.org/8/items/bstj32-2-406/bstj32-2-406_text.pdf
[Cor12]: https://research.google/pubs/pub39966.pdf
[Ghe03]: https://research.google.com/archive/gfs-sosp2003.pdf
[Jai13]: http://conferences.sigcomm.org/sigcomm/2013/papers/sigcomm/p3.pdf
[Kum15]: https://research.google/pubs/pub43838.pdf
[Sin15]: https://research.google.com/pubs/archive/43837.pdf
[Ver15]: https://research.google.com/pubs/archive/43438.pdf