Files
notes/Distributed_Systems/site_reliability_engineering.md
2020-12-30 23:56:38 +00:00

1.6 KiB

SRE (Site Reliability Enrineering)

Managing Machines

  • Google Borg [Ver15]
    • similar to Apache Mesos
    • Kubernetes [Bur16] (this is the successor)
  • Clos network fabrik [Clos53][Sin15]
  • B4 - backbone network [Jai13]
  • OpenFlow communication protocol

Storage

  • Colossus
    • successor of a GFS (Google File System)[Ghe03]
    • comparable to Lustre or Hadoop Distributed File System (HDFS)

DB

  • BigTable [Cha06]
  • Spanner [Cor12]
  • Blobstore

network

  • Bandwidth Enforcer (BwE)[Kum15]
  • Global Software Load Balacer (GSLB)
    • 3 levels: Geo DNS, user service, RPC

other

  • Lock Service - Chubby [Bur06]
    • similar to Zookeeper
    • consistent DB
  • Borgmon
    • similar to BB GUTS
  • Stubby
    • similar to gRPC or BB BAS
    • uses protobufs, check also Apache Thrift

Dev Environment

[Mor12b] and [Pot16]

!Reference [Bur06]: http://research.google.com/archive/chubby-osdi06.pdf [Bur16]: https://research.google/pubs/pub44843.pdf [Cha06]: https://research.google/pubs/pub27898.pdf [Clos53]: https://ia801901.us.archive.org/8/items/bstj32-2-406/bstj32-2-406_text.pdf [Cor12]: https://research.google/pubs/pub39966.pdf [Ghe03]: https://research.google.com/archive/gfs-sosp2003.pdf [Jai13]: http://conferences.sigcomm.org/sigcomm/2013/papers/sigcomm/p3.pdf [Kum15]: https://research.google/pubs/pub43838.pdf [Mor12b]: https://research.google/pubs/pub37755.pdf [Pot16]: https://www.youtube.com/watch?v=W71BTkUbdqE&feature=emb_logo [Sin15]: https://research.google.com/pubs/archive/43837.pdf [Ver15]: https://research.google.com/pubs/archive/43438.pdf