Files
notes/Distributed_Systems/site_reliability_engineering.md

1.4 KiB

SRE (Site Reliability Enrineering)

Managing Machines

  • Google Borg Ver15
    • similar to Apache Mesos
    • Kubernetes Bur16 (this is the successor)
  • Clos network fabrik Clos53
  • B4 - backbone network Jai13
  • OpenFlow communication protocol

Storage

  • Colossus
    • successor of a GFS (Google File System)[Gheo03]
    • comparable to Lustre or Hadoop Distributed File System (HDFS)

DB

network

  • Bandwidth Enforcer (BwE)Kum15
  • Global Software Load Balacer (GSLB)
    • 3 levels: Geo DNS, user service, RPC

other

  • Lock Service - Chubby Bur06
    • similar to Zookeeper
    • consistent DB
  • Borgmon
    • similar to BB GUTS
  • Stubby
    • similar to gRPC or BB BAS
    • uses protobufs, check also Apache Thrift

Reference