interview questions facebook/google

2021-03-26 19:35:59 +00:00
parent 539e4b0077
commit 29dca37612
4 changed files with 469 additions and 0 deletions
--- a/puzzles/interviews/Google
+++ b/puzzles/interviews/Google
@@ -0,0 +1,125 @@
+# Google Interview
+
+## Coding Interview 1
+
+[['a1', 'a2', 'a1b'], ['b1', 'b2'], ['c1', 'c2']]
+
+### problem 1
+
+print all combinations, like
+```
+a1b1c1
+a1b1c2
+a1b2c1
+a1b2c2
+a2b1c1
+a2b1c2
+a2b2c1
+a2b2c2
+a1bb1c1
+a1bb1c2
+a1bb2c1
+a1bb2c2
+```
+### problem 2
+
+given a string 'a2b2c1', is it one of combinations from above.
+
+solved by dynamic programming.
+
+
+
+## Coding Interview 2
+
+### Problem 1
+given a tree (not a binary one)
+
+```
+         o
+        /|\
+       / | \
+      o  o  o
+     / \   / \
+    o  o  o   o
+    |     |
+    o     o
+```
+Find nodes with similar structure. Like all life nodes are similar.
+
+My solution was to come up with a signature for each node, like
+sig node = <num child>,(<sig child 1>),(<sig child 2>),(<sig child 3>) ...;
+then create a map <sig> -> set of similar nodes.
+
+if we know that nodes are divers, then we could actually create a hash.
+
+### Problem 2
+Represent a number in (-2) base, like  a*(-2)^4+b*(-2)^3+c*(-2)^2+d*(-2)^1+e*(-2)^0
+
+e.g.    1 -2 4 -8 16 -32
+
+ 0  = 0000
+ 1  = 0001
+-1  = 0011
+ 2  = 0110
+-2  = 0010
+ 3  = 0111
+-3  = 1111
+....
+
+
+## System Design 1
+
+800 Billions query log entries, 8 machines 
+top million query by frequency
+
+unique queries? take samples to estimate the unique query, shrinks by 1 factor of ten
+
+distribute queries evenly by machine
+
+go over the logs in each machine
+and create 10B key value pairs (assume 10x reduction)
+
+For all machines = 4T
+average query length = 50B
+
+Data per machine = 500GB of data  (This produced from 5T)
+    (use hash map, split data in chunks ~1T or 0.5T
+      Produce about 50GB for each chunks, do this 5-10 time for all data)
+
+reading from disk is going to be 20h at 200MB/s -> but we have 6 disks -> 3.5h to read everything
+
+Another way to do this is to sort each chunk (by query), merge those. Produce all 500GB of data. 
+
+idea: merge the data between machines in pairs (1.5h to send to wire)
+
+Somehow sort 500GB data on each machine and produce 1 M query per machine.
+(50Meg Data)
+About 10x50GB chunks. Create another hash count->set of queries.)
+
+Do chunks have overlapping queries? yes they wiil
+
+
+Send all data to one machine (400Meg Data)
+Apply some sort of heap sort and merge presorted arrays.
+
+
+Correctness of data
+
+how big are the machines?
+32 cores of 3 GHz
+120 G of RAM
+6x2T each (spinning disk)
+10Gbps network connection
+
+you can store 5T in each machine
+
+how to do the key value pair?
+have an hash map
+
+Run Time 
+
+2M querie 100MB data per machine, (1GB data transfered for all) May be another 1GB for the second phase.
+
+~4-5h to process this data.
+
+