interview questions facebook/google

This commit is contained in:
Vahagn Khachatryan
2021-03-26 19:35:59 +00:00
parent 539e4b0077
commit 29dca37612
4 changed files with 469 additions and 0 deletions

View File

@@ -0,0 +1,125 @@
# Google Interview
## Coding Interview 1
[['a1', 'a2', 'a1b'], ['b1', 'b2'], ['c1', 'c2']]
### problem 1
print all combinations, like
```
a1b1c1
a1b1c2
a1b2c1
a1b2c2
a2b1c1
a2b1c2
a2b2c1
a2b2c2
a1bb1c1
a1bb1c2
a1bb2c1
a1bb2c2
```
### problem 2
given a string 'a2b2c1', is it one of combinations from above.
solved by dynamic programming.
## Coding Interview 2
### Problem 1
given a tree (not a binary one)
```
o
/|\
/ | \
o o o
/ \ / \
o o o o
| |
o o
```
Find nodes with similar structure. Like all life nodes are similar.
My solution was to come up with a signature for each node, like
sig node = <num child>,(<sig child 1>),(<sig child 2>),(<sig child 3>) ...;
then create a map <sig> -> set of similar nodes.
if we know that nodes are divers, then we could actually create a hash.
### Problem 2
Represent a number in (-2) base, like a*(-2)^4+b*(-2)^3+c*(-2)^2+d*(-2)^1+e*(-2)^0
e.g. 1 -2 4 -8 16 -32
0 = 0000
1 = 0001
-1 = 0011
2 = 0110
-2 = 0010
3 = 0111
-3 = 1111
....
## System Design 1
800 Billions query log entries, 8 machines
top million query by frequency
unique queries? take samples to estimate the unique query, shrinks by 1 factor of ten
distribute queries evenly by machine
go over the logs in each machine
and create 10B key value pairs (assume 10x reduction)
For all machines = 4T
average query length = 50B
Data per machine = 500GB of data (This produced from 5T)
(use hash map, split data in chunks ~1T or 0.5T
Produce about 50GB for each chunks, do this 5-10 time for all data)
reading from disk is going to be 20h at 200MB/s -> but we have 6 disks -> 3.5h to read everything
Another way to do this is to sort each chunk (by query), merge those. Produce all 500GB of data.
idea: merge the data between machines in pairs (1.5h to send to wire)
Somehow sort 500GB data on each machine and produce 1 M query per machine.
(50Meg Data)
About 10x50GB chunks. Create another hash count->set of queries.)
Do chunks have overlapping queries? yes they wiil
Send all data to one machine (400Meg Data)
Apply some sort of heap sort and merge presorted arrays.
Correctness of data
how big are the machines?
32 cores of 3 GHz
120 G of RAM
6x2T each (spinning disk)
10Gbps network connection
you can store 5T in each machine
how to do the key value pair?
have an hash map
Run Time
2M querie 100MB data per machine, (1GB data transfered for all) May be another 1GB for the second phase.
~4-5h to process this data.