126 lines
2.5 KiB
Markdown
126 lines
2.5 KiB
Markdown
# Google Interview
|
|
|
|
## Coding Interview 1
|
|
|
|
[['a1', 'a2', 'a1b'], ['b1', 'b2'], ['c1', 'c2']]
|
|
|
|
### problem 1
|
|
|
|
print all combinations, like
|
|
```
|
|
a1b1c1
|
|
a1b1c2
|
|
a1b2c1
|
|
a1b2c2
|
|
a2b1c1
|
|
a2b1c2
|
|
a2b2c1
|
|
a2b2c2
|
|
a1bb1c1
|
|
a1bb1c2
|
|
a1bb2c1
|
|
a1bb2c2
|
|
```
|
|
### problem 2
|
|
|
|
given a string 'a2b2c1', is it one of combinations from above.
|
|
|
|
solved by dynamic programming.
|
|
|
|
|
|
|
|
## Coding Interview 2
|
|
|
|
### Problem 1
|
|
given a tree (not a binary one)
|
|
|
|
```
|
|
o
|
|
/|\
|
|
/ | \
|
|
o o o
|
|
/ \ / \
|
|
o o o o
|
|
| |
|
|
o o
|
|
```
|
|
Find nodes with similar structure. Like all life nodes are similar.
|
|
|
|
My solution was to come up with a signature for each node, like
|
|
sig node = <num child>,(<sig child 1>),(<sig child 2>),(<sig child 3>) ...;
|
|
then create a map <sig> -> set of similar nodes.
|
|
|
|
if we know that nodes are divers, then we could actually create a hash.
|
|
|
|
### Problem 2
|
|
Represent a number in (-2) base, like a*(-2)^4+b*(-2)^3+c*(-2)^2+d*(-2)^1+e*(-2)^0
|
|
|
|
e.g. 1 -2 4 -8 16 -32
|
|
|
|
0 = 0000
|
|
1 = 0001
|
|
-1 = 0011
|
|
2 = 0110
|
|
-2 = 0010
|
|
3 = 0111
|
|
-3 = 1111
|
|
....
|
|
|
|
|
|
## System Design 1
|
|
|
|
800 Billions query log entries, 8 machines
|
|
top million query by frequency
|
|
|
|
unique queries? take samples to estimate the unique query, shrinks by 1 factor of ten
|
|
|
|
distribute queries evenly by machine
|
|
|
|
go over the logs in each machine
|
|
and create 10B key value pairs (assume 10x reduction)
|
|
|
|
For all machines = 4T
|
|
average query length = 50B
|
|
|
|
Data per machine = 500GB of data (This produced from 5T)
|
|
(use hash map, split data in chunks ~1T or 0.5T
|
|
Produce about 50GB for each chunks, do this 5-10 time for all data)
|
|
|
|
reading from disk is going to be 20h at 200MB/s -> but we have 6 disks -> 3.5h to read everything
|
|
|
|
Another way to do this is to sort each chunk (by query), merge those. Produce all 500GB of data.
|
|
|
|
idea: merge the data between machines in pairs (1.5h to send to wire)
|
|
|
|
Somehow sort 500GB data on each machine and produce 1 M query per machine.
|
|
(50Meg Data)
|
|
About 10x50GB chunks. Create another hash count->set of queries.)
|
|
|
|
Do chunks have overlapping queries? yes they wiil
|
|
|
|
|
|
Send all data to one machine (400Meg Data)
|
|
Apply some sort of heap sort and merge presorted arrays.
|
|
|
|
|
|
Correctness of data
|
|
|
|
how big are the machines?
|
|
32 cores of 3 GHz
|
|
120 G of RAM
|
|
6x2T each (spinning disk)
|
|
10Gbps network connection
|
|
|
|
you can store 5T in each machine
|
|
|
|
how to do the key value pair?
|
|
have an hash map
|
|
|
|
Run Time
|
|
|
|
2M querie 100MB data per machine, (1GB data transfered for all) May be another 1GB for the second phase.
|
|
|
|
~4-5h to process this data.
|
|
|
|
|