구글 인터뷰 대학은 웹 개발자(컴퓨터공학 학위 없이 독학한)에서 구글의 소프트웨어 엔지니어가 되기 위한 나의 몇 달 간의 공부 계획이다.
이 기나긴 리스트는 구글 코칭 노트에서 선별되고 확장된 것으로 여러분이 알아야 할 내용이다. 맨 아래에는 인터뷰에 등장하거나 문제를 푸는 데에 도움이 될 만한 추가적인 내용이 있다. 많은 내용이 Steve Yegge의 "Get that job at Google"이라는 책에서 나왔으며, 때때로 구글 코칭 노트의 내용을 그대로 담고있기도 하다.
나는 Yegge의 추천으로부터 여러분이 알아야만 할 내용들을 추려내었다. 구글과의 연락으로 얻은 정보를 바탕으로 그의 추천내용을 수정하였다. 이 리스트는 신입 소프트웨어 엔지니어, 혹은 소프트웨어/웹 개발에서 소프트웨어 엔지니어링(컴퓨터과학 지식이 필요한)으로 전환하는 사람들을 위한 것이다
만약 당신이 여러 해의 소프트웨어 엔지니어링 경력이 있다면, 더 어려운 인터뷰가 예상된다. 더 보기.
만약 당신이 여러 해의 소프트웨어/웹 개발 경험을 가지고 있다면, 구글은 소프트웨어 엔지니어링을 소프트웨어/웹 개발과 다르게 바라보고 있으며 컴퓨터과학 지식을 요구한다는 사실에 주목하도록 하자.
신뢰할만한 엔지니어, 혹은 시스템 엔지니어가 되고 싶다면 선택적 주제 목록(네트워크, 보안 등)을 더 공부하도록 하자.
나는 구글 인터뷰를 준비하기 위해 이 계획을 따랐다. 1997년 부터 나는 웹과 서비스를 개발하고 스타트업을 세웠다. 나는 컴퓨터과학이 아닌 경제학 학위를 가지고 있다.
나의 커리어는 굉장히 성공적이어왔지만, 나는 구글에서 일하고 싶었다. 나는 더 큰 시스템을 다루고 컴퓨터 시스템, 알고리즘 효율, 자료구조 퍼포먼스, 저급 언어 등과 그 것들이 어떻게 작동하는지에 대하여
이해하고 싶었다. 그리고 당신이 그런 것들을 모른다면 구글은 당신을 채용하지 않을 것이다.
내가 이 프로젝트를 시작했을 때, 나는 힙스택, Big-O, 트리, 그래프 운행 등에 대하여 전혀 아는 바가 없었다.
만약 내가 정렬 알고리즘을 코딩해야했다면, 나는 그리 잘 하지 못했을 것이다.
모든 사용했던 모든 자료 구조는 언어 안에서 구현 되어 있던 것들이고, 나는 그 것들이 보이는 것 아래서 어떻게 작동하고 있는지 알지 못했다.
나는 진행 중인 프로세스가 메모리 부족 에러를 메세지를 보내지 않는 한 메모리를 관리할 필요가 없었고, 나는 회피방법을 찾아야만 했다.
나는 몇몇 다차원 배열이나 연관 배열을 사용해왔지만, 자료구조를 처음부터 구현해본 적은 없었다.
하지만 이 공부 계획을 진행하면서 나는 내가 고용될 것이라는 자신감을 갖게 되었다. 이 것은 내게 여러 달이 필요한 긴 계획이다.
만약 당신이 이 중 많은 내용에 익숙하다면 시간은 훨씬 덜 들 것이다.
How to use it
아래의 모든 것은 대략적인 개요이며 당신은 위에서 아래 순서대로 진행해야 한다.
진행상황을 확인하기 위한 목록를 포함하여, 나는 Github'special markdown flavor를 사용하고 있다.
The algorithm catalog portion is well beyond the scope of difficulty you'll get in an interview.
This book has 2 parts:
class textbook on data structures and algorithms
pros:
is a good review as any algorithms textbook would be
nice stories from his experiences solving problems in industry and academia
code examples in C
cons:
can be as dense or impenetrable as CLRS, and in some cases, CLRS may be a better alternative for some subjects
chapters 7, 8, 9 can be painful to try to follow, as some items are not explained well or require more brain than I have
don't get me wrong: I like Skiena, his teaching style, and mannerisms, but I may not be Stony Brook material.
algorithm catalog:
this is the real reason you buy this book.
about to get to this part. Will update here once I've made my way through it.
To quote Yegge: "More than any other book it helped me understand just how astonishingly commonplace
(and important) graph problems are – they should be part of every working programmer's toolkit. The book also
covers basic data structures and sorting algorithms, which is a nice bonus. But the gold mine is the second half
of the book, which is a sort of encyclopedia of 1-pagers on zillions of useful problems and various ways to solve
them, without too much detail. Almost every 1-pager has a simple picture, making it easy to remember. This is a
great way to learn how to identify hundreds of problem types."
Can rent it on kindle
Half.com is a great resource for textbooks at good prices.
Important: Reading this book will only have limited value. This book is a great review of algorithms and data structures, but won't teach you how to write good code. You have to be able to code a decent solution efficiently.
To quote Yegge: "But if you want to come into your interviews prepped, then consider deferring your application until you've made your way through that book."
Half.com is a great resource for textbooks at good prices.
aka CLR, sometimes CLRS, because Stein was late to the game
The first couple of chapters present clever solutions to programming problems (some very old using data tape) but
that is just an intro. This a guidebook on program design and architecture, much like Code Complete, but much shorter.
"Algorithms and Programming: Problems and Solutions" by Shen
A fine book, but after working through problems on several pages I got frustrated with the Pascal, do while loops, 1-indexed arrays, and unclear post-condition satisfaction results.
Would rather spend time on coding problems from another book or online coding problems.
시작하기 전에
이 문서는 몇 달간 계속 업데이트 되고 있으며, 그런 이유로, 내가 감당할 수 없어지기 시작한 듯하다.
내가 저지른 몇 가지 실수들을 소개한다. 이를 통해 당신은 이 과정을 좀 더 효과적으로 진행할 수 있기를 바란다.
1. 당신은 이것을 다 기억하지 못할 것이다.
나는 수 시간의 비디오를 보고 방대한 양의 노트를 작성했지만, 몇 달 뒤에는 대부분의 내용을 기억하지 못했다. 나는 3일 동안 내가 작성한 노트를 보고 flashcard를 만들면서 내용들을 다시 검토해야 했다.
Gotcha: you need pointer to pointer knowledge:
(for when you pass a pointer to a function that may change the address where that pointer points)
This page is just to get a grasp on ptr to ptr. I don't recommend this list traversal style. Readability and maintainability suffer due to cleverness.
dequeue() - returns value and removes least recently added element (front)
empty()
Implement using fixed-sized array:
enqueue(value) - adds item at end of available storage
dequeue() - returns value and removes least recently added element
empty()
full()
Cost:
a bad implementation using linked list where you enqueue at head and dequeue at tail would be O(n)
because you'd need the next to last element, causing a full traversal each dequeue
enqueue: O(1) (amortized, linked list and array [probing])
NOTE: DP is a valuable technique, but it is not mentioned on any of the prep material Google provides. But you could get a problem where DP provides an optimal solution. So I'm including it.
This subject can be pretty difficult, as each DP soluble problem must be defined as a recursion relation, and coming up with it can be tricky.
I suggest looking at many examples of DP problems until you have a solid understanding of the pattern involved.
Videos:
the Skiena videos can be hard to follow since he sometimes uses the whiteboard, which is too small to see
Know about the most famous classes of NP-complete problems, such as traveling salesman and the knapsack problem,
and be able to recognize them when an interviewer asks you them in disguise.
Reading all from end to end with full comprehension will likely take more time than you have. I recommend being selective on papers and their sections.
You can expect system design questions if you have 4+ years of experience.
Scalability and System Design are very large topics with many topics and resources, since
there is a lot to consider when designing a software/hardware system that can scale.
Expect to spend quite a bit of time on this.
For even more, see "Mining Massive Datasets" video series in the Video Series section.
Practicing the system design process: Here are some ideas to try working through on paper, each with some documentation on how it was handled in the real world:
집에 화이트보드가 없는가? 그럴 수 있다. 나는 커다란 화이트보드를 가진 괴짜이다. 화이트보드 대신에 상점에서 큰 도화지를 사오자.
소파에 앉아서 연습할 수 있다. 이 것은 내 "소파 화이트보드"이다. 크기 비교를 위해 사진에 펜을 추가하였다. 펜을 쓰면, 곧 지우고 싶어질 것이다.
금방 지저분해 진다.
*****************************************************************************************************
*****************************************************************************************************
아래의 모든 것들은 선택 사항이다. 이 것들은 Google의 권장사항이 아니라, 나의 추천사항이다.
당신은 이것들을 공부함으로써 더 많은 CS 개념들에 대해 알 수 있을 것이며, 소프트웨어 엔지니어링 직업을 준비하는 데에도 도움이 될 것
이다. 더불어 당신은 훨씬 더 균형 잡힌 소프트웨어 엔지니어가 될 것이다.
*****************************************************************************************************
*****************************************************************************************************
Know least one type of balanced binary tree (and know how it's implemented):
"Among balanced search trees, AVL and 2/3 trees are now passé, and red-black trees seem to be more popular.
A particularly interesting self-organizing data structure is the splay tree, which uses rotations
to move any accessed key to the root." - Skiena
Of these, I chose to implement a splay tree. From what I've read, you won't implement a
balanced search tree in your interview. But I wanted exposure to coding one up
and let's face it, splay trees are the bee's knees. I did read a lot of red-black tree code.
splay tree: insert, search, delete functions
If you end up implementing red/black tree try just these:
search and insertion functions, skipping delete
I want to learn more about B-Tree since it's used so widely with very large data sets.
In practice:
From what I can tell, these aren't used much in practice, but I could see where they would be:
The AVL tree is another structure supporting O(log n) search, insertion, and removal. It is more rigidly
balanced than red–black trees, leading to slower insertion and removal but faster retrieval. This makes it
attractive for data structures that may be built once and loaded without reconstruction, such as language
dictionaries (or program dictionaries, such as the opcodes of an assembler or interpreter).
In practice:
Splay trees are typically used in the implementation of caches, memory allocators, routers, garbage collectors,
data compression, ropes (replacement of string used for long text strings), in Windows NT (in the virtual memory,
networking and file system code) etc.
In practice:
Red–black trees offer worst-case guarantees for insertion time, deletion time, and search time.
Not only does this make them valuable in time-sensitive applications such as real-time applications,
but it makes them valuable building blocks in other data structures which provide worst-case guarantees;
for example, many data structures used in computational geometry can be based on red–black trees, and
the Completely Fair Scheduler used in current Linux kernels uses red–black trees. In the version 8 of Java,
the Collection HashMap has been modified such that instead of using a LinkedList to store identical elements with poor
hashcodes, a Red-Black tree is used.
In practice:
For every 2-4 tree, there are corresponding red–black trees with data elements in the same order. The insertion and deletion
operations on 2-4 trees are also equivalent to color-flipping and rotations in red–black trees. This makes 2-4 trees an
important tool for understanding the logic behind red–black trees, and this is why many introductory algorithm texts introduce
2-4 trees just before red–black trees, even though 2-4 trees are not often used in practice.
fun fact: it's a mystery, but the B could stand for Boeing, Balanced, or Bayer (co-inventor)
In Practice:
B-Trees are widely used in databases. Most modern filesystems use B-trees (or Variants). In addition to
its use in databases, the B-tree is also used in filesystems to allow quick random access to an arbitrary
block in a particular file. The basic problem is turning the file block i address into a disk block
(or perhaps to a cylinder-head-sector) address.
MIT 6.851 - Memory Hierarchy Models (video)
- covers cache-oblivious B-Trees, very interesting data structures
- the first 37 minutes are very technical, may be skipped (B is block size, cache line size)
k-D Trees
great for finding number of points in a rectangle or higher dimension object
I added these to reinforce some ideas already presented above, but didn't want to include them
above because it's just too much. It's easy to overdo it on a subject.
You want to get hired in this century, right?