Biao Lu's Blog: Hadoop learning notes

During the Olympic break I got a chance to read something about one of Apache's top projects - hadoop. Essentially it looks like an elegant way of implementing the famous map-reduce ideas. At the beginning I was able to run those codes that they provided. But when it came to the multi-server section, I was trapped by the lack of resources. So I could only read the book and mimic the execution of the code in my mind.

As far as I can see, the idea of hadoop here is to split the work, then map them to different nodes and reduce the results. Well, this is just a very naive understanding. But I guess for the most part of hadoop it is doing these things. The nice thing is that hadoop provide a scalable mechanism to implement this map-reduce idea. By assigning the responsibilities to different machines(they are treated as nodes like NameNode, DataNode, JobTracker, TaskTracker and so on). By carefully configuring a cluster, these nodes could work together and finish some heavy-loaded tasks.

Then I talked about it with my friend. He is also doing some computing research right now. He suggest me read some Google stuff about this kind of map-reduce. As hadoop is originated by Yahoo. Google may have a different idea to do the same thing. Well, the problem is how I can get to know the secrets of Google's infrastructure. Every one knows there is a big table in data search engine. But how does Google split the work and implement the big table. It is obviously more difficult to get the information from Google rather than Hadoop:( Perhaps I have to read other people's research about this field instead of find it out by myself.

Biao Lu's Blog

Sunday, March 7, 2010

Hadoop learning notes

No comments:

Post a Comment

Followers

Blog Archive

About Me