Today it is common for engineers and business analyst to use tools such as Pig and Hive when querying big datasets in Hadoop. Both tools offer an abstraction layer that encapsulates the complexity of map reduce. However, for any engineer that wants to understand the how map reduce works, it's best to start by creating map reduce jobs in code first then learn how to use tools such as Pig and Hive. This data challenge assumes that you've spent some time learning about map reduce and is just an exercise you can use to practice with.
For this data challenge, you are to work with a 12 MB dataset from the New York Stock Exchange (NYSE). The dataset contains the following data fields for every stock traded on the NYSE for every day from 1/1/2000 - 12/31/2001.
For this data challenge, you are to work with a 12 MB dataset from the New York Stock Exchange (NYSE). The dataset contains the following data fields for every stock traded on the NYSE for every day from 1/1/2000 - 12/31/2001.
- stock_symbol
- date
- stock_price_open
- stock_price_high
- stock_price_low
- stock_price_close
The image below is a sample of the data set for the stock "ASP" from 12/17/2001 - 12/31/2001
You are to create a map reduce job in Java that returns the date that each stock reached it highest price and the date it reached it's lowest price. Your output file should be in a easy to read format that allows me to quickly see the information that I need.
Extra Credit:
Modify your map reduce job so that it returns one output file for the year 2000 and another output file for the year 2001.
When your done, zip up your output file(s) and email them to me at collindcouch@gmail.com. I'll compare your output the solution and let you know how you did.
Good luck, and have fun!
-Collin Couch

No comments:
Post a Comment