Map Reduce Bigdata

Evaluating bids
Your project must incorporate the following elements:
1. Consider a large dataset and the size of the dataset should justify the complexity level (Struc-
tured or Semi-structured or Non-structured). Source datasets can be static (file or database)
or programmatically retrieved from an API/ Web Service/ Scrape, or a mixture of the two.

2. Utilisation of a distributed data processing environment (e.g., Hadoop Map-reduce or Spark),
for some part of the analysis.
3. Source dataset(s) should be stored into an appropriate SQL/ NoSQL database(s) prior to pro-
cessing by MapReduce / Spark (HBase / HIVE / Spark SQL / Kudu / Cassandra / MongoDB /
etc.) The data should be populated into the NoSQL database using an appropriate tool (Sqoop
/ Spark / Pentaho / Talend / etc.)
4. Post Map-reduce processing dataset(s) should be stored in an appropriate NoSQL database(s)
(Follow a similar choice as in the previous step no. 3 or a different choice)
5. Programmatically accessing the source data from the chosen NoSQL database using appro-
priate MapReduce / Spark code (i.e.

You should not extract the data to text files before run-
ning the MapReduce / Spark task, the MapReduce / Spark task should read directly from the
6. Programmatically storing the MapReduce/ Spark output data into the chosen NoSQL output
database (again, the MapReduce / Spark task should write directly to the database).
7. Follow-up analysis on the output data. It can be extracted from the NoSQL database into an-
other format, using an appropriate tool, if necessary (e.g. Extract to CSV to import into R/
Python/ Matlab/ Qlik / Power BI / Tableau/ SPSS).

For example, you may initially utilise MySQL to store a large structured dataset (any NoSQL data-
base for semi or non-structured dataset) and then your Hadoop MapReduce/ Spark processing
would utilise the MySQL (NoSQL) database as an input source. After processing the data through
MapReduce/ Spark, you may then store the Big data into HBase or Hive or MongoDB.
Following that you may use Python/NumPy/Pandas/Matplotlib/Matlab or R/ggplot/plotly to con-
duct further analysis of the MapReduce output data (e.g.,

Statistical analysis), and generate data
visualisation plots for better presentation of the results. Alternatively, you can import output gen-
erated data into Microsoft Powerbi/ ibm spss to generate the analyses.

Category: IT & Programming
Subcategory: Other
Project size: Small
Is this a project or a position?: Project
I currently have: I have specifications
Required availability: As needed


USD 100 - 250





Interested freelancers

Days until project expiration: 18 days

Published: 2 weeks ago

Deadline: Not specified

Create your own project

Are you looking for a freelancer to work on a similar project? Create your own project and you’ll receive proposals from the best freelancers.

Freelancers who already applied to this project

Magento-WordPress E.My name is Nayan and I am expert in the areas of Magento, Wordpress, PSD's to HTML, Photoshop and logo design, template design, brochure design, PHP with Object-oriented architecture. I have made 300+ websites in ... More details

YongWen J.Specialties: ---------- Mobile App Development-------- ♦ React Native, Ionic, Flutter, Native Script, Unity 3d ♦ iOS Mobile App Development, Swift, Objective-C, RxSwift ♦ Android Mobile App Development, Ja... More details