Seminar 12. Mock Test
The aim of this mock test is to perform data analysis using Hadoop technological stack. You will practically apply big data tools on a sample dataset. You need to work on Cloudera Quickstart VM version 5.12.0 and use different technologies (Hive, Spark, MapReduce, Pig) to perform data analysis. Instructions on how to install Cloudera Quickstart VM are provided on the module page in WIUT’s Intranet.
In the course of this assignment, you should analyze the dataset and implement solutions to answer questions asked below. For the real test you will also produce a small report describing your findings.
You need to create a folder named as your ID number in HDFS to store all your data. Copy your dataset file(s) from local system to HDFS folder.
You should answer these questions using Hive and Spark. Both frameworks should be used to answer each question. Then you will compare results to make sure you answered the questions properly. You need to work on the data you placed into the HDFS folder.
You need to provide all the scripts to: § create folders
You should submit Jupyter notebook if you’re working with PySpark. You need to submit your map and reduce scripts for MapReduce task(s).
Proper instructions on how to execute the scripts should be provided in the report during the real exam.
Formula One, also called F1 in short, is an international auto racing sport. F1 is the highest level of single-seat, open-wheel and open-cockpit professional motor racing contest.
The objective of a Formula 1 contest is to determine the winner of a race. The driver who crosses the finish line first after completing a pre-determined number of laps is declared the winner.
A series of Formula One races are conducted over a period of time, usually over a year called the ‘Formula One World Championship season’. Each race in a season is called a ‘Grand Prix’ or GP
The number of Grand Prix in a season has varied through the years, starting from 1950 which had 7 races. This number kept increasing up to a maximum of 20 GPs a year (in 2012). Normally there are 19 to 20 GPs in a season now. Top 10 drivers at the end of each Grand Prix will receive points based on the positions they finished, and these points will contribute towards determining both, the champion.
The results of all the Grand Prix races in a season are taken together to determine annual Championship awards. If you need more info about F1 – you can check the following website: https://www.tutorialspoint.com/formula_one/formula_one_quick_guide.htm
The website kaggle.com published a dataset related to the F1 race data. This dataset contains data from 1950 all the way through the 2017 season, and consists of tables describing constructors, race drivers, lap times, pit stops and more. You can find the dataset at https://www.kaggle.com/cjgdev/formula-1-race-data-19502017.
Download this dataset and using Big Data tools answer the following questions:
Our motto is deliver assignment on Time. Our Expert writers deliver quality assignments to the students.
Get reliable and unique assignments by using our 100% plagiarism-free.
Get connected 24*7 with our Live Chat support executives to receive instant solutions for your assignment.
Get Help with all the subjects like: Programming, Accounting, Finance, Engineering, Law and Marketing.
Get premium service at a pocket-friendly rate at AssignmentHippo
I was struggling so hard to complete my marketing assignment on brand development when I decided to finally reach to the experts of this portal. They certainly deliver perfect consistency and the desired format. The content prepared by the experts of this platform was simply amazing. I definitely owe my grades to them.