Download the dataset from the below path:
https://drive.google.com/file/d/1pRcX-HFAvKhG7LYEYehhQYMl6hYt2MbQ/view?usp=sharing
Here goes the questions that you need to solve.Once solved please email me your solution at suraz.hadoop@gmail.com
1. Find out Category wise total products.
eg:
cat_id Total_Product
1 20
2 10
3 70
2.Find out top 10,Highly ordered product ordered by mostly ordered first.Please make sure the order was not cancelled.
eg:
prod_id total_ordered
1 100
2 70
3 50
4 20
3.Give me the summary of the orders, from the total data.
How many were CLOSED,PENDING_PAYMENT,COMPLETE,PROCESSING,ON_HOLD,CANCELLED
4.Give me the list of all the Product whose order were cancelled along with count.
prod_id count(*)
1 500
2 50
5. Find out all the happy product.A product is said to be happy product if it was never cancelled anytime.
6. Give me the most sold product in terms of Volume.(Display all product with their total volume of sales).Dont Include cancelled ones.
7. Give me the list of most money making product. That generated more money ( whose price* quantity is maximum).Dont include cancelled ones.
8. Give me the top 10 products that made least money. Dont include cancelled ones.
9. If we divide the total time of 24 Hours into 8 parts.
00:00- 3:00 AM
3:00-6:00 AM
6:00- 9:00 AM
9:00-12:00 PM
12:00-15:00 PM
15:00-18:00 PM
18:00-21:00 PM
21:00- 24:00 AM
Tell me the total sales that have happened in these time period.
You must consider only successful transaction to count the total sales.
10.List of all the ordersIds where items were more than 5, and no product were repeated more than once and total sales amount of that order was more than equal to 1000.
Support order no 101
ProdId productQty price
1 1 200
2 1 200
3 1 200
4 1 200
5 1 200
Here no products were repeated twice and total amount was 1000 and there were 5 or more product.