Suraz seriously loves teaching Big data. At the beginning of 2022, he decides to embark on a learning venture and open an Online Learning Centre where he taught his 3 favorite courses: Hadoop, Spark, and Scala.
Suraz’s Learning venture is in need of your assistance to help the Online Learning Centre stay afloat - the Learning venture has captured some very basic data from its few months of operation but has no idea how to use their data to help them run the courses.
Suraz wants to use the data to answer a few simple questions about his students, especially about their learning patterns, how much money and time they’ve spent, and also which courses are their favorite. Having this deeper connection with his students will help him deliver a better and more personalized experience for his loyal students.
He plans on using these insights to help him decide whether he should expand the existing student referral program - additionally, he needs help to generate some basic datasets so his team can easily inspect the data without needing to use SQL.
Suraz has provided you with a sample of his overall student data due to privacy issues - but he hopes that these examples are enough for you to write fully functioning SQL queries to help him answer his questions!
Suraz has shared with you 3 key datasets for this case study:
sales
courses
students
You can inspect the entity-relationship diagram and example data below.
All datasets exist within the suraz_bigdata_program
database schema - be sure to include this reference within your SQL scripts as you start exploring the data and answering the case study questions.
The sales
table captures all student_id
level purchases with a corresponding order_date
and course_id
information for when and what menu items were ordered.
student_id | order_date | course_id |
---|---|---|
ABC | 2022-01-01 | 1 |
ABC | 2022-01-01 | 2 |
ABC | 2022-01-07 | 2 |
ABC | 2022-01-10 | 3 |
ABC | 2022-01-11 | 3 |
ABC | 2022-01-11 | 3 |
BCD | 2022-01-01 | 2 |
BCD | 2022-01-02 | 2 |
BCD | 2022-01-04 | 1 |
BCD | 2022-01-11 | 1 |
BCD | 2022-01-16 | 3 |
BCD | 2022-02-01 | 3 |
CDE | 2022-01-01 | 3 |
CDE | 2022-01-01 | 3 |
CDE | 2022-01-07 | 3 |
Table2: course
The course
table maps the course_id
to the actual course_name
and price
of each course item.
course_id | course_name | price |
---|---|---|
1 | Hadoop | 12000 |
2 | Spark | 15000 |
3 | Scala | 15000 |
Table 3: student
The final student
table captures the joining_date
when a student_id
joined any course of Suraz’s Big Data program.
student_id | joining_date |
---|---|
ABC | 2022-01-07 |
BCD | 2022-01-09 |
Interactive SQL Scripts
Below are SQL scripts to create the required data sets:
CREATE SCHEMA suraz_bigdata_program;
CREATE TABLE sales (
"stdent_id" VARCHAR(5),
"order_date" DATE,
"course_id" INTEGER
);
INSERT INTO sales
("stdent_id", "order_date", "course_id")
VALUES
('ABC', '2022-01-01', '1'),
('ABC', '2022-01-01', '2'),
('ABC', '2022-01-07', '2'),
('ABC', '2022-01-10', '3'),
('ABC', '2022-01-11', '3'),
('ABC', '2022-01-11', '3'),
('BCD', '2022-01-01', '2'),
('BCD', '2022-01-02', '2'),
('BCD', '2022-01-04', '1'),
('BCD', '2022-01-11', '1'),
('BCD', '2022-01-16', '3'),
('BCD', '2022-02-01', '3'),
('CDE', '2022-01-01', '3'),
('CDE', '2022-01-01', '3'),
('CDE', '2022-01-07', '3');
CREATE TABLE course (
"course_id" INTEGER,
"course_name" VARCHAR(10),
"price" INTEGER
);
INSERT INTO course
("course_id", "course_name", "price")
VALUES
('1', 'Hadoop', '12000'),
('2', 'Spark', '15000'),
('3', 'Scala', '15000');
CREATE TABLE student (
"student_id" VARCHAR(5),
"joining_date" DATE
);
INSERT INTO student
("student_id", "joining_date")
VALUES
('ABC', '2022-01-07'),
('BCD', '2022-01-09');
Each of the following case study questions can be answered using a single SQL statement:
- What is the total amount each student spent in the BigDataProgram?
- How many days has each student visited the BigDataProgram?
- What was the first item from the course purchased by each student?
- What is the most purchased course from the courses and how many times was it purchased by all students?
- Which course was the most popular for each student?
- Which course was purchased first by the user after they became students?
- Which course was purchased just before the user became a student?
- What are the total course and amounts spent for each user before they became a student?
- If each RS1000 spent equates to 10 points and Hadoop has a 2x points multiplier - how many points would each student have?
- In the first week after a student joins the program (including their joining date) they earn 2x points on all items, not just Hadoop - how many points do students ABC and BCD have at the end of January?
If you’d like to use this case study for one of your portfolio projects or in a personal blog post - please remember to link back to this URL also don’t forget to share some LinkedIn updates using the #SQLChallenge hashtag and remember to tag me!
I really hope you enjoyed this fun little case study - it definitely was fun for me to create!