Lets use covid19 dataset to build a small poc on Spark steaming with Kafka.
Steps:
- Download Kafka latest from Internet https://www.apache.org/dyn/closer.cgi?path=/kafka/2.5.0/kafka_2.13-2.5.0.tgz
- Extract to a folder say D:\kafka_2.13-2.5.0
- Navigate to below path D:\kafka_2.13-2.5.0\kafka_2.13-2.5.0\bin\windows
- copy D:\kafka_2.13-2.5.0\kafka_2.13-2.5.0/conf/zookeeper.properties to kafka/bin/windows
- copy D:\kafka_2.13-2.5.0\kafka_2.13-2.5.0/conf/server.properties to kafka/bin/windows
-
start zookeeper
$ zookeeper-server-start.bat zookeeper.properties -
start Kafka Server
$ kafka-server-start server.properties - Create Kafka Topic
$ kafka-topics.bat --zookeeper localhost:2181 --create --topic covid19india --partitions 2 --replication-factor 1 -
Create DB Table to Store Data.
CREATE TABLE `covid19india` ( `sno` varchar(100) DEFAULT NULL, `date_of_identification` varchar(100) DEFAULT NULL, `current_status` varchar(100) DEFAULT NULL, `state` varchar(100) DEFAULT NULL, `num_of_cases` int(11) DEFAULT NULL )
-
Run Kafka Consumer Code
- Run Kafka Producer Code
- Check Your MySQK Database table getting Populated.
- Visualize it using Tableau.
Example usecase
https://www.dropbox.com/sh/c8l4uy57wisahub/AAAr9GQGtUb0pXzy5qykr9LZa?dl=0
Published By : Suraj Ghimire