Saipavan Konireddy
814-***-**** • **************@*****.***
Professional Summary
Around 3 years IT experience in Analysis, Design, Development and Big Data in Scala, Pyspark, Hadoop, and HDFS environment and experience in Python. Implemented Big Data solutions using Hadoop technology stack, including Pyspark, Hive, Sqoop, Avro and Thrift.
Proficiency in developing SQL with various relational databases like Oracle, SQL Server for Support of Data Warehousing and Data Integration Solutions. Strong experience in migrating other databases to Snowflake. In-depth knowledge of Snowflake Database, Schema and Table structures.
Experienced in requirement analysis, application development, application migration and maintenance using Software Development Lifecycle (SDLC) and Python technologies. Defined user stories and driving the agile board in JIRA during project execution, participate in sprint demo and retrospective.
Strong working experience with SQL and NoSQL databases, data modeling and data pipelines. Involved in end-to-end development and automation of ETL pipelines using SQL and Python. Managed multiple tasks and worked under tight deadlines and in fast pace environment. Excellent analytical, communication skills which helps to understand the business logics and develop a good relation between stakeholders and team members. Good communication skills, work ethics and the ability to work in a team efficiently with good leadership skills.Implementation and Support using Agile and Waterfall Methodologies. Skills
TECHNICAL SKILLS:
Big Data
Ecosystem:
HDFS, MapReduce, HBase, Pig, Hive, Sqoop,
Kafka Flume, Cassandra, Impala, Oozie,
Zookeeper, MapR,
Amazon Web Services (AWS), EMR
Machine
Learning:
Databases: Oracle 11g/10g/9i, MySQL, DB2, MS
SQL Server, HBASE
Programming: Query Languages Java, SQL,
Python Programming (Pandas, NumPy, SciPy,
Scikit-Learn, Seaborn,
Matplotlib, NLTK), NoSQL, PySpark, PySpark
SQL, SAS, R Programming (Caret, Glmnet,
XGBoost, rpart,
Ggplot2, sqldf), RStudio, PL/SQL, Linux shell
Classification Algorithms Logistic Regression,
Decision Tree, Random Forest, K-Nearest
Neighbor (KNN),
Gradient Boosting Classifier, Extreme Gradient
Boosting Classifier, Support Vector Machine
(SVM), Artificial
Neural Networks (ANN), Naïve Bayes
Classifier, Extra Trees Classifier, Stochastic
Gradient Descent, etc.
Cloud
Technologies:
AWS, Azure, Google cloud platform (GCP)
IDE'sIntelliJ: Eclipse, Spyder, Jupyter
Ensemble and
Stacking
Averaged Ensembles Weighted Averaging,
Base Learning, Meta Learning, Majority Voting,
Stacked
Ensemble, Auto ML - Scikit-Learn, ML jar, etc.
scripts, Scala.
F
Data Engineer: Big Data Tools / Cloud /
Visualization / Other Tools Databricks, Hadoop
Distributed File System (HDFS),
Hive, Pig, Sqoop, MapReduce, Spring Boot,
Flume, YARN, Hortonworks, Cloudera,
Mahout, MLlib, Oozie,
Zookeeper, etc. AWS, Azure Databricks, Azure
Data Explorer, Azure HDInsight, Salesforce,
GCP, Google
Shell, Linux, PuTTY, Bash Shell, Unix, etc.,
Tableau, Power BI, SAS, We Intelligence,
Crystal Reports,
Dashboard Design.
Work History
AWS Data Engineer, 06/2019 to 07/2022
Infotel Groups India
Implemented Installation and configuration of multi-node cluster on Cloud using Amazon Web Services (AWS) on EC2
Handled AWS Management Tools as Cloud watch and Cloud Trail Stored teh log files in AWS S3
Used versioning in S3 buckets where teh highly sensitive information is stored Experience in Developing Spark applications using Spark - SQL in Databricks for data extraction, transformation, and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns Working experience with data streaming process with Kafka, Apache Spark, Hive Analysed the SQL scripts and designed the solution to implement using Scala Designed both 3NF data models for OLTP systems and dimensional data models using star and snowflake Schemas
Optimized the PySpark jobs to run on Kubernetes Cluster for faster data processing Developed parallel reports using SQL and Python to validate the daily, monthly, and quarterly reports
Designed and Developed Real time Stream processing Application using Spark, Kafka, Scala, and Hive to perform Streaming
ETL and apply Machine Learning
Imported data from AWS S3 into Spark RDD, Performed transformations and actions on RDD's Primarily Responsible for converting Manual Report system to fully automated CI/CD Data Pipeline that ingest data from different Marketing platforms to AWS S3 data lake Designed and developed AWS architecture, Cloud migration, AWS EMR, DynamoDB, Redshift and event processing using lambda function
Conducted ETL Data Integration, Cleansing, and Transformations using AWS glue Spark script Worked on ETL Migration services by developing and deploying AWS Lambda functions for generating a serverless data pipeline which can be written to Glue Catalog and can be queried from Athena
Developed an aws lambda functions to trigger the unit step functions to process the scheduled EMR job
Created Partitioned and Bucketed Hive tables in Parquet File Formats with Snappy compression and then loaded data into
Parquet hive tables from Avro hive tables
Involved in designing and developing tables in HBase and storing aggregated data from Hive Table Analyzed the sql scripts and designed it by using PySpark SQL for faster performance Developed spark applications in python (PySpark) on a distributed environment to load huge number of CSV files with different schema in to Hive ORC tables Used Apache Spark Data frames, Spark-SQL, Spark MLlib extensively and developing and designing POC's using Scala, Spark
SQL and MLlib libraries
Created PySpark code that uses Spark SQL to generate data frames from avro formatted raw layer and writes them to data service layer internal tables as orc format Used SQL Server Integrations Services (SSIS) for extraction, transformation, and loading data into target systems from multiple sources
Developed Airflow DAGs in python by importing the Airflow libraries Environment: AWS, JMeter, Kafka, Ansible, Jenkins, Docker, Maven, Linux, Red Hat, GIT, Cloud Watch, Python, Shell Scripting, Golang,
Web Sphere, Splunk, Tomcat, Soap UI, Kubernetes, Terraform, PowerShell. Education
Master of Science: Data Science, May 2023
GANNON UNIVERSITY - ERIE, PA
GPA: 3.85/4
Bachelor of Technology: Electronics and Communication Engineering, May 2019 S.R.M INSTITUTE OF SCIENCE AND TECHNOLOGY - CHENNAI Applied Machine Learning, Introduction to statistics, Applied algorithms, Linear Regression, Data Mining, Text Mining, AWS, Data Visualization and analysis, Manage and access big data, Advanced database management.
Accomplishments
Jan 2016-April 2016
Title: -Signal jammer
Description: It is a prototype that blocks transmission of signals between a cell phone and a base station
By using the same frequency as a mobile handset, the cell phone jammer creates strong interference for communication between caller and receiver
Depending on the frequencies needed to block the values of the indicator and capacitor can be altered
Title: - Movement of car by hand gestures
Description: To control a car movement in the desired direction, wheels will rotate with specific velocity
The robotic car can move forward, reverse, left, or right by performing some hand gesture and it is designed using Arduino
Title: -Speed of moving objects using sonic sensor Description: Prototype using Ultrasonic HC-SR04 and Arduino UNO to calculate distance between UltraSonic device and an object
A processing application is used to display the distance between an UltraSonic device and an object on the laptop's (Monitor) screen
Title: -Pollution and Weather Monitoring using LORA Description: To determine how much pollutant happening around our atmosphere by men of human activities and a weather report from daily access can be handled The data collected from sensors are transferred to Arduino UNO via analog to digital converters with the mean time sources that can be sent to LORA transmitters LORA module is used for long range coverage of wireless transmission mechanisms The data collected from the sensors can be stored and monitored by transferring to cloud storage devices.
Additional Information
Hands on Experience with dimensional modeling using star schema and snowflake models. Firm understanding of Hadoop architecture and various components including HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce programming. Involved in setting up Jenkins Master and multiple slaves for the entire team as a CI tool as part of Continuous development and deployment process Installed and configured apache Airflow for workflow management and created workflows in python, created the DAG's using Airflow to run jobs sequentially and parallelly. Experienced in Optimizing the Pyspark jobs to run on Kubernetes Cluster for faster data processing Involved in converting Hive Queries into various Spark Actions and Transformations by Creating RDD and Data frame from the required files in HDFS. Experience in providing support to data analyst in running Hive queries and building an ETL. Performed Importing and exporting data into HDFS and Hive using Sqoop. Worked on AWS Cloud – used S3 buckets for storing the data, EMR – for spinning up the cluster and running the spark jobs, Athena – for creating external tables, IAM – for authentication, AWS GLUE – to get data from sources. Experienced in Designing, Architecting, and implementing scalable cloud-based web applications using AWS and Azure. Experienced on Migrating SQL database to Azure data Lake storage, Azure Data Factory (ADF), Azure data lake Analytics, Azure SQL Database, Data Bricks and Azure SQL Data warehouse and controlling and granting database access and migrating on premise databases to Azure Data Lake store using Azure Data factory. Involved in Software development, Data warehousing and Analytics and Data engineering projects using Hadoop, MapReduce, Hive, and other open-source tools/technologies. Worked on reading and writing multiple data formats like JSON, ORC, Parquet on HDFS using PySpark.