Data Engineer with AWS & Big Data Expertise

Location:

Chicago, IL

Salary:

80000

Posted:

December 17, 2025

Contact this candidate

Resume:

Vivek Vardhan Rao Sirikonda

Data Engineer

Email ID: ************************@*****.*** /Contact:+1-312-***-****

Summary:

Around 5 years of experience in in the entire SDLC life cycle as well as Data Engineer including Data Science, Data collection, Data Cleaning, Data Manipulation, Data Mining, Data Validation, Data Visualization, Analytics modeling and Designing algorithms.

Extensively worked on Python and R with various packages, MATLAB with various toolboxes.

Solid ability to write and optimize diverse SQL queries, and working knowledge of RDBMS and NoSQL DBMS.

Experience in working with Spark and different components in spark, pyspark, Python, Scala, Git, CI CD, Jenkins, Ansible, Cloud formation templates, SQL, NOSQL, Kafka, Hive, HBase, HDFS and Map reduce.

Good Knowledge in Amazon AWS concepts like EMR, S3, Lambda, Triggers, Glue, Step functions, Cloud watch events, Redshift, RDS, Athena, Kinesis firehose, data streams and EC2 web services which provides fast and efficient.

Strong experience in core Python, Scala, SQL, PL/SQL and Restful web services.

Encountered a wide range of data format including Structured data (Relational, Non-Relational data including Time series data, Spatial data, Graph data, etc.), Semi-structured data, Unstructured data.

Strong Knowledge in Bid Data environment and using tools like Hive and cloud-based data warehouse such as Teradata, Snowflake, Amazon Redshift, Amazon S3, Google Big query, Azure synapse Analytics.

Proficient in using different AI/ML platforms like Anaconda (Jupyter Notebook, Spyder), Google Colab, TensorFlow, Alteryx, Dataiku, RapidMiner, KNIME, Google Vertex AI & AutoML for data pre-processing and model prediction.

Hands-on experience with AWS cloud services (VPC, EC2, S3, RDS, Redshirt, Data Pipeline, EMR, Dynamo, Workspaces, RDS, SNS, SQS).

Worked on At Scale with Snowflake and created live connection between various teams and data.

Experienced in MLOps Concepts (Deploying, Monitoring, Maintenance) using Amazon Sage maker, Databricks, Google Cloud AI platform, Azure Data Bricks.

Good Experience in writing Map Reduce jobs using Java native code, Pig, Hive for business use cases.

Experience implementing NoSQL database technologies, including MongoDB, Cassandra, and HBase.

Highly skilled in statistical methodologies includes Experimental design and hypothesis test such as ANOVA test, T test, F test, Chi-square test, A/B test to prove the business assumption through data-driven analysis using Python and Excel.

Expertise in developing spark application using Spark-SQL and PySpark in Databricks for data extraction, transformation, and aggregation from multiple file formats for analysing & transforming the data to uncover insights into the customer usage patterns.

Familiarity with machine learning algorithm such as Linear Regression, Logistic Regression, Decision Tree, Random Forest, Naive Bayes, KNN, SVM, Neural Networks, CNN, RNN, K-means and Hierarchical clustering.

Hands on experience in building sophisticated models to address the business problems with machine learning models, Segmentation, Time series and other advanced analytical/statistical methodologies in Retail, Health care, Telecom, Insurance and financial services.

Performed data analysis and applied ML models in Various field includes medical diagnosis, product recommendation, fraud detection, Speech recognition, Image recognition and Stock market trading.

Experienced in ML model testing, Model deployment, model monitoring and model maintenance.

Excellent communication skills and understanding of System Development Life Cycle (SDLC) with Agile, Scrum and waterfall methodology.

Experienced with Integration Services (SSIS), Reporting Service (SSRS) and Analysis Services (SSAS).

Successfully working in fast-paced multitasking environment both independently and in collaborative team, a self-motivated enthusiastic learner.

Technical Skills:

Big Data

HDFS, Hbase, Hive, MapReduce, Spark, Cassandra, Scala, Git, CI CD, Jenkins, Kafka, Ansible

Languages

Python (Jupyter Notebook, PyCharm IDE), R, C, C++, SAS

Cloud Computing Tools

Azure, AWS (S3, EC2, Lambda, Triggers, Glue, EMR, Step functions, Cloud watch events, Redshift, RDS, Athena, Kinesis firehose, data streams)

ETL tools

TensorFlow. Data API, PySpark

Modelling and Architect Tools

Star-Schema, Snowflake-Schema Modelling, FACT and dimension tables, Pivot Tables

Databases

Snowflake Cloud Database, Oracle, MS SQL Server, Teradata, MySQL, DB2

Database Tools

SQL Server Data Tools, Visual Studio, Spotlight, SQL Server Management Studio, Query Analyzer

Reporting Tools

MS Excel, Tableau, Tableau server, Tableau Reader, Power BI

Machine Learning Algorithm's

Logistic Regression, Linear Regression, Support Vector Machines, Decision Trees,

K-Nearest Neighbors, Random Forests, Gradient Boost Decision Trees,

Naive Bayes, K-Means Clustering and Hierarchical Clustering.

Deep Learning

CNN Convolutional Neural Networks, LSTM, GRU

Work Experience:

Client: Genesis Healthcare, Zanesville, OH Mar 2024 – Present

Role: Data Engineer

Responsibilities:

Gathered business requirements and worked with business users and co-workers to define the objectives and goal of data analysis.

Worked with Data Engineers or ETL pipeline to collect required dataset for job requirements and solve the business problems (Database, Datawarehouse, API, flat files).

Queried the data using SQL, Pandas SQL that are retrieved from various Databases and Datawarehouse depends on the data structure and file format.

Prominently used Python language to fetch the data from (AWS Redshift, Snowflake) data warehouse and performed data cleaning, manipulation, data profiling and data mining.

Worked in Google Colab and Anaconda Platform (Jupyter Notebook), including python libraries such as NumPy, Pandas, SciPy, Matplotlib, Seaborn, and Scikit-learn for data preparation, data validation, advanced analytics, visualization and building predictive modeling.

Develop Azure Databricks notebooks to apply the business transformations and perform data cleansing operations.

Implemented statistical techniques and Machine learning algorithms covering Classification, Regression, Clustering, Time series Forecasting, Deep Learning and Natural Language Processing to create new, scalable solutions for business problems.

Designing and Developing Azure Data Factory (ADF) extensively for ingesting data from different source systems like relational and non-relational to meet business functional requirements.

Build various predictive/explanatory models based on domain knowledge and customer business objectives.

Performed Cross-validation(K-fold) and Evaluation metrics to avoid overfitting and to predict the accurate model that suites for the business requirements.

Worked on AWS EMR to run Spark-Python Notebook through various Cloud Data pipelines.

Used DAX Queries to create calculated Measure and Calculated Columns.

Used DevOps to integrate AWS services and SQL databases.

Parsing the data from S3 through the Python API calls through the Amazon API Gateway generating Batch Source for processing.

Also used Alteryx, Dataiku for data preparation, advanced analytics, and predictive modeling, including its visual interface for data preparation, and a range of analytics tools and algorithms.

Uplifting and shifting the SSIS packages using ADF.

Worked with stakeholders and to get more insights on supply chain and risk management.

Conducted data/statistical analysis, generated Transaction Performance Report on monthly and quarterly basis for all the transactional data from U.S., Canada, and Latin America Markets using Tableau and Power BI.

Migrating the data from different sources to sink with the help of Azure Data Factory.

Worked with Operation team in MLOps, to deploy, monitor, and maintain the machine learning models in a production environment using Amazon sage maker or Databricks.

Collaborated with research analysts, state agencies, business stakeholders, and software/data engineers to deliver analytical solutions.

Domain: Retail (Descriptive Analysis, Predictive & Prescriptive Analysis, Exploratory Analysis).

Environment: Customer’s personalization & behavior analysis, customer segmentation, Azure Data Bricks, Product grouping, Price optimization, Fraud detection, Inventory optimization, Demand forecasting, Supply chain management, Sales & marketing analysis, Risk management, Speech recognition in stores.

Client: Kanerika Inc, India Dec 2021 - July 2023

Role: Data Engineer

Responsibilities:

Involved in complete Software Development Life Cycle (SDLC) process by analyzing business requirements and understanding the functional workflow of information from source systems to destination systems.

Assisted in building data modeling and ETL pipelines such as extracting, Transforming and storing in GCP- Google Big Query and Amazon Redshift data warehouse.

Performed customed data mapping, integration and transformation process using Apache Spark.

Analyzed and Prepared data, identify the patterns on dataset by applying historical models.

Mentored highly immersive Data Analytics program involving Data fetching, Data manipulation and preparation, data profiling, data mining, Visualization and Machine Learning models using SQL and Python libraries in Anaconda (Jupyter notebook) platforms.

Responsible for Writing the Data Quality checks, based on the existing source code, using Python & PySpark data framework in Databricks platform (which Improved process time).

Building the pipelines to copy the data from source to sink in Azure Data Factory (ADF).

Provided training for Application of various statistical distribution and machine learning algorithms like Logical Regression, Decision Trees, Random Forest, neural networks, SVM, clustering to identify Volume using scikit-learn package.

Build Data pipelines in Airflow in GCP for ETL related jobs.

Experience in GCP Dataproc, GCS, Cloud Functions and Google Big Query.

Mentored juniors for developing and deploying machine learning models using Google cloud Vertex AI and Google AutoML.

Used Python and R for developing various machine learning algorithms to predict the data, and Improved efficiency and accuracy by evaluating model to forecast the data for better results.

Documented project reporting requirements and translate them into functional specifications.

Analyzed sentimental data and detecting trend in customer usage and other services to forecast future prediction using NLP.

Implemented Key Performance Indicator (KPI) Objects, Actions, Hierarchies and Attribute Relationships to measure the performance of an organization, department and project.

Worked on Agile Methodologies (JIRA) and presented various analytical Dashboards to Higher Management for more Insights using Power BI and Tableau.

Generated SQL and PL/SQL scripts to create and drop database objects including: Tables, Views, and Primary keys, Indexes, Constraints.

Transforming data in Azure Data Factory with the ADF Transformations.

Collaborated with operation team and developer in MLOps environment, that is deployment, monitoring, and maintenance using Google Could AI or Databricks.

Environment: Telecomm- Customer’s personalization, behavior & churn analysis, customer segmentation, Revenue analysis, Network performance optimization, Sales & marketing analysis Finance-Customer’s personalization & behavior analysis, customer segmentation, Credit scoring, Fraud detection, Risk management.

Client: Envoy Global, India Mar 2020 - Nov 2021

Role: Data Engineer

Responsibilities:

Created Redshift cluster, security groups and VPC endpoint and handled multiple AWS Glue jobs to crawl data from S3 to Redshift clusters Gathered business requirements and prepared technical design documents, target to source mapping document, mapping specification document.

Participated in all phases of research including data collection, data cleaning, data mining, visualizations and developing models.

Collaborated with data engineers and operation team to collect data from internal system to fit the analytical requirements (Database, Datawarehouse, API, flat files).

Used AWS EMR to transform and move large amounts of data in and out of other AWS data stores and databases, such as Amazon Simple Storage Service (Amazon S3) and Amazon DynamoDB.

Implemented AWS Step Functions to automate and orchestrate the AWS Lambda functions related to tasks such as publishing data to S3, databases and API’s.

Migrate existing architecture to Amazon Web Services and utilize several technologies like Kinesis, RedShift, AWS Lambda, Cloud watch metrics.

Worked in GCP- Google Big query and Snowflake data warehouse for data storage and management.

Performed data processing using NumPy, Pandas, SciPy and ML model prediction using Scikit-learn in Python libraries in Anaconda Jupyter notebook platform.

Involved in data modelling using Visio and ETL process-extracting, transformation and loading from the source systems to the target system (using Talend or informatics) with help of IT team.

Performed data analysis and maintenance on information store in MySQL, MS SQL server, Oracle and NoSQL-DBMS (MongoDB) database depends on the data structure and file format using SQL / T-SQL / PL-SQL.

Redefined many attributes and relationships, and cleansed unwanted tables/columns using SQL, MQL queries (Mongo DB for NoSQL DBMS & SQL for RDBMS).

Used Pandas, NumPy, Seaborn, Matplotlib, Scikit-learn in Python for developing various machine learning models and utilized algorithms such as Decision Trees, Logistic regression, SVM and Random Forest.

Presented the gained insights and business solution to help business owner, executive, stakeholders through data analysis and data-driven decision-making using Power BI and Tableau.

Worked in ETL pipelines and loaded data to Hadoop (HDFS), Hive and relational databases.

Worked in Python packages like NumPy, Pandas, Matplotlib for data preparation including Data cleaning, data manipulation, data mining and built various graphs for business decision making.

Created ETL processes to extract and load data into warehouses while working within an Agile framework.

Developed and optimized SQL queries to extract insights from large healthcare data sets.

Implemented and maintained job scheduling and monitoring mechanisms using SQL Server Agent, ensuring timely execution and reliable operation of SSIS packages as per business requirements.

Implemented AWS lambda functions, python script that pulls the privacy files from AWS S3 buckets to post to it the Malibu data privacy endpoints.

Played an active role in end-to-end implementation of the Hadoop infrastructure using Sqoop, Hive, and Spark and migrated the data from legacy systems (like Teradata, MySQL) into HDFS.

Environment: Customer’s personalization & behavior analysis, Azure Synapse, customer segmentation, Credit scoring, Fraud detection, Risk management, Chatbot & Customer service.

Education:

Masters in Information technology and management From Illinois institute of technology, Chicago.

Specialization in Data Engineering

Bachelors in Computer Science from Lovely Professional University, Punjab.

Specialization in Cloud Transformation

Contact this candidate