DATA ENGINEER ANALYST
Over *+ years of experience in Data Engineering professional in building data pipelines, agile methodology, interpreting and analyzing trends in the data using trending techniques and building machine learning models.
Knowledgeable in Azure Data Services, including Azure SQL Database, Azure Data Lake, and Azure Synapse Analytics, with demonstrated success in designing and implementing data solutions on the Azure platform.
Proficient in database management with Oracle, SQL Server, MySQL, and DB2, and specialized in creating Ab Initio graphs to transfer and fetch data from sources to Teradata and flat files.
Accomplished in developing complex SQL applications, managing cloud platforms (GCP, AWS), and deploying on Linux- based environments, complemented by proficiency in service-oriented architectures, API design, CI/CD, and testing frameworks for seamless system integration and reliability.
Strong skills in data visualization and business intelligence using Tableau, Power BI, and Data Studio, alongside Python and PySpark scripting for data analysis and insight communication.
AREA OF EXPERTISE
Python Scripting Expertise
SQL Query Optimization
Data Integration Strategies
AWS, AZURE, GCP
AWS Cloud Services
ETL/ELT pipelines
Cloud Data Engineering
Azure Synapse Analytics
Data Visualization Strategies
Data Pipeline Development
AI and ML Integration
Data Security Protocols
PROFESSIONAL EXPERIENCE
Vanguard-Texas Jun 2023 – Present
Data Engineer
Developed end-to-end data pipelines using PySpark and Python on Azure Databricks, increasing data processing efficiency by 30%.
Built scalable workflows on Azure Databricks and AWS Glue, reducing migration time by 50%.
Automated data pipelines for parsing and storing raw data into partitioned Hive tables, enhancing data retrieval for reporting by 20%.
Designed and implemented batch and real-time ETL pipelines, improving data handling across Hadoop and AWS data lakes.
Facilitated complex transformations using PySpark and Spark SQL, resulting in faster processing and enhanced accuracy.
Optimized SQL queries and used Python libraries (Pandas, NumPy) for data manipulation, achieving a 40% reduction in execution times.
Orchestrated data workflows with Apache Airflow for efficient scheduling and monitoring across cloud platforms.
Streamlined data preparation tasks with Alteryx, improving data processing efficiency for reporting.
Formulated predictive models using Python and Spark MLlib, enhancing business decision-making through data-driven insights.
Integrated interactive Tableau dashboards connected to AWS RDS and Azure SQL, providing actionable insights tailored to business needs.
Designed and deployed data pipelines across AWS and Azure (AWS Glue, S3, Redshift, HDInsight, Data Factory, Synapse), supporting robust data movement and transformation across platforms.
JP Morgan Chase-India Apr 2019 – Jun 2021
Data Engineer
Automated ETL processes using SQL Server Integration Services (SSIS) and DevOps, optimizing performance and reducing manual intervention.
Migrated on-premises databases to AWS Redshift, configuring data loads from S3 with AWS Data Pipeline for improved scalability.
Collaborated with cross-functional teams to implement comprehensive data migration plans, ensuring efficient transitions across environments.
Crafted business intelligence solutions using SQL Server Reporting Services (SSRS) and SQL Server Analysis Services (SSAS), enhancing data visualization for decision-making.
Implemented AWS Lambda for event-driven Python scripts, optimizing automation tasks through precise AWS CLI configurations.
Managed multi-cloud environments (AWS and GCP) for deploying SQL applications, ensuring effective resource management.
Created advanced SQL applications, optimizing queries for performance on cloud platforms.
Conducted system analysis to identify technical solutions aligned with business objectives, achieving a 15% increase in operational efficiency.
Wipro Limited-India Apr 2017 – Mar 2019
Data Analyst
Maintained ETL processes for integrating patient data from Electronic Health Records (EHR), ensuring timely and accurate data availability for clinical decision-making.
Conducted rigorous data quality checks, achieving a 99.9% accuracy rate in patient data processing, enhancing compliance with healthcare regulations.
Created a supervised machine learning model to predict customer wishlists, utilizing historical startup data for insights through data cleaning and feature engineering.
Leveraged AWS (Amazon Redshift, S3, Glue) for scalable data architecture, enabling effective storage, processing, & analysis
Executed data workflows using AWS Step Functions, Lambda, and Batch, resulting in improved operational efficiency.
Engineered interactive dashboards in Tableau and Power BI to visualize key performance indicators (KPIs) for patient care, reducing report generation time by 30%.
Deployed a real-time data ingestion pipeline using Apache Kafka, enabling timely monitoring of patient vitals and treatment responses, decreasing response times by 20%.
Monitored system health and data processes with AWS CloudWatch, developing high-performance RESTful API endpoints for SageMaker models, ensuring data availability and protection.
Enhanced testing infrastructure, reducing test time by 50% while achieving over 90% coverage and streamlining data cleaning for NLP models.
Collaborated with data scientists to implement data encryption protocols, ensuring data integrity during analysis and model training.
PROJECTS
Real-time Data Processing System:
Developed a real-time data processing system utilizing AWS Kinesis and Lambda, along with Azure Functions for serverless computing, enabling real-time analytics and decision-making.
Predictive Maintenance Model
Engineered a predictive maintenance model using Python and Azure services (Databricks, Synapse Analytics) to forecast equipment failures, reducing downtime by 25%.
Building Scalable Data Lake for Healthcare Analytics
Design and executed a scalable data lake using GCP Cloud Storage to centralize healthcare data.
Leveraged GCP Dataflow for real-time data capture and transformation, and applied BigQuery for large-scale data analysis, boosting query performance by 50%.
EDUCATION
Master of Science in Computer and Information systems May 2023
Rivier University Nashua, NH
TECHNICAL SKILLS
Programming Language: Python, R Language, Java, C, Pyspark, SQL, R, HTML, Scala, GoLang, ReactJS, javascript
Competencies & tools: Jupyter, Databricks, Docker, RStudio, Git, PyCharm, Terraform, VS Code, PowerBI, Tableau
IDE’s: PyCharm, Jupyter Notebook, Visual studio
Big Data Ecosystem: Hadoop, Hive, MapReduce, Apache Airflow, Apache Kafka, Apache Spark, Apache Flink, Data Bricks, Kenesis
Cloud Technologies: AWS (EC2, S3, Lambda, Glue, Athena, AWS Pipeline, Redshift), Azure (ADF, Azure Data Lake, Data Bricks, Azure DevOps, Azure SQL, Azure Synapse), Alteryx, AB Intitio, GCP (BigQuery, Dataflow, Dataproc, Pub/Sub, Cloud Storage, Cloud Function)
Visualizations: Tableau, Power BI, Excel
Packages & Data Processing: NumPy, Pandas, Matplotlib, Seaborn, TensorFlow, PySpark, Data Pipelines, Jenkins
Database: SQL Server, PostgreSQL, MongoDB, Cassandra, MySQL, Snowflake, Oracle
Machine Learning: ANN, CNN, LLM models, Statistics (Hypothesis testing, ANOVA), Regression (Linear, Logistic, Ridge, Lasso), Decision Tree, Random Forest, XG Boost, K-Means, KNN, Support Vector Machine, A/B Testing, NLP, Scikit, Seaborn, Plotly
CERTIFICATIONS
Microsoft Certified: Azure Data Engineer Associate
AWS Certified Data Engineer - Associate
SHRAVANI CH
Mobile: +1-470-***-**** Email: ********.*****@*****.*** LinkedIn: LinkedIn Profile