Data Engineer Machine Learning

Location:

Flower Mound, TX

Salary:

100000

Posted:

November 05, 2024

Contact this candidate

Resume:

DATA ENGINEER ANALYST

Over *+ years of experience in Data Engineering professional in building data pipelines, agile methodology, interpreting and analyzing trends in the data using trending techniques and building machine learning models.

Knowledgeable in Azure Data Services, including Azure SQL Database, Azure Data Lake, and Azure Synapse Analytics, with demonstrated success in designing and implementing data solutions on the Azure platform.

Proficient in database management with Oracle, SQL Server, MySQL, and DB2, and specialized in creating Ab Initio graphs to transfer and fetch data from sources to Teradata and flat files.

Accomplished in developing complex SQL applications, managing cloud platforms (GCP, AWS), and deploying on Linux- based environments, complemented by proficiency in service-oriented architectures, API design, CI/CD, and testing frameworks for seamless system integration and reliability.

Strong skills in data visualization and business intelligence using Tableau, Power BI, and Data Studio, alongside Python and PySpark scripting for data analysis and insight communication.

AREA OF EXPERTISE

Python Scripting Expertise

SQL Query Optimization

Data Integration Strategies

AWS, AZURE, GCP

AWS Cloud Services

ETL/ELT pipelines

Cloud Data Engineering

Azure Synapse Analytics

Data Visualization Strategies

Data Pipeline Development

AI and ML Integration

Data Security Protocols

PROFESSIONAL EXPERIENCE

Vanguard-Texas Jun 2023 – Present

Data Engineer

Developed end-to-end data pipelines using PySpark and Python on Azure Databricks, increasing data processing efficiency by 30%.

Built scalable workflows on Azure Databricks and AWS Glue, reducing migration time by 50%.

Automated data pipelines for parsing and storing raw data into partitioned Hive tables, enhancing data retrieval for reporting by 20%.

Designed and implemented batch and real-time ETL pipelines, improving data handling across Hadoop and AWS data lakes.

Facilitated complex transformations using PySpark and Spark SQL, resulting in faster processing and enhanced accuracy.

Optimized SQL queries and used Python libraries (Pandas, NumPy) for data manipulation, achieving a 40% reduction in execution times.

Orchestrated data workflows with Apache Airflow for efficient scheduling and monitoring across cloud platforms.

Streamlined data preparation tasks with Alteryx, improving data processing efficiency for reporting.

Formulated predictive models using Python and Spark MLlib, enhancing business decision-making through data-driven insights.

Integrated interactive Tableau dashboards connected to AWS RDS and Azure SQL, providing actionable insights tailored to business needs.

Designed and deployed data pipelines across AWS and Azure (AWS Glue, S3, Redshift, HDInsight, Data Factory, Synapse), supporting robust data movement and transformation across platforms.

JP Morgan Chase-India Apr 2019 – Jun 2021

Data Engineer

Automated ETL processes using SQL Server Integration Services (SSIS) and DevOps, optimizing performance and reducing manual intervention.

Migrated on-premises databases to AWS Redshift, configuring data loads from S3 with AWS Data Pipeline for improved scalability.

Collaborated with cross-functional teams to implement comprehensive data migration plans, ensuring efficient transitions across environments.

Crafted business intelligence solutions using SQL Server Reporting Services (SSRS) and SQL Server Analysis Services (SSAS), enhancing data visualization for decision-making.

Implemented AWS Lambda for event-driven Python scripts, optimizing automation tasks through precise AWS CLI configurations.

Managed multi-cloud environments (AWS and GCP) for deploying SQL applications, ensuring effective resource management.

Created advanced SQL applications, optimizing queries for performance on cloud platforms.

Conducted system analysis to identify technical solutions aligned with business objectives, achieving a 15% increase in operational efficiency.

Wipro Limited-India Apr 2017 – Mar 2019

Data Analyst

Maintained ETL processes for integrating patient data from Electronic Health Records (EHR), ensuring timely and accurate data availability for clinical decision-making.

Conducted rigorous data quality checks, achieving a 99.9% accuracy rate in patient data processing, enhancing compliance with healthcare regulations.

Created a supervised machine learning model to predict customer wishlists, utilizing historical startup data for insights through data cleaning and feature engineering.

Leveraged AWS (Amazon Redshift, S3, Glue) for scalable data architecture, enabling effective storage, processing, & analysis

Executed data workflows using AWS Step Functions, Lambda, and Batch, resulting in improved operational efficiency.

Engineered interactive dashboards in Tableau and Power BI to visualize key performance indicators (KPIs) for patient care, reducing report generation time by 30%.

Deployed a real-time data ingestion pipeline using Apache Kafka, enabling timely monitoring of patient vitals and treatment responses, decreasing response times by 20%.

Monitored system health and data processes with AWS CloudWatch, developing high-performance RESTful API endpoints for SageMaker models, ensuring data availability and protection.

Enhanced testing infrastructure, reducing test time by 50% while achieving over 90% coverage and streamlining data cleaning for NLP models.

Collaborated with data scientists to implement data encryption protocols, ensuring data integrity during analysis and model training.

PROJECTS

Real-time Data Processing System:

Developed a real-time data processing system utilizing AWS Kinesis and Lambda, along with Azure Functions for serverless computing, enabling real-time analytics and decision-making.

Predictive Maintenance Model

Engineered a predictive maintenance model using Python and Azure services (Databricks, Synapse Analytics) to forecast equipment failures, reducing downtime by 25%.

Building Scalable Data Lake for Healthcare Analytics

Design and executed a scalable data lake using GCP Cloud Storage to centralize healthcare data.

Leveraged GCP Dataflow for real-time data capture and transformation, and applied BigQuery for large-scale data analysis, boosting query performance by 50%.

EDUCATION

Master of Science in Computer and Information systems May 2023

Rivier University Nashua, NH

TECHNICAL SKILLS

Programming Language: Python, R Language, Java, C, Pyspark, SQL, R, HTML, Scala, GoLang, ReactJS, javascript

Competencies & tools: Jupyter, Databricks, Docker, RStudio, Git, PyCharm, Terraform, VS Code, PowerBI, Tableau

IDE’s: PyCharm, Jupyter Notebook, Visual studio

Big Data Ecosystem: Hadoop, Hive, MapReduce, Apache Airflow, Apache Kafka, Apache Spark, Apache Flink, Data Bricks, Kenesis

Cloud Technologies: AWS (EC2, S3, Lambda, Glue, Athena, AWS Pipeline, Redshift), Azure (ADF, Azure Data Lake, Data Bricks, Azure DevOps, Azure SQL, Azure Synapse), Alteryx, AB Intitio, GCP (BigQuery, Dataflow, Dataproc, Pub/Sub, Cloud Storage, Cloud Function)

Visualizations: Tableau, Power BI, Excel

Packages & Data Processing: NumPy, Pandas, Matplotlib, Seaborn, TensorFlow, PySpark, Data Pipelines, Jenkins

Database: SQL Server, PostgreSQL, MongoDB, Cassandra, MySQL, Snowflake, Oracle

Machine Learning: ANN, CNN, LLM models, Statistics (Hypothesis testing, ANOVA), Regression (Linear, Logistic, Ridge, Lasso), Decision Tree, Random Forest, XG Boost, K-Means, KNN, Support Vector Machine, A/B Testing, NLP, Scikit, Seaborn, Plotly

CERTIFICATIONS

Microsoft Certified: Azure Data Engineer Associate

AWS Certified Data Engineer - Associate

SHRAVANI CH

Mobile: +1-470-***-**** Email: ********.*****@*****.*** LinkedIn: LinkedIn Profile

Contact this candidate