Data Engineer Machine Learning

Location:

Hyderabad, Telangana, India

Posted:

October 15, 2025

Contact this candidate

Resume:

RAJESWARI R

Email: **************@*****.*** Mobile: +1-512-***-****

PROFESSIONAL SUMMARY

● 5+ years of experience as a Data Engineer with deep expertise in designing, building, and optimizing large- scale data pipelines and cloud-based data architectures across AWS and Azure environments.

● Advanced cloud data engineering skills with hands-on experience in AWS (Redshift, Glue, EMR, SageMaker, Kinesis) and Azure (ADF, Synapse, Databricks, Event Hub), delivering scalable ETL and real- time streaming solutions.

● Proven track record of performance optimization, reducing data processing times by up to 50%, improving query efficiency, and cutting storage and compute costs by millions annually.

● Expertise in big data, analytics, and machine learning deployment, leveraging Spark, PySpark, Delta Lake, and SageMaker/Azure ML to enable predictive analytics and data-driven decision-making.

● Strong focus on security, compliance, and automation, implementing IAM, RBAC, encryption, and CI/CD pipelines to ensure highly reliable, auditable, and secure data solutions. TECHNICAL SKILLS

Cloud Platforms – AWS Redshift, Lambda, SageMaker, Step Functions, Glue Catalog, KMS, S3, Glue, Kinesis, EMR, Lake Formation, CodePipeline, IAM, CloudWatch Cloud Platforms – Azure Cosmos DB, Azure SQL, Stream Analytics, Azure Databricks, DevOps, AKS, Purview, Event Hub, Synapse Analytics, Azure Data Factory (ADF) ETL & Data Warehousing Snowflake, Dimensional Modeling, DBT, Change Data Capture (CDC), Apache Airflow, Azure Synapse Analytics, AWS Glue, Amazon Redshift, Google BigQuery, Azure Data Factory

Big Data & Analytics Delta Lake, PySpark, Apache Spark, Parquet, Hadoop, Azure Databricks, Apache Airflow, AWS EMR

Databases MySQL, PostgreSQL, AuroraDB, SQL Server, Amazon Redshift, Cosmos DB, NoSQL, Azure SQL Database

Programming Languages Bash (Shell Scripting), Scala, SQL, Java, Python, PySpark Business Intelligence Tools QuickSight, PowerBI, Tableau Containerization & Deployment Kubernetes (EKS, AKS), Jenkins, Docker, AWS CodePipeline, GitHub, GitHub Actions, CI/CD, Git

Methodologies Infrastructure as Code (IaC), Scrum, Agile, DevOps PROFESSIONAL EXPERIENCE

State Street May 2024 - Present

Data Engineer II Phoenix, AZ

● Architected and implemented data pipelines using AWS Glue, processing over 12 TB of financial data weekly and transforming it for analysis in Amazon S3.

● Optimized Apache Spark workflows on AWS EMR, cutting data processing time by 40% for daily financial transaction analysis.

● Migrated critical workloads to AWS Redshift, improving query performance for over 200 users and reducing storage costs by $3M annually.

● Developed real-time processing pipeline using AWS Kinesis and Lambda, handling 150K financial data events per minute with 99.99% availability.

● Utilized AWS SageMaker to deploy machine learning models, forecasting market trends with 95% accuracy and supporting financial decision-making.

● Monitored data pipeline health with AWS CloudWatch, reducing ETL job failures by 25% and increasing processing reliability.

● Automated deployment of big data solutions with AWS CodePipeline, reducing time-to-deployment from 5 hours to under 1 hour.

● Designed secure data environments using IAM, ensuring compliance with SOC 2 and ISO 27001 standards across cloud infrastructure.

● Improved data warehouse performance in Redshift with custom UDFs, reducing query latency for financial reporting by 30%.

● Implemented data lineage tracking, ensuring full visibility into the data flow from source to destination, supporting audit requirements.

Johnson & Johnson Oct 2021 - Aug 2023

Data Engineer I Bangalore, India

● Architected data pipelines in Azure Data Factory (ADF), processing 15 TB of healthcare data per month, integrating it into Azure Synapse and on-prem systems.

● Optimized Spark-based ETL jobs in Azure Synapse Analytics, processing 4 TB of data daily and reducing batch job execution time by 45%.

● Built real-time data streaming platform using Azure Event Hubs and Azure Functions, supporting 100+ healthcare data integration points with 99.99% uptime.

● Migrated 70% of on-prem data to Azure Blob Storage and Azure Data Lake Storage (ADLS), cutting data storage costs by $1.5M annually.

● Engineered Azure SQL Database and Azure SQL Data Warehouse (Synapse) solutions for operational and analytical workloads, handling 20 TB of data.

● Automated deployment of Spark-based data processing jobs in Azure Databricks using Azure DevOps, cutting deployment time from 4 hours to 30 minutes.

● Deployed machine learning models in Azure ML, providing predictive analytics for patient outcomes across 50+ healthcare facilities.

● Implemented data quality testing and validation workflows with Azure Monitor, reducing data processing errors by 20%.

● Enforced security compliance for data processing using Azure Active Directory and RBAC, meeting HIPAA standards for sensitive data.

● Managed Azure Key Vault to securely handle secrets, reducing risk of unauthorized access by 30% during data integration tasks.

Nationwide Insurance Sep 2019 – Sep 2021

Data Engineer I Bangalore, India

● Designed and implemented ETL pipelines with AWS Glue, processing over 10 TB of data monthly from multiple sources into Amazon S3 for storage and transformation.

● Optimized data models in Redshift and AWS RDS, reducing query latency by 40% and enhancing data access speed for 500+ business users.

● Engineered Apache Spark jobs on AWS EMR, processing 5 TB of data daily, cutting batch job execution time from 8 hours to 4 hours.

● Developed real-time streaming architecture using AWS Kinesis and Lambda, enabling processing of 100K events per minute for fraud detection.

● Migrated 80% of on-prem data infrastructure to AWS with AWS Migration Hub, reducing infrastructure costs by $2M annually.

● Automated scaling of data processing workloads using EC2 and EKS, reducing compute costs by 35% while maintaining high throughput.

● Deployed CI/CD pipelines via AWS CodePipeline and Jenkins, automating the deployment of 20+ data engineering solutions per month.

● Enabled ad-hoc querying with AWS Athena on 5 TB of structured data stored in S3, reducing query execution time from 30 minutes to 3 minutes.

● Orchestrated complex ETL workflows with AWS Step Functions, improving process efficiency and reducing data processing failures by 25%.

● Implemented IAM policies and encryption standards in AWS S3 and Redshift, ensuring compliance with GDPR and PCI DSS regulations.

● Enhanced data warehouse performance in Redshift by creating custom UDFs, reducing query processing time for large datasets by 50%.

● Integrated Kafka and Kinesis for real-time data streaming, supporting 100+ event-driven systems with zero data loss.

● Developed custom Spark jobs on EMR to process over 10 TB of transactional data, improving analytics turnaround time by 30%.

● Enabled data sharing across teams using Snowflake, facilitating collaboration and reducing data replication by 40%.

● Established comprehensive data lineage tracking, enhancing audit capabilities and ensuring traceability for 50+ data pipelines.

EDUCATION

Masters in Computer Science May 2025

University of Texas - A&M, Corpus Christi

Contact this candidate