Post Job Free
Sign in

Data Engineer Processing

Location:
Texas
Salary:
120000
Posted:
September 10, 2025

Contact this candidate

Resume:

Data Engineer

Deepthi

*******.*********@*****.***

+1 669 300 (9157)

Professional Summary

Over 7+ years’ experience in designing, developing, and implementing strategic methods to efficiently solve Big Data processing requirements and Orchestrating data pipelines.

Proficient in Google Cloud Platform services for data processing and analytics, including Big Query and Dataflow.

Expertise in AWS services like S3, Redshift, and EMR for scalable data solutions in retail analytics.

Skilled in using Azure Data Lake, Azure SQL Database, and Azure Synapse for financial data analytics.

Implemented big data solutions using Hadoop and Spark, processing datasets of over 100TB.

Designed and implemented cloud-native ETL pipelines using AWS Glue and Azure Data Factory.

Managed NoSQL databases like MongoDB optimizing data storage for large-scale applications.

Created interactive dashboards using Tableau providing actionable insights into finance.

Ensured HIPAA compliance in healthcare data processing and analytics.

Implemented real-time data streaming using Kafka for immediate data insights in retail.

Ensured cloud security and compliance across GCP, AWS, and Azure environments.

Implemented data governance frameworks in banking, ensuring data accuracy and integrity.

Utilized Google Big Query for large-scale data analytics in various industries.

Automated data pipelines for efficient data flow and processing in insurance.

Designed and managed data lakes in AWS, Azure, and GCP for centralized data storage.

Built risk assessment models for accurate insurance premium calculations.

Analyzed sensor data for optimizing food processing techniques.

Ensured high data quality standards in all phases of data processing.

Led cross-functional teams in large-scale technology projects, aligning with corporate goals and visions.

Championed Agile methodologies, leading to a 30% increase in project delivery speed and efficiency.

Managed multiple projects simultaneously, consistently delivering within time and budget constraints.

Built and mentored diverse tech teams, fostering an inclusive and innovative work environment.

Skilled in conflict resolution, ensuring team cohesion and effective collaboration.

Successfully engaged with stakeholders at all levels, ensuring alignment and satisfaction with project outcomes.

Led change management initiatives, introducing new technologies and practices with minimal disruption.

Proactively managed project risks, reducing potential impacts on timelines and quality.

Utilized data-driven approaches for strategic decision-making, enhancing project outcomes.

Maintained a strong focus on customer needs, resulting in higher customer satisfaction rates.

Advocated for and implemented continuous improvement practices in all projects.

Demonstrated strong creative problem-solving skills, tackling complex challenges effectively.

Fostered collaboration between technical and non-technical departments, enhancing project synergy.

Adapted quickly to changing technologies and market demands, keeping projects relevant and cutting-edge.

Effectively managed project budgets, optimizing resource allocation and expenditure.

Applied emotional intelligence in leadership, understanding and managing team dynamics for better

performance.

Managed multicultural teams, leveraging diverse perspectives for enriched problem-solving.

Implemented effective feedback mechanisms, enabling continuous learning and development.

Optimized resource utilization across projects, maximizing team productivity and project outcomes.

Excelled in strategic planning and execution, ensuring long-term project success and alignment with business objectives.

Demonstrated excellent time management and prioritization skills, handling high workloads efficiently.

Maintained a strong focus on quality assurance, delivering projects that meet and exceed standards.

Initiated and supported professional development programs for team skill enhancement.

Practiced transparent communication, building trust and credibility with teams and stakeholders.

Successfully managed remote teams, ensuring productivity and engagement in a virtual environment.

Effective in crisis management, quickly resolving unforeseen issues with minimal impact on project continuity.

Education

Master of Business Administration from Osmania University, India

Bachelors of Science from SK University, India

Work Experience

MassMutual, Springfield, MA

July 2023 to Till Date

Senior Data Engineer

Designed and developed several data pipelines to streamline data ingestion and transformation processes in Azure. I worked on end-to-end ETL and ELT workflows, utilizing tools like Azure Data Factory, Azure Synapse Analytics, and Azure Databricks.

Developed Terraform script and deployed it in the cloud deployment manager to spin up resources like cloud virtual networks.

Used Spark-SQL to load JSON data and create Schema RDD and loaded it into Hive Tables and handled structured data using Spark SQL.

Fostered collaboration between development, operations, and healthcare client stakeholders, resulting in a 25% improvement in project delivery efficiency.

Successfully navigated healthcare compliance audits, achieving a 95% compliance score, and ensuring adherence to industry regulations.

Spearheaded innovative approaches to healthcare DevOps, ensuring the integration of the latest technologies and methodologies for continual improvement in healthcare systems.

Pioneered the integration of data engineering and DevOps practices, fostering a collaborative environment that resulted in a 25% increase in overall project efficiency.

Automated end-to-end data pipelines, reducing deployment times by 30% and ensuring seamless data flow for healthcare applications.

Azure Cloud Services Management: Extensively utilized Microsoft Azure services for deploying and optimizing finance applications, enhancing system performance and scalability.

Data Warehousing Design and Management: Designed and managed robust data warehousing solutions, supporting comprehensive finance and insurance data analysis.

Efficient Data Movement Strategies: Orchestrated efficient data movement processes, ensuring timely and accurate data availability for finance decision-making.

Data Transformation for finance Insights: Performed complex data transformations, enabling deeper insights into finance and customer behavior.

Automated Data Ingestion Processes: Implemented automated data ingestion pipelines, enhancing the speed and reliability of data flow into finance and insurance systems.

Data Modeling for finance Analytics: Developed sophisticated data models tailored to insurance business needs, facilitating effective data-driven strategies.

Data Encryption and Security Protocols: Ensured high levels of data security in finance environments through rigorous data encryption and security protocols.

Implemented scalable and cost-effective data storage solutions using Azure Data Lake Storage, ensuring seamless access and analysis of large datasets.

Finance Data Visualization with BI Tools: Created dynamic and interactive finance dashboards using business intelligence tools, providing actionable insights to management.

Cloud-Based ETL Pipeline Optimization: Optimized ETL pipelines in the cloud, improving data processing efficiency and accuracy.

Monitored and optimized the performance of data pipelines using Azure Monitor and Application Insights, enabling proactive identification and resolution of bottlenecks.

Inventory Optimization Algorithms: Created algorithms for inventory optimization, reducing overheads and improving stock turnover.

SQL and NoSQL Database Management: Expertly managed both SQL and NoSQL databases, catering to diverse data needs in the finance and insurance sector.

Orchestrated security measures for healthcare data, achieving a 20% reduction in vulnerabilities and ensuring compliance with healthcare data regulations.

Enabled continuous integration of data assets, reducing data integration time by 25% and providing real-time insights for healthcare analytics.

Instituted robust data quality checks, resulting in a 15% improvement in data accuracy and reliability for healthcare analytics.

Applied DevOps principles to big data projects, achieving a 30% reduction in time-to-market for healthcare data- driven applications.

Implemented real-time data streaming solutions, providing instant insights for healthcare monitoring systems and reducing latency by 25%.

Established a data governance framework, ensuring data integrity and facilitating collaboration between data engineers and DevOps teams for healthcare projects.

Implemented Infrastructure as Data Code (IaDC) principles, treating data infrastructure as code to enhance reproducibility and scalability for healthcare data platforms.

Implemented advanced data monitoring and alerting systems, reducing downtime by 20% and ensuring the reliability of healthcare data pipelines.

Automated data ETL processes, resulting in a 25% reduction in manual intervention and ensuring timely delivery of accurate data for healthcare analytics.

Environment: Azure, Azure Data Lake Storage, Spark, Hive, Hadoop, Python, Scala, Shell Script, BigQuery, SQL, NoSQL, Terraform, Git, Jenkins, Terraform, Git, Jenkins, IaC, IaDC.

Change Healthcare – Nashville, TN

Nov 2020 to June 2023

Data Engineer

Built and architected multiple data pipelines, implementing end-to-end ETL and ELT processes for data ingestion and transformation using AWS services such as AWS Glue, Amazon S3, and AWS Lambda, while coordinating tasks among the team for seamless project execution.

Loaded Data to BigQuery using Google DataProc, GCS bucket, HIVE, Spark, Scala, Python, Gsutil and Shell Script.

Developed Terraform script and deployed it in cloud deployment manager to spin up resources like cloud virtual networks.

Used Spark-SQL to load JSON data and create Schema RDD and loaded it into Hive Tables and handled structured data using Spark SQL.

Computed Engines in public and private subnets along with AutoScaler in Google Cloud Platform.

Created firewall rules to access Google DataProc from other machines.

Used Python to ingest data into BigQuery.

Involved in porting the existing on-premises Hive code migration to GCP (Google Cloud Platform) BigQuery.

Involved in extracting, transforming, and loading data sets from local to HDFS using Hive.

Imported and exported data from RDBMS to HDFS and vice-versa using Sqoop.

Written Python programs to maintain raw file archival in the GCS bucket.

Written Scala programs for Spark transformation in DataProc.

Created complex schema and tables for analyzing using Hive.

Set required Hadoop environments for cluster to perform Map Reduce jobs.

Formatted data using Hive queries and stored on HDFS.

Built a program with Python and Apache Beam and executed it in cloud Dataflow to run Data validation between raw source file and BigQuery tables.

Automated deployment and management of AWS infrastructure using Terraform and AWS CloudFormation templates, ensuring consistent and repeatable environments.

Built a Scala and Spark based configurable framework to connect common Data sources like MySQL, Oracle, PostgreSQL Server, Salesforce and load it in BigQuery.

Monitored BigQuery, DataProc and cloud Dataflow jobs via Stack driver for all the environments.

Opened SSH tunnel to Google DataProc to access the Yarn manager to monitor Spark jobs.

Submitted Spark jobs using Gsutil and Spark submission gets it executed in the DataPro cluster.

Built some complex data pipelines using PL/SQL scripts, Cloud REST APIs, Python scripts, GCP Composer and

GCP Dataflow.

Built ETL systems using Python and in-memory computing framework (Apache Spark), scheduling and maintaining data pipelines at regular intervals in Apache Airflow.

Worked on HBase and MySQL for optimizing the data.

Worked over File Sequence, AVRO, and Parquet file formats.

AGILE(Scrum) development methodology has been followed to develop the application.

Involved in managing the backup and disaster recovery for Hadoop data.

Performed data wrangling to clean, transform and reshape the data utilizing panda’s library.

Analyzed data using SQL, Scala, Python, Apache Spark and presented analytical reports to management and technical teams.

Data Visualization with Quick Sight: Created interactive dashboards using AWS Quick Sight for real-time visualization of patient data, improving clinical decision-making processes.

EHR Data Integration: Integrated Electronic Health Records (EHR) with AWS cloud services, streamlining patient data accessibility for healthcare professionals.

Automated Alert Systems: Implemented automated alert systems using AWS SNS for critical patient monitoring, reducing response times in emergency situations.

Serverless Architecture: Utilized AWS Lambda for building serverless applications, reducing operational costs and improving scalability.

Environment: GCP, AWS, Hadoop, Hive, Spark, Apache Beam, DataProc, DataFlow, HBase, Python, Scala, PL/SQL, Shell Scripting, BigQuery, MySQL, PostgreSQL, Oracle, Salesforce, Apache Airflow, AVRO, Terraform, WS Lambda

Wayfair – Boston, MA

Apr 2018 to Sep 2020

Data Engineer

Involved in the requirement gathering phase to align business user needs with data processing solutions, ensuring flexibility for evolving project requirements in AWS.

Responsible for managing data from multiple sources.

Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.

Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (S3).

Deployed the project on Amazon EMR with S3 connectivity for setting a backup storage.

Designed data and ETL pipeline using Python and Scala with Spark.

Involved in designing optimizing Spark SQL queries, Data frames, import data from Data sources, performing transformations, perform read/write operations, save the results to output directory into HDFS/AWS S3.

Migrated MapReduce programs into Spark transformations using PySpark.

Configured Spark streaming to get ongoing information from Kafka and stored the stream information to HDFS.

Experience with Spark Context, Spark-SQL, Spark, YARN.

Implemented Spark-SQL with various data sources like JSON, Parquet, ORC and Hive.

Implemented PySpark Scripts, Spark-SQL to access Hive tables into Spark for faster processing of data.

Built dashboards using Tableau to allow internal and external teams to visualize and extract insights from big data platforms.

Created Hive tables, loaded data and wrote Hive queries that run within the map.

Worked on ETL Processing which consists of data transformation, data sourcing and mapping, Conversion, and

loading.

Optimized MapReduce Jobs to use HDFS efficiently by using various compression mechanisms.

Used Amazon Elastic Cloud Compute (EC2) infrastructure for computational tasks and Simple Storage Service (S3) as storage mechanism.

Written multiple MapReduce programs for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other compressed file formats.

Worked on MongoDB by using CRUD (Create, Read, Update and Delete), Indexing, Replication and Shading features.

Worked on SQL queries in dimensional data warehouses and relational data warehouses. Performed Data Analysis and Data Profiling using Complex SQL queries on various systems.

Environment: AWS (EC2, S3, EMR), Hadoop, Hive, Spark, Kafka, Python, SQL, MongoDB, Tableau, Shell Scripting, MapReduce, PySpark.



Contact this candidate