Data Engineer Senior

Location:

Wichita Falls, TX

Posted:

May 09, 2025

Contact this candidate

Resume:

Chinthakunta Alekhya

+1-940-***-****

********************@*****.***

Senior Data Engineer

PROFESSIONAL SUMMARY:

Senior Data Engineer with 9+ years of professional experience in data infrastructure and system architecture.

Expertise in Google Cloud Platform (GCP) data engineering with GCP Dataproc, BigQuery, GCS, and Dataflow.

Strong experience in building ETL data pipelines using Apache Airflow (GCP Composer), Apache Hive, and Spark.

Expert in implementing robust data storage with technologies like Azure SQL Database and Azure Data Lake Storage, ensuring data security and scalability.

Adept at automating ETL processes using Stream Sets, significantly improving data transformation and loading speeds.

Proficient in project management and CI/CD processes using Azure DevOps, enhancing team productivity and deployment efficiency.

Experienced in data lineage tools to provide comprehensive tracking and management of data lifecycle within data-intensive applications.

Implements Agile methodologies within development teams to improve adaptability and responsiveness to business needs.

Advanced user of Unity Catalog in managing and securing data assets across distributed data environments.

Designs and deploys data integration workflows using Apache NiFi and Informatica, enhancing automation and efficiency in data handling.

Manages high-volume data storage solutions using AWS S3 and DynamoDB, optimizing data accessibility and durability.

Utilizes AWS Glue for serverless data integration and ETL, simplifying complex data transformations across various sources.

Deploys infrastructure as code using Docker and Terraform to ensure consistent and reproducible environments across stages.

I have 6 years of experience in Develops and maintains analytical dashboards using Tableau and AWS QuickSight to provide actionable insights and support decision-making.

Utilizes Python and SQL extensively to construct sophisticated data models and perform complex data analysis and manipulation tasks.

Experienced with Jira in Agile environments, enhancing project tracking and team coordination.

Manages large-scale data processing using Apache Hadoop and Apache Spark within AWS EMR, optimizing for performance and cost.

Implements and monitors data pipelines using AWS Data Pipeline and Apache Kafka to ensure robust data ingestion and stream processing.

Develops and maintains portable and scalable application environments using Docker and Terraform, enhancing development and production parity.

Employs AWS S3 for effective data storage and integrates Hadoop YARN to manage cluster resources efficiently.

Optimizes MapReduce jobs and develops Spark applications to handle large datasets with improved processing times.

Designs and implements complex ETL workflows using Python, SSIS, and Apache Airflow to streamline data extraction, transformation, and loading.

Manages data warehousing solutions with Amazon Redshift and MySQL, ensuring high availability and performance of storage systems.

Implements continuous integration and delivery pipelines using Jenkins and Git, fostering a culture of continuous improvement and agility.

TECHNICAL SKILLS:

Languages & Scripting: Python, SQL,java,Scala,Shell scripting,CDK

Big Data Technologies: Apache Hadoop, HDFS, Apache Spark, AWS EMR, Azure Databricks, BigQuery,Google Cloud Storage(GCS),Dataflow.

Data Integration Tools: SSIS, Apache Airflow, Apache NiFi, Informatica

Data Visualization: Tableau, AWS QuickSight

Database Management: PostgreSQL, MySQL, AWS DynamoDB, Azure SQL Database

Cloud Platforms: AWS, Azure,GCP,CloudEra

GCP Services: BigQuery, GCS, Dataproc, Dataflow, GCP Composer (Airflow)

DevOps & CI/CD: Docker, Jenkins, Azure DevOps, Git, Terraform,Bamboo

Data Security & Management: Azure Key Vault, Azure Active Directory, AWS Glue

Project Management: Agile, Jira, Git

Event-Driven Solutions: Kafka, Apache Flink, REST APIs

PROFESSIONAL EXPERIENCE

Client: Liberty Mutual, Boston, MA Apr 2023 to till date

Role: Senior Data Engineer

Roles & Responsibilities:

Designed and implemented ETL pipelines using Python, Talend, and GCP Composer for real-time data processing.

Developed and deployed robust data storage solutions with Azure SQL Database, Haddop, HDFS and Azure Data Lake Storage, ensuring scalable data management.

Leveraged Databricks within the Azure platform like cloudera etc.to perform efficient big data processing, significantly enhancing analytics capabilities.

Implemented Azure Machine Learning to construct and deploy advanced predictive models, improving decision-making processes.

Configured Azure Active Directory and Azure Key Vault, enhancing data security and compliance across all data platforms.

Automated ETL processes using Stream Sets, which increased data transformation speed and accuracy.

Utilized Azure DevOps for project management, which improved team collaboration and increased deployment frequency.

Employed comprehensive data lineage tools to ensure transparency and accountability in data transformations and usage.

Optimized large-scale data pipelines using BigQuery and Google Cloud Storage, enhancing query performance.

Integrated data streaming solutions with Azure Event Hubs and Stream Analytics for real-time data ingestion and analysis.

Employed Azure Functions to create serverless applications that responded dynamically to events and triggers in the data pipeline.

Set up CI/CD pipelines using Azure Pipelines, enhancing the automation of build and release processes.

Developed Azure Logic Apps to automate workflows and business processes without writing code, increasing operational efficiency.

Configured and managed Azure Monitor and Log Analytics to oversee and analyze performance metrics across services.

Implemented Azure Blob Storage for massive, scalable object storage for data-intensive applications, improving data availability.

Used Azure Data Factory to orchestrate data movement and transformation, simplifying complex data integration challenges.

Deployed Azure Cognitive Services to enhance applications with AI capabilities, such as text analysis and image recognition.

Utilized Power BI for developing interactive visualizations and business intelligence reports, aiding strategic decision-making.

Implemented Azure Synapse Analytics to run big data analytics seamlessly across data warehouses and big data systems.

Developed custom scripts in Python and PowerShell to automate routine tasks and data operations within the Azure environment.

Engineered disaster recovery strategies using Azure Site Recovery, ensuring business continuity and data integrity.

Configured Azure ExpressRoute to establish a private connection to Azure services, enhancing network security and performance.

Utilized Azure Active Directory B2C to manage customer, partner, and employee identities at scale, securing access to applications.

Environment: Azure Data Factory,GCP, Azure SQL Database, Azure Databricks, Azure Machine Learning, Azure Active Directory, Databricks, Azure DevOps, StreamSets, Azure Event Hubs, Azure Functions, Azure Pipelines, Azure Logic Apps, Azure Monitor, Azure Blob Storage, Azure Kubernetes Service, Azure Cognitive Services, Power BI, Azure Synapse Analytics, Python, PowerShell, Azure Site Recovery, and Azure ExpressRoute.

Client: HCSC, Chicago, IL. Jun 2021 to Mar 2023

Role: Data Engineer

Roles & Responsibilities:

•Orchestrated data pipeline configurations with Apache Airflow, managing scheduled ETL tasks for healthcare data integration efficiently.

•Managed Kafka streams to enable real-time health data processing and monitoring, vital for responsive healthcare services.developed event-driven solutions using REST APIs and Kafka technologies.

•Integrated AWS S3 into our data ecosystem for scalable data storage solutions, essential for managing large volumes of healthcare data.

•Implemented AWS Glue for serverless data cataloging and ETL operations, enhancing data accessibility and management within the healthcare domain.

•Utilized AWS EMR to process large-scale health data efficiently, supporting analytics and decision-making in healthcare.

•Configured AWS DynamoDB for high-performance NoSQL data storage, optimizing data retrieval for critical healthcare applications.

•Employed Terraform to automate cloud infrastructure provisioning, ensuring robust and scalable environments for healthcare data operations.

•Developed data flows with Apache NiFi, ensuring seamless data ingestion and routing, which enhanced data accuracy and timeliness in healthcare reporting.

•Deployed microservices in AWS ECS to enhance data processing capabilities, supporting complex healthcare data operations effectively.

•Leveraged Databricks for deploying data science and machine learning projects, advancing patient care through predictive analytics.

•Applied MLflow for managing machine learning experiments and model tracking, integral to personalized healthcare solutions.

•Wrote Python scripts to automate data transformation and cleansing processes, increasing efficiency and reducing errors in data handling.

•Utilized SQL for complex data queries, analyzing healthcare databases to extract insights crucial for patient care optimization.

•Managed version control with Git to maintain integrity and collaboration across healthcare data projects, ensuring consistent data handling practices.

•Practiced Agile methodologies to expedite project deliverables and ensure responsiveness to evolving healthcare needs, improving project agility.

•Provided technical documentation for system setups and data architectures, ensuring clarity and compliance in healthcare data management.

•Ensured compliance with HIPAA regulations through secure data handling practices, protecting patient privacy and data integrity.

•Coordinated with cross-functional teams to align data engineering with business objectives, enhancing healthcare services through technology.

•Developed data integrity tests to validate healthcare datasets for accuracy, ensuring reliable data for critical care decisions.

•Performed root cause analysis on data incidents to prevent future occurrences, maintaining system reliability and data quality.

•Trained new team members on AWS services and data processing techniques, enhancing team capabilities and service quality.

•Monitored data workflows for performance bottlenecks and implemented optimizations, ensuring efficient data handling and processing.

•Created data visualizations to represent patient data trends and health outcomes, supporting proactive healthcare management and planning.

Environment: Apache Airflow, Kafka, AWS (S3, Glue, EMR, DynamoDB, ECS), Terraform, Apache NiFi, Databricks, MLflow, Python, SQL, Git, Agile methodologies, and data visualization tools.

Client: Safeway, Pleasanton, CA. Feb 2019 to May 2021

Role: Big Data Engineer

Roles & Responsibilities:

Administered large-scale data processing using Apache Hadoop and Apache Spark within the AWS EMR environment to handle complex datasets.

Managed and optimized data pipelines using AWS Data Pipeline and Apache Kafka for efficient data ingestion and streaming.

Employed Docker to containerize applications, ensuring consistent environments across development, testing, and production.

Utilized Terraform for infrastructure as code to automate the provisioning of cloud resources, enhancing operational efficiency.

Implemented AWS S3 for scalable and secure cloud storage solutions, optimizing data accessibility and durability.

Managed computing resources effectively using Hadoop YARN, optimizing resource allocation and job scheduling.

Developed and optimized MapReduce jobs to process large volumes of data quickly and efficiently.

Configured AWS Glue as a serverless data integration service to simplify ETL operations and data preparation.

Leveraged Apache Spark for advanced data processing tasks, including real-time analytics and machine learning.

Ensured high data availability and disaster recovery using AWS Elastic Block Store (EBS) for block-level storage.

Utilized Docker Compose to define and run multi-container Docker applications, streamlining deployment processes.

Automated deployment and scaling of containerized applications using AWS Elastic Container Service (ECS).

Enhanced data security and compliance by implementing AWS Identity and Access Management (IAM) policies.

Monitored application and infrastructure health using AWS CloudWatch to maintain system performance.

Developed data ingestion mechanisms using Sqoop to transfer data between Hadoop and relational databases efficiently.

Implemented continuous integration and continuous deployment (CI/CD) workflows using Jenkins, improving development and release speed.

Used AWS Lambda to run code in response to events, reducing the complexity of back-end code management.

Configured AWS CloudTrail to enable governance, compliance, and operational and risk auditing of the AWS account.

Employed AWS Redshift to perform large scale data analytics and gain deeper insights into data patterns.

Facilitated real-time data analysis by integrating AWS Kinesis with Apache Spark, enhancing the responsiveness to data-driven insights.

Environment: Apache Hadoop, Spark, Kafka, Docker, Terraform, AWS services (EMR, Data Pipeline, S3, Glue, ECS, EBS, IAM, CloudWatch, Lambda, CloudTrail, Redshift, Kinesis), and complementary technologies (Docker Compose, Jenkins, Sqoop).

Client: FuGenX Technologies, Hyderabad, India Jan 2017 to Nov 2018

Role: Python Developer

Roles& Responsibilities:

Developed complex ETL workflows using Python and SSIS, streamlining data extraction, transformation, and loading processes.

Managed data warehousing solutions using Amazon Redshift, optimizing data storage and retrieval operations.

Automated data pipelines using Apache Airflow, enhancing scheduling and monitoring of batch jobs.

Leveraged MySQL for database management, ensuring efficient data storage and high availability.

Utilized Git for version control, maintaining the integrity and traceability of development changes.

Implemented Jenkins for continuous integration and continuous deployment, improving development lifecycle efficiency.

Developed data processing scripts in PySpark, handling large datasets with improved performance.

Used SQL extensively for database querying, enhancing data retrieval and manipulation capabilities.

Configured Amazon S3 as a scalable storage solution for storing and retrieving various file formats.

Employed Docker for creating isolated environments, ensuring consistency across development, testing, and production.

Automated routine data operations, significantly increasing efficiency and reducing manual errors.

Optimized data handling and processing by implementing various Python libraries for data analysis.

Managed project source code and documentation using GitLab, enhancing collaboration and code management.

Developed APIs for data integration, facilitating seamless data exchange between systems.

Enhanced system security by integrating security patches and updates into development workflows.

Conducted performance tuning of SQL queries and databases to improve application responsiveness.

Facilitated team collaboration and project tracking using Agile methodologies to meet project deadlines.

Implemented data validation and cleaning processes, ensuring the accuracy and quality of data in reports.

Designed and developed REST APIs for efficient data communication between microservices.

Utilized XML and JSON formats for data serialization/deserialization, ensuring seamless data exchange.

Integrated external systems through secure RESTful web services, enhancing application interoperability.

Environment: Python, SSIS, Amazon Redshift, Apache Airflow, MySQL, Git, Jenkins, PySpark, SQL, Amazon S3, Docker, GitLab, REST APIs, XML, JSON.

Client: Newton Software Pvt Ltd, Pune, India Mar 2015 to Jan 2017

Role: Business Analyst

Roles & Responsibilities:

Analyzed business requirements and transformed them into detailed data models using SQL and Python.

Developed and maintained business intelligence solutions using Tableau, enabling advanced data visualization and reporting.

Managed PostgreSQL databases, optimizing schema designs and query performance for business applications.

Utilized R for statistical analysis and data processing, aiding in data-driven decision-making processes.

Leveraged Microsoft Excel for data manipulation and complex financial modeling, enhancing operational efficiencies.

Employed GitLab for version control, improving code collaboration and management among team members.

Configured Apache Hive to query and manage large datasets stored in distributed storage.

Developed automated reporting systems, significantly reducing the time required for report generation.

Performed data cleaning and preprocessing using Python, ensuring data quality and consistency.

Implemented Docker for deploying applications, simplifying configuration and dependencies management.

Enhanced data analysis capabilities by integrating SQL-based analytics into business workflows.

Conducted regular data audits to ensure compliance with data governance and standards.

Utilized advanced Excel functions and macros to automate repetitive tasks and enhance data analysis.

Streamlined data collection and entry processes, reducing errors and increasing data reliability.

Collaborated closely with business stakeholders to understand their data needs and deliver tailored solutions.

Environment: SQL, Python, Tableau, PostgreSQL, R, Microsoft Excel, GitLab, Apache Hive, Docker.

Education:

Bachelor of Technology (B.Tech) in Computer Science and Engineer from JNTUH, Hyderabad, Telangana, India. - 2015

Contact this candidate