Post Job Free
Sign in

Data Engineer Senior

Location:
Trumbull Center, CT
Posted:
September 10, 2025

Contact this candidate

Resume:

MANASA KONAGANTI

Senior Data Engineer

Contact: +1-469-***-**** Email:*****************@*****.***

LinkedIn : https://www.linkedin.com/in/manasa-reddy-6a52b724b/

Professional Summary:

Data Engineer with 7+ years of experience in designing, building, and maintaining scalable data infrastructure and ETL/ELT pipelines across cloud and on-premises environments, supporting analytics and machine learning initiatives for enterprise-level organizations.

Extensive experience with Big Data technologies including Apache Spark, Hadoop ecosystem (HDFS, MapReduce, Hive), Apache Kafka, Apache Airflow, and Databricks for processing large-scale datasets and real-time streaming analytics.

Proficient in cloud data platforms such as AWS (S3, Redshift, EMR, Glue, Kinesis, Lambda), Azure (Data Factory, Synapse Analytics, Data Lake Storage, Event Hubs), and Google Cloud Platform (BigQuery, Dataflow, Pub/Sub, Cloud Storage) for building modern data architectures.

Snowflake experience in building modern data transformation pipelines using dbt for SQL-based transformations, testing, and documentation while leveraging Snowflake's cloud data warehouse capabilities for scalable analytics workloads.

Strong programming skills in Python, Scala, Java, SQL, and R with expertise in data manipulation libraries including Pandas, NumPy, PySpark, Dask, and Apache Spark SQL for data processing and analysis.

Database expertise across multiple systems including PostgreSQL, MySQL, Oracle, MongoDB, Cassandra, Redis, Snowflake, and Amazon DynamoDB with experience in database design, optimization, and migration strategies.

Data warehouse and data lake architecture experience using Amazon Redshift, Google BigQuery, Snowflake, Azure Synapse Analytics implementing dimensional modeling, star/snowflake schemas, and modern lakehouse architectures.

Advanced ETL/ELT pipeline development using tools like Apache Airflow, Prefect, Luigi, AWS Glue, Azure Data Factory, Talend, and Informatica with focus on data quality, monitoring, and error handling mechanisms.

Real-time data streaming expertise with Apache Kafka, Amazon Kinesis, Azure Event Hubs, Apache Pulsar, and Apache Storm for building event-driven architectures and real-time analytics platforms.

Container and orchestration technologies including Docker, Kubernetes, Apache Mesos for deploying and scaling data applications with experience in Helm charts and container registries.

Infrastructure as Code (IaC) proficiency using Terraform, CloudFormation, Azure Resource Manager (ARM), and Ansible for automated provisioning and management of data infrastructure.

Data modeling and schema design experience with dimensional modeling, data vault methodology, and modern approaches like dbt (data build tool) for maintaining data lineage and implementing data transformations.

CI/CD pipeline implementation for data engineering workflows using Jenkins, GitLab CI, Azure DevOps, GitHub Actions with automated testing, deployment, and monitoring of data pipelines.

Data governance and quality frameworks implementation including Apache Atlas, AWS Data Catalog, Azure Purview, Great Expectations for metadata management, data lineage tracking, and quality monitoring.

Performance optimization expertise in query tuning, partitioning strategies, indexing, and caching mechanisms across various database systems and big data platforms to handle high-volume data processing.

Machine Learning pipeline integration experience working with MLflow, Kubeflow, SageMaker, Azure ML to build end-to-end ML data pipelines and feature stores for data science teams.

Monitoring and observability tools proficiency including Prometheus, Grafana, ELK Stack (Elasticsearch, Logstash, Kibana), Datadog, and CloudWatch for pipeline monitoring and troubleshooting.

Agile and DevOps methodologies experience working in Scrum, Kanban environments with strong collaboration skills across data science, analytics, and engineering teams to deliver data-driven solutions

Technical Skills:

Category

Technologies/Tools

Big Data Frameworks

Apache Spark, Hadoop (HDFS, MapReduce, Hive), Databricks

Cloud Platforms

AWS (S3, Redshift, EMR, Glue, Kinesis, Lambda), Azure (Data Factory, Synapse Analytics, Data Lake Storage), GCP (BigQuery, Pub/Sub), Snowflake

Programming Languages

Python, Scala, Java, SQL, Shell Scripting

Data Processing Libraries

PySpark, Pandas, NumPy, Apache Spark SQL

Streaming Technologies

Apache Kafka, Amazon Kinesis, Azure Event Hubs, Confluent Platform

ETL/ELT Tools

Apache Airflow, AWS Glue, Azure Data Factory, dbt

Data Warehouses

Snowflake, Amazon Redshift, Google BigQuery, Azure Synapse Analytics

Data Lakes/Storage

Delta Lake, Amazon S3, Azure Data Lake Storage, Google Cloud Storage

Infrastructure as Code

Terraform

CI/CD Tools

Jenkins, GitLab CI, Azure DevOps, GitHub Actions

Version Control

Git, GitHub, GitLab, Bitbucket, Azure Repos

Monitoring & Observability

Prometheus, ELK Stack, CloudWatch

Development IDEs

PyCharm, IntelliJ IDEA, VS Code

Workflow Orchestration

Apache Airflow, AWS Step Functions, Azure Logic Apps

Data Quality Tools

Great Expectations, dbt tests

Methodologies

Agile, Scrum, Kanban

Professional Experience:

Client: - General Motors, Columbus, OH July 2023 - Present Senior Data Engineer

Responsibilities:

Architected enterprise-wide data lake platform using AWS S3, Glue, and EMR to process 500TB+ of insurance claims data, implementing Delta Lake for ACID transactions and enabling real-time analytics for risk assessment models

Designed and implemented real-time streaming pipelines using Apache Kafka and Apache Flink to process 10M+ daily transactions from policy management systems, reducing data latency from hours to sub-second for fraud detection algorithms

Led data warehouse modernization initiative migrating legacy systems to Snowflake, implementing dbt for transformation logic and establishing data governance frameworks that improved query performance by 75% and reduced infrastructure costs by 40%

Implemented advanced data observability platform using Great Expectations with custom monitoring solutions, reducing data quality incidents by 80% and establishing automated data lineage tracking across 150+ data pipelines

Optimized big data processing workflows using Apache Spark on Databricks with PySpark and Scala, processing insurance claims data 10x faster through partitioning strategies and adaptive query execution

Established CI/CD practices for data pipelines using Jenkins, GitLab, and Terraform with automated testing and infrastructure as code, reducing deployment time from days to hours

Collaborated with data science teams to build feature stores and real-time inference pipelines supporting machine learning models for customer segmentation and risk analysis

Designed event-driven architecture using Apache Kafka Connect and AWS Kinesis to capture real-time policy changes and claims events for immediate processing and alerting

Environment: Python, Apache Spark, Databricks, Snowflake, dbt, Apache Kafka, Apache Flink, AWS (S3, EMR, Glue, Kinesis, Lambda), Apache Airflow, SageMaker, Terraform, Docker, Kubernetes, PostgreSQL, Delta Lake

Client: Cenit IT Hub Pvt Ltd, Hyderabad, India Sept 2018 - July 2022

Data Engineer

Responsibilities:

Developed scalable ETL pipelines using Apache Airflow and Python to process 100GB+ daily data from multiple e-commerce sources, implementing error handling and data quality validation for production reliability

Built data warehouse solution on Amazon Redshift with optimized table design, compression strategies, and query optimization, improving dashboard load times by 60% and supporting analytics for 1M+ customers

Implemented streaming data processing using Apache Kafka and Spark Streaming to handle real-time inventory updates and customer behaviour tracking, enabling real-time recommendation engines

Designed dimensional data models for retail analytics using star schema methodology, creating fact and dimension tables that support complex analytical queries for business intelligence reporting

Created automated data quality monitoring using Python scripts and SQL to validate daily data loads, implement data profiling, and alert on anomalies, establishing foundational data governance practices

Developed API data integration solutions using Python and REST APIs to extract data from CRM, ERP, and third-party systems, consolidating data into centralized analytical databases

Built NoSQL data solutions using MongoDB and Cassandra for handling semi-structured customer interaction data and product catalogue information, implementing appropriate data modeling strategies

Implemented CI/CD for data workflows using Jenkins with automated testing and deployment, reducing manual deployment efforts and eliminating configuration errors

Optimized database performance on PostgreSQL and MySQL through indexing strategies, query tuning, and partitioning, improving analytical query response times by 50%

Collaborated with business stakeholders to gather requirements and translate business needs into technical data solutions, supporting cross-functional analytics initiatives

Environment: Python, Apache Airflow, Apache Spark, Apache Kafka, Amazon Redshift, PostgreSQL, MySQL, MongoDB, Cassandra, AWS (S3, EC2, RDS), Jenkins, Docker, Pandas, SQL

Client:Indus Business Systems Ltd, India Jan 2018 – Aug 2018

Data Engineer

Responsibilities:

Built foundational ETL processes using Python and Pandas to extract data from MySQL and Oracle databases, transforming and loading data into centralized reporting database for business analytics

Created automated reporting solutions using Python scripts and SQL queries to replace manual Excel-based processes, saving 15+ hours per week and improving data accuracy for financial reporting

Implemented basic data quality checks using SQL and Python to validate data consistency and completeness, creating alerts for data anomalies and establishing data monitoring practices

Supported database administration tasks including backup and recovery procedures for

PostgreSQL and MySQL, ensuring data availability and implementing basic security measures

Created documentation for data sources, transformations, and business logic, establishing knowledge base for data lineage and supporting future development efforts

Participated in requirements gathering sessions with business users to understand reporting needs and translate them into technical data solutions

Developed basic web scraping solutions using Python and BeautifulSoup to collect external market data for competitive analysis and enrichment of internal datasets

Environment: Python, Pandas, PostgreSQL, MySQL, Oracle, Tableau, SQL, BeautifulSoup, Excel, Basic ETL tools

Client: Standalone IT Solutions, Hyderabad, India July 2016 - Dec 2017

Software Developer

Responsibilities:

Performed data extraction and analysis using SQL queries on Oracle 9i to support business reporting requirements and ad-hoc analytical requests from various departments

Created basic data reports using Excel and SQL to track business KPIs, customer metrics, and operational performance, providing insights to management teams

Developed simple data validation scripts using SQL to ensure data quality and identify inconsistencies in transactional systems, supporting data cleanup initiatives

Assisted in manual testing of data-related applications and documented test results, gaining foundational understanding of data quality and testing methodologies

Supported data migration projects by writing SQL scripts to transfer data between systems and validate data integrity during system upgrades

Created basic data visualizations using Excel charts and pivot tables to present analytical findings to business stakeholders in accessible formats

Participated in database maintenance activities including data archiving and cleanup procedures under senior supervision

Documented data sources and business rules to support knowledge transfer and maintain institutional knowledge about data systems

Environment: Oracle 9i, SQL, Excel, Basic reporting tools, Manual testing, Data documentation

Education: -

New England College Oct 2023

Master’s in computer & information systems

JNTU University May 2016

Bachelor of Technology, Computer Science and Engineering



Contact this candidate