Post Job Free
Sign in

Data Engineer Processing

Location:
Hartford, CT
Posted:
May 12, 2025

Contact this candidate

Resume:

Sr. Data Engineer

Name: Sravan K

Ph.no: +1-919-***-****

Email id: *******.****@*****.***

LinkedIn UrL: https://www.linkedin.com/in/gksravan/

Professional Summary:

Around 10 years of extensive experience in Data Engineering, ETL development, Big Data processing, and cloud-based data solutions across banking, healthcare, telecommunications, and retail industries.

Proficient in Python & SQL, developing and optimizing Python scripts and SQL queries to automate data extraction, transformation, and reporting tasks, significantly enhancing data processing efficiency across multiple domains.

Extensive experience in leveraging Azure cloud services such as Azure Data Factory, Azure Synapse Analytics, ADLS, and Databricks for scalable data pipelines, cloud storage, and real-time/batch data processing.

Designed and managed complex ETL pipelines using Azure Data Factory, Apache Airflow, and Azure Logic Apps, automating data workflows for banking, healthcare, and product-related projects.

Experience in data warehousing and database management, building and maintaining relational and NoSQL databases using MySQL, PostgreSQL, CockroachDB, and Snowflake, ensuring data integrity and optimizing database queries for reporting and analytics.

Hands-on expertise in Big Data technologies, utilizing PySpark, Spark SQL, SnowSQL, and Databricks to process and analyze large datasets, implementing performance optimizations to reduce processing times in healthcare and banking environments.

Integrated real-time data streams using Kafka, enabling timely fraud detection and transaction data analysis in banking systems.

Use DBT cloud to connect to Snowflake and set the development, deployment environments

Experience in machine learning and data analytics, assisting in creating data-driven reports and visualizations using Power BI, leveraging structured and semi-structured data to generate insights into financial, healthcare, and product data.

Processed MySQL and Athena Tables to create the datasets to draw the visual insights using AWS QuickSight.

Automated repetitive data processing tasks using Python libraries like Pandas and Matplotlib, and SQL-based transformations, ensuring accurate and consistent data for business decisions.

Implemented cloud infrastructure-as-code using Terraform and Docker, deploying on-premises data pipelines and managing CI/CD pipelines with Jenkins.

Experience in Data Lineage using Excel and Collibra.

Extensive experience in leveraging AWS cloud services such as AWS Glue, Amazon S3, and Amazon Redshift for scalable data pipelines, cloud storage, and real-time/batch data processing.

Designed and managed complex ETL pipelines using AWS Glue, Apache Airflow, and AWS Lambda, automating data workflows for banking, healthcare, and product-related projects.

Ensured data security and compliance by managing secure access controls using Azure Active Directory, enforcing role-based access, and maintaining regulatory compliance (e.g., PCI DSS, HIPAA) across all data systems.

Implemented data validation and quality checks to ensure accurate ingestion and processing of sensitive data, mitigating risks and ensuring high-quality inputs for analytics

Designed and implemented scalable data architectures, including data lakes and data warehouses using Azure Synapse, Snowflake, and ADLS, handling high volumes of transactional and customer data for efficient querying and analytics.

Designed and implemented scalable cloud architectures, including data lakes and data warehouses using AWS S3, Redshift, and Snowflake, handling high volumes of transactional and customer data for efficient querying and analytics.

Worked with BI tools and services & Data visualization tools such as Tableau, Amazon QuickSight etc.

Collaborated with data scientists, engineers, and business analysts to deliver data-driven solutions, including fraud detection models, sales reporting, and customer insights.

Deployed cloud-based applications using AWS Elastic Beanstalk, ensuring seamless integration of banking and healthcare data services.

Led data migration projects from on-premises systems to Azure, ensuring smooth data transition and transformation from NoSQL databases like Cassandra and MongoDB to Azure-based storage solutions.

Enhanced SQL query performance in both relational and cloud databases, reducing query latency and improving user experiences for banking and healthcare applications.

Performed database performance tuning for large-scale databases in MySQL, PostgreSQL, Snowflake, ensuring quick data retrieval for time-sensitive reports and analytics.

Created data visualizations and insights using Pandas, Matplotlib, and Power BI, generating visual reports and presentations to aid in understanding complex data trends in product development, sales, and customer transactions.

Developed automated reporting solutions using SQL queries and Python scripts, providing timely updates to stakeholders and reducing manual effort involved in generating financial and healthcare reports.

Gained hands-on experience across multiple domains (banking, healthcare, telecommunications), applying diverse technologies to address domain-specific challenges and deliver impactful data solutions.

Technical Skills:

Programming Languages

Python, SQL, Java (Basic)

Cloud Platforms

Azure (ADLS, ADF, Synapse, Databricks), AWS (Glue, S3, Redshift, Lambda)

Databases

PostgreSQL, MySQL, Snowflake, MongoDB, CassandraDB

Data Processing

PySpark, SparkSQL, Pandas, Matplotlib, Apache Airflow

CI/CD & DevOps

Jenkins, Docker, Terraform, Kubernetes

Big Data & Analytics

Amazon Redshift, Snowflake, Amazon Kinesis, Kafka

Business Intelligence

Power BI, Tableau

Security & Monitoring

AWS IAM, Azure Active Directory, AWS CloudWatch, Azure Monitor

Professional Experience:

Client: Davita, Denver, CO Feb 2023 -Till Date

Role: Sr. Data Engineer

Responsibilities:

Designed and implemented Azure Data Factory (ADF) pipelines to ingest, transform, and load patient claims data from on-premises databases into Azure Data Lake Storage (ADLS) for downstream analytics.

Developed scalable ETL workflows using PySpark in Azure Databricks to clean and normalize healthcare provider and insurance claims data for compliance with HIPAA standards.

Optimized complex queries and data models in Snowflake, ensuring efficient processing of high-volume pharmaceutical research datasets and clinical trial records.

Established a secure and scalable Azure Synapse Analytics environment for performing advanced analytics on real-world evidence (RWE) datasets, reducing query latency by 30%.

Optimized SQL-based data models within DBT to ensure high performance and faster query execution for analytical reporting.

Integrated CockroachDB as a distributed transactional database for managing patient medical records, ensuring high availability and compliance with healthcare regulations.

Led data migration projects from on-premises systems to Azure, ensuring smooth data transition and transformation from NoSQL databases like Cassandra and MongoDB to Azure-based storage solutions.

Create data ingestion modules using AWS Glue for loading data in various layers in S3 and reporting using Athena and QuickSight.

Created Logic Apps workflows to automate notifications and alerting mechanisms for real-time monitoring of drug supply chain disruptions.

Experience in Data Lineage using Excel and Collibra.

Developed CI/CD pipelines using Terraform, Docker and Jenkins for provisioning and managing healthcare data pipelines across multiple Azure environments, ensuring infrastructure consistency.

Implemented real-time data streaming using Kafka to capture and process electronic health records (EHR) from various hospital management systems.

Built interactive Power BI dashboards leveraging curated datasets from Snowflake, providing insights into patient demographics, disease trends, and drug effectiveness.

Good experience of integration with Hadoop using MongoDB.

Designed and managed role-based access controls (RBAC) in Azure Active Directory (AAD) to ensure secure access to sensitive pharmaceutical data.

Developed SparkSQL-based transformations in Azure Databricks to process and standardize unstructured genomic sequencing data for predictive healthcare analytics.

Automated metadata-driven ingestion frameworks in ADF, enabling dynamic ingestion of diverse healthcare data sources without manual intervention.

Familiar with integrating Tableau with Python and R for advanced analytics and machine learning models within Tableau. Create Athena data sources on S3 buckets for adhoc querying and business dashboarding using Quicksight and Tableau reporting tools.

Created a scalable batch processing framework in PySpark, optimizing the transformation of insurance claims data and reducing processing time by 40%.

Developed Snow SQL and Snowpark scripts for bulk loading structured clinical trial data into Snowflake, ensuring efficient storage and retrieval.

Experience database backups and test recoverability regularly and overall performance of the MongoDB.

Developed and maintained data transformation workflows using DBT to convert raw data into structured models, improving data accessibility for analysis and reporting.

Established Airflow workflows to schedule and monitor data pipelines, ensuring timely updates to regulatory compliance reports.

Experience implementing Collibra to automate data management processes.

Implemented a PostgreSQL-based data warehouse to store and manage aggregated patient outcomes data for research and analytics teams.

Technologies and Tools: Azure (Data Factory, Synapse Analytics, ADLS, Active Directory, Logic Apps), Databricks, PySpark, Snowpark, Snowflake, CockroachDB, Airflow, DBT, Collibra, PostgreSQL, Power BI, QuickSight, Kafka,

MongoDB, Jenkins, Docker, Terraform, SparkSQL, SnowSQL.

Client: Huntington Bank, Columbus, OH Oct 2020 - Jan 2023

Role: Sr. Data Engineer

Responsibilities:

Built PySpark data transformation scripts in AWS Glue to cleanse and standardize credit card transaction data for risk modeling.

Optimized Snowflake-based financial reporting dashboards, reducing data processing time by 35% through indexing and partitioning strategies.

Established Amazon Redshift as a centralized platform for performing ad hoc queries on customer credit history and fraud detection datasets.

Designed and implemented Cockroach DB as a high-availability database for storing real-time account balance updates across multiple banking branches.

Developed AWS Lambda workflows to automate loan approval notifications and customer credit limit updates based on predefined risk factors.

Built a CI/CD pipeline using Docker, Jenkins, and Terraform to automate deployment and scaling of banking data processing pipelines in AWS.

Configured Kafka for real-time streaming and anomaly detection of high-risk financial transactions, enhancing fraud prevention measures.

Created Power BI dashboards for executive reporting, providing real-time insights into mortgage loan performance and customer risk segmentation.

Managed access controls and data security policies using AWS IAM to ensure compliance with financial regulations like PCI DSS.

Engineered Spark SQL transformations in AWS Glue to aggregate and analyze multi-year customer transaction trends for personalized banking services.

Automated data ingestion processes in AWS Glue to seamlessly integrate third-party financial data sources for credit risk assessment models.

Implemented scalable batch processing frameworks in PySpark to handle large-scale financial transactions and reduce ETL execution time.

Developed Snow SQL and Snowpark scripts for efficient bulk loading of customer financial statements into Snowflake for analytics and compliance reporting.

Implemented Amazon CloudWatch and AWS CloudTrail to track real-time performance metrics of financial data pipelines, ensuring proactive issue detection and optimizing system uptime.

Orchestrated Airflow workflows for scheduling and monitoring nightly ETL jobs, ensuring seamless updates to customer account balances.

Designed and implemented a PostgreSQL-based transaction data warehouse to store structured banking data for regulatory compliance and auditing purposes.

Technologies and Tools: AWS (Glue, Redshift, IAM, Lambda, CloudWatch, CloudTrail), PySpark,Snowpark, CockroachDB, Snowflake, Airflow, PostgreSQL, Power BI, Kafka, Jenkins, Docker, Terraform, Spark SQL, Snow SQL.

Client: Byteridge Software, India Jul 2018 - Dec 2019

Role: Data Engineer

Responsibilities:

Developed AWS Glue ETL pipelines to ingest, transform, and load financial transaction data from multiple banking systems into Amazon S3 for centralized storage.

Designed and optimized Amazon Redshift data models to support high-performance analytics on customer transactions, improving query efficiency by 35%.

Implemented PySpark transformations in AWS Glue to cleanse and standardize structured and semi-structured banking data for compliance reporting.

Built CI/CD pipelines using Docker, Jenkins, and Terraform to automate infrastructure provisioning and deployment of banking data pipelines.

Configured Kafka for real-time streaming of credit card transactions, enabling fraud detection and anomaly detection with minimal latency.

Developed AWS Lambda functions to trigger alerts on high-risk transactions, reducing fraud detection response time by 50%.

Designed a PostgreSQL-based data warehouse in Amazon RDS to store structured financial data for regulatory reporting and risk assessment.

Implemented Snow SQL and Snowpark scripts for efficient data ingestion into Snowflake, ensuring seamless analytics and reporting capabilities.

Built and scheduled ETL workflows using Apache Airflow, automating daily financial data updates and ensuring data consistency.

Designed AWS IAM roles and policies to enforce fine-grained access controls on sensitive banking data, ensuring compliance with PCI DSS regulations.

Created Power BI dashboards leveraging Amazon Redshift data to provide executive insights into loan approvals, customer credit behavior, and financial risk assessment.

Established AWS CloudWatch monitoring and logging for end-to-end data pipelines, enabling proactive troubleshooting and performance optimization.

Developed MySQL-based transaction processing systems in Amazon RDS to store real-time customer payment history for auditing and reconciliation.

Implemented a Terraform-driven infrastructure-as-code approach to manage and scale AWS resources efficiently for data processing and analytics workloads.

Technologies and Tools: AWS (Glue, Redshift, S3, IAM, Lambda, CloudWatch), PySpark, Snowpark, PostgreSQL, MySQL, Snowflake, Airflow, Power BI, Kafka, Jenkins, Docker, Terraform, Spark SQL, Snow SQL.

Client: Origin Software, India Aug 2016 - Jun 2018

Role: Data Engineer

Responsibilities:

Developed and maintained Python-based ETL scripts to automate the extraction and transformation of banking transaction data, reducing manual processing time.

Assisted in setting up Apache NiFi workflows to streamline data pipelines, enabling efficient movement of large volumes of customer transaction data into on-premises HDFS for further processing.

Worked with Teradata to implement data models for reporting on financial transactions, improving query performance through effective table partitioning and indexing strategies.

Supported the development of a PostgreSQL database on an on-premises server for managing transactional data, ensuring high availability and compliance with financial regulations.

Helped configure Ansible playbooks to automate the deployment of banking data infrastructure, reducing manual setup errors and improving deployment efficiency.

Assisted in creating data pipelines using Apache Kafka to stream real-time banking transaction data, enabling timely fraud detection and financial reporting.

Collaborated in the design and implementation of a MySQL-based reporting database on bare-metal servers, providing business analysts with accurate and real-time transaction data.

Set up and managed Active Directory roles and policies to ensure proper access controls and security across multiple banking systems in the on-premises environment.

Developed and managed Nagios alerts to monitor the health and performance of on-premises banking applications, ensuring system uptime and smooth operation.

Assisted in building Power BI dashboards by integrating data from Teradata, providing insights into customer transactions, loan performance, and financial risk factors.

Contributed to the implementation of automated ETL workflows using Apache Airflow, improving data processing and analytics delivery cycles.

Supported the integration of HDFS with Apache Hive to build a scalable on-premises data warehouse for storing and analyzing structured and semi-structured banking data.

Assisted in the setup of Tomcat servers for deploying banking applications, allowing for smooth scaling and management of applications in the on-premises environment.

Worked with HDFS lifecycle policies to manage the retention and archival of historical banking transaction data, ensuring compliance with regulatory requirements.

Technologies and Tools: Python, Apache NiFi, Apache Kafka, Apache Airflow, Apache Hive, HDFS, PostgreSQL, MySQL, Teradata, Ansible, Active Directory, Nagios, Power BI, Tomcat, On-Premises Servers.

Client: Qualcomm - Hyderabad, India Sep 2014 - July 2016

Role: Data Analyst

Responsibilities:

Developed Python scripts to automate data extraction and transformation from relational databases, reducing manual data handling by 30%.

Wrote SQL queries to retrieve, filter, and analyze data from large datasets, generating insights for business decision-making.

Assisted in designing and implementing database schemas in MySQL to store structured data for research and development projects.

Collaborated with the team to optimize SQL queries, reducing query execution time by 20% and improving database performance.

Developed Python-based tools for data cleaning and preprocessing, ensuring high-quality input for analysis and machine learning projects.

Created SQL-based reports to track product development progress, customer feedback, and sales metrics, helping stakeholders make informed decisions.

Participated in data validation activities, ensuring that incoming data from various sources met quality standards and was compatible with the system.

Built Python scripts to visualize trends in product performance using libraries like Matplotlib and Pandas, supporting analysis for new product features.

Assisted in setting up basic data models and writing SQL queries for retrieving test results from large-scale product simulations.

Participated in database maintenance tasks, including data migration and updates, ensuring minimal disruption to ongoing research projects.

Technologies and Tools: Python, SQL, MySQL, Pandas, Matplotlib, Data Modeling, Database Optimization, Data Cleaning, Data Visualization.



Contact this candidate