Data Engineer It Infrastructure

Location:

Chicago, IL

Posted:

October 19, 2025

Contact this candidate

Resume:

Surya P Data Engineer

************@*****.*** Mobile: 331-***-**** LinkedIn

Professional Summary

Experienced and versatile Data Engineer 3+ with a proven record in Software Development, IT Infrastructure, and Banking/Finance domains. Expert at building real-time, secure, and scalable data systems across cloud environments. Adept at automating ETL pipelines, optimizing data warehouses, and driving data quality and compliance for analytics and AI workloads.

Technical Skills

• Languages: Python, SQL, Scala, Java, Bash

• Big Data & ETL: Apache Spark, Hadoop, Airflow, Kafka, Flink, dbt, NiFi, Informatica, Talend

• Databases: PostgreSQL, Oracle, Mongo DB, SQL Server, Cassandra, Teradata, Snowflake

• Cloud Platforms: AWS (Glue, Redshift, EMR, S3), Azure (Data Factory, Synapse), GCP (BigQuery, Dataflow)

• Infra & DevOps: Docker, Kubernetes, Jenkins, Therefrom, Ansible, GitOps, ELK Stack, Prometheus

• Security & Compliance: Data Masking, GDPR, PCI-DSS, Encryption, AWS KMS

• Visualization: Power BI, Tableau

Professional Experience

Citibank New York, NY

Data Engineer May 2024 – Present

• Developed robust ETL pipelines using Apache Spark and Airflow, automating data movement from 10+ financial systems.

• Built real-time fraud detection workflows with Kafka and Flink, processing millions of transactions daily with near-zero delay.

• Designed and deployed data lake house architecture in Snowflake, improving credit risk analytics by 35%.

• Automated regulatory and compliance reports (RBI, PCI-DSS) using Python, reducing manual efforts by 75%.

• Implemented data masking and encryption using AWS KMS and Lake Formation to ensure full data compliance.

• Created data lineage tracking via Apache Atlas, boosting traceability and audit efficiency.

• Partnered with Data Science teams to build feature stores for fraud and credit scoring models, cutting model training time.

• Enhanced data reliability by implementing Great Expectations to validate 100% of critical datasets.

• Optimized query performance on Snowflake, reducing data aggregation time from hours to minutes.

• Developed executive dashboards in Power BI to visualize risk, compliance, and credit metrics for management.

• Migrated on-prem Teradata systems to Snowflake on AWS, ensuring zero data loss during transition.

• Designed and automated ETL monitoring workflows using Jenkins and Prometheus for proactive alerting. Cognizant Hyderabad, India

Data Engineer May 2022 – July 2023

• Designed data ingestion pipelines using Azure Data Factory & Databricks, enabling multi-source data integration for monitoring systems.

• Developed real-time telemetry aggregation using Kafka and NiFi, capturing 500+ servers’ performance metrics daily.

• Built a centralized ELK-based data hub, allowing IT teams to analyze system health, logs, and uptime trends in real time.

• Automated data workflows with Jenkins and Terraform, achieving 100% repeatable deployments across environments.

• Implemented data lineage and cataloging through Apache Atlas and Collibra for enhanced governance.

• Migrated traditional data warehouse workloads to Snowflake, reducing compute costs and increasing throughput.

• Created predictive analytics dashboards using Power BI to identify system anomalies and performance bottlenecks.

• Set up proactive monitoring using Prometheus + Grafana, enabling 99.9% pipeline uptime through automated alerts.

• Developed a metadata-driven ETL framework that dynamically adapts to SLA changes and reduces maintenance overhead.

• Partnered with DevOps teams to create cross-cloud data synchronization between Azure and AWS for business continuity.

• Enhanced data reliability by implementing Airflow DAG dependency management and auto-retry logic.

• Mentored a team of junior data engineers, improving team delivery speed and code quality through GitOps and reviews.

Techcore, Hyd

Data Engineer Sep 2021 – Apr 2022

• Built real-time streaming pipelines using Kafka + Spark Structured Streaming, handling 3M+ events per day for analytics.

• Designed and implemented a data lake on AWS (S3 + Glue + Athena) to centralize product and user data analytics.

• Automated ETL workflows using Apache Airflow, reducing manual scheduling and improving data freshness.

• Created modular transformation layers using dbt, improving maintainability and version control of analytics models.

• Developed RESTful data APIs to deliver insights to engineering and product dashboards in real time.

• Built CI/CD pipelines using Jenkins + Docker, ensuring seamless deployment of data projects and ETL updates.

• Designed data quality checks with Great Expectations to detect schema drift and data anomalies proactively.

• Optimized AWS Glue jobs and Redshift queries, achieving 40% faster data processing times.

• Integrated schema registry in Kafka to maintain data integrity and schema evolution across micro services.

• Collaborated with ML teams to build feature stores for predictive analytics, improving model serving efficiency.

• Created data observability dashboards using Prometheus and Grafana for real-time data pipeline monitoring.

• Reduced AWS infrastructure cost by 30% through data partitioning and lifecycle management strategies.

Contact this candidate