Data Engineer Senior

Location:

St. Augustine, FL

Posted:

July 29, 2025

Contact this candidate

Resume:

Vishnu Mannam

Email: ***********@*****.*** Contact: +1-904-***-**** LinkedIn: Vishnu

Senior Data Engineer

Summary of Experience

* ***** ** ********** ** data engineering and big data solutions development across airline, healthcare, and financial sectors.

Expertise in designing and deploying scalable ETL pipelines using Hadoop, Spark, Hive, and Python.

Strong proficiency in cloud platforms including AWS and Azure for data processing and analytics.

Skilled in handling structured and unstructured data using tools like Kafka, Flume, Sqoop, and NiFi.

Extensive experience with DevOps tools such as Jenkins, Docker, and Kubernetes for CI/CD.

Hands-on with Snowflake for data warehousing and analytics integration.

Experienced in building secure, compliant systems adhering to HIPAA and SOX.

Proven ability in real-time data processing and visualization using Looker, Tableau, and OLAP models.

Techincal Skills

Category

Tools and Technologies

Programming Languages

Python, Java, Scala, PL/SQL, SQL, Shell Scripting

Big Data & Distributed Sys

Hadoop, Hive, HDFS, MapReduce, Spark, Kafka, Flume, Sqoop, Oozie, Airflow, Zookeeper, NiFi, Storm

Cloud & DevOps

AWS (EC2, EMR, S3, Lambda, CloudFormation, CloudWatch), Azure (Data Lake, Data Factory, Blob Storage), Jenkins, Docker, Kubernetes, Git, GitHub, Code Pipeline, Code Deploy

Databases

Oracle, Snowflake, SQL Server, PostgreSQL, MySQL, MongoDB, Cassandra, HBase

ETL & BI Tools

Informatica, Talend, SSIS, SSAS, SSRS, Tableau, Looker, SAS

Data Modeling & Warehousing

Star Schema, Snowflake Schema, Erwin, Oracle Data Modeler, OLAP

Operating Systems

Linux, Unix, Windows

Project Experience

Client: American Airlines, Fort Worth TX Duration: April 2023 – Present

Role: Senior Data Engineer

Key Contributions:

Built real-time data ingestion pipelines using Kafka, Flume, and Spark to analyze airline operations data.

Developed Snowflake workflows to automate ETL and generate business reports via Looker.

Implemented HIPAA-compliant pipelines for PII/PHI data from partner systems.

Designed Spark-based batch processing to aggregate booking data for analytics dashboards.

Backend Development:

Developed Hive and Pig scripts to support analytic workflows in Hadoop.

Extended Hive with custom UDFs/UDTFs and managed complex workflows via Oozie.

Used Sqoop to migrate RDBMS data to Hadoop and coordinated batch loads with Zookeeper.

Implemented Python modules for branch and job control using Kafka consumers.

Wrote MapReduce jobs for data cleansing of airline booking feeds.

Automated transformation and enrichment jobs with Scala and SparkSQL.

Ingested structured/unstructured data into HBase and Hive.

Scheduled daily workflows with Oozie and handled streaming data with Kafka.

DevOps & CI/CD:

Developed CI/CD pipelines using Jenkins, GitHub, Docker, and Kubernetes.

Configured scalable infrastructure using AWS EC2, EMR, CloudFormation templates.

Monitored performance with CloudWatch and scaled systems with Auto Scaling Groups.

Containerized Python applications for testing and production deployment.

Implemented role-based access and job-level security policies.

Integrated Prometheus/Grafana for real-time job metrics.

Maintained infrastructure versioning across dev, test, and prod environments.

Automated delivery and logging with ELK Stack.

Database:

Migrated airline booking and ticketing data from Oracle to Snowflake.

Created external/managed tables for semi-structured logs and JSON records.

Built stored procedures and pipelines in PL/SQL for real-time job control.

Partitioned Snowflake tables for high-volume analytic queries.

Applied cost optimization with virtual warehouse scaling.

Validated schemas and source connectivity with SQL Developer.

Queried audit logs for lineage and reconciliation analysis.

Developed stored functions for data transformation and filtering.

Environment: AWS, Snowflake, Hive, Pig, Sqoop, Spark, Python, Kafka, HDFS, Jenkins, Docker, Kubernetes, Looker

Client: State of TX (DSHS), Austin, TX Duration: June 2021 – March 2023

Role: Big Data Engineer

Key Contributions:

Migrated large-scale healthcare workloads to Azure Databricks using Spark/Scala.

Integrated Kafka-streamed EHR data into HDFS with near real-time pipelines.

Ensured HIPAA compliance with audit-controlled Azure environments.

Built dashboards in Tableau and alerts for SLA monitoring of Kafka queues.

Backend Development:

Rewrote MapReduce programs to Spark for better efficiency.

Built Oozie workflows to schedule multi-stage Hive and Pig jobs.

Created HiveQL for summarizing claims and patient metrics.

Converted batch ingestion to streaming using Kafka and Spark Streaming.

Developed shell scripts for task scheduling and job dependencies.

Parsed unstructured EHR data using Python scripts.

Loaded structured healthcare data into HBase for fast retrieval.

Built custom data validation modules with Python.

DevOps & CI/CD:

Automated infrastructure with Jenkins and Azure DevOps.

Managed Docker containers and Kubernetes deployments on Azure VMs.

Controlled source code via Git and integrated Jira for agile tracking.

Used Azure Monitor for metrics and custom alerts.

Created re-usable Azure templates for workspace creation.

Performed daily environment health checks and access audits.

Automated rollouts with container registries and blob syncing.

Monitored ETL pipelines using Kafka lag metrics.

Database:

Created Snowflake schemas to optimize healthcare data access.

Defined warehouse sizing and scheduling rules.

Ingested data from SAP HANA using PySpark and ODBC connectors.

Wrote Python transformations to clean, validate, and deduplicate JSON data.

Used PL/SQL to maintain historical transaction data.

Queried Oracle views for pre-stage health record insights.

Optimized partitions for reporting and cube creation.

Used Azure Blob Storage to stage incoming EHR feeds.

Environment: Azure, Spark, Kafka, Snowflake, Hive, Pig, HDFS, Tableau, Python, Scala, Docker, Kubernetes

Client: Wells Fargo, Charlotte, NC Duration: Sep 2019 – May 2021

Role: Big Data Engineer

Key Contributions:

Built data ingestion tools in Scala and Spark for heterogeneous source systems.

Created reusable ETL framework in Python to load finance datasets.

Designed Redshift and S3 interactions with Glue and Lambda.

Ensured encryption and SOX-compliance for critical financial transactions.

Backend Development:

Integrated with Kafka for real-time account transaction processing.

Used Python Flask to expose lightweight endpoints for ingestion triggers.

Handled transformation logic using Hive/Pig and Spark RDDs.

Ingested bank ledger files using Nifi and custom validation scripts.

Set up batch scheduling using Oozie for data exports.

Optimized SQL joins and partition pruning on Redshift.

Used Spark Streaming for log monitoring and alert generation.

Built user access logging for SOX audits.

DevOps & CI/CD:

Automated testing and deployment using Jenkins pipelines.

Used Maven and custom shell scripts for version control and packaging.

Deployed services to AWS using EC2 with secure IAM roles.

Monitored services with CloudWatch and Kibana dashboards.

Used Elastic Search for indexing critical banking logs.

Created regression test suites for CI automation.

Scanned images and logs with vulnerability scanners.

Maintained integration with ticketing systems for DevOps workflows.

Database:

Ingested data to Redshift using Python and AWS Glue jobs.

Built Glue crawlers for schema discovery and SQL querying.

Imported daily logs into Hive from Oracle mainframes.

Wrote shell scripts to move snapshot tables into HDFS.

Extracted data into ElasticSearch for performance dashboards.

Used partitioned tables to handle daily snapshots efficiently.

Scheduled table archiving and vacuuming jobs.

Automated SSIS jobs for SQL Server reconciliation.

Environment: AWS, Redshift, Hive, Pig, Spark, Python, Flask, SQL Server, Jenkins, ElasticSearch, Docker, Nifi

Client: TCS, India Duration: May 2018 – Aug 2019

Role: Hadoop Developer

Key Contributions:

Built scalable data workflows for enterprise ETL using Hive and Pig.

Managed Hadoop clusters and ensured job orchestration using Oozie.

Worked across multiple Hadoop components for parallel data processing.

Built foundational knowledge on Cassandra and Avro/Parquet formats.

Backend Development:

Built MapReduce programs for retail data enrichment.

Imported RDBMS sources to Hadoop using Sqoop.

Wrote Hive UDFs to process semi-structured sales feeds.

Parsed system logs using Python and uploaded them to HDFS.

Created Pig scripts to restructure marketing campaign results.

Validated data pipelines with HiveQL and custom counters.

Set up workflow dependencies using Oozie coordinator.

Implemented HDFS archival with compression techniques.

DevOps & CI/CD:

Maintained source control using SVN.

Wrote Ant scripts for daily builds.

Configured logs with Flume for consistent ingestion.

Implemented table-level backups and restores.

Managed jobs with Hadoop JobTracker.

Tuned performance via job-level configuration.

Documented batch cycle timing for SLA tracking.

Automated Flume startup monitoring via scripts.

Database:

Stored data in Hive and HBase.

Used Avro and Parquet for format compatibility.

Validated Hive schemas against RDBMS views.

Queried data using HiveQL and Pig Latin.

Cleaned up staging tables using custom cron jobs.

Created schemas in Cassandra for timeseries data.

Audited record counts between Hive and Oracle.

Used shell scripting for job-to-table mapping.

Environment: Hadoop, Hive, Pig, Oozie, HDFS, Cassandra, Flume, Python, MapReduce, Sqoop, SVN

Education Details

B.Tech in Mechanical Engineering, Sri Krishnadevaraya University College of Engineering and Technology.

Contact this candidate