Data Engineer Big

Location:

Cincinnati, OH

Posted:

October 15, 2025

Contact this candidate

Resume:

Gosula Reddy

DATA ENGINEER

513-***-**** ****************@*****.*** LinkedIn Cincinnati, OH

SUMMARY

Results-driven Data Engineer with 4+ years of experience designing, developing, and deploying scalable data pipelines, ETL workflows, and model-ready datasets across healthcare, financial, and consulting domains. Skilled in cloud platforms including Azure (ADF, Synapse, Databricks, Blob), AWS (S3, Redshift, EMR, Glue, Lambda, DynamoDB, CodePipeline), and GCP (BigQuery, Dataflow, DataProc, GCS, Cloud Functions). Proficient in Python, SQL, PL/SQL, PySpark, Scala, dbt, REST APIs, and Big Data technologies with expertise in Snowflake, DBT, Hadoop, Kafka, HBase, Spark Streaming, and Airflow. Experienced in MLOps with MLflow, feature engineering, TensorFlow, scikit-learn, and GenAI workflows such as embedding stores, vector databases, and prompt-tuning datasets for LLM applications. Hands-on with Docker, Kubernetes, Terraform, Informatica, Talend, SaaS deployments, and CI/CD automation, while ensuring strict data governance, security, and measurable business impact in Agile environments.

EDUCATION

• Master of Science in Information Technology from the University of Cincinnati, Cincinnati, OH, USA. TECHNICAL SKILLS

• Languages: Python, SQL, PL/SQL, Scala, R, XML, XHTML, HTML, AJAX, CSS, HiveQL, Unix, Shell Scripting

• Big Data Technologies: HDFS, YARN, MapReduce, Hive, Spark, Apache Kafka, Zookeeper, MongoDB, Cassandra, Puppet, Avro, Parquet.

• NO SQL Databases: Postgres, HBase, Cassandra, MongoDB, Amazon DynamoDB, Red

• Databases: Snowflake, MySQL, PostgreSQL, Oracle, Microsoft SQL Server, Redshift, BigQuery

• Cloud Platforms: AWS (S3, EMR, EC2, Lambda, VPC, Route 53, Cloud Watch, CloudFront), Microsoft Azure, GCP

• ETL Tools: Talend, AWS Glue, Azure Data Factory, Informatica, Apache NiFi, Ab Initio, SSIS

• Business Intelligence Tools: Power BI, Tableau, Looker

• Data Security & Compliance: HIPAA, GDPR, Data Masking, Encryption, IAM

• DevOps & Version Control: Airflow, Jenkins, Docker, Terraform, Git, GitHub, Bitbucket

• AI & ML: RAG, Hugging Face, OpenAI, LangChain, Prompt Engineering, Vector Databases

• Development Tools: Eclipse, Intellij, Microsoft SQL Studio, Toad, NetBeans

• Other Tools: REST APIs, JSON, JIRA, Agile/Scrum, SDLC PROFESSIONAL WORK EXPERIENCE

AWS Data Engineer Dec 2024 – Present

Citi USA

• Integrated a real-time trade and market data ingestion pipeline using Apache Kafka, Apache Spark Structured Streaming & Apache Airflow, processing 1+ million events per hour across Citi’s equities & FX desks and reducing latency by 30%.

• Built an AWS-based data lake architecture using S3, AWS Glue & Athena, managing compliance & settlement datasets stored in Parquet & ORC formats with schema evolution tracked using JSON/XML schema registries.

• Integrated dbt with Snowflake and Redshift, automating data transformations and improving ETL efficiency by 40%.

• Involved in ingestion, transformation, manipulation, and computation of data using kinesis, SQL, AWS glue and Spark.

• Constructed end-to-end ETL workflows with Talend, PySpark, and SQL, integrating multi-source financial data into Snowflake, improving delivery speed for risk analysis and predictive modeling by 35%.

• Optimized Snowflake performance through clustering, partitioning, and query tuning, reducing query runtime by 45% and enhancing BI/reporting scalability with Tableau and Power BI.

• Streamlined scalable data workflows in AWS using S3, Glue, Lambda, and Step Functions, orchestrated with Airflow and provisioned via Terraform, cutting ETL runtime by 40% through task parallelization and optimized transformations.

• Used Apache Airflow with AWS to analyze multi-stage machine learning processes with Amazon SageMaker tasks.

• Implemented MLOps pipelines and GenAI workflows with MLflow, Databricks, TensorFlow, and vector databases to support feature engineering, regulatory compliance automation, and fraud detection systems.

• Integrated REST APIs to ingest market and reference data from Bloomberg, Reuters, and internal platforms into PostgreSQL and Snowflake with end-to-end encryption, SLA adherence, and governance.

• Deployed containerized data workflows with Docker, Kubernetes, and Jenkins CI/CD, ensuring scalable deployments, governance, and compliance in Agile/DevOps environments.

• Worked on Informatica tools such as power center MDM Repository manager and workflow monitor

• Collaborated with the clients and solution architect for maintaining quality data points in source by carrying out the activities such as cleansing, transformation, and maintaining Integrity in a relational environment. Azure Data Engineer Nov 2022 – Nov 2023

Deloitte India

• Engineered batch and real-time data pipelines using Apache Spark and Kafka for a global banking client, enhancing regulatory reporting accuracy for Basel III and IFRS-9 compliance across 10 million record datasets.

• Developed robust ETL processes using Informatica, Python, and Azure Data Factory (ADF), transforming raw loan and credit card data, reducing data processing time by 30% while ensuring data quality for downstream analytics.

• Wrote Spark-Streaming applications to consume the data from Kafka topics and write the processed streams to HBase.

• Designed modular, testable data models in Snowflake and SQL Server using dbt, mitigating model debugging time by 40% and accelerating regulatory report delivery across 5+ financial products.

• Used different Python libraries like BEAUTIFUL SOUP to perform data extraction from the website.

• Led the migration of 50+ legacy ETL jobs and large datasets into Databricks and Azure Data Factory, orchestrated with Azure DevOps and Git, cutting deployment time by 40% and enabling scalable data integration from ADLS Gen2.

• Dockized ETL components and deployed to Data Specific ECS clusters using Jenkins/GIT.

• Achieved performance improvement on Azure Synapse loading by implementing a dynamic partition switch.

• Responsible for loading the data from Oracle database, Teradata into HDFS using Sqoop. Configured, administered, and monitored Azure, IAAS and PAAS, AzureAD, Azure VMs, Networking (VNets, Load Balancers, App Gateway)

• Automated monitoring and observability by integrating DataDog, CloudWatch, and Elasticsearch with Snowflake, Databricks, and ETL pipelines, reducing downtime and improving system efficiency.

• Building/Maintaining Docker container clusters managed by Kubernetes Linux, Bash, Git, Docker.

• Designed RAG-based document search systems and ML-ready pipelines, writing 500+ lines of code in Databricks with Python and SQL, enabling accurate regulatory document retrieval and supporting compliance automation initiatives. Data Engineer Jan 2021 – Oct 2022

Optum India

• Automated healthcare ETL workflows using Cloud Composer and Dataflow, reducing manual intervention by 35% and improving SLA adherence for monthly claims data refreshes.

• Built scalable data pipelines in Dataproc with PySpark and Delta Lake, ingesting 5M+ healthcare records daily and cutting processing time from 4 hours to under 2.

• Experienced in Google Cloud components, Google container builders and GCP client libraries and Cloud SDK'S.

• Optimized BigQuery datasets for large-scale medical records and designed data marts in Snowflake and SQL Server, reducing query runtime by 40% and accelerating analytics in Looker and Tableau.

• Written Airflow DAGs orchestrating end-to-end pipelines across Cloud Storage, Dataflow, and BigQuery, ensuring reliable multi-application data delivery.

• Working on query languages such as SQL, code languages such as Python or C# and scripting languages such as PowerShell, M-Query (Power Query), or Windows batch commands.

• Configured Google VPC to create a public-facing subnet for web servers with internet access and private subnets for backend databases and application servers without external exposure.

• Helped in automation with Cloud Build and Git-based CI/CD, integrating Snowflake and SQL Server optimizations to boost dashboard responsiveness by 25%.

• Developed multiple notebooks using PySpark and Spark SQL in Databricks for data extraction, analyzing and transforming the data according to the business requirements.

• Created cost-optimization dashboards in Data Studio for billing and usage monitoring, enabling query tuning and measurable GCP savings.

• Engineered data validation pipelines using Python, Apache Beam, and SQL to reconcile raw source files with BigQuery tables, detecting 250+ anomalies monthly.

CERTIFICATIONS

• Google Associate Cloud Engineer (ACE)

Contact this candidate