SAI SREE
Data Engineer
Email: ************@*****.*** PH: 913-***-****
Linkedin:https://www.linkedin.com/in/saisree-chowdhary-2aa939231/
AWS Certified Solution Architect
Summary
Experienced Data Engineer with over 5 years of success in designing and delivering scalable data pipelines and analytics platforms across AWS and Azure. Proven track record in improving data processing performance by 40% through Apache Spark, Python, and SQL-based optimizations. Specialized in real-time data ingestion using Kafka and Flink, and workflow orchestration using Airflow and AWS Glue. Strong focus on data quality with tools like Great Expectations and Apache Atlas. Skilled in CI/CD integration via Jenkins and GitHub Actions, ensuring stable and high-integrity data deployments. Adept at collaborating in Agile environments, mentoring junior engineers, and aligning technical solutions with business and compliance requirements. Holds a Master’s in Computer Science and AWS Solutions Architect & Azure Data Engineer certification.
Professional Skills
Programming Languages
Python, SQL, Scala
Data Warehousing & Storage
Amazon Redshift, Google BigQuery, Snowflake, PostgreSQL, MySQL, MongoDB, Cassandra, Hbase
Big Data Frameworks
Apache Hadoop, Apache Spark, Apache Kafka, Apache Airflow
ETL Tools
Talend, Apache NiFi, dbt (Data Build Tool),
Cloud Platforms & Services
AWS (S3, Glue, Lambda, Redshift), GCP (BigQuery, Cloud Storage, Dataflow), Microsoft Azure (Data Lake, Synapse)
Data Modeling & Schema Mgmt
dbt, ERwin Data Modeler, Parquet, ORC
Version Control & Collaboration
Git (GitHub, GitLab, Bitbucket), Jira, Confluence
Data Integration & APIs
REST APIs, GraphQL
Monitoring & Logging
Prometheus, Grafana, ELK Stack (Elasticsearch, Logstash, Kibana), DataDog
Containerization & Virtualization
Docker, Kubernetes, Terraform
Data Security & Compliance
Apache Ranger, AWS IAM, Data Encryption (TLS/SSL, KMS)
DevOps & Automation
Jenkins, CircleCI, Ansible, Chef, Puppet
Professional Experience
Data Engineer Jan 2025 to Present
JPMorgan Chase,Phoenix,AZ
Built scalable ETL pipelines using PySpark on AWS Glue to process terabytes of security analytics data, optimizing for throughput and cost.
Integrated structured and semi-structured data from diverse sources into AWS S3, Redshift, and Glue Catalog, enabling robust analytics.
Orchestrated end-to-end workflows using AWS Glue Workflows, Step Functions, and Triggers for modular and maintainable pipelines.
Created external tables in Glue Catalog with Parquet/ORC formats, supporting ad-hoc queries via Athena and Redshift Spectrum.
Automated infrastructure provisioning using Terraform for Glue jobs, IAM roles, S3 buckets, and Redshift access policies.
Streamed real-time logs using Kafka and Kinesis into S3-based data lakes with schema evolution support and efficient partitioning.
Developed robust error handling, logging, and alerting using CloudWatch, Lambda, and custom monitoring scripts.
Built reusable Python modules for S3 I/O, data validation, and config management across environments.
Conducted performance tuning in Spark and PySpark with optimized memory, partitioning, and caching strategies.
Designed and implemented DBT models in Snowflake for staging, cleansing, and transformation layers with CI/CD via GitHub Actions.
Used Databricks notebooks for exploratory data processing and collaborative development on semi-structured datasets.
Developed complex SQL transformations in Redshift to support dashboards, reporting layers, and downstream analytics tools.
Collaborated with analysts, product managers, and QA teams to align data models and pipeline logic with business requirements.
Supported UAT and QA testing with synthetic data generation, schema validation, and pipeline output comparison.
Documented pipeline design, failure scenarios, operational procedures, and best practices for cross-team handover.
Participated in Agile ceremonies, sprint planning, and demos, ensuring timely and transparent delivery of data solutions.
Data Engineer Feb 2024 to Jan 2025
Regions Bank,Alabama,USA
Roles & Responsibilities
Developed robust data pipelines in AWS Glue using PySpark to ingest, transform, and partition raw data from S3 into Parquet for analytical consumption.
Utilized AWS Glue Catalog and Crawlers to register dynamic metadata for external tables queried via Athena and Redshift Spectrum.
Wrote advanced Spark SQL and PySpark DataFrame queries for complex business logic including aggregations, window functions, and conditional joins.
Automated ETL workflows using AWS Step Functions and Lambda, coordinating multi-stage data processing pipelines with conditional branching.
Tuned PySpark jobs with memory optimizations, repartitioning, and caching, reducing job runtimes on large datasets.
Processed semi-structured JSON, CSV, and XML data using custom schema mapping and Spark’s schema inference features.
Loaded transformed datasets into Amazon Redshift using COPY commands with manifest file support and column mapping.
Built reusable Python modules for data cleaning, null handling, date normalization and custom string parsing logic.
Implemented data reconciliation scripts using SQL and Python, validating counts and quality between source and target layers.
Conducted performance tuning of DataStage jobs by optimizing partitioning and memory settings, reducing job runtime significantly.
Implemented CI/CD pipelines for DBT using GitHub Actions to validate and deploy transformations automatically.
Managed data extraction from SQL Server, PostgreSQL, and Oracle using AWS DMS, JDBC connections, and PySpark connectors.
Designed and executed SQL queries for data profiling, validation, and complex joins to support data modeling efforts.
Supported on-premises data pipelines using Apache Hive on Hadoop, processing large volumes of legacy operational data.
Built and maintained Hive tables on HDFS, leveraging partitioning, bucketing, and ORC formats for performance.
Used Hue and Beeline for running exploratory HiveQL queries and managing job diagnostics in Hadoop environments.
Migrated legacy Hadoop pipelines to AWS Glue, replicating existing logic in PySpark and improving execution times.
Extracted large batch datasets from Hadoop using Sqoop, transforming them for cloud ingestion.
Configured and used S3 lifecycle policies and KMS encryption for secure and cost-effective storage management.
Scheduled and orchestrated DBT runs via Airflow and DBT Cloud for production workflows
Set up CloudWatch log streams and metrics alarms to monitor Glue job health and trigger recovery logic on failure.
Applied Athena views for cross-source reporting and federated querying across S3 and on-premise SQL systems.
Created custom AWS Glue classifiers and transformations for non-standard file parsing and ETL staging.
Used Git and Jenkins to implement CI/CD for Glue and Python job deployments across dev, staging, and prod environments.
Performed performance tuning and cost optimization across data workflows using Spark UI, CloudWatch, and Redshift WLM queues.
Ensured governance compliance using row-level access controls, data masking, and encryption across staging and production zones.
Data Engineer Jun 2019 – Jul 2022
Wipro, Hyderabad,India
Designed and built scalable, high-performance data pipelines using Apache Spark, Python, and SQL, supporting batch and real-time data processing.
Created robust ETL frameworks to ingest, clean, and transform data into AWS Redshift and Snowflake, enhancing analytics readiness.
Led full migration of on-prem systems to cloud platforms (AWS & Azure), cutting infrastructure costs by
30% and boosting uptime.
Developed real-time data solutions using Apache Kafka and Spark Streaming, enabling low-latency insights for critical business processes.
Automated data ingestion and transformation with Python, eliminating manual effort and saving over 20 hours/week in team resources.
Managed complex data workflows using Apache Airflow, increasing reliability and visibility of scheduled data jobs.
Partnered with cross-functional teams to define data models, KPIs, and reporting pipelines based on business objectives.
Worked closely with data scientists to prepare structured datasets, accelerating model development and deployment.
Built data validation layers using SQL and Python to ensure data quality, completeness, and consistency.
Tuned SQL queries and optimized warehouse performance, reducing reporting latency by over 40%.
Documented data workflows, transformation logic, and architecture diagrams using Confluence and
Lucidchart for team onboarding.
Integrated third-party APIs for real-time data synchronization with client-facing dashboards and analytics tools.
Conducted performance tuning of Spark jobs and cloud resources, reducing compute costs and improving pipeline speed.
Created modular, reusable code libraries in Python to standardize transformation logic across different projects.
Mentored junior engineers in data engineering best practices, coding standards, and cloud-native tool adoption.
Data Analyst Mar 2018 – Apr 2019
Cognizant,Hyderabad,India
Analyzed large datasets to identify key insights and trends, enhancing data-driven decision-making processes.
Utilized advanced Excel, SQL, and Python to manipulate data, generate reports, and automate data workflows.
Designed and developed interactive dashboards using Tableau and Power BI to visualize key business metrics and improve decision-making.
Collaborated with cross-functional teams to define business requirements and provide actionable insights tailored to specific organizational needs.
Conducted ad-hoc data analysis to support strategic initiatives, identifying opportunities for process improvements and operational efficiency.
Performed data cleaning and transformation tasks to ensure data accuracy, consistency, and integrity for actionable insights.
Presented findings and recommendations to senior management to drive business strategy and facilitate key decisions.
Developed and maintained data pipelines for efficient data extraction, transformation, and loading (ETL) processes.
Provided ongoing support for data systems, identifying potential issues and troubleshooting data discrepancies.
Automated repetitive data tasks using Python and SQL, freeing up time for more strategic analysis and business development.
Coordinated with external vendors and third-party tools to ensure seamless data integration and consistent reporting across platforms.
Conducted trend analysis and developed predictive models to support long-term forecasting and business planning.
Participated in the design and implementation of data governance frameworks to ensure compliance with data privacy and regulatory standards.
Collaborated with marketing teams to analyze customer data and improve segmentation strategies, resulting in more targeted marketing campaigns.
Developed training materials and conducted workshops to upskill team members on data analysis tools and techniques, promoting knowledge sharing.
Academics
Master’s in Computer Science, Northern Arizona University, Flagstaff,AZ Aug 2022 to Dec 2023
Bachelors in Computer science & Eng., Anurag Group of Institution, India Aug 2015 to Apr 2019
Certifications
AWS Solution Architect
Azure Data Engineering
Database Programming with SQL