Data Engineer Big

Location:

Irving, TX

Posted:

October 15, 2025

Contact this candidate

Resume:

SRIRAM TEJA

AWS DATA ENGINEER

: 469-***-**** : ************@*****.*** : https://www.linkedin.com/in/sriramteja8/

PROFILE SUMMARY:

Passionate Data Engineer with proven expertise in building scalable, high-performance data pipelines and ETL/ELT workflows that transform raw data into actionable business insights.

Skilled in leveraging modern data tools, cloud platforms, and big data technologies to solve complex problems, optimize operations, and drive data-driven decision-making.

Designed and optimized end-to-end ETL pipelines to extract, transform, and load data from multiple sources, ensuring data quality, consistency, and reliability.

Modernized legacy ETL processes to cloud-native ELT architectures, improving scalability, maintainability, and processing speed.

Managed ETL workflows in Informatica PowerCenter, using XML-based exports/imports to enable smooth migration across environments.

Hands-on experience with Snowflake (SnowSQL, SnowPipe) and Python for building Big Data models and automating data ingestion workflows.

Leveraged Databricks with PySpark for distributed processing, Delta Lake for optimized storage, and Unity Catalog for centralized data governance.

Developed reusable Python modules to interact with cloud services (AWS Boto3, Azure SDK), APIs, and databases, reducing manual intervention.

Extensive experience with HDFS, MapReduce, Sqoop, Flume, Hive, HBase, Apache Pig, Oozie, and Zookeeper for batch and streaming data processing.

Orchestrated and monitored ETL pipelines using Apache Airflow to enhance reliability and operational efficiency.

Engineered scalable ETL pipelines using AWS Glue, EMR, and Redshift for high-performance data transformation and warehousing.

Developed real-time streaming pipelines with PySpark and Kafka for efficient ingestion and analytics.

Implemented robust data workflows in Scala, optimizing performance for both batch and streaming applications.

Designed and optimized relational database schemas, queries, and indexing (MySQL, Oracle, PostgreSQL) to ensure data integrity and performance.

Built and maintained NoSQL solutions using MongoDB for flexible and efficient data retrieval.

Hands-on experience with a wide range of AWS services: EC2, RDS, VPC, IAM, ELB, Auto Scaling, CloudWatch, SNS, SES, SQS, Lambda, EMR, and more.

Built event-driven pipelines using AWS Lambda, SQS, and SNS for asynchronous processing.

Developed interactive analytics and data warehouses with Redshift and Athena.

Automated infrastructure provisioning with Terraform for consistent, version-controlled deployments.

Implemented CI/CD pipelines with Jenkins, AWS CodePipeline/Build/Deploy, and Docker, integrating infrastructure-as-code practices.

Used Git and Bitbucket for version control and collaborative development within Agile teams.

Collaborated with cross-functional teams to gather requirements, troubleshoot issues, and maintain comprehensive technical documentation.

Technical Skills

Programming Languages

Python, SQL, Java, Scala, R

Databases & Data Warehousing

MySQL, PostgreSQL, SQL Server, MongoDB, Cassandra, HBase, Snowflake, Amazon Redshift, Google BigQuery

Big Data Technologies

Hadoop (HDFS, MapReduce, YARN, Sqoop, Flume, Oozie, Zookeeper), Hive, Pig, Impala, Kafka, Spark, PySpark, Apache Airflow

Cloud Platforms:

AWS: S3, Lambda, EMR, EC2, RDS, Redshift, SNS, SQS, IAM, Kinesis

Azure: Data Lake, Synapse, Databricks

Google Cloud Platform: Big Query, Dataflow, Cloud Storage

ETL & Data Integration Tools

Informatica, Talend, AWS Glue

Data Visualization & BI Tools

Tableau, Power BI

CI/CD & Version Control

Git, Bitbucket, Maven, SBT, GitHub Actions, AWS Code Pipeline, AWS Code Build, AWS Code Deploy, Azure DevOps Pipelines

Data Processing Frameworks

Apache Hadoop, Apache Spark, Apache Flink, Apache Beam

Analytical Skills

Data Modeling, Data Quality, Root Cause Analysis, Trend Analysis, Forecasting, Business KPI Reporting

Project & Workflow Management

Agile Methodologies, Cross-Functional Collaboration, Data Governance, Quality Assurance

WORK EXPERIENCE

Ryder – Miami, FL

AWS Data Engineer September 2023 - Till Date

Responsibilities:

Designed and implemented an internal Fleet Analytics Dashboard using Amazon QuickSight and a ReactJS + AWS API Gateway application to track vehicle utilization, fuel efficiency, driver performance, and delivery delays.

Developed real-time and batch data ingestion pipelines using AWS Glue, AWS Lambda, Amazon Kinesis, and Apache Kafka to integrate telematics, GPS, and IoT datasets from multiple sources.

Built and optimized ETL/ELT workflows using PySpark on AWS EMR and Databricks (Delta Lake, Unity Catalog) to cleanse, aggregate, and model high-volume fleet data.

Modeled and stored processed datasets in Amazon Redshift and Snowflake to enable low-latency querying and analytics.

Created SPICE datasets and parameterized reports in QuickSight to improve dashboard performance by 40% and enhance end-user experience.

Integrated backend APIs with AWS API Gateway and AWS Lambda, enabling dynamic data retrieval for the React-based UI.

Developed custom Python scripts for KPI computation, anomaly detection, and trend analysis, improving operational insight accuracy by 30%.

Implemented data governance and access control using AWS IAM, KMS encryption, and VPC configurations to protect sensitive driver and operational data.

Automated data processing workflows using AWS Step Functions and Apache Airflow, improving pipeline reliability and reducing manual intervention.

Configured CloudWatch, CloudTrail, and QuickSight anomaly detection for proactive monitoring, alerting, and root cause analysis of data issues.

Built and deployed infrastructure using Terraform and AWS CloudFormation, ensuring consistent and version-controlled resource provisioning.

Developed CI/CD pipelines using AWS CodePipeline and AWS CodeBuild for automated deployment of data pipelines and application updates.

Tuned Redshift and Snowflake queries for complex analytical workloads, reducing query execution time by up to 35%.

Integrated streaming analytics to provide near real-time visibility into fleet KPIs using Kinesis Data Analytics and Spark Structured Streaming.

Applied data quality frameworks to validate incoming datasets, ensuring accuracy, completeness, and compliance with business rules.

Collaborated with cross-functional teams including data analysts, operations managers, and BI developers to refine KPIs and deliver actionable insights.

Conducted performance testing and optimization for both frontend (React) and backend APIs to ensure scalability for enterprise-wide use.

Delivered Agile sprint demos and maintained technical documentation for data pipelines, architecture, and dashboard features to support long-term maintainability.

Ensured compliance with internal data security policies and industry standards by implementing encryption, masking, and secure network configurations.

Braintree– Chicago, Illinois

Cloud Data Engineer April 2021– Aug 2023

Roles & Responsibilities:

Designed and implemented an automated daily reconciliation pipeline to match merchant payouts with bank transaction reports, eliminating manual finance team intervention.

Built ETL workflows using AWS Glue to extract payout and bank transaction data from Amazon S3, transform datasets, and prepare them for reconciliation analysis.

Utilized Amazon Athena to run optimized SQL queries on transformed data for settlement matching, discrepancy identification, and financial accuracy checks.

Developed automated reconciliation reports in Amazon QuickSight and scheduled CSV exports for the Finance team, reducing reporting time by 70%.

Integrated Python scripts to handle data parsing, enrichment, and discrepancy flagging, ensuring high accuracy in settlement records.

Leveraged AWS Lambda functions for event-driven execution of reconciliation jobs and automated report generation.

Implemented S3 event triggers to initiate Glue jobs upon data arrival, enabling near real-time reconciliation processing.

Applied data validation rules within Glue ETL to ensure completeness, accuracy, and compliance with financial data governance standards.

Secured financial datasets using AWS IAM roles, KMS encryption, and VPC configurations, ensuring PCI DSS compliance.

Configured CloudWatch for monitoring pipeline execution, error alerts, and performance metrics.

Optimized Athena queries using partitioning and compression techniques to improve reconciliation performance by 40%.

Developed version-controlled infrastructure using Terraform for consistent deployment of AWS resources.

Created CI/CD pipelines with AWS CodePipeline and CodeBuild to automate ETL deployment and report updates.

Collaborated with Finance, Data Analytics, and Operations teams to define reconciliation logic, reporting formats, and KPI tracking.

Conducted unit testing and QA validation on settlement logic to ensure zero mismatches before production rollout.

Documented end-to-end pipeline architecture, reconciliation workflows, and data mappings for compliance and audit readiness

Delivered training sessions to Finance teams on using QuickSight dashboards for payout tracking and discrepancy resolution.

Supported Agile sprints with regular status updates, backlog refinement, and demo sessions to business stakeholders.

Best Buy, Richfield, MN March2018 – March 2021

Big Data Engineer

Roles & Responsibilities:

Designed and implemented end-to-end data pipelines for ingesting, processing, and transforming inventory and supply chain data from multiple sources, including ERP, POS, and warehouse management systems.

Utilized AWS services such as S3 for data storage, Glue for ETL workflows, Athena for ad-hoc querying, and Redshift for data warehousing to enable real-time analytics.

Developed Spark/PySpark jobs for large-scale data processing, data cleansing, and enrichment to ensure high-quality, reliable data for reporting and forecasting.

Implemented data modelling and schema design to optimize storage and query performance for inventory and supply chain datasets.

Built automated dashboards and reports using Quick Sight to provide insights on inventory levels, demand trends, and supply chain efficiency for business stakeholders.

Collaborated with supply chain analysts and business teams to understand business requirements, translate them into technical specifications, and implement data solutions that support strategic decisions.

Monitored, debugged, and optimized ETL pipelines to ensure data accuracy, consistency, and timeliness.

Applied version control using Git and participated in CI/CD processes to deploy ETL workflows and analytics solutions efficiently.

Ensured data security and compliance by implementing IAM roles, access policies, and encryption for sensitive supply chain data.

Provided insights on inventory replenishment, demand forecasting, and supply chain bottlenecks, leading to improved operational efficiency and cost reduction.

Automated routine data validation and reconciliation processes to minimize manual intervention and reduce errors.

Designed and implemented alerting mechanisms for data anomalies and pipeline failures to proactively address issues. Leveraged AWS Lambda to trigger serverless ETL workflows for real-time inventory updates and notifications.

Implemented partitioning, indexing, and compression strategies in Redshift to optimize query performance and reduce costs.

Conducted root cause analysis for data discrepancies and implemented corrective actions to improve pipeline reliability.

Performed data enrichment by integrating external datasets, improving forecasting accuracy and decision-making.

Applied role-based access control and auditing for sensitive supply chain data to meet compliance standards.

Optimized pipeline performance using parallel processing, caching, and memory tuning in Spark/PySpark.

Developed reusable Python and PySpark scripts for ETL automation, reducing development time for new pipelines.

Prepared technical documentation, runbooks, and knowledge-sharing materials to ensure smooth handover and maintainability.

EDUCATION:

Master’s in computer science from University of North Carolina at charlotte, NC.

Contact this candidate