Hruday Kumar Reddy Poreddy
Email: *************@*****.***
Mobile: +1-716-***-****
Data Engineer
PROFESSIONAL SUMMARY:
Data Engineer with 4+ years of experience, demonstrating expertise in analytical thinking and problem-solving, delivering data solutions while maintaining attention to detail and innovative thinking skills. Strong communication skills are applied to present technical information to diverse audiences.
Proficient in connecting dots across applications to understand end-to-end views, effectively communicating technical details and executive summaries, and utilizing strong PL/SQL skills to write and analyze complex queries and stored procedures. Works well with minimal supervision.
Experienced in Oracle Exadata or 10g and above, adept at using Microsoft Office suite, and skilled in working within Agile/Scrum teams for prioritization and resource assignments, contributing as a team player who can influence and guide others.
Integrated data solutions with a focus on identifying priorities and managing multiple projects simultaneously, demonstrating know-how in effort and financials estimation, and willingness to ask questions and reach out for assistance as required. Good at innovative thinking.
Designed end-to-end data solutions, enhancing observability and automating failure handling, while applying analytical thinking to improve data platforms; strong communication and presentation skills are used to convey complex information. Team player for success.
Developed robust architectures leveraging data solutions to support parallel processing of datasets for real-time analytics, demonstrating attention to detail in data governance and applying problem-solving skills to optimize performance. Works well in a team environment.
Automated ingestion workflows for real-time event processing, improving SLA adherence and reducing latency in enterprise dashboards, while exhibiting proficiency in query tools to aid data analysis and strong PL/SQL skills. Good with identifying priorities.
Designed partitioned datasets to optimize query cost and performance, implementing access control through authorized views, ensuring governance in cross-functional analytical collaboration, and demonstrating expertise in analytical thinking. Team player.
Prototyped workflows for customer segmentation, including preprocessing pipelines and feature engineering, to enhance accuracy, while applying attention to detail and problem-solving skills to ensure data quality. Strong communication and presentation skills.
Built centralized data lakes with ingestion pipelines feeding into transformation and dashboards, enabling unified reporting for business users, and demonstrating proficiency in connecting dots across applications. Works well in a team environment.
Hands-on experience with data solutions, enabling hybrid-cloud architectures and seamless data migration across platforms for regulatory compliance, while demonstrating know-how in effort and financials estimation. Good at innovative thinking skills.
Developed PySpark and Delta Lake pipelines to implement modular, ACID-compliant data transformations that significantly improved job runtimes and reduced reconciliation issues in financial reporting, applying analytical thinking and attention to detail. Team player.
Experienced in logical and physical data modeling using star and snowflake schemas to support OLAP queries, enabling high-performance analytics on structured datasets, and demonstrating strong PL/SQL skills. Works well with minimal supervision.
Strong collaboration history with data scientists and DevOps teams to detect inefficiencies, redesign workflows, and implement scalable pipeline architectures that reduced SLA breaches and system load, applying problem-solving skills. Good with identifying priorities.
Implemented CI/CD pipelines for automated deployment of ETL workflows, including Terraform-based infrastructure provisioning, increasing team agility in production rollouts, and demonstrating expertise in analytical thinking. Team player for success.
Applied data quality validation to enforce schema rules, detect anomalies, and ensure accuracy in large-scale datasets before consumption, while demonstrating attention to detail and innovative thinking skills. Strong communication and presentation skills.
Skilled in Python, SQL, and Shell scripting for ETL automation, data preprocessing, and Airflow-based orchestration, complemented by monitoring to ensure uptime and performance, and demonstrating proficiency in query tools. Works well in a team environment.
TECHNICAL SKILLS:
Databases - PostgreSQL, MySQL, Oracle, Microsoft SQL Server, DynamoDB, MongoDB, Cassandra, Oracle Exadata
Programming Languages - Python, SQL, PySpark, Scala, Shell Scripting, Java, PL/SQL
DevOps & Orchestration - GitHub Actions, AWS CodePipeline, Jenkins, Terraform, Docker, Kubernetes, CI/CD, CloudFormation, Git, Bitbucket
Others - Microsoft Office suite
PROFESSIONAL EXPERIENCE:
Capital One September 2024 – Present
Data Engineer
Responsibilities:
Demonstrated analytical thinking and problem-solving skills by designing scalable data ingestion pipelines using AWS Glue and S3, consolidating financial data for real-time compliance analytics. This improved the end-to-end view of financial transactions.
Integrated Amazon Redshift with downstream tools, enabling real-time forecasting and fraud analysis, while utilizing strong PL/SQL skills to optimize queries and stored procedures for analytical teams. This required attention to detail.
Developed modular PySpark ETL workflows in AWS Glue to convert semi-structured JSON and CSV banking feeds into schema-compliant tables, reducing transformation complexity and processing time. This showcased innovative thinking.
Implemented AWS Step Functions for orchestrating multi-stage ETL pipelines with conditional logic, failure recovery, and alerting, improving reliability and auditability of data workflows supporting customer portfolio analytics. This required innovative thinking.
Built real-time streaming pipelines using Amazon Kinesis and Lambda to capture and enrich customer transaction events, reducing fraud alert latency while integrating historical data patterns into anomaly detection. This required attention to detail.
Applied validation at transformation stages using AWS Deequ and Python logic to enforce business rules, ensure referential integrity, prevent ingestion of incomplete records into Redshift and analytical data marts. This required analytical thinking.
Engineered reusable, parameterized Glue Jobs supporting schema evolution and dynamic table transformations across multiple clients, reducing deployment artifacts and maintenance overhead, demonstrating problem-solving skills. This improved the end-to-end view.
Collaborated with financial analysts and compliance leads to establish secure data zones using IAM policies, encrypted S3 buckets, and Redshift access layers aligned with APRA and SOX regulatory standards. This required attention to detail.
Utilized S3 lifecycle rules and Glacier storage to optimize storage costs for low-frequency regulatory snapshots while ensuring on-demand retrieval for historical audits and compliance reviews. This required analytical thinking.
Built CI/CD pipelines using GitHub Actions and CloudFormation templates for deploying ETL workflows and data infrastructure, enforcing infrastructure-as-code and reducing lead time for production changes. This required innovative thinking.
Pfizer August 2023 – July 2024
Data Engineer
Responsibilities:
Collaborated with cross-functional business teams to gather data requirements for clinical trial monitoring and pharmaceutical product analytics, translating those into technical data ingestion pipelines. This required strong communication skills.
Designed comprehensive batch processing workflows in AWS Glue that extracted, normalized, and enriched patient and trial metadata from internal health data systems, facilitating reporting dashboards. This required attention to detail.
Utilized Amazon Athena and Redshift for analytical querying of structured datasets sourced from S3 buckets, enabling rapid exploratory analysis on research KPIs, drug efficacy patterns, and laboratory data trends. This required analytical thinking.
Engineered transformation scripts using PySpark to clean, standardize, and join multi-format datasets including XML, JSON, and delimited files from external CRO partners, ensuring consistent schema ingestion. This required problem-solving skills.
Leveraged AWS Step Functions to sequence Glue Jobs, integrating business logic checkpoints, schema validation stages, and downstream notification alerts for error detection and SLA tracking during nightly data loads. This required innovative thinking.
Created data profiling and quality assurance scripts to validate clinical and manufacturing data attributes such as dosage accuracy, patient demographics, and trial phase attributes, integrating rules into automated ETL pipelines. This required attention to detail.
Used AWS Lambda for lightweight transformation of inbound data files and automated metadata capture, feeding logs into S3 and triggering notification alerts via Amazon SNS for successful or failed ingestions. This required analytical thinking.
Participated in data lineage documentation efforts using AWS Glue Data Catalog, tagging datasets with business metadata and defining classification schemas to support internal data governance and audit readiness. This required attention to detail.
Worked with Redshift Spectrum to create external tables referencing S3-based datasets, significantly improving performance and cost-efficiency for ad hoc analysis queries performed by internal analytics stakeholders. This required analytical thinking.
Conducted root cause analysis on data drift and ingestion failures, applying corrective transformations, regex-based filters, and schema realignments to restore consistency and minimize downstream data rework. This required problem-solving skills.
Costco Wholesale Corporation June 2021 – July 2023
Data Analyst
Responsibilities:
Developed and maintained scalable ETL pipelines using AWS Glue, PySpark, and Step Functions to ingest, transform, and load structured and semi-structured data into Redshift and S3 for analytics consumption. This required analytical thinking.
Created real-time data ingestion pipelines utilizing AWS Kinesis, Lambda, and Firehose to process retail transactions and log data from POS systems across distributed locations for operational monitoring. This required attention to detail.
Implemented Spark-based batch processing on AWS EMR clusters, optimizing memory management and parallelization to accelerate transformation workflows for large-scale historical purchase and inventory datasets. This required problem-solving skills.
Designed high-performance data warehouse models using Amazon Redshift with star and snowflake schema structures to support reporting for product sales, logistics operations, and customer behavior analysis. This required innovative thinking.
Orchestrated multi-step workflows using AWS Step Functions and Lambda to automate failure handling and integrate disparate services including Redshift, S3, Glue Jobs, and CloudWatch Logs for traceability. This required attention to detail.
Built data lake architecture on Amazon S3 with optimized partitioning and lifecycle policies, integrating it with Athena and Glue Data Catalog for fast, serverless querying by business analysts. This required analytical thinking.
Wrote custom PySpark scripts in AWS Glue to perform cleansing, validation, and enrichment of raw JSON and Parquet data, ensuring accuracy and consistency for downstream analytics pipelines. This required problem-solving skills.
Managed secure data transfers across regions and accounts by using AWS DataSync and S3 replication, adhering to compliance requirements and enabling low-latency data access for business partners. This required attention to detail.
Designed and deployed parameterized Glue workflows for ingesting dynamic source datasets using reusable components and configuration-driven architecture, increasing productivity and reducing technical debt. This required innovative thinking.
Migrated legacy on-prem Oracle and SQL Server databases into Amazon RDS and Aurora using DMS, performing schema conversion and data validation to ensure parity and minimize downtime. This required analytical thinking.
Educational Details:
Master of Science in Data Science - University at Buffalo
Bachelor of Technology in Electronics and Communication Engineering - Gandhi Institute of Technology and Management (GITAM)