Data Engineer Power Bi

Location:

Coppell, TX

Posted:

January 21, 2025

Contact this candidate

Resume:

Ajitha Pagadala

Phone No: 469-***-****

Email: ******************@*****.***

LinkedIn: https://www.linkedin.com/in/ajitha-p-4b7768258/

Professional Summary

Over 10 years of experience as a Data Validation Engineer and Data Engineer with expertise in data integration, validation, and competitor intelligence using manual methods, ETL processes, SQL, and Snowflake. Proven track record in optimizing data pipelines, ensuring data accuracy, and providing actionable insights that drive business decision-making. Demonstrated success in automating workflows, leveraging AWS services (including Lambda, Glue, Redshift, Athena), and supporting cloud-based data storage and processing. Skilled in SCALA, SPARK, SQL, and HADOOP, with hands-on experience in using Spark SQL and Hive QL for large-scale data processing. Experienced in working with CI/CD workflows using GitHub for version control. Seeking to leverage my expertise in a data-focused role that prioritizes data integrity, operational efficiency, and scalable data solutions.

Expert in assortment and white gap analysis, competitor promotions, and pricing strategies, providing insights that drive a competitive edge in retail e-commerce markets.

Proficient in ETL operations, data pipeline automation, data transformation, and cloud storage solutions (AWS S3, Redshift, Athena), ensuring scalable and reliable data management.

Experienced in SQL, Snowflake, Hadoop Trino, and advanced Excel for data extraction, validation, and quality assurance.

Adept in Power BI and Tableau for creating visualizations that improve stakeholder communication and decision-making.

Skilled in stakeholder management, risk management, and Agile project delivery, collaborating with teams to ensure timely and impactful solutions.

Hands-on experience using PySpark and Python for data processing on platforms like Databricks and AWS EMR.

Skilled in CI/CD workflows and Agile practices, including unit testing for data pipelines to ensure seamless deployments.

Technical Skills

ETL Tools: AWS Lambda, AWS Glue, Apache Airflow, Python, Spark Streaming

Cloud Platforms: AWS (S3, Redshift, Athena, Lambda, Glue, RDS, CloudWatch), AWS EMR, EC2

Data Warehousing & Storage: AWS Redshift, Snowflake, RDS, S3

Databases: SQL (MySQL, PostgreSQL), Snowflake, Hadoop Trino, Hive QL, Spark SQL

DevOps Tools: GitHub, Jira, Confluence

Data Validation: Manual & automated validation techniques, Data integrity checks, Anomaly detection

Reporting & Visualization: Power BI, Tableau

Programming Languages: SQL, Python

Agile Methodologies: Jira (Task tracking, Issue management)

Version Control & Workflow Automation: CI/CD (Continuous Integration/Delivery), GitHub

Big Data & Orchestration: Spark, Hadoop, Hive, Apache Airflow

Shell Scripting: Basic familiarity with Linux/UNIX environments for task automation

Key Accomplishments

Improved match accuracy and pricing strategies at Lowe’s, contributing to a 15% improvement in competitive intelligence and product offerings.

Automated data pipelines using AWS Lambda and Glue, reducing manual intervention and processing errors by 25% at Dell Technologies.

Led a team of 30+ members, achieving a 98% validation accuracy rate and improving data quality by 20% at Lowe’s through enhanced validation techniques.

Optimized SQL queries and Python scripts, boosting pipeline performance by 25% and reducing storage and access costs at Dell Technologies.

Leveraged Hadoop Trino to process large datasets and identify anomalies, improving automated crawling processes and data accuracy at Boomerang Commerce.

Conducted assortment and white gap analysis at Lowe’s, identifying 500+ product expansion opportunities, driving strategic product offerings and enhancing competitive positioning.

Professional Experience

Project: Scalable Data Pipeline Automation & Integration Platform

Client: Dell Technologies

Role: AWS Data Engineer

Duration: June 2022 to Present

Location: Rock Round Texas

Project Overview:

The Scalable Data Pipeline Automation & Integration Platform at Dell Technologies focused on automating and optimizing data pipelines to enhance data management and analytical capabilities. As an AWS Data Engineer, I integrated data from various sources using SCALA, SPARK, SQL, and HADOOP for scalable processing. The project aimed to improve data retrieval, reduce operational overhead, and enable on-demand business analysis with tools like AWS Redshift, Athena, and Power BI, all while ensuring scalability and data quality. Integrated data from SQL databases, REST APIs, and flat files into a unified pipeline, ensuring data consistency and accuracy.

Responsibilities:

Designed and maintained data pipelines using SCALA, SPARK, and HADOOP, ensuring efficient processing of large datasets.

Developed workflows within the HADOOP ecosystem, integrating Spark SQL and Hive QL for optimized data transformations and queries.

Automated ETL processes with AWS Glue and Spark Streaming, improving the efficiency of data ingestion workflows.

Leveraged AWS EMR and EC2 to process large-scale data pipelines efficiently in cloud environments.

Conducted data transformations using SQL, Spark SQL, and Hive QL to ensure accurate, clean data for downstream applications.

Implemented CI/CD pipelines using GitHub, enabling seamless deployment and version control in the data pipeline process.

Utilized Apache Airflow for scheduling and monitoring workflows, ensuring pipeline reliability and automation.

Developed interactive dashboards in Power BI, providing actionable insights to stakeholders from large datasets.

Built foundational ETL pipelines using AWS-Kinesis, S3, and DynamoDB to process structured and semi-structured data.

Developed and tested REST APIs with Python Flask, integrating AWS services like SNS and SQS for event-driven workflows.

Implemented small-scale storage solutions using AWS-S3 and DynamoDB, applying basic design patterns like MVC to improve reliability.

Monitored pipeline performance using AWS CloudWatch, implementing improvements to enhance data processing efficiency.

Collaborated in Agile sprints using Jira for task tracking and worked cross-functionally to meet project deadlines.

Proficient in using Terraform to automate the provisioning and management of AWS resources like EC2, S3, IAM roles, and VPC, ensuring consistency and scalability across environments.

Implemented Infrastructure as Code (IaC) with Terraform, reducing manual intervention and improving the speed and accuracy of cloud infrastructure deployments.

Integrated Terraform into CI/CD pipelines, enabling automated infrastructure provisioning and ensuring seamless application deployment across different stages.

Optimized cloud resource management using Terraform scripts, resulting in cost reduction and improved operational efficiency in AWS environments.

Collaborated with cross-functional teams to automate and manage cloud infrastructure, ensuring alignment with security and governance standards using Terraform.

Developed and maintained Terraform scripts to provision and update AWS infrastructure, improving system reliability and reducing human error in the cloud deployment process.

Developed financial dashboards using Power BI and Tableau to support budget optimization and forecasting processes.

Automated financial data pipelines using AWS Glue and SQL, improving data quality and accuracy for financial reporting.

Conducted data analysis to support budget variance analysis and recommended corrective actions for enhanced decision-making.

Collaborated with stakeholders to analyze financial trends and provide actionable insights for strategic planning.

Developed scalable data processing workflows using Python and PySpark for large-scale data transformations, ensuring efficient data manipulation and processing within cloud environments.

Learn the fundamentals of Infrastructure as Code (IaC) using Terraform. Focus on writing scripts for provisioning AWS resources like S3, EC2, and IAM roles and Practice integrating Terraform into CI/CD workflows.

Collaborated in Agile sprints using Jira for task tracking and worked cross-functionally to meet project deadlines.

Worked on AWS Glue, AWS Redshift, and AWS Lambda to build data pipelines, orchestrate workflows, and process large-scale data and developed workflows using Step Functions for automation and seamless integration of various AWS services.

Designed and implemented data visualizations using Power BI, creating interactive dashboards to provide business insights and real-time analytics.

Utilized Erwin for data modeling, creating conceptual, logical, and physical data models, ensuring data integrity and alignment with business requirements.

Managed ETL processes using Informatica, ensuring data quality and consistency across various environments. Implemented data governance frameworks to maintain compliance and best practices.

Applied SDLC principles for end-to-end project delivery, including requirement gathering, design, development, and deployment phases, with a focus on data lifecycle management.

Implemented data security practices and ensured adherence to GDPR compliance, safeguarding sensitive information and maintaining regulatory standards.

Modeled data for OLTP systems, Data Lakes, and Big Data environments, optimizing performance and ensuring scalability for large datasets.

Project: Competitive Intelligence and Pricing Optimization

Client: Lowe’s

Role: Sr Data Validation Engineer

Duration: Sep 2019 to Feb 2022

Location: Bangalore, India

Project Overview:

This project focused on enhancing Lowe's competitive intelligence capabilities by improving pricing strategies and identifying assortment gaps. The project involved analyzing competitor data, optimizing pricing based on market trends, and validating match accuracy to ensure data integrity. The goal was to support pricing decisions, improve product offerings, and maintain Lowe’s competitive edge in the retail market through data-driven insights

Responsibilities:

Conducted assortment and white gap analyses, identifying over 500 potential product expansion opportunities to address competitive gaps and improve product offerings.

Performed competitor analysis on current promotions and offers, providing insights for strategic pricing and promotional alignment.

Verified and validated data science-generated match candidates across competitors, using internal tools to support data-driven pricing decisions.

Participated actively in product line reviews, contributing insights on assortment gaps, competitor offerings, and potential white space opportunities.

Extracted and analyzed complex datasets from Snowflake, identifying scraping anomalies and improving automated crawling processes.

Enhanced competitive data quality through match maintenance and quality control measures, achieving a 15% increase in accuracy.

Led and trained a team of 30+ members, driving high performance and maintaining a 98% validation accuracy rate.

Developed and communicated monthly and quarterly reports for leadership, supporting data-driven decision-making.

Implemented SQL queries for data extraction, validation, and manipulation, maintaining high standards of data accuracy.

Engaged in ad hoc benchmarking analyses and manual matches for high-revenue SKUs, impacting revenue optimization.

Utilized Jira for task tracking and project management, improving collaboration across ongoing validation projects.

Integrated large datasets into the validation pipeline using AWS Glue, improving data processing efficiency by 20%.

Validated product and pricing data transformations by running automated tests with CI/CD pipelines in GitHub, ensuring seamless deployment and version control.

Collaborated with Data Engineers to implement Hadoop and AWS tools like Redshift and Athena, enhancing the speed and reliability of data validation processes.

Used SCALA for automating data extraction processes, improving data validation efficiency.

Leveraged Spark SQL to process and validate large datasets, ensuring data accuracy.

Utilized Hadoop for scalable data processing, ensuring consistency across platforms.

Project: Competitive Match Validation for E-Commerce Platforms

Client: Boomerang Commerce

Role: Data Validation Engineer

Duration: April 2015 to July 2019

Location: Bangalore, India

Project Overview:

This project aimed to validate and optimize competitive product data for e-commerce platforms to enhance pricing accuracy. By leveraging big data tools like Hadoop Trino and AI-driven data scraping, the project focused on matching products across different platforms, identifying anomalies, and ensuring data consistency. The insights derived from the validation process helped clients improve pricing strategies and product offerings, driving business value in e-commerce.

Responsibilities:

Validated data science-generated match candidates across competitors using an internal tool, contributing to competitive product pricing.

Conducted manual matching for high-priority SKUs and proof-of-concept projects, ensuring precision in pricing decisions.

Performed quality assurance and validation checks, working with stakeholders to maintain accurate and complete datasets.

Conducted brand analysis on competitor websites (e.g., Amazon, BestBuy, Bloomingdale's) to identify brand trends and variations.

Leveraged Hadoop Trino to extract and analyze large-scale datasets, identifying anomalies and refining automated scraping processes.

Scraped and analyzed competitor data using AI-lite tools, providing insights into product variants like color and size.

Collaborated on weekly and monthly product matching exercises, contributing to pricing decisions for new product categories.

Produced detailed validation reports and documentation, ensuring process transparency and enabling continuous improvement.

Used Jira for task management and issue tracking, streamlining project workflows and deliverables.

Validated product match data using Hadoop and Spark, ensuring accurate product identification across different e-commerce platforms.

Automated the validation process by using AWS Glue and Spark Streaming to handle large-scale data ingestions and transformations, reducing manual effort by 30%.

Applied SQL queries for data validation and cleaning in the Hadoop ecosystem, ensuring data consistency across platforms like Amazon, BestBuy, and Bloomingdale’s.

Assisted in automating data validation tasks using CI/CD workflows in GitHub, improving pipeline performance and version control.

Conducted data anomaly detection on competitor match data, flagging discrepancies using custom SQL queries.

Worked with stakeholders to monitor and optimize product matching workflows using Apache Airflow, ensuring timely and accurate data delivery.

Created and maintained Power BI dashboards to provide insights to the leadership team, ensuring that validated data was visualized in a way that supported actionable decision-making.

Implemented SCALA for processing and transforming large product datasets efficiently.

Used Hive QL to query and validate large datasets, ensuring data consistency across sources.

Automated data validation tasks with CI/CD workflows using GitHub, enhancing pipeline performance.

Educational details: CSE BTECH JNTUA

Certifications:

AWS Solutions Architect Associate (SAA)

Career Essentials in Generative AI by Microsoft and LinkedIn

Databricks Lakehouse Fundamentals

Generative AI Fundamentals by Databricks

Contact this candidate