Data Engineer Power Bi

Location:

Dallas, TX

Salary:

70000

Posted:

September 18, 2025

Contact this candidate

Resume:

Harshith Chilukuri

DATA ENGINEER

Location: TX Email: **************@*****.*** Phone: 945-***-**** https://www.linkedin.com/in/harshith-c-b41624305/ PROFESSIONAL SUMMARY

Accomplished as a Data Engineer with above 4+ years of extensive experience in developing and optimizing data architectures. Proficiently employed a diverse set of tools and technologies to craft scalable and efficient solutions, contributing significantly to the success of data-driven initiatives.

Employed Python adeptly for advanced data engineering tasks, showcasing expertise in data manipulation, analysis, and engineering applications. Focused on optimizing data workflows and ensuring the delivery of robust, scalable solutions.

Demonstrated advanced proficiency in SQL (SSIS, SSMS, SSRS), excelled in designing database schemas, optimizing queries, and intricately modeling data. Designing, developing, and maintaining robust Extract, Transform, Load (ETL) processes using SSIS for efficient data integration. Leveraging SAS for data exploration, analysis, and manipulation.

Strong Hadoop and platform support experience with the entire suite of tools and services in major Hadoop Distributions

– Cloudera, Amazon EMR, Azure HDInsight, and Hortonworks.

Applied extensive experience with data manipulation and analysis libraries, including NumPy, Pandas, Scikit-Learn, SciPy, TensorFlow, and OpenCV.

Utilized a strong database skill set, engaging with tools such as SQL Server Enterprise Manager, Microsoft Management Console, MySQL, T-SQL, Shell Scripting, and MongoDB.

Applied skills effectively using reporting tools like Power BI, Tableau, SharePoint, and MS Office for impactful data visualization and communication.

Showcased advanced proficiency in Hadoop, encompassing a deep understanding of distributed file systems, adept use of MapReduce and Hive for parallel processing and analytics, and skillful implementation of scalable, fault-tolerant big data solutions in the data engineering domain.

Proficiently employed cloud technologies, particularly AWS services such as EC2, S3, ELB, CloudWatch, RDS, DynamoDB, and AWS Athena.

Leveraged expertise in data warehouse and business intelligence tools such as SQL Server Integration Services

(SSIS), Visual Studio, and MS Data Tools. Expert in re-architecting on-premises SQL data warehouses to AWS cloud platforms, utilizing AWS and third-party services. Proficient in ETL processes, data analysis, migration, and SQL programming. Hands-on experience with SSIS for building efficient data integration and workflow solutions.

Proficient in analyzing over 20 Business KPIs utilizing advanced tools such as Hive, Pig, and Spark.

Illustrated proficiency in reporting and ETL tools, specifically highlighting expertise in Informatica, Talend, Tableau, and Power BI. Demonstrated the ability to generate insightful visualizations and optimize ETL processes for efficiency. EDUCATION

University of North Texas Denton, TX

Master’s in Advanced Data Analyst Aug 2020 – May 2023 Amrita University Bengaluru, India

Bachelor in Computer Science Aug 2015 – May 2019

TECHNICAL SKILLS

Languages: Python, PySpark, Spark-SQL, T-SQL, PL/SQL Data Quality and Governance: Apache Atlas, Collibra, Informatica Data Quality, Talend Data Quality, Trifacta.

Cloud Service: AWS (S3, EC2, EMR, Redshift, Athena, IAM), Azure (Data ADF, Synapse Data warehouses, Stream Analytics)

Databases MySQL, PostgreSQL, Snowflake, Oracle, DB2, Hive, Redshift, Azure SQL DW, Data bricks Database.

Data Visualization and Reporting: Tableau, Power BI, Looker, QlikView, Google Data Studio CI/CD and Version Control: Bricks, Kafka, Jenkins, Git, GitLab, Bit bucket Python Libraries NumPy, Pandas, Beautiful Soup, PySpark, SQLAlchemy, PyTest, Apache Airflow

Big Data Technologies: Hadoop (HDFS, Map Reduce, YARN), Apache Spark, Apache, Kafka, Apache Hive, Sqoop.

Additional Data Engineering Skills: Data mining, data cleaning, data governance, data integrity, A/B testing PROFESSIONAL EXPERIENCE

MCKINSEY & COMPANY TX, USA

Data Engineer Feb 2024 – Current

Crafted Spark applications using PySpark and Spark-SQL for data extraction, transformation, and aggregation, enhancing data processing speed by 25%.

Utilized Azure Cloud Services including HDInsight, Databricks, Data Lake, Blob Storage, Data Factory, Synapse, SQL DB, and SQL DWH.

Employed IBM DataStage Director for job scheduling, validation, execution, and monitoring, achieving a 98% success rate in seamless pipeline execution.

Engineered and optimized complex MapReduce and Spark jobs for data processing, transformation, and enrichment, maximizing distributed computing and reducing costs by 20%.

Incorporated advanced analytics and ML libraries like PyTorch into Hadoop, improving machine learning accuracy.

Executed POCs for Delta table package integration and Active Directory group management, driving a 25% improvement in data management practices.

Formulated automated Databricks workflows for parallel data loads, improving processing speed by 40% and reliability by 30%. Implemented and optimized SSIS, SSMS, and SSRS solutions for efficient data integration, management, and reporting.

Created database objects (tables, views, stored procedures, triggers, packages, and functions) using T-SQL, ensuring robust data management and structure.

Established an ETL framework using Spark and Hive, reducing data processing time by 30% and enhancing retrieval efficiency.

Produced Tableau visualizations (bar graphs, scatter plots, geo maps), increasing stakeholder engagement by 35% and enabling 20% more data-driven decisions.

Synchronized Kafka with Spark Streaming for real-time analytics, enabling a 50% faster response rate in decision- making processes.

KPMG India

Data Engineer Jul 2020 – Jul 2022

Revitalized a 10GB+ Data Warehouse infrastructure, automating 60% of processes, minimizing downtime, and improving data accessibility.

Utilized Apache Spark (in-memory processing) on AWS S3 Data Lake to handle large datasets; loaded data into S3 buckets, then filtered and ingested into Hive external tables.

Strong hands-on expertise in SQL—created and modified stored procedures, functions, views, indexes, and triggers for optimized database performance.

Performed ETL operations using Python, SparkSQL, AWS S3, and Amazon Redshift on terabytes of data to extract valuable customer insights.

Implemented CI/CD pipelines using Jenkins, Terraform, and AWS, streamlining deployment and integration.

Conducted end-to-end architecture and implementation assessments across AWS services including Amazon EMR, Redshift, and S3.

Leveraged AWS EMR to transform and migrate large datasets into and out of Amazon S3 and Amazon DynamoDB.

Strong understanding and practical experience with AWS services: S3, EC2, IAM, RDS, and orchestration tools such as AWS Step Functions, AWS Data Pipeline, and AWS Glue.

Transformed data using AWS Glue dynamic frames with PySpark, cataloged data using Crawlers, and scheduled ETL jobs through AWS Workflows.

Designed and managed cloud infrastructures using AWS (EC2, S3, CloudFront, Elastic File System, IAM), enabling automated operations.

Cipla India

Data Analyst Sep 2019 – Jun 2020

Developed Power BI dashboards to visualize key mobility KPIs and operational performance, enhancing real-time decision-making for stakeholders.

Designed and automated Python-based ETL workflows using AWS Glue, ingesting and transforming 5,000+ records monthly across multiple business domains.

Developed and maintained Power BI dashboards using advanced DAX calculations, providing real-time KPI visibility and reducing executive reporting time by 40%.

Collaborated with finance and operations teams to integrate Oracle Cloud ERP with AWS S3 and Athena, enhancing financial data accuracy and reporting speed.

Built Alteryx data pipelines to automate recurring reports, eliminating 30% of manual work and significantly boosting team productivity.

Implemented data integration pipelines connecting Oracle databases with cloud platforms, improving data consistency across systems.

Applied data mining techniques to uncover hidden patterns in rider behavior and improve model predictions.

Led data cleaning and standardization processes to ensure high data quality and reliability across analytics projects.

Supported data governance initiatives by implementing validation rules, access controls, and metadata documentation to maintain data integrity and compliance.

Performed A/B testing on marketing campaigns to evaluate effectiveness and drive improvements in customer engagement and conversion rates.

PROJECTS

Financial Data Lake Integration for Regulatory Analytics

Led the integration of audit and financial data from 6 departments into a Delta Lake on AWS S3, enabling traceable lineage and time-travel queries (Financial Modeling, Data Warehousing).

Modeled star schemas in ER Studio and automated refresh schedules using Airflow, supporting near real-time dashboards for 20+ stakeholders (Data Orchestration).

Collaborated with finance teams using NetSuite Budgeting & Planning to align data pipelines with regulatory compliance standards.

Metadata Catalog and Validation Automation

Built a metadata discovery framework using Informatica EDC, enabling lineage tracking and quick impact analysis across 10+ enterprise data systems (CRM, Data Collection).

Automated schema validations with PL/SQL and Python scripts, reducing QA dependency and data delivery delays across UAT and production environments (Root Cause Analysis).

Contact this candidate