Data Engineer Power Bi

Location:

Cincinnati, OH, 45202

Posted:

April 24, 2025

Contact this candidate

Resume:

Anuroop Chinthireddy

*******************@*****.*** +1-513-***-**** Cincinnati OH

Professional Summary

Results-driven Data Engineer with over 3 years of progressive experience designing and implementing scalable, high-performance data solutions across cloud-native and hybrid environments. Proven expertise in ETL/ELT pipeline development, data modeling, and big data processing, with hands-on experience across a broad technology stack including AWS, GCP, Azure, Snowﬂake, Apache Spark, Airﬂow, and Databricks. Proﬁcient in scripting and automation using Python, Scala, and SQL, with a strong foundation in both structured and semi-structured data handling (CSV, JSON, Parquet). Adept at building robust data ingestion pipelines leveraging tools such as AWS Lambda, Glue, Redshift, DynamoDB, and Kafka, supporting real-time and batch processing requirements.

Demonstrated success in optimizing data workﬂows using Spark transformations, implementing star schema models using Kimball methodology, and enabling advanced analytics through data warehouse solutions in Snowﬂake and BigQuery. Solid understanding of data governance, encryption, and cloud-based workﬂow orchestration using Apache Airﬂow.

Adept in agile environments, delivering high-impact data solutions in cross-functional teams and aligning closely with business stakeholders to deliver actionable insights through BI tools such as Tableau and Power BI. Committed to engineering excellence, continuous integration/deployment (CI/CD), and driving innovation through automation and scalable architectures.

Skills:

Programming Languages: Python, Java, SQL, Scala, C, C++, R Programming

Database Management Systems (DBMS): MySQL, PostgreSQL, Oracle, SQL Server, MongoDB, Cassandra, DBT

Data Warehousing: Amazon Redshift, Google Big Query, Snowﬂake, Apache Hive.

ETL Tools: Apache Spark, Apache Airﬂow, Azure Synapse, Talend, Informatica, Apache NiFi

Big Data Technologies: Apache Hadoop, Apache Kafka, Apache HBase, Apache Flink, Apache Storm

Cloud Platforms: AWS, Azure, GCP, including services like Amazon S3, Azure Data Lake Storage, Databricks

DevOps & Automation: CI/CD pipelines (GitHub Actions, Jenkins, Azure DevOps), Docker, Kubernetes, Terraform, ECR

Version Control Systems: Git, GitHub

Data Visualization: Tableau, Power BI

Machine Learning/AI: TensorFlow, PyTorch

Operating System: Linux/Unix, Windows, macOS

Containerization and Orchestration: Docker, Kubernetes

Tools & IDE: Git, IntelliJ, Visual Studio Code, Jupyter Notebook, PyCharm

Hadoop Ecosystem: HDFS, YARN, MapReduce

Monitoring and Logging: Prometheus, Grafana, ELK stack, Splunk, Instana

Analytical Tools: SAS, Microsoft Power Bi

Experience:

Data Engineer Intern May 2024 – March 2025

MACHIT Group Cincinnati OH

Participated in all phases of software development, including requirements gathering and business analysis.

Devised PL/SQL Stored Procedures, Functions, Triggers, Views, and packages.

Designed data models for AWS Lambda applications and analytical reports.

Built a full-service catalog system using Elasticsearch, Logstash, Kibana, Kinesis, and CloudWatch with the eƯective use of MapReduce.

Utilized Indexing, Aggregation, and Materialized views to optimize query performance.

Implemented Python and Scala code for data processing and analytics leveraging inbuilt libraries.

Utilized various Spark Transformations, including mapToPair, ﬁlter, ﬂatMap, groupByKey, sortByKey, join, cogroup, union, repetition, coalesce, distinct, intersection, map Partitions, map Partitions with Index, and Actions for cleansing input data.

Developed PySpark code used to compare data between HDFS and S3.

Developed data transition programs from DynamoDB to AWS Redshift (ETL Process) utilizing AWS Lambda to create functions in Python for speciﬁc events based on use cases.

Created scripts to read CSV, JSON, and parquet ﬁles from S3 buckets using Python, executed SQL operations, and loaded data into AWS S3, DynamoDB, and Snowﬂake, utilizing AWS Glue with the crawler.

Designed the Staging and Operational Data Storage (ODS) environment for the enterprise data warehouse (Snowﬂake), including Dimension and fact table design following Kimball's Star Schema approach.

Unit tested data between Redshift and Snowﬂake.

Implemented a Continuous Delivery pipeline with Docker and GitHub.

Installed and conﬁgured Apache Airﬂow for workﬂow management and created workﬂows in Python.

Developed Airﬂow dags to orchestrate sequential and parallel ETL Jobs.

Experience in moving data between GCP and Azure utilizing Azure Data Factory.

Worked on PySpark script for data encryption using hashing algorithms on client- speciﬁed columns.

Experience in developing Spark applications using Spark-SQL in Databricks for data extraction, transformation, and aggregation from multiple ﬁle formats for analyzing and transforming data to uncover insights into customer usage patterns.

Reviewed system speciﬁcations related to DataStage ETL and developed functions in AWS Lambda for event-driven processing.

Wrote reports using Tableau Desktop to extract data for analysis using ﬁlters based on the business use case.

Data Engineer May 2021 – June 2023

Enshire India

Utilized AWS services to architect, analyze, and develop enterprise data warehouse and business intelligence solutions, ensuring optimal architecture, scalability, flexibility, availability, and performance for better decision-making.

Developed Scala scripts and User Defined Functions (UDFs) using data frames/SQL and Resilient Distributed Datasets (RDD) in Spark for data aggregation, querying, and writing back into the S3 bucket.

Executed data cleansing and data mining operations.

Programmed, compiled, and executed programs using Apache Spark in Scala for ETL jobs with ingested data.

Crafted Spark application programs for data validation, cleansing, transformation, and custom aggregation, employing Spark engine and Spark SQL for data analysis, provided to data scientists for further analysis.

Automated ingestion processes using Python and Scala, pulling data from various sources such as API, AWS S3, Teradata, and Snowflake.

Designed and developed Spark workflows using Scala for data extraction from AWS S3 bucket and Snowflake, applying transformations.

Designed and implemented ETL pipelines between various Relational Databases and Data Warehouse using Apache Airflow.

Developed Custom ETL Solutions and Real-Time data ingestion pipelines to move data in and out of Hadoop using Python and shell Script.

Utilized GCP Data processing, GCS, Cloud Functions, and BigQuery for data processing.

Worked on Data Extraction, aggregations, and consolidation of Adobe data within AWS Glue using PySpark.

Implemented Spark RDD transformations to map business analysis and applied actions on top of transformations.

Installed and configured Apache Airflow, automating resulting scripts to ensure daily execution in production.

Created Directed Acyclic Graphs (DAG) utilizing Email Operator, Bash Operator, and Spark Livy Operator for execution in EC2.

Developed scripts to read CSV, JSON, and parquet files from S3 buckets in Python and load them into AWS S3, DynamoDB, and Snowflake.

Ingested real-time data streams to the Spark streaming platform, saving data in HDFS and HIVE through GCP.

Implemented AWS Lambda functions to execute scripts in response to events in Amazon DynamoDB table or S3 bucket or HTTP requests using Amazon API Gateway.

Worked on Snowflake Schemas and Data Warehousing, processing batch, and streaming data load pipeline using Snow Pipe and Matillion from data lake Confidential AWS S3 bucket.

Profiled structured, unstructured, and semi-structured data across various sources to identify patterns and implemented data quality metrics using necessary queries or Python scripts based on the source.

Projects:

WVENTO August 2022 – December 2022

Led a team of three in the development of an event tracking application within the state.

Constructed a subscription model for targeted notiﬁcations based on users' speciﬁc event locations. -Utilized Android Studio for front-end, ensuring intuitive navigation.

Employed PHP in backend, establishing scalable server-side architecture for eƯicient data processing.

Integrated Google Maps Activity for users to locate and explore events based on proximity.

DATA ANALYSIS AND VISUALIZATION USING TABLEAU January 2023 - May 2023

Led and coordinated a team of ﬁve, ensuring eƯective planning and project execution.

Gathered and preprocessed data using Python, identifying features and target variables.

Scripted a machine learning model with Random Forest and evaluated its performance on a test set.

Collaborated with peers to identify and address potential project risks early on.

Created an interactive Tableau dashboard to visualize model performance metrics.

Maintained project documentation, including project plans and reports. Education:

University of Cincinnati, Cincinnati, OH August 2023 – May 2025 Master’s in computer engineering

Contact this candidate