Anuroop Chinthireddy
*******************@*****.*** +1-513-***-**** Cincinnati OH
Professional Summary
Results-driven Data Engineer with over 3 years of progressive experience designing and implementing scalable, high-performance data solutions across cloud-native and hybrid environments. Proven expertise in ETL/ELT pipeline development, data modeling, and big data processing, with hands-on experience across a broad technology stack including AWS, GCP, Azure, Snowflake, Apache Spark, Airflow, and Databricks. Proficient in scripting and automation using Python, Scala, and SQL, with a strong foundation in both structured and semi-structured data handling (CSV, JSON, Parquet). Adept at building robust data ingestion pipelines leveraging tools such as AWS Lambda, Glue, Redshift, DynamoDB, and Kafka, supporting real-time and batch processing requirements.
Demonstrated success in optimizing data workflows using Spark transformations, implementing star schema models using Kimball methodology, and enabling advanced analytics through data warehouse solutions in Snowflake and BigQuery. Solid understanding of data governance, encryption, and cloud-based workflow orchestration using Apache Airflow.
Adept in agile environments, delivering high-impact data solutions in cross-functional teams and aligning closely with business stakeholders to deliver actionable insights through BI tools such as Tableau and Power BI. Committed to engineering excellence, continuous integration/deployment (CI/CD), and driving innovation through automation and scalable architectures.
Skills:
Programming Languages: Python, Java, SQL, Scala, C, C++, R Programming
Database Management Systems (DBMS): MySQL, PostgreSQL, Oracle, SQL Server, MongoDB, Cassandra, DBT
Data Warehousing: Amazon Redshift, Google Big Query, Snowflake, Apache Hive.
ETL Tools: Apache Spark, Apache Airflow, Azure Synapse, Talend, Informatica, Apache NiFi
Big Data Technologies: Apache Hadoop, Apache Kafka, Apache HBase, Apache Flink, Apache Storm
Cloud Platforms: AWS, Azure, GCP, including services like Amazon S3, Azure Data Lake Storage, Databricks
DevOps & Automation: CI/CD pipelines (GitHub Actions, Jenkins, Azure DevOps), Docker, Kubernetes, Terraform, ECR
Version Control Systems: Git, GitHub
Data Visualization: Tableau, Power BI
Machine Learning/AI: TensorFlow, PyTorch
Operating System: Linux/Unix, Windows, macOS
Containerization and Orchestration: Docker, Kubernetes
Tools & IDE: Git, IntelliJ, Visual Studio Code, Jupyter Notebook, PyCharm
Hadoop Ecosystem: HDFS, YARN, MapReduce
Monitoring and Logging: Prometheus, Grafana, ELK stack, Splunk, Instana
Analytical Tools: SAS, Microsoft Power Bi
Experience:
Data Engineer Intern May 2024 – March 2025
MACHIT Group Cincinnati OH
Participated in all phases of software development, including requirements gathering and business analysis.
Devised PL/SQL Stored Procedures, Functions, Triggers, Views, and packages.
Designed data models for AWS Lambda applications and analytical reports.
Built a full-service catalog system using Elasticsearch, Logstash, Kibana, Kinesis, and CloudWatch with the eƯective use of MapReduce.
Utilized Indexing, Aggregation, and Materialized views to optimize query performance.
Implemented Python and Scala code for data processing and analytics leveraging inbuilt libraries.
Utilized various Spark Transformations, including mapToPair, filter, flatMap, groupByKey, sortByKey, join, cogroup, union, repetition, coalesce, distinct, intersection, map Partitions, map Partitions with Index, and Actions for cleansing input data.
Developed PySpark code used to compare data between HDFS and S3.
Developed data transition programs from DynamoDB to AWS Redshift (ETL Process) utilizing AWS Lambda to create functions in Python for specific events based on use cases.
Created scripts to read CSV, JSON, and parquet files from S3 buckets using Python, executed SQL operations, and loaded data into AWS S3, DynamoDB, and Snowflake, utilizing AWS Glue with the crawler.
Designed the Staging and Operational Data Storage (ODS) environment for the enterprise data warehouse (Snowflake), including Dimension and fact table design following Kimball's Star Schema approach.
Unit tested data between Redshift and Snowflake.
Implemented a Continuous Delivery pipeline with Docker and GitHub.
Installed and configured Apache Airflow for workflow management and created workflows in Python.
Developed Airflow dags to orchestrate sequential and parallel ETL Jobs.
Experience in moving data between GCP and Azure utilizing Azure Data Factory.
Worked on PySpark script for data encryption using hashing algorithms on client- specified columns.
Experience in developing Spark applications using Spark-SQL in Databricks for data extraction, transformation, and aggregation from multiple file formats for analyzing and transforming data to uncover insights into customer usage patterns.
Reviewed system specifications related to DataStage ETL and developed functions in AWS Lambda for event-driven processing.
Wrote reports using Tableau Desktop to extract data for analysis using filters based on the business use case.
Data Engineer May 2021 – June 2023
Enshire India
Utilized AWS services to architect, analyze, and develop enterprise data warehouse and business intelligence solutions, ensuring optimal architecture, scalability, flexibility, availability, and performance for better decision-making.
Developed Scala scripts and User Defined Functions (UDFs) using data frames/SQL and Resilient Distributed Datasets (RDD) in Spark for data aggregation, querying, and writing back into the S3 bucket.
Executed data cleansing and data mining operations.
Programmed, compiled, and executed programs using Apache Spark in Scala for ETL jobs with ingested data.
Crafted Spark application programs for data validation, cleansing, transformation, and custom aggregation, employing Spark engine and Spark SQL for data analysis, provided to data scientists for further analysis.
Automated ingestion processes using Python and Scala, pulling data from various sources such as API, AWS S3, Teradata, and Snowflake.
Designed and developed Spark workflows using Scala for data extraction from AWS S3 bucket and Snowflake, applying transformations.
Designed and implemented ETL pipelines between various Relational Databases and Data Warehouse using Apache Airflow.
Developed Custom ETL Solutions and Real-Time data ingestion pipelines to move data in and out of Hadoop using Python and shell Script.
Utilized GCP Data processing, GCS, Cloud Functions, and BigQuery for data processing.
Worked on Data Extraction, aggregations, and consolidation of Adobe data within AWS Glue using PySpark.
Implemented Spark RDD transformations to map business analysis and applied actions on top of transformations.
Installed and configured Apache Airflow, automating resulting scripts to ensure daily execution in production.
Created Directed Acyclic Graphs (DAG) utilizing Email Operator, Bash Operator, and Spark Livy Operator for execution in EC2.
Developed scripts to read CSV, JSON, and parquet files from S3 buckets in Python and load them into AWS S3, DynamoDB, and Snowflake.
Ingested real-time data streams to the Spark streaming platform, saving data in HDFS and HIVE through GCP.
Implemented AWS Lambda functions to execute scripts in response to events in Amazon DynamoDB table or S3 bucket or HTTP requests using Amazon API Gateway.
Worked on Snowflake Schemas and Data Warehousing, processing batch, and streaming data load pipeline using Snow Pipe and Matillion from data lake Confidential AWS S3 bucket.
Profiled structured, unstructured, and semi-structured data across various sources to identify patterns and implemented data quality metrics using necessary queries or Python scripts based on the source.
Projects:
WVENTO August 2022 – December 2022
Led a team of three in the development of an event tracking application within the state.
Constructed a subscription model for targeted notifications based on users' specific event locations. -Utilized Android Studio for front-end, ensuring intuitive navigation.
Employed PHP in backend, establishing scalable server-side architecture for eƯicient data processing.
Integrated Google Maps Activity for users to locate and explore events based on proximity.
DATA ANALYSIS AND VISUALIZATION USING TABLEAU January 2023 - May 2023
Led and coordinated a team of five, ensuring eƯective planning and project execution.
Gathered and preprocessed data using Python, identifying features and target variables.
Scripted a machine learning model with Random Forest and evaluated its performance on a test set.
Collaborated with peers to identify and address potential project risks early on.
Created an interactive Tableau dashboard to visualize model performance metrics.
Maintained project documentation, including project plans and reports. Education:
University of Cincinnati, Cincinnati, OH August 2023 – May 2025 Master’s in computer engineering