Data Engineer

Location:

Dayton, NJ

Posted:

March 10, 2025

Contact this candidate

Resume:

Praneeth Reddy

Email: *************.******@*****.***

Phone: +1-862-***-****

Professional Summary:

●Data Engineer with over 3 years of experience in designing, developing, and optimizing data pipelines and ETL processes. Proficient in working with Azure and AWS cloud platforms, with hands-on expertise in data warehousing, ETL, and big data technologies. Skilled in SQL, Python, and Spark for data transformation and analysis. Adept at implementing scalable data solutions to support business intelligence and analytics needs.

●Proficient in building and maintaining data pipelines using Python, SQL, and Apache Airflow.

●Experience with Hadoop ecosystem tools like Hive, Spark, and Kafka for data processing and real-time streaming.

●Skilled in cloud platforms (AWS, Azure, GCP) and tools like AWS Glue, Azure Data Factory, and Google BigQuery.

●Strong understanding of data warehousing concepts, including star schema and snowflake schema.

●Proficient in SQL databases (MySQL, PostgreSQL, Oracle) and NoSQL databases (MongoDB, Cassandra).

●Familiar with data visualization tools like Tableau and Power BI for creating dashboards and reports.

Technical Skills

●Programming & Scripting: Python, SQL, Scala

●Big Data & Cloud Technologies: Azure Data Factory, Azure Databricks, AWS Glue, Apache Spark, Kafka

●Databases & Warehousing: SQL Server, Snowflake, Redshift, Hive, PostgreSQL

●ETL & Data Processing: Azure Data Factory (ADF), Airflow, PySpark

●Visualization & Reporting: Power BI, Tableau

●CI/CD & DevOps: Git, Jenkins, Docker

●Operating Systems: Linux, Windows

Professional Experience

Progress Solutions

Data Engineer Jan2024 -Present

Responsibilities:

●Designed and implemented ETL pipelines using Azure Data Factory (ADF) to extract, transform, and load data from various sources such as Azure SQL Database, Blob Storage, and Azure Data Lake.

●Developed Python scripts to automate data processing tasks and integrate them into Azure Functions for serverless execution.

●Created and managed data lakes on Azure Data Lake Storage (ADLS Gen2), organizing structured and unstructured data for efficient querying and analysis.

●Built data pipelines to ingest data from on-premises SQL Server and Azure SQL Database into Azure Synapse Analytics for analytics and reporting.

●Optimized Azure Synapse performance by tuning queries, implementing distribution styles, and partitioning strategies, resulting in a 20% improvement in query performance.

●Utilized Azure Data Factory to orchestrate and schedule data workflows, ensuring timely and accurate data delivery.

●Assisted in setting up real-time data streaming pipelines using Azure Stream Analytics and Event Hubs to process and analyze streaming data from IoT devices.

●Developed ARM templates to automate the provisioning of Azure resources, ensuring consistent and repeatable deployments.

●Collaborated with data analysts to create Synapse SQL views and materialized views for faster access to frequently queried data.

●Monitored and troubleshooted data pipelines using Azure Monitor and Log Analytics, ensuring high availability and reliability.

●Implemented data security best practices by configuring Azure Active Directory (AAD), role-based access control (RBAC), and encryption for data at rest and in transit.

●Assisted in migrating on-premises data to Azure Data Lake and Azure SQL Database using Azure Data Migration Service (DMS), reducing migration time by 30%.

●Worked with Azure Databricks to process large datasets using PySpark, enabling advanced analytics and machine learning workflows.

●Supported the development of machine learning models by preparing and cleaning datasets using Azure Databricks and Azure Machine Learning (AML).

●Documented data pipelines, workflows, and best practices to ensure knowledge sharing and team collaboration.

Environment: Azure Data Factory, Azure Data Lake, Azure Synapse, Azure Functions, Azure Stream Analytics, Azure Databricks, Azure SQL Database, Azure Monitor, Python, SQL, Azure DevOps.

EP Soft Hyderabad, India

Data Engineer March 2022- December 2023

Responsibilities:

●Designed and implemented ETL pipelines using AWS Glue to extract, transform, and load data from various sources such as S3, RDS, and Redshift.

●Developed Python scripts to automate data processing tasks and integrate them into AWS Lambda functions for serverless execution.

●Created and managed data lakes on AWS S3, organizing structured and unstructured data for efficient querying and analysis.

●Built data pipelines to ingest data from RDS (MySQL, PostgreSQL) and NoSQL databases (DynamoDB) into Redshift for analytics and reporting.

●Optimized Redshift performance by tuning queries, implementing sort keys, and distribution styles, resulting in a 20% improvement in query performance.

●Utilized AWS Glue Crawlers to automatically discover and catalog data stored in S3, creating a centralized metadata repository for easy data discovery.

●Assisted in setting up real-time data streaming pipelines using Amazon Kinesis and Lambda to process and analyze streaming data from IoT devices.

●Developed CloudFormation templates to automate the provisioning of AWS resources, ensuring consistent and repeatable deployments.

●Collaborated with data analysts to create Redshift views and materialized views for faster access to frequently queried data.

●Monitored and troubleshooted data pipelines using CloudWatch and AWS Glue Job Metrics, ensuring high availability and reliability.

●Implemented data security best practices by configuring IAM roles, S3 bucket policies, and encryption for data at rest and in transit.

●Assisted in migrating on-premises data to AWS S3 and Redshift using AWS DMS (Data Migration Service), reducing migration time by 30%.

●Worked with Athena to run SQL queries directly on data stored in S3, enabling ad-hoc analysis without the need for data movement.

●Supported the development of machine learning models by preparing and cleaning datasets using AWS Glue and SageMaker.

●Documented data pipelines, workflows, and best practices to ensure knowledge sharing and team collaboration.

Environment: AWS Glue, S3, Redshift, Lambda, Kinesis, CloudFormation, IAM, CloudWatch, Athena, DMS, Python, SQL, Jenkins, GitHub.

Data Engineer March 2020- December 2020

Responsibilities:

●Assisted in developing ETL workflows for data ingestion and transformation using AWS Glue and Redshift.

●Wrote Python scripts to clean, process, and analyze large datasets, improving data quality.

●Participated in the migration of on-premises databases to cloud-based data warehouses.

●Supported performance tuning efforts for SQL queries and Spark jobs to enhance efficiency.

●Created Power BI dashboards to visualize key business metrics and data insights.

Education:

●Master’s in Information Technology

Saint Peter’s University- May 2024

●Bachelor of Technology in Electronics and Communication Engineering

Jawaharlal Nehru Technological University, Hyderabad, India

Graduated: 2022

Projects:

Real-Time Data Streaming with Kafka and Spark Streaming

●Built a real-time data pipeline using Kafka and Spark Streaming to process and store streaming data in HDFS.

●Used Scala to develop Spark jobs for data transformation and aggregation.

●Improved data processing speed by 20% through optimization of Spark jobs.

ETL Pipeline for Sales Data

●Developed an ETL pipeline using Apache Airflow to extract, transform, and load sales data from multiple sources into a data warehouse.

●Used Python for data cleaning and transformation tasks.

●Automated the pipeline using Airflow to run daily, reducing manual effort by 30%.

Data Migration to Azure Data Lake

●Assisted in migrating data from on-premises SQL Server to Azure Data Lake using Azure Data Factory.

●Created Hive tables and wrote HiveQL queries for data analysis.

●Reduced data migration time by 25% through optimization of data

Contact this candidate