Data Engineer Big

Location:

Mumbai, Maharashtra, India

Salary:

65000

Posted:

October 15, 2025

Contact this candidate

Resume:

Sathwik Vardhan Reddy Pothireddy

Data Engineer

****************@*****.*** +1-630-***-**** Illinois, USA (Open to Relocate) LinkedIn Professional Summary

• Around 3+ years of extensive IT experience as a Data engineer with expertise in designing data-intensive applications using Hadoop Ecosystem and Big Data Analytical, Cloud Data engineering (Aws,), Data Visualization, Data Warehouse, Reporting, and Data Quality solutions.

• Hands-on expertise with the Hadoop ecosystem, including strong knowledge of Big Data technologies such as HDFS, Spark, YARN, MapReduce, Apache Cassandra, HBase, Zookeeper, Hive, Oozie, Impala, Pig, and Flume.

• Extensive experience in working with Base SAS (MACROS, Proc SQL, ODS), SAS/ACCESS, and SAS/GRAPH in UNIX, Windows, and Mainframe environment.

• Experience in Data Analysis, Data Profiling, Data Integration, Migration, and Data governance, Metadata Management, MDM and Configuration Management.

• Experience in UNIX Korn Shell scripting for process automation.

• Thorough knowledge in SAS Programming, using Base SAS, Macro Facility, Proc SQL, SAS Procedures, SAS Functions, SAS Formats, and ODS facility in data scrubbing, manipulation, and preparation to produce summary datasets and reports.

• With the knowledge on Spark Context, Spark, Data frame API, Spark Streaming, and Pair RDD's, worked extensively on Py Spark to increase the efficiency and optimization of existing Hadoop approaches.

• Good understanding of Spark Architecture including Spark Core, Spark SQL, Data Frames, Spark Streaming, Driver Node, Worker Node, Stages, Executors, and Tasks.

• In-depth understanding and experience with real-time data streaming technologies such as Kafka and Spark Streaming.

• Experience in writing queries using SQL, experience in data integration and performance training.

• Hands-on experience on AWS components such as EMR, EC2, S3, RDS, IAM, Auto Scaling, CloudWatch, SNS, Athena, Glue, Kinesis, Lambda, Redshift, DynamoDB to ensure a secure zone for an organization in AWS public cloud.

• Expertise in Azure infrastructure management (Azure Web Roles, Worker Roles, SQL Azure, Azure Storage).

• Experience in Migrating SQL database to Azure data Lake, Azure data lake Analytics, Azure SQL Database, Data bricks, and Azure SQL Data warehouse and controlling and granting database access and migrating on-premises databases to Azure Data Lake store using Azure Data factory.

• Good understanding of Spark Architecture with Data bricks, Structured Streaming. Setting Up AWS and Microsoft Azure with Data bricks.

• Experience in integrating Hive queries into Spark environment using Spark SQL.

• Experienced in building Snow pipe and In-depth knowledge of Data Sharing in Snowflake and Snowflake Database, Schema and Table structures.

• Experienced in configuring and administering the Hadoop Cluster using major Hadoop Distributions like Apache Hadoop and Cloudera.

• Hands-on experience in handling database issues and connections with SQL and No SQL databases such as MongoDB, HBase, SQL server. Professional Experience

Cloud Data Engineer, Intellectsoft, TX Nov 2024 – Present

• Developed ETL pipelines to extract, transform, and load digital wallet transaction data into a centralized Snowflake data warehouse, optimizing accessibility for business intelligence by 35%.

• Implemented real-time data ingestion workflows using Snowpipe, enabling near-instantaneous processing of chargeback transactions and improving decision-making efficiency by 40%.

• Tuned and optimized SQL queries in Snowflake to enhance performance and reduce latency for chargeback reporting, accelerating operational insights.

• Built real-time ETL pipelines using Azure Data Factory and Snowflake to streamline transactional data processing from multiple payment systems.

• Extracted, transformed, and loaded data from source systems to Snowflake using native functionalities, SQL scripts, and Snowpipe for continuous data ingestion, ensuring 99% data accuracy and improving data freshness by 50%.

• Designed and implemented scalable semi-structured data models in Snowflake, ensuring high performance for analytical and reporting requirements.

• Collaborated with Data Science teams to integrate Machine Learning pipelines using Snowpark, reducing latency and improving processing efficiency.

• Configured Snowflake External Tables to query data directly from external Azure Blob Storage, minimizing storage duplication and improving processing speed by 35% while cutting storage costs by 20%.

• Deployed and managed distributed Apache Spark clusters in Azure Databricks, processing terabytes of data with high availability and fault tolerance.

• Developed and maintained ETL pipelines in Azure Data Lake Storage (ADLS), supporting real-time analytics and machine learning workflows.

• Automated monitoring, alerting, and data-driven workflows using Azure Logic Apps, Azure Functions, Microsoft Insights, and Grafana, reducing incident response times by 40%.

• Implemented backup and disaster recovery strategies using Azure Backup and Azure Site Recovery for critical workloads.

• Created PowerShell scripts to manage cloud storage operations, including uploading, archiving, and versioning files in Azure Blob Storage.

• Deployed code across SIT, UAT, and Production environments following DevOps practices, using Jira, Confluence, and Bitbucket for version control and collaboration, achieving 100% successful deployments with improved traceability.

• Designed pipelines to convert raw data into Parquet format, enabling efficient large-scale data analytics.

• Developed multiple Python DAGs and integrated ETL workflows in Airflow, improving overall workflow efficiency and maintainability. Data Engineer Intern, EXCELERATE.ORG, IL May 2024 – Oct 2024

• Built and optimized real-time data pipelines using Azure Event Hubs, Snowflake (Snowpipe), and Azure Stream Analytics for grocery delivery and inventory monitoring, improving data refresh rates by 45% and reducing latency to under 10 seconds.

• Automated ETL workflows with Python, Azure Functions, and Data Factory, integrating Snowflake and Delta Lake for scalable data transformation, increasing processing throughput by 35%.

• Developed interactive Power BI dashboards and serverless APIs via Azure Functions and API Management, delivering actionable insights to stakeholders and reducing manual reporting time by 60%.

• Ensured secure data access and governance using Azure Entra ID, Azure Data Share, RBAC, and Microsoft Insights, enabling 100% compliance with internal audit standards and improving cross-team collaboration efficiency by 25%. Data Engineer, COGNIZANT TECHNOLOGIES, India Jan 2022 – Mar 2023

• 15+ databases, 20+ applications, and multiple web servers from on-premises infrastructure to Microsoft Azure Cloud, improving scalability and system uptime by 42.6%, while reducing infrastructure costs by 33.8%.

• Engineered and automated end-to-end ETL pipelines in Azure Synapse and Data Factory, integrating data from 12+ disparate sources and cutting manual data processing efforts by 40%, enhancing overall workflow efficiency and data reliability.

• Created large datasets by combining individual datasets using SQL joins, SAS/BASE merging techniques, and Python, optimizing data preparation for analytics.

• Built real-time data ingestion pipelines using Spark Streaming and Kafka, processing terabytes of records efficiently for downstream applications.

• Developed Python notebooks in Azure Databricks and processed data into Azure SQL and ADLS Gen2, enabling robust analytics workflows.

• Managed and optimized Snowflake and Azure SQL databases, including data modeling, schema improvements, and query tuning, reducing query latency by 25%.

• Implemented dynamic pipelines, datasets, and linked services in Azure Data Factory, automating workflows and improving ETL efficiency.

• Developed and maintained NoSQL data integrations with HBase, Cassandra, and MongoDB, ensuring smooth Hadoop ecosystem operations.

• Applied CI/CD practices for Java-based data pipelines, automating deployment, testing, and monitoring, reducing errors and increasing release efficiency.

• Designed data warehouse solutions using Azure Synapse, PolyBase, external tables, and Azure Data Lake, improving analytics query performance by 33.7% and enabling faster reporting.

Data Engineer Intern, Anvizon (PwC), India Jul 2021 – Dec 2021

• Assisted in building ETL pipelines to extract, transform, and load data from multiple sources, improving workflow efficiency significantly.

• Processed and cleaned large datasets using Python and SQL, enabling accurate reporting and faster data-driven decision-making across teams.

• Supported development of real-time data workflows using Spark Streaming and Kafka, handling over 1TB of high-velocity data daily for analytics.

• Created automated scripts to monitor data quality and pipeline health, reducing errors by 24% and ensuring reliable, consistent data delivery. Skills

• Programming & Scripting: Java, Python, C, C++, SQL, Shell Scripting, Apache Spark (Scala/PySpark)

• Databases: Relational: MySQL, Oracle, Teradata, MS SQL Server, PostgreSQL, DB2, HBase, DynamoDB, MongoDB, Cassandra

• ETL / BI Tools: Informatica, Tableau, Power BI, Microsoft Excel

• Cloud Platforms: (AWS) EC2, S3, EMR, Kinesis, RDS, VPC, Elastic Load Balancing, IAM, CloudWatch, (Azure) Azure Data Factory, Azure Data Lake, Azure SQL Database, Azure Synapse Analytics

• Big Data Technologies: Hadoop, MapReduce, HDFS, YARN, Apache Spark, SparkMLib, Hive, Pig, Sqoop, HBase, Flume, Kafka, Zookeeper, Oozie, NiFi, StreamSets, Apache Airflow

• DevOps & CI/CD Tools: Terraform, Docker, Kubernetes, Jenkins, Ansible

• Version Control: Git, GitHub, SVN

• Application Servers: Apache Tomcat (5.x, 6.0), JBoss (4.0)

• Monitoring & Others: Splunk, Jupyter Notebook, Jira, Kerberos Education

Master of Science, Computer Science Aug 2023 – May 2025 Illinois Institute of Technology, Chicago, USA

Bachelor of Engineering, Electronics and Communications Engineering Aug 2018 – Jul 2022 ANNA UNIVERSITY, Tamil Nadu, India

Projects

Sentiment Analysis on Amazon Reviews (Machine Learning)

• Leveraged BERT and RNN/LSTM models to perform deep contextual sentiment analysis on real-world Amazon review data.

• Improved model accuracy and performance, enabling more precise insights for customer feedback and business decisions. Inventory Management (Advanced Database Organization)

• Optimized checkout process and payment transaction handling to ensure secure, efficient operations and reduce errors.

• Streamlined purchase flow, enhancing user experience and overall system reliability for e-commerce operations. Chicago House Price Prediction (Data Preparation and Analysis)

• Prepared and cleaned housing datasets using R, training predictive models to forecast prices accurately.

• Built trend visualizations to improve model interpretability and support data-driven decision-making for stakeholders. SparklingScript: DSL for Generating Apache Spark Code (Big Data Technologies)

• Developed parsing and code generation modules in Python using ANTLR, enabling analysts to generate Spark code efficiently.

• Created user-friendly syntax with robust testing, error handling, and documentation for scalable deployment and adoption. Publication

FULLY AUTOMATED SMART CROP FIELD PROTECTION SYSTEM.

• Designed a real-life crop protection solution using sensors and automation to detect animal intrusion in farmlands.

• Collected and processed sensor-generated data streams, applying data preprocessing and real-time monitoring techniques similar to data engineering workflows.

• Utilized basic tools such as SQL and Python for logging, filtering, and analyzing sensor data to ensure timely response mechanisms.

• Demonstrated the ability to bridge IoT data pipelines with data engineering principles, highlighting practical problemsolving and system integration skills.

Contact this candidate