Data Engineer Azure

Location:

Hutto, TX, 78634

Posted:

April 28, 2025

Contact this candidate

Resume:

PAVANI SENAPATHI

DATA ENGINEER

Location: TX Phone: +1-512-***-**** Email: ***********@*****.*** LinkedIn: www.linkedin.com/in/pavanispaug PROFESSIONAL SUMMARY

●Data Engineer with 5+ years of experience in data ingestion, ETL pipelines, and cloud platforms (AWS, Azure, GCP), creating scalable data solutions for finance, healthcare, and retail industries.

●Proficient in AWS tools (S3, Glue, Lambda, EMR, Lake Formation), Hadoop, Spark (PySpark, Scala), Kafka, and Azure Data Factory for efficient data storage, processing, and pipeline optimization.

●Knowledgeable in data modeling (Snowflake, relational), SQL/NoSQL (DynamoDB, MySQL), and storage (Azure Data Lake, Blob), plus Python-based ML with TensorFlow, Scikit-learn, and PyTorch for analytics.

●Certified Azure and GCP Data Engineer, excelling in data visualization (Power BI, Tableau), cloud architecture, governance, and security compliance.

SKILLS

Languages

Python, R, Java, C++, Bash, Unix Shell Scripting

Frameworks

Hadoop, EMR, Hive, Big Query, PySpark, Kafka, ETL, Databricks, Kubernetes, Hortonworks, Spark, Scikit-learn, Risk Metrics Frameworks

Cloud

AWS, Microsoft Azure, GCP.

Databases

MySQL, Oracle DB, JDBC, DynamoDB, MS-Access

Data Visualization: Tableau, PowerBI, MS Excel

Libraries

Pandas, GeoPandas, Folium, Seaborn, Snowflakes, NumPy, TensorFlow, Matplotlib, Keras, Hmm-learn, sqlite3, PyTorch. Interests

Data visualization, Deep Learning, Computer Vision, Statistics, Data Mining, Data Analysis, Cloud Computing.

Financial & Risk Expertise:

Risk calculations, risk methodology enhancements, financial derivatives pricing models, high-concurrency distributed systems

Operating Systems

Google, iOS, Windows, Linux

Tools

Git, PyCharm, VS Code, Jupyter Notebook, MATLAB,

Anaconda, Sublime Text, Docker, Informatica, Terraform. WORK EXPERIENCE

BANK OF NEW YORK MELLON, TX

SR. DATA ENGINEER SEP 2024 - CURRENT

●Engineered data ingestion managing 2M+ data records monthly using Hadoop, MapReduce, HBase, Hive, and Sqoop, ensuring reliable data handling.

●Streamlined data workflows and infrastructure, enhancing collection, storage, and processing for large-scale datasets.

●Managed data pipelines from Azure Data Lake Storage to Spark Data Frames, performing essential transformations and actions on high-volume datasets.

●Constructed Python-based frameworks to calculate Risk Metrics by implementing a scalable risk computation module, reducing financial risk assessment time by 30%.

●Conceptualized Microsoft Azure Services (Azure VM, Blob Storage, SQL Database, Synapse Analytics, Data Factory, IAM, and Power BI) to manage and process data, improving cloud data operations.

●Improved data processing in Hive by implementing partitioning and bucketing, generating monthly data cubes for advanced business insights in Power BI.

●Maintained Python-based ETL pipelines to enhance the efficiency and accuracy of financial data reconciliation and reporting systems.

●Implemented secure CI/CD processes to safeguard sensitive financial data during integration and deployment, adhering to compliance frameworks like SOC 2 and GDPR.

●Developed Azure Functions to aggregate event-driven data, store processed information in Azure Cosmos DB, and handle large data sets.

●Engineered scalable, containerized solutions using Docker and Kubernetes for financial analytics applications, enabling seamless processing of high-volume transactional data.

●Led the design and deployment of ETL pipelines on Azure Databricks, streamlining data ingestion from sources to ensure smooth data processing.

●Applied Data Modeling techniques for snowflake schemas and optimized SQL queries to enhance data accuracy, quality, and integration processes.

●Established Azure Delta Merge operations to handle slowly changing dimensions (SCDs), ensuring data consistency and minimizing processing overhead.

●Enhanced data lake management using Delta Lake for efficient updates, upserts, and deletes, reducing redundancy and improving query performance.

MCKINSEY & COMPANY, REMOTE (US)

SR. QA DATA ENGINEER MAY 2024 - AUG 2024

●Integrated AWS Step Functions to orchestrate large-scale data processing workflows in AWS EMR using Apache Spark, optimizing analytics and computational workflows for diverse datasets.

●Optimized scalable Python-based data pipelines and SQL workflows using PyCharm to streamline data integration, transformation, and real-time analytics, enabling scenario planning and strategic decision-making aligned with client-specific business models.

●Configured Amazon RDS to store and manage relational datasets for ETL pipelines, ensuring efficient data retrieval and integration with analytical tools.

●Built interactive Power BI dashboards to visualize key performance indicators, enabling consultants to deliver actionable insights to industry clients.

●Devised Snowflake-based data pipelines for processing semi-structured and structured data, enabling centralized governance and scalable analytics workflows.

●Established Snowflake as a data warehouse solution to streamline legacy Map Reduce jobs and facilitate faster query execution with Spark RDD transformations.

●Leveraged AWS Step Functions to automate cloud resource management workflows, integrating with PowerShell scripts to improve infrastructure efficiency for high-volume data operations.

●Enhanced ETL pipelines using AWS Glue and Snowflake integration, automating transformation processes and data ingestion for scalable and efficient analytics.

UBER, KY

SOFTWARE DEVELOPER/ DATA ENGINEER APR 2023 - AUG 2023

●Devised a real-time ingestion platform using Apache Spark, Kafka, and Databricks on Azure, processing approximately 500GB of data weekly and enhancing pipeline reliability.

●Conceptualized SQL to structure and extract data for predictive analytics, enabling precise demand forecasting and resource allocation across Uber's supply chain network.

●Enforced data governance policies and access controls utilizing GCP Identity and Access Management (IAM) to safeguard sensitive information and adhere to regulatory standards.

●Developed robust data storage and processing solutions using GCP services such as BigQuery, Cloud Storage, and Cloud Dataflow to support analytics and business intelligence initiatives.

●Automated Kafka topic creation with Python web services, saving 10+ hours per month in manual effort and ensuring consistent topic configurations for streaming workflows.

●Formulated scalable ETL workflows using Apache Spark with Scala and Python, processing 50K+ records per job, reducing runtime by 15 minutes per execution.

●Built an end-to-end metrics dashboard in Power BI, enabling monitoring of ingestion and processing workflows, reducing issue identification time to under 10 minutes.

ADANI, INDIA

DATA ENGINEER JUL 2018 - JAN 2022

●Led Proof of Concept (PoC) initiatives, conducting feasibility studies and testing new technologies to support data-driven solutions for global clients across industries.

●Constructed Python-based data pipelines to process and analyze energy consumption patterns, enabling optimization of resource utilization and cost efficiency across utility networks.

●Implemented data models and database architectures using SQL and Snowflake, ensuring high-quality data management for complex enterprise systems.

●Architected real-time data streaming pipelines using Apache Kafka, optimizing processing and data ingestion for enterprise-level applications.

●Automated ETL workflows to extract, transform, and load large datasets, leveraging Python and Apache Spark for seamless data processing and faster analytics insights.

●Executed machine learning techniques and predictive analytics to enhance decision-making capabilities in diverse applications, from supply chain optimization to customer behavior analysis.

●Formulated data pipelines using Azure Data Factory, ensuring efficient integration of data sources and enabling real-time data processing capabilities for finance, retail, and healthcare clients.

●Collaborated with cross-functional teams in an Agile environment, using Jira for project management, and contributed to designing scalable data architectures for clients undergoing cloud transformation and digitalization.

●Strengthened data governance and security protocols by implementing best practices for version control using GitHub and data handling in Azure, ensuring compliance for large-scale enterprise applications. KPMG, INDIA

DATA ANALYST FEB 2017 – JUN 2018

●Designed data pipelines using AWS S3 and AWS Lambda, automating data ingestion and processing to enhance efficiency and data accuracy.

●Designed Python scripts to automate data extraction and transformation processes, enhancing the accuracy and efficiency of financial models for energy sector investments and risk assessments.

●Conducted data mining to identify purchase patterns, uncover insights, and support impact analysis for strategic business experiments.

●Maintained Power BI dashboards, delivering actionable insights on customer complaints and resolutions to improve reporting.

●Created data models and performed advanced analytics to guide business campaign strategies, boosting operational efficiency.

●Wrote complex SQL queries to manipulate and extract data from large datasets, ensuring quality and enabling stakeholder insights.

●Applied exploratory data analysis (EDA) techniques to detect trends, anomalies, and correlations, driving informed decision-making.

●Collaborated with senior analysts to transform raw data into meaningful visualizations for stakeholders, improving clarity and usability.

●Conducted preliminary data analysis for client projects, providing insights that supported key business decisions.

●Formulated state flow logic and control for vehicle ECU replacements using Dspace ControlDesk, MATLAB, and CAN architecture, enhancing active vehicle system control.

EDUCATION

University of the Cumberlands US

Master of Science in Information Technology 2023 - 2024 JNTUK University India

Bachelors in Electrical and Electronics Engineering 2010 - 2014 CERTIFICATIONS

●Microsoft Certified: Azure Data Engineer Associate

●Google Cloud Certified Professional Data Engineer

Contact this candidate