Data Engineer

Location:

Beaverton, OR

Posted:

February 18, 2025

Contact this candidate

Resume:

Naresh Devasani

Sr. Data Engineer

***********@*****.***

972-***-****

PROFESSIONAL SUMMARY:

Accomplished Sr. Data Engineer with around 10 years of experience in architecting cloud-based data pipelines, big data processing, and advanced analytics across Azure, AWS, and GCP ecosystems.

Proficient in designing and implementing scalable ETL pipelines for structured, semi-structured, and unstructured data using Apache Spark, Kafka, and Snowflake to support real-time and batch processing workflows.

Expertise in data warehousing solutions with platforms like Snowflake, BigQuery, and Teradata, ensuring optimized storage, efficient query performance, and large-scale analytics.

Hands-on experience in data processing and analysis using Python, PySpark, TensorFlow, Scikit-Learn, Pandas, and NumPy, driving actionable insights and enhanced decision-making.

Skilled in leveraging Azure Data Factory (ADF), Azure ML and Databricks to design dynamic data pipelines for large-scale transformation and orchestration.

Strong experience in building and managing real-time data streaming pipelines using Apache Kafka, AWS Kinesis, and GCP Pub/Sub, delivering low-latency, scalable solutions.

Adept at deploying DevOps practices, utilizing tools like Docker, Kubernetes, Terraform, and Jenkins to automate infrastructure provisioning and CI/CD pipelines.

Proven expertise in migrating on-premises data to cloud platforms using Informatica, Apache Sqoop, and Talend, ensuring seamless transitions and operational excellence.

Experienced in database management with strong proficiency in SQL and NoSQL databases such as MySQL, MongoDB, and CosmosDB, ensuring efficient data storage and retrieval.

Extensive knowledge of implementing robust security practices, including role-based access controls using Apache Ranger, Azure Active Directory (AAD), and OAuth protocols to safeguard data.

Demonstrated success in creating and integrating RESTful APIs for seamless interoperability across diverse systems and platforms.

Proficient in monitoring and logging frameworks such as Prometheus, ELK Stack, and Grafana, ensuring system reliability and performance optimization.

Hands-on experience in deploying machine learning workflows, including feature engineering and model deployment using TensorFlow, Kubeflow and Azure Synapse Analytics for predictive analytics.

Certified Snowflake Data Engineer, demonstrating expertise in Snowflake architecture and data engineering best practices.

Strong collaboration skills in Agile environments, working closely with stakeholders, project managers, and cross-functional teams to deliver data-driven solutions aligned with business objectives.

TECHNICAL SKILLS:

Azure: Azure Data Lake Storage, Azure Data Factory (ADF), Azure Active Directory (AAD), CosmosDB, Azure DevOps, Azure SQL DB/DW, Logic App, Key Vault.

AWS: AWS Lambda, AWS S3, AWS Step Functions, AWS Kinesis, AWS Aurora, AWS Redshift, AWS Athena, AWS EC2, AWS Data Pipeline, AWS CloudFormation.

GCP: Google Cloud Dataflow, GCP Pub/Sub, BigQuery, Apache Beam, Dataproc

Big Data Technologies: Apache Hadoop, Apache Spark, Apache Kafka, Kafka Streams, Apache Airflow, Flink, CDH, Apache Hive, Apache Sqoop, Pig, Informatica

Databases: MySQL, Oracle, MongoDB, SQL Server, PostgreSQL

Data warehousing: Teradata, Snowflake, Big Query

DevOps and CI/CD: Docker, Kubernetes, Jenkins, Ansible, Terraform, Git, GitHub

Other Technologies: REST APIs, ServiceNow, Jira, Agile, Scrum, Confluence

Data Processing and Analysis: Python, PySpark, Spark SQL, TensorFlow, Scikit-Learn, Pandas, NumPy, Matplotlib

Monitoring and Logging: ELK Stack, Prometheus, Grafana

Work Experience:

Fidelity Investments, Boston, MA April 2023 – Present

Sr. Data Engineer

Responsibilities:

Designed and developed data ingestion pipelines from on-premises systems to Azure Data Lake Storage (ADLS) using Azure Data Factory.

Built scalable and reusable ETL pipelines to connect data from sources like AWS S3 and Azure Blob Storage to Snowflake, leveraging Snowflake connectors.

Orchestrated data integration workflows in ADF, utilizing components such as Integration Run-time, Linked Services, Datasets, and Pipelines.

Implemented dynamic pipelines in ADF to extract and load multiple files into multiple targets using a single pipeline framework.

Designed and optimized complex SQL queries and stored procedures in Teradata to enhance data retrieval performance, reducing query execution time by 40%.

Developed ETL pipelines and workflows leveraging Azure Synapse Analytics for massive parallel processing and efficient query execution.

Optimized data manipulation processes by applying Spark DataFrame API within Spark sessions, ensuring high-performance transformations.

Designed and executed ETL pipelines for integrating data from multiple sources, including on-premise systems, with a focus on Delta Extraction methods.

Created and managed databases in Snowflake, facilitating efficient data analysis and storage solutions.

Implemented star and snowflake schema models for optimized query performance in data warehouses, improving reporting efficiency and scalability

Worked extensively within the Spark ecosystem, optimizing Spark jobs for data processing and analytics workloads.

Experienced in Oracle database management, including writing and optimizing SQL queries, performance tuning, and ETL integration for data warehousing solutions

Developed and optimized ETL pipelines integrating Cosmos DB with Azure Data Factory, Synapse Analytics, and Functions, ensuring efficient data ingestion, transformation.

Developed and maintained Unix shell scripts to automate daily ETL workflows, reducing manual intervention and improving data processing efficiency.

Created shell scripts for log monitoring, alerting, and job scheduling using Cron, ensuring system reliability and proactive issue resolution.

Coordinated across parallel data engineering workstreams, conducted reviews of team deliverables, and ensured adherence to best practices.

Reviewed and optimized data pipelines and integration workflows to ensure data accuracy, consistency, and performance efficiency.

Texas Capital, Dallas, TX October 2021 – March 2023

Data Engineer

Responsibilities:

Developed robust ETL pipelines using Azure Data Factory (ADF), streamlining data integration and enabling real-time data processing, resulting in a 25% improvement in data synchronization speed.

Designed and developed scalable big data applications using Apache Spark and Databricks, optimizing data processing for large datasets and reducing data transformation time by 35%.

Applied Python, NumPy, and Pandas for data manipulation and advanced data analysis, enhancing decision-making with accurate insights and reducing analysis time by 15%.

Optimized Azure Data Lake Storage for large-scale data analytics, reducing data retrieval time and improving overall data access efficiency.

Ensured data access security by implementing role-based access controls using Apache Ranger, safeguarding sensitive data across cloud environments and reducing unauthorized access incidents by 20%.

Built and maintained data warehousing solutions on Snowflake, improving query performance by 30% and reducing query latency for large-scale data workloads.

Provisioned and managed cloud infrastructure with Terraform, automating the deployment of resources in Azure to improve development and operational efficiency.

Implemented Teradata BTEQ scripts for automating data extraction, transformation, and loading (ETL) processes, ensuring seamless data integration across enterprise systems.

Integrated Apache Kafka, Kafka Streams, and Kafka Connect for real-time data streaming and low-latency processing, supporting real-time analytics and event-driven architectures.

Collaborated with cross-functional teams to ensure proper data transformation workflows in SQL Server, delivering high-quality business intelligence (BI) insights.

Leveraged CosmosDB for building globally distributed data solutions, enhancing scalability and performance for geographically dispersed applications.

Created and deployed machine learning models using TensorFlow and Kubeflow, enabling intelligent data-driven decisions and automating processes, improving operational efficiency by 25%.

Maintained high data quality through data governance practices, ensuring compliance and traceability.

Led data migration projects for structured and unstructured data, improving data accessibility and streamlining transitions between platforms.

Delivered AI-driven projects by utilizing advanced feature engineering and model deployment pipelines, accelerating business intelligence capabilities and automation.

Orchestrated complex workflows with Luigi for effective task management and data pipeline automation, reducing manual intervention and enhancing data workflow reliability.

Monitored system performance using Prometheus, ensuring data pipeline reliability, tracking potential issues, and improving system uptime.

Environment: Apache Ranger, Snowflake, Terraform, Azure Data Factory (ADF), Apache Spark, Databricks, Python, NumPy, Pandas, Azure Data Lake Storage, Apache Kafka, Kafka Streams, Kafka Connect, SQL Server, CosmosDB, TensorFlow, Kubeflow, Azure Active Directory (AAD), Luigi, ServiceNow, Prometheus.

Oss Technologies, India. October 2018 – July 2021

Data Engineer

Responsibilities:

Led the development of real-time data streaming pipelines using GCP Pub/Sub and Google Cloud Dataflow, enabling timely data processing and reducing latency.

Applied Python, Pandas, and NumPy to manipulate and analyze large datasets, generating insights that informed key business decisions.

Implemented data transformation and data extraction processes using SQL and Python, ensuring accurate and timely reporting.

Designed and implemented ETL workflows using Informatica, ensuring seamless data integration across on-premise and cloud environments and improving data accessibility.

Developed and optimized data warehousing solutions using BigQuery and HiveQL to support large-scale big data analytics and improve data query performance.

Managed and optimized Apache Hive for efficient data storage and retrieval, enhancing query performance.

Designed and built data pipelines for batch processing and real-time processing, leveraging Apache Flink to improve data throughput and reduce processing times.

Applied Maven for build automation and dependency management, ensuring smooth deployment and consistent integration across development environments.

Implemented machine learning algorithms using Scikit-Learn and TensorFlow to create predictive analytics models, improving forecast accuracy for business metrics.

Monitored system performance and tracked key metrics using Grafana, identifying areas for improvement and optimizing pipelines for better resource usage.

Ensured data quality, accuracy, and compliance throughout the data lifecycle, reducing errors and ensuring adherence to regulatory standards.

Ensured data security and compliance by implementing robust OAuth authentication and authorization protocols across data systems.

Migrated large datasets from on-premise environments to cloud-based storage systems using Apache Sqoop, improving system scalability and reducing maintenance costs.

Developed and optimized data storage solutions in NoSQL databases (including MySQL and MongoDB) to support scalable and flexible data models.

Utilized Cloudera CDH to build and maintain distributed computing environments, optimizing data processing tasks across high-throughput and low-latency pipelines.

Led Agile projects using Confluence for documentation and task management, fostering effective communication and on-time project delivery within the team.

Environment: GCP Pub/Sub, Google Cloud Dataflow, Python, Pandas, NumPy, SQL, Informatica, BigQuery, HiveQL, Apache Hive, Apache Flink, Maven, Scikit-Learn, TensorFlow, Grafana, REST APIs, OAuth, Apache Sqoop, MySQL, MongoDB, Cloudera CDH, Confluence.

Brill Mindz Technologies, India. January 2015 – September 2018

Data Engineer

Responsibilities:

Managed and scaled distributed environments using AWS EC2 instances to facilitate high-throughput data processing pipelines.

Leveraged Hadoop and HDFS for efficient big data storage and data retrieval, ensuring reliable and cost-effective data management.

Built reusable and modular ETL pipelines connecting AWS S3, Redshift, and DynamoDB, enhancing data integration and simplifying pipeline maintenance.

Engineered and maintained AWS Lambda serverless workflows to automate data processing, enhancing operational efficiency for event-driven data pipelines.

Designed and optimized database schemas, queries, and stored procedures to support both transactional and analytical workloads in large-scale systems.

Developed and executed MapReduce algorithms for data aggregation and data transformation tasks, improving processing speeds for massive datasets.

Orchestrated complex data workflows in Apache Airflow to automate data pipelines, improving data processing efficiency and reducing manual intervention.

Implemented data pipeline automation with Shell scripting, reducing manual intervention and improving operational efficiency.

Collaborated with business intelligence and analytics teams to provide real-time, actionable insights from streaming data sources using Apache Beam.

Engineered batch data processing workflows to handle large volumes of data, ensuring smooth data transitions from source to target.

Created and managed ETL processes using Talend, ensuring seamless data integration from various sources while maintaining data governance and compliance standards.

Optimized SQL and PL/SQL queries for performance tuning, enhancing the efficiency of transactional workloads and reducing query execution time by 30%.

Implemented robust data security standards to protect sensitive information.

Developed and maintained scalable data pipelines to process large-scale datasets, ensuring maintainability and scalability for future growth.

Ensured the data accuracy and quality of all processed datasets, leading to a 15% improvement in data reliability for key business reports.

Utilized Git for version control in a collaborative development environment, ensuring streamlined and coordinated coding efforts among team members.

Environment: AWS EC2, Hadoop, HDFS, AWS S3, AWS Lambda, MapReduce, Shell scripting, Apache Beam, Talend, SQL, PL/SQL, Git.

EDUCATION:

Bachelor of Technology in Computer Science Vardhaman College of Engineering, India.

Contact this candidate