Resume

Data Engineering Azure, GCP, AWS, Java, Python, Puspark

Location:

United States

Salary:

70$

Posted:

March 27, 2024

Contact this candidate

Resume:

Prathyusha

Contact No: - +1-682-***-****

Email ID: - ad4maq@r.postjobfree.com

Sr. DATA ENGINEER

ABOUT MYSELF: -

Sr. Data Engineer with over 10 + years of experience in Data warehousing, Data engineering, Feature engineering, big data, ETL/ELT, and Business Intelligence.

Experienced as a big data architect and engineer, specializing in AWS and Azure frameworks, Cloudera, Hadoop Ecosystem, Spark/PySpark/Scala, Data bricks, Hive, Redshift, Snowflake, relational databases, tools like Tableau, Airflow, DBT, Presto/Athena, and Data DevOps Frameworks/Pipelines with strong Programming/Scripting skills in Python, Expertise on designing and developing the big data Analytics platforms for Retail, Logistics, Healthcare and Banking Industries using Big Data, Spark, Real-time streaming, Kafka, Data Science, Machine Learning, NLP and Cloud

Worked in Data Engineering, Data Pipeline Design, Development, and Implementation as a Sr. Data Engineer and Data Modeler/Data Analyst using AWS, Azure Cloud.

PROFESSIONAL EXPERIENCE: -

Experienced in AWS, Azure DevOps, Continuous Integration, Continuous Deployment, and Cloud Implementations.

Extensive experience in Text Analytics, generating Data Visualization using Python and creating dashboards using tools like Tableau.

Developed Consumer-based custom features and applications using Python, Django, HTML, and CSS.

Orchestrated end-to-end data lifecycle management processes, encompassing data acquisition, transformation, storage, and reporting, with a focus on CCAR, LFI, CECL, IFRS9, and ALLL guidelines.

Experienced with Software Development Life Cycle, Database designs, agile methodologies, coding, and testing of enterprise applications and IDEs such as Jupiter Notebook, PyCharm, Emacs, Spyder and Visual Studio.

Managed the deployment of QlikView/Qlik Sense applications to production and performed ongoing maintenance and updates to ensure alignment with changing business needs.

Worked with ETL tools Including Talend Data Integration, Talend Big Data, Pentaho Data Integration, and Informatica.

Led end-to-end Data Engineering initiatives, overseeing the entire data lifecycle from ingestion to consumption.

Designed and implemented robust data pipelines, ensuring the smooth flow of data through various stages of the lifecycle.

Hands-on experience in using Informatica IICS for cloud-based data integration, application integration, and data quality.

Utilized UNIX scripting to automate file handling, data extraction, transformation, and loading processes.

Implemented error handling and logging mechanisms in UNIX scripts to ensure robust and reliable execution.

Implemented Continuous Integration/Continuous Deployment (CI/CD) pipelines to automate the testing and deployment processes.

Integrated Collibra with other data management tools and platforms, such as ETL tools and data warehouses, to streamline data governance processes and ensure data consistency.

Proficient in managing the entire data science project life cycle and actively involved in all the phases of the project life cycle including Data acquisition, Data cleaning, Feature scaling, Dimension reduction techniques, Feature engineering, Statistical modeling, and Ensemble learning.

Good understanding of Apache Zookeeper and Kafka for monitoring and managing Hadoop jobs and Cloudera CDH4 and CDH5 for monitoring and managing Hadoop clusters.

Experience in migrating data from Teradata to Amazon Redshift and S3, utilizing various data formats such as Parquet and CSV.

Leveraged QuickSight's AutoGraph feature to automatically choose optimal visualization types based on data characteristics, streamlining the creation of insightful visualizations.

Extensive experience in successfully implementing OBIEE solutions and integrating them with various data sources and data warehouses to enable insightful data analysis and reporting.

Experience and familiarity in building modern Spring applications with Spring Boot.

Good working experience on Spark (Spark Core Component, Spark SQL, Spark Streaming, Spark MLlib, Spark GraphX, SparkR) with Scala and Kafka.

Experienced in using Python and PySpark with cloud services like AWS (S3, EC2, EMR) and Google Cloud Platform for data storage and processing.

Hands-on experience on Google Cloud Platform (GCP) in all the big data products Big Query, Cloud Data Proc, Google Cloud Storage, and Composer (Air Flow as a service).

Used Explain to optimize Teradata SQL queries for better performance.

Used the Teradata tools like Teradata SQL Assistant, Administrator and PMON extensively.

Used DBT to test the data and ensure data quality.

Experience in setting up real-time data replication using HVR for databases like Oracle, SQL Server, PostgreSQL, and MySQL.

In-depth knowledge of SSIS, Power BI, Informatica, T-SQL, and reporting and analytics.

Experience in building and architecting multiple Data pipelines, end-to-end ETL, and ELT processes for Data ingestion and transformation in GCP and coordinating tasks among the team.

Understanding structured datasets, data pipelines, ETL tools, and extensive knowledge on tools such as DBT, data stage.

Designed and maintained data pipelines connecting various Azure services to Azure Synapse for streamlined data workflows.

Maintained comprehensive documentation for SnapLogic workflows, ensuring that the integration processes are well-documented and easily understandable.

Conducted performance tuning on Azure Synapse SQL queries for optimal execution in a cloud-based environment.

Extensive experience in designing and implementing data warehousing solutions on Snowflake, harnessing the power of cloud-based architecture.

Understanding of Spark Architecture including Spark SQL, Data Frames, Spark Streaming.

Strong experience in analyzing Data with Spark while using Scala.

Hands-on experience in using other AWS (Amazon Web Services) like S3, VPC, EC2, Auto scaling, RedShift, Dynamo DB, Route53, RDS, Glacier, EMR.

Experience in Analysis, Design, Development, and Big Data in Scala, Spark, Hadoop, Pig, and HDFS environments.

Proficient in using Jenkins for automating parts of the software development (SDET) process, including build automation, testing, and deployment, thereby enhancing development efficiency.

Developed data ingestion pipelines to efficiently extract, transform, and load (ETL) data from various sources into the data fabric, ensuring high data quality and consistency.

Used Data Stage stages namely Sequential file, Transformer, Aggregate, Sort, Datasets, Join, Funnel, Row Generator, Remove Duplicates, Teradata Extender, and Copy stages extensively.

Built machine learning solutions using PySpark for large sets of data on the Hadoop ecosystem.

Expertise in statistical programming languages like Python and R including Big-Data technologies like Hadoop, HDFS, Spark, and Hive.

Extensive knowledge on designing Reports, Scorecards, and Dashboards using Power BI.

Worked on technical stack like Snowflake, SSIS, SSAS, and SSRS to design warehousing applications.

Integrated DBT with data warehouse technologies such as Snowflake, Big Query, or Redshift to leverage their capabilities for efficient data processing and storage.

Experience in data mining, including predictive behavior analysis, Optimization and Customer Segmentation analysis using SAS and SQL.

Experience in Applied Statistics, Exploratory Data Analysis, and Visualization using Matplotlib, Tableau, Power BI, and Google Analytics.

TECHNICAL SKILLS: -

Hadoop Distributions

Cloudera, AWS EMR, and Azure Data Factory.

Languages

Scala, Python, SQL, Python, Hive QL, KSQL.

IDE Tools

Eclipse, IntelliJ, PyCharm.

Cloud platform

AWS, Azure, GCP.

AWS Services

VPC, IAM, S3, Elastic Beanstalk, Cloud Front, Redshift, Lambda, Kinesis,

Dynamo DB, Direct Connect, Storage Gateway, EKS, DMS, SMS, SNS, and SWF

Reporting and ETL Tools

Tableau, Power BI, Talend, AWS GLUE.

Databases

Oracle, SQL Server, MySQL, MS Access,

NoSQL Database (HBase, Cassandra, Mongo DB)

Big Data Technologies

Hadoop, HDFS, Hive, Pig, Oozie, Sqoop, Spark, Machine Learning, Pandas,

NumPy, Seaborn, Impala, Zookeeper, Flume, Airflow, Informatica,

Snowflake, Data bricks, Kafka, Cloudera

Machine Learning and

Statistics

Regression, Random Forest, Clustering, Time-Series Forecasting, Hypothesis,

Explanatory Data Analysis

Containerization

Docker, Kubernetes

CI/CD Tools

Jenkins, Bamboo, GitLab CI, UDeploy, Travis CI, Octopus

Operating Systems

UNIX, LINUX, Ubuntu, CentOS.

Other Software

Control M, Eclipse, PyCharm, Jupyter, Apache, Restful API,

Jira, Putty, Advanced Excel, TOAD, Oracle SQL developer,

MS Office, FTP, Control-M, SQL Assistant, Rally, JIRA, GitHub, JSON

Frameworks

Django, Flask, WebApp2

EDUCATIONAL QUALIFICATIONS: -

Bachelor’s in Computer Science from JNTU.

CERTIFICATIONS: -

AWS Certified Data Analytics – Specialty

PROFESSIONAL EXPERIENCE: -

Client: - Kaiser Permanente, CO April 2022-Present

Role: - Sr. ETL Engineer

Responsibilities: -

Transformed business problems into Big Data solutions and defined Big Data strategy and Roadmap.

Installed, configured, and maintained Data Pipelines.

Led and implemented successful large-scale data migrations from Teradata to AWS platforms, leveraging tools like AWS Glue, AWS DMS (Database Migration Service), and AWS SCT (Schema Conversion Tool).

Developed and maintained automated test frameworks using industry-standard tools such as Selenium, Appium, or Cypress.

Designed and created compelling data visualizations, including charts, graphs, and dashboards, using Amazon QuickSight for effective presentation of key business metrics.

Implemented end-to-end data integration solutions using SnapLogic, connecting diverse data sources and destinations seamlessly.

Automated data workflows using Palantir Foundry to improve efficiency and reduce manual intervention in repetitive tasks.

Utilized distributed computing frameworks to parallelize and optimize NLP tasks, improving efficiency and reducing processing times.

Collaborated closely with data scientists to understand NLP model requirements, providing scalable infrastructure and data engineering support for model development and deployment.

Utilized Informatica DPM for comprehensive data mapping and discovery, identifying sensitive data across various systems, databases, and applications.

Implemented solutions that involved federated querying across diverse data sources, such as Hadoop, cloud storage, and relational databases, using Starburst Presto.

Designed and implemented scalable and resilient architectures using Starburst Presto, supporting the processing of large datasets, and accommodating increased demand for concurrent queries.

Configured API proxies in APIGEE Edge to manage and secure APIs, including setting up policies for authentication, authorization, and traffic management.

Proficient in utilizing Informatica Data Quality (IDQ) to ensure data accuracy, consistency, and integrity throughout the organization's data assets.

Created comprehensive API documentation using APIGEE's documentation tools, ensuring clear and concise information for developers and stakeholders.

Experienced in configuring, managing, and optimizing Apache Hadoop clusters for large-scale data processing, storage and analysis.

Led a team of data engineers in designing, constructing, installing, testing, and maintaining highly scalable data management systems.

Designed and maintained Hive tables to store structured data, enabling efficient querying and analysis.

Proficient in managing, tracking, and troubleshooting data migration processes using AWS Migration Hub.

Employed Pytest parametrize to create data-driven tests, covering multiple test scenarios with a single test function.

Optimized data storage and retrieval mechanisms, ensuring efficiency and responsiveness at every stage of the data lifecycle.

Designed and implemented Extract, Transform, Load (ETL) processes to populate the OBIEE data repository, ensuring data accuracy and consistency for reporting and analysis.

Worked on integrating gcp based application to database using NodeJS.

Proficient in designing and developing RESTful APIs using the OpenAPI Specification (OAS) to enable efficient and standardized data access and integration.

Strongly advocate for the adoption of Data Mesh principles within the organization to enable decentralized data ownership, data autonomy, and better data collaboration.

Developed and fine-tuned Splunk data inputs, ensuring accurate and efficient ingestion of large volumes of machine data from various sources.

Effectively communicated complex data insights to non-technical stakeholders, enhancing data-driven decision-making processes.

Developed and deployed data integration tasks, data mappings, and workflows within the Informatica IICS environment.

Led the implementation and management of data clusters for efficient storage and processing of large datasets.

Implemented and managed CI/CD pipelines to automate the build, deployment, and testing processes, improving software delivery (SDET) efficiency by 40%.

Implemented partitioning strategies in Hive for improved data organization and query speed.

Optimized cluster configurations to enhance performance and meet specific project requirements.

Collaborated with cross-functional teams to ensure seamless integration of data clusters into the overall architecture.

Implemented data masking strategies using Informatica DPM to protect sensitive information during non-production data usage, maintaining data privacy in testing and development environments.

Implemented and managed data solutions using Azure Synapse Analytics, ensuring high-performance analytics and reporting capabilities.

Utilized Azure Synapse Studio for end-to-end development, including data preparation, exploration, and visualization.

Implemented data warehousing best practices within Azure Synapse for efficient querying and analysis.

Collaborated with Azure Synapse administrators to optimize data storage and processing for cost-effectiveness.

Implemented data transformations and cleansing operations within ADF V1 to enhance data quality and accuracy.

Implemented data integration solutions using ADF V2, demonstrating proficiency in handling evolving data engineering requirements.

Designed and implemented data pipelines using Azure Data Factory for efficient data orchestration and movement across various Azure services.

Familiarity in working with popular frameworks like Hibernate, Spring, Spring Boot and MVC.

Develop and deploy the outcome using spark and Scala code in Hadoop cluster running on GCP.

Designed the business requirement collection approach based on the project scope and SDLC methodology.

Developed custom data quality (CDQ) scripts and transformations using Informatica's transformation language to address unique data quality challenges and requirements.

Experience in working with Java, J2EE, React, and NoSQL, Database, and web technologies.

Experience in implementing data security measures in Python and PySpark, ensuring data privacy and compliance.

Managed Splunk data lifecycle, optimizing storage costs and ensuring data availability for long-term analysis.

Utilized Git for version control, maintaining a structured and organized repository of code and configurations.

Designed XML Schema Definitions (XSD) to validate and enforce XML structure and data types, ensuring data integrity during system interactions.

Developed and maintained WSDL files for SOAP web services, defining the structure, messages, and operations for service-oriented architecture (SOA) implementations.

Managed security protocols for computer networks, databases, systems, and servers to ensure protection of the company's digital assets.

Used different components in Pentaho like Database Lookup & Join, generate rows, Calculator, Row normalizer & demoralizers, Java Script, Add constant, Add Sequence

Created Pipelines in Azure Data Factory using Linked Services/Datasets/Pipeline/ to Extract, Transform, and load data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool, and backward.

Worked with Data governance and Data quality to design various models and processes.

Involved in all the steps and scope of the project reference data approach to MDM, have created a Data Dictionary and Mapping from Sources to the Target in MDM Data Model.

Chipped away at Python content to extricate information from Netezza data sets and move it to AWS S3.

Implemented query optimization techniques in Snowflake, ensuring fast and efficient querying of large datasets for analytics and reporting.

Ensured data integrity and consistency through the normalization process in Snowflake schema.

Developed, tested, and deployed Business feature set in Node.js with Express and MongoDB backend, incorporating APIs.

Collaborated with data scientists to integrate machine learning models into Azure Databricks workflows for advanced analytics.

Implemented data cataloging and metadata management systems to provide a centralized repository for data assets, enhancing data discoverability and lineage tracking.

Implemented data governance practices within the data mesh framework, ensuring data privacy, security, and compliance with regulatory requirements.

Implemented restful API to transfer data between systems.

Experience in analyzing and writing SQL queries to extract data in JSON format through rest API calls with API keys.

Developed data models using Star and Snowflake schema for effective representation of relational databases.

Designed and coded different SQL statements in Teradata for generating reports.

Involved in query translation, optimization, and execution.

Automated the data processing with Oozie to automate data loading into the Hadoop Distributed File System.

Creating New Dashboard, Implement Filters, Dashboard Parameters, Content Linking using Pentaho Dashboards

Established proactive monitoring and alerting mechanisms to identify potential bottlenecks or performance issues within the data fabric infrastructure.

Collaborated with cross-functional teams to understand data requirements, identify data sources, and create scalable and maintainable DBT pipelines for data extraction, transformation, and loading.

Designed and Developed Oracle PL/SQL and Shell Scripts, Data Import/Export, Data Conversions, and Data Cleansing.

Implemented schema designs that facilitated complex analytical queries and reporting requirements.

Conducted in-depth AML risk assessments to identify vulnerabilities and potential areas of concern within the organization's operations, products, and customer relationships.

Conducted performance tuning of SQL queries and PL/SQL code, optimizing database operations for improved responsiveness.

Optimized data pipelines and processing jobs within the data mesh to enhance performance and scalability, resulting in reduced data processing costs and improved overall efficiency.

Performed data analysis, statistical analysis, generated reports, listings, and graphs using SAS tools, SAS/Graph, SAS/SQL, SAS/Connect, and SAS/Access.

Developed Spark applications using Scala and Spark-SQL for data extraction, transformation, and aggregation from multiple file formats.

Created on-demand tables on S3 files using Lambda Functions and AWS Glue using Python and PySpark.

Used Athena to transform and clean the data before it was loaded into data warehouses.

Used Data Build Tool (DBT) to debug complex chains of queries. They can be split into multiple models and macros that can be tested separately.

Automated the update process for Type 2 SDC changes to minimize manual intervention and ensure data accuracy.

Authored Python (PySpark) Scripts for custom UDF’s for Row/ Column manipulations, merges, aggregations, stacking, data labeling, and for all Cleaning and conforming tasks.

Designed and maintained DBT models, including defining the schemas, managing incremental data loads, and handling data lineage for efficient data processing.

Used ORC, Parquet file formats on HDInsight, Azure Blobs, and Azure tables to store raw data.

Involved in writing T-SQL working on SSIS, SSAS, Data Cleansing, Data Scrubbing, and Data Migration.

Proficient in utilizing PowerShell for various data-related tasks and automation.

Successfully utilized HVR in hybrid cloud environments, ensuring seamless data replication between on-premises and cloud databases.

Hands-on experience in provisioning, configuring, and scaling Amazon EMR clusters to process large datasets efficiently in the cloud.

Integrated data quality rules and data standards into Collibra, enabling data profiling and data quality monitoring, resulting in improved data accuracy and integrity.

Worked on Dimensional and Relational Data Modeling using Star and Snowflake Schemas, OLTP/OLAP system, Conceptual, Logical and Physical data modeling using Erwin.

Performed PoC for Big data solution using Hadoop for data loading and data querying.

Wrote Pig Scripts to generate Map Reduce jobs and performed ETL procedures on the data in HDFS.

Used Sqoop to channel data from different sources of HDFS and RDBMS.

Involved in Normalization and De-Normalization of existing tables for faster query retrieval.

Managed the deployment of PySpark applications on clusters, utilizing tools like Hadoop YARN for resource allocation and job scheduling.

Designed and maintained CD tables to capture historical changes in dimension attributes.

Developed and maintained data dictionary to create metadata reports for technical and business purposes.

Developed Json Scripts for deploying the Pipeline in Azure Data Factory (ADF) that process the data using the Cosmos Activity.

Environment: - Python, Spring boot, PowerShell, Snaplogic, Collibra, Snowflake, Java, HVR, Azure HD Insights, OBIEE, OAS, Graph DB, Pentaho, Hadoop (HDFS, MapReduce), YARN, Apache Hadoop, Spark, Spark Context, AWS, Azure, GCP, Spark-SQL, PySpark, DBT, Pair RDDs, EMR, Spark Data Frames, Spark YARN, Hive, Pig, HBase, Oozie, Hue, Sqoop, Restful API, Flume, Oracle, NIFI, Kafka, Erwin9.8, BigData3.0, Hadoop3.0, Oracle12c, Pig0.17, Sqoop1.4, Oozie4.3.

Client: - FedEx, Minneapolis, MN Sep 2021-March 2022

Role: - Big Data Engineer / Analyst

Responsibilities: -

Expertise in designing and deployment of Hadoop clusters and different Big Data analytic tools including Pig, Hive, HBase, Oozie, Sqoop, Flume, Spark, and Impala.

Migrated data from On-Prem to cloud databases, using Teradata Utilities & Informatica for better loading, and created data pipelines using Azure data factory & Azure data bricks.

Implemented Micro Services architecture using spring boot framework.

Ensured data security and compliance by implementing secure data transfer protocols and adhering to industry best practices in SnapLogic integrations.

Collaborated with cross-functional teams to build and deploy collaborative analytics solutions on Palantir Foundry.

Integrated NLP processes seamlessly into data lakes, enabling the extraction of valuable insights from unstructured textual data stored in distributed environments.

Integrated Informatica DPM into the broader data governance framework to ensure consistent enforcement of data privacy policies across the organization.

Developed and deployed the outcome using Spark and Scala code in Hadoop cluster running on GCP.

Used Snaplogic for child pipelines.

Designed and optimized data transformations using Azure Databricks, enhancing the efficiency and speed of analytical processes.

Successfully worked with Azure Synapse, leveraging knowledge of clusters and Hive tables for data processing.

Integrated Azure Data Lake with other Azure services, enabling seamless data movement and analytics across the entire data ecosystem.

Contributed to the implementation and optimization of data solutions within the Azure Synapse environment.

Collaborated with Azure Synapse administrators to address specific project requirements and optimize performance.

Leveraged Quantexa's Entity Resolution and Network Analytics capabilities to identify and mitigate fraud and financial crime risks by analyzing data from multiple sources and uncovering potential relationships.

Demonstrated the ability to troubleshoot and modify PowerShell scripts as needed.

Implemented strategies for managing Slowly Changing Dimensions (SCD) using Type 1 and Type 2 methodologies.

Conducted performance tuning on Hive queries to enhance overall data processing capabilities.

Created detailed data mapping documentation illustrating the flow of data relevant to CECL requirements, facilitating transparency and compliance.

Participated in the integration of Quantexa with existing systems and workflows, ensuring seamless adoption and effective utilization of the platform's features.

Collaborated with different departments including legal, finance, and risk management to ensure cohesive management of delegated authorities.

Assessed and managed the due diligence process for potential delegated authorities.

Implemented data transformation and mapping techniques to ensure data consistency and compatibility between different systems, adhering to OAS standards.

Proficient in developing dynamic dashboards, reports, and applications for visualizing data using QlikView and Qlik Sense, providing actionable insights to stakeholders.

Utilized IICS features such as mapping designer, data masking, data replication, and event-driven data synchronization.

Designed and implemented data models specifically tailored to CCAR requirements, ensuring the accuracy and completeness of financial data.

Integrated LFI guidelines into data engineering processes, ensuring that liquidity, funding, and interest rate risk considerations are reflected in data workflows.

Automated regulatory reporting processes for IFRS9, incorporating accounting standards into data engineering workflows for efficient and compliant reporting.

Implemented data quality measures and governance protocols to meet ALLL compliance, ensuring accurate assessment and provisioning for potential loan and lease losses.

Used Dell Boomi's drag-and-drop interface and pre-built connectors to integrate disparate systems quickly and without writing extensive custom code.

Developed and scheduled data processing workflows on Amazon EMR using Apache Airflow, orchestrating ETL tasks and optimizing data pipelines.

Implemented and managed Apache Zookeeper for distributed coordination and configuration management in highly available systems.

Designed and implemented RESTful APIs to allow for flexible, lightweight, and scalable integrations with web applications and mobile platforms.

Develop and Implement prototype in PLSQL, Java, and JavaScript to fulfill change requests.

Experience in integrating Graph Databases with other data storage and processing systems, such as relational databases, NoSQL databases, or big data processing systems like Hadoop and Spark.

Hands-on experience in using HVR for cloud migrations, data lake consolidations, and real-time analytics.

Managed Splunk clustering for high availability and fault tolerance, ensuring uninterrupted service and data integrity.

Implemented advanced procedures like text analytics and processing using in-memory computing capabilities like Apache Spark written in Python.

Build data pipelines in airflow in GCP for ETL-related jobs using different airflow operators, both old and newer operators.

Implemented Spark using Python and Spark SQL for faster testing and processing of data.

Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, and Scala.

Worked with Spark to create structured data from the pool of unstructured data received.

Implemented intermediate functionalities like events or records count from the flume sinks or Kafka topics by writing Spark programs in Java and Python.

Implemented information into CSV records and stored them in AWS S3 by using AWS EC2 and loading them into AWS Redshift.

Experienced in optimizing HVR performance through techniques such as compression and parallel data streams.

Developed interactive Reports using Tableau, Pentaho BA tool which helps clients for monthly statistical analysis and decision makings etc.,

Developed Glue Jobs to read data in CSV format in the raw layer and write data to parquet format in publish layer.

Optimized the data fabric infrastructure for scalability and performance, utilizing cloud-based solutions or distributed computing frameworks (e.g., Hadoop, Spark) to handle large-scale data processing.

Developed and implemented data engineering solutions using DBT to transform raw data into clean, structured, and analytics-ready formats.

Extracted Real-time feed using Kafka and Spark Streaming and convert it to RDD and process data in the form of Data Frame and save the data as Parquet format in HDFS.

Experienced in transferring Streaming data, data from different data sources into HDFS, No SQL databases.

Ensured optimal performance and security by incorporating best practices and standards for both REST and SOAP protocols.

Demonstrated ability to migrate data from other databases (SQL, NoSQL) to graph databases while ensuring data integrity and consistency.

Extensive experience in Data Warehousing projects using Talend, Informatica, Pentaho. Designing and developing complex mappings to extract data from various sources including flat files, RDBMS tables, and legacy systems.

Documented data mesh architecture, design patterns, and best practices, promoting knowledge sharing and facilitating onboarding for new team members.

Created ETL Mapping with Talend Integration Suite to pull data from Source, apply transformations, and load data into the target database.

Created Data bricks notebooks using SQL,

Contact this candidate