Post Job Free

Resume

Sign in

Azure Data Engineer

Location:
Salisbury, NC
Salary:
60
Posted:
February 05, 2024

Contact this candidate

Resume:

Sr Data Engineer

Name: Manikhanta

Email ID: ad3dvu@r.postjobfree.com Contact No: +1-716-***-****

PROFESSIONAL SUMMARY

Experienced Data Engineer with around 10 years of expertise in the IT sector.

Extensive experience with different phases of the project lifecycle, including project initiation, requirement gathering, system design, coding, testing, and debugging client-server-based applications.

Experience in Azure Cloud Computing services, such as Azure Data Factory, Azure Storage accounts, Azure Synapse, Key Vaults, ADLS Gen2, Azure App Insights, Azure Databricks, ARM templates, Azure DevOps, Integration Runtimes, Azure Data Flows.

Proficient in applying SDLC software development processes and establishing a business analysis methodology.

Lead data mapping, exploratory data analysis, and logical data modeling between source and destination systems.

Designed Azure architecture, Cloud migration, Dynamo DB, and event processing using Azure Data Factory.

Experience in managing and securing Custom AMI's, Azure account access using IAM.

Led cross-functional teams through the entire SDLC, from project initiation to deployment.

Proficient in object-oriented analysis concepts, implementation, design, and programming.

Excellent understanding of Quality Assurance processes and SDLC.

Proficient in designing data pipelines for streaming web data and RDBMS source data.

Merit of building ETL architecture and source-to-target mapping for data loading into data warehouses.

Experience in manipulating data for data loads, extracts, statistical analysis, and modeling.

Good knowledge of data modeling techniques like Data Vault 2.0, star schema, and Kimball modeling.

Good exposure to activities in Azure Data Factory, such as Copy Activity, For Each, Look Up, Switch, IF, Execute Pipeline, Web Activity, and Set Variable.

Orchestration of ETL pipelines in Azure Data Factory and Azure Synapse for loading data from various sources to Snowflake DW and Synapse Analytics.

Proficient in datasets, Integration runtimes, and Linked services in Azure Data Factory for data connection and access.

Adapted Agile practices to suit the specific needs and constraints of projects.

Designed and implemented end-to-end data pipelines for extracting, cleansing, processing, and analyzing behavioural and log data.

Ability to architect solutions, create Data Platform roadmaps, and enable scalability in Azure Data Lake, Azure Databricks, Azure Data Factory.

Strong knowledge of designing and building data models for Snowflake cloud data warehouse.

Experience in developing production-ready Spark applications using Spark RDD APIs, DataFrames, Spark- SQL, and Spark-Streaming API.

Worked on SQL implementations, complex queries, Views, CTEs, Stored Procedures, and window functions.

Experienced in using waterfall, Agile, and Scrum models of the software development process framework.

Worked with multiple join statements to retrieve data from multiple tables.

proficient in creating reports, dashboards, and visualizations using Power BI, implementing role-based security, schedule refresh, and performing data transformations using Power Query Editor.

Implemented data transformations using split columns, conditional columns, append and merge queries in Power Query Editor.

Active team player with excellent interpersonal skills, a keen learner, and innovative.

Superior communication skills, strong decision-making, organizational skills, and outstanding analytical and problem-solving skills.

Skills:

Hadoop Core Services

Hadoop Distribution Databases

Data Services Scheduling Tools Monitoring Tools Cloud Computing Tools Programming Languages

Java & J2EE Technologies Operating Systems Build Tools Development Tools

HDFS, Map Reduce, Spark, YARN. Cloudera Hortonworks, Apache Hadoop.

HBase, Spark-Redis, Cassandra, Oracle, MySQL, Postgres, Teradata. Hive, Pig, Impala, Sqoop, Flume, Kafka.

Zookeeper, Oozie. Cloudera Manager.

Azure, AWS

C, Java, Scala, Python, R, SQL, PL/SQL, Pig Latin, HiveQL, Unix, Java Script, Shell Scripting.

Core Java, Servlets, Hibernate, Spring, Struts, JMS, EJB. UNIX, Windows, LINUX.

Jenkins, Maven, ANT.

Eclipse, NetBeans, Microsoft SQL Studio, Toad.

Client: Sam’s club, Bentonville, AR December 2020 to Present

Sr. Data Engineer

Design, deploy, and manage scalable and secure AWS cloud infrastructure, including provisioning and configuring EC2 instances, setting up networking components, and managing services such as S3, RDS, Lambda, and DynamoDB.

Most of the infrastructure is on AWS, used,

AWS EMR Distribution for Hadoop

AWS S3 for raw file storage

AWS EC2 for Kafka

Used AWS Lambda to perform data validation, filtering, sorting, or other transformations for every data change in a DynamoDB table and load the transformed data to another data store.

Programmed ETL functions between Oracle and Amazon Redshift.

Maintained end to end ownership for analyzed data, developed framework’s, Implementation building and communication of a range of customer analytics projects.

Good exposure to IRI end-end analytics service engine, new big data platform (Hadoop loader framework, big data Spark framework etc.)

Orchestrated complex infrastructure deployments using Terraform, streamlining the process and minimizing manual intervention.

Experience in building and optimizing ETL processes using Scala, including data cleaning, extraction, transformation, and loading.

Created reusable Terraform modules to standardize infrastructure components across projects.

Applied best practices for optimizing performance in DynamoDB, including proper indexing and partitioning.

Leveraged features such as DynamoDB Accelerator (DAX) for caching and improved query speed.

Explore clickstream events data with SparkSQL.

Spark SQL is used as a part of Apache Spark big data framework for structured, Shipment, POS, Consumer, Household, Individual digital impressions, Household TV impressions data processing.

Created DataFrames from different data sources like Existing RDDs, Structured data files, JSON Datasets, Hive tables, External databases.

Experience in migrating data to Snowflake from various sources like on-premises databases, other cloud databases, or different data warehouse technologies.

Demonstrated expertise in optimizing SQL queries for both Oracle and MySQL databases.

Successfully rewrote and refactored complex SQL queries for better performance.

Proficient in troubleshooting and debugging performance issues in SQL queries.

Skilled in data modeling techniques optimized for Snowflake, including star schema, snowflake schema, and dimensional modeling.

Proficient in using Scala for developing efficient, scalable data processing and data pipeline applications.

Experience in leveraging Scala’s functional programming capabilities for complex data manipulation and processing tasks.

Demonstrated expertise in Agile methodologies such as Scrum, Kanban, or a hybrid approach.

Facilitated Agile ceremonies, including sprint planning, daily stand-ups, sprint reviews, and retrospectives.

Environment: Map Reduce, HDFS, Hive, Python, Scala, Kafka, Spark, Spark Sql, Oracle, Informatica 9.6, SQL, Sqoop, Zookeeper, AWS EMR, AWS S3, Data Pipeline, Jenkins, GIT, JIRA, Unix/Linux, Agile Methodology, Scrum.

Client: Molina healthcare, Bothell, WA October 2018 to November 2020

Data Engineer

Participated in daily stand-up meetings to update the project status with the internal Dev team.

Participated in daily Agile stand-up meetings, updating the internal Dev team on project statuses, and collaborated through Palantir Foundry for real-time data management and decision-making processes.

Proficient in implementing and managing big data solutions using Azure Databricks.

Experience in setting up Azure Databricks workspaces and clusters for data processing and analysis tasks.

Skilled in developing ETL pipelines using Databricks for data extraction, transformation, and loading.

Expertise in optimizing Snowflake performance, including query tuning, warehouse sizing, and understanding of Snowflake's unique performance features like clustering keys and automatic clustering.

Experience in handling large-scale data ingestion, transformation, and aggregation using Scala.

Managed code deployment via Azure DevOps, handling feature branch merges, PRs, and CI/CD pipeline executions. Ensured that the deployments are reflected accurately in Palantir Foundry's version-controlled environment.

Generated case management reports using Azure Data Lake Gen2 and developed Power BI data models, facilitating transparent reporting for Stop Loss Insurance stakeholders.

Conducted manual testing, drafted test cases, and presented austomated test result analyses during stand-up meetings, using Palantir Foundry's built-in capabilities for tracking data quality and validation results.

perience in building and optimizing ETL pipelines for loading and transforming data into Snowflake.

Strong skills in monitoring ADF pipelines using Azure Monitor and Log Analytics

Extensive experience in Apache Spark development using Scala for batch and real-time data processing.

Proficient in configuring ADF-linked services and datasets for optimal data integration.

Designed and implemented robust, scalable data pipelines using Scala, integrating with various data sources and sinks (like Kafka, HDFS, S3, SQL/NoSQL databases).

Experienced in designing and implementing complex data integration solutions using Azure Data Factory.

Integrated Azure solutions, including Azure Data Factory, Synapse, and Databricks, to design and deploy complex data solutions aligned with the needs of Stop Loss Insurance data processing.

Proficient in building, scheduling, and orchestrating ETL/ELT pipelines in ADF.

Designed and implemented data models for DynamoDB, considering its NoSQL nature.

Utilized partition keys and sort keys effectively for optimal data retrieval.

Proficient in developing distributed systems for data processing using Scala and big data technologies like Hadoop and Spark.

Expertise in integrating Azure Data Factory with other Azure services like Azure Databricks, Azure SQL Data Warehouse/Synapse Analytics, Azure Data Lake Storage for comprehensive data solutions

Experience in utilizing ADF’s Data Flow and Mapping Data Flow for complex data transformation tasks

Proficient in troubleshooting and debugging performance issues in SQL queries.

Strong background in processing large datasets using Databricks and Spark, including batch and real-time data processing.

Managed file ingestion, data transformation, and loading into Snowflake DB using ADF and Azure Databricks, ensuring consistency with datasets within Palantir Foundry.

Demonstrated proficiency with Agile project management tools such as Jira, Trello, or others. DynamoDB

Tools & Environment: Palantir Foundry, Azure, ETL, SSIS, Erwin, SSMS, SSRS, SSDT, Excel, Python, DAX, Power BI, Agile, TFS.

Client: Silicon Valley Bank, CA August 2016 to September 2018

Data Engineer

Developed functional / technical specifications for the project including business scope document and high-level design document.

Lead data analytics project from requirement stage, scope analysis, model development, deployment and support.

Supervised entire gamut of data analysis including data collection, data transformation, and data loading.

Proficient in implementing machine learning models and pipelines using Azure Databricks and MLflow.

Responsible for analyzing data from client extracts and identifying data quality issues.

Prepared data mapping documents for the data migration process based on an analysis of source databases.

Experience in integrating Azure Databricks with other Azure services like Azure Data Lake Storage, Azure Synapse Analytics, and Azure Cosmos DB.

Analyzed the source data and worked with business users and developers to develop the data model.

Created dimension model for reporting system by identifying required dimensions and facts utilizing Erwin.

Provided investigation and root cause analysis support for operational issues.

Used data analysis techniques to validate business rules and identify low quality missing data in the existing Enterprise Data Warehouse.

Knowledge of implementing Snowflake security features, including role-based access control, data encryption, and secure data sharing.

Developed exception handling mechanisms using Python exception handling.

Developed and optimized queries using DynamoDB Query Language to retrieve data efficiently.

Implemented pagination and conditional expressions for precise data retrieval.

Created BI objects for the data warehouse to analyze data, and as the basis for analytics and reporting.

Performed quality testing of converted data, identifying root cause of issues, and designing/ documenting proposed solutions.

Developed the logical and physical data model and designed the data flow.

Involved with all phases of data warehouse development including data modeling, data analysis, data designing, testing and documentation.

Handling complex report creation in SSRS

Prepared various SSRS report templates adhering reporting standards to work on the report development.

Developed SSIS packages for extracting data from different sources and transforming data with the help of functionalities in SSIS tool and loading to required SQL server.

Handled different types of SQL queries while doing transformations in SSIS ETL tool.

Tools & Environment: SSIS, ETL, SSMS, SSRS.Erwin

Client: Avon Technologies Pvt Ltd Hyd India April 2015 to May 2016

Big Data Developer

Migrated the existing data from Teradata/SQL Server to Hadoop and perform ETL operations on it.

Responsible for loading structured, unstructured, and semi-structured data into Hadoop by creating static and dynamic partitions.

Worked on different data formats such as JSON and performed machine learning algorithms in Python.

Created a task scheduling application to run in an EC2 environment on multiple servers.

Strong knowledge of various Data warehousing methodologies and Data modeling concepts.

Created Hive partitioned tables using Parquet Avro format to improve query performance and efficient space utilization.

Responsibilities include Database Design and Creation of User Database.

Moving ETL pipelines from SQL server to Hadoop Environment.

Developed Spark applications using PySpark and Spark-SQL for data extraction, transformation, and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns.

Implemented a CI/CD pipeline using Jenkins, Airflow for Containers from Docker and Kubernetes.

Used SSIS, NIFI, Python scripts, and Spark Applications for ETL Operations to create data flow pipelines and was involved in transforming data from legacy tables to Hive, HBase tables, and S3 buckets for handoff to business and Data scientists to create analytics over the data.

Support current and new services that leverage AWS cloud computing architecture including EC2, S3, and other managed service offerings.

Used advanced SQL methods to code, test, debug, and document complex database queries.

Design relational database models for small and large applications.

Designed and developed Scala workflows for data pull from cloud-based systems and apply transformations to it.

The ability to develop reliable, maintainable, efficient code in most of SQL, Linux shell, and Python.

Implemented Apache-spark code to read multiple tables from the real-time records and filter the data based on the requirement.

Stored final computation result to Cassandra tables and used Spark-SQL, and spark-dataset to perform data computation.

Used Spark for data analysis and store final computation results to HBase tables.

Troubleshoot and resolve complex production issues while providing data analysis and data validation.

Environment: Teradata, SQL Server, Hadoop, ETL operations, Data Warehousing, Data Modelling, Cassandra, AWS Cloud computing architecture, EC2, S3, Advanced SQL methods, NiFi,Python, Linux, Apache Spark, Scala, Spark-SQL, HBase

Client: IBing Software Solutions Private Limited Hyd India March 2014 to April 2015

Role: Sql, Pl/Sql Developer Responsibilities:

Created database objects like tables, views, Materialized views, procedures, packages using Oracle tools like PL/SQL, sql * plus.

Involved in database development by creating Functions, Procedures, Triggers, and Packages using TOAD for accessing, inserting, modifying and deleting data in the database.

Used Ref cursors and Collections for accessing complex data resulted from joining of large number of tables.

Worked in creation of Sequences for automatic generation of students registering.

Created indexes on tables for optimizing stored procedure queries.

Participated in Performance Tuning of SQL queries using Explain Plan to improve the performance of the application.

Worked extensively on exception handling for handling errors using system defined exceptions and user defined exceptions.

Developed number of Reports like parameterized reports, cross tab reports, and sub reports according to the client requirement using Reports 6i.

Documented all the technical work for future references and customizations.

Environment: Oracle 9i/8i, SQL Server 2000, SQL, PL/SQL, SQL*Plus, Developer 6i (Report 6i), TOAD 8.6, UNIX.



Contact this candidate