Data Engineer Azure

Location:

Olathe, KS

Posted:

February 01, 2024

Contact this candidate

Resume:

Sanjay Kumar

Senior Cloud Data Engineer

************@*****.*** • 402-***-**** • LinkedIn

Senior Cloud Data Engineer with over 9+ years of industry experience, I specialize in leveraging big data technologies and cloud platforms such as AWS, Azure, and GCP. My expertise extends to Informatica and ETL, with a focus on building and maintaining secure, scalable data lakes. I excel in orchestrating real-time data pipelines to derive actionable insights, consistently implementing robust data security and compliance measures that surpass stringent company audits.

Professional Summary

• Hands-on experience in the Hadoop ecosystem, encompassing Spark, Kafka, HBase, Scala, Pig, Impala, Sqoop, Oozie, Flume, Storm, and various big data technologies. Worked extensively on Spark SQL, Spark Streaming, and Core Spark API to construct effective data pipelines.

• Proficient in designing and developing databases for Business Intelligence using SQL Server 2014/2016, Integration Services (SSIS), DTS Packages, SQL Server Analysis Services (SSAS), DAX, OLAP Cubes, Star Schema, and Snowflake Schema.

• Hands-on experience in data import/export using Sqoop, facilitating the transfer of data between HDFS and Relational Database Systems (RDBMS).

• Proficient in migrating on-premises SQL databases to Azure Data Lake, Azure Data Lake Analytics, Azure SQL Database, Data Bricks, and Azure SQL Data Warehouse. Skilled in controlling and granting database access using Azure Data Factory.

• Practical experience in Azure Analytics Services, covering Azure Data Lake Store (ADLS), Azure Data Lake Analytics (ADLAI), Azure SQL DW, Azure Data Factory (ADF), Azure Data Bricks (ADB), and more.

• Proficient in utilizing the Snowflake cloud data warehouse and AWS S3 bucket to integrate data from diverse source systems, including loading nested JSON formatted data into Snowflake tables.

• Proficient in Amazon Web Services, covering S3, IAM, EC2, EMR, Kinesis, VPC, Dynamo DB, RedShift, Amazon RDS, Lambda, Athena Glue, DMS, Quick Sight, Amazon Elastic Load Balancing, Auto Scaling, CloudWatch, SNS, SQS, and various other AWS services.

• Experienced in leveraging Informatica PowerCenter versions 9.x, 8.x, and 7.x, as well as Informatica Data Cloud, throughout the entire Software Development Life Cycle (SDLC). Experienced in various phases, including requirements gathering, analysis, application design, development, testing, implementation, system maintenance, documentation, and support for Data Warehousing applications.

• Proficient in using Airflow for scheduling ETL jobs and leveraging Glue and Athena to extract data from AWS data warehouse.

• Robust understanding of statistical methods (regression, time series, hypothesis testing, randomized experiment), machine learning, algorithms, data structures, and data infrastructure.

• Proficient in working with various data formats, including Json, Avro, Parquet, RC, and ORC, and compressions like Snappy & Bzip.

• Proficient in technical aspects of designing and data modeling for data warehouse/Business Intelligence applications.

• Robust skills in visualization tools, including Power BI, Tableau, Confidential Excel formulas, Pivot Tables, Charts, and DAX Commands.

Professional Experience

Sr Cloud Data Engineer

Emc Insurance, Iowa June 2022–Present

Responsibilities:

• Analyzed source systems like Azure SQL Server, ADLS, BLOB storage, and Rest APIs to create efficient Linked Services within Azure Data Factory (ADF). This involved understanding data schemas, security requirements, and potential integration challenges.

• Built and deployed ETL pipelines in ADF to seamlessly extract data from on-premises systems and load it into Azure Data Lake Storage (ADLS). Leveraged tools like Sqoop or custom scripts for on-premises data access and optimized copy activities for hierarchical data handling.

• Demonstrated proficiency in various ADF activities including Lookups for reference data management, Stored Procedures for custom logic, conditional statements for branching workflows, loops for iterative processing, variables for dynamic configurations, filters for data selection, and Wait activities for sequencing tasks.

• Implemented Logic Apps triggered by ADF pipelines to send automated email notifications to users and stakeholders using web service integrations. This ensured timely communication of pipeline statuses and potential issues.

• Extensively employed Azure Key Vaults to securely store and manage connection strings and other sensitive data used in Linked Services, ensuring best practices for access control and security.

• Configured and scheduled ADF pipelines to run at specific intervals or based on events. Employed monitoring tools and alerts to proactively identify and address pipeline failures, minimizing downtime and data loss.

• Developed Azure Stream Analytics Jobs to replicate real-time data streams from external sources directly into Azure SQL Data Warehouse, providing near-instantaneous insights for critical decision-making.

• Deployed code updates to multiple environments with CI/CD processes, actively addressed code defects during testing phases, and provided data load support for SIT and UAT testing.

• Integrated Spark-Python Notebooks into ADF pipelines using Azure Data Bricks, leveraging widgets for parameter passing to dynamically configure notebooks during pipeline execution.

• Implemented comprehensive logging frameworks within ADF pipelines, capturing events, errors, and performance metrics to facilitate continuous monitoring, troubleshooting, and pipeline maintenance. Sr Cloud Data Engineer

Ford, Dearborn, MI October 2019 – May 2022

Responsibilities:

• Built serverless & containerized EL/ELT/ETL pipelines using Glue & Batch on Python/PySpark, powering Charging/Energy Analytics with BigQuery-ready data.

• Streamlined on-prem data integration with AWS DataSync, seamlessly fetching data from Hadoop & SQL Server for BigQuery analysis.

• Crafted RESTful APIs with AWS API Gateway & Lambda, fuelling internal & external data consumption for the Charging/Energy Analytics product line.

• Partnered closely with data scientists, ensuring timely data delivery to BigQuery for impactful ML model development in the energy domain.

• Implemented AWS Auto Scaling & CloudWatch, maximizing pipeline scalability & cost-efficiency for Charging/Energy Analytics.

• Ensured data governance & compliance within BigQuery, implementing robust security measures for Charging/Energy Analytics data privacy.

• Provided strategic data transformation expertise & guidance to stakeholders, driving informed decision- making for Charging/Energy Analytics initiatives.

Cloud Data Engineer

Global Atlantic Financial Group, Indianapolis, IN June 2017–August 2019 Responsibilities:

• Constructed ETL pipelines to facilitate data movement into and out of the data warehouse and generated insightful reports using advanced SQL queries.

• Established S3 buckets and implemented IAM role-based policies to manage access and security.

• Authored PySpark code to execute data processing tasks within both AWS Glue and EMR environments.

• Acquired knowledge of AWS Redshift, a cloud-based data warehouse service.

• Developed data pipelines using Kafka to efficiently stream data from external systems and RDBMS sources into HDFS for storage and analysis.

• Implemented Apache Airflow to orchestrate, schedule, and monitor data pipelines, ensuring reliable data flow and automation.

• Established, implemented, and maintained standards, protocols, and documentation for data analytics practices, promoting consistency and quality.

ETL / Informatica Developer

Ibing Software Solutions pvt ltd, India September 2015–March 2017 Responsibilities:

• Utilized Informatica Power Center for Extract, Transform, Load (ETL) processes, extracting and loading data from diverse source systems into the target database.

• Designed mappings using the Designer tool to extract data from various sources and transform it as per specified requirements.

• Engaged in extracting data from Flat Files and Relational databases, transferring it to the staging area.

• Managed the migration of Mappings, Sessions, and Workflows from the Development environment to Test and subsequently to the QA (I-JAT) environment.

• Created Informatica Mappings and Reusable Transformations to facilitate the timely loading of data into a star schema and implemented those with the use of Aggregator, SQL overrides in Lookups, source filter in Source Qualifiers, and data flow management into multiple targets using Router.

• Applied various transformations, including Filter, Expression, Sequence Generator, Update Strategy, Joiner, Router, and Aggregator, to create robust mappings in the Informatica Power Center Designer.

• Imported heterogeneous files using Informatica Power Center 8.x Source Analyzer.

• Developed reusable transformations and mapplets for use in multiple mappings.

• Contributed to the creation of technical design documents and test cases for unit testing, addressing various bottlenecks encountered in the process.

ETL Data Analyst

Ceequence Technologies, India May 2014–August 2015 Responsibilities:

• Collaborated with data architects to execute required modifications to data models, ensuring alignment with evolving business needs.

• Utilized Informatica mappings to successfully implement ETL processes within the Data Warehouse, effectively streamlining data integration.

• Demonstrated proficiency in advanced Excel functions, including Pivot Tables and V-Lookups, to manage, update, and manipulate report structures, enabling comprehensive data analysis and visualization.

• Conducted rigorous data testing by analysing logs generated post-data loading into the Data Warehouse.

• Implemented MDM practices to maintain accurate and consistent customer information, enforcing ETL rules to safeguard data integrity.

• Leveraged OLAP tools, including ETL, Data Warehousing, and Modelling, to extract, transform, and load data between SQL Server and Oracle databases, employing Informatica/SSIS for seamless data integration.

• Actively participated in meetings with user groups to analyse requirements and provide recommendations for design and specification enhancements, ensuring solutions aligned with user needs.

• Conducted Detailed Data Analysis (DDA), Data Quality Analysis (DQA), and Data Profiling on source data to identify potential issues, ensure data quality, and facilitate informed decision-making. Skills:

Big Data Technologies Hadoop, MapReduce, HDFS, Sqoop,Pig, Hive, HBase, Oozie, Flume, NiFi, Kafka, Zookeeper, Yarn,Spark,

Databases Oracle 8i/9i/1Og/11g, MySQL, SQL Server,MongoDB, Cassandra, DynamoDB, PostgreSQL,

Cloud Platforms AWS, Microsoft Azure, GCP

Scripting and Programming Python, PySpark, Scala, Java, Shell script, Perl script, SQL Data Warehousing Informatica Power Center 9.x/8.x/7.x, Informatica Cloud,Snowflake, Synapse Analytics, Redshift, BigQuery

Database Modelling Dimension Modelling, ER Modelling, StarSchema, Snowflake Data Visualization and BI Tableau, Matplotlib, Power Bl and SSRS Source Control and CI/CD GIT, Bitbucket, Jenkins, Maven Others Apache Kafka (Publisher, Consumer), YAML configuration (used in Azure Data Factory), Kubernetes, Kafka Publisher in Spark job, Docker Education:

Bachelor of Technology in Computer Science, Malla Reddy Engineering College May 2014

Contact this candidate