Post Job Free

Resume

Sign in

Azure Data Engineer

Location:
Kansas City, MO
Posted:
February 15, 2024

Contact this candidate

Resume:

Pramod Jakkula

Title: Azure Data Engineer

Phone: 913-***-****

Email: ad3ni1@r.postjobfree.com

LinkedIn: https://www.linkedin.com/in/pramod-jakkula-30b42217b/ PROFESSIONAL SUMMARY

• Experienced Data Engineer with a strong background in the software industry, and a focus on Azure cloud services and Big Data technologies. I have over 10+ years of experience in the field, including 5+ years specializing in Azure cloud services and 3+ years in Data warehouse implementations.

• Adept knowledge in Azure Services and its components, including Azure Data Factory (ADF), Azure Databricks, Azure Synapse Analytics, Azure Data Lake Gen 2 (ADLS GEN 2), Azure Blob Storage, Key Vault, Azure Logical Apps, Azure function Apps, and Azure DevOps services.

• Command in Azure Data Factory (ADF) for seamless data loading and expert in leveraging Azure Data Factory (ADF) to orchestrate data pipelines and streamline data workflows efficiently.

• Extensively utilized Azure Data Lake Storage Gen2 (ADLS Gen2) seamlessly integrated with Azure Databricks enables efficient data storage and processing, empowering advanced analytics and insights generation.

• Azure Synapse Analytics is a comprehensive analytics solution that combines elements of big data and data warehousing technologies, streamlining processes for data integration, exploration, and analysis within a singular platform.

• Over 5+ years of hands-on experience in creating ETL data pipelines utilizing Spark and PySpark on Azure Databricks.

• Widely utilized Azure Logic App Integration for intricate workflows and implemented advanced analytics solutions on Azure Synapse, integrating data warehousing and big data analytics capabilities.

• Broadly utilizes Azure Event Hubs to develop messaging and streaming applications, primarily employing Scala for enhanced functionality and performance.

• Designed and implemented Azure Functional apps to deploy serverless, event-driven applications, utilizing triggers and integrating with Azure Key Vault for secure management of cryptographic keys and secrets.

• Expertly deployed Azure Functions, Azure Storage, and Service Bus queries, optimizing enterprise ERP integration systems for streamlined data processing and communication in complex environments.

• Experienced in creating and managing Azure DevOps tools for continuous integration and deployment (CI/CD) pipelines.

• Designed and implemented ETL data pipelines using PySpark, Spark SQL, and Scala, demonstrating proficiency in big data processing and maintain corporate solutions, facilitating seamless data extraction, transformation, and loading for effective integration.

• In-depth proficiency in crafting UNIX shell scripts tailored for Hadoop Big Data Development, contributing to efficient data processing and management in distributed computing environments.

• Extensive experience in crafting and optimizing Data Pipeline Development and Data Modelling strategies, crucial for driving efficient data processing and analysis workflows.

• Formulated data ingestion workflows for efficient storage and retrieval, working with Avro, Parquet, Sequence, JSON, and ORC file formats.

• Proficient in Hadoop ecosystem elements such as HDFS, MapReduce, Hive, Pig, complemented by adeptness in programming languages such as Java, Python, and PySpark, crucial for intricate big data processing and analytics workflows.

• Extensively expert in crafting expansive data pipelines utilizing Spark and Hive, essential for managing and analysing vast datasets in complex environments.

• Leveraged Apache Sqoop proficiently to facilitate seamless data transfer operations between HDFS and Hive, enhancing efficiency in data import and export processes.

• Adept in configuring Apache Oozie workflows, orchestrating Hadoop jobs efficiently, and proficient in SQOOP for seamless HDFS to relational database data transfer.

• Implemented partitioning and bucketing strategies for performance tuning in data engineering workflows, enhancing data processing efficiency.

• Developed Spark scripts with Scala shell commands, tailored to project needs, demonstrating adeptness in advanced programming for efficient big data processing.

• Implemented real-time streaming with Kafka as data pipeline, integrating Spark Streaming for continuous data processing, showcasing expertise in event processing and distributed computing.

• Leveraging Informatica PowerCenter for orchestrating data integration, transformation, and ETL operations, ensuring uninterrupted data flow and precision in intricate corporate requirements.

• Experienced in infrastructure as code practices using Terraform, enabling automation and scalability in deploying and managing cloud infrastructure for data engineering projects, ensuring efficient resource utilization and reproducibility.

• Proficient in Teradata, leveraging its powerful data warehousing and analytics capabilities to manage and analyse large volumes of data effectively in complex enterprise environments.

• Extensive experience in developing, maintaining, and implementing Enterprise Data Warehouse (EDW), Data Marts, ODS, and Data warehouses with Star schema and Snowflake schema.

• Proficient in utilizing leading Business Intelligence (BI) reporting tools such as Tableau, Power BI, and Looker to create insightful dashboards and visualizations, enabling stakeholders to make data-driven decisions effectively.

• Utilized GitHub extensively for version control, demonstrating practical proficiency in managing code repositories and facilitating collaborative software development workflows.

• Demonstrated mastery in SDLC management, skilfully applying Agile Methodology to steer iterative development and continuous software project improvement.

• Proof of concept (POC) initiatives utilizing Snowflake, Airflow, and DBT, exploring data warehousing, workflow orchestration, and transformation capabilities within a modern data engineering ecosystem. EDUCATION

• Masters in Computer Science at University of Missouri at Kansas City in Dec,2013.

• Bachelors in Electronics and Communication at B V Raju Institute of Technology in June,2011. Certifications

• DZ-900 Microsoft Certified Azure Data Fundamentals

• DP-203 Microsoft Certified Azure Data Engineer Associate TECHNICAL SKILLS

Cloud Services: Azure Data Factory (ADF), Azure Blob Storage, Azure Data Lake Storage (ADLS), Azure Data Bricks, SQL Server, Azure Synapse Analytics, Azure Stream Analytics, Azure Logic Apps, Functional Apps, Azure Event Hubs, Azure Key Vault, Azure Monitoring, Purview and Entra ID/ Active Directory. Big Data Technologies: Hadoop (1.0X and 2.0X), Hortonworks HDP (2.4/2.6), HDFS, YARN, MapReduce, Pig, HBase, Hive, Sqoop, Flume, Spark, Oozie, Airflow, Ambari and Apache Kafka. Programming languages: MapReduce, PIG, Java, Python, PySpark, Spark SQL, Linux, Unix Shell Scripting, SQL, PL/SQL. ETL Tools: IBM Information Server 11.5/9.1/8.7/8.5, IBM Infosphere DataStage 8.1.0, Assential DataStage 7.5.X, Quality Stage, Talend 6.4, SSIS, SSRS, Informatica. Data Modeling: Data Modelling, Star Schema Modelling, Snow-Flake Schema Modelling FACT and Dimensions Tables, Erwin 4.0/3.5, Slowly Changing Dimensions (SCD), Change Data Capture (CDC). Business Intelligence: PowerBI, SAP Business Objects 11.5, Qlik Sense, Tableau. Scheduling: Control-M, Autosys, Oozie, Apache Airflow. Version Control Tools: Git, CI/CD, Jenkins.

Databases: NoSQL: HBase and Cassandra

Row-Oriented: Oracle 11g/10g, MS SQL Server, MySQL, Teradata V2R5/V2R6, DB2. Columnar: HP Vertica.

WORK EXPERIENCE

Client: HOMESITE INSURANCE, Boston, MA. Feb 2021 – Till Now Role: Sr. Azure Data Engineer

Responsibilities:

• Developed and implemented a sophisticated Personalized Customer Recommendations system, seamlessly integrating data collection, processing, and analysis techniques, resulting in enhanced data utilization and tailored recommendations for optimal customer engagement.

• Deployed a robust data ingestion pipeline in Azure Data Factory (ADF) to efficiently handle both streaming and batch data sources, ensuring seamless integration and continuous flow of data for further processing and analysis.

• Developed and maintained end-to-end operations of ETL data pipeline, handling large datasets in Azure Data Factory

(ADF).

• Utilized Azure Data Factory's Copy Activity to streamline data transfer operations, implementing optimized queries and indexing techniques for enhanced fetching efficiency.

• Leveraged Copy Activity to efficiently ingest and process streaming data, integrating Kafka and Spark Streaming for specific use cases.

• Integrated On-Premises (MYSQL, Cassandra) and cloud data storage (Blob storage, Azure SQL DB) using Azure Data Factory (ADF) and applied transformations after loading into Snowflake.

• Modeled data in Snowflake using data warehousing techniques, performed data cleansing, managed Slowly Changing Dimensions, assigned Surrogate keys, and implemented change data capture.

• Azure Data Lake Storage Gen2 as a central repository, enabling scalable storage and efficient management of diverse data types for streamlined processing and analysis.

• Utilizing a modular strategy employing Medallion architecture to enhance scalability, fault tolerance, and adaptability in deploying customized customer recommendation systems, streamlining data flow and processing for maximum efficiency.

• Implemented Azure Databricks, leveraging its advanced analytics capabilities for efficient data processing, machine learning, and collaborative development, enhancing overall performance.

• Developed scalable event ingestion pipelines utilizing Azure Event Hubs to ingest, process, and analyse real-time streaming data, enabling timely insights and actionable outcomes within Azure data engineering solutions.

• Utilized Azure Data Factory (ADF), Data Lake, and Azure Synapse Analytics to solve business problems with an analytical approach.

• Skilled in leveraging Informatica PowerCenter for orchestrating data integration, transformation, and ETL operations, ensuring uninterrupted data flow and precision in intricate corporate requirements.

• Designed ELT/ETL pipelines to enable bidirectional data transfer between Snowflake, utilizing Snowflake Snow SQL to guarantee seamless integration and transformation processes.

• Implemented ETL transformations and validation using Spark-SQL/Spark Data Frames with Azure Databricks and Azure Data Factory (ADF).

• Collaborated closely with Azure Logic Apps administrators to proactively monitor, diagnose, and resolve process automation and data pipeline issues, ensuring seamless operation and performance optimization within complex data.

• Enhanced efficiency by optimizing Azure Functional Apps code to perform data extraction, transformation, and loading operations from various sources including databases, APIs, and file systems, ensuring seamless data integration.

• Implemented secure storage and management of cryptographic keys, secrets, and certificates using Azure Key Vault service along with that leveraged features like encryption at rest, access policies, and integration with Azure services for enhanced data protection and compliance.

• Deployed data pipelines utilizing Apache Airflow for orchestrating end-to-end data science projects Integrated with Azure ML for predictive modelling, enabling identification of high-value customer patterns, resulting in a 15% revenue growth.

• Implemented Vertica, a high-performance analytics database, within Azure data engineering ecosystems to efficiently store, manage, and analyse large volumes of data, optimizing insights generation and decision-making processes.

• Implemented advanced Compression techniques, such as gzip and snappy, in Azure data engineering pipelines to optimize storage utilization, reduce data transfer costs, and enhance overall performance and scalability.

• Proficient in Apache PySpark for large-scale data processing, analytics tasks leveraging Python-based APIs and distributed computing capabilities for efficient data transformations and analysis.

• Developed, and maintained data integration solutions within a hybrid environment in Hadoop and relational database management systems (RDBMS), ensuring seamless data flow and interoperability across platforms.

• Orchestrated the implementation of a robust CI/CD framework for data pipelines utilizing Jenkins, collaborating closely with DevOps engineers to architect automated CI/CD and test-driven development pipelines in Azure cloud environment, aligning with client specifications and ensuring seamless deployment and scalability of data solutions.

• Wrote SQL queries (DDL, DML) and implemented indexes, triggers, views, stored procedures, functions, and packages.

• Leveraged JIRA to facilitate project management by generating detailed reports and creating sub-tasks for development, Quality Assurance (QA), and partner validation, ensuring streamlined collaboration and efficient workflow orchestration within Azure data engineering projects.

• Skilled in crafting visually compelling and informative dashboards and data visualizations using Tableau, with a focus on delivering intuitive and impactful data presentations.

• Proficient in data preparation, blending, and analysis within Tableau, adept at creating calculated fields, parameters, and custom visualizations to uncover trends, patterns, and insights from diverse datasets.

• Demonstrated expertise in Agile methodologies, adeptly participating in Agile ceremonies such as daily stand-ups and globally synchronized Program Increment (PI) Planning sessions, ensuring efficient collaboration and alignment within Azure data engineering initiatives.

Environment: Azure Databricks, Azure Event Hubs, Azure Data Factory, Azure Synapse Analytics, Key Vault, Logic Apps, Functional App, Informatica, Snowflake, MS SQL, Vertica, Oracle, Cassandra, HDFS, MapReduce, YARN, Spark, Hive, SQL, Python, Scala, Pyspark, shell scripting, JIRA, Agile, Jenkins, Kafka, Apache Airflow, ADF Pipeline, Tableau. Client: DELL TECHNOLOGIES, Austin TX. Nov 2018 - Jan 2021 Role: Azure Data Engineer

Responsibilities:

• Architected Personalized Customer Recommendations platform, orchestrating data acquisition, transformation, and analytics, driving optimal data utilization and personalized recommendations to enhance customer engagement and satisfaction.

• Ingested data into various Azure services such as Azure Data Factory, Azure Data Lake Storage Gen 2, Azure SQL server, Azure Blob storage and Azure Data Warehouse, leveraging Azure Databricks for data processing.

• Performed ETL operations using Azure Databricks and successfully migrated on-premises Oracle ETL processes to Azure Synapse Analytics.

• Migrated SQL databases to Azure Data Factory (ADF), Azure Data Lake Gen 2 (ADLS GEN 2), Azure Synapse Analytics, Azure SQL Database, Azure Databricks, and Azure SQL Data Warehouse.

• Managed database access control and facilitated the migration of On-premises databases to Azure Data Lake Storage Gen 2 (ADLS GEN 2) using Azure Data Factory (ADF).

• Proficient in utilizing Informatica PowerCenter for data integration, transformation, and ETL processes, ensuring seamless data flow and accuracy within complex business environments and also experienced in designing and implementing scalable solutions for data warehousing and analytics.

• Leveraged Azure Synapse Analytics and Polybase for seamless and optimized data ingestion, integration, and transfer, enhancing data processing efficiency and scalability.

• Implemented real-time streaming data processing using Azure Event Hubs, enabling timely insights and proactive interventions for business decision-making and these technologies such as data partitioning, indexing, and stream processing for optimized performance and scalability.

• Developed enterprise-level solutions using batch processing and streaming frameworks, including Spark Streaming and Apache Kafka.

• Applied Scala and Spark to handle diverse data types, encompassing both structured and unstructured data formats, ensuring comprehensive processing capabilities for Azure Data engineering workflows with a focus on scalability and efficiency.

• Worked extensively with Data Lakes and big data ecosystems, including Hadoop, Spark, Hortonworks, and Cloudera.

• Loaded and transformed large datasets of structured, semi-structured, and unstructured data.

• Utilized Azure Active Directory (AAD) expertise to manage identities, authentication, and access control for applications and services, implementing single sign-on (SSO) and multi-factor authentication (MFA) solutions.

• Leveraging Azure Monitor for proactive monitoring, performance optimization, resource availability, usage analysis, metrics configuration, log management, alert configuration, dashboard setup, operational efficiency, timely issue identification, continuous improvement, reliability enhancement, scalability optimization, cloud solutions.

• Integrated seamlessly with Azure Key Vault for secure access, encryption, decryption, and authentication.

• Applied Apache PySpark to leverage Resilient Distributed Datasets (RDDs) and Data Frames within Spark SQL, optimizing data processing and analysis workflows, ensuring efficient handling of large-scale datasets.

• Experience in Azure SQL Database provisioning, configuration, optimization, performance tuning, security management, disaster recovery, SQL administration, scalability, query optimization, database monitoring, backup strategies, data encryption, access control, high availability, disaster recovery planning, maintenance.

• Developed Continuous Integration/Continuous Deployment (CI/CD) pipelines to automate the building and deployment processes within the Hadoop environment, ensuring efficient project delivery and deployment practices.

• Proficient in designing and developing interactive dashboards and reports using Power BI, leveraging its robust features for data visualization, data modelling, and advanced analytics.

• Experienced in creating complex data models, implementing DAX calculations, and connecting to diverse data sources to extract actionable insights and drive business decisions with Power BI.

• Managed project workflow and tracked issues utilizing JIRA, enabling agile project management and seamless collaboration among team members, ensuring meticulous progress tracking and timely issue resolution in data engineering endeavours.

• Used Git as a version control tool for code repository management. Environment: Azure Databricks, Azure Event Hubs, Informatica, Azure Data Factory, Azure Synapse Analytics, Azure Monitoring, Key Vault, Logic Apps, Functional App, Snowflake, MS SQL, Vertica, Oracle, HDFS, MapReduce, YARN, Spark, Hive, SQL, Python, Scala, Pyspark, shell scripting, GIT, JIRA, Jenkins, Kafka, ADF Pipeline, Power BI. Client: PUBLIX, Charlotte, NC. July 2017 - Oct 2018 Role: Big Data Engineer

Responsibilities:

• Utilized Sqoop for periodic ingestion of data from MySQL into HDFS, ensuring seamless integration and efficient data transfer within big data environments, facilitating robust data processing and analysis workflows.

• Performed aggregations on large amounts of data using Apache Spark and Scala and stored the data in Hive warehouse for further analysis.

• Engaged with Data Lakes and prominent big data ecosystems such as Hadoop, Spark, Hortonworks, and Cloudera, orchestrating data processing and analytics tasks within scalable and distributed computing environments.

• Ingested and transformed extensive volumes of Structured, Semi-structured, and Unstructured data, leveraging big data technologies to handle diverse data formats efficiently within scalable distributed computing environments.

• Implemented Apache Ambari for centralized management and monitoring of Big Data infrastructure, streamlining administration tasks and ensuring optimal performance across Hadoop clusters.

• Wrote Hive queries to meet business requirements and conducted data analysis. Built HBASE tables by leveraging HBASE integration with HIVE on the Analytics Zone.

• Acquired practical proficiency in Kafka and Spark Streaming for real-time processing of streaming data, catering to targeted use cases within the realm of big data analytics and distributed computing environments.

• Engineered a data pipeline employing Flume and Sqoop to ingest customer behavioural data into Hadoop Distributed File System (HDFS), facilitating comprehensive analysis within big data analytics frameworks.

• Utilized a range of big data analytics tools like Hive and MapReduce to analyze Hadoop clusters, alongside developing a robust data pipeline with Kafka, Spark, and Hive for end-to-end data ingestion, transformation, and analysis.

• Wrote Hive queries to meet specified business requirements, created Hive tables, and utilized Hive QL to simulate MapReduce functionalities.

• Implemented UNIX and YAML scripts for orchestrating use case workflows, automating data file processing, job execution, and deployment processes, enhancing efficiency and scalability within big data environments.

• Executed a seamless migration of data from Oracle RDBMS to Hadoop utilizing Sqoop, facilitating efficient data processing and integration within the big data ecosystem, optimizing scalability and performance.

• Developed Continuous Integration/Continuous Deployment (CI/CD) pipelines to automate the building and deployment processes within Hadoop environments, ensuring efficient project delivery and deployment practices in big data ecosystems.

• Utilized JIRA to manage project issues and workflow. Worked on Spark using Python (PySpark) and Spark SQL for faster data testing and processing. Used Spark Streaming to segment streaming data into batches as input for batch processing in the Spark engine.

• Utilized Zookeeper for coordinating, synchronizing, and serializing servers within clusters. Worked with the Oozie workflow engine for job scheduling.

• Employed Git as a distributed version control system to manage codebase repositories, ensuring efficient collaboration, tracking of changes, and versioning control within Azure Cloud, maintaining code integrity and facilitating seamless development workflows.

• Conducted advanced data analysis and processing tasks utilizing Spark SQL and PySpark within Spark SQL framework, enabling efficient manipulation and querying of large-scale datasets in databases.

• Engaged in collaborative troubleshooting to address Java Virtual Machine (JVM) related challenges, ensuring optimal performance and stability within Azure data engineering ecosystems. Environment: Sqoop, MYSQL, HDFS, Apache Spark, Scala, Hive Hadoop, Cloudera, HBASE, Kafka, MapReduce, Zookeeper, Oozie, Data Pipelines, RDBMS, Python, Pysprak, JVM, shell script, Flume, YAML, Unix, Cassandra, Ambari, JIRA, GIT. Client: ATENA, Albany, NY Mar 2015 - Jun 2017

Role: Data warehouse Developer

Responsibilities:

• Experience as a SQL Server Analyst/Developer/DBA specializing in SQL Server versions 2012, 2015, and 2016 within data warehousing environments.

• Developed jobs, configured SQL Mail Agent, set up Alerts, and scheduled DTS/SSIS Packages within data warehousing.

• Manage and update the Erwin models - Logical/Physical Data Modeling for Consolidated Data Store (CDS), Actuarial Data Mart (ADM), and Reference DB according to the user requirements.

• Proficient in designing and implementing dimensional modelling techniques, including star schema and snowflake schema, to optimize data storage, streamline querying processes, and enhance reporting efficiency in data warehousing environments.

• Utilized TFS for source control and tracking environment-specific script deployments, ensuring version management and traceability in data warehouse development processes.

• Experienced in designing and implementing both snowflake and star schema structures, optimizing data organization for efficient querying and reporting.

• Exported Data Models from Erwin to PDF format and published them on SharePoint, enabling access for diverse users in data warehouse development.

• Skilled in database normalization techniques, ensuring data integrity, minimizing redundancy, and optimizing database structure for efficient storage and retrieval.

• Developing, Administering, and Managing corresponding databases: Consolidated Data Store, Reference Database (Source for the Code/Values of the Legacy Source Systems), and Actuarial Data Mart

• Writing Triggers, Stored Procedures, Functions, Coding using Transact-SQL (TSQL), create and maintain Physical Structures.

• Good working knowledge on Developing SSAS Cubes, Aggregation, KPIs, Measures, Partitioning Cube, Data Mining Models and Deploying and Processing SSAS objects.

• Experience in creating Ad hoc reports and reports with complex formulas and querying the database for Business Intelligence.

• Designed, implemented, and optimized data solutions utilizing SQL Server 2012/2015 Enterprise Edition, SSRS, SSIS, T-SQL, and Shell scripting, on Windows Server 2012.

• Proficient in PerformancePoint Server 2007, Oracle 12c, and Visual Studio 2010.

• Expertise in developing Parameterized, Chart, Graph, Linked services, Dashboard, Scorecards, Report on SSAS Cube using Drill-down, Drill-through and Cascading reports using SSRS.

• Deployment of Scripts in different environments according to Configuration Management, Playbook requirements Create

/ Manage Files/File group - Table/Index association Query Tuning, Performance Tuning.

• Managed defect tracking and resolution using Quality Center, maintaining user roles and permissions to ensure effective collaboration and quality assurance in data warehouse development. Environment: SQL Server 2012/2015 Enterprise Edition, SSRS, SSIS, T-SQL, Shell script, Windows Server 2012, PerformancePoint Server 2007, Oracle 12c, visual Studio 2010, Star Schema, Snowflake, Normalization, Dimensioning modelling. Client: UPS, Kansas City, MO. Jan 2014 - Feb 2015

Role: Data warehouse Developer

Responsibilities:

• Experience in developing complex store procedures, efficient triggers, required functions, creating indexes and indexed views for performance.

• Demonstrated expertise in monitoring and fine-tuning SQL Server performance to optimize data warehouse operations and enhance query efficiency.

• Expert in designing ETL data flows using SSIS, creating mappings/workflows to extract data from SQL Server and Data Migration and Transformation from Access/Excel Sheets using SQL Server SSIS.

• Efficient in Dimensional Data Modeling for Data Mart design, identifying Facts and Dimensions, and developing, fact tables, dimension tables, using Slowly Changing Dimensions (SCD).

• Proficient in error and event handling techniques such as precedence constraints, breakpoints, checkpoints, and logging, ensuring robust error management and monitoring within data warehouse development processes.

• Experienced in Building Cubes and Dimensions with different Architectures and Data Sources for Business Intelligence and writing MDX Scripting.

• Utilized MS SQL Server 2014 and Visual Studio 2010/2013 to design and implement robust data solutions.

• Proficient in SSIS for ETL processes, SharePoint for collaboration, and MS Access for data management. Experienced with Team Foundation Server and GIT for version control and collaboration.

• Possess a comprehensive understanding of data mart concepts including features, structure, attributes, hierarchies, as well as star and snowflake schemas, essential for effective data warehouse development and modeling processes.

• Flexible, enthusiastic, and project-oriented team player with excellent written, verbal communication and leadership skills to develop creative solutions for challenging client needs. Environment: MS SQL Server 2014, Visual Studio 2010/2013, SSIS, Share point, GIT, Dimensioning modelling MDX Scripting, SQL Server.



Contact this candidate