Azure Data Engineer

Location:

Hyderabad, Telangana, India

Posted:

April 29, 2024

Contact this candidate

Resume:

ANUSHA BILLURI

Microsoft-certified Azure Data Engineer

Phone:475-***-****

E-Mail:*************@*****.***

Linkedin:www.linkedin.com/in/anusha-billuri-27681a152

PROFESSIONAL SUMMARY

10+ years of experience in Data warehouse, Azure with Snowflake, AWS services, and scalable data ingestion pipelines. Skilled in Azure Data Factory architecture, enabling seamless integration between on-premises and Azure Cloud using Python, PySpark, and Microsoft Azure Cloud services.

Hands-on experience in working with Azure Cloud and its components like Azure Data Factory, Azure Data Lake Gen2, Azure Blob Storage, Azure Databricks, Azure Synapse Analytics, Logic Apps, Function apps, Azure Key Vault.

Proficient in managing and configuring Azure Blob Storage, File Storage, Queue Storage, and Table Storage.

Skilled in developing robust Data Lake data ingestion pipelines, performing data extraction, transformation, and loading (ETL) processes to ensure data quality and availability.

Implemented data ingestion pipelines using Azure Synapse Analytics to efficiently extract, transform, and load (ETL) large volumes of structured and unstructured data into the data warehouse.

Proficient at using Databricks notebooks for data exploration with Pyspark/Scala, scripting using Python/SQL, and deploying APIs for the analytics team.

Demonstrated understanding and proficiency in Agile methodologies, particularly SCRUM, for efficient project management and iterative development processes.

Hands-on experience on Azure function apps as API services to communicate with various Databases.

Automated data flows using Azure Logic apps and Power Automate (Flow) which connects different Azure services and Function apps for customization.

Proficient in working with Hadoop ecosystem technologies such as HDFS, MapReduce, YARN, Sqoop, Cassandra, Pig, Kafka, Zookeeper, and Hive.

Implemented end-to-end ETL processes on Snowflake, transforming raw data into meaningful insights. Utilized Snowflake's features like stages, file formats, and COPY INTO command to efficiently load and process data.

Expertise in large-scale data processing, machine learning, and real-time analytics using Apache Spark.

Experience in using Apache Sqoop to import and export data to and from HDFS and Hive.

Strong expertise in loading unstructured and semi-structured data into Hadoop clusters coming from different sources using Flume.

Extensive knowledge of Relational &amp dimensional data modeling, star schema/snowflakes schema, fact and dimension tables, and process mapping using the top-down and bottom-up approach.

Performed complex data workflows using Apache Oozie for efficient data processing and workflow automation.

Strong understanding of developing MapReduce programs to cleanse and parse data in HDFS obtained from various data sources and to perform joins on the Map side using distributed cache.

Utilized Azure Delta Lake to establish a robust and efficient data lake architecture, ensuring data integrity, reliability, and optimal analytics performance.

Participated in the development, improvement, and maintenance of Snowflake database applications.

Implemented CI/CD pipelines using Azure DevOps to streamline data engineering processes and ensure efficient and reliable delivery of data solutions.

EDUCATION

Bachelor’s in Computer Science, Acharya Nagarjuna University.

Certifications:

Microsoft Azure Data Engineer Associate: DP-203

TECHNICAL SKILLS

Cloud Services

Azure data Factory, Azure Data Bricks, Logic Apps, Function Apps, Snowflake, Azure DevOps,AWS (Amazon Web Services), EC2, S3, ELB, RDS

Big Data Technologies

MapReduce, Hive, Tez, Python, PySpark, Scala, Kafka, Spark, Oozie, Sqoop, Zookeeper, Cassandra, Flume, Pig, Apache Spark Streaming

Hadoop Distribution

Cloudera,Horton works

Languages

SQL, PL/SQL, Python, HiveQL, Scala, U-SQL, and NoSQL.T-SQL

Web Technologies

HTML, CSS, JavaScript, XML, JSP, Restful, SOAP

Operating Systems

Windows (XP/7/8/10), UNIX, LINUX, UBUNTU, CENTOS.

Build Automation tools

Ant, Maven, PowerShell scripts

Version Control

GIT, GitHub.

IDE &Build Tools, Design

Eclipse, Visual Studio.

Databases

MS SQL Server 2016/2014/2012, Azure SQL DB, Azure Synapse. MS Excel, MS Access, Oracle 11g/12c, Cosmos DB, MongoDB,Milvus Vector DB

WORK EXPERIENCE

Azure Databricks Engineer Feb 2023 to Till date

Wells Fargo, Bradenton, FL

Responsibilities:

Designed and implemented scalable data ingestion pipelines using Azure Data Factory, ingesting data from various sources such as SQL databases, CSV files, and REST APIs.

Utilized PolyBase to establish seamless integration between heterogeneous data sources, enabling efficient querying and analysis across various platforms, such as SQL Server & Azure SQL Database.

Created Azure Databricks notebooks using Spark SQL, Scala, Python, and Automated notebooks using jobs.

Developed data processing workflows using Azure Databricks, leveraging Spark for distributed data processing and transformation tasks.

Built customer operator tasks in Azure Integration Services using Python-based data pipeline use cases.

Developed data ingestion jobs using different steaming services like Apache Kafka into different data storage services in Azure and other enterprise data stores.

Leveraged Azure Logic Apps for orchestrating complex workflows, integrating various data services and triggering actions based on events.

Utilized Azure Function Apps for serverless computing in data engineering tasks, enabling the execution of discrete functions without managing infrastructure.

Proficient in leveraging Azure Machine Learning services to design, develop, and deploy machine learning models, demonstrating a strong understanding of the end-to-end machine learning lifecycle.

Managed and optimized OLTP systems to ensure real-time, high-speed processing of transactional data, enhancing the efficiency and responsiveness of critical business operations.

Proficient in writing and optimizing T-SQL queries for Microsoft SQL Server, demonstrating the ability to retrieve, manipulate, and analyze data efficiently.

Proficient in designing, developing, and maintaining reports using SQL Server Reporting Services (SSRS), creating visually appealing and insightful reports for business stakeholders.

Experienced with popular RDBMS platforms, such as MySQL, PostgreSQL, Oracle, and Microsoft SQL Server, enabling effective database management and development.

Managed and optimized resource allocation for big data processing using Apache YARN on Azure HDInsight.

Extensive experience in utilizing PySpark for ETL (Extract, Transform, Load) operations, enabling efficient data processing, cleaning, and transformation in big data environments.

Proficient in Azure Data Lake Storage Gen2 (ADLS Gen2), designing and implementing scalable data solutions, optimizing performance, and ensuring data integrity for efficient data processing and analysis.

Developed the scalable data pipelines in Azure Databricks and ingested the enrich on the gold layer of the data lake.

Proficient in implementing and managing Delta Lake on Azure, leveraging Azure Databricks, Azure Synapse Analytics, or other Azure services to create reliable and scalable data lakes.

Implemented CI/CD pipelines using Azure DevOps to streamline data engineering processes and ensure efficient and reliable delivery of data solutions.

Proficient in implementing and managing Azure Event Hubs for real-time event streaming and data ingestion in cloud-based solutions, ensuring scalability and reliability

Created PowerShell scripts for managing Azure resources, orchestrating data workflows, and performing administrative tasks, improving operational efficiency.

Built the data pipeline using Azure Services like Azure Data Factory to load the data from the Legacy SQL server to Azure Data Base using Data Factories.

Proficient in PL/SQL, with a strong track record of leveraging its capabilities to develop and optimize database-driven applications, ensuring efficient data management and retrieval.

Experienced in writing and optimizing U-SQL scripts for data processing and analytics, enabling efficient ETL (Extract, Transform, Load) operations on large datasets within Azure Data Lake Analytics.

Demonstrated understanding and proficiency in Agile methodologies, particularly SCRUM, for efficient project management and iterative development processes.

Proficient in writing and optimizing MDX queries for querying multidimensional databases, extracting relevant data for analysis, and generating comprehensive reports for business intelligence purposes.

Proficient in utilizing DAX (Data Analysis Expressions) to create complex calculations, measures, and data models in Power BI, enabling accurate data analysis and visualization.

Proficient in Python Lambda functions for concise and efficient code execution in serverless computing environments.

Utilized Azure Cosmos DB to create a globally distributed and highly responsive NoSQL database solution, ensuring seamless data access and low-latency performance.

Environment: Azure Databricks, Azure Data Factory, Snowflake, Azure Logic Apps,Azure Function App, Snowflake, MS SQL, Oracle, HDFS, MapReduce, YARN, Spark, Hive, SQL,Python, Scala, PySpark, PowerBI, Powershell and Kafka.

Azure Snowflake Data Engineer Jul 2021 – Jan 2023

Verizon, Basking Ridge, NJ

Responsibilities:

Implemented end-to-end data pipelines using Azure Data Factory to extract, transform, and load (ETL) data from diverse sources into Snowflake.

Designed and implemented data processing workflows using Azure Databricks, leveraging Spark for large-scale data transformations.

Proficient in utilizing Spark SQL for querying and analyzing large-scale structured data within Apache Spark, enabling seamless integration of SQL queries with Spark's distributed computing capabilities for efficient data processing and analytics.

Applied Infrastructure as Code principles within Azure DevOps to manage and version infrastructure configurations for data solutions.

Integrated Azure Machine Learning with other Azure services, such as Azure Databricks, Azure Synapse Analytics, and Azure Data Factory, to create comprehensive data pipelines and enable seamless integration of machine learning solutions into broader data workflows.

Applied TDD methodologies to create high-quality, maintainable code by writing unit tests before implementing new features.

Proven ability to ensure data integrity, consistency, and security within RDBMS through user access control, backup and recovery strategies, and performance tuning.

Implemented security measures for DLT networks, including encryption, key management, and access controls, ensuring the integrity and confidentiality of distributed ledger data.

Designed and implemented data streaming pipelines using Confluent Kafka for real-time event processing.

Skilled in SnowSQL to develop and maintain data workflows, ensuring data integrity and accessibility for informed decision-making.

Implemented product navigation, search functionalities, and interactive features within the Unity catalog, enhancing user engagement and satisfaction.

Demonstrated expertise in designing and maintaining Snowflake data warehouses, implementing data security best practices, and collaborating with cross-functional teams to ensure seamless data integration, storage, and retrieval for organizational needs.

Integrated Snowflake with Power BI and Azure Analysis Services for creating interactive dashboards and reports, enabling self-service analytics for business users.

Skilled in data modeling, DAX calculations, and data transformation within the Power BI ecosystem.

Environment: Azure Databricks,Azure Data Factory, Azure Logic Apps, Snowflake, Azure Functional App, Snowflake, MS SQL, Oracle, HDFS, MapReduce, YARN, Spark, Hive, SQL, Python, Scala, PySpark, Tableau, shell scripting, Kafka.

AWS Data Engineer Jul 2019 – Jun 2021

CVS, Louisville, KY.

Responsibilities:

Designed and implemented scalable and robust data architectures on AWS, utilizing services such as Amazon S3, Amazon Redshift, and AWS Glue.

Ensured efficient data storage, retrieval, and processing for diverse analytical and reporting needs.

Spearheaded the integration of Snowflake as a cloud-based data warehousing solution, providing a secure and flexible platform for data storage and analytics.

Implemented Snowflake features such as data sharing and multi-cluster, ensuring optimal performance for complex analytical queries.

Implemented and managed Amazon EMR clusters for distributed processing of large-scale data sets, utilizing Apache Spark and Hadoop frameworks to perform efficient data processing and analysis.

Designed and implemented scalable and durable storage solutions using Amazon S3, leveraging its object storage capabilities for efficient data storage and retrieval.

Implemented and managed Apache Kafka clusters using Amazon MSK, ensuring the availability, scalability, and durability of real-time streaming data infrastructure.

Proficient in healthcare data exchange standards including X12, HL7, and FHIR for streamlined interoperability and data integration in healthcare systems.

Engineered scalable and resilient compute solutions using Amazon EC2 instances, tailoring virtual machine configurations to meet specific application requirements and performance needs.

Configured and managed route tables in Amazon VPC, defining the routing policies to direct traffic between subnets and control communication within the virtual network.

Developed and trained machine learning models using Amazon SageMaker, leveraging its end-to-end capabilities for data preparation, model training, and deployment in a scalable and efficient manner.

Utilized AWS Glue's serverless architecture to automate data preparation and transformation tasks.

Integrated Snowflake with other AWS services, including AWS Lambda for serverless computing and AWS Step Functions for orchestrating complex workflows.

Led successful data migration projects, migrating on-premises data warehouses or other cloud-based solutions to Snowflake on AWS.

Conducted data profiling, validation, and reconciliation to ensure data accuracy during migration processes.

Implemented cost-effective solutions for Snowflake on AWS, optimizing resource allocation based on workload patterns.

Conducted cost analyses and recommendations for right-sizing Snowflake clusters and storage.

Established monitoring solutions for Snowflake, utilizing AWS CloudWatch and Snowflake's native monitoring tools to track performance and resource utilization.

Environment: Sqoop, MySQL, EC2, IAM, HDFS, Apache Spark Scala, Hive, Athena, SageMaker, AWS Redshift, Python, AWS Glue, KMS, EKS, Route 53, Elastic MapReduce, Kinesis, Kubernetes, Event Bridge, SQS, DynamoDB, AWS Cloudwatch, Snowflake.

Big Data Developer Jan 2017 – Jun 2019

PNC BANK, Dallas, TX

Responsibilities:

Imported data from MySQL to HDFS on a regular basis using Sqoop for efficient data loading.

Enhanced end-to-end development of Data Warehouses/Data Marts/Data Lakes with ETL tools like Informatic power center, and Big Data platform (PySpark, Hive, Hadoop Ecosystem) environments.

Performed aggregations on large volumes of data using Apache Spark and Scala and stored the results in the Hive data warehouse for further analysis.

Worked extensively with Data Lakes and big data ecosystems, including Hadoop, Spark, Hortonworks, and Cloudera.

Loaded and transformed structured, semi-structured, and unstructured data sets efficiently.

Developed Hive queries to analyze data and meet specific business requirements.

Leveraged HBASE integration with Hive to build HBASE tables in the Analytics Zone

Utilized Kafka and Spark Streaming to process streaming data for specific use cases.

Developed data pipelines using Flume and Sqoop to ingest customer behavioral data into HDFS for analysis.

Utilized various big data analytic tools, such as Hive and MapReduce, to analyze Hadoop clusters.

Implemented a data pipeline using Kafka, Spark, and Hive for ingestion, transformation, and analysis of data.

Migrated data from RDBMS (Oracle) to Hadoop using Sqoop for efficient data processing.

Developed custom scripts and tools using Oracle's PL/SQL language to automate data validation, cleansing, and transformation processes.

Implemented CI/CD pipelines for building and deploying projects in the Hadoop environment.

Utilized JIRA for issue and project workflow management.

Utilized PySpark and Spark SQL for faster testing and processing of data in Spark.

Configured and customized Hadoop services using Ambari, ensuring optimal resource utilization and performance in data engineering processes.

Used Spark Streaming to process streaming data in batches for efficient batch processing.

Leveraged Zookeeper to coordinate, synchronize, and serialize servers within clusters.

Utilized the Oozie workflow engine for job scheduling in Hadoop.

Managed Agile boards, sprints, and backlogs within JIRA for improved project visibility and coordination.

Utilized PySpark in SparkSQL for data analysis and processing.

Used Git as a version control tool to maintain the code repository.

Environment: Sqoop, MYSQL, HDFS, Apache Spark Scala, Hive Hadoop, Cloudera, Kafka, MapReduce, Zookeeper, Oozie, Data Pipelines, RDBMS, Python, PySpark, Ambari, JIRA.

Data Warehouse Developer May 2014 - Mar 2016

Elegant Microweb, Hyderabad, India

Responsibilities:

Designed and implemented scalable and efficient data processing pipelines using technologies such as Apache Hadoop and Apache Spark.

Conducted in-depth data analysis to extract valuable insights and support data-driven decision-making processes.

Developed and maintained large-scale distributed databases, optimizing performance and ensuring data integrity.

Implemented data warehousing solutions for efficient storage, retrieval, and analysis of structured and unstructured data.

Ensured seamless data flow between SQL Server databases, Cosmos DB, and the Hadoop/Spark ecosystem for comprehensive analytics.

Proficient in programming languages such as Java, Python, and Scala for developing robust data applications.

Created and optimized scripts for data extraction, transformation, and loading (ETL) processes.

Extensive experience with big data technologies, including Apache Hadoop ecosystem components (HDFS, MapReduce) and Apache Spark for large-scale data processing.

Implemented security measures within Cloudera Manager to control access, ensure data integrity, and comply with regulatory requirements.

Implemented automated alerts within Ambari for proactive cluster management.

Implemented and managed Apache Zookeeper for distributed coordination and synchronization in the Hadoop and Spark ecosystem.

Managed Agile boards, sprints, and backlogs within Jira for improved project visibility and coordination.

Utilized shell scripting to automate system tasks, manage file manipulations, and orchestrate data processes in the Hadoop and Spark clusters.

Utilized tools like Apache Hive and Apache Pig for data transformation and analysis.

Designed and implemented data models to support business requirements and ensure effective data organization.

Implemented Sqoop connectors to ensure reliable and scalable data movement, contributing to a cohesive and interoperable big data ecosystem.

Developed and maintained data schemas and structures for optimal performance and scalability.

Implemented real-time data processing solutions using technologies such as Apache Kafka for streaming data ingestion.

Developed and maintained data processing applications using PySpark, combining the power of Python and Spark for ETL tasks.

Orchestrated end-to-end data workflows using Apache Oozie to schedule and manage complex data processing tasks.

Ensured the availability and reliability of real-time data streams for immediate business insights.

Utilized version control systems (e.g., Git) for managing codebase and ensuring collaboration efficiency.

Maintained comprehensive documentation for developed data solutions, ensuring knowledge transfer and team continuity.

Environment: SQL Server, Cosmos DB, Informatica, SSIS, Sqoop, MYSQL, HDFS, Apache Spark Scala, Hive Hadoop, Cloudera, HBASE, Kafka, MapReduce, Zookeeper, Oozie, Data Pipelines, RDBMS, Python, PySpark, shell script, Ambari, ETL, JIRA.

ETL Developer Nov 2012 - Apr 2014

OG Software Solutions, Chennai, India

Responsibilities:

Designed, developed, and maintained end-to-end ETL processes, ensuring seamless data extraction, transformation, and loading from source to target systems.

Orchestrated data workflows to support business intelligence, analytics, and reporting requirements.

Demonstrated expertise in ETL tools such as Informatica, Talend, or Apache NiFi, utilizing their functionalities for data integration and transformation.

Developed and optimized ETL jobs and workflows to align with business objectives and data quality standards.

Created and maintained data models and mappings, defining the transformation logic to ensure accurate and consistent data representation.

Collaborated with data architects to design efficient and scalable data structures for ETL processes.

Conducted performance-tuning activities to optimize ETL job execution times, resource utilization, and overall system efficiency.

Proven expertise in designing, developing, and optimizing ETL processes using SQL Server Integration Services (SSIS).

Integrated ETL processes with diverse source systems, including databases, APIs, flat files, and cloud-based platforms.

Environment: Informatica Power Center 10.5, SQL Developer, MS SQL Server, Flat Files, XML files 89 10g, DB2, SQL, PL/SQL, Unix/Linux, Putty, FileZilla.

Contact this candidate