Post Job Free

Resume

Sign in

Data Engineer Azure

Location:
Chicago, IL
Salary:
120000
Posted:
March 07, 2024

Contact this candidate

Resume:

Harshavardhan Pothula

Data Engineer

Phone No: 872-***-**** Email ID: ad36tx@r.postjobfree.com

OBJECTIVE

More than 6+ years of IT experience in Data Warehousing and Business Intelligence. Experience in analyzing requirements, designing, implementing, and unit testing various Data Warehousing projects. EDUCATION

Masters in computer Science from

Governors State university, USA

SKILLS

Cloud Services: Azure Cloud

Services, AWS Cloud Services, Azure

VMs, Azure DataFactory, Azure

Databricks, Azure SQL Database,

Azure Functions, LogicApps,

AppServices, Azure KeyVaults,

Managed Identities, AWS EC2, S3,

RDS, Craelers, IAM, VPc, AWS Glue,

Data Catalog, RDS, Managed Apache

Airflow

Data Warehouses: Azure Synapse

Analytics, AWS Redshift, Snowflake

&Salesforce NPSP

Databases: SQL Server 2018,

PostgreSQL, AzureSQL

CI/CD: Jenkins, Azure DevOps

Source/ Version Control: Git,

GitHub, GitLabs, BitBucket

Project Management Tools: JIRA,

ServiceNow, Confluence

Programming: Python

Programming

Big Data Technologies: Apache

Spark, Hadoop, Hive

Containerization/ Orchestration:

Docker, Kubernetes

Streaming Data Technologies:

Apache Kafka, Apache Flink

Data Quality/ Governance: Data

Quality Frameworks, Data

Governance Practices

Operating Systems: UNIX Shell

Scripting (viaPuTTYclient), Linux,

Windows, MacOS.

SUMMARY

• Over 6+ years of experience as a highly skilled and motivated Data Engineer. Specialized in designing, developing, and implementing robust data pipelines, ETL processes, and data architectures.

• Proficient at developing sophisticated MapReduce systems that operate on a variety of file types, including Text, Sequence, XML, and JSON.

• Expert in utilizing Spark SQL, Spark streaming and developing Data frames using Spark with Snowflake. Has experience with story grooming, Sprint planning, daily standups, and software techniques like Agile and SAFe.

• Hands-on experience with Azure SQL Database, Azure SQL Data Warehouse, Azure Analysis Services, HDInsight, Azure Data Lake and Data Factory.

• Experience in Cisco Cloud Center to more securely deploy and manage applications in multiple data center, private cloud, and public cloud environments. Experience working with NoSQL databases like MongoDB and AWS DynamoDB in storing and retrieving json documents.

• Experience in implementing Azure data solutions, provisioning storage account, Azure Data Factory, SQL server, SQL Databases, SQL Data wareho use, Azure Data Bricks and Azure Cosmos DB.

• Working knowledge of HDFS, Kafka, Map Reduce, Spark, PIG, HIVE, Sqoop, HBase, Flume, and Zookeeper as tools for designing and deploying end-to-end big data ecosystems. Worked with creating Docker files, Docker containers in developing the images and hosting them in antifactory.

• Expert in designing Parallel jobs using various stages like Join, Merge, Lookup, remove duplicates, Filter, Dataset, Lookup file set, Complex flat file, Modify, Aggregator, XML. Experience in migrating on premise to Windows Azure using Azure Site Recovery and Azure backups.

• Experienced with Dimensional modelling, Data migration, Data cleansing, Data profiling, and ETL Processes features for data warehouses.

• Experience on Migrating SQL database to Azure data Lake, Azure data lake Analytics, Data Bricks and Azure SQL Data warehouse and controlling and granting database access and Migrating On premise databases to Azure Data Lake store using Azure Data factory.

• Experience in Windows Azure Services like PaaS, IaaS and worked on storages like Blob (Page and Block), SQL Azure. Well experienced in deployment & configuration management and Virtualization.

• Expertise in building CI/CD on AWS environment using AWS Code Commit, Code Build, Code Deploy and Code Pipeline and experience in using AWS CloudFormation, API Gateway, and AWS Lambda in automation and securing the infrastructure on AWS.

Client: State Farm, Bloomington, Illinois, U.S. Jan. 2023 - Present Role: Azure Data Engineer

Description: State Farm is an insurance company that offers insurance and financial services. It is the largest property, casualty, and auto insurance provider. It offers services to increase food production, establish commercial orchards, develop land, develop farm machinery, and the breeding, production, and distribution of seeds. Responsibilities:

• Conducted Performance tuning and optimization of Snowflake data warehouse, resulting in improved query execution times and reduced operational costs. Designed and deployed a Kubernetes-based containerized infrastructure for data processing and analytics, leading to a 20% increase in data processing capacity.

• Wrote and executed various MYSQL database queries from Python using Python-MySQL connector and MySQL dB package. Involved in monitoring and scheduling the pipelines using Triggers in Azure DataFactory.

• Created Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform, and load data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and backwards.

• Build and deployed the code Artefacts into the respective environments in the Confidential Azure cloud.

• Storing different configs in No SQL database Mongo DB and manipulating the configs using PyMongo.

• Used Kafka functionalities like distribution, partition, replicated commit log service for messaging systems by maintaining feeds. Involved in loading data from rest endpoints to Kafka. Used Continuous Delivery Pipeline. Deployed microservices, including provisioning Azure environments and developed modules using Python scripting and Shell Scripting.

• Involved in various phases of Software Development Lifecycle (SDLC) of the application, like gathering requirements, design, development, deployment, and analysis of the application.

• Led requirement gathering, business analysis, and technical design for Hadoop and Big Data projects.

• Spearheaded HBase setup and utilized Spark and SparkSQL to develop faster data pipelines, resulting in a 60% reduction in processing time and improved data accuracy.

• Instantiated, created, and maintained CI/CD (continuous integration & deployment) pipelines and apply automation to environments and applications. Worked on various automation tools like GIT, Terraform, Ansible.

• Created datasets from S3 using AWS Athena and created Visual insights using AWS Quicksight Monitoring Data Quality and integrity end to end testing and reverse engineering and documented existing program and codes.

• Extensive use of cloud shell SDK in GCP to configure/deploy the services using GCP Big Query. Integrated Kubernetes with cloud-native services, such as AWS EKS and GCP GKE, to leverage additional scalability and managed services.

• Working on migrating Data to the cloud (Snowflake and AWS) from the legacy data warehouses and developing the infrastructure. Handled importing data from various Data sources, performed transformations using Pandas, Spark and loaded into Hive as external files using HDFS.

• Enhanced by adding Python XML SOAP request/response handlers to add accounts, modify trades and security updates.

• Used Pig as ETL tool to do Transformations with joins and pre-aggregations before storing the data onto HDFS and assisted Manager by providing automation strategies, Selenium/Cucumber Automation and JIRA reports.

• Ensured data quality and accuracy with custom SQL and Hive scripts and created data visualizations using Python and Tableau for improved insights and decision-making. Implemented data transformations and enrichment using Apache Spark Streaming to clean and structure the data for analysis. Environment: Python, Django, JavaScript, MySQL, NumPy, SciPy, PandasAPI, PEP, PIP, Jenkins, JSON, Git, JavaScript, AJAX, RESTful web service, MySQL, PyUnit.

Client: OSF Saint Francis Medical Center, Peoria, Illinois, USA Apr. 2022 – Dec. 2022 Role: AWS Data Engineer

Description: OSF Saint Francis Medical Center is a medical practice company. It is a not-for-profit Catholic health care organization that operates a medical group, hospital system, and other health care facilities. WORK EXPERIENCE

Responsibilities:

• Worked with AWS Terraform templates in maintaining the infrastructure as code. Develop metrics based on SAS scripts on legacy system, migrating metrics to snowflake (AWS).

• The AWS Lambda functions were written in Spark with cross - functional dependencies that generated custom libraries for delivering the Lambda function in the cloud. Performed raw data ingestion into, which triggered a lambda function and put refined data into ADLS. Looked into existing Java/Scala spark processing and maintained, enhanced the jobs.

• Analyzed and developed a modern data solution with Azure PaaS service to enable data visualization. Understood the application's current Production state and the impact of new installation on existing business processes.

• Involved in building database Model, APIs and Views utilizing Python, in order to build interactive web based solutions.

• Developed Python Spark modules for Data ingestion & analytics loading from Parquet, Avro, JSON data and from database tables. Ability to apply the spark Data Frame API to complete data manipulation with in sparksession.

• Worked on creating MapReduce programs to parse the data for claim report generation and running the Jars in Hadoop. Co-ordinated with Java team in creating MapReduce programs. Staged the API and Kafka Data (in JSON file format) into Snowflake DB by Flattening the same for different functional services.

• Experience in creating Kubernetes replication controllers, Clusters and label services to deployed Microservices in Docker. Responsible for estimating the cluster size, monitoring, and troubleshooting of the Spark Databricks cluster and Ability to apply the spark Data Frame API to complete Data manipulation within spark session.

• Used Python to write Data into JSON files for testing Django Websites, Created scripts for data modelling and data import and export. Consult leadership/stakeholders to share design recommendations and thoughts to identify product and technical requirements, resolve technical problems and suggest Big Data based analytical solutions.

• Instantiated, created, and maintained CI/CD continuous integration & deployment pipelines and apply automation to environments and applications. Collected and aggregated large amounts of web log data from different sources such as webservers, mobile and network devices using Apache Flume and stored the data into HDFS for analysis. Environment: Spark, Python, AWS, S3, Glue, Redshift, DynamoDB, Hive, SparkSQL, Docker, Kubernetes, Airflow, GCP, ETL workflows

Client: Equitas Small Finance Bank, Chennai, India Nov. 2019 – Dec. 2021 Role: Data Engineer

Description: Equitas Small Finance Bank is a small finance bank. It operates through 3 segments: Treasury, Wholesale banking and Retail Banking. Its services include retail banking business with focus on micro finance, commercial vehicle finance, etc for individuals and micro and small enterprises.

Responsibilities:

• Experience in using different types of stages like Transformer, Aggregator, Merge, Join, Lookup, and Sort, remove duplicated, Funnel, Filter, Pivot for developing jobs. Created pipelines to load the data using ADF.

• Building/Maintaining Docker container clusters managed by Kubernetes Linux, Bash, GIT, Docker.

• Build Data pipelines using Python, Apache Airflow for ETL related jobs inserting data into Oracle.

• Creating job flow using Airflow in python and automating the jobs. Airflow will have separate stack for developing DAGs on and will run jobs on EMR or EC2 Cluster. Written queries in MySQL and Native SQL.

• Pipelines were created in Azure Data Factory utilizing Linked Services to extract, transform, and load data from many sources such as Azure SQL Data warehouse, write-back tool, and backwards.

• Configured Spark streaming to get ongoing information from the Kafka and store the stream information to DBFS.

• Deployed models as python package, as API for backend integration and as services in a microservices architecture with a Kubernetes orchestration layer for the Dockers containers.

• Involved in the entire lifecycle of the projects including Design, Development, and Deployment, Testing and Implementation, and support. Analyzed the SQL scripts and designed it by using Spark SQL for faster performance.

• Worked on Big Data Integration & Analytics based on Hadoop, SOLR, PySpark, Kafka, Storm and web Methods.

• Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries. Processed the image data through the Hadoop distributed system by using Map and Reduce then stored into HDFS. Used AWS to create storage resources and define resource attributes, such as disk type or redundancy type, at the service level.

• Created Data tables utilizing PyQt to display customer and policy information and add, delete, update customer records.

• Working on data management disciplines including data integration, modeling and other areas directly relevant to business intelligence/business analytics development. Developed tools using Python, Shell scripting, XML to automate tasks.

• Involved in database migration methodologies and integration conversion solutions to convert legacy ETL processes into Azure Synapse compatible architecture. Created clusters to classify control and test groups.

• Developed multiple notebooks using Pyspark and Spark SQL in Databricks for data extraction, analyzing and transforming the data according to the business requirements. Build Jenkins jobs for CI/CD Infrastructure for GitHub repos.

• Automated and monitored AWS infrastructure with Terraform for high availability and reliability, reducing infrastructure management time by 90% and improving system uptime. Integrated Azure Data Factory with Blob Storage to move data through DataBricks for processing and then to Azure Data Lake Storage and Azure SQL data warehouse. Environment: ER/Studio, Teradata, SSIS, SAS, Excel, T-SQL, SSRS, Tableau, SQLServer, Cognos, Pivottables, Graphs, MDM, PL/SQL, ETL, DB2, Oracle, SQL, Teradata, Informatica Power Center etc. Client: TTK Group, Chennai, India Jul. 2017 – Oct 2019 Role: Data Analyst

Description: TTK Group manufactures home appliances and health care products. It produces and markets kitchen ware, pharmaceuticals, medical devices, animal products and food products. I designed, developed, and maintained scalable web applications.

Responsibilities:

• Involved in creating logical and physical data analyst with STAR and SNOWFLAKE schema techniques using Erwin in Data warehouse as well as in Data Mart. Migrated SSIS packages from SQL Server to SSIS and Created Mappings for Initial load from MSSQL server 2005 to Netezza while performing data cleansing.

• Performed data analysis and data profiling using complex SQL on various sources systems including Oracle, Netezza and Teradata. Performed match/ merge and ran match rules to check the effectiveness of MDM process on data

• Involved in Data Analysis, Data Validation, Data Cleansing, Data Verification and identifying data mismatch.

• Writing and executing customized SQL code for adhoc reporting duties and used other tools for routine

• Extensively used SQL to develop reports from the existing relational dataware house tables (Queries, Joins, Filters, etc)

• Converted SQL Server packages to Informatica Power Center mapping to be used with Netezza and involved in Informatica MDM processes including batch based and real-time processing

• Performed ad-hoc queries by using SQL, PL/SQL, MSAccess, MS excel and UNIX to meet business Analyst's needs

• Responsible for reviewing data model, data base physical design, ETLdesign, and Presentation layer design

• Conducted meetings with business and development teams for data validation and end-to-end data mapping

• Developed logging for ETL load at the package level and task level to log number of records processed by each package and each task in a package using SSIS. Involved in Oracle, SQL, PL/SQL, T-SQL queries programming and creating objects such as stored procedures, packages, functions, triggers, tables, and views. Environment: Erwin, MDM, ETL, Teradata, MS SQLServer 2005, PL/SQL, Netezza, DB2, Oracle, SSIS, IBM etc.



Contact this candidate