Data Engineer Analyst

Location:

Rogers, AR

Posted:

November 21, 2023

Contact this candidate

Resume:

Phani N

Data Scientist/Data engineer/Data analyst

: *****.*****@*****.***

: 424-***-****

PROFESSIONAL SUMMARY:

Overall, 8 years of experience in designing and developing data engineering solutions, Big Data Analytics and Development, and administering database projects that includes installing, upgrading, configuring databases, performing deployments, working on capacity planning, and tuning database to optimize the application performance.

Experience in Machine Learning with large data sets of Structured and Unstructured data, Data Acquisition, Data Validation, Predictive modelling, Data Visualization.

Experience working with varied forms of data infrastructure inclusive of relational databases such as SQL, Hadoop, Spark, and column-oriented databases such as MySQL.

Developed and implemented robust ETL processes using SSIS to extract, transform, and load data from various sources into a centralized data warehouse, ensuring data accuracy and consistency.

Experienced Azure Data Factory specialist with a proven track record of designing and implementing complex ETL pipelines to extract, transform, and load data from diverse sources.

Skilled in Azure Delta Lake and Blob Storage for managing and storing large volumes of structured and unstructured data with a strong emphasis on data security and compliance.

Utilized Azure SQL databases to store structured data, ensuring optimal performance, scalability, and data security.

Skilled in designing and implementing data ingestion pipelines using AWS Glue and AWS Lambda for extracting, transforming, and loading data from various sources into Amazon S3.

Hands-on experience with Snowflake cloud data warehouse, AWS S3 bucket, and AWS Redshift, integrating data from multiple source systems and loading nested JSON formatted data into Snowflake tables.

Proficiency in data warehousing inclusive of dimensional modelling concepts and in scripting languages like Python, Scala, and JavaScript.

Wrote AWS Lambda functions in python for AWS's Lambda which invokes python scripts to perform various transformations and analytics on large data sets in EMR clusters.

Developed SSRS reports that connected to multidimensional and tabular models in Azure Analysis Services.

Demonstrated proficiency in PL/SQL for data manipulation and extraction.

Expertise in data visualization techniques, employing tools such as Microsoft Power BI to create intuitive and interactive visual representations of complex data sets.

Utilized PL/SQL for creating dynamic SQL queries, allowing for flexible data retrieval and manipulation.

Designed and developed Power BI reports and SSRS dashboards to align with specific business requirements.

Demonstrated proficiency in using data analytics tools such as Tableau, Power BI, or Excel for data visualization.

Exceptional skills in SQL server reporting services, analysis services, Tableau, PowerBI and data visualization tools.

Hands on experience in MS SQL Server 2016, 2014, 2012, 2008 R 2/2008/2005/2000 with Business Intelligence in SQL Server Integration Services, SQL Server Analysis Services and SQL Server Reporting Services.

Developed complex SQL queries to extract valuable insights from relational databases such as MySQL and DB2.

Designed and implemented a comprehensive data model using ER/Studio for a complex e-commerce platform, enhancing data integrity, and improving query performance.

Education: Master’s in computer science at University of Missouri Kansas city

TECHNICAL SKILLS

Programming Languages

R Programming, Python, Java, SQL, JavaScript, C, C++, YAML, Scala, PLSQL, J2EE, JDBC.

Databases

MySQL, SQL Server, Oracle, MS Access, PostgreSQL, Mongo DB, Teradata.

Web Technologies

HTML, CSS, JSP, Bootstrap, Ajax, Hadoop

Tools and Technologies

Tableau, PowerBI, ArcGIS, Gephi, QlikView, Microsoft EXCEL, Informatica, ETL.

Cloud Computing Tools

Amazon AWS (EMR, EC2, S3, RDS, Redshift, Snowflake), Microsoft Azure (Data Lake, Data Storage), Azure Devops.

Azure Cloud Platform

ADFV2, BLOB Storage, ADLS, Azure SQL DB, SQL server, Azure Synapse, Azure Analytic Services, Data bricks, Mapping Dataflow (MDF), Azure Data Lake (Gen1 / Gen2), Azure Cosmos DB, Azure Stream Analytics, Azure Event Hub, Azure Machine Learning, App Services, Logic Apps, Event Grid, Service Bus, Azure DevOps, GIT Repository Management, ARM Templates

Big Data Ecosystem

HDFS, Hive, HBase, Sqoop, MapReduce, Spark (PySpark, Scala), Kafka, Hadoop, Cassandra

PROFESSIONAL EXPERIENCE

Sr. SQL Developer/Sr. Big Data Engineer, A & F, San Francisco, CA Apr 2023 – Present

Roles and Responsibilities:

Designed and implemented a scalable data processing pipeline using Azure Databricks, Azure Data Lake, and Azure Data Factory to ingest, process, and store data from various sources.

Designed and implemented end-to-end data pipelines on Azure using Azure Databricks, Azure Data Factory, and Azure Data Lake Storage, ensuring data ingestion, transformation, and loading.

Developed custom Python scripts to perform data transformations and data cleansing in Azure Databricks, improving data quality and accuracy.

Created and managed Azure Data Lake Storage Gen2 for storing large volumes of structured and unstructured data, optimizing data access and retrieval.

Orchestrated complex data workflows in Azure Data Factory to automate data integration processes and ensure data consistency across the organization.

Implemented Snowflake data warehousing solutions on Azure to enable efficient data storage, retrieval, and analytics for business intelligence.

Utilized Azure Synapse (formerly SQL Data Warehouse) for high-performance querying and analytics on large datasets, enhancing real-time decision-making capabilities.

Leveraged Microsoft Azure services to build a comprehensive big data ecosystem, encompassing data lakes, data warehouses, and real-time data processing.

Set up continuous integration and continuous deployment (CI/CD) pipelines in Azure DevOps for data pipelines, improving development and deployment efficiency.

Designed and developed interactive dashboards using Microsoft Power BI to provide data-driven insights to stakeholders, improving data visualization and reporting.

Implemented data partitioning and clustering in Azure Data Lake and Azure Synapse to enhance query performance and reduce costs.

on the Hadoop ecosystem, using PySpark, Python, Hive, the same we are trying to replicate using AWS Glue.

Integrated Apache Spark and PySpark into data processing workflows, enabling distributed data processing and machine learning capabilities.

Implemented ETL (Extract, Transform, Load) processes to transform and load data from various sources into a centralized data warehouse using Azure Data Factory.

Optimized SQL queries and data modelling techniques in data warehousing solutions to improve query performance and data retrieval speed.

Implemented security measures such as encryption, access controls, and role-based access management (RBAC) to ensure data protection and compliance.

Collaborated with cross-functional teams to gather requirements, define data schemas, and deliver end-to-end data solutions on the Microsoft Azure platform.

Developed automated data monitoring and alerting systems to proactively identify and resolve data quality and processing issues.

Built a reusable REST API framework for seamless data integration with MongoDB, enabling other teams to access and update data.

Integrated Azure Machine Learning to develop predictive models and gain actionable insights from data, contributing to data-driven decision-making.

Worked on performance tuning and optimization of Spark jobs and PySpark applications to achieve faster data processing.

Developed and maintained a centralized logging system using Shell Scripting, facilitating quick identification and resolution of system issues.

Created interactive dashboards and reports using Azure Power BI for business intelligence reporting, providing actionable insights to stakeholders.

Utilized HDFS for distributed storage of large datasets, ensuring data reliability and fault tolerance.

Implemented Azure Data Share to securely share and collaborate on data with external partners, complying with data privacy and governance regulations.

Documented data pipelines, architecture, and best practices, enabling knowledge sharing and onboarding of new team members.

Sr. Data Engineer/Data Scientist, Molina Health Care, Kansas, MO Sep 2022 – Mar 2023

Roles and Responsibilities:

Migrated the existing data from Teradata/SQL Server to Hadoop and performed ETL operations on it.

Responsible for loading structured, unstructured, and semi-structured data into Hadoop by creating static and dynamic partitions.

Designed and implemented end-to-end data pipelines for processing large datasets using R and Python, ensuring efficient data extraction, transformation, and loading (ETL) processes.

Ensured data quality and consistency by implementing data validation and cleansing processes within ETL pipelines.

Worked on different data formats such as JSON and performed machine learning algorithms in Python.

Created a task scheduling application to run in an EC2 environment on multiple servers.

Strong knowledge of various Data warehousing methodologies and Data modelling concepts.

Conducted performance tuning and optimization of SQL queries, ETL processes, and Databricks clusters to improve query response times and reduce costs.

Developed custom R scripts for data cleansing and transformation, enhancing data quality and accuracy in analytics.

Applied strong skills in Microsoft SSIS to streamline data extraction, transformation, and loading processes.

Created Physical Data Model from the Logical Data Model using Compare and Merge Utility in ER/Studio and worked with the naming standards utility.

Managed Power BI administration tasks, ensuring the platform's uptime and optimal performance.

Created data models that adhere to data privacy and security regulations (e.g., GDPR, HIPAA).

Documented technical specifications, data mappings, and ETL workflows for reference and compliance.

Successfully integrated data from older storage platforms such as MS SQL Server, Oracle, Teradata, ERP/SAP, ensuring compatibility and data consistency.

Exposure to Microsoft Azure in the processing of moving the on-prem data to azure cloud.

Used Azure DevOps to build and release different versions of code in different environments.

Collaborated with business teams to identify key performance indicators (KPIs) and metrics for monitoring.

Experience managing Azure Data Lakes (ADLS) and Data Lake Analytics and an understanding of how to integrate with other Azure Services. Knowledge of USQL and how it can be used for data transformation as part of a cloud data integration strategy.

Implemented spark and Beam machine learning algorithms and successfully deployed to production.

Used CI/CD tools Jenkins, Git/GitLab, Jira and Docker registry/daemon for configuration management and automation using Ansible.

Orchestrated data workflows using Airflow, ensuring data pipelines were reliable and timely.

Created and Managed Storage accounts in Azure Portal and pipelines, links, datasets in Azure Data Factory (ADF)

Created Containers in Docker. Moving ETL pipelines from SQL server to Hadoop Environment and worked on GLBA.

Designed and Developed ETL jobs using Talend Big Data ETL.

Developed Spark applications using PySpark and Spark-SQL for data extraction, transformation, and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns.

Packaged the application for deployment into a docker container using Docker and YAML config files.

Implemented a CI/CD pipeline using Jenkins, Airflow for Containers from Docker, and Kubernetes.

Used advanced SQL methods to code, test, debug, and document complex database queries.

Implemented Docker containers to create portable and reproducible environments for data science and application deployment.

Designed and developed Scala workflows for data pull from cloud-based systems and applying transformations on it.

Ability to develop reliable, maintainable, efficient code in most of SQL, Linux shell, and Python.

Implemented Apache-spark code to read multiple tables from the real-time records and filter the data based on the requirement.

Assisted in the design and implementation of API integrations between different software systems.

Designed and maintained PL/SQL-based data validation and cleansing routines, improving data quality.

Integrated GIS data with business intelligence tools, such as Power BI, for reporting and visualization.

Conducted market segmentation analysis using R and presented findings to guide product development decisions.

Developed complex Snowflake queries to support various reporting and analytics needs, improving query performance and reducing costs.

Developed executive-level dashboards with KPIs and financial metrics in Tableau for C-level executives.

Created interactive and visually compelling reports using PowerBI, providing key stakeholders with actionable insights into business performance and trends.

Designed and implemented dynamic and parameterized reports in SSRS, allowing end-users to customize reports based on their specific requirements.

Developed a database monitoring dashboard using Python and Shell Scripting, providing real-time insights into system performance and health.

Stored final computation result to Cassandra tables and used Spark-SQL, spark-dataset to perform data computation.

Experienced on Agile processes and facilitated planning meetings and retrospectives.

Troubleshoot and resolve complex production issues while providing data analysis and data validation. Participated in the Scrum master team meetings.

Big Data Developer/Data Engineer, Motel 6, Kansas, MO Jan 2022 – Aug 2022

Roles and Responsibilities:

Designed and developed data pipelines on the AWS cloud platform to efficiently collect, process, and store large volumes of data.

Maintained and improved existing data pipelines to ensure data quality and reliability for critical business operations.

Created solutions using the Snowflake database to support data storage and analytics needs, optimizing query performance and scalability.

Demonstrated hands-on troubleshooting experience by resolving software and data pipeline issues in a timely manner to minimize downtime and maintain data integrity.

Utilized strong analytical and problem-solving skills to address complex data engineering challenges and optimize data processing workflows.

Successfully managed and contributed to all phases of the software development life cycle, from requirements gathering to testing, ensuring the delivery of high-quality data solutions.

Worked with cross-functional teams to guarantee data accuracy and quality, facilitating data-driven decision-making across the organization.

Continuously monitored and optimized data pipeline performance to enhance data processing efficiency and reduce operational costs.

Enhanced data security by implementing encryption and access control mechanisms within AWS Glue ETL processes.

Led the design and implementation of RESTful APIs adhering to best practices, enabling seamless integration with external systems and partners.

Kept abreast of the latest technologies and industry trends in data engineering, integrating innovative solutions to improve data processing capabilities.

Demonstrated expertise in AWS cloud architecture, having hands-on experience with various AWS services, including S3, Redshift, Athena, DynamoDB, Lambda, Glue, EMR, Kinesis, and API Gateway.

Built and monitored CloudWatch alarms to proactively identify and address issues, ensuring the reliability and availability of data pipelines.

Developed and maintained data catalogs and metadata repositories in AWS Glue to facilitate data discovery and lineage tracking.

Proficiently utilized the AWS Continuous Integration and Continuous Deployment (CICD) suite, including Code Commit, Code Pipeline, and CloudFormation, to automate and streamline the deployment of data pipeline updates.

Strong experience with Apache Spark, preferred for its capabilities in processing large-scale data and enabling real-time analytics.

Implemented data versioning and change tracking mechanisms in MongoDB to enhance data governance and auditability.

Actively engaged in continuous learning to stay updated with the latest advancements in AWS, PySpark, and GIS tools.

Specialized in transforming data into user-friendly visualization to give business users a complete view of their business using Power BI.

Developed solutions using the Snowflake database, leveraging its features for data warehousing and analytics.

Excelled in troubleshooting complex SQL problems, demonstrating the ability to diagnose and resolve database-related issues efficiently.

Implemented data modeling best practices and architectural patterns to ensure efficient MongoDB schema design and data modeling.

Orchestrated complex workflows using Apache Airflow, optimizing scheduling and execution of data pipeline tasks.

Leveraged AWS DevOps tools to automate infrastructure provisioning, configuration management, and deployment processes for data applications.

Integrated AWS Glue with data visualization tools and BI platforms to enable data-driven decision-making.

Played a key role in creating and maintaining documentation for data pipelines, ensuring that the processes are well-documented and easy to understand for the team.

Collaborated with DevOps teams to implement data security best practices, including encryption and access controls, to protect sensitive data.

Data Analyst/ETL Developer, Coromandel International, Hyderabad, Ind Nov 2017 – Jul 2021

Roles and Responsibilities:

Developed Spark ETL scripts that collect application data and store into HDFS.

Developed Spark jobs to load the database log files from Linux file system to the HDFS for further processing. Also, developed bash scripts for importing the Oracle database tables to HDFS using Sqoop.

Experienced with creating Hive databases, tables, and writing Hive queries for data analysis to meeting business reporting needs.

Developed spark scripts, UDFs using both data frames/SQL and RDD/MapReduce in Spark for data aggregation, manipulation, and ordering and finally put that back into OLTP through Sqoop.

Involved in performance tuning of Informatica jobs.

Designing, developing, and deploying end-to-end Data Integration solution.

Experienced with database engineering, creating data models, functions, and procedures, understanding of data pipelining through ETL jobs and customized Shell script jobs.

Experienced with application developers designing and architecting applications from scratch, write database codes, and suggest changes in their code to optimize the application performance.

Implemented strategies for data archival and worked with application developer teams to purge unwanted data to increase the database performance.

Extensive experience with SQL, PL/SQL programming and query tuning.

Conducted performance tuning and optimization of MongoDB databases, enhancing data retrieval speed and overall system performance.

Created and scheduled PL/SQL jobs for automating data maintenance tasks and report generation.

Automated DBA monitoring tasks using bash scripts such as monitoring ASM disk space, standby database log reports, tablespace report.

Jr. Data Analyst, Unify Technologies, Hyderabad, Ind Aug 2015 - Oct 2017

Roles and Responsibilities:

Designed an application for in-house Accounting Systems in MS Excel Application.

Enhanced existing VBA application by performing VBA coding, modifying understanding existing VBA scripts.

Data extractions and Reports generation modules are written for management use.

Development of Macros and Formulas in Excel 2007.

Data Validation in MS Excel 2007 created forms for data entry while using data validation.

Created Pivot Tables Charts for Reports.

Develop Excel VBA macros and database calls.

Troubleshoot and maintain existing MS Access database for Company's Data Inventory System Department.

Extended the capabilities of an existing database used to track customer records in pure refined way after applying different queries.

Implemented PL/SQL-based security measures to protect sensitive data and ensure compliance with regulations.

SQL statement and stored procedures using SQL server 2005.

Import and Export databases between Microsoft Access, Microsoft Excel, Flat files, and SQL.

Created documentation for future time and training to new onboard users.

Certifications:

1.Microsoft Certified Azure Data Engineer

2.AWS Certified Data Engineer

3.Hacker Rank Certified Python

4.Hacker Rank Certified SQL(Basic)

5.Hacker Rank Certified SQL(Intermediate)

6.Hacker Rank Certified SQL(Advanced)

Contact this candidate