Data engineer

Location:

United States

Posted:

February 08, 2023

Contact this candidate

Resume:

Praneeth

Data Engineer

Email: ***************@*****.***

Phone: 469-***-****

Professional Summary:

Having Data Engineer with over 8+ years of IT experience in Data Application Developments

Strong development skills with Azure Data Lake, Azure Data Factory, SQL Data Warehouse Azure Blob, Azure Storage Explorer.

Architect and implement ETL and data movement solutions using Azure Data Factory (ADF), SSIS

Fluent programming experience with PowerShell, Scala, Java, Python, SQL, T - SQL, R.

Hands-on experience in developing and deploying enterprise-based applications using major Hadoop ecosystem components like MapReduce, YARN, Hive, HBase, Flume, Sqoop, Spark SQL, Kafka.

Experience on developing data pipelines and flow on Cloud services like AWS, Azure.

Worked on Aws Lambda, Athena, and Glue also on Azure SQL, Azure PowerShell and Data-lake setup.

Experience on DevOps administering Aws instances and Quick-sight service through IAM and Policy making

Adept at configuring and installing Hadoop/Spark Ecosystem Components.

Experienced working on IaC tools like Aws CloudFormation and Terraform.

Proficient with Spark Core, Spark SQL and Spark Streaming for processing and transforming complex data using in-memory computing capabilities written in Scala.

Worked with Spark to improve efficiency of existing algorithms using Spark Context, Spark SQL, Data Frame, Pair RDD's and Spark YARN.

Worked on Azure SQl and SQl administering on different management standards.

Extensive working experience on SaaS and IaaS platforms.

Expert in providing ETL solutions for any type of business model.

Experience working on Databricks platform for extensive use of Spark framework.

Develop Power BI reports & effective dashboards after gathering and translating end-user requirements.

Recreating existing application logic and functionality in the Azure Data Lake, Data Factory, SQL Database, and SQL Data warehouse environment.

Propose architectures considering cost/spend in Azure and develop recommendations to right-size data infrastructure.

Datacenter Migration, Azure Data Services have a strong virtualization experience.

Experience in troubleshooting and resolving architecture problems including database and storage, network, security and applications.

Experience managing Big Data platform deployed in Azure Cloud.

Implemented Copy activity, Custom Azure Data Factory Pipeline Activities for On-cloud ETL processing.

Experience in Monitoring and Tuning SQL Server Performance.

Experience in configuration of report server and report manager for job scheduling, giving permissions to a different level of users in SQL Server Reporting Services (SSRS).

Expert in creating, debugging, configuring, and deploying ETL packages designed MS SQL Server Integration Services (SSIS).

Configure SQL Azure firewall for a security mechanism.

Work in wearing multiple hats: Azure Architect/System Engineering, network operations and data engineering.

Design & implement migration strategies for traditional systems on Azure (Lift and shift/Azure Migrate, other third-party tools).

Collaborate with application architects on infrastructure as a service (IaaS) application to Platform as a Service (PaaS).

Deploy Azure Resource Manager JSON Templates from PowerShell.

Experience in Performance Tuning and Optimization (PTO), Microsoft Hyper-V virtual infrastructure.

Experience in understanding the security requirements for Hadoop.

Extensive experience in working with Informatica PowerCenter

Implemented Integration solutions for cloud platforms with Informatica Cloud.

Worked with Java based ETL tool, Talend.

Proficient in SQL, PL/SQL and Python coding.

Experience developing On - premise and Real Time processes.

Excellent understanding of best practices of Enterprise Data Warehouse and involved in Full life cycle development of Data Warehousing.

Expertise in DBMS concepts.

Experience in application of various data sources like Oracle SE2, SQL Server, Flat Files and Unstructured files into a data warehouse.

Able to use Sqoop to migrate data between RDBMS, NoSQL databases and HDFS.

Ensure compliance with server-specific architectural standards and implementation practices.

Design and implement database solutions in Azure SQL Data Warehouse, Azure SQL.

Implement ad-hoc analysis solutions using Azure Data Lake Analytics/Store, HDInsight.

Design and implement end-to-end data solutions (storage, integration, processing, visualization) in Azure.

Advice on cost efficiency for future usage and cost optimization for current infrastructure.

Establish, design, and support an enterprise Cloud SQL architecture that provides the required capacity, control and supports the long-term strategic goals for an enterprise database solution.

Technical Skills:

Programming Languages: Python, Spark, Scala, UNIX Shell Script, COBOL, SQL and PL/SQL

Tools: SSMS, Power BI, MS Office (Excel, SharePoint, Visio), Azure Data Factories, Azure Data Lake, Azure SQL Databases, Azure Management tools

Platforms: Windows, Linux

Databases: SQL Server, MySQL, Oracle

Version Control: SVN, CVS, TFS, GIT, Bit Bucket

Big Data Stack: Hadoop, Spark, MapReduce, Hive, Pig, Yarn, Sqoop, Flume, Kafka, Storm

Cloud: AWS Lambda, AWS Athena, AWS Glue, Quick sight and Azure SQL, Azure PowerShell, Data Lake, GCP

Operating Systems: Linux, Unix, ZOS and Windows

ETL Tools: IBM Infosphere Information Server V8, V8.5 & V9.1

Professional Experience:

Client: AT&T, Dallas, Texas Mar 2022 - Till date

Role: Azure Data Engineer

Responsibilities:

Design and implement database solutions in Azure SQL Data Warehouse, Azure SQL.

Worked on AWS Data pipeline to configure data loads from S3 to into Redshift.

Using AWS Redshift, I Extracted, transformed and loaded data from various heterogeneous data sources and destinations

Created Tables, Stored Procedures, and extracted data using T-SQL for business users whenever required.

Performs data analysis and design, and creates and maintains large, complex logical and physical data models, and metadata repositories using ERWIN and MB MDR

I have written shell script to trigger data Stage jobs.

Assist service developers in finding relevant content in the existing reference models.

Like Access, Excel, CSV, Oracle, flat files using connectors, tasks and transformations provided by AWS Data Pipeline.

Utilized Spark SQL API in Pyspark to extract and load data and perform SQL queries.

Worked on developing Pyspark script to encrypting the raw data by using hashing algorithms concepts on client specified columns.

Responsible for Design, Development, and testing of the database and Developed Stored Procedures, Views, and Triggers

Developed Python-based API (RESTful Web Service) to track revenue and perform revenue analysis.

Migrate data from traditional database systems to Azure databases.

Design and implement migration strategies for traditional systems on Azure (Lift and shift/Azure Migrate, other third-party tools.

Experience in DWH/BI project implementation using Azure Data Factory.

Interacts with Business Analysts, Users, and SMEs on elaborating requirements.

Design and implement end-to-end data solutions (storage, integration, processing, visualization) in Azure.

Design and implement database solutions in Azure SQL Data Warehouse, Azure SQL.

Propose architectures considering cost/spend in Azure and develop recommendations to right-size data infrastructure.

Setup and maintain the Azure SQL Database, Azure Analysis Service, Azure SQL Data warehouse, Azure Data Factory, Azure SQL Data warehouse.

Develop conceptual solutions & create proofs-of-concept to demonstrate viability of solutions.

Implement Copy activity, Custom Azure Data Factory Pipeline Activities.

Primarily involved in Data Migration using SQL, SQL Azure, Azure storage, and Azure Data Factory, SSIS, PowerShell.

Create C# applications to load data from Azure storage blob to Azure SQL, to load from web API to Azure SQL and scheduled web jobs for daily loads.

Recreating existing application logic and functionality in the Azure Data Lake, Data Factory, SQL Database and SQL Datawarehouse environment. experience in DWH/BI project implementation using Azure DF and databricks.

Architect, design and validate Azure infrastructure-as-a-Service (IaaS) environment

Develop dashboards and visualizations to help business users analyze data as well as providing data insight to upper management with a focus on Microsoft products like SQL Server Reporting Services (SSRS) and Power BI.

Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL, and U-SQL Azure Data Lake Analytics.

Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in In Azure Databricks.

Implemented Copy activity, Custom Azure Data Factory Pipeline Activities

Primarily involved in Data Migration using SQL, SQL Azure, Azure Storage, and Azure Data Factory, SSIS, PowerShell.

Architect & implement medium to large scale BI solutions on Azure using Azure Data Platform services (Azure Data Lake, Data Factory, Data Lake Analytics, Stream Analytics, Azure SQL DW, HDInsight/Databricks, NoSQL DB).

Worked on a direct query using PowerBI to compare legacy data with the current data and generated reports and stored and dashboards.

Designed SSIS Packages to extract, transfer, load (ETL) existing data into SQL Server from different environments for the SSAS cubes (OLAP)

SQL Server reporting services (SSRS). Created & formatted Cross-Tab, Conditional, Drill-down, Top N, Summary, Form, OLAP, Sub reports, ad-hoc reports, parameterized reports, interactive reports & custom reports

Responsible for creating Requirements Documentation for various projects.

Strong analytical skills, proven ability to work well in a multi-disciplined team environment and adapt at learning new tools and processes with ease.

Environment: Azure SQL, Azure Storage Explorer, Azure Storage, Azure Blob Storage, Azure Backup, Azure Files, Azure Data Lake Storage, SQL Server Management Studio 2016, Visual Studio 2015, VSTS, Azure Blob, Power BI, PowerShell, Spark, Python, ETL, Hive/Hadoop, Snowflakes, Power BI, AWS Data Pipeline.

Client: Comcast, Philadelphia, PA Feb 2021 – Feb 2022

Role: Data Engineer

Responsibilities:

Design & implement migration strategies for traditional systems on Azure (Lift and shift/Azure Migrate, other third-party tools).

Leveraged cloud and GPU computing technologies for automated machine learning and analytics pipelines, such as AWS, GCP

Worked on confluence and Jira

Designed and implemented configurable data delivery pipeline for scheduled updates to customer facing data stores built with Python

Proficient in Machine Learning techniques (Decision Trees, Linear/Logistic Regressors) and Statistical Modeling

Experience in building and architecting multiple Data pipelines, end to end ETL and ELT process for Data ingestion and transformation in GCP

Strong understanding of AWS components such as EC2 and S3

Implemented a Continuous Delivery pipeline with Docker and Git Hub

Strong understanding of AWS components such as EC2 and S3

Performed Data Migration to GCP

Responsible for data services and data movement infrastructures

Experienced in ETL concepts, building ETL solutions and Data modeling

Worked on architecting the ETL transformation layers and writing spark jobs to do the processing.

Aggregated daily sales team updates to send report to executives and to organize jobs running on Spark clusters

Used various sources to pull data into Power BI such as SQL Server, Excel, Oracle, SQL Azure, etc.

Implement ad-hoc analysis solutions using Azure Data Lake Analytics/Store, HDInsight.

Used various sources to pull data into Power BI such as SQL Server, Excel, Oracle, SQL Azure, etc.

Collaborate with application architects on infrastructure as a service (IaaS) application to Platform as a Service (PaaS).

Build Complex distributed systems involving huge amounts of data handling, collecting metrics building data pipeline, and Analytics.

Architect and implement ETL and data movement solutions using Azure Data Factory, SSIS create and run SSIS Package ADF V2 Azure-SSIS IR.

Migrate data from traditional database systems to Azure databases.

Deploying Azure Resource Manager JSON Templates from PowerShell worked on Azure suite: Azure SQL Database, Azure Data Lake, Azure Data Factory, Azure SQL Data Warehouse, Azure Analysis Service.

Engage with business users to gather requirements, design visualizations, and provide training to use self-service BI tools.

Design and implement end-to-end data solutions (storage, integration, processing, visualization) in Azure.

Implemented a Continuous Delivery pipeline with Docker, and Git Hub and AWS

Built performant, scalable ETL processes to load, cleanse and validate data

Participated in the full software development lifecycle with requirements, solution design, development, QA implementation, and product support using Scrum and other Agile methodologies

Design and implement database solutions in Azure SQL Data Warehouse, Azure SQL.

Identify potential problems and recommend alternative technical solutions.

Participating in Technical Architecture Documents, Project Design, and Implementation discussions.

Propose architectures considering cost/spend in Azure and develop recommendations to right-size data infrastructure.

Migration of on-premise data (Oracle/ SQL Server/ DB2/ MongoDB) to Azure Data Lake Store (ADLS) using Azure Data Factory (ADF V1/V2).

Environment: SQL Server Management Studio 2014, Visual Studio 2015, VSTS, Azure SQL, Azure Storage Explorer, Gcp, Big query, G-Cloud Function, Apache Beam, Cloud Dataflow, Cloud Shell, Python, Scala, Spark, Hive, Spark -SQL.

Client: Amex, Phoenix, AZ Dec 2019 – Jan 2021

Role: Data Engineer

Responsibilities:

Perform software installations and upgrades to operating systems and layered software packages.

Monitor communications performance using visual, diagnostic equipment, status indicator checking methods, etc., to locate problems

Manage the lifecycle for technical support documentation including Standard Operating Procedures and work instructions.

Analyze and cleanse raw data using HiveQL

Experience in data transformations using Map-Reduce, HIVE for different file formats.

Involved in converting Hive/SQL queries into transformations using Python

Performed complex joins on tables in hive with various optimization techniques

Created Hive tables as per requirements, internal or external tables defined with appropriate static and dynamic partitions, intended for efficiency.

Experience in fact dimensional modeling (Star schema, Snowflake schema), transactional modeling and SCD (Slowly changing dimension)

Devised PL/SQL Stored Procedures, Functions, Triggers, Views and packages. Made use of Indexing, Aggregation and Materialized views to optimize query performance.

Developed logistic regression models (Python) to predict subscription response rate based on customers variables like past transactions, response to prior mailings, promotions, demographics, interests, and hobbies, etc.

Develop near real time data pipeline using spark

Process and load bound and unbound Data from Google pub/sub topic to Bigquery using cloud Dataflow with Python

Hands of experience in GCP, Big Query, GCS bucket, G - cloud function, cloud dataflow, Pub/sub cloud shell, GSUTIL, BQ command line utilities, Data Proc, Stack driver

Implemented Apache Airflow for authoring, scheduling and monitoring Data Pipelines

Proficient in Machine Learning techniques (Decision Trees, Linear/Logistic Regressors) and Statistical Modeling

Schedule installations and upgrades and maintain them in accordance with established IT policies and procedures.

Use network management tools such as the Cisco Identity Services Engine (ISE) and Cisco Prime to identify, troubleshoot, and resolve wired and wireless networking issues on LAN equipment and/or PCs and other end-user devices.

Track and report issues in our ticketing system.

Participate in statewide projects to plan and implement system operation, optimization, enhancements, and upgrades.

Set up, configure, and troubleshoot AV collaboration systems and software used by staff.

Ensure data/media recoverability by implementing a schedule of system backups and database archive operations. Implement and promote standard operating procedures.

Administer data backup systems at remote offices to ensure data availability, security, and recoverability. Quickly restore data accidentally deleted by customers.

Develop Informatica cloud real time processes (ICRT).

Work with WSDL, SOAP UI for APIs

Write SOQL queries, create test data in salesforce for informatica cloud mappings unit testing.

Prepare TDDs, Test Case documents after each process has been developed.

Environment: SQL Server, Windows Server, Windows 2008 R2, Windows 2000 Advanced Server, Windows XP, Unix, LINUX, Shell Script. Hadoop, HDFS, Hive, PIG, Cloudera, MapReduce, Python, Informatica Cloud Services, Salesforce, Unix scripts, Flat Files, XML files

Client: Bank of the West, Mission Viejo, CA Aug 2018 – Nov 2019

Role: Data Engineer

Responsibilities:

Highly Involved into Data Architecture and Application Design using Cloud and Big Data solutions on AWS, Microsoft Azure.

Leading the effort for migration of Legacy-system to Microsoft Azure cloud-based solution. Re-designing the Legacy Application solutions with minimal changes to run on cloud platform.

Built the data pipeline using Azure Service like Data Factory to load the data from Legacy SQL server to Azure Data Base using Data Factories, API Gateway Services, SSIS Packages, Talend Jobs, custom .Net and Python codes.

Built Azure Web Job for Product Management teams to connect to different APIs and sources to extract the data and load into Azure Data Warehouse using Azure Web Job and Functions.

Build various pipeline to integrate the Azure Cloud to AWS S3 to get the data into Azure Database.

Set up the Hadoop and Spark cluster for the various POCs, specifically to load the Cookie level data and real-time streaming. Integrate with other ecosystems like Hive, Hbase, Spark, HDFS/Data Lake/Blob Storage.

Set up the Spark Cluster to process the more than 2 Tb of data and dumped into SQL Server. In addition, built various Spark jobs to run Data Transformations and Actions.

Writing a different APIs to connect with the different Media Data feeds like, Prisma, Double Click Management, Twitter, Facebook, Instagram and Amnet to get the Data using Azure Web Job and Functions integration with Cosmos DB.

Built the trigger-based Mechanism to reduce the cost of different resources like Web Job and Data Factories using Azure Logic Apps and Functions.

Extensively worked on Relational Database, Postgres SQL as well as MPP database like Redshift.

Experience in Custom Process design of Transformation via Azure Data Factory & Automation Pipelines. Extensively used the Azure Service like Azure Data Factory and Logic App for ETL, to push in/out the data from DB to Blob storage, HDInsight - HDFS, Hive Tables.

Client: Genpact, India Jul 2016 – Jul 2017

Role: Big Data Engineer

Responsibilities:

Imported the data from various formats like JSON, Sequential, Text, CSV, AVRO and Parquet to HDFS cluster with compressed for optimization.

Worked on ingesting data from RDBMS sources like - Oracle, SQL Server and Teradata into HDFS using Sqoop.

Loaded all datasets into Hive from Source CSV files using Spark and Cassandra from Source CSV files using Spark

Created environment to access Loaded Data via Spark SQL, through JDBC&ODBC (via Spark Thrift Server).

Developed real time data ingestion/ analysis using Kafka / Spark-streaming.

Configured Hive and written Hive UDF's and UDAF's Also, created Static and Dynamic with bucketing as required.

Worked on writing Scala programs using Spark on Yarn for analyzing data.

Managing and scheduling Jobs on a Hadoop cluster using Oozie.

Created Hive External tables and loaded the data into tables and query data using HQL.

Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.

Developed Oozie workflow for scheduling and orchestrating the ETL process and worked on Oozie workflow engine for job scheduling.

Managed and reviewed the Hadoop log files using Shell scripts.

Migrated ETL jobs to Pig scripts to do transformations, even joins and some pre-aggregations before storing the data onto HDFS.

Using Hive join queries to join multiple tables of a source system and load them to Elastic search tables.

Real time streaming, performing transformations on the data using Kafka and Kafka Streams.

Built NiFi dataflow to consume data from Kafka, make transformations on data, place in HDFS & exposed port to run Spark streaming job.

Developed Spark Streaming Jobs in Scala to consume data from Kafkatopics, made transformations on data and inserted to HBase.

Implemented Spark using Scala and SparkSQL for faster testing and processing of data.

Experience in managing and reviewing huge Hadoop log files.

Collected the logs data from web servers and integrated in to HDFS using Flume.

Expertise in designing and creating various analytical reports and Automated Dashboards to help users to identify critical KPIs and facilitate strategic planning in the organization.

Involved in Cluster maintenance, Cluster Monitoring and Troubleshooting.

Worked with Avro Data Serialization system to work with JSON data formats.

Used Amazon Web Services (AWS) S3 to store large amount of data in identical/similar repository.

Worked with the Data Science team to gather requirements for various data mining projects.

Automated the process of rolling day-to-day reporting by writing shell scripts.

Involved in build applications using Maven and integrated with Continuous Integration servers like Jenkins to build jobs.

Worked on BI tools as Tableau to create dashboards like weekly, monthly, daily reports using tableau desktop and publish them to HDFS cluster.

Environment: Spark, Spark SQL, Spark Streaming, Scala, Kafka, Hadoop, HDFS, Hive, Oozie, Pig, Nifi, Sqoop, AWS (EC2, S3, EMR), Shell Scripting, HBase, Jenkins, Tableau, Oracle, MySQL, Teradata and AWS.

Client: Sutherland, India Apr 2013 – Jun 2016

Role: Data Analyst

Responsibilities:

Responsible for gathering requirements from Business Analyst and Operational Analyst and identifying the data sources required for the request.

Worked closely with a data architect to review all the conceptual, logical and physical database design models with respect to functions, definition, maintenance review and support data analysis, Data quality and ETL design that feeds the logical data models.

Maintained and developed complex SQL queries, stored procedures, views, functions, and reports that qualify customer requirements using SQL Server 2012.

Creating automated anomaly detection systems and constant tracking of its performance.

Support Sales and Engagement's management planning and decision making on sales incentives.

Used statistical analysis, simulations, predictive modelling to analyze information and develop practical solutions to business problems.

Extending the company's data with third-party sources of information when needed.

Précised development of several types of sub-reports, drill down reports, summary reports, parameterized reports, and ad-hoc reports using SSRS through mailing server subscriptions &SharePoint server.

Generated ad-hoc reports using Crystal Reports 9 and SQL Server Reporting Services (SSRS).

Developed the reports and visualizations based on the insights mainly using Tableau and dashboards for the company insight teams.

Environment: SQL Server 2012, SSRS, SSIS, SQL Profiler, Tableau, Qlik View, Agile, ETL, Anomaly detection.

Education:

• Bachelors in Computer Science from GITAM India.

• Masters in Computer Science from UTA Texas.

Contact this candidate