Software Development Data Engineer

Location:

Hyderabad, Telangana, India

Posted:

February 22, 2024

Contact this candidate

Resume:

Sr. Data Engineer/Analyst

Greeshma Sai

Cont No : 713-***-****

Email ID: **************@*****.***

PROFESSIONAL SUMMARY:

Overall 9+ years of professional experience in IT industry including hands on experience in various Software development phases in Requirement Specification, design, development and implementation of business applications through Java and Bigdata.

Experienced in Software Development Life Cycle (SDLC) implementing Waterfall, V-Model and Agile methodologies.

Experience in migrating on premise ETL’s to GCP using cloud native tools such as Big Query, Cloud Data Proc, and Google Cloud Storage.

Experience in Hadoop, Java Development and Administration experience with an expert hand in the areas of BIG DATA, HADOOP, SPARK, HIVE, IMPALA, SQOOP, FLUME, KAFKA, SQL tuning, ETL development, database development and data modeling.

Good working experience in Amazon Web Services (AWS) Cloud Platform which includes services like EC2, S3, IAM, DynamoDB, Cloud Watch, Auto Scaling, Security Groups, EC2 Container Service (ECS), Code Commit, Code Pipeline, Code Build, Code Deploy and CloudFormation.

Hands-on experience in installing, configuring, and monitoring HDFS clusters.

Hands of experience in GCP, Big Query, GCS bucket, G - cloud function, cloud data flow, Pub/suB cloud shell, GSUTIL, BQ command line utilities, Data Proc, Stack driver.

Hands on experience in application development using Oracle(SQL,PL/sqL),Pyhon,shell scripting.

Design and implement database solutions in Azure SQL Data Warehouse, Azure SQL.

Building a Scala and spark based con gurable framework to connect common Data sources like MYSQL, Oracle, Postgres, SQL Server, Salesforce, Bigquery and load it in Bigquery.

Deep analysis on SQL execution plan and recommend hints or restructure or introduce index or materialized view for better performance.

Unix Scripting to manage the Hadoop Operation stuffs.

Created Data Connections, Published on Tableau Server for usage with Operational or Monitoring Dashboards.

Created Azure SQL database, performed monitoring and restoring of Azure SQL database. Performed migration of Microsoft SQL server to Azure SQL database.

Developed Job Scheduler scripts for data migration using UNIX Shell scripting.

Created interactive UI Design in OBIEE Answers and Dashboards, including experience in User Interface Design using CSS.

Good Experience in Linux Bash scripting and following PEP-8 Guidelines in Python.

Experience developing Pig Latin and HiveQL scripts for Data Analysis and ETL purposes and extended the default functionality by writing User Defined Functions (UDFs) for data specific processing.

Experience with migrating data to and from RDBMS and unstructured sources into HDFS using Sqoop & Flume.

Developed rest API's using python with flask and Django framework and done the integration of various data sources including Java, JDBC, RDBMS, Shell Scripting, Spreadsheets, and Text files.

Experience with Snowflake cloud data warehouse and AWS S3 bucket for int egrating data from multiple source system which include loading nested JSON formatted data into snowflake table.

Experience on Migrating SQL database to Azure data Lake, Azure data lake Analytics, Azure SQL Database, Data Bricks and Azure SQL Data warehouse and Controlling and granting database access and Migrating On premise databases to Azure Data lake store using Azure Data factory.

Experience in designing connection pools and schemas like Star and Snowflake Schema on the OBIEE repository.

Well-versed database development knowledge using SQL data types, Indexing, Joins, Views, Transactions, Large Objects and Performance tuning.

Participates in the development improvement and maintenance of snowflake database applications

Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL and U-SQL Azure Data Lake Analytics . Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in In Azure Databricks.

Developed rest API's using python with flask framework and done the integration of various data sources including Java, JDBC, RDBMS, Shell Scripting, Spreadsheets, and Text files.

Sound knowledge of Object-Oriented Programming (OOP), Analysis and Design (OOAD) concepts and designing experience using Star UML, Rational rose to develop UML/ design artifacts like Class Diagrams, Sequence Diagrams and Use case realization.

Understanding of SnowFlake cloud technology.

Experience in implementing/supporting data security on the group and Ul security using web groups in OBIEE WebLogic.

Hands on experience of UNIX and shell scripting to automate scripts

Performed Data Analysis, Data Migration, Data Cleansing, Transformation, Integration, Data Import, and Data Export through Python.

Define virtual warehouse sizing for Snowflake for different type of workloads.

Building a Scala and spark based configurable framework to connect common Data sources like MYSQL, Oracle, Postgres, SQL Server, Salesforce, BigQuery and load it in BigQuery.

Hands on experience on architecting the ETL transformation layers and writing spark jobs to do the processing.

Automating ETL process by using Unix/Perl/Python scripting languages.

Define virtual warehouse sizing for Snowflake for different type of workloads.

Hands-on experience with Snowflake utilities, SnowSQL, SnowPipe, Big Data model techniques using Python / Java.

Design relational and non-relational data stores on Azure.

Define virtual warehouse sizing for Snowflake for different type of workloads.

Experience with Snowflake cloud data warehouse and GCP bucket for integrating data from multiple source system which include loading nested JSON formatted data into snowflake table.

Strong knowledge of Google BigQuery and architecting data pipelines from on-prem to GCP.

Monitoring BigQuery, Dataproc and cloud Dataflow jobs via Stackdriver for all the environment.

Performance tuning of Big Data workloads.

Develop stored procedures/views in Snowflake and use in Talend for loading Dimensions and Facts.

Hands-on experience with Snowflake utilities, SnowSQL, SnowPipe, Big Data model techniques using Python / Java.

Developed Framework scripts to automate Workflows in Data warehouse systems using Unix/Perl scripting language.

Implement Azure Data bricks clusters, notebooks, jobs and auto scaling.

Have good Knowledge in ETL and hands on experience in ETL.

Operationalize data ingestion, data transformation and data visualization for enterprise use.

Implemented innovative solutions using various Hadoop ecosystem tools like Pig, Hive, Impala, Sqoop, Flume, Kafka, HBase, and Zookeeper.

Hands - on experience in Azure Analytics Services - Azure Data Lake Store (ADLS), Azure Data Lake Analytics (ADLA), Azure SQL DW, Azure Data Factory (ADF), Azure Data Bricks (ADB) etc.

Experience developing Pig Latin and HiveQL scripts for Data Analysis and ETL purposes and extended the default functionality by writing User Defined Functions (UDFs) for data specific processing.

Experience with migrating data to and from RDBMS and unstructured sources into HDFS using Sqoop & Flume.

Performed logical and physical data structure designs and DDL generation to facilitate the implementation of database tables and columns out to the DB2, SQL Server, AWS Cloud (Snowflake) and Oracle DB schema environment using ERwin Data Modeler Model Mart Repository version 9.6.

Involved in migrating the client data warehouse architecture from on-premises into Azure cloud.

TECHNICAL SKILLS:

Hadoop Distribution

Cloudera (CDH5 and CDP)

Bigdata Echo System

HDFS, YARN, Mesos, MapReduce, Hive, Pig, Sqoop, Flume, Spark, Kafka, Zookeeper, and Oozie.

NO SQL Databases

HBase.

Languages

Java, Scala, SQL, PL/SQL, Pig Latin, HiveQL, Unix, Java Script, Shell Scripting, Python.

Java & J2EE Technologies

Core Java, Servlets, Hibernate, Spring, Struts.

Webservices API

Restful (JSON, XML)

Application Servers

Web Logic, Web Sphere, JBoss, Tomcat.

Cloud Computing Tools

Azure, GCP, AWS

Databases

MySQL, Oracle, DB2

Operating Systems

UNIX, Windows, LINUX.

Development Tools

Microsoft SQL Studio, Toad, Eclipse, NetBeans.

Development Methodologies

Agile/Scrum, Kanban.

WORK EXPERIENCE:

Client: Charter Communications, Englewood, CO Duration: Nov 2022 to Present

Role: Sr. Data Engineer

Responsibilities:

Developing spark jobs with Java API to process the data from AWS S3 to AWS RDS.

Developed AWS Lambdas using both java and python which can automate the flow.

Prepared execution plans using AWS step functions (state machine).

Processing the raw data and loading into staging layer by using AWS Glue ETL Service.

Implemented Git branching strategy & Junit test cases which can give more than 80% code on SonarQube reports.

Worked on action items like iBots in OBIEE to provide real-time, personalized, and actionable intelligence data.

Created intelligent UI Design on OBIEE Answers, Delivers, and Dashboards utilizing CSS, HTML, and JavaScript.

Developed AWS Glue jobs to process data from Redshift to S3(master)

Developed AWS pyspark jobs to process master data(s3) to FHIR data (Fast Healthcare Interoperability Resources) and load into S3.

Migrating the existing project to Microsoft Azure with Azure services.

Azure Resource Manager (ARM) provides security, auditing, and tagging features to help you manage your resources after deployment.

Created reports for business clients utilizing data visualization tools like OBIEE, Oracle Business Intelligence Publisher, and Tableau.

Implemented AWS lambda to process from S3.

Prepared test scripts to validate the data from source to master and master to FHIR.

Involved more release plan creation and Production deployments.

Worked on the installation, configuration, and set up of the OBIEE administration tool and platform.

Involved in Jenkins pipeline creation & automation scripts.

Worked on performance tuning to process big data.

Prepared data models to process data using DBT into snowflake.

Prepared test cases, documents generation and deploy the builds into env.

Extensively worked on snowflake (created file format, tables, views, s3 integration connection, snow pipes, stored procedure, tasks, time travel featuresetc.,)

Created the total security framework for reports in OBIEE. BI Publisher, and Tableau.

Implemented Snowflake Data Pipeline to configure Auto-Ingestion from AWS S3 to Snowflake table.

Hands on loading and unloading data using COPY, PUT, LIST, GET and REMOVE commands in Snow SQL.

Developed Splunk dashboards & power BI reports for daily & monthly monitor

Environment: Spark Java, AWS (Lambda,Kubernetes, S3, SNS, Redshift, CloudWatch, RDS, Athena), Git repository, PostgreSQL, SQL, Python, Splunk, Bamboo CI/CD

Client: CVS PHARMACY, Woonsocket, RI Duration: Feb 2020– Oct 2022

Role: Data Engineer

Responsibilities:

Experience with Snowflake cloud data warehouse and gcp bucket for integrating data from multiple source system which include loading nested JSON formatted data into snowflake table.

Responsible to manage data coming from different sources.

Have Extensive Experience in IT data analytics projects and hands on experience premise ETLs to Google Cloud Platform (GCP) using cloud nativ tools such as BigQuery, Cloud Dataproc, Google Cloud Storage and Composer.

Developed server side code for Google Cloud Platform-based apps, building dependable, high-volume production applications, and fast developing prototypes.

Hands of experience in GCP, Big Query, GCS bucket, G - cloud function, cloud dataflow, Pub/suB cloud shell, GSUTIL, BQ command line utilities, Data Proc, Stack driver

Building a Scala and spark based configurable framework to connect common Data sources like MYSQL, Oracle, PostgreSQL, SQL Server, Salesforce, and BigQuery and load it in BigQuery.

Worked on migrating oracle database to BigQuery.

Worked on loading Salesforce Data on incremental basis to BIGQUERY raw and UDM layer using Google DataProc, GCS bucket, HIVE, Spark, Scala, Python, Gsutil and Shell Script.

Build a program with Python and apache beam and execute it in cloud Dataflow to run Data validation between raw source file and BigQuery tables.

Worked with professional software engineering practices for the full software development life cycle including coding standards, source control management control and build processes.

Developed Python code using version control tools like GIT hub on vagrant machines.

Uploaded and processed terabytes of data from various structured and unstructured sources into HDFS (Google cloud) using Sqoop and GsUtil packages.

Worked with Azure computer services, Azure Web apps, Azure Data Factory & Storage, Azure Media & Content delivery, Azure Networking, Azure Hybrid Integration, and Azure Identity & Access Management.

Implemented Azure Databricks to other Azure services (such as Azure Storage) in a more secure manner using service endpoints.

Created BigQuery authorized views for row level security or exposing the data to other teams.

Parse Json files through Spark core to extract schema for the production data using SparkSQL and PySpark.

Development and implementation of applications using Java, Python, Oracle, SQL Server, Hadoop, Apache Hive and Google Cloud platform aligned to specific customer pain points.

Submit spark jobs using gsutil and spark submission get it executed in Dataproc cluster.

Using g-cloud function with Python to load Data in to BigQuery for on arrival csv files in GCS bucket.

Strong experience in Business and Data Analysis, Data Profiling, Data Migration, Data Integration, Data governance and Metadata Management, Master Data Management and Configuration Management.

Creating views in GCP Bigquery and built views in Looker to model the data coming from the views from BQ.

WrotescriptsinHiveSQLforcreatingcomplextableswithhighperformancemetricslike partitioning, clustering and skewing.

Collaborating with and supporting data science, marketing, and customer success teams in data acquisition and tool integration.

Uploaded and processed terabytes of data from various structured and unstructured sources into HDFS (Google cloud) using Sqoop and GsUtil packages.

Coordinated with team and Developed framework to generate Daily adhoc reports and Extracts from enterprise data from BigQuery.

Worked with Google data catalog and other Google cloud API’s for monitoring, query and billing related analysis for BigQuery usage.

Environment: Hadoop, HDFS, Hive, Spark, MapReduce, ETL, REST, Java, Python, PL/SQL, Oracle 11g, Unix/Linux, Snowflake and GCP.

Client: Equifax, Atlanta, GA Duration: Nov 2018 to Jan 2020

Role: Data Engineer

Responsibilities:

Analysis of Technical and functional requirements

Load and transform data into S3 from different sources (MSSQL, semi structure &unstructured data) using AWS DMS.

Developed AWS Glue ETL jobs (PySpark) to make analytical data and store in to Redshift EDW / S3.

Prepared AWS Lambda functions Service for event driven operations in pipe line

Prepared execution plans using AWS step functions (state machine)

Coordination with the client and managing day - to - day activities.

Created queries in both Athena & warehouse that helped business analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics.

Developed Master screens (Customer, Agent, and Product etc.)

Created U-SQL script for transform activities and developed complex queries to transform the data from multiple sources and output the data into Azure Data warehouse.

Configured Blob storage services in Azure for storing large amounts of unstructured object data which can be accessible via HTTP and HTTPS.

Involved in performance tuning of the SQL’s.

Developed and implements physical data models to define the database structure. Optimizing database performance through efficient indexing and table relationships.

Writing HiveQL, performance tuning and monitoring. Using the event filters and writhing the JSON objects to define and match with the audit events.

Implemented Optimize performance for data access requirements by choosing the appropriate native Hadoop file formats (Avro, Parquet, ORC etc) and compression codec respectively.

Implements data governance processes and methods for managing metadata, access, retention to data for internal and external users.

Created SVN and GIT repositories and specified branch strategies for development.

Environment: Python, PySpark, AWS S3, Lambda, DMS, Redshift DW, MSSQL, Athena, SNS, RDS.

AWS, Data Bricks, Spark 2.X, Notebooks, HD Insights, Airflow, REST API's, Shell Script, Scala, Python, Pyspark, Insights

Client: Infinite Computer Solutions, India Duration : Apr 2015 – Jan2018

Role: Data Engineer

Responsibilities:

Worked on Agile methodologies and SCRUM process.

Performed Data Profiling to learn about user behavior and merged data from multiple data sources.

Participated in all phases of data mining; data collection, data cleaning, developing models, validation, visualization and performed Gap analysis.

Performed K-means clustering, Multivariate analysis and Support Vector Machines in Python and R.

Developed Clustering algorithms and Support Vector Machines that improved Customer segmentation and Market Expansion.

Professional Tableau user (Desktop, Online, and Server).

Data Storyteller, Mining Data from different Data Source such as SQL Server, Oracle, Cube Database, Web Analytics and Business Object.

Worked on Agile Methodology and responsible for monitoring and managing the development and production and worked with AzurePortal to Provide IAAS resources to client.

Worked with AzurePaaS and IaaSSolutions like AzureWebApps, Web Roles, Worker Roles, and SQL Azure and Azure Storage.

Provided AD hoc analysis and reports to the Executive level management team.

Data Manipulation and Aggregation from different sources using Nexus, Toad, Business Objects, Power BI and Smart View.

In the Unix development environment, for Financial application reports used batch processes and models using Perl and Korn shell scripts with partitions and sub partitions on oracle database.

Developed analytics and strategy to integrate B2B analytics in outbound calling operations.

Implemented analytics delivery on cloud-based visualization using shiny tools for Business Object and Google analytics platform.

Main source of Business Regression report.

Created various B2B Predictive and descriptive analytics using R and Tableau.

Created and automated ad hoc reports.

Worked on NOSQL databases like Cassandra.

Parsing data, producing concise conclusions from raw data in a clean, well-structured and easily maintainable format.

Worked on different data formats such as JSON, XML and performed machine learning algorithms in R and Python.

SPOC Data Engineer and predictive analyst to create annual and quarterly Business forecast reports.

Responsible for planning & scheduling new product releases and promotional offers.

Environment: R, Tableau, Python, UNIX Scripting, Cassandra, Java, Oracle, Eclipse, Spark Java, AWS (Lambda, EMR, S3, SNS, Redshift, CloudWatch, RDS, Athena), Git repository, PostgreSQL, SQL, Python.

Client: Adani Group - India Duration: July 2013 – March 2015

Role: Data Analyst

Responsibilities:

•Responsible for development, testing, deployment, support and enhancement on Daily, Weekly, Quarterly, Yearly and Adhoc BAUs (Business As Usual).

•Balance multiple projects and priorities in a fast-paced environment by Waterfall methodology

•Design the Power BI dashboard to meet the business requirements with interactive reporting systems.

•Worked with clients to understand the business needs and translate their needs to actionable reports from SQL, MS Excel, Power BI.

•Collaborate with key stakeholders to find solutions as per the business requirements

•Automated report for sales team using filters, parameters and suggesting the improvement of sales conversion rate by 20%.

•Used stored procedures, triggers, and views to provide the structured data to the business units from multiple sources.

•Communicating with stakeholders to provide continuous support.

•Building and maintaining SQL scripts and complex queries for data analysis and extraction after getting input from the Business user.

•Scheduling data refresh and file handling by python scripting.

•Working closely with various functional teams to translate business requirements into optimized semantic layers, reports and dashboards.

•Developing business process flows and data models for new system development or for process redesign.

Environment: SQL, MS Excel, Power BI, python, GIT, Jenkins pipelines, data models, Shell scripts,Linux,

Pyspark, AWS (Lambda, Glue, S3, SQS, Redshift, CloudWatch, RDS, Athena), Jenkins, Git repository,.

Contact this candidate