Post Job Free

Resume

Sign in

Data Engineer Analyst

Location:
Louisville, KY
Salary:
60/Hr
Posted:
May 26, 2023

Contact this candidate

Resume:

Shivasharana Reddy

Phone: +1-502-***-****

Email: adxcnd@r.postjobfree.com

LinkedIn : https://www.linkedin.com/in/shivasharan-reddy-a49a0818/

Summary

Total 10+ Years of IT Experience in Data engineering and Analytics.

Worked on distributed and cloud computing Big Data technologies like Apache Hadoop, AWS, Azure, Databricks, Spark, PySpark, Oracle. proficient in working on programing languages like Python, Scala.

Expertise in building ETL frameworks using various tools and technologies.

Worked on verities of data storage formats like Parquet, ORC, AVRO, XML, JSON, XLS, CSV,

data formats like Structured, Unstructured, and Semi structured data

Having good exposer on supervised, unsupervised learning and Natural language processing (NLP) methods and mathematical and statistical methods

Developed date warehouse solutions in Hadoop using HDFS, Hive, Pig, Sqoop, HBase, Oozie, Cloudera Hue, Cloudera Manager, Scala, Spark, Python, Java, Impala, Ambari, Ranger

Developed cloud-based solutions using AWS Redshift, Glue, Lambda, Athena, S3, Spectrum, Azure data factory, Synapse, Databricks.

Supported creation of technology standards and reusable components

Proficient in other tools and technologies like oracle 11g/12c DB, SQL Server, MySQL, Netezza

Good exposer on Design, document, and implement data warehouse strategies, including building ETL, ELT, and data pipeline processes.

Translate business requirements into high level and low-level design document.

Participated in task breakdown and effort estimation on upcoming projects.

Part of review team to implement new changes on analytical data platform. Identify, research emerging technologies, conducted proof of values and converted into full fledged project.

Collaborated with different stakeholders like product managers, architects, Analytics, and project managers to deliver solutions.

Having complete picture of software lifecycle management and following best practices throughout.

Supervise team activities including work scheduling, technical direction, and standard development practices.

Believing in continues improvement, developed a multiple framework to improve the efficiency of team across my carrier.

Education

Graduate in Industrial and Production Engineering from P.D.A College of engineering affiliated by Visveswaraiah Technological University, Karnataka, India in the year of 2007.

Tools/Technologies:

Big Data Ecosystems: Apache Hadoop, MapReduce, Spark, HDFS, HBase, Hive, Pig, Sqoop, Oozie, Kafka, Nifi, Airflow

Cloud Ecosystems: AWS Glue, Lambda, Athena, S3, EMR, Redshift, Spectrum, Databricks, Azure data factory, Synapse

Langages : Python, Scala, Java, PL/SQL

Machine learning: Scikit-learn, Pandas, Matplotlib, NumPy, NLTK

Databases: Oracle, Netezza, RedShift, SQL Server, MySQL

NoSQL Database: HBase, Elastic Search

Operating System: Windows, Red hat Linux

Tools Used: Spyder, PyCharm, Toad, IntelliJ, Anaconda, Atlan Data Catalog

Streaming tools (Real time): Kafka, Nifi

Version Controls: SVN, TFS, Mercurial, Bitbucket, Git

Data Processing: Structured, Unstructured, and Semi structured data

Experience:

Clients: Humana, Louisville, Ky Mar 2023 – till date

Role: Sr Data Analyst

Worked on Azure Databricks, Hyperscale, Azure Blob, Data Factory and Synapse to develop pipeline for Insurance claims data.

Understand the complexity of existing Insurance Claim Management system and worked on improving the performance of existing logic built on hyperscale Database.

Technologies: SQL Server Database, Azure Blob, Data Factory, Synapse, and Azure Databricks, Python, PySpark

Clients: ChargePoint, Aspirion Jan 2022 – Dec 2022

Role: Lead Data Engineer

Employer: Happiest Minds technologies Dec 2021- Dec 2022

Clients: Aspirion, Columbus, GA

Lead Data Engineer

As a Primary resource understood complete project and data architecture and worked closely with client team to improve the data collection, quality, reporting and analytics capabilities.

Worked on AWS Glue, data Catalog, S3, Athena, Lambda to build the data pipelines.

Collaborated with different teams to understand the requirement and resolve the issues.

Worked on Azure Blob, Data Factory, Synapse, and Azure Databricks to build the ETL data pipelines.

Understand the complexity of existing system proposed the new solutions to run the pipeline efficiently.

Optimized existing pipelines, resulted in reduced operational cost, scalability, and usability.

Technologies: AWS Glue, data Catalog, S3, Athena, Lambda, MySQL Database, Azure Blob, Data Factory, Synapse, and Azure Databricks, Python, PySpark, Bitbucket, Spider, Atlan Data Catalog

Client : Millipore Sigma Burlington, MA Jun 2019 - Dec 2021

Role: Sr Data Engineer

Played the role of Lead engineer and Analyst, designed end to end data lake architecture and helped in analysis of flight risk and learning models of employees.

Collaborated with different teams to streamline the process, conflict resolution and build efficient pipelines.

Proposed, Designed and Developed a Generic plug and play ETL framework on PySpark Databricks. which will enable developers to configure and run any new pipeline in a quick and efficient way. This reduces the development efforts by more than 50%.

Proposed and Developed a Log analyzer framework using NLP and Elastic Search to analyze and recommend a solution occurred in Hadoop cluster that Increase efficiency by 40% of support team.

Helped team in development of Python base Data Validation utility to detect data related issue in early stage of the pipeline.

Handled PII data, developed a secure architecture to encrypt and descript the PII fields in data processing.

Used API response to connect SAP database and extract JSON data into Hadoop system.

Perform a variety of tasks to facilitate completion of projects including coordinating with different teams, helping team in complex production related issue.

Reading unstructured data using Grook patterns and converting them into structured formats as CSV.

Working on strategizing efficient migration from Hadoop to AWS cloud and Palantire platforms.

Helped team in migrating the data from Hadoop to AWS and Palantire platform in efficient manner.

Technologies: Hadoop, PySpark, HDFS, Hive, Sqoop, Oozie, Bitbucket, PyCharm, AWS Glue, Lambda, Athena, S3, EMR, Spectrum, Elastic Search, Scikit-learn, Pandas, NumPy, NLTK, Data modeling, DataIku, Palantir, Ambari,

PySpark Databricks.

Client: Coupons.com, Salt Lake City, UT Dec 2017 – Sep 2018

Role: Sr Data Engineer

As a senior member in the team strategized the migration plan and helped minimized timeline and implemented efficient way.

Helped in migration from legacy DataStage pipelines to Hadoop, Spark ecosystem efficiently with help of team.

Emphasized the importance of file monitoring framework for all non-Hadoop pipelines and implemented using python, it saves a weekly 5Hr of operations time.

Built a customer sessionization logic on a Coupons website clickstream data using PySpark, which eliminated the use of HBase cluster.

Developed a configurable generic PySpark utility to generate XML reports by reading a JSON file, which suits different clients based on configuration.

Technologies: Hadoop, PySpark(Databricks), HDFS, Hive, Sqoop, Oozie, Mercurial, PyCharm, Python, Pandas, Apache Solr, Unix Shell scripting, Cloudera Manager, Databricks.

Client: Epsilon USA (Alliance data company) Sep 2015 –Dec 2017

Role: Sr Database Engineer

Played the role of architect, designed end to end Hadoop Data Lake for NBA client, owned the project development activity and helped in smooth transition.

Provided excellent leadership by recommending the right technologies and solutions for a given use case.

Provided technical support to resolve or assist in resolution of issues relating to production systems.

Involved in requirement gathering and designing end to end project architecture in Hadoop.

Built and automated a report using MS SSIS that reduced a weekly 2 Hrs. of manual efforts.

Involved in migrating data pipelines from Netezza to AWS environment.

Initiated multiple process improvement activities.

Reading complex XML data with 7 sub levels, store them into Impala tables.

Technologies: Hadoop, PySpark, HDFS, Hive, Pig, HBase, Sqoop, Oozie, SVN, IntelliJ, Unix Shell scripting, Data modeling, AWS Glue, S3, Redshift. MS SSIS, SQL Server

IMS Health (IQVIA) Danbury, CT - Nov 2011 - May 2015

Role: Data Engineer

Built data Lake in Hadoop, migrating pipeline and data from Netezza to Hive and pig scripts.

Worked on top revenue generating analytical project. It analyzes trends of the pharmaceutical product and market segments.

Built a cost-effective generic pipeline and data model that accommodates multiple client reports in single data platform.

Worked on performance improvement of data pipelines and awarded by out of the box thinker award.

Provided excellent leadership by recommending the right technologies and solutions for a given use case.

Designed best practices to support continuous process automation for data ingestion and data pipeline workflows.

Reading X12 format and converting into CSV format

Prepared and presented reports, analysis and presentations to various stakeholders including executives.

Technologies: Oracle PLSQL, Netezza, Hadoop, Python, HDFS, Hive, Pig, Sqoop, Unix Shell scripting, Data modeling, Toad, AWS Redshift, S3



Contact this candidate