Shivasharana Reddy
Phone: +1-502-***-****
Email: adxcnd@r.postjobfree.com
LinkedIn : https://www.linkedin.com/in/shivasharan-reddy-a49a0818/
Summary
Total 10+ Years of IT Experience in Data engineering and Analytics.
Worked on distributed and cloud computing Big Data technologies like Apache Hadoop, AWS, Azure, Databricks, Spark, PySpark, Oracle. proficient in working on programing languages like Python, Scala.
Expertise in building ETL frameworks using various tools and technologies.
Worked on verities of data storage formats like Parquet, ORC, AVRO, XML, JSON, XLS, CSV,
data formats like Structured, Unstructured, and Semi structured data
Having good exposer on supervised, unsupervised learning and Natural language processing (NLP) methods and mathematical and statistical methods
Developed date warehouse solutions in Hadoop using HDFS, Hive, Pig, Sqoop, HBase, Oozie, Cloudera Hue, Cloudera Manager, Scala, Spark, Python, Java, Impala, Ambari, Ranger
Developed cloud-based solutions using AWS Redshift, Glue, Lambda, Athena, S3, Spectrum, Azure data factory, Synapse, Databricks.
Supported creation of technology standards and reusable components
Proficient in other tools and technologies like oracle 11g/12c DB, SQL Server, MySQL, Netezza
Good exposer on Design, document, and implement data warehouse strategies, including building ETL, ELT, and data pipeline processes.
Translate business requirements into high level and low-level design document.
Participated in task breakdown and effort estimation on upcoming projects.
Part of review team to implement new changes on analytical data platform. Identify, research emerging technologies, conducted proof of values and converted into full fledged project.
Collaborated with different stakeholders like product managers, architects, Analytics, and project managers to deliver solutions.
Having complete picture of software lifecycle management and following best practices throughout.
Supervise team activities including work scheduling, technical direction, and standard development practices.
Believing in continues improvement, developed a multiple framework to improve the efficiency of team across my carrier.
Education
Graduate in Industrial and Production Engineering from P.D.A College of engineering affiliated by Visveswaraiah Technological University, Karnataka, India in the year of 2007.
Tools/Technologies:
Big Data Ecosystems: Apache Hadoop, MapReduce, Spark, HDFS, HBase, Hive, Pig, Sqoop, Oozie, Kafka, Nifi, Airflow
Cloud Ecosystems: AWS Glue, Lambda, Athena, S3, EMR, Redshift, Spectrum, Databricks, Azure data factory, Synapse
Langages : Python, Scala, Java, PL/SQL
Machine learning: Scikit-learn, Pandas, Matplotlib, NumPy, NLTK
Databases: Oracle, Netezza, RedShift, SQL Server, MySQL
NoSQL Database: HBase, Elastic Search
Operating System: Windows, Red hat Linux
Tools Used: Spyder, PyCharm, Toad, IntelliJ, Anaconda, Atlan Data Catalog
Streaming tools (Real time): Kafka, Nifi
Version Controls: SVN, TFS, Mercurial, Bitbucket, Git
Data Processing: Structured, Unstructured, and Semi structured data
Experience:
Clients: Humana, Louisville, Ky Mar 2023 β till date
Role: Sr Data Analyst
Worked on Azure Databricks, Hyperscale, Azure Blob, Data Factory and Synapse to develop pipeline for Insurance claims data.
Understand the complexity of existing Insurance Claim Management system and worked on improving the performance of existing logic built on hyperscale Database.
Technologies: SQL Server Database, Azure Blob, Data Factory, Synapse, and Azure Databricks, Python, PySpark
Clients: ChargePoint, Aspirion Jan 2022 β Dec 2022
Role: Lead Data Engineer
Employer: Happiest Minds technologies Dec 2021- Dec 2022
Clients: Aspirion, Columbus, GA
Lead Data Engineer
As a Primary resource understood complete project and data architecture and worked closely with client team to improve the data collection, quality, reporting and analytics capabilities.
Worked on AWS Glue, data Catalog, S3, Athena, Lambda to build the data pipelines.
Collaborated with different teams to understand the requirement and resolve the issues.
Worked on Azure Blob, Data Factory, Synapse, and Azure Databricks to build the ETL data pipelines.
Understand the complexity of existing system proposed the new solutions to run the pipeline efficiently.
Optimized existing pipelines, resulted in reduced operational cost, scalability, and usability.
Technologies: AWS Glue, data Catalog, S3, Athena, Lambda, MySQL Database, Azure Blob, Data Factory, Synapse, and Azure Databricks, Python, PySpark, Bitbucket, Spider, Atlan Data Catalog
Client : Millipore Sigma Burlington, MA Jun 2019 - Dec 2021
Role: Sr Data Engineer
Played the role of Lead engineer and Analyst, designed end to end data lake architecture and helped in analysis of flight risk and learning models of employees.
Collaborated with different teams to streamline the process, conflict resolution and build efficient pipelines.
Proposed, Designed and Developed a Generic plug and play ETL framework on PySpark Databricks. which will enable developers to configure and run any new pipeline in a quick and efficient way. This reduces the development efforts by more than 50%.
Proposed and Developed a Log analyzer framework using NLP and Elastic Search to analyze and recommend a solution occurred in Hadoop cluster that Increase efficiency by 40% of support team.
Helped team in development of Python base Data Validation utility to detect data related issue in early stage of the pipeline.
Handled PII data, developed a secure architecture to encrypt and descript the PII fields in data processing.
Used API response to connect SAP database and extract JSON data into Hadoop system.
Perform a variety of tasks to facilitate completion of projects including coordinating with different teams, helping team in complex production related issue.
Reading unstructured data using Grook patterns and converting them into structured formats as CSV.
Working on strategizing efficient migration from Hadoop to AWS cloud and Palantire platforms.
Helped team in migrating the data from Hadoop to AWS and Palantire platform in efficient manner.
Technologies: Hadoop, PySpark, HDFS, Hive, Sqoop, Oozie, Bitbucket, PyCharm, AWS Glue, Lambda, Athena, S3, EMR, Spectrum, Elastic Search, Scikit-learn, Pandas, NumPy, NLTK, Data modeling, DataIku, Palantir, Ambari,
PySpark Databricks.
Client: Coupons.com, Salt Lake City, UT Dec 2017 β Sep 2018
Role: Sr Data Engineer
As a senior member in the team strategized the migration plan and helped minimized timeline and implemented efficient way.
Helped in migration from legacy DataStage pipelines to Hadoop, Spark ecosystem efficiently with help of team.
Emphasized the importance of file monitoring framework for all non-Hadoop pipelines and implemented using python, it saves a weekly 5Hr of operations time.
Built a customer sessionization logic on a Coupons website clickstream data using PySpark, which eliminated the use of HBase cluster.
Developed a configurable generic PySpark utility to generate XML reports by reading a JSON file, which suits different clients based on configuration.
Technologies: Hadoop, PySpark(Databricks), HDFS, Hive, Sqoop, Oozie, Mercurial, PyCharm, Python, Pandas, Apache Solr, Unix Shell scripting, Cloudera Manager, Databricks.
Client: Epsilon USA (Alliance data company) Sep 2015 βDec 2017
Role: Sr Database Engineer
Played the role of architect, designed end to end Hadoop Data Lake for NBA client, owned the project development activity and helped in smooth transition.
Provided excellent leadership by recommending the right technologies and solutions for a given use case.
Provided technical support to resolve or assist in resolution of issues relating to production systems.
Involved in requirement gathering and designing end to end project architecture in Hadoop.
Built and automated a report using MS SSIS that reduced a weekly 2 Hrs. of manual efforts.
Involved in migrating data pipelines from Netezza to AWS environment.
Initiated multiple process improvement activities.
Reading complex XML data with 7 sub levels, store them into Impala tables.
Technologies: Hadoop, PySpark, HDFS, Hive, Pig, HBase, Sqoop, Oozie, SVN, IntelliJ, Unix Shell scripting, Data modeling, AWS Glue, S3, Redshift. MS SSIS, SQL Server
IMS Health (IQVIA) Danbury, CT - Nov 2011 - May 2015
Role: Data Engineer
Built data Lake in Hadoop, migrating pipeline and data from Netezza to Hive and pig scripts.
Worked on top revenue generating analytical project. It analyzes trends of the pharmaceutical product and market segments.
Built a cost-effective generic pipeline and data model that accommodates multiple client reports in single data platform.
Worked on performance improvement of data pipelines and awarded by out of the box thinker award.
Provided excellent leadership by recommending the right technologies and solutions for a given use case.
Designed best practices to support continuous process automation for data ingestion and data pipeline workflows.
Reading X12 format and converting into CSV format
Prepared and presented reports, analysis and presentations to various stakeholders including executives.
Technologies: Oracle PLSQL, Netezza, Hadoop, Python, HDFS, Hive, Pig, Sqoop, Unix Shell scripting, Data modeling, Toad, AWS Redshift, S3