Data Engineer

Location:

Santa Clara, CA

Posted:

February 21, 2021

Contact this candidate

Resume:

SATYA MISHRA

C: +1-669-***-**** adkdcx@r.postjobfree.com

https://www.linkedin.com/in/satyamishra2007/

Santa Clara, CA,95051

Professional

summary

Senior Data Engineer offering 8+ years in data warehouse design, modeling, implementation and management. Highly skilled at programming and configuring databases to achieve business data needs. Overall 9+ years of experience in Training Delivery, end-to-end Project Development, Support, Project Management and Administration.

Technical

skills

● Programming Language: Python, Scala,Core Java

● Big Data Ecosystem: Hadoop, Hive, Spark, Kafka, Oozie, Splunk

● ETL: Airflow, Run deck, Azkaban

● Cloud Service: Google Cloud and AWS

● Query Language: Oracle SQL, SQL Server, Hive-QL

● NoSQL: Cassandra

● RDBMS: Oracle, SQL server, Teradata

● Dev tools: IntelliJ, PyCharm, Jupyter Notebook

● Data Models: ER / Dimensional

● CICD: GIT, Jenkins

Experience

Lead Data Engineer, Rodanandfields

Santa Clara, CA Apr 2020 to Current

This project is all about building a financial reconciliation process which ensures that there are no reconciliation differences between the data from different systems in order life cycle. Ensures all paid orders have been fulfilled received cash for all valid orders, Refunds/Credits are issued correctly, Taxes are accurately calculated and accrued, and Month end accruals and revenue recognition are financially accurate.

Accomplishments

● Architecture of EDW Modernization for Operational data sources into Google Cloud System

● Building the Conceptual, Logical and Physical data modelling along with STM Design

● Design and Development of ETL pipelines from various source system such file based, RDBMS and API using Airflow and Big Query

● Complete redesign of legacy SQL workloads into Big Query workloads

● Design and Development of Reconciliation engine at daily level using all the data sources in order life cycle

● Complete History data migration all the SQL server objects with lift-shift into Big Query

● Performance improvement of SQL server stored procedures/Tables/Views with Big Query implementation

● Drive adoption of advanced Google Cloud services and capabilities where it makes sense

● Working with various stakeholder and cross platform team in order achieve business goal

● Provided technical mentorship to junior team members, performing code and design reviews, enforce coding standards and best practices Lead Data Engineer, Kohl’s

Milpitas, CA May 2019 to Mar 2020

Kohl’s developed a digital platform to migrate the transaction and customer data from On-premises to Cloud ecosystem. Designing the framework, Cloud migration activities and building a Data Warehouse in the cloud are the key aspects of the project. The various dashboard reports such as Executive, Customers, Campaigning and Sales projection helps the business in crucial decision making.

Accomplishments

● Architecture of on-premises MapReduce /Spark jobs into Google cloud- Big Query Platform

● Complete data migration and Incremental flow set up using Google Cloud Services using Big Query and Airflow

● Designed and Developed multiple aggregates in Big Query with hourly, daily and weekly needed for BI Reporting

● Designed and developed ETL pipelines using Airflow

● Performance tuning of SQL queries and gained improvement by 50%-60% for complex and high-volume jobs

● Automated Data quality set up at various level of ETL process and Data Studio Dashboard between legacy environment to Cloud

● Played an important role in analysis, requirements gathering, functional/technical design, development, deploying and unit testing with Agile methodologies

● Work experience with CICD Pipeline using Git and Jenkins in Cloud Data Engineer, Apple Inc

Sunnyvale, CA Aug 2011 to Jun 2019

GBI iTunes Apple music is the GBI Big Data live application for music streaming data in the Hadoop cluster. This application stores music data and the clickstream activities for all the users being performed on listening and downloading songs. The Project is designed for Data warehousing with Hadoop and its ecosystem. The semantic process involved in making suitable aggregation as per business requirement and providing the necessary reporting file to downstream by using apple framework and configuration. Accomplishments

● Worked on various near real-time and batch processing applications using Spark, Kafka and Cassandra

● Building Spark applications to perform data enrichments and transformations using Spark Data frames with Cassandra lookups

● Used Datastax Spark Cassandra Connector to extract and load data to/from Cassandra

● Developed Spark application that uses Kafka Consumer and Broker libraries to connect to Apache Kafka and consume data from the topics and ingest them into Cassandra

● Worked on building data ingestion layer with Sqoop from Oracle and Tera data for high volume history and incremental data to Hadoop ecosystem, then doing analytics for business reports

● Developed multiple aggregates using Hive and Oozie in Hadoop with Unit testing, Integration testing and Performance benchmarking

● Experience in writing Map reduce and Hive UDF’s applications using Java

● Worked on migration and performance tuning of complex and high-volume Hive driven workflows into Spark Data frame

● Pre-processing script build up using Python and shell script

● Developed Splunk applications for health monitoring tool, Log analysis and alert mechanism set up on various data quality monitoring systems across all processing layer

● DQM dashboard on boarding using Tableau to analyze the growth and failure analysis on transactional data

● Work across functional (development/testing, deployment, systems/infrastructure) with CICD Pipeline

Education Bachelor’s degree

Instrumentation and Electronics – NIST (BPUT, Odisha) Aug 2007 to Jul 2011

Contact this candidate