Post Job Free

Resume

Sign in

Data Engineer

Location:
Pleasanton, CA
Posted:
June 30, 2020

Contact this candidate

Resume:

BRINDA MAKWANA

347-***-**** add8i2@r.postjobfree.com

Profile Summary

Overall, 6 years of experience as Big Data Engineer/Data Engineer and Data Analysis including designing, developing using Big data & ETL technologies.

Pleasant experience in working with different ETL tool environments like SSIS, Informatica and reporting tool environments like SQL Server Reporting Services (SSRS), Cognos and Business Objects.

Knowledge and working experience on big data tools like Hadoop, Azure Data Lake, AWS Redshift.

Hands on experience in Normalization (1NF, 2NF, 3NF and BCNF) Denormalization techniques for effective and optimum performance in OLTP and OLAP environments.

Experience in Text Analytics, Data Mining solutions to various business problems and generating data visualizations using SAS and Python.

Experienced in configuring and administering the Hadoop Cluster using major Hadoop Distributions like Apache Hadoop and Cloudera.

Expertise in integration of various data sources like RDBMS, Spreadsheets, Text files, JSON and XML files.

Solid knowledge of Data Marts, Operational Data Store (ODS), OLAP, Dimensional Data Modeling with Ralph Kimball Methodology (Star Schema Modeling, Snow Flake Modeling for FACT and Dimensions Tables) using Analysis Services.

Expertise in Data Architect, Data Modeling, Data Migration, Data Profiling, Data Cleansing, Transformation, Integration, Data Import, and Data Export using multiple ETL tools such as Informatica Power Centre.

Experience in designing, building and implementing complete Hadoop ecosystem comprising of HDFS and Hive.

Strong experience with architecting highly per formant databases using MySQL and BigSQL

Extensive experience in using ER modeling tools such as Erwin and ER/Studio, Teradata, BTEQ, MLDM and MDM.

Hands of experience in GCP, Big Query, GCS bucket, G - cloud function, cloud dataflow, Pub/suB cloud shell, GSUTIL, BQ command line utilities, Data Proc, Stack driver

Experienced on Python for statistical computing.

Extensive experience in loading and analyzing large datasets with Hadoop framework (MapReduce, HDFS, HIVE).

Excellent working experience in Scrum / Agile framework and Waterfall project execution methodologies.

Strong Experience in working with Databases like Teradata and proficiency in writing complex SQL, PL/SQL for creating tables, views, indexes, stored procedures and functions.

Pleasant experience working on analysis tool like Tableau for regression analysis, pie charts and bar graphs.

Functional Skills

Big Data technologies: Hadoop 3.0, Kafka, NiFi, SQL, Spark, Teradata, Zookeeper, HDFS, HBase.

Hadoop Ecosystem: HDFS, Map Reduce YARN, Hive 2.3, Impala, Zookeeper, Sqoop, Oozie, Spark, Solr and Avro.

Cloud Architecture: Amazon AWS, EC2, EC3, Elastic Search, Elastic Load Balancing & Basic MS Azure, GCP

OLAP Tools: SAP BO, SSAS, Business Objects, and Crystal Reports 9/7

Programming Languages: Python, Scala, Java

Databases: MS SQL, Oracle & IBM SQL, TSQL, NoSQL

Machine learning: Decision trees, Random forest, Linear & Logistic regression,NPL, PCA, K-means, XG Boost, and predictive analytics-based algorithms

Data -Streaming: Batch Processing & Real-time streaming using KAFKA

Operating System: Windows, Unix & Linux.

Deep Learning & Database: TensorFlow, K-NN, SVM, LDA with BigSQL

Methodologies: Agile, RDD, TDD, System Development Life Cycle (SDLC), Waterfall Model.

Work Summary

Client: Unified Communications

Title: Big Data Engineer

Location: St. Louis, MO

Duration: January 2019 – Present

Roles & Responsibilities

oAs a Big Data Engineer primarily involved in development of Big Data solutions focused on pattern matching and predictive modeling.

oUsed Sci-kit learn, Pandas, NumPy and TensorFlow to determine insights from data and created a trained detection model for batch data.

oTransformed Kafka loaded data using Spark-streaming with Scala and Python.

oDeveloped multiple MapReduce jobs in Python preprocessing with in-depth understanding of Data Structure and Algorithms in-memory data processing with Python and Scala.

oWorked with the Spark for improving performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Spark MLlib, Data Frame, Pair RDD's, Spark YARN.

oExcellent Experience in Hadoop architecture and various components such as HDFS Job Tracker, TaskTracker, NameNode, DataNode and MapReduce programming designed and implemented MapReduce-based large-scale parallel relation-learning system.

oExperience in managing and reviewing Hadoop log files, edits, and metadata in various file formats.

oExperienced in using Tidal enterprise scheduler and Oozie Operational Services for coordinating the cluster and scheduling workflows.

oBuilt data warehousing solutions on analytics/reporting using AWS Redshift service.

oInvolved in creating Snow SQL to extract data from S3 buckets to snowflake tables and transforming the data according to business requirements.

oMigration of data includes various data types like Streaming data, structured data and unstructured data from various sources and also includes legacy data migration.

oUtilize AWS services with focus on big data analytics, enterprise data warehouse and business intelligence solutions to ensure optimal architecture, scalability, flexibility,

oDesigned AWS architecture, Cloud migration, AWS EMR, Redshift, and event processing using lambda function.

oExperience in building and architecting multiple Data pipelines, end to end ETL and ELT process for Data ingestion and transformation in GCP and coordinate task among the team.

oExperience on Migrating SQL database to Azure data Lake, Azure data lake Analytics, Azure SQL Database, Data Bricks and Azure SQL Data warehouse and Controlling and granting database access and Migrating On premise databases to Azure Data lake store using Azure Data factory.

oExperience in Developing Spark applications using Spark - SQL in Databricks for data extraction, transformation, and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns.

oProficient in Object Oriented Concepts. Confident to work with Python, and Scala object-oriented programming language.

oInvolved in data migration to snowflake using AWS S3 buckets. As well worked with Big Query on GCP.

oUsed Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS.

oExperience in importing and exporting data using Sqoop from HDFS to Relational Database Systems as well Hive and vice-versa.

oCreated Hive External tables to stage data and then move the data from Staging to main tables.

oWrote Hive join query to fetch info from multiple tables, writing multiple MapReduce jobs to collect output from Hive.

oUsed Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.

oUsed Presto in-built connectors for Redshift and Hive to prepare datasets for applying advanced analytics (ML) on certain use cases.

oImplemented performance optimizations on the Queries for improving query retrieval times.

oUsed Query execution plans in Presto for tuning the queries that are integrated as data sources for dashboards.

oExtracted the needed data from the server into HDFS and Bulk Loaded the cleaned data into HBase as NoSQL.

oDesigned of Redshift Data model and working on the Redshift performance improvements that helps faster query retrieval and also improves the dependent reporting/analytics layers.

oDeveloped data transition programs from DynamoDB to AWS Redshift (ETL Process) using AWS Lambda by creating functions in Python for the certain events based on use cases.

oCreated new DataStage Jobs and sequences for data validation to detect inconsistency on the data loaded by existence ETL process using different stages in DataStage like link collector, join, merge, lookup, remove duplicates, filter, dataset, transformer, aggregator.

oSolve execution incidents on the processes developed in DataStage.

oFind the root cause of data inconsistency loaded by ETL process developed in DataStage.

Tools: Hadoop 3.0, Spark 2.4.6, Amazon S3, HDFS, Hive 2.3, MapReduce, ETL, Kafka, Sqoop 1.4.7, Python, Scala and Linux.

Client: Aisle 411 Inc

Location: St. Louis, MO

Title: Big Data Engineer

Duration: November 2017 to December 2018

Roles & Responsibilities

oAs a Big Data Engineer, you will provide technical expertise and aptitude to Hadoop technologies as they relate to the development of analytics.

oWorked closely with Business Analysts to review the business specifications of the project and also to gather the ETL on Teland requirements.

oCleaned and congruous data was then streamed using Kafka into Spark and manipulations were performed on real time data with Python and Scala.

oAssisted in leading the plan, building, and running states within the Enterprise Analytics Team

oUsed Agile (SCRUM) methodologies for Software Development.

oImplemented Installation and configuration of multi-node cluster on Cloud using Amazon Web Services (AWS) on EC2.

oResponsible for developing data pipeline with Amazon AWS to extract the data from weblogs and store in HDFS.

oGood understanding of Spark Architecture including Spark Core, Spark SQL, Data Frames, Spark Streaming, Driver Node, Worker Node, Stages, Executors and Tasks.

oLoading salesforce Data every 15 min on incremental basis to BIGQUERY raw and UDM layer using SOQL, Google DataProc, GCS bucket, HIVE, Spark, Scala, Python And Shell Script.

oEngaged in solving and supporting real business issues with your Hadoop distributed File systems and Open Source framework knowledge.

oDesigned efficient and robust Hadoop solutions for performance improvement and end-user experiences.

oWorked in a Hadoop ecosystem implementation/administration, installing software patches along with system upgrades and configuration.

oConducted performance tuning of Hadoop clusters while monitoring and managing Hadoop cluster job performance, capacity forecasting, and security.

oCreated Hive External tables to stage data and then move the data from Staging to main tables

oImplemented the Big Data solution using Hadoop, hive and to pull/load the data into the HDFS system.

oPulling the data from data lake (HDFS) and massaging the data with various RDD transformations.

oUsed AWS Cloud with Infrastructure Provisioning / Configuration.

oUsed Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.

oImplemented the Big Data solution using Hadoop, hive and Informatica to pull/load the data into the HDFS system.

oPulling the data from data lake (HDFS) and massaging the data with various RDD transformations.

oCreated PL/SQL Stored procedures and implemented them through the Stored Procedure transformation.

oExtract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL and U-SQL Azure Data Lake Analytics . Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in In Azure Databricks.

oActive involvement in design, new development, and SLA based support tickets of Big Machines applications.

oDeveloped Scala scripts, UDF's using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into RDBMS.

oLoaded the data from different sources such as HDFS or HBase into Spark RDD and implement in memory data computation to generate the output response.

Tools: Apache Spark 2.3, Hive 2.3, HDFS, Scala, Apache Nifi 1.6, Yarn, HBase, PL/SQL, Pig 0.16, Sqoop 1.2, Kafka

Client: United Airlines

Location: Chicago, IL

Title: ETL Data Engineer

January 2017 to October 2017

Roles & Responsibilities

oAnalyzed Big Data using application solutions such as Hadoop technologies through hands-on projects.

oRecreated existing SQL Server objects in snowflake.

oAlso converted SQL Server mapping logic to Snow SQL queries.

oInvolved in ETL tool teland, Data Integration and Migration using Sqoop to load data into HDFS on a regular basis with Python.

oExperience in Developing Spark applications using Spark - SQL in Databricks for data extraction, transformation, and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns.

oExtract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL, and U-SQL Azure Data Lake Analytics . Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in In Azure Databricks.

oWrote Hive queries for ad hoc data analysis to meet the business requirements.

oImplemented Partitions, Bucketing concepts to Hive and designed both managed and external tables in Hive to optimize performance.

oSolved performance issues in Hive scripts with understanding of Joins, Group, and Aggregation and how does it translate to MapReduce jobs.

oCreated Hive tables and worked on them using HiveQL implementing transformation and performing analysis and creating visualization reports.

oPerformed ETL data cleaning, integration, and transformation using Sqoop on Python.

oDesigned a data warehouse using Hive, created managed Hive tables in Hadoop.

oWorked on analyzing Hadoop cluster and different big data analytic tools using Hive, Sqoop, and Oozie.

oCreated and maintained Technical documentation for launching Hadoop Clusters and for executing Hive queries.

oExported the analyzed data of the relational databases using Sqoop for visualization and to generate reports for the R and D team.

oManaged the data by storing them in tables and created visualization reports.

Tools: Hadoop, HDFS, Sqoop, Hive, Oozie, Unix Shell Scripting, SQL, PL/SQL, Toad,

Pheonom Health

Location: Hyderabad, India

ETL Data Analyst

August 2014 to December 2016

Roles & Responsibilities

oDefined and modified standard design patterned ETL frameworks, Data Model standards guidelines and ETL best practices.

oDesigned physical / logical data models based on Star and snowflake schema using Erwin modeler to build an integrated enterprise data warehouse.

oPerformed detailed data investigation and analysis of known data quality issues in related databases through SQL.

oPerformed data validation, data profiling, data auditing and data cleansing activities to ensure high quality Business Objects report deliveries.

oConfigured sessions for different situations including incremental aggregation, pipe-line partitioning etc.

oCreated effective Test Cases and performed Unit and Integration Testing to ensure the successful execution of data loading process.

oCreated SSIS Packages to export and import data from CSV files, Text files and Excel Spread sheets.

oGenerated periodic reports based on the statistical analysis of the data from various time frame and division using SQL Server Reporting Services (SSRS).

oDeveloped different kind of reports such a Sub Reports, Charts, Matrix reports, Linked reports.

oAnalyze the client data and business terms from a data quality and integrity perspective.

oWorked to ensure high levels of data consistency between diverse source systems including flat files, XML and SQL Database.

oDeveloped and run ad hoc data queries from multiple database types to identify system of records, data inconsistencies and data quality issues.

oMaintained Excel workbooks, such as development of pivot tables, exporting data from external SQL databases, producing reports and updating spreadsheet information.

oCreated SSIS Packages to export and import data from CSV files, Text files and Excel Spreadsheets.

oWork with Quality Improvement, Claims and other operational business owners to ensure appropriate actions take place to address rejections as well as ensure reprocessing of previously rejected data

oEnsured the quality, consistency, and accuracy of data in a timely, effective and reliable manner.

oWorked with the Business analyst for gathering requirements.

Tools: SQL, SSIS, Data Analytics, MDM, TOAD, Erwin, Windows XP, Excel.

EDUCATION:

•MS - Computer Science from Monroe College, New Rochelle, NY

•BE – Electronics & Communication from Sal Engineering & Research College, Gujarat, India



Contact this candidate