Post Job Free

Resume

Sign in

Data Engineer

Location:
DeKalb, IL
Posted:
March 31, 2021

Contact this candidate

Resume:

PARAG TETAMBE Portfolio: https://www.paragtetambe.com

*** ************ ******, ********* **, Boston, MA, 02215 LinkedIn: https://www.linkedin.com/in/paragtetambe Willing to relocate Email: adlbmk@r.postjobfree.com Phone: 815-***-**** PROFESSIONAL SUMMARY

• Every organization is generating data at an unprecedented pace and scale right now. My mission is to create data pipelines to collect data from multiple sources, transform it, & store it in a centralized, enriched and efficiently usable form while also keeping an eye out for trends or inconsistencies that will impact business goals

• Data Engineer highly adept at analyzing large datasets from multiple sources into Data Warehouses using ETL pipelines to tell a story enabling organizations make the best decisions to determine ongoing strategy driving business intelligence decisions while optimizing IT resources so they can increase revenue and reduce costs SKILLS & CERTIFICATION

Programming Languages & Databases: Python, Spark, SQL, CQL, MS SQL Server, PostgreSQL, Apache Cassandra (NoSQL Database), MySQL, UML, SAP ERP DW/BI Tools: Power BI (DAX), Tableau, Alteryx, DBeaver, SAS Enterprise Miner, Google Analytics, Dataiku Data Science Studio, Talend, Qlik Sense, Visual Studio Code, SSIS, SSAS Cloud Computing: AWS (S3, Lambda, Glue, Athena, Redshift Spectrum, QuickSight), GCP (BigQuery, Composer, Workflow, CloudSQL, Dataproc), Azure (Databricks, Data Factory, Cosmos DB) Other: Git, Microsoft Project, Azure Synapse Analytics, Excel, ER Studio Data Architect, JIRA, Big Data Solutions, Amazon Web Services, Google Cloud Platform, Azure Data Services BUSINESS EXPERIENCE

Data Engineer/Business Intelligence Consultant, (Startup: Opportunity DeKalb) Northern Illinois University, DeKalb, IL 10/2020 – 12/2020

• Coached, mentored & led a team of 5 for problem-solving, presentation, dashboard creation & Agile development of a communication & project plan to help improve outreach by 73%

• Spent 70% of each week collecting, maintaining and processing raw data to assist in Business Intelligence, Data Engineering & Analytical services utilizing ETL tools to improve data quality by 91%

• Removed inconsistency from data by performing data cleaning & pre-processing which led to its enrichment & helped in integration to efficiently store it in a centralized usable form for data analysis

• Performed ETL/ELT using Excel to migrate survey data from Qualtrics to Excel & Power BI DAX Queries for creation of visualizations for non-technical audiences to make important business decisions

• Solved data integrity issues & developed data migration templates to integrate and centralize data which resulted in improvement in data consistency saving 10+ hours every 7 days for 4+ months ETL Engineer/Data Analyst Consultant, (Startup: Heel Strike) Northern Illinois University, DeKalb, IL 8/2020 – 10/2020

• Created Reports & Dashboards using Excel, Tableau & Power BI to track progress to identify high-level analytics to stakeholders and executives which improved business performance by 60%

• Utilized Git and GitHub for Continuous Delivery by extracting, transforming & loading (ETL) into Tableau & Power BI to influence intelligent business decision-making by designing dashboards

• Delivered Subject Matter Expertise (SME) guidance to 4 for developing & designing a website within a 4-week period to increase customer retention by 43% & an increase in future sales by 70%

• Reported to stakeholder by working with internal & Engineering team as a leader to deliver project updates every week while keeping the project on track which increased work efficiency by 30%

• Solved ETL & analytical issues while following BI best practices that resulted in insights & ad hoc reports in Power Bi that improved marketing strategy & increase in client’s customer base by 33% NOTEWORTHY PROJECTS - https://paragtetambe.com/Projects Data Engineering, Serverless ETL & BI on Amazon Cloud (S3 Data Lake, Redshift DWH, Lambda, Glue – PySpark & Python, Athena, Data Pipeline, Redshift Spectrum, QuickSight)

• Synced the data of an ecommerce company from MySQL transactional database stored in AWS RDS into Redshift DWH and captured external data from other sources to transform & centralize the parquet data into a single infrastructure for generating reports/ KPIs and metrics for decision-making, machine learning & analysis using amazon cloud infrastructure

• Designed a data warehouse on AWS by loading historical & current data using AWS data pipeline, automated incremental hourly data loads from S3 bucket & used lambda triggers to call glue jobs using PySpark & Python, implemented redshift cluster snapshot for backup & made used of IAM & VPC endpoints for security, database creation, data validation and data governance

• Imported data from external sources using glue crawlers in amazon Athena & S3 data lake, centralized data into redshift spectrum by optimizing & fine tuning it to improve data enrichment & data centralization for business data analysis, built dashboards in cloud using QuickSight in the form of charts & reports to help derive intelligent business decisions and give Key Performance Indicators Business Data Architecture, Data Warehousing and Business Intelligence (Tableau, Power BI, Qlik Sense, Talend, ER Studio Data Architect, SQL Server)

• Performed data modeling on Database operating SQL Server, Oracle, and ER studio, integrated data from multiple systems into target dimensions & facts to create multi-dimensional data warehouse, implemented Slowly Changing Dimensions (SCD’s) & outriggers to integrate data in Talend (STAR schema) for efficient & proper data storage (Kimball approach) for business analysis

• Executed Error handling & Currency Conversion for 5M rows of data, reduced the execution time in Master Job by 22% using serialization & parallelization within 20 minutes ensuring data quality

• Built BI Dashboards, visualizations and reports in Qlik Sense, Power BI based on product online sales to help stakeholders derive better business strategies & improve business performance by 87% Real-Time Social Media Analytics using Sentiment Analysis and Topic Modelling (Twitter Development, ETL, Python, Alteryx, SAS Enterprise Miner, Power BI)

• Streamed & extracted tweets in real-time for 3 days using python & twitter API, pre-processed & integrated data from multiple csv files using Alteryx to extract a dataset of 34000+ records

• Performed text analysis to transform & give sentiment of twitter text data, converted dataset into Excel to perform topic modelling using SAS Miner which classified text into topics and terms & loaded dataset into Power BI to give key performance indicators (KPI’s) related to sales, reviews & performance in the form of visualizations, reports & dashboards to answer BI questions Big Data Engineering on Google Cloud Platform (Hadoop, Dataproc HDFS, BigQuery, Apache Airflow, Google Composer, Workflow, Sqoop, CloudSql, Hive, Spark, Kafka, IOT)

• Performed AVRO, parquet data MapReduce ingestion into BigQuery/GCS using Sqoop on dataproc, used google workflow to run Sqoop jobs instead of CI/CD Jenkins, performed data orchestration

& job automation using airflow to create dependencies between different tasks in ETL & batch processing, made use of spark structure streaming & hive on dataproc using big data techniques

• Used Kafka as messaging queue, cloud storage as streaming file source & wrote functions to automatically trigger when new files are imported and used BigQuery as Data Warehouse (DWH)

• Pulled data from Kafka topics, handled late data using watermarking, applied aggregations & transformations in micro-batches to IOT streaming data using foreachSink Batch having energy consumption from different home appliances (Internet of Things) & handled energy consumption by sending real-time alerts to Kafka topics using big data technologies Data Architecture, ETL, Automated BI and Analytics (Azure Data Lake, Databricks, Data Factory, Cosmos DB, Function, Logic, Cognitive Services, Active Directory, Power BI)

• Built a cloud solution architecture using azure data services to help Santa’s IT team automate children’s request for presents sent via twitter and email using ETL pipeline

• Detected email & twitter of a specific keyword to store in an azure data lake by sentiment from cognitive services using logic app, created functions & data factory orchestration for repetition, automated a pipeline to get Databricks data straight from the data lake using triggers & centralized the data to create reports & publish dashboards in Power BI Dimensional Modeling, Data Warehousing and Business Intelligence using SQL Server Analysis Services (Data Mart, MOLAP Cubes, SSIS, SSAS, Excel PivotTable, Data Model, Power BI)

• Defined Project Charter, profiled & modeled data to create high-level detailed dimension model & ROLAP architecture in MS SQL Server Integration Services & SSAS (Visual Studio)

• Utilized Multi-dimension techniques and Tabular model, SSIS for ETL process where the database is transformed into data warehouse using star & snowflake schema, developed Data Marts & Data Cubes (Inmon approach) to design reports in Excel Power Pivot & Power BI DAX functions providing actionable insights & identifying trends exploring big data issues Cloud ETL, Data Pipelines, Data Visualization & Automated Machine Learning (Dataiku Data Science Studio -DSS, ETL, AWS Cloud Data Lake, Python, AutoML, Visualization)

• Loaded large datasets from S3 bucket to perform exploratory data analysis on training data leveraging visualization features within DSS, cleaned & prepared data using visual and code recipes in ETL flow, deployed the trained model after testing and designing it using AutoML to predict new outcomes by building a scalable ETL data pipeline within 20 minutes Relational Data Modeling & Database Management System (SSMS, SSDT, MSBI, SQL Server Integration Services - SSIS, Toad Data Modeler, OLTP, Relational Databases)

• Led a team of 3 members to build a Relational Database system & loaded master data through SSIS tool which increased efficiency by 60% and saved 5+ hours of time

• Modeled ER diagram utilizing Toad Data Modeler and normalized tables to 3NF to avoid transactional data redundancy and devised a consolidated view of subscribers while incorporating various functionalities, used complex queries to display data from more than 77,590+ records saving 100+ hours of time and enabling users to analyze different plans Azure Synapse Analytics (Data Lake - Synapse SQL On-demand, Azure Synapse Spark, Data Warehouse – Azure Synapse Pipelines, Synapse SQL Pools, Power BI, Spark, Azure Databricks)

• Explored the azure data lake using SQL On-demand and Azure Synapse Spark or SQL, used a pipeline with parallel activities to bring data into the Data Lake, transform it, and load it into the Azure Synapse SQL Pool by performing data wrangling to build a modern data warehouse with Azure Synapse Pipelines by the use of data flows and data orchestrations

• Visualized Power BI reports inside Synapse Studio & used advanced integration from Synapse Services and SQL Data Warehousing services to provide high performance & governed workloads & finally leveraged a model trained in Databricks to make predictions using the T-SQL in a dedicated SQL pool database table using Azure Machine Learning automated ML Data Engineering, ETL, Data Pipelines & AWS Redshift Datawarehouse with ETLeap (ETLeap, Amazon Redshift, AWS S3 bucket/Data Lake, AWS Glue Catalog, MySQL)

• Performed database migration from on-prem SQL database to AWS cloud data warehouse by creating data pipelines where data is centralized by extracting, transforming & loading process to store in Amazon Redshift data warehouse and S3 data lake from multiple sources, used ETLeap to create models for increasing runtime & monitoring ETL data pipelines EDUCATION

Master of Science in Management Information Systems, Northern Illinois University May 2021 Bachelor of Engineering in Information Technology, Mumbai University May 2018



Contact this candidate