Data Analyst/Data Engineer/BI Engineer, Python, R, Tableau, SQL,Hadoop

Location:

Hyattsville, MD

Posted:

January 27, 2023

Contact this candidate

Resume:

Jalvi Sheta

202-***-**** ● Hyattsville, MD ****3● ************@*****.***

https://www.linkedin.com/in/jalvi-sheta-834225124/ SUMMARY

● Overall, 3+ years of IT experience in focusing on Data warehousing, Data modeling, Data Analysis, Data Integration, Data Migration, ETL process and Business Intelligence.

● Expertise in designing and developing scalable Big Data solutions, data warehouse models on large-scale distributed data, performing a wide range of analytics to measure service performance.

● Designed and developed secure and scalable ETL Big data pipelines on the Hadoop ecosystem for diverse use cases.

● Worked on automated ETL processes using SSIS and SQL Server scripts.

● Support the design, development and ongoing support of Data Lake and Delta Lake environment

● Developed advanced single response processing ETL scripts using SQL Server and Python for loading data from Qualtrics surveys into the data warehouse: reduced data refresh time by 2 hours

● Developed solutions using Spark SQL, Spark streaming, and Kafka to process web feeds and server logs.

● Worked on Hive and Pig for data analysis as well as hands-on experience on Spark SQL

● Experience developing large-scale batch and real-time data pipelines with data processing frameworks like Apache Storm, Flink, Spark and Kafka on the AWS.

● Development experience using cloud technologies like AWS EC2, S3, EMR, VPC, LAMBDA, EBS, Redshift, Glue, Athena etc.

● Created automated jobs and visualizations using Python to analyze Business KPIs

● Hands-on experience in the Hadoop ecosystem including Spark, Kafka, HBase, Hive, Pig, Sqoop, Oozie, Storm.

● Experience developing reports and dashboards using Visualization/Reporting tools like Power BI, Tableau and Spotfire.

● Experience using workflow management and scheduler tools like Apache Airflow, oozie, Autosys etc.

● Hands-on experience with Amazon EC2, Amazon S3, Amazon RDS, VPC, IAM, Athena, Amazon Elastic Load Balancing, Auto Scaling, CloudWatch, SNS, SQS, Lambda, Glue, Redshift, and other services of the AWS.

● Worked on standardization of data, Data Analysis & Profiling, Data cleansing, validating data using Business Rules, Parsing, Improving the data quality by eliminating the redundant data, Standardization and merging.

● Experience in Data Modeling using Dimensional Data Modeling techniques like Star Schema and Snowflake Modeling.

● Actively learning implementation of Data science concepts like Machine learning, Deep learning, NLP using Python and Anaconda Jupyter Notebooks.

TECHNICAL SKILL SET

Operating Systems UNIX, MAC OS X, Windows

Programming Languages Python, C, R, SQL, PLSQL, HTML and XML Databases Oracle, SQL Server, PostgreSQL, MySQL,MongoDB, Cassandra, HBase, Redshift

Big Data EcoSystem Hadoop, MapReduce, Pig, Hive, Sqoop, Yarn, Spark, Storm, Kafka, HBase, PySpark, NIFI, Airflow

ETL Tools Informatica Power Center, Informatica IDQ, SSIS, Alteryx Reporting Tools Power BI, Tableau

Version controls TFS, Git, SVN.

EDUCATION

University of Maryland, Robert H. Smith School of Business College Park, MD, USA Master of Science in Information Systems Dec 2020

WORK EXPERIENCE

Haynes & Company Washington DC, USA

Data Engineer Sep 2020– Present

● Transform raw, unstructured data into well structured, normalized Data Models for end user consumption in Business Intelligence reporting tools

● Assist with activities required to launch new data related projects such as Data Mapping, Data Modeling and establishing Data Dictionaries

● Develop advanced single response processing ETL scripts using SQL Server and Python for loading data from Qualtrics surveys into the data warehouse: reduced data refresh time by 2 hours

● Design, build, test, and maintain large-scale batch and real-time data pipelines.

● Supported clients in building scalable solutions in the Hadoop ecosystem for diverse use cases.

● Optimizing existing ETL pipelines to improve reliability, performance, adding data quality checks, alerting, and improving SLA landings times for Tier-0 pipelines.

● Involved in loading processes into HDFS and Pig for preprocessing the data

● Hands-on experience in the Hadoop ecosystem including Spark, Kafka, HBase, Hive, Pig, Sqoop, Oozie, Storm

● Create automated client reports using SQL and Tableau: increase data load speed by 35%

● Work on Amazon EC2, Amazon S3, Amazon RDS, VPC, IAM, Athena, Amazon Elastic Load Balancing, Auto Scaling, CloudWatch, SNS, SQS, Lambda, Glue, Redshift, and other services of the AWS

● Work with clients to create efficient Qualtrics surveys to enhance vendor experience and reduce survey time

● Created and tested automated jobs and visualizations using Python to analyze Business KPIs

● Joined various tables in Cassandra using Spark and Scala and ran analytics on top of them.

● Implemented a generic ETL framework with high availability for bringing related data for Hadoop from various sources using spark.

● Involved in converting Hive/SQL queries into Spark transformations using Spark RDD and Pyspark concepts.

● Involved in error handling, debugging, and troubleshooting sessions using the Session logs, Debugger, and Workflow Monitor.

CU-Rise Analytics Pvt. Ltd. Ahmedabad, GJ, India

Data Analyst Jan 2019 – Jul 2019

● Collaborated with a team of five members and generated a Data-warehouse called Data Analytical Model

● Worked on Finance data to get insights about the customers of the Credit Unions.

● Implemented model on servers of various clients- Credit Unions and hence worked on client’s data and created data-warehouse

● Analyzed 2000+ consumer surveys to evaluate customer satisfaction rate among them.

● Built Tableau dashboard to visualize core business KPIs, saving 12 hours of manual reporting work.

● Formulated SQL Server Integration Services packages; designed Extract-Transform-Load processes on client database to clean data and bring it into a single format

● Created Data Model and automated jobs using SSIS and Visual Studio to execute the ETL processes on a daily basis; reduced the load time by at least 25%.

● Worked with clients on a daily basis and developed Power BI reports along with BI team to create final reports for them.

● Built Tableau dashboard to visualize core business KPIs, saving 12 hours of manual reporting work.

● Developed Convolutional Neural Network (CNN) and Deep Neural Network (DNN) models with Python modules Keras TensorFlow.

● Developed PySpark applications using Data frames and Spark SQL API for faster processing of data.

● Involved in the entire data science project life cycle and actively involved in all the phases including data extraction, data cleaning, statistical modeling, and data visualization with large data sets of structured and unstructured data.

● Developed SQL queries and data pipelines to extract, transform and load (ETL) the data into the final schema.

● Used best practices and complex statistical and machine learning techniques to build models that address business needs and improved accuracy of data and data-driven decisions.

● Improved application load time of the manage downtime screen by 50% by using reusable fragments and enhanced application performance by ~25% by using helper classes within the Fiori screens Motadata Ahmedabad, GJ, India

Machine Learning Intern May 2018 – Jun 2018

● Led a team of three interns to foster a forecasting model for prediction of live CPU readings obtained through Datadog software

● Directed research of different algorithms; examined efficiencies and finally used Holt-Winters algorithm for forecasting with accuracy of around 85%

● Performed data manipulation, Data preparation, Normalization and Predictive modeling. Improved efficiency and accuracy by evaluating models in Python.

● Generated reports and visualizations based on the insights and developed dashboards for the company insight teams.

● Generated clean and performant code to train ML models, focusing on throughput, stability, and ML metrics

● Performed setting up testing and best engineering practices for the research engineering team

● Collaborated with ML scientists, and user researchers to unlock powerful and new capabilities to build interactions upon

● Developed training and testing pipelines to assess the performance of these architectures on relevant image processing tasks

● Help the Data Science team scale by building out a platform for training, deploying, and monitoring machine learning models in production

PROJECT EXPERIENCE

Cricket World Cup Predictive Model (Python- Numpy, Pandas, MatplotLib, Plotly)

● Cleaned and worked on datasets which contained batsmen, bowler and stadium data that formed base of analysis

● Visually analyzed the performance of each team and the players associated with them over a period of four years to come up with a generic predictive model for the ICC World Cup Tournament Sentimental Analysis (Anaconda, Python)

● Implemented a model in a group of two which classifies comments into different classes- binary or multiclass

● Analyzed multiple techniques for acquiring better results; pre-processed text datasets; removed all stop words and applied stemming processes before feeding into classifiers

Contact this candidate