Data Engineer Open Source

Location:

Hyderabad, Telangana, India

Posted:

November 16, 2023

Contact this candidate

Resume:

SHASHANK

Data Engineer 470-***-****)

**************@*****.***

PROFESSIONAL SUMMARY

●Around 8 years of IT experience in a variety of industries, which includes hands on experience in Hadoop, MapReduce, Hive, Spark (Pyspark), Sqoop, Airflow, Autosys, Snowflake, Teradata, Oracle, RDMS, Python, Scala, Cassandra, and Data Visualization.

●Around 6+ years hands-on experience building ETL pipelines/ Visualizations / Analytics based quality solutions in-house using AWS / Open-source frameworks.

●Expertise on coding in different technologies i.e., Python, Scala, Java, Linux shell scripting.

●Well versed experience in Amazon Web Services (AWS) Cloud services like EC2, EMR, S3, IAM, CloudWatch, Microsoft Azure and Google cloud platform.

●Worked on API’s with spring boot and Node JS applications and also elastic search.

●Experienced in Airflow and Managed Airflow Platform workflows to automate the jobs based on the crontab guru scheduler in Python.

●Good understanding/knowledge of Hadoop Architecture and its components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Secondary Name node, and MapReduce concepts.

●Exploring with the Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, PySpark, SparkSQL, DataFrames, Pair RDD's, Spark YARN, Resource Manager, Memory Tuning, Data Serialization and Memory management.

●Extensive knowledge in programming with Resilient Distributed Datasets (RDDs).

●Experienced in writing custom Hive UDF's to incorporate business logic with Hive queries.

●Experience in process improvement, Normalization/de-Normalization, data extraction, data cleansing, data manipulation on HIVE.

●Strong Knowledge in dimensional Star Schema, Snowflake Schema and Data vault methodologies to develop data marts.

●Experience in testing Code and Microservices using various open source frameworks in Golang.

●In-depth understanding of MapReduce Framework and Spark execution model.

●Good understanding of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.

●Experience in writing Sqoop command to import data from Relational database to Hive.

●Experienced in performing analytics on structured and unstructured data using Hive queries.

●Experience in handling different file formats like Text files, Sequence files, Avro and Parquet data files using different SerDe's in Hive.

●Mastered in using different columnar file formats like Parquet, RCFile and ORC formats.

●Expert in writing complicated SQL Queries and database analysis for good performance.

●Having experience in SQL Server and Oracle Database and in writing queries.

●Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.

●Extensively used Pig for data cleansing.

●Developed the Pig UDF's to pre-process the data for analysis.

●Developed workflow in Oozie to automate the tasks of loading the data into HDFS and preprocessing with Pig.

●Used Python scripts to build a workflow in Autosys to automate the tasks.

●Closely working with Business team for gathering the requirements and fully understanding the business requirements.

●Involved in development of new features like developing Platform Upgrade Testing and Data Snowflake Telemetry as a part of Engineering Excellence.

TECHNICAL SKILLS

●Big data technologies: Hadoop, MapReduce, Hive, Presto, Spark (PySpark), Sqoop, Airflow, Autosys, Snowflake, Teradata, Oracle, RDMS, Python, Scala, ETL and Data Visualization, Kafka.

●Database/Data Warehouse: Oracle, SQL Server, Snowflake, Teradata, Postgre SQL

●Programming languages: Python, Scala, Java, and C/C++

●Scripting Languages: Linux shell scripting

●CI/CD Tools: Jenkins, GitHub, and Jira

●IDE & Tools: Eclipse, PyCharm, Hue, Greenplum, Putty, Autosys, WinSCP and Athena

●Cloud Platform: GCP, AWS, Cloudera and Microsoft Azure

CERTIFICATION AND ONLINE COURSES:

The Linux Command Line Bootcamp: Beginner To Power User (Udemy), 2022 Complete Python Bootcamp From Zero to Hero in Python (Udemy), Microsoft SQL from A to Z (Udemy)

PROFESSIONAL EXPERIENCE

Client : - Fiserv Feb 2023 - Present

Role: Data Engineer Responsibilities:

●Built complex Terabyte-scale data pipelines and presentation layers using Cascading for

consumption by loan performance analysis models.

●Extracted TBs of data from SQL Server and Teradata to One Lake.

●Worked with spark eco system using Spark SQL and Scala queries on different formats like parquet file, CSV file.

●Implementing Spark using Scala and Spark SQL for faster testing and processing of data responsible to manage data from different sources.

●Implemented Data vault modeling for creation of EDW and used Git Hub for code versioning

●A solid experience and understanding of architecting, designing and operationalization of large- scale data and analytics solutions on Snowflake Cloud Data Warehouse.

●Loaded various data pipelines into snowflake segregating into various stages and built various curated tables to feed the Tableau Dashboards to show various metrics to Business.

●Assist in migrating infrastructure from EC2 to EMR with the latest Hadoop stack.

●Built event-driven and scheduled data pipelines to ingest data into operational and analytical DB from various sources.

●Involved in implementing the Micro Services based on Restful API using Golang.

●Enhanced and optimized product Spark code to aggregate, group and run data mining tasks using the Spark framework and handled Json Data.

●Worked on a product team using Agile Scrum methodology to Design, Develop, Deploy and support solutions that leverage the Client big data platform.

●Wrote data APIs and multi-server applications to meet product needs using Golang.

●Developed Spark code using python and Spark-SQL for faster testing and data processing.

●Responsible for batch processing of data sources using Apache Spark. Developed predictive analytics using Apache Spark APIs.

Environment : AWS, EMR, EC2, S3, Teradata, PySpark, Scala, Airflow, Python & Shell scripting, GIT and SQL.

Client : - Change Healthcare Nov 2021 to Dec 2022 Role: Data Engineer

HEAPlus streamlines the application process for individuals and families by providing one electronic application that collects and records data, conducts a preliminary eligibility determination, and facilitates application submission for a range of health, nutrition and cash assistance programs.

Responsibilities:

●Developed and maintained multiple Automated Jobs which enrolls the applicants through the backend process, using business requirements.

●Developed stored procedures and triggers to facilitate consistent data entry into the database.

●Created Complex ETL Packages using SSIS to extract data from staging tables to partitioned tables with incremental load.

●Created SSIS Reusable Packages to extract data from Multi formatted Flat files, Excel, XML files into SQL Server Database

●Working in building EDW for FSA using data vault modeling.

●Extensively used SSIS transformations such as Lookup, Derived column, Data conversion, Aggregate, Conditional split, SQL task, Script task and Send Mail task etc.

●Developed, deployed, and monitored SSIS Packages.

●Involved in Lift-shift cloud POC for migrating Database to the Azure Cloud.

●Handled multiple projects like Data Migration, Conversions, Interfaces.

●Performed operations like Data reconciliation, validation and error handling after Extracting data into SQL Server.

●Using Golang, Implemented RESTful Web Services for the data transportation between multiple systems.

●Worked on SSIS Package, DTS Import/Export for transferring data from Database (Oracle and Text format data) to SQL Server.

●Designed data masking process to Scramble PHI information to set up staging environment.

●Excellent technical and analytical skills with clear understanding design goals of ER modeling for OLTP system and dimension modeling for OLAP system.

●Optimize performance of queries with modifications in T-SQL queries, remove unnecessary cursors, eliminated redundant and inconsistent data, established joins and created indexes whenever necessary.

●Worked with Application Developers cross functional teams to design databases and tables to support Development activities.

●Created SSRS reports using Report Parameters, Drop-Down Parameters, Multi-Valued Parameters Debugging Parameter Issues Matrix Reports and Charts.

●Developed dashboard reports using Reporting Services, Report Model and ad-hoc reporting using Report Builder.

●Created report snapshots to improve the performance of SSRS.

●Managed, updated and manipulated report orientation and structures with the use of advanced Excel functions including Pivot Tables and V-Lookups.

●Actively worked with client to resolve Production tickets.

Environment: MS SQL Server 2008/2012/2016/2017, MS SQL Server Integration Services (SSIS), SQL Server Reporting Services (SSRS), SQL Server analysis service (SSAS), Microsoft Visual Source Safe (VSS), SSRS, TFS.

Bidmath Nov 2020 to Aug 2021

Hyderabad, IND Role : Data Engineer Responsibilities:

●Worked on large data files using Pyspark (parquet format files).

●Experience in developing and building scripts in Python and shell scripting languages.

●Good hands-on experience on launching the AWS EMR clusters and EC2 instances and understanding the process of AWS IAM properties.

●Experienced in selecting the right nodes, instance types according to the account limit to increase the performance of the ETL.

●Have good understanding of storing data in S3 buckets and perform read, transformations and actions on S3 data using Spark Data frames and Spark SQL context on PySpark.

●Experience in Spark with Hadoop platform for processing billions of data which uses in-memory data processing.

●Experience in extracting data from HDFS using Hive, presto and performed data analysis using spark with python.

●Hands on experience on NOSQL, Cassandra, Elastic search.

●Experience in creating hive external tables and good understanding of the techniques of partitioning, bucketing in hive and perform joins on hive tables.

●Involved in designing Hive schemas, using performance tuning techniques like partitioning, bucketing.

●Worked on Spring Boot and Node JS API’S.

●Build common scripts to load data into our final object in Snowflake by creating External Tables and Manage Table in snowflake which helps our product analysts and reporting developers to read and extract data from Snowflake.

●Experience in real time streaming data using message-based topics between producers and consumers and enabled Kafka in our data lakes.

●Used Airflow scheduler end to end data processing pipelines and scheduling theworkflows.

●Involved in development of new features like developing Platform Upgrade Testing using CloudTrail data and Data Snowflake Telemetry as a part of Engineering Excellence.

Environment : AWS, EMR, EC2, S3, IAM, PySpark, Scala, CloudTrail, CloudWatch, Presto, Airflow, Python & Shell scripting, Teradata, GIT, Jenkins, Athena, Kafka and Snowflake, SQL.

Mediamint

Hyderabad, IND June 2016 - Oct 2020

Role : Data Engineer Responsibilities:

●Worked on the Reports module of the project as a developer on MS SQL Server 2008 (using SSRS,

T-SQL, scripts, stored procedures and views).

●Extracted data from many operational systems, including flat files, spreadsheets and RDBMSs like SQL Server and Oracle 8i.

●Learnt Hands on experience on NOSQL, Cassandra, Kafka event mechanism.

●Worked on Spring Boot and Node JS API’S.

●Developed and automated the ETL operations to Extract data from multiple data sources, transform inconsistent and missing data to consistent and reliable data, and finally load it into the multidimensional data warehouse.

●Created reports from OLAP, sub reports, bar charts and matrix reports using SSRS.

●Deployed the SSRS reports in Microsoft share point portal server MOSS 2007.

●Worked on DTS/SSIS for transferring data from Heterogeneous Database (Access database and xml format data) to SQL Server.

●Involved in Data Integration by identifying the information needs within and across functional areas of an enterprise database upgrade and Migration with SQL server Export Utility.

●Performance tuning of SQL queries and stored procedures using SQL Profiler and Index Tuning Wizard.

●Worked with Application Developers cross functional teams to design databases and tables to support Development activities.

●Managed, updated and manipulated report orientation and structures with the use of advanced Excel functions including Pivot Tables and V-Lookups.

Environment : SQL Server 2005 /2008, DTS, Microsoft Business Intelligence Development Studio, VB Scripts, SQL Queries, Stored Procedures, Office, Excel, SSRS, SSIS.

Contact this candidate