NAGENDRA SAI
Phone: 469-***-****
Email: *********@*********-***********.***
Senior Data Engineer
Professional Summary:
•Around 8 years of experience as Data engineer with strong focus on analytical programming – performed data analysis and data visualizations, ETL/ELT process, AWS/Azure cloud technologies, Hadoop Ecosystem and experienced in data migration projects. Data engineer with leadership abilities, results-oriented, proactive, and problem-solving. Adapted to the constraints of tight release dates and succeeded.
•Profound experience in performing Data Ingestion, Data Processing (Transformations, enrichment, and aggregations). Strong Knowledge on Architecture of Distributed systems and Parallel processing, In-depth understanding of MapReduce programming paradigm and Spark execution framework.
•Experienced with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data frame API, Spark Streaming, MLlib, Pair RDD 's and worked explicitly on PySpark and Scala .
•Experience in Oozie and workflow scheduler to manage Hadoop jobs by Direct Acyclic Graph (DAG) of actions with control flows.
•Implemented the security requirements for Hadoop and integrating with Kerberos authentication infrastructure- KDC server setup, creating realm /domain, managing.
•Hands-on experience in handling database issues and connections with SQL and NoSQL databases such as MongoDB, HBase, Cassandra, SQL server, and PostgreSQL . Created Java apps to handle data in MongoDB and HBase. Used Phoenix to create SQL layer on HBase.
•Experience in designing and creating RDBMS Tables, Views, User Created Data Types, Indexes, Stored Procedures, Cursors, Triggers and Transactions.
•Expert in designing ETL data flows using creating mappings/workflows to extract data from SQL Server and Data Migration and Transformation from Oracle/Access/Excel Sheets using SQL Server SSIS.
•Experienced in using Kafka.
•Developer in responsible for building and maintaining message configuration and flows and providing issue analysis on Kafka applications.
•Expert in designing Parallel jobs using various stages like Join, Merge, Lookup, remove duplicates, Filter, Dataset, Lookup file set, Complex flat file, Modify, Aggregator, XML.
•Hands-on experience with Amazon EC2, Amazon S3, Amazon RDS, VPC, IAM, Amazon Elastic Load Balancing, Auto Scaling, CloudWatch, SNS, SES, SQS, Lambda, EMR and other services of the AWS family.
•Created and configured new batch job in Denodo scheduler with email notification capabilities and Implemented Cluster setting for multiple Denodo node and created load balance for improving performance activity.
•Instantiated, created, and maintained CI/CD (continuous integration & deployment) pipelines and apply automation to environments and applications. Worked on various automation tools like GIT, Terraform, Ansible.
•Experienced with JSON based RESTful web services, and XML/QML based SOAP web services and worked on various applications using python integrated IDEs like Sublime Text and PyCharm.
•Developed web-based applications using Python, DJANGO, QT, C++, XML, CSS3, HTML5, DHTML, JavaScript and jQuery.
•Building and productionizing predictive models on large datasets by utilizing advanced statistical modelling, machine learning, or other data mining techniques.
•Developed intricate algorithms based on deep-dive statistical analysis and predictive data modelling that were used to deepen relationships, strengthen longevity and personalize interactions with customers.
•Handled Machine Learning model development and data engineering using Spark and Python.
•Applied statistical techniques and big data technologies using Spark to solve business challenges.
•Provided technical leadership and guidance to interns on Spark project-related activities.
Technical Skills:
Programming
Python, Shell scripting, PySpark, SQL
Frameworks/Tools
Kedro, DBT, Hadoop MapReduce, Informatica Power center 10.2,
IICS
Python Libraries
NumPy, Pandas, Matplotlib, Urllib2
Databases
MySQL, DB2, AWS Redshift, NoSQL – MongoDB, HDFS, Hive
Version Control
Git, GitHub
Operating Systems
UNIX, Linux, Windows, Mac OS
Methodologies
Agile, ETL/ELT
Cloud Technologies
AWS, Azure, GCP
Education:
Master of Science in Computer Science University of Southern Arkansas.
Professional Experience:
Frost Bank, San Antonio, Texas Oct 2021 – Present
Data Engineer
Responsibilities:
•Participated in architecture and requirement analysis talks.
•Collaborated with internal and external business partners to gather requirements.
•Involved in designing MVP plan for data migration from Hadoop Ecosystem to AWS cloud.
•Deployed AWS Data Sync agent near to HDFS cluster hosted on Azure VM to test out the data copy process from HDFS to AWS S3.
•Developed technical design documents, target to source mapping document, and mapping specification document based on business requirements.
•Extensively worked on Informatica PowerCenter.
•Unpacked the existing mapping logics in informatica PowerCenter to recreate the mappings in IICS.
•Worked on Informatica PowerCenter - Source Analyzer, Mapping Designer, Workflow Manager, Mapplet Designer and Transformation Developer.
•Created Complex mappings using Unconnected and connected Lookups and Aggregate and Router transformations for populating target table in efficient manner.
•Created mapplets, worklets and reused in mappings and workflows accordingly.
•Developed python 3 scripts to trigger the informatica workflows.
•Used Hue to interact with the Hadoop Ecosystem.
•Experienced in using Hive to process and analyze data.
•Adept in following Agile methodology and successful in meeting the fast packed bi-weekly sprint deadlines.
Technical Skills: Python3, DB2, AWS s3, AWS Redshift, AWS Data Sync, AWS EC2, Control M, Linux, shell, HDFS, Hue, Hive, Azure VM, Informatica PowerCenter 10.2, IICS, SQL, JIRA, Cognos, Tableau, PyCharm, Jupiter Notebooks.
Vanguard, Charlotte, NC Nov 2020– Sep 2021
Data Engineer
Responsibilities:
•Participated in requirement gathering and analysis phase of the project in documenting the business requirements by conducting workshops/meetings with various business users.
•Performed Data Analysis, Data Migration, Data Cleansing, Transformation, Integration, Data Import, and Data Export through Python.
•Developed Python Pandas Jobs to extract data from S3 folder and perform data quality checks and produced the target data back to AWS s3.
•Improved performance of ETL pipelines by 90% with the implementation of Pyspark jobs over Pandas jobs.
•Debugged the transformed output data for data inconsistencies and optimization of Data.
•Developed Pyspark Data frames to speed up Data processing for super large datasets.
•Creation of technical Mapping documents.
•Involved in file movements between HDFS and AWS S3 and extensively worked with S3 bucket in AWS.
•Created action filters, parameters, and calculated sets for preparing dashboards and worksheets using Power BI
•Developed visualizations and dashboards using Power BI
•Used ELT to implement the Slowly Changing Transformation, to maintain Historically Data in Data warehouse.
•Implemented ETL jobs on kedro framework using pandas to perform transformation on datasets loaded from AWS s3 bucket and generating the output files to AWS s3 bucket.
•Good experienced in creating dbt jobs with redshift database as data warehouse.
•Pushed Code to GitHub for peer reviews to optimize the code and get feedbacks.
Technical Skills: Python 3, Pandas, Pyspark, GitHub, Dbt, ETL/ELT, HDFS, Kedro, AWS S3, AWS Redshift, Snowflake, Shell, Linux, ALM, Power BI, PyCharm, Jupyter Notebooks.
Coveo Info Solutions, India Jan 2017– Jun 2019
Data Engineer
Responsibilities:
•Participated in requirement gathering and analysis phase of the project in documenting the business requirements by conducting workshops/meetings with various business users.
•Performed Data Ingestion into HDFS using tools like Sqoop and HDFS client APIs.
•Created custom python/shell scripts to import data via SQOOP from Oracle databases.
•Performed Data Analysis using Hive.
•Created Hive external tables, added partitions and worked to improve the performance of hive.
•Identify and validate data between source and target applications.
•Work extensively with flat files. Loading them into on-premises applications and retrieve data from applications to files.
•Verify data consistency between systems.
•Analyze and cleanse raw data using HiveQL.
•Experience in data transformations using Map-Reduce, HIVE for different file formats.
•Involved in converting Hive/SQL queries into transformations using Python.
•Performed complex joins on tables in hive with various optimization techniques.
•Created Hive tables as per requirements, internal or external tables defined with appropriate static and dynamic partitions, intended for efficiency.
•Worked extensively with HIVE DDLS and Hive Query language (HQLs)
•Involved in loading data from edge node to HDFS using shell scripting.
•Understand and manage Hadoop Log Files.
•Manage Hadoop infrastructure with Cloudera Manager.
•Created and maintained technical documentation for launching Hadoop cluster and for executing Hive queries.
Technical Skills: Bigdata ECO systems, Hadoop, HDFS, Hive, Cloudera, MapReduce, Unix scripts, Flat Files, XML files, PyCharm, Jupyter Notebooks, Python, Pandas.
iAppsoft Solutions, India Aug 2014– Dec 2017
Python Developer
Responsibilities:
•Participated in entire lifecycle of the projects including Design, Development, and Deployment, Testing and Implementation and support.
•Wrote Python routines to log into the websites and fetch data for selected options.
•Used Python modules such as requests, urllib, and urllib2 for web crawling.
•Performed Design, involved in code reviews and wrote unit tests in Python.
•Updated site with JavaScript, jQuery, Python, Django, and SQL.
•Creating new SSRS reports-based Teradata and SQL data sources.
•Creation of SSIS packages to enable SSRS reporting.
•Developed tools using Python, Shell scripting, XML to automate some of the menial tasks. Interfacing with supervisors, artists, systems administrators, and production to ensure production deadlines are met.
•Developed views and templates with Python and Django's view controller and templating language to create a user-friendly website interface.
•Manage Hadoop infrastructure with Cloudera Manager.
•Created and maintained technical documentation for launching Hadoop cluster and for executing Hive queries.
Technical Skills: Python, Django, PyUnit, PyQt, jQuery, JavaScript, HTML, CSS, XML, JSON, AJAX, Shell scripting, GitHub, agile, Jira, SQL, SSIS, T-SQL, SQLite3 and Windows.