Engineer Data

Location:

Wilmington, DE

Posted:

March 15, 2021

Contact this candidate

Resume:

Arun kumar

*************@*****.***

302-***-****

Professional Summary

Data Engineer / Spark Developer with 3+ years of IT Industry experience in designing, developing and maintaining large-scale systems/applications with significant expertise in Python 2.7, 3.2 and 3.6.

Experienced with full software development life-cycle, architecting scalable platforms, object-oriented programming, database design and agile methodologies

Expert knowledge of and experience in Object Oriented Design and Programming concepts.

Experience in writing data processing frameworks for large scale applications

Hands on experience in using Spark ecosystem components like S3, RDS, Snowflake, REST, Kafka

Experience in closely working with data analysis and Data scientists for converting POCs into production grade software.

Experience in consuming data from Kafka into spark micro batches.

Worked on spark application tuning and resource allocations based on use case

Experience in Shell Scripting, SQL Server, UNIX and Linux.

Experience in building applications in AWS infrastructure; Cloud Formation Templates, Cloud Watch Alarms, S3, RDS, Security Groups, VPCs, EC2.

Familiar with JSON based REST Web services and Amazon Web services.

Automated the continuous integration and deployments using Jenkins and AWS Cloud Templates deployment services (Lambda).

Experience with web scrapping using beautiful soup; Created own flight logistic application using information from scrapped data.

Experience in project deployment using Jenkins and using web services like Amazon Web Services (AWS) EC2, Cloud Formation Templates, AWS S3 and Cloud watch.

Exposure to ML/DL ecosystem, algorithmic approaches along with design and performance constraints

Experience in tools in java as part of personal development.

Exposure to Tensor flow and PyToch frameworks & hardware for AI.

Professional skills

Programming Languages: Python, Java, JavaScript

Web services: RESTful

Data bases: Oracle 10/11g, MySQL, SQL Server

IDE’s and tools: Eclipse, Pycharm, NetBeans

OS &Environment: XP, windows, Linux, Unix, Ubuntu

Unix Shell Scripting: Unix Shell Scripting

Version control: GitHub

Development Methodologies: Agile, Scrum

Hadoop: HDFS, MapReduce, Spark

Machine Learning: KNN, Gradient Descent, Back Propagation.

Professional Experience

Capital One, Chicago, IL

Role: Software Engineer / Data Engineer June 2020 to Present

Collaborate with and across Agile teams to design, develop, test, implement, and support technical solutions in full-stack development tools and technologies

Work with a team of developers with deep experience in machine learning, distributed microservices, and full stack systems

Utilize programming languages like Java, Scala, Python and Open Source RDBMS and NoSQL databases and Cloud based data warehousing services such as Snowflake

Share your passion for staying on top of tech trends, experimenting with and learning new technologies, participating in internal & external technology communities, and mentoring other members of the engineering community

Collaborate with digital product managers, and deliver robust cloud-based solutions that drive powerful experiences to help millions of Americans achieve financial empowerment

Perform unit tests and conducting reviews with other team members to make sure your code is rigorously designed, elegantly coded, and effectively tuned for performance

Environment: Linux, Rel7, PySpark, Snowflake, Data Lake, Hadoop, Teradata, PSQL, Python, JSON, AWS, CI/CD, RISK management, Apache Arrow, Credit Card Fraud Prevention, REST, GitHub, Notebooks, pytorch.

Caterpillar Innovation Center, Champaign IL 10/2019 to 05/2020

Data Engineer II

Working on an initiative to build a data lake and an ecosystem of dada intensive applications to enable real time insight into telemetric data in manufacturing and on field machines. Build pipelines to transform time series feature engineer time series data for visualization.

Review design documents on echo system for data-lake and test alternatives for empirical proofs

Develop python modules to support and develop in built ETL tool for ETL process visitation and development

Maintain existing linear regression model on telemetric data for pressure sensors using IOT data

Building a framework for data objects on time series data

Environment: Linux, Rel7, Data Lake, Python, JSON, AWS, CI/CD, IOT, React JS, Seaborne, REST, GitHub, Notebooks, pytorch.

Capital One, Richmond VA 12/2017 to 09/2019

Data Engineer

Capital One maintains an ecosystem of internal data processing applications designed to prevent fraud of various types including transactional and application fraud with dynamic fraud patterns. Ecosystem of highly available multiple spark clusters running rule on kafka streams and batch data.

Developed multiplatform PySpark framework used to create spark jobs and provide SQL like interface for data analysts. This application can be used for both creating spark jobs and productionize them.

Responsible for migrating findings from Data Scientist and fraud investigators into existing fraud defenses. And over seeing Data Science Approaches so that functionalities can be converted to production environment.

Designed data flow for new business needs and participated in architectural and workflow discussions.

Responsible for maintaining allocating and tuning resources for faster performance for heavy data loads.

Resolved Big Data small files issue in the organization using spark Hadoop configurations and dependency configurations.

Designed infrastructure configurations for spark resource management suited for the data processing needs.

As part of Capital One Fraud Prevention team, worked on maintaining old and developing new rule engines along with application resiliency strategies. Created new Fraud defenses using existing data processing patterns

Responsible for developmental and production data security aspects along with upgrading the system with new software and infrastructural features

Converted exiting traditional Teradata fraud defenses into cloud architecture and spark SQL

Create QA data by performing data analysis using Databricks notebooks with maintained entropy.

Configured Apache Arrow for columnar data processing at processors in pyspark 2.3

Migrate exiting fraud defenses from Teradata to PySpark environment. Tune exiting spark jobs for performance.

Created modules for switching cluster stack (active and inactive) using Jenkins, AWS lambda to EC2 cluster.

Converted monolithic application into a pip installable package and incorporated it to schedule Jupyter Notebooks using papermill

Built CI/CD pipeline using Jenkins and AWS lambda. Build secret management system. Built redundancy catch up in fraud case creation.

Upgraded Spark Hadoop Version and participated in building Custom Assume Role Credential Provider in jar for spark session assume role auto renewal.

Environment: Linux, Rel7, PySpark, Data Lake, Hadoop, Teradata, PSQL, Python, JSON, AWS, CI/CD, RISK management, Apache Arrow, Credit Card Fraud Prevention, REST, GitHub, Notebooks, pytorch.

WAFTS Solutions 01/2017 to 12/2017

Python Developer

Built a web application for human resource management for small businesses. Provided insights into data for Small to Mid-range businesses and present regular findings from data.

Responsible for complete SDLC process in gathering requirements, system analysis, design, development, testing and deployment.

Developed web applications and RESTful web services and APIs using Python, Django and PHP.

Worked on HTML5, CSS3, JavaScript, AngularJS, NodeJS, Git, REST API, Mongo DB, IntelliJ IDEA.

Design and Setting up of environment of MongoDB with shards and replica sets. (Dev/Test and Production)

Developed and tested many features for dashboard using Python, Bootstrap and JavaScript.

Spearheaded adoption of Responsive web design principles and converted existing websites to Responsive websites by using Bootstrap.

Worked with team on creating an Ansible script to develop Sonar on an Amazon instance.

Worked on Python Open stack API's and used NoSQL as database and followed Python test-driven development techniques.

Develop SQL code to be used with automated processes to identify revenue opportunities and financial issues.

Data mine large datasets in Relational Databases to find emerging issues and root-cause in provisioning, marketing and billing systems

Drive timely resolution of marketing issues for a seamless customer experience.

Proactively monitor daily processes and results to ensure consistent coverage.

Create REST APIs for web services using python Flask, Django frameworks.

Environment: Ubuntu, Python, SQL, JSON, AWS, REST, Node.js, Mongo DB, Jupyter Notebooks, GitHub.

Repulsor Technologies, India May 2015 to Jul 2015

Jr Web Developer

Developed static websites for small scale restaurants.

Designed cover books and posters for technology events in undergraduate collages.

Environment: Windows, HTML, CSS, Adobe Photoshop

EDUCATION

Master of Science in Internet and Web Design, Wilmington University, Wilmington DE, 2017

Bachelor of Technology in Electronics and Communication Engineering, India, 2015

Contact this candidate