Arun kumar
*************@*****.***
Professional Summary
Data Engineer / Spark Developer with 3+ years of IT Industry experience in designing, developing and maintaining large-scale systems/applications with significant expertise in Python 2.7, 3.2 and 3.6.
Experienced with full software development life-cycle, architecting scalable platforms, object-oriented programming, database design and agile methodologies
Expert knowledge of and experience in Object Oriented Design and Programming concepts.
Experience in writing data processing frameworks for large scale applications
Hands on experience in using Spark ecosystem components like S3, RDS, Snowflake, REST, Kafka
Experience in closely working with data analysis and Data scientists for converting POCs into production grade software.
Experience in consuming data from Kafka into spark micro batches.
Worked on spark application tuning and resource allocations based on use case
Experience in Shell Scripting, SQL Server, UNIX and Linux.
Experience in building applications in AWS infrastructure; Cloud Formation Templates, Cloud Watch Alarms, S3, RDS, Security Groups, VPCs, EC2.
Familiar with JSON based REST Web services and Amazon Web services.
Automated the continuous integration and deployments using Jenkins and AWS Cloud Templates deployment services (Lambda).
Experience with web scrapping using beautiful soup; Created own flight logistic application using information from scrapped data.
Experience in project deployment using Jenkins and using web services like Amazon Web Services (AWS) EC2, Cloud Formation Templates, AWS S3 and Cloud watch.
Exposure to ML/DL ecosystem, algorithmic approaches along with design and performance constraints
Experience in tools in java as part of personal development.
Exposure to Tensor flow and PyToch frameworks & hardware for AI.
Professional skills
Programming Languages: Python, Java, JavaScript
Web services: RESTful
Data bases: Oracle 10/11g, MySQL, SQL Server
IDE’s and tools: Eclipse, Pycharm, NetBeans
OS &Environment: XP, windows, Linux, Unix, Ubuntu
Unix Shell Scripting: Unix Shell Scripting
Version control: GitHub
Development Methodologies: Agile, Scrum
Hadoop: HDFS, MapReduce, Spark
Machine Learning: KNN, Gradient Descent, Back Propagation.
Professional Experience
Capital One, Chicago, IL
Role: Software Engineer / Data Engineer June 2020 to Present
Collaborate with and across Agile teams to design, develop, test, implement, and support technical solutions in full-stack development tools and technologies
Work with a team of developers with deep experience in machine learning, distributed microservices, and full stack systems
Utilize programming languages like Java, Scala, Python and Open Source RDBMS and NoSQL databases and Cloud based data warehousing services such as Snowflake
Share your passion for staying on top of tech trends, experimenting with and learning new technologies, participating in internal & external technology communities, and mentoring other members of the engineering community
Collaborate with digital product managers, and deliver robust cloud-based solutions that drive powerful experiences to help millions of Americans achieve financial empowerment
Perform unit tests and conducting reviews with other team members to make sure your code is rigorously designed, elegantly coded, and effectively tuned for performance
Environment: Linux, Rel7, PySpark, Snowflake, Data Lake, Hadoop, Teradata, PSQL, Python, JSON, AWS, CI/CD, RISK management, Apache Arrow, Credit Card Fraud Prevention, REST, GitHub, Notebooks, pytorch.
Caterpillar Innovation Center, Champaign IL 10/2019 to 05/2020
Data Engineer II
Working on an initiative to build a data lake and an ecosystem of dada intensive applications to enable real time insight into telemetric data in manufacturing and on field machines. Build pipelines to transform time series feature engineer time series data for visualization.
Review design documents on echo system for data-lake and test alternatives for empirical proofs
Develop python modules to support and develop in built ETL tool for ETL process visitation and development
Maintain existing linear regression model on telemetric data for pressure sensors using IOT data
Building a framework for data objects on time series data
Environment: Linux, Rel7, Data Lake, Python, JSON, AWS, CI/CD, IOT, React JS, Seaborne, REST, GitHub, Notebooks, pytorch.
Capital One, Richmond VA 12/2017 to 09/2019
Data Engineer
Capital One maintains an ecosystem of internal data processing applications designed to prevent fraud of various types including transactional and application fraud with dynamic fraud patterns. Ecosystem of highly available multiple spark clusters running rule on kafka streams and batch data.
Developed multiplatform PySpark framework used to create spark jobs and provide SQL like interface for data analysts. This application can be used for both creating spark jobs and productionize them.
Responsible for migrating findings from Data Scientist and fraud investigators into existing fraud defenses. And over seeing Data Science Approaches so that functionalities can be converted to production environment.
Designed data flow for new business needs and participated in architectural and workflow discussions.
Responsible for maintaining allocating and tuning resources for faster performance for heavy data loads.
Resolved Big Data small files issue in the organization using spark Hadoop configurations and dependency configurations.
Designed infrastructure configurations for spark resource management suited for the data processing needs.
As part of Capital One Fraud Prevention team, worked on maintaining old and developing new rule engines along with application resiliency strategies. Created new Fraud defenses using existing data processing patterns
Responsible for developmental and production data security aspects along with upgrading the system with new software and infrastructural features
Converted exiting traditional Teradata fraud defenses into cloud architecture and spark SQL
Create QA data by performing data analysis using Databricks notebooks with maintained entropy.
Configured Apache Arrow for columnar data processing at processors in pyspark 2.3
Migrate exiting fraud defenses from Teradata to PySpark environment. Tune exiting spark jobs for performance.
Created modules for switching cluster stack (active and inactive) using Jenkins, AWS lambda to EC2 cluster.
Converted monolithic application into a pip installable package and incorporated it to schedule Jupyter Notebooks using papermill
Built CI/CD pipeline using Jenkins and AWS lambda. Build secret management system. Built redundancy catch up in fraud case creation.
Upgraded Spark Hadoop Version and participated in building Custom Assume Role Credential Provider in jar for spark session assume role auto renewal.
Environment: Linux, Rel7, PySpark, Data Lake, Hadoop, Teradata, PSQL, Python, JSON, AWS, CI/CD, RISK management, Apache Arrow, Credit Card Fraud Prevention, REST, GitHub, Notebooks, pytorch.
WAFTS Solutions 01/2017 to 12/2017
Python Developer
Built a web application for human resource management for small businesses. Provided insights into data for Small to Mid-range businesses and present regular findings from data.
Responsible for complete SDLC process in gathering requirements, system analysis, design, development, testing and deployment.
Developed web applications and RESTful web services and APIs using Python, Django and PHP.
Worked on HTML5, CSS3, JavaScript, AngularJS, NodeJS, Git, REST API, Mongo DB, IntelliJ IDEA.
Design and Setting up of environment of MongoDB with shards and replica sets. (Dev/Test and Production)
Developed and tested many features for dashboard using Python, Bootstrap and JavaScript.
Spearheaded adoption of Responsive web design principles and converted existing websites to Responsive websites by using Bootstrap.
Worked with team on creating an Ansible script to develop Sonar on an Amazon instance.
Worked on Python Open stack API's and used NoSQL as database and followed Python test-driven development techniques.
Develop SQL code to be used with automated processes to identify revenue opportunities and financial issues.
Data mine large datasets in Relational Databases to find emerging issues and root-cause in provisioning, marketing and billing systems
Drive timely resolution of marketing issues for a seamless customer experience.
Proactively monitor daily processes and results to ensure consistent coverage.
Create REST APIs for web services using python Flask, Django frameworks.
Environment: Ubuntu, Python, SQL, JSON, AWS, REST, Node.js, Mongo DB, Jupyter Notebooks, GitHub.
Repulsor Technologies, India May 2015 to Jul 2015
Jr Web Developer
Developed static websites for small scale restaurants.
Designed cover books and posters for technology events in undergraduate collages.
Environment: Windows, HTML, CSS, Adobe Photoshop
EDUCATION
Master of Science in Internet and Web Design, Wilmington University, Wilmington DE, 2017
Bachelor of Technology in Electronics and Communication Engineering, India, 2015