Data Engineer Processing

Location:

Charlotte, NC

Posted:

September 11, 2023

Contact this candidate

Resume:

Ajith Reddy

**********.***@*****.***

469-***-****

SUMMARY:

6+ year’s experience in Software Development Life Cycle and major aspects of technological projects including Requirement Analysis, Design, Development, Database Design, Deployment, Testing, Debugging, and client-server enterprise applications using Python.

Extensive expertise in developing data pipelines involving Spark using various modules of Spark like Spark-Core, Spark-SQL, and Spark Streaming using PySpark and Scala.

Excellent knowledge of Azure cloud concepts like Azure Databricks, Data Lake Storage, SQL Database, Azure Storage, Azure Virtual Machines, and Azure Key Vault.

Excellent knowledge of Amazon cloud concepts like VPC, EC2, S3, CloudWatch, CloudTrail, Cloud Formation, IAM, Redshift, RDS, SQS, Snowflake and Lambda.

Expertise in various components of the Hadoop Ecosystem - Map Reduce, Hive, Pig, Sqoop, Impala, Flume, Oozie, HBase, MongoDB, Cassandra, Spark, Kafka, YARN.

Excellent working experience in Big Data Integration and Analytics based on Hadoop, Spark, Kafka, Storm, and web Methods technologies.

Highly Proficient in writing complex SQL Queries, stored procedures, and triggers and very well experienced in PL/SQL or T-SQL.

Experience of working with relational databases like Oracle, MySQL, PostgreSQL.

Expertise in full life cycle application development and good experience in Unit testing and Test-Driven Development (TDD) and Behavior driven Development.

Strong experience in DevOps Environment by enhancing Continuous Delivery and infrastructure change by using Chef, Ansible, Kubernetes and Docker to deploy code with GIT, Jenkins.

Hands on experience with continuous integration and automation using Jenkins.

Designed and developed end-to-end ETL pipelines using PySpark, efficiently extracting, transforming, and loading data from various sources into target data warehouses.

Proficiently scripted and automated data processing tasks using shell scripting, enhancing efficiency and streamlining data workflows.

Knowledgeable about utilizing NLP techniques including tokenization, text vectorization, sentiment analysis, and named entity recognition.

Utilized Django framework to design and develop robust web applications, implementing MVC architecture, database management, and seamless user experiences

Written and developed scripts for automating tasks using UNIX shell scripting.

Worked on various Operating Systems like UNIX, Linux, and Windows.

Experienced with version control systems like Git, GitHub, CVS, and SVN to keep the versions and configurations of the code organized.

Highly motivated, dedicated, quick learner and have proven ability to work individually and as a team.

Excellent written and oral communication skills with results-oriented attitude.

TECHNICAL SKILLS:

Programming Languages:

Python, SQL, Java and PL/SQL

Web Technologies:

HTML, CSS, JavaScript, XML, JSON, JQuery, AngularJS, NodeJS.

Frameworks:

Django, Flask, Pyspark, PyJamas, Databricks, Talend and Airflow.

Relational Databases:

HQL, MySQL, Teradata, Redshift, SQL Server, Oracle, Db2

NoSQL Databases:

Hbase, DynamoDB, MongoDB, Cassandra, Elastic Search

IDE's/Development Tools:

PyCharm, Jupyter Notebook, Eclipse, PyScripter, Visual Studio, Sublime Text and Git, GitHub, SVN, CVS.

Web Services:

AWS, RESTful Web Services, SOAP

Operating Systems:

UNIX, Linux, Windows

Methodologies:

Agile, Scrum, Waterfall, Kanban

Data Visualization:

Tableau, Power BI

ETL Pipeline Tools:

Talend, Informatica, Microsoft SSIS

Big Data/Hadoop:

Spark, Kafka, HDFS, MapReduce, YARN, Hive, Pig, Sqoop, Flume, Zookeeper, Oozie

EDUCATIONAL DETAILS:

Bachelors in information technology from MLRIT, Dundigal, Hyderabad in 2016.

WORK EXPERIENCE:

Client: Alfa Insurance, Montgomery, AL March 2023 to present

Role: Senior Python Developer – Data engineer

Responsibilities:

Experienced in ORM programming in converting data between incompatible type systems in OOP languages.

Working on Snowflake as a cloud-based data warehousing solution to manage and analyze large-scale datasets for improved business insights.

Applied data modeling and database design principles to create robust and scalable data structures, ensuring optimal data organization, integrity, and performance for efficient querying and reporting.

Developed and maintained Restful APIs using Python, ensuring seamless data flow between the front-end and back-end components of the application.

Implemented CI/CD pipelines using Jenkins to automate the deployment process, reducing manual errors and increasing deployment speed.

Developed AWS Lambda functions in various programming languages to handle event-driven tasks and automate workflows.

Experience in project deployment Amazon Web Services (AWS) and Amazon Simple Storage Services (S3).

Proficient in designing and developing ETL (Extract, Transform, Load) workflows using AWS Glue to process and transform data from various sources, including databases, S3, and streaming services.

Integrated various AWS services, such as S3, RDS, DynamoDB, Lambda, API Gateway, and more, to build efficient and serverless cloud-based solutions.

Experienced in Set up an AWS Databricks workspace and create a cluster for executing my Python code.

Used Amazon Cloud EC2 along with Amazon S3 bucket to upload and retrieve project history.

Extensive experience in designing, scheduling, and managing complex data workflows using Apache Airflow.

Developed and maintained DAGs to orchestrate data pipelines, defining task dependencies and execution sequences.

Implemented role-based access controls (RBAC) and data access policies, restricting unauthorized access to sensitive data and ensuring data security.

Experienced working on Airflow by scheduling and Backfilling, Error handling and logging.

Proficient in using PySpark for processing and transforming large-scale datasets, leveraging the power of distributed computing.

Experienced working with PySpark and expertise in Data cleaning and preprocessing, SQL Integrations, Performance optimization, cloud integration and data serialization on different formats.

Collaborated with internal teams to convert end user feedback into meaningful and improved solutions.

Developed and tested many features for dashboard using Python, Java, Bootstrap, CSS, JavaScript and J Query.

Maintained the code using version control system Bitbucket.

Used RabbitMQ, Jira, Bitbucket, Crucible, Bamboo and SharePoint tools to maintain the code, user stories and fixing the bugs.

Resolved ongoing problems and accurately documented progress of a project.

Created the environment specific settings for a new deployment and updated the deployment specific conditional checks in code base to use specific checks.

Environment: Python, Django, Mongo DB, AWS, Spark, Pandas, J Query, Zookeeper, bootstrap, My SQL, Linux, Ajax, Java Script, JIRA, Cassandra, HTML 5 and CSS, Angular JS.

Client: Simon Kucher and Partners, NY April 2022 to February 2023

Role: Senior Data Engineer

Responsibilities:

Wrote and executed various MYSQL database queries from Python using Python-My SQL connector and My SQL DB package.

Development of company s internal CI system, providing a comprehensive API for CI/CD.

Worked with team of developers on Python applications for RISK management.

Generated Python Django Forms to record data of online users.

Integrated Kafka with various data sources and sinks, including databases, data lakes, and cloud services, ensuring seamless data synchronization and distribution.

Integrated Azure App Service and Azure Functions to build scalable and reliable web applications, optimizing for performance and cost efficiency.

Experienced Azure SQL Database for managing large datasets, implementing robust security measures to protect sensitive information.

Experienced Azure Synapse's integration with Apache Spark for processing and analyzing large-scale datasets, accelerating data processing tasks and optimizing performance.

Proficient in working with Azure Blob Storage, Azure Table Storage, and Azure Queue Storage, understanding their features, capabilities, and use cases.

Managed Azure SQL Database and Azure Cosmos DB for data storage and retrieval, implementing data replication, backups, and disaster recovery solutions.

Skilled in using Azure Data Factory's data transformation activities and mapping data between different data sources, handling schema evolution and data cleansing.

Knowledgeable in using Azure Data Factory Data Flow to visually design and build data transformation logic using a code-free environment.

Skilled in managing Azure Data Factory pipelines using version control systems like Git and implementing CI/CD pipelines for automated deployment and continuous integration.

Led a successful data migration project from Cloudera to Azure cloud, ensuring smooth and secure transfer of data.

Understanding of data security practices and familiarity with encrypting data at rest and in transit in Azure Storage, including the use of HTTPS, encryption keys, and Azure Key Vault integration.

Developed serverless APIs using Azure Functions, leveraging event-driven architecture to handle asynchronous tasks and enable efficient data processing.

Set up an Azure Databricks workspace and create a cluster for executing Python code.

Experience in implementing RESTful APIs to enable data integration and exports from Azure data services like Azure SQL Database, Azure Cosmos DB, and Azure Data Lake Storage.

Contribute to major development initiatives with codebases utilizing Python, Django, MySQL, Mongo DB, j Query and React.

Experienced in ingesting data from various sources such as Hadoop HDFS, Amazon S3, relational databases, and streaming platforms into PySpark Data Frames and RDDs.

Worked on cross browser compilation and Responsive Web Design using HTML5, CSS3 and Bootstrap.

Designed and developed the UI of the website using HTML, XHTML, AJAX, CSS, and JavaScript.

Created web application prototype using jQuery and Angular JS.

Developed flask Restful API endpoints for accessing through UI and uploading ETL data.

Developed entire frontend and backend modules using Python on Django Web Framework with GIT.

Worked on Oracle 11g databases and wrote SQL queries as well as stored procedures for the application.

Assisted with production support activities using JIRA, when necessary, to help identify and resolve escalated production issues based on the SLA.

In addition, if the fix requires a code change, documented the Change Request, and assign to the respective teams and tracked them to closure.

Wrote documents in support of the SDLC phases. Documents include requirements and analysis reports, design documents, and technical documentation.

Used advanced packages like Mock, patch and beautiful soup (b4) to perform unit testing.

Created the environment specific settings for a new deployment and updated the deployment specific conditional checks in code base to use specific checks.

Environment: Python, Django, Azure, Mongo DB, Pandas, Spark, J Query, Zookeeper, bootstrap, MySQL, Linux, Ajax, Java Script, JIRA, Cassandra, HTML5 and CSS, Angular JS.

Client: KJT Groups, Rochester, NY June 2021 to April 2022

Role: Senior Data Engineer

Responsibilities:

Translated the customer requirements into design specifications and ensured that the requirements translate into software solution.

Setup full CI/CD pipelines so that each commit a developer makes will go through standard process of software lifecycle and gets tested well enough before it can make it to the production.

Utilized Azure Virtual Machines and Azure Kubernetes Service (AKS) to deploy and manage scalable applications, ensuring high availability and efficient resource utilization.

Experienced Azure Databricks to process and analyze massive datasets using Apache Spark, improving data processing efficiency and enabling advanced analytics.

Implemented data storage solutions using Azure Blob Storage, utilizing its cost-effective and highly scalable object storage capabilities for structured and unstructured data.

Generated Python Django Forms to record data of online financial users.

Worked on Integration of Selenium RC/Web Driver with existing API to test Framework.

Used Pandas API to put the data as time series and tabular format for east timestamp data manipulation and retrieval.

Cleaned data and processed third party spending data into maneuverable deliverables within specific formats with Excel macros and python libraries. Used TDD (Test driven development) methodology.

Developed data pipelines using Databricks notebooks, efficiently transforming and ingesting data from various sources for downstream analysis and reporting.

Used Python library Beautiful Soup for web scrapping to extract data for building graphs.

Experience in writing Sub Queries, Stored Procedures, Triggers, Cursors, and Functions on My SQL and Post gre SQL database.

Experienced in connecting Power BI to various data sources, including databases, APIs, Excel files, and cloud services.

Designed and developed ETL processes that were used to integrate data between various sources using Talend, Power Center 9.

Designed and developed Oracle PL/SQL Procedures, Functions, and Database Triggers and involved in creating and updating Packages to meet business requirements.

Designed REST full XML web service for handling AJAX requests.

Environment: Python, Django, Selenium, Azure, Spark, HTML5, CSS, Oracle DB, Pandas, MS SQL Server 2013, Jasper, Reports, JavaScript, Ajax, Eclipse, Linux, Shell Scripting, REST full, MVC3.

Client: IRI, Chicago November 2020 to June 2021

Role: Senior Data Engineer

Responsibilities:

Utilized data build tools to automate ETL processes, enabling streamlined data extraction, transformation, and loading operations, resulting in enhanced data accuracy and reduced processing time.

Created Azure Databricks Python notebooks to consume data from Kafka and Elastic for real-time data processing.

Wrote scripts in Python for extracting data from HTML file.

Worked in development of applications in UNIX environment and familiar with all of its commands.

Wrote and executed various MYSQL database queries from python using Python-MySQL connector and My SQLdb package.

Thorough knowledge in various front-end tools like HTML, DHTML, CSS, JavaScript, XML, JQuery, Angular JS, and AJAX.

Experience in working with Drag and Drop Calculation, and Geographic Search by using Tableau.

Strong analytical skills with the ability to extract meaningful insights from data using PySpark.

Spark Streaming collects data from Kafka in near-real-time and performs necessary transformations and aggregation to build the common learner data model and stores the data in NoSQL store (MongoDB).

Experienced in developing Web Services with Python programming language.

Good experience with Unix, Linux scripting/unit testing/Load testing, Performance stress/endurance testing and Integration Testing; tools and frameworks- Junit, PyUnit, HP VUGEN LoadRunner, Selenium Web driver, Controller, Dynatrace, Performance Center, QC, JIRA.

Built real-time data processing pipelines using Azure Stream Analytics, enabling the analysis and visualization of streaming data from various sources.

Developed serverless applications using Azure Functions, reducing infrastructure management overhead and enabling automatic scaling based on demand.

Designed and implemented Hive delta tables to store the processed results in a structured table format.

Build back-end application with Python / Django, work with Docker, RabbitMQ, Celery, Jenkins.

Experience in debugging and troubleshooting programming related issues.

Developed PowerShell scripts to configure the upgraded Exchange environment, migrate users, then monitor and maintain the new environment.

MySQL and custom tools developed in Python and Bash.

Designed and developed the UI of the website using HTML, AJAX, CSS, JQuery and JavaScript.

Developed user-friendly modals for the form submissions using simplemodal.js, JQuery, Ajax and JavaScript.

Environment: Python, Django, Azure, VBscripts, Jasper, MySQL, Zookeeper, Unix, HTML, CSS, JQuery, JavaScript, Apache, Ajax, simplemodal.js, Git, PowerShell, Selenium, Linux, Bootstrap, GitHub, Beautiful Soup, JIRA, Panda, Jenkins.

Client: Dynata, Hyd, India February 2018 to November 2020

Role: Data Analyst

Responsibilities:

Involved in analysis, specification, design, and implementation and testing phases of Software Development Life Cycle (SDLC) and used agile methodology for developing application.

Worked on Amazon Athena for interactive querying and analysis of data stored in Amazon S3, enabling ad-hoc queries and data exploration.

Developed RESTful APIs using Amazon API Gateway, providing a secure and scalable interface for applications to interact with services.

Utilized AWS Glue's transformation capabilities to clean, enrich, and structure data, ensuring its suitability for downstream analytics and reporting.

Designed, deployed, and managed scalable applications using Amazon Elastic Compute Cloud (EC2), optimizing resource allocation and ensuring high availability.

Implemented data storage solutions using Amazon Simple Storage Service (S3), leveraging its scalable and cost-effective object storage for various data types.

Experienced in connecting to various data sources, transforming data, and creating data pipelines within Tableau.

Applied Normalization and De-Normalization techniques for both OLTP and OLAP systems, creating database objects like tables, constraints, and indexes.

Used Jenkins for continuous integration for code quality inspection and worked on building local repository mirror and source code management using Git hub.

Developed Talend Bigdata Jobs for loading Claim data from Unix to Hive through HBase.

Implemented queries with HiveQL and Pig for data processing and analysis.

Wrote unit testing codes using unit test, resolving bugs and other defects using Firebug.

Created and optimized Talend jobs to automate data workflows, ensuring the efficient and reliable movement of data between systems.

Proficient in designing, developing, and maintaining ETL processes using Talend to extract, transform, and load data from various source systems to target databases and data warehouses.

Implemented Performance tuning and improved the Performance of Stored Procedures and Queries.

Environment: Python, AWS, Django Web Framework, Java, HTML, CSS, NoSQL, JavaScript, J Query, Sublime Text, Jira, GIT, py Builder, unit test, Firebug, Web Services.

Client: Test Yantra, Bangalore, India June 2017 to January 2018

Role: Programmer Analyst

Responsibilities:

Created test cases and test scripts to validate user registration, login, product search, cart functionality, and checkout process.

Tested responsive design by emulating different screen sizes and devices, ensuring a consistent user experience.

Collaborated with the development team to provide timely feedback on bug reports and test results.

Integrated the testing process into the development workflow using Git hooks and continuous integration tools.

Skilled in creating comprehensive test cases, test scenarios, and test data that cover various functional and non-functional aspects of the application

Proficient in using popular Java testing frameworks like JUnit and TestNG to design and execute unit, integration, and functional tests.

Experienced in writing and executing complex SQL queries and scripts to verify data accuracy, consistency, and completeness.

Skilled in designing and executing test cases to validate data integrity and accuracy, ensuring data consistency between application and database.

Environment: Java, JavaScript, HTML, CSS, GIT, Cypress, SQL, Selenium WebDriver.

Contact this candidate