Data Engineer Python Developer

Location:

Athens, GA

Posted:

April 30, 2024

Contact this candidate

Resume:

PRUDHVI ARUN CHEKKA

***********.***@*****.*** LinkedIn GitHub Portfolio 720-***-**** Athens, GA

“A committed team player and an enthusiastic learner with 5 years of experience as a Developer in multiple Agile teams. Proficient in a wide range of programming languages, tools, and technologies, with a strong foundation in computer science concepts”.

EDUCATION

University of Georgia, Athens GA, USA January 2023 – May 2024

Master’s in Computer Science GPA: 4.0

Courses: Advanced Distributed System, Advanced Information Systems, Data Science, Algorithms, SE, DBMS, Computer Networks.

SKILLS

Programming Languages/Tools: Python, Java, JavaScript, HTML, CSS, React, SQL, AWS, Azure, Git, Jenkins, CI/CD, Docker, PySpark, Airflow.

Key Skills: Django REST Framework, Flask, Machine Learning, Data Modeling, Data Science.

WORK EXPERIENCE

Senior Python Developer, General Electric, Bangalore (India) July 2021 – December 2022

Build REST API functions using Flask framework, essential for all lines of business in GE to execute over 500 ETL tasks daily in AWS.

Leveraged React for frontend development, enhancements and bug fixing to ensure seamless user experience and optimal functionality.

Enhanced JSON parser to efficiently ingest data from diverse sources including Splunk, Elasticsearch, Azure, Intune, MobileIron, Redshift, Amazon S3, LDAP, MySQL, Oracle, PostgreSQL, and Box.

Integrated and troubleshot an ETL integrity checker tool, leveraging regex patterns. Implemented custom notifications using AWS SNS and AWS API Gateway to send reports and updates on thresholds.

Worked with comprehensive AWS pipeline management system using Python, Boto3, and Docker, enabling deployment and orchestration of AWS ECS, EC2, Fargate, Event bridge, and Lambdas.

Leveraged proficiency in Databricks and Apache Spark to architect and optimize data processing pipelines, enabling high-throughput processing and sophisticated analytics, driving operational efficiency, and delivering actionable insights.

Containerized and orchestrated Batch and Streaming job workflows using Apache Airflow, resulting in a 30% reduction in manual effort.

Employed multi-threading and lazy loading techniques to optimize code, resulting in 40% reduction in execution time for data ingestion.

Implemented continuous integration and deployment (CI/CD) pipeline using GitHub triggers and Jenkins, enabling automatic deployment upon code push to the Git repository, streamlining the development and deployment process.

Thrived in an agile, rapid paced environment with distributed teams of both technical and non-technical individuals, fostering successful collaboration and communication.

Assumed end to end ownership and comprehensive responsibility for projects, proficiently steering them from the proof-of-concept stage to successful deployment in the production environment.

Software Engineer, Tata Consultancy Services (Client: NYL), Hyderabad (India) June 2018 – July 2021

Designed and developed more than 50 Control Reports to test more than 450K policies and 80,000K transactions during a data migration employing SSIS and Python Automation.

Automated querying and comparison of data between Oracle and SQL Server databases using linked server functionality.

Contributed to the development of over 200 automation test cases leveraging the Orchestrator tool. This significantly streamlined the testing workflow, resulting in a 30% improvement in accuracy and a 40% acceleration in the policy validation process.

PROJECTS

House Price Predictions Advanced Regression Techniques - Kaggle

Led Exploratory Data Analysis (EDA) to unveil patterns and outliers on 79 feature variables.

Performed data preprocessing to handle missing values, categorical variables by employing data imputation, normalization, and one-hot encoding, ensuring an optimized dataset for advanced regression analysis using NumPy and Pandas.

Executed Model Development by utilizing techniques such as Grid Search for Hyper Parameter Optimization and Kfold Cross-Validation.

Enhanced predictive data models using stacked Regression and ensemble methods, for effectively combining the strengths of multiple models to achieve superior predictive accuracy and stability which improved the RMSE score from 0.11 to 0.074.

ScalaTion Project Computational Data Science

Actively contributed to the Scalation project as a Research Assistant, working in the development of Regression Trees to facilitate optimal data splitting at greater depths within a comprehensive modeling using Scala.

Implemented feature bagging approach to enhance the versatility and performance of Random Forests, mitigating overfitting.

Engineered a gradient-boosting algorithm, showcasing adeptness in implementing and optimizing machine learning algorithms.

Conducted comparative assessments against scikit-learn’s libraries and SclaTion, achieving a correlation of 98%.

Text to SQL generator Optimizing LLM for querying database.

Created a Text to SQL chatbot with Langchain to access OpenAI API’s, LLM and streamlit, applying Prompt Engineering, Chain of Thought, and Zero-shot Learning techniques.

Contact this candidate