Pavan M
***********@*****.***, 551-***-****, Jersey City, NJ, LinkedIn
EXPERIENCE
DATA ENGINEER / Mitaja Corporation (Full-time)- Maryland, USA 01/2023 - Current
● Ensured data consistency across systems through configuration management. Built optimized data pipelines with Apache Spark, Kafka, and AWS Lambda, reducing processing time by 30%.
● Developed robust data architectures for efficient storage and processing. Created and maintained ETL pipelines using Snowflake.
● Successfully migrated on-premise data warehouse to AWS Redshift, achieving a 25% cost reduction. Integrated Redshift with Glue, Lambda, and S3 for seamless data flow.
● Collaborated with UI/UX teams to align wireframes with requirements. Integrated PLC data into central databases.
● Designed data visualizations with Sisense, Tableau, and Power BI. Implemented MDM/PIM solutions with Stibo Systems to improve data governance.
● Validated data consistency using SQL for transaction testing. Applied SPC techniques to monitor and control manufacturing processes, ensuring consistent quality.
DATA ENGINEER / UBER (Intern) - San Francisco, USA Mar – Dec 2023
● Built scalable data pipelines with Apache NiFi, Talend, and Kafka. Implemented data governance for privacy and compliance.
● Ensured data availability and reliability through replication, backup, and disaster recovery. Developed real-time streaming apps integrated with downstream systems.
● Optimized operations using data-driven methodologies. Managed data sourcing for OFAC/BSA/AML teams.
● Created reports with Power BI and managed relational/non-relational databases for business applications.
● Automated CDC processes and data workflows in DataMart using SSIS/SQL and DevOps tools like Bamboo and Bitbucket.
● Migrate SSIS packages to Azure Data Factory; utilize GCP services like BigQuery and Dataflow.
● Improved data quality with automated validation and transformation. Enhanced pipeline efficiency by 40%, enabling better business decisions through real-time analytics.
DATA ANALYST / TCS (Full-time)- Hyderabad, India June/2021 – Nov/2022
● Identified relevant data and developed use cases with domain experts. Ensured accurate ERP data exchange.
● Assisted in creating predictive risk models and standardized data formats for system consistency.
● Managed big data clusters using Hadoop and Hive; analyzed data with Python libraries like Pandas and Matplotlib.
● Developed middleware solutions and data models for integration and reporting. Created ETL workflows with Informatica.
● Implemented data security, optimized SQL queries, and managed databases. Used Jira for ticket management and Teradata for database tasks.
● Enhanced query performance by 50% and integrated multiple data sources into a unified data warehouse. TECHNICAL SKILLS
Programming: Python, MySQL, Java, R, NumPy, Pandas, CSS, BigData Tools: Machine Learning, AI Builder, Chatbots, LLMs, APIs, Regression Models, Kubernates, Git, Jenkins, Apache Spark, Hadoop, Kafka.
Tools: Excel, Tableau, PostgreSQL, SAS, MongoDB, Talend, Apache NiFi, Informatica, Redshift. Certifications: SQL, Python, AWS.
PROJECTS
Revolutionizing Conversational AI with RAG and Cohere Command R+
● Enhanced chatbot accuracy by 35% using Cohere's Command R+, LLama-3, and RAG techniques, integrated with the LLamaindex library.
● Improved response times by 40% through optimized context retrieval and embedding utilization with LLamaindex.
● Boosted user satisfaction by 25% with a Streamlit-based interface, enabling seamless, code-free interactions. Industry Capstone Project – Project Portfolio BI Reporting & Analysis For Studio Lab
● Currently developing business insights by analyzing and comparing initial estimated hours with actual hours worked on Studio Lab’s
● Leveraged Jira for data access and utilized eazyBI for report generation, increasing project time-tracking accuracy by 30%. Automatic PowerPoint Generator
● Led the development of an automated PPT tool, reducing preparation time by 30% with Python-based web scraping and NLP for topic-driven slide generation.
● Authored a thesis on the process, showcasing a 40% enhancement in content accuracy through integrated data extraction and text summarization.
Cloud Data Migration
● Led the migration of an on-premise data warehouse to AWS using tools like AWS DMS, S3, and Redshift, which improved scalability by 40% and reduced operational costs by 25%. EDUCATION
Stevens Institute of Technology, New Jersey, USA
Master’s in Business Intelligence and analytics. Dec 2022 – May 2024 JB Institute of Technology and Engineering, Hyderabad, India Bachelor of Technology in Computer Science Engineering. July 2018 – May 2022