Vineela Velaga
Data Engineer
Bloomfield, NJ, USA +1-640-***-**** *****************@*****.*** LinkedIn SUMMARY
Data Engineer with 3 years of experience in scalable ETL pipelines, data warehousing, and machine learning workflows. Skilled in Python, SQL, Apache Spark, and cloud platforms (AWS, GCP, Azure). Proficient in creating data models, automating pipelines, and building dashboards. Experienced in deploying machine learning models, optimizing CI/CD processes, and improving efficiency by 30%, leading to enhanced system reliability and faster data processing.
EXPERIENCE
Data Engineer, Doublene, USA 01/2024 – Present
• Spearheaded the development of a scalable ETL pipeline using PySpark, reducing data processing time by 25% across the organization.
• Designed and implemented Star Schema Data Models to support high-volume reporting systems, enhancing query efficiency by 40%.
• Worked with AWS Redshift for data warehousing solutions, optimizing storage and query performance to reduce costs by 25%.
• Created interactive Power BI dashboards, improving decision-making efficiency by 15% for operational teams.
• Collaborated with data science teams to prepare and validate datasets for machine learning models, ensuring 98% data completeness and 95% accuracy, improving model training efficiency.
• Automated deployment workflows using Git and Jenkins, reducing deployment errors and lead time by 30%.
• Conducted root cause analysis for production data anomalies, achieving 95% reliability for critical data systems.
• Improved data consistency and quality by writing complex SQL scripts to perform data validation and transformation tasks, achieving a 98% data accuracy rate.
Data Engineer, Techecy, India 06/2020 – 06/2022
• Strengthened ETL workflows using Python, SQL, and Apache Spark, increasing throughput by 20% and reducing error rates by 15%.
• Deployed predictive analytics models on Databricks, improving efficiency by 10% through faster processing and more accurate predictions.
• Adhered to data security standards, complying with GDPR and HIPAA, reducing security incidents by 40%.
• Collaborated with developers to create APIs and microservices for model deployment, reducing deployment times by 30% and improving scalability.
• Deployed machine learning models using Scikit-Learn, TensorFlow, and Keras, leading to a 25% improvement in prediction accuracy.
• Conducted data preprocessing, including cleaning, transformation, and normalization, improving dataset quality by 30%.
• Developed and managed CI/CD pipelines with Agile and SDLC methodologies, integrating Jenkins and Azure DevOps, leading to a 50% reduction in deployment time.
PROJECTS
Fake job detection Python, NLP, pandas, data visualization, Machine learning
• Developed a machine learning model to detect fraudulent job postings, reducing fraud by 35%.
• Achieved 98% accuracy in fraud detection using Random Forest and Naive Bayes, automating the fraud prevention process and reducing false positives by 30%.
Understanding and Enhancing User Experience on Social Media Platforms Python, NLP, pandas, machine learning
• Built a sentiment classification model for Facebook comments, increasing accuracy by 25%.
• Improved sentiment analysis accuracy by 25% through data cleaning, feature extraction, and machine learning techniques, enhancing user experience insights.
SKILLS
• Programming Languages: Python, SQL, HTML/CSS, JavaScript, PL/SQL, R
• Databases: Oracle, MySQL, Microsoft SQL Server, Data Modeling, PostgreSQL, MongoDB
• Frameworks & Libraries: TensorFlow, Keras, Pandas, NumPy, Sklearn, Flask
• Big Data Technologies: Apache Spark, Apache Hive, Hadoop, HDFS, Apache Kafka
• ETL& Data Engineering: Informatica PowerCenter, FiveTran, Snowflake, ETL development, Data Warehousing
• Data Visualization & Analysis: PowerBI, Excel, Plotly, Tableau
• Build & Deploy: Bazel, Terraform
• Cloud Platforms: GCP (BigQuery, GCS, Cloud Function, PubSub, Dataproc), AWS (S3, Redshift, Lambda, Glue), Azure (Data Factory, Synapse Analytics)
• Version Control & Development: Git, GitHub, PyCharm, Jupyter Notebook, VSCode
• Machine Learning: Deep Learning, Image Processing, NLP, Supervised Learning, Feature Engineering, EDA, Cluster Analysis CERTIFICATIONS
• AI for Everyone
• Programming for Everybody
• Python Data Structures
EDUCATION
Masters in Computer Science 09/2022 – 05/2024
Montclair State University, Montclair, NJ, USA
Bachelors in Electronics and Communication Engineering 07/2017 – 07/2021 Velagapudi Ramakrishna Siddhartha Engineering college, Vijayawada, India