Vamshi Sangireddy
Data Engineer
+1-724-***-**** *****************@*****.*** CA, 95050 Linkedin SUMMARY
Results-driven Data Engineer with 5 years of experience in designing, developing, and managing robust data pipelines and ETL processes. Proficient in handling large-scale datasets, ensuring seamless data integration, transformation, and loading to support business intelligence and analytics. Expertise in SQL, Python, and big data tools like Hadoop, Hive, and Spark to create efficient and scalable solutions. Skilled in leveraging cloud platforms such as AWS and Azure for data storage, processing, and orchestration, including services like S3, Redshift, Azure Data Factory, and Data Lakes. Strong understanding of data modeling, data governance, and data quality principles to ensure the accuracy and reliability of data assets.
SKILLS
Methodologies: SDLC, Agile, Waterfall
ML Algorithm: Linear Regression, Logistic Regression, Decision Trees, Supervised Learning, Unsupervised Learning Programming Languages: Python, SQL, Java, Scala
Big Data Technologies: Hadoop, Spark, Hive, Kafka
Data Warehousing: Snowflake, Teradata
Cloud Platforms: AWS (S3, EMR, Redshift, EC2), Azure (Data Lakes), GCP (BigQuery) ETL Tools: Informatica, Talend, SSIS, Apache NiFi, Alteryx Databases: SQL Server, MySQL, PostgreSQL, MongoDB
Data Visualization Tools: Tableau, Power BI, Looker DevOps Tools: Docker, Kubernetes, Jenkins
Data Integration and Automation: Apache Airflow
Collaboration and Project Management Tools: JIRA, Confluence, Trello Operating Systems: Linux, Windows
EDUCATION
Masters in Data Science January 2023 – December 2024 The University of Memphis, Memphis, TN, 38111
Bachelors in Computer Science and Engineering August 2018 – August 2022 Vignana Bharathi Institute of Technology, Hyderabad, India EXPERIENCE
Data Engineer USA
Freddie Mac June 2024 – Present
Implemented data pipelines using AWS services, ensuring efficient data ingestion, processing, and storage.
Optimized ETL processes to extract, transform, and load data from sources into centralized data warehouses using Python and SQL.
Conceptualized Power BI's DAX (Data Analysis Expressions) language to perform complex calculations and create custom measures and KPIs.
Analyzed services offered by Azure like Flash Storage, Information Factory, and Synapse Analytics, deliberate scalable data pipelines that enhanced data accessibility and reliability by 35%.
Developed MapReduce algorithms to process large-scale datasets, increasing data processing efficiency and reducing computation time by 30%.
Formulated complex SQL queries to retrieve data from relational databases, accelerating query performance.
Revamped data workflows with PySpark, reducing execution time by 35% and ensuring scalable processing for datasets up to 1.5TB, thereby enhancing the data team's productivity by 25%.
Managed Apache Airflow DAGs for orchestrating complex ETL workflows, increasing job scheduling efficiency by 25% and reducing workflow errors by 20%.
Data Engineer INDIA
Hellinex Cloud August 2018 – December 2022
Established a data platform from scratch, an activity Influenced the project's requirement gathering and analysis phase and documenting the business requirements.
Executed data mining using R and SAS to perform statistical tests, including hypothesis, which upgraded data accuracy.
Conducted machine learning experiments and settled predictive models, uncovering insights that increased business forecasting accuracy by 25%.
Transitioned on-premises infrastructure to AWS using both lift-and-shift and re-architecture approaches, enhancing scalability and reducing operational costs.
Delivered technical guidance and support to cross-functional teams on AWS architecture and best practices, resulting in a 15% improvement in team efficiency.
Improved decision-making processes with the help of Power BI dashboards associated with Azure services, that provide real-time data insights.
Employed Hadoop-based solutions for processing and storing large datasets, enhancing data scalability and reducing data processing time.
Executed data processing and analysis using Pyspark, optimizing performance and scalability of data workflows.