Sneha G
770-***-**** *****.********@*****.***
PROFESSIONAL SUMMARY
Data Engineer with 5 years of experience in designing scalable data pipelines and ETL workflows using AWS, Azure, and Databricks. Demonstrated success in optimizing data systems for improved quality and performance, including automated processes and custom data flows. Proven expertise in Python, SQL, and Big Data technologies, delivering actionable insights through reliable dashboards and analytics.
TECHNICAL SKILLS
• Big Data Technologies: Hadoop, HDFS, MapReduce, Hive, Sqoop, Spark, Kafka, YARN, Nifi, Zookeeper
• Hadoop Distributions: Cloudera, Hortonworks
• Cloud Solutions: Microsoft Azure, ADF, AWS EMR, S3, Glue Data Catalog, Databricks
• Databases: SQL Server, MySQL, Oracle, Netezza, Teradata
• Reporting Tools: Tableau, Power BI, SAP Business Objects WORK EXPERIENCE
PROINFY SOLUTIONS Dec 2024 - Present
Cloud Consultant Alpharetta, GA
• Developed and maintained scalable data pipelines using Databricks notebooks and workflows to support modern data solutions.
• Designed and implemented parameterized and dynamic ADF pipelines for automated multi-source data ingestion, enhancing data transformation and migration from on-prem SQL Server to Azure PaaS databases.
• Created Azure Databricks Notebooks to extract data from Data Lake and load into SQL Server databases, aligning with best practices for ETL workflows.
• Configured and managed ADF pipelines along with Azure Key Vault and integration runtimes to ensure secure and efficient data processing.
• Established CI/CD pipelines using Azure DevOps for source control and automated deployment of data pipelines, improving opera- tional efficiency.
• Monitored pipeline performance with Azure Monitor and Log Analytics to resolve issues and optimize data workflows, ensuring data quality and reliability.
• Orchestrated data migration pipelines and maintained data transformations in the Azure environment, supporting robust scalable data systems.
• Created reusable ADF templates and parameterized components to streamline future migration projects and improve overall process efficiency.
• Configured Power BI to connect to Data Warehouse SQL databases, facilitating dashboard creation and data visualization. CHARTER COMMUNICATIONS Apr 2019 - Jan 2020
Big Data Consultant Charlotte, NC
• Engineered data pipelines for ingestion, aggregation, and load of consumer response data into Hive external tables, supporting Tableau dashboard development.
• Utilized Hadoop clusters and Hive for efficient storage and retrieval of large datasets, ensuring data quality across various sources.
• Migrated large datasets from Netezza to Teradata using custom Sqoop scripts, streamlining data warehouse processes.
• Collaborated with cross-functional teams under Scrum Agile methodology to develop technical requirements, design specifications, and software enhancements.
• Executed query analysis and coding with Hadoop, Spark, Scala, Hive, and HBase to enhance data processing capabilities and generate actionable insights.
• Participated in production deployment activities and provided ongoing support to resolve issues, ensuring continuous delivery of client business requirements.
• Worked on ETL methods and data warehouse tools for reporting and analysis using Power BI, reinforcing data-driven decision-making. CHANGE HEALTHCARE Nov 2018 - Mar 2019
Data Consultant – Big Data San Mateo, CA
• Collected Members, Providers, claims data and identified personal health information (PHI) from Various SQL Servers and Ingested them Into Hadoop Distributed File System.
• Worked on Hadoop cluster and data querying tools Hive to store and retrieve data.
• Developed Spark SQL queries for the analysts via Zeppelin and Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
• Setting up environments on AWS EC2 instances for computing, using S3 as storage, and setting up ELASTIC MAPREDUCE (EMR) clusters.
• Imported data from AWS S3 into Spark RDD, performed transformations and actions on RDDs, and administered the cluster, tuning memory based on RDD usage.
• Collected data using Spark Streaming from AWS S3 bucket in near-real-time, performed necessary transformations and aggregations to build the data model, and persisted the data in HDFS.
• Implemented a mechanism for triggering Spark applications on EMR on file arrival from the client.
• Implemented Sqoop for large data transfer from RDBMS to S3.
• Worked with the Spark ecosystem using SparkSQL and Scala queries on different formats like Text File, CSV File, and extensively worked with Parquet File formats.
• Used Eclipse IDE and IntelliJ for the development of the entire application.
• Worked on continuous integration of the application using Jenkins, Rundeck, and CICD Pipelines.
• Used Maven Build Tool for automation.
• Used Git as a source control for importing and delivering the application to source control.
• Worked with the team on many design decisions.
• Extensively used Zeppelin tool for data analytics and data visualization. GLOBAL ATLANTIC FINANCIAL GROUP Jun 2017 - Oct 2018 Data Consultant - Hadoop Boston, MA
• Coordinated with business customers to gather business requirements. And, interact with other technical peers to derive technical requirements and delivered the BRD and TDD documents.
• Worked on analyzing Hadoop cluster and different Big Data analytic tools including Hive, HBase database and SQOOP. Implemented Spark using Scala and Spark SQL for faster processing of data.
• Installed and Configured Apache Hadoop clusters for application development and Hadoop tools like Hive, Zookeeper, and Sqoop.
• Developed SQOOP scripts to migrate data from Source Systems to Big data Environment.
• Worked in a team with 30 node cluster and increased cluster by adding Nodes.
• Imported data from MySQL and Oracle to HDFS using Sqoop. Loaded data from Linux file systems to HDFS and processed it using Hive queries.
• Imported and exported data from MySQL, IBM DB2 and Oracle to HDFS using Sqoop.
• Worked on Hive to transform files from various formats (Avro, Parquet, XML) to text files.
• Prepared Avro schema files for Hive table creation and loaded data into tables using HQL.
• Analyzed partitioned and bucketed data using Hive QL, executed Hive queries on Parquet tables, and utilized Tableau for daily reporting.
• Extending HIVE core functionality by using custom User Defined Function's (UDF), User Defined Table-Generating Functions
(UDTF) and User Defined Aggregating Functions (UDAF).
• Performance tuned and optimized Hadoop clusters to achieve high performance.
• Monitored System health and logs and respond accordingly to any warning or failure conditions
• Responsible to manage the test data coming from different sources.
• Weekly meetings with technical collaborators and active participation in code review sessions with senior and junior developers.
• Used Oozie, ControlM and Autosys workflow engine for managing and scheduling Hadoop Jobs.
• Created and maintained technical documentation for launching Hadoop Clusters and for executing Hive queries and Pig Scripts. ACHIEVA IT INC Jan 2017 - May 2017
Software Trainee Alpharetta, GA
• Involved in Trouble Shooting and Performance Issues and Optimizing the performance.
• Enhanced Hadoop security by suggesting custom access protocols.
• Created users in MySQL, Oracle, Microsoft SQL Server 2008, installed databases, and assigned privileges. Monitored alert log files regularly.
• Loaded data from Linux File System to HDFS and vice versa.
• Exported and imported data from HDFS to Microsoft SQL Server and MySQL Server for a car manufacturing company sales data.
• Developed Hive queries for data analysis and SAP BO for Reporting. EDUCATION
University of Illinois Springfield Aug 2015 - Dec 2016 Master of Science (M.S), Computer Science