Data Engineer Big

Location:

Toronto, ON, Canada

Posted:

July 03, 2024

Contact this candidate

Resume:

Sri Harsha Ginjupalli

Data Engineer Data Analyst Big Data Engineer Cloud Data Engineer

+1-647-***-**** ************@*****.*** Toronto, Ontario, Canada Open to Remote Open to Relocate LinkedIn Portfolio

SUMMARY

Experienced Data Engineer/ Analyst adept in Big Data technologies, Azure, GCP and AWS cloud platforms. Proficient in building ETL pipelines using Azure Data Factory, DataFlow, AWS Glue and EMR for data processing and analysis. Skilled in NoSQL databases like MongoDB, HBase and SQL databases including MySQL, Oracle, SQL Server. Expertise in Data Visualization, Data Modeling using Power BI and Tableau. Knowledge in Looker and QuickSight for data visualization. Proven ability to integrate DevOps tools such as Jenkins and Docker for efficient deployment processes. SKILLS

Languages : SQL, T-SQL, Python, Python (Programming Language), Scala (Programming Language), Java Big Data and Data Processing : Hadoop, Sqoop, MapReduce, Spark, Hive, Pig, Kafka, Oozie, Big Data, MapReduce, Apache Kafka

Cloud Services : Azure Blob Storage, HDInsight, Azure Data Factory, Azure SQL Database, Azure Stream Analytics, Azure Synapse Analytics, CosmosDB, DevOps Toolkit, Log Analytics, AWS EMR, EC2, S3, RDS, Amazon Redshift, GCP ETL Tools : Azure Data Factory, AWS Glue, Informatica, SSIS Database Management : MySQL, SQL Server, Oracle, Snowflake, PostgreSQL, HBase, MongoDB, DynamoDB, BigQuery

Business Intelligence and Visualization: Power BI, Tableau, Looker, Amazon QuickSight CI/CD : Jenkins, Docker, Terraform, Kubernetes, Git EXPERIENCE

Data Engineer

client: NorthBridge Financial Company

Feb '23 — Present

Toronto, Canada

Designed and implemented data pipelines within Azure Data Factory for ETL processes, resulting in a 40% improvement in data processing efficiency.

Contributed to the development of scalable Pyspark solutions for data transformation on Azure Databricks, resulting in a 20% improvement in pipeline execution efficiency.

Developed scalable Spark clusters that improved data preparation speed by 40% using Azure Databricks, contributing to the efficient processing of large datasets.

Designed and implemented scalable Snowpipes for extracting over 100TB of data from Azure Data Lake to Snowflake database, optimizing data flow and ensuring efficient ETL processes. Integrated and automated data ingestion processes from multiple sources into SQL Server using Azure Data Factory

(ADF), enhancing data availability by 30% for analytics and reporting purposes. Scheduled triggers to batch pipelines using Azure Data Factory and was involved in writing PL/SQL, stored procedures, and functions.

Automated ETL processes by building Airflow DAGs utilizing Kubernetes executors in Azure Cloud environment, reducing manual intervention by 40% and enhancing overall operational efficiency. Integrated Power BI with Azure SQL DB to generate reports and charts and implemented drill-down functionality. Leveraged Azure DevOps Tool Kit to automate testing and deployment processes, resulting in a 30% reduction in manual errors, improving overall system reliability and minimizing downtime. Extensive experience with Snowflake Data warehouses, Snowpipes, SnowSQL Collaborated with cross-functional teams to integrate Azure log Analytics with data visualization tools like Power BI for generating comprehensive reports on system behavior and performance metrics.

·Utilized Power Query and DAX functions within Power BI to clean, transform, and model data from various sources, ensuring data accuracy and consistency.

Utilized SCRUM methodologies as part of an Agile Team to contribute towards efficient sprint planning and regular stand-up meetings resulting in achieving 100% task completion rate within stipulated timelines. Data Analyst

Client : EngHouse

Jan '22 — Dec '22

Markham, Canada

Understand the current production state of the applications and determine the impact of new implementations on the existing business processes.

Utilized Azure Data Services such as ADF, Blob Storage, Azure SQL DW, Azure Data Lake Storage and Azure Synapse Analytics for data integration, data warehousing, and big data analytics. Created and maintained datamodeling concepts like conceptual, logical, and physical data models to support business requirements.

Designed and implemented 10+ fault-tolerant data pipelines for extracting, transforming, and loading data from diverse source systems to Azure Data Storage services using Azure Data Factory, Azure Data Lake Analytics with T-SQL, ensuring continuous integration and scalability.

Developed spark applications using PySpark and SparkSQL for data extraction, transformation and aggregation from multiple formats for analyzing and transforming data to uncover insights into the customer usage patterns. Developed and maintained PowerBI datasets for seamless integration with SQL Server Analysis Services (SSAS), contributing to improved reporting capabilities and business insights. Implemented automated error handling mechanisms within SSIS packages, reducing manual intervention by 90% and enhancing fault-tolerance capabilities in ETL processes. Designed and implemented fault tolerant ETL processes using SSIS to generate drill through and drill down reports, meeting user requirements and enhancing data visualization Utilized conditional formatting with expressions in SSRS to enhance data visualization accuracy, contributing to improved insights for stakeholders.

Developed and deployed SSRS reports to SharePoint server, utilizing Excel services within SharePoint application to enhance data visualization and accessibility for stakeholders. Converted 150 Tableau reports to SSRS to support business user requirements, ensuring seamless data visualization across the organization's Microsoft Azure platform Data Analytics Engineer

Client: Cvent Technologies

Dec '20 — Dec '21

Hyderabad, India (Remote)

Designed and implemented data pipelines for both batch and real-time processing, transforming unstructured data to structured format, resulting in a 40% increase in data accuracy and efficiency. Utilized PySpark scripts for data transformations within AWS Glue on a daily basis, resulting in the processing of over 10TB of data per month for consumption by data stakeholders. Optimized ETL process by configuring extraction from S3 into Parquet, Avro, and ORC formats resulting in a 30% reduction in processing time for loading into RDS via EC2 instances. Collaborated with cross-functional teams to establish an automated ETL process leveraging AWS Glue and Lambda, achieving a cost-saving of $50k per quarter while ensuring seamless integration with diverse data sources. Created a real-time data pipeline using Amazon Kinesis with firehose where the data is loaded to DynamoDB. Designed, implemented, and optimized ETL processes using AWS EMR for processing large-scale big data sets, resulting in a 30% improvement in data processing efficiency. Automated data archival workflows from S3 to Glacier and Deep Glacier, resulting in a 50% decrease in storage costs. Streamlined the CI/CD pipeline by integrating GitHub with Jenkins, leading to a 40% improvement in release cycle efficiency and a threefold increase in automated testing coverage. Developed and scheduled Python-based jobs in Apache Airflow, creating Directed Acyclic Graphs (DAG) for efficient task dependencies.

Utilized QuickSight to create interactive visual representations of data from RDS, Redshift, and DynamoDB databases, leading to a 15% increase in stakeholders' understanding of complex datasets. Python Developer

Client: NTT DATA

Nov '19 — Nov '20

Hyderabad, India (Remote)

Developed Python scripts and Django APIs to optimize database access, resulting in a 25% increase in data operation efficiency and responsiveness.

Designed and implemented RESTful web service APIs using Python, enhancing data accessibility and connectivity for seamless integration with a 20% reduction in latency. Collaborated on developing Python scripts to support the transformation of over 1 TB of daily data on the Enterprise big data platform (EDL), optimizing performance and scalability. Collaborated with Data Architecture and QA teams to translate architecture into logical and physical data models adhering to evolving standards resulting in increased operational efficiencies by 25%. Utilized Python to optimize query performance on Apache Hive tables, resulting in a 20% reduction in query response time.

Implemented best practices using JIRA ticketing tool and GitHub version control system, resulting in a 20% reduction in code integration errors.

EDUCATION

Bachelor of Technology in Electronics and Communication Engineering, Malla Reddy Institute of Engineering and Technology, Hyderabad, India

Jun '15 — May '19

India

CERTIFICATIONS

Microsoft Certified: Azure Data Engineer Associate ( DP-203) Microsoft

Contact this candidate