Data Azure

Location:

Allen, TX

Posted:

September 06, 2023

Contact this candidate

Resume:

Kranthi M

Azure Data Engineer

Phone: +1-469-***-****

Email: *********************@*****.***

LinkedIn: http://www.linkedin.com/in/kranthi-k-3767241a1

Having 10+ years of experience in Data engineering using Azure, Hadoop, Snowflake & Informatica.

Results-driven Data Engineer with expertise in designing and implementing scalable data ingestion pipelines using Azure Data Factory, PySpark, Sqoop & Stream Sets.

Holds Certification in Microsoft Azure Cloud Services.

Implemented Data pipelines using Delta Lake, Delta Tables, Delta Live Tables, Data Catalogues and Delta Lake API.

Implemented real-time streaming data pipelines using Azure Event Hub and Databricks Auto Loader.

Implemented data pipelines using PySpark, Databricks and Databricks job API.

Implemented Data pipelines using Pandas Data frame, Sparke Data Frame, RDDs.

Implemented Spark performance tuning, Spark SQL, Spark Streaming in Bigdata & Azure Databricks.

Strong expertise in optimizing Spark jobs and leveraging Azure Synapse Analytics for big data processing and analytics.

Implemented optimized Datawarehouse solutions using azure synapse.

Implemented Server-less solutions using azure functions.

Implemented workflows using Azure logic apps.

Proficient in Python and Scala using Spark framework.

Hands on experience in using ORC, Parquet, Avro, and delimited file formats.

Implemented Hive performance tuning techniques and UDFs.

Implemented real time data injection using Apache Kafka.

Hands on experience in implementing data pipeline solutions using Hadoop, azure, ADF, Synapse, Pyspark, Map-Reduce, Hive, Tez, Python, Scala, Azure functions, Logic apps, stream sets, ADLS Gen2 and snowflake.

Implemented data pipelines using snow SQL, Snowflake Integrated services, and snow pipe.

Experience in using Snowflake Clone and Time Travel.

Participates in the development, improvement, and maintenance of snowflake database applications.

Build the Logical and Physical data model for snowflake as per the changes required.

Experience with Snowflake Multi - Cluster Warehouses.

Adept at designing cloud-based data warehouse solutions using Snowflake, optimizing schemas, tables, and views for efficient data storage and retrieval.

Experienced in Health industry with hands on experience in Healthcare, HHS agency, PII data, PHI data, HL7 data.

Worked on Health care domain projects with Internationalsos.com and Change health care.

Conducted thorough data security analysis for Personally Identifiable Information (PII) and Protected Health Information (PHI) data to identify vulnerabilities and mitigate potential risks.

Developed and implemented data protection strategies, ensuring compliance with privacy regulations such as HIPAA in handling sensitive healthcare information.

Collaborated with IT and security teams to implement encryption, access controls, and data masking techniques for safeguarding PII and PHI data.

Analysed HL7 data messages to facilitate seamless integration and exchange of healthcare information between systems, promoting interoperability.

Leveraged HL7 standards to extract and interpret clinical data, facilitating effective communication and data sharing across healthcare applications.

Applied data mapping and transformation techniques to ensure the accuracy and consistency of HL7 data for meaningful insights and decision-making.

Experienced in Informatica Master Data Management (MDM) 10.4, 10.3, and 10.1 Multi-Domain Edition, covering Analysis, Design, Testing, Maintenance, and Quality Assurance for MDM and Data Warehousing applications.

Proficient in the full software development life cycle, from requirement gathering to production support, utilizing Informatica MDM and Informatica Data Director.

Implemented Informatica MDM on Azure Cloud and proficient in designing MDM solutions, including landing tables, staging tables, base objects, hierarchies, foreign-key relationships, lookups, queries, and packages.

Highly skilled in Data modeling, Data Mappings, Data validation, Match and Merge rules, Hierarchy Manager, and configuring Informatica MDM solutions.

Proficient in Informatica Developer for understanding data and mappings.

Experience in Provisioning Tool 10.3 HF1 and C360 10.3 HF1, including UI screen and dashboard creation.

Familiarity with Oracle 10g, PL/SQL, and SQL Server for database tasks and data optimization. Highly proficient in Agile methodologies, including JIRA/ADO for project management and reporting.

Implemented CI/CD data pipelines and collaborating with DevOps teams for automated pipeline deployment.

Implemented Production data pipelines using Apache Airflow, YAML and Terraform scripts.

Hands on experience using version control tools Git hub, Azure Dev Ops, bitbucket, Git labs and ARM templates.

Implemented data visualization using Power BI & Power BI DAX.

Hands on experience in implementing shell scripting.

Implemented Datawarehouse solutions using Snowflake & Start Schema.

Implemented data pipelines using informatica & PL SQL.

Proficient in creating and designing paginated reports, specialized in delivering pixel-perfect, structured layouts suitable for printing and PDF generation.

Skilled in utilizing the Report Definition Language (RDL) to craft precise and consistent report layouts, catering to various data sources including relational databases and multidimensional data.

Experienced in parameterizing paginated reports, enabling end-users to customize content, apply filters, and modify report settings for enhanced interactivity.

Well-versed in enterprise reporting solutions, adept at leveraging paginated reports for operational reporting needs, automation, and scheduled distribution, enhancing organizational efficiency.

Technical skills:

Cloud Services

Microsoft Azure, Snowflake cloud data warehouse.

Azure Services

Azure data Factory, ADLS Gen2, ADLS Blob storage, Azure Data Bricks, Logic Apps, Purview, AD, Synapse analytics, Functional App, Azure DevOps, CircleCi, Azure SQL, RBAC, IAM, Kubernetes, Docker, CI/CD pipelines, API, AM Tool, SailPoint, RBAC, SOD, SOX, SOC controls.

Snowflake data warehouse

Snowflake, snowpipe, snowtask, Failsafe, Zero copy cloning, Data Retention, BI integration, Snowsight

Big Data Technologies

MapReduce, Hive, Teg, Python, Apache Spark, Scala, Kafka, Matillion, Spark streaming, Oozie, Sqoop, Zookeeper, HBase, YARN, Impala, Pig, Map Reduce, and Flume, ETL, SSIS, Informatica, Informatica MDM 10.1,10.2,10.3,10.4 (MDM Hub, Informatica Data Director, Message Queues), Informatica PowerCenter, IICS, IDQ, ITSOs, DAO, Data Perk, Informatica S360, Informatica C360, Fivetran, DBA, Splunk

Hadoop Distribution

Cloudera, Horton Works

Programming Languages

SQL, PL/SQL, Python, Java, C#, HiveQL, Scala, R programming, SAS

Web Technologies

HTML, CSS, JavaScript, XML, Adobe PDF, JSP, Quickbase development, Rest API, SOAP API, SAP, HTML/HTML5, CSS/CSS3, XML, XSLT, jQuery, JSON, Ajax, Bootstrap, Angular JS, MicroStrategy, JBoss EAP 6.4/7.1, Spring integration, MVC (Model-View-Controller), Hibernate, Spring Boot Microservices, EWB

Operating Systems

Windows (XP/7/8/10), UNIX, LINUX, UBUNTU, CENTOS.

Build Automation tools

Ant, Maven

Version Control

GIT, Bitbucket, SVN

IDE &Build Tools, Design

Eclipse, MS Visual Studio, Visio, Erwin

Databases / web servers

Neo4J, MS SQL Server 2016/2014/2012, Oracle SQL Server, T-SQL, Azure SQL DB, MS Excel, MS Access, Oracle 11g, Oracle 12C, Cosmos DB, Cassandra, PostgreSQL, MongoDB, Teradata, Apache Web Server, Nginx, Tomcat, JBoss, WebLogic

Data Visualization tools

PowerBi, PBRS (PowerBIReport Scheduler), Power On (PowerBIReport Automation), Tableau, Grafana, Cognos, SSRS, SSDT

Testing tools

Selenium, UiPath, JUnit, JIRA, Postman, SoapUI, JMeter, Cucumber

SDLC methodologies

Waterfall, Agile

Education:

Bachelor of Technology from Jawaharlal Nehru Technological University, India. (August 2007-May 2011)

Master of Science in Information Systems, USA. (January 2012- May 2013)

Work Experience

Sr Azure Data Engineer Feb 2023 – Till Now

Cisco, San Jose

Responsibilities:

•Designed scalable data ingestion pipelines using Azure Data Factory for SQL databases, CSV files, and REST APIs.

•Developed efficient data processing workflows with Azure Databricks, utilizing Spark for distributed tasks.

•Ensured high data quality through validation, cleansing, and transformation using Azure Data Factory and Databricks.

•Implemented a cloud-based data warehouse with Snowflake on Azure, ensuring scalability and top-notch performance.

•Created optimized Snowflake schemas, tables, and views for efficient data storage and retrieval.

•Collaborated with stakeholders to design data models and structures in Snowflake.

•Utilized Azure Blob Storage and Azure SQL, implementing compression and encryption techniques for secure storage.

•Leveraged Azure Synapse Analytics for robust big data processing and analytics capabilities.

•Automated data pipelines with event-based triggers and scheduling for seamless operation.

•Proficiently employed ER Studio Data Vault Modeling to design and manage complex Data Vault models, ensuring scalability and adaptability to changing business requirements.

•Utilized ER Studio's visual modeling capabilities to create, modify, and visualize Data Vault components like Hubs, Links, and Satellites, streamlining the modeling process.

•Leveraged ER Studio's automated documentation features to generate comprehensive documentation for Data Vault models, enhancing communication among stakeholders.

•Applied ER Studio's impact analysis tools to assess the effects of proposed changes on existing Data Vault structures before implementation, ensuring smooth transitions.

•Utilized ER Studio's collaboration features to facilitate teamwork and collaboration among data architects, modelers, and administrators, promoting efficient model development and maintenance.

•Implemented data lineage and metadata management solutions for enhanced tracking and monitoring.

•Resolved performance bottlenecks, optimizing Snowflake query execution for faster results.

•Demonstrated excellent communication skills, conveying technical concepts to non-technical stakeholders effectively.

•Utilized partitioning, indexing, and caching strategies in Snowflake to improve query performance.

•Integrated diverse data sources into Snowflake, facilitating streamlined analytics.

•Implemented Azure Purview and AD for data security and governance, ensuring data protection and compliance.

•Hands-on experience with Hive on Spark, Kafka, Spark Streaming, and Hive for effective data handling.

Led the implementation of a cloud-based data warehouse with Snowflake on Azure, expanding expertise in data warehousing to encompass NoSQL solutions like Cassandra.

Collaborated with stakeholders to design data models and structures in Snowflake, demonstrating analytical thinking that can be applied to Cassandra data modeling.

Leveraged Azure Blob Storage and Azure SQL, demonstrating proficiency in cloud storage, which can be extended to include Cassandra-related setups.

•Developed interactive dashboards with Tableau/Power BI/Looker, enabling data-driven insights.

•Conducted performance tuning for Snowflake virtual warehouses to enhance overall system responsiveness.

•Led successful migration to Snowflake on Azure, achieving enhanced scalability and cost-efficiency.

•Proficient in crafting complex SQL queries for data analysis and reporting, meeting tight deadlines.

•Assisted in seamless integration of multiple data sources into Snowflake for unified analytics.

•Created and executed caching mechanisms for rapid data retrieval and improved performance implemented comprehensive data retention policies in Azure environments, ensuring adherence to regulatory requirements and minimizing data retention risks. Configured retention settings in Azure Data Factory, Snowflake, and other relevant services to automatically manage data lifecycle.

•Orchestrated data archiving strategies using Azure Blob Storage's tiering options, efficiently transitioning aging data to lower-cost storage tiers while keeping it accessible when needed.

•Leveraged Azure Data Lake Storage's time-based lifecycle management to automate data retention and expiration processes, effectively managing data sprawl and storage costs.

•Utilized Azure Purview (formerly Azure Data Catalog) to establish and enforce metadata management, allowing for efficient data discovery, lineage tracking, and maintaining data governance standards across the organization.

•Implemented data classification and labeling mechanisms within Azure Information Protection to ensure sensitive data is appropriately tagged, encrypted, and controlled, enabling better data security and compliance.

•Collaborated with security teams to implement data access controls, utilizing Azure Active Directory for fine-grained identity and access management. Implemented role-based access controls (RBAC) to enforce the principle of least privilege.

•Implemented data anonymization and masking techniques for non-production environments using Azure SQL Database Dynamic Data Masking and Snowflake's masking policies, safeguarding sensitive information during development and testing.

•Designed and implemented data retention and deletion workflows, leveraging Azure Logic Apps and Azure Functions to automate the process of purging expired data in accordance with retention policies.

•Created and enforced data encryption-at-rest and in-transit standards using Azure Key Vault and SSL/TLS protocols, ensuring data confidentiality and integrity throughout its lifecycle.

•Established data lineage tracking across the entire data ecosystem using tools like Azure Purview and third-party metadata management solutions, providing clear visibility into data movement and transformations.

•Conducted regular audits and assessments of data retention practices, collaborating with legal and compliance teams to ensure ongoing alignment with changing regulations and policies.

•Developed comprehensive disaster recovery strategies for data assets, including backup and replication mechanisms for critical components of the data pipeline, ensuring data availability and business continuity.

•Utilized Azure Policy and Azure Blueprints to enforce standardized data governance rules and configurations, maintaining consistency and compliance across various data engineering projects.

•Collaborated with Data Privacy Officers to establish data subject rights processes, enabling the organization to respond to data access and deletion requests in line with privacy regulations such as GDPR and CCPA.

•Managed massive data volumes in Snowflake efficiently.

•Proficient in IAM services, IAM tools, and Agile project management, leading cross-functional teams successfully.

•Created clear and concise technical documentation using diagramming tools for effective communication.

•Spearheaded data perk programs, leveraging data analytics for personalized customer rewards, enhancing engagement.

•Experience in performance tuning Snowflake queries for optimal efficiency.

•Implemented Data Quality and Governance frameworks using Microsoft "Purview" and others.

•Contributed to data governance initiatives and access controls, ensuring data privacy and security.

Proficient in identifying Data security weaknesses and potential risks.

Innovatively developed and optimized data processing workflows within Azure Databricks, harnessing the power of Spark for efficient distributed computing. Employed analytical thinking to ensure high data quality through validation, cleansing, and transformation, utilizing both Azure Data Factory and Databricks.

Led the strategic implementation of a cloud-based data warehouse utilizing Snowflake on Azure. Designed and fine-tuned optimized Snowflake schemas, tables, and views, showcasing a deep understanding of data storage and retrieval strategies for enhanced performance.

Fully leveraged PySpark to perform complicated high queries to change data types and for computing.

Collaborated closely with stakeholders to design intricate data models and structures within Snowflake. Applied an analytical mindset to translate business requirements into efficient data models, enabling streamlined analytics and actionable insights.

•Identified and assessed automation opportunities, developing RPA workflows using UiPath for data processing automation.

•Utilized Git's branching capabilities to experiment with new features while maintaining a stable main codebase.

•Leveraged Azure DevOps with Git repositories for efficient project management and enhanced collaboration.

•Employed remote Git repositories for secure backup and disaster recovery of critical data engineering scripts.

•Utilized GitHub's API to integrate the platform with external tools, enabling seamless automation and data synchronization.

Project: CISCO Unified Communications Manager; An IP based Telephony call engine service for businesses of all sizes. We worked on CRM related business of the UCM.

Environment: Azure Databricks, Data Factory, Snowflake, Snowpipe, Snow task,, Failsafe, Zero copy clone, Snow SQL, Logic Apps, Functional App, Azure SQL, IAM Tool, SailPoint, Oracle Identity Manager, CyberArk, RBAC, SOD, Least Privilege Access, SOX, SOC controls, Azure AD, Azure B2C, Azure B2B, Azure Key vault, RBAC, MIM, PAM, Azure DevOps, SSRS, SSAS, SSIS, Azure Purview, Synapse analytics, Matillion, MS SQL, Oracle, Cassandra, HDFS, ER studio, Data Vault modeling, MapReduce, YARN, Spark, Hive, SQL, Python, Scala, PySpark, shall scripting, GIT, UML modeling and OO modeling, kafka, ADF Pipeline, PowerBi, CircleCi, ITSOs, DAO, Data Perk,, RPA (Robotic Process Automation, UiPath, Visio, Erwin, Nessus, Qualys, Data Modeling, PowerBi, PBRS (Power BI Report Scheduler), Splunk, Power On (Power BI Report Automation), Tableau.

Azure Data Engineer May 2021 – Jan 2023

Internationalsos.com, Philadelphia

Responsibilities:

•Implemented end-to-end data pipelines using Azure Data Factory for diverse data sources into Snowflake.

•Designed data processing workflows with Azure Databricks, leveraging Spark for large-scale transformations.

•Expertly leveraged ER Studio Data Vault Modeling to design and manage intricate Data Vault models, ensuring scalability and adaptability to InternationalSOS's dynamic data landscape.

•Utilized ER Studio's visual modeling capabilities to create and optimize Data Vault components such as Hubs, Links, and Satellites, aligning with InternationalSOS's data architecture requirements.

•Built optimized Snowflake schemas, tables, and views for complex analytics queries.

•Developed real-time data ingestion pipelines using Azure Event Hubs and Functions.

•Collaborated to handle cross-functional call data records for Consumer Lending.

•Utilized Azure Data Lake Storage for raw and processed data with partitioning and retention strategies.

•Integrated Azure Logic Apps with Azure Data Factory for complex workflows.

•Ensured data governance and quality using Azure Data Factory and Snowflake.

•Implemented data replication and synchronization strategies between Snowflake and other platforms.

•Leveraged ER Studio to establish reusable model components, promoting consistency and standardization across Data Vault modeling efforts.

•Integrated ER Studio with other data management tools to streamline end-to-end data engineering processes, enhancing the overall efficiency of data warehouse development.

•Utilized ER Studio's metadata management capabilities to enhance data lineage tracking and governance, ensuring transparency and compliance across various data operations.

•Utilized Azure Machine Learning for advanced analytics and machine learning.

•Designed data archiving and retention strategies using Azure Blob Storage and Snowflake.

•Implemented monitoring solutions using Azure Monitor and Snowflake QPM.

•Utilized Azure Policy and Azure Blueprints to enforce standardized data governance rules and configurations across the project.

•Integrated Snowflake with Power BI and Azure Analysis Services for interactive dashboards.

•Proficient in IAM services, IAM tools, and Agile project management.

•Developed and maintained technical documentation using diagramming tools.

•Strong experience in Healthcare Industry, understanding workflows and regulations.

•Expertise in data privacy and security regulations (e.g., HIPAA).

•Collaborated with cross-functional teams to deploy RPA solutions.

•Expert in leveraging Nessus for in-depth vulnerability assessments.

•Proficient at pinpointing security weaknesses and potential risks.

•Experienced in deploying Qualys to achieve efficient vulnerability management.

•Skilled in conducting meticulous security evaluations.

•Orchestrated data archival and deletion in Azure Data Factory to meet industry regulations and optimize storage costs.

•Enforced least privilege in Snowflake using fine-grained access controls, enhancing data integrity and security.

•Established sensitive data identification through Azure Purview's labeling, bolstering data protection and compliance.

•Secured sensitive data during analytics with Snowflake's dynamic masking and tokenization techniques.

•Utilized Snowflake's Time Travel for historical data snapshots, supporting audits and regulatory requirements.

•Enforced data governance via Azure Policy and Blueprints for consistent quality, privacy, and security.

•Aligned access policies with GDPR and HIPAA, integrating consent management in Azure Data Factory and Snowflake.

•Implement data classification and labeling mechanisms within Azure Information Protection to ensure sensitive data is appropriately tagged, encrypted, and controlled.

•Collaborate with security teams to enforce data access controls using Azure Active Directory and role-based access controls (RBAC).

•Implemented robust data encryption-at-rest and in-transit standards using Azure Key Vault and SSL/TLS protocols.

•Collaborated with security teams to establish fine-grained identity and access controls using Azure Active Directory.

•Showcased data lineage through Azure Purview and Snowflake's metadata for audit transparency.

•Ensured data security with Azure Key Vault and Snowflake's encryption for rest and transit.

•Automated retention workflows with Azure Logic Apps and Data Factory, ensuring systematic data cleanup.

•Conducted proactive data governance audits with Azure Monitor and Snowflake QPM.

•Established cross-functional data stewardship roles for organization-wide data quality, security, and compliance.

•Prioritizing and suggesting effective remediation strategies.

•Utilized PySpark for High query analytics and in-memory computing purposes.

•Demonstrated an analytical approach by utilizing advanced indexing, partitioning, and caching strategies within data models to enhance query performance, resulting in expedited data retrieval.

•Utilized PBRS and Power On for automating reporting and visualizations and seamlessly integrated data with Fivetran, enhancing performance and scalability using DBA skills.

•Orchestrated intricate branching strategies in SVN, managing feature development, bug fixes, and releases efficiently across the codebase

•Integrated SVN repositories with Azure DevOps, streamlining project management and cross-platform collaboration.

Environment: Snowflake, snowpipe, snow task, Failsafe, Zero copy clone, Azure Databricks, Data Factory, Logic Apps, Kubernetes, Azure DevOps, Azure Purview, IAM Tool, RBAC, SailPoint, ER studio, Data Vault, Oracle Identity Manager, RBAC, SOD, Least Privilege Access, SOX, SOC controls, Azure AD, Azure Key vault, Azure B2C, Azure B2B, MIM, PAM, Azure DevOps, SSRS, SSAS, SSIS, Azure SQL, Synapse analytics, Matillion, CI/CD pipelines, Functional App, Cassandra, Oracle server, Consumer Lending, API, MS SQL, Oracle, HDFS, MapReduce, YARN, Spark, Hive, SQL, Python, Scala, PySpark, shall scripting, GIT, SVN, Healthcare, Data Modeler, HHS agency, PII data, PHI data, HL7 data, Kafka, ADF Pipeline, ITSOs, DAO, Data Perk, Power Bi, UML modeling and OO modeling, RPA (Robotic Process Automation, UiPath, Nessus, Qualys, PBRS (Power BI Report Scheduler), Paginated Reports, Power On (Power BI Report Automation), CircleCi, Fivetran, DBA, Splunk, DBA, Visio, Erwin, Consumer lending, Tableau, Github.

Data Engineer/Cloud Apr 2020 – Apr 2021

Change HealthCare, Nashville, TN

Responsibilities:

•Designed SSIS ETL processes to extract data from various healthcare systems, such as Electronic Health Records (EHR), claims data, and billing systems.

•Spearheaded SSIS implementation for ETL processes, seamlessly integrating diverse data sources, resulting in efficient data workflows within the healthcare domain.

Leveraged SSRS to design and develop dynamic reports that provided actionable insights to stakeholders, enhancing data-driven decision-making.

Managed healthcare-specific data (PII, PHI, HL7) with meticulous attention to regulatory compliance, ensuring secure handling of sensitive patient information.

Successfully migrated databases to Azure Cloud for scalability.

Optimized the cost of data storage and processing on Azure cloud environments if applicable.

Collaborated closely with Azure cloud services, Docker, and Kubernetes for scalable data solutions, optimizing system performance and flexibility.

Used PySpark and SSIS to clean and preprocess healthcare data, ensuring data quality and integrity.

Designed and maintained data warehouses or data lakes to store healthcare data efficiently.

Proficiently managed Hadoop, Hive, PySpark, and related tools for efficient processing and analysis of large datasets, driving business insights.

Employed Change Data Capture (CDC) strategies to synchronize data changes across systems, ensuring real-time data availability for reporting.

Created seamless data pipelines with Sqoop, Flume, Kafka for real-time data.

Leveraged tools like Hadoop, PySpark, Cloudera for diverse data types.

Proficient in using Nessus and Qualys for vulnerability management.

Designed complex data models in Cassandra and HBase for efficiency.

Utilized Gitflow, multiple worktrees, and orchestrated SVN to Git migration.

Created structured paginated reports with RDL and PowerBI dashboards for insights.

Environment: Azure services, Azure Data Lake, Azure Data factory, Azure Databricks, Sqoop, MYSQL, HDFS, Azure cloud services, Apache Spark Scala, Hive Hadoop, Apache Spark, Docker, Kubernetes, Azure DevOps, Cloudera, CI/CD pipelines, HBASE, Kafka, MapReduce, Zookeeper, Oozie, Data Pipelines, SSIS, SSRS, ER studio, Data Vault, Healthcare, HHS agency, PII data, PHI data, HL7 data, GIT, RDBMS, Ms Excel, Oracle SQL, Cassandra, Consumer lending, Informatica, Change data capture (CDC), Data Modeler, Splunk, Fivetran, DBA, Alteryx, Python, PySpark, shall script, Ambari, JIRA, Visio, IICS, Erwin, Nessus, Qualys, JIRA, PowerBI, Paginated Reports, Tableau, Github

Data Engineer July 2018 – Mar 2020

Quotient, Mountain view, CA

Responsibilities:

Developed SSIS packages for ETL processes, for moving and transforming data, integrating data seamlessly from diverse sources, and ensuring accurate data transformations and loading.

Implemented SSRS reports that addressed specific business use cases, providing stakeholders with valuable insights for informed decision-making.

Built ETL framework with Sqoop, Pig, Hive, and Spark.

Developed Spark Streaming apps for real-time analytics.

Generated ad-hoc reports with Power BI, Splunk, and Tableau.

Implemented Splunk for real-time metric visualization and log compliance.

Designed efficient data models, optimized queries in Cassandra.

Developed and fine-tuned PySpark applications and queries to perform complex analytics.

Optimized Hive queries and leveraged PySpark for advanced in-memory ETL processing, improving data processing efficiency.

Optimized PySpark and SSIS jobs for performance by tuning queries, improving data processing algorithms, and managing resource allocation.

Conducted vulnerability scans using Nessus and Qualys.

Implemented real-time data tracking solutions.

Managed cursors used bulk processing for efficient data manipulation.

Utilized Git and GitHub for version control and collaboration.

Utilized Tableau and PowerBI to prepare Dashboards and stories.

Environment: Hadoop, Hive, Azure cloud services, spark, Oracle database, MS Excel, PySpark, Apache Spark, Sqoop, Spark SQL, Shallscript, Cassandra, PL SQL, YAML, ETL, MDM, Informatica PowerCenter, Informatica S360, Informatica C360, workflow design, Metadata Management, Data Governance Framework, Data Privacy and Security, Alteryx, Nessus, Qualys, PowerBi, PBRS (Power BI Report Scheduler), Paginated Reports, Fivetran, DBA, SSIS, SSRS, Splunk, IICS, Change data capture (CDC), Visio, Erwin, Power On (Power BI Report Automation), JIRA, Tableau.

Data Engineer/ SSIS Developer May 2017 – June 2018

Experian, Costa Mesa, CA

Responsibilities:

Design and develop robust SSIS packages for seamless data integration, ensuring accurate extraction, transformation, and loading of data from various sources.

Collaborate with cross-functional teams to understand business requirements and translate them into effective ETL solutions, catering to Experian's data processing needs.

Implemented real-time data ingestion and processing with Apache NiFi and Kafka Connect.

Optimized Hive queries and used Spark for advanced ETL in-memory processing.

Employed Dimensional Data Modeling, SCD techniques in Apache environments.

Developed, deployed, and fine-tuned Spark applications for complex analytics.

Utilized Apache Kafka for efficient real-time data streaming and processing.

Designed optimized data models and managed Cassandra clusters.

Used PySpark and SSIS to clean and preprocess data,

Contact this candidate