Post Job Free
Sign in

Data Engineer Lake

Location:
Edison, NJ
Posted:
May 23, 2024

Contact this candidate

Resume:

MANISHA PULIPAKALA

Email: ad5wk6@r.postjobfree.com

Ph: 937-***-****

LinkedIn: linkedin.com/in/manishapulipakala796395266

Professional Summary:

•10 years of experience in the analysis, design, development, testing, performance and documentation of Database and Client Server applications.

•Experienced as Data Engineer solving business use cases for several clients.

•Experienced in the field of software with expertise in backend applications.

•Extensive experience in Informatica PowerCenter and Cloud data integration with SFDC and SNOW.

•Strong understanding of Distributed systems design, HDFS architecture, internal working details of MapReduce and Spark processing frameworks.

•Solid experience developing Spark Applications for performing highly scalable data transformations using RDD, Data frame, Spark-SQL, and Spark Streaming.

•Experience on Developing Spark applications using Spark - SQL in Data bricks for data extraction, transformation, and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into customer usage patterns.

•Experienced in MVC and Microservices Architecture with Spring Boot and Docker, Swamp.

•Expertise in using Docker and setting up ELK with Docker and Docker-Compose. Actively involved in deployments on Docker using Kubernetes.

•Experience of developing applications with Model View Architecture (MVC2) using Spring Framework and J2EE Design Patterns.

•Strong experience troubleshooting Spark failures and fine-tuning long running Spark applications.

•Strong experience working with various configurations of Spark like broadcast thresholds, increasing shuffle partitions, caching, repartitioning etc., to improve the performance of the jobs.

•Experience in using cloud services like Amazon AWS EMR, S3, Lambda, Auto Scaling, Cloud Watch, EC2, Red shift and Athena.

•Configured Spark Streaming to receive real time data from Kafka and store the stream data to HDFS and process it using Spark and Scala.

•Experience working with Data Lake is a system or repository of data stored in its natural/raw format, usually object blobs or files.

•Experience with Delta Lake, an open-source storage layer that brings ACID transactions to Apache Spark and big data workloads.

•Worked on Spark Streaming and Structured Spark streaming including Kafka for real time data processing.

•Strong experience of operating with cloud environments such as EC2 and S3 of Amazon Web Services (AWS).

•Solid experience in using various file formats like CSV, TSV, Parquet, ORC, JSON, and AVRO.

•Experienced working with various Hadoop Distributions (Cloudera, Hortonworks, Amazon EMR) to fully implement and leverage various Hadoop services.

•In-depth knowledge on import/export of data from Databases using Sqoop.

•Well versed in writing complex hive queries using analytical functions.

•Knowledge in writing custom UDFs in Hive to support custom business requirements.

•Experienced in working with structured data using HiveQL, join operations, writing custom UDFs and optimizing Hive queries.

•Worked with Data Lake is usually a single store of all enterprise data including raw copies of source system data and transformed data used for tasks such as reporting, visualization, advanced analytics, and machine learning.

•Experience on Migrating SQL database to Azure Data Lake, Azure data lake Analytics, Azure SQL Database, Data Bricks and Azure SQL Data warehouse and controlling and granting database access and Migrating On premise databases to Azure Data Lake store using Azure Data factory.

•Highly experienced in whole cycle of DTS/SQL server integration services (SSIS, SSAS, SSRS, PowerBI, QlikView, and Tableau) Packages (Developing, Deploying, Scheduling, Troubleshooting, and monitoring) for performing Data transfers and ETL Purposes across different servers Business Analysis.

•Designed and implemented data lake architectures leveraging ADLS as the primary storage layer, enabling centralized storage of diverse data types and supporting various data processing and analytics workloads.

•Strong expertise in building scalable applications using various programming languages (Java, Scala, and Python).

•Continuous Delivery pipeline deployment experience with Maven, Ant, Jenkins, and AWS.

•Proficient in Core Java concepts like Multi-threading, Collections and Exception Handling concepts.

•Strong experience in working with Databases like Oracle, and MySQL, Teradata, Netezza and proficiency in writing complex SQL queries.

•Experienced in version control tools like SVN, GitHub and CVS.

•Experienced working with JIRA for project management, GIT for source code management, JENKINS for continuous integration and Crucible for code reviews.

Technical Skills:

Languages

Shell scripting, SQL, PL/SQL, Python, R, PySpark, Scala

Big Data Ecosystem

HDFS, MapReduce, Hive, Pig,Sqoop, Kafka, Apache Spark, Spark Streaming

Databases

Oracle 10g/11g/12c, SQL Server, MySQL, Cassandra, Teradata, PostgreSQL, MS Access, Snowflake, Netezza, NoSQL Database (HBase, MongoDB).

Cloud Services

Azure services – ADF (Azure Data factory), Blob storage, Data Lake, Synapse, Data Bricks, Azure DevOps, Azure HDInsight.

AWS services -- S3, EC2, EMR, Redshift, RDS, Lambda, IAM, CloudWatch.

Data Modeling ER

(OLTP), and Dimensional (Star, Snowflake Schema)

ETL Tools/Visualization/Other Tools

DTS (Data Transformation services), Informatica

PowerCenter 10.x/9.x/8.x, SSIS, SSRS, Alteryx, MapReduce, Spring Boot, Tableau, SAS, Toad, Power BI, Jira, Tidal, Notepad++, SQL Navigator, AWS Glue, Teradata, Git, GitHub.

Education Details:

Bachelors - Completed a bachelor’s degree at JNTUA University in 2014

Professional Experience:

Wells Fargo, Charlotte, NC March 2022 – Till Date

Azure Data Engineer

Responsibilities:

•Wrote a Data Bricks code and ADF pipeline with fully parameterized for efficient code management.

•Created and maintained SQL Server scheduled jobs, executing stored procedures for the purpose of extracting data from Oracle into SQL Server. Extensively used Tableau for customer marketing data visualization.

•Worked with Informatica PowerCenter, created SQL to extract and load data as per the requirements.

•Transformed business problems into Big Data solutions and defined Big Data strategy and Roadmap. Installing, configuring, and maintaining Data Pipelines.

•Integrated Azure DevOps with Azure Kubernetes Service (AKS) for containerized deployments, optimizing scalability and resource utilization for microservices architecture.

•Developed serverless applications using Azure Functions for real-time data processing and background task automation.

•Leveraged event-driven architectures by integrating with Azure Event Grid, Service Bus, and Event Hubs, improving system responsiveness.

•Extensive experience on building dashboards in Tableau and Involved in Performance tuning of reports and resolving issues within Tableau Server and Reports.

•Developed Data Bricks Python notebooks to Join, filter, pre-aggregate, and process the files stored in Azure data lake storage.

•Utilized Power Query in Power BI to Pivot and Un-pivot the data model for data cleansing and data massaging.

•Worked with real-time data processing and analytics using Structured Streaming, enabling continuous applications to read data from various sources such as Kafka, AWS Kinesis, or Azure Event Hubs.

•Designed and maintained ADF pipelines with activities – Copy, Lookup, For Each, Get Metadata, Execute Pipeline, Stored Procedure, if condition, Web, Wait, Delete etc.

•Involved in all phases of SDLC, i.e. Design, Development, Testing and Implementation of ETL processes end to end using Informatica PowerCenter and Informatica Cloud.

•Designed, developed, and tested dimensional data models using Star and Snowflake schema methodologies under the Kimball method.

•Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in in Azure Databricks.

•Designed ETL flows to implement using Informatica PowerCenter as per the mapping sheets provided.

•Published Workbooks and Dashboards to the Tableau server.

•Implemented Copy activity, Custom Azure Data Factory Pipeline Activities.

•Designed and developed business intelligence dashboards, analytical reports and data visualizations using Power BI by creating multiple measures using DAX expressions for user groups.

•Developed Kafka producers and consumers efficiently ingested data from various data sources.

•Responsible for wide-ranging data ingestion using Sqoop and HDFS commands.

•Accumulate partitioned data in various storage formats like text, JSON, Parquet, etc. Involved in loading data from LINUX file system to HDFS.

•Primarily involved in Data Migration using SQL, SQL Azure, Azure Storage, Azure Data Factory, SSIS, PowerShell.

•Worked configuring GitLab CI/CD pipelines to automate build, test, and deployment processes for software projects.

•Designed the business requirement collection approach based on the project scope and SDLC methodology.

•Managed Azure Data Lakes (ADLS) and Data Lake Analytics and an understanding of how to integrate with other Azure Services. Knowledge of U-SQL.

•Utilized ADLS features such as hierarchical namespace and storage tiers to optimize data storage and retrieval performance, resulting in improved data processing efficiency and reduced latency.

•Created Pipelines in ADF using Linked Services/Datasets/Pipeline/to Extract, Transform, and load data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and backwards.

•Worked on tickets opened by users regarding various incidents, requests.

•Wrote production level Machine Learning classification models and ensemble classification models from scratch using Python and PySpark to predict binary values for certain attributes in certain time frame.

•Implementing DevOps practices for continuous integration and continuous deployment (CI/CD) of data solutions within Azure Data Fabrics.

•Used Apache Spark Data frames, Spark-SQL, Spark MLib extensively and developing and designing POCs using Scala, Spark SQL and MLib libraries.

•Designed and documented the entire Architecture of Power BI POC

•Expertise in writing complex DAX functions in Power BI and Power Pivot.

•Automated Power Query refresh using power shell script and windows task scheduler.

•Used various sources to pull data into Power BI such as SQL Server, SAP BW, Oracle, SQL etc.

•Installed and configured Enterprise gateway and Personal gateway in Power BI service.

•Created Workspace and content packs for business users to view the developed reports.

•Scheduled Automatic refresh and scheduling refresh in Power BI service.

•Designed both 3NF data models for OLTP systems and dimensional data models using star and snowflake Schemas.

•Developed various Mappings with the collection of all Sources, Targets, and Transformations using Informatica Designer.

•Used SQL Server Integrations Services (SSIS) for extraction, transformation, and loading data into target system from multiple sources.

•Worked closely with regulatory delivery leads to ensure robustness in prop trading control frameworks using Hadoop, Python Jupyter Notebook, Hive and NoSQL.

•Wrote UNIX shell scripts to automate the jobs and scheduling Cron jobs for job automation using commands with Crontab.

Environment: Hadoop, Kafka, Spark, Sqoop, Snowflake, Spark SQL, Spark-Streaming, Hive, Scala, pig, NoSQL, Impala, Oozie, HBase, Zookeeper, Power BI, Data Bricks, Data Fabrics, Data Lake Storage, Data Factory, Unix/Linux Shell Scripting, Python, PyCharm, Tableau, Informatica, Linux, Shell Scripting, Informatica PowerCenter

Charter Communication, Charlotte, NC September 2020- February 2022

Azure Data Engineer

Responsibilities:

Created Dynamic Linked services and Datasets to reuse them in different pipelines by passing parameter values at runtime.

Extensively worked with Azure Storage, Azure Data Lake, Azure File Share &amp, Azure Blob Storage to store and retrieve data files.

Used Azure Key Vault to securely store secrets for connection strings and database passwords and used them in configuring Linked services.

Managed large-scale data lakes and data warehouses using Delta Lake for efficient storage, management, and processing of big data workloads.

Integrated Logic Apps with various Azure services such as Azure Storage, Azure Service Bus, and Azure Functions.

Designed and implemented data pipelines using Azure Data Factory to ingest, transform, and load data from various sources into Azure data services, making it accessible for analysis in Tableau.

Using Informatica PowerCenter Designer analyzes the source data to Extract & Transform from source systems by incorporating business rules using different objects and functions that the tool supports.

Successfully integrated ADLS with Apache Spark, Azure Databricks, and Azure Data Factory to build end-to-end data processing pipelines, enabling efficient data ingestion, transformation, and analysis workflows.

Created Metadata driven pipelines to dynamically load flat files and database tables from source to destination.

Migrated on PREM Traditional warehouse to synapse Warehouse.

Used Microsoft Power BI to Build and maintain the reports and dashboards. Used Power BI Desktop to develop data analysis multiple data sources to visualize the reports.

Integrated Custom Visuals based on business requirements using Power BI desktop.

Developed complex SQL queries using stored procedures, common table expressions (CTEs), temporary table to support Power BI and SSRS reports.

Developed custom APIs for seamless integration with on-premises and cloud systems, enhancing data flow and process efficiency.

Implemented robust error handling, logging, and monitoring strategies using Azure Monitor and Application Insights.

Developed complex calculated measures using Data Analysis Expression language (DAX).

Embedded Power BI reports on SharePoint portal page and managed access of reports and data for individual users using Roles.

Experience on Migrating SQL database to Azure Data Lake, Azure data Lake Analytics, Azure SQL Database, Data Bricks and Azure SQL Data warehouse and controlling and granting database access and Migrating On premise databases to Azure Data Lake store using Azure Data factory.

Utilized Azure Databricks or Azure Synapse Analytics for advanced data transformation and modeling tasks, preparing data for optimal visualization in Tableau dashboards.

Used Python libraries like Pandas and NumPy perform advanced data processing operations, including data manipulation, filtering, and feature engineering, to prepare data for analysis and modeling.

Performed information purging and applied changes utilizing Data bricks and Spark information analysis.

Extensively utilized Data bricks notebooks for interactive analysis utilizing Spark APIs.

Implemented granular access controls using Azure Active Directory integration and role-based access control (RBAC) in ADLS, ensuring data security and compliance with regulatory requirements such as GDPR and HIPAA.

Data analytics and engineering in multiple Azure platforms such as Azure SQL, Azure SQL Data warehouse, Azure Data Factory, Azure Storage Account etc. for source stream extraction, cleansing, consumption, and publishing across multiple user bases.

Designing and development of SQL objects, writing complex SQL queries using Microsoft Data Analysis, Data Modeling, Data profiling, T-SQL, and advanced functions.

Designed and implemented scalable data lake architecture on ADLS, leveraging best practices for data partitioning, folder structure, and metadata management to facilitate data discovery and exploration for data scientists and analysts.

Experience with Azure transformation projects and Azure architecture decision making Architect and implement ETL and data movement solutions using Azure Data Factory (ADF), SSIS.

Architect & implement medium to large scale BI solutions on Azure using Azure Data Platform services (Azure Data Lake, Data Factory, Data Lake Analytics, Stream Analytics, and Azure SQL DW.

Created comprehensive documentation and conducted training sessions for team members on Azure DevOps best practices and CI/CD methodologies, promoting knowledge sharing and skill development across the organization.

Architect and implement ETL and data movement solutions using Azure Data Factory, SSIS create and run SSIS Package ADF V2 Azure-SSIS IR.

Developed Databricks notebook (PySpark) to process data from different files and runtime converted into formatted records. This helps to maintain quick file and data manipulation operations in a very effective manner.

Regularly wrote Python routines to log into the database and fetch data.

Analyzed source system data and designed the star schema for the data warehouse to be developed.

Developed fact tables, dimensional tables, cubes for OLAP warehouse using Analysis Services.

Environment: Azure SQL Server, Azure Synapse, Data Bricks, T-SQL, SQL Server Integration Services (SSIS), SQL Server Reporting Services (SSRS), Power BI, GIT, Azure Data Lake Storage (ADLS), Data Factory, Azure Blob, Azure Storage Explorer, Informatica PowerCenter, PySpark, Power BI, PowerShell, JIRA, DevOps.

Hexagon, Irving, TX September 2019 – August 2020

Data Engineer

Responsibilities:

•Wrote various data normalization jobs for new data ingested into Redshift.

•Developed SSRS reports, SSIS packages to Extract, Transform and Load data from various source systems.

•Implemented and managed ETL solutions and automated operational processes.

•Performed end-to-end delivery of PySpark ETL pipelines on Azure Data Bricks to perform the transformation of data orchestrated.

•Designed and implemented Azure DevOps pipelines to automate the build, test, and deployment processes for a complex web application, resulting in a 50% reduction in manual deployment efforts.

•Developed serverless Python functions using Azure Functions to implement scalable data processing tasks, event-driven workflows, and real-time data processing solutions.

•Optimized Logic Apps for performance and cost-effectiveness by managing concurrency, batching, and polling triggers.

•Optimized and tuned the Redshift environment, enabling queries to perform up to 100x faster for Tableau and SAS Visual Analytics.

•Used Informatica Power Center for (ETL) extraction, transformation and loading data from heterogeneous source systems into target database.

•Migrated on premise database structure to Confidential Redshift data warehouse.

•Defined facts, dimensions and designed the data marts using Ralph Kimball's Dimensional Data Mart modeling methodology using Erwin.

•Strong understanding of AWS components such as EC2 and S3.

•Implemented a Continuous Delivery pipeline with Docker, Git Hub, and AWS.

•Built efficient, scalable ETL processes to load, cleanse and validate data.

•Participated in the full software development lifecycle with requirements, solution design, development, QA implementation, and product support using Scrum and other Agile methodologies.

•Worked on publishing interactive data visualizations dashboards, reports/workbooks on Tableau and SAS Visual Analytics.

•Worked on Big data on AWS cloud services i.e., EC2, S3, EMR and DynamoDB.

•Created Entity Relationship Diagrams (ERD), Functional diagrams, Data flow diagrams and enforced referential integrity constraints and created logical and physical models using Erwin.

•Created ad hoc queries and reports to support business decisions SQL Server Reporting Services (SSRS).

•Analyzed the system for new enhancements/functionalities and performed Impact analysis of the application for implementing ETL changes.

•Implemented Python-based data validation scripts to ensure data integrity and quality throughout the data engineering pipelines, reducing the risk of errors and discrepancies in downstream analysis.

•Involved in the Forward Engineering of the logical models to generate the physical model using Erwin and generate Data Models using Erwin and subsequent deployment to Enterprise Data Warehouse.

•Collaborated with team members and stakeholders in design and development of data environment.

•Wrote Python scripts to monitor Azure data pipelines, analyze performance metrics, and identify optimization opportunities, ensuring efficient utilization of resources and cost-effectiveness.

•Developed and deployed Azure Functions to implement serverless architecture for scalable and event-driven applications.

•Utilized triggers and bindings to seamlessly connect Azure Functions with other Azure services and third-party APIs.

•Prepared associated documentation for specifications, requirements, and testing.

•Managed security groups on AWS, focusing on high-availability, fault-tolerance, and auto scaling using Terraform templates. Along with Continuous Integration and Continuous Deployment with AWS Lambda and AWS code pipeline.

•Created various complex SSIS/ETL packages to Extract, Transform and Load data.

•Designed, developed, and tested various Power BI and Tableau visualizations for dashboard and ad-hoc reporting solutions by connecting from different data sources and databases.

•As a Power BI and Tableau SME responsible for the design, development and production support of interactive data visualizations used across the company.

•Was responsible for ETL and data validation using SQL Server Integration Services.

•Defined and deployed monitoring, metrics, and logging systems on AWS.

•Connected to Amazon Redshift through Tableau to extract live data for real time analysis.

•Used Hive SQL, Presto SQL, and Spark SQL for ETL jobs and using the right technology for the job to get done.

•Analyzed the existing application programs and tuned SQL queries using execution plan, query analyzer, SQL Profiler, and database engine tuning advisor to enhance performance.

•Optimized the TensorFlow Model for efficiency.

Environment: AWS, EC2, S3, Databricks, SQL Server, Erwin, Oracle, Redshift, Informatica PowerCenter, RDS, NOSQL, MySQL, Dynamo DB, Docker, PostgreSQL, Tableau, Git Hub.

Citigroup, Dallas, TX February 2017 – July 2019

Data Engineer

Responsibilities:

•Designing and Configuring Azure Cloud relational servers and databases analyzing current and future business requirements.

•Designing and developing CI/CD process using Azure Devops.

•Designing and creating Data Marts, Databases, Indexes, Views, Aggregations, Stored Procedures, Partitions and Data Integrity.

•Migrating Data from On-Perm SQL Server to Cloud databases Azure Synapse Analytics (DW) and Azure SQL DB.

•Experienced in designing and implementing data engineering pipelines and ETL processes on Azure Databricks to transform raw data into valuable insights and actionable information.

•Maintained data migration projects from on-premises data stores to ADLS, utilizing Azure Data Box and Azure Data Factory to transfer large volumes of data securely and efficiently, with minimal downtime and data loss.

•Developing Tabular Models on Azure Analytics services Developing Azure Data Factory pipelines to extract and manipulate data from Azure Blob storage/Azure Data Lake Storage (ADLS)/ SQL Server on cloud.

•Extracted data from different ERP source systems, OLTP servers, Cloud storage using SQL Server Integration Services and Azure Data Factory.

•Integrated Python-based machine learning models into Azure Machine Learning pipelines for predictive analytics and automated decision-making processes, optimizing model performance and scalability.

•Transforming Extracted data into multi-Dimensional cubes, Azure Synapse for building DataMarts to develop reports using Power BI.

•Implemented CI/CD pipelines using Azure DevOps for automated testing, building, and deployment of Azure Functions.

•Analyzed the business requirements and framing the Business Logic for the ETL Process and maintained the ETL process using Informatica PowerCenter.

•Extensively worked on Informatica transformations such as Source Qualifier, Joiner, Filter, Router, Expression, Lookup, Aggregator, Sorter, Normalizer, Update Strategy, Sequence Generator and Stored Procedure transformations.

•Developing custom stored procedures for delta loads, functions, triggers using SQL, T-SQL on cloud SQL server/Azure Synapse.

•Performing research to identify the source and nature of data required for ETL solutions using Azure Databricks.

•Migrate the entire CRM database present on the IBM DB2 servers, Informix to cloud based data warehouse called Snowflake using ETL DataStage.

•Acted as a subject matter expert for ADLS, providing technical guidance and training to cross-functional teams on best practices for data lake design, implementation, and management.

•Performing Data Validations of the DW using Power BI.

•Performance tuning of SQL queries, Data Pipelines, Tableau and Power BI Dashboards.

•Maintaining version control of code using Azure Devops and GIT repository.

Environment: Azure Data Factory, Azure Data Lake, Azure Data Bricks Azure Synapse Analytics (DW), Azure Devops, Snowflake,

Power BI, SharePoint, ADLS, SSIS, SQL, Spark, Python, GitHub.

Magnaquest, India Jun 2015 – Jan 2017

Azure Data Engineer

Responsibilities:

•Analyze, design, and build Modern data solutions using Azure PaaS service to support visualization of data. Understand current Production state of application and determine the impact of new implementation on existing business processes.

•Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL, and U-SQL Azure Data Lake Analytics. Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in In Azure Databricks.

•Created Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform, and load data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and backwards.

•Successfully implemented and configured Azure Data Lake Storage (ADLS) to meet specific business requirements, including setting up storage accounts, defining access controls, and optimizing storage resources.

•Developed Spark applications using Spark and Spark-SQL for data extraction, transformation, and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns.

•Responsible for estimating the cluster size, monitoring, and troubleshooting of the Spark data bricks cluster.

•Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.

•Designed and implemented data ingestion pipelines to efficiently ingest large volumes of structured and unstructured data into ADLS using tools such as Azure Data Factory, Azure Databricks, or custom scripts.

•Integrated Python-based data engineering pipelines into CI/CD workflows using tools like Azure DevOps or Jenkins, enabling automated testing, deployment, and monitoring of changes, fostering agility and collaboration.

•To meet specific business requirements wrote UDFs in Scala and Pyspark. Developed JSON Scripts for deploying the Pipeline in Azure Data Factory (ADF) that processes the data using the SQL Activity.

•Implemented data governance policies and security controls within Azure services to ensure data privacy and compliance with regulatory requirements, enhancing Tableau's data visualization capabilities.

•Utilized Python-based version control development and ensuring code quality and consistency. Systems like Git and collaboration platforms like Azure DevOps for managing code repositories, facilitating collaborative.

•Hands-on experience on developing SQL Scripts for automation purposes. Created Build and Release for multiple projects (modules) in production environment using Visual Studio Team Services (VSTS).

Environment: Azure PaaS services, T-SQL, Tableau, Spark SQL, Azure Data Lake Analytics, Azure Storage, Azure SQL, ADLS, Azure DW, Blob Storage, Pyspark, Azure Data Factory, Python, Power BI.

Concentrix, India June 2014 – Jun 2015

SQL Server developer

Responsibilities:

•Created a new database logical and physical design to fit new business requirements and implemented a new design using SQL Server 2005.

•Migrated DTS packages to SSIS, modified the packages accordingly to the new features of SSIS, and carried out a migration of databases from SQL Server.

•Participated in database logical design to fit the new business requirements and implemented the new design in SQL Server 2005.

•Utilized Python's data analysis libraries such as Pandas and NumPy to perform exploratory data analysis (EDA) on SQL Server., enabling deeper insights into the data and facilitating informed decision-making.

•Filtered bad data from legacy systems using T-SQL statements and implemented various constraints and triggers for data consistency.

•Dropped and recreated Index on the DSS system during the migration of data. Wrote complex T-SQL Queries to perform data validation and graph validation to make sure test results matched back to expected results based on business requirements.

•Integrated Python scripts with SQL Server queries to preprocess and cleanse data before loading it into databases, ensuring data quality and consistency.

•Created views to facilitate easy user interface implementation, and triggers on them to facilitate consistent data entry into the database.

•Scheduled the Reports to run daily and weekly in Report Manager and emailed them to the director and analysts to review in Excel Sheet.

•Identified, tested, and resolved database performance



Contact this candidate