Data Analysis Analyst

Location:

Posted:

April 18, 2024

Resume:

***PROFESSIONAL SUMMARY:

Around *+ years of Professional experience as a Data Analyst in Data Analysis, Data Profiling, Data Integration, Migration, Data governance and Metadata Management, Master Data Management and Configuration Management.

Expertise in data models, database design development, data mining and segmentation techniques. Strong experience in Data Analysis, Data Profiling, Data Cleansing & Quality, Data Migration, Data Integration.

Hands - on experience in using software packages such as SAS for analytical modeling and data management.

Solid experiences in using specific Programming languages like Python, R, SQL, Scala and SAS. Expert in backend testing to verify Data integrity and Data validation of client server and web-based applications using SQL.

Hand-on experience in performing statistical analysis, causal inference and statistical modeling in R and Python, interpreting and analyzing results of hypothesis tests providing recommendations based on data.

Proficient in utilizing PostgreSQL for data analysis, demonstrating advanced SQL skills to extract, transform, and analyze large datasets.

Proficient in using Python to connect to relational databases like PostgreSQL for data extraction and analysis.

Good experience and knowledge on Amazon Web Services like AWS S3, RDS, Redshift, Glue, Lambda, Athena, AWS Cloud Trail, Cloud Trail, Cloud Front and IAM.

Extensively worked as an Azure service using Azure Data bricks, Data Lake, Data Factory, Data Lake Storage, Synapse Analytics, NOSQL DB and Azure HDInsight’s

Expertise in implementing data partitioning and indexing strategies for optimal performance in Azure Data bricks, Data Lake, and Data Lake Analytics.

Implemented data transformation and cleansing activities within Azure Data Factory, guaranteeing the quality and integrity of data stored in Azure Blob Storage.

Skilled in automating data cleansing, enrichment, and transformation tasks with AWS Glue, AWS Lambda and AWS Athena enhancing data quality and accuracy for analytical insights.

Skilled in utilizing Hadoop's Map Reduce for data transformation, cleaning, and processing, ensuring data quality and consistency for analytical purposes.

Proficient in leveraging Hadoop ecosystem tools such as HDFS, Map Reduce, Hive, Pig and YARN to efficiently process and analyze vast amounts of structured and unstructured data.

Proficient in implementing data partitioning strategies in MySQL for improved query performance and management of large datasets.

Proficient in advanced Excel functions, including VLOOKUP, HLOOKUP, INDEX-MATCH, and complex nested formulas, ensuring accurate data analysis.

Proficiency in Excel Power Query and Power Pivot for seamless integration and manipulation of diverse datasets, enabling comprehensive data analysis.

Experienced in Data Visualization Reporting via QlikView and Qlik Sense.

Strong experience in extracting, blending and joining varies data sources to implement and optimize QVD and loading scripts in QlikView.

Strong expertise in data extraction, transformation and loading processes using Tableau Prep and Power BI tools.

Skilled in utilizing Tableau's mapping capabilities to visually represent geographical data, enabling geographic insights and trend analysis.

Proficient in creating compelling visualizations with Jupyter Notebooks and Matplotlib/Seaborn, translating complex data into clear, actionable insights for stakeholders.

Proficient in generating custom reports and dashboards in Google Analytics to extract actionable insights for business stakeholders.

TECHNICAL SKILLS:

Programming Languages

SAS, Python, R, SQL, Scala

Data Modeling and Database Design

Data models, database design, data mining, segmentation techniques, data analysis, data profiling, data cleansing, data quality, data migration, data integration

Database Technologies

PostgreSQL, MySQL, Hadoop (HDFS, Map Reduce, Hive, Pig, YARN), Tableau, Power BI, Google Analytics

Clouds

AWS, Azure

Big Data Technologies

Azure HDInsight, Hadoop ecosystem tools (HDFS, Map Reduce, Hive, Pig, YARN), Spark, AWS EMR, Delta Lake, Delta Tables

Data Visualization Tools

Tableau, QlikView, Qlik Sense, Matplotlib, Seaborn, Plotly, Dash, ggplot2, SAS Visual Analytics, Power Query, Power Pivot

Web Scraping

Beautiful Soup, Scrapy

Documentation

Jupyter Notebooks, SAS Business Intelligence tools, Google Analytics reports, MySQL documentation, QlikView documentation,

Data Quality

Data quality checks, validation processes, quality assurance measures, performance tuning for databases

PROFESSIONAL EXPERIENCE:

Client: AgFirst, Columbia, SC. Feb 2021 – Present

Role: Sr. Data Analyst

Responsibilities:

Developed and managed Azure-based big data and analytics solutions using technologies such as Azure HDInsight, Data Lake Storage and Stream Analytics.

Handled data ingestion into various Azure services such as Azure Data Lake, Storage, Azure SQL, and Azure Data Warehouse, followed by data processing in Azure Data bricks.

Implemented medium to large scale BI solutions on Azure using Azure Data Platform services (Azure Data Lake, Data Factory, Data Lake Analytics, Stream Analytics, SQL DW, Data bricks and NoSQL DB).

Involved in SSIS packages to extract data from different sources like SQL server, MS Excel, MS Access, transform and then load into Dimension and Fact tables in Data Warehouse using SSIS.

Created visually appealing and insightful reports using Excel charts and graphs to effectively communicate complex data trends to stakeholders.

Implemented and optimized data processing workflows on Hadoop and Spark clusters, ensuring efficient data ingestion, storage, and retrieval on the Hadoop (HDP) platform.

Developed and maintained real-time data processing pipelines using Hadoop ecosystem tools, ensuring the timely availability of insights from streaming data sources.

Implemented Tableau data-driven alerts and notifications to proactively address critical business changes.

Performed Data Analysis using visualization tools such as Tableau, Spot fire and SharePoint to provide insights into the data.

Utilized Python libraries such as Beautiful Soup and Scrapy for web scraping, enabling the collection of relevant data from online sources.

Managed and queried databases using Python (SQL Alchemy) and R (dbplyr), ensuring seamless integration of data from various sources for comprehensive analysis.

Implemented enhanced e-commerce tracking in Google Analytics, providing comprehensive insights into customer behaviour, product performance, and revenue attribution.

Conducted performance tuning activities, such as index optimization and query profiling, to enhance the overall efficiency of MySQL databases for seamless data analysis.

Developed strategies for database scalability, anticipating future data growth and ensuring MySQL databases can seamlessly handle increased workloads as the organization expands.

Automated routine database tasks, including backups, updates, and monitoring, to ensure the reliability and availability of PostgreSQL databases.

Utilized QlikView’s associative data model to perform ad hoc analysis, enabling quick exploration and discovery of patterns, trends, and outliers in the data.

Created interactive and visually appealing data visualizations using Matplotlib and Seaborn in Jupyter Notebook to communicate insights effectively.

Utilized Jira as the primary project management tool, creating and managing tasks, user stories, and epics to track progress and ensure timely delivery of data engineering projects.

Actively engaged in sprint planning meetings, collaborating with the Agile team to select user stories and define tasks for the upcoming sprint.

Environment: Azure (HDInsight, Data Lake, Stream Analytics, Data Lake, Data Warehouse, Data Factory, Data Lake Analytics, Data bricks, NoSQL DB), SQL, Excel, Hadoop, Spark, Tableau, Spotfire, SAS, Python, Beautiful Soup, Scrapy, SQL Alchemy, R, Google Analytics, MySQL, PostgreSQL, QlikView, Matplotlib, Seaborn, Jupyter Notebook,

Client: Chewy, Dania Beach, FL. Mar 2019 – Jan 2021

Role: Sr. Data Analyst

Responsibilities:

Utilized AWS Glue and Map Reduce (EMR) for Extract, Transform, Load (ETL) processes, optimizing data transformation and preparation.

Used AWS Cloud Watch, AWS Cloud Trail and AWS X-Ray to ensure the reliability and performance of data pipelines.

Used AWS Redshift, S3, Spectrum and Athena services to query large amounts of data stored on AWS S3 to create a Virtual AWS Data Lake without having to go through the ETL process.

Developed automated processes using Excel macros and VBA to streamline repetitive tasks, increasing efficiency in data handling.

Utilized Excel for data forecasting and trend analysis, providing valuable insights for future planning and resource allocation.

Performed Data analysis and Data profiling using complex SQL on various sources systems including Oracle and Teradata.

Responsible for generating Financial Business Reports using SAS Business Intelligence tools (SAS/BI) and also developed ad-hoc reports using SAS Enterprise Guide.

Developed visualizations using Python's Matplotlib and Seaborn to effectively communicate complex data patterns to stakeholders.

Prepared comprehensive documentation for analysis methodologies, results, and insights using both Python and R, facilitating knowledge sharing and reproducibility.

Experienced in understanding the security requirements for Hadoop and integrating with Kerberos authentication and authorization infrastructure.

Experienced in optimizing Hadoop cluster performance by configuring resource allocation, data partitioning, and employing compression techniques to enhance query speed and efficiency.

Utilized Google Analytics to perform cohort analysis, uncovering long-term user trends and informing product development and marketing strategies.

Developed complex SQL queries to extract, manipulate, and analyze data from relational databases.

Created and managed user accounts and permissions to control access to MySQL databases.

Created custom functions and stored procedures in PostgreSQL to encapsulate complex logic, promoting code reusability and enhancing maintainability.

Implemented data storytelling techniques in QlikView to enhance the communication of insights to non-technical stakeholders.

Implemented and maintained data governance standards within QlikView, ensuring data integrity, security, and compliance with organizational policies.

Utilized Microsoft Word to create comprehensive technical documentation, including data engineering specifications, system architecture diagrams, and process flowcharts.

Leveraged Microsoft Excel for data analysis, using functions, formulas, and pivot tables to extract insights from datasets.

Created compelling data visualizations using SAS tools, such as SAS Visual Analytics, to effectively communicate complex trends and patterns, making data more accessible to non-technical stakeholders.

Utilized Jupyter Notebook for feature engineering, transforming raw data into meaningful variables for machine learning models.

Environment: AWS (Glue, EMR, Cloud Watch, Cloud Trail, X-Ray, Redshift, S3, Athena), Excel, VBA, SQL, Oracle, Teradata, SAS, Python, Matplotlib, Seaborn, R, Hadoop, Kerberos, Tableau, Google Analytics, MySQL, PostgreSQL, QlikView, Jupyter Notebook, Machine Learning.

Client: Thomson Reuters, Chennai, India. Aug 2017 – Jan 2019

Role: Data Analyst

Responsibilities:

Utilized advanced data partitioning and indexing techniques in Azure Data Factory and Azure Synapse Analytics to improve query performance and reduce latency.

Experienced in implementation of Lakehouse architecture on Azure using Azure Data Lake, Delta Lake, Delta Tables and Azure Data bricks.

Worked with Azure BLOB and Data Lake storage and loading data into Azure SQL Synapse analytics (DW).

Utilized statistical functions in Excel to conduct hypothesis testing, regression analysis, and other statistical methods for robust data exploration.

Integrated diverse data sources seamlessly in Excel, ensuring a comprehensive view for analysis and supporting holistic decision-making processes.

Implemented lots of metadata driven macros that are used across multiple applications and saved the macros in permanent SAS Catalog for reuse.

Created interactive dashboards and reports using Python libraries like Plotly and Dash to provide real-time insights to business stakeholders.

Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, HBase database and Sqoop.

Implemented data quality checks and validation processes within Hadoop, ensuring the accuracy and reliability of analytical results by identifying and rectifying data anomalies.

Utilized Python in conjunction with Docker for containerization, facilitating the deployment and scalability data analysis applications.

Implemented Python scripts to enforce data governance frameworks, ensuring consistency, integrity, and compliance with organizational policies.

Developed and implemented SQL Script to normalize and reformat client data elements into company standards.

Monitored and optimized SQL queries for MySQL databases to enhance query performance.

Utilized MySQL monitoring tools to proactively identify and address performance bottlenecks, ensuring optimal database performance and responsiveness.

Utilized PostgreSQL partitioning features to enhance data organization and retrieval speed, optimizing large datasets for analytical queries.

Maintained detailed documentation of SAS code, ensuring transparency and replicability of analyses, and adhered to best practices for code management and version control.

Experienced in designing stunning visualizations using Tableau software, publishing and presenting dashboards, stories on web and desktop platforms.

Created actions, parameters, Filter Local, Global, and calculated sets for preparing dashboards and worksheets using Tableau Desktop.

Experienced in creating robust data models in QlikView, optimizing structures for efficient data retrieval and analysis, resulting in streamlined reporting processes.

Implemented version control and documentation practices to maintain a structured and organized QlikView development environment.

Conducted code reviews and documentation in Jupyter Notebook to enhance team collaboration and knowledge sharing.

Generated customized reports in Google Analytics, extracting actionable insights to guide marketing and business strategies.

Environment: Azure (Data Factory, Synapse Analytics, Data Lake, Delta Lake, Data bricks, Blob Storage, SQL Synapse Analytics), Excel, SAS, Python, Plotly, Dash, Docker, Hadoop, Pig, HBase, Sqoop, SQL, MySQL PostgreSQL, Tableau, QlikView, Jupyter Notebook, Google Analytics.

Client: Merck, Mumbai, India. Jun 2015 – Jul 2017

Role: Data Analyst

Responsibilities:

Documented architecture, configurations, and procedures for AWS S3, Lambda, and AWS EMR implementations.

Designed and developed Security Framework to provide fine grained access to objects in AWS S3 using AWS Lambda and Dynamo DB.

Design and Develop ETL Processes in AWS Glue to migrate Campaign data from external sources like AWS S3, Text Files into AWS Redshift.

Developed error-handling mechanisms in Excel formulas and scripts to enhance data accuracy and troubleshoot issues efficiently.

Created custom formulas in Excel to address specific analytical requirements, demonstrating a deep understanding of data manipulation and transformation.

Experienced in various components of the Hadoop ecosystem, including Hive, Pig, HBase, and Impala, to leverage a diverse set of tools for specific data processing and analysis requirements.

Created advanced SQL queries, including window functions and CTEs, for complex data analysis tasks.

Maintained comprehensive documentation for MySQL databases, including schema designs, configurations, and troubleshooting procedures.

Designed and maintained MySQL databases and created pipelines using user-defined functions and stored procedures for daily reporting tasks.

Utilized PostgreSQL's advanced features like window functions and common table expressions for complex data analysis.

Developed customized Python solutions for specific data challenges, demonstrating adaptability and problem-solving skills in diverse scenarios.

Created compelling visualizations using libraries such as Matplotlib, Seaborn (Python) and ggplot2 (R) to communicate complex data patterns and trends effectively.

Developed comprehensive reports and dashboards using SAS tools, facilitating clear communication of analytical findings to stakeholders and management.

Utilized SAS programming for efficient data transformation tasks, including merging datasets, creating new variables, and reshaping data structures as needed for analysis.

Implemented Tableau automated reporting solutions, reducing manual effort and improving efficiency.

Conducted Tableau user training sessions to foster self-service analytics and empower non-technical users.

Implemented robust data models within QlikView to organize and structure data for optimal performance, facilitating efficient data analysis and reporting.

Implemented data security measures to safeguard sensitive information in QlikView applications.

Environment: AWS (S3, Lambda, EMR, Dynamo DB, Glue, Redshift), Excel, Hadoop, Hive, Pig, HBase, Impala, SQL, MySQL, PostgreSQL, Python, Matplotlib, Seaborn, Ggplot2, R, SAS, Tableau, QlikView.

EDUCATION: JAWAHARLAL NEHRU TECHNOLOGICAL UNIVERSITY Hyderabad, TS, India

BTech in COMPUTER SCIENCE AND ENGINEERING June 2011 - April 2015

Major in Computer Science

Contact this candidate