Resume

Data Analyst Python Developer

Location:

Rogers, AR

Posted:

June 09, 2023

Contact this candidate

Resume:

Shirleen Jones B

Sr. Data Analyst /Data Engineer/SQL Developer/Python Developer

Email: adxmad@r.postjobfree.com

Phone: +1-417-***-****

* ***** ** ********** ** the IT industry, I have specialized in designing, developing, and delivering business intelligence solutions using programming languages such as R and Python. My expertise includes working with Big Data technologies like SQL SERVER Data tools, Tableau, Spark, and Power BI to create effective solutions for complex business problems.

PROFESSIONAL SUMMARY:

Proficient in using Spark in Python for advanced analytics, with experience in Spark-Hive and Spark-SQL in Databricks for data extraction, transformation, and aggregation to uncover customer usage patterns.

Extensive experience in designing, deploying, and managing big data processing and analytics solutions using AWS EMR.

Skilled in developing web services with Python and processing large datasets with Scala and PySpark.

Designed and developed multiple Power BI dashboards and reports while ensuring data privacy and security and worked on Mantas POC to bring metadata information from various source systems for technical lineage.

Proficient in working with various big data technologies such as Apache Hadoop, Apache Spark, Apache Hive, Apache Pig, and Apache HBase within the AWS EMR ecosystem.

Developed real-time data visualizations using Plotly and Dash, which allowed stakeholders to monitor key performance indicators (KPIs) and make data-driven decisions in real-time

Experienced in managing the full lifecycle of Data Science projects, including data collection, cleaning, preparation, validation, mining, and visualization using Tableau and Power BI, and exposure to Big Data topics like Hadoop, HDFS architecture, MapReduce, and PySpark.

Worked with NoSQL databases such as HBase, Cassandra, and MongoDB, including database performance tuning and data modeling.

Administration: Extensive experience in administering Oracle databases, including installation, configuration, backup and recovery, performance tuning, and security management.

Proficient in using various XML technologies and tools such as JAXP SAX/DOM, XSLT, XPath, and XQuery for XML document generation, parsing, and formatting.

Strong experience with the Mantas product, well-versed in various Mantas classes (AML, TC, BC) and the end-to-end lifecycle of the product including Ingestion Manager, Behavior detection, and paraphrase detection.

Extensive experience in administering PostgreSQL databases, including installation, configuration, performance tuning, and troubleshooting.

Extensive experience in administering Amazon Aurora databases, including deployment, configuration, and maintenance of Aurora database instances.

Skilled in creating and managing EMR clusters, including cluster configuration, scaling, and monitoring to ensure efficient and reliable processing of large-scale data workloads

Created custom dashboards using Plotly to provide insights into complex data sets, including real-time financial data and machine learning models.

Proficient in developing complex PL/SQL scripts, stored procedures, functions, and triggers to support data processing, data manipulation, and business logic implementation.

Experienced in creating data quality scripts using SQL and Hive to validate the successful data load and quality of data.

Demonstrated ability to develop and optimize data processing and analytics workflows on AWS EMR, leveraging distributed computing capabilities to derive insights from large and complex datasets.

Extensive experience with regular expressions and core features in Python using lambda, map reduce, etc., and implemented logging features using the Python logging library and profiling using Profile.

Worked with RESTful APIs and developed APIs using Flask and Django frameworks in Python.

Familiar with Agile methodology and worked in Agile teams, including participating in daily stand-up meetings, sprint planning, and retrospectives.

Knowledgeable about implementing high availability solutions in Oracle, such as Oracle Data Guard and Oracle Real Application Clusters (RAC), ensuring data redundancy, disaster recovery, and minimal downtime.

Experienced in designing efficient and scalable data models in PostgreSQL, including normalization, indexing strategies, and partitioning, to optimize database performance.

Knowledgeable in data warehouse models such as Star Schema and Snowflake schema, responsible for managing operations and support for Big Data Analytics platforms and Power BI visualizations.

Experienced in building business rules and creating ad-hoc reports using Teradata and SQL, as well as generating Ad Hoc reports with Oracle, Teradata, and UNIX.

Proficient in data migration testing and web interface testing, and automating data quality alerts using Databricks, Python, and HTML.

Proficient in using EMR along with tools like Apache Spark and Apache Hive to perform data transformation, extraction, and loading (ETL) tasks on large datasets, ensuring data quality and consistency.

Skilled in optimizing Oracle database performance by analyzing query execution plans, identifying bottlenecks, and implementing appropriate indexing, partitioning, and caching strategies.

Designed and implemented real-time streaming data visualizations using Plotly, allowing stakeholders to monitor live data feeds and react to changes in real-time.

Conducted load and performance testing on RESTful and SOAP services.

Proficient in working with various data formats such as JSON and XML for RESTful web services and XML for SOAP web services.

Experienced in the entire data science project life cycle, utilizing R, SAS Enterprise suite, Python, and Scala.

Experienced in designing and implementing backup and recovery strategies for Oracle MySQL databases, including full and incremental backups, point-in-time recovery, and disaster recovery planning.

Skilled in using Python libraries including NumPy, SciPy, matplotlib, Python-twitter, urllib2, and MySQL for database connectivity.

Expertise in designing and implementing backup and recovery strategies using Oracle Recovery Manager (RMAN), ensuring data integrity and availability in case of system failures or data corruption.

Proficient in migrating SQL databases to Azure Data Lake and controlling database access, as well as migrating on-premises databases to Azure Data Lake store using Azure Data Factory.

Skilled in data visualization using Tableau and administering IBM MQ Series for asynchronous message-based communication between distributed systems.

Knowledgeable about implementing security measures for PostgreSQL databases, including user management, access controls, encryption, and auditing, to protect data and ensure compliance with regulatory requirements.

Skilled in planning and executing database upgrades, applying patches and fixes, and ensuring compatibility and stability of Oracle MySQL databases.

Familiarity with integrating AWS EMR with other AWS services such as Amazon S3, Amazon Redshift, and AWS Glue for seamless data ingestion, storage, and analysis.

Experienced in Agile methodologies, Scrum stories, and sprints in a Python-based environment, as well as conducting data analytics, data wrangling, and Excel data extracts.

Proficient in designing and implementing backup and recovery strategies for PostgreSQL databases, including full and incremental backups, point-in-time recovery, and disaster recovery planning.

Experienced in maintaining the Hadoop cluster on AWS EMR and implementing Mantas 8.1 upgrade.

Reviewed the XML logical data model, developed XML schema XSD to validate the model, and used Jax-B for XML-Java Mapping and XML conversion.

Experience in automating EMR workflows using AWS services like AWS Step Functions, AWS Lambda, or Apache Airflow to streamline data processing pipelines and improve operational efficiency.

EDUCATION: Master’s in science and Health Informatics-The the University of Findlay. Findlay, USA.

TECHNICAL SKILLS:

Skill Category

Specific Skills

Programming Languages

Python, SQL, SAS, HTML, XML, UNIX shell scripting, Java, R

BI and Big Data Tools

Spark, Hadoop Map Reduce, Hive, Sqoop, Apache Storm, Apache Flink, Apache Beam, Apache Apex

RDBMS

Microsoft SQL Server, MySQL, Oracle, MS Access, PostgreSQL

Reporting tools

Microsoft Power BI, Tableau, QlikView, SAP Business Objects

Operating Systems

Windows, Linux, Unix

Web services

Apache, RESTful, SOAP

Database

Oracle, Teradata, PostgreSQL, SQL, OLAP, Snowflake, Mango DB, Cassandra

Cloud Technologies

Azure, AWS, Google Cloud Platform, IBM Cloud

PROFESSIONAL EXPERIENCE

Walgreen’s May 2022-March 2023

Sr. Data Analyst/ SQL Developer/Data Engineer/Python Chicago, IL.

Responsibilities:

.Experienced in multiple Python libraries for data analytics and development, including NumPy, SciPy, Matplotlib, Beautiful Soup, and Panda’s.

Created and configured high concurrency Spark clusters using Azure Databricks for faster data preparation.

Deployed and managed applications on IBM WebSphere Commerce platform for e-commerce websites and applications.

Knowledgeable in Agile methodologies, Scrum stories, and sprints for Python-based development, data analytics, data wrangling, and Excel data extracts.

Migrated an existing application to AWS, using EC2 and S3 for small data sets processing and storage, and maintaining the Hadoop cluster on AWS EMR.

Built and maintained Snowflake data warehouses and data marts, integrating them with ETL tools like Talend and Apache Airflow for improved data accessibility and accuracy.

Developed web services using Windows Communication Foundation and deployed on Cloud Service on Confidential Azure.

Experience in migrating data from other database platforms to Oracle MySQL, including schema conversion, data transfer, and compatibility testing, while ensuring data integrity and minimal downtime.

Building JavaScript functions to generate dynamic HTML components such as dynamic table lists, calendars, spreadsheets, and drop-down menus.

Implementing UI Tier with JSP, Servlet, JSF, XML, XML Schema, CSS, JavaScript AJAX.

Developing Unix shell scripts to copy, remove, and deploy Java applications to the Application sehver

Collaborated with cross-functional teams to develop custom Plotly dashboards and visualizations, providing real-time insights into business metrics and driving data-driven decision making.

Extensive knowledge of Azure Cloud environment, including implementing Azure tools like Data Factory.

Familiarity with PostGIS, the spatial extension for PostgreSQL, and its usage for storing and querying geospatial data, including spatial indexing, proximity analysis, and geospatial queries

Demonstrated ability to optimize data processing workloads on AWS EMR by leveraging features such as instance types, spot instances, and auto-scaling, resulting in improved

Proficient in Tableau desktop and server, managing sites and projects on the server.

Developed RESTful web services using Java and Spring framework for seamless data exchange between applications

Experience in ETL processing using SSIS for designing, testing, and deploying packages, and building data integration, workflow, and ETL solutions using SQL Server Integration Service (SSIS).

Trained end-users on Mantas 8.1 features and provided ongoing support.

Proficient in integrating AWS EMR with data visualization and reporting tools such as Amazon QuickSight or Tableau to create interactive dashboards and reports for data analysis and business intelligence purposes.

Proficient in scaling and optimizing Amazon Aurora database clusters for high performance and scalability, leveraging features such as read replicas, automatic scaling, and query optimization.

Skilled in planning and executing database upgrades, applying patches and fixes, and ensuring compatibility and stability of PostgreSQL databases.

Experienced in data visualization using various tools such as Tableau to produce tables, graphs, and listings.

Worked with senior architects and data modelers to plan and design the Actimize model and its underlying data structure.

Utilized Apache Spark with Python to develop and execute Big Data analytics and Machine learning applications.

Skilled in monitoring and troubleshooting AWS EMR clusters using tools like Amazon CloudWatch, AWS CloudTrail, and Apache Ambari to identify and resolve performance bottlenecks, errors, or failures.

Experienced in monitoring and optimizing PostgreSQL database performance using tools like pg_stat_monitor, EXPLAIN ANALYZE, and configuration tuning for memory, disk I/O, and parallelism.

Skilled in implementing high availability solutions for Oracle MySQL databases, such as replication, clustering, and failover mechanisms, to ensure continuous availability of critical systems.

Conducted user testing and gathered feedback to continuously improve the design and functionality of real-time data visualizations built with Plotly.

Experience in scripting and automating AWS EMR tasks using technologies such as Apache Spark, AWS CLI, or AWS SDKs (Python, Java, or Ruby), enabling efficient management and orchestration of EMR clusters.

Proficient in automating database administration tasks using scripting languages like Bash or Python, including routine maintenance, monitoring, and reporting, to improve efficiency and reduce manual effort.

Demonstrated commitment to continuous learning and staying updated with the latest advancements in big data technologies, AWS services, and best practices related to AWS EMR.

Developed a web application using Java with the Google Web Toolkit API with PostgreSQL.

Environment: Tableau, Python, SQL, Excel, Oracle, SSIS, ETL, Jupyter, Big Data, SAS, Java, XML, Apache, Spark, PL/SQL, AWS, Teradata, Hive, Sqoop, Mllib.

Verisk July 2020-November-2021

Oracle PL/SQL Developer/Data Analyst/Python Developer Jersey City, NJ

Responsibilities:

Developed various machine learning algorithms using pandas, NumPy, seaborn, SciPy, and matplotlib in Python.

Performed data analysis on an Oracle database using SQL and import data from SQL Server and Excel into SAS datasets.

Experience in implementing real-time analytics solutions on AWS EMR, integrating streaming data sources such as Apache Kafka or Amazon Kinesis with Spark Streaming or Flink for near real-time data processing and analysis.

Deployed and maintained Django projects on cloud platforms like AWS, Azure, or Google Cloud Platform.

Building a Global Credit Underwriting solution on Actimize platform to support new merchant applications as well as exiting merchants with additional outlets.

Utilized Plotly's APIs and libraries to build custom visualizations and dashboards, allowing stakeholders to monitor real-time data from multiple sources in a single view.

Proficient in orchestrating end-to-end data pipelines using AWS EMR along with other AWS services like AWS Glue, AWS Data Pipeline, or Apache Airflow, ensuring smooth data flow and processing across various stages.

Developed various machine learning algorithms using pandas, NumPy, seaborn, SciPy, and matplotlib in Python.

Designed and developed applications using Apache Spark, Scala, and Python to format, cleanse, validate, and create the schema.

Wrote various scripts to query several networking devices and databases to gather information and stored that data in MySQL and NoSQL databases.

Developed custom middleware components to handle cross-cutting concerns such as logging, auditing, and error handling for RESTful and SOAP services.

Knowledgeable about implementing security measures on AWS EMR, including encryption at rest and in transit, IAM roles, VPC configurations, and network access controls, to protect data and ensure compliance.

Extensive experience in administering PostgreSQL databases, including installation, configuration, performance tuning, and troubleshooting.

Implemented real-time data visualizations for IoT devices using Plotly, enabling users to monitor and control devices in real-time.

Used SAS ODS to produce HTML, PDF, and RTF format files. Developed models for several application systems, tested and implemented databases, and supported the ETL process.

Implemented layered architecture for Hadoop to modularize design. Developed framework scripts to enable quick development. Designed reusable shell scripts for Hive, Sqoop, Flink, and PIG jobs. Standardized error handling, logging, and metadata management processes.

Implemented advanced procedures like text analytics and processing using in-memory computing capabilities like Apache Spark written in Python.

Proficient in writing complex SQL queries and optimizing query performance by analyzing query plans, indexing strategies, and database statistics in PostgreSQL.

Tested complex SQL scripts for Teradata databases for creating a BI layer on DW for tableau reporting.

Integrated third-party APIs and services into RESTful and SOAP services, including authentication and authorization providers, SMS gateways, and email delivery services.

Experienced with Integration Services (SSIS), Reporting Services (SSRS), and Analysis Services (SSAS).

Environment: Oracle, Excel, SQL, SAS, NoSQL, Hadoop, UNIX, Python, SciPy, NumPy, Apache, Tableau, PowerBI, Azure, AWS, Hive, Flink, SSRS, SSIS, OLAP,

Medicover Hospitals Jan-2019-June 2020 Data Analyst/ SQL Developer/Python/Data Engineer Hyderabad, India

Responsibilities:

Created end-to-end Data Analytics and Automation systems, incorporating custom visualization tools utilizing Python, R, Tableau, and Power BI.

Analyzed logs and utilized various Python libraries to predict and forecast the next occurrence of an event.

Developed PySpark applications to extract data from multiple vendor sources to the Data Lake using Sqoop, performed transformations on raw data using Spark SQL, built data extracts, and stored them in Hive tables.

Expertise in MapReduce, Apache Crunch, Hive, Pig, and Splunk for analyzing data and writing Hadoop Jobs.

Utilized Python packages & Spark SQL for initial data visualizations and exploratory data analysis.

Ingested data to Azure Services such as Azure Data Lake, Azure Storage, Azure SQL, and Azure DW, and processed the data in Azure Databricks.

Analyzed metric dashboard reports, identified formulas, and functionality, and digitized the metric dashboards to the Tableau application.

Published dashboard reports to Tableau Server and scheduled them on a weekly basis.

Strong documentation skills, including creating technical documentation, runbooks, and best practices guides, as well as conducting knowledge-sharing sessions and training to enable effective utilization of AWS EMR within the organization.

Experience in Data transformation, Data Mapping, and Data Cleansing procedures using SQL queries, Regular expressions string manipulations, and data frames.

Experienced in ETL processes using Python, C Sharp, and Java Scripting languages.

Utilized EMR (Elastic Map Reducing) to perform big data operations in AWS.

Experience in deploying Django applications using web servers such as Apache or Nginx.

Planned dashboards with Tableau 9.2 and produced complex reports including synopses, outlines, and diagrams to present findings to teams and stakeholders, also used Power BI.

Participated in all phases of data mining; data collection, cleaning, developing models, validation, visualization, and performed Gap analysis.

Built data pipelines for reporting, alerting, and data mining. Experienced with table design and data management using HDFS, Hive, Impala, Sqoop, MySQL, and Kafka.

Project experience in Data mining, segmentation analysis, business forecasting, and association rule mining.

Environment: Tableau, Python, SQL.EXCEL, Oracle, SSIS, ETL, Jupyter, Big Data, SAS, Apache, Spark, PL/SQL, AWS, Teradata, Hive, Sqoop, Mllib, PowerBi, MySQL, Django.

Micronet Solutions Feb 2016-Dec 2018

Big Data Analyst/SQL Developer Hyderabad, India

Responsibilities:

Working in the Advanced Operational Analytics and Big Data Analysis team.

Creating reports using Power BI, Tableau, SAP Business Objects-Crystal Report, Web Intelligence (Web).

Expertise in data profiling and Designing Excel Pivot tables and V-Lookup tables to analyze large data.

Creating Proof of concepts for different users in the Tableau server.

Designing and maintaining databases using Python and developing Python-based API (RESTful Web Service) using Flask, SQL Alchemy, and PostgreSQL.

Writing, testing, and implementing triggers, stored procedures, and functions at the Database level using PL/SQL.

Utilizing analytical applications like R, SPSS, Rattle, and Python to identify trends and relationships between different pieces of data, draw appropriate conclusions and translate analytical findings into risk management and marketing strategies that drive value.

Proficient in programming/query languages such as R Programming, Python Programming (NumPy, pandas, scikit-learn, seaborn, matplotlib, NLTK), NoSQL, PySpark, PySpark SQL, SQL, R Studio, Jupiter Notebooks.

Troubleshooting and resolving complex issues with data pipelines, working with the Snowflake support team to identify and resolve root causes.

Installing and configuring Apache Hadoop, Hive, and Pig environments.

Extracting Data from Teradata using Informatica Power Center ETL and DTS Packages to the target database, including SQL Server, and using the data for reporting purposes.

Developing ETL pipelines to extract data from various source systems and load it into Snowflake, using Python and SQL to ensure data accuracy and completeness.

Developing Loading Scripts for the Teradata database using Fast Load Utility.

Using Teradata SQL Assistant and Nexus client tools to access Teradata database, TOAD and SQL Navigator client tools for Oracle database, and SQL Query analyzer for SQL Server 2005/2008 database.

Assisting client's IT department in supporting different tasks such as data migration from Maximo 5.2 to Maximo 7.5 SQL Server DB, including the creation of a clustered WebSphere environment to support multiple nodes.

Experienced with AWS AZURE services to smoothly manage applications in the cloud and create or modify instances.

Knowledgeable about big data technologies, Hadoop, Hive, Map Reduce, NoSQL, JavaScript, Linux, HTML CSS, AWS, and Salesforce Customer Relationship Management (CRM) platform.

Environment: Tableau, Python, SQL.EXCEL, Oracle, SSIS, ETL, Jupyter, Big Data, SAS, Apache, Spark, PL/SQL, AWS, Teradata, Hive, Sqoop, Mllib.

Contact this candidate