Data Analyst Engineer

Location:

Memphis, TN

Posted:

November 13, 2024

Contact this candidate

Resume:

MANASA REDDY SEELAM

Collierville, TN 901-***-**** **********@*****.*** https://www.linkedin.com/in/manasa-s-43a1a12ba/ SUMMARY

• Experienced Data Engineer with over 3 years in developing and optimizing data architectures, showcasing proficiency in Python for advanced data manipulation, analysis, and engineering applications.

• Advanced SQL skills (SSIS, SSMS, SSRS) for designing robust database schemas, optimizing queries, and managing ETL processes. Expertise in creating insightful reports with SSRS and data exploration with SAS.

• Strong Hadoop platform support experience, including Cloudera, Amazon EMR, Azure HDInsight, and Hortonworks, with advanced proficiency in MapReduce and Hive for big data processing.

• Proficient in cloud technologies, particularly AWS services such as EC2, S3, ELB, CloudWatch, RDS, DynamoDB, and AWS Athena, along with cloud data warehousing and migration.

• Expert in data manipulation and analysis using libraries such as NumPy, Pandas, Scikit-Learn, SciPy, TensorFlow, and OpenCV, and adept in tools like SQL Server, MySQL, T-SQL, Shell Scripting, and MongoDB.

• Skilled in data visualization and reporting with Power BI, Tableau, SharePoint, and MS Office, and demonstrated ability to drive operational efficiency improvements.

• Designed and implemented data pipelines, ensuring efficient data flow and processing, and analyzed key Business KPIs using Hive, Pig, and Spark, contributing to informed decision-making.

• Collaborative team player with a track record of identifying and addressing business challenges through innovative data solutions, boosting operational efficiency by 20%. EDUCATION

Christian Brothers University - Memphis, TN Jun 2022 – Dec 2023 Masters of Science in Computer Information Systems Veltech University Jun 2016 – May 2020

Bachelors of Technology in Computer Information Systems TECHNICAL SKILLS

Programming Languages Python, SQL, Shell Scripting, R & T-SQL, Scala Python Libraries: NumPy, Pandas, Matplotlib, SciPy, Scikit-learn, Seaborn, TensorFlow,, Beautiful Soup Databases MySQL, PostgreSQL, Snowflake, Oracle, DB2, Hive, Redshift, Azure SQL DW, Data bricks Database

Big Data Technologies Yarn, MapReduce, Pig, Hive, HBase, Cassandra, Oozie, Apache Spark, Scala, Impala, Kafka. NoSQL Database Cassandra, MongoDB

Reporting Tools/ETL Tools Informatica, Talend, SSIS, SSRS, SSAS, ER Studio, DBT, Tableau, Power BI, Grafana Methodologies Agile/Scrum, Waterfall

Data Visualization and

Reporting:

Tableau, Power BI, Looker, QlikView, Google Data Studio. Data Quality and

Governance:

Apache Atlas, Collibra, Informatica Data Quality, Talend, Data Quality, Trifacta. CI/CD and Version Control: Bricks, Kafka, Jenkins, Git, GitLab, Bit bucket. PROFESSIONAL EXPERINECE

BCBS, Data Engineer TN, USA Feb 2024 – Present

• Crafted Spark applications using PySpark and Spark-SQL for data extraction, transformation, and aggregation, resulting in a 25% improvement in data processing speed and a 99% data accuracy rate across healthcare datasets.

• Utilized Azure cloud services, including HDInsight, Databricks, Data Lake, Blob, Data Factory, Synapse, SQL DB, and SQL DWH, streamlining cloud-based data storage and processing, improving system reliability by 30%.

• Managed third-party data sources and applied transformations using PySpark and custom-built packages, ensuring 95% data consistency and reducing computational costs by 20%.

• Engineered complex MapReduce and Spark jobs for data enrichment and transformation, increasing distributed computing efficiency by 20%, and incorporated advanced analytics models using PyTorch, boosting predictive accuracy by 15%.

• Formulated automated Databricks workflows for parallel data loads, improving processing speed by 40%, and employed SSIS, SSMS, and SSRS solutions, leading to a 98% success rate in data pipeline execution.

• Synchronized Kafka with Spark Streaming for real-time data analysis, reducing decision-making time by 50% and improving the speed of insights for clinical data.

• Established an ETL framework using Spark and Hive, cutting data processing time by 30%, while optimizing SQL Server and MySQL databases for fast retrieval of patient information and medical records.

• Produced Tableau visualizations, such as bar graphs, scatter plots, and geographical maps, leading to a 35% increase in stakeholder engagement and driving 20% more data-informed decisions.

• Implemented machine learning models using Scikit-Learn and PyTorch, integrating predictive analytics into data pipelines to enhance disease prediction accuracy and personalized treatment plans. Accenture Solutions Pvt Ltd, Data Engineer India Sep 2020 – Apr 2022

• Revitalized a 10GB+ data warehouse infrastructure, enhancing automation by 60%, minimizing downtime, and improving data accessibility, leading to seamless operational continuity across healthcare datasets.

• Crafted Spark applications using PySpark and Spark-SQL for ETL processes, resulting in a 25% improvement in data processing speed and optimized integration of third-party sources, achieving 99% data accuracy and 95% consistency.

• Utilized AWS and Azure cloud services, including S3, EC2, IAM, RDS, HDInsight, Databricks, Glue, Redshift, and Data Lake, to improve data storage, processing efficiency, and reduce computational costs by 20%, with a 40% boost in processing speed.

• Performed complex ETL operations with Python, SparkSQL, and Redshift, processing terabytes of data, and optimizing batch processing time by 2 hours using Informatica Power Center and Talend for data integration.

• Designed and automated Databricks and AWS workflows, leveraging PySpark and Glue dynamic frames, reducing data processing time by 30%, while increasing reliability by 30%, and implementing SSIS/SSRS to improve reporting accuracy.

• Transformed real-time data analysis using Kafka integrated with Spark Streaming, resulting in a 50% faster decision- making process, and utilized AWS EMR to efficiently move and transform large datasets across S3 and DynamoDB.

• Developed Tableau and Power BI visualizations, creating dynamic reports that increased stakeholder engagement by 35%, improving data-driven decision-making by 20%, while automating reporting tasks with Excel VBA.

• Executed end-to-end cloud infrastructure management with Terraform, Jenkins, and AWS CI/CD pipelines, elevating deployment efficiency by 25% and ensuring compliance with security standards through AWS CloudWatch monitoring.

•

Nokia, Data Analyst India Sep 2019 – Aug 2020

• Utilized Python to create and maintain scripts for data processing and analysis, leveraging libraries such as NumPy and Pandas to handle large datasets efficiently.

• Developed complex SQL queries to extract and manipulate data from MySQL databases, enhancing the accuracy and relevancy of data for analysis, which resulted in reducing data retrieval time from 30 minutes to 10 minutes.

• Managed and optimized Extract, Transform, Load (ETL) operations to ensure accurate and efficient data processing, reviewing and refining data integration strategies to enhance throughput and data quality, significantly reducing processing times and errors

• Designed and developed intuitive and user-friendly Power BI dashboards that provide critical insights and analytics for decision- makers, focusing on interactive and visually appealing elements to facilitate informed decision-making across the organization

• Implemented and managed automation solutions using Power Automate to streamline business processes and reduce manual effort, developing automation scripts that improved operational efficiency by over 30%

• Regularly engaged with department managers and other key stakeholders to gather and analyze data requirements, provided consistent updates on project progress and data insights to ensure alignment with business objectives and data-driven decisions

• Applied Agile (Scrum) methodologies to manage and execute data projects, ensuring timely delivery and adherence to business requirements through regular sprint reviews and planning sessions.

• Conducted data cleansing and transformation to improve data quality and consistency, which supported accurate and effective data analysis, reducing data-related errors by half. PROJECT EXPERIENCE

Driver drowsiness detection System

The project is designed to overcome the problem of accidents due to drowsiness.

This system will alert the driver while driving based on the number of times eyebrows close per minute with the help of a camera.

Report creation of data using Power BI and Automating tasks using Power Automation

Generated reports with complex data along with DAX formula usage. On the other hand, I have automated workflows for many scenarios.

Density based Traffic Controlling System

The aim of the project is to monitor the patient’s health condition from a mobile place without staying beside the patient.

It will record the readings from time to time and intimates if there is any abnormal condition.

Contact this candidate