Data Engineer Processing

Location:

Washington, DC

Salary:

85000

Posted:

February 05, 2024

Contact this candidate

Resume:

Sai Sandeep Gollamudi

DATA ENGINEER

Denton, TX Mobile: 301-***-**** Email: *************@*****.*** SUMMARY

• Seasoned data engineer with 4 years of experience designing, developing, and optimizing data solutions to support data-driven decision-making and business objectives.

• Proven expertise in enhancing data processing efficiency through ETL workflows, optimizing extraction, transformation, and loading processes.

• Adept at leveraging cloud platforms, including AWS and GCP, and utilizing Big Data technologies such as Apache Hadoop, Apache Spark, and Apache Kafka.

• Proficient in managing and optimizing databases, including MySQL, PostgreSQL, MongoDB, and T-SQL, with a focus on query performance and indexing.

• Skilled in using a variety of tools such as Apache NiFi, Tableau, Power BI, and Excel for comprehensive data processing, visualization, and analysis.

• Familiarity with a broad range of tools including SSIS, SSRS, SSAS, Docker, Kubernetes, Jenkins, Terraform, Informatica, Talend, Amazon Redshift, Snowflake, and Google BigQuery.

• Applied machine learning algorithms, statistical methods, and advanced analytics techniques to derive meaningful insights from diverse datasets.

• Proven ability to approach complex data challenges with critical thinking and creative problem-solving, ensuring effective and efficient solutions.

SKILLS

Methodologies: SDLC, Agile, Waterfall

Programming Language: Python, SQL, R

Packages: NumPy, Pandas, Matplotlib, SciPy, Scikit-learn, TensorFlow, Seaborn Visualization Tools: Tableau, Power BI, Advanced Excel (Pivot Tables, VLOOKUP) IDEs: Visual Studio Code, PyCharm, Jupyter Notebook Cloud Platforms: Amazon Web Services (AWS), Google Cloud Platform (GCP) Database: MySQL, PostgreSQL, MySQL, MongoDB, T-SQL Data Engineering Concept Apache Spark, Apache Hadoop, Apache Kafka, Apache Beam, ETL/ELT Other Technical Skills: SSIS, SSRS, SSAS, Docker, Kubernetes, Jenkins, Terraform, Informatica, Talend, Amazon Redshift, Snowflake, Google Big Query, Data Quality and Governance, Machine Learning Algorithms, Natural Language Process, Big Data, Advance Analytics, Statistical Methods, Data Mining, Data Visualization, Data warehousing, Data transformation, Critical Thinking, Communication Skills, Presentation Skills, Problem-Solving

Version Control Tools: Git, GitHub

Operating Systems: Windows, Linux, Mac iOS

EDUCATION

Master of Science in Computer and Information Science - University of North Texas, Denton, USA Bachelor of Technology in Computer Science and Engineering - K L University, Guntur, India EXPERIENCE

Data Engineer HCA Healthcare, TX Sept 2022-Present

• Enhance patient data processing at HCA Healthcare by implementing Apache NiFi for ETL workflows, optimizing data extraction, transformation, and loading processes.

• Utilize AWS cloud services, including S3 for scalable storage and AWS Glue for automated ETL job execution, resulting in a 20% reduction in storage costs.

• Optimizing and tuning the Redshift environment, enabling queries to perform up to 100x faster for Tableau and SAS Visual Analytics.

• Design and develop Security Framework to provide fine grained access to objects in AWS S3 using AWS Lambda, DynamoDB.

• Implement data quality checks using Python scripts, ensuring a 98% accuracy rate in patient records, and boosting data reliability.

• Extract data from multiple source systems S3, Redshift, RDS and Created multiple tables/databases in Glue Catalog by creating Glue Crawlers.

• Manage patient databases with PostgreSQL, optimizing queries and indexing for a 15% reduction in query response time.

• Use AWS EMR to transform and move large amounts of data into and out of other AWS data stores and databases, such as Amazon Simple Storage Service (Amazon S3) and Amazon DynamoDB.

• Introduce Apache Kafka for real-time data processing, reducing data latency by 25% and enabling timely medical interventions.

• Responsible for loading structured, unstructured, and semi-structured data into Hadoop by creating static and dynamic partitions and migrated the existing data from Teradata/SQL Server to Hadoop and perform ETL operations on it.

• Importing & exporting database using SQL Server Integrations Services (SSIS) and Data Transformation Services

(DTS Packages).

• Conduct Data blending, Data preparation using Alteryx and SQL for Tableau consumption and publishing data sources to Tableau server.

• Transition to the ELK Stack for monitoring and logging, reducing system downtime by 25% and ensuring uninterrupted access to patient data.

Data Engineer Cognizant, India Jun 2018-Jul 2021

• Spearheaded a financial project at Cognizant using Apache Hadoop and Spark, achieving a 30% improvement in data retrieval speed with Amazon Redshift for data warehousing.

• Utilized Python and Scala for ETL processes, incorporating SQL scripts for data transformations, automating workflows, and reducing manual effort by 25% for streamlined and error-free financial data processing.

• Used SSIS, NIFI, Python scripts, Spark Applications for ETL Operations to create data flow pipelines and involved in transforming data from legacy tables to Hive, HBase tables, and S3 buckets for handoff to business and Data scientists to create analytics over the data.

• Developed, prototyped, and tested predictive algorithms. Filtered and cleaned data, reviewed reports, and performed indicators.

• Implemented Power BI for interactive financial dashboards, connecting to SQL databases to facilitate SQL- based queries and contributing to a 25% improvement in decision-making speed for stakeholders.

• Performed incremental loads as well as full loads to transfer data from OLTP to Data Warehouse of snowflake schema using different data flow and control flow tasks and provide maintenance for existing jobs

• Conducted Exploratory Data Analysis using Python Matplotlib and Seaborn to identify underlying patterns and correlation between features.

• Established data governance with Apache Atlas, incorporating SQL-based metadata management for a 20% reduction in data errors and ensuring regulatory compliance in SQL-based financial reporting.

• Worked on various machine learning algorithms like Linear regression, logistic regression, Decision trees, random forests, K- means clustering, Support vector machines, XGBoosting on client requirements.

• Designed and developed end-to-end ETL process from various source systems to Staging area, from staging to Data Marts and data load.

• Worked with Python NumPy, SciPy, Pandas, Matplot, Stats packages to perform dataset manipulation, data mapping, data cleansing and feature engineering. Built and analysed datasets using R and Python. CERTIFICATION

• AWS Certified Solutions Architect Associate.

• AWS Certified Cloud Practitioner.

Contact this candidate