Harshitha Manam
Email: ******************@*****.***
Contact No: 336-***-****
Data Engineer
Professional Summary:
Around 4+ years of professional IT work experience in Analysis, Design, Development, Deployment and Maintenance of critical software and big data applications.
Capable of processing large sets of structured, semi-structured data and supporting systems application architecture.
Very good experience in Python and Shell scripting.
Experience in designing, implementing, and optimizing data pipelines on Google Cloud Platform.
Designed and implemented a real-time analytics platform using GCP services and Kafka.
Experience in retrieving data from databases like MYSQL, Teradata, and Oracle into HDFS using Sqoop and ingesting them into BigQuery.
Proficient in designing, implementing, and managing cloud infrastructure solutions on Azure.
Experienced in deploying virtual machines, storage, and networking components in Azure.
Very good experience in modelling data solutions for reporting/Datawarehouse purpose
Good understanding and experience with Software Development methodologies like Agile and Waterfall.
Effective leadership quality with good skills in strategy, business development, client management and project management.
Technical Skills:
Tools and Technologies:
Monitoring and Reporting
Tableau, Custom Shell Scripts, Splunk, Grafana
Build and Deployment Tools
Maven, Git, SVN, Jenkins
Programming and Scripting
SQL, JavaScript, Shell Scripting, Python, HiveQL
Databases
Oracle, MY SQL, MS SQL Server, Teradata, Postgres SQL
ETL Tools
Informatica Power Center, Info works
Operating Systems
Linux, Unix, Windows 8, Windows 7, Windows Server 2008/2003
AWS Services
S3, Redshift, EMR, Lambda
Kafka
Setup, configuration, data streaming, and integration
Professional Experience:
Sr. Data Engineer Nov 2023 – Till date
Client: Bright Speed, Charlotte, NC
Responsibilities:
●Designed and developed end-to-end data pipelines on Google Cloud Platform, incorporating Kafka for real-time data streaming and Python for data ingestion.
●Designed and implemented Azure Virtual Networks for secure and efficient communication.
●Created custom Kafka producers and consumers to seamlessly integrate external data sources with internal systems, enabling real-time analytics.
●Developed Python scripts for data extraction, transformation, and loading (ETL) from TXT and CSV files into Google BigQuery.
●Utilized Azure Log Analytics and Application Insights for diagnostics and performance optimization.
●Implemented efficient partitioning and clustering strategies in BigQuery for optimized query performance.
●Leveraged Airflow to design and orchestrate complex ETL workflows, including data extraction from MySQL databases and loading into BigQuery.
●Deployed and configured Azure Virtual Machines to host various applications and services.
●Developed Cloud Functions, to trigger the workflow in airflow upon files delivered to GCS storage.
●Participated in designing Airflow DAGs for scheduling and orchestrating data workflows.
●Orchestrated containerized applications using Azure Kubernetes Service.
●Implemented Azure Monitor and Azure Security Center for proactive monitoring.
●Implemented high-availability solutions using Azure VM scale sets.
●Collaborated with cross-functional teams to gather requirements, analyze data needs, and design effective data models.
●Integrated Azure DevOps for continuous integration and continuous deployment (CI/CD).
Environment: Python, BigQuery, Cloud Composer/ Airflow, MySQL, Cloud Storage, GitHub.
Sr. Data Engineer MAY 2023 – NOV 2023
General Motors – Detroit, MI.
Responsibilities:
●Worked on automating ETL scripts using Python.
●Designed and built data solutions to migrate existing Hadoop jobs on-prem to Google Cloud Platform
●Performed Exploratory Data Analysis on Data Sets for checking Assumptions required for model fitting and hypothesis testing and handling missing values and making transformations.
●Transformed data from different formats like XML, JSON, DSV to parquet file format using PySpark (Python).
●Shell Scripts to automate pipelines and error handling for data.
●Scheduled jobs in composer/airflow.
Environment: GCP, Sqoop, SQL, PySpark, Spark SQL, Astronomer/ Airflow, Hive, GitHub.
Sr. Data Engineer Jan 2020 – Aug 2022
Infor Global Solutions-IN
Responsibilities:
●Worked on automating ETL scripts using Python.
●Designed and built data solutions to migrate existing source data in Teradata and DB2 to Big
Query (Google Cloud Platform)
Integrated Azure Cognitive Services for AI-driven capabilities in applications.
●Designed, Developed, and automated pipeline to ingest data from Teradata, Oracle DB to Data Storage in GCP.
●Integrated Azure Functions with other Azure services for event-driven architectures.
●Transformed, Cleansed, and backfill data, created models in Big Query for the business use case to create reports for the EHRs.
●Deployed and managed web applications using Azure App Service.
●Utilized Azure Storage solutions, including Blob Storage, Azure Files, and Azure Disk Storage.
●Transformed data from different formats like XML, JSON, DSV to parquet file format using PySpark (Python).
●Shell Scripts to automate pipelines and error handling for data.
●Scheduled jobs in composer/airflow/Tidal to run jobs.
●Integrated Azure DevOps for continuous integration and continuous deployment (CI/CD).
●Implement One time Data Migration of Multistate level data from SQL server to Snowflake by using Python and SnowSQL.
Environment: GCP, Sqoop, SQL, PySpark, Logstash, Teradata, GitHub.
Education:
M.S in Computer Science, University of North Carolina Greensboro Aug 2022 – May 2024
Bachelor of Technology in Electronics and Communication Engineering
Sri Indu College of engineering and technology Aug 2017 – July 2021