Data Engineer Real-Time

Location:

Fort Worth, TX

Salary:

70k

Posted:

February 06, 2025

Contact this candidate

Resume:

Manasa GV Data Engineer

*****************@*****.***

linkedin.com/in/manasa-gv-2142m807

214-***-****

github.com/ManasaGV12

PROFILE

Data Engineer with 5+ years of experience in cloud-based solutions, big data technologies, and ETL pipelines. Skilled in Python, Scala, SQL, and tools like Spark, Kafka, AWS, and GCP. Expert in data modeling, performance tuning, and real-time analytics. Holds a Master’s in Data Science, delivering scalable and secure data solutions. PROFESSIONAL EXPERIENCE

JP Morgan Chase, Pennsylvania

Data Engineer

Feb 2024 – present

•Developed Python-based GCP Cloud Functions to process incoming CSV files from Google Cloud Storage (GCS) into BigQuery, streamlining data ingestion workflows.

•Automated SQL dump downloads and integrated them into Cloud SQL (MySQL), transforming and migrating data to BigQuery using PySpark, Scala, Spark, and GCP Dataproc.

•Designed and implemented data pipelines with Apache Beam and Google Cloud Dataflow, efficiently processing bounded and unbounded data streams from Pub/Sub topics into BigQuery.

•Optimized production pipelines, reducing error rates by 15%, and collaborated with Google Support to enhance scalability for handling 500K+ daily events, monitored via MonViz dashboards.

•Built batch processing pipelines on GCP Dataproc using Spark and Scala, reducing query execution times by 30% and optimizing data storage for 1TB+ datasets using Hive partitioned tables and Parquet format.

•Designed real-time data pipelines leveraging Flume, Kafka, and Spark Streaming to process and transform web log data, improving data latency for client reporting systems.

•Enhanced Spark job performance with partitioned RDDs (e.g., hash, range) and developed automation tools using Python, PySpark, and Shell scripting to increase operational efficiency.

•Configured firewall rules to securely enable Dataproc access from external systems and scaled Hadoop-based log pipelines using Flume, HDFS, and custom sinks to Pub/Sub.

•Migrated data across systems using Sqoop and Scala, implemented fact-dimension modeling, and performed large- scale data transformations on GCP Dataproc.

•Deployed and executed Spark and Scala-based distributed pipelines on GCP Dataproc, processing datasets at scale and optimizing resource utilization.

•Conducted scalability tests on Kubernetes-deployed Airflow with a Cloud SQL backend and Hadoop components, ensuring seamless performance for log pipelines.

•Created tools using Python, PySpark, and Shell scripting to automate repetitive tasks, improving operational workflows and reducing manual effort.

Infosys Private Limited, Hyderabad, India

Data Engineer

Sep 2021 – Dec 2022

•Designed and maintained the core website infrastructure for BMG, ensuring high availability, scalability, and seamless functionality.

•Enhanced user interfaces by developing and deploying Java-based solutions and delivering responsive web components using React.js.

•Scaled Hadoop and Spark workloads on GCP Dataproc to process datasets exceeding 1.5TB, improving data analytics speed by 20% with seamless BigQuery integration.

•Implemented real-time messaging and event-driven architectures using GCP Pub/Sub to streamline workflows.

•Built and automated DAG workflows using Apache Airflow for reliable and efficient data pipeline execution.

•Secured data workflows using Google Cloud Storage (GCS) and implemented data backup strategies to protect critical music publishing assets.

•Deployed GCP Cloud Functions to automate event-driven workflows, processing 50K+ events daily with minimal latency.

•Created Selenium-based automation scripts to test and validate application functionality, ensuring high-quality user experiences.

•Used insights from the technology stack to shape artists’ promotion and distribution strategies across various markets.

Envision Infotech Private Limited, Chennai, India

Data Analyst

Jun 2019 – Aug 2021

•Migrated data processing pipelines and applications to AWS Cloud, ensuring seamless integration and optimized performance for analytical workloads.

Manasa GV *****************@*****.*** 1 / 2

•Implemented CI/CD pipelines using Jenkins and AWS CodePipeline, reducing deployment time by 40% for 20+ data ingestion workflows.

•Designed and implemented serverless architectures leveraging API Gateway, AWS Lambda, and DynamoDB to process and analyze real-time data with high scalability and minimal infrastructure overhead.

•Automated the deployment of AWS Lambda functions via S3 buckets, enabling efficient event-driven data processing workflows.

•Streamlined data operations by integrating DevOps/Agile methodologies for code reviews, unit test automation, build and release automation, and incident management, improving delivery timelines and system reliability.

•Deployed and maintained Dockerized analytics applications on AWS, incorporating tools like Netflix Eureka for service discovery and Spring Ribbon for load balancing, ensuring high availability and fault tolerance.

•Utilized AWS CLI and Boto3 scripts to automate data pipeline tasks, including Amazon EMR setup and management for big data analytics.

•Built and managed serverless data workflows using AWS Lambda and Step Functions for efficient orchestration of analytical processes.

•Optimized real-time data pipelines with Spring Kafka and Zookeeper, improving processing reliability for 100K+ daily streaming events.

•Collaborated with cross-functional teams to deploy and monitor data applications, ensuring smooth operations and alignment with business requirements.

Envision Infotech Private Limited, Chennai, India

Intern

Dec 2018 – May 2019

•Assisted in migrating data processing pipelines and analytical applications to AWS Cloud, ensuring seamless integration and optimized performance.

•Developed CI/CD pipelines using Jenkins, AWS CodePipeline, and Docker, enhancing deployment efficiency for data ingestion and transformation workflows.

•Designed and implemented serverless architectures with AWS Lambda, API Gateway, and DynamoDB for real-time data processing, automating event-driven workflows via S3.

•Gained hands-on experience deploying analytics applications in Dockerized environments and improving real-time data streaming with Spring Kafka and Zookeeper.

SKILLS

Cloud Technologies: AWS (S3, Redshift, EC2, Lambda, Glue) GCP (BigQuery, Dataproc, Cloud Storage, Cloud Functions, Cloud Storage, Pub/Sub)

Languages: Python, SQL, Java, Scala, R, Shell Scripting, ReactJs Big Data Technologies: Hadoop, Apache Spark, Hive, HBase, Flink, Pig, Cassandra, Presto ETL Tools: Apache NiFi, Talend, Informatica, AWS Glue, Azure Data Factory Data Warehousing: Snowflake, Amazon Redshift, Google BigQuery, Teradata Databases: MySQL, PostgreSQL, MongoDB, MS SQL Server, DynamoDB Build/CI-CD Tools: Jenkins, Maven, Gradle, Docker, Kubernetes, Terraform, Airflow, DAGs Data Streaming: Apache Kafka, Spark Streaming, Flink Streaming, AWS Kinesis, Google Pub/Sub Data Visualization: Tableau, Power BI, Looker, QuickSight Version Control: Git, GitHub, GitLab, Bitbucket

Operating Systems: Linux, UNIX, Windows

Testing/Development Tools: Selenium, Eclipse

Other Skills: Data Modeling, Data Pipeline Development, Performance Tuning, Data Governance, Data Security Testing Frameworks PyTest, JUnit, TestNG, Mockito

EDUCATION

University of North Texas

Master’s of Science in Data Science and Information Sciences Jan 2023 – Dec 2024 Denton, Texas

CGPA: 3.9/4

Kalasalingam Academy of Research and Education

Batchelor of Technology in Compute Science and Engineering Jun 2015 – May 2019 TN, India

CGPA: 3.2/4

Manasa GV *****************@*****.*** 2 / 2

Contact this candidate