Data Engineer Machine Learning

Location:

Jersey City, NJ

Posted:

September 10, 2025

Contact this candidate

Resume:

Bhanu Prakash Ghanta

PH: +1-414-***-**** Mail: *****.****@*****.***

LinkedIn: www.linkedin.com/in/bhanu-prakash-ghanta-645493370 Professional Summary:

• With 4+ years of professional experience as a Data Engineer, specializing in data analysis, design, coding, and the development of Data Warehousing solutions.

• Skilled in managing data analytics, data processing, and data-driven projects, with strong capabilities in Machine Learning and Artificial Intelligence.

• Proficient in scripting with Python and SQL, and UNIX/Linux shell scripting, leveraging these skills to perform efficient Extract, Load, and Transform operations.

• Experienced in real-time data processing using Apache Kafka, Amazon Kinesis, and Apache Spark, constructing robust data pipelines for high-volume transactions.

• Adept at developing and managing data warehousing solutions using Amazon Redshift, Snowflake, and MySQL, ensuring optimized data aggregation and analysis.

• Proficient with tools in the Hadoop Ecosystem including Hive, HDFS, MapReduce, Sqoop, Kafka, Yarn, and HBase, with expertise in handling terabytes of streaming and batch data.

• Experienced in utilizing AWS and Azure services like Azure Data Lake, Azure Data Factory, Databricks, Azure Synapse Analytics, Azure Stream Analytics, and Azure Event Hub for scalable and efficient data architectures.

• Strong background in data governance and security protocols using Collibra and IAM, ensuring compliance and safeguarding sensitive information.

• Skilled at implementing CI/CD practices using Docker, Kubernetes, Terraform, and Jenkins, streamlining deployment processes and improving development efficiency.

• Adept in data visualization and reporting using Power BI and Tableau, creating interactive dashboards to support business decision-making.

• Extensive experience in database management and development using PostgreSQL, SQL Server, and MongoDB, optimizing data storage and retrieval performance.

• Experienced in Agile methodologies, including Scrum and SDLC best practices, ensuring high-quality and timely delivery of projects.

• Proven ability to lead collaborative efforts with cross-functional teams, data analysts, and business stakeholders to deliver tailored data solutions.

• Exceptional communication and problem-solving skills, with a strong ability to translate technical concepts and requirements for both technical and non-technical stakeholders. Technical Skills:

• Languages & Scripting: Python, SQL, Scala, Java, Bash, HiveQL, UNIX Shell Scripting

• Big Data & Distributed Systems: Apache Spark, PySpark, Databricks, Hadoop, HDFS, Hive, YARN, HBase, MapReduce, Sqoop, Apache Kafka, Apache Flink, Delta Lake, Impala

• Databases & Data Warehousing: Snowflake, Amazon Redshift, Google BigQuery, Azure Synapse Analytics, MySQL, PostgreSQL, SQL Server, Oracle, MongoDB, DynamoDB, Cassandra

• ETL & Data Pipeline Tools: Apache Airflow, dbt, Apache NiFi, Informatica PowerCenter, Talend, AWS Glue, Azure Data Factory, SSIS, AWS Data Pipeline

• Cloud Platforms & Services: AWS, Azure, Google Cloud Platform, S3, Redshift, Glue, Lambda, RDS, EMR, Athena, CloudWatch, DynamoDB, SQS, Kinesis, Data Factory, Synapse Analytics, Databricks, Blob Storage, Cosmos DB, Event Hub, Stream Analytics, Azure DevOps, BigQuery, Dataflow, Cloud Composer, Cloud Storage, Pub/Sub

• DevOps & IaC: Docker, Kubernetes, Jenkins, GitHub Actions, GitLab CI/CD, Terraform, CloudFormation, Ansible, Helm

• Data Governance & Security: IAM, Apache Atlas, Apache Ranger, Collibra, RBAC, GDPR, CCPA, HIPAA

• Visualization & Reporting: Power BI, Tableau, Looker, Metabase, SSRS, Excel

• Monitoring & Observability: Prometheus, Grafana, ELK Stack, Elasticsearch, Logstash, Kibana

• Version Control & Collaboration: Git, GitHub, GitLab, Bitbucket, Azure Repos, Jira, Confluence, Microsoft Teams

• Development & Productivity: Visual Studio Code, IntelliJ IDEA, Eclipse, NetBeans, SSMS, Jupyter Notebook, Microsoft Office Suite.

Work Experience:

Goldman Sachs Global Data Engineer August 2024 - Present Responsibilities:

• Architected and maintained ETL/ELT workflows utilizing Python and Apache Spark with Apache Airflow, enhancing data automation efficiency.

• Executed data cleansing, validation, and transformation on intricate financial datasets, ensuring superior data quality and consistency for downstream analytics using Snowflake.

• Led collaborative efforts with data analysts and business stakeholders to comprehend data requirements, resulting in tailored solutions meeting precise business needs, leveraging Azure and Agile methodologies.

• Constructed data pipelines for real-time processing using Apache Kafka and RESTful APIs, significantly improving performance and throughput.

• Administered and optimized NoSQL databases using Apache Cassandra, enhancing data storage and retrieval performance for high-volume transactions.

• Established robust data governance protocols to fortify data security and achieve regulatory compliance mandates, leveraging Collibra and Apache Spark.

• Implemented and managed data warehousing solutions using Amazon Redshift, reducing query times.

• Enhanced data models using Star schema design principles for efficient querying, improving query performance, and facilitating agile analytics with Azure Synapse Analytics.

• Engineered data processing frameworks using Databricks, improving data processing efficiency by 40%.

• Utilized Git for version control and IntelliJ for development, ensuring efficient collaboration and development processes.

• Deployed CI/CD pipelines using Docker, Kubernetes, and Terraform for streamlined deployment and management.

• Created interactive dashboards and performed real-time data analytics using PySpark and Power BI, improving decision-making processes and reducing reporting times.

• Implemented comprehensive data lineage tracking mechanisms using Apache Atlas, improving transparency and accountability in data workflows.

• Utilized advanced data compression techniques within Apache Spark workflows, significantly reducing storage costs and improving processing times.

• Leveraged Azure Stream Analytics to process and analyze streaming data in real-time, enhancing data processing capabilities and providing instant insights. Sundaram Finance Data Engineer June 2020 – June 2023 Responsibilities:

• Developed and sustained data pipelines for member enrolment, claims processing, and risk management systems, employing Hive, Sqoop, and RESTful APIs for seamless data integration and querying.

• Formulated data models using PostgreSQL and SQL Server, enhancing data storage and retrieval performance.

• Enforced stringent data quality checks and monitoring procedures, ensuring data accuracy and consistency within the data pipelines, leveraging Scala.

• Orchestrated data workflows using Informatica and SQL Server Integration Services (SSIS), enhancing automation and scheduling capabilities for improved operational efficiency.

• Collaborated closely with data scientists to prepare and transform data for machine learning models deployed in fraud detection and healthcare cost prediction initiatives, following SDLC best practices.

• Assisted in the migration of on-premises data infrastructure to cloud platforms (Azure Data Lake

(ADLS) and Azure Data Factory (ADF)), ensuring scalability and cost-effectiveness, with a focus on HIPAA compliance practices.

• Leveraged Tableau for data visualization and reporting, creating interactive dashboards to support business decision- making.

• Utilized Bitbucket for version control, facilitating efficient collaboration and project management within the team.

• Managed databases using SQL Server Management Studio (SSMS), ensuring efficient data handling and processing.

• Employed Apache HBase for NoSQL database management, improving data storage flexibility and retrieval times.

• Integrated Grafana and Jenkins for monitoring and continuous integration, enhancing system reliability and deployment efficiency.

• Implemented and automated routine tasks using UNIX/Linux shell scripting to enhance operational efficiency.

• Deployed Azure Monitor to provide real-time insights into pipeline performance and system health, ensuring timely issue resolution and uptime consistency.

• Automated schema evolution for structured and semi-structured data using Avro and Hive, facilitating smoother data model updates without disrupting pipelines.

• Implemented data governance protocols using Apache Atlas, ensuring data compliance and security.

• Conducted unit testing and integration testing to verify data accuracy and system performance, ensuring robust data processing pipelines.

• Integrated Azure Event Hub for event-driven data processing, improving real-time data handling and reducing latency.

Certifications:

• Azure Data Engineer Associate

• Databricks Certified Data Engineer Associate

Education:

• Master’s in Computer Science

Concordia University of Wisconsin

• Bachelor’s in Computer Science and Engineering

SRM University- Chennai

Contact this candidate