Data Engineer Senior

Location:

Hyattsville, MD

Posted:

February 24, 2025

Contact this candidate

Resume:

Ram Gopal

Phone: +1-720-***-****

Email: ********@*****.***

Senior Data Engineer

PROFESSIONAL SUMMARY:

• Overall 9+ years of experience as a Senior Data Engineer, adept at designing and implementing robust data solutions.

• Proficient in programming languages, including Python, Scala, R, SQL, and Bash, with a strong data manipulation and analysis foundation.

• Extensive expertise in cloud platforms such as AWS, Azure, and GCP, and skilled in managing data warehouses like Snowflake, Google Big Query, and RedShift.

• Specialized in big data technologies, including Spark, PySpark, Hadoop, Hive, Pig, Sqoop, and Impala, for efficient data processing and analytics.

• Experienced in working with various databases such as SQL Server, CosmosDB, PostgreSQL, Oracle, Aurora, and DynamoDB.

• Well-versed in ETL tools like Informatica, Nifi, ADF, SSIS, and AWS Glue for seamless data extraction, transformation, and loading.

• Proficient in data processing and analytics libraries such as Pandas, NumPy, PyTorch, Scikit-Learn, TensorFlow, and Matplotlib.

• Skilled in workflow orchestration using AWS Step Functions and Apache Oozie, ensuring smooth data pipeline execution.

• Expertise in data modeling and visualization tools, including Erwin, QuickSight, Dataiku, Power BI, and Looker.

• Developed and maintained data transformation workflows using DBT, ensuring data quality and consistency across pipelines.

• Hands-on experience with containerization and orchestration tools like Openshift, Docker, Kubernetes, ECS, and EKS.

• Familiarity with infrastructure such as code through AWS CloudFormation and Terraform for efficient deployment and management.

• Developed algorithms for processing and analyzing astronomical data, enhancing data processing efficiency and accuracy.

• Designed and implemented data pipelines within Palantir Foundry, facilitating seamless integration of diverse data sources to enhance operational efficiency.

• Proficient in using integrated development environments (IDEs) such as Eclipse, VS Code, and IntelliJ.

• Adept at monitoring and logging using CloudWatch, Prometheus, Splunk, Grafana, and Azure Monitor for proactive system management.

• Strong focus on security and identity management with expertise in IAM, KMS, JWT, and Azure AD.

• Well-versed in version control and collaboration tools like GitHub, GIT, Bitbucket, and JIRA.

• Experienced in project management methodologies, including Agile, Scrum, and Kanban.

• Analyzed customer behavior patterns using CRM data to identify trends, helping to drive targeted marketing strategies and improve customer engagement.

• Designed and implemented scalable GraphQL APIs using Hasura to enable efficient data querying and manipulation, improving data accessibility for front-end applications.

• Familiarity with Guidewire’s data model is crucial, including key entities like policies, claims, and customers. TECHNICAL SKILLS:

Cloud Technologies Azure (Data Factory (V2), Data Lake, Databricks, Blob Storage, Data Box), Amazon EC2, IAM, Amazon S3, Amazon RDS, Amazon Elastic Load Balancing, AWS Lambda, Amazon EMR, Amazon Glue, Amazon Kinesis. Programming Languages Java, Python,SQL, Scala, Shell Script Automation Tools Azure Logic App, Crontab, Terraform. Big Data Technologies Hadoop, Apache Spark, Apache Kafka, Apache Hbase, HDFS, Nifi Database Systems MySql, SQL Server, Oracle, PostgreSQL NoSQL Data Database HBase, Cassandrs, MongoDB, DynamoDB. Data Warehousing Amazon Redshfit, Snowflake

Data Pipeline Orchestration Apache Airflow, Luigi

Data Visualization Tableau, Power BI, Looker, QlikView PROFESSIONAL EXPERIENCE:

Spencer Health Solutions Aerial Center Pkwy, Morrisville March 2023 - Present Senior Data Engineer

Responsibilities:

• Implemented a real-time data ingestion pipeline using PostgreSQL and Apache Kafka to process and analyze streaming data.

• Experienced in working with various databases such as SQL Server, CosmosDB, PostgreSQL, Oracle, Aurora, and DynamoDB.

• Automated testing and documentation processes within DBT to improve deployment efficiency and stakeholder understanding.

• Integrated Snowflake with Azure services such as Azure Data Factory and Azure Blob Storage to facilitate seamless data migration and orchestration from on-premises to the cloud.

• Developed Azure Synapse dedicated SQL pools with materialized views and column store indexes, utilizing Python stored procedures and dynamic SQL queries, integrated with Azure Data Factory parallel copy activities.

• Utilized Hasura's real-time capabilities to create subscriptions for live data updates, enhancing user experience in applications requiring instant data reflection.

• Developed and maintained data transformation workflows using DBT, ensuring data quality and consistency across pipelines.

• Optimized resource utilization in Azure Synapse through workload importance classification and Azure Databricks cluster policies, managed by Azure Data Factory concurrent pipeline execution with Python monitoring.

• Developed comprehensive data profiling in Azure Databricks using Ganglia metrics with PySpark profiling functions and Python analysis of Azure Data Lake storage tiers through Azure Data Factory monitoring.

• Integrated Hasura with PostgreSQL databases, optimizing data modeling and access patterns to streamline ETL processes and improve data retrieval times.

• Collaborated with interdisciplinary teams of scientists and engineers to integrate data-driven approaches in astronomical projects.

• Proficient in Confluence for collaborative documentation and knowledge sharing. Created and maintained Confluence pages for project documentation, data dictionaries, and knowledge bases.

• Implemented row-level security in Azure Synapse with dynamic data masking and Azure Databricks table ACLs, orchestrating secure data movement through Azure Data Factory managed identities with Python encryption.

• Led a team of data engineers in the successful migration of legacy data systems to Palantir Foundry, ensuring minimal disruption to ongoing business processes.

• Implemented CI/CD pipelines in Azure DevOps for automated deployment of data solutions, enhancing development efficiency and reducing deployment time.

• Worked with Presto, Hive, Spark SQL and Big Query, leveraging Python client libraries to build efficient and interoperable analytics programs.

• Built payment platform leveraging Azure Databricks Delta Live Tables and Unity Catalog with MLflow, orchestrating complex Python and PySpark ML pipelines through Azure Synapse serverless pools and message queues.

• Conducted SQL development, unit testing, and performance tuning to ensure the resolution of testing issues based on defect reports.

• Utilized Informatica Data Transformations, such as normalizer, Lookup, Source Qualifier, Expression, Aggregator, Sorter, Rank and Joiner to parse complex files and load them into databases.

• Leveraged DBT's version control features to manage and track changes, facilitating smooth collaboration among team members.

Environment: Airflow, Azure, Big Table, Big Query, G cloud ML Engine, Tensor Flow, Kafka, Cloud Dataflow, Cloud Spanner, Cloud Data fusion, Cloud shell SDK, Slack, Jira, Lucid chart, G-suite, DBT, Composer, TOAD, SQL Developer, Netezza, Hadoop, Hive, Pig, Sqoop, Kubernetes, Oozie, Snowflake, MySQL, Python, DAG, Teradata, HDFS, Cassandra, Yarn.

Nationwide Insurance Columbus, OH May 2020 - Feb 2023 Senior Data Engineer

Responsibilities:

• I have demonstrated proficiency in Informatica PowerCenter, utilizing its comprehensive set of tools and features to design, develop, and deploy ETL processes.

• Ability to create reporting and analytics solutions using Guidewire data, often integrating with BI tools like Tableau or Power BI.

• Automated testing and documentation processes within DBT to improve deployment efficiency and stakeholder understanding.

• Developed financial data warehouse in Azure Synapse, implementing row-level security patterns and managing complex Azure Data Factory metadata through Python Airflow monitoring and alerting frameworks.

• Built enterprise IoT analytics platform using Azure Databricks job clusters and spot instances, implementing sophisticated PySpark window functions into Azure Data Lake medallion architecture for scalable data processing.

• Proficient in installing, configuring, and maintaining PostgreSQL databases for optimal performance and scalability.

• Leveraged DBT's version control features to manage and track changes, facilitating smooth collaboration among team members.

• Implemented and managed dynamic data access policies using Immuta, ensuring compliance with data governance regulations and enhancing security across multiple data sources.

• Engineered enterprise data pipelines in Azure Databricks using Delta Lake and Spark Structured Streaming, ensuring data quality and reliability while processing over 500TB of daily data volumes.

• · Implemented and orchestrated Azure Data Factory pipelines integrated with Azure Databricks notebooks for complex ETL workflows, incorporating parallel processing and comprehensive error-handling mechanisms.

• Created robust real-time streaming solutions using Azure Event Hubs with Azure Databricks, processing over 1 million events per second while maintaining sub-second latency and fault tolerance.

• Proficient in installing, configuring, and maintaining PostgreSQL databases for optimal performance and scalability.

• Implemented a real-time data ingestion pipeline using PostgreSQL and Apache Kafka to process and analyze streaming data.

• Utilized Immuta's data lineage capabilities to monitor and document data flows, enabling better understanding and auditing of data usage across teams.

• Developed comprehensive monitoring solutions using Azure Monitor and Log Analytics for Azure Databricks clusters, implementing automated alerting and custom dashboard creation for performance tracking.

• Implemented robust security frameworks through Azure Active Directory integration with Azure Databricks, establishing RBAC policies and managing access controls across multiple production environments.

• Built scalable data validation framework using Azure Databricks and Delta Lake, incorporating automated quality checks, schema validation, and performance optimization for enterprise data pipelines.

• Implemented Python solutions for data processing within Azure Functions and Event Hubs, creating fault-tolerant architectures that handled 200K+ events per second with consistent performance.

• Configuring and managing Oracle Real Application Clusters (RAC) for high availability and failover. Ensuring data availability through features like Data Guard.

• Implemented Spark using Scala and Spark SQL for faster testing and processing of data.

• Involved in developing a MapReduce framework that filters bad and unnecessary records.

• Responsible for migrating the code base to Amazon EMR and evaluating Amazon ecosystem components like Redshift.

• Developed NiFi flow topologies for data cleansing operations before moving data into HDFS.

• Worked on different file formats (ORCFILE, Parquet, Avro) and different Compression Codecs (GZIP, SNAPPY, LZO).

• Optimize Talend jobs for performance, including parallel processing, partitioning, and caching, to handle large volumes of data efficiently.

Environment: Azure, Glue, Kinesis, EMR, API Gateway, DynamoDB, HDFS, Oozie, Hive, Sqoop, Snowflake, Spark, java, python, Zookeeper, Apache NiFi, Py spark, Terraform, Redshift, SNS, SQS, S3, IAM, Talend, Jenkins, Netezza, PL/SQL, Git, Bucket.

AgFirst Columbia, SC Jan 2018 - April 2020

Data Engineer

Responsibilities:

• Participated in the complete Software Development Life Cycle (SDLC) for data engineering initiatives, overseeing activities from the initial concept to the final deployment.

• Designed and implemented highly available and scalable database solutions using Aurora, DynamoDB, and RDS on AWS.

• Orchestrated serverless workflows using AWS Step Functions and data streaming with Kinesis for real-time data processing.

• Implemented and optimized data storage and retrieval using AWS services such as S3, Redshift, and EMR.

• Developed serverless functions with AWS Lambda, automating tasks and improving overall system efficiency.

• Defined and provisioned infrastructure as code using AWS CloudFormation for consistent and reproducible environments.

• Implemented distributed data processing using Spark and PySpark for large-scale data analytics.

• Automated configuration management and deployment tasks using Ansible for increased efficiency.

• I have demonstrated expertise in financial data management and data warehousing, utilizing advanced technologies such as Databricks Delta Lake, Delta Live Tables, and PySpark.

• Utilized Erwin for data modeling and schema design, ensuring data accuracy and consistency.

• Created interactive and visually appealing dashboards with Quick Sight for data visualization and reporting.

• Implemented data serialization and deserialization using Avro for efficient data interchange.

• Applied machine learning techniques using Scikit-Learn and TensorFlow for predictive modeling and analysis.

• Orchestrated containerized applications using ECS and EKS for scalable and efficient deployment.

• Implemented data transfer between Hadoop and relational databases using Sqoop and Pig.

• Utilized AWS Glue for automated data cataloging, ETL, and data preparation.

• Orchestrated workflow scheduling and coordination using Oozie for Hadoop-based data processing.

• Integrated and utilized Dataiku for collaborative and end-to-end advanced analytics.

• Implemented CI/CD pipelines using AWS CodePipeline for automated testing and deployment.

• Managed version control and collaborative development using GitHub and tracked project progress using Jira.

• Executed and oversaw Identity and Access Management (IAM) policies and encryption through Key Management Service (KMS).

• Monitored and ensured system health using AWS CloudWatch for proactive issue resolution.

• Utilized CloudFront for content delivery and caching, optimizing data access and retrieval.

• Developed and debugged data engineering solutions using Eclipse IDE for efficient development. Environment: AWS, Aurora, DynamoDB, Kinesis, CloudFormation, Spark, PySpark, Ansible, Erwin, QuickSight, Avro, RedShift, Pandas, NumPy, ECS, EKS, Sqoop, Pig, AWS Glue, Oozie, Dataiku, AWS CodePipeline, GitHub, Jira, IAM, KMS, CloudWatch, CloudFront, Eclipse.

Aricent Technologies Bangalore, India June 2015 - Sept 2017 Data analyst

Responsibilities:

• Designed and implemented an Enterprise Data Lake to facilitate diverse use cases, including analytics, processing, storage, and reporting of voluminous and rapidly changing data.

• Maintained high-quality reference data in the source by performing operations such as cleaning, transformation and ensuring integrity in a relational environment. Collaborated closely with stakeholders and solution architects.

• Conducted comprehensive architecture and implementation assessments of various AWS services, including Amazon EMR, Redshift and S3, ensuring optimal utilization and performance.

• Implemented machine learning algorithms in Python, leveraging technologies such as Kinesis Firehose and S3 Data Lake, to predict item quantities for users and provide automated suggestions.

• Leveraged AWS EMR to efficiently transform and transfer large volumes of data between AWS data stores and databases like Amazon S3 and Amazon DynamoDB.

• Utilized Spark SQL for Scala and Python interfaces, enabling seamless conversion of RDD case classes into schema RDDs for advanced data processing and analytics.

• Ingested data from various sources, such as HDFS and HBase, into Spark RDD, performing computations using Py Spark to generate response outputs.

• Conducted data blending and preparation using Alteryx and SQL, preparing data for consumption in Tableau and publishing data sources to Tableau Server.

• Designed and implemented Kibana dashboards based on Logstash data, integrating various source and target systems into Elasticsearch for real-time log analysis of end-to-end transactions.

• Leveraged AWS Step Functions to automate and orchestrate tasks related to Amazon Sage Maker, including data publishing to S3, model training, and deployment for prediction. Environment: AWS Lambda, Glue, Kinesis, EMR, API Gateway, DynamoDB, HDFS, Oozie, Hive, Sqoop, Snowflake, Spark, Zookeeper, Apache NiFi, Py spark, Terraform, Redshift, SNS, SQS, S3, IAM, Jenkins, Netezza, PL/SQL, Git, Bucket.

Contact this candidate