Data Engineer Business Intelligence

Location:

Montgomery, AL

Salary:

60$-70$

Posted:

March 10, 2025

Contact this candidate

Resume:

Data Engineer

Harika Saladi

Phone: +* (***) - **0 - 6419

Email: **************@*****.***

Linked In: linkedin.com/in/saladi-harika-0345651a2

PROFESSIONAL SUMMARY:

●Data Engineer with 9+ years of experience in analysis, design, development, testing, and implementation of Big Data/Hadoop, Data Warehousing, Business Intelligence, and Application Development.

● Extensive hands-on experience in Hadoop led development of enterprise-level solutions utilizing Hadoop components such as Apache Spark, MapReduce, HDFS, Sqoop, PIG, Hive, HBase, Oozie, Flume, NiFi, Kafka, Zookeeper, and YARN.

●Well-versed in design and development of code base with Object-Oriented Programming methodologies (OOP), Design Patterns and Multi-threading.

●Developed and maintained data pipelines, ETL processes, and data integrations using SQL, Python, PySpark, and Spark.

●Worked with databases including SQL Server, DB2, and MySQL for querying, optimization, and performance tuning.

●Collaborated on system migrations and data transformation tasks, working with legacy and cloud platforms.

●Well versed in the development of REST API’s for data transfer through JSON and XML formats by utilizing AWS API Gateway, Flask, Fast API, and Django.

●Expertise in Kimball and One Big Table data modeling approaches, with experience leveraging complex data types to design and implement efficient and scalable data models.

●Experienced in the design and architecture of Data Warehouses and OLAP systems, multi-dimensional and relational data modeling, Extraction-Transformation-Load (ETL) processes, and data model review and database performance tuning.

●Designed and optimized complex logical and physical data models, including intricate fact and dimension tables. Leveraged advanced techniques like Star Schema, Snowflake Schema, and Slowly Changing Dimensions for data integration, storage, retrieval, and analysis, empowering informed decision-making.

●Impeccable working experience on Snowflake cloud data warehouse features like Snowpipe, Streams & Tasks, Time travel, zero copy cloning, Secure Data Sharing, Multi-cluster warehouse, Admin Activity and Credit Usage.

●Building Data Pipeline in snowflake using Snowpipe, Streams and Tasks as well as applying Data Masking policies in Snowflake.

●Experienced with Informatica Intelligent Cloud Services (IICS) DI, IDQ, and MDM capabilities, including Cloud Mass Ingestion (CMI) and Cloud Data Integration (CDI), for building and maintaining cloud data pipelines on AWS. Use IICS MDM to create and manage a master data repository (MDR).

● Implemented Informatica Data Quality (IDQ) to discover, define, apply, and monitor data quality rules, ensuring that data ingested into AWS met the organization’s standards.

●Proficient in establishing and configuring Kafka clusters, comprising brokers, topics, partitions, and replication, to ensure exceptional high availability and fault tolerance.

●Skilled in formulating SQL queries, stored procedures, functions, packages, tables, views, and triggers for various relational databases, including MySQL, PostgreSQL, MS SQL Server.

●Experienced in data manipulation using Python for loading and extraction, as well as with Python libraries such as NumPy, SciPy, Pandas, Dask, Keras, TensorFlow, PyTorch and Polars for data analysis, numerical computations, and parallel computing.

●Wrote complex SQL queries for data manipulation, cleansing, and reporting. Familiar with SQL/400 for DB2 and other relational databases.

●Worked directly with stakeholders to understand business requirements, ensuring the delivery of data solutions that met customer needs and enhanced business decision-making.

●Designed and developed interactive and visually appealing Tableau dashboards and reports to visualize key performance indicators (KPIs) and metrics.

●Proficient in transforming and processing data with AWS EMR, AWS Glue, and PySpark, enabling advanced data analytics and machine learning capabilities.

●Experience in designing and developing data warehouses using tools such as Amazon Redshift, and have worked on tasks such as data modeling, ETL, and data integration.

●Implemented cloud containerization using AWS ECS enabling automated scaling and efficient resource management of containerized applications in a production environment.

●Utilized Python and PySpark to develop data processing pipelines for ETL, data transformations, and integration with cloud-based services, improving data flow and processing efficiency.

●Designed and optimized complex SQL queries for data retrieval and reporting, ensuring system efficiency and high performance in database operations.

●Integrated Apache Kafka for real-time data streaming, enabling seamless communication between microservices and enhancing the responsiveness of the system.

●Configured and maintained CI/CD pipelines, automating build, test, and deployment processes using Jenkins and GitLab, resulting in faster delivery cycles.

●Have experience in creating data visualizations using Python libraries such as Matplotlib, Seaborn & R libraries GGplot and Rshiny. Built reports and dashboards using Tableau and PowerBI.

●Experience in building machine learning models using Python and its libraries Scikit-Learn, TensorFlow, and PyTorch. Used pre-trained models and fine-tuned them for image classification and NLP tasks.

●Proficiently employed Kubernetes for the seamless coordination, scaling, and efficient administration of Docker Containers.

●Experience in DevOps and Continuous Integration/Continuous Deployment (CI/CD) tools such as Git and GitHub, Bitbucket, CloudBees Jenkins, Splunk, ELK Stack and Docker containers.

TECHNICAL SKILLS:

Languages/Frameworks/Scripting

Scala, Java, Python, Spark, R, JavaScript, Node.js, Vue.js, SQL, PL/SQL Django, Flask, FastApi, Spring, Spring Boot, Bash, Powershell

Databases

Hive, Presto, Oracle, SQL Server, MySQL, PostgreSQL, MongoDB, Couchbase, HBase, Apache Cassandra, Informatica IDMC

Big Data Technologies

HDFS, MapReduce, Yarn, Hive, Sqoop, Oozie, Spark, Pig, Impala, Flume, Nifi, Flink, Informatica Cloud Data Integration (CDI)

AWS

S3, EC2, ECS, EKS, EBS, EMR, Redshift, Lambda, Glue, Athena, DynamoDB, Kinesis, Step Functions, SNS, SQS, Service Catalog, CloudWatch, IAM, QuickSight, Sage Maker, RDS, ELB, ALB, VPC.

Azure

Azure Synapse Analytics, Azure Data factory, Azure Data Lake Storage, Logic Apps, Event Hubs, Azure SQL database, HDInsight, Blob Storage, CosmosDB, Event Grid, Api Management

Data Visualization

Tableau, PowerBI, SAS, Data Transformation and Mapping Refactoring

Devops

Ansible, Jenkins, Docker, Kubernetes, Terraform, GIT

Tools

Jira, IntelliJ, Jupyter, ipython notebook, DBT, Delphix, Splunk, Informatica, IICS, IDMC, Airbyte, Talend, Erwin, GitHub, Bitbucket, Rally

PROFESSIONAL SUMMARY

Broadridge Financial Lake Success, NY Jan 2023 - Present

Senior Data Engineer

Responsibilities: -

●Designed and implemented ETL pipelines on AWS Databricks, integrating with AWS Glue ensuring reliable data ingestion, transformation, and loading into data lakes or warehouses.

●Designed and implemented ETL pipelines on AWS Databricks, integrated with AWS Glue, ensuring reliable data ingestion, transformation, and loading into S3 or Redshift for data lakes/warehouses.

●Led cloud-based migrations, transitioning legacy ETL workflows from on-prem systems to AWS and Azure platforms using Informatica Cloud Data Integration (CDI).

●Utilized Informatica IDMC to refactor and migrate complex ETL mappings and data transformations, ensuring high performance and scalability in cloud environments.

●Optimized SQL queries and configurations using AWS tools to identify bottlenecks and improve database performance metrics for better query execution.

●Built and managed data pipelines with AWS Glue and Lambda to process data from S3, applying complex business logic and transformations using PySpark and Spark/Scala.

●Configured automated backups for cloud databases like RDS, Redshift, and Aurora and implemented disaster recovery strategies.

●Integrated AWS Databricks with Amazon Redshift and QuickSight for seamless querying, analysis, and data visualization.

●Applied data governance and security best practices in cloud migrations to ensure compliance with GDPR, SOC2, and other industry standards.

●Led performance testing and optimization of cloud-based workflows, refining ETL performance and cloud execution efficiency using IDMC Operational Insights.

●Led the migration of legacy ETL workflows from on-premise data systems to cloud-based platforms, including AWS and Azure, ensuring scalability and performance optimization.

●Utilized tools like Informatica Cloud Data Integration (CDI) for automated migration and successfully transitioned data flows and workflows to the cloud.

●Refactored complex ETL mappings and transformations to work in cloud-native environments such as Informatica IDMC.

●Conducted unit testing on migrated workflows to ensure data accuracy and integrity, and optimized job performance for cloud execution.

●Managed cloud connectivity and built data pipelines across diverse data environments, ensuring seamless flow and integration.

●Developed and implemented a migration strategy, including Lift & Shift and Optimization, ensuring minimal disruption during the transition.

●Ensured compliance with cloud security standards, including GDPR and SOC2, by applying robust data governance practices during migration.

●Delivered training to team members on new cloud-based interfaces and tools post-migration to ensure smooth adoption

●Designed and managed YAML-based CI/CD pipelines in Azure DevOps to automate code deployments, integrating Terraform for infrastructure provisioning and ensuring consistent, error-free releases.

●Created and optimized GitHub Actions workflows using YAML to automate build, test, and deployment processes for AWS-based applications, improving release cycle efficiency

●Integrated AWS Athena with Amazon S3 and AWS Glue for data ingestion, cataloging, and transformation, enabling seamless access to structured and unstructured data sources.

●Developed custom UDFs (User Defined Functions) and aggregate functions in AWS Athena to perform advanced analytics and calculations on insurance data. RPG/400, RPGLE, SQLRPGLE, COBOL/400, COBOL ILE, CL/400, CLLE

●Developed and optimized stored procedures, triggers, and SQL queries in SQL Server for high-performance data manipulation.

●Utilized Neo4j for graph data modeling and Cypher query optimization, supporting the development of recommendation engines and real-time analytics applications.

●Worked with Active Directory for managing user identities, access permissions, and security groups, ensuring proper provisioning and de-provisioning protocols were followed.

●Collaborated with healthcare IT teams to ensure proper data access and security protocols for EMR systems, ensuring privacy and compliance

●Led database migrations into SQL Server, ensuring seamless data integrity across systems.

●Managed user access and permissions using cloud platforms like AWS IAM, ensuring secure and compliant access to various systems and databases

Change Healthcare - Nashville,TN Aug 2020 – Dec 2022

Senior Data Engineer

Responsibilities:

●Implemented data extraction and ingestion to raw, prep, and curative data lake zones (AWS S3) in Parquet format.

●Implemented end-to-end data integration using AWS Glue Data Catalog and Crawler to automate data discovery and cataloging from diverse sources in Amazon S3, while leveraging AWS DynamoDB (NoSQL) for efficient metadata management.

●Experience in advanced SQL analytical functions, performance analysis and fixing issues for the ETL jobs to Optimize the execution time and reduce the cost of resources in million dollars per annum.

●Hands-on experience on creating Airflow DAGs, task instances, and scheduled data pipelines as per requirements to execute in parallel and sequential.

●Implemented a callback function in Airflow to receive email notifications upon DAG failure, ensuring timely resolution of any issues. Developed custom apps and dashboards utilizing data from Dynamics 365 or Dataverse, ensuring seamless data integration, reporting, and business logic.

●Designed and implemented an Airflow pipeline to schedule and monitor bash jobs that check directories and run downstream jobs based on the results.

●Implemented trigger rules to specify the order and dependency of jobs, ensuring that the pipeline runs correctly and efficiently.

●Executed optimize and advance Hive-SQL queries with explode functions, lateral views.

●Managed the integration of diverse data sources, including structured and unstructured data, into RDS stores.

●Successfully imported Parquet data into Snowflake from Amazon S3 using a customized Python-based solution within Dataiku DSS, optimizing data transfer, and ensuring smooth integration.

●Proficient in seamlessly ingesting and integrating XML data into Snowflake, utilizing both built-in support and SQL techniques, and optimizing storage through efficient design of Snowflake tables with VARIANT columns.

●Built data pipelines to load data into Snowflake Data Warehouse using Apache Spark/Scala jobs running on Amazon EMR orchestrated using AWS Step Functions/Lambda Functions/CloudWatch.

●Utilized AWS Step Functions, to design and run workflows that stitch together multiple AWS services such as AWS Lambda and Amazon ECS into feature-rich applications.

●Worked with large healthcare datasets, such as insurance claims, patient records, or clinical data, to build data pipelines for processing and analysis. Designed and implemented ETL pipelines to extract, transform, and load patient data, improving the efficiency of health analytics and reporting systems.

● Used Snowflake's query profiling tools to identify and optimize performance bottlenecks in our data pipeline for our real-time analytics dashboard.

●Knowledge of compression methods in PostgreSQL, such as GIN and LZ4, to reduce storage space and improve performance. Designed and implemented ETL pipelines and data integration workflows, utilizing SQL Server for efficient data storage and retrieval.

●Led data migration efforts from legacy systems to SQL Server, ensuring smooth transition and minimizing data discrepancies.

●Utilized SQL Server performance optimization techniques, improving query performance and system efficiency. Automated the validation of PDF authenticity, eliminating the need for manual intervention.

●Developed an intricate PDF fraud detection system, utilizing pattern recognition for altered documents, triggering meticulous manual reviews, and leveraging MySQL and PostgreSQL for secure data storage.

●Leveraged AWS Athena and IAM permissions to create secure, role-based based views for data access, bolstering data security.

●Demonstrated expertise by organizing, compressing, and uploading Python scripts with shared functionalities to an S3 bucket, and then efficiently implemented an AWS Lambda LayerVersion using CloudFormation templates to enhance code reusability and deployment efficiency.

●Performed Data quality analysis on each data lake stage of developed pipelines and collaboratively worked in an Agile team.

●Worked on Elasticsearch, Logstash and Kibana (ELK) for monitoring lambda logs by generating visualizations.

●Developed dashboards in Kibana, that would visualize all the data that is being processed and transferred to data warehouse.

The Hartford Insurance Hartford, Connecticut March 2018 – July 2020

Data Engineer

Responsibilities:

●Build complex ETL jobs that transform data visually with data flows or by using compute services Azure Databricks, and Azure SQL Database.

●Transferred an already-running on-premises application to Azure. Processed and stored datasets using Azure services like Databricks, Azure SQL, and Blob Storage. For routine data transfer from a traditional MySQL database to an Azure SQL Data Warehouse, built-in ADF pipelines with event and time triggers were used.

●Created Azure functions to handle certain data transfer operations from Oracle systems to Azure SQL DB.

●Created and implemented an end-to-end ETL procedure for consuming massive data from blob storage utilizing ADF pipelines that can iterate across a changeable folder and sub-folder file system.

●Implemented data ingestion and data modeling from sources to the Azure Data Lake and performed data processing using Apache Spark transformations in Azure Databricks, stored the transformation results in Azure Synapse data warehouse which is accessed by Business for reporting.

●Replaced ADF pipelines with reusable python code using Databricks which can be used for multiple Data Engineer use cases. Integrated ECS with CI/CD pipelines for automated deployment.

●Extensive experience in developing data processing workflows, automating tasks, and building backend services using Python. Proficient in working with PySpark for big data processing and integration with cloud platforms like AWS. Proficient in writing optimized SQL queries and performing performance tuning on relational databases (such as MySQL and PostgreSQL).

●Experience in complex data transformations, ETL processes, and database migrations.

●Integrated Apache Kafka for real-time data streaming and messaging, enabling communication between distributed systems and ensuring reliable, high-performance data flow between microservices.

●Loaded data from Azure storage accounts to Azure Synapse to enable end users to perform analytical tasks.

●Automated dataflows using Logic apps and Power Automate (Flow) which connects different Azure services and Function apps for customizations.

●Creating pipelines, data flows and complex data transformations and manipulations using Azure Data Factory (ADF) and PySpark with Databricks.

●Designed and developed an Azure cloud-based ETL pipeline to extract customer data from APIs and efficiently process it into Azure SQL Database.

●Developed Databricks notebooks using PySpark and Spark SQL for comprehensive data transformation in Azure Data Lake across Raw, Stage, and Curated zones.

●Automated the daily data ingestion from a web service into Azure SQL DB, ensuring seamless and timely updates.

●Expertly deployed MapReduce solutions to manage unstructured data types, including xml, Json, Avro data files, and sequence files, effectively handling complex log data scenarios.

●Implemented procedures in Azure SQL Data Warehouse to create final aggregate tables, crucial for generating informative dashboards.

●Proficiently designed and executed ad-hoc analysis solutions leveraging Azure Data Lake Analytics/Store and HDInsight, effectively delivering insights through agile and data exploration.

●Devised JSON Scripts to enable the seamless deployment of data processing pipelines in ADF, effectively utilizing SQL activities for data manipulation.

●Created Kafka producer API to send live stream json data into various Kafka topics.

●Using Azure DevOps to deploy ETL pipelines and trace changes from the development environment to the production environment over time, enabling look-back situations.

●Development in Azure Analysis Service Tabular cubes, Azure SQL DB, and ADF, thereby ensuring seamless continuous integration practices.

KPMG Hyd India Sept 2014 – Nov 2017

Data Engineer

Responsibilities:

●Using Sqoop to import and export data from Oracle and PostgreSQL into HDFS to use it for the analysis.

●Migrated Existing MapReduce programs to Spark Models using Python.

●Migrating the data from Data Lake (hive) into S3 Bucket.

●Done data validation between data present in Data Lake and S3 bucket.

●Used Spark Dataframe API over Cloudera platform to perform analytics on hive data.

●Designed batch processing jobs using Apache Spark to increase speeds by ten-fold compared to that of MR jobs.

●Used Kafka for real time data ingestion.

●Created different topics for reading the data in Kafka.

●Read data from different topics in Kafka.

●Involved in converting the hql’s into spark transformations using Spark RDD with support of python and Scala.

●Moved data from S3 bucket to Snowflake Data Warehouse for generating the reports.

●Written Hive queries for data analysis to meet the business requirements.

●Migrated an existing on premises application to AWS.

●Used AWS Cloud with Infrastructure Provisioning / Configuration.

●Developed PIG Latin scripts to extract the data from the web server output files and to load into HDFS.

●Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.

●Created many Spark UDF and UDAFs in Hive for functions that were not preexisting in Hive and Spark Sql.

●Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.

●Implementing different performance optimization techniques such as using distributed cache for small datasets, partitioning, and bucketing in hive, doing map side joins etc.

●Good knowledge on Spark platform parameters like memory, cores, and executors

●By using Zookeeper implementation in the cluster, provided concurrent access for Hive Tables with shared and exclusive locking.

●Configured the monitoring solutions for the project using Data Dog for infrastructure, ELK for app logging.

Education (2010-2014)

Bachelor’s in Computer Science Engineering – KL University, Vijayawada, India

Contact this candidate