Data Engineer Business Intelligence

Location:

Bentonville, AR

Posted:

June 04, 2025

Contact this candidate

Resume:

Srinivas Dasari

DATA ENGINEER

**************@*****.***

+1-479-***-****

PROFESSIONAL SUMMARY

IT Consultant with 8 years of experience in Technology Consulting, specializing in Data Engineering and Data & Analytics. I have successfully delivered end-to-end solutions involving Enterprise Data Warehousing (EDW), Data Lakes, and Business Intelligence (BI) projects for clients across the Finance, Healthcare, and Insurance sectors. My expertise lies in data transformation, with hands-on experience designing and developing robust DWH systems and BI applications using tools such as Informatica PowerCenter, Informatica BDM, IICS, Snowflake, PL/SQL, and other industry-leading BI platforms. I have worked extensively with databases including Oracle SQL Developer, SQL Server, Toad for DB2, Snowflake, Hive, and MySQL. Additionally, I bring strong experience with AWS cloud services, particularly S3, to support scalable big data development. My background also includes building and managing data stream applications using Big Data technologies like Hadoop, MapReduce, HDFS, Hive, Spark, and PySpark, making me well-equipped to drive data-driven transformation initiatives.

Delivered large-scale data platform modernization by migrating pipelines from GCP BigQuery, Snowflake, and Teradata to AWS Databricks, streamlining cost, performance, and operational efficiency.

Designed modular PySpark frameworks integrated with Delta Lake to enable ACID transactions, schema evolution, and version-controlled data zones (bronze, silver, gold).

Orchestrated over 100 batch and streaming workflows using Apache Airflow and Databricks workflows with SLA

monitoring, automated retries, and event-based triggers for critical supply chain and sales data pipelines.

Developed Kafka-based ingestion pipelines to capture and process real-time sales, shipment, and inventory events with minimal latency.

Tuned Spark workloads using AQE, broadcast joins, and partition pruning to reduce job runtime by up to 60% and optimize resource utilization in Databricks.

Embedded data quality frameworks with row-level validation, null and threshold checks, referential integrity, and audit controls across ingestion and transformation layers.

Built dimensional and SCD2 data models on Snowflake and Delta Lake for scalable and consistent analytics, enabling seamless joins and reliable time-series reporting.

Partnered with cross-functional engineering, analytics, and ML teams to translate evolving business logic into reusable, production-grade PySpark and SQL transformation layers.

TECHNICAL SKILLS

Big Data & Analytics: Spark, PySpark, Databricks, Databricks Workflows, Delta Lake, Delta Live Tables (DLT), Hadoop, HDFS, YARN, MapReduce, Hive, Sqoop, Kafka, Airflow, Snowflake, Tableau, Power BI, ETL/ELT.

Cloud Platforms:

AWS: Amazon S3, EC2, EMR, Athena

Azure: Data Factory, Data Lake Storage, Azure SQL, Azure Databricks

GCP: BigQuery, Cloud Storage

Programming Languages & Scripting: Python, Java, Scala, SQL, Spark SQL, Snowflake SQL, HQL, Unix, Linux Shell Scripting

Data Engineering & Operations: Data Extraction, Ingestion, Cleaning, Transformation, Mapping, Aggregation, Joins, Validation, Standardization, Deduplication, Profiling, Imputation, Enrichment, Change Data Capture (CDC)

Databases: PostgreSQL, MySQL, Oracle, SQL Server, SAP HANA, Teradata, Amazon Aurora, Amazon RDS, Azure SQL.

Databricks Expertise: PySpark, Spark SQL, Streaming, Delta Lake, DLT, Workflows, Jobs, Notebooks, Unity Catalog, dbutils, Secret Scopes, Git Integration, Repos

Version Control & CI/CD: Bitbucket, Jenkins, GitHub Actions, CI/CD, Azure Pipelines, Bitbucket Pipelines, Databricks Repos, Airflow DAG Versioning

PROFESSIONAL EXPERIENCE

Senior Data Engineer

KANEXY LTD, UK (Remote)

Jan 2022 – April 2025

Perform data extraction, transformation, and loading (ETL) between systems using Informatica Power Center and Informatica Cloud (IICS) and Informatica BDM for Data Migration and legacy data integration to cloud projects.

Experience in utilizing Apache Spark and PySpark for distributed data processing and implemented complex algo rithms using Spark's Resilient distributed datasets (RDDs)

Integrated into the Snowflake Cloud Data Warehouse with AWS Cloud S3 Buckets to access files. and Worked on Continuous Data Loading & Data Ingestion in Snowflake by using Snowpipe.

Developed PySpark applications to process and analyze large-scale datasets, utilizing distributed computing capabilities of Apache Spark.

By Using Hadoop HDFS we have stored the data extracted by implementing ETL pipelines to handle Big data.

Integrated the Snowflake Cloud Data Warehouse with AWS Cloud S3 Buckets to access files

Worked with SCD Type 2 and Experience working with various Informatica concepts like partitioning, Performance tuning, identifying bottlenecks, deployment groups, Informatica scheduler and more

Environment: PySpark, Spark SQL, Delta Lake, Apache Airflow, Databricks, Kafka, GCP BigQuery, Snowflake, Teradata, Hive, HQL, AWS (S3, EMR, EC2, Athena), Git, Jenkins, SQL, CI/CD, Data Quality Frameworks, Dimensional Modeling, SCD2.

Senior Software Engineer

Social Tek AI ML Business solutions pvt ltd

Client: Walt Disney Oct 2015 – Jan 2020

• Analyze the business requirements and framing the Busi-ness Logic for the ETL Process and maintained the ETL process using Informatica Power Center, Informatica BDM, IICS

• Experienced in Parsing high-level design specs to simple ETL coding and mapping standards. Worked with team to convert Trillium process into Informatica IDQ objects.

• Hands on experience using AWS Service subnets EC2, S3. Cloud front, SNS, cloud watch and cloud formation focusing on high availability.

• Proficient in extracting structured and unstructured data from various sources such as sales force AWS cloud and GCP

• Configured and managed AWS IAM (Identity control across AWS services Worked on Agile Methodology, participated in daily/weekly team meetings, guided two groups of seven developers in Informatica and Access Management) roles, policies, and permissions, ensuring secure access Power Center/Data Quality (IDQ), peer reviewed their development works and provided the technical solutions. Proposed ETL strategies based on requirements

• Used Python and Django creating graphics, XML processing data exchange and business logic implementation. • Use debugger in identifying bugs in existing mappings by analyzing data flow, evaluating transformations and create mapplets that provides reusability in mappings.

Environment: PySpark, Hive, SQL, Hadoop (HDFS), Apache Airflow, AWS EMR, S3, EC2, Excel, ETL Pipelines,

Data Validation, Data Cleansing, Aggregation, Reporting & Insights, Compliance Analytics, Student Information Systems (SIS).

EDUCATION

Bachelor of Technology in the locale of Electronics and Instrumentation from JNTU University, Kakinada. [L.B.R.C.E] Mylavaram. (2011-2015)

MSc Distinction in Cybersecurity with Project Management (2020-2022) in University of Bedfordshire, United Kingdom.

Contact this candidate