Lekhraj Rawal
Sr. Data Analyst/ Engineer
Email: **************@*****.*** Phone: 385-***-**** linkedin.com/in/lv1001 Professional Summary:
● 7+ years of experience managing the complete data lifecycle, including data ingestion, transformation, storage, governance, visualization, and advanced analytics across diverse business domains.
● Proven expertise in building data pipelines and ELT/ETL workflows for both batch and real-time processing using Apache Airflow, AWS Glue, Azure Data Factory, and Informatica.
● Deep proficiency in Python, Scala, SQL, R, and Java, building robust and scalable data engineering solutions, with a strong emphasis on code optimization and performance tuning.
● Extensive experience with cloud platforms—AWS, Azure, and Google Cloud—leveraging services such as AWS Lambda, Kinesis, Azure Synapse Analytics, Azure Databricks, and Google Dataflow for distributed computing and real-time analytics, Oracle Identity Manager, ZScaler, SCCM, CrowdStrike, and SolarWinds.
● Advanced skills in Power BI, creating complex data models, DAX measures, paginated reports, and integrating Power BI with Azure services (e.g., Azure Analysis Services, Azure Synapse). Expertise in Row-Level Security (RLS) and Power BI Service administration for enterprise-scale deployments.
● Hands-on experience with Snowflake, Redshift, BigQuery, and Delta Lake for warehousing, and integrating modern lakehouse architectures using Databricks and Apache Hudi/Iceberg.
● Expertise in Big Data ecosystems: Apache Spark (PySpark & SparkSQL), Hadoop, Kafka, Flink, and Delta Live Tables, enabling high-throughput and low-latency data processing pipelines.
● Solid experience in containerization (Docker), orchestration (Kubernetes, AKS, EKS), and deploying microservices and data APIs, ensuring fault-tolerant and scalable applications.
● Advanced data science and AI experience: building and deploying machine learning models using TensorFlow, Scikit-learn, MLflow, and SageMaker, and integrating AI insights directly into BI dashboards.
● Strong experience with CI/CD pipelines (Azure DevOps, GitHub Actions, GitLab CI/CD) and automated testing frameworks, ensuring reliable and maintainable data workflows.
● Advanced visualization skills using Power BI, Tableau, Looker, and Python libraries (Matplotlib, Seaborn, Plotly), producing insightful, interactive dashboards for business intelligence and executive reporting.
● Well-versed in RESTful APIs, GraphQL, and event-driven architectures to integrate disparate data systems, enabling seamless data exchange.
● Familiarity with DataOps, DevSecOps practices, and tools such as Great Expectations for data validation, ensuring robust data engineering pipelines.
● Deep knowledge of Agile/Scrum, Kanban frameworks, and project management tools like Jira, Confluence, and Azure Boards, driving iterative delivery and cross-team collaboration.
● Strong communication and documentation skills, adept at mentoring junior engineers and promoting best practices in data engineering and analytics.
Technical Skills:
Programming Languages:Python, Scala, SQL, R, Java, Bash/Shell scripting Big Data Technologies:Apache Hadoop, HDFS, Apache Spark (Core, SQL, Streaming), PySpark, Hive, Pig, Sqoop, Kafka, Apache Airflow, Delta Lake, Apache Flink, Apache Hudi, Apache Iceberg Cloud Platforms:Amazon Web Services (AWS), Microsoft Azure, Google Cloud Platform (GCP) Data Storage and Databases:AWS RDS, DynamoDB, Azure SQL Database, Google Cloud Spanner, MySQL, PostgreSQL, Cassandra, MongoDB, Oracle Database, SQL Server, Cosmos DB Data Warehouse & Lakehouse:Snowflake, Amazon Redshift, Google BigQuery, Azure Synapse Analytics, Databricks Lakehouse, Delta Lake
Infrastructure as Code (IaC):Terraform, AWS CloudFormation, Azure Resource Manager (ARM), Azure Bicep Containerization & Orchestration:Docker, Kubernetes (AKS, EKS, GKE), Helm CI/CD:Jenkins, Azure DevOps, GitHub Actions, GitLab CI/CD, CircleCI ETL/ELT & Workflow Orchestration Tools:Informatica, Azure Data Factory (ADF), SSIS, dbt (Data Build Tool), Apache Nifi, Talend
Version Control & Automation:Git (GitHub, GitLab, Bitbucket), Ansible, Terraform Monitoring & Alerting:AWS CloudWatch, Azure Monitor, Prometheus, Grafana, Nagios, Splunk, Datadog Data Visualization & BI Tools:Power BI (incl. DAX, Power Query, RLS, Paginated Reports), Tableau, Looker, Qlik Sense, Seaborn, Matplotlib, Plotly
Data Formats & Messaging:JSON, XML, Parquet, Avro, ORC, CSV, Apache Kafka, RabbitMQ Machine Learning & AI Libraries:Pandas, NumPy, Scikit-Learn, TensorFlow, Keras, PyTorch, SciPy, MLflow, XGBoost, LightGBM
Data Governance & Quality:Great Expectations, Apache Atlas, Collibra, Informatica Data Quality API & Integration:RESTful APIs, GraphQL, gRPC, SOAP, Webhooks, Apache Camel Security & Compliance:IAM (AWS, Azure), VPC, Encryption (KMS, SSL/TLS), GDPR, HIPAA Project Management & Collaboration:Jira, Confluence, ServiceNow, Trello, Slack, Agile (Scrum, Kanban) Operating Systems:MacOS, Windows, Linux (Ubuntu, RedHat, CentOS), UNIX Professional Experience:
Sr. Data Engineer
Caterpiller - Chicago, IL
July 2021 - Till
Designed and developed complex Power BI dashboards and reports tailored for supply chain and operations teams using Clarity EPMO and ServiceNow datasets.
Built robust data models and implemented Row-Level Security (RLS) in Power BI to manage data access by department and user roles.
Connected and analyzed structured and semi-structured data from Hadoop using Hive and Impala to support executive reporting.
Automated ETL pipelines using Informatica and Power Automate for efficient data movement across Oracle Identity Manager and ServiceNow.
Enhanced dashboard performance by optimizing DAX measures, data refresh schedules, and incremental loading techniques.
Conducted advanced data cleansing and transformation using SQL, Python, and Power Query to ensure data quality for analytics.
Integrated external data sources including Splunk, ZScaler, and CrowdStrike for unified security and performance reporting.
Provided end-user training for Power BI tools, enabling business teams to generate their own ad hoc reports.
Collaborated with security and compliance teams to implement version control and manage updates in PowerApps and Power BI solutions.
Managed the creation of audit-compliant reporting systems utilizing Asignet and SCCM logs for compliance tracking.
Partnered with data science teams to structure input datasets for predictive modeling use cases.
Proactively evaluated emerging technologies (Kafka, Flink, GraphQL) to propose enhancements to Caterpillar’s data engineering strategy.
Tech Stack: Python, PySpark, REST APIs, AWS (Redshift, Athena, ECS, EKS, RDS, DynamoDB, EMR, Glue, Lambda, QuickSight, CloudWatch, Lake Formation, Step Functions, Kinesis), Hadoop, HDFS, Kafka, Airflow, Sqoop, Hive, Pig, Spark, Power BI, Tableau, dbt, JSON, XML, Parquet, BitBucket, Pandas, NumPy, Scikit-Learn, TensorFlow, Jenkins, CloudFormation, Ansible, Kubernetes, Prometheus, Grafana, Agile, Scrum, Jira Data Engineer
Optum - Eden Prairie, Minnesota
Nov 2019 - June 2021
Developed dynamic Power BI dashboards for financial and operational metrics sourced from Clarity EPMO, ServiceNow, and Oracle DB.
Reshaped and transformed raw data using SQL and Python to create analytics-ready datasets for business intelligence use.
Built scalable data warehouse models to support self-service BI and enterprise reporting.
Designed and executed ETL workflows with Informatica and Visual Studio to migrate data between on-prem systems and cloud environments.
Applied Row-Level Security (RLS) in reports to restrict access based on user department and region.
Tuned queries and datasets to ensure optimal report rendering performance and efficient data refresh cycles.
Automated alerting and reporting systems integrated with SolarWinds, SCCM, and ZScaler for proactive system health monitoring.
Maintained PowerApps components for internal tools, applying version control practices and CI/CD pipelines for deployment.
Worked closely with stakeholders to gather business requirements and iterate on dashboard prototypes.
Accessed data from Hadoop clusters using Impala for batch and real-time analysis.
Provided support and documentation for Power BI adoption and knowledge sharing within business units.
Engaged in continuous R&D of new data engineering technologies (Delta Lake, Airflow, dbt) to propose strategic improvements to Optum’s data platform.
Tech Stack: AWS (EC2, S3, Lambda, RDS, Step Functions, Glue, CloudWatch), Databricks (Spark SQL, PySpark, Scala), Snowflake, Kafka, Hadoop, Hive, Pig, Sqoop, Informatica, MySQL, Cassandra, Docker, Kubernetes, Terraform, Jenkins, GitHub, Nagios, Power BI, Tableau, JSON, XML, Parquet, Jira, Agile, Scrum Data Analyst
Comcast
Aug 2017 - Oct 2019
● Proficient in Python for data manipulation, analysis, and scripting.
● Led development of enterprise-wide dashboards in Power BI with integrations from Splunk, ServiceNow, and Asignet for IT performance metrics.
● Built and maintained robust ETL pipelines using Informatica to transfer and cleanse data from multiple operational systems.
● Applied advanced DAX and Power Query transformations to handle complex business logic in reporting.
● Engineered data ingestion and enrichment processes using Hive and Impala on Hadoop infrastructure.
● Collaborated with stakeholders from marketing, finance, and operations to customize BI solutions that aligned with KPIs.
● Developed PowerApps solutions to streamline business processes, integrating connectors for Clarity EPMO and Oracle Identity Manager.
● Implemented security layers and user-based access control within Power BI and PowerApps using RLS and role mapping.
● Created self-service BI portals enabling users to interact with reports and export customized data views.
● Optimized performance of dashboards by reducing data model size, applying indexing, and scheduling asynchronous refreshes.
● Documented end-to-end data processes and conducted Power BI training workshops to ensure seamless transition to business users.
● Utilized SolarWinds and SCCM logs for automated asset tracking and IT operations monitoring.
● Wrote complex SQL queries for database management and data extraction. Tech Stack: Python, SQL, Azure, Hadoop, Spark, CDH, Oracle, CosmosDB, Hive, Pig, REST API, Azure DevOps, Tableau, ETL, ADF, Git, Maven, Splunk, XML, Jira, Agile.
Education:
Masters - Business Analytics - Lewis university, Romeovile IL.