Data Engineer Big

Location:

Boston, MA

Posted:

October 15, 2025

Contact this candidate

Resume:

Karthik Gollakore

Data Engineer

IL, USA +1-618-***-**** *******.***********@*****.***

SUMMARY

Results-driven Data Engineer with 5+ years of experience designing, developing, and maintaining scalable data solutions across cloud and on-premise environments. Proven expertise in architecting robust ETL pipelines, optimizing data processing workflows, and implementing big data solutions using Python, SQL, AWS, Azure, Databricks, and Snowflake. Skilled in integrating diverse data sources, applying advanced analytics, and delivering actionable insights through BI tools. Adept at ensuring data integrity, governance, and compliance while collaborating with cross-functional teams to achieve business objectives. TECHNICAL SKILLS

Programming Languages: Python, SQL, Java, Scala, PySpark, R, Shell Scripting, JavaScript Data Engineering: ETL Development, Data Modeling, Data Lakes, Data Migration, Data Quality Frameworks, Data Pipelines Big Data Technologies: Apache Spark, Hadoop, Hive, Kafka, HDFS, Flume Cloud Platforms: AWS (S3, Redshift, Glue, Lambda, EMR), Azure (ADF, Synapse, Databricks, ADLS), GCP (BigQuery, Dataflow) Databases: MySQL, PostgreSQL, MongoDB, Oracle, Teradata, SQL Server, Cassandra Data Warehousing: Snowflake, Amazon Redshift, Google BigQuery, Azure Synapse Data Visualization: Power BI, Tableau, Excel (Advanced), Power Query, Looker DevOps & CI/CD: Docker, Kubernetes, Jenkins, Git, Azure DevOps, GitHub Actions Data Governance: Data Lineage, GDPR Compliance, Data Security, Metadata Management APIs & Integration: RESTful APIs, JSON, XML, YAML, GraphQL Version Control & Collaboration: Git, Jira, Confluence, ServiceNow Testing & Automation: Pytest, Unit Testing, Automated Data Validation Scripts, CI/CD Test Integration PROFESSIONAL EXPERIENCE

Data Engineer Northern Trust USA Jan 2025 – Current

• Designed and maintained end-to-end ETL pipelines using AWS Glue, Python, SQL, and MongoDB, resulting in a 30% increase in processing efficiency and significantly improving analytics readiness.

• Integrated the ELK stack (Elasticsearch, Logstash, Kibana) to centralize log collection and visualization, improving operational visibility and enabling proactive troubleshooting that reduced pipeline issue resolution time by 35%.

• Automated end-to-end data workflows in Databricks and Azure Data Factory, streamlining ETL processes and reducing deployment time by 25% while maintaining consistent transformation logic and high data quality across environments.

• Developed interactive Power BI dashboards for KPI tracking, integrating real-time data sources and advanced DAX measures to provide business leaders with actionable insights, accelerating data-driven decision-making.

• Optimized PySpark transformations and complex data joins by refining partitioning, caching, and shuffle operations, reducing batch job execution time by 20% and significantly increasing overall pipeline throughput. Data Engineer Mphasis India June 2020 – July 2023

• Architected and deployed large-scale data pipelines using Python, SQL, and Apache Spark to process multi-terabyte datasets, optimizing resource allocation and parallelization to enhance scalability and reduce processing delays.

• Implemented ETL solutions using Azure Data Factory and AWS Glue to orchestrate and automate data ingestion from multiple sources, increasing analytics data availability by 40% and improving the timeliness of business reporting.

• Optimized complex queries in Redshift and Snowflake, reducing execution times by 35% and lowering cloud computing costs.

• Integrated APIs, streaming data, and flat-file sources into a unified analytics environment, standardizing formats and automating ingestion processes to enable more comprehensive, real-time data analysis across business domains.

• Designed Power BI and Tableau dashboards to visualize KPIs and trends, improving data-driven decision-making across teams. Jr. Data Engineer Hexaware Technologies India Nov 2018 – May 2020

• Automated ETL workflows using Python and SQL, reducing manual intervention by 50% and improving the reliability of critical data.

• Worked with Apache Spark and Hadoop ecosystems for large-scale data processing, enabling faster analysis of high-volume datasets.

• Developed and maintained analytical data models in Snowflake and Redshift, improving reporting accuracy and reducing query times.

• Supported migration of on-premise data warehouses to AWS cloud, ensuring minimal downtime and zero data loss during transitions.

• Prepared optimized datasets for Power BI dashboards, enabling faster load times and better visualization performance.

• Developed and maintained data integration solutions using SSIS and Informatica, facilitating data access for analytics and reporting purposes and Used GIT for version control with Data Engineer team and Data Scientists colleagues.

• Engineered data pipelines, improving processing efficiency by 30% through optimization techniques, including parallel processing and data partitioning.

EDUCATION

Master of Science in Computer Information Systems – Southern Illinois University, USA Aug 2023 – May 2025 Bachelor of Technology in Computer Science – Madanapalle Institute of Technology and Science, India May 2015 – Jul 2019

Contact this candidate