Data Scientist / Analytics Engineer
PropStream
All India, Solapur • 1 month ago
Experience: 5 to 9 Yrs
PREMIUM
Deal of the Day
--:--:--
15 Days Free Trial
After Free Trial → Flat 50% OFF
Upgrade to CVX24 Premium
- Free Resume Writing
-
Get a Verified Blue tick
- See who viewed your profile
- Unlimited chat with recruiters
- Rank higher in recruiter searches
- Get up to 10× more recruiter visibility
- Auto-forward profile to 10 top recruiters
- Receive verified recruiter messages directly
- Unlock hidden jobs, not visible to free users
$0
Activate
$0
A small token amount will be charged to verify.
Get Refund in 48 Hours.
Free Earplugs Delivery Only after Payment of Rs. 99 for Five Consecutive Months.
After free-trial 6 Months subscription will be auto Activated @ $
1
(Cancel Anytime). Quoted price includes 50% discount.
Enter Your Details
Job Description
As a senior Databricks Architect, your primary responsibility will be to design, build, and govern the Lakehouse data platform from scratch. You will be in charge of the entire data infrastructure, ensuring smooth data flows from raw ingestion through transformation to serving, and establishing engineering standards for the data organization. Your role is pivotal in driving the adoption of Databricks, Unity Catalog, and modern Lakehouse patterns across all data products and pipelines.
**Key Responsibilities:**
- Design and implement a production-grade Medallion Architecture (Bronze / Silver / Gold) across all data pipelines.
- Define data modeling standards and schema evolution policies for the Lakehouse.
- Architect end-to-end data flows covering ingestion (streaming and batch) through transformation and serving layers.
- Lead the setup, configuration, and rollout of Unity Catalog as the centralized governance layer for all data assets.
- Implement fine-grained access control, data masking policies, and audit logging.
- Establish data lineage tracking for end-to-end visibility across all pipelines.
- Define and enforce data classification and sensitivity frameworks for PII and regulated data assets.
- Build and maintain production-grade data pipelines using PySpark, Delta Live Tables (DLT), and Databricks Workflows / Jobs.
- Design modular, reusable pipeline patterns including incremental ingestion, CDC, and full-refresh strategies.
- Implement robust pipeline observability with logging, alerting, lineage tracking, and SLA monitoring.
- Leverage Databricks Repos for CI/CD integration, managing code promotion across environments.
- Optimize Spark execution plans to identify and resolve performance bottlenecks.
- Right-size cluster configurations for BI and ad hoc analytics workloads.
- Utilize Serverless Warehouses and SQL Warehouses to minimize cost and cold-start latency.
- Set up and maintain Databricks Repos with standardized project structures and Git integration.
- Define Python coding standards, notebook best practices, and modular library patterns.
- Establish unit testing and integration testing frameworks for Spark pipelines.
- Configure workspace-level and account-level security.
- Design and enforce network isolation for sensitive data workloads.
- Ensure compliance with data residency and access control requirements.
**Qualifications Required:**
- 5+ years of hands-on experience with Databricks, with at least 2 years in an architect or senior lead role.
- Deep expertise in Unity Catalog and the Medallion Architecture.
- Proven experience designing and deploying production pipelines with Databricks Jobs and Workflows.
- Hands-on experience with Databricks Repos and CI/CD integration.
- Experience configuring and operating Serverless SQL Warehouses and compute for Jobs.
- Proficiency in DataFrames, Spark SQL, window functions, broadcast joins, and UDFs.
- Experience with structured streaming and micro-batch processing patterns.
- Ability to diagnose and resolve Spark performance issues using Spark UI and event logs.
- Advanced Python skills with a software engineering background.
- Experience building modular Python libraries for data engineering use cases.
- Familiarity with common data engineering libraries like pandas, pydantic, and great_expectations.
- Experience deploying Databricks on AWS.
- Familiarity with cloud-native storage and infrastructure-as-code tooling.
- Relevant certifications such as Databricks Certified Data Engineer Professional.
Please note that this job description is written in the second person and follows the requested format. As a senior Databricks Architect, your primary responsibility will be to design, build, and govern the Lakehouse data platform from scratch. You will be in charge of the entire data infrastructure, ensuring smooth data flows from raw ingestion through transformation to serving, and establishing engineering standards for the data organization. Your role is pivotal in driving the adoption of Databricks, Unity Catalog, and modern Lakehouse patterns across all data products and pipelines.
**Key Responsibilities:**
- Design and implement a production-grade Medallion Architecture (Bronze / Silver / Gold) across all data pipelines.
- Define data modeling standards and schema evolution policies for the Lakehouse.
- Architect end-to-end data flows covering ingestion (streaming and batch) through transformation and serving layers.
- Lead the setup, configuration, and rollout of Unity Catalog as the centralized governance layer for all data assets.
- Implement fine-grained access control, data masking policies, and audit logging.
- Establish data lineage tracking for end-to-end visibility across all pipelines.
- Define and enforce data classification and sensitivity frameworks for PII and regulated data assets.
- Build and maintain production-grade data pipelines using PySpark, Delta Live Tables (DLT), and Databricks Workflows / Jobs.
- Design
Skills Required
Python
AWS
Elasticsearch
Apache Kafka
Databricks
Unity Catalog
Lakehouse patterns
PySpark
Delta Live Tables DLT
Databricks Workflows
Spark SQL
DataFrames
window functions
broadcast joins
UDFs
pandas
pydantic
greatexpectations
S3ADLS
Terraform
Databricks Certified Data Engineer Professional
Databricks Certified Associate Developer for Apache Spark
Delta Live Tables DLT
dbt data build tool
MLflow
Posted on: March 26, 2026
Relevant Jobs
Step 2 of 2