Senior Data Scientist
CertiK
Data Science
New York, NY, USA
Today, CertiK supports thousands of enterprise clients and Web3 projects globally, with a distributed international team spanning North America, Asia, and Europe. The company is backed by leading investors including Coatue, Goldman Sachs, Insight Partners, and Sequoia Capital, and has been recognized by organizations such as the World Economic Forum and CB Insights for its contributions to blockchain security innovation.
The primary responsibility of this role is to build/maintain ETL pipelines & process large datasets from APIs/databases/third-party platforms to enable real-time team analytics and automate data preprocessing (cleaning/normalization/validation) for client accounts using rule-based logic/statistical checks to ensure data quality & prepare analysis-ready datasets for modeling/reporting.
Responsibilities
- Analyze large-scale blockchain/transactional/social-media datasets to identify patterns/trends/anomalies/risk indicators.
- Develop/apply machine learning models (graph-based algorithms & NLP techniques) for threat detection/behavioral analysis/monitoring.
- Perform feature engineering/model training/testing/validation to ensure accuracy/robustness/interpretability.
- Design/implement scalable data pipelines/ETL processes & CI/CD workflows for ingestion/preprocessing/aggregating blockchain & social media data.
- Create dashboards/visualizations to deliver actionable insights & provide data-driven guidance for strategic planning.
- Collaborate with engineering/product/business teams to translate analytical requirements into scalable data-science solutions.
Requirements
- Master’s degree in Data Science, Statistics, or a related field.
- Sound knowledge of feature engineering/model evaluation/validation & on-chain patterns/risk-analysis/threat-detection methodologies.
- In-depth understanding of blockchain/distributed ledger data structures & analytics.
- Strong ability to apply machine-learning & statistical modeling techniques to large-scale datasets.
- Expertise in analyzing graph/text-based or transactional data.
- Familiar with cloud platforms (AWS/Azure/GCP) & Spark-based distributed-computing systems (e.g., Databricks).
- Proficient in Python, SQL (PostgreSQL/MySQL/NoSQL) & ETL tools (Apache Airflow).
Target annual salary compensation for this role performed is $110,000 to $125,000. The exact compensation at which this job is filled will be determined by the skills and experience of qualified candidates.