Senior Data Scientist / Machine Learning Engineer
- 1372 Peachtree St NE, Atlanta, GA 30309, USA
PubMatic is a digital advertising technology company for premium content creators. The PubMatic platform empowers independent app developers and publishers to control and maximize their digital advertising businesses. PubMatic’s publisher-first approach enables advertisers to maximize ROI by reaching and engaging their target audiences in brand-safe, premium environments across ad formats and devices. Since 2006, PubMatic has created an efficient, global infrastructure and remains at the forefront of programmatic innovation. Headquartered in Redwood City, California, PubMatic operates 13 offices and six data centers worldwide.
We are immediately hiring a strong Data Scientist or Machine Learning Engineer to join us to develop, implement and extend data-intensive machine learning software for real-time auctioning, ad inventory estimation, and audience segmentations.
You will design and implement core components of our algorithms, as well as model and monetize the large amounts of data that PubMatic generates daily.
Working with our Data Science and AdServing teams, you will apply Machine Learning to help get things done.
- Development and implementation of data-intensive machine learning software for real-time auctioning, ad inventory estimation, audience segmentations, and other AdTech applications
- Working with data scientists, product managers, and software engineers to develop and support the software for new Machine Learning products
- Ensuring excellence in delivery to internal and external customers
- MS / PhD in STEM field
3+ years of hands-on industry work experience designing and building large-scale ML algorithms and ETL that are well-designed, cleanly coded, well-documented, operationally stable, and timely delivered
5+ years total analytical work, including academic research
Solid Experience with a mix of the following:
Python or R, including ML libraries (SKLearn, NumPy, caret, e1071), including CPU/GPU parallelization, matrix algebra, vectorization, linear programming, lambda programming, OOP
At least one of the DL frameworks (TensorFlow, PyTorch, Caffe, Theano, Keras, or alike)
- Graduate statistics and probability (inference, hypothesis testing, p-value, ANOVA, CLT, LLN, Bayes’ theorem, A/B testing, combinatorics, PDF/CDF, joint/conditional/marginal densities)
- Vector calculus (gradients, Jacobians, partial derivatives and integrals, optimization)
- Linear algebra (eigen values/vectors, inverses, decompositions, orthogonality, multi-linear)
- Time series (ARIMA, GARCH, forecasting, Kalman filter)
- Shallow ML algorithms: regressions, SVM, kMeans, kNN, NB, HMM, PCA, NMF, SVD, XGBoost, decision trees, ensemble methods (random forest)
- Deep NN algorithms: MLP, RNN, LSTM, CNN, GRU
- ML concepts: backprop, hyperparameter tuning (Bayesian optimization, grid/random search), regularization, learning rate, optimization
- Advanced work with SQL or NoSQL, including nested/join/aggregate queries, stored procedures, over partition by, basic stat functions
- Cloud compute engines (AWS, Azure, GCP and alike), ML on clusters of GPUs, SageMaker, Jupyter
- Excellent communication skills, cultural fit and natural curiosity in learning the ML developments and domain expertise
Nice to have:
- Experience in Programmatic advertising and RTB
- Deep reinforcement learning (Bellman equations, MDP, policy optimization, credit assignment, multi-agent, …)
- Proficiency with Spark (ML Lib, GraphX), Hadoop, Kafka, Hive
- Scala, Java, C/C++
- Record of STEM publications in top journals or conferences
- High rank at Kaggle competitions
PubMatic is proud to be an equal opportunity employer; we don’t just value diversity, we promote and celebrate it.
We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.
All your information will be kept confidential according to EEO guidelines.