Data Engineer - Machine Learning Product Catalogue

Full-time

Job Description

The salary range for this position is (contract of employment):

mid: 14 200 - 19 690 PLN in gross terms
senior: 18 400 - 25 410 PLN in gross terms

A hybrid work model that incorporates solutions developed by the leader and the team

We are looking for a Data Engineer with a focus on the data processing and preparation, deployment and maintenance of our ML/data projects. Join our team to enhance your skills related to deploying data-based processes, MLOps Machine Learning approaches and share the skills within the team.

We are looking for people who have:

2+ years hands-on experience in Python and its data processing toolset (pandas, NumPy)
Experience in process/solution monitoring
Knowledge and experience in processing large datasets with Big Data tools, especially Spark (PySpark)
Proficiency in using development tools (git, issue tracking, pull requests, code reviews etc.), familiarity with software engineering best practices (PEP8, code review, documentation, CI/CD, testing, automation etc.)
DevOps experience
Experience in writing advanced and efficient SQL queries (especially in GCP/BigQuery environment)
Experience in working on cloud solutions and architecture (GCP, AWS, Azure)
Understanding of AI related concepts (classification vs clustering, modeling, precision/recall metrics, model evaluation etc.) and demonstrated ability to use those metrics to back up assumptions and evaluate outcomes
Positive attitude and ability to work in a team
Good communication skills and pro-activity in seeking, clarifying and understanding information from end users and stakeholders

An additional advantage would be:

Previous experience in building, evaluating or deploying ML/AI-based solutions
Knowledge of ML libraries (sklearn, xgboost, lgbm)
MLOps practical experience
Previous experience with GCP tools for data processing e.g. BigQuery, Dataproc etc. and workflow automation solutions, e.g. Airflow
GCP certifications and/or hand-on experience in GCP including ML/AI tools (vertex AI)

Our techstack:

Python, BigQuery SQL, Spark
Google Cloud Platform (Airflow, BigQuery, Composer)
GitHub (code storage, CI/CD, hosting our own Data Science Python library)

What we offer:

A hybrid work model that you will agree on with your leader and the team. We have well-located offices (with fully equipped kitchens and bicycle parking facilities) and excellent working tools (height-adjustable desks, interactive conference rooms)
Annual bonus up to 10% of the annual salary gross (depending on your annual assessment and the company’s results)
A wide selection of fringe benefits in a cafeteria plan – you choose what you like (e.g. medical, sports or lunch packages, insurance, purchase vouchers)
English classes that we pay for related to the specific nature of your job
Working in a team you can always count on — we have on board top-class specialists and experts in their areas of expertise
A high degree of autonomy in terms of organizing your team’s work; we encourage you to develop continuously and try out new things
Hackathons, team tourism, training budget and an internal educational platform, MindUp (including training courses on work organization, means of communications, motivation to work and various technologies and subject-matter issues)
A 16" or 14" MacBook Pro with M1 processor and, 32GB RAM or a corresponding Dell with Windows (if you don’t like Macs) and other gadgets that you may need

What will your responsibilities be?

You will be actively responsible for building data processing tools for modeling, analysis and ML – in close cooperation with Data Science team
You will be supporting Data Science team in the development of data sources for ad-hoc analyses and Machine Learning projects
You will process terabytes of data using Google Cloud Platform BigQuery, Composer, Dataflow and PySpark as well as optimize processes in terms of their performance and GCP cloud processing costs
You will collect process requirements from project groups and automate tasks related to preprocessing and data quality monitoring, prediction serving, as well as Machine Learning model monitoring, alerting and retraining
You will be responsible for the engineering quality of each project and you will cooperate with your colleagues on the engineering excellence

Why is it worth working with us?

Through the supplied data and processes, you will have a meaningful impact on the operation of one of the largest e-commerce platforms in the world
Thanks to the wide range of projects we are involved in, you will never be without an interesting challenge to take on
You will have access to vast datasets (measured in petabytes)
You will get a chance to work in a team of experienced engineers and BigData specialists who are willing to share their knowledge (incl. with the general public, as part of allegro.tech)
Your professional growth will follow the most recent open-source technological trends
You will have an actual impact on the directions of product development and on the selection of particular technologies – we use the most recent and best technological solutions available, because we align them closely with our needs
We are a full-stack provider – we design, code, test, deploy and maintain our solutions

Send us your CV and learn why it’s #goodtobehere