Principal Big Data Engineer

  • 153 Townsend St, San Francisco, CA 94107, USA
  • Full-time

Job Description

Ancestry’s Big Data Engineering team is expanding! Ancestry is creating groundbreaking technology for algorithmic search, machine learning, natural language processing, and artificial intelligence. The features we’re creating are redefining how people experience Genealogy. We are growing our engineering team to facilitate the needs for providing intelligent driven user experience.

The Opportunity

We are looking for a Principle Big Data Engineer to join and build cutting-edge technologies with the team. You would be expected to stay up-to-date on software engineering oriented solutions and be able apply your vast knowledge and experience in Big Data engineering toward the design, development, and implementation of Ancestry's ML-AI-as-a-service platform.

The ideal candidate will thrive in our highly collaborative workplace and proactively engage with your team's architects, engineers, and primarily client group, Ancestry's exceptional global Data Science team. Our global Data Science team would be in charge of developing groundbreaking intelligent driven solutions. The platform you help to design and build will handle both the needs in deploying models in production and model development, such as preparation of data for use by our data scientists (i.e. ETL pipelining). The new ML-AI platform must highly-scalable, elastic, and fault-tolerant platform and infrastructure upon models will be deployed and also provide the interface through which our data science team can "push" their models into production (e.g., API's and/or client UI). Prior experience in designing high-capacity Big Data infrastructure is a must for this position! This is a truly unique career opportunity to lead and build a world-class ML-AI-as-a-service platform from the ground up.

If you’re a data engineer that lives and breathes machine learning, this may just be your dream job.

What you will do:

  • Design, develop, and deploy a high-volume ETL pipelining system to manage complex real-time, data collection. 
  • Design, develop, and deploy a high-volume ETL pipelining system to manage complex real-time, data collection. 
  • Work with the team's architects to develop architectural blueprints, and a long-term technical roadmap, for our ML-AI-as-a-service platform. You must balance your focus on both the immediate needs and on the long-term view (i.e., projected future feature set, capacity, and scalability requirements).
  • Interpret and translate the needs of our global data science team into technical requirements
  • As the senior-most engineer on your team, you be the lead engineer of this highly-complex, production system.
  • Evaluate and recommend tools, technologies and processes to ensure that the services you provide achieve the highest standards of quality and performance.
  • Collaborate with other peer organizations (e.g., DevOps, technical support, etc.) to prevent and resolve technical issues and provide technical guidance.
  • Focus on scalability, security and availability of all applications and processes.
  • Motivate and mentor team members on required coding standards and best practices through code review process.
  • Excellent communication skills, and ability to lead technical discussions and engage with downstream teams on their rewrites.


  • Bachelor in computer science or related field is required (masters preferred) 
  • You have a minimum of 7 years of experience in the design, development, and deployment of large-scale, distributed, and cloud-deployed software services.
  • In addition to 7 years of more generalized, large-platform engineering experience, you must have a minimum of 2 years of experience in Big Data software development technologies (e.g., Hadoop, Hive, Spark, Kafka) and exposure to resource/cluster management technologies (e.g., Mesos, YARN)
  • Must be highly-proficient in both Java and Python.
  • Minimum of 1 year of experience with AWS (e.g., EC2, S3, EMR, SNS, SQS, Aurora).
  • Exposure to Machine Learning and Deep Learning libraries is a plus (e.g., TensorFlow, Scikit-learn, etc).
  • Experience with various software technologies/solutions and understand where to use them. 
  • Experience with data-centric languages like R and Scala is a plus.
  • Experience of SOC, SOX, GDPR is a plus.

Additional Information

We’re a cutting-edge tech company with a very human mission—to help every person discover, preserve, and share the story of what led to them. Combining the rich information in family trees and historical records with the genetic details revealed in DNA, we create unique experiences that give people a new understanding of their lives, because connecting all the pieces of our family story can give us the deepest sense of who we are.

For more information on what we do and why you would want to work at Ancestry, visit our careers

Ancestry is not accepting unsolicited assistance from search firms for this employment opportunity. All resumes submitted by search firms to any employee at Ancestry via-email, the Internet or in any form and/or method without a valid written search agreement in place for this position will be deemed the sole property of Ancestry. No fee will be paid in the event the candidate is hired by Ancestry as a result of the referral or through other means . 

Qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability or protected veteran status.