Staff/Principal Big Data Engineer (Kafka/Kinesis) - Ancestry (Lehi, UT or San Francisco, CA)

  • Lehi, UT, USA
  • Full-time

Company Description

When you join Ancestry, you join our family tree. Backed by history, science, and technology, we’re creating a new world of connection, innovation, and understanding. Whether it’s reuniting long-lost relatives through DNA or unearthing new family stories from historical records, Ancestry empowers life-changing experiences. With over 10 billion digitized historical records, 100 million family trees, and 14 million DNA kits sold, Ancestry is bringing the power of personal discovery to people around the world.

For more information on what we do and why you would want to work at Ancestry, visit

Job Description

At Ancestry we have an amazing opportunity to work with very interesting, massive data sets. We are looking for a passionate Data Engineer that thrives on challenges, has a deep understanding of distributed data systems and data architecture, and a strong software background. This person will take a lead role in furthering the big data footprint at Ancestry and work closely with Business Intelligence, Data Infrastructure, and Data Services teams in developing and maturing our data pipelines that include: a near-real time enterprise data warehouse, an infrastructure for data analytics and machine learning, and real-time alerting and monitoring solutions. This position is located in Lehi, UT/ San Francisco, CA. Telecommuting is not an option. 

Key Responsibilities / Performance Requirements:

  • Technical lead on the data platform team, responsible for scaling the platform to meet Ancestry’s data growth.
  • Develop, deploy, and support real-time automated data streams from numerous sources into the data platform.
  • Develop and implement data auditing strategies and processes to ensure data accuracy and integrity
  • Deploy IAC (Infrastructure as Code) builds to lay down the infrastructure that the data pipelines utilize.
  • Mentor and teach others


  • BS or MS degree in Computer Science, IS or related field.
  • Expert with Big Data ecosystems, including Kafka and Kinesis
  • Expertise in building and deploying streaming spark solutions in AWS
  • Proficiency in database technologies MySQL (Aurora), MSSQL, Redshift or equivalent
  • Expert with Terraform, CloudFormation, or other infrastructure as code tool
  • Mastery of one of the following data formats Parquet, AVRO, ORC
  • Proficient in Java/Scala (preferred) or Python with 5+ years of experience in an enterprise
  • Experience with Test Driven Code Development, SCM tools such as GIT, SVN, Jenkins build and deployment automation.
  • Experience implementing open source technologies.
  • RESTful web service development
  • Experience with HBase or comparable NoSQL.
  • Strong grasp of algorithms and data structures
  • Good familiarity with in Linux/Unix, scripting and administration
  • Experience with AWS Cloud automated deployments.

Additional Information

Ancestry is not accepting unsolicited assistance from search firms for this employment opportunity. All resumes submitted by search firms to any employee at Ancestry via-email, the Internet or in any form and/or method without a valid written search agreement in place for this position will be deemed the sole property of Ancestry. No fee will be paid in the event the candidate is hired by Ancestry as a result of the referral or through other means.

 Ancestry is an Equal Opportunity Employer that makes employment decisions without regard to race, color, religious creed, national origin, ancestry, sex, pregnancy,  sexual orientation, gender, gender identity, gender expression, age, mental or physical disability, medical condition, military or veteran status, citizenship, marital status, genetic information, or any other characteristic protected by applicable law.   In addition, Ancestry will provide reasonable accommodations for qualified individuals with disabilities