Big Data DevOps Engineer

  • Full-time
  • Department: Data Center

Company Description

PubMatic (Nasdaq: PUBM) is an independent technology company maximizing customer value by delivering digital advertising’s supply chain of the future. PubMatic’s sell-side platform empowers the world’s leading digital content creators across the open internet to control access to their inventory and increase monetization by enabling marketers to drive return on investment and reach addressable audiences across ad formats and devices. Since 2006, our infrastructure-driven approach has allowed for the efficient processing and utilization of data in real time. By delivering scalable and flexible programmatic innovation, we improve outcomes for our customers while championing a vibrant and transparent digital advertising supply chain.

Job Description

Responsibilities:

  • Manage large-scale Hadoop cluster environments including capacity planning, cluster setup, performance tuning, monitoring and Alerting. 
  • Perform proof of concepts on scaling, reliability, performance, and manageability.
  • Work with core production support personnel in IT and Engineering to automate deployment and operation of the infrastructure. Manage, deploy, and configure infrastructure with Ansible or other automation toolsets.
  • Monitoring Hadoop jobs and recommend optimization 
    • Job Monitoring
    • Rerun jobs
    • Job Tuning
    • Spark Optimizations
  • Data Monitoring and Pruning
  • Creation of metrics and measures of utilization and performance.
  • Capacity planning and implementation of new/upgraded hardware and software releases as well as for storage infrastructure.
  • Ability to work well with a global team of highly motivated and skilled personnel. 
  • Research and recommend innovative, and where possible, automated approaches for system administration tasks. 
  • Integrating ML libraries
  • Hardware accelerations
  • SQream / Kinetica / Wallaroo monitoring and maintenance)
  • Should be able to develop and apply patches
  • Debugging Infrastructure issues (Like - Underlying network issue or Issues with the nodes)
  • Addition/replacement of Kafka cluster/consumer (Not sure if this is covered in Hardware acceleration)
  • Testing/Support of infrastructure component change (like changing the load balancer to F5).
  • Deployment during the release.
  • Help QA team with production parallel testing and performance testing.
  • Help out Dev team with POC/Adhoc execution of some of the jobs for debugging/cost analysis

Qualifications

  • 3 to 5 years of professional experience in Java, Scala and Python.
  • 2+ years of experience of Spark/MapReduce in a production environment
  • A deep understanding of Hadoop design principles, cluster connectivity, security and the factors that affect distributed system performance.
  • Experience on Kafka, Hbase, and Hortonworks is mandatory. 
  • Prior experience with remote monitoring and event handling using Nagios, ELK.

#LI-MD1

Additional Information

Return to Office: PubMatic employees throughout the global have returned to our offices via a hybrid work schedule (3 days “in office” and 2 days “working remotely”) that is intended to maximize collaboration, innovation, and productivity among teams and across functions. All PubMatic employees in the US and India are required to be fully vaccinated to return to our offices. Covid-19 boosters are not required at this point in time.

Benefits: Our benefits package includes the best of what leading organizations provide, such as stock options, paternity/maternity leave, healthcare insurance, broadband reimbursement. As well, when we’re back in the office, we all benefit from a kitchen loaded with healthy snacks and drinks and catered lunches and much more!

Diversity and Inclusion: PubMatic is proud to be an equal opportunity employer; we don’t just value diversity, we promote and celebrate it. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.