Senior Big Data Infrastructure Architect
- 305 Main St, Redwood City, CA 94063, USA
PubMatic is a publisher-focused sell-side platform for an open digital media future. We exist to help our clients succeed. We work tirelessly to optimize your performance while our SSP enables you to make smart, strategic decisions.
Featuring the leading omni-channel revenue automation platform for publishers and enterprise-grade programmatic tools for media buyers, PubMatic’s publisher-first approach enables advertisers to access premium inventory at scale.
Processing nearly one trillion ad impressions per month, PubMatic has created a global infrastructure to activate meaningful connections between consumers, content and brands.
Since 2006, PubMatic’s focus on data and technology innovation has fueled the growth of the programmatic industry as a whole. Headquartered in Redwood City, California, PubMatic operates 11 offices and six data centers worldwide.
PubMatic's data center team is looking for a Senior Big Data Infrastructure Architect who will be responsible for assisting with the design, implementation and ongoing support of the Big Data platforms.
In this role, you will be responsible for installing, configuring and maintaining multiple Hadoop clusters. You will be responsible for design and architecture of the Big Data Platform, working with development teams to optimize different Hadoop deployments and code into multiple environments.
Duties and Tasks:
- Manage large scale Hadoop environments, handle builds, including design, capacity planning, cluster setup, performance tuning and ongoing monitoring
- Evaluate and recommend systems software and hardware for the enterprise system, including capacity modeling
- Architect our Hadoop infrastructure to meet changing requirements for scaling, reliability, performance and manageability
- Work with core production support personnel in IT and Engineering to automate deployment and operations of the infrastructure
- Manage, deploy, and configure infrastructure with Ansible or other automation tool sets
- Create metrics and measures of utilization and performance
- Increase capacity by planning to implement new/upgraded hardware and software, including storage infrastructure
- Ability to work well with a global team of highly motivated and skilled personnel - interaction and dialogue are requisites in our dynamic environment
- Research and recommend innovative solutions, including automated approaches for system administration tasks where possible
- Identify approaches that leverage our resources, provide economies of scale, and simplify remote/global support issues
- Monitor and maintain cluster connectivity and performance
- Configure cluster to get the best performance for our requirement
- Identify faulty nodes and programmatically isolate them to avoid process/job failures
- Monitor file system to maintain data locality and accessibility
- Keep track of all the Hadoop jobs and recommend optimization
- Alert and terminate resource intensive jobs
- Define and setup ACL policies
- Monitor clusters for data loss and protect against hacking
- Allocate and manage compute, memory, storage, number of name-node objects for individual pools and user groups
- Build dashboards to identify security threats on the cluster
- Add and remove nodes as required
- Plan and optimize cluster capacity
- Maintain latest software versions on the clusters
- Upgrade software and tools by coordinating with business, customer success and engineering teams
- Install and maintain software libraries upon project needs
- Constantly evaluate new technologies
- Hire and mentor junior Hadoop Administrators
- Attend daily scrums to coordinate with software development agile teams
- Document and articulate technical details with stakeholders
- Present project plan and status at steering committee meetings
- 9+ years of professional experience supporting production medium to large scale Linux environments.
- 5 years of professional experience working with Hadoop (HDFS & MapReduce) and related technology stack.
- A deep understanding of Hadoop design principles, cluster connectivity, security and the factors that affect distributed system performance.
- Experience on Kafka, HBase and Hortonworks is mandatory.
- Solid understanding of automation tools (puppet, chef, ansible)
- Expert experience with at least one, if not most, of the following languages; python, PERL, Ruby, or Bash
- Prior experience with remote monitoring and event handling using Nagios, ELK.
- Solid ability to create automation with chef, puppet, ansible or a shell
- Good collaboration & communication skills - the ability to participate in an interdisciplinary team
- Strong written communications and documentation experience
- Knowledge of best practices related to security, performance, and disaster recovery
- BE/BTech/BS/BCS/MCS/MCA in Computers or equivalent
- Excellent interpersonal and verbal communication skills
Nice to Have:
- MapR and MySQL experience is a plus
All your information will be kept confidential according to EEO guidelines.