Lead Associate – Site (Service) Reliability Engineering (SRE)

  • Reston, VA
  • Employees can work remotely
  • Full-time
  • Worker Classification: Remote

Company Description

At Fannie Mae, futures are made. The inspiring work we do makes an affordable home a reality and a difference in the lives of Americans. Every day offers compelling opportunities to modernize the nation's housing finance system while being part of an inclusive team using new, emerging technologies. Here, you will help lead our industry forward, enhance your technical expertise, and make your career.

Job Description

As a valued colleague on our team, you will act as a team lead in the designing, producing, testing, or implementing software, technology, or processes, as well as lead processes for creating and maintaining IT architecture, large scale data stores, and cloud-based systems.

You will apply your expertise in software and systems engineering to ensure that both our internally critical and externally visible systems meet the appropriate performance needs of our users. You will serve as a champion of service availability, efficiency, automation, monitoring, and capacity management. Specifically, you will leverage your skills and experience in Amazon Web Services, software development with Java and/or Python, customization in Splunk and/or Dynatrace, and automation in Selenium and/or Blue Prism (among others) to enable increased feature velocity and continuous improvement.


The Site (Service) Reliability Engineering (SRE) Lead Associate role will offer you the flexibility to make each day your own, while working alongside people who care, so that you can deliver on the following responsibilities:

  • Independently determine the needs of the customer and create solution frameworks.
  • Design and develop moderately complex software solutions to meet needs.
  • Use a process-driven approach in designing and developing solutions.
  • Implement new software technology and coordinate end-to-end tasks across the team.
  • May maintain or oversee the maintenance of existing software.



Minimum Required Experiences

  • 4+ years of relevant work experience

Desired Experience

  • Bachelor’s Degree in Computer Science, Management Information Systems (MIS), Systems Engineering, or related field
  • Certification in AWS Solutions Architect Associate or Developer Associate, Splunk Certification Developer, or Sun Certified Java Developer
  • Experience with Scaled Agile Framework (SAFe) and Jira / Confluence
  • Experience with application production / operations support, including incident response, problem management, runbooks, and knowledge articles using tools such as ServiceNow, Moogsoft, StatusHub, and / or Blameless
  • Understanding of error budgeting and toil reduction
  • Experience with post-mortems, root-cause analysis (RCA), and / or AWS Correction-of-Errors (CoE)
  • Experience creating disaster recovery plans and executing failover tests
  • Experience with capacity planning and performance testing / engineering tools, such as JMeter and / or LoadRunner
  • Experience with Failure Mode Effect Analysis (FMEA) and Chaos testing / engineering tools, such as Gremlin, Chaos Monkey, Chaos Toolkit, AWS Fault Injection Service (FIS)
  • Experience with programming in Java and / or Python
  • Understanding J2EE frameworks, such as JavaScript, Spring Boot / Spring Cloud, and REST
  • Understanding of Java performance monitors (JVM, GC, Heap Size, Message Broker)
  • Experience with building automation solutions using tools such as BluePrism and / or Selenium
  • Understanding of fault tolerant / resilience architectural design patterns, such as Bulkhead, Circuit-breaker, Retry, Timeout, etc.


  • 3+ years of experience supporting AWS cloud applications and technologies, including containerization, virtualization, microservices, and server-less architecture in tools
  • 2+ years of experience working in an Agile, Scrum, or Kanban environment
  • 2+ years of experience application monitoring / observability, including building dashboards, establishing service level indicators / objectives / agreements (SLIs / SLOs / SLAs), and logging / tracing using tools
  • 2+ years of experience with CI/CD / DevOps deployment tools
  • Excellent problem-solving skills and proactivity in resolving issues / blockers
  • Excellent verbal / written communication skills, relationship management skills, and ability to collaborate with multiple stakeholders
  • Eagerness to learn and ability to work independently with minimal guidance


  • Experience with AWS Elastic Container Service (ECS) and Fargate
  • Experience using tools such as AWS CloudWatch, Splunk, Dynatrace, CatchPoint, and / or Datadog
  • Excellent understanding and demonstrated experience in the use of DevOps/ CICD tools like Jenkins, Terraform, UrbanCode Deploy (UCD), and / or GitLab
  • Understanding of IT Service Management (ITSM) ​​​​​​​

Additional Information

Job REF ID: REF2759D

The future is what you make it to be. Discover compelling opportunities at Fanniemae.com/careers.

Fannie Mae is an Equal Opportunity Employer, which means we are committed to fostering a diverse and inclusive workplace. All qualified applicants will receive consideration for employment without regard to race, religion, national origin, gender, gender identity, sexual orientation, personal appearance, protected veteran status, disability, age, or other legally protected status. For individuals with disabilities who would like to request an accommodation in the application process, email us at careers_mailbox@fanniemae.com.

Privacy Policy