Site Reliability Engineer SRE / CloudOps / DevOps

Engineering | Remote / Berkeley / San Jose | Full Time

Nefeli Networks is an award-winning early-stage startup in Silicon Valley focused on simplifying multi and hybrid cloud networking. This is the opportunity to get in early as we begin to ramp revenue and expand our offer. You’ll be working with a talented team of innovative developers, making key new contributions to the cloud and networking fields.

We are looking for a highly motivated DevOps/Site Reliability Engineer to join our exceptional team. The candidate we are looking for is ready to design, automate and support our cloud infrastructure, back-end systems and do technical integration with our partners. The ideal candidate would have experience operating and supporting cloud networking solutions and familiarity with automation tools and processes.

As a DevOps/SRE at Nefeli, you will play a critical role in helping us shape our software stack and hardware infrastructure. Your knowledge of cloud and automation frameworks will enhance our development team to satisfy customer business and functional requirements. You will also be instrumental in deploying our SaaS platform.

Apply Now

Responsibilities:

  • Work with engineering to improve the whole product lifecycle through inception, design, deployment, operation, and refinement
  • Design, build and operate Cloud infrastructure to enable reliable and rapid deployment of microservices with effective monitoring and resilient operations
  • Work with development teams to make sure applications are production ready, scalable, and reliable from the ground up
  • Identify and drive opportunities to improve automation for code deployment, management, and visibility of application services
  • Develop tools and framework to automate operational tasks, deployment of machines, services, applications
  • Write automation code for provisioning and operating infrastructure at scale
  • Establish end-to-end monitoring and alerting on all critical components of the applications, including availability, latency, and overall system health
  • Participate in the on-call rotation supporting the platform and/or the production application
  • Direct root-cause-corrective-action analysis of critical business and production issues
  • Develop standard methodology for Infra orchestration and troubleshooting application service in production
  • Represent DevOps/SRE in design reviews and works with Engineering teams on operational readiness

Technical Qualifications:

  • BS Computer Science, Engineering or a related field, or equivalent professional experience
  • Experience in AWS, Azure or GCP cloud computing and its related services
  • Experience with Unix/Linux operating systems internals and administration
  • Expertise in cloud build-out using Terraform and system configuration management with a framework such as Ansible, Chef, or Puppet
  • Good understanding of networking technologies as they relate to the cloud
  • Good understanding in the areas of server & network virtualization, and global infrastructure, distributed systems, load balancing and security
  • Strong fundamentals working with REST APIs
  • Experience with CI/CD pipelines and git
  • Ability to debug and optimize scripts
  • Passion for automation and monitoring
  • Knowledge of best practices related to security, performance, and disaster recovery

Other Qualifications:

  • Ability to communicate effectively and succinctly
  • Strong systematic problem-solving skills and able to work in ambiguity
  • Excellent written and verbal communication, able to collaborate and rally support
  • Excellent interpersonal skills and the ability to work well in a team
  • Self-disciplined, self-managed, self-motivated, and strong sense of ownership, urgency and drive; positive attitude with the ability to quickly learn new technologies and effectively manage parallel projects
  • Ability to diagnose and troubleshoot complex distributed systems handling high volume transactions
  • Passionate to learn, understand, and dissect new technologies quickly and independently

Preferred Qualifications:

  • 5+ years of related experience
  • Experience with modern metrics/logging/tracing tools such as Grafana / Prometheus / Loki / Tempo
  • Experience with cluster orchestrators such as Kubernetes or Nomad
  • Experience with the HashiCorp stack such as Nomad / Consul / Vault
  • Experience with internal tooling such as Teleport and Opsgenie
  • Experience with networking (e.g., TCP/IP, routing, network topologies and hardware, SDN, NFV)
  • Experience with etcd, NoSQL (e.g., MongoDB) and time series Databases
  • Familiarity with CI/CD pipelines such as Jenkins or GitHub Actions
  • Familiarity with build systems such as make
  • Familiarity with test frameworks such as pytest
  • Proven experience working with customers and vendors

Begin the Application Process