Senior Engineer – Site Reliability

Open position

6+ years


We are seeking an experienced Senior Site Reliability Engineer to join our team. This position will be responsible for architecting and implementing the next generation of AppTrana WAF and Analytics backend and ensuring the reliability, scalability, and performance of our production systems. Working closely with developers, quality assurance engineers, product management, and other stakeholders, you will design, build, and operate our infrastructure with an emphasis on automation, resiliency, and self-healing.

Job Description:

  • Maintain and monitor the availability of cloud infrastructure, troubleshoot, identify, and resolve production-level infrastructure issues.
  • Using Infrastructure as a Code (IAAC) tools, develop and maintain automation tools for provisioning, configuration management, and deployment.
  • Establish and maintain monitoring and alerting systems for the detection and response to incidents.
  • Demonstrate strong customer focus. Should have the ability to collaborate with internal teams and customers during incidents, explaining the issue, recommending immediate mitigations, and providing long-term solutions.
  • Investigate customer escalations and work closely with the engineering, support, and sales teams to implement a solution.
  • Perform a postmortem analysis of system failures and implement corrective measures as necessary.
  • Participate in the rotational on-call schedule based on the need to be available in an emergency.
  • A demonstrated track record of optimizing cloud infrastructure costs. Monitor and control the use of cloud resources, implement cost-saving measures, and provide recommendations for optimizing cloud costs.
  • Experience implementing security best practices and compliance measures in production environments. Experience with security audits, vulnerability assessments, and the implementation of security controls to protect sensitive data and ensure regulatory compliance.

Candidate Profile:

  • 6+ years’ experience with a focus on cloud infrastructure automation, configuration management, and deployment automation. Significant portion of AWS is used for mid to large size deployments.
  • Experience designing, architecting, and running large scale cloud infrastructure.
  • Experience working with reverse proxy, webservers, load balancing and CDN services.
  • Familiarity with security best practices and compliance frameworks such as PCI DSS
  • Strong interpersonal and communication skills (including oral, written, and listening skills)
  • Experience with stress testing and tuning production systems using tools such as K6, Locust
  • Experience in using AWS Cost Explorer, AWS Budgets, and AWS Cost and Usage Reports and optimizing costs to ensure efficient resource use

Technical skills:

  • Experience with AWS in designing, deploying, and managing cloud infrastructure.
  • Experience with scripting languages such as Python and Bash
  • Experience managing reverse proxies/web servers on a large-scale production level.
  • Experience with infrastructure as a code tool such as Terraform/CloudFormation
  • Experience working with Kafka, Elasticsearch, and RabbitMQ
  • Experience with observation tools such as Prometheus, Grafana, and Loki

Nice to have :

  • Experience in Linux performance tuning
  • Knowledge of Chaos Engineering and other resilience testing methodologies.

Join us

Resume *
Invalid file type please choose a PDF or DOC

Have a friend who
would love this?

Linkedin Icon - Indusface Share with your network
Refer A Friend Program - Indusface