Site Reliability EngineerRequirements: - A bachelor’s degree in computer science, engineering or related field, or equivalent experience
- Proficient in one or more programming languages, such as Python, Go, Java, or C++
- Proficient in one or more scripting languages, such as Bash, Perl, or Ruby
- Proficient in one or more cloud platforms, such as AWS, Azure, or GCP
- Proficient in one or more UNIX-like operating systems
- Proficient in one or more configuration management and deployment tools, such as Ansible, Chef, Puppet, or Terraform
- Proficient in one or more monitoring and alerting tools, such as Prometheus, Grafana, Datadog, or Splunk
- Proficient in one or more container and orchestration tools, such as Docker, Kubernetes
- Proficient in one or more web servers and proxies, such as Apache, Nginx, or Envoy
- Proficient in one or more databases and data stores, such as MySQL, PostgreSQL, MongoDB, or Redis
- Proficient in one or more version control and collaboration tools, such as Git
- Knowledgeable in the concepts and principles of site reliability engineering, such as SLIs, SLOs, error budgets, incident management, postmortems, and blameless culture
- Knowledgeable in the concepts and principles of software engineering, such as design patterns, code quality, testing, debugging, and documentation
- Knowledgeable in the concepts and principles of performance engineering, such as profiling, benchmarking, load testing, and capacity planning
- Knowledgeable in the concepts and principles of distributed computing, such as concurrency, parallelism, synchronization, and consensus
Responsibilities: - Assisting with resources to facilitate engineering services, and keep them operational
- This includes continuous integration systems, software deployment and basic troubleshooting of code, and creation and management of software repositories
- Ensuring servers are patched against security exploits in time, managing secure access to servers and repositories for partners and internal staff, and secure interconnection between systems
- Ensuring servers are configured in a documented and repeatable way
- Ensuring system and server architecture is appropriate to the requirements of projects, easily maintainable in the long term, and provides appropriate levels of redundancy
- Provide timeous uptime assurance, and support with issue investigation and recovery procedures
Skills:- AWS
- C++
- Docker
- Java
- MongoDB
- MySQL
- Perl
- postgreSQL
- Python
- Ruby
Posted on 23 Aug 16:51, Closing date 22 Sep |
| |
|
|