OVERALL SUMMARY:
As a Site Reliability Engineer at HBO you will be working to improve the reliability and performance of HBO’s premiere digital properties including HBO GO and HBO.com. This person will be deeply hands-on with infrastructure, systems, automation, monitoring and system telemetry, and operational processes. This person will understand the challenges around rapidly creating, scaling and managing distributed applications and will be able to collaborate with talented engineers across multiple disciplines to address those challenges.
PRIMARY RESPONSIBILITIES:
Troubleshoot issues across the entire stack: hardware, software, application and network. Physical hardware and cloud-based environments.
Drive standardization efforts across multiple disciplines and services
Identify and drive opportunities to improve automation for the company
Manage timely resolution of all critical and/or complex problems meeting SLA requirements
Participate in a 24x7 on-call rotation
Ability to effectively communicate with all levels of management and all stakeholders.
Develop, configure and optimize service and application monitoring and telemetry
REQUIREMENTS:
Proficient with TCP/IP, HTTP, web application security, and experience supporting multi-tier web application architectures
Ability to actively participate in infrastructure design and implementation.
Solid knowledge of shell scripting and at least one scripting language (Python strongly preferred)
Must be adaptable and able to focus on the simplest, most efficient & reliable solutions
Track record of successful practical problem solving, excellent written and interpersonal communication, and documentation skills
Practical knowledge of various aspects of service design, including messaging protocols & behavior, caching strategies and software design practices
Must work well with and be able to influence a myriad of personalities at all levels
内推联系信息:
Please register(Free!) to see Job Contact Information
Or contact [email protected] if you want become a member.