Job Description:
The DevOps Engineer supports high-availability 24/7 production systems of low to moderate complexity and risk. The role performs ongoing application support for live production systems by diagnosing and resolving issues, configuring, patching, and installing custom-developed and third-party package applications and upgrades; identifying and recommending options for improving performance, maintainability and operability; update existing practices and procedures, as defined by supervisor. The role may also assist senior technical staff with architecting technical solutions; work on projects requiring technical decision making and providing input to delivery schedules; normally receive basic instruction on routine work and general instructions on new assignments; and have work periodically reviewed by more senior team members or supervisors for the soundness of technical judgment and overall accuracy.
Principal Responsibilities
- Acts as the single point of contact for development and product teams into Technical Operations and as the primary subject matter expert within Technical Operations for a limited number of applications and services
- Works with senior members of the team to ensure operational requirements (reliability, availability, scalability, performance, capacity etc.) are met, and recommends operational improvements to them
- Knowledgeable of monitoring for owned applications and proactively monitors and manages the runstate of them; responsible and accountable for ensuring all issues are addressed and resolved in a timely and robust fashion
- Performs technical work to accomplish tasks and projects within defined time lines and in a professional manner in alignment with active standards
- Installs, upgrades, configures, repairs and monitors high-availability 24/7 application service, third-party appliances, and applications; assists with automation scripts and tools to improve the team’s overall efficiency
- Works with others or independently on projects of low to medium complexity with minimal cross team alignment needs
- Works with team members on project tasks that derive from release requirements; uses ticketing system to accurately document tasks performed for assigned production systems; communicates technical issues within entire product release
- Documents operations and manages resources related to procedures, including installation, maintenance, restart / recovery, monitoring and troubleshooting; performs ongoing revision and testing of established procedures
- Performs maintenance and service functions to support production infrastructure, including system installation, systems administration, patching and configuration and software upgrades
- Provides second-level support (application and host specific tasks) of Web services monitoring alerts; refers to resolution processes, and escalates and communicates outages/issues to senior team members and supervisor
- Participates in 24x7 on-call rotation and responds to production alerts involving multiple software system components, using background, experience and established procedures to resolve issues and restore services as quickly as possible
- Follows and updates the Tactical Run Books and SOP’s, which define the processes and procedures for first-level support of Web Operations systems; continues troubleshooting beyond what is documented
- Keeps abreast of technical trends, and develops and incorporates them within assignments as appropriate; recommends improvements and changes to supervisor and team
Knowledge and Skills
- Bachelor’s Degree or equivalent in computer science, electrical engineering, or related field is preferred with 5+ years of directly related work experience
- Strong working knowledge of Unix/Linux systems administration and troubleshooting
- Proficiency in one or more administrative languages such as Bash, Ruby, Python preferred
- Knowledge and exposure to network protocols and tools for system side diagnostics
- Knowledge and exposure to secure communication tools and principles (SSH, SSL, TLS, etc.)
- Experience with commercial or open source monitoring systems (Nagios, Zenoss, etc.) a plus
- Knowledge of distributed systems development, tools and designs
- Familiarity with failure mode analysis
- Experience with distributed computing and Software Registration systems such as Cassandra/Zookeeper a plus
- Understanding of basic software lifecycle process
- Familiarity with version control tools (Git, Perforce, SVN, etc.) a plus
- Familiarity with common frameworks, languages and application servers for Webapps, Java, C/C++ or other languages a plus.
- Experience with hosting applications in Cloud, and principles of elasticity using third party cloud technologies (Amazon AWS, Azure, CloudStack, Rackspace, etc.) a plus
- Experience with automated host installation and configuration enforcement technologies (Chef,Cobbler, KickStart, Puppet, Foreman, etc.) a plus
- Experience with relational database technologies (SQL, schema design, etc.) a plus
- Knowledge and experience in the administration and operations of large-scale distributed computing environments; experience with standard system Operations methods and procedures; prior hosting experience a plus
内推联系信息:
Please register(Free!) to see Job Contact Information
Or contact [email protected] if you want become a member.