A system administrator, or a sysadmin, plays a key role in software development and system management. While they mainly hold an administrative position, their responsibilities cover multiple areas, wearing several hats and managing various parts of the software development process, servers, and overall infrastructure.
Sysadmins mainly work behind the scenes, ensuring the operations they supervise go as smoothly as possible by implementing optimal practices and meeting applicable regulatory requirements.
Required Skill-Set
Depending on the level, type, and aspirations of the system a system administrator manages, they need a diverse set of skills that are up to date to understand its ins and outs.
Mastery of Automation and Management Tools
As dev operations become more complex, it’s up to the system administrator to recognize patterns in the workflow and tasks and find ways to automate them using management tools such as Puppet and Chef.
Extensive Knowledge of Cloud Infrastructure
Encompassing cloud infrastructure into any working system became inescapable as users and organizations started expecting cloud-integrated systems and applications from their providers. A sysadmin is responsible for setting the proper monitoring and alert tools, servers, and insider network.
Programming
Since a sysadmin is responsible for managing and overseeing software and hardware, being able to script and program in relevant languages is an indispensable skill. Understanding the various coding languages allows the system administrator to work with developers from the early stages of development, identifying bugs and glitches and shortening the deployment life cycle.
Server Maintenance
Servers are what takes care of most—if not all—of a system’s data. They’re also how users access and utilize the app or service. However, they require regular maintenance and monitoring to remain in top condition and reduce errors. Sysadmins need to suggest server improvements for better performance as operations get more elaborate.
The Ability to Learn On-the-Job
Being a sysadmin is a demanding role where one might face unprecedented challenges. The ability to learn on-the-job means a sysadmin needs to adapt to new changes in their industry and solve problems they weren’t prepared for.
The Role of a SysAdmin
A system administrator is responsible for various aspects of the DevOps process. Their roles divide into two categories, administrative roles and security management.
Administrative Roles
Sysadmins’ roles vary depending on the system they’re responsible for, the number of people they oversee, and what tech-support resources they have at arm’s reach, whether it’s internal teams or outsourced talent. Administrative roles primarily include:
User Administration and Training
A system administrator’s purpose is to bridge the gap between a complex system and its users, which could be company employees, independent contractors, or clients. Their job is to ensure everyone uses the system efficiently and adequately. In some cases, sysadmins are the shared point of contact between the IT experts and average users regarding system and tech issues.
In complex systems, the system administrator might have to directly communicate with users to solve technical issues. But when the system is used by many individuals with no background in tech, contacting each one separately can put immense pressure on the sysadmin with unsatisfactory results. In that scenario, they might have to conduct training programs that familiarise users with the system, or, at the least, supervise the production of one. The end result is a training plan that meets users at their level and provides them with the needed tools and network access to use the system.
Monitoring and Alerts
While a sysadmin is not responsible for directly solving issues that pop up during development or update deployment, they are responsible for setting the requirements and standards of the system’s monitoring and alerts. And since every team, company, and project has a unique set of variants of their own; there is no one-size-fits-all solution or template for monitoring and alerts. Therefore, they need to be fully aware of their team’s abilities, goals, and hurdles they’re most likely to encounter, and set the alerts accordingly.
Monitoring is the process of collecting data from a set source that you need information on. The collected data needs to allow developers to identify the root of the problem in order to avoid it reoccurring in the future. Also, having the right data can allow for fairly-accurate corrective scripts that automatically solve the issue when it occurs in the future without needing human intervention or the time wasted seeking it.
The same principles apply to alerts. Alerts notifying whenever something goes wrong is an excellent first step in keeping a system stable. But an alert needs to have sufficient information on the error. Otherwise, the team or person in charge could waste a lot of time looking for what triggered the error message instead of working on fixing it. An error alert needs to be detailed enough to easily identify the issue but not cluttered that it wastes the developer’s time.
When it comes to alerts and monitoring, a sysadmin needs to find the right balance between too little and too much data while still keeping the system flexible enough for future modification as it evolves and the goals change.
Setting System Policies
Having the right system policies ensures that everything flows smoothly between users from different departments with varying access privileges and technical skills. The right policies also ensure the system remains as secure as possible, preventing users from performing specific actions that could put it at risk, such as installing third-party software or editing or accessing critical system files.
That’s not to say the stricter the policies the better. The sysadmin needs to provide team members with enough freedom to do their job without having to ask for editing or access permissions often. And limit access to where they could accidentally damage the system or interfere with vital operations. There are several types of system policies that need setting depending on the complexity of the teams and how many people are involved in the network, such as:
- Email policy.
- Encryption policy.
- Communication policy.
- Access control policy.
- Data disposal policy.
- Hardware control policy.
Overseeing Software Installation and Updates
Whether the company uses native apps or third-party software, they need maintenance and upkeep similar to hardware components. Depending on the number of devices and users in the network, managing software installation and updates can range from a simple manual task to a complex one that requires a degree of automation.
The type of automation a system administrator needs depends on the state of the hardware. Scripting kickstart or preseed can efficiently partition a section of the hard drive when installing software on a bare-metal server. Although this process sometimes results in errors that need fixing after the automatic installation process. Luckily, most modern systems rely on some form of server virtualisation and cloud hosting, which have a standard base that the sysadmin can then easily customize into the type of machine they need using a cloud-init script.
Problem Solving
A sysadmin’s job is to prevent problems and errors from taking place with suitable policies and proper management. However, in the software industry, where the architecture of every update is more complicated than the one before, solving is just as important as prevention. Unlike other skills and roles, problem-solving depends on a sysadmin’s intuitions and their in-depth knowledge of the system they’re administrating.
Profound knowledge can be applied to recognizing the issue, determining its origin whether it’s hardware, incompatible software, user error, etc. It also matters whether the issue was noticed due to the system crashing or through an error message. If there wasn’t an error message, the system administrator would have to go over the monitoring and alerts system again.
Understanding why the problem happened in the first place allows the sysadmin to prevent errors of a similar nature from happening again. But that’s not to say the process isn’t time-sensitive, with update deployment and milestones always nearby.
System Scaling
Since a company can change in capacity, either growing exponentially or cutting back on operations, the systems and infrastructure need to be auto scalable to changes in size with minimum issues. When building and implementing changes to the system—whether it’s storage, security, or access control— system administrators need to make sure their alterations are scalable as needed. Scalability should also be included in application development and updates, as it’s much easier to write flexibility into the early stages of development and design rather than later on.
Ensuring Application Compatibility
As the IT team and developers work on the software, there needs to be a set expectation for the final product’s compatibility with the company’s internal system and infrastructure, the virtual machine deploying it, or clients using it. The sysadmin performs regular testing throughout the various development stages to ensure compatibility and upgrade hardware and software requirements as needed.
Incompatibility becomes significantly harder to prevent when there’s a rapid release of new updates and versions. The more changes the development team implements, the higher the chances of incompatibility, which can cause the OS to crash, the hardware to overheat, resulting in irreversible damage, and interference with other functioning software.
Documentation and Internal Wikis
Software documentation is the work of keeping consistent and accurate records of the software’s development process or creating an ‘internal wiki’ for future reference. The documents include everything from the IT resources used and dev teams to programming languages used. The documents are stored in the final product’s source code, providing a detailed description of how the software was developed in great detail. In addition to helping devs with future versions and updates, software documentation is often a requirement for regulatory compliance.
Proper documentation covers various parts of the software, but mainly, there are four types:
- User Documentation – covers the documents that a user might need when using the application. Depending on the target user base—whether they’re an average person or a tech insider, user documentation includes tutorials, user manuals, troubleshooting guides, and installation and update guides.
- System Documentation – is a detailed description of the system and its components, from hardware requirements, OS and software compatibility and user interface design to overall layout and architecture, and source code.
- Development Documentation – includes all documents, guides, and manuals and readme’s that cover the application development process. It can be thorough and include early project plans, prototypes, devs’ notes, product standards, and the debugging process.
- Product Documentation – is the combination of system and user documentation, describing the final vision of the application being developed and guides on how to use it, rather than how it was made.
Cybersecurity and Crises Management
In addition to administrating and managing the development and updates of a system, the sysadmin is responsible for setting the procedures they deem necessary to keep the system database and their progress secure and communicate the instructions over to the IT or specifically DevSecOps teams.
Backup and Recovery
System administrator’s are responsible for developing suitable backup and recovery plans. They need to ensure the backup is frequent enough to minimize data loss without overloading the system and getting in the way of operations and progress. They also need to make sure the data is well-organized, safe, and ready for a quick recovery when needed.
Creating Incident Detection and Response Strategies
The system needs to be optimized for efficiency, speed, and incident detection. Depending on the size of the teams and network in question, the sysadmin would either have to find the perfect balance on their own or consult with the IT department to set the suitable defence strategies that won’t sacrifice efficiency.
There also needs to be an incident response plan for who to contact first when an incident is detected. Having the right response strategies ensures operations go back to normal as soon as possible with few drawbacks.
Post-Incident Reports
Post-incident reports include information such as the damages caused by the attack, how long it took the system to detect it, and how smooth the recovery process was. Staying on top of incident reports and how they affected each department and section of the system can help the system administrator, in collaboration with the IT team, come up with an adjusted security strategy that would prevent similar incidents from occurring while still maintaining a productive work flow.
The Importance of a Qualified SysAdmin
System administration is one of the primary pillars of any software development or system operation process; they link the various departments, ensuring reliable communication and collaboration. Being knowledgeable in the various technical fields, such as programming, infrastructure management, and server maintenance is essential. But primarily, the role of a sysadmin depends on the company’s current needs and future scaling aspirations.
Author Bio:
Eleanor Bennett is a technical copywriter & digital marketing specialist with Logit.io. Her research has previously been featured in the Financial Times, The Huffington Post & Dzone across a range of topics.
Logit.io is a full-service log management & data analysis platform built on ELK that helps businesses scale and improves the observability of servers, applications and services by offering users a single centralised logging dashboard from which they can create dashboards, data visualisations and alerts.
The platform is used by engineers around the world to improve how they handle error resolution, data analysis and cross-team collaboration.