systemd Production Service Patterns: Practical Configuration and Troubleshooting¶
Target Audience
- Intermediate administrators operating systemd services in production environments (with basic systemctl command knowledge)
Key Points¶
- Prevention configuration for frequent dependency errors in production
- Implementation of stable operations with memory/CPU limits
- Construction of automatic recovery and log management patterns during failures
Why This Problem is Critical Now¶
In production environments, service outages, memory leaks, and dependency errors directly impact business continuity. While basic systemd service creation is understood, many struggle to resolve actual operational challenges like "startup order issues between services," "failures due to resource exhaustion," and "complex log management."
Solution Steps Overview¶
| Step | Content | Success Criteria |
|---|---|---|
| 1 | Explicit dependency configuration | Order guarantee with other services |
| 2 | Resource limitation implementation | Memory/CPU usage control |
| 3 | Auto-recovery and log management setup | Automatic recovery behavior during failures |
Step 1: Explicit Dependency Configuration¶
Implementation using typical web application + database pattern:
[Unit]
Description=Web Application Service
After=network.target postgresql.service
Wants=postgresql.service
Requires=network.target
[Service]
Type=forking
User=webapp
Group=webapp
ExecStart=/opt/webapp/bin/start.sh
ExecStop=/opt/webapp/bin/stop.sh
PIDFile=/var/run/webapp/webapp.pid
Restart=on-failure
RestartSec=10
[Install]
WantedBy=multi-user.target
Key Configuration Explanation: - After=: Start after specified service (order guarantee) - Wants=: Attempt to start dependency service (continue on failure) - Requires=: Essential dependency (stop this service if dependency fails)
Step 2: Resource Limitation Implementation¶
Limit settings to prevent memory leaks and CPU overuse:
[Unit]
Description=Resource-Limited Web Service
After=network.target
[Service]
Type=simple
User=webapp
ExecStart=/opt/webapp/app
Restart=always
RestartSec=5
# Resource limits
MemoryLimit=512M
CPUQuota=50%
TasksMax=100
# Security hardening
NoNewPrivileges=yes
PrivateTmp=yes
ProtectSystem=strict
ReadWritePaths=/var/log/webapp /var/lib/webapp
[Install]
WantedBy=multi-user.target
Effect: Automatic kill when exceeding 512MB memory ensures overall system stability.
Step 3: Auto-Recovery and Log Management Setup¶
Automatic recovery during failures and structured log output:
[Unit]
Description=Self-Healing API Service
After=network.target
[Service]
Type=simple
User=apiuser
ExecStart=/usr/local/bin/api-server
StandardOutput=journal
StandardError=journal
SyslogIdentifier=api-service
# Auto-recovery configuration
Restart=on-failure
RestartSec=10
StartLimitIntervalSec=300
StartLimitBurst=5
# Environment variables for log level control
Environment=LOG_LEVEL=INFO
Environment=LOG_FORMAT=json
[Install]
WantedBy=multi-user.target
Log Check Commands:
# Real-time monitoring
journalctl -u api-service -f
# Structured log search
journalctl -u api-service -o json-pretty | grep ERROR
Common Pitfalls and Solutions¶
| Symptom | Cause | Immediate Solution |
|---|---|---|
| Service startup failure | Dependency service not started | Add dependency to After= |
| Killed due to memory shortage | Resource limits not set | Add MemoryLimit= setting |
| Logs not found | Output to stdout | Add StandardOutput=journal |
Advanced Operational Patterns (For Large-Scale Environments)
### Multi-Instance Management# /etc/systemd/system/webapp@.service
[Unit]
Description=Web App Instance %i
After=network.target
[Service]
Type=simple
User=webapp
ExecStart=/opt/webapp/start.sh %i
Environment=INSTANCE_ID=%i
PrivateTmp=yes
[Install]
WantedBy=multi-user.target
# Start individual instances
systemctl start webapp@1.service webapp@2.service
systemctl enable webapp@{1..3}.service
[Unit]
Description=Production Only Service
ConditionPathExists=/etc/production.flag
ConditionKernelCommandLine=!rescue
[Service]
ExecStart=/opt/service/prod-service