check celery is alive15 Jan 2015
Celery is a brilliant piece of software, especially if you need to distribute and coordinate processing across nodes.
So I have multiple celery instances running over many nodes, however sometime some simply stop picking up work from the AMQP server. Its rare, but I want a way to positively check that Celery is functioning.
So to address this, I’ve created the below script that simply calls the built-in Celery ‘ping’ functionality to ensure the nodes are functioning. This doesn’t check every case that could cause a celery node to stop, but its a good start.
#!/bin/bash WORKER="worker" cd /var/sites/oversight echo "Checking $WORKER@`hostname -f` alive"|logger python manage.py celery inspect ping -t 30 -d $WORKER@`hostname -f` if [ $? -ne 0 ]; then echo "Restrting Celeryd due to missed ping"|logger timeout -s 9 240s /etc/init.d/celeryd restart if [ $? -ne 0 ]; then echo "Unable to restart, killall"|logger killall -9 python echo "Final restart"|logger /etc/init.d/celeryd restart else echo "Node Responding" fi fi