vrandom yet another random IT blog

check celery is alive

Overview

Celery is a brilliant piece of software, especially if you need to distribute and coordinate processing across nodes.

Problem

So I have multiple celery instances running over many nodes, however sometime some simply stop picking up work from the AMQP server. Its rare, but I want a way to positively check that Celery is functioning.

Solution

So to address this, I’ve created the below script that simply calls the built-in Celery ‘ping’ functionality to ensure the nodes are functioning. This doesn’t check every case that could cause a celery node to stop, but its a good start.

#!/bin/bash
WORKER="worker"
cd /var/sites/oversight
echo "Checking $WORKER@`hostname -f` alive"|logger
python manage.py celery inspect ping -t 30 -d $WORKER@`hostname -f`
if [ $? -ne 0 ]; then
	echo "Restrting Celeryd due to missed ping"|logger
	timeout -s 9 240s /etc/init.d/celeryd restart
	if [ $? -ne 0 ]; then
		echo "Unable to restart, killall"|logger
		killall -9 python
		echo "Final restart"|logger
		/etc/init.d/celeryd restart
	else
		echo "Node Responding"
	fi
fi