I have an Elastic Beanstalk worker that can only run one task at a time and it takes some time to do so (from a few minutes to, hopefully, less than 30 minutes), so I'm queuing my tasks on a SQS.
On my worker configuration, I have:
HTTP connections: 1
Visibility timeout: 3600
Error visibility timeout: 300
(On "Advanced")
Inactivity timeout: 1800
The problem is that there seems to be a 1 minute timeout (on nginx?) that overrides the "Inactivity timeout", returning a 504 (Gateway timeout).
This is what I can find on the aws-sqsd.log file:
2016-02-03T16:16:27Z init: initializing aws-sqsd 2.0 (2015-02-18)
2016-02-03T16:16:27Z start: polling https://sqs.eu-central-1.amazonaws.com/855381918026/jitt-publisher-queue
2016-02-03T16:23:36Z message: sent to %[http://localhost:80]
2016-02-03T16:24:36Z http-err: 1444d1ba-ecb5-46f8-82d6-d0bf19b91fad (1) 504 - 60.006
2016-02-03T16:28:54Z message: sent to %[http://localhost:80]
2016-02-03T16:29:54Z http-err: 1b7514d3-689a-4e8b-a569-5ef1ac32ed0c (1) 504 - 60.029
2016-02-03T16:29:54Z message: sent to %[http://localhost:80]
2016-02-03T16:29:54Z http-err: 1444d1ba-ecb5-46f8-82d6-d0bf19b91fad (2) 500 - 0.006
2016-02-03T16:33:49Z message: sent to %[http://localhost:80]
2016-02-03T16:34:49Z http-err: 3a43e80f-a8d3-46b2-b2a0-9d898ad4f2a6 (1) 504 - 60.023
2016-02-03T16:34:54Z message: sent to %[http://localhost:80]
2016-02-03T16:34:54Z http-err: 1b7514d3-689a-4e8b-a569-5ef1ac32ed0c (2) 500 - 0.004
2016-02-03T16:34:54Z message: sent to %[http://localhost:80]
2016-02-03T16:34:54Z http-err: 1444d1ba-ecb5-46f8-82d6-d0bf19b91fad (3) 500 - 0.003
2016-02-03T16:39:49Z message: sent to %[http://localhost:80]
2016-02-03T16:40:49Z http-err: 3a43e80f-a8d3-46b2-b2a0-9d898ad4f2a6 (2) 504 - 60.019
Some things make sense here, like the 5 minute delay that each message takes from the time of the 504/500 until the task is re-sent to the worker once again (which matches the 300 seconds configuration for the "Error visibility timeout").
Those 500 codes match my current logic: the worker rejects the task by throwing a 500 back if there's still something running.
I have seen a lot of answers talking about setting the Load Balancer connection timeout setting, but, since this is a worker pulling messages from a SQS queue, there is no Load Balancer.
Any idea on what I should do to override that 1 minute timeout setting?
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…