Storm and Kafka – parallelism is not magic

This came to me when we were trying to exploit the maximum out of parallelism factor in our storm topology. While going through the docs and understanding storm, I had in mind that we should increase the parallelism factor to get more throughput out of storm.

We had a sample storm topology which was using a kafka spout for its input feed. But after trying to increase the parallelism factor more than 1, we dint get a much of gain in throughput from out storm execution.

This led me to this storm group discussion:

which quotes Nathan Marz saying

The maximum parallelism you can have on a KafkaSpout is the number of partitions.

And all the spout instances which are more than the number of kafka partitions for the topic we are subscribing wont read any data.

So if you are trying to get maximum out of the parallelism factor of storm be sure to have that many number of partitions for kafka topic you are subscribing to. 🙂