Storm and Kafka – parallelism is not magic

This came to me when we were trying to exploit the maximum out of parallelism factor in our storm topology. While going through the docs and understanding storm, I had in mind that we should increase the parallelism factor to get more throughput out of storm.

We had a sample storm topology which was using a kafka spout for its input feed. But after trying to increase the parallelism factor more than 1, we dint get a much of gain in throughput from out storm execution.

This led me to this storm group discussion:

which quotes Nathan Marz saying

The maximum parallelism you can have on a KafkaSpout is the number of partitions.

And all the spout instances which are more than the number of kafka partitions for the topic we are subscribing wont read any data.

So if you are trying to get maximum out of the parallelism factor of storm be sure to have that many number of partitions for kafka topic you are subscribing to. 🙂

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s