KTable VS GlobalKTable
A KTable
shardes the data between all running Kafka Streams instances, while a GlobalKTable
has a full copy of all data on each instance. The disadvantage of GlobalKTable
is that it obviously needs more memory. The advantage is, that you can do a KStream-GlobalKTable join with a non-key attribute from the stream. For a KStream-KTable join and a non-key stream attribute for the join is only possible by extracting the join attribute and set it as the key before doing the join -- this will result in a repartitioning step of the stream before the join can be computed.
Note though, that there is also a semantical difference: For stream-table join, Kafka Stream align record processing ordered based on record timestamps. Thus, the update to the table are aligned with the records of you stream. For GlobalKTable
, there is no time synchronization and thus update to GlobalKTable
and completely decoupled from the processing of the stream records (thus, you get weaker semantics).
For further details, see KIP-99: Add Global Tables to Kafka Streams.
leftJoin() VS outerJoin()
About left and outer joins: it's like in a database a left-outer and full-outer join, respectively.
For a left outer join, you might "lose" data of your right input stream in case there is no match for the join in the left-hand side.
For a (full)outer join, no data will be dropped and each input record of both streams will be in the result stream.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…