You have two problems. First you want to make sure that you assign exactly one id for each token. To do that you should sort and group records by token and make the assignment in a reducer. Once you've made sure that the reducer method is called exactly once for each token you can use the partition number from the context and a unique numeric id maintained by the reducer (one instance per partition) - just use an instance variable initialized to 1 in the setup method and incremented in the reduce method.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…