... can't really grasp how the checksum can be used as an identifier for the start of the package (wouldn't the checksum change all the time?).
Yes, the checksum would change since it is derived from the data.
But even a fixed-value start-of-packet nibble would (by itself) not be sufficient to (initially) identify (or verify) data packets. Since this is binary data (rather than text), the data can take on the same value as any fixed-value start-of-packet. If you had a trivial scan for this start-nibble, that algorithm could easily misidentify a data nibble as the start-nibble.
Is this a common way for identifying the start of a packet?
No, but given the high data rate, it seems to be a scheme to minimize the packet size.
Does anyone have any idea how to solve this problem?
You probably have to initially scan every sequence of bytes five at a time (i.e. the length of a packet frame).
Calculate the checksum of this frame, and compare it to the first nibble.
A match indicates that you (may) have frame alignment.
A mismatch means that you should toss the first byte, and test the next possible packet frame that would start with what was the second byte (i.e. shift the 4 remaining bytes and append a new 5th byte).
Once frame alignment has been achieved (or assumed), you need to continually verify the checksum of every packet in order to confirm data integrity and ensure frame alignment. Any checksum error should force another hunt for correct frame alignment (starting at the 2nd byte of the current packet).
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…