Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
940 views
in Technique[技术] by (71.8m points)

arrays - How does PHP memory actually work

I've always heard and searched for new php 'good writing practice', for example: It's better (for performance) to check if array key exists than search in array, but also it seems better for memory too:

Assuming we have:

$array = array
(
    'one'   => 1,
    'two'   => 2,
    'three' => 3,
    'four'  => 4,
);

this allocates 1040 bytes of memory,

and

$array = array
(
    1 => 'one',
    2 => 'two',
    3 => 'three',
    4 => 'four',
);

requires 1136 bytes

I understand that the key and value surely will have different storing mechanism, but please can you actually point me to the principle how does it work?

Example 2 (for @teuneboon):

$array = array
(
    'one'   => '1',
    'two'   => '2',
    'three' => '3',
    'four'  => '4',
);

1168 bytes

$array = array
(
    '1' => 'one',
    '2' => 'two',
    '3' => 'three',
    '4' => 'four',
);

1136 bytes

consuming same memory:

  • 4 => 'four',
  • '4' => 'four',
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Note, answer below is applicable for PHP prior to version 7 as in PHP 7 major changes were introduced which also involve values structures.

TL;DR

Your question is not actually about "how memory works in PHP" (here, I assume, you meant "memory allocation"), but about "how arrays work in PHP" - and these two questions are different. To summarize what's written below:

  • PHP arrays aren't "arrays" in classical sense. They are hash-maps
  • Hash-map for PHP array has specific structure and uses many additional storage things, such as internal links pointers
  • Hash-map items for PHP hash-map also use additional fields to store information. And - yes, not only string/integer keys matters, but also what are strings themselves, which are used for your keys.
  • Option with string keys in your case will "win" in terms of memory amount because both options will be hashed into ulong (unsigned long) keys hash-map, so real difference will be in values, where string-keys option has integer (fixed-length) values, while integer-keys option has strings (chars-dependent length) values. But that may not always will be true due to possible collisions.
  • "String-numeric" keys, such as '4', will be treated as integer keys and translated into integer hash result as it was integer key. Thus, '4'=>'foo' and 4 => 'foo' are same things.

Also, important note: the graphics here are copyright of PHP internals book

Hash-map for PHP arrays

PHP arrays and C arrays

You should realize one very important thing: PHP is written on C, where such things as "associative array" simply does not exist. So, in C "array" is exactly what "array" is - i.e. it's just a consecutive area in memory which can be accessed by a consecutive offset. Your "keys" may be only numeric, integer and only consecutive, starting from zero. You can't have, for instance, 3,-6,'foo' as your "keys" there.

So to implement arrays, which are in PHP, there's hash-map option, it uses hash-function to hash your keys and transform them to integers, which can be used for C-arrays. That function, however, will never be able to create a bijection between string keys and their integer hashed results. And it's easy to understand why: because cardinality of strings set is much, much larger that cardinality of integer set. Let's illustrate with example: we'll recount all strings, up to length 10, which have only alphanumeric symbols (so, 0-9, a-z and A-Z, total 62): it's 6210 total strings possible. It's around 8.39E+17. Compare it with around 4E+9 which we have for unsigned integer (long integer, 32-bits) type and you'll get the idea - there will be collisions.

PHP hash-map keys & collisions

Now, to resolve collisions, PHP will just place items, which have same hash-function result, into one linked list. So, hash-map would not be just "list of hashed elements", but instead it will store pointers to lists of elements (each element in certain list will have same hash-function key). And this is where you have point to how it will affect memory allocation: if your array has string keys, which did not result in collisions, then no additional pointers inside those list would be needed, so memory amount will be reduced (actually, it's a very small overhead, but, since we're talking about precise memory allocation, this should be taken to account). And, same way, if your string keys will result into many collisions, then more additional pointers would be created, so total memory amount will be a bit more.

To illustrate those relations within those lists, here's a graphic:

enter image description here

Above there is how PHP will resolve collisions after applying hash-function. So one of your question parts lies here, pointers inside collision-resolution lists. Also, elements of linked lists are usually called buckets and the array, which contains pointers to heads of those lists is internally called arBuckets. Due to structure optimization (so, to make such things as element deletion, faster), real list element has two pointers, previous element and next element - but that's only will make difference in memory amount for non-collision/collision arrays little wider, but won't change concept itself.

One more list: order

To fully support arrays as they are in PHP, it's also needed to maintain order, so that is achieved with another internal list. Each element of arrays is a member of that list too. It won't make difference in terms of memory allocation, since in both options this list should be maintained, but for full picture, I'm mentioning this list. Here's the graphic:

enter image description here

In addition to pListLast and pListNext, pointers to order-list head and tail are stored. Again, it's not directly related to your question, but further I'll dump internal bucket structure, where these pointers are present.

Array element from inside

Now we're ready to look into: what is array element, so, bucket:

typedef struct bucket {
    ulong h;
    uint nKeyLength;
    void *pData;
    void *pDataPtr;
    struct bucket *pListNext;
    struct bucket *pListLast;
    struct bucket *pNext;
    struct bucket *pLast;
    char *arKey;
} Bucket;

Here we are:

  • h is an integer (ulong) value of key, it's a result of hash-function. For integer keys it is just same as key itself (hash-function returns itself)
  • pNext / pLast are pointers inside collision-resolution linked list
  • pListNext/pListLast are pointers inside order-resolution linked list
  • pData is a pointer to the stored value. Actually, value isn't same as inserted at array creation, it's copy, but, to avoid unnecessary overhead, PHP uses pDataPtr (so pData = &pDataPtr)

From this viewpoint, you may get next thing to where difference is: since string key will be hashed (thus, h is always ulong and, therefore, same size), it will be a matter of what is stored in values. So for your string-keys array there will be integer values, while for integer-keys array there will be string values, and that makes difference. However - no, it isn't a magic: you can't "save memory" with storing string keys such way all the times, because if your keys would be large and there will be many of them, it will cause collisions overhead (well, with very high probability, but, of course, not guaranteed). It will "work" only for arbitrary short strings, which won't cause many collisions.

Hash-table itself

It's already been spoken about elements (buckets) and their structure, but there's also hash-table itself, which is, in fact, array data-structure. So, it's called _hashtable:

typedef struct _hashtable {
    uint nTableSize;
    uint nTableMask;
    uint nNumOfElements;
    ulong nNextFreeElement;
    Bucket *pInternalPointer;   /* Used for element traversal */
    Bucket *pListHead;
    Bucket *pListTail;
    Bucket **arBuckets;
    dtor_func_t pDestructor;
    zend_bool persistent;
    unsigned char nApplyCount;
    zend_bool bApplyProtection;
#if ZEND_DEBUG
    int inconsistent;
#endif
} HashTable;

I won't describe all the fields, since I've already provided much info, which is only related to the question, but I'll describe this structure briefly:

  • arBuckets is what was described above, the buckets storage,
  • pListHead/pListTail are pointers to order-resolution list
  • nTableSize determines size of hash-table. And this is directly related to memory allocation: nTableSize is always power of 2. Thus, it's no matter if you'll have 13 or 14 elements in array: actual size will be 16. Take that to account when you want to estimate array size.

Conclusion

It's really difficult to predict, will one array be larger than another in your case. Yes, there are guidelines which are following from internal structure, but if string keys are comparable by their length to integer values (like 'four', 'one' in your sample) - real difference will be in such things as - how many collisions occurred, how many bytes were allocated to save the value.

But choosing proper structure should be matter of sense, not memory. If your intention is to build the corresponding indexed data, then choice always be obvious. Post above is only about one goal: to show how arrays actually work in PHP and where you can find the difference in memory allocation in your sample.

You may also check article about arrays & hash-tables in PHP: it's Hash-tables in PHP by PHP internals book: I've used some graphics from there. Also, to realize, how values are allocated in PHP, check zval Structure article, it may help you to understand, what will be differences between strings & integers allocation for values of your arrays. I didn't include explanations from it here, since much more important point for me - is to show array data structure and what may be difference in context of string keys/integer keys for your question.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...