| =head1 NAME |
| |
| perlguts - Introduction to the Perl API |
| |
| =head1 DESCRIPTION |
| |
| This document attempts to describe how to use the Perl API, as well as |
| to provide some info on the basic workings of the Perl core. It is far |
| from complete and probably contains many errors. Please refer any |
| questions or comments to the author below. |
| |
| =head1 Variables |
| |
| =head2 Datatypes |
| |
| Perl has three typedefs that handle Perl's three main data types: |
| |
| SV Scalar Value |
| AV Array Value |
| HV Hash Value |
| |
| Each typedef has specific routines that manipulate the various data types. |
| |
| =head2 What is an "IV"? |
| |
| Perl uses a special typedef IV which is a simple signed integer type that is |
| guaranteed to be large enough to hold a pointer (as well as an integer). |
| Additionally, there is the UV, which is simply an unsigned IV. |
| |
| Perl also uses two special typedefs, I32 and I16, which will always be at |
| least 32-bits and 16-bits long, respectively. (Again, there are U32 and U16, |
| as well.) They will usually be exactly 32 and 16 bits long, but on Crays |
| they will both be 64 bits. |
| |
| =head2 Working with SVs |
| |
| An SV can be created and loaded with one command. There are five types of |
| values that can be loaded: an integer value (IV), an unsigned integer |
| value (UV), a double (NV), a string (PV), and another scalar (SV). |
| |
| The seven routines are: |
| |
| SV* newSViv(IV); |
| SV* newSVuv(UV); |
| SV* newSVnv(double); |
| SV* newSVpv(const char*, STRLEN); |
| SV* newSVpvn(const char*, STRLEN); |
| SV* newSVpvf(const char*, ...); |
| SV* newSVsv(SV*); |
| |
| C<STRLEN> is an integer type (Size_t, usually defined as size_t in |
| F<config.h>) guaranteed to be large enough to represent the size of |
| any string that perl can handle. |
| |
| In the unlikely case of a SV requiring more complex initialisation, you |
| can create an empty SV with newSV(len). If C<len> is 0 an empty SV of |
| type NULL is returned, else an SV of type PV is returned with len + 1 (for |
| the NUL) bytes of storage allocated, accessible via SvPVX. In both cases |
| the SV has the undef value. |
| |
| SV *sv = newSV(0); /* no storage allocated */ |
| SV *sv = newSV(10); /* 10 (+1) bytes of uninitialised storage |
| * allocated */ |
| |
| To change the value of an I<already-existing> SV, there are eight routines: |
| |
| void sv_setiv(SV*, IV); |
| void sv_setuv(SV*, UV); |
| void sv_setnv(SV*, double); |
| void sv_setpv(SV*, const char*); |
| void sv_setpvn(SV*, const char*, STRLEN) |
| void sv_setpvf(SV*, const char*, ...); |
| void sv_vsetpvfn(SV*, const char*, STRLEN, va_list *, |
| SV **, I32, bool *); |
| void sv_setsv(SV*, SV*); |
| |
| Notice that you can choose to specify the length of the string to be |
| assigned by using C<sv_setpvn>, C<newSVpvn>, or C<newSVpv>, or you may |
| allow Perl to calculate the length by using C<sv_setpv> or by specifying |
| 0 as the second argument to C<newSVpv>. Be warned, though, that Perl will |
| determine the string's length by using C<strlen>, which depends on the |
| string terminating with a NUL character, and not otherwise containing |
| NULs. |
| |
| The arguments of C<sv_setpvf> are processed like C<sprintf>, and the |
| formatted output becomes the value. |
| |
| C<sv_vsetpvfn> is an analogue of C<vsprintf>, but it allows you to specify |
| either a pointer to a variable argument list or the address and length of |
| an array of SVs. The last argument points to a boolean; on return, if that |
| boolean is true, then locale-specific information has been used to format |
| the string, and the string's contents are therefore untrustworthy (see |
| L<perlsec>). This pointer may be NULL if that information is not |
| important. Note that this function requires you to specify the length of |
| the format. |
| |
| The C<sv_set*()> functions are not generic enough to operate on values |
| that have "magic". See L<Magic Virtual Tables> later in this document. |
| |
| All SVs that contain strings should be terminated with a NUL character. |
| If it is not NUL-terminated there is a risk of |
| core dumps and corruptions from code which passes the string to C |
| functions or system calls which expect a NUL-terminated string. |
| Perl's own functions typically add a trailing NUL for this reason. |
| Nevertheless, you should be very careful when you pass a string stored |
| in an SV to a C function or system call. |
| |
| To access the actual value that an SV points to, you can use the macros: |
| |
| SvIV(SV*) |
| SvUV(SV*) |
| SvNV(SV*) |
| SvPV(SV*, STRLEN len) |
| SvPV_nolen(SV*) |
| |
| which will automatically coerce the actual scalar type into an IV, UV, double, |
| or string. |
| |
| In the C<SvPV> macro, the length of the string returned is placed into the |
| variable C<len> (this is a macro, so you do I<not> use C<&len>). If you do |
| not care what the length of the data is, use the C<SvPV_nolen> macro. |
| Historically the C<SvPV> macro with the global variable C<PL_na> has been |
| used in this case. But that can be quite inefficient because C<PL_na> must |
| be accessed in thread-local storage in threaded Perl. In any case, remember |
| that Perl allows arbitrary strings of data that may both contain NULs and |
| might not be terminated by a NUL. |
| |
| Also remember that C doesn't allow you to safely say C<foo(SvPV(s, len), |
| len);>. It might work with your compiler, but it won't work for everyone. |
| Break this sort of statement up into separate assignments: |
| |
| SV *s; |
| STRLEN len; |
| char *ptr; |
| ptr = SvPV(s, len); |
| foo(ptr, len); |
| |
| If you want to know if the scalar value is TRUE, you can use: |
| |
| SvTRUE(SV*) |
| |
| Although Perl will automatically grow strings for you, if you need to force |
| Perl to allocate more memory for your SV, you can use the macro |
| |
| SvGROW(SV*, STRLEN newlen) |
| |
| which will determine if more memory needs to be allocated. If so, it will |
| call the function C<sv_grow>. Note that C<SvGROW> can only increase, not |
| decrease, the allocated memory of an SV and that it does not automatically |
| add space for the trailing NUL byte (perl's own string functions typically do |
| C<SvGROW(sv, len + 1)>). |
| |
| If you have an SV and want to know what kind of data Perl thinks is stored |
| in it, you can use the following macros to check the type of SV you have. |
| |
| SvIOK(SV*) |
| SvNOK(SV*) |
| SvPOK(SV*) |
| |
| You can get and set the current length of the string stored in an SV with |
| the following macros: |
| |
| SvCUR(SV*) |
| SvCUR_set(SV*, I32 val) |
| |
| You can also get a pointer to the end of the string stored in the SV |
| with the macro: |
| |
| SvEND(SV*) |
| |
| But note that these last three macros are valid only if C<SvPOK()> is true. |
| |
| If you want to append something to the end of string stored in an C<SV*>, |
| you can use the following functions: |
| |
| void sv_catpv(SV*, const char*); |
| void sv_catpvn(SV*, const char*, STRLEN); |
| void sv_catpvf(SV*, const char*, ...); |
| void sv_vcatpvfn(SV*, const char*, STRLEN, va_list *, SV **, |
| I32, bool); |
| void sv_catsv(SV*, SV*); |
| |
| The first function calculates the length of the string to be appended by |
| using C<strlen>. In the second, you specify the length of the string |
| yourself. The third function processes its arguments like C<sprintf> and |
| appends the formatted output. The fourth function works like C<vsprintf>. |
| You can specify the address and length of an array of SVs instead of the |
| va_list argument. The fifth function extends the string stored in the first |
| SV with the string stored in the second SV. It also forces the second SV |
| to be interpreted as a string. |
| |
| The C<sv_cat*()> functions are not generic enough to operate on values that |
| have "magic". See L<Magic Virtual Tables> later in this document. |
| |
| If you know the name of a scalar variable, you can get a pointer to its SV |
| by using the following: |
| |
| SV* get_sv("package::varname", 0); |
| |
| This returns NULL if the variable does not exist. |
| |
| If you want to know if this variable (or any other SV) is actually C<defined>, |
| you can call: |
| |
| SvOK(SV*) |
| |
| The scalar C<undef> value is stored in an SV instance called C<PL_sv_undef>. |
| |
| Its address can be used whenever an C<SV*> is needed. Make sure that |
| you don't try to compare a random sv with C<&PL_sv_undef>. For example |
| when interfacing Perl code, it'll work correctly for: |
| |
| foo(undef); |
| |
| But won't work when called as: |
| |
| $x = undef; |
| foo($x); |
| |
| So to repeat always use SvOK() to check whether an sv is defined. |
| |
| Also you have to be careful when using C<&PL_sv_undef> as a value in |
| AVs or HVs (see L<AVs, HVs and undefined values>). |
| |
| There are also the two values C<PL_sv_yes> and C<PL_sv_no>, which contain |
| boolean TRUE and FALSE values, respectively. Like C<PL_sv_undef>, their |
| addresses can be used whenever an C<SV*> is needed. |
| |
| Do not be fooled into thinking that C<(SV *) 0> is the same as C<&PL_sv_undef>. |
| Take this code: |
| |
| SV* sv = (SV*) 0; |
| if (I-am-to-return-a-real-value) { |
| sv = sv_2mortal(newSViv(42)); |
| } |
| sv_setsv(ST(0), sv); |
| |
| This code tries to return a new SV (which contains the value 42) if it should |
| return a real value, or undef otherwise. Instead it has returned a NULL |
| pointer which, somewhere down the line, will cause a segmentation violation, |
| bus error, or just weird results. Change the zero to C<&PL_sv_undef> in the |
| first line and all will be well. |
| |
| To free an SV that you've created, call C<SvREFCNT_dec(SV*)>. Normally this |
| call is not necessary (see L<Reference Counts and Mortality>). |
| |
| =head2 Offsets |
| |
| Perl provides the function C<sv_chop> to efficiently remove characters |
| from the beginning of a string; you give it an SV and a pointer to |
| somewhere inside the PV, and it discards everything before the |
| pointer. The efficiency comes by means of a little hack: instead of |
| actually removing the characters, C<sv_chop> sets the flag C<OOK> |
| (offset OK) to signal to other functions that the offset hack is in |
| effect, and it puts the number of bytes chopped off into the IV field |
| of the SV. It then moves the PV pointer (called C<SvPVX>) forward that |
| many bytes, and adjusts C<SvCUR> and C<SvLEN>. |
| |
| Hence, at this point, the start of the buffer that we allocated lives |
| at C<SvPVX(sv) - SvIV(sv)> in memory and the PV pointer is pointing |
| into the middle of this allocated storage. |
| |
| This is best demonstrated by example: |
| |
| % ./perl -Ilib -MDevel::Peek -le '$a="12345"; $a=~s/.//; Dump($a)' |
| SV = PVIV(0x8128450) at 0x81340f0 |
| REFCNT = 1 |
| FLAGS = (POK,OOK,pPOK) |
| IV = 1 (OFFSET) |
| PV = 0x8135781 ( "1" . ) "2345"\0 |
| CUR = 4 |
| LEN = 5 |
| |
| Here the number of bytes chopped off (1) is put into IV, and |
| C<Devel::Peek::Dump> helpfully reminds us that this is an offset. The |
| portion of the string between the "real" and the "fake" beginnings is |
| shown in parentheses, and the values of C<SvCUR> and C<SvLEN> reflect |
| the fake beginning, not the real one. |
| |
| Something similar to the offset hack is performed on AVs to enable |
| efficient shifting and splicing off the beginning of the array; while |
| C<AvARRAY> points to the first element in the array that is visible from |
| Perl, C<AvALLOC> points to the real start of the C array. These are |
| usually the same, but a C<shift> operation can be carried out by |
| increasing C<AvARRAY> by one and decreasing C<AvFILL> and C<AvMAX>. |
| Again, the location of the real start of the C array only comes into |
| play when freeing the array. See C<av_shift> in F<av.c>. |
| |
| =head2 What's Really Stored in an SV? |
| |
| Recall that the usual method of determining the type of scalar you have is |
| to use C<Sv*OK> macros. Because a scalar can be both a number and a string, |
| usually these macros will always return TRUE and calling the C<Sv*V> |
| macros will do the appropriate conversion of string to integer/double or |
| integer/double to string. |
| |
| If you I<really> need to know if you have an integer, double, or string |
| pointer in an SV, you can use the following three macros instead: |
| |
| SvIOKp(SV*) |
| SvNOKp(SV*) |
| SvPOKp(SV*) |
| |
| These will tell you if you truly have an integer, double, or string pointer |
| stored in your SV. The "p" stands for private. |
| |
| There are various ways in which the private and public flags may differ. |
| For example, a tied SV may have a valid underlying value in the IV slot |
| (so SvIOKp is true), but the data should be accessed via the FETCH |
| routine rather than directly, so SvIOK is false. Another is when |
| numeric conversion has occurred and precision has been lost: only the |
| private flag is set on 'lossy' values. So when an NV is converted to an |
| IV with loss, SvIOKp, SvNOKp and SvNOK will be set, while SvIOK wont be. |
| |
| In general, though, it's best to use the C<Sv*V> macros. |
| |
| =head2 Working with AVs |
| |
| There are two ways to create and load an AV. The first method creates an |
| empty AV: |
| |
| AV* newAV(); |
| |
| The second method both creates the AV and initially populates it with SVs: |
| |
| AV* av_make(I32 num, SV **ptr); |
| |
| The second argument points to an array containing C<num> C<SV*>'s. Once the |
| AV has been created, the SVs can be destroyed, if so desired. |
| |
| Once the AV has been created, the following operations are possible on it: |
| |
| void av_push(AV*, SV*); |
| SV* av_pop(AV*); |
| SV* av_shift(AV*); |
| void av_unshift(AV*, I32 num); |
| |
| These should be familiar operations, with the exception of C<av_unshift>. |
| This routine adds C<num> elements at the front of the array with the C<undef> |
| value. You must then use C<av_store> (described below) to assign values |
| to these new elements. |
| |
| Here are some other functions: |
| |
| I32 av_len(AV*); |
| SV** av_fetch(AV*, I32 key, I32 lval); |
| SV** av_store(AV*, I32 key, SV* val); |
| |
| The C<av_len> function returns the highest index value in an array (just |
| like $#array in Perl). If the array is empty, -1 is returned. The |
| C<av_fetch> function returns the value at index C<key>, but if C<lval> |
| is non-zero, then C<av_fetch> will store an undef value at that index. |
| The C<av_store> function stores the value C<val> at index C<key>, and does |
| not increment the reference count of C<val>. Thus the caller is responsible |
| for taking care of that, and if C<av_store> returns NULL, the caller will |
| have to decrement the reference count to avoid a memory leak. Note that |
| C<av_fetch> and C<av_store> both return C<SV**>'s, not C<SV*>'s as their |
| return value. |
| |
| A few more: |
| |
| void av_clear(AV*); |
| void av_undef(AV*); |
| void av_extend(AV*, I32 key); |
| |
| The C<av_clear> function deletes all the elements in the AV* array, but |
| does not actually delete the array itself. The C<av_undef> function will |
| delete all the elements in the array plus the array itself. The |
| C<av_extend> function extends the array so that it contains at least C<key+1> |
| elements. If C<key+1> is less than the currently allocated length of the array, |
| then nothing is done. |
| |
| If you know the name of an array variable, you can get a pointer to its AV |
| by using the following: |
| |
| AV* get_av("package::varname", 0); |
| |
| This returns NULL if the variable does not exist. |
| |
| See L<Understanding the Magic of Tied Hashes and Arrays> for more |
| information on how to use the array access functions on tied arrays. |
| |
| =head2 Working with HVs |
| |
| To create an HV, you use the following routine: |
| |
| HV* newHV(); |
| |
| Once the HV has been created, the following operations are possible on it: |
| |
| SV** hv_store(HV*, const char* key, U32 klen, SV* val, U32 hash); |
| SV** hv_fetch(HV*, const char* key, U32 klen, I32 lval); |
| |
| The C<klen> parameter is the length of the key being passed in (Note that |
| you cannot pass 0 in as a value of C<klen> to tell Perl to measure the |
| length of the key). The C<val> argument contains the SV pointer to the |
| scalar being stored, and C<hash> is the precomputed hash value (zero if |
| you want C<hv_store> to calculate it for you). The C<lval> parameter |
| indicates whether this fetch is actually a part of a store operation, in |
| which case a new undefined value will be added to the HV with the supplied |
| key and C<hv_fetch> will return as if the value had already existed. |
| |
| Remember that C<hv_store> and C<hv_fetch> return C<SV**>'s and not just |
| C<SV*>. To access the scalar value, you must first dereference the return |
| value. However, you should check to make sure that the return value is |
| not NULL before dereferencing it. |
| |
| The first of these two functions checks if a hash table entry exists, and the |
| second deletes it. |
| |
| bool hv_exists(HV*, const char* key, U32 klen); |
| SV* hv_delete(HV*, const char* key, U32 klen, I32 flags); |
| |
| If C<flags> does not include the C<G_DISCARD> flag then C<hv_delete> will |
| create and return a mortal copy of the deleted value. |
| |
| And more miscellaneous functions: |
| |
| void hv_clear(HV*); |
| void hv_undef(HV*); |
| |
| Like their AV counterparts, C<hv_clear> deletes all the entries in the hash |
| table but does not actually delete the hash table. The C<hv_undef> deletes |
| both the entries and the hash table itself. |
| |
| Perl keeps the actual data in a linked list of structures with a typedef of HE. |
| These contain the actual key and value pointers (plus extra administrative |
| overhead). The key is a string pointer; the value is an C<SV*>. However, |
| once you have an C<HE*>, to get the actual key and value, use the routines |
| specified below. |
| |
| I32 hv_iterinit(HV*); |
| /* Prepares starting point to traverse hash table */ |
| HE* hv_iternext(HV*); |
| /* Get the next entry, and return a pointer to a |
| structure that has both the key and value */ |
| char* hv_iterkey(HE* entry, I32* retlen); |
| /* Get the key from an HE structure and also return |
| the length of the key string */ |
| SV* hv_iterval(HV*, HE* entry); |
| /* Return an SV pointer to the value of the HE |
| structure */ |
| SV* hv_iternextsv(HV*, char** key, I32* retlen); |
| /* This convenience routine combines hv_iternext, |
| hv_iterkey, and hv_iterval. The key and retlen |
| arguments are return values for the key and its |
| length. The value is returned in the SV* argument */ |
| |
| If you know the name of a hash variable, you can get a pointer to its HV |
| by using the following: |
| |
| HV* get_hv("package::varname", 0); |
| |
| This returns NULL if the variable does not exist. |
| |
| The hash algorithm is defined in the C<PERL_HASH(hash, key, klen)> macro: |
| |
| hash = 0; |
| while (klen--) |
| hash = (hash * 33) + *key++; |
| hash = hash + (hash >> 5); /* after 5.6 */ |
| |
| The last step was added in version 5.6 to improve distribution of |
| lower bits in the resulting hash value. |
| |
| See L<Understanding the Magic of Tied Hashes and Arrays> for more |
| information on how to use the hash access functions on tied hashes. |
| |
| =head2 Hash API Extensions |
| |
| Beginning with version 5.004, the following functions are also supported: |
| |
| HE* hv_fetch_ent (HV* tb, SV* key, I32 lval, U32 hash); |
| HE* hv_store_ent (HV* tb, SV* key, SV* val, U32 hash); |
| |
| bool hv_exists_ent (HV* tb, SV* key, U32 hash); |
| SV* hv_delete_ent (HV* tb, SV* key, I32 flags, U32 hash); |
| |
| SV* hv_iterkeysv (HE* entry); |
| |
| Note that these functions take C<SV*> keys, which simplifies writing |
| of extension code that deals with hash structures. These functions |
| also allow passing of C<SV*> keys to C<tie> functions without forcing |
| you to stringify the keys (unlike the previous set of functions). |
| |
| They also return and accept whole hash entries (C<HE*>), making their |
| use more efficient (since the hash number for a particular string |
| doesn't have to be recomputed every time). See L<perlapi> for detailed |
| descriptions. |
| |
| The following macros must always be used to access the contents of hash |
| entries. Note that the arguments to these macros must be simple |
| variables, since they may get evaluated more than once. See |
| L<perlapi> for detailed descriptions of these macros. |
| |
| HePV(HE* he, STRLEN len) |
| HeVAL(HE* he) |
| HeHASH(HE* he) |
| HeSVKEY(HE* he) |
| HeSVKEY_force(HE* he) |
| HeSVKEY_set(HE* he, SV* sv) |
| |
| These two lower level macros are defined, but must only be used when |
| dealing with keys that are not C<SV*>s: |
| |
| HeKEY(HE* he) |
| HeKLEN(HE* he) |
| |
| Note that both C<hv_store> and C<hv_store_ent> do not increment the |
| reference count of the stored C<val>, which is the caller's responsibility. |
| If these functions return a NULL value, the caller will usually have to |
| decrement the reference count of C<val> to avoid a memory leak. |
| |
| =head2 AVs, HVs and undefined values |
| |
| Sometimes you have to store undefined values in AVs or HVs. Although |
| this may be a rare case, it can be tricky. That's because you're |
| used to using C<&PL_sv_undef> if you need an undefined SV. |
| |
| For example, intuition tells you that this XS code: |
| |
| AV *av = newAV(); |
| av_store( av, 0, &PL_sv_undef ); |
| |
| is equivalent to this Perl code: |
| |
| my @av; |
| $av[0] = undef; |
| |
| Unfortunately, this isn't true. AVs use C<&PL_sv_undef> as a marker |
| for indicating that an array element has not yet been initialized. |
| Thus, C<exists $av[0]> would be true for the above Perl code, but |
| false for the array generated by the XS code. |
| |
| Other problems can occur when storing C<&PL_sv_undef> in HVs: |
| |
| hv_store( hv, "key", 3, &PL_sv_undef, 0 ); |
| |
| This will indeed make the value C<undef>, but if you try to modify |
| the value of C<key>, you'll get the following error: |
| |
| Modification of non-creatable hash value attempted |
| |
| In perl 5.8.0, C<&PL_sv_undef> was also used to mark placeholders |
| in restricted hashes. This caused such hash entries not to appear |
| when iterating over the hash or when checking for the keys |
| with the C<hv_exists> function. |
| |
| You can run into similar problems when you store C<&PL_sv_yes> or |
| C<&PL_sv_no> into AVs or HVs. Trying to modify such elements |
| will give you the following error: |
| |
| Modification of a read-only value attempted |
| |
| To make a long story short, you can use the special variables |
| C<&PL_sv_undef>, C<&PL_sv_yes> and C<&PL_sv_no> with AVs and |
| HVs, but you have to make sure you know what you're doing. |
| |
| Generally, if you want to store an undefined value in an AV |
| or HV, you should not use C<&PL_sv_undef>, but rather create a |
| new undefined value using the C<newSV> function, for example: |
| |
| av_store( av, 42, newSV(0) ); |
| hv_store( hv, "foo", 3, newSV(0), 0 ); |
| |
| =head2 References |
| |
| References are a special type of scalar that point to other data types |
| (including other references). |
| |
| To create a reference, use either of the following functions: |
| |
| SV* newRV_inc((SV*) thing); |
| SV* newRV_noinc((SV*) thing); |
| |
| The C<thing> argument can be any of an C<SV*>, C<AV*>, or C<HV*>. The |
| functions are identical except that C<newRV_inc> increments the reference |
| count of the C<thing>, while C<newRV_noinc> does not. For historical |
| reasons, C<newRV> is a synonym for C<newRV_inc>. |
| |
| Once you have a reference, you can use the following macro to dereference |
| the reference: |
| |
| SvRV(SV*) |
| |
| then call the appropriate routines, casting the returned C<SV*> to either an |
| C<AV*> or C<HV*>, if required. |
| |
| To determine if an SV is a reference, you can use the following macro: |
| |
| SvROK(SV*) |
| |
| To discover what type of value the reference refers to, use the following |
| macro and then check the return value. |
| |
| SvTYPE(SvRV(SV*)) |
| |
| The most useful types that will be returned are: |
| |
| SVt_IV Scalar |
| SVt_NV Scalar |
| SVt_PV Scalar |
| SVt_RV Scalar |
| SVt_PVAV Array |
| SVt_PVHV Hash |
| SVt_PVCV Code |
| SVt_PVGV Glob (possibly a file handle) |
| SVt_PVMG Blessed or Magical Scalar |
| |
| See the F<sv.h> header file for more details. |
| |
| =head2 Blessed References and Class Objects |
| |
| References are also used to support object-oriented programming. In perl's |
| OO lexicon, an object is simply a reference that has been blessed into a |
| package (or class). Once blessed, the programmer may now use the reference |
| to access the various methods in the class. |
| |
| A reference can be blessed into a package with the following function: |
| |
| SV* sv_bless(SV* sv, HV* stash); |
| |
| The C<sv> argument must be a reference value. The C<stash> argument |
| specifies which class the reference will belong to. See |
| L<Stashes and Globs> for information on converting class names into stashes. |
| |
| /* Still under construction */ |
| |
| The following function upgrades rv to reference if not already one. |
| Creates a new SV for rv to point to. If C<classname> is non-null, the SV |
| is blessed into the specified class. SV is returned. |
| |
| SV* newSVrv(SV* rv, const char* classname); |
| |
| The following three functions copy integer, unsigned integer or double |
| into an SV whose reference is C<rv>. SV is blessed if C<classname> is |
| non-null. |
| |
| SV* sv_setref_iv(SV* rv, const char* classname, IV iv); |
| SV* sv_setref_uv(SV* rv, const char* classname, UV uv); |
| SV* sv_setref_nv(SV* rv, const char* classname, NV iv); |
| |
| The following function copies the pointer value (I<the address, not the |
| string!>) into an SV whose reference is rv. SV is blessed if C<classname> |
| is non-null. |
| |
| SV* sv_setref_pv(SV* rv, const char* classname, void* pv); |
| |
| The following function copies a string into an SV whose reference is C<rv>. |
| Set length to 0 to let Perl calculate the string length. SV is blessed if |
| C<classname> is non-null. |
| |
| SV* sv_setref_pvn(SV* rv, const char* classname, char* pv, |
| STRLEN length); |
| |
| The following function tests whether the SV is blessed into the specified |
| class. It does not check inheritance relationships. |
| |
| int sv_isa(SV* sv, const char* name); |
| |
| The following function tests whether the SV is a reference to a blessed object. |
| |
| int sv_isobject(SV* sv); |
| |
| The following function tests whether the SV is derived from the specified |
| class. SV can be either a reference to a blessed object or a string |
| containing a class name. This is the function implementing the |
| C<UNIVERSAL::isa> functionality. |
| |
| bool sv_derived_from(SV* sv, const char* name); |
| |
| To check if you've got an object derived from a specific class you have |
| to write: |
| |
| if (sv_isobject(sv) && sv_derived_from(sv, class)) { ... } |
| |
| =head2 Creating New Variables |
| |
| To create a new Perl variable with an undef value which can be accessed from |
| your Perl script, use the following routines, depending on the variable type. |
| |
| SV* get_sv("package::varname", GV_ADD); |
| AV* get_av("package::varname", GV_ADD); |
| HV* get_hv("package::varname", GV_ADD); |
| |
| Notice the use of GV_ADD as the second parameter. The new variable can now |
| be set, using the routines appropriate to the data type. |
| |
| There are additional macros whose values may be bitwise OR'ed with the |
| C<GV_ADD> argument to enable certain extra features. Those bits are: |
| |
| =over |
| |
| =item GV_ADDMULTI |
| |
| Marks the variable as multiply defined, thus preventing the: |
| |
| Name <varname> used only once: possible typo |
| |
| warning. |
| |
| =item GV_ADDWARN |
| |
| Issues the warning: |
| |
| Had to create <varname> unexpectedly |
| |
| if the variable did not exist before the function was called. |
| |
| =back |
| |
| If you do not specify a package name, the variable is created in the current |
| package. |
| |
| =head2 Reference Counts and Mortality |
| |
| Perl uses a reference count-driven garbage collection mechanism. SVs, |
| AVs, or HVs (xV for short in the following) start their life with a |
| reference count of 1. If the reference count of an xV ever drops to 0, |
| then it will be destroyed and its memory made available for reuse. |
| |
| This normally doesn't happen at the Perl level unless a variable is |
| undef'ed or the last variable holding a reference to it is changed or |
| overwritten. At the internal level, however, reference counts can be |
| manipulated with the following macros: |
| |
| int SvREFCNT(SV* sv); |
| SV* SvREFCNT_inc(SV* sv); |
| void SvREFCNT_dec(SV* sv); |
| |
| However, there is one other function which manipulates the reference |
| count of its argument. The C<newRV_inc> function, you will recall, |
| creates a reference to the specified argument. As a side effect, |
| it increments the argument's reference count. If this is not what |
| you want, use C<newRV_noinc> instead. |
| |
| For example, imagine you want to return a reference from an XSUB function. |
| Inside the XSUB routine, you create an SV which initially has a reference |
| count of one. Then you call C<newRV_inc>, passing it the just-created SV. |
| This returns the reference as a new SV, but the reference count of the |
| SV you passed to C<newRV_inc> has been incremented to two. Now you |
| return the reference from the XSUB routine and forget about the SV. |
| But Perl hasn't! Whenever the returned reference is destroyed, the |
| reference count of the original SV is decreased to one and nothing happens. |
| The SV will hang around without any way to access it until Perl itself |
| terminates. This is a memory leak. |
| |
| The correct procedure, then, is to use C<newRV_noinc> instead of |
| C<newRV_inc>. Then, if and when the last reference is destroyed, |
| the reference count of the SV will go to zero and it will be destroyed, |
| stopping any memory leak. |
| |
| There are some convenience functions available that can help with the |
| destruction of xVs. These functions introduce the concept of "mortality". |
| An xV that is mortal has had its reference count marked to be decremented, |
| but not actually decremented, until "a short time later". Generally the |
| term "short time later" means a single Perl statement, such as a call to |
| an XSUB function. The actual determinant for when mortal xVs have their |
| reference count decremented depends on two macros, SAVETMPS and FREETMPS. |
| See L<perlcall> and L<perlxs> for more details on these macros. |
| |
| "Mortalization" then is at its simplest a deferred C<SvREFCNT_dec>. |
| However, if you mortalize a variable twice, the reference count will |
| later be decremented twice. |
| |
| "Mortal" SVs are mainly used for SVs that are placed on perl's stack. |
| For example an SV which is created just to pass a number to a called sub |
| is made mortal to have it cleaned up automatically when it's popped off |
| the stack. Similarly, results returned by XSUBs (which are pushed on the |
| stack) are often made mortal. |
| |
| To create a mortal variable, use the functions: |
| |
| SV* sv_newmortal() |
| SV* sv_2mortal(SV*) |
| SV* sv_mortalcopy(SV*) |
| |
| The first call creates a mortal SV (with no value), the second converts an existing |
| SV to a mortal SV (and thus defers a call to C<SvREFCNT_dec>), and the |
| third creates a mortal copy of an existing SV. |
| Because C<sv_newmortal> gives the new SV no value, it must normally be given one |
| via C<sv_setpv>, C<sv_setiv>, etc. : |
| |
| SV *tmp = sv_newmortal(); |
| sv_setiv(tmp, an_integer); |
| |
| As that is multiple C statements it is quite common so see this idiom instead: |
| |
| SV *tmp = sv_2mortal(newSViv(an_integer)); |
| |
| |
| You should be careful about creating mortal variables. Strange things |
| can happen if you make the same value mortal within multiple contexts, |
| or if you make a variable mortal multiple times. Thinking of "Mortalization" |
| as deferred C<SvREFCNT_dec> should help to minimize such problems. |
| For example if you are passing an SV which you I<know> has a high enough REFCNT |
| to survive its use on the stack you need not do any mortalization. |
| If you are not sure then doing an C<SvREFCNT_inc> and C<sv_2mortal>, or |
| making a C<sv_mortalcopy> is safer. |
| |
| The mortal routines are not just for SVs; AVs and HVs can be |
| made mortal by passing their address (type-casted to C<SV*>) to the |
| C<sv_2mortal> or C<sv_mortalcopy> routines. |
| |
| =head2 Stashes and Globs |
| |
| A B<stash> is a hash that contains all variables that are defined |
| within a package. Each key of the stash is a symbol |
| name (shared by all the different types of objects that have the same |
| name), and each value in the hash table is a GV (Glob Value). This GV |
| in turn contains references to the various objects of that name, |
| including (but not limited to) the following: |
| |
| Scalar Value |
| Array Value |
| Hash Value |
| I/O Handle |
| Format |
| Subroutine |
| |
| There is a single stash called C<PL_defstash> that holds the items that exist |
| in the C<main> package. To get at the items in other packages, append the |
| string "::" to the package name. The items in the C<Foo> package are in |
| the stash C<Foo::> in PL_defstash. The items in the C<Bar::Baz> package are |
| in the stash C<Baz::> in C<Bar::>'s stash. |
| |
| To get the stash pointer for a particular package, use the function: |
| |
| HV* gv_stashpv(const char* name, I32 flags) |
| HV* gv_stashsv(SV*, I32 flags) |
| |
| The first function takes a literal string, the second uses the string stored |
| in the SV. Remember that a stash is just a hash table, so you get back an |
| C<HV*>. The C<flags> flag will create a new package if it is set to GV_ADD. |
| |
| The name that C<gv_stash*v> wants is the name of the package whose symbol table |
| you want. The default package is called C<main>. If you have multiply nested |
| packages, pass their names to C<gv_stash*v>, separated by C<::> as in the Perl |
| language itself. |
| |
| Alternately, if you have an SV that is a blessed reference, you can find |
| out the stash pointer by using: |
| |
| HV* SvSTASH(SvRV(SV*)); |
| |
| then use the following to get the package name itself: |
| |
| char* HvNAME(HV* stash); |
| |
| If you need to bless or re-bless an object you can use the following |
| function: |
| |
| SV* sv_bless(SV*, HV* stash) |
| |
| where the first argument, an C<SV*>, must be a reference, and the second |
| argument is a stash. The returned C<SV*> can now be used in the same way |
| as any other SV. |
| |
| For more information on references and blessings, consult L<perlref>. |
| |
| =head2 Double-Typed SVs |
| |
| Scalar variables normally contain only one type of value, an integer, |
| double, pointer, or reference. Perl will automatically convert the |
| actual scalar data from the stored type into the requested type. |
| |
| Some scalar variables contain more than one type of scalar data. For |
| example, the variable C<$!> contains either the numeric value of C<errno> |
| or its string equivalent from either C<strerror> or C<sys_errlist[]>. |
| |
| To force multiple data values into an SV, you must do two things: use the |
| C<sv_set*v> routines to add the additional scalar type, then set a flag |
| so that Perl will believe it contains more than one type of data. The |
| four macros to set the flags are: |
| |
| SvIOK_on |
| SvNOK_on |
| SvPOK_on |
| SvROK_on |
| |
| The particular macro you must use depends on which C<sv_set*v> routine |
| you called first. This is because every C<sv_set*v> routine turns on |
| only the bit for the particular type of data being set, and turns off |
| all the rest. |
| |
| For example, to create a new Perl variable called "dberror" that contains |
| both the numeric and descriptive string error values, you could use the |
| following code: |
| |
| extern int dberror; |
| extern char *dberror_list; |
| |
| SV* sv = get_sv("dberror", GV_ADD); |
| sv_setiv(sv, (IV) dberror); |
| sv_setpv(sv, dberror_list[dberror]); |
| SvIOK_on(sv); |
| |
| If the order of C<sv_setiv> and C<sv_setpv> had been reversed, then the |
| macro C<SvPOK_on> would need to be called instead of C<SvIOK_on>. |
| |
| =head2 Magic Variables |
| |
| [This section still under construction. Ignore everything here. Post no |
| bills. Everything not permitted is forbidden.] |
| |
| Any SV may be magical, that is, it has special features that a normal |
| SV does not have. These features are stored in the SV structure in a |
| linked list of C<struct magic>'s, typedef'ed to C<MAGIC>. |
| |
| struct magic { |
| MAGIC* mg_moremagic; |
| MGVTBL* mg_virtual; |
| U16 mg_private; |
| char mg_type; |
| U8 mg_flags; |
| I32 mg_len; |
| SV* mg_obj; |
| char* mg_ptr; |
| }; |
| |
| Note this is current as of patchlevel 0, and could change at any time. |
| |
| =head2 Assigning Magic |
| |
| Perl adds magic to an SV using the sv_magic function: |
| |
| void sv_magic(SV* sv, SV* obj, int how, const char* name, I32 namlen); |
| |
| The C<sv> argument is a pointer to the SV that is to acquire a new magical |
| feature. |
| |
| If C<sv> is not already magical, Perl uses the C<SvUPGRADE> macro to |
| convert C<sv> to type C<SVt_PVMG>. Perl then continues by adding new magic |
| to the beginning of the linked list of magical features. Any prior entry |
| of the same type of magic is deleted. Note that this can be overridden, |
| and multiple instances of the same type of magic can be associated with an |
| SV. |
| |
| The C<name> and C<namlen> arguments are used to associate a string with |
| the magic, typically the name of a variable. C<namlen> is stored in the |
| C<mg_len> field and if C<name> is non-null then either a C<savepvn> copy of |
| C<name> or C<name> itself is stored in the C<mg_ptr> field, depending on |
| whether C<namlen> is greater than zero or equal to zero respectively. As a |
| special case, if C<(name && namlen == HEf_SVKEY)> then C<name> is assumed |
| to contain an C<SV*> and is stored as-is with its REFCNT incremented. |
| |
| The sv_magic function uses C<how> to determine which, if any, predefined |
| "Magic Virtual Table" should be assigned to the C<mg_virtual> field. |
| See the L<Magic Virtual Tables> section below. The C<how> argument is also |
| stored in the C<mg_type> field. The value of C<how> should be chosen |
| from the set of macros C<PERL_MAGIC_foo> found in F<perl.h>. Note that before |
| these macros were added, Perl internals used to directly use character |
| literals, so you may occasionally come across old code or documentation |
| referring to 'U' magic rather than C<PERL_MAGIC_uvar> for example. |
| |
| The C<obj> argument is stored in the C<mg_obj> field of the C<MAGIC> |
| structure. If it is not the same as the C<sv> argument, the reference |
| count of the C<obj> object is incremented. If it is the same, or if |
| the C<how> argument is C<PERL_MAGIC_arylen>, or if it is a NULL pointer, |
| then C<obj> is merely stored, without the reference count being incremented. |
| |
| See also C<sv_magicext> in L<perlapi> for a more flexible way to add magic |
| to an SV. |
| |
| There is also a function to add magic to an C<HV>: |
| |
| void hv_magic(HV *hv, GV *gv, int how); |
| |
| This simply calls C<sv_magic> and coerces the C<gv> argument into an C<SV>. |
| |
| To remove the magic from an SV, call the function sv_unmagic: |
| |
| int sv_unmagic(SV *sv, int type); |
| |
| The C<type> argument should be equal to the C<how> value when the C<SV> |
| was initially made magical. |
| |
| However, note that C<sv_unmagic> removes all magic of a certain C<type> from the |
| C<SV>. If you want to remove only certain magic of a C<type> based on the magic |
| virtual table, use C<sv_unmagicext> instead: |
| |
| int sv_unmagicext(SV *sv, int type, MGVTBL *vtbl); |
| |
| =head2 Magic Virtual Tables |
| |
| The C<mg_virtual> field in the C<MAGIC> structure is a pointer to an |
| C<MGVTBL>, which is a structure of function pointers and stands for |
| "Magic Virtual Table" to handle the various operations that might be |
| applied to that variable. |
| |
| The C<MGVTBL> has five (or sometimes eight) pointers to the following |
| routine types: |
| |
| int (*svt_get)(SV* sv, MAGIC* mg); |
| int (*svt_set)(SV* sv, MAGIC* mg); |
| U32 (*svt_len)(SV* sv, MAGIC* mg); |
| int (*svt_clear)(SV* sv, MAGIC* mg); |
| int (*svt_free)(SV* sv, MAGIC* mg); |
| |
| int (*svt_copy)(SV *sv, MAGIC* mg, SV *nsv, |
| const char *name, I32 namlen); |
| int (*svt_dup)(MAGIC *mg, CLONE_PARAMS *param); |
| int (*svt_local)(SV *nsv, MAGIC *mg); |
| |
| |
| This MGVTBL structure is set at compile-time in F<perl.h> and there are |
| currently 32 types. These different structures contain pointers to various |
| routines that perform additional actions depending on which function is |
| being called. |
| |
| Function pointer Action taken |
| ---------------- ------------ |
| svt_get Do something before the value of the SV is |
| retrieved. |
| svt_set Do something after the SV is assigned a value. |
| svt_len Report on the SV's length. |
| svt_clear Clear something the SV represents. |
| svt_free Free any extra storage associated with the SV. |
| |
| svt_copy copy tied variable magic to a tied element |
| svt_dup duplicate a magic structure during thread cloning |
| svt_local copy magic to local value during 'local' |
| |
| For instance, the MGVTBL structure called C<vtbl_sv> (which corresponds |
| to an C<mg_type> of C<PERL_MAGIC_sv>) contains: |
| |
| { magic_get, magic_set, magic_len, 0, 0 } |
| |
| Thus, when an SV is determined to be magical and of type C<PERL_MAGIC_sv>, |
| if a get operation is being performed, the routine C<magic_get> is |
| called. All the various routines for the various magical types begin |
| with C<magic_>. NOTE: the magic routines are not considered part of |
| the Perl API, and may not be exported by the Perl library. |
| |
| The last three slots are a recent addition, and for source code |
| compatibility they are only checked for if one of the three flags |
| MGf_COPY, MGf_DUP or MGf_LOCAL is set in mg_flags. This means that most |
| code can continue declaring a vtable as a 5-element value. These three are |
| currently used exclusively by the threading code, and are highly subject |
| to change. |
| |
| The current kinds of Magic Virtual Tables are: |
| |
| =for comment |
| This table is generated by regen/mg_vtable.pl. Any changes made here |
| will be lost. |
| |
| =for mg_vtable.pl begin |
| |
| mg_type |
| (old-style char and macro) MGVTBL Type of magic |
| -------------------------- ------ ------------- |
| \0 PERL_MAGIC_sv vtbl_sv Special scalar variable |
| # PERL_MAGIC_arylen vtbl_arylen Array length ($#ary) |
| % PERL_MAGIC_rhash (none) extra data for restricted |
| hashes |
| . PERL_MAGIC_pos vtbl_pos pos() lvalue |
| : PERL_MAGIC_symtab (none) extra data for symbol |
| tables |
| < PERL_MAGIC_backref vtbl_backref for weak ref data |
| @ PERL_MAGIC_arylen_p (none) to move arylen out of |
| XPVAV |
| A PERL_MAGIC_overload vtbl_amagic %OVERLOAD hash |
| a PERL_MAGIC_overload_elem vtbl_amagicelem %OVERLOAD hash element |
| B PERL_MAGIC_bm vtbl_regexp Boyer-Moore |
| (fast string search) |
| c PERL_MAGIC_overload_table vtbl_ovrld Holds overload table |
| (AMT) on stash |
| D PERL_MAGIC_regdata vtbl_regdata Regex match position data |
| (@+ and @- vars) |
| d PERL_MAGIC_regdatum vtbl_regdatum Regex match position data |
| element |
| E PERL_MAGIC_env vtbl_env %ENV hash |
| e PERL_MAGIC_envelem vtbl_envelem %ENV hash element |
| f PERL_MAGIC_fm vtbl_regdata Formline |
| ('compiled' format) |
| G PERL_MAGIC_study vtbl_regexp study()ed string |
| g PERL_MAGIC_regex_global vtbl_mglob m//g target |
| H PERL_MAGIC_hints vtbl_hints %^H hash |
| h PERL_MAGIC_hintselem vtbl_hintselem %^H hash element |
| I PERL_MAGIC_isa vtbl_isa @ISA array |
| i PERL_MAGIC_isaelem vtbl_isaelem @ISA array element |
| k PERL_MAGIC_nkeys vtbl_nkeys scalar(keys()) lvalue |
| L PERL_MAGIC_dbfile (none) Debugger %_<filename |
| l PERL_MAGIC_dbline vtbl_dbline Debugger %_<filename |
| element |
| N PERL_MAGIC_shared (none) Shared between threads |
| n PERL_MAGIC_shared_scalar (none) Shared between threads |
| o PERL_MAGIC_collxfrm vtbl_collxfrm Locale transformation |
| P PERL_MAGIC_tied vtbl_pack Tied array or hash |
| p PERL_MAGIC_tiedelem vtbl_packelem Tied array or hash element |
| q PERL_MAGIC_tiedscalar vtbl_packelem Tied scalar or handle |
| r PERL_MAGIC_qr vtbl_regexp precompiled qr// regex |
| S PERL_MAGIC_sig (none) %SIG hash |
| s PERL_MAGIC_sigelem vtbl_sigelem %SIG hash element |
| t PERL_MAGIC_taint vtbl_taint Taintedness |
| U PERL_MAGIC_uvar vtbl_uvar Available for use by |
| extensions |
| u PERL_MAGIC_uvar_elem (none) Reserved for use by |
| extensions |
| V PERL_MAGIC_vstring vtbl_vstring SV was vstring literal |
| v PERL_MAGIC_vec vtbl_vec vec() lvalue |
| w PERL_MAGIC_utf8 vtbl_utf8 Cached UTF-8 information |
| x PERL_MAGIC_substr vtbl_substr substr() lvalue |
| y PERL_MAGIC_defelem vtbl_defelem Shadow "foreach" iterator |
| variable / smart parameter |
| vivification |
| ] PERL_MAGIC_checkcall (none) inlining/mutation of call |
| to this CV |
| ~ PERL_MAGIC_ext (none) Available for use by |
| extensions |
| |
| =for mg_vtable.pl end |
| |
| When an uppercase and lowercase letter both exist in the table, then the |
| uppercase letter is typically used to represent some kind of composite type |
| (a list or a hash), and the lowercase letter is used to represent an element |
| of that composite type. Some internals code makes use of this case |
| relationship. However, 'v' and 'V' (vec and v-string) are in no way related. |
| |
| The C<PERL_MAGIC_ext> and C<PERL_MAGIC_uvar> magic types are defined |
| specifically for use by extensions and will not be used by perl itself. |
| Extensions can use C<PERL_MAGIC_ext> magic to 'attach' private information |
| to variables (typically objects). This is especially useful because |
| there is no way for normal perl code to corrupt this private information |
| (unlike using extra elements of a hash object). |
| |
| Similarly, C<PERL_MAGIC_uvar> magic can be used much like tie() to call a |
| C function any time a scalar's value is used or changed. The C<MAGIC>'s |
| C<mg_ptr> field points to a C<ufuncs> structure: |
| |
| struct ufuncs { |
| I32 (*uf_val)(pTHX_ IV, SV*); |
| I32 (*uf_set)(pTHX_ IV, SV*); |
| IV uf_index; |
| }; |
| |
| When the SV is read from or written to, the C<uf_val> or C<uf_set> |
| function will be called with C<uf_index> as the first arg and a pointer to |
| the SV as the second. A simple example of how to add C<PERL_MAGIC_uvar> |
| magic is shown below. Note that the ufuncs structure is copied by |
| sv_magic, so you can safely allocate it on the stack. |
| |
| void |
| Umagic(sv) |
| SV *sv; |
| PREINIT: |
| struct ufuncs uf; |
| CODE: |
| uf.uf_val = &my_get_fn; |
| uf.uf_set = &my_set_fn; |
| uf.uf_index = 0; |
| sv_magic(sv, 0, PERL_MAGIC_uvar, (char*)&uf, sizeof(uf)); |
| |
| Attaching C<PERL_MAGIC_uvar> to arrays is permissible but has no effect. |
| |
| For hashes there is a specialized hook that gives control over hash |
| keys (but not values). This hook calls C<PERL_MAGIC_uvar> 'get' magic |
| if the "set" function in the C<ufuncs> structure is NULL. The hook |
| is activated whenever the hash is accessed with a key specified as |
| an C<SV> through the functions C<hv_store_ent>, C<hv_fetch_ent>, |
| C<hv_delete_ent>, and C<hv_exists_ent>. Accessing the key as a string |
| through the functions without the C<..._ent> suffix circumvents the |
| hook. See L<Hash::Util::FieldHash/GUTS> for a detailed description. |
| |
| Note that because multiple extensions may be using C<PERL_MAGIC_ext> |
| or C<PERL_MAGIC_uvar> magic, it is important for extensions to take |
| extra care to avoid conflict. Typically only using the magic on |
| objects blessed into the same class as the extension is sufficient. |
| For C<PERL_MAGIC_ext> magic, it is usually a good idea to define an |
| C<MGVTBL>, even if all its fields will be C<0>, so that individual |
| C<MAGIC> pointers can be identified as a particular kind of magic |
| using their magic virtual table. C<mg_findext> provides an easy way |
| to do that: |
| |
| STATIC MGVTBL my_vtbl = { 0, 0, 0, 0, 0, 0, 0, 0 }; |
| |
| MAGIC *mg; |
| if ((mg = mg_findext(sv, PERL_MAGIC_ext, &my_vtbl))) { |
| /* this is really ours, not another module's PERL_MAGIC_ext */ |
| my_priv_data_t *priv = (my_priv_data_t *)mg->mg_ptr; |
| ... |
| } |
| |
| Also note that the C<sv_set*()> and C<sv_cat*()> functions described |
| earlier do B<not> invoke 'set' magic on their targets. This must |
| be done by the user either by calling the C<SvSETMAGIC()> macro after |
| calling these functions, or by using one of the C<sv_set*_mg()> or |
| C<sv_cat*_mg()> functions. Similarly, generic C code must call the |
| C<SvGETMAGIC()> macro to invoke any 'get' magic if they use an SV |
| obtained from external sources in functions that don't handle magic. |
| See L<perlapi> for a description of these functions. |
| For example, calls to the C<sv_cat*()> functions typically need to be |
| followed by C<SvSETMAGIC()>, but they don't need a prior C<SvGETMAGIC()> |
| since their implementation handles 'get' magic. |
| |
| =head2 Finding Magic |
| |
| MAGIC *mg_find(SV *sv, int type); /* Finds the magic pointer of that |
| * type */ |
| |
| This routine returns a pointer to a C<MAGIC> structure stored in the SV. |
| If the SV does not have that magical feature, C<NULL> is returned. If the |
| SV has multiple instances of that magical feature, the first one will be |
| returned. C<mg_findext> can be used to find a C<MAGIC> structure of an SV |
| based on both its magic type and its magic virtual table: |
| |
| MAGIC *mg_findext(SV *sv, int type, MGVTBL *vtbl); |
| |
| Also, if the SV passed to C<mg_find> or C<mg_findext> is not of type |
| SVt_PVMG, Perl may core dump. |
| |
| int mg_copy(SV* sv, SV* nsv, const char* key, STRLEN klen); |
| |
| This routine checks to see what types of magic C<sv> has. If the mg_type |
| field is an uppercase letter, then the mg_obj is copied to C<nsv>, but |
| the mg_type field is changed to be the lowercase letter. |
| |
| =head2 Understanding the Magic of Tied Hashes and Arrays |
| |
| Tied hashes and arrays are magical beasts of the C<PERL_MAGIC_tied> |
| magic type. |
| |
| WARNING: As of the 5.004 release, proper usage of the array and hash |
| access functions requires understanding a few caveats. Some |
| of these caveats are actually considered bugs in the API, to be fixed |
| in later releases, and are bracketed with [MAYCHANGE] below. If |
| you find yourself actually applying such information in this section, be |
| aware that the behavior may change in the future, umm, without warning. |
| |
| The perl tie function associates a variable with an object that implements |
| the various GET, SET, etc methods. To perform the equivalent of the perl |
| tie function from an XSUB, you must mimic this behaviour. The code below |
| carries out the necessary steps - firstly it creates a new hash, and then |
| creates a second hash which it blesses into the class which will implement |
| the tie methods. Lastly it ties the two hashes together, and returns a |
| reference to the new tied hash. Note that the code below does NOT call the |
| TIEHASH method in the MyTie class - |
| see L<Calling Perl Routines from within C Programs> for details on how |
| to do this. |
| |
| SV* |
| mytie() |
| PREINIT: |
| HV *hash; |
| HV *stash; |
| SV *tie; |
| CODE: |
| hash = newHV(); |
| tie = newRV_noinc((SV*)newHV()); |
| stash = gv_stashpv("MyTie", GV_ADD); |
| sv_bless(tie, stash); |
| hv_magic(hash, (GV*)tie, PERL_MAGIC_tied); |
| RETVAL = newRV_noinc(hash); |
| OUTPUT: |
| RETVAL |
| |
| The C<av_store> function, when given a tied array argument, merely |
| copies the magic of the array onto the value to be "stored", using |
| C<mg_copy>. It may also return NULL, indicating that the value did not |
| actually need to be stored in the array. [MAYCHANGE] After a call to |
| C<av_store> on a tied array, the caller will usually need to call |
| C<mg_set(val)> to actually invoke the perl level "STORE" method on the |
| TIEARRAY object. If C<av_store> did return NULL, a call to |
| C<SvREFCNT_dec(val)> will also be usually necessary to avoid a memory |
| leak. [/MAYCHANGE] |
| |
| The previous paragraph is applicable verbatim to tied hash access using the |
| C<hv_store> and C<hv_store_ent> functions as well. |
| |
| C<av_fetch> and the corresponding hash functions C<hv_fetch> and |
| C<hv_fetch_ent> actually return an undefined mortal value whose magic |
| has been initialized using C<mg_copy>. Note the value so returned does not |
| need to be deallocated, as it is already mortal. [MAYCHANGE] But you will |
| need to call C<mg_get()> on the returned value in order to actually invoke |
| the perl level "FETCH" method on the underlying TIE object. Similarly, |
| you may also call C<mg_set()> on the return value after possibly assigning |
| a suitable value to it using C<sv_setsv>, which will invoke the "STORE" |
| method on the TIE object. [/MAYCHANGE] |
| |
| [MAYCHANGE] |
| In other words, the array or hash fetch/store functions don't really |
| fetch and store actual values in the case of tied arrays and hashes. They |
| merely call C<mg_copy> to attach magic to the values that were meant to be |
| "stored" or "fetched". Later calls to C<mg_get> and C<mg_set> actually |
| do the job of invoking the TIE methods on the underlying objects. Thus |
| the magic mechanism currently implements a kind of lazy access to arrays |
| and hashes. |
| |
| Currently (as of perl version 5.004), use of the hash and array access |
| functions requires the user to be aware of whether they are operating on |
| "normal" hashes and arrays, or on their tied variants. The API may be |
| changed to provide more transparent access to both tied and normal data |
| types in future versions. |
| [/MAYCHANGE] |
| |
| You would do well to understand that the TIEARRAY and TIEHASH interfaces |
| are mere sugar to invoke some perl method calls while using the uniform hash |
| and array syntax. The use of this sugar imposes some overhead (typically |
| about two to four extra opcodes per FETCH/STORE operation, in addition to |
| the creation of all the mortal variables required to invoke the methods). |
| This overhead will be comparatively small if the TIE methods are themselves |
| substantial, but if they are only a few statements long, the overhead |
| will not be insignificant. |
| |
| =head2 Localizing changes |
| |
| Perl has a very handy construction |
| |
| { |
| local $var = 2; |
| ... |
| } |
| |
| This construction is I<approximately> equivalent to |
| |
| { |
| my $oldvar = $var; |
| $var = 2; |
| ... |
| $var = $oldvar; |
| } |
| |
| The biggest difference is that the first construction would |
| reinstate the initial value of $var, irrespective of how control exits |
| the block: C<goto>, C<return>, C<die>/C<eval>, etc. It is a little bit |
| more efficient as well. |
| |
| There is a way to achieve a similar task from C via Perl API: create a |
| I<pseudo-block>, and arrange for some changes to be automatically |
| undone at the end of it, either explicit, or via a non-local exit (via |
| die()). A I<block>-like construct is created by a pair of |
| C<ENTER>/C<LEAVE> macros (see L<perlcall/"Returning a Scalar">). |
| Such a construct may be created specially for some important localized |
| task, or an existing one (like boundaries of enclosing Perl |
| subroutine/block, or an existing pair for freeing TMPs) may be |
| used. (In the second case the overhead of additional localization must |
| be almost negligible.) Note that any XSUB is automatically enclosed in |
| an C<ENTER>/C<LEAVE> pair. |
| |
| Inside such a I<pseudo-block> the following service is available: |
| |
| =over 4 |
| |
| =item C<SAVEINT(int i)> |
| |
| =item C<SAVEIV(IV i)> |
| |
| =item C<SAVEI32(I32 i)> |
| |
| =item C<SAVELONG(long i)> |
| |
| These macros arrange things to restore the value of integer variable |
| C<i> at the end of enclosing I<pseudo-block>. |
| |
| =item C<SAVESPTR(s)> |
| |
| =item C<SAVEPPTR(p)> |
| |
| These macros arrange things to restore the value of pointers C<s> and |
| C<p>. C<s> must be a pointer of a type which survives conversion to |
| C<SV*> and back, C<p> should be able to survive conversion to C<char*> |
| and back. |
| |
| =item C<SAVEFREESV(SV *sv)> |
| |
| The refcount of C<sv> would be decremented at the end of |
| I<pseudo-block>. This is similar to C<sv_2mortal> in that it is also a |
| mechanism for doing a delayed C<SvREFCNT_dec>. However, while C<sv_2mortal> |
| extends the lifetime of C<sv> until the beginning of the next statement, |
| C<SAVEFREESV> extends it until the end of the enclosing scope. These |
| lifetimes can be wildly different. |
| |
| Also compare C<SAVEMORTALIZESV>. |
| |
| =item C<SAVEMORTALIZESV(SV *sv)> |
| |
| Just like C<SAVEFREESV>, but mortalizes C<sv> at the end of the current |
| scope instead of decrementing its reference count. This usually has the |
| effect of keeping C<sv> alive until the statement that called the currently |
| live scope has finished executing. |
| |
| =item C<SAVEFREEOP(OP *op)> |
| |
| The C<OP *> is op_free()ed at the end of I<pseudo-block>. |
| |
| =item C<SAVEFREEPV(p)> |
| |
| The chunk of memory which is pointed to by C<p> is Safefree()ed at the |
| end of I<pseudo-block>. |
| |
| =item C<SAVECLEARSV(SV *sv)> |
| |
| Clears a slot in the current scratchpad which corresponds to C<sv> at |
| the end of I<pseudo-block>. |
| |
| =item C<SAVEDELETE(HV *hv, char *key, I32 length)> |
| |
| The key C<key> of C<hv> is deleted at the end of I<pseudo-block>. The |
| string pointed to by C<key> is Safefree()ed. If one has a I<key> in |
| short-lived storage, the corresponding string may be reallocated like |
| this: |
| |
| SAVEDELETE(PL_defstash, savepv(tmpbuf), strlen(tmpbuf)); |
| |
| =item C<SAVEDESTRUCTOR(DESTRUCTORFUNC_NOCONTEXT_t f, void *p)> |
| |
| At the end of I<pseudo-block> the function C<f> is called with the |
| only argument C<p>. |
| |
| =item C<SAVEDESTRUCTOR_X(DESTRUCTORFUNC_t f, void *p)> |
| |
| At the end of I<pseudo-block> the function C<f> is called with the |
| implicit context argument (if any), and C<p>. |
| |
| =item C<SAVESTACK_POS()> |
| |
| The current offset on the Perl internal stack (cf. C<SP>) is restored |
| at the end of I<pseudo-block>. |
| |
| =back |
| |
| The following API list contains functions, thus one needs to |
| provide pointers to the modifiable data explicitly (either C pointers, |
| or Perlish C<GV *>s). Where the above macros take C<int>, a similar |
| function takes C<int *>. |
| |
| =over 4 |
| |
| =item C<SV* save_scalar(GV *gv)> |
| |
| Equivalent to Perl code C<local $gv>. |
| |
| =item C<AV* save_ary(GV *gv)> |
| |
| =item C<HV* save_hash(GV *gv)> |
| |
| Similar to C<save_scalar>, but localize C<@gv> and C<%gv>. |
| |
| =item C<void save_item(SV *item)> |
| |
| Duplicates the current value of C<SV>, on the exit from the current |
| C<ENTER>/C<LEAVE> I<pseudo-block> will restore the value of C<SV> |
| using the stored value. It doesn't handle magic. Use C<save_scalar> if |
| magic is affected. |
| |
| =item C<void save_list(SV **sarg, I32 maxsarg)> |
| |
| A variant of C<save_item> which takes multiple arguments via an array |
| C<sarg> of C<SV*> of length C<maxsarg>. |
| |
| =item C<SV* save_svref(SV **sptr)> |
| |
| Similar to C<save_scalar>, but will reinstate an C<SV *>. |
| |
| =item C<void save_aptr(AV **aptr)> |
| |
| =item C<void save_hptr(HV **hptr)> |
| |
| Similar to C<save_svref>, but localize C<AV *> and C<HV *>. |
| |
| =back |
| |
| The C<Alias> module implements localization of the basic types within the |
| I<caller's scope>. People who are interested in how to localize things in |
| the containing scope should take a look there too. |
| |
| =head1 Subroutines |
| |
| =head2 XSUBs and the Argument Stack |
| |
| The XSUB mechanism is a simple way for Perl programs to access C subroutines. |
| An XSUB routine will have a stack that contains the arguments from the Perl |
| program, and a way to map from the Perl data structures to a C equivalent. |
| |
| The stack arguments are accessible through the C<ST(n)> macro, which returns |
| the C<n>'th stack argument. Argument 0 is the first argument passed in the |
| Perl subroutine call. These arguments are C<SV*>, and can be used anywhere |
| an C<SV*> is used. |
| |
| Most of the time, output from the C routine can be handled through use of |
| the RETVAL and OUTPUT directives. However, there are some cases where the |
| argument stack is not already long enough to handle all the return values. |
| An example is the POSIX tzname() call, which takes no arguments, but returns |
| two, the local time zone's standard and summer time abbreviations. |
| |
| To handle this situation, the PPCODE directive is used and the stack is |
| extended using the macro: |
| |
| EXTEND(SP, num); |
| |
| where C<SP> is the macro that represents the local copy of the stack pointer, |
| and C<num> is the number of elements the stack should be extended by. |
| |
| Now that there is room on the stack, values can be pushed on it using C<PUSHs> |
| macro. The pushed values will often need to be "mortal" (See |
| L</Reference Counts and Mortality>): |
| |
| PUSHs(sv_2mortal(newSViv(an_integer))) |
| PUSHs(sv_2mortal(newSVuv(an_unsigned_integer))) |
| PUSHs(sv_2mortal(newSVnv(a_double))) |
| PUSHs(sv_2mortal(newSVpv("Some String",0))) |
| /* Although the last example is better written as the more |
| * efficient: */ |
| PUSHs(newSVpvs_flags("Some String", SVs_TEMP)) |
| |
| And now the Perl program calling C<tzname>, the two values will be assigned |
| as in: |
| |
| ($standard_abbrev, $summer_abbrev) = POSIX::tzname; |
| |
| An alternate (and possibly simpler) method to pushing values on the stack is |
| to use the macro: |
| |
| XPUSHs(SV*) |
| |
| This macro automatically adjusts the stack for you, if needed. Thus, you |
| do not need to call C<EXTEND> to extend the stack. |
| |
| Despite their suggestions in earlier versions of this document the macros |
| C<(X)PUSH[iunp]> are I<not> suited to XSUBs which return multiple results. |
| For that, either stick to the C<(X)PUSHs> macros shown above, or use the new |
| C<m(X)PUSH[iunp]> macros instead; see L</Putting a C value on Perl stack>. |
| |
| For more information, consult L<perlxs> and L<perlxstut>. |
| |
| =head2 Autoloading with XSUBs |
| |
| If an AUTOLOAD routine is an XSUB, as with Perl subroutines, Perl puts the |
| fully-qualified name of the autoloaded subroutine in the $AUTOLOAD variable |
| of the XSUB's package. |
| |
| But it also puts the same information in certain fields of the XSUB itself: |
| |
| HV *stash = CvSTASH(cv); |
| const char *subname = SvPVX(cv); |
| STRLEN name_length = SvCUR(cv); /* in bytes */ |
| U32 is_utf8 = SvUTF8(cv); |
| |
| C<SvPVX(cv)> contains just the sub name itself, not including the package. |
| For an AUTOLOAD routine in UNIVERSAL or one of its superclasses, |
| C<CvSTASH(cv)> returns NULL during a method call on a nonexistent package. |
| |
| B<Note>: Setting $AUTOLOAD stopped working in 5.6.1, which did not support |
| XS AUTOLOAD subs at all. Perl 5.8.0 introduced the use of fields in the |
| XSUB itself. Perl 5.16.0 restored the setting of $AUTOLOAD. If you need |
| to support 5.8-5.14, use the XSUB's fields. |
| |
| =head2 Calling Perl Routines from within C Programs |
| |
| There are four routines that can be used to call a Perl subroutine from |
| within a C program. These four are: |
| |
| I32 call_sv(SV*, I32); |
| I32 call_pv(const char*, I32); |
| I32 call_method(const char*, I32); |
| I32 call_argv(const char*, I32, register char**); |
| |
| The routine most often used is C<call_sv>. The C<SV*> argument |
| contains either the name of the Perl subroutine to be called, or a |
| reference to the subroutine. The second argument consists of flags |
| that control the context in which the subroutine is called, whether |
| or not the subroutine is being passed arguments, how errors should be |
| trapped, and how to treat return values. |
| |
| All four routines return the number of arguments that the subroutine returned |
| on the Perl stack. |
| |
| These routines used to be called C<perl_call_sv>, etc., before Perl v5.6.0, |
| but those names are now deprecated; macros of the same name are provided for |
| compatibility. |
| |
| When using any of these routines (except C<call_argv>), the programmer |
| must manipulate the Perl stack. These include the following macros and |
| functions: |
| |
| dSP |
| SP |
| PUSHMARK() |
| PUTBACK |
| SPAGAIN |
| ENTER |
| SAVETMPS |
| FREETMPS |
| LEAVE |
| XPUSH*() |
| POP*() |
| |
| For a detailed description of calling conventions from C to Perl, |
| consult L<perlcall>. |
| |
| =head2 Memory Allocation |
| |
| =head3 Allocation |
| |
| All memory meant to be used with the Perl API functions should be manipulated |
| using the macros described in this section. The macros provide the necessary |
| transparency between differences in the actual malloc implementation that is |
| used within perl. |
| |
| It is suggested that you enable the version of malloc that is distributed |
| with Perl. It keeps pools of various sizes of unallocated memory in |
| order to satisfy allocation requests more quickly. However, on some |
| platforms, it may cause spurious malloc or free errors. |
| |
| The following three macros are used to initially allocate memory : |
| |
| Newx(pointer, number, type); |
| Newxc(pointer, number, type, cast); |
| Newxz(pointer, number, type); |
| |
| The first argument C<pointer> should be the name of a variable that will |
| point to the newly allocated memory. |
| |
| The second and third arguments C<number> and C<type> specify how many of |
| the specified type of data structure should be allocated. The argument |
| C<type> is passed to C<sizeof>. The final argument to C<Newxc>, C<cast>, |
| should be used if the C<pointer> argument is different from the C<type> |
| argument. |
| |
| Unlike the C<Newx> and C<Newxc> macros, the C<Newxz> macro calls C<memzero> |
| to zero out all the newly allocated memory. |
| |
| =head3 Reallocation |
| |
| Renew(pointer, number, type); |
| Renewc(pointer, number, type, cast); |
| Safefree(pointer) |
| |
| These three macros are used to change a memory buffer size or to free a |
| piece of memory no longer needed. The arguments to C<Renew> and C<Renewc> |
| match those of C<New> and C<Newc> with the exception of not needing the |
| "magic cookie" argument. |
| |
| =head3 Moving |
| |
| Move(source, dest, number, type); |
| Copy(source, dest, number, type); |
| Zero(dest, number, type); |
| |
| These three macros are used to move, copy, or zero out previously allocated |
| memory. The C<source> and C<dest> arguments point to the source and |
| destination starting points. Perl will move, copy, or zero out C<number> |
| instances of the size of the C<type> data structure (using the C<sizeof> |
| function). |
| |
| =head2 PerlIO |
| |
| The most recent development releases of Perl have been experimenting with |
| removing Perl's dependency on the "normal" standard I/O suite and allowing |
| other stdio implementations to be used. This involves creating a new |
| abstraction layer that then calls whichever implementation of stdio Perl |
| was compiled with. All XSUBs should now use the functions in the PerlIO |
| abstraction layer and not make any assumptions about what kind of stdio |
| is being used. |
| |
| For a complete description of the PerlIO abstraction, consult L<perlapio>. |
| |
| =head2 Putting a C value on Perl stack |
| |
| A lot of opcodes (this is an elementary operation in the internal perl |
| stack machine) put an SV* on the stack. However, as an optimization |
| the corresponding SV is (usually) not recreated each time. The opcodes |
| reuse specially assigned SVs (I<target>s) which are (as a corollary) |
| not constantly freed/created. |
| |
| Each of the targets is created only once (but see |
| L<Scratchpads and recursion> below), and when an opcode needs to put |
| an integer, a double, or a string on stack, it just sets the |
| corresponding parts of its I<target> and puts the I<target> on stack. |
| |
| The macro to put this target on stack is C<PUSHTARG>, and it is |
| directly used in some opcodes, as well as indirectly in zillions of |
| others, which use it via C<(X)PUSH[iunp]>. |
| |
| Because the target is reused, you must be careful when pushing multiple |
| values on the stack. The following code will not do what you think: |
| |
| XPUSHi(10); |
| XPUSHi(20); |
| |
| This translates as "set C<TARG> to 10, push a pointer to C<TARG> onto |
| the stack; set C<TARG> to 20, push a pointer to C<TARG> onto the stack". |
| At the end of the operation, the stack does not contain the values 10 |
| and 20, but actually contains two pointers to C<TARG>, which we have set |
| to 20. |
| |
| If you need to push multiple different values then you should either use |
| the C<(X)PUSHs> macros, or else use the new C<m(X)PUSH[iunp]> macros, |
| none of which make use of C<TARG>. The C<(X)PUSHs> macros simply push an |
| SV* on the stack, which, as noted under L</XSUBs and the Argument Stack>, |
| will often need to be "mortal". The new C<m(X)PUSH[iunp]> macros make |
| this a little easier to achieve by creating a new mortal for you (via |
| C<(X)PUSHmortal>), pushing that onto the stack (extending it if necessary |
| in the case of the C<mXPUSH[iunp]> macros), and then setting its value. |
| Thus, instead of writing this to "fix" the example above: |
| |
| XPUSHs(sv_2mortal(newSViv(10))) |
| XPUSHs(sv_2mortal(newSViv(20))) |
| |
| you can simply write: |
| |
| mXPUSHi(10) |
| mXPUSHi(20) |
| |
| On a related note, if you do use C<(X)PUSH[iunp]>, then you're going to |
| need a C<dTARG> in your variable declarations so that the C<*PUSH*> |
| macros can make use of the local variable C<TARG>. See also C<dTARGET> |
| and C<dXSTARG>. |
| |
| =head2 Scratchpads |
| |
| The question remains on when the SVs which are I<target>s for opcodes |
| are created. The answer is that they are created when the current |
| unit--a subroutine or a file (for opcodes for statements outside of |
| subroutines)--is compiled. During this time a special anonymous Perl |
| array is created, which is called a scratchpad for the current unit. |
| |
| A scratchpad keeps SVs which are lexicals for the current unit and are |
| targets for opcodes. One can deduce that an SV lives on a scratchpad |
| by looking on its flags: lexicals have C<SVs_PADMY> set, and |
| I<target>s have C<SVs_PADTMP> set. |
| |
| The correspondence between OPs and I<target>s is not 1-to-1. Different |
| OPs in the compile tree of the unit can use the same target, if this |
| would not conflict with the expected life of the temporary. |
| |
| =head2 Scratchpads and recursion |
| |
| In fact it is not 100% true that a compiled unit contains a pointer to |
| the scratchpad AV. In fact it contains a pointer to an AV of |
| (initially) one element, and this element is the scratchpad AV. Why do |
| we need an extra level of indirection? |
| |
| The answer is B<recursion>, and maybe B<threads>. Both |
| these can create several execution pointers going into the same |
| subroutine. For the subroutine-child not write over the temporaries |
| for the subroutine-parent (lifespan of which covers the call to the |
| child), the parent and the child should have different |
| scratchpads. (I<And> the lexicals should be separate anyway!) |
| |
| So each subroutine is born with an array of scratchpads (of length 1). |
| On each entry to the subroutine it is checked that the current |
| depth of the recursion is not more than the length of this array, and |
| if it is, new scratchpad is created and pushed into the array. |
| |
| The I<target>s on this scratchpad are C<undef>s, but they are already |
| marked with correct flags. |
| |
| =head1 Compiled code |
| |
| =head2 Code tree |
| |
| Here we describe the internal form your code is converted to by |
| Perl. Start with a simple example: |
| |
| $a = $b + $c; |
| |
| This is converted to a tree similar to this one: |
| |
| assign-to |
| / \ |
| + $a |
| / \ |
| $b $c |
| |
| (but slightly more complicated). This tree reflects the way Perl |
| parsed your code, but has nothing to do with the execution order. |
| There is an additional "thread" going through the nodes of the tree |
| which shows the order of execution of the nodes. In our simplified |
| example above it looks like: |
| |
| $b ---> $c ---> + ---> $a ---> assign-to |
| |
| But with the actual compile tree for C<$a = $b + $c> it is different: |
| some nodes I<optimized away>. As a corollary, though the actual tree |
| contains more nodes than our simplified example, the execution order |
| is the same as in our example. |
| |
| =head2 Examining the tree |
| |
| If you have your perl compiled for debugging (usually done with |
| C<-DDEBUGGING> on the C<Configure> command line), you may examine the |
| compiled tree by specifying C<-Dx> on the Perl command line. The |
| output takes several lines per node, and for C<$b+$c> it looks like |
| this: |
| |
| 5 TYPE = add ===> 6 |
| TARG = 1 |
| FLAGS = (SCALAR,KIDS) |
| { |
| TYPE = null ===> (4) |
| (was rv2sv) |
| FLAGS = (SCALAR,KIDS) |
| { |
| 3 TYPE = gvsv ===> 4 |
| FLAGS = (SCALAR) |
| GV = main::b |
| } |
| } |
| { |
| TYPE = null ===> (5) |
| (was rv2sv) |
| FLAGS = (SCALAR,KIDS) |
| { |
| 4 TYPE = gvsv ===> 5 |
| FLAGS = (SCALAR) |
| GV = main::c |
| } |
| } |
| |
| This tree has 5 nodes (one per C<TYPE> specifier), only 3 of them are |
| not optimized away (one per number in the left column). The immediate |
| children of the given node correspond to C<{}> pairs on the same level |
| of indentation, thus this listing corresponds to the tree: |
| |
| add |
| / \ |
| null null |
| | | |
| gvsv gvsv |
| |
| The execution order is indicated by C<===E<gt>> marks, thus it is C<3 |
| 4 5 6> (node C<6> is not included into above listing), i.e., |
| C<gvsv gvsv add whatever>. |
| |
| Each of these nodes represents an op, a fundamental operation inside the |
| Perl core. The code which implements each operation can be found in the |
| F<pp*.c> files; the function which implements the op with type C<gvsv> |
| is C<pp_gvsv>, and so on. As the tree above shows, different ops have |
| different numbers of children: C<add> is a binary operator, as one would |
| expect, and so has two children. To accommodate the various different |
| numbers of children, there are various types of op data structure, and |
| they link together in different ways. |
| |
| The simplest type of op structure is C<OP>: this has no children. Unary |
| operators, C<UNOP>s, have one child, and this is pointed to by the |
| C<op_first> field. Binary operators (C<BINOP>s) have not only an |
| C<op_first> field but also an C<op_last> field. The most complex type of |
| op is a C<LISTOP>, which has any number of children. In this case, the |
| first child is pointed to by C<op_first> and the last child by |
| C<op_last>. The children in between can be found by iteratively |
| following the C<op_sibling> pointer from the first child to the last. |
| |
| There are also two other op types: a C<PMOP> holds a regular expression, |
| and has no children, and a C<LOOP> may or may not have children. If the |
| C<op_children> field is non-zero, it behaves like a C<LISTOP>. To |
| complicate matters, if a C<UNOP> is actually a C<null> op after |
| optimization (see L</Compile pass 2: context propagation>) it will still |
| have children in accordance with its former type. |
| |
| Another way to examine the tree is to use a compiler back-end module, such |
| as L<B::Concise>. |
| |
| =head2 Compile pass 1: check routines |
| |
| The tree is created by the compiler while I<yacc> code feeds it |
| the constructions it recognizes. Since I<yacc> works bottom-up, so does |
| the first pass of perl compilation. |
| |
| What makes this pass interesting for perl developers is that some |
| optimization may be performed on this pass. This is optimization by |
| so-called "check routines". The correspondence between node names |
| and corresponding check routines is described in F<opcode.pl> (do not |
| forget to run C<make regen_headers> if you modify this file). |
| |
| A check routine is called when the node is fully constructed except |
| for the execution-order thread. Since at this time there are no |
| back-links to the currently constructed node, one can do most any |
| operation to the top-level node, including freeing it and/or creating |
| new nodes above/below it. |
| |
| The check routine returns the node which should be inserted into the |
| tree (if the top-level node was not modified, check routine returns |
| its argument). |
| |
| By convention, check routines have names C<ck_*>. They are usually |
| called from C<new*OP> subroutines (or C<convert>) (which in turn are |
| called from F<perly.y>). |
| |
| =head2 Compile pass 1a: constant folding |
| |
| Immediately after the check routine is called the returned node is |
| checked for being compile-time executable. If it is (the value is |
| judged to be constant) it is immediately executed, and a I<constant> |
| node with the "return value" of the corresponding subtree is |
| substituted instead. The subtree is deleted. |
| |
| If constant folding was not performed, the execution-order thread is |
| created. |
| |
| =head2 Compile pass 2: context propagation |
| |
| When a context for a part of compile tree is known, it is propagated |
| down through the tree. At this time the context can have 5 values |
| (instead of 2 for runtime context): void, boolean, scalar, list, and |
| lvalue. In contrast with the pass 1 this pass is processed from top |
| to bottom: a node's context determines the context for its children. |
| |
| Additional context-dependent optimizations are performed at this time. |
| Since at this moment the compile tree contains back-references (via |
| "thread" pointers), nodes cannot be free()d now. To allow |
| optimized-away nodes at this stage, such nodes are null()ified instead |
| of free()ing (i.e. their type is changed to OP_NULL). |
| |
| =head2 Compile pass 3: peephole optimization |
| |
| After the compile tree for a subroutine (or for an C<eval> or a file) |
| is created, an additional pass over the code is performed. This pass |
| is neither top-down or bottom-up, but in the execution order (with |
| additional complications for conditionals). Optimizations performed |
| at this stage are subject to the same restrictions as in the pass 2. |
| |
| Peephole optimizations are done by calling the function pointed to |
| by the global variable C<PL_peepp>. By default, C<PL_peepp> just |
| calls the function pointed to by the global variable C<PL_rpeepp>. |
| By default, that performs some basic op fixups and optimisations along |
| the execution-order op chain, and recursively calls C<PL_rpeepp> for |
| each side chain of ops (resulting from conditionals). Extensions may |
| provide additional optimisations or fixups, hooking into either the |
| per-subroutine or recursive stage, like this: |
| |
| static peep_t prev_peepp; |
| static void my_peep(pTHX_ OP *o) |
| { |
| /* custom per-subroutine optimisation goes here */ |
| prev_peepp(o); |
| /* custom per-subroutine optimisation may also go here */ |
| } |
| BOOT: |
| prev_peepp = PL_peepp; |
| PL_peepp = my_peep; |
| |
| static peep_t prev_rpeepp; |
| static void my_rpeep(pTHX_ OP *o) |
| { |
| OP *orig_o = o; |
| for(; o; o = o->op_next) { |
| /* custom per-op optimisation goes here */ |
| } |
| prev_rpeepp(orig_o); |
| } |
| BOOT: |
| prev_rpeepp = PL_rpeepp; |
| PL_rpeepp = my_rpeep; |
| |
| =head2 Pluggable runops |
| |
| The compile tree is executed in a runops function. There are two runops |
| functions, in F<run.c> and in F<dump.c>. C<Perl_runops_debug> is used |
| with DEBUGGING and C<Perl_runops_standard> is used otherwise. For fine |
| control over the execution of the compile tree it is possible to provide |
| your own runops function. |
| |
| It's probably best to copy one of the existing runops functions and |
| change it to suit your needs. Then, in the BOOT section of your XS |
| file, add the line: |
| |
| PL_runops = my_runops; |
| |
| This function should be as efficient as possible to keep your programs |
| running as fast as possible. |
| |
| =head2 Compile-time scope hooks |
| |
| As of perl 5.14 it is possible to hook into the compile-time lexical |
| scope mechanism using C<Perl_blockhook_register>. This is used like |
| this: |
| |
| STATIC void my_start_hook(pTHX_ int full); |
| STATIC BHK my_hooks; |
| |
| BOOT: |
| BhkENTRY_set(&my_hooks, bhk_start, my_start_hook); |
| Perl_blockhook_register(aTHX_ &my_hooks); |
| |
| This will arrange to have C<my_start_hook> called at the start of |
| compiling every lexical scope. The available hooks are: |
| |
| =over 4 |
| |
| =item C<void bhk_start(pTHX_ int full)> |
| |
| This is called just after starting a new lexical scope. Note that Perl |
| code like |
| |
| if ($x) { ... } |
| |
| creates two scopes: the first starts at the C<(> and has C<full == 1>, |
| the second starts at the C<{> and has C<full == 0>. Both end at the |
| C<}>, so calls to C<start> and C<pre/post_end> will match. Anything |
| pushed onto the save stack by this hook will be popped just before the |
| scope ends (between the C<pre_> and C<post_end> hooks, in fact). |
| |
| =item C<void bhk_pre_end(pTHX_ OP **o)> |
| |
| This is called at the end of a lexical scope, just before unwinding the |
| stack. I<o> is the root of the optree representing the scope; it is a |
| double pointer so you can replace the OP if you need to. |
| |
| =item C<void bhk_post_end(pTHX_ OP **o)> |
| |
| This is called at the end of a lexical scope, just after unwinding the |
| stack. I<o> is as above. Note that it is possible for calls to C<pre_> |
| and C<post_end> to nest, if there is something on the save stack that |
| calls string eval. |
| |
| =item C<void bhk_eval(pTHX_ OP *const o)> |
| |
| This is called just before starting to compile an C<eval STRING>, C<do |
| FILE>, C<require> or C<use>, after the eval has been set up. I<o> is the |
| OP that requested the eval, and will normally be an C<OP_ENTEREVAL>, |
| C<OP_DOFILE> or C<OP_REQUIRE>. |
| |
| =back |
| |
| Once you have your hook functions, you need a C<BHK> structure to put |
| them in. It's best to allocate it statically, since there is no way to |
| free it once it's registered. The function pointers should be inserted |
| into this structure using the C<BhkENTRY_set> macro, which will also set |
| flags indicating which entries are valid. If you do need to allocate |
| your C<BHK> dynamically for some reason, be sure to zero it before you |
| start. |
| |
| Once registered, there is no mechanism to switch these hooks off, so if |
| that is necessary you will need to do this yourself. An entry in C<%^H> |
| is probably the best way, so the effect is lexically scoped; however it |
| is also possible to use the C<BhkDISABLE> and C<BhkENABLE> macros to |
| temporarily switch entries on and off. You should also be aware that |
| generally speaking at least one scope will have opened before your |
| extension is loaded, so you will see some C<pre/post_end> pairs that |
| didn't have a matching C<start>. |
| |
| =head1 Examining internal data structures with the C<dump> functions |
| |
| To aid debugging, the source file F<dump.c> contains a number of |
| functions which produce formatted output of internal data structures. |
| |
| The most commonly used of these functions is C<Perl_sv_dump>; it's used |
| for dumping SVs, AVs, HVs, and CVs. The C<Devel::Peek> module calls |
| C<sv_dump> to produce debugging output from Perl-space, so users of that |
| module should already be familiar with its format. |
| |
| C<Perl_op_dump> can be used to dump an C<OP> structure or any of its |
| derivatives, and produces output similar to C<perl -Dx>; in fact, |
| C<Perl_dump_eval> will dump the main root of the code being evaluated, |
| exactly like C<-Dx>. |
| |
| Other useful functions are C<Perl_dump_sub>, which turns a C<GV> into an |
| op tree, C<Perl_dump_packsubs> which calls C<Perl_dump_sub> on all the |
| subroutines in a package like so: (Thankfully, these are all xsubs, so |
| there is no op tree) |
| |
| (gdb) print Perl_dump_packsubs(PL_defstash) |
| |
| SUB attributes::bootstrap = (xsub 0x811fedc 0) |
| |
| SUB UNIVERSAL::can = (xsub 0x811f50c 0) |
| |
| SUB UNIVERSAL::isa = (xsub 0x811f304 0) |
| |
| SUB UNIVERSAL::VERSION = (xsub 0x811f7ac 0) |
| |
| SUB DynaLoader::boot_DynaLoader = (xsub 0x805b188 0) |
| |
| and C<Perl_dump_all>, which dumps all the subroutines in the stash and |
| the op tree of the main root. |
| |
| =head1 How multiple interpreters and concurrency are supported |
| |
| =head2 Background and PERL_IMPLICIT_CONTEXT |
| |
| The Perl interpreter can be regarded as a closed box: it has an API |
| for feeding it code or otherwise making it do things, but it also has |
| functions for its own use. This smells a lot like an object, and |
| there are ways for you to build Perl so that you can have multiple |
| interpreters, with one interpreter represented either as a C structure, |
| or inside a thread-specific structure. These structures contain all |
| the context, the state of that interpreter. |
| |
| One macro controls the major Perl build flavor: MULTIPLICITY. The |
| MULTIPLICITY build has a C structure that packages all the interpreter |
| state. With multiplicity-enabled perls, PERL_IMPLICIT_CONTEXT is also |
| normally defined, and enables the support for passing in a "hidden" first |
| argument that represents all three data structures. MULTIPLICITY makes |
| multi-threaded perls possible (with the ithreads threading model, related |
| to the macro USE_ITHREADS.) |
| |
| Two other "encapsulation" macros are the PERL_GLOBAL_STRUCT and |
| PERL_GLOBAL_STRUCT_PRIVATE (the latter turns on the former, and the |
| former turns on MULTIPLICITY.) The PERL_GLOBAL_STRUCT causes all the |
| internal variables of Perl to be wrapped inside a single global struct, |
| struct perl_vars, accessible as (globals) &PL_Vars or PL_VarsPtr or |
| the function Perl_GetVars(). The PERL_GLOBAL_STRUCT_PRIVATE goes |
| one step further, there is still a single struct (allocated in main() |
| either from heap or from stack) but there are no global data symbols |
| pointing to it. In either case the global struct should be initialised |
| as the very first thing in main() using Perl_init_global_struct() and |
| correspondingly tear it down after perl_free() using Perl_free_global_struct(), |
| please see F<miniperlmain.c> for usage details. You may also need |
| to use C<dVAR> in your coding to "declare the global variables" |
| when you are using them. dTHX does this for you automatically. |
| |
| To see whether you have non-const data you can use a BSD-compatible C<nm>: |
| |
| nm libperl.a | grep -v ' [TURtr] ' |
| |
| If this displays any C<D> or C<d> symbols, you have non-const data. |
| |
| For backward compatibility reasons defining just PERL_GLOBAL_STRUCT |
| doesn't actually hide all symbols inside a big global struct: some |
| PerlIO_xxx vtables are left visible. The PERL_GLOBAL_STRUCT_PRIVATE |
| then hides everything (see how the PERLIO_FUNCS_DECL is used). |
| |
| All this obviously requires a way for the Perl internal functions to be |
| either subroutines taking some kind of structure as the first |
| argument, or subroutines taking nothing as the first argument. To |
| enable these two very different ways of building the interpreter, |
| the Perl source (as it does in so many other situations) makes heavy |
| use of macros and subroutine naming conventions. |
| |
| First problem: deciding which functions will be public API functions and |
| which will be private. All functions whose names begin C<S_> are private |
| (think "S" for "secret" or "static"). All other functions begin with |
| "Perl_", but just because a function begins with "Perl_" does not mean it is |
| part of the API. (See L</Internal Functions>.) The easiest way to be B<sure> a |
| function is part of the API is to find its entry in L<perlapi>. |
| If it exists in L<perlapi>, it's part of the API. If it doesn't, and you |
| think it should be (i.e., you need it for your extension), send mail via |
| L<perlbug> explaining why you think it should be. |
| |
| Second problem: there must be a syntax so that the same subroutine |
| declarations and calls can pass a structure as their first argument, |
| or pass nothing. To solve this, the subroutines are named and |
| declared in a particular way. Here's a typical start of a static |
| function used within the Perl guts: |
| |
| STATIC void |
| S_incline(pTHX_ char *s) |
| |
| STATIC becomes "static" in C, and may be #define'd to nothing in some |
| configurations in the future. |
| |
| A public function (i.e. part of the internal API, but not necessarily |
| sanctioned for use in extensions) begins like this: |
| |
| void |
| Perl_sv_setiv(pTHX_ SV* dsv, IV num) |
| |
| C<pTHX_> is one of a number of macros (in F<perl.h>) that hide the |
| details of the interpreter's context. THX stands for "thread", "this", |
| or "thingy", as the case may be. (And no, George Lucas is not involved. :-) |
| The first character could be 'p' for a B<p>rototype, 'a' for B<a>rgument, |
| or 'd' for B<d>eclaration, so we have C<pTHX>, C<aTHX> and C<dTHX>, and |
| their variants. |
| |
| When Perl is built without options that set PERL_IMPLICIT_CONTEXT, there is no |
| first argument containing the interpreter's context. The trailing underscore |
| in the pTHX_ macro indicates that the macro expansion needs a comma |
| after the context argument because other arguments follow it. If |
| PERL_IMPLICIT_CONTEXT is not defined, pTHX_ will be ignored, and the |
| subroutine is not prototyped to take the extra argument. The form of the |
| macro without the trailing underscore is used when there are no additional |
| explicit arguments. |
| |
| When a core function calls another, it must pass the context. This |
| is normally hidden via macros. Consider C<sv_setiv>. It expands into |
| something like this: |
| |
| #ifdef PERL_IMPLICIT_CONTEXT |
| #define sv_setiv(a,b) Perl_sv_setiv(aTHX_ a, b) |
| /* can't do this for vararg functions, see below */ |
| #else |
| #define sv_setiv Perl_sv_setiv |
| #endif |
| |
| This works well, and means that XS authors can gleefully write: |
| |
| sv_setiv(foo, bar); |
| |
| and still have it work under all the modes Perl could have been |
| compiled with. |
| |
| This doesn't work so cleanly for varargs functions, though, as macros |
| imply that the number of arguments is known in advance. Instead we |
| either need to spell them out fully, passing C<aTHX_> as the first |
| argument (the Perl core tends to do this with functions like |
| Perl_warner), or use a context-free version. |
| |
| The context-free version of Perl_warner is called |
| Perl_warner_nocontext, and does not take the extra argument. Instead |
| it does dTHX; to get the context from thread-local storage. We |
| C<#define warner Perl_warner_nocontext> so that extensions get source |
| compatibility at the expense of performance. (Passing an arg is |
| cheaper than grabbing it from thread-local storage.) |
| |
| You can ignore [pad]THXx when browsing the Perl headers/sources. |
| Those are strictly for use within the core. Extensions and embedders |
| need only be aware of [pad]THX. |
| |
| =head2 So what happened to dTHR? |
| |
| C<dTHR> was introduced in perl 5.005 to support the older thread model. |
| The older thread model now uses the C<THX> mechanism to pass context |
| pointers around, so C<dTHR> is not useful any more. Perl 5.6.0 and |
| later still have it for backward source compatibility, but it is defined |
| to be a no-op. |
| |
| =head2 How do I use all this in extensions? |
| |
| When Perl is built with PERL_IMPLICIT_CONTEXT, extensions that call |
| any functions in the Perl API will need to pass the initial context |
| argument somehow. The kicker is that you will need to write it in |
| such a way that the extension still compiles when Perl hasn't been |
| built with PERL_IMPLICIT_CONTEXT enabled. |
| |
| There are three ways to do this. First, the easy but inefficient way, |
| which is also the default, in order to maintain source compatibility |
| with extensions: whenever F<XSUB.h> is #included, it redefines the aTHX |
| and aTHX_ macros to call a function that will return the context. |
| Thus, something like: |
| |
| sv_setiv(sv, num); |
| |
| in your extension will translate to this when PERL_IMPLICIT_CONTEXT is |
| in effect: |
| |
| Perl_sv_setiv(Perl_get_context(), sv, num); |
| |
| or to this otherwise: |
| |
| Perl_sv_setiv(sv, num); |
| |
| You don't have to do anything new in your extension to get this; since |
| the Perl library provides Perl_get_context(), it will all just |
| work. |
| |
| The second, more efficient way is to use the following template for |
| your Foo.xs: |
| |
| #define PERL_NO_GET_CONTEXT /* we want efficiency */ |
| #include "EXTERN.h" |
| #include "perl.h" |
| #include "XSUB.h" |
| |
| STATIC void my_private_function(int arg1, int arg2); |
| |
| STATIC void |
| my_private_function(int arg1, int arg2) |
| { |
| dTHX; /* fetch context */ |
| ... call many Perl API functions ... |
| } |
| |
| [... etc ...] |
| |
| MODULE = Foo PACKAGE = Foo |
| |
| /* typical XSUB */ |
| |
| void |
| my_xsub(arg) |
| int arg |
| CODE: |
| my_private_function(arg, 10); |
| |
| Note that the only two changes from the normal way of writing an |
| extension is the addition of a C<#define PERL_NO_GET_CONTEXT> before |
| including the Perl headers, followed by a C<dTHX;> declaration at |
| the start of every function that will call the Perl API. (You'll |
| know which functions need this, because the C compiler will complain |
| that there's an undeclared identifier in those functions.) No changes |
| are needed for the XSUBs themselves, because the XS() macro is |
| correctly defined to pass in the implicit context if needed. |
| |
| The third, even more efficient way is to ape how it is done within |
| the Perl guts: |
| |
| |
| #define PERL_NO_GET_CONTEXT /* we want efficiency */ |
| #include "EXTERN.h" |
| #include "perl.h" |
| #include "XSUB.h" |
| |
| /* pTHX_ only needed for functions that call Perl API */ |
| STATIC void my_private_function(pTHX_ int arg1, int arg2); |
| |
| STATIC void |
| my_private_function(pTHX_ int arg1, int arg2) |
| { |
| /* dTHX; not needed here, because THX is an argument */ |
| ... call Perl API functions ... |
| } |
| |
| [... etc ...] |
| |
| MODULE = Foo PACKAGE = Foo |
| |
| /* typical XSUB */ |
| |
| void |
| my_xsub(arg) |
| int arg |
| CODE: |
| my_private_function(aTHX_ arg, 10); |
| |
| This implementation never has to fetch the context using a function |
| call, since it is always passed as an extra argument. Depending on |
| your needs for simplicity or efficiency, you may mix the previous |
| two approaches freely. |
| |
| Never add a comma after C<pTHX> yourself--always use the form of the |
| macro with the underscore for functions that take explicit arguments, |
| or the form without the argument for functions with no explicit arguments. |
| |
| If one is compiling Perl with the C<-DPERL_GLOBAL_STRUCT> the C<dVAR> |
| definition is needed if the Perl global variables (see F<perlvars.h> |
| or F<globvar.sym>) are accessed in the function and C<dTHX> is not |
| used (the C<dTHX> includes the C<dVAR> if necessary). One notices |
| the need for C<dVAR> only with the said compile-time define, because |
| otherwise the Perl global variables are visible as-is. |
| |
| =head2 Should I do anything special if I call perl from multiple threads? |
| |
| If you create interpreters in one thread and then proceed to call them in |
| another, you need to make sure perl's own Thread Local Storage (TLS) slot is |
| initialized correctly in each of those threads. |
| |
| The C<perl_alloc> and C<perl_clone> API functions will automatically set |
| the TLS slot to the interpreter they created, so that there is no need to do |
| anything special if the interpreter is always accessed in the same thread that |
| created it, and that thread did not create or call any other interpreters |
| afterwards. If that is not the case, you have to set the TLS slot of the |
| thread before calling any functions in the Perl API on that particular |
| interpreter. This is done by calling the C<PERL_SET_CONTEXT> macro in that |
| thread as the first thing you do: |
| |
| /* do this before doing anything else with some_perl */ |
| PERL_SET_CONTEXT(some_perl); |
| |
| ... other Perl API calls on some_perl go here ... |
| |
| =head2 Future Plans and PERL_IMPLICIT_SYS |
| |
| Just as PERL_IMPLICIT_CONTEXT provides a way to bundle up everything |
| that the interpreter knows about itself and pass it around, so too are |
| there plans to allow the interpreter to bundle up everything it knows |
| about the environment it's running on. This is enabled with the |
| PERL_IMPLICIT_SYS macro. Currently it only works with USE_ITHREADS on |
| Windows. |
| |
| This allows the ability to provide an extra pointer (called the "host" |
| environment) for all the system calls. This makes it possible for |
| all the system stuff to maintain their own state, broken down into |
| seven C structures. These are thin wrappers around the usual system |
| calls (see F<win32/perllib.c>) for the default perl executable, but for a |
| more ambitious host (like the one that would do fork() emulation) all |
| the extra work needed to pretend that different interpreters are |
| actually different "processes", would be done here. |
| |
| The Perl engine/interpreter and the host are orthogonal entities. |
| There could be one or more interpreters in a process, and one or |
| more "hosts", with free association between them. |
| |
| =head1 Internal Functions |
| |
| All of Perl's internal functions which will be exposed to the outside |
| world are prefixed by C<Perl_> so that they will not conflict with XS |
| functions or functions used in a program in which Perl is embedded. |
| Similarly, all global variables begin with C<PL_>. (By convention, |
| static functions start with C<S_>.) |
| |
| Inside the Perl core (C<PERL_CORE> defined), you can get at the functions |
| either with or without the C<Perl_> prefix, thanks to a bunch of defines |
| that live in F<embed.h>. Note that extension code should I<not> set |
| C<PERL_CORE>; this exposes the full perl internals, and is likely to cause |
| breakage of the XS in each new perl release. |
| |
| The file F<embed.h> is generated automatically from |
| F<embed.pl> and F<embed.fnc>. F<embed.pl> also creates the prototyping |
| header files for the internal functions, generates the documentation |
| and a lot of other bits and pieces. It's important that when you add |
| a new function to the core or change an existing one, you change the |
| data in the table in F<embed.fnc> as well. Here's a sample entry from |
| that table: |
| |
| Apd |SV** |av_fetch |AV* ar|I32 key|I32 lval |
| |
| The second column is the return type, the third column the name. Columns |
| after that are the arguments. The first column is a set of flags: |
| |
| =over 3 |
| |
| =item A |
| |
| This function is a part of the public API. All such functions should also |
| have 'd', very few do not. |
| |
| =item p |
| |
| This function has a C<Perl_> prefix; i.e. it is defined as |
| C<Perl_av_fetch>. |
| |
| =item d |
| |
| This function has documentation using the C<apidoc> feature which we'll |
| look at in a second. Some functions have 'd' but not 'A'; docs are good. |
| |
| =back |
| |
| Other available flags are: |
| |
| =over 3 |
| |
| =item s |
| |
| This is a static function and is defined as C<STATIC S_whatever>, and |
| usually called within the sources as C<whatever(...)>. |
| |
| =item n |
| |
| This does not need an interpreter context, so the definition has no |
| C<pTHX>, and it follows that callers don't use C<aTHX>. (See |
| L</Background and PERL_IMPLICIT_CONTEXT>.) |
| |
| =item r |
| |
| This function never returns; C<croak>, C<exit> and friends. |
| |
| =item f |
| |
| This function takes a variable number of arguments, C<printf> style. |
| The argument list should end with C<...>, like this: |
| |
| Afprd |void |croak |const char* pat|... |
| |
| =item M |
| |
| This function is part of the experimental development API, and may change |
| or disappear without notice. |
| |
| =item o |
| |
| This function should not have a compatibility macro to define, say, |
| C<Perl_parse> to C<parse>. It must be called as C<Perl_parse>. |
| |
| =item x |
| |
| This function isn't exported out of the Perl core. |
| |
| =item m |
| |
| This is implemented as a macro. |
| |
| =item X |
| |
| This function is explicitly exported. |
| |
| =item E |
| |
| This function is visible to extensions included in the Perl core. |
| |
| =item b |
| |
| Binary backward compatibility; this function is a macro but also has |
| a C<Perl_> implementation (which is exported). |
| |
| =item others |
| |
| See the comments at the top of C<embed.fnc> for others. |
| |
| =back |
| |
| If you edit F<embed.pl> or F<embed.fnc>, you will need to run |
| C<make regen_headers> to force a rebuild of F<embed.h> and other |
| auto-generated files. |
| |
| =head2 Formatted Printing of IVs, UVs, and NVs |
| |
| If you are printing IVs, UVs, or NVS instead of the stdio(3) style |
| formatting codes like C<%d>, C<%ld>, C<%f>, you should use the |
| following macros for portability |
| |
| IVdf IV in decimal |
| UVuf UV in decimal |
| UVof UV in octal |
| UVxf UV in hexadecimal |
| NVef NV %e-like |
| NVff NV %f-like |
| NVgf NV %g-like |
| |
| These will take care of 64-bit integers and long doubles. |
| For example: |
| |
| printf("IV is %"IVdf"\n", iv); |
| |
| The IVdf will expand to whatever is the correct format for the IVs. |
| |
| If you are printing addresses of pointers, use UVxf combined |
| with PTR2UV(), do not use %lx or %p. |
| |
| =head2 Pointer-To-Integer and Integer-To-Pointer |
| |
| Because pointer size does not necessarily equal integer size, |
| use the follow macros to do it right. |
| |
| PTR2UV(pointer) |
| PTR2IV(pointer) |
| PTR2NV(pointer) |
| INT2PTR(pointertotype, integer) |
| |
| For example: |
| |
| IV iv = ...; |
| SV *sv = INT2PTR(SV*, iv); |
| |
| and |
| |
| AV *av = ...; |
| UV uv = PTR2UV(av); |
| |
| =head2 Exception Handling |
| |
| There are a couple of macros to do very basic exception handling in XS |
| modules. You have to define C<NO_XSLOCKS> before including F<XSUB.h> to |
| be able to use these macros: |
| |
| #define NO_XSLOCKS |
| #include "XSUB.h" |
| |
| You can use these macros if you call code that may croak, but you need |
| to do some cleanup before giving control back to Perl. For example: |
| |
| dXCPT; /* set up necessary variables */ |
| |
| XCPT_TRY_START { |
| code_that_may_croak(); |
| } XCPT_TRY_END |
| |
| XCPT_CATCH |
| { |
| /* do cleanup here */ |
| XCPT_RETHROW; |
| } |
| |
| Note that you always have to rethrow an exception that has been |
| caught. Using these macros, it is not possible to just catch the |
| exception and ignore it. If you have to ignore the exception, you |
| have to use the C<call_*> function. |
| |
| The advantage of using the above macros is that you don't have |
| to setup an extra function for C<call_*>, and that using these |
| macros is faster than using C<call_*>. |
| |
| =head2 Source Documentation |
| |
| There's an effort going on to document the internal functions and |
| automatically produce reference manuals from them - L<perlapi> is one |
| such manual which details all the functions which are available to XS |
| writers. L<perlintern> is the autogenerated manual for the functions |
| which are not part of the API and are supposedly for internal use only. |
| |
| Source documentation is created by putting POD comments into the C |
| source, like this: |
| |
| /* |
| =for apidoc sv_setiv |
| |
| Copies an integer into the given SV. Does not handle 'set' magic. See |
| C<sv_setiv_mg>. |
| |
| =cut |
| */ |
| |
| Please try and supply some documentation if you add functions to the |
| Perl core. |
| |
| =head2 Backwards compatibility |
| |
| The Perl API changes over time. New functions are added or the interfaces |
| of existing functions are changed. The C<Devel::PPPort> module tries to |
| provide compatibility code for some of these changes, so XS writers don't |
| have to code it themselves when supporting multiple versions of Perl. |
| |
| C<Devel::PPPort> generates a C header file F<ppport.h> that can also |
| be run as a Perl script. To generate F<ppport.h>, run: |
| |
| perl -MDevel::PPPort -eDevel::PPPort::WriteFile |
| |
| Besides checking existing XS code, the script can also be used to retrieve |
| compatibility information for various API calls using the C<--api-info> |
| command line switch. For example: |
| |
| % perl ppport.h --api-info=sv_magicext |
| |
| For details, see C<perldoc ppport.h>. |
| |
| =head1 Unicode Support |
| |
| Perl 5.6.0 introduced Unicode support. It's important for porters and XS |
| writers to understand this support and make sure that the code they |
| write does not corrupt Unicode data. |
| |
| =head2 What B<is> Unicode, anyway? |
| |
| In the olden, less enlightened times, we all used to use ASCII. Most of |
| us did, anyway. The big problem with ASCII is that it's American. Well, |
| no, that's not actually the problem; the problem is that it's not |
| particularly useful for people who don't use the Roman alphabet. What |
| used to happen was that particular languages would stick their own |
| alphabet in the upper range of the sequence, between 128 and 255. Of |
| course, we then ended up with plenty of variants that weren't quite |
| ASCII, and the whole point of it being a standard was lost. |
| |
| Worse still, if you've got a language like Chinese or |
| Japanese that has hundreds or thousands of characters, then you really |
| can't fit them into a mere 256, so they had to forget about ASCII |
| altogether, and build their own systems using pairs of numbers to refer |
| to one character. |
| |
| To fix this, some people formed Unicode, Inc. and |
| produced a new character set containing all the characters you can |
| possibly think of and more. There are several ways of representing these |
| characters, and the one Perl uses is called UTF-8. UTF-8 uses |
| a variable number of bytes to represent a character. You can learn more |
| about Unicode and Perl's Unicode model in L<perlunicode>. |
| |
| =head2 How can I recognise a UTF-8 string? |
| |
| You can't. This is because UTF-8 data is stored in bytes just like |
| non-UTF-8 data. The Unicode character 200, (C<0xC8> for you hex types) |
| capital E with a grave accent, is represented by the two bytes |
| C<v196.172>. Unfortunately, the non-Unicode string C<chr(196).chr(172)> |
| has that byte sequence as well. So you can't tell just by looking - this |
| is what makes Unicode input an interesting problem. |
| |
| In general, you either have to know what you're dealing with, or you |
| have to guess. The API function C<is_utf8_string> can help; it'll tell |
| you if a string contains only valid UTF-8 characters. However, it can't |
| do the work for you. On a character-by-character basis, C<is_utf8_char> |
| will tell you whether the current character in a string is valid UTF-8. |
| |
| =head2 How does UTF-8 represent Unicode characters? |
| |
| As mentioned above, UTF-8 uses a variable number of bytes to store a |
| character. Characters with values 0...127 are stored in one byte, just |
| like good ol' ASCII. Character 128 is stored as C<v194.128>; this |
| continues up to character 191, which is C<v194.191>. Now we've run out of |
| bits (191 is binary C<10111111>) so we move on; 192 is C<v195.128>. And |
| so it goes on, moving to three bytes at character 2048. |
| |
| Assuming you know you're dealing with a UTF-8 string, you can find out |
| how long the first character in it is with the C<UTF8SKIP> macro: |
| |
| char *utf = "\305\233\340\240\201"; |
| I32 len; |
| |
| len = UTF8SKIP(utf); /* len is 2 here */ |
| utf += len; |
| len = UTF8SKIP(utf); /* len is 3 here */ |
| |
| Another way to skip over characters in a UTF-8 string is to use |
| C<utf8_hop>, which takes a string and a number of characters to skip |
| over. You're on your own about bounds checking, though, so don't use it |
| lightly. |
| |
| All bytes in a multi-byte UTF-8 character will have the high bit set, |
| so you can test if you need to do something special with this |
| character like this (the UTF8_IS_INVARIANT() is a macro that tests |
| whether the byte can be encoded as a single byte even in UTF-8): |
| |
| U8 *utf; |
| U8 *utf_end; /* 1 beyond buffer pointed to by utf */ |
| UV uv; /* Note: a UV, not a U8, not a char */ |
| STRLEN len; /* length of character in bytes */ |
| |
| if (!UTF8_IS_INVARIANT(*utf)) |
| /* Must treat this as UTF-8 */ |
| uv = utf8_to_uvchr_buf(utf, utf_end, &len); |
| else |
| /* OK to treat this character as a byte */ |
| uv = *utf; |
| |
| You can also see in that example that we use C<utf8_to_uvchr_buf> to get the |
| value of the character; the inverse function C<uvchr_to_utf8> is available |
| for putting a UV into UTF-8: |
| |
| if (!UTF8_IS_INVARIANT(uv)) |
| /* Must treat this as UTF8 */ |
| utf8 = uvchr_to_utf8(utf8, uv); |
| else |
| /* OK to treat this character as a byte */ |
| *utf8++ = uv; |
| |
| You B<must> convert characters to UVs using the above functions if |
| you're ever in a situation where you have to match UTF-8 and non-UTF-8 |
| characters. You may not skip over UTF-8 characters in this case. If you |
| do this, you'll lose the ability to match hi-bit non-UTF-8 characters; |
| for instance, if your UTF-8 string contains C<v196.172>, and you skip |
| that character, you can never match a C<chr(200)> in a non-UTF-8 string. |
| So don't do that! |
| |
| =head2 How does Perl store UTF-8 strings? |
| |
| Currently, Perl deals with Unicode strings and non-Unicode strings |
| slightly differently. A flag in the SV, C<SVf_UTF8>, indicates that the |
| string is internally encoded as UTF-8. Without it, the byte value is the |
| codepoint number and vice versa (in other words, the string is encoded |
| as iso-8859-1, but C<use feature 'unicode_strings'> is needed to get iso-8859-1 |
| semantics). You can check and manipulate this flag with the |
| following macros: |
| |
| SvUTF8(sv) |
| SvUTF8_on(sv) |
| SvUTF8_off(sv) |
| |
| This flag has an important effect on Perl's treatment of the string: if |
| Unicode data is not properly distinguished, regular expressions, |
| C<length>, C<substr> and other string handling operations will have |
| undesirable results. |
| |
| The problem comes when you have, for instance, a string that isn't |
| flagged as UTF-8, and contains a byte sequence that could be UTF-8 - |
| especially when combining non-UTF-8 and UTF-8 strings. |
| |
| Never forget that the C<SVf_UTF8> flag is separate to the PV value; you |
| need be sure you don't accidentally knock it off while you're |
| manipulating SVs. More specifically, you cannot expect to do this: |
| |
| SV *sv; |
| SV *nsv; |
| STRLEN len; |
| char *p; |
| |
| p = SvPV(sv, len); |
| frobnicate(p); |
| nsv = newSVpvn(p, len); |
| |
| The C<char*> string does not tell you the whole story, and you can't |
| copy or reconstruct an SV just by copying the string value. Check if the |
| old SV has the UTF8 flag set, and act accordingly: |
| |
| p = SvPV(sv, len); |
| frobnicate(p); |
| nsv = newSVpvn(p, len); |
| if (SvUTF8(sv)) |
| SvUTF8_on(nsv); |
| |
| In fact, your C<frobnicate> function should be made aware of whether or |
| not it's dealing with UTF-8 data, so that it can handle the string |
| appropriately. |
| |
| Since just passing an SV to an XS function and copying the data of |
| the SV is not enough to copy the UTF8 flags, even less right is just |
| passing a C<char *> to an XS function. |
| |
| =head2 How do I convert a string to UTF-8? |
| |
| If you're mixing UTF-8 and non-UTF-8 strings, it is necessary to upgrade |
| one of the strings to UTF-8. If you've got an SV, the easiest way to do |
| this is: |
| |
| sv_utf8_upgrade(sv); |
| |
| However, you must not do this, for example: |
| |
| if (!SvUTF8(left)) |
| sv_utf8_upgrade(left); |
| |
| If you do this in a binary operator, you will actually change one of the |
| strings that came into the operator, and, while it shouldn't be noticeable |
| by the end user, it can cause problems in deficient code. |
| |
| Instead, C<bytes_to_utf8> will give you a UTF-8-encoded B<copy> of its |
| string argument. This is useful for having the data available for |
| comparisons and so on, without harming the original SV. There's also |
| C<utf8_to_bytes> to go the other way, but naturally, this will fail if |
| the string contains any characters above 255 that can't be represented |
| in a single byte. |
| |
| =head2 Is there anything else I need to know? |
| |
| Not really. Just remember these things: |
| |
| =over 3 |
| |
| =item * |
| |
| There's no way to tell if a string is UTF-8 or not. You can tell if an SV |
| is UTF-8 by looking at its C<SvUTF8> flag. Don't forget to set the flag if |
| something should be UTF-8. Treat the flag as part of the PV, even though |
| it's not - if you pass on the PV to somewhere, pass on the flag too. |
| |
| =item * |
| |
| If a string is UTF-8, B<always> use C<utf8_to_uvchr_buf> to get at the value, |
| unless C<UTF8_IS_INVARIANT(*s)> in which case you can use C<*s>. |
| |
| =item * |
| |
| When writing a character C<uv> to a UTF-8 string, B<always> use |
| C<uvchr_to_utf8>, unless C<UTF8_IS_INVARIANT(uv))> in which case |
| you can use C<*s = uv>. |
| |
| =item * |
| |
| Mixing UTF-8 and non-UTF-8 strings is tricky. Use C<bytes_to_utf8> to get |
| a new string which is UTF-8 encoded, and then combine them. |
| |
| =back |
| |
| =head1 Custom Operators |
| |
| Custom operator support is a new experimental feature that allows you to |
| define your own ops. This is primarily to allow the building of |
| interpreters for other languages in the Perl core, but it also allows |
| optimizations through the creation of "macro-ops" (ops which perform the |
| functions of multiple ops which are usually executed together, such as |
| C<gvsv, gvsv, add>.) |
| |
| This feature is implemented as a new op type, C<OP_CUSTOM>. The Perl |
| core does not "know" anything special about this op type, and so it will |
| not be involved in any optimizations. This also means that you can |
| define your custom ops to be any op structure - unary, binary, list and |
| so on - you like. |
| |
| It's important to know what custom operators won't do for you. They |
| won't let you add new syntax to Perl, directly. They won't even let you |
| add new keywords, directly. In fact, they won't change the way Perl |
| compiles a program at all. You have to do those changes yourself, after |
| Perl has compiled the program. You do this either by manipulating the op |
| tree using a C<CHECK> block and the C<B::Generate> module, or by adding |
| a custom peephole optimizer with the C<optimize> module. |
| |
| When you do this, you replace ordinary Perl ops with custom ops by |
| creating ops with the type C<OP_CUSTOM> and the C<pp_addr> of your own |
| PP function. This should be defined in XS code, and should look like |
| the PP ops in C<pp_*.c>. You are responsible for ensuring that your op |
| takes the appropriate number of values from the stack, and you are |
| responsible for adding stack marks if necessary. |
| |
| You should also "register" your op with the Perl interpreter so that it |
| can produce sensible error and warning messages. Since it is possible to |
| have multiple custom ops within the one "logical" op type C<OP_CUSTOM>, |
| Perl uses the value of C<< o->op_ppaddr >> to determine which custom op |
| it is dealing with. You should create an C<XOP> structure for each |
| ppaddr you use, set the properties of the custom op with |
| C<XopENTRY_set>, and register the structure against the ppaddr using |
| C<Perl_custom_op_register>. A trivial example might look like: |
| |
| static XOP my_xop; |
| static OP *my_pp(pTHX); |
| |
| BOOT: |
| XopENTRY_set(&my_xop, xop_name, "myxop"); |
| XopENTRY_set(&my_xop, xop_desc, "Useless custom op"); |
| Perl_custom_op_register(aTHX_ my_pp, &my_xop); |
| |
| The available fields in the structure are: |
| |
| =over 4 |
| |
| =item xop_name |
| |
| A short name for your op. This will be included in some error messages, |
| and will also be returned as C<< $op->name >> by the L<B|B> module, so |
| it will appear in the output of module like L<B::Concise|B::Concise>. |
| |
| =item xop_desc |
| |
| A short description of the function of the op. |
| |
| =item xop_class |
| |
| Which of the various C<*OP> structures this op uses. This should be one of |
| the C<OA_*> constants from F<op.h>, namely |
| |
| =over 4 |
| |
| =item OA_BASEOP |
| |
| =item OA_UNOP |
| |
| =item OA_BINOP |
| |
| =item OA_LOGOP |
| |
| =item OA_LISTOP |
| |
| =item OA_PMOP |
| |
| =item OA_SVOP |
| |
| =item OA_PADOP |
| |
| =item OA_PVOP_OR_SVOP |
| |
| This should be interpreted as 'C<PVOP>' only. The C<_OR_SVOP> is because |
| the only core C<PVOP>, C<OP_TRANS>, can sometimes be a C<SVOP> instead. |
| |
| =item OA_LOOP |
| |
| =item OA_COP |
| |
| =back |
| |
| The other C<OA_*> constants should not be used. |
| |
| =item xop_peep |
| |
| This member is of type C<Perl_cpeep_t>, which expands to C<void |
| (*Perl_cpeep_t)(aTHX_ OP *o, OP *oldop)>. If it is set, this function |
| will be called from C<Perl_rpeep> when ops of this type are encountered |
| by the peephole optimizer. I<o> is the OP that needs optimizing; |
| I<oldop> is the previous OP optimized, whose C<op_next> points to I<o>. |
| |
| =back |
| |
| C<B::Generate> directly supports the creation of custom ops by name. |
| |
| =head1 AUTHORS |
| |
| Until May 1997, this document was maintained by Jeff Okamoto |
| E<lt>okamoto@corp.hp.comE<gt>. It is now maintained as part of Perl |
| itself by the Perl 5 Porters E<lt>perl5-porters@perl.orgE<gt>. |
| |
| With lots of help and suggestions from Dean Roehrich, Malcolm Beattie, |
| Andreas Koenig, Paul Hudson, Ilya Zakharevich, Paul Marquess, Neil |
| Bowers, Matthew Green, Tim Bunce, Spider Boardman, Ulrich Pfeifer, |
| Stephen McCamant, and Gurusamy Sarathy. |
| |
| =head1 SEE ALSO |
| |
| L<perlapi>, L<perlintern>, L<perlxs>, L<perlembed> |