perl/lib/pods/perlhacktut.pod - chromium/deps/perl.git - Git at Google

 =encoding utf8

 =for comment
 Consistent formatting of this file is achieved with:
   perl ./Porting/podtidy pod/perlhacktut.pod

 =head1 NAME

 perlhacktut - Walk through the creation of a simple C code patch

 =head1 DESCRIPTION

 This document takes you through a simple patch example.

 If you haven't read L<perlhack> yet, go do that first! You might also
 want to read through L<perlsource> too.

 Once you're done here, check out L<perlhacktips> next.

 =head1 EXAMPLE OF A SIMPLE PATCH

 Let's take a simple patch from start to finish.

 Here's something Larry suggested: if a C<U> is the first active format
 during a C<pack>, (for example, C<pack "U3C8", @stuff>) then the
 resulting string should be treated as UTF-8 encoded.

 If you are working with a git clone of the Perl repository, you will
 want to create a branch for your changes. This will make creating a
 proper patch much simpler. See the L<perlgit> for details on how to do
 this.

 =head2 Writing the patch

 How do we prepare to fix this up? First we locate the code in question
 - the C<pack> happens at runtime, so it's going to be in one of the
 F<pp> files. Sure enough, C<pp_pack> is in F<pp.c>. Since we're going
 to be altering this file, let's copy it to F<pp.c~>.

 [Well, it was in F<pp.c> when this tutorial was written. It has now
 been split off with C<pp_unpack> to its own file, F<pp_pack.c>]

 Now let's look over C<pp_pack>: we take a pattern into C<pat>, and then
 loop over the pattern, taking each format character in turn into
 C<datum_type>. Then for each possible format character, we swallow up
 the other arguments in the pattern (a field width, an asterisk, and so
 on) and convert the next chunk input into the specified format, adding
 it onto the output SV C<cat>.

 How do we know if the C<U> is the first format in the C<pat>? Well, if
 we have a pointer to the start of C<pat> then, if we see a C<U> we can
 test whether we're still at the start of the string. So, here's where
 C<pat> is set up:

     STRLEN fromlen;
     register char *pat = SvPVx(*++MARK, fromlen);
     register char *patend = pat + fromlen;
     register I32 len;
     I32 datumtype;
     SV *fromstr;

 We'll have another string pointer in there:

     STRLEN fromlen;
     register char *pat = SvPVx(*++MARK, fromlen);
     register char *patend = pat + fromlen;
  +  char *patcopy;
     register I32 len;
     I32 datumtype;
     SV *fromstr;

 And just before we start the loop, we'll set C<patcopy> to be the start
 of C<pat>:

     items = SP - MARK;
     MARK++;
     sv_setpvn(cat, "", 0);
  +  patcopy = pat;
     while (pat < patend) {

 Now if we see a C<U> which was at the start of the string, we turn on
 the C<UTF8> flag for the output SV, C<cat>:

  +  if (datumtype == 'U' && pat==patcopy+1)
  +      SvUTF8_on(cat);
     if (datumtype == '#') {
         while (pat < patend && *pat != '\n')
             pat++;

 Remember that it has to be C<patcopy+1> because the first character of
 the string is the C<U> which has been swallowed into C<datumtype!>

 Oops, we forgot one thing: what if there are spaces at the start of the
 pattern? C<pack("  U*", @stuff)> will have C<U> as the first active
 character, even though it's not the first thing in the pattern. In this
 case, we have to advance C<patcopy> along with C<pat> when we see
 spaces:

     if (isSPACE(datumtype))
         continue;

 needs to become

     if (isSPACE(datumtype)) {
         patcopy++;
         continue;
     }

 OK. That's the C part done. Now we must do two additional things before
 this patch is ready to go: we've changed the behaviour of Perl, and so
 we must document that change. We must also provide some more regression
 tests to make sure our patch works and doesn't create a bug somewhere
 else along the line.

 =head2 Testing the patch

 The regression tests for each operator live in F<t/op/>, and so we make
 a copy of F<t/op/pack.t> to F<t/op/pack.t~>. Now we can add our tests
 to the end. First, we'll test that the C<U> does indeed create Unicode
 strings.

 t/op/pack.t has a sensible ok() function, but if it didn't we could use
 the one from t/test.pl.

  require './test.pl';
  plan( tests => 159 );

 so instead of this:

  print 'not ' unless "1.20.300.4000" eq sprintf "%vd",
                                                pack("U*",1,20,300,4000);
  print "ok $test\n"; $test++;

 we can write the more sensible (see L<Test::More> for a full
 explanation of is() and other testing functions).

  is( "1.20.300.4000", sprintf "%vd", pack("U*",1,20,300,4000),
                                        "U* produces Unicode" );

 Now we'll test that we got that space-at-the-beginning business right:

  is( "1.20.300.4000", sprintf "%vd", pack("  U*",1,20,300,4000),
                                      "  with spaces at the beginning" );

 And finally we'll test that we don't make Unicode strings if C<U> is
 B<not> the first active format:

  isnt( v1.20.300.4000, sprintf "%vd", pack("C0U*",1,20,300,4000),
                                        "U* not first isn't Unicode" );

 Mustn't forget to change the number of tests which appears at the top,
 or else the automated tester will get confused. This will either look
 like this:

  print "1..156\n";

 or this:

  plan( tests => 156 );

 We now compile up Perl, and run it through the test suite. Our new
 tests pass, hooray!

 =head2 Documenting the patch

 Finally, the documentation. The job is never done until the paperwork
 is over, so let's describe the change we've just made. The relevant
 place is F<pod/perlfunc.pod>; again, we make a copy, and then we'll
 insert this text in the description of C<pack>:

  =item *

  If the pattern begins with a C<U>, the resulting string will be treated
  as UTF-8-encoded Unicode. You can force UTF-8 encoding on in a string
  with an initial C<U0>, and the bytes that follow will be interpreted as
  Unicode characters. If you don't want this to happen, you can begin
  your pattern with C<C0> (or anything else) to force Perl not to UTF-8
  encode your string, and then follow this with a C<U*> somewhere in your
  pattern.

 =head2 Submit

 See L<perlhack> for details on how to submit this patch.

 =head1 AUTHOR

 This document was originally written by Nathan Torkington, and is
 maintained by the perl5-porters mailing list.
	=encoding utf8

	=for comment
	Consistent formatting of this file is achieved with:
	perl ./Porting/podtidy pod/perlhacktut.pod

	=head1 NAME

	perlhacktut - Walk through the creation of a simple C code patch

	=head1 DESCRIPTION

	This document takes you through a simple patch example.

	If you haven't read L<perlhack> yet, go do that first! You might also
	want to read through L<perlsource> too.

	Once you're done here, check out L<perlhacktips> next.

	=head1 EXAMPLE OF A SIMPLE PATCH

	Let's take a simple patch from start to finish.

	Here's something Larry suggested: if a C<U> is the first active format
	during a C<pack>, (for example, C<pack "U3C8", @stuff>) then the
	resulting string should be treated as UTF-8 encoded.

	If you are working with a git clone of the Perl repository, you will
	want to create a branch for your changes. This will make creating a
	proper patch much simpler. See the L<perlgit> for details on how to do
	this.

	=head2 Writing the patch

	How do we prepare to fix this up? First we locate the code in question
	- the C<pack> happens at runtime, so it's going to be in one of the
	F<pp> files. Sure enough, C<pp_pack> is in F<pp.c>. Since we're going
	to be altering this file, let's copy it to F<pp.c~>.

	[Well, it was in F<pp.c> when this tutorial was written. It has now
	been split off with C<pp_unpack> to its own file, F<pp_pack.c>]

	Now let's look over C<pp_pack>: we take a pattern into C<pat>, and then
	loop over the pattern, taking each format character in turn into
	C<datum_type>. Then for each possible format character, we swallow up
	the other arguments in the pattern (a field width, an asterisk, and so
	on) and convert the next chunk input into the specified format, adding
	it onto the output SV C<cat>.

	How do we know if the C<U> is the first format in the C<pat>? Well, if
	we have a pointer to the start of C<pat> then, if we see a C<U> we can
	test whether we're still at the start of the string. So, here's where
	C<pat> is set up:

	STRLEN fromlen;
	register char pat = SvPVx(++MARK, fromlen);
	register char *patend = pat + fromlen;
	register I32 len;
	I32 datumtype;
	SV *fromstr;

	We'll have another string pointer in there:

	STRLEN fromlen;
	register char pat = SvPVx(++MARK, fromlen);
	register char *patend = pat + fromlen;
	+ char *patcopy;
	register I32 len;
	I32 datumtype;
	SV *fromstr;

	And just before we start the loop, we'll set C<patcopy> to be the start
	of C<pat>:

	items = SP - MARK;
	MARK++;
	sv_setpvn(cat, "", 0);
	+ patcopy = pat;
	while (pat < patend) {

	Now if we see a C<U> which was at the start of the string, we turn on
	the C<UTF8> flag for the output SV, C<cat>:

	+ if (datumtype == 'U' && pat==patcopy+1)
	+ SvUTF8_on(cat);
	if (datumtype == '#') {
	while (pat < patend && *pat != '\n')
	pat++;

	Remember that it has to be C<patcopy+1> because the first character of
	the string is the C<U> which has been swallowed into C<datumtype!>

	Oops, we forgot one thing: what if there are spaces at the start of the
	pattern? C<pack(" U*", @stuff)> will have C<U> as the first active
	character, even though it's not the first thing in the pattern. In this
	case, we have to advance C<patcopy> along with C<pat> when we see
	spaces:

	if (isSPACE(datumtype))
	continue;

	needs to become

	if (isSPACE(datumtype)) {
	patcopy++;
	continue;
	}

	OK. That's the C part done. Now we must do two additional things before
	this patch is ready to go: we've changed the behaviour of Perl, and so
	we must document that change. We must also provide some more regression
	tests to make sure our patch works and doesn't create a bug somewhere
	else along the line.

	=head2 Testing the patch

	The regression tests for each operator live in F<t/op/>, and so we make
	a copy of F<t/op/pack.t> to F<t/op/pack.t~>. Now we can add our tests
	to the end. First, we'll test that the C<U> does indeed create Unicode
	strings.

	t/op/pack.t has a sensible ok() function, but if it didn't we could use
	the one from t/test.pl.

	require './test.pl';
	plan( tests => 159 );

	so instead of this:

	print 'not ' unless "1.20.300.4000" eq sprintf "%vd",
	pack("U*",1,20,300,4000);
	print "ok $test\n"; $test++;

	we can write the more sensible (see L<Test::More> for a full
	explanation of is() and other testing functions).

	is( "1.20.300.4000", sprintf "%vd", pack("U*",1,20,300,4000),
	"U* produces Unicode" );

	Now we'll test that we got that space-at-the-beginning business right:

	is( "1.20.300.4000", sprintf "%vd", pack(" U*",1,20,300,4000),
	" with spaces at the beginning" );

	And finally we'll test that we don't make Unicode strings if C<U> is
	B<not> the first active format:

	isnt( v1.20.300.4000, sprintf "%vd", pack("C0U*",1,20,300,4000),
	"U* not first isn't Unicode" );

	Mustn't forget to change the number of tests which appears at the top,
	or else the automated tester will get confused. This will either look
	like this:

	print "1..156\n";

	or this:

	plan( tests => 156 );

	We now compile up Perl, and run it through the test suite. Our new
	tests pass, hooray!

	=head2 Documenting the patch

	Finally, the documentation. The job is never done until the paperwork
	is over, so let's describe the change we've just made. The relevant
	place is F<pod/perlfunc.pod>; again, we make a copy, and then we'll
	insert this text in the description of C<pack>:

	=item *

	If the pattern begins with a C<U>, the resulting string will be treated
	as UTF-8-encoded Unicode. You can force UTF-8 encoding on in a string
	with an initial C<U0>, and the bytes that follow will be interpreted as
	Unicode characters. If you don't want this to happen, you can begin
	your pattern with C<C0> (or anything else) to force Perl not to UTF-8
	encode your string, and then follow this with a C<U*> somewhere in your
	pattern.

	=head2 Submit

	See L<perlhack> for details on how to submit this patch.

	=head1 AUTHOR

	This document was originally written by Nathan Torkington, and is
	maintained by the perl5-porters mailing list.