clue / utf8-react
Streaming UTF-8 parser, built on top of ReactPHP.
Fund package maintenance!
clue
clue.engineering/support
Installs: 7 005 573
Dependents: 2
Suggesters: 0
Security: 0
Stars: 65
Watchers: 4
Forks: 6
Open Issues: 9
Requires
- php: >=5.3
- react/stream: ^1.0 || ^0.7 || ^0.6 || ^0.5 || ^0.4 || ^0.3
Requires (Dev)
- phpunit/phpunit: ^9.6 || ^5.7 || ^4.8.36
- react/stream: ^1.0 || ^0.7
README
Streaming UTF-8 parser, built on top of ReactPHP.
Table of Contents
Support us
We invest a lot of time developing, maintaining and updating our awesome open-source projects. You can help us sustain this high-quality of our work by becoming a sponsor on GitHub. Sponsors get numerous benefits in return, see our sponsoring page for details.
Let's take these projects to the next level together! 🚀
Usage
Sequencer
The Sequencer
class can be used to make sure you only get back complete, valid
UTF-8 byte sequences when reading from a stream.
It wraps a given ReadableStreamInterface
and exposes its data through the same
interface.
<?php require __DIR__ . '/vendor/autoload.php'; $stdin = new ReadableResourceStream(STDIN); $stream = new Sequencer($stdin); $stream->on('data', function ($chunk) { var_dump($chunk); });
React's streams emit chunks of data strings and make no assumption about its encoding. These chunks do not necessarily represent complete UTF-8 byte sequences, as a sequence may be broken up into multiple chunks. This class reassembles these sequences by buffering incomplete ones.
Also, if you're merely consuming a stream and you're not in control of producing and
ensuring valid UTF-8 data, it may as well include invalid UTF-8 byte sequences.
This class replaces any invalid bytes in the sequence with a ?
.
This replacement character can be given as a second parameter to the constructor:
$stream = new Sequencer($stdin, 'X');
As such, you can be sure you never get an invalid UTF-8 byte sequence out of the resulting stream.
Note that the stream may still contain ASCII control characters or ANSI / VT100 control byte sequences, as they're valid UTF-8. This binary data will be left as-is, unless you filter this at a later stage.
Install
The recommended way to install this library is through Composer. New to Composer?
This project follows SemVer. This will install the latest supported version:
composer require clue/utf8-react:^1.3
See also the CHANGELOG for details about version upgrades.
This project aims to run on any platform and thus does not require any PHP extensions and supports running on legacy PHP 5.3 through current PHP 8+ and HHVM. It's highly recommended to use the latest supported PHP version for this project.
Tests
To run the test suite, you first need to clone this repo and then install all dependencies through Composer:
composer install
To run the test suite, go to the project root and run:
vendor/bin/phpunit
License
This project is released under the permissive MIT license.
Did you know that I offer custom development services and issuing invoices for sponsorships of releases and for contributions? Contact me (@clue) for details.
More
-
If you want to learn more about processing streams of data, refer to the documentation of the underlying react/stream component.
-
If you want to process ASCII control characters or ANSI / VT100 control byte sequences, you may want to use clue/reactphp-term on the raw input stream before passing the resulting stream to the UTF-8 sequencer.
-
If you want to to display or inspect the byte sequences, you may want to use clue/hexdump on the emitted byte sequences.