• Search:



Planet eZ publish




mugo web

› eZ Find: How to return specific fields of indexed data

When working with eZ Find fetches, you may want to return only a specific sub-set of data for each of the search results, rather than the whole content object.

You can do that by using the eZ Find 'search' fetch's 'fields_to_return' parameter.

11/01/2017 11:30 pm (UTC)   Mugo Web   View entry   Digg!  digg it!   del.icio.us  del.icio.us

derick rethans

› Good Bye PHP 5

Good Bye PHP 5

A few days ago I merged a patch into Xdebug that removes support for PHP 5 in Xdebug's master branch on GitHub. Maintaining PHP 5 and PHP 7 support in one code base is not particularly easy, and even more complicated for something like Xdebug, with its deep interactions with PHP's internals.

As PHP 5.6's active support has ended on December 31st, I also felt it no longer needed to support PHP 5 with Xdebug any more. It saves more than 5000 lines of code:

Many people people were quite positive about that:

Others were less keen:

Removing PHP 5 support from Xdebug's master branch does not mean that Xdebug suddenly stops working for PHP 5 installations. Xdebug 2.5, which was recently released supports PHP 5.5 and 5.6, and is not going to go away.

Right now, Xdebug will no longer receive new features in the branch that also supports PHP 5. New features will only go into master (to become Xdebug 2.6). However, Xdebug 2.5 continues to receive bug fixes until Xdebug 2.6 comes out.

Once Xdebug 2.6 comes out, the Xdebug 2.5 branch will no longer get bug fixes, and hence support for PHP 5 goes away. That still does not mean that you can no longer use Xdebug with PHP 5. The releases of the 2.5 branch will still be available.

On the positive side, not having to implement lots of code twice, also means that new features can be added faster, as less work is required. Xdebug 2.6 has already have some new features lined up.

11/01/2017 10:44 am (UTC)   Derick Rethans   View entry   Digg!  digg it!   del.icio.us  del.icio.us

netgen

› December @ Netgen

The end of the year was marked by our annual Netgen Winter Meetup and a partnership between Web Summer Camp and Agent Conference. This is what we have been up to in December.

10/01/2017 2:19 pm (UTC)   http://www.netgenlabs.com/Blog   View entry   Digg!  digg it!   del.icio.us  del.icio.us

ez publish community gateway

› 2.x refactoring work, round one: Deciding on UI Technology

With 2017 here, and 1.7LTS out the door, we have started to work on the largest change planned this year where we’ll need lots of involvement for the ecosystem: A 2.x refactoring project, starting with making UI technology choices.

09/01/2017 6:45 pm (UTC)   http://share.ez.no   View entry   Digg!  digg it!   del.icio.us  del.icio.us

ez publish community gateway

› The Week in Review: winners adapt fast, a new Studio tutorial and more

eZ Systems’s CEO and Co-founder Aleksander Farstad looks ahead to 2017 and reflects on the importance of fast adaptation. We also present a new Studio tutorial and lots of Symfony news this week.

06/01/2017 6:54 pm (UTC)   http://share.ez.no   View entry   Digg!  digg it!   del.icio.us  del.icio.us

netgen

› Agent Conference and Web Summer Camp – two conferences you should not miss in 2017!

Want to start the year skiing and improving your frontend know-how and end your summer with sun, sea, and web-related workshops? Here is how you can accomplish both!

20/12/2016 2:19 pm (UTC)   http://www.netgenlabs.com/Blog   View entry   Digg!  digg it!   del.icio.us  del.icio.us

mugo web

› What to consider when migrating WordPress content to another WordPress site

Migrating content from one WordPress site into a new site is typically relatively straightforward, but merging two existing sites together - especially sites with different content types, categories, and functionality - can actually be a labour-intensive process that requires planning, testing, and attention to detail.

19/12/2016 9:30 pm (UTC)   Mugo Web   View entry   Digg!  digg it!   del.icio.us  del.icio.us

mugo web

› What to consider when migrating WordPress content to another WordPress site

Migrating content from one WordPress site into a new site is typically relatively straightforward, but merging two existing sites together - especially sites with different content types, categories, and functionality - can actually be a labour-intensive process that requires planning, testing, and attention to detail.

19/12/2016 9:30 pm (UTC)   Mugo Web   View entry   Digg!  digg it!   del.icio.us  del.icio.us

ez publish community gateway

› The Week in Review: eZ Platform 1.7.0 release, name changes, PHP 7 benchmarks and more

An exciting week as eZ Platform and eZ Platform Enterprise Edition are released, important name changes in our product portfolio, a PHP 7 benchmark and more.

16/12/2016 7:37 pm (UTC)   http://share.ez.no   View entry   Digg!  digg it!   del.icio.us  del.icio.us

derick rethans

› Natural Language Sorting with MongoDB 3.4

Natural Language Sorting with MongoDB 3.4

Arranging English words in order is simple—most of the time. You simply arrange them in alphabetical order. Sorting a set of German words, or French words with all of their accents, or Chinese with their different characters is a lot harder than it looks. Sorting rules are specified through locales, which determine how accents are sorted, in which order the characters are in, and how to do case-insensitive sorting. There is a good set of those sorting rules available through CLDR, and there is a neat example to play with all kinds of sorting at ICU's demo site. If you want to know how the algorithms work, have a look at the Unicode Consortium's report on the Unicode Collation Algorithm.

Years ago I wrote about collation and MongoDB. There is an old issue in MongoDB's JIRA tracker, SERVER-1920, to implement collation so that sorting and indexing could work depending on the different sorting orders as described for each language (locale).

Support for these collations have finally landed in MongoDB 3.4 and in this article we are going to have a look at how they work.

How Unicode Collation Works

Many computer languages have their own implementation of the Unicode Collation Algorithm, often implemented through ICU. PHP has an ICU based implementation as part of the intl extension, in the form of the Collator class.

The Collator class encapsulates the Unicode Collation Algorithm to allow you to sort an array of text yourself. It also allows you to visualise the "sort key" to see how the algorithm works:

Take for example the following array of words:

$dictionary = [
    'boffey', 'bøhm', 'brown',
];

Which we can turn into sort keys, and sort using the en locale (English):

$collator = new Collator( 'en' );
foreach ( $dictionary as $word )
{
    $sortKey = $collator->getSortKey( $word );
    $dictionaryWithKey[ bin2hex( $sortKey ) ] = $word;
}

ksort( $dictionaryWithKey );
print_r( $dictionaryWithKey );

Which outputs:

Array
(
    [2b4533333159010a010a] => boffey
    [2b453741014496060109] => bøhm
    [2b4b45554301090109] => brown
)

If we would do this according to the nb (Norwegian) locale, the output would have brown and bøhm reversed:

Array
(
    [2b4533333159010a010a] => boffey
    [2b4b45554301090109] => brown
    [2b5c6703374101080108] => bøhm
)

The sort key for bøhm has now changed, so that its numerical value now makes it sort after brown instead of before brown. In Norwegian, the ö is a distinct letter that sorts after z.

MongoDB 3.4

Before the release of MongoDB 3.4, it was not possible to do a locale based search. As case-insensitivity is just another property of a locale, that was not supported either. Many users worked around this by storing a lower case version of the value in separate field just to do a case-insensitive search. But this has now changed with the implementation of SERVER-1920.

In MongoDB 3.4 you may attach a default locale to a collection:

db.createCollection( 'dictionary', { collation: { locale: 'nb' } } );

A default locale is used for any query without a different locale being specified with the query. Compare the default (nb) locale:

> db.dictionary.find().sort( { word: 1 } );
{ "_id" : ObjectId("5846d65210d52027a50725f0"), "word" : "boffey" }
{ "_id" : ObjectId("5846d65210d52027a50725f1"), "word" : "brown" }
{ "_id" : ObjectId("5846d65210d52027a50725f2"), "word" : "bøhm" }

With the English (en) locale:

> db.dictionary.find().collation( { locale: 'en'} ).sort( { word: 1 } );
{ "_id" : ObjectId("5846d65210d52027a50725f0"), "word" : "boffey" }
{ "_id" : ObjectId("5846d65210d52027a50725f2"), "word" : "bøhm" }
{ "_id" : ObjectId("5846d65210d52027a50725f1"), "word" : "brown" }

The default locale of a collection is also inherited by an index when you create one:

db.dictionary.createIndex( { word: 1 } );

db.dictionary.getIndexes();
[
    …
    {
        "v" : 2,
        "key" : { "word" : 1 },
        "name" : "word_1",
        "ns" : "demo.dictionary",
        "collation" : {
            "locale" : "nb",
            "caseLevel" : false,
            "caseFirst" : "off",
            "strength" : 3,
            "numericOrdering" : false,
            "alternate" : "non-ignorable",
            "maxVariable" : "punct",
            "normalization" : false,
            "backwards" : false,
            "version" : "57.1"
        }
    }
]


From PHP

All the examples below are using the PHP driver for MongoDB (1.2.0) and the accompanying library (1.1.0). These are the minimum versions to work with locales.

To use the MongoDB PHP Library, you need to use Composer to install it, and include the Composer-generated autoloader to make the library available to the script. In short, that is:

php composer require mongodb/mongodb=^1.1.0

And at the start of your script:


        

In this first example, we are going to drop the collection dictionary from the demo database, and create a collection with the default collation en. We also create an index on the word field and insert a couple of words.

First the set-up and assigning of the database handle ($demo):

$client = new \MongoDB\Client();
$demo = $client->demo;

Then we drop the dictionary collection:

$demo->dropCollection( 'dictionary' );

We create a new collection dictionary and set the default collation for this collection to the en locale:

$demo->createCollection(
    'dictionary',
    [
        'collation' => [ 'locale' => 'en' ],
    ]
);
$dictionary = $demo->dictionary;

We create the index, and we also give the index the name dictionary_en. MongoDB supports multiple indexes with the same field pattern, as long as they have a different name and have different collations (e.g. locale, or locale options):

$dictionary->createIndex(
    [ 'word' => 1 ],
    [ 'name' => 'dictionary_en' ]
);

And then we insert some words:

$dictionary->insertMany( [
    [ 'word' => 'beer' ],
    [ 'word' => 'Beer' ],
    [ 'word' => 'côte' ],
    [ 'word' => 'coté' ],
    [ 'word' => 'høme' ],
    [ 'word' => 'id_12' ],
    [ 'word' => 'id_4' ],
    [ 'word' => 'Home' ],
] );

When doing a query, you can specify the locale for that operation. Only one locale can be used for a single operation, which means that MongoDB uses the same locale for the find and the sort parts of a query. We do intent to support more granular support for using collations on different parts of an operation. This is tracked in SERVER-25954.

Only the Base Character

There are many variants of locales. The strength option defines the number of levels that are used to perform a comparison of characters. At strength=1, only base characters are compared. This means that with the en locale: beer == Beer, coté == côte, and Home == høme.

You can specify the strength while doing each query. First we use the en locale and strength 1. This is equivalent to a case insensitive match:

showResults(
    "Match on base character only",
    $dictionary->find(
        [ 'word' => 'beer' ],
        [ 'collation' => [ 'locale' => 'en', 'strength' => 1 ] ]
    )
);

Which outputs:

Match on base character only:
beer Beer

Strength 1 also ignores accents on characters, such as in:

showResults(
    "Match on base character only, ignoring accents",
    $dictionary->find(
        [ 'word' => 'home' ],
        [ 'collation' => [ 'locale' => 'en', 'strength' => 1 ] ]
    )
);

Which outputs:

Match on base character only, ignoring accents:
høme Home

As strength, or any of the other options we will see later, changes the sort key for a string, it is important that you realise that because of this, an index in MongoDB will only be used if it is created with the exact same locale options as the query.

Because we only have an index on word with the default en locale, all other examples do not make use of an index while matching or sorting. If you want to make an indexed lookup for the en/strength=1 example, you need to create an index with:

$dictionary->createIndex(
    [ 'word' => 1 ],
    [
        'name' => 'word_en_strength1',
        'collation' => [
            'locale' => 'en',
            'strength' => 1
        ],
    ]
);

Sorting Accents

Strength 2 takes into account accents on letters while matching and sorting. If we run the match on home in the English locale with strength 2, we get:

showResults(
    "Match on base character with accents",
    $dictionary->find(
        [ 'word' => 'home' ],
        [ 'collation' => [ 'locale' => 'en', 'strength' => 2 ] ]
    )
);

Which outputs:

Match on base character with accents:
Home

The word høme is no longer included. However, the case of characters is still not considered:

showResults(
    "Match on base character with accents (and not case sensitive)",
    $dictionary->find(
        [ 'word' => 'beer' ],
        [ 'collation' => [ 'locale' => 'en', 'strength' => 2 ] ]
    )
);

Which outputs:

Match on base character with accents (and not case sensitive):
beer Beer

Again, more fun can be had while sorting with accents, because languages do things differently. If we take the words cøte and coté, we see a difference in sorting between the fr (French) and fr_CA (Canadian French) locales:

showResults(
    "Sorting accents in French (France)",
    $dictionary->find(
        [ 'word' => new \MongoDB\BSON\Regex( '^c' ) ],
        [
            'collation' => [ 'locale' => 'fr', 'strength' => 2 ],
            'sort' => [ 'word' => 1 ],
        ]
    )
);

showResults(
    "Sorting accents in Canadian French",
    $dictionary->find(
        [ 'word' => new \MongoDB\BSON\Regex( '^c' ) ],
        [
            'collation' => [ 'locale' => 'fr_CA', 'strength' => 2 ],
            'sort' => [ 'word' => 1 ],
        ]
    )
);

Which outputs:

Sorting accents in French (France):
coté côte

Sorting accents in Canadian French:
côte coté

In Canadian French, the accents sort from back to front. This is called Backward Secondary Sorting sorting, and is an option you can set on any locale-based query. Some language locales have different default values for options. To make the French Canadian sort the "wrong" way, we can specify the additional backwards option:

showResults(
    "Sorting accents in Canadian French, the 'wrong' way",
    $dictionary->find(
        [ 'word' => new \MongoDB\BSON\Regex( '^c' ) ],
        [
            'collation' => [ 'locale' => 'fr_CA', 'strength' => 2, 'backwards' => false ],
            'sort' => [ 'word' => 1 ],
        ]
    )
);

Which outputs:

Sorting accents in Canadian French, the 'wrong' way:
coté côte

16/12/2016 12:51 pm (UTC)   Derick Rethans   View entry   Digg!  digg it!   del.icio.us  del.icio.us