Lohnt sich eine COBIT 5 Foundation Zertifizierung für Linuxer?

Linux ist heute im Serverbereich vielfach eine etablierte Größe. Dennoch mag es immer noch viele Projekte geben – im Besonderen im deutschen Mittelstand und/oder der öffentlichen Verwaltung – mit denen die Einführung von Linux-Systemen (z.B. aus Gründen von Kostensenkung) erstmalig angegangen wird. Vielleicht mag im einen oder anderen Fall sogar auch der Desktop betroffen sein (s. z.B. das (noch) laufende Limux-Projekt der Stadt München).

Im Zusammenhang mit Linux-Einführungen ist in einem ersten Schritt viel Überzeugungsarbeit zu leisten, bevor entsprechende strategische Entscheidungen getroffen werden. Hierbei sind klassische Themenfelder der IT-Governance betroffen und die Abstimmung mit einem verantwortlichen CIO kommt mit hoher Wahrscheinlichkeit auf die Tagesordnung. Und dann berührt die Welt der betrieblichen IT-Governance direkt diejenigen interner und externer Mitarbeiter wie Berater.

Ich habe letzte Woche die COBIT 5 Foundation Zertifizierung hinter mich gebracht. COBIT 5 ist ein Framework für die Governance und das Management von Unternehmens-IT. Da ich bereits über ein Spektrum personenbezogener Zertifikate im ITSM-, ISM-, Risk Management- und ITIL-Bereich verfüge, habe ich mich selbst gefragt, ob sich der Aufwand überhaupt lohnt. Nach dem Selbststudium des Buches “Praxiswissen COBIT” und dem Absolvieren eines zugehörigen Kurses meine ich jetzt, dass es gerade für Berater im Linux-Umfeld interessant sein kann, sich mit COBIT auseinanderzusetzen. Hierfür ein paar Gründe:

  • 1) IT-Governance und Linux
    Linux einzuführen und/oder substanziell im Unternehmen auszubauen ist immer auch ein GF- und damit Governance-Thema. Für Berater (ggf. auch technische Berater) und engagierte Mitarbeiter lohnt es sich daher durchaus, ein grundlegendes Verständnis in Bezug auf Governance-Themen – und im Besonderen bzgl. IT-Governance – zu erwerben. Und sei im Fall einer technischen (Führungs-) Kraft auch nur zu dem Zweck, die Motive und Leitlinien im Denken des Managements – im Besonderen eines CIO – besser zu verstehen. Hierfür ist ein 1,5-tägiger Foundation Kurs aus meiner Sicht gut geeignet.
    Gerade technik-begeisterte Verfechter von Linux, die sich von “ideologischen” Motiven manchmal nicht frei machen können oder wollen, mag das Credo der COBIT Zielkaskade – nämlich die Ausrichtung der IT auf die Unterstützung von Unternehmenszielen – helfen, den Entscheidungsträgern die “richtigen” Argumente zu liefern. Eine relevante Frage ist: Kann eine linux-basierte Lösung die Unternehmensziele besser, nachhaltiger und ggf. auch kostengünstiger unterstützen als andere Lösungen? Warum? In welchem Umfang? Wie sehen die zugehörigen Business Cases aus?
    “Open” allein ist auf der Management-Ebene zu wenig …. das ist zwar eine Trivialität, aber COBIT erinnert auch Linux-Anhänger zu Recht und explizit daran, das IT kein Selbstzweck ist. Wir müssen schon aufzeigen, wo, wann und warum genau Linux-basierte Lösungen das Unternehmen die Unternehmensziele effektiver und effizienter erreichen lassen als andere Lösungsansätze. Und wir müssen das auch immer wieder selbstkritisch überprüfen.
  • 2) Projekte / Programme als Teil der IT-Governance
    Wer die ISO 20000 oder die ISO 21000 kennt weiß, dass die Begriffswelten dieser Normen Projekte und deren Prozesse ausklammern. Das ist unter dem Aspekt der Allgemeingültigkeit von Normen verständlich – aber dennoch nicht wirklich praxis- und realitätsnah. Aus einem Anforderungsmanagement heraus entstehen neue Services i.d.R. durch Projekte oder bei großen, strategischen Anliege auch durch Programme. Das gilt natürlich auch und gerade im Linux-Umfeld. Daher ist ein Framework gefragt, welches neben den ITSM-Normen und ITSM, ISM- Best Practices auch auf Best Practices im Bereich des Projektmanagement verweist und entsprechende Prozesse zur Government-Unterstützung einbindet. Dies ist bei CoBIT 5 der Fall.

  • 3) Architektur unter verschiedenen Blickwinkeln
    Linux hat aus meiner Sicht natürlicherweise viel mit dem Aufbau konsistenter, aber auch flexibler und erweiterungsfähiger IT-System- und -SW-Architekturen zu tun. Auch hier gilt: Ein Governance-Framework sollte aus diesem Grund Architektur-Aspekte einbinden. Dies ist bei CoBIT 5 grundsätzlich gegeben, nachdem ein Abgleich mit Togaf stattgefunden hat.
  • 4) Richtlinien
    Ich kenne kaum ein Linux-Projekt und schon gar kein Linux-Einführungs-Projekt, dass nicht der Unterstützung durch das Management bedurft hätte. Damit sind Policies und Richtlinien gefragt. Wer sich mit dieser Thematik noch nicht im Rahmen von anderen Management-Systemen befasst hat, erhält über einen COBIT Foundation Kurs einen guten Einstieg.
  • 5) Breites, umfassendes Alignment mit anderen Frameworks
    COBIT 5 stellt aus meiner jetzigen Sicht ein sehr breit angelegtes Rahmenwerk dar, das gezielt mit anderen wichtigen Frameworks und Normen wie u.a. ITIL V3, ISO 20000, ISO 21000, ISO 31000, ISO 38500, COSO, Togaf abgeglichen wurde. Hier ist seit COBIT 4 eine große Fleißarbeit geleistet worden, die den gesamten Governance-Ansatz auf eine deutlich breitere, umfassendere Grundlage stellt als manche andere mir bekannten Frameworks oder Normen. Nach meiner Einschätzung hat das letztlich mehr Vor- als Nachteile. Zumindest verdeutlicht COBIT 5, an welchen Punkten IT-Governance auf andere Management-Systeme oder Best Practices zurückgreifen kann und sollte. Wie oben schon angedeutet, ergibt sich hierbei u.a. auch ein Anknüpfungspunkt, um die Prozess-Welten von Projekten/Programmen und des IT-Service-Managements in sinnvoller Weise miteinander zu verzahnen – auch wenn COBIT die konkrete Ausdeutung für mein Gefühl nicht hinreichend vornimmt. Aber das ist auch ein spezieller Aspekt, der mich wegen meiner Affinität zu SW-Projekten besonders interessiert.

Natürlich kann ein Foundation Kurs zu den genannten Punkten maximal Grundlagen vermitteln und Hinweise darauf geben, wie entsprechende Prozesse in ein governance-getriebenes Prozessmodell integriert werden können. Das ist aus meiner Sicht jedoch schon interessant genug – auch wenn die Untergliederung von Prozessen in sog. “Praktiken” für den ITILianer etwas gewöhnungsbedürftig ist. Zudem finde ich, dass COBIT aus Sicht des Managements eine ebenso gute Motivation für die Bschäftigung mit weiterführenden oder spezialisierten IT-bezogene Management-Frameworks und Normen bietet wie ITIL dies aus einer ganz anderen Perspektive auch tut.

Den Aufwand für die internationale, personenbezogene Foundation Zertifizierung möchte persönlich ich im Vergleich mit anderen Zertifizierungsprüfungen als wirklich überschaubar einstufen. Das von mir gewählte Buch “Praxiswissen COBIT” ist als Referenz und für ein Selbststudium sicher gut geeignet – auch wenn man nach dem Lesen nicht unbedingt einschätzen kann, was für eine Foundation Prüfung relevant ist. Der Verfasser Markus Gaulke sollte in künftigen Auflagen vielleicht zu Beginn des Buches entsprechende Lesehinweise geben und nicht erst auf S. 363. Die relevanten Kapitel kann man an einem längeren Wochenende gut durcharbeiten. Den Besuch eines ca. 1,5 tägigen Kurses zur Vorbereitung auf die Prüfung empfand ich auf dieser Grundlage als sehr hilfreich, wenn auch nicht zwingend erforderlich. Der von mir besuchte Kurs beim mITSM war erfrischend lebendig und rückte zudem die eine oder andere vorschnell auf der Basis anderer Frameworks oder Normen gefasste Überzeugung gerade. Ein Kurs hilft ferner auch, die oft etwas verquere Fragestellungen, den Stil sowie Haken und Ösen der Prüfungen besser einzuschätzen – zumal mir nach der Prüfung manche Frage zu wortwörtlich aus dem Englischen übersetzt erscheint.

Eine Warnung noch : IT-Governance ist kein Thema für Technik-Freaks – es geht vielmehr
um die Steuerung von Prozessen zur Erreichung von Unternehmens- oder Organisationszielen. Natürlich wird dabei auch die Organisation technischer Aufgabenstellungen berührt. Aber technologische Fragen sind kein primäres Thema eines COBIT-Foundation Kurses – wenngleich sich manches Steuerungselement an konkreten Beispielen gut verdeutlichen lässt.

Aber niemand hat ja festgelegt, dass nicht auch technik-affine Mitarbeiter und Führungskräfte über ihren Tellerrand hinausblicken dürfen. Und eine stringente IT-Governance ist bei den vielfältigen, heterogenen und offenen Angeboten an potentiellen linux-basierten Lösungen für Aufgabenstellungen eines modernen Unternehmens sicher ein essentieller Faktor, durch den die ungeheure Dynamik von Open Source Entwicklungen erst zum Vorteil von Unternehmen besser genutzt und auf priorisierte Ziele hin ausgerichtet werden kann. Führung, Agilität und Prinzipien einer kreativen Selbstorganisation sind für mich überhaupt keine Widersprüche sondern einander ergänzende Faktoren – gerade im Open Source Umfeld. COBIT kann der Unternehmensführung wie dem IT-Management helfen, wirksame Leitplanken für die fruchtbare Entfaltung von Agilität und Dynamik zu setzen.

Character sets and Ajax, PHP, JSON – decode/encode your strings properly!

Ajax and PHP programs run in a more or less complex environment. Very often you want to transfer data from a browser client via Ajax to a PHP server and save them after some manipulation into a MariaDB or MySQL database. As you use Ajax you expect some asynchronous response sent from the PHP sever back at the client. This answer can have a complicated structure and may contain a combination of data from different sources – e.g. the database or from your PHP programs.

If and when all components and interfaces [web pages, Ajax-programs, the web server, files, PHP programs, PHP/MySQL interfaces, MySQL …) are set up for a UTF-8 character encoding you probably will not experience any problems regarding the transfer of POST data to a PHP server by Ajax and further on into a MySQL database via a suitable PHP/MySQL interface. The same would be true for the Ajax response. In this article I shall assume that the Ajax response is expected as a JSON object, which we prepare by using the function json_encode() on the PHP side.

Due to provider restrictions or customer requirements you may not always find such an ideal “utf-8 only” situation where you can control all components. Instead, you may be forced to combine your PHP classes and methods with programs others have developed. E.g., your classes may be included into programs of others. And what you have to or should do with the Ajax data may depend on settings others have already performed in classes which are beyond your control. A simple example where a lack of communication may lead to trouble is the following:

You may find situations where the data transfer from the server side PHP-programs into a MySQL database is pre-configured by a (foreign) class controlling the PHP/MySQL interface for a western character set iso-8859-1 instead of utf-8. Related settings of the MySQL system (SET NAMES) affect the PHP mysql, mysqli and pdo_mysql interfaces for the control program. In such situations the following statement would hold :

If your own classes and methods do not provide data encoded with the expected character set at your PHP/MySQL interface, you may get garbage inside the database. This may in particular lead to classical "Umlaut"-problems for German, French and other languages.

So, as a PHP developer you are prepared to decode the POST or GET data strings of an Ajax request properly before transferring such string data to the database! However, what one sometimes may forget is the following:

You have to encode all data contributing to your Ajax response – which you may deliver in a JSON format to your browser – properly, too. And this encoding may depend on the respective data source or its interface to PHP.

And even worse: For one Ajax request the response data may be fetched from multiple sources – each encoded for a different charset. In case you want to use the JSON format for the response data you probably use the json_encode() function. But this function may react allergic to an offered combination of strings encoded in different charsets! So, a proper and suitable encoding of string data from different sources should be performed before starting the json_encode()-process in your PHP-program ! This requires a complete knowledge and control over the encoding of data from all sources that contribute strings to an Ajax response !

Otherwise, you may never get any (reasonable) result data back to your javascript function handling the Ajax response data. This happened to me lately, when I deployed classes which worked perfectly in a UTF-8 environment on a French LAMP system where the PHP/MySQL interfaces were set up for a latin-1 character set (corresponding to iso-8859-1). Due to proper decoding on the server side Ajax data went correctly into a database –
however, the expected complex response data comprising database data, data from files and programs were not generated at all or incorrectly.

As I found it somewhat difficult to analyze what happened, I provide a short overview over some important steps for such Ajax situations below.

Setting a character set for the PHP/MySQL interface(s)

The character code setting for the PHP/MySQL-connections is performed from the PHP side by issuing a SQL command. For the old interface mysql-interface, e.g., this may look like

$sql_unames = “SET NAMES ‘latin1′”;
mysql_query($sql_unames, $this->db);

Note that this setting for the PHP/MySQL-interfaces has nothing to do with the MySQL character settings for the base, a specific table or a table row! The NAMES settings actually prepares the database for the character set of incoming and outgoing data streams. The transformation of string data to (or from) the character code defined in your database/tables/columns is additionally and internally done inside the MySQL RDBMS.

With such a PHP/MySQL setting you may arrive at situations like the one displayed in the following drawing:

ajax_encoding

In the case sketched above I expect the result data to come back to the server in a JSON format.

Looking at the transfer processes, one of the first questions is: How does or should the Ajax transfer to the server for POST data work with respect to character sets ?

Transfer POST data of Ajax-requests encoded with UTF-8

Normally, when you transfer data for a web form to a server you have to choose between the GET or the POST mechanism. This, of course, is also true for Ajax controlled data transfers. Before starting an Ajax request you have to set up the Ajax environment and objects in your Javascript programs accordingly. But potentially there are more things to configure. Via e.g. jQuery you may define an option regarding the so called “ContentType” for the character encoding of the transfer data, the “type” of the data to be sent to the server and the “dataType” for the structural format of the response data:

$.ajaxSetup( { …..
    ContentType : ‘application/x-www-form-urlencoded; charset=UTF-8’
    type : ‘POST’
    dataType : ‘json’
…});

With the first option you could at least in principle change the charset for the encoding to iso-8859-1. However, I normally refrain from doing so, because it is not compliant with W3C-requirements. The jQuery/Ajax documentation says:

" The W3C XMLHttpRequest specification dictates that the charset is always UTF-8; specifying another charset will not force the browser to change the encoding."
(See: http://api.jquery.com/jquery.ajax/).

Therefore, I use the standard and send POST data in Ajax-requests utf-8 encoded. In our scenario this setting would lead to dramatic consequences on the PHP/MySQL side if you did not properly decode the sent data on the server before saving them into the database.

In case you have used the “SET NAMES” SQL command to activate a latin-1 encoded database connection, you must apply the function utf8_decode() to utf-8 encoded strings in the $_POST-array before you want to save these strings in some database table-
fields!

In case you want to deploy Ajax and PHP codes in an international environment where “SET NAMES” may vary from server to server it is wise to analyze your PHP/MySQL interface settings before deciding whether and how to decode. Therefore, the PHP/MySQL interface settings should be available information for your PHP methods dealing with Ajax data.

Note, that the function utf8_decode() decodes to the iso-8859-1-charset, only. For some cases this may not be sufficient (think of the €-sign !). Then the more general function iconv() is your friend on the PHP side.
See: http://de1.php.net/manual/de/function.iconv.php.

Now, you may think we have gained what we wanted for the “Ajax to database” transfer. Not quite:

The strings you eventually want to save in the database may be composed of substrings coming from different sources – not only from the $_POST array after an Ajax request. So, you need to control where from and in which charset the strings you compose come from. A very simple source is the program itself – but the program files (and/or includes) may have another charset than the $-POST-data! So, the individual strings may require a different de- or en-coding treatment! For that purpose the general “Multibyte String Functions” of PHP may be of help for testing or creating specific encodings. See e.g.: http://php.net/manual/de/function.mb-detect-encoding.php

Do not forget to encode Ajax response data properly!

An Ajax request is answered asynchronously. I often use the JSON format for the response from the server to the browser. It is easy to handle and well suited for Javascript. On the PHP the json_encode() function helps to create the required JSON object from the components of an array. However, the strings combined into a JSON conform Ajax data response object may come from different sources. In my scenario I had to combine data defined

  • in data files,
  • in PHP class definition files,
  • in a MySQL database.

All of these sources may provide the data with a different character encoding. In the most simple case, think about a combination (inclusion) of PHP files which some other developers have encoded in UTF-8 whereas your own files are encoded in iso-8859-1. This may e.g. be due to different standard settings in the Eclipse environments the programmers use.

Or let’s take another more realistic example fitting our scenario above:
Assume you have to work with some strings which contain a German “umlaut” as “ü”, “ö”, “ä” or “ß”. E.g., in your $_POST-array you may have received (via Ajax) some string “München” in W3C compliant UTF-8 format. Now, due to database requirements discussed above you convert the “München” string in $_POST[‘muc’] with

$str_x = utf8_decode($_POST[‘muc’]);

to iso-8859-1 before saving it into the database. Then the correct characters would appear in your database table (a fact which you could check by phpMyAdmin).

However, in some other parts of your your UTF-8 encoded PHP(5) program file (or in included files) you (or some other contributing programmers) may have defined a string variable $str_x that eventually also shall contribute to a JSON formatted Ajax response:

$str_y = “München”;

Sooner or later, you prepare your Ajax response – maybe by something like :

$ay_ajax_response[‘x’] = $str_x;
$ay_ajax_response[‘y’] = $str_y;
$ajax_response = json_encode($ay_ajax_response);
echo $ajax_response;

n
(Of course I oversimplify; you would not use global data but much more sophisticated things … ). In such a situation you may never see your expected response values correctly. Depending on your concrete setup of the Ajax connection in your client Javascript/jQuery program you may not even get anything on the client side. Why? Because the PHP function json_encode() will return “false” ! Reason:

json_encode() expects all input strings to be utf-8 encoded !

But this is not the case for your decoded $str_x in our example! Now, think of string data coming from the database in our scenario:

For the same reason, weird things would also happen if you just retrieved some data from a database without thinking about the encoding of the PHP/MySQL interface. If you had used “SET NAMES” to set the PHP/MySQL interface to latin-1, then retrieved some string data from the base and injected them directly – i.e. without a transformation to utf-8 by utf8_encode() – into your Ajax response you would run into the same trouble as described in the example above. Therefore:

Before using json_encode() make sure that all strings in your input array – from whichever source they may come – are properly encoded in UTF-8 ! Watch out for specific settings for the database connection which may have been set by database handling objects. If your original strings coming from the database are encoded in iso-8859-1 you can use the PHP function ut8_encode() to get proper UTF-8 strings!

Some rules

The scenario and examples discussed above illustrate several important points when working with several sources that may use different charsets. I try to summarize these points as rules :

  • All program files should be written using the same character set encoding. (This rule seems natural but is not always guaranteed if the results of different developer groups have to be combined)
  • You should write your program statements such that you do not rely on some assumed charsets. Investigate the strings you deal with – e.g. with the PHP multibyte string functions “mb_….()” and test them for their (probable) charset.
  • When you actively use “SET NAMES” from your PHP code you should always make this information (i.e. the character code choice) available to the Ajax handling methods of your PHP objects dealing with the Ajax interface. This information is e.g. required to transform the POST input string data of Ajax requests into the right charset expected by your PHP/MySQL-interface.
  • In case of composing strings from different sources align the character enprintcoding over all sources. Relevant sources with different charsets may e.g. be: data files, data bases, POST/GET data, ..
  • In case you have used “SET NAMES” to use some specific character set for your MySQL database connection do not forget to decode properly before saving into the database and to encode data fetched from the base properly into utf-8 if these data shall be part of the Ajax response. Relevant functions for utf-8/iso-8859-1 transformations may be utf8_encode(), utf8_decode and for more general cases iconv().
  • If you use strings in your program that are encoded in some other charset than utf-8, but which shall contribute to your JSON formatted Ajax response, encode all these strings in utf-8 before you apply json_encode() ! Verify that all strings are in UTF8 format before using json_encode().
  • Always check the return value of json_encode() and react properly by something like
    if (json_encode($…) === false
    ) {
    …. error handling code …
    }
  • Last but not least: When designing your classes and methods for the Ajax handling on the PHP side always think about some internal debugging features, because due to restrictions and missing features you may not be able to fully debug variables on the server. You may need extra information in your Ajax response and you may need switches to change from a Ajax controlled situation to a standard synchronous client/server situation where you could directly see echo/print_r – outputs from the server. Take into account that in some situation you may never get the Ajax response to the client …

I hope, these rules may help some people that have to work with jQuery/Ajax, PHP, MySQL and are confronted with more than one character set.

PHP/OO/Schemata: Composite SLAVE objects and (X)HTML generator methods

Some time ago I wrote an article about how to treat groups of associated, related properties of a Real World Object [“RWO”] when representing it by an object [“PWO”] in PHP.

We assumed that the property description of a certain object class is done with the help of a so called “SCHEMA” (a file with a list of definitions and rules or Schema program using definitions saved in a database Schema table). A SCHEMA is specific for a PWO class. A SCHEMA defines object relations, object properties, the relations between certain object properties, the relation of object properties with database fields and of course the type of each property, the property’s own properties as well as associated constraints. Such “SCHEMATA” would play a central role in a web application as they encapsulate vital structural and detail information about objects and their properties.

Every object constructors would refer to an appropriate SCHEMA, as well as web generators for the creation e.g. of form or web template elements would do. Actually, any reasonable (web) application would of course work with multiple SCHEMATA – each for a different object class. SCHEMATA give us a flexible mode to change or adapt a classes properties and relations. They also close the gap between the OO definitions and a relational database.

To handle groups of associated properties in such an environment I suggested the following:

  • Split the SCHEMA-information for properties and related database fields into several “SLAVE-Schemata” which would be built as sub objects within the MAIN SCHEMA object. Each SCHEMA would describe a certain group of closely associated object properties.
  • Create and use “SLAVE PWO SGL objects” as encapsulated sub-objects in a MAIN SGL PWO. Each SLAVE PWO object gets it’s own properties defined in a related SLAVE-SCHEMA.
  • Each of the SLAVE PWO objects receives it’s knowledge about it’s individual SLAVE-SCHEMA by an injection process during construction.

See:
PHP/OO/Schemata: Decouple property groups by combined composite patterns

Each PWO object, representing a single RWO we shall call a SGL PWO. It comprises a series of sub-objects: SLAVE SGL PWOs that refer to corresponding SLAVE Schemata of a MAIN Schema. See the illustration from the article named above:

Slave_schemata

In the named article I had discussed that one can iterate methods for complete database interactions over the MAIN and the SLAVE objects. The same is true for interactions with POST or SESSION data in a CMS like web application. So, there is no need to rewrite core method code originally written for objects which comprise all object properties in just one property/field-SCHEMA without any SLAVE objects. At least with respect to field checks and database interactions.

This works if and when a SGL object is derived from a base class which comprises all required methods to deal with the database, POST and SESSION data. Each MAIN or SLAVE SGL PWO knows itself how to store/fetch it’s property data into/from database tables (or $_POST or $_SESSION arrays) and uses the same methods inherited from common SGL base classes (= parent classes) to do so.

I have meanwhile introduced and realized this SLAVE PWO and SLAVE SCHEMA approach in a quite general form in my web application framework. In this article I briefly want to discuss some unexpected consequences for HTML
generator methods in a web or CMS context.

(X)HTML-Generator methods – where to place them ?

When you design CMS like web applications you always have to deal with Template [TPL] structures and objects that fill the templates with reasonable formatted contents. The contents may be just text or images or links inserted into template placeholders. In more complicated cases (like e.g. the generation of maintenance masks with forms), however, you may have to generate complete HTML fragments which depend on property/field information given in your respective SCHEMA.

A basic design question is: Where do we place the generator methods? Should the SGL PWOs know how to generate their property contents or should rather a “Template Control Object” – a “TCO” – know what to do? I have always preferred the second approach. Reason:

TPL aspects may become very specific, so the methods may need to know something about the TPL structures – and this is nothing that I want to incorporate into the OO representation [PWO] of real world objects.

Over time I have developed a bunch of generator methods for all kind of funny things required in customer projects. The methods are defined in base classes for Template Control Objects and or special purpose sub classes injected into TCOs. A TCO knows about its type of TPL and works accordingly. (By the way: With Pear ITX or Smarty you can realize a complete separation of (X)HTML-code and the PHP code).

(X)HTML-Generator methods – which MAIN or SLAVE PWO and which MAIN/SLAVE SCHEMA are we dealing with ?

In addition to some property/field identifiers a (X)HTML generator method has of course to know what SGL PWO and what SCHEMA it has to work with. This information can be fetched either by the TCO due to some rules or can be directly injected into the methods.

In the past I wanted to keep interfaces lean. In many applications the SGL PWO object was a classical singleton. So, it could relatively easily be received or identified by the central TCO. I did not see any reason to clutter TCO method interfaces with object references that were already known to their TCO object. So, my generator methods referred and used the SGL PWO object and it’s SCHEMA by invoking it with the “$this”-operator:

function someTCOgenerator_method() {
…..
do something with $this->SGL and $this->Schema
….
}

However, in the light of a more general approach this appears to be a too simplistic idea.

If we regard a SLAVE SGL object as a relatively compact entity inside a PWO – as a SLAVE object with its own property and field information SCHEMA – than we see: for a generator method it behaves almost like an independent object different from the MAIN PWO. This situation is comparable to one where the generator method really would be requested to operate on instances a completely different PWO class:

A HTML generator method needs to know the qualities of certain OO properties and associated database field definitions. In our approach with MAIN SGL PWOs comprising composite SLAVE SGL PWO objects each SGL object knows exactly about its associated MAIN or SLAVE Schema object. To work correctly the generator method must get access to this specific SCHEMA. This would in a reasonable application design also be valid for PWO objects representing other, i.e. different types of RWOs.

A (X)HTML generator method can work properly as soon as it knows about

  • the SGL object,
  • the object property (identified by some name, index or other unique information) to operate on and generate HTML code for,
  • the SCHEMA describing the qualities of the object property and related database fields.

This would in
our approach also be given for our SLAVE SGL objects or any PWO as soon as we inject it into our generator method.

Therefore, (X)HTML generator methods of TCOs should be programmed according to the following rules:

  • Do not assume that there is only one defined class of PWO SGL objects that the Template Control Object TCO needs to know about and needs to apply it’s generator methods to.
  • Instead enable the (X)HTML generator methods of a TCO to be able to work with any kind of PWO and its properties – as long as the PWO provides the appropriate SCHEMA information.
  • Inject the SGL [SLAVE] PWO and thereby also its associated [SLAVE] SCHEMA into each TCO generator method:
    function someTCOgenerator_method($SGL_ext, …..).
  • Do not refer to the TCO’s knowledge about a SGL PWO by using the “$this” operator (like “$this->Sgl”) inside a generator method of a TCO; refer all generator action to a local $SGL object reference that points to the injected (!) object $SGL_ext:
    $SGL = $SGL_ext.
    Also refer to a local $Schema which points to the $SGL->Schema of the injected $SGL object:
    $Schema = $Sgl->Schema.
    (Remember: Objects are transferred by reference !)

These recipes give us the aspired flexibility to deal with properties both of SLAVE objects and objects of a different PWO class.

The injection is a small but structurally important difference in comparison to the database interaction methods directly incorporated in SGL objects or better their base classes. Here the TCO (!) method must receive the required information – and we gain flexibility when we inject the SGL object into it (plus information identifying the property for which a HTML fragment has to be generated for).

Had I respected the rules above already some time ago it would have saved me much time now. Now, I had to adapt the interfaces of many of my generator methods in my TCO base classes.

Again, I found one experience confirmed:

  • In case of doubt do not hesitate to use loose object coupling via injection on general methods which could in principle be applied to more and other objects than the ones the object containing the method knows about at the time of your class design.
  • Use injection even if it may look strange when you need to do something like
    $this->someTCOgenerator_method($this->KnownObject);
    i.e., when you inject something the present object containing the method knows about already.

It will save time afterwards when iterator patterns over other objects have to be used and when you may access the (public) method from outside, too.

Iteration over SLAVE objects

Now, if our (X)HTML-generator methods are prepared for injection of SGL PWO objects, we have no more difficulties to generate HTML fragments required in templates for selected properties of MAIN and SLAVE PWO SGL objects:

We just have to iterate the usage of the generator method for the properties/fields of the MAIN SGL PWO as well as its SLAVE SGL PWOs. By getting the SGL object injected the method also knows about the right SCHEMA to use and provide required detail information for the generation of related HTML code (e.g. for a form element).

Think about a situation in which we want to provide a form with input fields which a user can use to update data for all properties of a certain PWO. We just have to apply generator methods to create the required input field types according to the appropriate SCHEMA informations. For this

  • we loop (iterate) over the MAIN and all it’s SLAVE objects,
  • identify all properties according to the MAIN or SLAVE SCHEMA
    information
  • and apply the generator method by injecting the relevant property identifier plus the MAIN/SLAVE-SGL object in question.

Mission accomplished.

The principle of iteration over SLAVE objects is actually nothing new to us: We used it already when dealing with the database interaction methods. (It is a basic ingredient of a composite pattern).

If we only want to work on selected properties then we need to know which of the properties is located in which SLAVE PWO and described in which SLAVE SCHEMA. To be able to do so, we should create and use an array

  • that collects information about which PWO property belongs to which of the PWOs SLAVE objects
  • and which is filled in course of the construction process of a PWO.

Conclusion

When realizing a composite pattern in the form of SLAVE objects (with SLAVE Schemata) to deal with closely associated property groups of complex objects you can apply existing base class methods and iterate over the SLAVE objects to accomplish complete transactions affecting all properties of a structured PWO. This principle can be extended to (X)HTML generator methods of Template Control Objects, if these methods are prepared to receive the SLAVE SGL PWO objects by injection. If we only want to apply generator methods on a bunch of selected properties, we should use an array associating each PWO property with a SLAVE PWO SGL object.

MySQL: LOAD DATA INFILE, multiple csv-files and index creation for big tables

In some of the last articles in this blog I have discussed the following aspects of a data import from csv-files into MySQL ISAM tables:

  • The use of the SQL command “LOAD DATA INFILE” may accelerate the import by a factor of at least 18 in comparison to reading the data line by line with fgetcsv() and inserting single records into the database table.
  • The creation of a (small) helper table may often be useful to improve the performance of repeated consistency checks based on the evaluation of “derived result sets” from COUNT(DISTINCT …) subselects or subqueries on big tables.

The first point implies a straightforward use of “LOAD DATA INFILE” in your PHP programs. The second point may be applicable if certain data structures are given and it requires a balanced use of indices.

See:
Importing large csv files with PHP into a MySQL MyISAM table
MySQL/PHP: LOAD DATA – import of large csv files – linearity with record number?
MySQL/Aggregation: Comparing COUNT(DISTINCT) values on big tables

In these articles I have more or less neglected two things:

  • A possible splitting of the import process into several sequential processes. Instead of loading one big file we may rather distribute the data records over several files. Each individual process loads data from just one file; so the data are loaded file after file into the table.
  • The impact of index creation on the data import time.

Reasons for a sequential data loading from multiple csv-files are: Better control of the loading process due to shorter times for well defined data packages and also an adaption to memory limits. However, without tests it is not clear whether such an approach may have an impact on the overall loading performance.

The second point raises the question whether and how index creation influences the efficiency of “LOAD DATA INFILE”. Note that the MySQL documentation explicitly recommends to separate index creation from the loading process. See:
http://dev.mysql.com/doc/refman/5.1/de/load-data.html

This article provides some numbers from test runs for both measures. We shall see how the combination of both points influences the overall performance of the data import. Furthermore, an optimized loading process may influence your strategy regarding consistency checks during a sequential loading of several files.

Loading data from a sequence of multiple files instead of loading from one big file

I briefly have a look at some of the aspects of loading data from a sequence of files. In the article PHP/MySQL/Linux: File upload, database import and access right problems I have already discussed that one may split the data to be imported into bunches of records and distribute them over several files. I have pointed out that the data splitting shall not be done by fields. Instead records should be distributed over multiple files; but each record should contain all of the associated (!) fields if you want to profit from the performance of “LOAD DATA INFILE”.

A further aspect you have to take care of are logical relations between the data records. Such relations may become relevant in case you already want to work with the data in between 2 file loads:

Lets say you have
data with the structure

nr,    m,n,    i    Q1, Q2, Q3

m,n,i shall represent keys which together form an unique key of each record. An [m,n]-combination may e.g. define an object called “node” and [i] may represent time periods. Let us further assume that after having loaded one file you for some reason want to check the complete SUM (Q2) over all (!) i-values for each “node” before you (or a PHP program) import the data of the next file. Then you must organize your data such that all records related to one and the same node are placed inside exactly one file – otherwise you would get a moving target for your SUM when loading your files. In general, the distribution of the records over several files must comply with logical conditions if you want to work with the data already during the loading process.

Another aspect of distributing records over several files may be memory considerations: In one of the articles named above we saw that the loading and index generation consume more and more memory for buffering as the absolute numbers of records grow. Therefore, if you want to avoid conflicts with limited RAM, it may be reasonable to load bunches of records (in form of several csv-files) instead of processing on one gigantic file with “LOAD DATA INFILE” .

In most cases the central objective of using data packages is a better control of the file transfer and loading processes: With multiple files you can build up a “file pipeline” on the server. The files in this “pipeline” are handled by a repeatedly running import job. The required time interval to transfer each file to the server and load its data into the base can be limited to a reasonable value by choosing an appropriate number of records per file. After each file is processed on the server you may analyze the new table status, the pipeline status and send success or failure messages to the client (e.g. a browser). With a web client the communication exchange can be managed by Ajax.

A sequential loading of files also provides you with the possibility to fill the file pipeline on the server in parallel to loading data of already completely received files into the database. Despite a more complicated communication structure between a browser client and the server you can still keep control over both the transfers from the client to the server as well as over a sequence of server processes loading loading data into the base – file by file.

I have realized the sequential handling of several csv-files on a web-server with Ajax and PHP in the following way:
A web client starts an Ajax controlled PHP job that analyzes the number of files in a pipeline (a special directory) on the server and chooses the next file to import by some given rules. After having loaded the data the job returns control to the client and sends information about the processed file, the loaded number of records and the new status of the database table. Depending on parameter settings there and of course depending on the quality of the status information from the server a new server import job for the next file may automatically be started from the client with the help of Ajax from the client. This continues until the pipeline is empty. The last job may create required indices on the filled database table – depending on user settings.

Is the total import time independent of a record distribution over multiple files?

In the article MySQL/PHP: LOAD DATA – import of large csv files – linearity with record number? I discussed the seemingly linear dependency of “LOAD DATA INFILE” on the total record number of ONE file. Now, we should in addition investigate possible performance variations for import processes

  • that load a given number of records sequentially loaded from
    several csv-files;
  • with or without a parallel building of (unique) indices on defined columns.
  • that perform the creation of required (unique) indices only AFTER all records have been loaded into the database table.

Importing 5 million records from one or two files

Let us again take the data structure

nr,    m,n,    i    Q1, Q2, Q3

All Q-values are floating point numbers with double accuracy (MySQL type DOUBLE) in our scenario. We assume that at the beginning of our import runs we have already defined some indices on our database table:

  • a unique index (with auto-incrementation) on the column “nr”
  • a unique index defined for the columns “[n,m,i]”

In the course of our tests we shall study the effect of

  • dropping these indices,
  • creating them whilst filling the table with data,
  • creating them after having filled the table with data.

For each test case with 5 million records we look at two different situations :

  • Importing one csv-file with all 5 million records.
  • Sequentially importing 2 csv-files – one with 3 million records and one with 2 million records.

Our test server is a relatively small one in form of a KVM virtualized LAMP server with 2 GByte assigned RAM, CPU INTEL Q9550, Raid 10 disk array. For the tests I worked with MySQL MyISAM tables. In my concrete test table I chose:

distinct m values : 1 (as an extreme situation)
distinct n values : 13580
distinct i values : 386

I got the following numbers:

import by 1 file import by 2 files with unique ai-index on nr with non-unique index on nr without any index on nr with unique index on [n,m,i] with non-unique index on [n,m,i] without any index on [n,m,i] response time
yes no yes no no yes no no 30 secs
no yes yes no no yes no no 30 secs
yes no no yes no yes no no 30 secs
no yes no yes no yes no no 30 secs
yes no no yes no no yes no 21,3 secs
no yes no yes no no yes no 24,8 secs
yes no no no yes yes no no 19.2 secs
no yes no no yes yes no no 19.4 secs
yes no no no yes no yes no 9.8 secs
no yes no no yes no yes no 13.3 secs
yes no no no yes no no yes 8.7 secs
no yes no no yes no no yes 8.8 secs

The accuracy of the response times is in the range of a second – depending on the general situation on the virtualization host and its disk system at the measurements. The response times do not include any file transfer times between the client and the server in our LAN. However, the response time for the situations with a sequential load of several files includes some common Ajax overhead in all cases plus the time to load the PHP program files into our Apache server again. But the system buffers and caches sufficiently. So, the interaction between the client and the server should only have a marginal impact.

What do we learn from the results ? I dare to derive 4 rules:

  • Rule 1: Any unique index built during “LOAD DATA INFILE” reduces performance (due to checks which have to be performed). So, one should carefully check whether a unique index really is required later on when you work with the data.
  • Rule 2: The total import time for several or one big file are comparable if a unique index is created whilst or after data loading.
  • Rule 3: Any building of a non-unique index during the import increases the total import time for a sequential load of several csv-files in comparison to a loading process for just one big csv-file.
  • Rule 4: Without any index defined the total loading time for several files is comparable to the loading file for one big csv-file.

The most surprising effect for me was that described in rule 3. However, regarding the accuracy in the range of a second we may have some doubts whether we see a real effect. So, we should have a look at a situation with more data records and more files to confirm this rule.

Now, it will of course be interesting to see a comparison between different loading processes with a creation of the [n,m,i]-index AFTER the processing of all files with “LOAD DATA INFILE”. For such cases technically we drop any index before we load any file data and create the index only after the data of all files have been loaded into the base. Note, that index dropping may take some time, too.

Table: Comparison of server response times for index creation AFTER and WHILST data loading 5 million records with “LOAD DATA INFILE”

n

import by 1 files import by 2 files without any index on nr with unique index building on [n,m,i] whilst loading with non-unique index building on [n,m,i] whilst loading with unique index creation on [n,m,i] after loading with non-unique index creation on [n,m,i] after loading response time
yes no yes yes no no no 19.2 secs
no yes yes yes no no no 19.4 secs
yes no yes no yes no no 9.3 secs
no yes yes no yes no no 13.2 secs
yes no yes no no yes no 19.5 / 18.5 secs
yes no yes no no yes no 20.7 / 19.1 secs
yes no yes no no no yes 11.9 / 10.8 secs
no yes yes no no no yes 11.4 / 11.1 secs

The first time value given in the last four lines includes an index drop time which varies a bit according to precise server conditions at the time of measurement.

The most interesting result is that the differences measured between a sequential load of several files and a situation with just one big file almost to disappear when we create the index after the pure data loading.

Importing 10 million records from one or five files

Now, let us turn to 10 million records and an equal distribution of records over 5 files (with 2 million records each) and see whether we confirm our results. For the test we have just doubled the number of nodes in comparison to the 5 million case.

Table: Server response times for loading 10 million records with “LOAD DATA INFILE”

import by 1 files import by 5 files without any index on nr with parallel unique index building on [n,m,i] with parallel non-unique index building on [n,m,i] with unique index creation on [n,m,i] after loading with non-unique index creation on [n,m,i] after loading response time
yes no yes yes no no no 42 secs
no yes yes yes no no no 42,4 secs
yes no yes no yes no no 20 secs
no yes yes no yes no no 37 secs
yes no yes no no yes no 43.3 / 42.0 secs
no yes yes no no yes no 41.3 / 40.0 secs
yes no yes no no no yes 22.9 / 21.8 secs
no yes yes no no no yes 22.5 / 21.4 secs

Again, we see the huge impact of the creation of an unique index – but the total
loading time is almost independent of whether you create the index whilst or after the data loading. This is understandable as the required checks for a uniqueness must be done in both situations.

However, at least for me, it was almost shocking to see the difference between lines 3 and 4:

There is a big difference between a sequential loading process for several files and a loading process for just one file, when a non-unique index was created whilst loading with “LOAD DATA INFILE”.

Note in contrast the considerable reduction of the required time if you create the non-unique index after all data have been loaded into the database. The measure discrepancy in total loading time is really remarkable – and actually I have no convincing explanation for these findings. The tree structure of the index is quite simple; I do not really see what costs the overhead. And I have tested the effect several times to exclude any extraordinary conditions on the server. I have also validated parameter settings and the steps done during the sequence of loading steps to exclude any unplanned work or actions on the server.

The [n,m,i] index in my case has a size of 215 MByte; the data themselves take another 400 MByte. All in all we speak about 600 MByte. So, I suspect some reading and writing from/to the disk at the end/beginning of the individual 5 loader jobs and some inefficient memory management. I have not yet investigated this in detail. Write me a mail if you know the reason.

Independent of the reasons: For me, it was really important to see this difference because I do not need a unique index and I want to work with many more files to cover much larger numbers of records.

So, our result is:

You should avoid the creation of a non-unique index whilst using “LOAD DATA INFILE” on multiple files by sequentially started (PHP) jobs on the server instead of loading just once from one file. Whenever you load your data from several files create the index after all data from all files have been imported into the database table.

Impact on consistency checks during a sequential load of several files

In the article MySQL/Aggregation: Comparing COUNT(DISTINCT) values on big tables I discussed the creation of a helper table and respective indices to get a good performance of consistency checks. Most of these consistency checks were based on an analysis of derived result sets for “SELECT COUNT(DISTINCT … ).. ” statements. My basic idea was to perform such checks after each import process when dealing with a sequence of files.

However, in the named article I also showed that a fast creation of a helper table required already some existing index on the big data table. But our present results favor a solution where an index is created only after all data of all files have been loaded. So, no intermediate consistency checks in between the loading processes for two csv-files !

After some thinking I came to the conclusion that this is no big disadvantage. Even if you consistency checks only after all data have been loaded, you will still be able to identify the records which are problematic. And with some effort, data and file analysis – which will cost some extra time, but maybe worth it – you even may find the file from which the record was loaded. So, you can give the user who tried to load the files a full report on the suspicious records and that caused inconsistencies and the files that contained them.

Conclusion

We have tested different combinations for loading data with one or multiple files and with or without creating indices during/after the data loading. Our tests have shown that if you import data sequentially from several
files and you want to create a non-unique index over some columns, it makes a huge difference whether you create this non-unique index during or after the sequential “LOAD DATA INFILE” processes. One should avoid creating such indices whilst loading the data with “LOAD DATA INFILE”. It should be created after all data from all files have been imported into the database table.

In contrast such an order of steps does not seem to be required when creating unique indices. However, a unique index costs in general significantly more time to be created than a non-unique one. You, therefore, should evaluate the necessity of unique indices.

PHP: Upload/import of large csv files, server resource limitations, php.ini

In this article I want to list and briefly discuss some resource limiting parameters for PHP that may become important when big data files are transferred to a server and afterwards imported into a database. Before I do that, I first want to discuss a related comment of reader.

Data splitting to avoid huge transfer times to the server

In one of my last articles [Importing large csv files with PHP into a MySQL MyISAM table] I recommended using the SQL statement “LOAD DATA INFILE …” inside PHP programs to import big csv-files into a MySQL database. In addition I recommended to deliver all fields of a record with the same key dependency in one file instead of distributing data fields of records over several separate csv-files and import these files in maybe several separate (denormalized) database tables.

A reader reacted by writing in a mail that a file combining many data fields would of course become much larger than each of the individual files with only few data fields. He pointed out that this may become a problem for the transfer of the file from a client to the server – especially if the server was hosted at a provider. Typical and connected problem areas could be :

  • bandwidth of the client-server connection – especially for ADSL connections
  • limitations on the server side regarding e.g. file size, maximum job duration or input times

I agree – but only partially.
First, it is true that a limited bandwidth for file uploads may become a problem. An example of a “small” csv file with a million records each containing 7 fields (4 key fields, 3 quantity fields) makes that clear:

Upload time = transfer time to the server :

  • In my case the file size was around 35 MByte. To upload such a file with ADSL and a maximum upload rate of 1MBit gives you an effective 5 minutes upload time (better transfer time to the server).
  • The transfer time has to be compared with the data import time on the server itself which turns out to be in the region of 6 seconds when using the “LOAD DATA INFILE” directive.

The time difference will get even bigger for larger files. So, we see that the transfer time may soon become a dominant factor when our server is located somewhere in the internet. The question arises, whether required upload times may collide with server settings. This would be one example of a server side resource limitation we need to deal with when working with big data. We come back to a potentially relevant, but disputed parameter later on.

Anyway, I do not agree with data splitting by fields to overcome bandwidth problems:
One reason is that the total upload time will not become smaller. The only advantage is a smaller upload time interval per file. This may help to get a better control over the upload process(es). My real argument against splitting by fields is that the total data import time for several files with a few data fields each but with the same huge number of records may become considerably bigger than for one file with all data fields (and the same number of records). At least if you use the fast “LOAD DATA INFILE” feature of the MySQL engine. So, I would

  1. either tolerate a relative big transfer time to the server
  2. or try to organize the data records in such a way that they can be uploaded to the server and imported into the data base sequentially – i.e. by files with the full data field spectrum but with reduced record numbers.

I would not give up the idea of transferring csv-files with as many fields as possible per record line – as long as this is
compatible with a normalized data model. Point 2 can be realized very often without problems – and with Ajax and HTML5 technologies it can even be solved in such a way that the transfers are automatically done one after the other (file pipelines). So, if you want to limit the transfer times, split the data by records and not by fields. I.e. transfer several files with bunches of records instead of several files with fewer fields.

Resource limitations require control over server parameter settings

Nevertheless, relatively big transfer times for big files may conflict with some server settings. Also – and more important – the time required for the import into the database or the file size and memory consumption can conflict with limits set on your server. E.g., the maximum time a PHP job is allowed to run on most web servers is limited – as are many other resources given to a PHP job.

Many of the PHP resource limitations we may be confronted with on a Apache Web server are defined by parameter settings in a php.ini-file. On a SuSE system this is typically located at “/etc/php5/apache2/php.ini”. Many providers deny access to these global settings – although one could in principle influence them for certain scripts by .htaccess-files or by putting php.ini files into the directories where your PHP-scripts reside.

This is one of the reasons why I urge my customers to rent a root server or at least a virtualized server of some provider when seriously dealing with big data. You need to have control over a variety of server settings. Sometimes also general settings and not only php.ini-parameters. Working with big data files and applications dealing with tens or hundreds of millions of records in joined tables require full server control especially during the development phase. This is in my opinion nothing for simple web site hosting.

Some relevant php.ini parameters

What are typical php.ini parameters that limit resources for PHP processes on a server and may get relevant for file uploads? I think the most important ones are the following:

  • max_execution_time :: default value : 30 (many providers limit this to 10)
  • upload_max_filesize :: default: 2 M
  • post_max_size :: default : 8M
  • memory_limit :: default :: 128 M
  • max_input_time :: default value : 60
  • session.gc_maxlifetime :: default : 1440

The time numbers are given in secs. The memory numbers in Megabytes.

Most of these parameters are mentioned in the following web articles which comment their impact on file uploads. Please, have a look at these web pages.
http://php.net/manual/en/features.file-upload.php
http://php.net/manual/de/features.file-upload.common-pitfalls.php

Important parameters for the transfer of big data files are “upload_max_filesize” and “post_max-size”. I want to stress the following point:

Both of these parameters have to be set consistently when dealing with file uploads. (Actually, in the past I sometimes forgot that myself and wasted some time with wondering why a file did not get loaded although I had set a sufficient value for upload_max_filesize).
 
When you upload a file plus some additional POST data the total amount of POST data can be bigger than just the file size. However, “if the size of post data is greater than post_max_size, the $_POST and $_FILES superglobals are empty” – according to the PHP manual. You may even get no warnings or errors in some situations.
Therefore, the value of “post_max_size” should always be bigger than the value of “upload_max_filesize” – and the latter should of course be at least as big or a bit bigger than the size of the file you plan to transfer to a server.

An important parameter is also the “max_execution_time”. It limits the time a PHP process is allowed to run. It should be big enough to cover the file handling and database import times on the server. Transfer times are not included if we believe the hints given at http://php.net/manual/de/features.file-upload.common-pitfalls.php.

In my understanding the “max_input_time” parameter limits the time for “parsing” request data (= POST or GET data). See http://php.net/manual/en/info.configuration.php#ini.max-input-time
However, a comment on the following page
http://php.net/manual/de/features.file-upload.common-pitfalls.php
says something different:

Warning – max_input_time sets the maximum time, in seconds, the script is allowed to receive input; this includes file uploads. For large or multiple files, or users on slower connections, the default of 60 seconds may be exceeded.”

Personally, I have some doubts about this last statement as I never experienced any problem with the standard settings and files above 200 MB. So a contrary position would be:

“Parsing” refers to the time interval between the arrival on the web server and before a PHP script starts executing – so “max_input_time” would not include the upload time (= transfer time to the server). However, it would include the time to prepare the superglobal arrays as $_GET, $_POST, $_FILE.

This interpretation makes some sense to me. However, I did not test it thoroughly.

Interestingly enough, there is some dispute about the meaning of the max_input_time parameter and its relation to the max_execution_time parameter on the internet. See:

http://blog.hqcodeshop.fi/archives/185-PHP-large-file-uploads.html
https://bugs.php.net/bug.php?id=53590&
https://bugs.php.net/bug.php?id=28572
http://stackoverflow.com/questions/11387113/php-file-upload-affected-or-not-by-max-input-time
http://www.php.de/php-fortgeschrittene/41473-php-ini-frage-zu-max_input_time.html
https://www.grumpyland.com/blog/101/settings-for-uploading-large-files-in-php/
http://www.devside.net/wamp-server/apache-and-php-limits-and-timeouts
http://www.techrepublic.com/article/a-tour-of-the-phpini-configuration-file-part-2/
http://serverfault.com/questions/224536/is-max-input-time-included-in-max-execution-time

I have no definite opinion about this discussion, yet. In two of the articles listed above one can see that even the error messages referring to conflicts with the max_input_time parameter can be misleading as they wrongly refer
to the max_input_time. Therefore, I recommend be aware of the parameter “max_input_time”, although the default should be sufficient for most cases. Setting it to “-1” corresponds to an unlimited max_input_time interval – whatever it really has an impact on.

If you really work with huge files of several hundred megabytes even the garbage collection time may become important. gc_maxlifetime sets a life time limit for data until the system regards them as garbage. So, if your import or data handling times get big enough also this parameter must be adjusted.

The “memory_limit” of a PHP-Scripts may also be reached when processing huge files.

So, you see there is a whole bunch of parameters which you may need to adjust when you start working with larger and larger files.

Web hosting providers and php.ini-settings

You find typical settings of php.ini-parameters for the web hosting packages of the providers Strato and 1&1 here.
http://strato-faq.de/article/1315/Mit-welchen-Grundeinstellungen-steht-PHP-bei-STRATO-zur-Verf%C3%BCgung.html
https://hilfe-center.1und1.de/skript–und-programmiersprachen-c82634/php-c82686/bedienung-c82739/welche-php-einstellungen-kann-ich-aendern-a791899.html

Most web providers also give more or less clear information about which of the PHP relevant parameters can be manipulated by customers (e.g. by the use of .htaccess- or directory specific php.ini-files) and which not. See e.g.:
https://hilfe-center.1und1.de/skript–und-programmiersprachen-c82634/php-c82686/bedienung-c82739/php-einstellungen-ueber-phpini-aendern-a791901.html

At some providers you can use a .htaccess file, for others you may need to put a php.ini file into each script directory. See e.g.:
http://www.webdecker.de/artikel/items/php-ini-value-htaccess.html

Posted in PHP