CSV file upload with ZIP containers, jQuery, Ajax and PHP 5.4 progress tracking – I

This article series is written in support for French colleagues in a PHP collaboration project and therefore in English. I want to describe some basic elements of an

Ajax controlled file upload process between

  • a browser based User Interface (HTML4/5, Javascript, jQuery)
  • and some PHP/MySQL application programs on a LAMP server.

Our customer's project depends on a periodic transfer of up to 40 different CSV files with a lot of input data (around 0.5 GByte) to a database server. A requirement of our customer was that the data transfer should be performed with a ZIP file as a container for the individual CSV data files. After the transfer the contents of the individual CSV files should be imported into specific tables of a database.

As our whole interface of the web application is Ajax based, we decided to control all transfers via jQuery's Ajax API. Meanwhile, there are jQuery Plugins available for this type of task. However, we wanted to fully control all important phases of the file transfer and the data import - both on the client (browser) as well as on the server. This meant that we needed to program all basic steps during an Ajax communication cycle between the browser client and the LAMP server by ourselves. In addition we needed to guarantee some error control.

Personally, I found it a bit astonishing that such a seemingly simple task lead me to some relatively intricate obstacles to overcome. Although most of the necessary ingredients are documented on the Internet, the documentation is sparse and distributed. My objective with this article series is to provide a coherent picture of process design aspects, some coding tricks and also limitations of such a process. This may be useful also for other developers having to solve similar problems. However, if you want to read about a most simple and problem free approach to file upload tasks with Ajax you are probably looking at the wrong article.

Objectives and Assumptions

  • We want to upload several CSV-files (up to 40), whose contents shall be transferred to specific database tables.
  • These files shall be sent to the server in one ZIP file. Reasons for using a ZIP-container file: compression; limitations of the HTML4 file upload API.
  • As the Zip-container may get relatively large we want to see and control the transfer progress over the Internet by some means of PHP 5.4 - as far as possible today.
  • The server shall extract the files from the ZIP and build up a "pipeline" of these files for a subsequent database import of their contents.
  • The data import into database tables shall be done by a sequence of Ajax controlled PHP jobs. Reason: Intermediate information transfer to the client with the option to stop further processing.
  • The server shall decide by some naming conventions what to do with each file.
  • All steps shall be Ajax controlled - a relatively continuous flow of information between client and server has to be established.
  • For the sake of simplicity each Ajax answer of the server at the end of each controlled Ajax transaction cycle shall be encoded in form of a JSON object. (So, if you want to be particularly precise: we use Ajaj instead of Ajax.)

Wording used in this article series

  • JS, jQuery, Ajax:JS below stands for Javascript (on the client side). We furthermore use jQuery and its Ajax interface functionality. We expect JSON responses from the server. Although not completely correct we nevertheless use Ajax and Ajaj as synonyms in the articles of this series.
  • Upload: By "upload" we normally mean the whole process. It comprises a "file transfer process" from the Web client PC to the server and subsequent "database import processes". However, sometimes and for reasons of simplicity we also use the expression "file upload" in a restricted sense - namely for the file transfer to the server, only. It should become clear from the context what we mean.
  • Main PHP program/job:
    The PHP program receiving and working with the transferred Zip-container file and its contents is called "main PHP program" or "main PHP job". It has to be distinguished from "polling jobs" (see below).
  • Polling jobs: A sequence of additional PHP "polling jobs" may be triggered by the client. This is done in form of a time loop with a short period. A "polling job" on the LAMP server reads some status information of a previously started and still running "Main PHP job" (as e.g. the file transfer job or long lasting database import jobs). The status information of the running main job is fetched from a common data source as the $_SESSION or a database table accessible to both the status writing "main PHP job" and the status reading "polling job". Each short timed "polling job" fulfills its own complete Ajax transaction cycle. The evaluation of the Ajax response triggers the next polling job if the main PHP job is still running. We come back to the concept of "Ajax driven polling jobs" later on.
  • CtrlOs: Each user interaction area of a web page - e.g. a HTML FORM in a DIV container - shall be completely controlled by a so called JS "Control object" [CtrlO]. A CtrlO encapsulates all reactions of the UI to events in well defined prototype methods. A CtrlO uses jQuery's proxy mechanism to register events and delegate event handling to defined CtrlO methods. CtrlO methods furthermore control the Ajax communication with the server.
  • Phases: "Phases" describe a full cycle of defined Ajax interaction between client and server. An example of such a full cycle would be:

    HTML Form => Ajax Setup via JS CtrlO method => Submit via JS CtrlO method => POST/FILE data transfer => Server action (PHP) => JSON object as Ajax response => Client analysis of the JSON object via CtrlO method for the Ajax response

  • Client: The client is in our case typically a browser (Firefox) with active JS and jQuery. We do not care about specific requirements of MS IE browsers in this article series; but we assume that at least MS IE browsers > 10 should work.
  • Pipeline: The ordered sequence by which the files of the transferred ZIP container are imported into their related database.

Relevant phases

To get a more detailed overview over what is to be done we distinguish the following main phases and steps (I omit error handling in this overview which may occur at every step):

Phase I - file transfer, progress control and Zip extraction

Step I.1 - Client: Use a HTML form to choose a ZIP file (<input type="file">) and use methods of a specifically designed JS Control object [CtrlO] to control subsequent actions on the client. Add parameter data (hidden input fields) and prepare an Ajax transaction for the file upload (=transfer) process.

Step I.2 - Client: Start the transfer the ZIP file over the Internet to the server. Submit a special parameter in addition to the file to trigger the provision of transfer progress information on the server. Prepare and start the Ajax communication and the data transfer by a CtrlO method.

Step I.3 - Server: Initialize the progress measurement and provide progress data in the $_SESSION array.

Step I.4 - Client/Server: Initiate a sequence of Ajax polling jobs via a JS time loop for reading the progress information on the server. Handle the Ajax response of each polling job in separate defined methods of a special CtrlO. React to error situations and stop the polling job time loop in case of errors or when the file transfer has finalized.

Step I.5 - Server: Extract, expand and save the CSV files from the Zip-container into a special upload folder on the server. This is done by using standard methods of the PHP ZIP class. Define/Suggest a sequence of imports of the data contents of the different files into file specific database tables. This defined sequence may be controlled via an array ("DB Import Control array" = DBIC-array ) which is kept and updated in the $_SESSION array on the server AND which is also sent back via a JSON object to the client.

Step I.6 - Server: Prepare and send an Ajax response in form of a JSON object to the client with affirmation messages about which CSV files have been received, the name of the files and the order in which they shall be processed. Include error messages and system messages if necessary. The JSON object shall contain the "DBIC"-array.

Step I.7 - Client: Analyze the Ajax response. Display success and error information. Display the number and name of files to be processed afterwards. Stop the time loop for polling jobs.

Phase II - Database import of a file in the pipeline

Step II.1 - Client: Prepare and start a new Ajax job with some parameters. The PHP target program of this job shall import the data of one of the already transferred CSV files. Among other things it should be defined, which file shall be processed (= imported into its associated database table) next. This parameter can follow the suggested order of the array which came from the server at the end of Phase I. All parameters can be set up in a separate (hidden) form with hidden input fields. Submit the Ajax job.

Step II.2 - Server: Start the database import on the server with a flexible PHP program. For small and medium sized files (up to approx. below 500000 lines) do it line by line by appropriate special PHP standard methods for handling CSV files. Check the data of each line where reasonable. Gather at least 20 lines in one INSERT statement to accelerate the import process. Write intermediate progress information into a $_SESSION array or a special database table. (This status information may be read by "polling jobs" started on the client.)
For huge files you may extend the import methods later on by using the special MySQL command "LOAD DATA INFILE".

Step II.3 - Client: Launch a sequence of status information polling jobs via a time loop. Handle the return information of each job in separate methods.

Step II.3 - Server: After a successful import of a defined file remove the file from its upload directory (delete it or move it somewhere else, e.g. in a history directory for uploads). Update you Control Array for the sequence of uploads with the following info: Which of the original files have been loaded? Which had errors? Which are still unprocessed? Prepare a JSON object for an Ajax respond (including the upload Control Array). Send it back to the client.

Step II.4 - Client: Analyze the server's response. Stop any polling jobs issued after the previous submit. Continue with displaying information of the success of the database import of the handled file. Determine the next file to load. Continue with the elements of Step 5 described above.

Phases III + n - Client/Server - Loop:

Cycle through a sequence of Steps described under II.1 to II.4 for further phases until all files are processed or an error has occurred.

Enough for today. In the next article

CSV file upload with Zip containers, jQuery, Ajax and PHP 5.4 progress tracking – II

we shall cover some major elements of Phase I.

Eclipse PDT – Code Assist in PHP 5.4 Traits

Traits sind eine feine Sache, wenn man horizontales Design in PHP Projekten angemessen unterstützen will.
Siehe z.B.: http://www.kingcrunch.de/blog/2011/08/01/php5-4-traits-aka-horizontal-reuse/
Ich setze Traits z.Z. selbst verstärkt in Kundenprojekten ein. Ein kleines praktisches Problemchen, über das ich in Eclipse PDT dabei gestolpert bin, ist die Frage des Code Assists bzw. der Code Completion bei der Code-Erstellung innerhalb von Methoden (Funktionsblöcken) eines Traits.

In den Code-Blöcken der Methoden eines Traits kommt ja oft der $this-Operator mit Referenzen auf Methoden, Variablen und ggf. weitere (z.B. injizierte) Objekte der Zielobjekte bzw. Zielklassen des Traits zum Einsatz. Die Zielklassen eines Traits - also die Klassen, in die der Trait eingebunden und die Trait-Funktionen genutzt werden sollen - beinhalten ja typischerweise bereits Referenzen auf eigene Methoden und ggf. auch weitere Objekte (mit zugehörigen Klassen-Variablen und Methoden). Diese referenzierten Methoden und auch Objekte müssen im Trait u.U. angesprochen werden.

Bei der entwicklungstechnischen Bearbeitung der Zielklassen selbst unterstützt einen PDT umfangreich mit Code Assisting auch für weitere referenzierte Objekte und Klassen, wenn

  • diejenigen Member-Variablen, denen die intern referenzierten Objekte zugeordnet werden, z.B. per "phpDocumentor Tag"-Anweisung mit einem Hint hinsichtlich der Klassenzugehörigkeit versehen wurden
  • und/oder innerhalb der Methoden-Codes ein entsprechendes Type Hinting vorgenommen wurde.

Siehe hierzu:
PHP Type Hinting with Eclipse
PHP Code Content Assists und Inline Type Hinting in Eclipse

Nun möchte man gerne bei der Trait-Entwicklung

  • für die Methoden und Variablen der Zielklassen des Traits selbst
  • wie auch für Objekte und deren Klassen, die innerhalb der Zielklasse referenziert werden,

Code Assisting erhalten, wenn man diese in einer Trait-Methode ansprechen und nutzen will.

Natürlich kann die PDT-Engine bei der Bearbeitung eines Traits aber nicht wissen, für welche Ziel-Klassen das Trait zum Einsatz kommen wird. Mit welchem Klassentypus (einer Vererbungshierarchie) der $this-Operator innerhalb des Traits assoziiert sein wird, ist daher ohne weitere Hilfe unklar. Damit hängen auch alle über $this referenzierten internen Objekte in der Luft. Code Assist innerhalb eines Traits beschränkt sich daher ohne weiteres Zutun des Entwicklers nur auf genau die Variablen und Funktionen/Methoden, die im Trait selbst definiert wurden.

In dem Falle, dass die Ziel-Klassen des Traits alle von einer (gemeinsamen) Klasse abgeleitet wurden, kann man aber mehr erreichen.

Diese Situation entspricht z.B. der, dass man für bestimmte Objekt-Typen über eine Vererbungshierarchie einen Satz von Funktionalitäten bereitgestellt hat. Nun sollen entsprechende Objekte in einer unterschiedlichen Gruppen von Applikationen zum Einsatz kommen, in denen die Basis-Funktionalitäten in gruppen- und applikationsspezifischer Weise überschrieben und ergänzt werden sollen. Diese Änderungen/Ergänzungen seien applikationsspezifisch und nicht strukturell bedingt und mögen daher keinen Grund für eine Erweiterung der Klassenhierarchie selbst darstellen.
Dann kann man in PHP sein Ziel auf breiter Front pro Gruppe durch Integration von gruppenspezifischen Traits erreichen.

Beispiel: Eine Veererbungshierarchie stelle ein allgemeines Spektrum an Funktionalitäten für Template-Control-Objekte zur Verfügung. In unserem zu realisierenden Anwendungsspektrum sollen diese Funktionalitäten für definierte Applikationsgruppen spezifisch - innerhalb einer Applikatonsgruppe jedoch immer in gleicher Weise - kombiniert werden. Traits lösen dieses Problem auf einfache und elegante Weise.

Innerhalb des Traits möchte man nun Code-Assisting für die Variablen der gemeinsamen Basisklassen erhalten. Nennen wir die potentiellen Zielklassen mal "Class_CTRL_1", "Class_CTRL_2", ... und gehen wir davon aus, dass alle diese Klassen von einer gemeinsamen Parent-Klasse "Class_Basis_CTRL" abgeleitet sein sollen.

Der kleine aber wirkungsvolle Trick, um volles Code Assist zu bekommen, besteht nun darin, am Kopf einer Trait-Methode $this auf eine interne Variable abzubilden und dabei ein Typ-Hinting einzusetzen:

/* @var $myself Class_Basis_CTRL */
$myself = $this;

Danach arbeitet man in der Entwicklung der Trait-Methoden-Codes anstelle von $this mit $myself weiter und erhält dadurch kompletten Zugriff auf alle Variablen und Methoden von "Class_Basis_Ctrl" im Rahmen des PDT Code Assists. Und weiter auch auf referenzierte Objektklassen, wenn die für die zugehörigen Variablen in der Zielklasse bereits Type Hinting eingesetzt wurde. Das macht wirklich Spaß und Laune ...

Viel Spaß künftig bei der effizienten Entwicklung von Traits unter PHP5.4 mit Eclipse !