CSV file upload with ZIP containers, jQuery, Ajax and PHP 5.4 progress tracking – I

This article series is written in support for French colleagues in a PHP collaboration project and therefore in English. I want to describe some basic elements of an

Ajax controlled file upload process between

  • a browser based User Interface (HTML4/5, Javascript, jQuery)
  • and some PHP/MySQL application programs on a LAMP server.

Our customer’s project depends on a periodic transfer of up to 40 different CSV files with a lot of input data (around 0.5 GByte) to a database server. A requirement of our customer was that the data transfer should be performed with a ZIP file as a container for the individual CSV data files. After the transfer the contents of the individual CSV files should be imported into specific tables of a database.

As our whole interface of the web application is Ajax based, we decided to control all transfers via jQuery’s Ajax API. Meanwhile, there are jQuery Plugins available for this type of task. However, we wanted to fully control all important phases of the file transfer and the data import – both on the client (browser) as well as on the server. This meant that we needed to program all basic steps during an Ajax communication cycle between the browser client and the LAMP server by ourselves. In addition we needed to guarantee some error control.

Personally, I found it a bit astonishing that such a seemingly simple task lead me to some relatively intricate obstacles to overcome. Although most of the necessary ingredients are documented on the Internet, the documentation is sparse and distributed. My objective with this article series is to provide a coherent picture of process design aspects, some coding tricks and also limitations of such a process. This may be useful also for other developers having to solve similar problems. However, if you want to read about a most simple and problem free approach to file upload tasks with Ajax you are probably looking at the wrong article.

Objectives and Assumptions

  • We want to upload several CSV-files (up to 40), whose contents shall be transferred to specific database tables.
  • These files shall be sent to the server in one ZIP file. Reasons for using a ZIP-container file: compression; limitations of the HTML4 file upload API.
  • As the Zip-container may get relatively large we want to see and control the transfer progress over the Internet by some means of PHP 5.4 – as far as possible today.
  • The server shall extract the files from the ZIP and build up a “pipeline” of these files for a subsequent database import of their contents.
  • The data import into database tables shall be done by a sequence of Ajax controlled PHP jobs. Reason: Intermediate information transfer to the client with the option to stop further processing.
  • The server shall decide by some naming conventions what to do with each file.
  • All steps shall be Ajax controlled – a relatively continuous flow of information between client and server has to be established.
  • For the sake of simplicity each Ajax answer of the server at the end of each controlled Ajax transaction cycle shall be encoded in form of a JSON object. (So, if you want to be particularly precise: we use Ajaj instead of Ajax.)

Wording used in this article series

  • JS, jQuery, Ajax:JS below stands for Javascript (on the
    client side). We furthermore use jQuery and its Ajax interface functionality. We expect JSON responses from the server. Although not completely correct we nevertheless use Ajax and Ajaj as synonyms in the articles of this series.
  • Upload: By “upload” we normally mean the whole process. It comprises a “file transfer process” from the Web client PC to the server and subsequent “database import processes”. However, sometimes and for reasons of simplicity we also use the expression “file upload” in a restricted sense – namely for the file transfer to the server, only. It should become clear from the context what we mean.
  • Main PHP program/job:
    The PHP program receiving and working with the transferred Zip-container file and its contents is called “main PHP program” or “main PHP job”. It has to be distinguished from “polling jobs” (see below).
  • Polling jobs: A sequence of additional PHP “polling jobs” may be triggered by the client. This is done in form of a time loop with a short period. A “polling job” on the LAMP server reads some status information of a previously started and still running “Main PHP job” (as e.g. the file transfer job or long lasting database import jobs). The status information of the running main job is fetched from a common data source as the $_SESSION or a database table accessible to both the status writing “main PHP job” and the status reading “polling job”. Each short timed “polling job” fulfills its own complete Ajax transaction cycle. The evaluation of the Ajax response triggers the next polling job if the main PHP job is still running. We come back to the concept of “Ajax driven polling jobs” later on.
  • CtrlOs: Each user interaction area of a web page – e.g. a HTML FORM in a DIV container – shall be completely controlled by a so called JS “Control object” [CtrlO]. A CtrlO encapsulates all reactions of the UI to events in well defined prototype methods. A CtrlO uses jQuery’s proxy mechanism to register events and delegate event handling to defined CtrlO methods. CtrlO methods furthermore control the Ajax communication with the server.
  • Phases: “Phases” describe a full cycle of defined Ajax interaction between client and server. An example of such a full cycle would be:

    HTML Form => Ajax Setup via JS CtrlO method => Submit via JS CtrlO method => POST/FILE data transfer => Server action (PHP) => JSON object as Ajax response => Client analysis of the JSON object via CtrlO method for the Ajax response

  • Client: The client is in our case typically a browser (Firefox) with active JS and jQuery. We do not care about specific requirements of MS IE browsers in this article series; but we assume that at least MS IE browsers > 10 should work.
  • Pipeline: The ordered sequence by which the files of the transferred ZIP container are imported into their related database.

Relevant phases

To get a more detailed overview over what is to be done we distinguish the following main phases and steps (I omit error handling in this overview which may occur at every step):

Phase I – file transfer, progress control and Zip extraction

Step I.1 – Client: Use a HTML form to choose a ZIP file (<input type=”file”>) and use methods of a specifically designed JS
Control object [CtrlO] to control subsequent actions on the client. Add parameter data (hidden input fields) and prepare an Ajax transaction for the file upload (=transfer) process.

Step I.2 – Client: Start the transfer the ZIP file over the Internet to the server. Submit a special parameter in addition to the file to trigger the provision of transfer progress information on the server. Prepare and start the Ajax communication and the data transfer by a CtrlO method.

Step I.3 – Server: Initialize the progress measurement and provide progress data in the $_SESSION array.

Step I.4 – Client/Server: Initiate a sequence of Ajax polling jobs via a JS time loop for reading the progress information on the server. Handle the Ajax response of each polling job in separate defined methods of a special CtrlO. React to error situations and stop the polling job time loop in case of errors or when the file transfer has finalized.

Step I.5 – Server: Extract, expand and save the CSV files from the Zip-container into a special upload folder on the server. This is done by using standard methods of the PHP ZIP class. Define/Suggest a sequence of imports of the data contents of the different files into file specific database tables. This defined sequence may be controlled via an array (“DB Import Control array” = DBIC-array ) which is kept and updated in the $_SESSION array on the server AND which is also sent back via a JSON object to the client.

Step I.6 – Server: Prepare and send an Ajax response in form of a JSON object to the client with affirmation messages about which CSV files have been received, the name of the files and the order in which they shall be processed. Include error messages and system messages if necessary. The JSON object shall contain the “DBIC”-array.

Step I.7 – Client: Analyze the Ajax response. Display success and error information. Display the number and name of files to be processed afterwards. Stop the time loop for polling jobs.

Phase II – Database import of a file in the pipeline

Step II.1 – Client: Prepare and start a new Ajax job with some parameters. The PHP target program of this job shall import the data of one of the already transferred CSV files. Among other things it should be defined, which file shall be processed (= imported into its associated database table) next. This parameter can follow the suggested order of the array which came from the server at the end of Phase I. All parameters can be set up in a separate (hidden) form with hidden input fields. Submit the Ajax job.

Step II.2 – Server: Start the database import on the server with a flexible PHP program. For small and medium sized files (up to approx. below 500000 lines) do it line by line by appropriate special PHP standard methods for handling CSV files. Check the data of each line where reasonable. Gather at least 20 lines in one INSERT statement to accelerate the import process. Write intermediate progress information into a $_SESSION array or a special database table. (This status information may be read by “polling jobs” started on the client.)
For huge files you may extend the import methods later on by using the special MySQL
command “LOAD DATA INFILE”.

Step II.3 – Client: Launch a sequence of status information polling jobs via a time loop. Handle the return information of each job in separate methods.

Step II.3 – Server: After a successful import of a defined file remove the file from its upload directory (delete it or move it somewhere else, e.g. in a history directory for uploads). Update you Control Array for the sequence of uploads with the following info: Which of the original files have been loaded? Which had errors? Which are still unprocessed? Prepare a JSON object for an Ajax respond (including the upload Control Array). Send it back to the client.

Step II.4 – Client: Analyze the server’s response. Stop any polling jobs issued after the previous submit. Continue with displaying information of the success of the database import of the handled file. Determine the next file to load. Continue with the elements of Step 5 described above.

Phases III + n – Client/Server – Loop:

Cycle through a sequence of Steps described under II.1 to II.4 for further phases until all files are processed or an error has occurred.

Enough for today. In the next article

CSV file upload with Zip containers, jQuery, Ajax and PHP 5.4 progress tracking – II

we shall cover some major elements of Phase I.

PHP Code Content Assists und Inline Type Hinting in Eclipse

Vor ein paar Jahren war ich sehr happy, als ich entdeckte, dass Eclipse mit dem PHP Pluging (PDT) etliche Möglichkeiten im Bereich des PHP Content Assistings bietet. So nutze ich die Möglichkeiten des Type Hintings mittels “phpDocumentor tags” ausgiebig oberhalb der Deklaration von Class Member Variablen oder bei der Definition von Funktionen (frei oder als Objekt-Methoden). Es ist einfach schön, bequem und sehr effizient, wenn “Ctrl-Space” das Tippen von neuem Code in einem PHP-Projekt mit passenden Vorschlägen aufgrund der Definition von Klassen etc. in anderen Dateien unterstützt. Fast magisch !

Was ich bisher nicht hinbekommen hatte, war das Type Hinting und “Content Assisting” zu Variablen im PHP-Code, die im Rahmen einer dynamischen Namensgebung über die Nutzung der Funktion “call_user_func” als Objekt einer Klasse instanziiert werden. Das braucht man z.B., wenn man das Singleton-Pattern benutzt und das Singleton-Objekt entweder erstmalig instanziiert werden muss oder aber die Referenz auf das ggf. schon existierende Singleton-Objekt geholt werden soll – soweit es bereits woanders instanziiert wurde.

Aber auch das geht unter Eclipse PDT mit sog. “Inline Type Hinting”. Ein Beispiel:

	$get_instance ="getInstance"; 
	$Class_SS_Run_Data_Name = "Class_SS_Run_Data";
	$ay_SS_Run_Data		= array($Class_SS_Run_Data_Name, $get_instance);

	/* @var $SS_Run_Data Class_SS_Run_Data */		
	$SS_Run_Data = call_user_func($ay_SS_Run_Data);
	unset($ay_SS_Run_Data); 		

getInstance ist hier eine statische Methode, die gemäß des Singleton Patterns entweder das Objekt instanziiert oder die Referenz auf das Singleton Objekt aus einer statischen Variablen ermittelt.

Wichtig ist hier aber die Kommentar-Zeile und ihr Aufbau. Die Klasse “Class_SS_Run_Data” ist in einer anderen Datei definiert, deren Source Pfad natürlich im Bereich des BUILD Path des PHP-Projektes liegen muss.

Tippe ich nun später “$SS_Run_Data->” im Code erhalte ich je nach Einstellungen des Content Assistings umgehend oder spätestens über Ctrl-Space eine Vorschlagsliste zu Variablen oder Methoden der Klasse. Super !

Gelernt habe ich das nach Lesen eines Artikels von Norm McGarry, der sich mal die Mühe gemacht hat, die verschiedenen Arten des “Type Hintings” für “Content Assist”-Zwecke unter Eclipse PDT zusammenzustellen. Siehe:
PHP Type Hinting with Eclipse

Herzlichen Dank an Norm McGarry!

Character sets and Ajax, PHP, JSON – decode/encode your strings properly!

Ajax and PHP programs run in a more or less complex environment. Very often you want to transfer data from a browser client via Ajax to a PHP server and save them after some manipulation into a MariaDB or MySQL database. As you use Ajax you expect some asynchronous response sent from the PHP sever back at the client. This answer can have a complicated structure and may contain a combination of data from different sources – e.g. the database or from your PHP programs.

If and when all components and interfaces [web pages, Ajax-programs, the web server, files, PHP programs, PHP/MySQL interfaces, MySQL …) are set up for a UTF-8 character encoding you probably will not experience any problems regarding the transfer of POST data to a PHP server by Ajax and further on into a MySQL database via a suitable PHP/MySQL interface. The same would be true for the Ajax response. In this article I shall assume that the Ajax response is expected as a JSON object, which we prepare by using the function json_encode() on the PHP side.

Due to provider restrictions or customer requirements you may not always find such an ideal “utf-8 only” situation where you can control all components. Instead, you may be forced to combine your PHP classes and methods with programs others have developed. E.g., your classes may be included into programs of others. And what you have to or should do with the Ajax data may depend on settings others have already performed in classes which are beyond your control. A simple example where a lack of communication may lead to trouble is the following:

You may find situations where the data transfer from the server side PHP-programs into a MySQL database is pre-configured by a (foreign) class controlling the PHP/MySQL interface for a western character set iso-8859-1 instead of utf-8. Related settings of the MySQL system (SET NAMES) affect the PHP mysql, mysqli and pdo_mysql interfaces for the control program. In such situations the following statement would hold :

If your own classes and methods do not provide data encoded with the expected character set at your PHP/MySQL interface, you may get garbage inside the database. This may in particular lead to classical "Umlaut"-problems for German, French and other languages.

So, as a PHP developer you are prepared to decode the POST or GET data strings of an Ajax request properly before transferring such string data to the database! However, what one sometimes may forget is the following:

You have to encode all data contributing to your Ajax response – which you may deliver in a JSON format to your browser – properly, too. And this encoding may depend on the respective data source or its interface to PHP.

And even worse: For one Ajax request the response data may be fetched from multiple sources – each encoded for a different charset. In case you want to use the JSON format for the response data you probably use the json_encode() function. But this function may react allergic to an offered combination of strings encoded in different charsets! So, a proper and suitable encoding of string data from different sources should be performed before starting the json_encode()-process in your PHP-program ! This requires a complete knowledge and control over the encoding of data from all sources that contribute strings to an Ajax response !

Otherwise, you may never get any (reasonable) result data back to your javascript function handling the Ajax response data. This happened to me lately, when I deployed classes which worked perfectly in a UTF-8 environment on a French LAMP system where the PHP/MySQL interfaces were set up for a latin-1 character set (corresponding to iso-8859-1). Due to proper decoding on the server side Ajax data went correctly into a database –
however, the expected complex response data comprising database data, data from files and programs were not generated at all or incorrectly.

As I found it somewhat difficult to analyze what happened, I provide a short overview over some important steps for such Ajax situations below.

Setting a character set for the PHP/MySQL interface(s)

The character code setting for the PHP/MySQL-connections is performed from the PHP side by issuing a SQL command. For the old interface mysql-interface, e.g., this may look like

$sql_unames = “SET NAMES ‘latin1′”;
mysql_query($sql_unames, $this->db);

Note that this setting for the PHP/MySQL-interfaces has nothing to do with the MySQL character settings for the base, a specific table or a table row! The NAMES settings actually prepares the database for the character set of incoming and outgoing data streams. The transformation of string data to (or from) the character code defined in your database/tables/columns is additionally and internally done inside the MySQL RDBMS.

With such a PHP/MySQL setting you may arrive at situations like the one displayed in the following drawing:

ajax_encoding

In the case sketched above I expect the result data to come back to the server in a JSON format.

Looking at the transfer processes, one of the first questions is: How does or should the Ajax transfer to the server for POST data work with respect to character sets ?

Transfer POST data of Ajax-requests encoded with UTF-8

Normally, when you transfer data for a web form to a server you have to choose between the GET or the POST mechanism. This, of course, is also true for Ajax controlled data transfers. Before starting an Ajax request you have to set up the Ajax environment and objects in your Javascript programs accordingly. But potentially there are more things to configure. Via e.g. jQuery you may define an option regarding the so called “ContentType” for the character encoding of the transfer data, the “type” of the data to be sent to the server and the “dataType” for the structural format of the response data:

$.ajaxSetup( { …..
    ContentType : ‘application/x-www-form-urlencoded; charset=UTF-8’
    type : ‘POST’
    dataType : ‘json’
…});

With the first option you could at least in principle change the charset for the encoding to iso-8859-1. However, I normally refrain from doing so, because it is not compliant with W3C-requirements. The jQuery/Ajax documentation says:

" The W3C XMLHttpRequest specification dictates that the charset is always UTF-8; specifying another charset will not force the browser to change the encoding."
(See: http://api.jquery.com/jquery.ajax/).

Therefore, I use the standard and send POST data in Ajax-requests utf-8 encoded. In our scenario this setting would lead to dramatic consequences on the PHP/MySQL side if you did not properly decode the sent data on the server before saving them into the database.

In case you have used the “SET NAMES” SQL command to activate a latin-1 encoded database connection, you must apply the function utf8_decode() to utf-8 encoded strings in the $_POST-array before you want to save these strings in some database table-
fields!

In case you want to deploy Ajax and PHP codes in an international environment where “SET NAMES” may vary from server to server it is wise to analyze your PHP/MySQL interface settings before deciding whether and how to decode. Therefore, the PHP/MySQL interface settings should be available information for your PHP methods dealing with Ajax data.

Note, that the function utf8_decode() decodes to the iso-8859-1-charset, only. For some cases this may not be sufficient (think of the €-sign !). Then the more general function iconv() is your friend on the PHP side.
See: http://de1.php.net/manual/de/function.iconv.php.

Now, you may think we have gained what we wanted for the “Ajax to database” transfer. Not quite:

The strings you eventually want to save in the database may be composed of substrings coming from different sources – not only from the $_POST array after an Ajax request. So, you need to control where from and in which charset the strings you compose come from. A very simple source is the program itself – but the program files (and/or includes) may have another charset than the $-POST-data! So, the individual strings may require a different de- or en-coding treatment! For that purpose the general “Multibyte String Functions” of PHP may be of help for testing or creating specific encodings. See e.g.: http://php.net/manual/de/function.mb-detect-encoding.php

Do not forget to encode Ajax response data properly!

An Ajax request is answered asynchronously. I often use the JSON format for the response from the server to the browser. It is easy to handle and well suited for Javascript. On the PHP the json_encode() function helps to create the required JSON object from the components of an array. However, the strings combined into a JSON conform Ajax data response object may come from different sources. In my scenario I had to combine data defined

  • in data files,
  • in PHP class definition files,
  • in a MySQL database.

All of these sources may provide the data with a different character encoding. In the most simple case, think about a combination (inclusion) of PHP files which some other developers have encoded in UTF-8 whereas your own files are encoded in iso-8859-1. This may e.g. be due to different standard settings in the Eclipse environments the programmers use.

Or let’s take another more realistic example fitting our scenario above:
Assume you have to work with some strings which contain a German “umlaut” as “ü”, “ö”, “ä” or “ß”. E.g., in your $_POST-array you may have received (via Ajax) some string “München” in W3C compliant UTF-8 format. Now, due to database requirements discussed above you convert the “München” string in $_POST[‘muc’] with

$str_x = utf8_decode($_POST[‘muc’]);

to iso-8859-1 before saving it into the database. Then the correct characters would appear in your database table (a fact which you could check by phpMyAdmin).

However, in some other parts of your your UTF-8 encoded PHP(5) program file (or in included files) you (or some other contributing programmers) may have defined a string variable $str_x that eventually also shall contribute to a JSON formatted Ajax response:

$str_y = “München”;

Sooner or later, you prepare your Ajax response – maybe by something like :

$ay_ajax_response[‘x’] = $str_x;
$ay_ajax_response[‘y’] = $str_y;
$ajax_response = json_encode($ay_ajax_response);
echo $ajax_response;

n
(Of course I oversimplify; you would not use global data but much more sophisticated things … ). In such a situation you may never see your expected response values correctly. Depending on your concrete setup of the Ajax connection in your client Javascript/jQuery program you may not even get anything on the client side. Why? Because the PHP function json_encode() will return “false” ! Reason:

json_encode() expects all input strings to be utf-8 encoded !

But this is not the case for your decoded $str_x in our example! Now, think of string data coming from the database in our scenario:

For the same reason, weird things would also happen if you just retrieved some data from a database without thinking about the encoding of the PHP/MySQL interface. If you had used “SET NAMES” to set the PHP/MySQL interface to latin-1, then retrieved some string data from the base and injected them directly – i.e. without a transformation to utf-8 by utf8_encode() – into your Ajax response you would run into the same trouble as described in the example above. Therefore:

Before using json_encode() make sure that all strings in your input array – from whichever source they may come – are properly encoded in UTF-8 ! Watch out for specific settings for the database connection which may have been set by database handling objects. If your original strings coming from the database are encoded in iso-8859-1 you can use the PHP function ut8_encode() to get proper UTF-8 strings!

Some rules

The scenario and examples discussed above illustrate several important points when working with several sources that may use different charsets. I try to summarize these points as rules :

  • All program files should be written using the same character set encoding. (This rule seems natural but is not always guaranteed if the results of different developer groups have to be combined)
  • You should write your program statements such that you do not rely on some assumed charsets. Investigate the strings you deal with – e.g. with the PHP multibyte string functions “mb_….()” and test them for their (probable) charset.
  • When you actively use “SET NAMES” from your PHP code you should always make this information (i.e. the character code choice) available to the Ajax handling methods of your PHP objects dealing with the Ajax interface. This information is e.g. required to transform the POST input string data of Ajax requests into the right charset expected by your PHP/MySQL-interface.
  • In case of composing strings from different sources align the character enprintcoding over all sources. Relevant sources with different charsets may e.g. be: data files, data bases, POST/GET data, ..
  • In case you have used “SET NAMES” to use some specific character set for your MySQL database connection do not forget to decode properly before saving into the database and to encode data fetched from the base properly into utf-8 if these data shall be part of the Ajax response. Relevant functions for utf-8/iso-8859-1 transformations may be utf8_encode(), utf8_decode and for more general cases iconv().
  • If you use strings in your program that are encoded in some other charset than utf-8, but which shall contribute to your JSON formatted Ajax response, encode all these strings in utf-8 before you apply json_encode() ! Verify that all strings are in UTF8 format before using json_encode().
  • Always check the return value of json_encode() and react properly by something like
    if (json_encode($…) === false
    ) {
    …. error handling code …
    }
  • Last but not least: When designing your classes and methods for the Ajax handling on the PHP side always think about some internal debugging features, because due to restrictions and missing features you may not be able to fully debug variables on the server. You may need extra information in your Ajax response and you may need switches to change from a Ajax controlled situation to a standard synchronous client/server situation where you could directly see echo/print_r – outputs from the server. Take into account that in some situation you may never get the Ajax response to the client …

I hope, these rules may help some people that have to work with jQuery/Ajax, PHP, MySQL and are confronted with more than one character set.