Go4Expert

Go4Expert (http://www.go4expert.com/)
-   Web Development (http://www.go4expert.com/articles/web-development/)
-   -   Database Vs Files (http://www.go4expert.com/articles/database-vs-files-t283/)

shabbir 29Apr2005 11:17

Database Vs Files
 
1 Attachment(s)
There has been lots of web developer arguing about the fact as which one is faster Database or Files. I always replied in most cases as database is faster than Flat file and After getting that reply they always argued as why do you think that database is faster. I explained them with lots of words but now thought of just bench marking them so that its clear as what is faster.

Before going into the codes I would like to explain about the attachments
shabbir.txt - A Flat file that has 22400 names stored with one line containing one record.
benchmark.sql - SQL Script to create a table with the same data as the flat file contains. Remember that it also has same order as the flat file

Now here is the PHP code to be put in a file
PHP Code:

      <?php
      
      
echo("<h3>Test without condition!</h3>");
      echo 
"Reading File ... <BR />";
      
      
$start explode(" ",microtime());
      
$beginfile $start[1]+$start[0];
      
      
$h fopen("shabbir.txt","r");
      if (!
$h) return;
          
$msg="";
      
      while (!
feof($h))
          
fgets($h);
      
      
$start explode(" ",microtime());
      
$endfile $start[1]+$start[0];
      
      echo 
"Reading Database ... <BR />";
      
      
$start explode(" ",microtime());
      
$begindb $start[1]+$start[0];
      
      
$db=mysql_connect("localhost","root","");
      
mysql_select_db("benchmark");
      
$r mysql_query("Select f1 from t1");
      
      while(
$f mysql_fetch_array($r));
      
      
$start explode(" ",microtime());
      
$enddb $start[1]+$start[0];
      
      echo 
"Result of benchmark";
      
      
$resultfile $endfile $beginfile;
      
$resultdb $enddb $begindb;
      
      echo 
"<BR />Flat File time span to do the operation ==> $resultfile .";
      echo 
"<BR />Database time span to do the operation ==> $resultdb .<br/>";
      
      if(
$resultdb>$resultfile)
      {
          
$diff=$resultdb $resultfile;
          
printf("File operation was faster by %f",$diff);
      }
      else
      {
          
$diff=$resultfile-$resultdb;
          
printf("Database operation was faster by %f",$diff);
      }
      
      
mysql_close($db);
      
fclose($h);
      
      
//NEW TEST
      
echo("<h3>Test with condition where names start with 'A'!</h3>");
      echo 
"Reading File ... <BR />";
      
$start explode(" ",microtime());
      
$beginfile $start[1]+$start[0];
      
      
$h fopen("shabbir.txt","r");
      
      while (!
feof($h))
      {
          
$line=fgets($h);
          if(
eregi("^a",$line));
      }
      
      
$start explode(" ",microtime());
      
$endfile $start[1]+$start[0];
      
      echo 
"Reading Database ... <BR />";
      
      
$start explode(" ",microtime());
      
$begindb $start[1]+$start[0];
      
      
$db=mysql_connect("localhost","root","");
      
mysql_select_db("benchmark");
      
$r mysql_query("Select f1 from t1 where f1 like 'a%'");
      
      while(
$f mysql_fetch_array($r));
      
      
$start explode(" ",microtime());
      
$enddb $start[1]+$start[0];
      
      echo 
"Result of benchmark";
      
      
$resultfile $endfile $beginfile;
      
$resultdb $enddb $begindb;
      
      echo 
"<BR />Flat File time span to do the operation ==> $resultfile .";
      echo 
"<BR />Database time span to do the operation ==> $resultdb .<br/>";
      
      if(
$resultdb>$resultfile)
      {
          
$diff=$resultdb $resultfile;
          
printf("File operation was faster by %f",$diff);
      }
      else
      {
          
$diff=$resultfile-$resultdb;
          
printf("Database operation was faster by %f",$diff);
      }
      
      
//NEW TEST WRITING
      
echo("<h3>Test writing to Database & File</h3>");
      echo 
"Writing to File ... <BR />";
      
$start explode(" ",microtime());
      
$beginfile $start[1]+$start[0];
      
      
$newfile=fopen("shabbir.txt",'a');
      
fputs($newfile,"Name");
      
fclose($newfile);
      
      
$start explode(" ",microtime());
      
$endfile $start[1]+$start[0];
      
      
$start explode(" ",microtime());
      
$begindb $start[1]+$start[0];
      
      echo 
"Writing to Database ... <BR />";
      
$db=mysql_connect("localhost","root","");
      
mysql_select_db("benchmark");
      
$r mysql_query("insert into t1(f1) values('Name')");
      
      
$start explode(" ",microtime());
      
$enddb $start[1]+$start[0];
      
      echo 
"Result of benchmark";
      
      
$resultfile $endfile $beginfile;
      
$resultdb $enddb $begindb;
      
      echo 
"<BR />Flat File time span to do the operation ==> $resultfile .";
      echo 
"<BR />Database time span to do the operation ==> $resultdb .<br/>";
      
      if(
$resultdb>$resultfile)
      {
          
$diff=$resultdb $resultfile;
          
printf("File operation was faster by %f",$diff);
      }
      else
      {
          
$diff=$resultfile-$resultdb;
          
printf("Database operation was faster by %f",$diff);
      }
      
?>

Results for my machine *

Test without condition!

Reading File ...
Reading Database ...
Result of benchmark
Flat File time span to do the operation ==> 0.087211132049561 .
Database time span to do the operation ==> 0.13485312461853 .
File operation was faster by 0.047642

Test with condition where names start with 'A'!


Reading File ...
Reading Database ...
Result of benchmark
Flat File time span to do the operation ==> 0.11205410957336 .
Database time span to do the operation ==> 0.018458127975464 .
Database operation was faster by 0.093596

Test writing to Database & File


Writing to File ...
Writing to Database ...
Result of benchmark
Flat File time span to do the operation ==> 0.00011086463928223 .
Database time span to do the operation ==> 0.00024604797363281 .
File operation was faster by 0.000135

* can vary the time from machine to machine and from run to run depending on webserver loads and other network factors

Inference

As we can see from the above results
File operations were faster when there is no condition applied to the results and so the first impression we get is Files are faster than database but if you take the scenario into real time site then hardly there is any time you fetch all the records from the database and so for consideration we can ignore this.

Now applying a very simple condition where names start with 'A' and we see that the database operation is much faster than and practically similar operations are done more frequently on database where its database which is faster.

All the sites tend to follow WORM - write once read many and so the write operation in files is faster but again this operation is not that considerable speed.

coderzone 1Apr2006 18:18

Re: Database Vs Files
 
No doubt db.

rapwaydown 8Dec2007 05:57

Re: Database Vs Files
 
i still dont get the file and the database thing

ReekenX 21Jan2008 19:20

Re: Database Vs Files
 
Database is more secure to use than files, I think.

venkatesanj@hcl.in 25Jan2008 18:05

Re: Database Vs Files
 
1. Database provides a structured way of arranging datas instead of storing in irregular way

2. Logical and physical grouping and faster retrieval are some of the major points to be noted in db.

3. Querying and easy manipulation of stored datas can be achieved.

Regards,
Venkatesan Prabu. J

Magena 5Sep2008 11:57

Re: Database Vs Files
 
Hi

I also think DB is more secure than files.

thanks

Blagoj 17Oct2008 04:45

Re: Database Vs Files
 
Databases are better organized and utilize complex queries while flat file is just a file.
Databases are also faster.

JGRobinson 17Nov2008 05:07

Re: Database Vs Files
 
It basically depends on what you are doing, and how busy the hardware is - horses for courses. But usually we are complex animals, and databases rock for that...

hkp819 6Dec2008 18:31

Re: Database Vs Files
 
I also prefer database storage.

The rest of my application data is already in there. A database can have
better security, because it has its own permission and authentication
system, while file-based sessions often end up owned by "nobody", which
other users on a shared system can access.

I'm not so concerned about "faster", since databse storage is "fast
enough".

I think the only reason /not/ to use database storage is because the
database isn't be used for anything else. In that case, I would consider
file storage for simplicity.

xpi0t0s 7Dec2008 02:18

Re: Database Vs Files
 
Faster at what? A database will almost certainly be faster than writing your own code to do everything done, because it will already be debugged and optimised.

But if you're trying to say that accessing the same data through a database is faster than reading the data directly from a file with stuff like fopen() fread() then you need your head examining. A database will need to lookup the index, work out which block it's in, setup read consistency, check for exclusive locks that would prevent you reading the data, and multitask that with all the other stuff that goes on. There's no way on this planet that database access will be quicker than simple file access.

You wouldn't choose a database just because you need to store some data, you would choose it for the features it provides and for the fact that you don't have to write all that code yourself. If you don't need a database then just bung everything in flat files and take the performance benefit.

> A database can have
better security, because it has its own permission and authentication
system, while file-based sessions often end up owned by "nobody"

Wrong. Who owns the database files? Security of the filesystem is not a reason to pick database over flat file storage; make sure the files are owned by the relevant people and standard OS security will do the job. On the other hand, if the database files are owned by "nobody" then you get exactly the same problem: anyone can do anything with those files. The fact that you don't know how to setup security on data files is not an argument for a database.

> Test with condition where names start with 'A'!
> Flat File time span to do the operation ==> 0.11205410957336 .
> Database time span to do the operation ==> 0.018458127975464 .
> Database operation was faster by 0.093596

You have to do same thing of course for the comparison to be meaningful. In the "names beginning with A" test where the database was faster, the non-database was hampered by the fact that it was doing a regexp comparison of "^a" on each record; that's going to be a hell of a lot slower than doing "if str[0]=='a'" which is basically all that a "like 'A%'" has to do, or maybe "like 'A%'" will do a strncmpi() or similar). The database part of it used "$r = mysql_query("Select f1 from t1 where f1 like 'a%'");" - like and eregi are completely different operations and I would suggest to be fair you should make the database do a regexp lookup instead and see how well it fares.

Also I think the test for writing to a database and file could be improved substantially. Most of the time is going to be taken in opening and closing the file in the file test, and connecting and disconnecting in the database test. Instead of writing a single record, write 1,000,000 and divide the resulting time by 1,000,000 to get a time per record instead; this will be a far more meaningful comparison. Also you could open the file and connect to the database FIRST, then start the timing, write the data, stop the timing, then close the file and disconnect, then the timings will only relate to the record write and not to all the other gubbins.

File access WILL be faster, no question, because except for in-memory databases, a database will have to get the data from a file anyway, plus all the other RDBMSy stuff it has to do. Of course, it's perfectly possible to write highly optimised code against the database and really bad code against the files, and the results will be skewed in favour of the database. That doesn't prove the database is faster, it only proves that crap code is slower than good code.

Also you'll need to run the benchmark test several times to eliminate caching effects. The first run will be skewed by the fact that everything has to be loaded from disk and subsequent runs will be more reliable.


All times are GMT +5.5. The time now is 19:38.