I have several experience with file caching and it's definitely faster when your database server is under load, trashing CPU but not accessing disk.
On one server with totally broken D5 installation but without opportunity to install memcache, I managed to install cacherouter with filecaching but to ramdisk (shm) - it's super fast.
However, there are consequences to filecaching - several Drupal functions are calling file_scan_directory and if you have too many files in the cache (100k+), it will slow down your installation (on node edit, create, delete, in my case about 60 seconds!). I had to set up a cron script to delete cache during the night.
You should always try to use memcache/APC/xcache as it's RAM based, fastest solution. Drop back to others only if you have no other choice.
100k files is alot.
Is the file caching system working with sub folder?
Because one limitation of the standard linux file system is, that only 32k files can be put in a single
directory. Any try to put more files in will cause pretty strange file errors. (Windows don't have that
limit btw).
Also, without a file caching system, most file system starts to lose performance after 2-5k files in a single
directory.
I have no links here but you can search for both topic by google.
We have an mmorpg server, saving the player files and was running in the 32k limit.
A good way to overcome that is, to use the first 2 characters of a file name as index for a sub directory.
For example sourceforge is saving in that way projects and users.
like
a
-aa
-ab
-ac...
b
-ba
-bc
c
..
where every 2 chars mark directories and sub directories.
When it comes to caching and files, some of the drupal core functions are also criticial.
For example avatar upload and such can run in the directory trap and invoke a very hidden
performance trap.
I am well aware of problems with large directories. I am just passing over my experience.
Btw. Current fs implementations (ext3, etc) dont have problem with >32k files in 1 dir altough i would never recommend that because you are losing performance and handling that dir is extremely difficult
Exactly - the performance lose is even a bigger problem because its often triggered with somewhat
low numbers of files. More worse: its natural connected to the bus/storing system of your host,
which can be very different and special from provider to provider and you often has no real control over that
part of your server.
It sounds like one of this overhyped theoretical things, but its a real and easy triggered problem
because its so hidden and silent and unrelated to the normal work with LAMP and php.
I installed for example ICME and for the file access, i added there a line like
php: return 'users/'.round(($user->uid) / 4000) .'/'.$user->name;
to ensure a clean structure.
I recommed to use similiar structures everywhere when it comes to storing dynamically files, and everyone
should control the installed file caching system is doing it right too (to come back to topic).
Groups.drupal.org is a part of the drupal.org group of sites. Logging in on this site requires an account on the main drupal.org site. If you do not have an account on drupal.org, you will need to create one, log in over there, and then come back here where you should automatically be logged in.
I think that you can't just
I think that you can't just say either of them.
I have several experience with file caching and it's definitely faster when your database server is under load, trashing CPU but not accessing disk.
On one server with totally broken D5 installation but without opportunity to install memcache, I managed to install cacherouter with filecaching but to ramdisk (shm) - it's super fast.
However, there are consequences to filecaching - several Drupal functions are calling file_scan_directory and if you have too many files in the cache (100k+), it will slow down your installation (on node edit, create, delete, in my case about 60 seconds!). I had to set up a cron script to delete cache during the night.
You should always try to use memcache/APC/xcache as it's RAM based, fastest solution. Drop back to others only if you have no other choice.
Jakub
http://technoergonomics.com
100k files is alot. Is the
100k files is alot.
Is the file caching system working with sub folder?
Because one limitation of the standard linux file system is, that only 32k files can be put in a single
directory. Any try to put more files in will cause pretty strange file errors. (Windows don't have that
limit btw).
Also, without a file caching system, most file system starts to lose performance after 2-5k files in a single
directory.
I have no links here but you can search for both topic by google.
We have an mmorpg server, saving the player files and was running in the 32k limit.
A good way to overcome that is, to use the first 2 characters of a file name as index for a sub directory.
For example sourceforge is saving in that way projects and users.
like
a
-aa
-ab
-ac...
b
-ba
-bc
c
..
where every 2 chars mark directories and sub directories.
When it comes to caching and files, some of the drupal core functions are also criticial.
For example avatar upload and such can run in the directory trap and invoke a very hidden
performance trap.
Related Boost Issue
http://drupal.org/node/410730
I am well aware of problems
I am well aware of problems with large directories. I am just passing over my experience.
Btw. Current fs implementations (ext3, etc) dont have problem with >32k files in 1 dir altough i would never recommend that because you are losing performance and handling that dir is extremely difficult
Exactly - the performance
Exactly - the performance lose is even a bigger problem because its often triggered with somewhat
low numbers of files. More worse: its natural connected to the bus/storing system of your host,
which can be very different and special from provider to provider and you often has no real control over that
part of your server.
It sounds like one of this overhyped theoretical things, but its a real and easy triggered problem
because its so hidden and silent and unrelated to the normal work with LAMP and php.
I installed for example ICME and for the file access, i added there a line like
php: return 'users/'.round(($user->uid) / 4000) .'/'.$user->name;
to ensure a clean structure.
I recommed to use similiar structures everywhere when it comes to storing dynamically files, and everyone
should control the installed file caching system is doing it right too (to come back to topic).
This would be a great patch
This would be a great patch to improve scalability of IMCE. Why not submit a patch to IMCE queue. This could also be applicable to filefield.