Not recommended for deletion.
Odd, but the System V and Solaris servers I have managed never seemed to have a file bloat problem and file system checks and backups never seemed to take longer than a typical Windows server.
But yeah, letâs blame Unix.
This. I read the article, the author never explains how lots of little files are the fault of âUnix.â He then goes on to mention the symlinks created by Time Machine, and Spotlight. My Linux system has just shy of 1 million files on that, a large part of which are git object files, but Iâm not blaming âUnixâ for that.
Millions of tiny flies? Ugh. I knew there was a reason I didnât like Macs.
Thatâs how I read it first too, and I was thinking âI donât recall seeing any swarms when I had a Macbookâ
Well, theyâre, like, super tiny.
Not in detail, no. He says âThere is something about Unix that loves a multiplicity of tiny files rather than monolithic larger ones, hence these huge counts.â Which is true. Unix believes in aggregating small programs to do operations. Do one thing, do it well. Chain those things together. This good behavior is, to use your term, the fault of Unix.
He also goes on to point out that the number of files is not big deal these days: âWith HFS+, however, a drive can have nearly 4.3 billion files without each consuming unreasonable amounts of space.â Goes into some detail on Time Machine, then quotes somebody: âI wouldnât worry about it too much â let the system handle its files, and donât be too concerned about the count. As long as your drive isnât mysteriously filling up, youâre good.â
Blame for what? For not having a giant-monolithic-slow-file approach like windows, as the article implies?
Whatâs the fuss, Gus? Did you actually read the article?
With HFS+, a room can have nearly 4.3 billion flies without each consuming unreasonable amounts of space.
The noise of their wings and mouthparts as they consume the moisture from your eyeballs is another story.
Remember that one time, in Wisconsin, where Mayflies showed up on the radar?
Do you know where Woz was on that day?
I wondered this myself. The best I could find was Unix philosophy - Wikipedia which suggests this grew out of a tradition of building âlots of little things which only do one taskâ vs âlarger things which do multiple tasksâ. I believe at this point it is a matter of tradition rather than engineering intent.
In the case of the mac, file counts are also high because of the way macintosh apps and files are âpackagedâ â though mac applications, for example, appear as single file to the user, in fact they are folders holding all of the text, foreign language translations, icons, graphics, etc. that are seen in that program. To my knowledge no other operating system does that; either these âsupportâ materials are built into the application or some other type of container is used. I think this originated during the Mac OS 9 to Mac OS X transition. Few people here probably remember this but old-school mac files used to have the idea of a separate âforkâ or storage area where these support data were stored, but Mac OS X does not, so âpackages as folders of little filesâ had to be created to hold the material that used to be stored there.
#Macs have millions of tiny files?
I think I see the problem.
/former Windows user due to work requirements
Will this help?
Iâll let you know! I ordered it the first time Mark posted it.
Yes, and now that Mac app packages are bundles of files, theyâre more likely to be able to survive being copied through file servers that donât support file forks. There are fewer issues with less computer-literate users sending files through means that donât grok file forks.
Some people may remember that versions of Mac OS X before 10.5 supported booting off of UFS, which did not support file forks. There was probably considerable internal debate at Apple on if HFS+ would be viable.
So the proliferation of files is probably more about supporting the Mac OS X experience when it interacts with non-Macs/non-HFS rather than anything attributable to Unix.
Talk about a completely information free post.
Iâve noticed from the archives that created on Macs that there seem to be all the ._ files, usually in a ._MACOS (or something) subdirectory, that seem to be thumbnail files, but Iâm not even sure that they are that. They seem to be a bit too small.
Blaming it on Unix is the real WTF. Iâve run TRS-XENIX (go ahead and laugh) to IBMâs version on AIX. Iâve never seen that number of files, or those hidden subdirectory/files either. Some hidden files and directories, sure. But it really doesnât explain Mac behavior, this is a âjust cuz!â article.
Deleting those files has never caused me a problem, at least on Windows.
Trust me, Iâm no fan of the old resource fork (though when I was a kid I found manipulating those resources a great way to learn about computers / cheat at computer games). Though, there are other ways they could have handled this, for example using a compressed or non-compressed zip file as a container (as do Java apps, or epub books, or some microsoft office files) which would have protected mac files even more effectively when resident on or accessed by non-mac systems.
All that said⌠this is all a non-explanation for a non-problem. Shouldnât matter at all how many files are around on a modern system. Focussing on that metric encourages micromanagement and the use of dubious third-party utilities to âoptimizeâ things that donât really need optimizing. (That said, all systems seem to have issues with âleft behindâ data after product uninstalls; no one seems to have solved that problem in any systematic way.)
Firstly⌠Why is millions of tiny files a problem? A decent file system can handle this. Some may say that a monolithic single file accessed as a database is better, but that very much depends on the use case. Each method has its pros and cons.
Secondly, there is no real link to UNIX. Even the UNIX philosophy, as mentioned by others here, doesnât mean youâll have millions of files. It means that the tools you use are small, focused and capable of being used in pipes that feed data from one tool to the next, to get the desired result.
That does mean you might have some more configuration files. But many of the tools donât have - or need - any configuration files. There may be a few scripts to accomplish common tasks, but they wonât be âmillions of filesâ.
The two tools that are blamed for the bloat - Spotlight and Time Machine - are Appleâs own software, and are (as I understand it) monolithic products. Not built in the UNIX philosophy.
The rest of the files in these âmillions of filesâ could be due to application packaging, as someone else suspects. And a small amount of them might be hidden folders making up for a lack of forking support in OS X - but again, probably not millions. A more likely source would be caches for web browsers and media players (web browsers in particular can often have tens or hundreds of thousands of files in their caches in my experience).
All the candidates I can think of (iTunes, Safari) and the two culprits in the article (Spotlight, Time Machine) are monolithic software written by Apple. And have very little to do with UNIX except that they are running on a variant of UNIX.
And Iâll go back to my original opening - at no point does the article really cover why millions of files are a problem.
So I start thinking about the times that Iâve dealt with large numbers of files in my career, and what downsides it brought with it. Iâve moved terabytes of data from one disk to another in enterprise environments - scheduled changes that required hours or even days of downtime to accomplish successfully. Iâm well aware of the overhead that the findfirst/findnext filesystem behaviour adds to a copy of many files, be it local or across the network. Itâs there, but in my experience itâs negligible - it might delay the copy by a few minutes per hour of data to be copied, if youâre using an efficient program (like rsync/robocopy/richcopy).
Another downside that occurs to me is that if you need to do a disk integrity check on the filesystem, it will take longer. But filesystems like ext4 show that even that can be sped up by sane design - Iâve seen fsck times drop to a tenth of the ext3 times on the same data. So if thatâs an issue, then the fault is still with Apple as they have control over the HFS+ filesystemâŚ
The only other downside I can think of is increased disk usage due to allocation policies - the fact that a 1 byte file will probably take up an entire allocation unit, which on a >1Gb HFS+ volume will be 4KB. This is a problem that has been solved elsewhere - itâs called block suballocation, and allows a block to be divided amongst smaller files. Another possible solution is to store small files the filesystem structure itself alongside other metadata, to avoid using an allocation block for the file content at all. Again, as with integrity checking, HFS+ is Appleâs technology and not a UNIX one, so this is an Apple problem.
I canât think of any other downsides off the top of my headâŚ
The upsides are clear - multiple small files can be more resilient and survivable in the event of a software failure (due to crash or just a bug). If you stored all your files in one big DB, a bug in any application you use could potentially wipe out all your data. Also, if the container file on the disk for that DB gets corrupted youâre more likely to lose everything. One big DB is one big point of failure, and one big risk.
None of which is anything to do with UNIX per se. The linked article is one of the worst Iâve read in recent memory - foundationally unsound, astonishingly misleading, and unable to bear any informed rational scrutiny.
tl;dr - 0/10, would not read again.
I read it. I also read where the author believes the number of files causes slowdowns for file system checks and backups. My point is that Unix systems do not have this problem.
I began on AS400, moved to System V and MUMPS on VAX Alpha servers, and now maintain both Debian and Windows servers. I compared the file system check and backup times to Windows servers due to the fact that most people are familiar with Windows which makes for a good common comparison.