Talk:Comparison of file comparison tools
From Wikipedia, the free encyclopedia
Would be useful to know which tools can handle large volumes of data without needing memory for each file processed (even if no differences were found). I need to compare directories with terabytes of data (millions of files). Even DirDiff on the Amiga could do this (and still can, in the emulation), but every tool I tried on Windows runs out of memory because it seems to allocate memory for each file it processes.
I am looking for a File- and Directory-Tree Comparer for MacOS before X... Please don't let fall that under the table! Thanks!!
Suggestion: addition of samefile
to tables. (203.87.122.227 01:11, 3 January 2007 (UTC))
Contents |
[edit] The last table should be axed
- Documents
- What documents? HTML documents can be compared by all the utilities.
- Binary (display) Hexadecimal
- All the utilities can be made to do these. Pipes.
- Not true at all! Pdev3
- All the utilities can be made to do these. Pipes.
- Tables
- All the utilities can compare tables in all the file formats they can handle.
- Metadata
- Seriously? What is this supposed to mean?
- Graphics
- All utilities can compare at least three types of graphics files.
- Not true at all! Pdev3
- All utilities can compare at least three types of graphics files.
If done properly, this table would wind up containing only lots of yesses. Axe it. Shinobu 21:52, 21 March 2006 (UTC)
(Proposed merge: see talk:file comparison) Shinobu 12:02, 8 July 2006 (UTC)
@Pdev3: that you don't know how to do something doesn't mean it can't be done. In fact the things you tagged with "Not true at all!" are quite easy to do. Shinobu 16:17, 18 September 2006 (UTC)
[edit] Need info on handling document formats!
One of the important features is the ability to handle .doc files, .xml files, etc. Many lack ability to handle these (unless you consider treating a .doc file as an opaque binary file, or an xml file as a structureless text file, "handling" them -- obviously noone seriously counts that). Pdev3
- There is no realy need to compare .doc files with an external program, since Word does a much better job at comparing them, automatically flagging inserted and deleted text with the appropriate authors and dates.
- Haha, perhaps you have never tried to merge directories with multiple files (much less directories with hundreds or thousands of files). If you are just doing small work on one document, you are in a small, happy world, yes, but, unfortunately, some are not...
- And note that for any "real" work (at least those kinds of real work where diffing is important, i.e. work where files are interchanged between different people) you shouldn't use .doc files. Word's .doc is not an interchange format. [1]
- Comparing xml files as text actually works quite well, especially if you pipe them through a normalizer before the comparison. Shinobu 10:42, 3 October 2006 (UTC)
- Haha, again, I suspect you have only done small work with one or two files. I bet if you had to deal with real live data, changing and many directories, you'd start to get interested in tools that could wrap up the xml normalization for you.
- The number of files to be compared is not really relevant. Just iterate through your files and pipe them trough the normalizer to the diff utility. If you're lazy, you can write a shellscript or similar utility that does exactly what you want in less than the time needed to figure out what "professional" tool to use and to install it. A similar solution will exist for Word (and it will probably yield better results, since it will automatically play nice with Word's revision control model). Shinobu 19:43, 27 October 2006 (UTC)
[edit] XML Diff Tools
Does anyone have any pointers to XML diff tools? Should we create a seperate page for XML diff tools? Thanks --Dan 16:06, 20 January 2007 (UTC)
- There is an XML::Diff Perl library. -- Beland 20:03, 6 March 2007 (UTC)
[edit] Non-linear ?
What does non-linear refer to? Is it to do with the time required to compute the diff? Gary van der Merwe (Talk) 09:24, 13 February 2007 (UTC)