Talk:Type safety
From Wikipedia, the free encyclopedia
Contents |
[edit] Garbage collection and type safety
- Typically, a language needs to have garbage collection in order to be type-safe.
Is it so? I don't think gc has anything to do with type safety. The type safety, as described in datatype, is that type rules are enforced rigidly. For example, you cannot performace intergers addition over a value with a character datatype, even though it is perhaps possible on the level of the underlying computer architecture. -- Taku 23:13, Apr 17, 2004 (UTC)
- Well, if you read the article, gc has tons to do with type safety. If you don't have gc, you can possibly change a pointer to anything you want. The confusion lies in that people use the term type-safety in different ways. That's part of the purpose of the article. MShonle 23:16, 17 Apr 2004 (UTC)
-
- Oops, I was thinking of strong typing. Ok, what about smart pointer? Doesn't it make sure the pointer is not broken? -- Taku 23:29, Apr 17, 2004 (UTC)
-
-
- On the GC article, it lists smart pointers as a form of garbage collection. However, reference counting has the flaw that a circularly link list can end up being "dead": unreachable, but still allocated. Reference counting has some pros over mark and sweep, but it can still lead to memory leaks anyway. A hybrid form of reference counting plus mark and sweep probably wouldn't be as fast as a modern garbage collector. MShonle 23:59, 17 Apr 2004 (UTC)
-
Why merge? Datatypes and type-safety are different topics. Might I suggest the type safety discussion of datatype gets placed in here? This page should discussion type safety as in this book [1]
- Sorry I wasn't thinking carefully enough. I saw an article pop up suddely without the context of datatype so I thought it was a result of a contributor who doesn't know about datatype article. I guess I am not sure about this topic. My understading about type-safety is just safety described in datatype article. From the reading of the article, it seems the topic is different from a type system making the language safe. -- Taku 23:29, Apr 17, 2004 (UTC)
-
- Thanks, I think linking to datatype is good idea. I also included in datatype a link to here. It's unfortunate that "type safe" can refer to either strong typing or "complete semantics." In academia, we only use type safety in the mathematical sense (one could say Scheme was strongly typed, but it only has one type (ie. "object"), for example). The industry seems more to go with the other definition. It's not like one usage is "wrong", but in the literature it's good to know a distinction exists. MShonle 23:39, 17 Apr 2004 (UTC)
[edit] Complete semantic model?
Some question:
- The C language could become type-safe if it had a complete semantic model. For example, if C code was translated into Scheme code, together with a model to simulate a machine, that definition of C would be type-safe provided that all of the semantics were complete. In C, you can change pointer values to anything, but this can be simulated with a model of a memory system consisting of Scheme arrays.
Maybe I am not knowledgable enough to understand this. I read in your userpage that your Ph.D student so I don't doubt your expertise in computer science. I am just trying to understand some.
Anyhow, what is a complete semantic model? From reading, it seems C lacks semantic mode while Java and Scheme, so C is not type-safe but Java and Scheme are. Is this simply means C has a mean to bypass a type system? What about C++? Is it as unsafe as C? -- Taku 23:41, Apr 17, 2004 (UTC)
- A complete semantic model can tell you something like "given this program and these inputs, and if I run it for this amount of steps, what is the value of everything?" Semantics can start with lamdba calculus style reduction rules. To prove Java's typesafety, there was a "feather weight Java" model that would give you L-calculus style reductions. But if I have a C program, it can change any memory it cares too, which sometimes can change the whole behavior of the program in an unpredictible manner.
- Making a machine model in Scheme inorder to run any program would essentially require emulating a machine (it's not quite that bad, but close). Because the machine is fixed, it's possible to have complete specifications and determinism.
- C++ is also unsafe, because you can change pointers. You would have to remove pointer arithmetic and others from C++, and give it garbage collection too, to even begin to make it safe. (This essentially gives you something closer to Java.) So as not to seem to be bashing C++, I mention C. I also added that little part about C being used to write gc so as to be sure that C wasn't being bashed either. MShonle 23:55, 17 Apr 2004 (UTC)
-
- Thank you very much. Now the article makes much more sense to me. I am now trying a bit to make the article more accessible. Please feel free to correct any of my errors including gramatical ones. -- Taku 00:14, Apr 18, 2004 (UTC)
-
-
- I've made some minor edits. MShonle 00:21, 18 Apr 2004 (UTC)
-
Why did you remove this [2]? I thought it helps understand what does it mean by bugs making Java's type system unsafe. -- Taku 00:22, Apr 18, 2004 (UTC)
- That link actually talks about a different point that is not related to VM bugs, perse, but it should probably go under External Links or something.
[edit] Type safety/types
Maybe it's just a bad term, but I find the idea of "type-safety" having apparently nothing to do with types rather suspect. Is this some kind of historical accident?
Derrick Coetzee 17:01, 18 Apr 2004 (UTC)
- Apparently, the article is in the context of type theory, not datatype. In the line of datatype, it really doesn't make sense that Scheme is type-safe while being weakly typed. But I did some research. The article is a valid topic after all. I think we should emphasize more aspects of proof of programs in the article. It is not common (well for most of us) to prove a language is mathamtically type-safe. I am also suspecting this has something to do with proof-carrying code or such. Anyway, I am not into this topic and have no books or formal resources to tell much. MShonle is seeming knowledgable. So maybe we can just trust him?
- Anyway, true we have to put more contexts. Without background knowldge, I don't think the article makes much sense, like gc gurantees type-safety. It's probably true but the article needs to articulate more about why and how. -- Taku 17:10, Apr 18, 2004 (UTC)
-
- To Dcoetzee and Takuya: I took a graduate-level class with one of the preeminent researchers behind type safety and type soundness, so this isn't really a topic that comes up for even a typical CS graduate student.
-
- There is a relation to types, in that you don't want, in any way what-so-ever, to change the bits of a piece of data not sanctioned by the language. (Well, in the case of C, changing bits is sanctioned, but the behavior is undefined.) Matthias Felleisen has later said maybe his work should have been called "complete specification" instead of type safety, but, well, no one calls it that.
-
- Whatever the case, there is an insightful piece of human knowledge when you realize garbage collection is essential for security, and therefore worthy of an encylopedia entry. MShonle 20:37, 18 Apr 2004 (UTC)
-
-
- Actually, GC is not necessary for type-safety; GC is merely one mechanism for ensuring type-safe memory allocation. Region-based memory management, for example, is not usually thought of as garbage collection, but (with appropriate type system support) is type-safe. k.lee 07:14, 18 Aug 2004 (UTC)
-
-
-
-
- That's a good point. Perhaps it could be worded as "garbage collection or other restricted memory management systems (i.e., those memory systems without dangling pointers, or where dangling pointers can only point to objects of the same type)". Either way, the point is that people who make rules about strict typecasting (et cetera...) who think they have a type safe programming language are just fooling themselves if they don't have GC or anticipated it with some other form of restricted memory management. MShonle 19:20, 8 May 2005 (UTC)
-
-
[edit] Questionable claims
This article is full of highly questionable claims (Java and Scheme have been "mathematically proven" to be type-safe? Type safety requires a garbage collector?). There isn't even any mention of the key concepts in type-safety, viz. type-preservation and progress! I encourage editors to read the illuminating discussion on this very topic on the TYPES list. The present article only exposes a needless confusion of semantics (types) and implementation details (gc, etc.). — Kaustuv 20:06, 2004 Aug 17 (UTC)
- Note, in it's present state the article discusses preservation and progress, near the very top of the article. Other readers should also see the conversation with Kaustuv and myself below that explains why GC is more than just an implementation detail. (To be fair, I would call it a technical detail, in the same way that, in a mathematical proof, division by zero is a "technical detail" that can be significant enough to make the proof invalid.) --MShonle 02:34, 8 January 2006 (UTC)
[edit] Garbage Collection Confusion
The phrase "typically needs" is pure weasel wording. Either GC is necessary in a type safe language or not. If it is just one way to implement type safety, than the relationship is such that some type-safe languages do not have GC, and some non-type-safe language have GC. I think this qualifies as irrelevant.
The explanation given is weak as it is couched completely in a single example, and makes no effort to give a definitive reason why GC may be required.
- I think this is the thought trail used: type safety doesn't work with pointer arithmetic (see example). Lack of pointer arithmetic "usually" implies gc. Hence the relation. But you're right, it's not a strict claim. You can think of a language that's typesafe but doesn't have gc. Hell, think of a typesafe language without heap memory! Wouter Lievens 19:11, 7 May 2005 (UTC)
It's very difficult for a general language to be typesafe and not have GC. I'll try to explain the concept here: Suppose that there was no GC in your language. Then that would mean you could have so-called dangling pointers. Now suppose you have a dangling pointer to an object that has an integer field. Now suppose that a new object is allocated on top of where this integer field uses to be, but instead of an integer field it's a pointer. If you now take the dangling pointer and change the integer field, you've actually arbitrarily changed the pointer value!
A non-GC language could only get around this problem by only allowing objects of the same type to be allocated in only the same locations. Reference counting could also be used, but reference counting is really considered a GC technique... but that's a whole other topic. MShonle 22:14, 7 May 2005 (UTC)
I've updated the article. I also removed a rather snide, philistine comment that these claims about garbage collection were "dubious". I'm amazed how many times I've had to repeat this fact that appears repeatedly in computer science literature. I hope now that my saying "garbage collection or specialized memory management" will clear up any doubts.
- Can you provide citations for this claim that type-safety requires garbage collection? Perhaps in Pierce's book? — Kaustuv 22:27, 2005 May 13 (UTC)
-
- Yes, it's in Pierce. I don't have my copy here, but you could probably find it by looking up garbage collection in the index. Besdies, haven't I already supplied enough reasons? Has reasoning gone out of fashion? Anyway, here's a whole paper that assumes the topic: Memory Safety Without Runtime Checks or Garbage Collection. Note that the result of the paper has been addressed in the article by saying "or specialized memory management". Either way, it's completely non-trivial and it's vital to the understand of security and typesafety to realize that introducing typing and avoiding all casts can still get you into trouble (unless you use GC like Java, C#, Scheme, ML, Smalltalk... or the other tricks). MShonle 22:57, 13 May 2005 (UTC)
-
-
- The paper you mention talks about memory safety, which I would claim is quite different from type safety. The first is fundamentally an implementation detail, and the second fundamentally a semantic detail. I assume you are not proposing to consider them to be the same problem. (Note that walking off the ends of arrays is possible even in type-safe languages like SML.) I await your specific citation from Pierce where the claim that "type safety requires garbage collection (or specialized memory management)" is substantiated. Page numbers will be fine. — Kaustuv 23:10, 2005 May 13 (UTC)
-
-
-
-
- Why would Pierce saying 2+2=4 be any more true than my saying it? I think I've fully demonstrated to you how garbage collection is an important element of typesafety. Note that when you go off the end of the array, if you get an exception that's perfectly fine: the semantics say what happens when an exception is thrown. What's *not* ok is that the semantics of C never say what happens when you overwrite an array. What happens could affect even what code gets executed next. No one should ever claim that C is a typesafe language: calling it unsafe is not a value judgment on C itself. I've already made it clear in the article that garbage collectors are best written in languages like C (and Pierce makes the same point).
-
-
-
-
-
- (BTW, just read the introduction or so to the paper. They make the same claims I do. Only *after* that do they talk about memory safety.)
-
-
-
-
-
- If you do not understand the dangling pointer problem, you should ask me questions (here, on the talk page) and we'll try to work it out. If you think that undefined behavior (such as deferencing a floating-point number) can still be typesafe, then that's a completely separate discussion. If you just want to see Pierce say it, well, wait until I get home to the book and perhaps we can stop wasting our time arguing who says 2+2=4. MShonle 23:22, 13 May 2005 (UTC)
-
-
Here is the much-wanted quote from Pierce, in his book Types and Programming Languages, 2002. Page 158:
- "A last issue that we should mention before we move on formalizing references is storage deallocation. We have not provided any primitives for freeing reference cells when they are no longer needed. Instead, like many modern languages (including ML and Java) we rely on the run-time system to perform garbage collection, collecting and reusing cells that can no longer be reached by the program. This is not just a question of taste in language design: it is extremely difficult to achieve type safety in the presence of an explicit deallocation operation. The reason for this is the familiar dangling reference problem: we allocate a cell holding a number, save a reference to it in some data structure, use it for a while, then deallocate it and allocate a new cell holding a boolean, possibly reusing the same storage. Now we can have two names for the same storage cell--one with type Ref Nat and the other with type Ref Bool."
MShonle 01:31, 14 May 2005 (UTC)
- I had a comment here before, but realised on second thought that it was garbage. Thanks for that reference. — Kaustuv 01:50, 2005 May 14 (UTC)
Just to be sure: Type safety is about programs "not going wrong." What I've described here is a way that, even if you can't manipulate the bits of a pointer directly (like typecasting in C allows) you can still make the program "blow up" through this complex dangling pointer example I've described. To get around this dangling pointer problem you can either use garbage collection (common), or pools of object allocations where only structurally equivalent objects get allocated in the same memory regions (rarely used). If you don't fall into either GC or this (relatively) rare management, the language isn't typesafe. MShonle 21:59, 13 May 2005 (UTC)
- I've rephrased the discussion of GC and type safety. Saying that type safety requires GC is strictly-speaking wrong, IMHO. Consider a hypothetical language which disallows pointer aliasing (so assigning one pointer to another would be a deep copy, not a shallow copy). Or a language in which when an allocation is released, all pointers to the released storage are set to the null pointer. Both languages don't use GC, and while they are of dubious practical value, they also don't suffer from the specific type safety problem the article describes.
While we can debate the GC issue endlessly, I think the more important point is that the article should not focus on this question in the first place. The article should discuss the question "what kinds of operations can we allow in a type-safe language?" As it turns out, one of the kinds of operations that need to be disallowed are some combinations of typical "pointer" types and manual memory management. Those restrictions should be the focus of the discussion — the question of how one ought to implement resource management in a language without those properties is of secondary concern (so it should be mentioned in passing, as my revision of the article does). In other words, the article should speak about language semantics; GC is an implementation detail. Neilc 14:12, 14 May 2005 (UTC)
-
- Actually, we don't say type safety requires GC. We say it requires GC or some other memory management heroics. You prove the point with your two alternative examples (you suggest something as complex as the mark operation; and your deep copy suggestion seems to be impractical). I think it's highly important that the article address the GC issue, for one reason it's completely non-intuititive but it's something every type systems reseracher must know. Also, please check out the Peirce quote, about three paragraphs above this message. He's one of the pre-eminent researchers in the area, and I'll take his word for it, as should the wikipedia. MShonle 18:00, 14 May 2005 (UTC)
-
-
- Yeah, I'm not saying we needn't mention GC, I'm saying the emphasis of the article was wrong, and is just as wrong in your updated revision. I've made some changes that I think improve that. As for the "GC best implemented in languages with pointer arithmetic", it seems irrelevant to the article, so I've removed it again. Neilc 00:55, 15 May 2005 (UTC)
-
-
-
-
- The implementation note is not irrelevant. It serves three purposes, perhaps more: (1) it brings back the connection that you can have safe programs in type unsafe languages; (2) it shows that to create type safe systems you might need unsafe features; and thus (3) the article takes a neutral perspective toward C programmers. Many could read the article and think it was bashing C, when in reality it is saying "in order to implement systems with these properties that we're talking about, we need something like C."
-
-
-
-
-
- As for your opinion about the emphasis of the article, perhaps you can provide some reasoning other than you saying that you think it's "wrong." I've already addressed your other concerns. Moreover, there is nothing factually wrong in the article as I mostly recently edited it. I'm going to also change some of your wordings, which ruined the meaning. MShonle 01:51, 15 May 2005 (UTC)
-
-
-
-
-
-
- How did my wording "ruin" the "meaning"? Regarding what I think is "wrong", I already explained my reasoning for what the emphasis of the article should be; read my first comment. You never replied to that part of my comment, though, so I didn't bother repeating myself. Anyway, I'm reasonably happy with the current state of the article; I still disagree that the GC implementation note is worth including, but it's not a big deal. Neilc 02:21, 15 May 2005 (UTC)
-
-
-
-
-
-
-
-
- I'm glad we've reached consensus, because I'm tired of people reading this true, well-reasoned, logical, and interesting note on garbage collection and tring to reject it as non-factual. I felt that your intro sentence ruined the meaning by saying "place restrictions on ... manual memory management". To me, it no longer is manual memory management when it's GC (the qualifier "manual" is what ruined the meaning). Thus, I changed it to say "restrictions on the allocation and deallocation of memory". You can have GC, which certain does this (on the deallocation side), or region-based allocation (on the allocation side). Sorry for using strong words like "ruined", but I was quite miffed for you saying the article was "wrong" when it spoke the plain truth, and you simply disagreed with the style in which it was presented. MShonle 02:41, 15 May 2005 (UTC)
-
-
-
-
The requirement of storage reclamation (garbage collection or otherwise) is entirely a pragmatic one, having nothing to do with semantics. You could perfectly well define a language that has bounded storage requirements (requiring fixed-size arrays, no pointers, and no recursion, even indirectly -- and yes, such languages do exist), or that fails when it runs out of storage (the cleanest semantics can't prevent that in general...). The wording needs to be changed to reflect all this. --Macrakis 00:36, 27 January 2006 (UTC)
[edit] New "Definitions" section
I originally just wanted to add the link to Wright and Felleisen, who are generally credited with having invented Preservation and Progress. I ended up creating a new Definitions section, and became worried about the fact that the article defined type soundness in such a way as to completely rule out denotationally-specified languages (they have nothing analogous to Progress). W&F's paper describes some of the denotational-semantics-based notions of safety that were around before they did their thing; it seems like as long as this article is going to mention formal stuff we should at least leave room for someone who knows about denotational semantics to put in a word or two, so I tried to do that.
I have also clarified a bit how a dynamically-typed language can be type safe.
Another thing I think this article should say but doesn't is: Only in a type-safe language are type annotations really meaningful. In C, you can declare a local variable
struct something *x;
but, because C is not type-safe you can never be sure that x
points to a legitimate something
unless you have just created it yourself. In Java, on the other hand, if you say
Something
x;
then anything that might result in x
being anything other than an instance of Something
(or of one of its subclasses) will be illegal.
And since Ph.D. students in programming languages seem to be playing a big role in this discussion, I'll mention that I am one too. :-)
Cjoev 00:04, 24 January 2006 (UTC)
[edit] Theory vs. Praxis.
While I am a great advocate of type safetyI feel that the article is to much on the theoretical side of live.
- Type safety is usually a requirement for any toy language proposed in programming language research.
So more I read the discussion here so more if wonder if any "real" programming language can actualy archive "Type safety" as described here. That's unless they give up on the option of beeing self-hosted:
- Garbage collectors are best implemented in languages that allow pointer arithmetic, so that the library that implements the collector itself is best done in a type-unsafe language like C.
Actualy most languages mentioned here as type save are not self-hosted.
This reminds me of Why Pascal is Not My Favorite Programming Language - especial the Chaper 2.6. "There is no escape". Question: Is a type save language allowed to have an escape? Example: I allways considered Ada to be type save - and indeed: Ada's default behavior follows the two rules outlined in this module. However Ada does have escapes in the form of Unchecked_conversion and Unchecked_Deallocation (and some others).
--Krischik T 10:18, 25 January 2006 (UTC)
- It depends on what you mean by "escape". If attempts to escape are checked at run time and fail if they don't make sense (as with Java casts, for example), then the language can be type safe. If it is possible to escape in ways that don't make sense, without the language implementation noticing, then that means the language is not type safe. I'm a bit fuzzy on the semantics of the Ada constructs you mention, but certainly the fact that they are called "unchecked" suggests that they break type safety, making the statement that "Ada ... [has] been shown to be type safe" highly suspicious to say the least.
-
- Well in Ada is a case of [[grep]ing for "Unchecked_Deallocation", "Unchecked_Convertion" and "Unchecked_Access". If none of those have been used then the type system has not been violated and as such is save. Ada serves a dual purpouse: The designers wanted a language with type savety which is suitable for Embedded system programming as well. But it seams that System programming and Type safety are indeed mutualy exclusive.
-
- Ada has ben supplied with two convertions. A checked convertions like in Java (even has the same functions style syntax) and Unchecked_Convertion which is a generic/template.
- For that matter, I don't think it is true that Pascal (the classic "type-safe language" that everyone hated because of its restrictiveness) is really type safe -- if I remember correctly, at least the classic Pascal language from the old days was not, due to a problem with variant records. I don't know about more modern Pascal dialects.
- In theory, type safety is a black and white issue -- once one has specified the type system, the run-time behavior and what it means to "go wrong", either the language is type safe or it's not. Based on the above I would say that Pascal and Ada both fail the test. In practice, I suppose it is the case that a number of languages are "mostly type safe" in the sense that they have some unsafe constructs but it's understood those are only to be used in desperate situations and even then very carefully. (Even if the language itself is safe, it's possible to link unsafe libraries into a program; this is mentioned in the article.) From a formal standpoint, of course, such languages are type unsafe and that's all there is to it.
- By the way, I think it is safe to say that none of the many type safe languages in use today can be self-hosting. Most of them, however, do bootstrap, and attempt to limit the use of unsafe languages to the garbage collector, system libraries, and as few other parts of the runtime system as they can manage.
- Cjoev 20:57, 25 January 2006 (UTC)
Krischik, I hope you won't mind, but I reorganized your most recent contribution. I hope I preserved your intent, if not your language. I think what our conversation has shown is that when it comes to judging real-world languages and implementations, there is a large gray area between C (really really unsafe) and something that satisfies the formal definition to the letter. Based on that, I think that rather than just listing languages that fall into various categories, it may make more sense to discuss the issues associated with various languages. That's what I've tried to do (and I added a little bit about my favorite language). I think this list can grow: Java, for example, has quite a history of type safety bugs being discovered and patched. Most of them have to do with class loaders and things. Hopefully someone knowledgeable can fill that in.
Cjoev 01:22, 27 January 2006 (UTC)
- Well done - I think your approach ist better then mine - putting the issues up front. --Krischik T 07:12, 27 January 2006 (UTC)
I just thought of another point. You have said many times that Ada's "default behavior" is type safe. I would have said the opposite: "By default," Ada compilers let you use the type-unsafe features whenever you want. Therefore, absent some assurance to the contrary (like what you may get from grepping the code), any given Ada program should be presumed unsafe. Put another way, if I am presented with the binary form of a third-party library written in Ada and don't have access to the source code, I can't just assume its authors did not use any Unchecked functions! What I think you mean, and what I have tried to express, is that type-safe programming is supposed to be the default behavior of Ada programmers in the sense that use of the unsafe constructs is supposed to be rare.
Cjoev 22:31, 26 January 2006 (UTC)
- Indeed true. I have kicked of some discussion about the issues involved at [3]. This article got me thinking about some issues. --Krischik T 07:12, 27 January 2006 (UTC)
[edit] Java's type safety
Is it true that Java is believed to be type safe? Seems to me you can put an object of any class into a container, then pull it out as any other class. The first time you attempt to access any member, your program goes boom! If you believe that to be type safe you are working from a different definition of the term than I. —The preceding unsigned comment was added by 209.4.89.67 (talk • contribs) 22:35, 26 January 2006 (UTC)
- No, the program doesn't go boom. It raises an exception. There's a big difference, formally speaking: exceptions, the fact that you can catch and handle them, and the fact that they are raised in certain situations, are part of the language specification and hence completely predictable. You might even imagine a programmer relying on that exception to detect a certain perfectly normal condition in the program (but you'd hope no one really does this). In contrast, when a C++ cast goes wrong, you don't notice right away but the eventual result is implementation-dependent, that is, outside the specification of the language. This may be a different definition than yours, but it is the real definition.
- Cjoev 01:03, 27 January 2006 (UTC)
[edit] Degenerate type system?
assume for purposes of discussion that a certain implementation of a certain language has some type, say t, such that there exists some sequence of bits (of the appropriate length) that is not a legitimate member of t. (All but the most degenerate type systems satisfy this criterion.)
Um, what sequence of bits is invalid in an average int, float, or pointer type? Are type systems that only have a few really basic types degenerate? C, for instance, only has integer (including chars), floating-point, array, and pointer types, doesn't it? Any sequence of bits of the appropriate length would be valid for any of those, unless I'm missing something. (Well, a null pointer generally indicates a problem, but it's still a valid value, I should think.) —Simetrical (talk • contribs) 16:44, 19 May 2006 (UTC)
- I wrote that. Perhaps "degenerate" is not the right word; "trivial" might be better. Note that the example only requires one type in the entire language that has a non-member. Int and float don't qualify, but is a sequence of bits a valid pointer value if it points to inaccessible memory? Can a value be a function pointer if it doesn't point to code? Perhaps in C, but not in languages that satisfy stronger notions of type safety or memory safety. In that sense, yes, I would say C's type system is trivial and/or degenerate. Surely any system for "classifying" values is degenerate if it doesn't distinguish anything from anything else? Cjoev 22:30, 21 May 2006 (UTC)
-
- You're correct that some pointers might point to bad areas of memory, or might be null (which signifies no area of memory), but such values are still valid, just not useful. If you tell the program to assign such a value to a pointer, it will dutifully do so, because those values are valid for the type within the language.
- As I acknowledged, that's true in C, but it's not true of all languages by any means. If you reach into the memory of a running Java program and replace the contents of a variable of reference type (array or object) with random bits, you will most likely have introduced an error of a kind that could not possibly have occurred otherwise. On the other hand, if you do that to a C program, sure, you'll probably crash it, but not in a way that the language could ever have ruled out. Based on this, I say that the bits you wrote are not a valid value for the Java variable, since putting them there can cause the program to behave in very un-Java-like ways, even though you might say they are a valid value for the C variable, since you didn't break any language safety properties by introducing them. Cjoev
- See, the thing is, C classification of values does distinguish between different types of the same size. Pointers are usually the same length as ints, at least at the moment, but syntactically the two are treated differently: attempting to, for instance, set an integer to a string will result in a compiler error, whereas setting a pointer to equal a string is perfectly valid. The difference is that the distinction is made at the compiler level, not at the machine level: the compiler will catch incorrect typing, but once you've reduced yourself to assembly, the actual bit-patterns will be identical.
So basically: given that any type system that prohibits certain types (of a given length) from containing certain bit-patterns can always be reformulated (often more efficiently) via simple mapping such that all bit-patterns are valid, I don't think that the machine-level implementation of a language's types is a particularly good reason to declare it degenerate or not, or even trivial or not. —Simetrical (talk • contribs) 02:59, 24 May 2006 (UTC)
- Touché. I admit that looking at machine implementation is a bad way to judge languages. Normally I go around telling other people that. But: (1) the paragraph that started this discussion was about the relationship between type safety and memory safety, and the latter is an implementation issue (I think this was mentioned in a discussion much higher on this page), so meaningful discussion has to talk about implementation; and (2) we have to talk about any language on the level at which its semantics is defined. C is, if I'm not mistaken, defined in terms of its implementation on a machine; therefore it is impossible to say anything about its type system without referring to bits and things.
- Obviously, C compilers do static type checking, and certain expressions (like your string-to-integer assignment example) are not allowed. But that doesn't change the fact that any value can end up in any variable of the correct size, regardless of type. Sometimes you can just do the assignment, sometimes you have to do a cast, and sometimes you have to do other nasty tricks. This means we have to choose one of two possibilities: either C's type system is unsound, which I think qualifies it as degenerate, or it is sound but ultimately doesn't distinguish anything (in the sense that every type has the same set of values, whether you phrase that in terms of bits or in syntactic terms) and is therefore trivial.
- Bottom line, do you disagree with my argument that type safety and memory safety are related, or do you only disagree with my use of the word "degenerate"? If the former, we should talk about that instead. If the latter, can we end this argument by changing "all but the most degenerate" to "most"? Cjoev 04:41, 24 May 2006 (UTC)
- Well, the thing is, the types differ on the semantic level, not on the implementation level. If (hypothetically) you were to have a float and integer of the same bit length, any bit-pattern would be valid for either, but any given bit pattern (except 0000 0000 ...) would mean something different in each type. I would view the precise bit-patterns allowed for a type to be a minor technicality; they aren't fundamentally related to the nature of the type, which is to assign semantic value to bits and define interactions with differently-assigned bits. So I'd leave out mention of bit-patterns entirely.
- Hm. If the "nature" of a type is to assign semantic value to bits, then how can you expect to leave bit patterns out? Cjoev
- Because anything can be summarized as a bit-pattern, but that's typically irrelevant to any particular point about the thing in question. It doesn't matter to type safety whether the bit-patterns are the same or different, it matters that the meaning is different. —Simetrical (talk • contribs) 03:44, 11 June 2006 (UTC)
- Hm. If the "nature" of a type is to assign semantic value to bits, then how can you expect to leave bit patterns out? Cjoev
- As for memory safety and type safety: insofar as "memory-safe" means "you can't point to something insane" and "type-safe" means "you can't assign insane values to variables", and insofar as pointers (where they exist) are variables, yes, type safety and memory safety do seem related. Specifically, if you define assigning unreasonable (i.e., unused by the program: for writes, not requested for use; for reads, not initialized) values to pointers as type-unsafe, and assume that the only way to be memory-unsafe is to have a pointer pointing to something unreasonable, it's fairly clear that memory safety is a subset of type safety.
Or at least, that's how I see it, having read the article. There's no need to bring bit-patterns into it: even if the uninitialized data you're reading into the variable happens to match the bit-patterns allowed by type t, you've still broken memory safety by reading them. —Simetrical (talk • contribs) 05:52, 25 May 2006 (UTC)
- I argued in two directions. The bit pattern business was in the first direction, namely that memory-unsafe languages are also type-unsafe. Here the argument is that if you can read from an insane location, you can't expect to get a sane value. The catch is that if all values are considered sane, in the sense that anything you can possibly read from any location in memory is meaningful as a member of any type whatsoever, then of course you can expect to get a sane value. The fact that you've broken memory safety by reading them means nothing, because I've already assumed the language is memory-unsafe and am trying to argue that it is consequently also type-unsafe. So in order for the argument that memory-unsafety breaks type safety to hold water, there must exist some insane value that might be read from an insane location. My argument in the other direction, that certain kinds of type-unsafe behavior also break memory safety, had nothing to do with bits, triviality or degeneracy. Cjoev 00:07, 26 May 2006 (UTC)
- If you take random bits and copy them into a C int, you've broken type safety (unless you intended to create a random or otherwise arbitrary number, in which case copying the bits is just a way to generate a pseudorandom number). What results will be an integer, but a totally meaningless one. So what if it said:
Type safety is closely linked to so-called memory safety. To see why, assume for purposes of discussion that a certain implementation of a certain language has some type, say t, such that there exists some—Simetrical (talk • contribs) 03:44, 11 June 2006 (UTC)
sequence of bits (of the appropriate length)concept that is not a legitimate member of t. (All but the most degenerate type systems satisfy this criterion.) If that language is memory-unsafe to the extent of allowing data to be copied from unallocated or uninitialized memory into a variable of type t, then it is not type safe, because such an operationmightwould assign a non-tvaluemeaning to that variable. Conversely, if the language is type unsafe to the extent of allowing an arbitrary integer to be used as a pointer, then it is clearly not memory safe.- In the time that has passed since the last time I looked at this discussion, I've lost track of the reason I thought I had for saying "sequence of bits". What I really wanted to write at the time was "value" but for some reason I decided against it. I think "concept" is unnecessarily abstract. Whether the assignment "would" break type safety or just "might" depends on just how the language happens to define the outcome -- a gray area on which many informal language definitions are silent. I will boldly change "sequence of bits" to "value". Cjoev 18:44, 15 September 2006 (UTC)
- If you take random bits and copy them into a C int, you've broken type safety (unless you intended to create a random or otherwise arbitrary number, in which case copying the bits is just a way to generate a pseudorandom number). What results will be an integer, but a totally meaningless one. So what if it said:
- I argued in two directions. The bit pattern business was in the first direction, namely that memory-unsafe languages are also type-unsafe. Here the argument is that if you can read from an insane location, you can't expect to get a sane value. The catch is that if all values are considered sane, in the sense that anything you can possibly read from any location in memory is meaningful as a member of any type whatsoever, then of course you can expect to get a sane value. The fact that you've broken memory safety by reading them means nothing, because I've already assumed the language is memory-unsafe and am trying to argue that it is consequently also type-unsafe. So in order for the argument that memory-unsafety breaks type safety to hold water, there must exist some insane value that might be read from an insane location. My argument in the other direction, that certain kinds of type-unsafe behavior also break memory safety, had nothing to do with bits, triviality or degeneracy. Cjoev 00:07, 26 May 2006 (UTC)
- Well, the thing is, the types differ on the semantic level, not on the implementation level. If (hypothetically) you were to have a float and integer of the same bit length, any bit-pattern would be valid for either, but any given bit pattern (except 0000 0000 ...) would mean something different in each type. I would view the precise bit-patterns allowed for a type to be a minor technicality; they aren't fundamentally related to the nature of the type, which is to assign semantic value to bits and define interactions with differently-assigned bits. So I'd leave out mention of bit-patterns entirely.
- You're correct that some pointers might point to bad areas of memory, or might be null (which signifies no area of memory), but such values are still valid, just not useful. If you tell the program to assign such a value to a pointer, it will dutifully do so, because those values are valid for the type within the language.
[edit] Reverted deletions
I just reverted a major edit that removed large amounts of content with insufficient explanation: the worst of these was the deletion of the entire section on memory management on the basis that "garbage collection has nothing to do with type safety", a subject that has already been argued extensively on this page. Other deletions included Robin Milner's famous slogan, the nod to denotational semantics and a slightly philosophical discussion of the meaning of "unsafe". Cjoev 21:55, 22 September 2006 (UTC)